Bonfyre — Technical Documentation & Benchmarks

BonfyreFPQ — Functional Polar Quantization

Compress any model ~4×. Run inference directly on compressed weights.

Pure C model compression engine. No GPU required. No Python dependencies. No training. The SLI bridge is live — native .fpq files run inference directly, no decompression step.

✓ Runtime gap closed — patch_model() and go. TinyLlama: 155 layers, cos=0.997, top-1=97.3%

0.9999 per-weight cosine

0.9976 output cosine (30 layers)

1,790 tensors compressed

54→27 GB Wan2.1-T2V-14B

15 models on Hugging Face

Three layers of correctness — all verified

Layer 1 — Weight Space

cos ≈ 0.9999

Near-lossless per-tensor across all 307 encoded tensors in Wan2.1-T2V-1.3B. Worst tensor: 0.999590.

Layer 2 — Network Propagation

cos ≈ 0.9976

Error stays controlled after 30 stacked transformer blocks. PSNR 35.97 dB. MSE 1.01e-3.

Layer 3 — System Behavior

stable × all timesteps

Cosine holds 0.9976–0.9983 across the full diffusion schedule. No drift amplification.

Models compressed — real files, real numbers

Model	Domain	Original	Compressed	Tensors	Avg Cos	Worst Cos	Avg bpw	HF Model
Wan2.1-T2V-14B	Video diffusion	54 GB	27 GB	402	0.999882	0.999826	4.05	Download →
Phi-4 (14B)	Language model	28 GB	28 GB	162	1.000614	1.000149	4.08	Download →
Whisper Large V3	Speech recognition	8.7 GB	5.8 GB	998	0.999916	0.999834	4.19	Download →
Whisper Large V3 Turbo	Speech recognition	1.6 GB	1.6 GB	228	0.999929	0.999858	4.18	Download →
Wan2.1-T2V-1.3B	Video diffusion	5.3 GB	2.7 GB	307	0.999874	0.999590	—	local
SmolLM2-135M	Language model	101 MB	258 MB F16	211	0.999855	0.999589	—	GGUF
Gemma 2B-it	Language model	—	—	sampled	0.99995	0.99995	—	local
Whisper base.en	Speech recognition	—	—	sampled	0.999808	0.999763	—	GGML

Artifacts on Hugging Face: (1) compatibility safetensors (direct Transformers load), and (2) native .fpq v12 files (rANS entropy-coded, 3.5–5.2× smaller, direct inference via SLI bridge).

Output-Level Proof — Wan2.1-T2V-1.3B DiT Forward Pass

Loaded the original Wan2.1 model and FPQ-compressed version into the same WanTransformer3DModel architecture. Fed identical synthetic inputs (seed=42, shape [1,16,1,60,104], BF16 on MPS). Compared full forward pass outputs.

0.99759

Cosine Similarity

35.97 dB

PSNR

1.01e-3

MSE

6.18s → 6.40s

Inference Time

Per-channel: ch0 cos=0.9960, ch1 cos=0.9993, ch2 cos=0.9952, ch3 cos=0.9980
Max absolute error: 0.138 (on a ±0.45 std output range)
Relative error: 6.97% — well within the visually safe zone for video generation

Diffusion timestep sweep — Wan2.1-T2V-1.3B

Timestep	Cosine	PSNR (dB)	MSE
t = 0	0.99831	34.82	1.32e-3
t = 100	0.99792	35.49	1.13e-3
t = 500	0.99759	35.97	1.01e-3
t = 900	0.99782	36.02	1.00e-3
t = 999	0.99804	35.71	1.07e-3

Identical inputs at each timestep, BF16 on MPS. Cosine range: 0.99759–0.99831. Zero drift.

Perplexity benchmark — Qwen 2.5 0.5B, WikiText-2

994-token slice, max-length 512, stride 256. All runs on same hardware, same data.

Method	PPL	Δ baseline	Avg Cos	Worst Cos
Baseline (FP32)	14.20	—	1.0000	1.0000
BonfyreFPQ @3-bit	14.48	+1.97%	0.999783	0.999588
HQQ @3-bit (g64)	32.38	+128%	—	—
COORD @3-bit (v4)	35.59	+150%	0.982761	0.982327

169 tensors. HQQ via standalone benchmark (group-size 64, axis 1, CPU). Proof pack.

Published 3-bit benchmarks — Llama-2-7B, WikiText-2

From authors' own papers. Lower PPL = better. FP16 baseline: 5.12.

Method	Bits	PPL	Δ FP16	Source
FP16 (baseline)	16	5.12	—	—
AQLM	3.04	5.46	+6.6%	Egiazarian et al., ICML 2024
SpQR	2.98	6.20	+21.1%	Dettmers et al., 2023
AWQ	3	6.24	+21.9%	Lin et al., MLSys 2024
GPTQ	3.00	8.06	+57.4%	Frantar et al., 2022
HQQ	3	Not published on Llama-2		Badri & Shelor, 2023

Recent additions — runtime and system surfaces

The newer work is not just “more helper modules.” It broadened both the shared runtime and the system-facing binaries around proxy, orchestration, communications, realtime relay, and delivery.

bf_opts — CLI parsing

Lightweight, getopt-style parser with subcommand support, typed options, and automatic `--help` generation for each binary.

bf_pool — Thread pool

Per-process worker pool with task queues, priorities, and cooperative shutdown hooks used by servers and background jobs.

bf_utf8 — SIMD UTF‑8 validator

High-speed UTF‑8 validator with SSE/NEON paths for safe, zero-copy text ingestion and pipeline sanitization.

bf_conf — Config parser

Simple INI/TOML-style parser with typed readers and layered includes for runtime configuration and hardware profiles.

bf_lz4 — Compression adapter

Thin LZ4 wrapper with streaming encode/decode helpers used by artifact transport and the local cache.

bf_hashmap — Robin‑Hood map

Cache-friendly hash table with robin‑hood probing and predictable iteration used in in-memory indexes and caches.

bf_csv — CSV reader/writer

Low-allocation CSV I/O with typed columns and dialect handling for bulk import/export tasks.

bf_json / bf_msgpack

Fast JSON helpers and a MessagePack event encoder/decoder for compact telemetry, ledgers, and manifest interchange.

bf_crypto

Portable BLAKE2b + ChaCha20‑Poly1305 wrappers for authenticated encryption, with password‑hash and signature helpers.

bf_bloom

Small Bloom filter utilities for single-pass dedup and membership checks in ingestion and pipeline stages.

bf_dns

Asynchronous DNS resolver with a tiny LRU cache and reactor integration for network-facing binaries.

bf_log

Structured, ring-buffer logging with per-thread buffers and file rotation for low-latency telemetry.

bf_mmap

Zero-copy mmap helpers with sensible madvise hints and safe fallbacks for pipes and small files.

bf_lmdb

LMDB-backed artifact cache with zero-copy reads and named sub-databases for manifests, blobs, and families.

bf_pipeline_coro / bf_textfuse

Coroutine-based pipeline primitives (libdill style) and an Hyperscan-powered single-pass text fusion engine for fast, composable pipeline stages.

bf_flatbuf

FlatBuffers builders/readers for compact artifact manifests and zero-copy transport between components.

bf_picohttpparser + mimalloc integration

SIMD-accelerated HTTP parsing in the reactor and optional per-reactor mimalloc arenas to reduce contention on high-concurrency servers.

bonfyre-proxy

OpenAI-compatible local API shim for transcriptions, chat-style summarization, and model listing. Useful when existing clients need to keep working while Bonfyre takes over the backend path.

bonfyre-orchestrate

Machine-only planner with policy memory, typed request state, and optional Gemma/OpenAI-compatible assist. Decides when extra Bonfyre blocks should be used instead of exposing a prompt UI.

bonfyre-tel / bonfyre-moq

Communications edge plus browser-native realtime relay. SIP voice/SMS/MMS on one side, MoQ/WebTransport on the other, both kept in the same C-first operating model.

bonfyre-swarm / bonfyre-narrate

Metered artifact distribution and verified audio output. One expands how artifacts move; the other expands how finished outputs are published and checked.

If you want short usage examples for any module (headers, init, simple API), tell me which ones and I'll add code snippets and minimal examples in this section.

What's inside

Low-Rank SVD

Global structure extraction

E8 Lattice

Optimal 8D quantization

16D RVQ

Structured residual correction

Ghost Head

Rank-1 error correction

GGUF format support (llama.cpp compatible)

Reads & dequantizes

F32, F16, Q4_0, Q5_0, Q8_0
Q4_K, Q5_K, Q6_K

Writes

GGUF v3 F16 — direct llama.cpp load
Preserves all metadata verbatim

Quick start

bonfyre-fpq quantize model.gguf compressed.gguf --bits 3

Input formats:
GGUF (llama.cpp, whisper.cpp)
Safetensors (HuggingFace)
GGML (legacy whisper)

Output formats:
GGUF F16 → llama.cpp direct load
BF16 safetensors → PyTorch/diffusers
Preserves all metadata + tokenizer

SLI Bridge — Direct Runtime Inference

Direct runtime inference from .fpq is now working. Load a compressed model and run it immediately — no conversion, no extra RAM, no hacks. The SLI bridge (Spectral Lattice Inference) is fully integrated.

Results (TinyLlama, 155 SLI layers):
• 97.3% top-1 agreement vs original
• 0.997 cosine similarity
• 2/5 text matches (identical output)
(Full logs & proof pack)

Usage: patch_model(hf_model, fpq, resolver) — replaces nn.Linear layers with FPQLinear. No decode step, no weight copy.

FPQ-X — Generalized Compression Algebra · All 6 Operators Live

Six operators. One compiler. Rate–distortion–execution optimized.

FPQ-X evolves BonfyreFPQ from a quantizer into a full compression algebra. All six operator families are implemented and validated in fpqx_ops.c + fpq_bridge.py.

✓ A — Additive ✓ M — Multiplicative Row Scale ✓ Π — Predictive ✓ D — Distilled ✓ Λ — Adaptive Policy ✓ H — NEON Packing

𝒯(x,c,h,t) = (B + R + P) ⊙ S + Π(x,c,h,t) + Δ_seq(c,t)

A = Additive core · M = Multiplicative manifold · Π = Predictive restoration · D = Sequence distillation

Six operator families — each derived from 2026 published research

Additive

Inherited from FPQ v10

Low-rank SVD + E8 lattice + 16D RVQ + QJL projection + Ghost correction. The proven foundation delivering 0.999+ cosine across 1,790 tensors.

Multiplicative

Low-rank scaling manifold

Learns S = I + AB^T via thin SVD of the ratio matrix Q = W/Ŵ − 1. Captures scaling distortion that additive methods miss. Auto-rollback if cosine doesn't improve.

Derived from: LoRDS, WaterSIC

Predictive

Context-conditioned restoration

Per-column linear predictor from the low-rank basis to the quantization residual. Uses already-available L factor to predict and cancel systematic error.

Derived from: EchoKV, MoBiQuant

Distilled

Sequence-axis compression

Attention-weighted K-means++ on KV cache vectors. Compresses along the sequence dimension — tokens that attend similarly share one cache atom. Orthogonal to weight quantization.

Derived from: KVSculpt, KV-CoRE

Adaptive

Per-tensor policy selection

Profiles each tensor: η_L (low-rank energy), spectral gap, kurtosis, outlier fraction. Decision tree selects which operators to activate and at what rank.

Derived from: KV-CoRE, MoBiQuant

Hardware

Kernel-aligned packing

Inner-group quantization that aligns bit boundaries to SIMD lanes. Stores scales per group, enabling vectorized unpacking without scatter/gather overhead.

Derived from: InnerQ, High-Rate QMM

The FPQ-X encode pipeline

1. Λ Profile

→

2. BWA Prune

→

3. A Encode (v9)

→

4. M Scale

→

5. Π Predict

Each stage has automatic quality rollback — if an operator doesn't improve cosine by >1e-7, it's disabled for that tensor.

FPQ v10 vs FPQ-X

Dimension	FPQ v10	FPQ-X
Error model	Additive only (W ≈ Ŵ)	Additive × Multiplicative + Predictive
Per-tensor policy	Same pipeline for all	Λ profiles η_L, gap, kurtosis → selects operators
KV cache	Weight-only quantization	D operator: sequence-axis distillation
Hardware awareness	Generic packing	H operator: SIMD-lane-aligned groups
Objective	min ‖W − Ŵ‖	min λ_R·Rate + λ_D·Distortion + λ_E·Execution
Research basis	Original FPQ design	9 papers from early 2026

bonfyre-fpqx CLI


          # Full A+M+Π pipeline

          bonfyre-fpqx compress model.safetensors compressed.safetensors --bits 3


          # Roundtrip quality test

          bonfyre-fpqx roundtrip model.safetensors --bits 3


          # Per-tensor compressibility analysis

          bonfyre-fpqx profile model.safetensors


          # KV cache distillation

          bonfyre-fpqx distill cache.safetensors distilled.safetensors --atoms 256


          # Hardware-aligned repacking

          bonfyre-fpqx pack model.safetensors packed.safetensors --bits 3 --group-size 128

Research foundation — 9 papers synthesized

LoRDS

Multiplicative low-rank scaling

arXiv:2601.22716

WaterSIC

Activation-aware rate–distortion

arXiv:2603.04956

EchoKV

Predictive KV reconstruction

arXiv:2603.22910

KVSculpt

Attention-weighted cache distillation

arXiv:2603.27819

KV-CoRE

Data-dependent compressibility

arXiv:2602.05929

InnerQ

Hardware-aligned inner quantization

arXiv:2602.23200

MoBiQuant

Token-adaptive mixed precision

arXiv:2602.20191

High-Rate QMM

Activation-weighted matrix multiply

arXiv:2601.17187

Codebook Opt.

Optimal codebook initialization

arXiv:2602.06557

KV Cache Compression — 9 Optimizations

Baseline cosine numbers (bonfyre-kvcache C benchmark). All 9 Python optimizations live in fpq_bridge.py.

Bits	KV Cosine	Hardware implication
5-bit	0.99996	~5.3× more context — 8K ctx → 42K in same VRAM
4-bit	0.99994	4× more context — recommended for production
3-bit	0.99990	5.3× context, some quality loss on long sequences

#3 Attention-Weighted Tiles

High-attention blocks dominate tile assignment. Codebook quality concentrates where the model actually looks.

#4 Per-Layer Adaptive Bits

Λ-profiler analyzes each K/V layer — kurtosis, spectral gap, outlier fraction — to pick the right bit depth automatically.

#5 Cross-Layer Shared Codebook

One 256-tile codebook learned across 8 sample layers. Skip per-call K-means — compress all layers in amortized O(1).

#6 D-Operator Distillation

K-means++ on KV vectors to K atoms (K ≪ N). Tokens that attend similarly share one atom. Bug-free nearest-centroid lookup.

#7 Delta Encoding

Only the delta vs previous frame is compressed. Each new token costs far less than storing a new frame.

#8 Huffman PMF Weighting

E8 coordinate magnitude as Huffman code length proxy. High-cost blocks get upweighted — rate-quality jointly optimized.

#9 LT_SMALL_INT Fast Path

Near-zero blocks (max abs ≤ 63) bypass E8 lattice — 7-bit integer round + clamp. Significant throughput win on embedding layers.

#10 M-Operator Row Scale

Per-row scale vector on each FPQLinear. Applied after SLI matmul: corrects per-output-channel amplitude drift.

#11 H-Operator NEON Packing

ARM NEON 128-bit aligned pre-packing. Eliminates scatter/gather — vectorized unpacking on Apple Silicon and Jetson.

Use individually or compose: patch_kv_cache(adaptive_bits=True, shared_tiles=tiles) activates #4 + #5 simultaneously.

Hardware Fits & Pricing

What hardware runs what — and what it costs.

FPQ weight compression + KV cache optimizations change the hardware equation. What used to need cloud GPUs now fits on-device. Use cases shift depending on your hardware budget.

Device	RAM	Approx. Cost (2026)	Before (BF16)	After (FPQ 4-bit + KV)
Raspberry Pi 5	8 GB	$80	TinyLlama only, 512-token ctx, no video	TinyLlama + 2K ctx, Whisper turbo inference, local ASR pipeline
Jetson Orin Nano	8 GB	$250	Qwen 0.5B only, degraded at >512 ctx	Qwen 0.5B @ 4K ctx · NEON packing · embeddings + FPQ inference co-resident
Apple M1 MacBook (16 GB)	16 GB unified	$900 refurb	Qwen 0.5B (tight), Wan 1.3B (no headroom)	Wan 1.3B + 4K ctx KV · HCP speech + SLI co-resident · NEON-packed
Apple M2/M3 Max (64 GB)	64 GB unified	$2,500–3,500	Phi-4 14B (no KV headroom past 2K)	Wan 14B @ 8K ctx · Phi-4 @ 32K ctx with delta KV · full pipeline concurrent
T4 cloud (16 GB VRAM)	16 GB VRAM	~$0.35/hr spot	Qwen 3B, 2K ctx max before OOM	Qwen 3B @ 8K ctx · Wan 1.3B + full diffusion sweep · shared KV codebook
RTX 4090 (24 GB VRAM)	24 GB VRAM	$1,600 GPU / ~$0.50/hr cloud	Wan 1.3B (tight), Phi-4 14B doesn't fit	Wan 1.3B @ 32K ctx · Phi-4 14B fits · adaptive bits saves ~30% KV RAM
RTX 6000 Ada (48 GB VRAM)	48 GB VRAM	~$1.10/hr (RunPod)	Wan 14B (tight), long video sequences OOM	Wan 14B · 287 SLI layers @ 5-timestep sweep · multi-second video KV cached

Budget tiers — what opens up at each price point

Under $100

Raspberry Pi 5

Local ASR with Whisper turbo. TinyLlama inference. Bonfyre pipeline binaries. Edge transcription kiosk.

$250 – $1,000

Jetson Orin / M1 Mac

Qwen 0.5B + 4K ctx. Wan 1.3B video. HCP speech + SLI co-resident. NEON packing. Full local pipeline.

$2,500 – $3,500

M2/M3 Max

Wan 14B @ 8K ctx. Phi-4 @ 32K ctx with delta KV. Full Bonfyre pipeline + inference concurrent. Production-grade local stack.

Cloud spot ($0.35–$1.10/hr)

T4 / RTX 4090 / RTX 6000

Burst GPU for video generation, large model inference, SLI sweeps. Use FPQ to fit bigger models on cheaper instances. T4 now handles what used to need A100.

Weight footprint

~2.2 GB → 1.1 GB

TinyLlama 1.1B — stays in .fpq at runtime, no decode step. BF16 copy never materializes.

KV context scaling

8K → 32K tokens

4-bit KV compression in same VRAM budget. Delta encoding makes each new token incremental.

ARM throughput

NEON 128-bit

H-operator pre-packing on Apple Silicon and Jetson — vectorized unpacking, no scatter/gather.

Co-residency

Inference + pipeline

FPQ model + HCP speech + vector search + pipeline can run concurrently on a 16 GB Mac.

Metric	Before (P0)	After P5	Improvement
Single embed	~600 ms	237 ms	2.5×
10-file batch embed	~6,000 ms	386 ms	15.5×
Pipeline (6 stages)	76 ms	8 ms	9.5×
Tag inference	~150 ms	6 ms	25×
Hash hex	~100 ns	~10 ns	~10×
Artifact struct	1,076 bytes	536 bytes	2× cache density
Vector file (384-dim)	6.4 KB JSON	1,544 bytes VECF	4.2× smaller

BonfyreTel — Realtime Relay

Pure C realtime media relay. Same Bonfyre philosophy.

bonfyre-moq is the C-first realtime transport layer for BonfyreTel. Built on ngtcp2 + nghttp3 + OpenSSL 3 + SQLite, it gives Bonfyre a browser-native relay primitive that matches the rest of the stack: small binary, local control, SQLite observability, and no dependency on a hosted realtime vendor.

✓ MoQ-Transport draft-14 ✓ Pure C11 ✓ SQLite stream log ✓ WebTransport session termination ✓ Subscriber fan-out ✓ Zero-Copy Object Forwarding

Core pieces first, optional experiments second

Transport Core

bonfyre-moq.c

Handles QUIC, HTTP/3, WebTransport session setup, MoQ control parsing, subscriber fan-out, and SQLite stream/session logging in one C binary.

ngtcp2 + nghttp3 + OpenSSL 3 + SQLite

Observability

transport_sessions + stream_events

Uses the same Bonfyre style as the rest of the stack: log the useful state to SQLite, keep the binary understandable, and make downstream processing compose cleanly with other tools.

session.open · broadcast.announce · track.forward.stop

C-First Direction

bonfyre-moq is the product path

The goal is not “a Node relay with a C rewrite later.” The goal is all C. The Node moq-edge subtree is useful as a spec/reference harness, but the main narrative and target runtime stay native.

make bonfyre-moq

Experimental Hooks

inference.c / optimizer.c / mesh.c / consensus.c

These modules exist, but they should be read as optional experiments around the relay, not the core promise. The core promise is simpler: private realtime media transport in a small native binary.

keep the core path understandable first

Relay internals

QUIC / WebTransport (ngtcp2+nghttp3)
↳ MoQ control parsing → publish / subscribe / namespace tracking
↳ zero-copy forward → subscriber fan-out
↳ SQLite stream log → session + event visibility for the rest of Bonfyre
kqueue/epoll event loop · graceful drain · optional experimental extension hooks

Build & run


          # Build the native relay

          make bonfyre-moq


          # Run relay

          ./bonfyre-moq --host 127.0.0.1 --port 4443 \

                         --runtime-dir /tmp/bonfyre-moq \

                         --db /tmp/bonfyre-moq/relay.db


          # Optional extension-module check

          make test-bonfyre

C11No Node.js

MoQBrowser-native transport

SQLiteRealtime event log

SmallFits Bonfyre narrative

C-firstMain product path

Stack Binaries And Libraries

The stack now spans 51 focused binaries shown here, plus shared libraries. Every piece is standalone enough to use alone. ~2.1 MB total disk.

Substrate (9 binaries) — cold, stable infrastructure. Never imports product concepts. Content-addressed output.

bonfyre-ingest 35 KB — universal intake: type detection, BOM stripping, CRLF→LF, inline SHA-256 (no double-read), manifest stamping. PURE|CACHEABLE|IDEMPOTENT.

bonfyre-hash 34 KB — FIPS 180-4 SHA-256 content addressing. Streaming hash over arbitrary-size inputs. Dedup via Bloom filter (8 KB, k=7, 0.01% FPR).

bonfyre-index 68 KB — SQLite artifact index with FTS5 virtual table for full-text search. Family-key grouping for structural queries.

bonfyre-compress 34 KB — family-aware zstd with dictionary training from sample corpus. Per-family ratio reporting. Level tuning (1-19). Dict library management.

bonfyre-stitch 34 KB — DAG materializer: execution plan generation, dead-branch pruning, cache hit statistics.

bonfyre-graph 51 KB — Merkle-DAG artifact graph with breadth-first materialization. SHA-256 interior nodes.

bonfyre-runtime 34 KB — process lifecycle manager: dispatches queue/pipeline/ledger subprocesses.

bonfyre-queue 34 KB — SQLite WAL job queue: concurrent writers, per-worker thread pool (default 4), FTS5 result search, webhook dispatch, exponential backoff retry.

bonfyre-sync 34 KB — cross-instance artifact replication with sync manifests.

Transform (22 binaries) — pure, cacheable, stateless. Same input → same output. Cache key = (operator, params, input_hash).

bonfyre-media-prep 34 KB — 16 kHz mono WAV normalization, noise gate, denoising via ffmpeg resampling.

bonfyre-transcribe 34 KB — Whisper + HCP v3.2: quad-channel spectral (acoustic + morpho + bigram + trigram), KIEL-CC Kalman, E-T Gate, formant anchoring, 9-layer hallucination detection, morphological logit bias. 0.999 quality (base).

bonfyre-transcript-clean 34 KB — Hyperscan single-pass >1 GB/s pattern fusion (6 classes: filler/halluc/keyword/quality/tag/chunk-hdr) or Aho-Corasick fallback 200 MB/s.

bonfyre-paragraph 35 KB — sentence boundary detection with structural paragraph formatting.

bonfyre-brief 34 KB — executive summary + action items via TF-IDF keyword extraction + BM25 ranking (Robertson '94, document-length normalization).

bonfyre-proof 34 KB — quality scoring: length, filler ratio, hallucination probability, BM25 confidence metrics. Machine-verifiable proof bundle.

bonfyre-embed 52 KB — ONNX all-MiniLM-L6-v2 (384-dim), trie tokenizer, batch mode (6.5× faster), --insert-db zero-file-I/O path.

bonfyre-vec 35 KB — sqlite-vec similarity search with NEON SIMD cosine. 5 ms exact search over 3K artifacts.

bonfyre-narrate 68 KB — Piper neural TTS with 6-layer fidelity verification.

bonfyre-render 34 KB — Handlebars/Jinja2 template rendering from artifact manifests.

bonfyre-emit 34 KB — multi-format output via pandoc: HTML, PDF, EPUB, RSS.

bonfyre-mfa-dict 34 KB — CMU dict + custom rules for MFA pronunciation dictionary generation.

bonfyre-weaviate-index 34 KB — Weaviate vector database integration for semantic search.

bonfyre-transcript-family 34 KB — composed chain: ingest → transcribe → clean → paragraph in one call.

bonfyre-repurpose 34 KB — brief → social formats: tweets, LinkedIn, carousel, YouTube description, newsletter. Template-based.

bonfyre-segment 50 KB — idea boundary detection via temporal clustering + energy peaks. Outputs segment graph + rhythm profile.

bonfyre-clips 35 KB — auto clip discovery: silence detection + energy thresholding → candidate timestamps.

bonfyre-speechloop 34 KB — Whisper → transform → Piper feedback loop for iterative speech refinement.

bonfyre-tone 34 KB — OpenSMILE eGeMAPSv02: 88 acoustic features (MFCCs, pitch, jitter, shimmer, voiced energy). Tone profile + diff.

bonfyre-tag 35 KB — native C fastText inference: topic + intent tagging + language detection. 2 ms, no GPU.

bonfyre-quant 42 KB — v8 RLF weight quantization: E8 lattice snap + μ-law warp + 16D RVQ. 0.9999+ per-weight cosine.

bonfyre-kvcache 42 KB — KV cache compression: 9 optimization passes, attention-weighted tiles, per-layer adaptive bits.

Surface (11 binaries) — product-facing, stateful services. Own mutable state. HTTP/CLI interfaces.

bonfyre-cms 287 KB — dynamic schemas + versioning. Every record is Lambda Tensors-compressed. REST API. 921 μs/create. Replaces Strapi (1,742× smaller).

bonfyre-api 69 KB — async HTTP gateway: per-core kqueue/io_uring reactors, SSE real-time streams, token-bucket rate limiting (120 req/min), CORS, SPA fallback.

bonfyre-auth 35 KB — user signup/login with SHA-256 passwords, session tokens, expiry management.

bonfyre-pipeline 52 KB — unified in-process pipeline via libdill coroutines (~50 ns context-switch). Gate → Ingest → Hash → Index → Compress → Meter → Stitch → Ledger. 5-8 ms/stage, >93% faster than forking.

bonfyre 34 KB — unified CLI dispatcher: pipes to any binary via subcommand.

bonfyre-project 34 KB — project scaffolding with template-based directory generation.

bonfyre-tel 68 KB — communications edge runtime: FreeSWITCH ESL voice/SMS/MMS today, with persistent topics, identities, intents, and transport-agnostic event state underneath.

bonfyre-moq ~120 KB — pure C WebTransport/MoQ relay: ngtcp2 + nghttp3 + OpenSSL 3, SQLite stream log, subscriber fan-out, zero-copy object forwarding.

bonfyre-canon 35 KB — structural canonicalization via Tree-sitter AST parsing. Produces structural hash + structural diff.

bonfyre-proxy 53 KB — OpenAI-compatible API: set OPENAI_API_BASE=localhost:8787. Existing code works unchanged. Drop-in replacement.

bonfyre-orchestrate 45 KB — machine-only planner with typed request state, policy memory, feedback scoring, and optional Gemma/OpenAI-compatible assist.

Value (9 binaries) — monetization, metering, delivery. Separate privilege boundary.

bonfyre-offer 34 KB — quality-based dynamic pricing: proof-score → tier options → proposal generation.

bonfyre-gate 34 KB — API key tiers (Free/Pro/Enterprise): entitlement checking, key file validation (JSON + expiry + allowed_ops whitelist).

bonfyre-meter 34 KB — per-operation usage tracking: operation counters, billing aggregation.

bonfyre-ledger 34 KB — append-only financial records: immutable log, audit trail.

bonfyre-finance 51 KB — service arbitrage + bundle pricing: offer + meter → revenue model.

bonfyre-outreach 51 KB — outreach event tracking + follow-up routing.

bonfyre-pay 35 KB — invoicing, payments, credits: entry point for Stripe/Cash integration.

bonfyre-pack 34 KB — deliverable packaging: ZIP + JSON manifest for complete artifact families.

bonfyre-distribute 34 KB — async delivery to email, Slack, webhooks. Fire-and-forget with receipt tracking.

Libraries

libbonfyre ~180 KB — shared runtime with 26 modules across 10 domains (see below)

liblambda-tensors 72 KB — structural JSON compression

libbonfyre — 26 modules, 10 domains

Every binary links libbonfyre.a (~180 KB). Zero external dependencies beyond libc + SQLite. All modules are C11, portable across macOS/Linux/BSD, with SIMD fast paths on ARM NEON and x86 SSE4.2 where applicable.

Core (4)
bf_artifact — content-addressed manifest: FNV-1a-64 family/canonical keys, binary cache (.bfrec/.bfsum), parse/serialize
bf_operators — 51-operator registry with O(1) lookup, behavioral flags (PURE|STATEFUL|CACHEABLE|REVERSIBLE|IDEMPOTENT), exactness classes
bf_sha256 — FIPS 180-4 streaming SHA-256
bf_common — dir creation, file I/O, timestamps, FNV-1a, JSON field extraction

Networking (3)
bf_reactor — per-core async I/O (kqueue on macOS, io_uring on Linux 5.1+). Zero-copy recv via registered buffers, SSE persistent read, token-bucket rate limiting
bf_dns — async DNS resolver
bf_picohttpparser — zero-copy HTTP/1.1 request parser

Crypto (2)
bf_crypto — BLAKE2b hash/HMAC/KDF, XSalsa20-Poly1305 AEAD secretbox, Ed25519 signing. Portable impl + libhydrogen/libsodium delegation
bf_bloom — split-block Bloom filter: 8 KB fixed, k=7 hashes, FNV-1a seeded, ~10K items @ 0.01% FPR

Serialization (3)
bf_json — SIMD JSON engine (yyjson-inspired): 16-byte aligned scanning, SSE4.2/NEON acceleration, lazy number parsing, zero-copy strings
bf_flatbuf — FlatBuffers zero-copy serialization: read directly from wire buffer without unpacking
bf_msgpack — MessagePack binary serialization

Concurrency (2)
bf_pool — Chase-Lev work-stealing thread pool: lock-free per-worker deques (C11 atomics), idle workers steal from random peers, auto core detection
bf_pipeline_coro — structured concurrency via libdill: lightweight coroutines, typed channels, deadline API. ~50 ns context-switch vs ~5 μs fork

Data & I/O (4)
bf_lmdb — zero-copy LMDB cache: 512 MB default, 4 named DBs (artifacts/families/blobs/meta), pointer casts into mmap (no deserialization)
bf_mmap — portable mmap: MAP_PRIVATE + MADV_SEQUENTIAL hints, pipe/stdin fallback for non-seekable inputs
bf_csv — RFC 4180 streaming parser: zero-copy field access, quoted field handling, configurable delimiter, writer with auto-quoting
bf_conf — INI/TOML config: layered search (/etc/bonfyre/ → ~/.config/bonfyre/ → ./), env override (BONFYRE_SECTION_KEY), $VAR expansion

Text & Unicode (2)
bf_utf8 — SIMD UTF-8: SSE4.2 PCMPISTRI / NEON validation, Hoehrmann DFA, codepoint iteration, case folding (ASCII + Latin-1/Extended-A + Greek + Cyrillic), U+FFFD sanitization
bf_textfuse — Hyperscan/Aho-Corasick pattern fusion: 6 pattern classes (filler/halluc/keyword/quality/tag/chunk-hdr), single-pass >1 GB/s or fallback 200 MB/s

Compression (2)
bf_lz4 — self-contained LZ4 block compression: single-shot + streaming API, hash-table greedy match finder, 64-byte 8-wide match length. No external tool spawn
bf_fountain — Luby Transform fountain codes: O(N) rateless encoder/decoder, 5% overhead, robust soliton distribution, any K-of-N reconstruction

Containers (1)
bf_hashmap — Robin-hood open-addressing: power-of-2 capacity, ~0.75 load factor, backshift deletion (no tombstones), FNV-1a default hash, configurable callbacks, string convenience API

CLI & Ops (3)
bf_opts — declarative CLI parser: typed args (BOOL/STRING/INT/DOUBLE/PATH), subcommands, auto --help, combined short opts, ~ expansion
bf_log — structured logging with level filtering, color terminal output, file rotation
bf_hotload — live pipeline reload: dlopen plugins, file-watcher (kqueue/inotify), A/B test support, per-stage metadata/flags

liblambda-tensors — 5-tier structural compression

Family-based JSON compression: artifacts sharing structure (same schema, same field names) compress dramatically better together. Every tier preserves O(1) random field access — no full decompression needed.

V1 — Varint + zigzag + type tags
88% of raw · 12 type codes · Headers only overhead

V2 — Small-int/float32/empty-str + LZ77
64.9% of raw · Position stride table

Interned — Cross-member string dedup
29% of raw · Family string pool index

Huffman — Per-position canonical codes
13.5% of raw · Cost-pruned codebooks

Arithmetic — Range coding
~11% of raw · Shannon entropy measurement

Operator Model

Every pipeline stage is a typed operator with declared behavior — the scheduler uses these flags to parallelize, cache, and reorder safely.

51 Operators Across 4 Layers

Substrate (9 ops) — ingest, normalize, segment, chunk
Transform (22 ops) — transcribe, summarize, score, semantics
Surface (10 ops) — format, render, template, deliver
Value (10 ops) — price, invoice, pay, distribute, prove

Behavioral Flags

PURE — deterministic, no side effects → safe to cache & parallelize
STATEFUL — mutates external state (DB, filesystem)
CACHEABLE — output cached by SHA-256(input) in LMDB
REVERSIBLE — undo supported (audit, rollback)
IDEMPOTENT — safe to retry without duplication

Exactness Classes

BF_EXACT_BYTE — bit-identical output every run (hash, compress)
BF_EXACT_CANON — canonical but with format flexibility (JSON field reorder)
BF_EXACT_LOSSY — model-dependent (transcription, summarization)

Content-Addressed Artifacts

Every operator input/output is a BfArtifact. SHA-256 content hash for identity. family_key (FNV-1a-64 of type+system) groups related artifacts. canonical_key distinguishes format variants. Binary cache files: .bfrec records + .bfsum summaries.

Cross-cutting Architecture

Capabilities that span every binary, not just one stage.

Content-Addressed Dedup

Every artifact gets a SHA-256 identity. If the hash already exists in the local LMDB cache, the stage is skipped entirely. Re-running a pipeline on the same input costs near zero — only changed stages execute.

Zero-Copy LMDB Cache

512 MB memory-mapped store with 4 named databases (artifacts, families, blobs, meta). Reads are pointer casts into mmap — no malloc, no deserialization. Cache lookups take ~200 ns vs ~5 ms disk I/O.

Family-Aware Storage

Artifacts with the same schema (type+system) share a family_key. Lambda-tensors compress families 7× better than individual compression because field names, types, and structure are shared across the family.

Fountain Codes for Lossy Links

Luby Transform rateless encoding: sender emits coded packets forever, receiver reconstructs from any K-of-N. 5% overhead. Perfect for satellite, mobile, or lossy last-mile delivery where TCP retransmits are expensive.

Operator Hot-Loading

bf_hotload watches operator .so plugins via kqueue/inotify. When a new version appears, the pipeline swaps it in without restart. Supports A/B testing across operator versions with per-stage metadata.

Work-Stealing Parallelism

Chase-Lev lock-free deques with C11 atomics: each core gets a private work queue, idle cores steal from random peers. Combined with libdill coroutines (~50 ns context switch), pipelines fully saturate all cores with minimal coordination.

SIMD Throughout

SSE4.2 and ARM NEON fast paths in JSON parsing (16-byte aligned scan), UTF-8 validation (PCMPISTRI), text pattern fusion (Hyperscan >1 GB/s), and vector cosine similarity. Automatic runtime dispatch — falls back cleanly on older hardware.

Crypto Primitives

BLAKE2b (hash/HMAC/KDF), XSalsa20-Poly1305 (AEAD encryption), Ed25519 (signing). No OpenSSL dependency. Portable C implementation with optional libsodium delegation if available.

How-To: Pick Your Entry Point

Each is standalone — you don't need to understand the whole system.

Direct .fpq Inference (SLI Bridge)

Run models directly from .fpq files. patch_model() replaces nn.Linear with FPQLinear. TinyLlama: 155 layers, cos=0.997, top-1=97.3%. bonfyre-oss.

patch_model(hf_model, fpq, resolver)
5 min to try

KV Cache Compression

4× more context tokens in same VRAM. 9 optimization passes. Works with nn.Linear and FPQLinear simultaneously.

patch_kv_cache(model, bits=4, adaptive_bits=True)
2 min to try

Model Quantization (v8 RLF)

Quantize LLM weights to 3-bit with 0.9999+ cosine. E8 lattice snap + μ-law warp + 16D RVQ. 42 KB binary.

bonfyre-quant benchmark model.gguf --bits 3
5 min to try

Lightweight CMS

Replace Strapi's 500 MB with a 287 KB binary. Dynamic schemas, token auth, REST API. bonfyre-cms.

bonfyre-cms serve --port 8800
2 min to try

Local Transcription + HCP

Local speech for public or private audio. Live proof. bonfyre-intake.

bonfyre-transcribe run audio.wav
5 min to try

Audio-to-Invoice Pipeline

Audio → transcript → summary → quality score → pricing → deliverable. 5–8 ms per stage. bonfyre-pipeline.

bonfyre-pipeline run --input audio.mp3
2 min to try

Semantic Vector Search

Embed docs + NEON SIMD cosine search. Replace $250/mo Pinecone. bonfyre-embed.

bonfyre-embed --insert-db my.db
5 min to try

Realtime Relay

Pure C WebTransport/MoQ relay for private live audio and browser-native media flows. Keeps the same Bonfyre style: small binary, local control, SQLite observability, no hosted relay vendor in the middle.

./bonfyre-moq --port 4443 --db relay.db
make bonfyre-moq to build

OpenAI-Compatible API

Drop-in replacement. Set OPENAI_API_BASE=http://localhost:8787. 53 KB binary.

bonfyre-proxy serve --port 8787
1 min to try

Install

Build from source in under 60 seconds. All 83 modules, pure C11, no runtime dependencies.

Platform support

Platform	Status	Path
macOS arm64 (Apple Silicon)	✓ native	`make && make install` or `./install.sh`
macOS x86_64 (Intel)	✓ native	`make && make install` or `./install.sh`
Linux x86_64	✓ build from source	`make && make install` or `docker build .`
Linux arm64	✓ build from source	`make && make install`
Windows	WSL2	use WSL2 + Ubuntu path above

⚠ If you downloaded a zip: bundled binaries are Mach-O arm64 (macOS Apple Silicon only) and will not run on Linux. Linux users must build from source or use Docker.

macOS — build from source


        # Clone and build (macOS — recommended)

        git clone https://github.com/Nickgonzales76017/bonfyre-oss.git

        cd bonfyre-oss

        make # builds all 83 modules from source

        make install # copies binaries to ~/.local/bin


        # One-liner (macOS)

        curl -fsSL https://raw.githubusercontent.com/Nickgonzales76017/bonfyre-oss/main/install.sh | sh

Linux — build from source


        # Ubuntu / Debian — install deps then build

        sudo apt-get install -y gcc make libsqlite3-dev zlib1g-dev pkg-config

        git clone https://github.com/Nickgonzales76017/bonfyre-oss.git

        cd bonfyre-oss

        make CC=gcc # builds all modules

        make install # copies to ~/.local/bin


        # One-liner (Linux)

        curl -fsSL https://raw.githubusercontent.com/Nickgonzales76017/bonfyre-oss/main/install.sh | sh

Docker — any platform


        # Build + run the full Bonfyre stack in Docker (Linux x86_64 binaries)

        git clone https://github.com/Nickgonzales76017/bonfyre-oss.git

        cd bonfyre-oss

        docker compose up -d # API on :9999, queue worker, SQLite WAL


        # Extract Linux binaries from Docker image (macOS → Linux cross-build)

        make docker-build # outputs Linux x86_64 binaries to out/linux-x86_64/

Build requirements: C11 compiler (gcc ≥ 9 or clang ≥ 11), make, SQLite3 dev headers, zlib. Optional: ONNX Runtime (bonfyre embed), FreeSWITCH (bonfyre tel), PyTorch SLI bridge.

Latest upgrades in this repo