Everything you need to evaluate, install, and run BonfyreFPQ and the Bonfyre Stack. Benchmarks, hardware fits with pricing, reproduction commands, and full architecture reference.
A short technical changelog for what moved since the last public push.
When runtime code, registries, or CLI surfaces change, this is the command block that repopulates the operator-facing views and keeps `bonfyre-oss` honest after a pull.
Pure C model compression engine. No GPU required. No Python dependencies. No training.
The SLI bridge is live β native .fpq files run inference directly, no decompression step.
Near-lossless per-tensor across all 307 encoded tensors in Wan2.1-T2V-1.3B. Worst tensor: 0.999590.
Error stays controlled after 30 stacked transformer blocks. PSNR 35.97 dB. MSE 1.01e-3.
Cosine holds 0.9976β0.9983 across the full diffusion schedule. No drift amplification.
| Model | Domain | Original | Compressed | Tensors | Avg Cos | Worst Cos | Avg bpw | HF Model |
|---|---|---|---|---|---|---|---|---|
| Wan2.1-T2V-14B | Video diffusion | 54 GB | 27 GB | 402 | 0.999882 | 0.999826 | 4.05 | Download β |
| Phi-4 (14B) | Language model | 28 GB | 28 GB | 162 | 1.000614 | 1.000149 | 4.08 | Download β |
| Whisper Large V3 | Speech recognition | 8.7 GB | 5.8 GB | 998 | 0.999916 | 0.999834 | 4.19 | Download β |
| Whisper Large V3 Turbo | Speech recognition | 1.6 GB | 1.6 GB | 228 | 0.999929 | 0.999858 | 4.18 | Download β |
| Wan2.1-T2V-1.3B | Video diffusion | 5.3 GB | 2.7 GB | 307 | 0.999874 | 0.999590 | β | local |
| SmolLM2-135M | Language model | 101 MB | 258 MB F16 | 211 | 0.999855 | 0.999589 | β | GGUF |
| Gemma 2B-it | Language model | β | β | sampled | 0.99995 | 0.99995 | β | local |
| Whisper base.en | Speech recognition | β | β | sampled | 0.999808 | 0.999763 | β | GGML |
Artifacts on Hugging Face: (1) compatibility safetensors (direct Transformers load), and (2) native .fpq v12 files (rANS entropy-coded, 3.5β5.2Γ smaller, direct inference via SLI bridge).
Loaded the original Wan2.1 model and FPQ-compressed version into the same WanTransformer3DModel architecture. Fed identical synthetic inputs (seed=42, shape [1,16,1,60,104], BF16 on MPS). Compared full forward pass outputs.
| Timestep | Cosine | PSNR (dB) | MSE |
|---|---|---|---|
| t = 0 | 0.99831 | 34.82 | 1.32e-3 |
| t = 100 | 0.99792 | 35.49 | 1.13e-3 |
| t = 500 | 0.99759 | 35.97 | 1.01e-3 |
| t = 900 | 0.99782 | 36.02 | 1.00e-3 |
| t = 999 | 0.99804 | 35.71 | 1.07e-3 |
Identical inputs at each timestep, BF16 on MPS. Cosine range: 0.99759β0.99831. Zero drift.
994-token slice, max-length 512, stride 256. All runs on same hardware, same data.
| Method | PPL | Ξ baseline | Avg Cos | Worst Cos |
|---|---|---|---|---|
| Baseline (FP32) | 14.20 | β | 1.0000 | 1.0000 |
| BonfyreFPQ @3-bit | 14.48 | +1.97% | 0.999783 | 0.999588 |
| HQQ @3-bit (g64) | 32.38 | +128% | β | β |
| COORD @3-bit (v4) | 35.59 | +150% | 0.982761 | 0.982327 |
169 tensors. HQQ via standalone benchmark (group-size 64, axis 1, CPU). Proof pack.
From authors' own papers. Lower PPL = better. FP16 baseline: 5.12.
| Method | Bits | PPL | Ξ FP16 | Source |
|---|---|---|---|---|
| FP16 (baseline) | 16 | 5.12 | β | β |
| AQLM | 3.04 | 5.46 | +6.6% | Egiazarian et al., ICML 2024 |
| SpQR | 2.98 | 6.20 | +21.1% | Dettmers et al., 2023 |
| AWQ | 3 | 6.24 | +21.9% | Lin et al., MLSys 2024 |
| GPTQ | 3.00 | 8.06 | +57.4% | Frantar et al., 2022 |
| HQQ | 3 | Not published on Llama-2 | Badri & Shelor, 2023 | |
The newer work is not just βmore helper modules.β It broadened both the shared runtime and the system-facing binaries around proxy, orchestration, communications, realtime relay, and delivery.
Lightweight, getopt-style parser with subcommand support, typed options, and automatic `--help` generation for each binary.
Per-process worker pool with task queues, priorities, and cooperative shutdown hooks used by servers and background jobs.
High-speed UTFβ8 validator with SSE/NEON paths for safe, zero-copy text ingestion and pipeline sanitization.
Simple INI/TOML-style parser with typed readers and layered includes for runtime configuration and hardware profiles.
Thin LZ4 wrapper with streaming encode/decode helpers used by artifact transport and the local cache.
Cache-friendly hash table with robinβhood probing and predictable iteration used in in-memory indexes and caches.
Low-allocation CSV I/O with typed columns and dialect handling for bulk import/export tasks.
Fast JSON helpers and a MessagePack event encoder/decoder for compact telemetry, ledgers, and manifest interchange.
Portable BLAKE2b + ChaCha20βPoly1305 wrappers for authenticated encryption, with passwordβhash and signature helpers.
Small Bloom filter utilities for single-pass dedup and membership checks in ingestion and pipeline stages.
Asynchronous DNS resolver with a tiny LRU cache and reactor integration for network-facing binaries.
Structured, ring-buffer logging with per-thread buffers and file rotation for low-latency telemetry.
Zero-copy mmap helpers with sensible madvise hints and safe fallbacks for pipes and small files.
LMDB-backed artifact cache with zero-copy reads and named sub-databases for manifests, blobs, and families.
Coroutine-based pipeline primitives (libdill style) and an Hyperscan-powered single-pass text fusion engine for fast, composable pipeline stages.
FlatBuffers builders/readers for compact artifact manifests and zero-copy transport between components.
SIMD-accelerated HTTP parsing in the reactor and optional per-reactor mimalloc arenas to reduce contention on high-concurrency servers.
OpenAI-compatible local API shim for transcriptions, chat-style summarization, and model listing. Useful when existing clients need to keep working while Bonfyre takes over the backend path.
Machine-only planner with policy memory, typed request state, and optional Gemma/OpenAI-compatible assist. Decides when extra Bonfyre blocks should be used instead of exposing a prompt UI.
Communications edge plus browser-native realtime relay. SIP voice/SMS/MMS on one side, MoQ/WebTransport on the other, both kept in the same C-first operating model.
Metered artifact distribution and verified audio output. One expands how artifacts move; the other expands how finished outputs are published and checked.
If you want short usage examples for any module (headers, init, simple API), tell me which ones and I'll add code snippets and minimal examples in this section.
bonfyre-fpq quantize model.gguf compressed.gguf --bits 3
Direct runtime inference from .fpq is now working. Load a compressed model and run it immediately β no conversion, no extra RAM, no hacks. The SLI bridge (Spectral Lattice Inference) is fully integrated.
patch_model(hf_model, fpq, resolver) β replaces nn.Linear layers with FPQLinear. No decode step, no weight copy.
FPQ-X evolves BonfyreFPQ from a quantizer into a full compression algebra.
All six operator families are implemented and validated in fpqx_ops.c + fpq_bridge.py.
Low-rank SVD + E8 lattice + 16D RVQ + QJL projection + Ghost correction. The proven foundation delivering 0.999+ cosine across 1,790 tensors.
Learns S = I + ABT via thin SVD of the ratio matrix Q = W/Ε΄ β 1. Captures scaling distortion that additive methods miss. Auto-rollback if cosine doesn't improve.
Per-column linear predictor from the low-rank basis to the quantization residual. Uses already-available L factor to predict and cancel systematic error.
Attention-weighted K-means++ on KV cache vectors. Compresses along the sequence dimension β tokens that attend similarly share one cache atom. Orthogonal to weight quantization.
Profiles each tensor: Ξ·L (low-rank energy), spectral gap, kurtosis, outlier fraction. Decision tree selects which operators to activate and at what rank.
Inner-group quantization that aligns bit boundaries to SIMD lanes. Stores scales per group, enabling vectorized unpacking without scatter/gather overhead.
| Dimension | FPQ v10 | FPQ-X |
|---|---|---|
| Error model | Additive only (W β Ε΄) | Additive Γ Multiplicative + Predictive |
| Per-tensor policy | Same pipeline for all | Ξ profiles Ξ·L, gap, kurtosis β selects operators |
| KV cache | Weight-only quantization | D operator: sequence-axis distillation |
| Hardware awareness | Generic packing | H operator: SIMD-lane-aligned groups |
| Objective | min βW β Ε΄β | min Ξ»RΒ·Rate + Ξ»DΒ·Distortion + Ξ»EΒ·Execution |
| Research basis | Original FPQ design | 9 papers from early 2026 |
# Full A+M+Ξ pipeline
bonfyre-fpqx compress model.safetensors compressed.safetensors --bits 3
# Roundtrip quality test
bonfyre-fpqx roundtrip model.safetensors --bits 3
# Per-tensor compressibility analysis
bonfyre-fpqx profile model.safetensors
# KV cache distillation
bonfyre-fpqx distill cache.safetensors distilled.safetensors --atoms 256
# Hardware-aligned repacking
bonfyre-fpqx pack model.safetensors packed.safetensors --bits 3 --group-size 128
Baseline cosine numbers (bonfyre-kvcache C benchmark). All 9 Python optimizations live in fpq_bridge.py.
| Bits | KV Cosine | Hardware implication |
|---|---|---|
| 5-bit | 0.99996 | ~5.3Γ more context β 8K ctx β 42K in same VRAM |
| 4-bit | 0.99994 | 4Γ more context β recommended for production |
| 3-bit | 0.99990 | 5.3Γ context, some quality loss on long sequences |
High-attention blocks dominate tile assignment. Codebook quality concentrates where the model actually looks.
Ξ-profiler analyzes each K/V layer β kurtosis, spectral gap, outlier fraction β to pick the right bit depth automatically.
One 256-tile codebook learned across 8 sample layers. Skip per-call K-means β compress all layers in amortized O(1).
K-means++ on KV vectors to K atoms (K βͺ N). Tokens that attend similarly share one atom. Bug-free nearest-centroid lookup.
Only the delta vs previous frame is compressed. Each new token costs far less than storing a new frame.
E8 coordinate magnitude as Huffman code length proxy. High-cost blocks get upweighted β rate-quality jointly optimized.
Near-zero blocks (max abs β€ 63) bypass E8 lattice β 7-bit integer round + clamp. Significant throughput win on embedding layers.
Per-row scale vector on each FPQLinear. Applied after SLI matmul: corrects per-output-channel amplitude drift.
ARM NEON 128-bit aligned pre-packing. Eliminates scatter/gather β vectorized unpacking on Apple Silicon and Jetson.
Use individually or compose: patch_kv_cache(adaptive_bits=True, shared_tiles=tiles) activates #4 + #5 simultaneously.
FPQ weight compression + KV cache optimizations change the hardware equation. What used to need cloud GPUs now fits on-device. Use cases shift depending on your hardware budget.
| Device | RAM | Approx. Cost (2026) | Before (BF16) | After (FPQ 4-bit + KV) |
|---|---|---|---|---|
| Raspberry Pi 5 | 8 GB | $80 | TinyLlama only, 512-token ctx, no video | TinyLlama + 2K ctx, Whisper turbo inference, local ASR pipeline |
| Jetson Orin Nano | 8 GB | $250 | Qwen 0.5B only, degraded at >512 ctx | Qwen 0.5B @ 4K ctx Β· NEON packing Β· embeddings + FPQ inference co-resident |
| Apple M1 MacBook (16 GB) | 16 GB unified | $900 refurb | Qwen 0.5B (tight), Wan 1.3B (no headroom) | Wan 1.3B + 4K ctx KV Β· HCP speech + SLI co-resident Β· NEON-packed |
| Apple M2/M3 Max (64 GB) | 64 GB unified | $2,500β3,500 | Phi-4 14B (no KV headroom past 2K) | Wan 14B @ 8K ctx Β· Phi-4 @ 32K ctx with delta KV Β· full pipeline concurrent |
| T4 cloud (16 GB VRAM) | 16 GB VRAM | ~$0.35/hr spot | Qwen 3B, 2K ctx max before OOM | Qwen 3B @ 8K ctx Β· Wan 1.3B + full diffusion sweep Β· shared KV codebook |
| RTX 4090 (24 GB VRAM) | 24 GB VRAM | $1,600 GPU / ~$0.50/hr cloud | Wan 1.3B (tight), Phi-4 14B doesn't fit | Wan 1.3B @ 32K ctx Β· Phi-4 14B fits Β· adaptive bits saves ~30% KV RAM |
| RTX 6000 Ada (48 GB VRAM) | 48 GB VRAM | ~$1.10/hr (RunPod) | Wan 14B (tight), long video sequences OOM | Wan 14B Β· 287 SLI layers @ 5-timestep sweep Β· multi-second video KV cached |
Local ASR with Whisper turbo. TinyLlama inference. Bonfyre pipeline binaries. Edge transcription kiosk.
Qwen 0.5B + 4K ctx. Wan 1.3B video. HCP speech + SLI co-resident. NEON packing. Full local pipeline.
Wan 14B @ 8K ctx. Phi-4 @ 32K ctx with delta KV. Full Bonfyre pipeline + inference concurrent. Production-grade local stack.
Burst GPU for video generation, large model inference, SLI sweeps. Use FPQ to fit bigger models on cheaper instances. T4 now handles what used to need A100.
TinyLlama 1.1B β stays in .fpq at runtime, no decode step. BF16 copy never materializes.
4-bit KV compression in same VRAM budget. Delta encoding makes each new token incremental.
H-operator pre-packing on Apple Silicon and Jetson β vectorized unpacking, no scatter/gather.
FPQ model + HCP speech + vector search + pipeline can run concurrently on a 16 GB Mac.
Every number from a real run. Raw logs, scripts, and CSVs in the repo. β FPQ overview
BF16 safetensors (drop-in) + native .fpq v12 files (rANS, direct SLI inference).
Qwen PPL (v8 vs v4 vs HQQ), Whisper roundtrip, CSV, PNG chart, reproduction commands.
View proof pack βFull benchmark report: version progression, weight tables, KV cache, speed optimization, binary sizes.
View benchmarks doc βPure C11: main.c, fpq_codec.c, ggml_reader.c, fpq.h. Builds with make on macOS/Linux.
View source βApple M-series, after 6 optimization passes (P0βP6). All real. FPQ benchmarks β
| Metric | Before (P0) | After P5 | Improvement |
|---|---|---|---|
| Single embed | ~600 ms | 237 ms | 2.5Γ |
| 10-file batch embed | ~6,000 ms | 386 ms | 15.5Γ |
| Pipeline (6 stages) | 76 ms | 8 ms | 9.5Γ |
| Tag inference | ~150 ms | 6 ms | 25Γ |
| Hash hex | ~100 ns | ~10 ns | ~10Γ |
| Artifact struct | 1,076 bytes | 536 bytes | 2Γ cache density |
| Vector file (384-dim) | 6.4 KB JSON | 1,544 bytes VECF | 4.2Γ smaller |
bonfyre-moq is the C-first realtime transport layer for BonfyreTel.
Built on ngtcp2 + nghttp3 + OpenSSL 3 + SQLite, it gives Bonfyre a browser-native relay primitive that matches the rest of the stack:
small binary, local control, SQLite observability, and no dependency on a hosted realtime vendor.
bonfyre-moq.cHandles QUIC, HTTP/3, WebTransport session setup, MoQ control parsing, subscriber fan-out, and SQLite stream/session logging in one C binary.
transport_sessions + stream_eventsUses the same Bonfyre style as the rest of the stack: log the useful state to SQLite, keep the binary understandable, and make downstream processing compose cleanly with other tools.
bonfyre-moq is the product pathThe goal is not βa Node relay with a C rewrite later.β The goal is all C. The Node moq-edge subtree is useful as a spec/reference harness, but the main narrative and target runtime stay native.
inference.c / optimizer.c / mesh.c / consensus.cThese modules exist, but they should be read as optional experiments around the relay, not the core promise. The core promise is simpler: private realtime media transport in a small native binary.
# Build the native relay
make bonfyre-moq
# Run relay
./bonfyre-moq --host 127.0.0.1 --port 4443 \
--runtime-dir /tmp/bonfyre-moq \
--db /tmp/bonfyre-moq/relay.db
# Optional extension-module check
make test-bonfyre
A small stack of focused binaries and libraries. Not a monolith. Not a framework. Each piece stays legible enough to run on its own.
Each binary does one thing. Compose with pipes, files, or the pipeline binary. bonfyre-media-prep audio.wav | bonfyre-transcribe | bonfyre-brief
Every binary runs as its own process. No shared memory. If one crashes, nothing else does. Separate processes clean up on exit.
Whisper via libwhisper (Homebrew). LLM via llama-completion subprocess. SQLite via system library.
The stack now spans 51 focused binaries shown here, plus shared libraries. Every piece is standalone enough to use alone. ~2.1 MB total disk.
Every binary links libbonfyre.a (~180 KB). Zero external dependencies beyond libc + SQLite. All modules are C11, portable across macOS/Linux/BSD, with SIMD fast paths on ARM NEON and x86 SSE4.2 where applicable.
Family-based JSON compression: artifacts sharing structure (same schema, same field names) compress dramatically better together. Every tier preserves O(1) random field access β no full decompression needed.
Every pipeline stage is a typed operator with declared behavior β the scheduler uses these flags to parallelize, cache, and reorder safely.
Substrate (9 ops) β ingest, normalize, segment, chunk
Transform (22 ops) β transcribe, summarize, score, semantics
Surface (10 ops) β format, render, template, deliver
Value (10 ops) β price, invoice, pay, distribute, prove
PURE β deterministic, no side effects β safe to cache & parallelize
STATEFUL β mutates external state (DB, filesystem)
CACHEABLE β output cached by SHA-256(input) in LMDB
REVERSIBLE β undo supported (audit, rollback)
IDEMPOTENT β safe to retry without duplication
BF_EXACT_BYTE β bit-identical output every run (hash, compress)
BF_EXACT_CANON β canonical but with format flexibility (JSON field reorder)
BF_EXACT_LOSSY β model-dependent (transcription, summarization)
Every operator input/output is a BfArtifact. SHA-256 content hash for identity. family_key (FNV-1a-64 of type+system) groups related artifacts. canonical_key distinguishes format variants. Binary cache files: .bfrec records + .bfsum summaries.
Capabilities that span every binary, not just one stage.
Every artifact gets a SHA-256 identity. If the hash already exists in the local LMDB cache, the stage is skipped entirely. Re-running a pipeline on the same input costs near zero β only changed stages execute.
512 MB memory-mapped store with 4 named databases (artifacts, families, blobs, meta). Reads are pointer casts into mmap β no malloc, no deserialization. Cache lookups take ~200 ns vs ~5 ms disk I/O.
Artifacts with the same schema (type+system) share a family_key. Lambda-tensors compress families 7Γ better than individual compression because field names, types, and structure are shared across the family.
Luby Transform rateless encoding: sender emits coded packets forever, receiver reconstructs from any K-of-N. 5% overhead. Perfect for satellite, mobile, or lossy last-mile delivery where TCP retransmits are expensive.
bf_hotload watches operator .so plugins via kqueue/inotify. When a new version appears, the pipeline swaps it in without restart. Supports A/B testing across operator versions with per-stage metadata.
Chase-Lev lock-free deques with C11 atomics: each core gets a private work queue, idle cores steal from random peers. Combined with libdill coroutines (~50 ns context switch), pipelines fully saturate all cores with minimal coordination.
SSE4.2 and ARM NEON fast paths in JSON parsing (16-byte aligned scan), UTF-8 validation (PCMPISTRI), text pattern fusion (Hyperscan >1 GB/s), and vector cosine similarity. Automatic runtime dispatch β falls back cleanly on older hardware.
BLAKE2b (hash/HMAC/KDF), XSalsa20-Poly1305 (AEAD encryption), Ed25519 (signing). No OpenSSL dependency. Portable C implementation with optional libsodium delegation if available.
Each is standalone β you don't need to understand the whole system.
Run models directly from .fpq files. patch_model() replaces nn.Linear with FPQLinear. TinyLlama: 155 layers, cos=0.997, top-1=97.3%. bonfyre-oss.
patch_model(hf_model, fpq, resolver)
4Γ more context tokens in same VRAM. 9 optimization passes. Works with nn.Linear and FPQLinear simultaneously.
patch_kv_cache(model, bits=4, adaptive_bits=True)
Quantize LLM weights to 3-bit with 0.9999+ cosine. E8 lattice snap + ΞΌ-law warp + 16D RVQ. 42 KB binary.
bonfyre-quant benchmark model.gguf --bits 3
Replace Strapi's 500 MB with a 287 KB binary. Dynamic schemas, token auth, REST API. bonfyre-cms.
bonfyre-cms serve --port 8800
Local speech for public or private audio. Live proof. bonfyre-intake.
bonfyre-transcribe run audio.wav
Audio β transcript β summary β quality score β pricing β deliverable. 5β8 ms per stage. bonfyre-pipeline.
bonfyre-pipeline run --input audio.mp3
Embed docs + NEON SIMD cosine search. Replace $250/mo Pinecone. bonfyre-embed.
bonfyre-embed --insert-db my.db
Pure C WebTransport/MoQ relay for private live audio and browser-native media flows. Keeps the same Bonfyre style: small binary, local control, SQLite observability, no hosted relay vendor in the middle.
./bonfyre-moq --port 4443 --db relay.db
Drop-in replacement. Set OPENAI_API_BASE=http://localhost:8787. 53 KB binary.
bonfyre-proxy serve --port 8787
Every number on this site comes from scripts in the repo.
# Weight roundtrip β Wan2.1 (307 tensors)
./bonfyre-fpq roundtrip-v9 ~/.local/share/models/wan2.1-t2v-1.3b/diffusion_pytorch_model.safetensors --bits 3
# Compress GGUF (llama.cpp compatible)
./bonfyre-fpq quantize model.gguf compressed.gguf --bits 3
# Compress safetensors
./bonfyre-fpq quantize model.safetensors compressed.safetensors --bits 3
# Perplexity benchmark
python3 perplexity_benchmark.py --model Qwen/Qwen2.5-0.5B --bits 3 --mode v8
# DiT forward-pass comparison
python3 scripts/wan_dit_compare.py
# SLI bridge inference test
python3 test_sli_plus_kvcache.py --device mps
# Full Bonfyre pipeline
git clone https://github.com/Nickgonzales76017/bonfyre-oss.git && cd bonfyre-oss && make
time ./bin/bonfyre-pipeline run --input audio.wav
# Run tests
make test # 167/167 tests
All scripts, logs, and CSVs: 10-Code/BonfyreFPQ/ Β· Proof pack: results/2026-04-10-proof-pack/
Build from source in under 60 seconds. All 83 modules, pure C11, no runtime dependencies.
Platform support
| Platform | Status | Path |
|---|---|---|
| macOS arm64 (Apple Silicon) | β native | make && make install or ./install.sh |
| macOS x86_64 (Intel) | β native | make && make install or ./install.sh |
| Linux x86_64 | β build from source | make && make install or docker build . |
| Linux arm64 | β build from source | make && make install |
| Windows | WSL2 | use WSL2 + Ubuntu path above |
β If you downloaded a zip: bundled binaries are Mach-O arm64 (macOS Apple Silicon only) and will not run on Linux. Linux users must build from source or use Docker.
macOS β build from source
# Clone and build (macOS β recommended)
git clone https://github.com/Nickgonzales76017/bonfyre-oss.git
cd bonfyre-oss
make # builds all 83 modules from source
make install # copies binaries to ~/.local/bin
# One-liner (macOS)
curl -fsSL https://raw.githubusercontent.com/Nickgonzales76017/bonfyre-oss/main/install.sh | sh
Linux β build from source
# Ubuntu / Debian β install deps then build
sudo apt-get install -y gcc make libsqlite3-dev zlib1g-dev pkg-config
git clone https://github.com/Nickgonzales76017/bonfyre-oss.git
cd bonfyre-oss
make CC=gcc # builds all modules
make install # copies to ~/.local/bin
# One-liner (Linux)
curl -fsSL https://raw.githubusercontent.com/Nickgonzales76017/bonfyre-oss/main/install.sh | sh
Docker β any platform
# Build + run the full Bonfyre stack in Docker (Linux x86_64 binaries)
git clone https://github.com/Nickgonzales76017/bonfyre-oss.git
cd bonfyre-oss
docker compose up -d # API on :9999, queue worker, SQLite WAL
# Extract Linux binaries from Docker image (macOS β Linux cross-build)
make docker-build # outputs Linux x86_64 binaries to out/linux-x86_64/
Build requirements: C11 compiler (gcc β₯ 9 or clang β₯ 11), make, SQLite3 dev headers, zlib.
Optional: ONNX Runtime (bonfyre embed), FreeSWITCH (bonfyre tel), PyTorch SLI bridge.
Full source. Real benchmarks. Reproduction scripts included.