πŸ”₯ Bonfyre

50 composable C11 binaries that replace your entire backend stack. CMS, transcription, telephony, auth, payments, vector search, compression β€” each under 70 KB.

50 Binaries
~2.1 MB Total Disk
5–8 ms Per-Stage Latency
9.3% JSON Compression
167 Tests Passing
3 Public-Origin Proofs

The first live proof corpus now traces public YouTube sources through Bonfyre's transient audio path into derived transcripts, handoff briefs, proof bundles, and in-app search without hosting the original media.

git clone https://github.com/Nickgonzales76017/bonfyre.git && cd bonfyre && make
Now shown on public-origin audio
YouTube source in. Bonfyre proof out.

The Shift Handoff app now starts from linked public videos, processes them transiently through Bonfyre, removes the downloaded media, and keeps only derived transcripts, briefs, and proof bundles. The rough edges are visible on purpose.

3 public-origin handoff videos
24-36x realtime on current proofs
0 mirrored source media
4 derived surfaces per source
1 click back to the original video
BonfyreFPQ β€” Functional Polar Quantization

Compress any model ~4Γ—.
Keep the outputs identical.

BonfyreFPQ is a pure C model compression engine that reduces neural network weight files while preserving output quality. No GPU required. No Python dependencies. No training. Just better math.

0.9999 per-weight cosine
0.9976 output cosine (30 layers)
1,790 tensors compressed
54β†’27 GB Wan2.1-T2V-14B
4 models on Hugging Face

Three layers of correctness β€” all verified

Layer 1 β€” Weight Space
cos β‰ˆ 0.9999

Near-lossless per-tensor representation across all 307 encoded tensors in Wan2.1-T2V-1.3B. Worst tensor: 0.999590.

Layer 2 β€” Network Propagation
cos β‰ˆ 0.9976

Error stays controlled after 30 stacked transformer blocks. PSNR 35.97 dB. MSE 1.01e-3.

Layer 3 β€” System Behavior
stable Γ— all timesteps

Cosine holds 0.9976–0.9983 across the full diffusion schedule. No drift amplification.

Models compressed β€” real files, real numbers

Model Domain Original Compressed Tensors Avg Cos Worst Cos Avg bpw HF Model
Wan2.1-T2V-14B Video diffusion 54 GB 27 GB 402 0.999882 0.999826 4.05 Download β†’
Phi-4 (14B) Language model 28 GB 28 GB 162 1.000614 1.000149 4.08 Download β†’
Whisper Large V3 Speech recognition 8.7 GB 5.8 GB 998 0.999916 0.999834 4.19 Download β†’
Whisper Large V3 Turbo Speech recognition 1.6 GB 1.6 GB 228 0.999929 0.999858 4.18 Download β†’
Wan2.1-T2V-1.3B Video diffusion 5.3 GB 2.7 GB 307 0.999874 0.999590 β€” local
SmolLM2-135M Language model 101 MB 258 MB F16 211 0.999855 0.999589 β€” GGUF
Gemma 2B-it Language model β€” β€” sampled 0.99995 0.99995 β€” local
Whisper base.en Speech recognition β€” β€” sampled 0.999808 0.999763 β€” GGML

All tests at 3-bit (FPQ3). Artifacts are published in two tracks on Hugging Face: (1) compatibility safetensors (direct Transformers load, larger files), and (2) native .fpq files (much smaller, but not yet directly usable in standard Transformers inference).

Production bar: compressed artifacts only count as success if they run inference directly with no offline decompression step and no quality regression.

Current blocker: until native .fpq closes that runtime gap, this is a storage result, not a finished inference innovation.

Verified example: Qwen2.5-3B safetensors = 6.18 GB total vs Qwen2.5-3B native .fpq = 692 MB total (~8.9x smaller).

Output-Level Proof β€” Wan2.1-T2V-1.3B DiT Forward Pass

Loaded the original Wan2.1 model and the FPQ-compressed version into the same WanTransformer3DModel architecture. Fed identical synthetic inputs (seed=42, shape [1,16,1,60,104], BF16 on MPS). Compared full forward pass outputs.

0.99759
Cosine Similarity
35.97 dB
PSNR
1.01e-3
MSE
6.18s β†’ 6.40s
Inference Time
Per-channel: ch0 cos=0.9960, ch1 cos=0.9993, ch2 cos=0.9952, ch3 cos=0.9980
Max absolute error: 0.138 (on a Β±0.45 std output range)
Relative error: 6.97% β€” well within the visually safe zone for video generation

Diffusion timestep sweep β€” Wan2.1-T2V-1.3B

Timestep Cosine PSNR (dB) MSE
t = 00.9983134.821.32e-3
t = 1000.9979235.491.13e-3
t = 5000.9975935.971.01e-3
t = 9000.9978236.021.00e-3
t = 9990.9980435.711.07e-3

Identical inputs at each timestep, BF16 on MPS. Cosine range: 0.99759–0.99831. Zero drift.

Perplexity benchmark β€” Qwen 2.5 0.5B, WikiText-2

994-token slice, max-length 512, stride 256. All runs on same hardware, same data.

Method PPL Ξ” baseline Avg Cos Worst Cos
Baseline (FP32)14.20β€”1.00001.0000
BonfyreFPQ @3-bit14.48+1.97%0.9997830.999588
HQQ @3-bit (g64)32.38+128%β€”β€”
COORD @3-bit (v4)35.59+150%0.9827610.982327

169 tensors quantized. HQQ run via standalone benchmark script (group-size 64, axis 1, CPU). Reproduced from proof pack.

Published 3-bit benchmarks β€” Llama-2-7B, WikiText-2

All numbers from the authors' own papers. Lower PPL = better. FP16 baseline: 5.12.

Method Bits PPL Ξ” FP16 Source
FP16 (baseline)165.12β€”β€”
AQLM3.045.46+6.6%Egiazarian et al., ICML 2024
SpQR2.986.20+21.1%Dettmers et al., 2023
AWQ36.24+21.9%Lin et al., MLSys 2024
GPTQ3.008.06+57.4%Frantar et al., 2022
HQQ3Not published on Llama-2Badri & Shelor, 2023

AQLM, SpQR, AWQ, and GPTQ numbers are from their published Llama-2-7B tables (AQLM Table 2, AWQ Table 4). All use WikiText-2 validation. AQLM is currently the best published result at 3-bit.

Where BonfyreFPQ fits: Our reproduced benchmark on Qwen 2.5 0.5B shows +1.97% PPL degradation at 3-bit (14.48 vs 14.20 FP32) β€” see proof pack above. Published methods above are benchmarked on Llama-2-7B (14Γ— larger). Direct cross-model comparison isn't valid, but the degradation pattern is informative: at 3-bit, every published method shows measurable PPL loss. BonfyreFPQ keeps it under 2%. We will publish Llama-2 benchmarks as GPU-hours allow.

Tested across domains

Per-model cosine and PPL numbers in the models table above. All artifacts on Hugging Face.

βœ“14B LLM (Phi-4)
βœ“14B Video diffusion (Wan2.1)
βœ“Speech (Whisper Large V3 + Turbo)
βœ“GGUF round-trip (SmolLM2 Q4_K)
βœ“DiT forward pass verified
βœ“1,790 tensors across 4 models

GGUF format support (llama.cpp compatible)

Reads & dequantizes
F32, F16, Q4_0, Q5_0, Q8_0
Q4_K, Q5_K, Q6_K
Writes
GGUF v3 F16 β€” direct llama.cpp load
Preserves all metadata verbatim

What's inside

Low-Rank SVD
Global structure extraction
E8 Lattice
Optimal 8D quantization
16D RVQ
Structured residual correction
Ghost Head
Rank-1 error correction

One command. Any model. llama.cpp compatible.

bonfyre-fpq quantize model.gguf compressed.gguf --bits 3
Input formats:
GGUF (llama.cpp, whisper.cpp)
Safetensors (HuggingFace)
GGML (legacy whisper)
Output formats:
GGUF F16 β†’ llama.cpp direct load
BF16 safetensors β†’ PyTorch/diffusers
Preserves all metadata + tokenizer
What this means (in plain English)

A 14B video model that needed 54 GB of disk now fits in 27 GB. Same outputs. No retraining. No calibration data. No GPU required for compression. Compressed models are standard BF16 safetensors β€” load them exactly like the originals.

Most compression methods look good at the weight level but degrade when outputs are actually measured. FPQ is the first to prove β€” through real end-to-end inference β€” that compression error doesn't accumulate across deep transformer stacks or iterative diffusion processes.

Current status: compatibility safetensors are inference-ready today. Native .fpq is the smallest storage path but is not yet direct-inference-ready, which means the core product gap is still open. Both are published on Hugging Face with explicit naming.

Published 2-bit benchmarks β€” Llama-2-7B, WikiText-2

Extreme compression regime. FP16 baseline: 5.12.

Method Bits PPL Ξ” FP16 Source
AQLM2.026.59+28.7%AQLM Table 1
QuIP#2.028.22+60.5%Tseng et al., 2024

At 2-bit, even state-of-the-art methods show 29–61% PPL degradation. BonfyreFPQ targets the 3–4 bit regime where near-lossless is achievable.

KV cache compression β€” Qwen 0.5B

Bits PPL Ξ” baseline Avg Cos
FP3211.95β€”1.000
4-bit14.77+23.6%0.9999
3-bit17.89+49.7%0.9997

KV cache is harder than weights β€” errors compound across 24 layers Γ— every token. 4-bit recommended. 3-bit degrades.

Reproduce everything

# Weight roundtrip β€” Wan2.1 (307 tensors, ~15 min on M-series)
./bonfyre-fpq roundtrip-v9 ~/.local/share/models/wan2.1-t2v-1.3b/diffusion_pytorch_model.safetensors --bits 3

# Compress to GGUF (llama.cpp compatible)
./bonfyre-fpq quantize model.gguf compressed.gguf --bits 3

# Compress safetensors (PyTorch/diffusers)
./bonfyre-fpq quantize model.safetensors compressed.safetensors --bits 3

# Perplexity benchmark
python3 perplexity_benchmark.py --model Qwen/Qwen2.5-0.5B --bits 3 --mode v8

# DiT forward-pass comparison (requires PyTorch + diffusers)
python3 scripts/wan_dit_compare.py

All scripts, logs, and CSV artifacts live in 10-Code/BonfyreFPQ/. Proof pack with raw logs: results/2026-04-10-proof-pack/

FPQ-X β€” Generalized Compression Algebra

Six operators. One compiler.
Rate–distortion–execution optimized.

FPQ-X evolves BonfyreFPQ from a quantizer into a full compression algebra. Instead of compressing tensors, FPQ-X compresses information flow β€” optimizing the joint objective of rate, distortion, and hardware execution cost.

𝒯(x,c,h,t) = (B + R + P) βŠ™ S + Ξ (x,c,h,t) + Ξ”_seq(c,t)
A = Additive core Β· M = Multiplicative manifold Β· Ξ  = Predictive restoration Β· D = Sequence distillation

Six operator families β€” each derived from 2026 published research

A
Additive
Inherited from FPQ v10

Low-rank SVD + E8 lattice + 16D RVQ + QJL projection + Ghost correction. The proven foundation delivering 0.999+ cosine across 1,790 tensors.

M
Multiplicative
Low-rank scaling manifold

Learns S = I + ABT via thin SVD of the ratio matrix Q = W/Ε΄ βˆ’ 1. Captures scaling distortion that additive methods miss. Auto-rollback if cosine doesn't improve.

Derived from: LoRDS, WaterSIC
Ξ 
Predictive
Context-conditioned restoration

Per-column linear predictor from the low-rank basis to the quantization residual. At decode time, uses the already-available L factor to predict and cancel systematic error.

Derived from: EchoKV, MoBiQuant
D
Distilled
Sequence-axis compression

Attention-weighted K-means++ on KV cache vectors. Compresses along the sequence dimension β€” tokens that attend similarly share one cache atom. Orthogonal to weight quantization.

Derived from: KVSculpt, KV-CoRE
Ξ›
Adaptive
Per-tensor policy selection

Profiles each tensor: Ξ·L (low-rank energy), spectral gap, kurtosis, outlier fraction. Decision tree selects which operators to activate and at what rank β€” no blanket compression.

Derived from: KV-CoRE, MoBiQuant
H
Hardware
Kernel-aligned packing

Inner-group quantization that aligns bit boundaries to hardware SIMD lanes. Stores scales per group instead of per-channel, enabling vectorized unpacking without scatter/gather overhead.

Derived from: InnerQ, High-Rate QMM

The FPQ-X encode pipeline

1. Ξ› Profile
β†’
2. BWA Prune
β†’
3. A Encode (v9)
β†’
4. M Scale
β†’
5. Ξ  Predict
Each stage has automatic quality rollback β€” if an operator doesn't improve cosine by >1e-7, it's disabled for that tensor. The Ξ› profiler pre-selects operators based on tensor statistics, so most tensors skip inapplicable stages entirely.

FPQ v10 vs FPQ-X β€” what changes

Dimension FPQ v10 FPQ-X
Error modelAdditive only (W β‰ˆ Ε΄)Additive Γ— Multiplicative + Predictive
Per-tensor policySame pipeline for allΞ› profiles Ξ·L, gap, kurtosis β†’ selects operators
KV cacheWeight-only quantizationD operator: sequence-axis distillation
Hardware awarenessGeneric packingH operator: SIMD-lane-aligned groups
Objectivemin β€–W βˆ’ Ε΄β€–min Ξ»RΒ·Rate + Ξ»DΒ·Distortion + Ξ»EΒ·Execution
Research basisOriginal FPQ design9 papers from early 2026

bonfyre-fpqx β€” the new CLI

# Full A+M+Ξ  pipeline β€” compress and write output
bonfyre-fpqx compress model.safetensors compressed.safetensors --bits 3

# Encode+decode roundtrip β€” measure quality (no output file)
bonfyre-fpqx roundtrip model.safetensors --bits 3

# Per-tensor compressibility analysis β€” see which operators activate
bonfyre-fpqx profile model.safetensors

# KV cache distillation β€” sequence-axis compression
bonfyre-fpqx distill cache.safetensors distilled.safetensors --atoms 256

# Hardware-aligned repacking
bonfyre-fpqx pack model.safetensors packed.safetensors --bits 3 --group-size 128

Research foundation β€” 9 papers synthesized

LoRDS
Multiplicative low-rank scaling
arXiv:2601.22716
WaterSIC
Activation-aware rate–distortion
arXiv:2603.04956
EchoKV
Predictive KV reconstruction
arXiv:2603.22910
KVSculpt
Attention-weighted cache distillation
arXiv:2603.27819
KV-CoRE
Data-dependent compressibility
arXiv:2602.05929
InnerQ
Hardware-aligned inner quantization
arXiv:2602.23200
MoBiQuant
Token-adaptive mixed precision
arXiv:2602.20191
High-Rate QMM
Activation-weighted matrix multiply
arXiv:2601.17187
Codebook Opt.
Optimal codebook initialization
arXiv:2602.06557

What Bonfyre replaces

Side-by-side against the industry incumbents. These numbers are real.

vs Strapi (CMS)
1,742Γ— smaller
500 MB install β†’ 287 KB binary. 400 npm deps β†’ 0. 200 MB RAM β†’ 15 MB. Cold start 120 sec β†’ 50 ms.
vs Deepgram (transcription)
local speech + visible proof path
Current public handoff proofs run 24-36x realtime on linked YouTube sources, keep no mirrored media, and publish transcript, clean text, brief, and proof JSON artifacts. Open live proof. Speech engine.
vs Pinecone (vector search)
$0 / month
$70–250/mo hosted β†’ 35 KB binary. Local SQLite + NEON SIMD cosine. 5 ms exact search.
vs Twilio (telephony)
68 KB binary
SaaS vendor lock-in β†’ FreeSWITCH ESL adapter. SIP/RTP, call routing, IVR β€” no per-call billing.
vs Express + Prisma
95Γ— smaller
200 MB + Node.js runtime β†’ ~2.1 MB total. Static binaries, zero runtime deps, < 50 ms startup.
vs full SaaS stack
$0 / month
Auth + billing + gateway + CMS + search: typically $2,500/mo β†’ 240 KB of Bonfyre binaries.
DeepgramOpenAI Whisper APIBonfyre + HCP
Cost$0.006/min$0.006/min$0 / minute
Current public proofNot run hereNot run here3 linked YouTube handoff sources, 0.5303-0.6887 confidence, 0.027-0.041 realtime factor
Model sizeCloud (N/A)Cloud (N/A)29 MB default (tiny q5_0) / 44 MB (base q4_0) / 24 MB (tiny q4_0)
Quality visibilityCloud summaryCloud summarySegment counts, confidence, realtime factor, proof JSON, and source trace exposed in the app
Post-process overheadN/A (cloud)N/A (cloud)<1% of decode time (unified FFT, -O3)
PrivacyCloud β€” data leaves deviceCloud β€” data leaves device100% local, offline, private
Internet requiredYesYesNo
Output formatsJSON, SRTJSON, SRT, VTTJSON + HCP metrics, TXT, SRT, VTT, meta.json
Novel algorithmProprietary cloudWhisper (standard)HCP quad-channel spectral + KIEL-CC Kalman + unified E-T Gate/formant + bigram/trigram semantic + morphological logit bias + context-seeded re-decode + quantization (q4_0/q5_0)
StrapiExpress + PrismaBonfyre
Install size~500 MB~200 MB~2.1 MB
DependenciesNode + 400 packagesNode + 80 packageslibc + SQLite
Startup time30–120 sec2–5 sec< 50 ms
Idle memory~200 MB~80 MB15 MB
Build stepnpm install (2 min)npm install (45 sec)make (8 sec)
RuntimeNode.js 18+Node.js 18+None (static binary)
Binaries1 monolith1 monolith50 composable

Pick the one that matches your problem

Each is a standalone entry point β€” you don't need to understand the whole system.

Lightweight CMS

Replace Strapi's 500 MB install with a 287 KB binary. Dynamic schemas, token auth, REST API. Repo: bonfyre-cms.

bonfyre-cms serve --port 8800
2 min to try

Local Transcription + HCP v3.2

Local speech path for public or private audio: media prep, transcription, cleaning, paragraphs, and proof artifacts. The live Shift Handoff app shows the current public-origin results, including where transcription still needs pressure. Live proof. Repo: bonfyre-intake.

bonfyre-transcribe run audio.wav
5 min to try

JSON Compression

Shrink JSON payloads to 9.3% with O(1) random field access. Near Shannon limit with arithmetic coding. Repo: bonfyre-core. Library repo.

liblambda-tensors
10 min to try

Audio-to-Invoice Pipeline

Audio β†’ transcript β†’ summary β†’ quality score β†’ pricing β†’ packaged deliverable. One command, 5–8 ms per stage. Repo: bonfyre-pipeline.

bonfyre-pipeline run --input audio.mp3
2 min to try

Semantic Vector Search

Embed documents + NEON SIMD cosine search. Replace $250/mo Pinecone β€” local, 5 ms queries. Repo: bonfyre-embed.

bonfyre-embed --insert-db my.db
5 min to try

Self-Host a SaaS Backend

Auth, payments, metering, API keys, rate limiting, telephony β€” composable binaries. ~240 KB total. Umbrella repo. Telephony repo.

bonfyre-api + auth + pay + gate + tel
15 min to try

OpenAI-Compatible API

Drop-in replacement for OpenAI endpoints. Set OPENAI_API_BASE=http://localhost:8787 and existing code just works β€” transcription via HCP, completions via bonfyre-brief. 53 KB binary, localhost only.

bonfyre-proxy serve --port 8787
1 min to try

Model Quantization (v8 RLF)

Quantize LLM weights to 3-bit with 0.9999+ cosine similarity and zero perplexity loss. E8 lattice snap + ΞΌ-law warp + 16D RVQ. Qwen 0.5B: PPL 12.07 vs 11.95 baseline (+0.9%). 42 KB binary.

bonfyre-quant benchmark model.gguf --bits 3
5 min to try

Repo Graph

Bonfyre is no longer a single opaque repo. These are the public entry vectors linked from the live product surface.

Umbrella

The top-level router for architecture, comparisons, and the full system story.

bonfyre

Shared Core

Core substrate, hashing, canonicalization, compression helpers, and the shared C runtime library.

bonfyre-core

Intake

Transcription, ingest, media prep, transcript cleanup, paragraphization, and transcript-family workflows.

bonfyre-intake

Compression Library

Standalone JSON compression and family-aware tensor substrate.

liblambda-tensors

Pipeline

The single-process pipeline surface for audio-to-invoice and other end-to-end flows.

bonfyre-pipeline

Semantic Search

ONNX-backed embeddings and local vector search for document workflows.

bonfyre-embed

CMS

Dynamic schemas, REST API, token auth, and compact content operations in one binary.

bonfyre-cms

Telephony

FreeSWITCH-based telephony, mock call flows, SMS/MMS, and verification without Twilio lock-in.

bonfyre-tel

WordPress

Bonfyre works as a high-performance companion backend for WordPress.

Use WordPress as the experience layer. Use Bonfyre as the tiny local-first engine behind search, media, AI workflows, packaging, auth, and monetization.

Use WordPress for themes, editors, plugins, and admin workflows.
Use Bonfyre for the heavy lifting: transcription, vector search, structured compression, packaging, metering, auth, pricing, and output generation.

WordPress UI Bonfyre binaries search / auth / packages / outputs

15 concrete uses

1. Podcast-to-post pipeline

media-prep → transcribe → brief

Turn episode audio into draft blog posts, summaries, and quotes — automatically.

2. Semantic site search

embed + vec

Index posts by meaning, not just keywords. Replace bloated search plugins.

3. Auto-generated article briefs

brief

Create editorial summaries and action items from long transcripts or notes for editors.

4. Premium member gateway

auth + gate + meter + pay

Back premium features or content tiers without plugin sprawl.

5. Lead magnet generator

render + emit

Produce PDFs, EPUBs, and downloadable guides from WordPress content.

6. Knowledge base search

embed + vec + query

Index docs, FAQs, uploads, and help content for fast semantic retrieval.

7. Client portal backend

auth + meter + pack

WordPress handles presentation. Bonfyre handles auth, metering, file packaging, and deliverables.

8. Call recording to CRM notes

ingest → transcribe → brief → proof → pack

For agencies and consultants: raw call audio into organized, quality-scored client packets.

9. Auto-tagging archives

tag + embed

Enrich old WordPress content with topics, categories, and semantic clusters.

10. Content repurposing engine

render + emit + distribute

Turn one long post or transcript into snippets, email copy, and social-ready assets.

11. Research library companion

embed + pack + emit

WordPress as public frontend. Bonfyre as semantic index + artifact pipeline for PDFs and transcripts.

12. Proposal & invoice automation

offer + ledger + finance + pay

Quoting and billing workflows for agencies — from proof bundles to invoices.

13. Voice memo publishing

transcribe → clean → brief → emit

Upload raw voice notes, publish cleaned, structured, summarized versions.

14. Local AI features

transcribe + embed + vec

Local-first transcription and search without cloud APIs, billing, or vendor lock-in.

15. Fast static publishing

emit + render + distribute

Use WordPress as editor/admin, then Bonfyre to emit alternate site outputs, packages, and feeds.

Binary mapping for WordPress users

WordPress needBonfyre binaries
Smarter CMS / data layerbonfyre-cms, bonfyre-api, bonfyre-index
Audio → article workflowbonfyre-media-prep, bonfyre-transcribe, bonfyre-brief, bonfyre-pack
Semantic searchbonfyre-embed, bonfyre-vec, bonfyre-query
Premium content / subscriptionsbonfyre-auth, bonfyre-gate, bonfyre-meter, bonfyre-pay
Offers / quoting / deliverablesbonfyre-offer, bonfyre-render, bonfyre-emit, bonfyre-pack
Repurposing / multi-format outputbonfyre-render, bonfyre-emit, bonfyre-distribute

Replace plugin sprawl

Typical WordPress

  • Yoast SEO — $99/yr
  • MemberPress — $179/yr
  • SearchWP — $99/yr
  • WP All Import — $99/yr
  • Gravity Forms — $59/yr
  • WooCommerce + 8 add-ons
  • Deepgram/Otter API — $/min
  • Zapier — $49/mo
vs

Bonfyre

  • 50 binaries — $0/month
  • ~2.1 MB total on disk
  • Auth + billing + metering
  • Local transcription
  • Semantic search
  • Multi-format output
  • Dynamic pricing engine
  • Zero vendor lock-in

Good fit

Publishers Agencies Course creators Membership sites Podcast networks Documentation portals Niche research sites Local businesses

Real-world recipes

You don't need to understand the binaries. Bonfyre is a behind-the-scenes engine that takes messy business input — calls, files, notes, recordings — and turns it into something useful, organized, and ready to use.

🏢 Property Managers

What you already have
Maintenance calls, voicemails, inspection notes, resident complaints, vendor quotes.
What Bonfyre does
Turns recordings and notes into clean summaries, searchable records, follow-up items, and downloadable packets.
What you get
Fewer missed requests, faster vendor coordination, cleaner records.
Try this
Resident leaves a voicemail → upload it → get a transcript, summary, issue type, follow-up notes, and maintenance packet.

🍺 Bars & Nightlife

What you already have
Staff updates, vendor calls, event ideas, shift notes, inventory issues, promo materials.
What Bonfyre does
Organizes into shift summaries, event prep notes, staff instructions, vendor records, and promo content.
What you get
Less confusion between shifts, smoother event planning, less wasted time.
Try this
Manager records a voice memo after a busy night → shift recap, issue list, and event lessons-learned.

🍕 Restaurants

What you already have
Staff training notes, supplier calls, customer comments, shift reports, menu updates.
What Bonfyre does
Turns that into clean training docs, supplier summaries, manager reports, and feedback trends.
What you get
Better staff consistency, easier training, fewer details getting lost.
Try this
Weekly manager meeting recorded → clear summary, action list, staff notices, training updates.

✂️ Salons & Barbershops

What you already have
Team updates, service notes, customer questions, training needs, promo ideas.
What Bonfyre does
Organizes into service guides, training materials, marketing drafts, and a staff knowledge base.
What you get
Faster onboarding, more consistent service, less repeated instructions.
Try this
Owner records “how we handle premium appointments” → staff guide, checklist, and FAQ page.

🏋️ Gyms & Fitness Studios

What you already have
Coach notes, member questions, program explanations, onboarding conversations.
What Bonfyre does
Creates member onboarding materials, coach summaries, training guides, searchable internal knowledge.
What you get
Better member experience, less repeated explanation, cleaner team communication.
Try this
Coach records a new class explanation → member description, coaching notes, and onboarding content.

🔧 Local Service Businesses

What you already have
Phone calls, job notes, technician updates, customer questions, estimates.
What Bonfyre does
Turns that into clean summaries, estimate materials, follow-up drafts, and job documentation.
What you get
Faster quoting, better follow-up, fewer details through the cracks.
Try this
Technician leaves a job-site voice note → job summary, customer recap, and estimate-ready notes.

🏠 Real Estate Teams

What you already have
Showing notes, listing ideas, client calls, buyer concerns, neighborhood research.
What Bonfyre does
Organizes into client-ready summaries, listing support material, and searchable deal notes.
What you get
Better communication, faster follow-up, more polished client service.
Try this
Agent uploads buyer consultation recording → needs summary, budget priorities, follow-up recommendations.

🛡️ Insurance Agencies

What you already have
Client calls, policy explanations, onboarding materials, renewal questions.
What Bonfyre does
Turns conversations into clear summaries, training materials, FAQ content, and intake records.
What you get
Clearer communication, easier staff training, less policy confusion.
Try this
Customer call uploaded → policy discussion summary, next-step notes, follow-up checklist.

⚖️ Law Offices

What you already have
Client intake calls, meeting notes, matter updates, document reviews.
What Bonfyre does
Structures into organized summaries, intake records, case prep notes, and packaged materials.
What you get
More usable internal records, less time reorganizing conversations.
Try this
Client intake call → transcript, intake summary, issues list, organized review packet.

🏥 Medical & Dental Offices

What you already have
Staff procedures, patient education material, office training notes, admin processes.
What Bonfyre does
Turns those into reusable office guides, staff SOPs, patient handouts, and searchable documentation.
What you get
Better consistency and easier staff training.
Try this
Office manager records a front-desk process → written SOP, checklist, new-staff training material.

💚 Nonprofits

What you already have
Meeting recordings, community interviews, field notes, grant ideas, donor drafts.
What Bonfyre does
Turns them into reports, grant support material, board packets, and outreach drafts.
What you get
Less admin overhead, more polished outputs from a small team.
Try this
Program debrief uploaded → summary, outcomes list, grant-language draft, board update.

📚 Schools & Training Orgs

What you already have
Lectures, workshop recordings, training notes, teacher knowledge, learning content.
What Bonfyre does
Turns that into study guides, training documents, searchable archives, and reusable course materials.
What you get
More value from the same teaching time, easier content reuse.
Try this
Workshop recording uploaded → summary, lesson notes, handout draft, archive entry.

⛪ Churches & Faith Communities

What you already have
Sermons, teaching notes, ministry updates, volunteer instructions, archive material.
What Bonfyre does
Turns them into summaries, newsletters, volunteer guides, and searchable ministry archives.
What you get
Better communication, easier reuse of important content.
Try this
Sermon audio uploaded → transcript, summary, devotional notes, newsletter-ready content.

🏛️ Museums & Historical Groups

What you already have
Oral histories, archive notes, exhibit research, recordings, educational material.
What Bonfyre does
Organizes into searchable archives, summaries, educational packets, and public-ready content.
What you get
Preservation work that is easier to access and reuse.
Try this
Oral history recording → transcript, topic summary, archive entry, exhibit-support content.

💼 Agencies & Consultants

What you already have
Client calls, workshop recordings, sales notes, proposals, reports.
What Bonfyre does
Turns those into clean summaries, proposals, report drafts, pricing support, and client-ready deliverables.
What you get
Faster service delivery, more polished outputs from the same conversations.
Try this
Discovery call uploaded → transcript, executive summary, next steps, proposal starter packet.

🎧 Clubs & Event Venues

What you already have
DJ notes, promoter messages, event recaps, guest lists, sponsor conversations.
What Bonfyre does
Turns scattered event info into organized summaries, promo assets, sponsor packets, and planning records.
What you get
Better run events, cleaner communication, faster promo cycles.
Try this
Post-event notes uploaded → event summary, top issues, social post ideas, sponsor recap.

You keep using familiar tools on the front end. Bonfyre handles the hard part behind the scenes.

Browse all recipes →

FPQ Compression Benchmarks

Every number below comes from a real run on this machine. Raw logs, scripts, and CSVs are in the repo. ↑ Back to FPQ overview

0.999882 Avg cosine β€” Wan2.1-T2V-14B (402 tensors, v9@3-bit)
0.999916 Avg cosine β€” Whisper Large V3 (998 tensors, v9@3-bit)
1,790 Tensors compressed across 4 production models
+1.97% PPL degradation β€” Qwen 0.5B @3-bit
54β†’27 GB Wan2.1-T2V-14B (50% compression, 14B params)
28 GB Phi-4 14B β€” near-lossless (cos 1.000614)
8.7β†’5.8 GB Whisper Large V3 (33% compression, 1.55B params)
0.999826 Worst-case tensor cosine (Wan2.1-T2V-14B, 402 tensors)
4.05–4.19 Bits per weight range across all models @3-bit
0.99759 DiT output cosine (30 transformer blocks)
+128% HQQ @3-bit PPL (32.38 β€” FPQ is 65Γ— less degradation)

Artifact links

Hugging Face Model Hub

Inference-ready track: BF16 safetensors (drop-in, no special loader). Native .fpq track is published separately as an unfinished storage path until direct inference works without that extra runtime gap.

Wan2.1-T2V-14B (54β†’27 GB) β†’ Phi-4 14B (28 GB) β†’ Whisper Large V3 (8.7β†’5.8 GB) β†’ Whisper Large V3 Turbo (1.6 GB) β†’
Proof Pack (2026-04-10)

Qwen perplexity (v8 vs v4 vs HQQ), Whisper roundtrip, CSV, PNG chart, reproduction commands.

View proof pack β†’
DiT Comparison JSON

Forward pass metrics, per-channel analysis, timestep sweep, timing data. Machine-readable.

View comparison script β†’
BENCHMARKS.md

Full benchmark report: version progression, weight tables, KV cache, speed optimization, binary sizes.

View benchmarks doc β†’
Perplexity Benchmark Script

Python script to reproduce Qwen PPL results. Supports v4/v8 modes, configurable tokens/stride.

View script β†’
BonfyreFPQ Source

Pure C11 engine: main.c, fpq_codec.c, ggml_reader.c, fpq.h. Builds with make on macOS/Linux.

View source β†’
Wan2.1 Roundtrip Log

Full 307-tensor v9 roundtrip log showing per-tensor cosine, adaptive rank, E8/RVQ diagnostics.

View log β†’

Benchmarks

Apple M-series, measured after 5 optimization passes (P0–P5). All numbers are real. See also: FPQ compression benchmarks.

5–8 ms Per-stage latency (was 76 ms)
9.3% Lambda Tensors compression (N=10K)
237 ms ONNX embed (was 600 ms Python)
6 ms fastText inference (was 150 ms Python)
536 bytes Artifact struct (was 1,076)
5 ms SIMD exact vector search
15.5Γ— Batch embed speedup (10 files)
~10Γ— Hash hex (LUT vs snprintf)
3 public-origin handoff proofs live now
0 mirrored source media retained
6/43 segments flagged in current public proof set

Reproduce these numbers

# Public-source proof path (transient media, retained artifacts)
git clone https://github.com/Nickgonzales76017/hcp-whisper.git
cd hcp-whisper && make
./hcp-whisper -m models/ggml-tiny.en-q5_0.bin -f your-audio.wav --output-json
# Inspect confidence, realtime factor, and retained derived artifacts
# Run the full test suite (167 tests)
make test # hcp-whisper: 167/167 tests
# Bonfyre pipeline (5-8 ms per stage)
git clone https://github.com/Nickgonzales76017/bonfyre.git
cd bonfyre && make
time ./bin/bonfyre-pipeline run --input audio.wav

Current public proof set: Nursing School Explained and AHRQ Patient Safety YouTube sources linked inside the Shift Handoff app. Bonfyre downloads source media transiently, processes it, deletes the local media copy, and publishes only derived artifacts.

5 optimization passes. Zero regressions.

Every pass shipped. Every test passes. 167 tests across hcp-whisper + 2 libraries + 47 binaries.

P0 β€” Foundation
Pure C rewrite
Python β†’ C11, ONNX multi-thread, VECF binary format, -O3 -march=native -flto
P1 β€” Tokenizer
Trie + inline DB
Hash table β†’ trie tokenizer, --insert-db zero-file-I/O embed path
P2 β€” SIMD
Batch + cosine
NEON SIMD cosine, batch embed, libbonfyre shared runtime (8 binaries)
P3 β€” Native
fastText in C
Pure C fastText inference, libbonfyre β†’ 29 binaries, DB connection pooling
P4 β€” Architecture
Hardening pass
FNV hash registry, SHA-256 dedup, PGO targets, TCP_NODELAY, SIGPIPE handling
P5 β€” Datatype
10 syntax wins
Hex LUT ~10Γ—, struct 1076β†’536, O(nΒ²)β†’O(n), raw syscalls, switch dispatch
MetricBefore (P0)After P5Improvement
Single embed~600 ms (Python)237 ms2.5Γ—
10-file batch embed~6,000 ms386 ms15.5Γ—
Pipeline (6 stages)76 ms8 ms9.5Γ—
Tag inference~150 ms (Python)6 ms25Γ—
Hash hex conversion~100 ns (snprintf)~10 ns (LUT)~10Γ—
Artifact struct1,076 bytes536 bytes2Γ— cache density
Operator lookupO(n) linearO(1) FNV hashalgorithmic
Token generationO(nΒ²) strlen loopO(n) tracked offsetalgorithmic
Vector file (384-dim)6.4 KB JSON1,544 bytes VECF4.2Γ— smaller
Public proof confidencenot measured here0.5303-0.6887 across 3 linked handoff videosvisible, not hidden
HCP pipelineN/Aspectral + KIEL-CC + E-T Gate + formant + logit bias<1% overhead (unified FFT)
Flagged segmentsundetected6 / 43 in the current public-origin proof setshown in proof JSON
Duplicate code34 copies1 each (libbonfyre)eliminated

Architecture

48 separate binaries. Not a monolith. Not a framework. Each is a standalone Unix process.

Unix philosophy

Each binary does one thing. Compose them with pipes, files, or the pipeline binary. bonfyre-media-prep audio.wav | bonfyre-transcribe | bonfyre-brief

Process isolation

Every binary runs as its own process. No shared memory. If one crashes, nothing else does. 15-minute audio files process without leaks β€” separate processes clean up on exit.

Dynamic linking

Whisper via libwhisper (Homebrew). LLM via llama-completion as a subprocess. SQLite via system library. No static megabinary.

Pipeline DAG

Audio in β†’ ingest β†’ media-prep β†’ transcribe β†’ transcript-clean β†’ paragraph β†’ brief β†’ proof β†’ pack β†’ distribute
                                    ↳ embed β†’ vec (semantic search branch)
                                    ↳ tag + tone (enrichment branch)
                                    ↳ render β†’ emit (HTML/PDF/EPUB/RSS output)

What Bonfyre is not

Not an LLM runner

Ollama, LocalAI, and LM Studio serve LLM inference. Bonfyre is a content processing pipeline β€” it uses models as tools inside a larger workflow, not as the product itself.

Not a framework

No SDK, no plugins, no config DSL. Each binary reads files or stdin, writes files or stdout. Compose them however you want β€” shell scripts, Makefiles, GitHub Actions.

Not a monolith

47 separate executables, each 34–287 KB. Use one binary for one job, or chain 10 into a pipeline. No coupling β€” swap, skip, or replace any stage.

Five layers

Every binary declares its behavioral class. Transform binaries are pure β€” same inputs, same outputs, cacheable.

Surface cms Β· api Β· auth Β· pipeline Β· cli Β· transcript-family Β· project Β· tel Β· proxy
Value offer Β· gate Β· meter Β· ledger Β· finance Β· outreach Β· pay Β· pack Β· distribute
Transform media-prep Β· transcribe Β· transcript-clean Β· paragraph Β· brief Β· proof Β· embed Β· narrate Β· render Β· emit Β· mfa-dict Β· weaviate-index Β· repurpose Β· segment Β· clips Β· speechloop Β· tone Β· tag Β· canon Β· query
Substrate ingest Β· hash Β· index Β· compress Β· stitch Β· graph Β· runtime Β· queue Β· sync
Libraries libbonfyre (runtime contract, FNV hash registry, SHA-256, 47 operators) Β· liblambda-tensors (family compression, Huffman, arithmetic coding)

Full architecture doc β†’

All 50 binaries

Every binary is standalone. Use one or use all. ~2.1 MB total disk.

Substrate (9 binaries)

bonfyre-ingest 35 KB β€” intake + type detection
bonfyre-hash 34 KB β€” SHA-256 content addressing
bonfyre-index 68 KB β€” SQLite artifact index + FTS
bonfyre-compress 34 KB β€” zstd family-aware compression
bonfyre-stitch 34 KB β€” DAG materializer
bonfyre-graph 51 KB β€” Merkle-DAG artifact graph
bonfyre-runtime 34 KB β€” process lifecycle
bonfyre-queue 34 KB β€” persistent job queue
bonfyre-sync 34 KB β€” cross-instance replication

Transform (22 binaries)

bonfyre-media-prep 34 KB β€” audio normalization
bonfyre-transcribe 34 KB β€” speech-to-text (Whisper)
bonfyre-transcript-clean 34 KB β€” remove filler words
bonfyre-paragraph 35 KB β€” structure paragraphs
bonfyre-brief 34 KB β€” summary + action items
bonfyre-proof 34 KB β€” quality scoring
bonfyre-embed 52 KB β€” ONNX embeddings, trie tokenizer, batch, --insert-db
bonfyre-vec 35 KB β€” SIMD cosine vector search (sqlite-vec)
bonfyre-narrate 68 KB β€” verified TTS: 6-layer fidelity, inline FFT, 27-feature fingerprint, closed-loop verification, zero external deps
bonfyre-render 34 KB β€” template rendering
bonfyre-emit 34 KB β€” HTML/PDF/EPUB/RSS output
bonfyre-mfa-dict 34 KB β€” pronunciation dictionary
bonfyre-weaviate-index 34 KB β€” Weaviate vector search
bonfyre-transcript-family 34 KB β€” full transcription chain
bonfyre-repurpose 34 KB β€” content repurposing
bonfyre-segment 50 KB β€” speaker segmentation
bonfyre-clips 35 KB β€” audio clip extraction
bonfyre-speechloop 34 KB β€” live speech loop
bonfyre-tone 34 KB β€” tone/sentiment (openSMILE)
bonfyre-tag 35 KB β€” topic tagging (native fastText)
bonfyre-quant 42 KB β€” v8 RLF weight quantization (E8 lattice + ΞΌ-law + 16D RVQ, 0.9999 cos @ 3-bit)
bonfyre-kvcache 42 KB β€” KV cache compression (E8 lattice + ΞΌ-law + 16D RVQ, 4-bit recommended)

Surface (9 binaries)

bonfyre-cms 287 KB β€” CMS + Lambda Tensors
bonfyre-api 69 KB β€” HTTP gateway + dashboard
bonfyre-auth 35 KB β€” user auth + sessions
bonfyre-pipeline 52 KB β€” unified pipeline (5-8 ms/stage)
bonfyre 34 KB β€” unified CLI dispatcher
bonfyre-project 34 KB β€” project scaffolding
bonfyre-tel 68 KB β€” FreeSWITCH telephony (SIP/RTP)
bonfyre-canon 35 KB β€” canonical artifact format
bonfyre-proxy 53 KB β€” OpenAI-compatible API shim (drop-in replacement)

Value (9 binaries)

bonfyre-offer 34 KB β€” dynamic pricing
bonfyre-gate 34 KB β€” API key tiers
bonfyre-meter 34 KB β€” usage tracking
bonfyre-ledger 34 KB β€” financial records
bonfyre-finance 51 KB β€” bundle pricing
bonfyre-outreach 51 KB β€” outreach tracking
bonfyre-pay 35 KB β€” invoicing + payments
bonfyre-pack 34 KB β€” deliverable packaging
bonfyre-distribute 34 KB β€” email/Slack/webhooks

Libraries

libbonfyre 64 KB β€” runtime contract, FNV hash operator registry, SHA-256
liblambda-tensors 72 KB β€” structural JSON compression (Huffman, arithmetic coding)

Install

Build from source in under 60 seconds.

# From source (recommended)
git clone https://github.com/Nickgonzales76017/bonfyre.git
cd bonfyre
make # builds 2 libraries + 47 binaries
make install # copies to ~/.local/bin
# One command (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/Nickgonzales76017/bonfyre/main/install.sh | sh

Requirements: C11 compiler (gcc or clang), SQLite3 dev headers, zlib. Optional: ONNX Runtime (for embed), FreeSWITCH (for tel).

Live Apps

20 apps running on GitHub Pages

Real end-user applications powered by Bonfyre binaries. Each runs a hybrid architecture: WASM client-side preview + GitHub Actions server-side pipeline. Drag-and-drop a file, watch it process.

Shift Handoff Board
Public-origin handoff videos become traceable shift cards, clean transcripts, briefs, and proof bundles.
media-prep → transcribe → clean → brief → proof
Memory Atlas
Voice notes transcribed and placed on an interactive timeline.
transcribe → brief → embed → render
Freelancer Evidence Vault
Client calls become timestamped invoices and proof-of-work records.
transcribe → brief → proof → pack
Customer Voice Board
Customer interviews distilled into a searchable insight dashboard.
transcribe → tone → tag → embed
Family History Museum
Family recordings organized into a browsable oral history museum.
transcribe → brief → render
Podcast Plant
Raw audio becomes a published podcast site with RSS feed.
media-prep → transcribe → brief → emit
Postmortem Atlas
War room recordings become searchable postmortem archives.
transcribe → tag → embed → render
Explain This Repo
Source code analyzed and transformed into an onboarding guide.
ingest → canon → brief → render
Town Box
Civic OS β€” meeting recordings become public-facing town dashboards.
transcribe → brief → tag → render
Grant Evidence Pack
Stories and interviews packaged into grant-ready evidence bundles.
transcribe → proof → pack → emit
Micro-Consulting Storefront
Package and meter consulting engagements with dynamic pricing.
offer → meter → pack
Personal Legal Prep Binder
Documents scored, tagged, and packed into legal-ready binders.
proof → tag → pack
OSS Maintainer Cockpit
Ingest issues and PRs into a searchable embedded knowledge base.
ingest → tag → embed
Release-Note Radio
Changelogs narrated into audio and published as an HTML + RSS site.
narrate → render → emit
Async Standup Newspaper
Voice standups analyzed for tone and rendered as a daily newspaper.
tone → render
Competitive Intelligence Scrapbook
Market data embedded and searchable in a vector-powered intel database.
embed → vec → index
Sales Call Distiller
Sales calls analyzed for tone, clipped, tagged, and embedded for search.
tone → clips → tag → embed
Procurement Memory Site
Procurement docs embedded into a vector-searchable ledger.
embed → vec → ledger
Museum Exhibit Builder
Audio clips rendered into interactive museum exhibit pages.
clips → render
Local Archive Explorer
Documents embedded and indexed into a searchable local archive.
embed → vec → index

Every app uses the same pattern: Git hooks run Bonfyre pipelines on commit. GitHub Actions process server-side. 22 KB WASM module gives instant client-side previews.

MIT Licensed. Do whatever you want with it.

One repo. 50 binaries. ~2.1 MB. 167 tests. 5 optimization passes.