Bonfyre is a behind-the-scenes engine that takes messy business input β calls, files, notes, recordings β and turns it into something useful, organized, and ready to use. No cloud bills. No vendor lock-in. Runs on your hardware.
This isn't a wrapper around someone else's tooling. Every piece is built from scratch in C.
Bonfyre's binaries are organized into four layers: Substrate (ingest, normalize, segment), Transform (transcribe, summarize, score, embed), Surface (format, render, deliver), and Value (price, invoice, pay, prove). Each operator declares whether it's cacheable, reversible, and idempotent β so the pipeline can parallelize, skip, and retry safely.
The stack is not just bigger. The shape of the product has changed in useful ways.
A week ago the public story was still mostly compression, transcription, and backend footprint. The newer repo shape adds a more operational Bonfyre: captures become artifacts, artifacts become recipes, recipes become execution plans, and the CLI now exposes more of the health and registry state needed to work with that system confidently.
The Shift Handoff app takes linked public videos, transcribes them locally, generates summaries and proof bundles, and removes the original media. Everything is traceable back to the source.
Real cost and complexity savings. These are real numbers.
These workflows weren't possible before β they needed expensive cloud GPUs or huge servers. Now they run on your laptop.
Record a client intake call. Bonfyre transcribes it locally (HCP Whisper with speaker segmentation), identifies key issues via BM25 ranking, scores transcript quality, and packages everything into a structured brief with content-addressed artifacts. Your client's data never touches the cloud β BLAKE2b hashing proves provenance.
Patient conversations transcribed and summarized into structured SOAP notes β all on-premise. HIPAA-compliant by default because nothing leaves the building. XSalsa20-Poly1305 encryption at rest, Ed25519 signing for audit trails. An 8 GB device runs the entire stack: transcription, summarization, quality scoring, and delivery.
Drop in a raw episode β get a transcript (speaker-segmented), show notes, social media clips, and a checkout page with dynamic pricing β all generated locally in under 5 minutes. The pipeline: bonfyre-intake β bonfyre-transcribe β bonfyre-summarize β bonfyre-format β bonfyre-offer. Each stage is content-addressed; re-runs skip unchanged work.
AI video generation models that used to need $10K+ cloud GPUs now run on a $2,000 workstation. Generate longer sequences without running out of memory β Bonfyre's compression makes the difference.
Record interviews in the field with no internet. Bonfyre transcribes on-device, embeds documents for SIMD vector search, creates searchable archives with zero-copy LMDB caching, and publishes a static site β all from a MacBook. Fountain codes handle lossy syncs back to base.
Record a 90-minute lecture. Bonfyre transcribes it, generates structured notes, and produces a study guide β all locally. Student data never goes to any third party.
Run AI quality inspection on the factory floor with no internet required. A small device handles the entire model + logging + API. Fits in a 10-watt power envelope.
Full AI inference on hardware with no internet connection. Bonfyre's compression means large AI models now fit on portable, field-deployable devices.
bonfyre-moq is the realtime version of the same idea as the rest of Bonfyre: a small C binary you can run yourself. It terminates WebTransport, speaks MoQ, records stream events to SQLite, and gives you a private relay primitive for live audio, agent calls, and browser-native media flows.
Calls, files, notes, recordings, repos, and live streams come in β Bonfyre processes them locally β organized output comes out as summaries, briefs, archives, pages, audio, search surfaces, and delivery flows. No cloud dependency. No recurring vendor chain.
If you have a laptop made in the last 5 years, you can probably run Bonfyre. Here's what different hardware can handle.
| What You Have | What You Can Run |
|---|---|
| Raspberry Pi / 8 GB device | Small AI models, local transcription, full pipeline processing |
| MacBook / 16 GB laptop | Speech transcription + AI inference + video processing β all running together |
| MacBook Pro / 64 GB Mac | Large AI models (14B parameters), full video generation, 32K-token context windows |
| Cloud GPU (T4 / 16 GB) | Everything a 64 GB Mac can do, plus batch processing and multi-model workflows |
| Workstation GPU (RTX 4090+) | Full video generation, multi-second sequences, all models simultaneously |
AI models that used to require expensive cloud GPUs now fit on hardware you already own. Bonfyre's compression technology makes models ~4Γ smaller while keeping quality at 99.9%+. That's the difference between "cloud only" and "runs on your laptop."
Each is a standalone starting point β you don't need to understand the whole system.
Replace bloated CMS platforms with a 287 KB binary. Dynamic schemas, token auth, REST API, full-text search β 1,742Γ smaller than Strapi with zero npm dependencies. Repo β
HCP-enhanced Whisper runs locally: speaker segmentation, filler/hallucination filtering, quality scoring (88 audio features via eGeMAPSv02). No cloud, no API keys, no per-minute charges. Live proof β Repo β
Drop in audio β transcript β summary β quality score β pricing β packaged deliverable. 5β8 ms per stage. Content-addressed caching skips unchanged stages on re-runs. Repo β
NEON SIMD cosine similarity + BM25 text ranking. Zero-copy LMDB storage (pointer casts into mmap). Replace $250/mo Pinecone with local 5 ms queries. Repo β
Auth, metering, API gateway, rate limiting, telephony, proxy, orchestration, work-stealing thread pools, compression, config, and CLI surfaces β all backed by the same shared runtime instead of a pile of separate services. Repo β
3-bit quantization via E8 lattice snap + ΞΌ-law warp + 16D RVQ. Cosine 0.9999+. 15 models published on HuggingFace. Run on a $500 laptop what used to require a $10K GPU. Repo β
53 KB binary. Set OPENAI_API_BASE=http://localhost:8787. Same endpoints, local processing, $0 cost. Good when you want existing tools to keep working while Bonfyre handles transcription, briefing, and local model-facing flows underneath.
bonfyre-orchestrate is not a chat wrapper. It is a planner layer that scores pathways, stores policy memory, and decides when extra Bonfyre blocks like narrate, render, emit, or pack should be used to improve the result.
KV cache compression: 4Γ more context tokens in the same VRAM. 9 optimization passes. Works with both nn.Linear and FPQ layers. Process entire 90-minute lectures in a single pass.
bonfyre-narrate turns briefs, updates, and changelogs into audio with inline feature extraction, fidelity scoring, and refinement. It is not just βTTS added laterβ β it is a proof-minded output layer in the same stack.
Run a private WebTransport/MoQ relay in C for browser-native audio, live agent sessions, and local media fan-out. Same Bonfyre values: local-first, low overhead, SQLite logs, no recurring relay vendor tax. Docs β
Use SIP where it fits, MoQ where it fits, and swarm delivery where it fits. Bonfyre now has a broader transport story: telephony, browser-native relay, QUIC-aware artifact movement, and metered peer distribution in one family.
Keep using WordPress for the front end. Let Bonfyre handle the heavy lifting behind the scenes.
Turn episode audio into draft posts, summaries, and quotes automatically.
Search by meaning, not just keywords. Replace bloated search plugins.
Create editorial summaries from long transcripts or notes.
Back premium content tiers without plugin sprawl.
Produce PDFs, EPUBs, and guides from your WordPress content.
Index docs, FAQs, and help content for fast retrieval.
WordPress handles presentation. Bonfyre handles auth, metering, and deliverables.
Raw call audio into organized, quality-scored client packets.
Enrich old content with topics, categories, and clusters.
Turn one post into snippets, email copy, and social-ready assets.
Semantic index + artifact pipeline for PDFs and transcripts.
Quoting and billing from proof bundles to invoices.
Upload voice notes, publish cleaned, summarized versions.
Transcription and search without cloud APIs or vendor lock-in.
WordPress as editor, Bonfyre to emit alternate outputs and feeds.
You don't need to understand the technical details. Bonfyre takes your messy business input and turns it into something useful.
Real applications you can click and use. Each one is powered by Bonfyre behind the scenes.
Newer runtime pieces and outward-facing surfaces, summarized without turning this page into release notes.
53 binaries in the stack. 26 shared library modules. Pages, proxy, orchestration, relay, and delivery still keep the same local-first shape.