12 Kernels | Pure Rust + WASM (wasm32-wasip2) | No CUDA | No unsafe
Production inference kernels
you can actually read.
12 Rust + WASM (wasm32-wasip2) kernels covering the full LLM inference stack: embedding, attention, flash-attention, KV-cache, RoPE, LayerNorm+GeLU, RMSNorm, fused MLP, SwiGLU, int8-matmul, bf16-matmul, and token-sampler. No CUDA. No unsafe code. Ships to any runtime.
$1,500
Flat fee — source included
197 tests
Across 12 kernels, all green
0 unsafe
Pure safe Rust — audit it yourself
wasm32-wasip2
Ships to any runtime, no CUDA needed
Why we built it
Our agent memory system (NovaMem) processes memories through a vector search layer. We needed an embedding kernel we could verify, modify, and compile to wasm32-wasip2 (WASI Preview 2) without pulling in a Python runtime or CUDA toolchain. So we wrote one from scratch in Rust.
| Approach |
Cost |
Auditable |
WASM-compatible |
| sentence-transformers (Python) |
Free |
Complex |
No |
| ONNX Runtime |
Free |
Partial |
Partial |
| Custom CUDA kernel |
$10K–$50K consulting |
Yes |
No |
| blitz-embedding |
$1,500 |
Yes — pure safe Rust |
Yes — wasm32-wasip2 |
What's in the box
1
Fused embedding pipeline
Token lookup → mean pooling → layer norm → L2 normalization in a single pass. No intermediate allocations.
2
Ragged batch support
Real variable-length inputs — no padding needed. Tested at BERT-base scale: 256 × 64 tokens × 768-dim.
3
Load your own weights
EmbeddingTable::from_weights() accepts your checkpoint. Unit-norm outputs ready for cosine similarity search.
4
30-min architecture call included
We walk you through the code, integration path, and answer every question. Flat fee, no ongoing obligation.
Who this is for
If you're building an embedding pipeline and want a kernel you can actually read, audit, and ship to any runtime, this is for you.
-
Cloudflare Workers / wasmCloud
Need embedding inference at the edge without Python. wasm32-wasip2 target, Component Model ready.
-
Inference Providers
Together AI, Fireworks, Baseten — want a CPU reference impl that's readable and hackable. Bring your own weights.
-
Teams Who Hate Black Boxes
Every line is readable, every algorithm is documented, every test is explicit. No magic. No unsafe.
Early Access — No Risk
If you buy and find anything in the source that doesn't match the spec, reach out. We'll fix it or refund immediately. We want honest customers who got exactly what they expected.
Pricing
One-time purchase. No subscriptions. No vendor lock-in.
Early Access
$1,500
blitz-embedding — source, tests, and integration support.
- Full source (pure safe Rust)
- wasm32-wasip2 compatible binary
- EmbeddingTable::from_weights() API
- 18 tests (15 unit + 3 doc), all green
- 30-min architecture call included
- No-questions refund if spec not met
Get Early Access — $1,500
12-Kernel Bundle
$6,500 (save ~64%)
All 12 kernels: embedding, attention, flash-attention, kv-cache, rope, layernorm-gelu, rmsnorm, fused-mlp, swiglu, int8-matmul, bf16-matmul, token-sampler.
- All 12 kernels — full source
- Design partnership access
- Architecture consultation call
- Priority on GPU path (H200)
- 30-day support included
Buy 12-Kernel Bundle — $6,500
Enterprise
$15,000 /year
Full kernel library + custom builds + SLA.
- Unlimited kernel library access
- Custom kernel requests (priority)
- Dedicated support engineer
- SLA: 99.9% delivery uptime
- Quarterly performance reviews
- Early access to new kernels
Contact Sales
Optional: Kernel Support
Keep your kernels current as hardware and models evolve.
A kernel you can read is a kernel you can trust.
Flat fee. Source included. 30-min integration call. No ongoing obligation.
Contact to Buy — from $1,500