12 Kernels | Pure Rust + WASM (wasm32-wasip2) | No CUDA | No unsafe

Production inference kernels
you can actually read.

12 Rust + WASM (wasm32-wasip2) kernels covering the full LLM inference stack: embedding, attention, flash-attention, KV-cache, RoPE, LayerNorm+GeLU, RMSNorm, fused MLP, SwiGLU, int8-matmul, bf16-matmul, and token-sampler. No CUDA. No unsafe code. Ships to any runtime.

Buy a Kernel — from $1,500 See the Catalog

$1,500

Flat fee — source included

197 tests

Across 12 kernels, all green

0 unsafe

Pure safe Rust — audit it yourself

wasm32-wasip2

Ships to any runtime, no CUDA needed

Why we built it

Our agent memory system (NovaMem) processes memories through a vector search layer. We needed an embedding kernel we could verify, modify, and compile to wasm32-wasip2 (WASI Preview 2) without pulling in a Python runtime or CUDA toolchain. So we wrote one from scratch in Rust.

Approach	Cost	Auditable	WASM-compatible
sentence-transformers (Python)	Free	Complex	No
ONNX Runtime	Free	Partial	Partial
Custom CUDA kernel	$10K–$50K consulting	Yes	No
blitz-embedding	$1,500	Yes — pure safe Rust	Yes — wasm32-wasip2

What's in the box

Fused embedding pipeline

Token lookup → mean pooling → layer norm → L2 normalization in a single pass. No intermediate allocations.

Ragged batch support

Real variable-length inputs — no padding needed. Tested at BERT-base scale: 256 × 64 tokens × 768-dim.

Load your own weights

EmbeddingTable::from_weights() accepts your checkpoint. Unit-norm outputs ready for cosine similarity search.

30-min architecture call included

We walk you through the code, integration path, and answer every question. Flat fee, no ongoing obligation.

Who this is for

If you're building an embedding pipeline and want a kernel you can actually read, audit, and ship to any runtime, this is for you.

Cloudflare Workers / wasmCloud
Need embedding inference at the edge without Python. wasm32-wasip2 target, Component Model ready.
Inference Providers
Together AI, Fireworks, Baseten — want a CPU reference impl that's readable and hackable. Bring your own weights.
Teams Who Hate Black Boxes
Every line is readable, every algorithm is documented, every test is explicit. No magic. No unsafe.

Early Access — No Risk

If you buy and find anything in the source that doesn't match the spec, reach out. We'll fix it or refund immediately. We want honest customers who got exactly what they expected.

Pricing

One-time purchase. No subscriptions. No vendor lock-in.

Early Access

$1,500

blitz-embedding — source, tests, and integration support.

Full source (pure safe Rust)
wasm32-wasip2 compatible binary
EmbeddingTable::from_weights() API
18 tests (15 unit + 3 doc), all green
30-min architecture call included
No-questions refund if spec not met

Get Early Access — $1,500

12-Kernel Bundle

$6,500 (save ~64%)

All 12 kernels: embedding, attention, flash-attention, kv-cache, rope, layernorm-gelu, rmsnorm, fused-mlp, swiglu, int8-matmul, bf16-matmul, token-sampler.

All 12 kernels — full source
Design partnership access
Architecture consultation call
Priority on GPU path (H200)
30-day support included

Buy 12-Kernel Bundle — $6,500

Enterprise

$15,000 /year

Full kernel library + custom builds + SLA.

Unlimited kernel library access
Custom kernel requests (priority)
Dedicated support engineer
SLA: 99.9% delivery uptime
Quarterly performance reviews
Early access to new kernels

Contact Sales

Optional: Kernel Support

Keep your kernels current as hardware and models evolve.

$200/month

Priority support + kernel updates as new GPU architectures ship.

Add Support — $200/mo

A kernel you can read is a kernel you can trust.

Flat fee. Source included. 30-min integration call. No ongoing obligation.

Contact to Buy — from $1,500

Production inference kernelsyou can actually read.