❯ ncpu // a neural computer that reasons in code
Programs are discovered, not written.
Give it input/output examples and it searches program space until it finds code that provably reproduces them — then keeps that program forever, in a library you can read, verify, and ship in 130 KB of WASM. No prompt engineering. No hallucinated code: it refuses rather than guesses. Try it right here in your browser.
loading runtime …
❯ thesis
One thesis, five subsystems.
A computer can be built out of learned components — and once the whole execution stack is differentiable, programs stop being things you write and become things you search for by gradient descent. NPCoT is one pillar of nCPU, not the whole project.
❯ pillar 1
The neural computer
Every ALU operation is a trained network — 100% exact 32-bit integer arithmetic, exhaustively verified. Neural multiplication is 12× faster than neural addition.
❯ pillar 2
The GPU computer
A complete UNIX machine on a single GPU: 25-command shell, a self-hosting C compiler, real BusyBox and Alpine Linux v3.20, ~1.9M IPS with zero timing variance.
❯ pillar 3
Differentiable synthesis
Programs discovered by gradient descent through a differentiable CPU: Mog at 315/315 and nSynth at 105/105 benchmark coverage. Try it live, or watch it learn Pong.
❯ pillar 4
The coprocessor
The neural ALU injected into a transformer forward pass. Qwen3.5-2B arithmetic: 14.5% → 71.0%. Measured on real HumanEval, not extrapolated.
❯ pillar 5
JEPA machine dynamics
A predictive world model of the computer itself — latent speculation and anomaly detection over an exact execution substrate with unlimited free ground truth.
The headline above — 86% HumanEval from a 4B model — is what happens when pillar 3 is pointed at code. Pillars 1, 2, and 5 are the computer it runs on.
The neural-computer story →❯ pipeline
What actually happens.
Conventional chain-of-thought asks a language model to emit reasoning as tokens and trusts it to follow them. NPCoT compiles reasoning into a discrete program inside the forward pass, then caches the program so the next invocation runs without a single gradient op.
❯ train »
A transformer's hidden state drives a differentiable array-reduction head. Gradient descent finds the program that matches the target.
❯ crystallize »
When soft-path and discrete-path outputs agree within threshold, the 5-tuple program is cached in the library with a signed fingerprint.
❯ reuse
Consult the library on new hidden states: 100% hit rate → ~4 ns consult + execute. Runs on CPU, GPU, WASM, or a 475 KB standalone binary.
❯ evidence
Verified on real hardware.
No synthetic benchmarks. Every number below comes from real rented GPUs (April 18, 2026): library verification on an RTX A4000 ($0.16), the HumanEval cascade on an RTX 3090 ($0.39).
Library self-consistency check
200 array-reduction problems — not a real LLM comparison
This is a regression test for the library itself. For real LLM numbers see the HumanEval runs.
Qwen3.5-4B on HumanEval: +27.5 pt system result
164 problems, RTX 3090 — same weights, the lift is the harness
The gains come from verifier-gated sampling plus a compounding solve store, not from extra parameters — 30 hard-fails rescued for $0.39 GPU, beating the Qwen3.5-9B baseline (71.3%). Full honest attribution, including the layers that did nothing, on the benchmarks page.
Cross-platform reproducibility
Same library, same MAE
Shipping artifacts
❯ components
The stack.
Differentiable training
ArrayExecutableThoughtHead learns programs by gradient descent. Coprocessor wraps any HF transformer layer behind a max_gate safety cap.
Discrete library
Cosine-similarity keyed cache of DiscreteArrayProgram 5-tuples. LRU eviction, HMAC signing, (ε,δ)-DP perturbation, fingerprint IDs.
Native runtime
Pure-Rust executor, Metal compute shader, 130 KB WASM. Zero Python, zero PyTorch on the inference path.
Compliance pipeline
Static verifier proves termination, division safety, overflow bounds. Compliance report emits a safe/warn/high aggregate for regulated deployments.
Federation
merge_libraries across organizations with conflict resolution. Teacher→student distillation via least-squares projection fit on paired hiddens.
Session lifecycle
ProgramLibrarySession handles load/save + snapshot/diff. Every task produces an audit trail of what skills changed.
❯ run it
On your laptop in 30 seconds.
❯ git clone https://github.com/robertcprice/nCPU ❯ cd nCPU ❯ python3 -m pytest tests/self_optimizing/ -q ❯ python3 -m demos.npcot_scale_practicality
458 tests, Apple Silicon native, real benchmarks. Takes about one minute total.