ncpu // a neural computer that reasons in code

Programs are discovered, not written.

Give it input/output examples and it searches program space until it finds code that provably reproduces them — then keeps that program forever, in a library you can read, verify, and ship in 130 KB of WASM. No prompt engineering. No hallucinated code: it refuses rather than guesses. Try it right here in your browser.

discovery reel

loading runtime …

+27.5 pt · HumanEval, system result130 KB · WASM runtime475 KB · native binary$0.39 · autoresearch cost552 · tests passing

thesis

One thesis, five subsystems.

A computer can be built out of learned components — and once the whole execution stack is differentiable, programs stop being things you write and become things you search for by gradient descent. NPCoT is one pillar of nCPU, not the whole project.

pillar 1

The neural computer

Every ALU operation is a trained network — 100% exact 32-bit integer arithmetic, exhaustively verified. Neural multiplication is 12× faster than neural addition.

pillar 2

The GPU computer

A complete UNIX machine on a single GPU: 25-command shell, a self-hosting C compiler, real BusyBox and Alpine Linux v3.20, ~1.9M IPS with zero timing variance.

pillar 3

Differentiable synthesis

Programs discovered by gradient descent through a differentiable CPU: Mog at 315/315 and nSynth at 105/105 benchmark coverage. Try it live, or watch it learn Pong.

pillar 4

The coprocessor

The neural ALU injected into a transformer forward pass. Qwen3.5-2B arithmetic: 14.5% → 71.0%. Measured on real HumanEval, not extrapolated.

pillar 5

JEPA machine dynamics

A predictive world model of the computer itself — latent speculation and anomaly detection over an exact execution substrate with unlimited free ground truth.

The headline above — 86% HumanEval from a 4B model — is what happens when pillar 3 is pointed at code. Pillars 1, 2, and 5 are the computer it runs on.

The neural-computer story →

pipeline

What actually happens.

Conventional chain-of-thought asks a language model to emit reasoning as tokens and trusts it to follow them. NPCoT compiles reasoning into a discrete program inside the forward pass, then caches the program so the next invocation runs without a single gradient op.

train »

A transformer's hidden state drives a differentiable array-reduction head. Gradient descent finds the program that matches the target.

crystallize »

When soft-path and discrete-path outputs agree within threshold, the 5-tuple program is cached in the library with a signed fingerprint.

reuse

Consult the library on new hidden states: 100% hit rate → ~4 ns consult + execute. Runs on CPU, GPU, WASM, or a 475 KB standalone binary.

evidence

Verified on real hardware.

No synthetic benchmarks. Every number below comes from real rented GPUs (April 18, 2026): library verification on an RTX A4000 ($0.16), the HumanEval cascade on an RTX 3090 ($0.39).

Library self-consistency check

200 array-reduction problems — not a real LLM comparison

ground truth reference100.0%
synthetic noise floor22.0%
NPCoT library consult60.5%

This is a regression test for the library itself. For real LLM numbers see the HumanEval runs.

Qwen3.5-4B on HumanEval: +27.5 pt system result

164 problems, RTX 3090 — same weights, the lift is the harness

baseline (greedy)58.5%
+ verified retry (sampled)67.68%
+ autoresearch (best-of-16)85.98%

The gains come from verifier-gated sampling plus a compounding solve store, not from extra parameters — 30 hard-fails rescued for $0.39 GPU, beating the Qwen3.5-9B baseline (71.3%). Full honest attribution, including the layers that did nothing, on the benchmarks page.

Cross-platform reproducibility

Same library, same MAE

macOS MPS0.560
Linux CUDA0.560
Linux CPU0.560
bit-for-bit identical

Shipping artifacts

standalone binary475 KB
WASM runtime130 KB
release tarball224 KB
library on disk2.2 KB

components

The stack.

Differentiable training

ArrayExecutableThoughtHead learns programs by gradient descent. Coprocessor wraps any HF transformer layer behind a max_gate safety cap.

Discrete library

Cosine-similarity keyed cache of DiscreteArrayProgram 5-tuples. LRU eviction, HMAC signing, (ε,δ)-DP perturbation, fingerprint IDs.

Native runtime

Pure-Rust executor, Metal compute shader, 130 KB WASM. Zero Python, zero PyTorch on the inference path.

Compliance pipeline

Static verifier proves termination, division safety, overflow bounds. Compliance report emits a safe/warn/high aggregate for regulated deployments.

Federation

merge_libraries across organizations with conflict resolution. Teacher→student distillation via least-squares projection fit on paired hiddens.

Session lifecycle

ProgramLibrarySession handles load/save + snapshot/diff. Every task produces an audit trail of what skills changed.

run it

On your laptop in 30 seconds.

zsh — ~/nCPU
 git clone https://github.com/robertcprice/nCPU
 cd nCPU
 python3 -m pytest tests/self_optimizing/ -q
 python3 -m demos.npcot_scale_practicality

458 tests, Apple Silicon native, real benchmarks. Takes about one minute total.