/deploy — six targets

Pick where it runs.

Library inference needs no GPU. Library training needs a GPU once per skill. Six deployment targets — pick the one that matches the scenario.

vast.ai

cheapest
$0.08–$0.15 / hr· Cost-sensitive benchmarks, ad-hoc GPU rentals
./packaging/scripts/vast_run.sh tests
./packaging/scripts/vast_run.sh bench3
./packaging/scripts/vast_run.sh humaneval Qwen/Qwen3.5-1.5B lib.json

The April 18, 2026 verification run: $0.16 total cost.

Modal

zero ops
$0.60–$3.75 / hr· CI jobs, serverless-style fire-and-forget
pip install modal && modal setup
modal run packaging/modal_run.py::run_bench3
modal run packaging/modal_run.py::run_humaneval --model X --library Y

RunPod

spot pricing
$0.20–$1.60 / hr· Long training sweeps, enterprise support

Provision a pod, SSH in, run the standard scripts:

git clone <mirror>/nCPU
pip install torch transformers datasets pytest
pytest tests/self_optimizing/ -q

Local Apple Silicon

free
$0· Day-to-day development, MPS profiling
python3 -m pytest tests/self_optimizing/ -q
python3 -m demos.npcot_scale_practicality
python3 -m benchmarks.benchmark_npcot_library --device mps

Serverless

library inference only
$0 (cold) → tiny· Production API, GPU-free autoscaling

The 475 KB standalone binary ships as a Lambda custom runtime. Cold start ~1 ms, warm consult ~4 ns.

Browser (WASM)

client-side
$0· Private-by-default inference, offline tools
import init, { NpcotRuntime } from './npcot_wasm.js'
await init()
const lib = await fetch('/library.json').then(r => r.text())
const rt = new NpcotRuntime(lib)
rt.consult(hidden, array, length)

The 130 KB WASM binary loads faster than most page analytics scripts. It powers the live demo on this site.

Decision matrix

ScenarioPick
Day-to-day Mac devLocal Apple Silicon
One-off benchmark, tight budgetvast.ai
CI / automated GPU validationModal
Long training sweepRunPod
Production library-inference APIServerless
Ship to end users' browsersWASM