❯ /deploy — six targets
Pick where it runs.
Library inference needs no GPU. Library training needs a GPU once per skill. Six deployment targets — pick the one that matches the scenario.
vast.ai
cheapest$0.08–$0.15 / hr· Cost-sensitive benchmarks, ad-hoc GPU rentals
./packaging/scripts/vast_run.sh tests ./packaging/scripts/vast_run.sh bench3 ./packaging/scripts/vast_run.sh humaneval Qwen/Qwen3.5-1.5B lib.json
The April 18, 2026 verification run: $0.16 total cost.
Modal
zero ops$0.60–$3.75 / hr· CI jobs, serverless-style fire-and-forget
pip install modal && modal setup modal run packaging/modal_run.py::run_bench3 modal run packaging/modal_run.py::run_humaneval --model X --library Y
RunPod
spot pricing$0.20–$1.60 / hr· Long training sweeps, enterprise support
Provision a pod, SSH in, run the standard scripts:
git clone <mirror>/nCPU pip install torch transformers datasets pytest pytest tests/self_optimizing/ -q
Local Apple Silicon
free$0· Day-to-day development, MPS profiling
python3 -m pytest tests/self_optimizing/ -q python3 -m demos.npcot_scale_practicality python3 -m benchmarks.benchmark_npcot_library --device mps
Serverless
library inference only$0 (cold) → tiny· Production API, GPU-free autoscaling
The 475 KB standalone binary ships as a Lambda custom runtime. Cold start ~1 ms, warm consult ~4 ns.
Browser (WASM)
client-side$0· Private-by-default inference, offline tools
import init, { NpcotRuntime } from './npcot_wasm.js'
await init()
const lib = await fetch('/library.json').then(r => r.text())
const rt = new NpcotRuntime(lib)
rt.consult(hidden, array, length)The 130 KB WASM binary loads faster than most page analytics scripts. It powers the live demo on this site.
Decision matrix
| Scenario | Pick |
|---|---|
| Day-to-day Mac dev | Local Apple Silicon |
| One-off benchmark, tight budget | vast.ai |
| CI / automated GPU validation | Modal |
| Long training sweep | RunPod |
| Production library-inference API | Serverless |
| Ship to end users' browsers | WASM |