Apples-to-Apples Benchmark Policy
The rules we follow before publishing any performance claim.
We believe benchmark numbers are only meaningful when every language gets a fair shot. This policy defines the minimum standard for results that can be shared publicly — in research, blog posts, talks, or release notes.
Non-Negotiable Requirements
Same workload semantics across languages
- •No proxies, stubs, canned outputs, or precomputed output shortcuts.
- •Each language implementation must execute equivalent algorithmic work for the same input data.
Same benchmark harness and inputs
- •Use canonical PLB-CI problem inputs and workload definitions.
- •No language-specific workload simplification.
Same environment class
- •Docker-backed reproducible runs only for public claims.
- •Fixed host profile (dedicated VM or dedicated bare-metal).
Full matrix completeness
- •No missing required language lanes for publication mode.
- •No "shared-subset only" claims presented as full-language conclusions.
Reproducible provenance
- •Pin upstream benchmark suite by commit (not floating branch tip).
- •Record toolchain versions, host metadata, and run timestamp IDs.
Strict execution mode
- •Publication runs must not use --no-docker.
- •Publication runs must not use --allow-preflight-degraded.
- •Publication runs must not use permissive missing-lane behavior.
Publication Blockers
6 resolved · 2 open
VibeLang adapter parity
All 18 adapters rewritten to read .benchmark_input and use canonical problem sizes. 11 fully canonical, 4 runtime-backed, 3 use native Float codegen with canonical f64 output.
Runtime/compile matrix incomplete
Some required lanes still missing in strict publication runs. Go, Kotlin, and Swift have build output path mismatches in the PLB-CI BenchTool.
Docker reproducibility
Docker-backed full run completed on 2026-03-11. Docker 28.0.1 on WSL2, AMD Ryzen 9 5900X, 31.3 GiB RAM.
Upstream benchmark suite ref is floating
language_matrix.json now pins plbci_ref to immutable commit ad18b203dd1769724f4eea94fc3ac1e99f6593e0.
Publication gating permits permissive paths
Publication mode now hard-fails degraded/permissive execution paths in collect, validate, and compare scripts.
Runtime feature gaps block canonical parity
Added math.edigits, net.*/http.server_bench, crypto.secp256k1_bench, json.canonical + hash.md5_hex for full workload parity.
Go and Kotlin baseline data is corrupted
Both Docker runs produced ~1.1-1.9ms for every Go/Kotlin problem regardless of complexity. The PLB-CI harness is failing to capture real execution timing. Root cause: build output path mismatch — binaries not found at expected locations.
Float codegen blocks canonical parity
Float codegen landed. nbody, spectral-norm, and mandelbrot adapters rewritten with native f64 and producing canonical output.
Current Status
- Publication
- Blocked (B2, B7 open)
- Shareability
- VibeLang vs Rust/C/C++/Zig/Python/TS comparisons are honest and can be cited with caveats about Go/Kotlin exclusion and runtime-backed adapters
- Last full run
- 2026-03-11T17:48:10Z (Docker-backed, full profile)
This policy is maintained in the VibeLang repository at benchmarks/third_party/APPLE_TO_APPLE_BENCHMARK_POLICY.md