Apples-to-Apples Benchmark Policy

The rules we follow before publishing any performance claim.

We believe benchmark numbers are only meaningful when every language gets a fair shot. This policy defines the minimum standard for results that can be shared publicly — in research, blog posts, talks, or release notes.

Non-Negotiable Requirements

Same workload semantics across languages

•No proxies, stubs, canned outputs, or precomputed output shortcuts.
•Each language implementation must execute equivalent algorithmic work for the same input data.

Same benchmark harness and inputs

•Use canonical PLB-CI problem inputs and workload definitions.
•No language-specific workload simplification.

Same environment class

•Docker-backed reproducible runs only for public claims.
•Fixed host profile (dedicated VM or dedicated bare-metal).

Full matrix completeness

•No missing required language lanes for publication mode.
•No "shared-subset only" claims presented as full-language conclusions.

Reproducible provenance

•Pin upstream benchmark suite by commit (not floating branch tip).
•Record toolchain versions, host metadata, and run timestamp IDs.

Strict execution mode

•Publication runs must not use --no-docker.
•Publication runs must not use --allow-preflight-degraded.
•Publication runs must not use permissive missing-lane behavior.

Publication Blockers

6 resolved · 2 open

closedB1

VibeLang adapter parity

All 18 adapters rewritten to read .benchmark_input and use canonical problem sizes. 11 fully canonical, 4 runtime-backed, 3 use native Float codegen with canonical f64 output.

openB2

Runtime/compile matrix incomplete

Some required lanes still missing in strict publication runs. Go, Kotlin, and Swift have build output path mismatches in the PLB-CI BenchTool.

closedB3

Docker reproducibility

Docker-backed full run completed on 2026-03-11. Docker 28.0.1 on WSL2, AMD Ryzen 9 5900X, 31.3 GiB RAM.

closedB4

Upstream benchmark suite ref is floating

language_matrix.json now pins plbci_ref to immutable commit ad18b203dd1769724f4eea94fc3ac1e99f6593e0.

closedB5

Publication gating permits permissive paths

Publication mode now hard-fails degraded/permissive execution paths in collect, validate, and compare scripts.

closedB6

Runtime feature gaps block canonical parity

Added math.edigits, net.*/http.server_bench, crypto.secp256k1_bench, json.canonical + hash.md5_hex for full workload parity.

openB7

Go and Kotlin baseline data is corrupted

Both Docker runs produced ~1.1-1.9ms for every Go/Kotlin problem regardless of complexity. The PLB-CI harness is failing to capture real execution timing. Root cause: build output path mismatch — binaries not found at expected locations.

closedB8

Float codegen blocks canonical parity

Float codegen landed. nbody, spectral-norm, and mandelbrot adapters rewritten with native f64 and producing canonical output.

Current Status

Publication: Blocked (B2, B7 open)
Shareability: VibeLang vs Rust/C/C++/Zig/Python/TS comparisons are honest and can be cited with caveats about Go/Kotlin exclusion and runtime-backed adapters
Last full run: 2026-03-11T17:48:10Z (Docker-backed, full profile)

This policy is maintained in the VibeLang repository at benchmarks/third_party/APPLE_TO_APPLE_BENCHMARK_POLICY.md

View on GitHub Benchmark Results