Apples-to-Apples Benchmark Policy

The rules we follow before publishing any performance claim.

We believe benchmark numbers are only meaningful when every language gets a fair shot. This policy defines the minimum standard for results that can be shared publicly — in research, blog posts, talks, or release notes.

Non-Negotiable Requirements

1

Same workload semantics across languages

  • No proxies, stubs, canned outputs, or precomputed output shortcuts.
  • Each language implementation must execute equivalent algorithmic work for the same input data.
2

Same benchmark harness and inputs

  • Use canonical PLB-CI problem inputs and workload definitions.
  • No language-specific workload simplification.
3

Same environment class

  • Docker-backed reproducible runs only for public claims.
  • Fixed host profile (dedicated VM or dedicated bare-metal).
4

Full matrix completeness

  • No missing required language lanes for publication mode.
  • No "shared-subset only" claims presented as full-language conclusions.
5

Reproducible provenance

  • Pin upstream benchmark suite by commit (not floating branch tip).
  • Record toolchain versions, host metadata, and run timestamp IDs.
6

Strict execution mode

  • Publication runs must not use --no-docker.
  • Publication runs must not use --allow-preflight-degraded.
  • Publication runs must not use permissive missing-lane behavior.

Publication Blockers

6 resolved · 2 open

closedB1

VibeLang adapter parity

All 18 adapters rewritten to read .benchmark_input and use canonical problem sizes. 11 fully canonical, 4 runtime-backed, 3 use native Float codegen with canonical f64 output.

openB2

Runtime/compile matrix incomplete

Some required lanes still missing in strict publication runs. Go, Kotlin, and Swift have build output path mismatches in the PLB-CI BenchTool.

closedB3

Docker reproducibility

Docker-backed full run completed on 2026-03-11. Docker 28.0.1 on WSL2, AMD Ryzen 9 5900X, 31.3 GiB RAM.

closedB4

Upstream benchmark suite ref is floating

language_matrix.json now pins plbci_ref to immutable commit ad18b203dd1769724f4eea94fc3ac1e99f6593e0.

closedB5

Publication gating permits permissive paths

Publication mode now hard-fails degraded/permissive execution paths in collect, validate, and compare scripts.

closedB6

Runtime feature gaps block canonical parity

Added math.edigits, net.*/http.server_bench, crypto.secp256k1_bench, json.canonical + hash.md5_hex for full workload parity.

openB7

Go and Kotlin baseline data is corrupted

Both Docker runs produced ~1.1-1.9ms for every Go/Kotlin problem regardless of complexity. The PLB-CI harness is failing to capture real execution timing. Root cause: build output path mismatch — binaries not found at expected locations.

closedB8

Float codegen blocks canonical parity

Float codegen landed. nbody, spectral-norm, and mandelbrot adapters rewritten with native f64 and producing canonical output.

Current Status

Publication
Blocked (B2, B7 open)
Shareability
VibeLang vs Rust/C/C++/Zig/Python/TS comparisons are honest and can be cited with caveats about Go/Kotlin exclusion and runtime-backed adapters
Last full run
2026-03-11T17:48:10Z (Docker-backed, full profile)

This policy is maintained in the VibeLang repository at benchmarks/third_party/APPLE_TO_APPLE_BENCHMARK_POLICY.md