GPT-5.5 / Codex parity maintainer notes

先讲这一页到底在解决什么

Maintainer notes 把 GPT-5.5 parity 的四个 PR 边界讲清楚

这页给 reviewer 和 maintainer 使用：PR A 管 strict-agentic 执行，PR B 管 runtime truthfulness，PR C 管工具 schema 与 replay/liveness 正确性，PR D 管同场景 parity harness 和 release gate。只有 runtime 修复、truthfulness 套件和 parity verdict 都过关，才能说 GPT-5.5 达到 parity。

原文共 16 节，先看 Start Here 路径：/help/gpt55-codex-agentic-parity-maintainers 查看官方原文

先分清责任

四个 PR 像四个柜子，每个柜子只放自己的工具

Maintainer review 最怕一个 PR 既想修 runtime，又想改 benchmark，还顺手重写权限语义。官方把边界列清，是为了让每块都能独立判断。

A

strict-agentic

负责 executionContract、same-turn follow-through、update_plan 非终点语义、明确 blocked states。

B

truthfulness

负责 Codex OAuth scope、typed provider/runtime failure、/elevated full 是否真的可用。

C

correctness

负责 OpenAI/Codex tool compatibility、parameter-free strict schema、replay-invalid 和 liveness state。

D

harness

负责 GPT-5.5 vs Opus 4.6 scenario pack、parity docs、report 和 release-gate mechanics。

Review 顺序

先合 runtime 修路，再用 PR D 当证明层

推荐顺序是 PR A、PR B、PR C、PR D。PR D 是 proof layer，不应该反过来拖住 runtime-correctness PR，也不应该假装自己修了 runtime 行为。

看 PR A

GPT-5 runs 是否 act or fail closed；update_plan 是否不再被当成终点；scope 是否保持 GPT-5-first 和 embedded-PI。

看 PR B

auth、proxy、runtime failure 是否不再压成 generic error；full-access blocked reasons 是否给模型和用户都能看见。

看 PR C

strict tool registration 是否 predictable；空参数工具是否通过 schema；replay 和 compaction 状态是否诚实。

看 PR D

场景包是否可复现，是否包含 mutating replay-safety lane，报告是否人和自动化都能读。

Release gate

没有四盏绿灯，就不要说 GPT-5.5 parity 或 superiority

放行条件包括：PR A/B/C 已合并，PR D 在合并后的 runtime 上干净跑完第一波 pack，runtime-truthfulness 回归仍然绿色，parity report 没有 fake-success，也没有 stop behavior 回退。

PR D 证据

提供 GPT-5.5 和 Opus 4.6 同场景比较，产物包括 qa-suite-summary.json 和 parity report。

PR B 证据

auth、proxy、DNS 和 full-access truthfulness 仍由 deterministic suites 证明。

不能替代

场景 harness 不替代 runtime truthfulness；truthfulness 套件也不替代同场景 parity 比较。

机器 verdict

qa-agentic-parity-summary.json 是自动化可读的通过或失败判断。

合并工作流

准备落 PR 时，先问“证据条够不够”

维护者流程强调低风险、可重复：先确认问题可复现、根因在 touched code、修复在相关路径、回归测试或手动验证说明到位，再进入标准 landing 流程。

证据条

症状、根因、修复路径、测试或验证说明都要能对上。

标签和线程

该 auto-close 的标签要处理；有 blocker review thread 时不要硬合。

本地验证

至少按 touched surface 跑 pnpm check:changed，测试相关改动再跑 pnpm test:changed。

合并后

检查 linked issues auto-close、CI、main 上状态，并搜索重复 PR 或 issue，只用 canonical reference 关闭。

最后记住

维护者页的目标，是让 parity 结论经得起复查

不要让一个 PR 偷偷承担所有责任。按边界 review，按证据放行，最后的 parity claim 才不是“我们感觉更好了”，而是“同一规则下，它真的过了”。