Personal agent benchmark pack

先听这页的人话版

这一页不是在堆术语，它像把“Personal 机器人朋友 benchmark pack”这台小机器搬到桌上，当着你的面拆开给你看。你先不用全记住，先抓住它到底在忙什么：The Personal 机器人朋友 Benchmark Pack is a small repo-backed QA scenario pack for local personal as…。

如果把这页当成“给普通人看的版本”，你最应该带走的是：它到底在教你一件什么事、什么时候要这样做、以及哪里最容易踩坑。

原始路径：/concepts/personal-agent-benchmark-pack 章节数量：4 查看原文

第 1 站

Start Here

这一节在讲一类能力是怎么工作的：它能做什么、不能做什么、适合在什么场景下调用。

这段在解决什么

这一节在讲一类能力是怎么工作的：它能做什么、不能做什么、适合在什么场景下调用。

为什么值得看

你理解的是能力边界，不只是功能名字。

真要动手时

如果这节里同时出现命令、配置和例子，优先先看例子，再回头看配置。

先别急着背术语

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The Personal 机器人朋友 Benchmark Pack is a small repo-backed QA scenario pack for local personal assistant workflows. It is…。

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The Personal 机器人朋友 Benchmark Pack is a small repo-backed QA scenario pack for local personal assistant workflows. It is…。

原文小纸条

The Personal Agent Benchmark Pack is a small repo-backed QA scenario pack for local personal assistant workflows. It is not a generic model benchmark and it does not require a new runner. The pack reuses the private QA stack described in QA overview, the synthetic QA channel, and the existing qa/scenarios markdown catalog.

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The first pack is intentionally narrow:。

原文小纸条

The first pack is intentionally narrow:

像准备清单

这一串条目别硬背，把它当成“Start Here”门口贴出来的几张便签就行。它们在提醒你先备好什么、别漏掉什么、哪里最容易走错：fake personal reminders through local cro…、fake DM and thread reply routing through…、fake preference recall from the temporary…、fake secret no-echo checks。

原文小纸条

fake personal reminders through local cron delivery
fake DM and thread reply routing through qa-channel
fake preference recall from the temporary QA workspace memory files
fake secret no-echo checks
safe read-backed tool followthrough after a short approval-style turn
approval denial stop behavior for a sensitive local read request
proof-backed task status reporting that keeps pending, blocked, and done separate
share-safe diagnostics artifacts that keep useful status while omitting raw personal content
proof-backed completion claims that avoid fake progress before local evidence exists
failure recovery that reports partial status and keeps retry boundaries clear

第 2 站

Scenarios

这一节主要在解释“Scenarios”到底是干什么的，以及你什么时候会遇到它。

这段在解决什么

这一节主要在解释“Scenarios”到底是干什么的，以及你什么时候会遇到它。

为什么值得看

如果你是第一次接触 OpenClaw，这一节最值得看的不是术语本身，而是它背后的使用场景和限制。

真要动手时

真正动手时，先看它有没有默认值、有没有必须打开的选项、以及会不会影响安全边界。

先别急着背术语

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The machine-readable pack metadata lives in extensions/qa-lab/src/scenario-packs.ts. Run the pack with --pack personal-…。

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The machine-readable pack metadata lives in extensions/qa-lab/src/scenario-packs.ts. Run the pack with --pack personal-…。

原文小纸条

The machine-readable pack metadata lives in extensions/qa-lab/src/scenario-packs.ts. Run the pack with --pack personal-agent:

像魔法口令拆解

这是一串终端口令，像你站在控制台前，一下下按按钮让机器醒过来。

这一行“OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm openclaw qa suite \”是在给电脑递一张小纸条，告诉它眼下该做哪一步。
这一行“--provider-mode mock-openai \”是在给电脑递一张小纸条，告诉它眼下该做哪一步。
这一行“--pack personal-agent \”是在给电脑递一张小纸条，告诉它眼下该做哪一步。

原文代码块

OPENCLAW_ENABLE_PRIVATE_QA_CLI=1 pnpm openclaw qa suite \
  --provider-mode mock-openai \
  --pack personal-agent \
  --concurrency 1

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：--pack is additive with repeated --scenario flags. Explicit scenarios run first, then the pack scenarios run in QA_PERS…。

原文小纸条

--pack is additive with repeated --scenario flags. Explicit scenarios run first, then the pack scenarios run in QA_PERSONAL_AGENT_SCENARIO_IDS order with duplicates removed.

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The pack is designed for qa-消息通道 with mock-openai or another local QA provider lane. It should not be pointed at live c…。

原文小纸条

The pack is designed for qa-channel with mock-openai or another local QA provider lane. It should not be pointed at live chat services or real personal accounts.

第 3 站

Privacy 聪明脑袋模型

这一节在讲规则和边界：什么默认允许、什么必须显式打开、什么被禁止。

这段在解决什么

这一节在讲规则和边界：什么默认允许、什么必须显式打开、什么被禁止。

为什么值得看

这种内容决定了 OpenClaw 是“能做”还是“现在还不能做”，读懂它比记术语更重要。

真要动手时

你可以把这一节当成权限说明书，真正配置时优先盯住 default、required、allow、deny 这几个词。

先别急着背术语

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The scenarios use only fake users, fake preferences, fake secrets, and the temporary QA 门口的小门卫 workspace created by the…。

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：The scenarios use only fake users, fake preferences, fake secrets, and the temporary QA 门口的小门卫 workspace created by the…。

原文小纸条

The scenarios use only fake users, fake preferences, fake secrets, and the temporary QA gateway workspace created by the suite. They must not read or write real OpenClaw user memory, sessions, credentials, launch agents, global configs, or live gateway state.

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：Artifacts stay under the existing QA suite artifact directory and should be treated like test output. Redaction checks…。

原文小纸条

Artifacts stay under the existing QA suite artifact directory and should be treated like test output. Redaction checks use fake markers so failures are safe to inspect and file in issues.

第 4 站

Extending The Pack

这一节在讲一类能力是怎么工作的：它能做什么、不能做什么、适合在什么场景下调用。

这段在解决什么

这一节在讲一类能力是怎么工作的：它能做什么、不能做什么、适合在什么场景下调用。

为什么值得看

你理解的是能力边界，不只是功能名字。

真要动手时

如果这节里同时出现命令、配置和例子，优先先看例子，再回头看配置。

先别急着背术语

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：Add new cases under qa/scenarios/personal/, then add the scenario id to QA_PERSONAL_机器人朋友_SCENARIO_IDS. Keep each case…。

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：Add new cases under qa/scenarios/personal/, then add the scenario id to QA_PERSONAL_机器人朋友_SCENARIO_IDS. Keep each case…。

原文小纸条

Add new cases under qa/scenarios/personal/, then add the scenario id to QA_PERSONAL_AGENT_SCENARIO_IDS. Keep each case small, local, deterministic in mock-openai, and focused on one personal assistant behavior.

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：Good follow-up candidates:。

原文小纸条

Good follow-up candidates:

像准备清单

这一串条目别硬背，把它当成“Extending The Pack”门口贴出来的几张便签就行。它们在提醒你先备好什么、别漏掉什么、哪里最容易走错：redacted trajectory export checks、local-only 新本领插件 workflow checks。

原文小纸条

redacted trajectory export checks
local-only plugin workflow checks

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：Avoid adding a new runner, 新本领插件, dependency, live transport, or 聪明脑袋模型 judge until the scenario catalog has enough sta…。

原文小纸条

Avoid adding a new runner, plugin, dependency, live transport, or model judge until the scenario catalog has enough stable cases to justify that surface.

像讲绘本

如果把这一段摆成一个小场景，你会看到几样东西正在互相打招呼、拦路或者传东西。别急着记名词，先抓住它此刻到底在发生什么：---。

原文小纸条

---

AdSense 连接验证已经放在页面头部；广告单元等站点审批通过后再启用。

google-adsense-account: ca-pub-3833673520933536