🐛 fix(server): rehydrate subagent runs from DB on cold replica (#15788 )

* 🐛 fix(server): rehydrate subagent runs from DB on cold replica Server-side hetero persistence kept per-operation state in a module-level map. On a cold serverless replica (or any cross-replica batch), the main agent state is rebuilt from DB but `MainAgentRunState.subagents` was seeded empty. A continuing subagent event then hit the `!existing` branch of `ensureRun` and forked a brand-new isolation thread for a parentToolCallId that already had one — producing piles of generic "Subagent" threads that were never attached to the right thread. Desktop never hit this (one long-lived run-state closure). Rebuild `state.main.subagents` from DB the same way the main half is rehydrated: add `rehydrateSubagentRunsState` to @lobechat/heterogeneous-agents and call a new `refreshSubagentRunsFromDb` each ingest. Only runs MISSING from memory are rehydrated (warm accumulators win); finalized (Active) threads are excluded so completed spawns are never resurrected. Sibling of #15783 (main message chaining) — same root cause, subagent half. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(server): scope subagent rehydration to operation + de-dupe inner tools Two follow-up fixes on the cold-replica subagent rehydration: - P1: de-dupe inner tool creation against the run-lifetime tool set, not just the per-turn `persistedIds`. Per-turn state is reset on every turn boundary and starts empty after a rehydration, so a replayed / continued tools_calling on a cold replica minted a SECOND tool message for an id the run already wrote. `lifetimeToolCallIds` survives boundaries and is restored from DB, so it is the durable de-dupe key. Mirrors the main-agent retry protection. - P2: scope `refreshSubagentRunsFromDb` to the current operation. Topics are reused across turns; a prior crashed/cancelled run can leave a subagent thread stuck `Processing`. Rehydrating purely by topic+status would merge that unrelated thread into the new operation's reducer state and finalize it on the new run's terminal drain. Stamp `operationId` on the subagent thread metadata at creation and filter rehydration by it. Adds regression cases for both (each verified to fail without its fix). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
✨ feat: support drag-to-reorder for desktop tabs (#15787 )
2026-06-14 03:30:19 +00:00 · 2026-06-14 03:13:35 +08:00 · 2026-06-14 02:57:21 +08:00 · 2026-06-14 02:00:49 +08:00 · 2026-06-14 01:40:36 +08:00 · 2026-06-14 00:56:53 +08:00
1738 changed files with 59760 additions and 15163 deletions
@@ -0,0 +1,376 @@
+---
+name: agent-testing
+description: >
+  Agentic end-to-end testing for LobeHub: backend verification via the CLI,
+  frontend verification via agent-browser (Electron), full-stack verification in
+  the browser, and bot-channel verification via osascript. Local-first today,
+  designed to extend to cloud automation. Triggers on 'cli test', 'test with cli',
+  'verify with cli', 'backend test with cli', 'local test', 'test in electron',
+  'test desktop', 'test bot', 'bot test', 'test in discord', 'test in telegram',
+  'test in slack', 'test in wechat', 'test in weixin', 'test in lark', 'test in feishu',
+  'test in qq', 'manual test', 'osascript', 'test report', or any local
+  end-to-end verification task.
+---
+
+# Agent Testing (Agentic End-to-End Verification)
+
+One skill for all agentic end-to-end testing — local-first today, designed to
+also run as full cloud automation. Every test session follows the same
+four-step contract:
+
+```
+Step -1: Plan approval  →  Step 0: Env + Auth  →  Step 1: Pick surface  →  Step 2: Run  →  Step 3: Structured report
+```
+
+## Step -1 — Plan approval for non-trivial tests
+
+Skip directly to Step 0 if: the test is a single re-run after a fix, the plan
+was already agreed on, or the user gave exact commands.
+
+Otherwise, propose a test plan (surface, cases, expected evidence, assumptions)
+and use the runtime structured question tool (`request_user_input` /
+ask-user-question equivalent) with two fixed choices:
+
+1. `开始执行 (Recommended)` — 测试方案没问题，开始执行
+2. `先讨论下` — 方案有问题，先讨论下
+
+Wait for the user's choice before proceeding.
+
+## Step 0 — Environment setup + auth check (mandatory)
+
+Step 0 is about getting the environment ready: **dependencies are healthy**
+and **auth is green**. A test run that dies halfway on a missing dependency or
+a login wall wastes the whole session — clear both gates BEFORE writing a
+single test step.
+
+### 0.0 Resolve the current test environment
+
+Before starting a dev server, checking auth, opening agent-browser, or writing
+test steps, print and confirm the current local test environment:
+
+```bash
+./.agents/skills/agent-testing/scripts/test-env.sh
+```
+
+This command is the source of truth for local test ports. It reads the current
+shell plus `.env` files using the same precedence as `scripts/runWithEnv.mts`,
+then prints:
+
+- `APP_URL`
+- `PORT`
+- `SERVER_URL`
+- `AUTH_TRUSTED_ORIGINS`
+- `SPA_PORT`
+- `MOBILE_SPA_PORT`
+- `DESKTOP_PORT`
+
+For commands that need these values, export them from the same resolver:
+
+```bash
+eval "$(./.agents/skills/agent-testing/scripts/test-env.sh --exports)"
+```
+
+Do not rely on hard-coded port tables. If the printed values do not match the
+running dev server, fix/export the env first, then continue.
+
+### 0.1 Dependencies are installed — root AND standalone apps
+
+The root pnpm workspace does **NOT** cover every app: `pnpm-workspace.yaml`
+lists `packages/**`, `e2e`, `apps/server`, and only `apps/desktop/src/main` —
+**`apps/desktop` and `apps/cli` are standalone**, each keeping its own
+`node_modules` with its own links into `packages/`. A root install does not
+refresh them, so install in every app the test will touch:
+
+```bash
+pnpm install                    # root workspace
+cd apps/desktop && pnpm install # Electron surface
+cd apps/cli && pnpm install     # CLI surface
+```
+
+Symptom of a stale standalone install: the build/launch fails to resolve a
+recently added workspace package — `Rolldown failed to resolve import
+"@lobechat/<pkg>"` (Electron) or `Cannot find module '@lobechat/<pkg>'` (CLI).
+
+### 0.2 Run scripts from the repo root
+
+All paths in this skill (`./.agents/skills/agent-testing/...`) are
+repo-root-relative, and background commands inherit the current working
+directory — a script launched while `cwd` is `apps/desktop` fails with
+`No such file or directory`. Verify `pwd` is the repo root before launching
+long-running scripts.
+
+### 0.3 Init local dev env without `.env`
+
+For Web smoke against local code, start a **normal local dev environment**.
+First check the repo root for `.env`:
+
+- If `.env` exists, use the existing local configuration and start the dev
+  server normally.
+- If `.env` does not exist, use the agent-testing env bootstrap.
+
+Do not start the standalone e2e server as the product under test.
+
+Use `scripts/init-dev-env.sh`. It follows the e2e setup pattern — Postgres,
+migrations, auth/key-vault/S3 test env, seed user — but it is owned by this
+skill and starts the repo's dev server (`pnpm run dev:next` / `bun run dev`),
+not `e2e/scripts/setup.ts --start`. The script hard-blocks when root `.env`
+exists, so it cannot accidentally override a user's local config. When `.env`
+exists, do not call any `init-dev-env.sh` subcommand.
+
+Decision flow:
+
+```bash
+if [[ -f .env ]]; then
+  bun run dev
+else
+  ./.agents/skills/agent-testing/scripts/init-dev-env.sh setup-db
+  ./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+  ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
+fi
+```
+
+Bootstrap flow when no `.env` exists:
+
+```bash
+# From repo root. Managed DB flow requires Docker Desktop.
+./.agents/skills/agent-testing/scripts/init-dev-env.sh setup-db
+./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
+```
+
+If using an existing Postgres instead of the managed Docker DB, set
+`DATABASE_URL` and skip `setup-db`:
+
+```bash
+DATABASE_URL=postgresql://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh migrate
+DATABASE_URL=postgresql://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+DATABASE_URL=postgresql://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
+```
+
+For backend-only checks, `dev-next` is available, but Web smoke needs the
+full-stack `dev` command so Next can proxy the SPA HTML from Vite:
+
+```bash
+./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
+```
+
+Useful subcommands:
+
+```bash
+./.agents/skills/agent-testing/scripts/init-dev-env.sh env       # print exports
+./.agents/skills/agent-testing/scripts/init-dev-env.sh write     # write .records/env/agent-testing-dev.env
+./.agents/skills/agent-testing/scripts/init-dev-env.sh migrate   # migrations only
+./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user # seed user + CLI API key
+./.agents/skills/agent-testing/scripts/init-dev-env.sh qstash    # local QStash for workflow paths
+./.agents/skills/agent-testing/scripts/init-dev-env.sh clean-db  # remove managed DB container
+```
+
+Default script env:
+
+- `APP_URL=http://localhost:3010`
+- `DATABASE_URL=postgresql://postgres:postgres@localhost:5433/postgres`
+- `DATABASE_DRIVER=node`
+- `FEATURE_FLAGS=-agent_self_iteration` so local smoke does not require QStash
+- Local QStash defaults (`QSTASH_URL`, `QSTASH_TOKEN`, signing keys) are exported;
+  run `init-dev-env.sh qstash` in a separate terminal when the path under test
+  triggers QStash/Workflow.
+- `KEY_VAULTS_SECRET`, `AUTH_SECRET`, auth verification off
+- S3 mock vars
+- Managed DB container: `lobehub-agent-testing-postgres`
+
+`seed-user` creates `agent-testing@lobehub.com` / `TestPassword123!` with
+onboarding already completed, plus a local API key in
+`.records/env/agent-testing-cli.env` for CLI automation. When running Cucumber
+against this dev server, pass the same script env into the test process too;
+Cucumber has its own `BeforeAll` seed path and it must see `DATABASE_URL`
+instead of silently skipping setup:
+
+```bash
+cd e2e
+# Only in the no-.env branch.
+eval "$(../.agents/skills/agent-testing/scripts/init-dev-env.sh env)"
+BASE_URL=http://localhost:3010 HEADLESS=true bun run test:smoke
+```
+
+### 0.4 Auth is green for the selected surface
+
+**Auth is the gate for automated testing, but the gate is surface-scoped.**
+Pick the intended surface first when it is already clear from the task, then
+check only that surface. Do not block a Web test on CLI device-code auth or an
+Electron login state unless the test spans those surfaces.
+
+```bash
+./.agents/skills/agent-testing/scripts/setup-auth.sh status --surface web
+```
+
+Use `status` with no `--surface` only for cross-surface test plans.
+
+| Surface  | Mechanism                                     | One-key path             | Standard check                            |
+| -------- | --------------------------------------------- | ------------------------ | ----------------------------------------- |
+| CLI      | Seeded API key, device-code fallback          | `setup-auth.sh cli-seed` | `setup-auth.sh status --surface cli`      |
+| Web      | Seeded better-auth login into `agent-browser` | `setup-auth.sh web-seed` | `setup-auth.sh status --surface web`      |
+| Electron | App's own persistent login state              | Log in once in the app   | `setup-auth.sh status --surface electron` |
+| Bot      | Native apps already logged in                 | —                        | per-platform screenshot                   |
+
+Login-state checks are standardized — do NOT hand-roll `window.__LOBE_STORES`
+eval snippets; use `scripts/app-probe.sh auth` (returns `{ isSignedIn, userId }`,
+works for Electron CDP and web sessions via `AB_TARGET`).
+
+For Web tests, the test surface is always `agent-browser --session lobehub-dev`.
+Use `setup-auth.sh web-seed` first in the seeded local env. The user's normal
+Chrome is only a source for copying the Cookie header when seed auth is not
+available or `status --surface web` still fails. If Chrome is already logged in,
+do not open a login page; verify agent-browser first, then request the Network
+`Cookie:` header only if that verification fails. Full background and failure modes:
+[references/auth.md](./references/auth.md).
+
+## Step 1 — Pick the surface by change scope
+
+| Change scope                                            | Default surface                      | Why                                                               | Guide                              |
+| ------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------------------- | ---------------------------------- |
+| **Backend** (TRPC router / service / model / migration) | **CLI**                              | Fastest loop, text-assertable output, zero UI flakiness           | [cli/index.md](./cli/index.md)     |
+| **Pure frontend** (components, store, styles, UX)       | **Electron** (agent-browser + CDP)   | Primary product shape; `__LOBE_STORES` state introspection        | [ui/electron.md](./ui/electron.md) |
+| **Full-stack** (new API + UI consuming it)              | **Web** (browser + local dev server) | One surface where network requests and UI are observable together | [ui/web.md](./ui/web.md)           |
+| **Bot channels** (Discord / WeChat / Lark / …)          | Native app via osascript / bridge    | Only way to exercise the real channel end-to-end                  | `bot/<platform>/index.md`          |
+
+Escalate, don't duplicate: verify a backend change with the CLI first; only add
+a UI pass when the change actually affects the UI.
+
+### Environment support (local macOS vs cloud Linux)
+
+The decisive constraint per surface is **how evidence (screenshots) is
+captured**: CDP-based capture (`agent-browser screenshot`) renders from the
+browser engine and needs no real display; OS-level capture (`screencapture`,
+osascript) is macOS-only.
+
+| Surface  | macOS (local) | Linux / cloud (headless)                                  | Screenshot mechanism                                   |
+| -------- | ------------- | --------------------------------------------------------- | ------------------------------------------------------ |
+| CLI      | ✅            | ✅                                                        | n/a — text output                                      |
+| Web      | ✅            | ✅ headless Chromium works natively                       | CDP — no display needed                                |
+| Electron | ✅            | ⚠️ runs, but needs a display server: wrap with `xvfb-run` | CDP works under Xvfb; `capture-app-window.sh` does NOT |
+| Bot      | ✅            | ❌ osascript + native apps are macOS-only                 | macOS `screencapture` only                             |
+
+When a test must stay cloud-portable, prefer CDP-based evidence over
+OS-level capture wherever both exist.
+
+### Bot platforms
+
+| Platform      | Guide                                            | Quick switcher        |
+| ------------- | ------------------------------------------------ | --------------------- |
+| Discord       | [bot/discord/index.md](./bot/discord/index.md)   | `Cmd+K`               |
+| Slack         | [bot/slack/index.md](./bot/slack/index.md)       | `Cmd+K`               |
+| Telegram      | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F`               |
+| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md)     | `Cmd+F`               |
+| Lark / 飞书   | [bot/lark/index.md](./bot/lark/index.md)         | `Cmd+K`               |
+| QQ            | [bot/qq/index.md](./bot/qq/index.md)             | `Cmd+F`               |
+| iMessage      | [bot/imessage/index.md](./bot/imessage/index.md) | bridge (no osascript) |
+
+Each platform folder contains an `index.md` (activation, navigation,
+send-message, verification snippets) and a `test-<platform>-bot.sh` script
+sharing the interface:
+
+```bash
+./.agents/skills/agent-testing/bot/<platform>/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
+```
+
+New to osascript automation? Read
+[references/osascript.md](./references/osascript.md) first — it is a general
+macOS-automation asset (activate, type, paste, screenshot, accessibility reads,
+gotchas), not bot-specific.
+
+## Step 2 — Run
+
+Surface guides above carry the detailed workflows. Shared infrastructure:
+
+| Need                                 | Where                                                                |
+| ------------------------------------ | -------------------------------------------------------------------- |
+| Start / restart the local dev server | [references/dev-server.md](./references/dev-server.md)               |
+| `agent-browser` command reference    | [references/agent-browser.md](./references/agent-browser.md)         |
+| osascript patterns (general macOS)   | [references/osascript.md](./references/osascript.md)                 |
+| Agent gateway probing                | [references/agent-gateway.md](./references/agent-gateway.md)         |
+| Screen recording                     | [references/record-app-screen.md](./references/record-app-screen.md) |
+
+### Scripts
+
+All under `.agents/skills/agent-testing/scripts/`:
+
+| Script                    | Usage                                                                        |
+| ------------------------- | ---------------------------------------------------------------------------- |
+| `test-env.sh`             | Print/export the resolved local test env and ports                           |
+| `setup-auth.sh`           | One-stop auth setup & status check (`status` / `cli` / `web`)                |
+| `init-dev-env.sh`         | Self-contained local dev env (`setup-db` / `seed-user` / `dev-next` / `dev`) |
+| `app-probe.sh`            | LobeHub app probes: `auth` / `route` / `ops` / `goto <path>` / `errors`      |
+| `record-gif.sh`           | Frame-sequence → GIF for time-based behavior (streaming, timers, animations) |
+| `report-init.sh`          | Scaffold a structured test report (Step 3)                                   |
+| `electron-dev.sh`         | Manage Electron dev env (start/stop/status/restart, CDP 9222)                |
+| `capture-app-window.sh`   | Screenshot a specific app window (general; used by bot tests)                |
+| `record-app-screen.sh`    | Record app screen (video + periodic screenshots)                             |
+| `record-electron-demo.sh` | Record Electron app demo with ffmpeg                                         |
+| `agent-gateway/`          | Gateway probe / dump / analyze tools                                         |
+
+`app-probe.sh` is the LobeHub-specific fast path into app state — auth check,
+current route, running operations, and `goto <path>` quick navigation
+(`/agent/<agentId>/<topicId>`, `/task/<taskId>`, `/settings`, …) so a test can
+jump straight to the state under test instead of clicking through the UI. See
+[ui/electron.md](./ui/electron.md#lobehub-probes--quick-navigation) for usage.
+
+## Step 3 — Structured report (mandatory deliverable)
+
+Every automated test session ends with a structured, evidence-backed report —
+not a chat-only summary. Scaffold it up front and fill it as you test:
+
+```bash
+DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh my-feature "Verify my feature")
+# ... test, saving screenshots / CLI transcripts into $DIR/assets/ ...
+# fill $DIR/report.md (scope, case table with inline evidence, verdict, score) and $DIR/result.json
+```
+
+Reports live in `.records/reports/<timestamp>-<slug>/` (gitignored): `report.md`
+(human-readable, with screenshots/GIFs embedded directly in the case table),
+`result.json` (machine-readable pass/fail + score), `assets/` (evidence).
+Format spec and evidence rules:
+[references/report.md](./references/report.md).
+
+Two hard rules worth front-loading:
+
+- **Report language = the user's conversation language.** Write the ENTIRE
+  `report.md` (headings included) in the language the user is conversing in —
+  no mixed English. `result.json` keys/status values stay English.
+- **The case table is the main reading surface.** Prefer the compact
+  `# | case | result | key observation | evidence` shape and embed the
+  screenshot/GIF in the evidence cell. Use separate evidence sections only for
+  long CLI transcripts, HAR summaries, or supplemental detail.
+- **Visual evidence must render inline.** Screenshots and GIFs in `report.md`
+  must use Markdown image syntax like `![case 1](assets/case1.png)`. Do not
+  use bare file paths, Markdown links, or local file links as the primary
+  visual evidence; those make the report unreadable without opening each asset.
+- **Final replies must include visual evidence links.** When a run includes UI
+  screenshots or GIFs, include the report directory and the most important
+  visual artifacts in the final chat response. Each item must include a stable
+  label, an evidence caption describing the observed UI outcome, and a
+  repo-relative path, for example:
+  `[Image #1 - error toast shows provider auth failure](<report-dir>/assets/foo.png)`.
+  Use repo-relative paths, not absolute paths.
+- **Time-based behavior needs a GIF, not a screenshot.** If a case asserts
+  change over time (streaming output, a ticking timer, loading states,
+  animations), record it with `scripts/record-gif.sh` and embed the GIF —
+  a static screenshot cannot prove the behavior.
+
+## Directory map
+
+```
+agent-testing/
+├── SKILL.md            # this router
+├── cli/index.md        # backend verification via the LobeHub CLI
+├── ui/electron.md      # pure-frontend verification in the desktop app
+├── ui/web.md           # full-stack verification in the browser
+├── bot/<platform>/     # bot-channel verification (osascript / bridge)
+├── references/         # shared knowledge: auth, dev-server, agent-browser, osascript, report
+└── scripts/            # setup-auth, report-init, electron-dev, capture, recording, gateway
+```
+
+## Gotchas
+
+- agent-browser: see [references/agent-browser.md](./references/agent-browser.md#gotchas)
+- Electron: see [ui/electron.md](./ui/electron.md#electron-gotchas)
+- osascript: see [references/osascript.md](./references/osascript.md#gotchas)
@@ -2,7 +2,7 @@

 **App name:** `Discord` | **Process name:** `Discord`

-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.

 ## Activate & Navigate

@@ -92,6 +92,6 @@ echo "Screenshot saved to /tmp/discord-test-result.png"
 ## Script

 ```bash
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
+./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
+./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
 ```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"

 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -21,7 +21,7 @@ So the test surface is three layers:
  curl -sS -m4 -o /dev/null -w '%{http_code}\n' \
    "http://127.0.0.1:1234/api/v1/server/info?password=<PW>" # expect 200
  ```
- **Electron dev running with CDP**: `./.agents/skills/local-testing/scripts/electron-dev.sh start`
+- **Electron dev running with CDP**: `./.agents/skills/agent-testing/scripts/electron-dev.sh start`
 - The **iMessage Desktop branch** checked out (the `imessageBridge` IPC group
  and `@lobechat/chat-adapter-imessage` must be compiled into the main bundle).
  Run `pnpm install --ignore-scripts` at the repo root **and** in `apps/desktop/`
@@ -31,7 +31,7 @@ So the test surface is three layers:
 ## Fast path: automated script

 ```bash
-./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
+./.agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
 ```

 Asserts the whole flow and self-cleans (unique `applicationId` per run, removes
@@ -136,7 +136,7 @@ Verifies the leg the bridge uses to _reply_: `BlueBubblesApiClient.sendText`
 → `POST /api/v1/message/text`. Run the helper against your own number:

 ```bash
-./.agents/skills/local-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
+./.agents/skills/agent-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
 ```

 **Gotcha that bites everyone:** with `method=apple-script` and a _new_
@@ -2,7 +2,7 @@

 **App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`

-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.

 ## Activate & Navigate

@@ -56,6 +56,6 @@ screencapture /tmp/lark-bot-response.png
 ## Script

 ```bash
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
+./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
+./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
 ```
@@ -80,5 +80,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"

 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@

 **App name:** `QQ` | **Process name:** `QQ`

-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.

 ## Activate & Navigate

@@ -57,6 +57,6 @@ screencapture /tmp/qq-bot-response.png
 ## Script

 ```bash
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
+./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
+./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
 ```
@@ -72,5 +72,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"

 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@

 **App name:** `Slack` | **Process name:** `Slack`

-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.

 ## Activate & Navigate

@@ -68,6 +68,6 @@ screencapture /tmp/slack-bot-response.png
 ## Script

 ```bash
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
+./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
+./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
 ```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"

 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@

 **App name:** `Telegram` | **Process name:** `Telegram`

-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.

 ## Activate & Navigate

@@ -75,6 +75,6 @@ curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | j
 ## Script

 ```bash
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
+./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
+./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
 ```
@@ -75,5 +75,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"

 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@

 **App name:** `微信` or `WeChat` | **Process name:** `WeChat`

-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.

 ## Activate & Navigate

@@ -76,6 +76,6 @@ screencapture /tmp/wechat-bot-response.png
 ## Script

 ```bash
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
+./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
+./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
 ```
@@ -81,5 +81,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"

 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -0,0 +1,152 @@
+# CLI Backend Verification
+
+Default surface for verifying **backend changes** (TRPC routers, services,
+models, migrations) end-to-end: fastest loop, text-assertable output, zero UI
+flakiness.
+
+## When to use
+
+- Verifying TRPC router / service / model changes end-to-end
+- Testing new API fields or response structure changes
+- Validating CLI command output after backend modifications
+- Debugging data flow issues between server and CLI
+
+## Prerequisites
+
+| Requirement  | Details                                                                                                                                        |
+| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
+| Dev server   | `localhost:3010` — see [../references/dev-server.md](../references/dev-server.md)                                                              |
+| CLI source   | `apps/cli/` — runs from source, no rebuild; standalone `node_modules` — run `pnpm install` inside `apps/cli/` (root install does not cover it) |
+| CLI dev mode | `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated settings                                                                                          |
+| Auth         | Seeded API key first; Device Code Flow only as fallback — see [../references/auth.md](../references/auth.md)                                   |
+
+All CLI dev commands run from `apps/cli/`. Subsequent examples use `$CLI`:
+
+```bash
+source ../../.records/env/agent-testing-cli.env
+CLI="bun src/index.ts"
+```
+
+## Workflow
+
+### Step 1 — Server up?
+
+See [../references/dev-server.md](../references/dev-server.md) for the health
+check, start, and restart commands. Server-side code changes require a restart.
+
+### Step 2 — Auth ready?
+
+```bash
+./.agents/skills/agent-testing/scripts/setup-auth.sh status
+```
+
+If the CLI is not ready in the seeded local environment:
+
+```bash
+./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+source .records/env/agent-testing-cli.env
+./.agents/skills/agent-testing/scripts/setup-auth.sh cli-seed
+```
+
+If the target environment is not seeded, use the interactive fallback:
+
+```bash
+cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
+```
+
+Seeded API-key auth does not store credentials. It writes local settings under
+`$HOME/.lobehub-dev` and requires the generated env file to be sourced before
+CLI commands. Details:
+[../references/auth.md](../references/auth.md).
+
+### Step 3 — Test with CLI commands
+
+CLI runs from source, so CLI-side code changes take effect immediately without
+rebuilding:
+
+```bash
+cd apps/cli
+$CLI <command>
+```
+
+Capture output for the report as you go (e.g. `$CLI task list | tee "$DIR/assets/task-list.txt"`).
+
+### Step 4 — Clean up test data
+
+```bash
+$CLI task delete < id > -y
+$CLI agent delete < id > -y
+```
+
+### Step 5 — Report
+
+Finish with a structured report —
+[../references/report.md](../references/report.md). CLI evidence = exact
+command + trimmed output.
+
+## Common testing patterns
+
+### Task system
+
+```bash
+$CLI task list
+$CLI task create -n "Root Task" -i "Test instruction"
+$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
+$CLI task view T-1
+$CLI task tree T-1
+$CLI task edit T-1 --status running
+$CLI task comment T-1 -m "Test comment"
+$CLI task delete T-1 -y
+```
+
+### Agent system
+
+```bash
+$CLI agent list
+$CLI agent view <agent-id>
+$CLI agent run <agent-id> -m "Test prompt"
+```
+
+### Document & knowledge base
+
+```bash
+$CLI doc list
+$CLI doc create -t "Test Doc" -c "Content here"
+$CLI doc view <doc-id>
+$CLI kb list
+$CLI kb tree <kb-id>
+```
+
+### Model & provider
+
+```bash
+$CLI model list
+$CLI provider list
+$CLI provider test <provider-id>
+```
+
+## Dev-test cycle
+
+```
+1. Make code changes (service/model/router/type)
+         |
+2. Run unit tests (fast feedback)
+   bunx vitest run --silent='passed-only' '<test-file>'
+         |
+3. Restart dev server (if server-side changes — see dev-server.md)
+         |
+4. CLI verification (end-to-end)
+   $CLI <command>
+         |
+5. Clean up test data + write the report
+```
+
+## Troubleshooting
+
+| Issue                       | Solution                                                                                               |
+| --------------------------- | ------------------------------------------------------------------------------------------------------ |
+| `No authentication found`   | Source `.records/env/agent-testing-cli.env`, or run device-code `login --server http://localhost:3010` |
+| `UNAUTHORIZED` on API calls | Re-run `init-dev-env.sh seed-user` and re-source the env file; for device-code fallback, re-run login  |
+| `ECONNREFUSED`              | Dev server not running — see dev-server.md                                                             |
+| CLI shows old data/behavior | Server needs restart to pick up code changes                                                           |
+| Login opens wrong server    | Must use `--server` flag (env var doesn't work)                                                        |
@@ -0,0 +1,257 @@
+# agent-browser CLI Reference
+
+Generic reference for the `agent-browser` CLI — automate Chromium-based apps (Electron, Chrome, web) via Chrome DevTools Protocol. LobeHub-specific patterns live in [../ui/electron.md](../ui/electron.md) and [../ui/web.md](../ui/web.md); authentication recipes live in [auth.md](./auth.md).
+
+Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
+
+Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
+
+## Core Workflow
+
+Every browser automation follows this pattern:
+
+1. **Navigate**: `agent-browser open <url>`
+2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
+3. **Interact**: Use refs to click, fill, select
+4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
+
+```bash
+agent-browser open https://example.com/form
+agent-browser snapshot -i
+# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
+
+agent-browser fill @e1 "user@example.com"
+agent-browser fill @e2 "password123"
+agent-browser click @e3
+agent-browser wait --load networkidle
+agent-browser snapshot -i # Check result
+```
+
+## Command Chaining
+
+```bash
+# Chain open + wait + snapshot in one call
+agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
+```
+
+Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
+
+## Essential Commands
+
+```bash
+# Navigation
+agent-browser open <url>              # Navigate (aliases: goto, navigate)
+agent-browser close                   # Close browser
+agent-browser close --all             # Close all active sessions
+
+# Snapshot
+agent-browser snapshot -i             # Interactive elements with refs (recommended)
+agent-browser snapshot -s "#selector" # Scope to CSS selector
+
+# Interaction (use @refs from snapshot)
+agent-browser click @e1               # Click element
+agent-browser click @e1 --new-tab     # Click and open in new tab
+agent-browser fill @e2 "text"         # Clear and type text
+agent-browser type @e2 "text"         # Type without clearing
+agent-browser select @e1 "option"     # Select dropdown option
+agent-browser check @e1               # Check checkbox
+agent-browser press Enter             # Press key
+agent-browser keyboard type "text"    # Type at current focus (no selector)
+agent-browser keyboard inserttext "text"  # Insert without key events
+agent-browser scroll down 500         # Scroll page
+agent-browser scroll down 500 --selector "div.content"  # Scroll within container
+
+# Get information
+agent-browser get text @e1            # Get element text
+agent-browser get url                 # Get current URL
+agent-browser get title               # Get page title
+agent-browser get cdp-url             # Get CDP WebSocket URL
+
+# Wait
+agent-browser wait @e1                # Wait for element
+agent-browser wait --load networkidle # Wait for network idle
+agent-browser wait --url "**/page"    # Wait for URL pattern
+agent-browser wait 2000               # Wait milliseconds
+agent-browser wait --text "Welcome"   # Wait for text to appear
+agent-browser wait --fn "!document.body.innerText.includes('Loading...')"  # Wait for text to disappear
+agent-browser wait "#spinner" --state hidden  # Wait for element to disappear
+
+# Downloads
+agent-browser download @e1 ./file.pdf          # Click element to trigger download
+agent-browser wait --download ./output.zip     # Wait for any download to complete
+
+# Network
+agent-browser network requests                 # Inspect tracked requests
+agent-browser network requests --type xhr,fetch  # Filter by resource type
+agent-browser network requests --method POST   # Filter by HTTP method
+agent-browser network route "**/api/*" --abort # Block matching requests
+agent-browser network har start                # Start HAR recording
+agent-browser network har stop ./capture.har   # Stop and save HAR file
+
+# Viewport & Device Emulation
+agent-browser set viewport 1920 1080          # Set viewport size (default: 1280x720)
+agent-browser set viewport 1920 1080 2        # 2x retina
+agent-browser set device "iPhone 14"          # Emulate device (viewport + user agent)
+
+# Capture
+agent-browser screenshot              # Screenshot to temp dir
+agent-browser screenshot --full       # Full page screenshot
+agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
+agent-browser pdf output.pdf          # Save as PDF
+
+# Clipboard
+agent-browser clipboard read          # Read text from clipboard
+agent-browser clipboard write "text"  # Write text to clipboard
+agent-browser clipboard copy          # Copy current selection
+agent-browser clipboard paste         # Paste from clipboard
+
+# Dialogs (alert, confirm, prompt, beforeunload)
+agent-browser dialog accept           # Accept dialog
+agent-browser dialog accept "input"   # Accept prompt dialog with text
+agent-browser dialog dismiss          # Dismiss/cancel dialog
+agent-browser dialog status           # Check if dialog is open
+
+# Diff (compare page states)
+agent-browser diff snapshot                        # Compare current vs last snapshot
+agent-browser diff screenshot --baseline before.png  # Visual pixel diff
+agent-browser diff url <url1> <url2>               # Compare two pages
+
+# Streaming
+agent-browser stream enable           # Start WebSocket streaming
+agent-browser stream status           # Inspect streaming state
+agent-browser stream disable          # Stop streaming
+```
+
+## Batch Execution
+
+```bash
+echo '[
+  ["open", "https://example.com"],
+  ["snapshot", "-i"],
+  ["click", "@e1"],
+  ["screenshot", "result.png"]
+]' | agent-browser batch --json
+```
+
+## Authentication
+
+```bash
+# Option 1: Auth vault (credentials stored encrypted)
+echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
+agent-browser auth login myapp
+
+# Option 2: Session name (auto-save/restore cookies + localStorage)
+agent-browser --session-name myapp open https://app.example.com/login
+agent-browser close                                                       # State auto-saved
+agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
+
+# Option 3: Persistent profile
+agent-browser --profile ~/.myapp open https://app.example.com/login
+
+# Option 4: State file
+agent-browser state save auth.json
+agent-browser state load auth.json
+```
+
+### LobeHub dev server — inject better-auth cookie
+
+`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3010`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [auth.md](./auth.md) for the full recipe.
+
+## Semantic Locators (Alternative to Refs)
+
+```bash
+agent-browser find text "Sign In" click
+agent-browser find label "Email" fill "user@test.com"
+agent-browser find role button click --name "Submit"
+agent-browser find placeholder "Search" type "query"
+agent-browser find testid "submit-btn" click
+```
+
+## JavaScript Evaluation (eval)
+
+```bash
+# Simple expressions
+agent-browser eval 'document.title'
+
+# Complex JS: use --stdin with heredoc (RECOMMENDED)
+agent-browser eval --stdin << 'EVALEOF'
+JSON.stringify(
+  Array.from(document.querySelectorAll("img"))
+    .filter(i => !i.alt)
+    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
+)
+EVALEOF
+
+# Base64 encoding (avoids all shell escaping issues)
+agent-browser eval -b "$(echo -n 'document.title' | base64)"
+```
+
+## Ref Lifecycle
+
+Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
+
+## Annotated Screenshots (Vision Mode)
+
+```bash
+agent-browser screenshot --annotate
+# Output includes the image path and a legend:
+#   [1] @e1 button "Submit"
+#   [2] @e2 link "Home"
+agent-browser click @e2 # Click using ref from annotated screenshot
+```
+
+## Parallel Sessions
+
+```bash
+agent-browser --session site1 open https://site-a.com
+agent-browser --session site2 open https://site-b.com
+agent-browser session list
+```
+
+## Connect to Existing Chrome
+
+```bash
+agent-browser --auto-connect snapshot # Auto-discover running Chrome
+agent-browser --cdp 9222 snapshot     # Explicit CDP port
+```
+
+## iOS Simulator (Mobile Safari)
+
+```bash
+agent-browser device list
+agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
+agent-browser -p ios snapshot -i
+agent-browser -p ios tap @e1
+agent-browser -p ios swipe up
+agent-browser -p ios screenshot mobile.png
+agent-browser -p ios close
+```
+
+## Observability Dashboard
+
+```bash
+agent-browser dashboard install
+agent-browser dashboard start # Background server on port 4848
+agent-browser dashboard stop
+```
+
+## Cloud Providers
+
+Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
+
+## Browser Engine Selection
+
+```bash
+agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
+```
+
+## Gotchas
+
+- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
+- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
+- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
+- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
+- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
+- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
+- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
+- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
@@ -19,13 +19,13 @@ works for any LobeHub streaming session.

 ```bash
 # 1. Start Electron with CDP
-./.agents/skills/local-testing/scripts/electron-dev.sh start
+./.agents/skills/agent-testing/scripts/electron-dev.sh start

 # 2. Navigate to a chat, switch runtime to Cloud Sandbox (gateway mode)

 # 3. Install the probe + helpers
 agent-browser --cdp 9222 eval --stdin \
-  < .agents/skills/local-testing/scripts/agent-gateway/probe.js
+  < .agents/skills/agent-testing/scripts/agent-gateway/probe.js

 # 4. Send a tool-call message — manually or via type+press
 agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
@@ -34,15 +34,15 @@ agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
 #    rightmost inactive tab as AWAY — edit ROUND_TRIPS / DWELL_MS in the
 #    file if you want different timing)
 agent-browser --cdp 9222 eval --stdin \
-  < .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
+  < .agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js

 # 6. Wait for streaming to finish, then dump
 agent-browser --cdp 9222 eval --stdin \
-  < .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js \
+  < .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js \
  > /tmp/probe.json

 # 7. Analyze
-node .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
+node .agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
 ```

 The analyzer prints three sections: EVENTS, TIMELINE, REGRESSIONS. If
@@ -0,0 +1,166 @@
+# Auth Setup for Local Agent Testing
+
+**Auth is the gate for all automated testing.** Complete
+[Step 0.0](../SKILL.md#00-resolve-the-current-test-environment) first so
+`SERVER_URL` and ports are resolved, then verify auth before writing any test
+step.
+
+Initialize helpers first:
+
+```bash
+SCRIPT="./.agents/skills/agent-testing/scripts/setup-auth.sh"
+TEST_ENV="./.agents/skills/agent-testing/scripts/test-env.sh"
+eval "$($TEST_ENV --exports)"
+```
+
+Quick reference after initialization:
+
+| Command                        | Purpose                                            |
+| ------------------------------ | -------------------------------------------------- |
+| `$SCRIPT status`               | Check all surfaces (server + CLI + web + Electron) |
+| `$SCRIPT status --surface web` | Check only the Web surface gate                    |
+| `$SCRIPT cli-seed`             | Configure CLI API-key auth from the seeded key     |
+| `$SCRIPT cli`                  | Interactive CLI device-code login (user must run)  |
+| `$SCRIPT open-chrome`          | Open Chrome at `SERVER_URL` with DevTools          |
+| `$SCRIPT web-seed`             | Sign in the seeded user and inject cookies         |
+| `pbpaste \| $SCRIPT web`       | Inject a copied Cookie header into agent-browser   |
+| `$SCRIPT web-verify`           | Live-check agent-browser session auth              |
+
+Use `localhost` for Web auth; better-auth cookies are stored for `localhost`,
+not `127.0.0.1`.
+
+## Per-surface overview
+
+| Surface  | Mechanism                                | Persistence                                                       | Human interaction                              |
+| -------- | ---------------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------- |
+| CLI      | Seeded API key or OIDC Device Code Flow  | `.records/env/agent-testing-cli.env` + `$HOME/.lobehub-dev`       | No for seed path; yes for device-code fallback |
+| Web      | Seeded better-auth login or cookie copy  | `~/.lobehub-agent-testing/web-state.json` + agent-browser session | No for seed path; copy cookie only as fallback |
+| Electron | App's own login state                    | Electron user-data dir                                            | Log in once manually in the app                |
+| Bot      | Native apps (Discord/WeChat/…) logged in | Each app's own session                                            | Once per app                                   |
+
+## CLI — Seeded API key
+
+For the self-contained no-root-`.env` dev environment, seed the baseline user
+and API key once:
+
+```bash
+./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+source .records/env/agent-testing-cli.env
+./.agents/skills/agent-testing/scripts/setup-auth.sh cli-seed
+```
+
+The seed step writes `LOBE_API_KEY` for humans and maps it to the CLI's current
+auth variable, `LOBEHUB_CLI_API_KEY`. It also sets `LOBEHUB_SERVER` so CLI
+commands hit the local server without needing a stored device-code token.
+
+Use this for automated CLI verification:
+
+```bash
+cd apps/cli
+source ../../.records/env/agent-testing-cli.env
+bun src/index.ts <command>
+```
+
+## CLI — Device Code Flow fallback
+
+Use device-code login only when testing against a non-seeded environment.
+Credentials are isolated from the user's real CLI config via
+`LOBEHUB_CLI_HOME=.lobehub-dev`, which the current CLI stores under
+`$HOME/.lobehub-dev`.
+
+```bash
+cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
+```
+
+- The `--server` flag is required — an env var does NOT work and login will hit
+  the wrong server without it.
+- Check state without logging in: `setup-auth.sh status` (verifies
+  `LOBEHUB_CLI_API_KEY` when present, otherwise checks the stored server URL).
+- `UNAUTHORIZED` on API calls means the token expired — re-run login.
+
+## Web — seeded better-auth login
+
+The Web test surface is `agent-browser --session lobehub-dev`. The user's
+ordinary Chrome is only a cookie source; Chrome screenshots, Chrome Network
+records, and Chrome logged-in state do not prove the agent-browser test session
+is authenticated.
+
+For the seeded local dev environment, use the automatic path:
+
+```bash
+./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+./.agents/skills/agent-testing/scripts/setup-auth.sh web-seed
+```
+
+`web-seed` posts the seeded email/password to
+`/api/auth/sign-in/email`, stores the returned cookie jar under
+`~/.lobehub-agent-testing/`, converts it to Playwright `storageState`, loads it
+into the `agent-browser` session, and verifies the session does not land on
+`/signin`.
+
+## Web — manual cookie injection fallback
+
+`agent-browser --headed` on macOS often creates the Chromium window off-screen —
+the user can't see or interact with it, so manual login inside the agent-browser
+session fails. Instead, copy the **better-auth session cookie** out of the
+user's own logged-in Chrome and inject it as a Playwright-style state file.
+
+Do **not** use this on production URLs — only local dev. Treat the cookie as a
+secret: don't paste it into shared logs, PRs, or commit it anywhere.
+
+### Web — decision flow
+
+1. `$SCRIPT status --surface web` — green? Start testing. Do not ask for a Cookie header.
+2. Not green and using the seeded local env → `$SCRIPT web-seed`.
+3. Still not green or not using the seed env → `$SCRIPT open-chrome` opens Chrome at `SERVER_URL` with DevTools.
+4. User copies the `Cookie:` header from Network tab → any same-origin request → Request Headers → right-click `Cookie:` → **Copy value**. Must be from Network, NOT `document.cookie` (HttpOnly cookies are invisible to `document.cookie`).
+5. `pbpaste | $SCRIPT web` — filters to better-auth cookies (`session_token`, `session_data`, `state`), builds Playwright `storageState`, loads it into the `agent-browser` session (`lobehub-dev`), opens `SERVER_URL`, and asserts the URL is not `/signin`.
+
+### Using the authenticated session
+
+```bash
+agent-browser --session lobehub-dev open "$SERVER_URL/"
+agent-browser --session lobehub-dev snapshot -i | head -20
+```
+
+### Notes
+
+- `storageState` doesn't enforce the HttpOnly flag on load — the script stores
+  cookies with `httpOnly: false`, which is fine for local dev and sidesteps a
+  CDP-context quirk where HttpOnly cookies sometimes fail to attach.
+- The state file is kept at `~/.lobehub-agent-testing/web-state.json` so
+  `setup-auth.sh status` can report web-auth readiness across sessions.
+
+### Common failure modes
+
+| Symptom                                       | Cause                                                                     | Fix                                                                                            |
+| --------------------------------------------- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
+| Still redirects to `/signin` after injection  | User pasted from `document.cookie` → missed HttpOnly session              | Re-pull from Network request Headers, not console                                              |
+| Script reports `no better-auth cookies found` | User pasted the wrong value, or the cookie parser regressed               | Keep the raw `Cookie:` header as-is; run `scripts/setup-auth.test.sh` if the input looks valid |
+| Login works briefly then expires              | `better-auth.session_token` rotated (user logged out / signed in again)   | Re-copy and re-inject                                                                          |
+| Domain mismatch                               | Cookie domain must be `localhost` literally, no leading dot for local dev | —                                                                                              |
+
+## Electron
+
+The desktop app keeps its own persistent login state in its user-data
+directory — log in once manually inside the app and it survives restarts of
+`electron-dev.sh`. No injection needed. The standard check (do NOT hand-roll a
+store eval) once Electron is up with CDP:
+
+```bash
+./.agents/skills/agent-testing/scripts/app-probe.sh auth
+# → {"ok":true,"isSignedIn":true,"userId":"user_xxx"}
+```
+
+`setup-auth.sh status` runs this probe automatically when CDP 9222 is
+reachable.
+
+## Scope
+
+These recipes only cover **local dev** authentication. They do not:
+
+- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com`
+  and must be delivered over HTTPS.
+- Replace real OAuth flows — tests that must exercise the login UI itself need a
+  real Chromium with `--remote-debugging-port` or a bot account.
+- Flow cookies back to the user's Chrome — injection is one-way.
@@ -0,0 +1,98 @@
+# Local Dev Server
+
+Single source of truth for starting / restarting the backend that all test
+surfaces (CLI, Electron, Web) hit.
+
+## Resolve ports first
+
+Run `test-env.sh` as described in
+[SKILL.md Step 0.0](../SKILL.md#00-resolve-the-current-test-environment)
+before starting or probing any local test surface.
+
+## Ports & modes
+
+| Command             | What it runs                                              | Port source         |
+| ------------------- | --------------------------------------------------------- | ------------------- |
+| `pnpm run dev:next` | Next.js backend (API + auth)                              | `PORT`              |
+| `bun run dev`       | Full-stack (Next.js + Vite SPA, via `devStartupSequence`) | `PORT` + `SPA_PORT` |
+| `bun run dev:spa`   | Vite SPA only, proxies API to `PORT`                      | `SPA_PORT`          |
+
+In the **cloud repo** (where this repo is the `lobehub/` submodule), local
+worktree names map to fallback defaults only when `.env` and shell env do not
+provide values:
+
+| Workspace directory | Default `SERVER_URL`             |
+| ------------------- | -------------------------------- |
+| `lobehub`           | `http://localhost:3010`          |
+| `lobehub-cloud`     | `http://localhost:3020`          |
+| `lobehub-cloud-1`   | `http://localhost:3021`          |
+| `lobehub-cloud-N`   | `http://localhost:$((3020 + N))` |
+
+`test-env.sh` and `setup-auth.sh` both use the resolved env first and these
+worktree defaults only as fallback. Treat the dev-server terminal output as the
+final source of truth when testing a non-standard port, then export it for every
+agent-testing command:
+
+```bash
+export SERVER_URL=http://localhost:<port-from-dev-output>
+```
+
+## Health check
+
+```bash
+curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/"
+```
+
+## Start / restart
+
+```bash
+# Start backend only.
+# With root .env: use the existing local config.
+pnpm run dev:next
+
+# Without root .env: use the self-contained agent-testing env.
+./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
+
+# Full-stack SPA + backend. Required for Web smoke.
+# With root .env:
+bun run dev
+
+# Without root .env:
+./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
+
+# Local QStash. Run in a separate terminal only when testing workflow paths.
+./.agents/skills/agent-testing/scripts/init-dev-env.sh qstash
+
+# Restart — required to pick up server-side code changes
+lsof -ti:"$PORT" | xargs kill
+pnpm run dev:next
+# or, when no root .env exists:
+# ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
+```
+
+## When a server restart is needed
+
+Next.js hot-reload may not pick up changes in workspace packages — restart when
+in doubt.
+
+| Change location                                 | Restart? |
+| ----------------------------------------------- | -------- |
+| `apps/server/src/` (routers, services, modules) | Yes      |
+| `src/server/` (agent-hono, workflows-hono)      | Yes      |
+| `packages/database/` (models)                   | Yes      |
+| `packages/types/`                               | Yes      |
+| `packages/prompts/`                             | Yes      |
+| `apps/cli/` (CLI runs from source)              | No       |
+
+## Troubleshooting
+
+| Issue                     | Solution                                                                                      |
+| ------------------------- | --------------------------------------------------------------------------------------------- |
+| `ECONNREFUSED`            | Server not running — start it                                                                 |
+| `EADDRINUSE` on the port  | Already running — `lsof -ti:<port> \| xargs kill` first                                       |
+| Stale data / old behavior | Server needs a restart to pick up code changes                                                |
+| QStash workflow failures  | Start `init-dev-env.sh qstash` and make sure dev server inherited the script's `QSTASH_*` env |
+
+Marketplace/community endpoints are not part of the local agent-testing auth
+gate. Do not block local product-chain verification on marketplace API auth
+unless the change explicitly targets marketplace behavior.
@@ -12,13 +12,13 @@ General-purpose screen recording tool for the Electron app. Captures CDP screens

 ```bash
 # Start recording (Electron must be running with CDP)
-.agents/skills/local-testing/scripts/record-app-screen.sh start [output_name]
+.agents/skills/agent-testing/scripts/record-app-screen.sh start [output_name]

 # Stop recording and assemble video
-.agents/skills/local-testing/scripts/record-app-screen.sh stop
+.agents/skills/agent-testing/scripts/record-app-screen.sh stop

 # Check if recording is active
-.agents/skills/local-testing/scripts/record-app-screen.sh status
+.agents/skills/agent-testing/scripts/record-app-screen.sh status
 ```

 ### Arguments
@@ -74,10 +74,10 @@ The `.records/` directory is at the project root and is gitignored.

 ```bash
 # Start Electron
-.agents/skills/local-testing/scripts/electron-dev.sh start
+.agents/skills/agent-testing/scripts/electron-dev.sh start

 # Start recording
-.agents/skills/local-testing/scripts/record-app-screen.sh start my-test
+.agents/skills/agent-testing/scripts/record-app-screen.sh start my-test

 # Run automation
 agent-browser --cdp 9222 click @e61
@@ -86,14 +86,14 @@ agent-browser --cdp 9222 press Enter
 sleep 10

 # Stop and get results
-.agents/skills/local-testing/scripts/record-app-screen.sh stop
+.agents/skills/agent-testing/scripts/record-app-screen.sh stop
 # → .records/my-test.mp4 + .records/my-test/*.png
 ```

 ### Gateway Streaming Demo

 ```bash
-.agents/skills/local-testing/scripts/electron-dev.sh start
+.agents/skills/agent-testing/scripts/electron-dev.sh start

 # Inject gateway URL
 agent-browser --cdp 9222 eval --stdin << 'EOF'
@@ -106,19 +106,19 @@ agent-browser --cdp 9222 eval --stdin << 'EOF'
 EOF

 # Record
-.agents/skills/local-testing/scripts/record-app-screen.sh start gateway-demo
+.agents/skills/agent-testing/scripts/record-app-screen.sh start gateway-demo

 # Navigate to agent, send message, wait for completion...
 # (automation commands here)

-.agents/skills/local-testing/scripts/record-app-screen.sh stop
+.agents/skills/agent-testing/scripts/record-app-screen.sh stop
 open .records/gateway-demo.mp4
 ```

 ### Check Active Recording

 ```bash
-.agents/skills/local-testing/scripts/record-app-screen.sh status
+.agents/skills/agent-testing/scripts/record-app-screen.sh status
 # [record] Active recording
 #   Frames:      42 captured (running: yes)
 #   Screenshots: 14 captured (running: yes)
@@ -0,0 +1,186 @@
+# Structured Test Reports
+
+Every automated test session ends with a structured, evidence-backed report.
+A chat-only summary is not an acceptable deliverable: the report is what the
+user (or a reviewer, or a later agent) audits without replaying the session.
+
+## Location & layout
+
+Reports live under `.records/reports/` (gitignored, like all `.records/`
+output):
+
+```
+.records/reports/<YYYYMMDD-HHMMSS>-<slug>/
+├── report.md      # human-readable report (case table with inline screenshots, verdict)
+├── result.json    # machine-readable results (pass/fail counts, score)
+└── assets/        # evidence: screenshots, HAR files, CLI transcripts
+```
+
+## Workflow
+
+1. **Scaffold up front** — before running the first test step:
+
+   ```bash
+   DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh < slug > "<title>")
+   ```
+
+   The script creates the directory, pre-fills branch / commit / date in both
+   files, and prints the directory path. The scaffold uses the compact report
+   shape below; translate its headings and table labels to the user's language
+   before delivery if needed.
+
+2. **Collect evidence as you test** — every asserted behavior gets one evidence
+   item in `$DIR/assets/`:
+   - UI (static state): `agent-browser screenshot` or `capture-app-window.sh`,
+     then **verify the screenshot with the Read tool before citing it** —
+     never cite an image you haven't looked at.
+
+   - UI (time-based behavior): **screenshot vs GIF is a judgment you must
+     make per case.** If the assertion is about change over time — streaming
+     output, a ticking timer, loading/progress states, animations,
+     appear/disappear transitions — a static screenshot cannot prove it.
+     Record a frame sequence and synthesize a GIF:
+
+     ```bash
+     # start recording (background), trigger the behavior, wait for it to finish
+     ../scripts/record-gif.sh "$DIR/assets/case2-streaming.gif" 12 2 &
+     GIF_PID=$!
+     # ... drive the scenario ...
+     wait $GIF_PID
+     ```
+
+     Embed it like an image: `![case 2](assets/case2-streaming.gif)`. Verify
+     at least the first/last frames visually (Read the GIF) before citing.
+
+   - CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
+
+   - Network: `agent-browser network requests` dumps or HAR files.
+
+3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
+   The primary evidence belongs in the case table itself: each row should pair
+   the assertion with the screenshot/GIF or non-visual artifact that proves it,
+   so readers can scan the result without jumping between sections. UI evidence
+   must render inline with Markdown image syntax; a plain link or file path is
+   not acceptable as primary visual evidence.
+
+4. **Set the verdict** in both `report.md` and `result.json`, then link the
+   report directory in your final answer to the user. If UI evidence exists,
+   list the key screenshot/GIF links in the final chat response. Use Markdown
+   link text as the evidence caption, for example:
+   `[Image #1 - observed outcome](<report-dir>/assets/case1.png)`.
+
+## Report language (hard rule)
+
+**`report.md` MUST be written in the language the user is conversing in** —
+the whole file, headings included. If the conversation is in Chinese, the
+report is in Chinese; do not mix English prose into it. The scaffold headings
+are placeholders — translate them when filling if the user is not conversing in
+the scaffold language. Exceptions that stay as-is: code/commands, identifiers,
+log excerpts, and `result.json` (its keys and status values are machine-read
+and stay English; the `title` and case `name` fields follow the user's
+language).
+
+## report.md sections
+
+Default report shape:
+
+| Section          | Content                                                                                      |
+| ---------------- | -------------------------------------------------------------------------------------------- |
+| **Scope**        | What changed / what is being verified; branch, commit, date, surface, entry URL/page, focus  |
+| **Cases**        | Compact table: `# \| Case \| Result \| Key observation \| Evidence`                          |
+| **Verdict**      | Overall verdict first (`pass` / `partial` / `fail`), then the concise reasons and follow-ups |
+| **Verification** | Commands or automated checks run in this session, with trimmed results                       |
+| **Score**        | Pass/fail/blocked counts, optional 0–100 score                                               |
+
+The case table is the main reading surface. Prefer one clear row per user
+scenario or regression assertion, and put the screenshot/GIF directly in the
+`Evidence` cell:
+
+```markdown
+| #   | Case                     | Result | Key observation                                                   | Evidence                                         |
+| --- | ------------------------ | ------ | ----------------------------------------------------------------- | ------------------------------------------------ |
+| 1   | Create a new page        | pass   | Title and body persisted after refresh                            | ![created page](assets/new-page-created.png)     |
+| 2   | Respect requested length | fail   | Requested about 600 Chinese characters; final body was about 1286 | ![final article](assets/write-article-final.png) |
+```
+
+## Inline visual evidence
+
+Screenshots and GIFs must be embedded so the report shows the image inline:
+
+```markdown
+![case 1 result](assets/case1-result.png)
+![streaming response](assets/case2-streaming.gif)
+```
+
+Do **not** use these as the primary evidence for UI cases:
+
+```markdown
+[case 1 result](assets/case1-result.png)
+assets/case1-result.png
+file:///tmp/case1-result.png
+```
+
+Links are acceptable for non-visual artifacts such as CLI transcripts, HAR
+files, or long logs. For videos, embed a representative screenshot/GIF inline in
+the case row and link the full video as supplemental evidence.
+
+Avoid the old wide table with separate `steps`, `expected`, and `actual`
+columns unless the test is purely non-visual and truly needs that breakdown.
+For UI reports, those columns make screenshot-backed reading harder. Put
+procedural detail in the row's key observation only when it changes the
+interpretation of the result.
+
+Use an extra evidence/detail section only when the inline table cannot carry
+the material cleanly, such as long CLI transcripts, HAR summaries, or multiple
+screenshots for one case. In that situation, keep the table evidence cell as an
+inline visual proof for UI cases or a concise link for non-visual artifacts,
+then put the longer material under `Verification` or a brief
+`Additional Evidence` section.
+
+Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
+missing; a blocked case is not a pass).
+
+## result.json schema
+
+```json
+{
+  "branch": "feat/task-tree",
+  "cases": [
+    {
+      "id": "1",
+      "name": "task tree returns nested children",
+      "surface": "cli",
+      "status": "pass",
+      "evidence": ["assets/task-tree.txt"]
+    }
+  ],
+  "commit": "abc1234",
+  "createdAt": "2026-06-11T15:30:00+08:00",
+  "summary": {
+    "total": 1,
+    "passed": 1,
+    "failed": 0,
+    "blocked": 0,
+    "score": 100,
+    "verdict": "pass"
+  },
+  "surfaces": ["cli"],
+  "title": "Verify task tree API"
+}
+```
+
+`score` is optional — use it when the verdict has a subjective component (UI
+polish, copy quality); omit it for purely binary runs. `verdict` is the single
+word the user reads first: `pass`, `fail`, or `partial`.
+
+## Rules
+
+- **No evidence, no claim** — every `pass`/`fail` in the case table must link
+  at least one asset. UI cases must inline-embed their primary screenshot/GIF;
+  non-visual CLI/network cases may link transcripts, HAR files, or logs.
+- **Screenshots must be visually verified** with the Read tool before being
+  cited.
+- **Report failures faithfully** — a failing case with clear evidence is a good
+  report; a vague green one is not.
+- If coverage was cut (cases skipped, surfaces not exercised), say so in the
+  Verdict section — silent truncation reads as "covered everything".
@@ -11,7 +11,7 @@
 //   6. ROLLBACKS — msgN / childN / role drops in the active-topic timeline
 //
 // Usage:
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts <dump.json>

 import { readFileSync } from 'node:fs';

@@ -5,16 +5,16 @@
 // streaming-replay test fixtures.
 //
 // Commands:
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts install
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts install
 //       Bundle probe-events.ts and inject into the CDP-attached browser.
 //       Re-installing clears all buffers and re-patches WebSocket / fetch.
 //
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts dump [name]
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts dump [name]
 //       Stop the timeline timer, fetch the capture as JSON, write it to
 //       `.agent-gateway/<name>-<YYYYMMDD-HHmmss>.json`. `name` defaults to
 //       `dump`. Prints the absolute path written.
 //
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts analyze [path]
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts analyze [path]
 //       Run analyze-events.ts on the dump. `path` defaults to the most
 //       recently modified file in `.agent-gateway/`.
 //
@@ -28,7 +28,7 @@ import path from 'node:path';
 import { fileURLToPath } from 'node:url';

 const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
-// .agents/skills/local-testing/scripts/agent-gateway/ → 5 levels up
+// .agents/skills/agent-testing/scripts/agent-gateway/ → 5 levels up
 const PROJECT_ROOT = path.resolve(SCRIPT_DIR, '../../../../..');
 const DUMP_DIR = path.join(PROJECT_ROOT, '.agent-gateway');

@@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+# app-probe.sh — standardized probes for a running LobeHub app (Electron via
+# CDP, or a web agent-browser session). Use these instead of hand-rolling
+# `window.__LOBE_STORES` eval snippets — especially the auth check.
+#
+# Usage:
+#   app-probe.sh auth              # { isSignedIn, userId } from the user store
+#   app-probe.sh route             # current SPA route
+#   app-probe.sh ops               # running chat operations (type / status / startTime)
+#   app-probe.sh goto <path>       # navigate the SPA to a route (full reload), e.g. goto /agent/agt_xxx
+#   app-probe.sh errors-install    # install a console.error interceptor
+#   app-probe.sh errors            # dump errors captured since errors-install
+#
+# Target selection (default: Electron over CDP 9222):
+#   AB_TARGET="--cdp 9222"             # Electron (default; CDP_PORT also honored)
+#   AB_TARGET="--session lobehub-dev"  # web agent-browser session
+#
+# Common routes (desktop SPA): /  /agent/<agentId>  /agent/<agentId>/<topicId>
+#   /task  /task/<taskId>  /page  /settings  /community
+
+set -euo pipefail
+
+AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
+
+run_eval() {
+  # shellcheck disable=SC2086
+  agent-browser $AB_TARGET eval --stdin
+}
+
+case "${1:-}" in
+  auth)
+    run_eval << 'EVALEOF'
+(function () {
+  var stores = window.__LOBE_STORES;
+  if (!stores || !stores.user) return JSON.stringify({ ok: false, reason: 'no user store — app not loaded yet?' });
+  var u = stores.user();
+  return JSON.stringify({ ok: !!u.isSignedIn, isSignedIn: !!u.isSignedIn, userId: (u.user && u.user.id) || null });
+})()
+EVALEOF
+    ;;
+  route)
+    run_eval << 'EVALEOF'
+location.pathname + location.search + location.hash
+EVALEOF
+    ;;
+  ops)
+    run_eval << 'EVALEOF'
+(function () {
+  var stores = window.__LOBE_STORES;
+  if (!stores || !stores.chat) return JSON.stringify({ ok: false, reason: 'no chat store — open a conversation first' });
+  var ops = Object.values(stores.chat().operations || {});
+  var running = ops.filter(function (o) { return o.status === 'running'; });
+  return JSON.stringify({
+    ok: true,
+    running: running.map(function (o) { return { startTime: o.metadata && o.metadata.startTime, type: o.type }; }),
+    runningCount: running.length,
+    total: ops.length,
+  });
+})()
+EVALEOF
+    ;;
+  goto)
+    TARGET_PATH="${2:?Usage: app-probe.sh goto <path>}"
+    # shellcheck disable=SC2086
+    agent-browser $AB_TARGET eval "location.href = '$TARGET_PATH'" > /dev/null
+    sleep 2
+    bash "${BASH_SOURCE[0]}" route
+    ;;
+  errors-install)
+    run_eval << 'EVALEOF'
+(function () {
+  window.__CAPTURED_ERRORS = [];
+  var orig = console.error;
+  console.error = function () {
+    var msg = Array.from(arguments).map(function (a) {
+      if (a instanceof Error) return a.message;
+      return typeof a === 'object' ? JSON.stringify(a) : String(a);
+    }).join(' ');
+    window.__CAPTURED_ERRORS.push(msg);
+    orig.apply(console, arguments);
+  };
+  return 'installed';
+})()
+EVALEOF
+    ;;
+  errors)
+    run_eval << 'EVALEOF'
+JSON.stringify(window.__CAPTURED_ERRORS || 'interceptor not installed — run errors-install first')
+EVALEOF
+    ;;
+  *)
+    echo "Usage: $0 {auth|route|ops|goto <path>|errors-install|errors}" >&2
+    exit 2
+    ;;
+esac
@@ -0,0 +1,407 @@
+#!/usr/bin/env bash
+# init-dev-env.sh — self-contained local dev env for agent testing.
+#
+# This script initializes the env needed to run LobeHub's normal local dev
+# server without depending on a root .env file. It follows the same shape as
+# the e2e bootstrap (Postgres + migrations + auth/key-vault/S3 test env), but
+# starts the repo's dev server, not the standalone e2e server.
+#
+# Guardrail: if repo-root .env exists, every non-help command exits immediately.
+# Existing local config always wins.
+#
+# Usage:
+#   init-dev-env.sh env              # print shell exports
+#   init-dev-env.sh write [file]     # write a source-able env file
+#   init-dev-env.sh setup-db         # start local Postgres and run migrations
+#   init-dev-env.sh migrate          # run DB migrations against the configured DB
+#   init-dev-env.sh seed-user        # seed the baseline test user + CLI API key
+#   init-dev-env.sh qstash           # run local Upstash QStash dev server
+#   init-dev-env.sh dev-next         # exec `pnpm run dev:next` with this env
+#   init-dev-env.sh dev              # exec `bun run dev` with this env
+#   init-dev-env.sh clean-db         # remove the managed Postgres container
+#
+# Overrides:
+#   SERVER_PORT=3010 DB_PORT=5433 DB_CONTAINER=lobehub-agent-testing-postgres QSTASH_DEV_PORT=8080
+
+set -euo pipefail
+
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
+ROOT_ENV_FILE="$REPO_ROOT/.env"
+
+SERVER_PORT="${SERVER_PORT:-3010}"
+DB_PORT="${DB_PORT:-5433}"
+DB_CONTAINER="${DB_CONTAINER:-lobehub-agent-testing-postgres}"
+DATABASE_URL="${DATABASE_URL:-postgresql://postgres:postgres@localhost:${DB_PORT}/postgres}"
+ENV_FILE_DEFAULT="$REPO_ROOT/.records/env/agent-testing-dev.env"
+CLI_ENV_FILE_DEFAULT="$REPO_ROOT/.records/env/agent-testing-cli.env"
+AGENT_TESTING_API_KEY="${AGENT_TESTING_API_KEY:-sk-lh-agenttesting0001}"
+QSTASH_DEV_PORT="${QSTASH_DEV_PORT:-8080}"
+QSTASH_LOCAL_TOKEN="${QSTASH_LOCAL_TOKEN:-eyJVc2VySUQiOiJkZWZhdWx0VXNlciIsIlBhc3N3b3JkIjoiZGVmYXVsdFBhc3N3b3JkIn0=}"
+QSTASH_LOCAL_CURRENT_SIGNING_KEY="${QSTASH_LOCAL_CURRENT_SIGNING_KEY:-sig_7kYjw48mhY7kAjqNGcy6cr29RJ6r}"
+QSTASH_LOCAL_NEXT_SIGNING_KEY="${QSTASH_LOCAL_NEXT_SIGNING_KEY:-sig_5ZB6DVzB1wjE8S6rZ7eenA8Pdnhs}"
+
+ok() { printf '  \033[32m✔\033[0m %s\n' "$1"; }
+bad() { printf '  \033[31m✘\033[0m %s\n' "$1"; }
+note() { printf '      %s\n' "$1"; }
+
+guard_no_root_env() {
+  if [[ -f "$ROOT_ENV_FILE" ]]; then
+    bad "root .env exists: $ROOT_ENV_FILE"
+    note "Use the existing local configuration instead of init-dev-env.sh."
+    note "Start normally from repo root, e.g. pnpm run dev:next or bun run dev."
+    exit 1
+  fi
+}
+
+apply_env() {
+  export APP_URL="${APP_URL:-http://localhost:${SERVER_PORT}}"
+  export AUTH_EMAIL_VERIFICATION="${AUTH_EMAIL_VERIFICATION:-0}"
+  export AUTH_SECRET="${AUTH_SECRET:-agent-testing-local-auth-secret-32chars}"
+  export DATABASE_DRIVER="${DATABASE_DRIVER:-node}"
+  export DATABASE_URL
+  export FEATURE_FLAGS="${FEATURE_FLAGS:--agent_self_iteration}"
+  export KEY_VAULTS_SECRET="${KEY_VAULTS_SECRET:-r2gbBPKyJ8ZRKCLKt+I3DImfcL+wGxaQyRC56xtm9Uk=}"
+  export NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION="${NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION:-0}"
+  export NODE_OPTIONS="${NODE_OPTIONS:---max-old-space-size=6144}"
+  export PORT="${PORT:-$SERVER_PORT}"
+  export QSTASH_CURRENT_SIGNING_KEY="${QSTASH_CURRENT_SIGNING_KEY:-$QSTASH_LOCAL_CURRENT_SIGNING_KEY}"
+  export QSTASH_DEV_PORT
+  export QSTASH_NEXT_SIGNING_KEY="${QSTASH_NEXT_SIGNING_KEY:-$QSTASH_LOCAL_NEXT_SIGNING_KEY}"
+  export QSTASH_TOKEN="${QSTASH_TOKEN:-$QSTASH_LOCAL_TOKEN}"
+  export QSTASH_URL="${QSTASH_URL:-http://127.0.0.1:${QSTASH_DEV_PORT}}"
+  export S3_ACCESS_KEY_ID="${S3_ACCESS_KEY_ID:-agent-testing-access-key}"
+  export S3_BUCKET="${S3_BUCKET:-agent-testing-bucket}"
+  export S3_ENDPOINT="${S3_ENDPOINT:-https://agent-testing-s3.localhost}"
+  export S3_SECRET_ACCESS_KEY="${S3_SECRET_ACCESS_KEY:-agent-testing-secret-key}"
+}
+
+env_keys() {
+  printf '%s\n' \
+    APP_URL \
+    AUTH_EMAIL_VERIFICATION \
+    AUTH_SECRET \
+    DATABASE_DRIVER \
+    DATABASE_URL \
+    FEATURE_FLAGS \
+    KEY_VAULTS_SECRET \
+    NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION \
+    NODE_OPTIONS \
+    PORT \
+    QSTASH_CURRENT_SIGNING_KEY \
+    QSTASH_DEV_PORT \
+    QSTASH_NEXT_SIGNING_KEY \
+    QSTASH_TOKEN \
+    QSTASH_URL \
+    S3_ACCESS_KEY_ID \
+    S3_BUCKET \
+    S3_ENDPOINT \
+    S3_SECRET_ACCESS_KEY
+}
+
+print_env() {
+  apply_env
+  while IFS= read -r key; do
+    printf 'export %s=%q\n' "$key" "${!key}"
+  done < <(env_keys)
+}
+
+write_env() {
+  local file="${1:-$ENV_FILE_DEFAULT}"
+  apply_env
+  mkdir -p "$(dirname "$file")"
+  {
+    printf '# Source this file before starting LobeHub local dev server.\n'
+    printf '# Generated by %s\n' "$0"
+    while IFS= read -r key; do
+      printf 'export %s=%q\n' "$key" "${!key}"
+    done < <(env_keys)
+  } > "$file"
+  ok "wrote env file: $file"
+  note "source it with: source $file"
+}
+
+require_docker() {
+  if ! command -v docker > /dev/null 2>&1; then
+    bad "docker CLI is not available"
+    note "Install/start Docker Desktop, or provide DATABASE_URL for an existing Postgres."
+    return 1
+  fi
+}
+
+wait_for_db() {
+  printf '      waiting for Postgres'
+  until docker exec "$DB_CONTAINER" pg_isready -U postgres > /dev/null 2>&1; do
+    printf '.'
+    sleep 2
+  done
+  printf '\n'
+}
+
+start_db() {
+  require_docker
+
+  if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
+    ok "Postgres container already running: $DB_CONTAINER"
+  elif docker ps -a --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
+    docker start "$DB_CONTAINER" > /dev/null
+    ok "started existing Postgres container: $DB_CONTAINER"
+  else
+    docker run -d \
+      --name "$DB_CONTAINER" \
+      -e POSTGRES_PASSWORD=postgres \
+      -p "${DB_PORT}:5432" \
+      paradedb/paradedb:latest > /dev/null
+    ok "created Postgres container: $DB_CONTAINER"
+  fi
+
+  wait_for_db
+}
+
+migrate_db() {
+  apply_env
+  cd "$REPO_ROOT"
+  bun run db:migrate
+}
+
+seed_user() {
+  apply_env
+  export AGENT_TESTING_API_KEY
+  export AGENT_TESTING_CLI_ENV_FILE="${AGENT_TESTING_CLI_ENV_FILE:-$CLI_ENV_FILE_DEFAULT}"
+  cd "$REPO_ROOT"
+  node <<'NODE'
+const bcrypt = require('bcryptjs');
+const crypto = require('node:crypto');
+const fs = require('node:fs');
+const path = require('node:path');
+const pg = require('pg');
+
+const databaseUrl = process.env.DATABASE_URL;
+if (!databaseUrl) {
+  throw new Error('DATABASE_URL is required to seed the baseline test user.');
+}
+
+const TEST_USER = {
+  email: 'agent-testing@lobehub.com',
+  fullName: 'Agent Testing User',
+  id: 'user_agent_testing_001',
+  password: 'TestPassword123!',
+  username: 'agent_testing_user',
+};
+
+const TEST_API_KEY = {
+  id: 'api_key_agent_testing_001',
+  key: process.env.AGENT_TESTING_API_KEY || 'sk-lh-agenttesting0001',
+  name: 'Agent Testing CLI API Key',
+};
+
+const validateApiKeyFormat = (apiKey) => /^sk-lh-[\da-z]{16}$/.test(apiKey);
+
+const hashApiKey = (apiKey) => {
+  const secret = process.env.KEY_VAULTS_SECRET;
+  if (!secret) throw new Error('KEY_VAULTS_SECRET is required to seed the baseline API key.');
+
+  return crypto.createHmac('sha256', secret).update(apiKey).digest('hex');
+};
+
+const encryptWithKeyVaultsSecret = (plaintext) => {
+  const secret = process.env.KEY_VAULTS_SECRET;
+  if (!secret) throw new Error('KEY_VAULTS_SECRET is required to seed the baseline API key.');
+
+  const rawKey = Buffer.from(secret, 'base64');
+  if (![16, 24, 32].includes(rawKey.length)) {
+    throw new Error(
+      `KEY_VAULTS_SECRET must decode to 16, 24, or 32 bytes, got ${rawKey.length} bytes.`,
+    );
+  }
+
+  const iv = crypto.randomBytes(12);
+  const cipher = crypto.createCipheriv(`aes-${rawKey.length * 8}-gcm`, rawKey, iv);
+  const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
+  const authTag = cipher.getAuthTag();
+
+  return `${iv.toString('hex')}:${authTag.toString('hex')}:${encrypted.toString('hex')}`;
+};
+
+const writeCliEnvFile = () => {
+  const file = process.env.AGENT_TESTING_CLI_ENV_FILE || '.records/env/agent-testing-cli.env';
+  fs.mkdirSync(path.dirname(file), { recursive: true });
+  fs.writeFileSync(
+    file,
+    [
+      '# Source this file before running LobeHub CLI agent tests.',
+      '# Generated by init-dev-env.sh seed-user',
+      `export LOBE_API_KEY=${TEST_API_KEY.key}`,
+      `export LOBEHUB_CLI_API_KEY="${'${LOBE_API_KEY}'}"`,
+      `export LOBEHUB_SERVER=${process.env.APP_URL}`,
+      'export LOBEHUB_CLI_HOME=.lobehub-dev',
+      '',
+    ].join('\n'),
+  );
+
+  return file;
+};
+
+const client = new pg.Client({ connectionString: databaseUrl });
+
+(async () => {
+  if (!validateApiKeyFormat(TEST_API_KEY.key)) {
+    throw new Error(`Invalid AGENT_TESTING_API_KEY format: ${TEST_API_KEY.key}`);
+  }
+
+  await client.connect();
+  const now = new Date().toISOString();
+  const onboarding = JSON.stringify({ finishedAt: now, version: 1 });
+  const passwordHash = await bcrypt.hash(TEST_USER.password, 10);
+  const encryptedApiKey = encryptWithKeyVaultsSecret(TEST_API_KEY.key);
+  const apiKeyHash = hashApiKey(TEST_API_KEY.key);
+
+  await client.query(
+    `INSERT INTO users (id, email, normalized_email, username, full_name, email_verified, onboarding, created_at, updated_at, last_active_at)
+     VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $8, $8)
+     ON CONFLICT (id) DO UPDATE SET onboarding = $7, updated_at = $8`,
+    [
+      TEST_USER.id,
+      TEST_USER.email,
+      TEST_USER.email.toLowerCase(),
+      TEST_USER.username,
+      TEST_USER.fullName,
+      true,
+      onboarding,
+      now,
+    ],
+  );
+
+  await client.query(
+    `INSERT INTO accounts (id, user_id, account_id, provider_id, password, created_at, updated_at)
+     VALUES ($1, $2, $3, $4, $5, $6, $6)
+     ON CONFLICT DO NOTHING`,
+    [
+      'agent_testing_account_001',
+      TEST_USER.id,
+      TEST_USER.email,
+      'credential',
+      passwordHash,
+      now,
+    ],
+  );
+
+  await client.query(
+    `INSERT INTO api_keys (id, name, key, key_hash, enabled, expires_at, user_id, workspace_id, created_at, updated_at)
+     VALUES ($1, $2, $3, $4, $5, NULL, $6, NULL, $7, $7)
+     ON CONFLICT (id) DO UPDATE
+     SET name = EXCLUDED.name,
+         key = EXCLUDED.key,
+         key_hash = EXCLUDED.key_hash,
+         enabled = EXCLUDED.enabled,
+         expires_at = NULL,
+         updated_at = EXCLUDED.updated_at`,
+    [
+      TEST_API_KEY.id,
+      TEST_API_KEY.name,
+      encryptedApiKey,
+      apiKeyHash,
+      true,
+      TEST_USER.id,
+      now,
+    ],
+  );
+
+  const cliEnvFile = writeCliEnvFile();
+
+  console.log('seeded baseline user:');
+  console.log(`  email: ${TEST_USER.email}`);
+  console.log(`  password: ${TEST_USER.password}`);
+  console.log('seeded baseline API key:');
+  console.log(`  LOBE_API_KEY: ${TEST_API_KEY.key}`);
+  console.log(`  CLI env: ${cliEnvFile}`);
+})()
+  .finally(() => client.end())
+  .catch((error) => {
+    console.error(error);
+    process.exit(1);
+  });
+NODE
+}
+
+cmd_status() {
+  apply_env
+  echo "agent-testing local dev env:"
+  note "APP_URL=$APP_URL"
+  note "DATABASE_URL=$DATABASE_URL"
+  note "PORT=$PORT"
+  note "QSTASH_URL=$QSTASH_URL"
+  if command -v docker > /dev/null 2>&1; then
+    ok "docker CLI available"
+    if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
+      ok "managed Postgres running: $DB_CONTAINER"
+    else
+      note "managed Postgres is not running: $DB_CONTAINER"
+    fi
+  else
+    bad "docker CLI is not available"
+  fi
+}
+
+cmd_qstash() {
+  apply_env
+  cd "$REPO_ROOT"
+  note "starting local QStash dev server at $QSTASH_URL"
+  note "keep this process running while testing workflow paths"
+  exec pnpm run qstash -- -port "$QSTASH_DEV_PORT"
+}
+
+cmd_dev_next() {
+  apply_env
+  cd "$REPO_ROOT"
+  exec pnpm run dev:next
+}
+
+cmd_dev() {
+  apply_env
+  cd "$REPO_ROOT"
+  exec bun run dev
+}
+
+cmd_clean_db() {
+  require_docker
+  if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
+    docker stop "$DB_CONTAINER" > /dev/null
+  fi
+  if docker ps -a --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
+    docker rm "$DB_CONTAINER" > /dev/null
+    ok "removed Postgres container: $DB_CONTAINER"
+  else
+    note "Postgres container not found: $DB_CONTAINER"
+  fi
+}
+
+usage() {
+  sed -n '3,24p' "$0" >&2
+}
+
+COMMAND="${1:-status}"
+
+case "$COMMAND" in
+  help|-h|--help) usage; exit 0 ;;
+  *) guard_no_root_env ;;
+esac
+
+case "$COMMAND" in
+  env) print_env ;;
+  write) shift; write_env "${1:-}" ;;
+  setup-db)
+    start_db
+    migrate_db
+    ;;
+  migrate) migrate_db ;;
+  seed-user) seed_user ;;
+  qstash) cmd_qstash ;;
+  dev-next) cmd_dev_next ;;
+  dev) cmd_dev ;;
+  clean-db) cmd_clean_db ;;
+  status) cmd_status ;;
+  *)
+    usage
+    exit 2
+    ;;
+esac
@@ -0,0 +1,61 @@
+#!/usr/bin/env bash
+# record-gif.sh — capture a frame sequence via agent-browser (CDP) and
+# synthesize a GIF for embedding in a test report.
+#
+# Use this whenever the asserted behavior is about CHANGE OVER TIME —
+# streaming output, a ticking timer, loading states, animations. A static
+# screenshot cannot prove those; a GIF can. Cloud-portable: frames come from
+# CDP rendering, no OS-level screen capture.
+#
+# Usage:
+#   record-gif.sh <output.gif> <duration_seconds> [fps]
+#
+#   AB_TARGET="--cdp 9222"             # Electron (default; CDP_PORT honored)
+#   AB_TARGET="--session lobehub-dev"  # web agent-browser session
+#   GIF_WIDTH=960                      # output width (px), default 960
+#
+# Requires ffmpeg (`brew install ffmpeg`). Effective fps is capped by
+# screenshot latency (~0.3-0.5s per frame); 1-2 fps is the realistic range.
+#
+# Example — record a 12s run and embed it in the report:
+#   ./record-gif.sh "$DIR/assets/case2-tray-running.gif" 12 2 &
+#   GIF_PID=$!
+#   # ... trigger the streaming behavior ...
+#   wait $GIF_PID
+
+set -euo pipefail
+
+OUT="${1:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
+DUR="${2:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
+FPS="${3:-2}"
+AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
+GIF_WIDTH="${GIF_WIDTH:-960}"
+
+command -v ffmpeg > /dev/null || {
+  echo "ffmpeg not found — install with: brew install ffmpeg" >&2
+  exit 1
+}
+
+TMP=$(mktemp -d)
+trap 'rm -rf "$TMP"' EXIT
+
+FRAMES=$((DUR * FPS))
+INTERVAL=$(python3 -c "print(1 / $FPS)")
+
+for i in $(seq -f '%04g' 1 "$FRAMES"); do
+  # shellcheck disable=SC2086
+  agent-browser $AB_TARGET screenshot "$TMP/frame-$i.png" > /dev/null 2>&1 || true
+  sleep "$INTERVAL"
+done
+
+CAPTURED=$(find "$TMP" -name 'frame-*.png' | wc -l | tr -d ' ')
+[ "$CAPTURED" -gt 0 ] || {
+  echo "no frames captured — is the app reachable via $AB_TARGET?" >&2
+  exit 1
+}
+
+ffmpeg -y -loglevel error -framerate "$FPS" -pattern_type glob -i "$TMP/frame-*.png" \
+  -vf "fps=$FPS,scale=$GIF_WIDTH:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
+  "$OUT"
+
+echo "$OUT ($CAPTURED frames @ ${FPS}fps)"
@@ -0,0 +1,88 @@
+#!/usr/bin/env bash
+# report-init.sh — scaffold a structured test report under .records/reports/.
+#
+# Format spec and evidence rules: ../references/report.md
+#
+# Usage:
+#   report-init.sh <slug> [title]
+#
+# Prints the report directory path (capture it: DIR=$(report-init.sh my-test)).
+
+set -euo pipefail
+
+SLUG="${1:?Usage: report-init.sh <slug> [title]}"
+TITLE="${2:-$SLUG}"
+
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
+TS="$(date +%Y%m%d-%H%M%S)"
+DIR="$REPO_ROOT/.records/reports/$TS-$SLUG"
+mkdir -p "$DIR/assets"
+
+BRANCH=$(git -C "$REPO_ROOT" branch --show-current 2> /dev/null || echo "unknown")
+COMMIT=$(git -C "$REPO_ROOT" rev-parse --short HEAD 2> /dev/null || echo "unknown")
+DATE_HUMAN=$(date '+%Y-%m-%d %H:%M')
+DATE_ISO=$(date '+%Y-%m-%dT%H:%M:%S%z')
+
+cat > "$DIR/report.md" << EOF
+# 测试报告：$TITLE
+
+## 范围
+
+<!-- 测试目标 / 变更范围 / 重点风险 -->
+
+- 分支：\`$BRANCH\`
+- 当前提交：\`$COMMIT\`
+- 日期：$DATE_HUMAN
+- 表面：<!-- CLI / Electron + CDP / Web / Bot:<platform> -->
+- 测试页 / 入口：<!-- e.g. /settings or http://localhost:3010 -->
+- 重点：<!-- 本轮最关心的体验、功能或回归点 -->
+
+## 用例
+
+| # | 用例 | 结果 | 关键现象 | 证据 |
+| - | ---- | ---- | -------- | ---- |
+| 1 |      | 待测 |          | ![用例 1](assets/case1.png) |
+
+## 结论
+
+整体结论：\`pending\`。
+
+<!-- 用 1-2 段概括用户最需要知道的结果；失败和阻塞必须明确说明影响。 -->
+
+仍需处理 / 跟进：
+
+- <!-- TODO -->
+
+## 本轮验证
+
+<!-- 如有自动化或命令行验证，保留精简命令与结果；没有则写“未运行额外自动化验证”。 -->
+
+\`\`\`bash
+# command
+\`\`\`
+
+结果：
+
+- <!-- TODO -->
+
+## 评分
+
+- 通过：0
+- 失败：0
+- 阻塞：0
+- 评分：— / 100
+EOF
+
+cat > "$DIR/result.json" << EOF
+{
+  "title": "$TITLE",
+  "createdAt": "$DATE_ISO",
+  "branch": "$BRANCH",
+  "commit": "$COMMIT",
+  "surfaces": [],
+  "cases": [],
+  "summary": { "total": 0, "passed": 0, "failed": 0, "blocked": 0, "verdict": "pending" }
+}
+EOF
+
+echo "$DIR"
@@ -0,0 +1,553 @@
+#!/usr/bin/env bash
+# setup-auth.sh — one-stop auth setup & check for local agent testing.
+#
+# Auth is the gate for all automated testing: prepare it BEFORE writing any
+# test step. Background and failure modes: ../references/auth.md
+#
+# Usage:
+#   setup-auth.sh status        # check server + CLI + web + Electron readiness
+#   setup-auth.sh status --surface web  # check only the Web surface gate
+#   setup-auth.sh cli-seed      # configure CLI API-key auth from seeded local env
+#   setup-auth.sh cli           # interactive CLI device-code login (run by a human)
+#   setup-auth.sh open-chrome   # open SERVER_URL in Chrome and show DevTools
+#   setup-auth.sh web-seed      # sign in seeded user and inject cookies automatically
+#   setup-auth.sh web           # stdin = Cookie header -> inject into agent-browser session
+#   setup-auth.sh web-verify    # live-check the agent-browser session is authenticated
+#
+# Env:
+#   SERVER_URL  (default from test-env.sh)        dev server under test
+#   SESSION     (default lobehub-dev)             agent-browser session name
+#   AUTH_DIR    (default ~/.lobehub-agent-testing) where web state is persisted
+#   SEED_EMAIL / SEED_PASSWORD                    seeded better-auth login
+
+set -euo pipefail
+
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
+
+workspace_root_for_port() {
+  local root="$REPO_ROOT"
+  local name
+  name="$(basename "$root")"
+
+  if [[ "$name" == "lobehub" ]]; then
+    local parent
+    parent="$(cd "$root/.." && pwd)"
+    local parent_name
+    parent_name="$(basename "$parent")"
+    if [[ "$parent_name" == lobehub-cloud* ]]; then
+      root="$parent"
+    fi
+  fi
+
+  printf '%s\n' "$root"
+}
+
+default_server_url() {
+  local env_resolver resolved
+  env_resolver="$(dirname "${BASH_SOURCE[0]}")/test-env.sh"
+  if [[ -x "$env_resolver" ]]; then
+    resolved="$("$env_resolver" --value SERVER_URL 2> /dev/null || true)"
+    if [[ -n "$resolved" ]]; then
+      printf '%s\n' "$resolved"
+      return 0
+    fi
+  fi
+
+  local root name suffix port
+  root="$(workspace_root_for_port)"
+  name="$(basename "$root")"
+
+  case "$name" in
+    lobehub-cloud)
+      port=3020
+      ;;
+    lobehub-cloud-*)
+      suffix="${name#lobehub-cloud-}"
+      if [[ "$suffix" =~ ^[0-9]+$ ]]; then
+        port=$((3020 + 10#$suffix))
+      else
+        port=3010
+      fi
+      ;;
+    *)
+      port=3010
+      ;;
+  esac
+
+  printf 'http://localhost:%s\n' "$port"
+}
+
+SERVER_URL="${SERVER_URL:-$(default_server_url)}"
+SESSION="${SESSION:-lobehub-dev}"
+AUTH_DIR="${AUTH_DIR:-$HOME/.lobehub-agent-testing}"
+STATE_FILE="$AUTH_DIR/web-state.json"
+CLI_HOME_NAME="${LOBEHUB_CLI_HOME:-.lobehub-dev}"
+CLI_HOME="$HOME/${CLI_HOME_NAME#/}"
+CLI_CREDENTIALS_FILE="$CLI_HOME/credentials.json"
+SEED_EMAIL="${SEED_EMAIL:-agent-testing@lobehub.com}"
+SEED_PASSWORD="${SEED_PASSWORD:-TestPassword123!}"
+SEED_API_KEY="${SEED_API_KEY:-${AGENT_TESTING_API_KEY:-sk-lh-agenttesting0001}}"
+CLI_ENV_FILE="${CLI_ENV_FILE:-$REPO_ROOT/.records/env/agent-testing-cli.env}"
+
+ok()   { printf '  \033[32m✔\033[0m %s\n' "$1"; }
+bad()  { printf '  \033[31m✘\033[0m %s\n' "$1"; }
+note() { printf '      %s\n' "$1"; }
+
+usage() {
+  cat << EOF
+Usage:
+  $0 status [--surface all|cli|web|electron]
+  $0 cli-seed
+  $0 cli
+  $0 open-chrome [--dry-run]
+  $0 web-seed
+  $0 web
+  $0 web-verify
+
+Env:
+  SERVER_URL=$SERVER_URL
+  SESSION=$SESSION
+  AUTH_DIR=$AUTH_DIR
+  SEED_EMAIL=$SEED_EMAIL
+  CLI_HOME=$CLI_HOME
+EOF
+}
+
+check_server() {
+  local code
+  code=$(curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/" 2> /dev/null || true)
+  if [[ "$code" =~ ^[23] ]]; then
+    ok "dev server reachable at $SERVER_URL"
+  else
+    bad "dev server NOT reachable at $SERVER_URL (http_code='$code')"
+    note "start it: pnpm run dev:next  (see references/dev-server.md)"
+    return 1
+  fi
+}
+
+check_cli() {
+  local api_key="${LOBEHUB_CLI_API_KEY:-${LOBE_API_KEY:-}}"
+  if [[ -n "$api_key" ]]; then
+    local body_file code
+    body_file="$(mktemp)"
+    code=$(curl -sS -o "$body_file" -w '%{http_code}' \
+      -H "Authorization: Bearer $api_key" \
+      "$SERVER_URL/api/v1/users/me?includeCount=0" 2> /dev/null || true)
+
+    if [[ "$code" =~ ^[23] ]]; then
+      rm -f "$body_file"
+      ok "CLI API-key auth valid for $SERVER_URL"
+      return 0
+    fi
+
+    bad "CLI API-key auth failed for $SERVER_URL (http_code='$code')"
+    note "seed the local API key first:"
+    note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user"
+    note "source $CLI_ENV_FILE"
+    rm -f "$body_file"
+    return 1
+  fi
+
+  if [[ -f "$CLI_HOME/settings.json" ]] && grep -q "$SERVER_URL" "$CLI_HOME/settings.json" && [[ -f "$CLI_CREDENTIALS_FILE" ]]; then
+    ok "CLI device-code credentials configured for $SERVER_URL (creds: $CLI_HOME)"
+  else
+    bad "CLI not logged in to $SERVER_URL"
+    note "automated path:"
+    note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user && source $CLI_ENV_FILE && $0 cli-seed"
+    note "interactive fallback:"
+    note "cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server $SERVER_URL"
+    return 1
+  fi
+}
+
+check_web() {
+  if [[ -f "$STATE_FILE" ]]; then
+    ok "web auth state saved ($STATE_FILE)"
+  else
+    bad "no web auth state for agent-browser"
+    note "for the seeded local user, run: $0 web-seed"
+    note "or copy the Cookie header from Chrome DevTools (Network tab), then:"
+    note "pbpaste | $0 web   (see references/auth.md)"
+    return 1
+  fi
+  cmd_web_verify --skip-server-check
+}
+
+check_agent_browser() {
+  if command -v agent-browser > /dev/null 2>&1; then
+    ok "agent-browser available"
+  else
+    bad "agent-browser command not found"
+    note "install or expose agent-browser before Web/Electron UI testing"
+    return 1
+  fi
+}
+
+check_electron() {
+  local cdp_port="${CDP_PORT:-9222}"
+  if ! curl -s -o /dev/null --max-time 2 "http://localhost:$cdp_port/json/version" 2> /dev/null; then
+    note "electron: not running (CDP $cdp_port unreachable) — start with electron-dev.sh; check skipped"
+    return 0
+  fi
+  local probe result
+  probe="$(dirname "${BASH_SOURCE[0]}")/app-probe.sh"
+  result=$(bash "$probe" auth 2> /dev/null || true)
+  # agent-browser eval returns the JSON string with escaped quotes — normalize.
+  result="${result//\\/}"
+  if [[ "$result" == *'"isSignedIn":true'* ]]; then
+    ok "electron app signed in ($result)"
+  else
+    bad "electron app NOT signed in ($result)"
+    note "log in once manually inside the app (state persists across restarts)"
+    return 1
+  fi
+}
+
+cmd_status() {
+  local surface="all"
+  while [[ $# -gt 0 ]]; do
+    case "$1" in
+      --surface)
+        if [[ $# -lt 2 ]]; then
+          echo "--surface requires one of: all, cli, web, electron" >&2
+          return 2
+        fi
+        surface="${2:-}"
+        shift 2
+        ;;
+      --surface=*)
+        surface="${1#*=}"
+        shift
+        ;;
+      all|cli|web|electron)
+        surface="$1"
+        shift
+        ;;
+      -h|--help)
+        usage
+        return 0
+        ;;
+      *)
+        echo "unknown status option: $1" >&2
+        usage >&2
+        return 2
+        ;;
+    esac
+  done
+
+  case "$surface" in
+    all|cli|web|electron) ;;
+    "")
+      echo "--surface requires one of: all, cli, web, electron" >&2
+      return 2
+      ;;
+    *)
+      echo "unknown surface: $surface" >&2
+      usage >&2
+      return 2
+      ;;
+  esac
+
+  echo "agent-testing auth status (surface=$surface, SERVER_URL=$SERVER_URL):"
+  local rc=0
+  case "$surface" in
+    all)
+      check_server || rc=1
+      check_cli || rc=1
+      check_web || rc=1
+      check_electron || rc=1
+      ;;
+    cli)
+      check_server || rc=1
+      check_cli || rc=1
+      ;;
+    web)
+      check_server || rc=1
+      check_web || rc=1
+      ;;
+    electron)
+      check_electron || rc=1
+      ;;
+  esac
+  if [[ $rc -eq 0 ]]; then
+    echo "$surface auth green — safe to start automated testing on this surface."
+  else
+    echo "$surface auth NOT ready — fix the ✘ items before writing any test step."
+  fi
+  return $rc
+}
+
+cmd_cli() {
+  echo "Starting CLI device-code login against $SERVER_URL ..."
+  echo "(opens a browser authorization — must be run by a human in a terminal)"
+  cd "$REPO_ROOT/apps/cli"
+  LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server "$SERVER_URL"
+}
+
+write_cli_seed_env() {
+  mkdir -p "$(dirname "$CLI_ENV_FILE")"
+  cat > "$CLI_ENV_FILE" << EOF
+# Source this file before running LobeHub CLI agent tests.
+# Generated by setup-auth.sh cli-seed
+export LOBE_API_KEY=$SEED_API_KEY
+export LOBEHUB_CLI_API_KEY="\${LOBE_API_KEY}"
+export LOBEHUB_SERVER=$SERVER_URL
+export LOBEHUB_CLI_HOME=.lobehub-dev
+EOF
+}
+
+write_cli_settings() {
+  mkdir -p "$CLI_HOME"
+  python3 - "$CLI_HOME/settings.json" "$SERVER_URL" << 'PY'
+import json
+import os
+import sys
+
+path, server_url = sys.argv[1], sys.argv[2]
+os.makedirs(os.path.dirname(path), exist_ok=True)
+with open(path, "w") as f:
+    json.dump({"serverUrl": server_url}, f, indent=2)
+    f.write("\n")
+os.chmod(path, 0o600)
+PY
+}
+
+cmd_cli_seed() {
+  check_server || return 1
+  write_cli_seed_env
+  write_cli_settings
+  ok "wrote CLI seed env: $CLI_ENV_FILE"
+  note "source it before CLI commands: source $CLI_ENV_FILE"
+  note "settings saved at: $CLI_HOME/settings.json"
+  LOBE_API_KEY="$SEED_API_KEY" LOBEHUB_CLI_API_KEY="$SEED_API_KEY" check_cli
+}
+
+cmd_open_chrome() {
+  local mode="${1:-}"
+  if [[ "$mode" != "" && "$mode" != "--dry-run" ]]; then
+    echo "unknown open-chrome option: $mode" >&2
+    usage >&2
+    return 2
+  fi
+
+  if [[ "$mode" == "--dry-run" ]]; then
+    echo "would open Google Chrome at $SERVER_URL/"
+    echo "would press Cmd+Option+I to open DevTools"
+    echo "would open DevTools command menu and run 'Show Network'"
+    return 0
+  fi
+
+  if [[ "$(uname -s)" != "Darwin" ]]; then
+    bad "open-chrome is macOS-only"
+    note "open $SERVER_URL/ in your browser and open DevTools manually"
+    return 1
+  fi
+
+  if ! command -v osascript > /dev/null 2>&1; then
+    bad "osascript not found"
+    note "open $SERVER_URL/ in Chrome and press Cmd+Option+I manually"
+    return 1
+  fi
+
+  SERVER_URL="$SERVER_URL" osascript << 'OSA'
+set targetUrl to (system attribute "SERVER_URL") & "/"
+
+tell application "Google Chrome"
+  activate
+  if (count of windows) = 0 then
+    make new window
+  end if
+  tell front window to make new tab with properties {URL:targetUrl}
+end tell
+
+delay 1
+
+tell application "System Events"
+  tell process "Google Chrome"
+    set frontmost to true
+    keystroke "i" using {command down, option down}
+    delay 1
+    keystroke "p" using {command down, shift down}
+    delay 0.2
+    keystroke "Show Network"
+    key code 36
+  end tell
+end tell
+OSA
+  ok "opened Chrome at $SERVER_URL/ and requested DevTools Network panel"
+}
+
+cookie_header_from_jar() {
+  local jar="$1"
+  awk '
+    BEGIN { first = 1 }
+    /^$/ { next }
+    /^#/ {
+      if ($0 !~ /^#HttpOnly_/) next
+      sub(/^#HttpOnly_/, "")
+    }
+    NF >= 7 {
+      if (!first) printf "; "
+      printf "%s=%s", $6, $7
+      first = 0
+    }
+    END {
+      if (!first) printf "\n"
+    }
+  ' "$jar"
+}
+
+# Build a Playwright storageState file from a raw Cookie header on stdin,
+# keeping only the better-auth cookies. See references/auth.md for why the
+# header must come from a Network request (HttpOnly) and why httpOnly=false.
+cmd_web() {
+  mkdir -p "$AUTH_DIR"
+  local raw
+  raw="$(cat)"
+  COOKIE_INPUT="$raw" python3 - "$STATE_FILE" << 'PY'
+import json, os, sys, time
+
+raw = os.environ.get("COOKIE_INPUT", "").strip()
+cookie_lines = []
+for line in raw.splitlines():
+    stripped = line.strip()
+    if not stripped:
+        continue
+    if stripped.lower().startswith("cookie:"):
+        cookie_lines.append(stripped.split(":", 1)[1].strip())
+    else:
+        cookie_lines.append(stripped)
+
+raw = "; ".join(cookie_lines)
+
+WANTED = {"better-auth.session_token", "better-auth.session_data", "better-auth.state"}
+exp = int(time.time()) + 30 * 24 * 3600  # 30 days
+
+cookies = []
+for pair in raw.split(";"):
+    pair = pair.strip()
+    if "=" not in pair:
+        continue
+    name, _, value = pair.partition("=")
+    if name not in WANTED:
+        continue
+    cookies.append({
+        "name": name,
+        "value": value,
+        "domain": "localhost",
+        "path": "/",
+        "expires": exp,
+        "httpOnly": False,
+        "secure": False,
+        "sameSite": "Lax",
+    })
+
+if not cookies:
+    sys.stderr.write("no better-auth cookies found in input — paste the raw Cookie header from a Network request\n")
+    sys.exit(1)
+
+with open(sys.argv[1], "w") as f:
+    json.dump({"cookies": cookies, "origins": []}, f, indent=2)
+print(f"wrote {len(cookies)} cookie(s) to {sys.argv[1]}")
+PY
+  cmd_web_verify
+}
+
+cmd_web_seed() {
+  check_server || return 1
+  mkdir -p "$AUTH_DIR"
+
+  local cookie_jar="$AUTH_DIR/web-seed-cookie.jar"
+  local response_body="$AUTH_DIR/web-seed-response.json"
+  local payload code
+  payload="$(
+    SEED_EMAIL="$SEED_EMAIL" SEED_PASSWORD="$SEED_PASSWORD" python3 - << 'PY'
+import json
+import os
+
+print(json.dumps({
+    "callbackURL": "/",
+    "email": os.environ["SEED_EMAIL"],
+    "password": os.environ["SEED_PASSWORD"],
+}))
+PY
+  )"
+
+  code=$(curl -sS -o "$response_body" -w '%{http_code}' \
+    -c "$cookie_jar" \
+    -H 'Content-Type: application/json' \
+    -X POST "$SERVER_URL/api/auth/sign-in/email" \
+    --data "$payload" 2> /dev/null || true)
+
+  if [[ ! "$code" =~ ^[23] ]]; then
+    bad "seed user sign-in failed at $SERVER_URL/api/auth/sign-in/email (http_code='$code')"
+    note "make sure the seed user exists:"
+    note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user"
+    return 1
+  fi
+
+  local cookie_header
+  cookie_header="$(cookie_header_from_jar "$cookie_jar")"
+  if [[ -z "$cookie_header" ]]; then
+    bad "seed sign-in succeeded but no cookies were written to $cookie_jar"
+    return 1
+  fi
+
+  printf '%s\n' "$cookie_header" | cmd_web
+}
+
+cmd_web_verify() {
+  local skip_server_check="${1:-}"
+  if [[ "$skip_server_check" != "--skip-server-check" ]]; then
+    check_server || return 1
+  fi
+  if [[ ! -f "$STATE_FILE" ]]; then
+    bad "no web auth state for agent-browser"
+    note "for the seeded local user, run: $0 web-seed"
+    note "or copy the Cookie header from Chrome DevTools (Network tab), then:"
+    note "pbpaste | $0 web"
+    return 1
+  fi
+  check_agent_browser || return 1
+  if ! agent-browser --session "$SESSION" state load "$STATE_FILE" > /dev/null; then
+    bad "failed to load web auth state into agent-browser session '$SESSION'"
+    return 1
+  fi
+  if ! agent-browser --session "$SESSION" open "$SERVER_URL/" > /dev/null; then
+    bad "failed to open $SERVER_URL in agent-browser session '$SESSION'"
+    return 1
+  fi
+  local url
+  url=$(agent-browser --session "$SESSION" get url 2> /dev/null || true)
+  if [[ -z "$url" ]]; then
+    bad "agent-browser session '$SESSION' did not report a current URL"
+    return 1
+  fi
+  if [[ "$url" == *"/signin"* || "$url" == *"/login"* ]]; then
+    bad "agent-browser session '$SESSION' NOT authenticated (landed on $url)"
+    note "re-copy the Cookie header and re-run: pbpaste | $0 web"
+    return 1
+  fi
+  ok "agent-browser session '$SESSION' authenticated (at $url)"
+}
+
+case "${1:-status}" in
+  status)
+    shift || true
+    cmd_status "$@"
+    ;;
+  cli-seed) cmd_cli_seed ;;
+  cli) cmd_cli ;;
+  open-chrome)
+    shift || true
+    cmd_open_chrome "$@"
+    ;;
+  web-seed) cmd_web_seed ;;
+  web) cmd_web ;;
+  web-verify) cmd_web_verify ;;
+  -h|--help) usage ;;
+  *)
+    echo "Usage: $0 {status|cli-seed|cli|open-chrome|web-seed|web|web-verify}" >&2
+    exit 2
+    ;;
+esac
@@ -0,0 +1,197 @@
+#!/usr/bin/env bash
+# Smoke tests for setup-auth.sh. Uses a temporary agent-browser stub and local
+# HTTP server, so it does not need real browser auth.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SCRIPT="$SCRIPT_DIR/setup-auth.sh"
+
+fail() {
+  echo "FAIL: $*" >&2
+  exit 1
+}
+
+assert_contains() {
+  local file="$1"
+  local text="$2"
+  grep -Fq "$text" "$file" || fail "expected '$text' in $file"
+}
+
+tmp_dir="$(mktemp -d)"
+server_pid=""
+
+cleanup() {
+  if [[ -n "$server_pid" ]]; then
+    kill "$server_pid" > /dev/null 2>&1 || true
+    wait "$server_pid" > /dev/null 2>&1 || true
+  fi
+  rm -rf "$tmp_dir"
+}
+trap cleanup EXIT
+export HOME="$tmp_dir/home"
+
+port="$(python3 - << 'PY'
+import socket
+
+sock = socket.socket()
+sock.bind(("127.0.0.1", 0))
+print(sock.getsockname()[1])
+sock.close()
+PY
+)"
+
+python3 - "$port" << 'PY' > "$tmp_dir/http.log" 2>&1 &
+from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
+import sys
+
+
+class Handler(BaseHTTPRequestHandler):
+    def do_GET(self):
+        if self.path.startswith("/api/v1/users/me"):
+            if self.headers.get("authorization") != "Bearer sk-lh-agenttesting0001":
+                self.send_response(401)
+                self.end_headers()
+                self.wfile.write(b'{"success":false}')
+                return
+
+            self.send_response(200)
+            self.send_header("Content-Type", "application/json")
+            self.end_headers()
+            self.wfile.write(b'{"success":true,"data":{"id":"user_agent_testing_001"}}')
+            return
+
+        self.send_response(200)
+        self.end_headers()
+        self.wfile.write(b"ok")
+
+    def do_POST(self):
+        length = int(self.headers.get("content-length") or "0")
+        if length:
+            self.rfile.read(length)
+
+        if self.path != "/api/auth/sign-in/email":
+            self.send_response(404)
+            self.end_headers()
+            return
+
+        self.send_response(200)
+        self.send_header(
+            "Set-Cookie",
+            "better-auth.session_token=seed.token; Path=/; HttpOnly; SameSite=Lax",
+        )
+        self.send_header(
+            "Set-Cookie",
+            "better-auth.session_data=seed.data; Path=/; HttpOnly; SameSite=Lax",
+        )
+        self.send_header("Content-Type", "application/json")
+        self.end_headers()
+        self.wfile.write(b'{"ok":true}')
+
+    def log_message(self, format, *args):
+        return
+
+
+ThreadingHTTPServer(("localhost", int(sys.argv[1])), Handler).serve_forever()
+PY
+server_pid="$!"
+
+server_url="http://localhost:$port"
+for _ in {1..50}; do
+  if curl -s -o /dev/null "$server_url/"; then
+    break
+  fi
+  sleep 0.1
+done
+curl -s -o /dev/null "$server_url/" || fail "test HTTP server did not start"
+
+mkdir -p "$tmp_dir/bin" "$tmp_dir/auth"
+cat > "$tmp_dir/bin/agent-browser" << 'SH'
+#!/usr/bin/env bash
+set -euo pipefail
+
+if [[ "${1:-}" == "--session" ]]; then
+  shift 2
+fi
+
+case "${1:-}" in
+  state)
+    [[ "${2:-}" == "load" ]] || exit 2
+    [[ -f "${3:-}" ]] || exit 1
+    ;;
+  open)
+    printf '%s\n' "${2:-}" > "${AGENT_BROWSER_URL_FILE:?}"
+    ;;
+  get)
+    [[ "${2:-}" == "url" ]] || exit 2
+    cat "${AGENT_BROWSER_URL_FILE:?}"
+    ;;
+  *)
+    echo "unexpected agent-browser command: $*" >&2
+    exit 2
+    ;;
+esac
+SH
+chmod +x "$tmp_dir/bin/agent-browser"
+
+export PATH="$tmp_dir/bin:$PATH"
+export AUTH_DIR="$tmp_dir/auth"
+export SESSION="setup-auth-test"
+export SERVER_URL="$server_url"
+export AGENT_BROWSER_URL_FILE="$tmp_dir/current-url"
+
+cookie_header="Cookie: foo=bar; better-auth.session_token=test.token; better-auth.session_data=encoded%3D; theme=dark"
+printf '%s\n' "$cookie_header" | "$SCRIPT" web > "$tmp_dir/web.out"
+
+python3 - "$AUTH_DIR/web-state.json" << 'PY'
+import json, sys
+
+with open(sys.argv[1]) as f:
+    state = json.load(f)
+
+names = {cookie["name"] for cookie in state["cookies"]}
+expected = {"better-auth.session_token", "better-auth.session_data"}
+if names != expected:
+    raise SystemExit(f"unexpected cookies: {sorted(names)}")
+PY
+
+"$SCRIPT" web-seed > "$tmp_dir/web-seed.out"
+
+python3 - "$AUTH_DIR/web-state.json" << 'PY'
+import json, sys
+
+with open(sys.argv[1]) as f:
+    state = json.load(f)
+
+values = {cookie["name"]: cookie["value"] for cookie in state["cookies"]}
+expected = {
+    "better-auth.session_token": "seed.token",
+    "better-auth.session_data": "seed.data",
+}
+if values != expected:
+    raise SystemExit(f"unexpected seeded cookies: {values}")
+PY
+
+"$SCRIPT" status --surface web > "$tmp_dir/status.out"
+assert_contains "$tmp_dir/status.out" "surface=web"
+assert_contains "$tmp_dir/status.out" "web auth green"
+
+"$SCRIPT" cli-seed > "$tmp_dir/cli-seed.out"
+assert_contains "$tmp_dir/cli-seed.out" "CLI API-key auth valid"
+assert_contains "$tmp_dir/cli-seed.out" "settings saved at: $HOME/.lobehub-dev/settings.json"
+
+if "$SCRIPT" status --surface cli > "$tmp_dir/cli-no-env.out"; then
+  fail "cli status without API key unexpectedly passed"
+fi
+assert_contains "$tmp_dir/cli-no-env.out" "CLI not logged in"
+
+LOBEHUB_CLI_API_KEY=sk-lh-agenttesting0001 "$SCRIPT" status --surface cli > "$tmp_dir/cli-status.out"
+assert_contains "$tmp_dir/cli-status.out" "CLI API-key auth valid"
+assert_contains "$tmp_dir/cli-status.out" "cli auth green"
+
+if printf 'foo=bar\n' | "$SCRIPT" web > "$tmp_dir/invalid.out" 2> "$tmp_dir/invalid.err"; then
+  fail "invalid cookie unexpectedly passed"
+fi
+assert_contains "$tmp_dir/invalid.err" "no better-auth cookies found"
+
+echo "setup-auth tests passed"
@@ -0,0 +1,377 @@
+#!/usr/bin/env bash
+# Print the resolved local test environment for agent-testing.
+#
+# This is intentionally read-only. It mirrors scripts/runWithEnv.mts precedence:
+# .env -> .env.$NODE_ENV -> .env.local -> .env.$NODE_ENV.local, then shell env.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
+NODE_ENV="${NODE_ENV:-development}"
+
+VALUE_APP_URL=""
+VALUE_PORT=""
+VALUE_SERVER_URL=""
+VALUE_AUTH_TRUSTED_ORIGINS=""
+VALUE_SPA_PORT=""
+VALUE_MOBILE_SPA_PORT=""
+VALUE_DESKTOP_PORT=""
+
+SOURCE_APP_URL=""
+SOURCE_PORT=""
+SOURCE_SERVER_URL=""
+SOURCE_AUTH_TRUSTED_ORIGINS=""
+SOURCE_SPA_PORT=""
+SOURCE_MOBILE_SPA_PORT=""
+SOURCE_DESKTOP_PORT=""
+
+LOADED_ENV_FILES=""
+
+keys() {
+  printf '%s\n' \
+    APP_URL \
+    PORT \
+    SERVER_URL \
+    AUTH_TRUSTED_ORIGINS \
+    SPA_PORT \
+    MOBILE_SPA_PORT \
+    DESKTOP_PORT
+}
+
+trim() {
+  local value="$1"
+  value="${value#"${value%%[![:space:]]*}"}"
+  value="${value%"${value##*[![:space:]]}"}"
+  printf '%s' "$value"
+}
+
+workspace_root() {
+  local root="$REPO_ROOT"
+  local name
+  name="$(basename "$root")"
+
+  if [[ "$name" == "lobehub" ]]; then
+    local parent parent_name
+    parent="$(cd "$root/.." && pwd)"
+    parent_name="$(basename "$parent")"
+    if [[ "$parent_name" == lobehub-cloud* ]]; then
+      root="$parent"
+    fi
+  fi
+
+  printf '%s\n' "$root"
+}
+
+workspace_offset() {
+  local name="$1"
+
+  case "$name" in
+    lobehub-cloud)
+      printf '0\n'
+      ;;
+    lobehub-cloud-*)
+      local suffix="${name#lobehub-cloud-}"
+      if [[ "$suffix" =~ ^[0-9]+$ ]]; then
+        printf '%s\n' "$((10#$suffix))"
+      else
+        printf '\n'
+      fi
+      ;;
+    *)
+      printf '\n'
+      ;;
+  esac
+}
+
+default_port() {
+  local base="$1"
+  local fallback="$2"
+  local root name offset
+  root="$(workspace_root)"
+  name="$(basename "$root")"
+  offset="$(workspace_offset "$name")"
+
+  if [[ -n "$offset" ]]; then
+    printf '%s\n' "$((base + offset))"
+  else
+    printf '%s\n' "$fallback"
+  fi
+}
+
+url_port() {
+  local url="$1"
+  local hostport
+  hostport="${url#*://}"
+  hostport="${hostport%%/*}"
+
+  if [[ "$hostport" == *:* ]]; then
+    local port="${hostport##*:}"
+    if [[ "$port" =~ ^[0-9]+$ ]]; then
+      printf '%s\n' "$port"
+      return 0
+    fi
+  fi
+
+  return 1
+}
+
+url_origin() {
+  local url="$1"
+  local scheme rest hostport
+  if [[ "$url" == *"://"* ]]; then
+    scheme="${url%%://*}"
+    rest="${url#*://}"
+    hostport="${rest%%/*}"
+    printf '%s://%s\n' "$scheme" "$hostport"
+  else
+    printf '%s\n' "$url"
+  fi
+}
+
+set_value() {
+  local key="$1"
+  local value="$2"
+  local source="$3"
+
+  case "$key" in
+    APP_URL) VALUE_APP_URL="$value"; SOURCE_APP_URL="$source" ;;
+    PORT) VALUE_PORT="$value"; SOURCE_PORT="$source" ;;
+    SERVER_URL) VALUE_SERVER_URL="$value"; SOURCE_SERVER_URL="$source" ;;
+    AUTH_TRUSTED_ORIGINS) VALUE_AUTH_TRUSTED_ORIGINS="$value"; SOURCE_AUTH_TRUSTED_ORIGINS="$source" ;;
+    SPA_PORT) VALUE_SPA_PORT="$value"; SOURCE_SPA_PORT="$source" ;;
+    MOBILE_SPA_PORT) VALUE_MOBILE_SPA_PORT="$value"; SOURCE_MOBILE_SPA_PORT="$source" ;;
+    DESKTOP_PORT) VALUE_DESKTOP_PORT="$value"; SOURCE_DESKTOP_PORT="$source" ;;
+  esac
+}
+
+value_for() {
+  case "$1" in
+    APP_URL) printf '%s\n' "$VALUE_APP_URL" ;;
+    PORT) printf '%s\n' "$VALUE_PORT" ;;
+    SERVER_URL) printf '%s\n' "$VALUE_SERVER_URL" ;;
+    AUTH_TRUSTED_ORIGINS) printf '%s\n' "$VALUE_AUTH_TRUSTED_ORIGINS" ;;
+    SPA_PORT) printf '%s\n' "$VALUE_SPA_PORT" ;;
+    MOBILE_SPA_PORT) printf '%s\n' "$VALUE_MOBILE_SPA_PORT" ;;
+    DESKTOP_PORT) printf '%s\n' "$VALUE_DESKTOP_PORT" ;;
+  esac
+}
+
+source_for() {
+  case "$1" in
+    APP_URL) printf '%s\n' "$SOURCE_APP_URL" ;;
+    PORT) printf '%s\n' "$SOURCE_PORT" ;;
+    SERVER_URL) printf '%s\n' "$SOURCE_SERVER_URL" ;;
+    AUTH_TRUSTED_ORIGINS) printf '%s\n' "$SOURCE_AUTH_TRUSTED_ORIGINS" ;;
+    SPA_PORT) printf '%s\n' "$SOURCE_SPA_PORT" ;;
+    MOBILE_SPA_PORT) printf '%s\n' "$SOURCE_MOBILE_SPA_PORT" ;;
+    DESKTOP_PORT) printf '%s\n' "$SOURCE_DESKTOP_PORT" ;;
+  esac
+}
+
+is_tracked_key() {
+  case "$1" in
+    APP_URL|PORT|SERVER_URL|AUTH_TRUSTED_ORIGINS|SPA_PORT|MOBILE_SPA_PORT|DESKTOP_PORT) return 0 ;;
+    *) return 1 ;;
+  esac
+}
+
+parse_env_file() {
+  local file="$1"
+  local root="$2"
+  local label="${file#$root/}"
+  local line key value
+
+  [[ -f "$file" ]] || return 0
+  if [[ -z "$LOADED_ENV_FILES" ]]; then
+    LOADED_ENV_FILES="$label"
+  else
+    LOADED_ENV_FILES="$LOADED_ENV_FILES, $label"
+  fi
+
+  while IFS= read -r line || [[ -n "$line" ]]; do
+    line="$(trim "$line")"
+    [[ -z "$line" || "$line" == \#* ]] && continue
+
+    if [[ "$line" == export[[:space:]]* ]]; then
+      line="$(trim "${line#export}")"
+    fi
+
+    [[ "$line" == *=* ]] || continue
+    key="$(trim "${line%%=*}")"
+    value="$(trim "${line#*=}")"
+    is_tracked_key "$key" || continue
+
+    if [[ "$value" == \"*\" && "$value" == *\" && ${#value} -ge 2 ]]; then
+      value="${value:1:${#value}-2}"
+    elif [[ "$value" == \'* && "$value" == *\' && ${#value} -ge 2 ]]; then
+      value="${value:1:${#value}-2}"
+    fi
+
+    set_value "$key" "$value" "$label"
+  done < "$file"
+}
+
+apply_env_files() {
+  local root="$1"
+  parse_env_file "$root/.env" "$root"
+  parse_env_file "$root/.env.$NODE_ENV" "$root"
+  parse_env_file "$root/.env.local" "$root"
+  parse_env_file "$root/.env.$NODE_ENV.local" "$root"
+}
+
+apply_shell_overrides() {
+  local key value
+  while IFS= read -r key; do
+    if [[ -n "${!key+x}" ]]; then
+      value="${!key}"
+      set_value "$key" "$value" "shell"
+    fi
+  done < <(keys)
+}
+
+resolve_defaults() {
+  local app_port spa_port mobile_spa_port desktop_port
+  app_port="$(default_port 3020 3010)"
+  spa_port="$(default_port 9800 9876)"
+  mobile_spa_port="$(default_port 3810 3012)"
+  desktop_port="$(default_port 3030 3015)"
+
+  if [[ -z "$VALUE_APP_URL" ]]; then
+    set_value APP_URL "http://localhost:$app_port" "inferred"
+  fi
+
+  if [[ -z "$VALUE_PORT" ]]; then
+    if app_port="$(url_port "$VALUE_APP_URL")"; then
+      set_value PORT "$app_port" "inferred from APP_URL"
+    else
+      set_value PORT "$(default_port 3020 3010)" "inferred"
+    fi
+  fi
+
+  if [[ -z "$VALUE_SERVER_URL" ]]; then
+    set_value SERVER_URL "$VALUE_APP_URL" "from APP_URL"
+  fi
+
+  if [[ -z "$VALUE_SPA_PORT" ]]; then
+    set_value SPA_PORT "$spa_port" "inferred"
+  fi
+
+  if [[ -z "$VALUE_MOBILE_SPA_PORT" ]]; then
+    set_value MOBILE_SPA_PORT "$mobile_spa_port" "inferred"
+  fi
+
+  if [[ -z "$VALUE_DESKTOP_PORT" ]]; then
+    set_value DESKTOP_PORT "$desktop_port" "inferred"
+  fi
+
+  if [[ -z "$VALUE_AUTH_TRUSTED_ORIGINS" ]]; then
+    set_value AUTH_TRUSTED_ORIGINS "$(url_origin "$VALUE_APP_URL"),http://localhost:$VALUE_SPA_PORT" "inferred"
+  fi
+}
+
+contains_origin() {
+  local list="$1"
+  local expected="$2"
+  local item
+  IFS=',' read -r -a items <<< "$list"
+  for item in "${items[@]}"; do
+    item="$(trim "$item")"
+    [[ "$item" == "$expected" ]] && return 0
+  done
+  return 1
+}
+
+print_exports() {
+  local key value
+  while IFS= read -r key; do
+    value="$(value_for "$key")"
+    printf 'export %s=%q\n' "$key" "$value"
+  done < <(keys)
+}
+
+print_value() {
+  local key="$1"
+  if ! is_tracked_key "$key"; then
+    echo "unknown key: $key" >&2
+    exit 2
+  fi
+  value_for "$key"
+}
+
+print_human() {
+  local root="$1"
+  local key value source
+
+  echo "agent-testing test env:"
+  printf '  workspace: %s\n' "$root"
+  printf '  NODE_ENV: %s\n' "$NODE_ENV"
+  printf '  env files: %s\n' "${LOADED_ENV_FILES:-none}"
+  echo
+  echo "resolved values:"
+  while IFS= read -r key; do
+    value="$(value_for "$key")"
+    source="$(source_for "$key")"
+    printf '  %-22s %s  (%s)\n' "$key=$value" "" "$source"
+  done < <(keys)
+  echo
+  echo "checks:"
+
+  local app_origin spa_origin app_port
+  app_origin="$(url_origin "$VALUE_APP_URL")"
+  spa_origin="http://localhost:$VALUE_SPA_PORT"
+  if app_port="$(url_port "$VALUE_APP_URL")" && [[ "$app_port" == "$VALUE_PORT" ]]; then
+    printf '  OK   PORT matches APP_URL (%s)\n' "$VALUE_PORT"
+  else
+    printf '  WARN PORT (%s) does not match APP_URL (%s)\n' "$VALUE_PORT" "$VALUE_APP_URL"
+  fi
+
+  if contains_origin "$VALUE_AUTH_TRUSTED_ORIGINS" "$app_origin"; then
+    printf '  OK   AUTH_TRUSTED_ORIGINS includes %s\n' "$app_origin"
+  else
+    printf '  WARN AUTH_TRUSTED_ORIGINS is missing %s\n' "$app_origin"
+  fi
+
+  if contains_origin "$VALUE_AUTH_TRUSTED_ORIGINS" "$spa_origin"; then
+    printf '  OK   AUTH_TRUSTED_ORIGINS includes %s\n' "$spa_origin"
+  else
+    printf '  WARN AUTH_TRUSTED_ORIGINS is missing %s\n' "$spa_origin"
+  fi
+}
+
+usage() {
+  cat << EOF
+Usage:
+  $0                 # print resolved test environment
+  $0 --exports       # print source-able export lines
+  $0 --value KEY     # print one resolved value
+
+Tracked keys:
+  APP_URL PORT SERVER_URL AUTH_TRUSTED_ORIGINS SPA_PORT MOBILE_SPA_PORT DESKTOP_PORT
+EOF
+}
+
+ROOT="$(workspace_root)"
+apply_env_files "$ROOT"
+apply_shell_overrides
+resolve_defaults
+
+case "${1:-}" in
+  "")
+    print_human "$ROOT"
+    ;;
+  --exports)
+    print_exports
+    ;;
+  --value)
+    print_value "${2:-}"
+    ;;
+  -h|--help)
+    usage
+    ;;
+  *)
+    echo "unknown option: $1" >&2
+    usage >&2
+    exit 2
+    ;;
+esac
@@ -0,0 +1,57 @@
+#!/usr/bin/env bash
+# Smoke tests for test-env.sh.
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+
+fail() {
+  echo "FAIL: $*" >&2
+  exit 1
+}
+
+assert_eq() {
+  local actual="$1"
+  local expected="$2"
+  [[ "$actual" == "$expected" ]] || fail "expected '$expected', got '$actual'"
+}
+
+assert_contains() {
+  local file="$1"
+  local text="$2"
+  grep -Fq "$text" "$file" || fail "expected '$text' in $file"
+}
+
+tmp_dir="$(mktemp -d)"
+trap 'rm -rf "$tmp_dir"' EXIT
+
+mkdir -p "$tmp_dir/lobehub-cloud-1/.agents/skills" "$tmp_dir/lobehub/.agents/skills"
+ln -s "$SCRIPT_DIR/.." "$tmp_dir/lobehub-cloud-1/.agents/skills/agent-testing"
+ln -s "$SCRIPT_DIR/.." "$tmp_dir/lobehub/.agents/skills/agent-testing"
+
+cloud_script="$tmp_dir/lobehub-cloud-1/.agents/skills/agent-testing/scripts/test-env.sh"
+oss_script="$tmp_dir/lobehub/.agents/skills/agent-testing/scripts/test-env.sh"
+
+assert_eq "$("$cloud_script" --value SERVER_URL)" "http://localhost:3021"
+assert_eq "$("$cloud_script" --value SPA_PORT)" "9801"
+assert_eq "$("$cloud_script" --value MOBILE_SPA_PORT)" "3811"
+assert_eq "$("$cloud_script" --value DESKTOP_PORT)" "3031"
+assert_eq "$("$oss_script" --value SERVER_URL)" "http://localhost:3010"
+
+cat > "$tmp_dir/lobehub-cloud-1/.env" << 'EOF'
+APP_URL=http://localhost:4123
+PORT=4123
+AUTH_TRUSTED_ORIGINS=http://localhost:4123,http://localhost:9823
+SPA_PORT=9823
+MOBILE_SPA_PORT=3823
+DESKTOP_PORT=3043
+EOF
+
+assert_eq "$("$cloud_script" --value SERVER_URL)" "http://localhost:4123"
+assert_eq "$("$cloud_script" --value SPA_PORT)" "9823"
+"$cloud_script" --exports > "$tmp_dir/exports.out"
+assert_contains "$tmp_dir/exports.out" "export APP_URL=http://localhost:4123"
+assert_contains "$tmp_dir/exports.out" "export SERVER_URL=http://localhost:4123"
+assert_contains "$tmp_dir/exports.out" "export AUTH_TRUSTED_ORIGINS=http://localhost:4123\\,http://localhost:9823"
+
+echo "test-env tests passed"
@@ -0,0 +1,154 @@
+# Electron (LobeHub Desktop) UI Testing
+
+Default surface for verifying **pure frontend changes** (components, store logic, styles, interactions) in the primary product shape. Drives the Electron renderer over CDP with `agent-browser` — see [../references/agent-browser.md](../references/agent-browser.md) for the full command reference.
+
+**Auth**: the Electron app keeps its own persistent login state — log in once manually in the app; sessions survive restarts. Run `../scripts/setup-auth.sh status` before testing (see [../references/auth.md](../references/auth.md)).
+
+**Linux / headless (cloud)**: Electron itself runs on Linux, but it has no true headless mode — it needs a display server. In a headless environment wrap the launch with `xvfb-run` (virtual framebuffer). Everything CDP-based keeps working under Xvfb: the `agent-browser --cdp 9222` connection, snapshots, eval, and `agent-browser screenshot` (captured from the renderer via CDP, not the OS screen). What does NOT work on Linux: `capture-app-window.sh` (macOS `screencapture`), osascript, and the ffmpeg recording scripts in their current form.
+
+### Setup / Teardown
+
+Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
+
+```bash
+SCRIPT=".agents/skills/agent-testing/scripts/electron-dev.sh"
+
+# Start Electron dev with CDP (idempotent — skips if already running)
+$SCRIPT start
+
+# Check if Electron is running and CDP is reachable
+$SCRIPT status
+
+# Kill all Electron-related processes (main + helper + vite)
+$SCRIPT stop
+
+# Force fresh restart
+$SCRIPT restart
+```
+
+After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
+
+**Always run `$SCRIPT stop` when done testing** — `pkill -f "Electron"` alone won't catch all helper processes.
+
+#### Environment Variables
+
+| Variable          | Default                 | Description                              |
+| ----------------- | ----------------------- | ---------------------------------------- |
+| `CDP_PORT`        | `9222`                  | Chrome DevTools Protocol port            |
+| `ELECTRON_LOG`    | `/tmp/electron-dev.log` | Electron process log                     |
+| `ELECTRON_WAIT_S` | `60`                    | Max seconds to wait for Electron process |
+| `RENDERER_WAIT_S` | `60`                    | Max seconds to wait for SPA to load      |
+
+### LobeHub Probes & Quick Navigation
+
+`scripts/app-probe.sh` is the standard fast path into app state — **use it
+instead of hand-rolling `__LOBE_STORES` eval snippets** for these common needs:
+
+```bash
+PROBE=".agents/skills/agent-testing/scripts/app-probe.sh"
+
+$PROBE auth              # login check (Step 0.3) → { isSignedIn, userId }
+$PROBE route             # current SPA route
+$PROBE ops               # running chat operations (type / startTime)
+$PROBE goto /settings    # jump the SPA straight to a route (full reload)
+$PROBE errors-install    # install console.error interceptor
+$PROBE errors            # dump captured errors
+```
+
+`goto` lets a test enter the state under test directly instead of clicking
+through the UI. Common desktop routes:
+
+| Route                         | Where it lands                       |
+| ----------------------------- | ------------------------------------ |
+| `/`                           | Home (has a chat input)              |
+| `/agent/<agentId>`            | Agent conversation (latest topic)    |
+| `/agent/<agentId>/<topicId>`  | Specific topic in a conversation     |
+| `/task` · `/task/<taskId>`    | Task list / task detail              |
+| `/page`                       | Documents (文稿)                     |
+| `/settings`                   | Settings                             |
+| `/community`                  | Discover / community                 |
+
+Targets default to Electron (`--cdp 9222`); set `AB_TARGET="--session <name>"`
+for web sessions. For deeper or one-off state inspection, fall back to raw
+eval below.
+
+### LobeHub-Specific Patterns
+
+#### Access Zustand Store State
+
+```bash
+agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
+(function() {
+  var chat = window.__LOBE_STORES.chat();
+  var ops = Object.values(chat.operations);
+  return JSON.stringify({
+    ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
+    activeAgent: chat.activeAgentId,
+    activeTopic: chat.activeTopicId,
+  });
+})()
+EVALEOF
+```
+
+#### Find and Use the Chat Input
+
+```bash
+# The chat input is contenteditable — must use -C flag
+agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
+
+agent-browser --cdp 9222 click @e48
+agent-browser --cdp 9222 type @e48 "Hello world"
+agent-browser --cdp 9222 press Enter
+```
+
+#### Wait for Agent to Complete
+
+```bash
+agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
+(function() {
+  var chat = window.__LOBE_STORES.chat();
+  var ops = Object.values(chat.operations);
+  var running = ops.filter(function(o) { return o.status === 'running'; });
+  return running.length === 0 ? 'done' : 'running: ' + running.length;
+})()
+EVALEOF
+```
+
+#### Install Error Interceptor
+
+```bash
+agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
+(function() {
+  window.__CAPTURED_ERRORS = [];
+  var orig = console.error;
+  console.error = function() {
+    var msg = Array.from(arguments).map(function(a) {
+      if (a instanceof Error) return a.message;
+      return typeof a === 'object' ? JSON.stringify(a) : String(a);
+    }).join(' ');
+    window.__CAPTURED_ERRORS.push(msg);
+    orig.apply(console, arguments);
+  };
+  return 'installed';
+})()
+EVALEOF
+
+# Later, check captured errors:
+agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
+```
+
+## Electron Gotchas
+
+- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
+- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
+- **Dev build auto-opens DevTools, which hijacks the CDP target** — `agent-browser --cdp 9222` may attach to the DevTools page (`devtools://…`) instead of the app (`app://renderer/`). Symptom: `get url` returns a `devtools://` URL. Fix: close the DevTools target and reconnect:
+
+  ```bash
+  DT_ID=$(curl -s http://localhost:9222/json/list | python3 -c "import json,sys; ts=json.load(sys.stdin); print(next(t['id'] for t in ts if t['type']=='page' and t['url'].startswith('devtools://')))")
+  curl -s "http://localhost:9222/json/close/$DT_ID" > /dev/null
+  agent-browser close --all && agent-browser --cdp 9222 get url   # expect app://renderer/
+  ```
+
+- **Don't resize the Electron window after load** — resizing triggers full SPA reload
+- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
+- **Streaming / ticking UI needs GIF evidence** — see `scripts/record-gif.sh`; a static screenshot cannot prove time-based behavior.
@@ -0,0 +1,78 @@
+# Web (Full-Stack) Testing
+
+Default surface for **full-stack changes** — a new/changed API plus the UI that
+consumes it. The browser is the one surface where network requests and UI state
+are observable together, so you can assert both sides of the contract in a
+single run.
+
+For pure-frontend changes prefer [electron.md](./electron.md); for
+backend-only changes prefer [../cli/index.md](../cli/index.md).
+
+## Prerequisites
+
+- Complete [Step 0.0](../SKILL.md#00-resolve-the-current-test-environment) (resolve ports) and [Step -1](../SKILL.md#step--1--plan-approval-for-non-trivial-tests) (plan approval) first.
+- Local dev server running — [../references/dev-server.md](../references/dev-server.md)
+- Web auth verified in agent-browser — prefer `setup-auth.sh web-seed`, see [auth decision flow](../references/auth.md#web--decision-flow).
+
+## Option A — agent-browser with seeded auth (recommended)
+
+```bash
+./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
+./.agents/skills/agent-testing/scripts/setup-auth.sh web-seed
+```
+
+Then drive the verified session:
+
+```bash
+SESSION=lobehub-dev
+
+agent-browser --session $SESSION open "$SERVER_URL/"
+agent-browser --session $SESSION snapshot -i
+# interact via refs — full command reference: ../references/agent-browser.md
+```
+
+Use this session as the evidence source. Do not use ordinary Chrome screenshots
+or Chrome Network records as proof for Web tests; ordinary Chrome is only a
+fallback source for copying cookies into agent-browser when the seeded login is
+not available.
+
+### Watch the API while driving the UI
+
+```bash
+# After triggering the UI action under test:
+agent-browser --session $SESSION network requests --type xhr,fetch
+agent-browser --session $SESSION network requests --method POST
+
+# Record a full HAR for the report
+agent-browser --session $SESSION network har start
+# ... drive the scenario ...
+agent-browser --session $SESSION network har stop ./capture.har
+```
+
+Assert both layers: the request/response shape (network) and the rendered
+result (snapshot/screenshot). Both belong in the report as evidence.
+
+## Option B — real Chrome with remote debugging
+
+For flows that need a real, visible browser (e.g. exercising the login UI
+itself):
+
+```bash
+/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
+  --remote-debugging-port=9222 \
+  --user-data-dir=/tmp/chrome-test-profile \
+  "<URL>" &
+sleep 5
+agent-browser --cdp 9222 snapshot -i
+
+# Or auto-discover running Chrome with remote debugging
+agent-browser --auto-connect snapshot -i
+```
+
+## Option C — Debug Proxy (local frontend, production backend)
+
+`bun run dev:spa` prints a **Debug Proxy** URL
+(`https://app.lobehub.com/_dangerous_local_dev_proxy?debug-host=…`) that loads
+your local Vite SPA inside the online environment — HMR against real server
+config. Useful for verifying frontend behavior against production data, **not**
+for testing backend changes (the backend is production, not your branch).
@@ -1,172 +0,0 @@
---
-name: cli-backend-testing
-description: >
-  CLI + Backend integration testing workflow. Use when verifying backend API changes
-  (TRPC routers, services, models) via the LobeHub CLI against a local dev server.
-  Triggers on 'cli test', 'test with cli', 'verify with cli', 'local cli test',
-  'backend test with cli', or when needing to validate server-side changes end-to-end.
---
-
-# CLI + Backend Integration Testing
-
-Standard workflow for verifying backend changes using the LobeHub CLI (`lh`) against a local dev server.
-
-## When to Use
-
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
-
-## Prerequisites
-
-| Requirement  | Details                                                       |
-| ------------ | ------------------------------------------------------------- |
-| Dev server   | `localhost:3011` (Next.js)                                    |
-| CLI source   | `lobehub/apps/cli/`                                           |
-| CLI dev mode | Uses `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials |
-| Auth         | Device Code Flow login to local server                        |
-
-## Quick Reference
-
-All CLI dev commands run from `lobehub/apps/cli/`. Subsequent examples use `$CLI`:
-
-```bash
-CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
-```
-
-## Workflow
-
-### Step 1: Ensure Dev Server is Running
-
-```bash
-curl -s -o /dev/null -w '%{http_code}' http://localhost:3011/ 2> /dev/null
-```
-
- **If reachable**: skip to Step 2.
- **If unreachable**: start from cloud repo root:
-
-```bash
-pnpm run dev:next
-```
-
-To **restart** (pick up server-side code changes):
-
-```bash
-lsof -ti:3011 | xargs kill
-pnpm run dev:next
-```
-
-**Important:** Server-side code changes in the submodule (`lobehub/apps/server/src/`, `lobehub/src/server/`, `lobehub/packages/`) require a server restart. Next.js hot-reload may not pick up changes in submodule packages.
-
-### Step 2: Check CLI Authentication
-
-```bash
-cat lobehub/apps/cli/.lobehub-dev/settings.json 2> /dev/null
-```
-
- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: skip to Step 3.
- **If missing or wrong server**: ask the user to run:
-
-```bash
-! cd lobehub/apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3011
-```
-
-> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. Credentials persist in `lobehub/apps/cli/.lobehub-dev/`.
-
-### Step 3: Test with CLI Commands
-
-CLI runs from source, so CLI-side code changes take effect immediately without rebuilding.
-
-```bash
-cd lobehub/apps/cli
-$CLI <command>
-```
-
-### Step 4: Clean Up Test Data
-
-```bash
-$CLI task delete < id > -y
-$CLI agent delete < id > -y
-```
-
-## Common Testing Patterns
-
-### Task System
-
-```bash
-$CLI task list
-$CLI task create -n "Root Task" -i "Test instruction"
-$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
-$CLI task view T-1
-$CLI task tree T-1
-$CLI task edit T-1 --status running
-$CLI task comment T-1 -m "Test comment"
-$CLI task delete T-1 -y
-```
-
-### Agent System
-
-```bash
-$CLI agent list
-$CLI agent view <agent-id>
-$CLI agent run <agent-id> -m "Test prompt"
-```
-
-### Document & Knowledge Base
-
-```bash
-$CLI doc list
-$CLI doc create -t "Test Doc" -c "Content here"
-$CLI doc view <doc-id>
-$CLI kb list
-$CLI kb tree <kb-id>
-```
-
-### Model & Provider
-
-```bash
-$CLI model list
-$CLI provider list
-$CLI provider test <provider-id>
-```
-
-## Dev-Test Cycle
-
-```
-1. Make code changes (service/model/router/type)
-         |
-2. Run unit tests (fast feedback)
-   bunx vitest run --silent='passed-only' '<test-file>'
-         |
-3. Restart dev server (if server-side changes)
-   lsof -ti:3011 | xargs kill && pnpm run dev:next
-         |
-4. CLI verification (end-to-end)
-   $CLI <command>
-         |
-5. Clean up test data
-```
-
-### When Server Restart is Needed
-
-| Change Location                                         | Restart? |
-| ------------------------------------------------------- | -------- |
-| `lobehub/apps/server/src/` (routers, services, modules) | Yes      |
-| `lobehub/src/server/` (agent-hono, workflows-hono)      | Yes      |
-| `lobehub/packages/database/` (models)                   | Yes      |
-| `lobehub/packages/types/`                               | Yes      |
-| `lobehub/packages/prompts/`                             | Yes      |
-| `lobehub/apps/cli/` (CLI code)                          | No       |
-| `src/` (cloud overrides)                                | Yes      |
-
-## Troubleshooting
-
-| Issue                       | Solution                                                              |
-| --------------------------- | --------------------------------------------------------------------- |
-| `No authentication found`   | Run `login --server http://localhost:3011`                            |
-| `UNAUTHORIZED` on API calls | Token expired; re-run login                                           |
-| `ECONNREFUSED`              | Dev server not running; start with `pnpm run dev:next`                |
-| CLI shows old data/behavior | Server needs restart to pick up code changes                          |
-| `EADDRINUSE` on port 3011   | Server already running; kill with `lsof -ti:3011 \| xargs kill`       |
-| Login opens wrong server    | Must use `--server http://localhost:3011` flag (env var doesn't work) |
@@ -241,6 +241,6 @@ When the bug comes from a real trace, distill it into the closest existing test
 3. Add or update the narrowest failing test near the broken layer.
 4. Fix the smallest layer that can explain the symptom.
 5. Re-run focused tests.
-6. Only then do an Electron smoke test with the `local-testing` skill if UI confirmation is still needed.
+6. Only then do an Electron smoke test with the `agent-testing` skill if UI confirmation is still needed.

 Do not start with a broad Electron repro if a raw trace or adapter test can prove the fault zone faster.
@@ -1,561 +0,0 @@
---
-name: local-testing
-description: >
-  Local app and bot testing. Uses agent-browser CLI for Electron/web app UI testing,
-  and osascript (AppleScript) for controlling native macOS apps (WeChat, Discord, Telegram, Slack, Lark/飞书, QQ)
-  to test bots. Triggers on 'local test', 'test in electron', 'test desktop', 'test bot',
-  'bot test', 'test in discord', 'test in telegram', 'test in slack', 'test in weixin',
-  'test in wechat', 'test in lark', 'test in feishu', 'test in qq',
-  'manual test', 'osascript', or UI/bot verification tasks.
---
-
-# Local App & Bot Testing
-
-Two approaches for local testing on macOS:
-
-| Approach                    | Tool                | Best For                                             |
-| --------------------------- | ------------------- | ---------------------------------------------------- |
-| **agent-browser + CDP**     | `agent-browser` CLI | Electron apps, web apps (DOM access, JS eval)        |
-| **osascript (AppleScript)** | `osascript -e`      | Native macOS apps (WeChat, Discord, Telegram, Slack) |
-
---
-
-# Part 1: agent-browser (Electron / Web Apps)
-
-Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
-
-Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
-
-## Core Workflow
-
-Every browser automation follows this pattern:
-
-1. **Navigate**: `agent-browser open <url>`
-2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
-3. **Interact**: Use refs to click, fill, select
-4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
-
-```bash
-agent-browser open https://example.com/form
-agent-browser snapshot -i
-# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
-
-agent-browser fill @e1 "user@example.com"
-agent-browser fill @e2 "password123"
-agent-browser click @e3
-agent-browser wait --load networkidle
-agent-browser snapshot -i # Check result
-```
-
-## Command Chaining
-
-```bash
-# Chain open + wait + snapshot in one call
-agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
-```
-
-Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
-
-## Essential Commands
-
-```bash
-# Navigation
-agent-browser open <url>              # Navigate (aliases: goto, navigate)
-agent-browser close                   # Close browser
-agent-browser close --all             # Close all active sessions
-
-# Snapshot
-agent-browser snapshot -i             # Interactive elements with refs (recommended)
-agent-browser snapshot -s "#selector" # Scope to CSS selector
-
-# Interaction (use @refs from snapshot)
-agent-browser click @e1               # Click element
-agent-browser click @e1 --new-tab     # Click and open in new tab
-agent-browser fill @e2 "text"         # Clear and type text
-agent-browser type @e2 "text"         # Type without clearing
-agent-browser select @e1 "option"     # Select dropdown option
-agent-browser check @e1               # Check checkbox
-agent-browser press Enter             # Press key
-agent-browser keyboard type "text"    # Type at current focus (no selector)
-agent-browser keyboard inserttext "text"  # Insert without key events
-agent-browser scroll down 500         # Scroll page
-agent-browser scroll down 500 --selector "div.content"  # Scroll within container
-
-# Get information
-agent-browser get text @e1            # Get element text
-agent-browser get url                 # Get current URL
-agent-browser get title               # Get page title
-agent-browser get cdp-url             # Get CDP WebSocket URL
-
-# Wait
-agent-browser wait @e1                # Wait for element
-agent-browser wait --load networkidle # Wait for network idle
-agent-browser wait --url "**/page"    # Wait for URL pattern
-agent-browser wait 2000               # Wait milliseconds
-agent-browser wait --text "Welcome"   # Wait for text to appear
-agent-browser wait --fn "!document.body.innerText.includes('Loading...')"  # Wait for text to disappear
-agent-browser wait "#spinner" --state hidden  # Wait for element to disappear
-
-# Downloads
-agent-browser download @e1 ./file.pdf          # Click element to trigger download
-agent-browser wait --download ./output.zip     # Wait for any download to complete
-
-# Network
-agent-browser network requests                 # Inspect tracked requests
-agent-browser network requests --type xhr,fetch  # Filter by resource type
-agent-browser network requests --method POST   # Filter by HTTP method
-agent-browser network route "**/api/*" --abort # Block matching requests
-agent-browser network har start                # Start HAR recording
-agent-browser network har stop ./capture.har   # Stop and save HAR file
-
-# Viewport & Device Emulation
-agent-browser set viewport 1920 1080          # Set viewport size (default: 1280x720)
-agent-browser set viewport 1920 1080 2        # 2x retina
-agent-browser set device "iPhone 14"          # Emulate device (viewport + user agent)
-
-# Capture
-agent-browser screenshot              # Screenshot to temp dir
-agent-browser screenshot --full       # Full page screenshot
-agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
-agent-browser pdf output.pdf          # Save as PDF
-
-# Clipboard
-agent-browser clipboard read          # Read text from clipboard
-agent-browser clipboard write "text"  # Write text to clipboard
-agent-browser clipboard copy          # Copy current selection
-agent-browser clipboard paste         # Paste from clipboard
-
-# Dialogs (alert, confirm, prompt, beforeunload)
-agent-browser dialog accept           # Accept dialog
-agent-browser dialog accept "input"   # Accept prompt dialog with text
-agent-browser dialog dismiss          # Dismiss/cancel dialog
-agent-browser dialog status           # Check if dialog is open
-
-# Diff (compare page states)
-agent-browser diff snapshot                        # Compare current vs last snapshot
-agent-browser diff screenshot --baseline before.png  # Visual pixel diff
-agent-browser diff url <url1> <url2>               # Compare two pages
-
-# Streaming
-agent-browser stream enable           # Start WebSocket streaming
-agent-browser stream status           # Inspect streaming state
-agent-browser stream disable          # Stop streaming
-```
-
-## Batch Execution
-
-```bash
-echo '[
-  ["open", "https://example.com"],
-  ["snapshot", "-i"],
-  ["click", "@e1"],
-  ["screenshot", "result.png"]
-]' | agent-browser batch --json
-```
-
-## Authentication
-
-```bash
-# Option 1: Auth vault (credentials stored encrypted)
-echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
-agent-browser auth login myapp
-
-# Option 2: Session name (auto-save/restore cookies + localStorage)
-agent-browser --session-name myapp open https://app.example.com/login
-agent-browser close                                                       # State auto-saved
-agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
-
-# Option 3: Persistent profile
-agent-browser --profile ~/.myapp open https://app.example.com/login
-
-# Option 4: State file
-agent-browser state save auth.json
-agent-browser state load auth.json
-```
-
-### LobeHub dev server — inject better-auth cookie
-
-`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3011`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [references/agent-browser-login.md](./references/agent-browser-login.md) for the full recipe.
-
-## Semantic Locators (Alternative to Refs)
-
-```bash
-agent-browser find text "Sign In" click
-agent-browser find label "Email" fill "user@test.com"
-agent-browser find role button click --name "Submit"
-agent-browser find placeholder "Search" type "query"
-agent-browser find testid "submit-btn" click
-```
-
-## JavaScript Evaluation (eval)
-
-```bash
-# Simple expressions
-agent-browser eval 'document.title'
-
-# Complex JS: use --stdin with heredoc (RECOMMENDED)
-agent-browser eval --stdin << 'EVALEOF'
-JSON.stringify(
-  Array.from(document.querySelectorAll("img"))
-    .filter(i => !i.alt)
-    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
-)
-EVALEOF
-
-# Base64 encoding (avoids all shell escaping issues)
-agent-browser eval -b "$(echo -n 'document.title' | base64)"
-```
-
-## Ref Lifecycle
-
-Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
-
-## Annotated Screenshots (Vision Mode)
-
-```bash
-agent-browser screenshot --annotate
-# Output includes the image path and a legend:
-#   [1] @e1 button "Submit"
-#   [2] @e2 link "Home"
-agent-browser click @e2 # Click using ref from annotated screenshot
-```
-
-## Parallel Sessions
-
-```bash
-agent-browser --session site1 open https://site-a.com
-agent-browser --session site2 open https://site-b.com
-agent-browser session list
-```
-
-## Connect to Existing Chrome
-
-```bash
-agent-browser --auto-connect snapshot # Auto-discover running Chrome
-agent-browser --cdp 9222 snapshot     # Explicit CDP port
-```
-
-## iOS Simulator (Mobile Safari)
-
-```bash
-agent-browser device list
-agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
-agent-browser -p ios snapshot -i
-agent-browser -p ios tap @e1
-agent-browser -p ios swipe up
-agent-browser -p ios screenshot mobile.png
-agent-browser -p ios close
-```
-
-## Observability Dashboard
-
-```bash
-agent-browser dashboard install
-agent-browser dashboard start # Background server on port 4848
-agent-browser dashboard stop
-```
-
-## Cloud Providers
-
-Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
-
-## Browser Engine Selection
-
-```bash
-agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
-```
-
-## Electron (LobeHub Desktop)
-
-### Setup / Teardown
-
-Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
-
-```bash
-SCRIPT=".agents/skills/local-testing/scripts/electron-dev.sh"
-
-# Start Electron dev with CDP (idempotent — skips if already running)
-$SCRIPT start
-
-# Check if Electron is running and CDP is reachable
-$SCRIPT status
-
-# Kill all Electron-related processes (main + helper + vite)
-$SCRIPT stop
-
-# Force fresh restart
-$SCRIPT restart
-```
-
-After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
-
-**Always run `$SCRIPT stop` when done testing** — `pkill -f "Electron"` alone won't catch all helper processes.
-
-#### Environment Variables
-
-| Variable          | Default                 | Description                              |
-| ----------------- | ----------------------- | ---------------------------------------- |
-| `CDP_PORT`        | `9222`                  | Chrome DevTools Protocol port            |
-| `ELECTRON_LOG`    | `/tmp/electron-dev.log` | Electron process log                     |
-| `ELECTRON_WAIT_S` | `60`                    | Max seconds to wait for Electron process |
-| `RENDERER_WAIT_S` | `60`                    | Max seconds to wait for SPA to load      |
-
-### LobeHub-Specific Patterns
-
-#### Access Zustand Store State
-
-```bash
-agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
-(function() {
-  var chat = window.__LOBE_STORES.chat();
-  var ops = Object.values(chat.operations);
-  return JSON.stringify({
-    ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
-    activeAgent: chat.activeAgentId,
-    activeTopic: chat.activeTopicId,
-  });
-})()
-EVALEOF
-```
-
-#### Find and Use the Chat Input
-
-```bash
-# The chat input is contenteditable — must use -C flag
-agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
-
-agent-browser --cdp 9222 click @e48
-agent-browser --cdp 9222 type @e48 "Hello world"
-agent-browser --cdp 9222 press Enter
-```
-
-#### Wait for Agent to Complete
-
-```bash
-agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
-(function() {
-  var chat = window.__LOBE_STORES.chat();
-  var ops = Object.values(chat.operations);
-  var running = ops.filter(function(o) { return o.status === 'running'; });
-  return running.length === 0 ? 'done' : 'running: ' + running.length;
-})()
-EVALEOF
-```
-
-#### Install Error Interceptor
-
-```bash
-agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
-(function() {
-  window.__CAPTURED_ERRORS = [];
-  var orig = console.error;
-  console.error = function() {
-    var msg = Array.from(arguments).map(function(a) {
-      if (a instanceof Error) return a.message;
-      return typeof a === 'object' ? JSON.stringify(a) : String(a);
-    }).join(' ');
-    window.__CAPTURED_ERRORS.push(msg);
-    orig.apply(console, arguments);
-  };
-  return 'installed';
-})()
-EVALEOF
-
-# Later, check captured errors:
-agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
-```
-
-## Chrome / Web Apps
-
-```bash
-/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
-  --remote-debugging-port=9222 \
-  --user-data-dir=/tmp/chrome-test-profile \
-  "<URL>" &
-sleep 5
-agent-browser --cdp 9222 snapshot -i
-
-# Or auto-discover running Chrome with remote debugging
-agent-browser --auto-connect snapshot -i
-```
-
---
-
-# Part 2: osascript (Native macOS App Bot Testing)
-
-Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. Works with any app that supports macOS Accessibility, no CDP or Chromium needed.
-
-The pattern is the same for every platform:
-
-1. **Activate** the app (`tell application "X" to activate`)
-2. **Navigate** to a channel/chat (Quick Switcher `Cmd+K` or Search `Cmd+F`)
-3. **Send** a message (clipboard paste `Cmd+V` + Enter)
-4. **Wait** for the bot response
-5. **Screenshot** for verification (`screencapture` + `Read` tool)
-
-## Per-Platform References
-
-Pick the file for your target platform — each contains activation, navigation, send-message, and verification snippets specific to that app:
-
-Each channel has its own folder under `bot/<channel>/` containing an `index.md`
-(activation, navigation, send-message, and verification snippets specific to
-that app) and its test script:
-
-| Platform      | Reference                                        | Quick switcher |
-| ------------- | ------------------------------------------------ | -------------- |
-| Discord       | [bot/discord/index.md](./bot/discord/index.md)   | `Cmd+K`        |
-| Slack         | [bot/slack/index.md](./bot/slack/index.md)       | `Cmd+K`        |
-| Telegram      | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F`        |
-| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md)     | `Cmd+F`        |
-| Lark / 飞书   | [bot/lark/index.md](./bot/lark/index.md)         | `Cmd+K`        |
-| QQ            | [bot/qq/index.md](./bot/qq/index.md)             | `Cmd+F`        |
-
-For **shared osascript patterns** (activate, type, paste, screenshot, read accessibility, common workflow template, gotchas), see [bot/osascript-common.md](./bot/osascript-common.md). Read this first if you're new to osascript automation.
-
-## Bridge-based channels (no native app)
-
-Some channels have no native app to drive with osascript — they connect through
-a local bridge inside the Desktop app. These are tested with agent-browser
-(IPC + UI) plus the bridge's own HTTP/REST endpoints, not osascript:
-
-| Channel  | Reference                                        | What it drives                                           |
-| -------- | ------------------------------------------------ | -------------------------------------------------------- |
-| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | `imessageBridge.*` IPC + local bridge + BlueBubbles REST |
-
-For iMessage there is a one-shot regression script — see `test-imessage-bridge.sh` below.
-
---
-
-# Scripts
-
-**App / recording scripts** in `.agents/skills/local-testing/scripts/`:
-
-| Script                    | Usage                                               |
-| ------------------------- | --------------------------------------------------- |
-| `electron-dev.sh`         | Manage Electron dev env (start/stop/status/restart) |
-| `record-electron-demo.sh` | Record Electron app demo with ffmpeg                |
-| `record-app-screen.sh`    | Record app screen (video + screenshots, start/stop) |
-
-**Bot scripts** live under `.agents/skills/local-testing/bot/`, one folder per
-channel (alongside that channel's `index.md`). The shared
-`capture-app-window.sh` sits at the `bot/` root:
-
-| Script                             | Usage                                                               |
-| ---------------------------------- | ------------------------------------------------------------------- |
-| `capture-app-window.sh`            | Capture screenshot of a specific app window (used by bot tests)     |
-| `discord/test-discord-bot.sh`      | Send message to Discord bot via osascript                           |
-| `slack/test-slack-bot.sh`          | Send message to Slack bot via osascript                             |
-| `telegram/test-telegram-bot.sh`    | Send message to Telegram bot via osascript                          |
-| `wechat/test-wechat-bot.sh`        | Send message to WeChat bot via osascript                            |
-| `lark/test-lark-bot.sh`            | Send message to Lark / 飞书 bot via osascript                       |
-| `qq/test-qq-bot.sh`                | Send message to QQ bot via osascript                                |
-| `imessage/test-imessage-bridge.sh` | Regression-test the iMessage BlueBubbles bridge (IPC + HTTP)        |
-| `imessage/send-imessage-test.sh`   | Send one real iMessage (desktop → BB → iMessage) and verify it sent |
-
-### Window Screenshot Utility
-
-`capture-app-window.sh` captures a screenshot of a specific app window using `screencapture -l <windowID>`. It uses Swift + CGWindowList to find the window by process name, so screenshots work correctly even when the window is on an external monitor or behind other windows.
-
-```bash
-# Standalone usage
-./.agents/skills/local-testing/bot/capture-app-window.sh "Discord" /tmp/discord.png
-./.agents/skills/local-testing/bot/capture-app-window.sh "Slack" /tmp/slack.png
-./.agents/skills/local-testing/bot/capture-app-window.sh "WeChat" /tmp/wechat.png
-```
-
-All bot test scripts use this utility automatically for their screenshots.
-
-### Bot Test Scripts
-
-All bot test scripts share the same interface:
-
-```bash
-./scripts/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
-```
-
-Examples:
-
-```bash
-# Discord — test a bot in #bot-testing channel
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
-
-# Slack — test a bot in #bot-testing channel
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
-
-# Telegram — test a bot by username
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
-
-# WeChat — test a bot or send to a contact
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
-
-# Lark/飞书 — test a bot in a group chat
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
-
-# QQ — test a bot in a group or direct chat
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
-```
-
-Each script: activates the app, navigates to the channel/contact, pastes the message via clipboard, sends, waits, and takes a screenshot. Use the `Read` tool on the screenshot for visual verification.
-
-### iMessage bridge regression script
-
-`test-imessage-bridge.sh` does **not** follow the osascript bot interface — it
-drives the Desktop bridge's IPC + HTTP layers and asserts the result, then
-self-cleans. Needs BlueBubbles running and Electron up with CDP.
-
-```bash
-./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
-# defaults: bb_url=http://127.0.0.1:1234  cdp_port=9222 — exit 0 = all green
-```
-
-It guards the connect/configure flow (testConfig happy + reject paths, first-time
-`upsertConfig` save, bridge running + webhook registered, local-server secret
-enforcement). See [bot/imessage/index.md](./bot/imessage/index.md)
-for the full manual UI flow and known bugs.
-
---
-
-# Screen Recording
-
-Record automated demos using `record-app-screen.sh` (start/stop lifecycle, CDP screenshots + ffmpeg assembly). See [references/record-app-screen.md](references/record-app-screen.md) for full documentation.
-
-```bash
-./.agents/skills/local-testing/scripts/electron-dev.sh start
-./.agents/skills/local-testing/scripts/record-app-screen.sh start my-demo
-# ... run automation ...
-./.agents/skills/local-testing/scripts/record-app-screen.sh stop
-```
-
-Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/` (screenshots every 3s).
-
---
-
-# Gotchas
-
-### agent-browser
-
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
-
-### Electron-specific
-
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
-
-### osascript
-
-See [bot/osascript-common.md](./bot/osascript-common.md#gotchas) for the full osascript gotchas list (accessibility permissions, `keystroke` non-ASCII issues, locale-specific app names, rate limiting, etc.).
@@ -1,110 +0,0 @@
-# Log `agent-browser` into a local LobeHub dev server
-
-`agent-browser --headed` on macOS often creates the Chromium window off-screen — the user can't see or interact with it, so manual login inside the agent-browser session fails. Instead of sharing the user's real Chrome profile, copy the **better-auth session cookie** out of a request in DevTools and inject it into the agent-browser session as a Playwright-style state file.
-
-## When to use
-
- You need `agent-browser` to reach an authenticated page on `http://localhost:<port>` (e.g. `localhost:3011`).
- The user already has a logged-in tab of the same dev server in their own Chrome.
- Spawning a headed Chromium to let the user log in manually is unreliable (window off-screen, no interaction).
-
-Do **not** use this on production URLs — only local dev. Treat the cookie as a secret: don't paste it into shared logs, PRs, or commit it anywhere.
-
-## Step 1 — Ask the user to copy the cookie from a Network request, NOT `document.cookie`
-
-`document.cookie` will not return HttpOnly cookies, which is exactly where better-auth puts its session. Instruct the user:
-
-1. Open the logged-in tab (`http://localhost:<port>/…`) in their own Chrome.
-2. `Cmd+Option+I` → **Network** tab.
-3. Refresh, click any same-origin request (e.g. the top-level document request).
-4. In the right pane under **Request Headers**, right-click the `Cookie:` line → **Copy value** (or copy the entire header).
-5. Paste the string into chat.
-
-You only need the better-auth pieces. Everything else (Clerk, `LOBE_LOCALE`, HMR hash, theme vars) is noise and can stay. The minimum viable set is:
-
-```
-better-auth.session_token=<value>; better-auth.state=<value>
-```
-
-## Step 2 — Build a Playwright-style state file
-
-`agent-browser state load` expects Playwright's `storageState` format: a JSON with a `cookies` array and an `origins` array.
-
-```bash
-cat > /tmp/mkstate.py << 'PY'
-import json, sys, time
-
-# Read the Cookie header from stdin (allows optional "Cookie: " prefix).
-raw = sys.stdin.read().strip()
-if raw.lower().startswith("cookie:"):
-    raw = raw.split(":", 1)[1].strip()
-
-# Keep only better-auth cookies. Extend this set if the app genuinely needs more.
-WANTED = {"better-auth.session_token", "better-auth.state"}
-
-cookies = []
-exp = int(time.time()) + 30 * 24 * 3600  # 30 days
-for pair in raw.split("; "):
-    if "=" not in pair:
-        continue
-    name, _, value = pair.partition("=")
-    if name not in WANTED:
-        continue
-    cookies.append({
-        "name": name,
-        "value": value,
-        "domain": "localhost",
-        "path": "/",
-        "expires": exp,
-        "httpOnly": False,
-        "secure": False,
-        "sameSite": "Lax",
-    })
-
-if not cookies:
-    sys.stderr.write("no better-auth cookies found in input\n")
-    sys.exit(1)
-
-print(json.dumps({"cookies": cookies, "origins": []}, indent=2))
-PY
-
-# Feed the copied Cookie header in via env var or heredoc.
-printf '%s' "$COOKIE_HEADER" | python3 /tmp/mkstate.py > /tmp/state.json
-```
-
-**Note on `httpOnly`**: the real cookie in the user's browser is HttpOnly, but `storageState` doesn't enforce the flag on load — it just attaches the value. Storing with `httpOnly: false` is fine for local dev and sidesteps a CDP-context quirk where HttpOnly cookies sometimes fail to attach.
-
-## Step 3 — Load state and navigate
-
-```bash
-SESSION="my-test" # any stable session name
-
-agent-browser --session "$SESSION" state load /tmp/state.json
-agent-browser --session "$SESSION" open "http://localhost:3011/"
-agent-browser --session "$SESSION" get url
-# Expect NOT /signin?callbackUrl=… — if you still see signin, cookie didn't apply.
-```
-
-## Step 4 — Verify
-
-```bash
-agent-browser --session "$SESSION" snapshot -i | head -20
-# Look for the user's avatar/name in the sidebar, or absence of the signin form.
-```
-
-## Common failure modes
-
-| Symptom                                         | Cause                                                                   | Fix                                                  |
-| ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- |
-| Still redirects to `/signin` after `state load` | User pasted from `document.cookie` → missed HttpOnly session            | Re-pull from Network request Headers, not console    |
-| `state load` reports 0 cookies                  | Separator wrong, or user pasted URL-decoded value                       | Keep the raw `Cookie:` header as-is; split on `"; "` |
-| Login works briefly then expires                | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-load                                  |
-| Domain mismatch                                 | Use `domain: "localhost"` literally, no leading dot for local dev       | —                                                    |
-
-## Scope
-
-Only covers authenticating an **agent-browser** session into a **local** LobeHub dev server. It does not:
-
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com` and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI need a real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way (into agent-browser only).
@@ -0,0 +1,69 @@
+---
+name: model-bank-metadata
+description: 'Backfill and maintain model-bank metadata (knowledgeCutoff, family, generation). Use when adding models, fixing cutoff/family data, running a metadata sweep across aiModels providers, or researching official knowledge cutoffs.'
+user-invocable: false
+---
+
+# Model-Bank Metadata (knowledgeCutoff / family / generation)
+
+How to populate and maintain the three structured metadata fields on `packages/model-bank/src/aiModels/*.ts` model cards, at single-model scale (new model PR) or repo-wide scale (sweep across \~80 provider files / \~1900 entries).
+
+## Field semantics
+
+| Field             | Format                                                                              | Meaning                                                                                                                                                                                 |
+| ----------------- | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `knowledgeCutoff` | `'YYYY-MM'` (or `'YYYY'` if only the year is published)                             | World-knowledge cutoff. When a vendor distinguishes a **"reliable knowledge cutoff"** from the broader training-data cutoff (Anthropic does), always use the **reliable** one.          |
+| `family`          | lowercase slug (`claude`, `gpt`, `o-series`, `qwen`, `deepseek`, `llama`, `glm`, …) | Model lineage, finer than `organization`. Lets the UI group models and match the same model across aggregator providers.                                                                |
+| `generation`      | family slug + version (`claude-4.6`, `gpt-5.2`, `qwen3.5`, `llama-3.1`)             | Generation within the family. Only set when confidently derivable from the model line's naming. Rolling aliases (`qwen-max`, `deepseek-chat`, `gemini-flash-latest`) get `family` only. |
+
+All three are optional. **The cardinal rule: only fill what an authoritative source states or naming rules derive — never guess.** An empty field is correct for vendors that publish nothing.
+
+No DB migration is ever needed for these: builtin models are merged from model-bank at read time (`repositories/aiInfra/index.ts` spreads the whole card), so new card fields flow to the client automatically.
+
+## Sourcing rules for knowledgeCutoff
+
+Accept only:
+
+- Vendor official docs (platform.openai.com / developers.openai.com, docs.x.ai, ai.google.dev, docs.anthropic.com / platform.claude.com)
+- Official Hugging Face org model cards (huggingface.co/meta-llama/..., etc.)
+- Official tech reports / system cards / launch blog posts
+
+Reject:
+
+- **Third-party aggregator sites** (aiknowledgecutoff.com and similar) — proven to copy one model's value across a whole family. A Cohere sweep once claimed `2024-06` for four distinct base models; none of the cited Cohere pages said that, and the only cutoff Cohere actually publishes is Feb 2023 for the 08-2024 Command R/R+ refresh.
+- **AWS Bedrock model cards as sole source** — proven to conflate launch date with knowledge cutoff (DeepSeek R1's card lists both as "Jan 2025"). If Bedrock is the only place a value appears, leave the field empty.
+- Inference from `releasedAt` — a release date is not a cutoff.
+
+Variant inheritance: dated snapshots (`-2024-08-06`), speed/price tiers of the same checkpoint, quantizations (`-fp8`, `-awq`), context-length variants (`-32k`), ollama `:NNb` tags, and cloud-prefixed ids (`anthropic.`/`us.`/`global.` Bedrock ids) share their base model's cutoff. **Distills do not inherit** from teacher or base — use the distill's own published value or leave empty. **Sizes within one generation can genuinely differ**: Llama 3 8B is Mar 2023 while 70B is Dec 2023 (per Meta's own card) — don't "fix" that to one family-wide value.
+
+Vendors that publish no cutoffs (leave empty, don't chase): Qwen, DeepSeek, GLM/Zhipu, ERNIE, Doubao, Hunyuan, SenseNova, Spark, MiniMax, StepFun, Yi (mostly), Moonshot.
+
+Known per-vendor footguns:
+
+- **Anthropic**: Opus 4.6 reliable cutoff is `2025-05`, Sonnet 4.6 is `2025-08` — easy to swap. Claude 3.7 is `2024-10` (system card: trained through Nov 2024, knowledge cutoff end of Oct 2024). Cite system cards / the models overview, not the Help Center article (a living page that drops retired models — citation rot).
+- **xAI**: docs.x.ai has one blanket sentence covering grok-3/grok-4; mini variants are not named there. Grok 4.20/4.3 have no official cutoff anywhere.
+- **OpenAI**: per-model docs pages (developers.openai.com/api/docs/models/<id>) state cutoffs explicitly, including snapshot differences (gpt-4-1106-preview `2023-04` vs gpt-4-0125-preview `2023-12`).
+
+## family/generation derivation
+
+Rule-based, no research needed: `scripts/derive-family.ts` holds the per-family regex rules. Traps already encoded there — keep them when extending:
+
+- Date suffixes are not versions: `claude-sonnet-4-20250514` is generation `claude-4`, not `claude-4.2`.
+- Size suffixes are not versions: `llama-3-8b` → `llama-3` (not `llama-3.8`); `gemma-7b-it` is **gemma-1** (not gemma-7).
+- Vendor spelling variants: `qwen2p5` = qwen2.5, `llama-v3p1` = llama-3.1, ollama `:NNb` tags, Bedrock `us.`/`global.`/`anthropic.` prefixes.
+- `claude-X.0` normalizes to `claude-X`.
+- Fable/Mythos-class ids (`claude-fable-5`) don't match the opus/sonnet/haiku regex — they are the Mythos class — `family: 'claude-mythos'`, `generation: 'mythos-5'` (set manually; the launch page calls Fable 5 "the generally available Mythos-class model").
+
+## Repo-wide sweep workflow
+
+1. **Extract ids**: `bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts` → unique normalized chat-model ids (normalization = last path segment, lowercased). Non-chat types (image/video/embedding/tts) have no knowledge cutoff — skip them.
+2. **Research (multi-agent)**: chunk ids by family (≤50 per chunk) and fan out one research agent per chunk (Workflow tool), each returning `{id, cutoff, source}` with the sourcing rules above baked into the prompt, **plus** one adversarial verify agent per chunk that re-fetches cited sources and refutes unsupported claims. The verify pass is load-bearing: it caught the Cohere aggregator copy-paste and the AWS launch-date conflation.
+3. **Policy filter**: before applying, drop entries whose only source is a rejected category (check the returned `sources` map — e.g. drop everything sourced to aws.amazon.com).
+4. **Apply**: `bun scripts/apply-cutoffs.ts <map.json>` and `bun scripts/apply-family.ts <map.json>` (run from repo root). Both are idempotent codemods keyed on normalized id — aggregator providers get the same values automatically; entries that already have the field are skipped. They rely on the uniform prettier formatting of the data files (entries start `  {` / end `  },`, fields at 4-space indent).
+5. **Verify**: `cd packages/model-bank && bunx vitest run src/aiModels/__tests__/index.test.ts && bunx tsc --noEmit`.
+
+## Maintenance rules
+
+- **New model PRs** should fill all three fields inline, citing the official source in the PR body (see the Anthropic entries in `anthropic.ts` for reference values).
+- **After resolving merge conflicts** in model-bank data files, sanity-check that metadata didn't vanish: `git grep -c knowledgeCutoff -- 'packages/model-bank/src/aiModels/*.ts'` before vs after. A three-way stack of model PRs once silently dropped all 10 Anthropic cutoffs during conflict resolution.
+- Dirty ids exist in aggregator data (a sambanova id once carried a trailing tab). The codemods match ids verbatim — if a map key won't apply, check for invisible characters before assuming the model is missing.
@@ -0,0 +1,73 @@
+/**
+ * One-off codemod: apply a canonical { normalizedModelId: 'YYYY-MM' } map onto
+ * packages/model-bank/src/aiModels/*.ts, inserting `knowledgeCutoff` after the
+ * `id:` line of every chat-model entry that matches and doesn't already have one.
+ *
+ * Relies on the uniform prettier formatting of these files:
+ *   - each model entry starts with `  {` and ends with `  },` (2-space indent)
+ *   - fields are at 4-space indent: `    id: '...'`, `    type: 'chat'`
+ *
+ * Usage: bun /tmp/apply-cutoffs.ts /tmp/cutoff-map.json
+ */
+import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+
+const mapPath = process.argv[2];
+if (!mapPath) throw new Error('usage: bun apply-cutoffs.ts <map.json>');
+const map: Record<string, string> = JSON.parse(readFileSync(mapPath, 'utf8'));
+
+const dir = 'packages/model-bank/src/aiModels';
+const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
+
+let touchedFiles = 0;
+let inserted = 0;
+const matchedIds = new Set<string>();
+
+for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
+  const path = join(dir, file);
+  const lines = readFileSync(path, 'utf8').split('\n');
+  const out: string[] = [];
+  let changed = false;
+
+  let i = 0;
+  while (i < lines.length) {
+    if (lines[i] !== '  {') {
+      out.push(lines[i]);
+      i++;
+      continue;
+    }
+    // collect one model entry block
+    const start = i;
+    let end = i;
+    while (end < lines.length && lines[end] !== '  },') end++;
+    const block = lines.slice(start, end + 1);
+
+    const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
+    const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
+    const hasCutoff = block.some((l) => /^ {4}knowledgeCutoff:/.test(l));
+
+    if (idLineIdx >= 0 && isChat && !hasCutoff) {
+      const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
+      const norm = rawId ? normalize(rawId) : undefined;
+      const cutoff = norm ? map[norm] : undefined;
+      if (cutoff && /^\d{4}(?:-\d{2})?$/.test(cutoff)) {
+        block.splice(idLineIdx + 1, 0, `    knowledgeCutoff: '${cutoff}',`);
+        inserted++;
+        changed = true;
+        matchedIds.add(norm!);
+      }
+    }
+    out.push(...block);
+    i = end + 1;
+  }
+
+  if (changed) {
+    writeFileSync(path, out.join('\n'));
+    touchedFiles++;
+  }
+}
+
+console.log(`inserted ${inserted} knowledgeCutoff fields across ${touchedFiles} files`);
+console.log(`map ids used: ${matchedIds.size}/${Object.keys(map).length}`);
+const unused = Object.keys(map).filter((k) => !matchedIds.has(k));
+if (unused.length) console.log('unused map keys (first 20):', unused.slice(0, 20));
@@ -0,0 +1,49 @@
+import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
+import { join } from 'node:path';
+
+const map: Record<string, { family: string; generation?: string }> = JSON.parse(
+  readFileSync('/tmp/family-map.json', 'utf8'),
+);
+const dir = 'packages/model-bank/src/aiModels';
+const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
+
+let inserted = 0;
+let touchedFiles = 0;
+for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
+  const path = join(dir, file);
+  const lines = readFileSync(path, 'utf8').split('\n');
+  const out: string[] = [];
+  let changed = false;
+  let i = 0;
+  while (i < lines.length) {
+    if (lines[i] !== '  {') {
+      out.push(lines[i]);
+      i++;
+      continue;
+    }
+    let end = i;
+    while (end < lines.length && lines[end] !== '  },') end++;
+    const block = lines.slice(i, end + 1);
+    const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
+    const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
+    const hasFamily = block.some((l) => /^ {4}family:/.test(l));
+    if (idLineIdx >= 0 && isChat && !hasFamily) {
+      const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
+      const r = rawId ? map[normalize(rawId)] : undefined;
+      if (r) {
+        const add = [`    family: '${r.family}',`];
+        if (r.generation) add.push(`    generation: '${r.generation}',`);
+        block.splice(idLineIdx, 0, ...add);
+        inserted++;
+        changed = true;
+      }
+    }
+    out.push(...block);
+    i = end + 1;
+  }
+  if (changed) {
+    writeFileSync(path, out.join('\n'));
+    touchedFiles++;
+  }
+}
+console.log(`annotated ${inserted} model entries across ${touchedFiles} files`);
@@ -0,0 +1,237 @@
+/* eslint-disable regexp/no-unused-capturing-group */
+/**
+ * Rule-based derivation of { family, generation } from normalized model ids.
+ * Principle: only fill what is confidently derivable; otherwise omit.
+ *
+ * Usage: bun /tmp/derive-family.ts            # print distinct pairs for review
+ *        bun /tmp/derive-family.ts --emit     # write /tmp/family-map.json
+ */
+import { readFileSync, writeFileSync } from 'node:fs';
+
+const ids: string[] = JSON.parse(readFileSync('/tmp/model-ids.json', 'utf8'));
+
+type R = { family: string; generation?: string };
+
+const derive = (id: string): R | undefined => {
+  // strip cloud/bedrock prefixes for matching
+  const m = id.replace(/^(us\.|global\.|eu\.|apac\.)?(anthropic\.|meta\.|cohere\.|azure-)/, '');
+
+  // ---- anthropic ----
+  if (m.startsWith('claude')) {
+    // family = product-line tier (claude-opus/sonnet/haiku/instant); bare claude-2.x has no tier
+    const tier = m.match(/(opus|sonnet|haiku|instant)/)?.[1];
+    const family = tier ? `claude-${tier}` : 'claude';
+    let g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)[.-](\d)(?!\d)/); // claude-opus-4-8 / claude-haiku-4.5
+    if (g) return { family, generation: `claude-${g[1]}.${g[2]}` };
+    g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)(?!\d)/); // claude-opus-4
+    if (g) return { family, generation: `claude-${g[1]}` };
+    g = m.match(/^claude-(\d)[.-](\d)(?!\d)/); // claude-3-5-haiku / claude-3.7-sonnet / claude-2.1
+    if (g) return { family, generation: g[2] === '0' ? `claude-${g[1]}` : `claude-${g[1]}.${g[2]}` };
+    g = m.match(/^claude-(\d)(?!\d)/); // claude-3-haiku
+    if (g) return { family, generation: `claude-${g[1]}` };
+    if (m.startsWith('claude-instant')) return { family: 'claude-instant' };
+    if (/^claude-v?2/.test(m)) return { family: 'claude', generation: 'claude-2' };
+    return { family };
+  }
+
+  // ---- openai ----
+  if (/^(gpt-oss|gpt_oss)/.test(m) || m.startsWith('gpt-oss:'))
+    return { family: 'gpt-oss', generation: 'gpt-oss' };
+  if (/^(chatgpt-4o|gpt-4o)/.test(m)) return { family: 'gpt', generation: 'gpt-4o' };
+  if (/^gpt-(3\.5|35)/.test(m)) return { family: 'gpt', generation: 'gpt-3.5' };
+  if (m.startsWith('gpt-audio')) return { family: 'gpt', generation: 'gpt-audio' };
+  {
+    const g = m.match(/^gpt-(\d)\.(\d)/); // gpt-4.1 / gpt-5.2
+    if (g) return { family: 'gpt', generation: `gpt-${g[1]}.${g[2]}` };
+    const g2 = m.match(/^gpt-(\d)(?!\d)/); // gpt-4 / gpt-5
+    if (g2) return { family: 'gpt', generation: `gpt-${g2[1]}` };
+  }
+  {
+    const g = m.match(/^o([134])(-|$)/); // o1 / o3 / o4
+    if (g) return { family: 'o-series', generation: `o${g[1]}` };
+  }
+  if (/^(codex|computer-use-preview)/.test(m)) return { family: 'gpt' };
+
+  // ---- google ----
+  {
+    const g = m.match(/^gemini-(\d+(?:\.\d+)?)/);
+    if (g) return { family: 'gemini', generation: `gemini-${g[1]}` };
+    if (/^gemini-(pro|flash)/.test(m)) return { family: 'gemini' }; // rolling aliases
+    if (m.startsWith('gemma')) {
+      if (/^gemma-?\db/.test(m)) return { family: 'gemma', generation: 'gemma-1' };
+      const v = m.match(/^gemma-?(\d)(?!b)/);
+      return { family: 'gemma', generation: v ? `gemma-${v[1]}` : undefined };
+    }
+    if (/^(codegemma|learnlm|palm)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
+  }
+
+  // ---- qwen ----
+  if (m.startsWith('qwq')) return { family: 'qwen', generation: 'qwq' };
+  if (m.startsWith('qvq')) return { family: 'qwen', generation: 'qvq' };
+  if (m.startsWith('codeqwen')) return { family: 'qwen' };
+  if (m.startsWith('qwen')) {
+    const g =
+      m.match(/^qwen-?([123](?:\.\d+)?)(?![0-9b])/) || // qwen3.5-plus / qwen-3-14b / qwen2-7b / qwen1.5
+      m.match(/^qwen([23](?:\.\d+)?):/) || // qwen2.5:72b
+      m.match(/^qwen([23])p(\d)/); // qwen2p5 -> handled below
+    if (/^qwen(\d)p(\d)/.test(m)) {
+      const p = m.match(/^qwen(\d)p(\d)/)!;
+      return { family: 'qwen', generation: `qwen${p[1]}.${p[2]}` };
+    }
+    if (g) return { family: 'qwen', generation: `qwen${g[1]}` };
+    return { family: 'qwen' }; // qwen-max/plus/turbo/vl rolling aliases
+  }
+
+  // ---- deepseek ----
+  if (/^(deepseek|azure-deepseek|pro-deepseek)/.test(m) || m.startsWith('deepseek_')) {
+    const s = m.replace(/^pro-/, '').replaceAll('_', '-');
+    if (s.startsWith('deepseek-r1-distill'))
+      return { family: 'deepseek', generation: 'deepseek-r1-distill' };
+    if (s.startsWith('deepseek-r1')) return { family: 'deepseek', generation: 'deepseek-r1' };
+    const g = s.match(/^deepseek-(?:chat-)?v(\d(?:\.\d)?)/);
+    if (g) return { family: 'deepseek', generation: `deepseek-v${g[1]}` };
+    if (/^deepseek-(coder-v2|coder)/.test(s))
+      return { family: 'deepseek', generation: 'deepseek-coder' };
+    return { family: 'deepseek' }; // deepseek-chat / reasoner rolling aliases
+  }
+
+  // ---- meta llama ----
+  if (m.startsWith('codellama')) return { family: 'llama', generation: 'codellama' };
+  if (/^(meta-)?llama|^l3(\d)?-|^llava/.test(m)) {
+    if (m.startsWith('llava')) return { family: 'llava' };
+    const s = m.replace(/^meta-/, '');
+    const g =
+      s.match(/^llama-?([234])(?:[.-](\d))?(?![0-9b])/) || // llama-3.1 / llama3.3 / llama-4
+      s.match(/^llama-?v([234])p?(\d)?/) || // llama-v3p1
+      s.match(/^llama([234])[.:-](\d)?/);
+    if (g) {
+      const gen = g[2] ? `llama-${g[1]}.${g[2]}` : `llama-${g[1]}`;
+      return { family: 'llama', generation: gen };
+    }
+    if (m.startsWith('l3-')) return { family: 'llama', generation: 'llama-3' };
+    if (m.startsWith('l31-')) return { family: 'llama', generation: 'llama-3.1' };
+    return { family: 'llama' };
+  }
+
+  // ---- zhipu ----
+  if (/^(zai-)?glm/.test(m)) {
+    const s = m.replace(/^zai-/, '');
+    if (s.startsWith('glm-z1')) return { family: 'glm', generation: 'glm-z1' };
+    if (s.startsWith('glm-zero')) return { family: 'glm', generation: 'glm-zero' };
+    const g = s.match(/^glm-(\d(?:\.\d)?)/);
+    if (g) return { family: 'glm', generation: `glm-${g[1]}` };
+    return { family: 'glm' };
+  }
+  if (/^(charglm|codegeex|emohaa)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
+
+  // ---- mistral ----
+  if (
+    /^(open-)?(mistral|mixtral|ministral|codestral|devstral|magistral|pixtral|mathstral|labs-devstral|labs-leanstral|open-codestral)/.test(
+      m,
+    )
+  ) {
+    const fam = m.replace(/^(open-|labs-)/, '').match(/^[a-z]+/)![0];
+    return { family: fam };
+  }
+
+  // ---- xai ----
+  if (m.startsWith('grok')) {
+    const g = m.match(/^grok-(\d(?:\.\d+)?)/);
+    return { family: 'grok', generation: g ? `grok-${g[1]}` : undefined };
+  }
+
+  // ---- moonshot ----
+  if (m.startsWith('kimi')) {
+    const g = m.match(/^kimi-k(\d(?:\.\d)?)/);
+    return { family: 'kimi', generation: g ? `kimi-k${g[1]}` : undefined };
+  }
+  if (m.startsWith('moonshot-kimi-k2')) return { family: 'kimi', generation: 'kimi-k2' };
+  if (m.startsWith('moonshot-v1')) return { family: 'kimi', generation: 'moonshot-v1' };
+
+  // ---- minimax ----
+  if (m.startsWith('minimax')) {
+    if (m.startsWith('minimax-text')) return { family: 'minimax', generation: 'minimax-text-01' };
+    const g = m.match(/^minimax-m(\d(?:\.\d)?)/);
+    return { family: 'minimax', generation: g ? `minimax-m${g[1]}` : undefined };
+  }
+  if (m.startsWith('abab')) return { family: 'minimax', generation: 'abab' };
+
+  // ---- baidu ----
+  if (m.startsWith('ernie')) {
+    if (m.startsWith('ernie-x1')) return { family: 'ernie', generation: 'ernie-x1' };
+    const g = m.match(/^ernie-(\d\.\d)/);
+    return { family: 'ernie', generation: g ? `ernie-${g[1]}` : undefined };
+  }
+  if (m.startsWith('qianfan')) return { family: 'qianfan' };
+
+  // ---- bytedance ----
+  if (m.startsWith('doubao')) {
+    const g = m.match(/^doubao-seed-(\d[.-]\d|\d)/) || m.match(/^doubao-(\d\.\d)/);
+    return { family: 'doubao', generation: g ? `doubao-${g[1].replace('-', '.')}` : undefined };
+  }
+  if (/^(seed-oss|skylark)/.test(m)) return { family: m.startsWith('seed') ? 'doubao' : 'skylark' };
+
+  // ---- tencent ----
+  if (m.startsWith('hunyuan')) {
+    const g = m.match(/^hunyuan-(\d\.\d)/);
+    return { family: 'hunyuan', generation: g ? `hunyuan-${g[1]}` : undefined };
+  }
+  if (m.startsWith('hy3')) return { family: 'hunyuan', generation: 'hunyuan-3' };
+
+  // ---- others (family only / simple version) ----
+  if (m.startsWith('yi-')) return { family: 'yi' };
+  if (/^(command|c4ai-command)/.test(m)) return { family: 'command' };
+  if (/^(aya|c4ai-aya)/.test(m)) return { family: 'aya' };
+  if (/^phi-?(\d)?/.test(m) && m.startsWith('phi')) {
+    const g = m.match(/^phi-?(\d(?:\.\d)?)/);
+    return { family: 'phi', generation: g ? `phi-${g[1]}` : undefined };
+  }
+  if (m.startsWith('wizardlm')) return { family: 'wizardlm' };
+  if (m.startsWith('step-')) {
+    const g = m.match(/^step-(?:r1|(\d(?:\.\d)?))/);
+    return { family: 'step', generation: g?.[1] ? `step-${g[1]}` : undefined };
+  }
+  if (/^(internlm|intern-)/.test(m)) return { family: 'intern' };
+  if (m.startsWith('internvl')) return { family: 'internvl' };
+  if (m.startsWith('baichuan')) {
+    const g = m.match(/^baichuan-?(m?\d)/);
+    return { family: 'baichuan', generation: g ? `baichuan-${g[1]}` : undefined };
+  }
+  if (/^(sensechat|sensenova)/.test(m)) return { family: 'sensenova' };
+  if (/^(spark|generalv|4\.0ultra)/.test(m)) return { family: 'spark' };
+  if (/^(360gpt|360zhinao)/.test(m)) return { family: '360zhinao' };
+  if (/^(jamba|ai21-jamba)/.test(m)) return { family: 'jamba' };
+  if (m.startsWith('sonar')) return { family: 'sonar' };
+  if (/^(nova-lite|nova-micro|nova-pro)/.test(m)) return { family: 'nova' };
+  if (/^(ling|ring)-/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
+  if (m.startsWith('longcat')) return { family: 'longcat' };
+  if (m.startsWith('mimo')) return { family: 'mimo' };
+  if (m.startsWith('taichu')) return { family: 'taichu' };
+  if (/^(hermes|nous-hermes)/.test(m)) return { family: 'hermes' };
+  if (m.startsWith('solar')) return { family: 'solar' };
+  if (m.startsWith('kat-coder')) return { family: 'kat-coder' };
+  if (m.startsWith('dbrx')) return { family: 'dbrx' };
+  if (m.startsWith('morph')) return { family: 'morph' };
+
+  return undefined;
+};
+
+const map: Record<string, R> = {};
+const pairs = new Map<string, number>();
+let derived = 0;
+for (const id of ids) {
+  const r = derive(id);
+  if (!r) continue;
+  derived++;
+  map[id] = r;
+  const key = `${r.family} :: ${r.generation ?? '—'}`;
+  pairs.set(key, (pairs.get(key) || 0) + 1);
+}
+
+console.log(`derived ${derived}/${ids.length}`);
+for (const [k, n] of [...pairs.entries()].sort()) console.log(String(n).padStart(4), k);
+
+if (process.argv.includes('--emit')) {
+  writeFileSync('/tmp/family-map.json', JSON.stringify(map, null, 1));
+  console.log('\nwritten /tmp/family-map.json');
+}
@@ -0,0 +1,23 @@
+/**
+ * Extract unique normalized chat-model ids from packages/model-bank/src/aiModels/*.ts.
+ * Normalization: last path segment, lowercased (matches the apply codemods).
+ *
+ * Usage (repo root): bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts [out.json]
+ * Default output: /tmp/model-ids.json
+ */
+import { readdirSync, writeFileSync } from 'node:fs';
+import { join, resolve } from 'node:path';
+
+const dir = resolve('packages/model-bank/src/aiModels');
+const out = process.argv[2] || '/tmp/model-ids.json';
+
+const ids = new Set<string>();
+for (const f of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
+  const mod = await import(join(dir, f));
+  for (const m of mod.default || []) {
+    if (!m?.id || m.type !== 'chat') continue;
+    ids.add(m.id.split('/').pop()!.toLowerCase());
+  }
+}
+writeFileSync(out, JSON.stringify([...ids].sort(), null, 1));
+console.log(`${ids.size} unique normalized chat ids -> ${out}`);
@@ -50,7 +50,7 @@ Common false positives (do NOT merge):
 - `db-migrations` vs `drizzle` — distinct workflows (migration files vs schema authoring).
 - `microcopy` vs `i18n` — content vs mechanics.
 - `agent-runtime-hooks` vs `agent-tracing` vs `agent-signal` — different surfaces of the agent system.
- `testing` vs `local-testing` vs `cli-backend-testing` — different test types.
+- `testing` vs `agent-testing` — different test types.

 ### 4 — Description format consistency

@@ -5,6 +5,18 @@ inputs:
  node-version:
    description: Node.js version
    required: true
+  cloud-repository:
+    description: Cloud repository to overlay for commercial desktop builds
+    required: false
+    default: lobehub/lobehub-cloud
+  cloud-ref:
+    description: Optional Cloud repository ref
+    required: false
+    default: ''
+  cloud-token:
+    description: GitHub token with permission to read the Cloud repository
+    required: false
+    default: ''

 runs:
  using: composite
@@ -14,9 +26,77 @@ runs:
      with:
        node-version: ${{ inputs.node-version }}

+    - name: Overlay Cloud repository for desktop build
+      if: inputs.cloud-token != ''
+      shell: bash
+      env:
+        CLOUD_CHECKOUT: ${{ runner.temp }}/lobehub-cloud
+        CLOUD_REF: ${{ inputs.cloud-ref }}
+        CLOUD_REPOSITORY: ${{ inputs.cloud-repository }}
+        CLOUD_ROOT: ${{ github.workspace }}/..
+        CLOUD_TOKEN: ${{ inputs.cloud-token }}
+      run: |
+        set -euo pipefail
+
+        cloud_root="$(cd "$GITHUB_WORKSPACE/.." && pwd)"
+        cloud_checkout="$RUNNER_TEMP/lobehub-cloud"
+
+        rm -rf "$cloud_checkout"
+
+        clone_args=(--depth 1)
+        if [ -n "$CLOUD_REF" ]; then
+          clone_args+=(--branch "$CLOUD_REF")
+        fi
+
+        git clone "${clone_args[@]}" "https://x-access-token:${CLOUD_TOKEN}@github.com/${CLOUD_REPOSITORY}.git" "$cloud_checkout"
+
+        node <<'NODE'
+        const fs = require('node:fs');
+        const path = require('node:path');
+
+        const source = process.env.CLOUD_CHECKOUT;
+        const target = process.env.CLOUD_ROOT;
+        const skip = new Set(['.git', 'lobehub', 'node_modules']);
+
+        const copy = (from, to) => {
+          const stat = fs.lstatSync(from);
+          if (stat.isSymbolicLink()) {
+            const link = fs.readlinkSync(from);
+            fs.rmSync(to, { force: true, recursive: true });
+            fs.symlinkSync(link, to);
+            return;
+          }
+
+          if (stat.isDirectory()) {
+            fs.mkdirSync(to, { recursive: true });
+            for (const entry of fs.readdirSync(from)) {
+              if (skip.has(entry)) continue;
+              copy(path.join(from, entry), path.join(to, entry));
+            }
+            return;
+          }
+
+          fs.mkdirSync(path.dirname(to), { recursive: true });
+          fs.copyFileSync(from, to);
+        };
+
+        for (const entry of fs.readdirSync(source)) {
+          if (skip.has(entry)) continue;
+          copy(path.join(source, entry), path.join(target, entry));
+        }
+        NODE
+
+        echo "CLOUD_DESKTOP=1" >> "$GITHUB_ENV"
+        echo "✅ Cloud repository overlaid at $cloud_root"
+
    - name: Install dependencies
      shell: bash
-      run: pnpm install --node-linker=hoisted
+      run: |
+        set -euo pipefail
+        if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
+          cd ..
+        fi
+        pnpm install --node-linker=hoisted

    # 移除国内 electron 镜像配置，GitHub Actions 使用官方源更快
    - name: Remove China electron mirror from .npmrc
@@ -31,4 +111,11 @@ runs:

    - name: Install deps on Desktop
      shell: bash
-      run: npm run install-isolated --prefix=./apps/desktop
+      run: |
+        set -euo pipefail
+        if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
+          cd ..
+          npm run install-isolated --prefix=./lobehub/apps/desktop
+        else
+          npm run install-isolated --prefix=./apps/desktop
+        fi
@@ -30,7 +30,7 @@ jobs:
            This issue is closed, If you have any questions, you can comment and reply.
      - name: Checkout repository
        if: github.event_name == 'pull_request_target' && github.event.pull_request.merged == true
-        uses: actions/checkout@v4
+        uses: actions/checkout@v6

      - name: Check if PR author is maintainer
        if: github.event.pull_request.merged == true
@@ -104,6 +104,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: ${{ env.NODE_VERSION }}

      - name: Set package version
@@ -172,6 +173,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: ${{ env.NODE_VERSION }}

      - name: Set package version
@@ -216,6 +218,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: ${{ env.NODE_VERSION }}

      - name: Set package version
@@ -54,7 +54,7 @@ jobs:
      - name: Setup Node.js
        uses: actions/setup-node@v6
        with:
-          node-version: 24.11.1
+          node-version: 24.16.0
          package-manager-cache: false

      # 主要逻辑：确定构建版本号
@@ -92,6 +92,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: 24.11.1

      # 设置 package.json 的版本号
@@ -87,6 +87,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: ${{ env.NODE_VERSION }}

      - name: Set package version
@@ -223,6 +223,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: ${{ env.NODE_VERSION }}

      - name: Set package version
@@ -409,7 +410,7 @@ jobs:
      - uses: actions/checkout@v6

      - name: Delete old canary GitHub releases
-        uses: actions/github-script@v7
+        uses: actions/github-script@v8
        with:
          script: |
            const { data: releases } = await github.rest.repos.listReleases({
@@ -180,6 +180,7 @@ jobs:
      - name: Setup build environment
        uses: ./.github/actions/desktop-build-setup
        with:
+          cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
          node-version: ${{ env.NODE_VERSION }}

      - name: Set package version
@@ -28,7 +28,7 @@ jobs:
      - name: Setup Node.js
        uses: actions/setup-node@v6
        with:
-          node-version: 24.11.1
+          node-version: 24.16.0

      - name: Setup pnpm
        uses: pnpm/action-setup@v4
@@ -51,7 +51,7 @@ jobs:
      - name: Setup Node.js
        uses: actions/setup-node@v6
        with:
-          node-version: 24.11.1
+          node-version: 24.16.0
          registry-url: https://registry.npmjs.org

      - name: Setup pnpm
@@ -19,12 +19,6 @@ jobs:
    steps:
      - uses: actions/checkout@v6

-      - name: Clean issue notice
-        uses: actions-cool/issues-helper@e361abf610221f09495ad510cb1e69328d839e1c # v3.7.6
-        with:
-          actions: 'close-issues'
-          labels: '🚨 Sync Fail'
-
      - name: Sync upstream changes
        id: sync
        uses: aormsby/Fork-Sync-With-Upstream-action@v3.4
@@ -33,22 +27,4 @@ jobs:
          upstream_sync_branch: main
          target_sync_branch: main
          target_repo_token: ${{ secrets.GITHUB_TOKEN }} # automatically generated, no need to set
-          test_mode: false
-
-      - name: Sync check
-        if: failure()
-        uses: actions-cool/issues-helper@e361abf610221f09495ad510cb1e69328d839e1c # v3.7.6
-        with:
-          actions: 'create-issue'
-          title: '🚨 同步失败 | Sync Fail'
-          labels: '🚨 Sync Fail'
-          body: |
-            Due to a change in the workflow file of the [LobeChat][lobechat] upstream repository, GitHub has automatically suspended the scheduled automatic update. You need to manually sync your fork. Please refer to the detailed [Tutorial][tutorial-en-US] for instructions.
-
-            由于 [LobeChat][lobechat] 上游仓库的 workflow 文件变更，导致 GitHub 自动暂停了本次自动更新，你需要手动 Sync Fork 一次，请查看 [详细教程][tutorial-zh-CN]
-
-            ![](https://github-production-user-asset-6210df.s3.amazonaws.com/17870709/273954625-df80c890-0822-4ac2-95e6-c990785cbed5.png)
-
-            [lobechat]: https://github.com/lobehub/lobe-chat
-            [tutorial-zh-CN]: https://lobehub.com/zh/docs/self-hosting/advanced/upstream-sync
-            [tutorial-en-US]: https://lobehub.com/docs/self-hosting/advanced/upstream-sync
+          test_mode: false
@@ -32,7 +32,7 @@ jobs:
    runs-on: ubuntu-latest
    name: Test Packages
    env:
-      PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/python-interpreter @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'
+      PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'

    steps:
      - name: Checkout
@@ -90,11 +90,23 @@ jobs:
          for package in $PACKAGES; do
            dir="${package#@lobechat/}"
            if [ -f "./packages/$dir/coverage/lcov.info" ]; then
-              echo "Uploading coverage for $dir..."
+              flag="packages/$dir"
+
+              case "$dir" in
+                builtin-tool-*)
+                  flag="builtin-tools"
+                  ;;
+                locales|env|device-gateway-client)
+                  echo "Skipping Codecov upload for $dir."
+                  continue
+                  ;;
+              esac
+
+              echo "Uploading coverage for $dir as $flag..."
              ./codecov upload-coverage \
                $COMMON_ARGS \
                --file ./packages/$dir/coverage/lcov.info \
-                --flag packages/$dir \
+                --flag "$flag" \
                --disable-search
            fi
          done
@@ -105,8 +117,8 @@ jobs:
    if: needs.check-duplicate-run.outputs.should_skip != 'true'
    strategy:
      matrix:
-        shard: [1, 2, 3]
-    name: Test App (shard ${{ matrix.shard }}/3)
+        shard: [1, 2]
+    name: Test App (shard ${{ matrix.shard }}/2)
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
@@ -126,7 +138,7 @@ jobs:
        run: pnpm install

      - name: Run tests
-        run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/3
+        run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/2 --exclude '**/apps/server/**'

      - name: Upload blob report
        if: ${{ !cancelled() }}
@@ -219,6 +231,40 @@ jobs:
          files: ./apps/desktop/coverage/lcov.info
          flags: desktop

+  test-server:
+    needs: check-duplicate-run
+    if: needs.check-duplicate-run.outputs.should_skip != 'true'
+    name: Test Server
+
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        env:
+          REF_SHA: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
+          REPOSITORY: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name || github.repository }}
+        run: |
+          git init .
+          git remote add origin "https://github.com/${REPOSITORY}.git"
+          git fetch --no-tags --depth=1 origin "${REF_SHA}"
+          git checkout --force FETCH_HEAD
+
+      - name: Setup environment
+        uses: ./.github/actions/setup-env
+
+      - name: Install deps
+        run: pnpm install
+
+      - name: Test Server Coverage
+        run: bunx vitest --coverage --silent='passed-only' --reporter=default --coverage.reportsDirectory=./apps/server/coverage --dir apps/server
+
+      - name: Upload Server coverage to Codecov
+        uses: codecov/codecov-action@v5
+        with:
+          token: ${{ secrets.CODECOV_TOKEN }}
+          files: ./apps/server/coverage/lcov.info
+          flags: server
+
  test-databsae:
    needs: check-duplicate-run
    if: needs.check-duplicate-run.outputs.should_skip != 'true'
@@ -59,6 +59,7 @@ bun.lockb
 # Build outputs
 dist/
 public/_spa/
+public/_spa-auth/
 public/spa/
 es/
 lib/
@@ -92,10 +93,8 @@ public/swe-worker*

 # Generated files
 src/app/spa/[variants]/[[...path]]/spaHtmlTemplates.ts
+src/app/spa-auth/authHtmlTemplate.ts
 public/*.js
-public/sitemap.xml
-public/sitemap-index.xml
-sitemap*.xml
 robots.txt

 # Git hooks
@@ -29,13 +29,14 @@
  },
  "devDependencies": {
    "@lobechat/agent-gateway-client": "workspace:*",
+    "@lobechat/device-control": "workspace:*",
    "@lobechat/device-gateway-client": "workspace:*",
    "@lobechat/device-identity": "workspace:*",
    "@lobechat/heterogeneous-agents": "workspace:*",
    "@lobechat/local-file-shell": "workspace:*",
    "@lobechat/tool-runtime": "workspace:*",
    "@trpc/client": "^11.8.1",
-    "@types/node": "^22.13.5",
+    "@types/node": "^24.13.2",
    "@types/ws": "^8.18.1",
    "commander": "^13.1.0",
    "dayjs": "^1.11.19",
@@ -1,5 +1,6 @@
 packages:
  - '../../packages/agent-gateway-client'
+  - '../../packages/device-control'
  - '../../packages/device-gateway-client'
  - '../../packages/device-identity'
  - '../../packages/heterogeneous-agents'
@@ -2,9 +2,16 @@ import fs from 'node:fs';
 import os from 'node:os';
 import path from 'node:path';

+import {
+  defaultGetLocalFilePreview,
+  defaultGetProjectFileIndex,
+  type DeviceControlDeps,
+  executeDeviceRpc,
+} from '@lobechat/device-control';
 import type {
  AgentRunRequestMessage,
  DeviceSystemInfo,
+  RpcRequestMessage,
  SystemInfoRequestMessage,
  ToolCallRequestMessage,
 } from '@lobechat/device-gateway-client';
@@ -262,19 +269,23 @@ async function runConnect(options: ConnectOptions, isDaemonChild: boolean) {

  // Handle tool call requests
  client.on('tool_call_request', async (request: ToolCallRequestMessage) => {
-    const { requestId, timeout, toolCall } = request;
+    const { operationId, requestId, timeout, toolCall } = request;
    if (isDaemonChild) {
-      appendLog(`[TOOL] ${toolCall.apiName} (${requestId})`);
+      appendLog(
+        `[TOOL] ${toolCall.apiName}${operationId ? ` op=${operationId}` : ''} (${requestId})`,
+      );
    } else {
-      log.toolCall(toolCall.apiName, requestId, toolCall.arguments);
+      log.toolCall(toolCall.apiName, requestId, toolCall.arguments, operationId);
    }

    const result = await executeToolCall(toolCall.apiName, toolCall.arguments, timeout);

    if (isDaemonChild) {
-      appendLog(`[RESULT] ${result.success ? 'OK' : 'FAIL'} (${requestId})`);
+      appendLog(
+        `[RESULT] ${result.success ? 'OK' : 'FAIL'}${operationId ? ` op=${operationId}` : ''} (${requestId})`,
+      );
    } else {
-      log.toolResult(requestId, result.success, result.content);
+      log.toolResult(requestId, result.success, result.content, operationId);
    }

    client.sendToolCallResponse({
@@ -288,6 +299,31 @@ async function runConnect(options: ConnectOptions, isDaemonChild: boolean) {
    });
  });

+  // Handle generic server-internal device RPCs (git / workspace / file ops).
+  // Shares the `@lobechat/device-control` dispatcher with the desktop app so the
+  // CLI exposes the same remote-device control surface. File preview / index use
+  // the package's portable defaults (no preview-protocol approval on the CLI).
+  const deviceControlDeps: DeviceControlDeps = {
+    getLocalFilePreview: defaultGetLocalFilePreview,
+    getProjectFileIndex: defaultGetProjectFileIndex,
+  };
+
+  client.on('rpc_request', async (request: RpcRequestMessage) => {
+    const { method, params, requestId } = request;
+    if (isDaemonChild) appendLog(`[RPC] ${method} (${requestId})`);
+    else info(`Received rpc_request: method=${method} (${requestId})`);
+
+    try {
+      const data = await executeDeviceRpc(method, params, deviceControlDeps);
+      client.sendRpcResponse({ requestId, result: { data, success: true } });
+    } catch (err) {
+      const message = err instanceof Error ? err.message : String(err);
+      if (isDaemonChild) appendLog(`[RPC ERROR] ${method}: ${message} (${requestId})`);
+      else error(`rpc_request method=${method} failed: ${message}`);
+      client.sendRpcResponse({ requestId, result: { error: message, success: false } });
+    }
+  });
+
  // Handle gateway-dispatched agent runs (heterogeneous agents, e.g. Claude
  // Code). Mirrors the desktop app: spawn `lh hetero exec`, which owns the full
  // execution + server-ingest pipeline. Ack with the spawn outcome — `accepted`
@@ -302,6 +338,7 @@ async function runConnect(options: ConnectOptions, isDaemonChild: boolean) {
        {
          agentType: request.agentType,
          cwd: request.cwd,
+          imageList: request.imageList,
          jwt: request.jwt,
          operationId: request.operationId,
          prompt: request.prompt,
@@ -650,7 +650,7 @@ describe('hetero exec command', () => {
  });

  it('resets the per-message text accumulator at message boundaries (no cross-message duplication)', async () => {
-    // LOBE-10157 Bug 3: the `replace` snapshot accumulator must not span
+    // The `replace` snapshot accumulator must not span
    // message boundaries. Two assistant messages separated by a
    // stream_end/stream_start boundary must each snapshot only their OWN
    // text — otherwise the second message re-emits the first's text verbatim.
@@ -261,7 +261,7 @@ class SerialServerIngester {
    // adapter's `openMainMessage`) must reset it — otherwise it spans the
    // whole run and every later message's snapshot re-emits all prior
    // messages' text verbatim, which the server then persists into the new
-    // DB message (LOBE-10157 Bug 3: cross-message text duplication). Reset
+    // DB message: cross-message text duplication. Reset
    // AFTER flushing the just-ended message's pending snapshot above.
    if (event.type === 'stream_start' || event.type === 'stream_end') {
      this.accumulatedText = '';
@@ -122,4 +122,24 @@ describe('spawnHeteroAgentRun', () => {
      ]),
    );
  });
+
+  it('appends image blocks to stdin when imageList is provided', async () => {
+    const child = makeFakeChild();
+    spawnMock.mockReturnValue(child);
+
+    const ackPromise = spawnHeteroAgentRun({
+      ...baseParams,
+      imageList: [{ id: 'file-1', url: 'https://signed/a.png' }],
+      prompt: 'look at this',
+    });
+    child.emit('spawn');
+    await ackPromise;
+
+    expect(child.stdin.write).toHaveBeenCalledWith(
+      JSON.stringify([
+        { text: 'look at this', type: 'text' },
+        { source: { id: 'file-1', type: 'url', url: 'https://signed/a.png' }, type: 'image' },
+      ]),
+    );
+  });
 });
@@ -1,8 +1,15 @@
 import { spawn } from 'node:child_process';

+import {
+  buildHeteroExecStdinPayload,
+  type HeteroExecImageRef,
+} from '@lobechat/heterogeneous-agents/protocol';
+
 export interface SpawnHeteroAgentRunParams {
  agentType: string;
  cwd?: string;
+  /** Image attachments (signed URLs) appended as image content blocks. */
+  imageList?: HeteroExecImageRef[];
  jwt: string;
  operationId: string;
  prompt: string;
@@ -46,6 +53,7 @@ export function spawnHeteroAgentRun(
  const {
    agentType,
    cwd,
+    imageList,
    jwt,
    operationId,
    prompt,
@@ -77,15 +85,11 @@ export function spawnHeteroAgentRun(
    ...(resumeSessionId ? ['--resume', resumeSessionId] : []),
  ];

-  // With systemContext, send a content-block array so the agent sees the
-  // context block first, then the user's actual prompt — mirrors the desktop
-  // path. `lh hetero exec` coerces both shapes via coerceJsonPrompt.
-  const stdinPayload = systemContext
-    ? JSON.stringify([
-        { text: systemContext, type: 'text' },
-        { text: prompt, type: 'text' },
-      ])
-    : JSON.stringify(prompt);
+  // systemContext / image attachments turn the payload into a content-block
+  // array: context block first, then the user's prompt, then images — mirrors
+  // the desktop path. `lh hetero exec` coerces both shapes via
+  // coerceJsonPrompt.
+  const stdinPayload = buildHeteroExecStdinPayload({ imageList, prompt, systemContext });

  return new Promise<AgentRunAckResult>((resolve) => {
    let settled = false;
@@ -1,4 +1,3 @@
-/* eslint-disable no-console */
 import pc from 'picocolors';

 let verbose = false;
@@ -41,18 +40,20 @@ export const log = {
    console.log(`${timestamp()} ${pc.bold('[STATUS]')} ${color(status)}`);
  },

-  toolCall: (apiName: string, requestId: string, args?: string) => {
+  toolCall: (apiName: string, requestId: string, args?: string, operationId?: string) => {
    console.log(
-      `${timestamp()} ${pc.magenta('[TOOL]')} ${pc.bold(apiName)} ${pc.dim(`(${requestId})`)}`,
+      `${timestamp()} ${pc.magenta('[TOOL]')} ${pc.bold(apiName)}${operationId ? ` ${pc.dim(`op=${operationId}`)}` : ''} ${pc.dim(`(${requestId})`)}`,
    );
    if (args && verbose) {
      console.log(`  ${pc.dim(args)}`);
    }
  },

-  toolResult: (requestId: string, success: boolean, content?: string) => {
+  toolResult: (requestId: string, success: boolean, content?: string, operationId?: string) => {
    const icon = success ? pc.green('OK') : pc.red('FAIL');
-    console.log(`${timestamp()} ${pc.magenta('[RESULT]')} ${icon} ${pc.dim(`(${requestId})`)}`);
+    console.log(
+      `${timestamp()} ${pc.magenta('[RESULT]')} ${icon}${operationId ? ` ${pc.dim(`op=${operationId}`)}` : ''} ${pc.dim(`(${requestId})`)}`,
+    );
    if (content && verbose) {
      const preview = content.length > 200 ? content.slice(0, 200) + '...' : content;
      console.log(`  ${pc.dim(preview)}`);
@@ -6,6 +6,7 @@ import dotenv from 'dotenv';
 import { defineConfig } from 'electron-vite';
 import type { PluginOption, ViteDevServer } from 'vite';
 import { loadEnv } from 'vite';
+import tsconfigPaths from 'vite-tsconfig-paths';

 import {
  sharedOptimizeDeps,
@@ -88,10 +89,112 @@ function electronDesktopHtmlPlugin(): PluginOption {
  };
 }

+const CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG = '__LOBECLOUD_DESKTOP_BUSINESS_FEATURES__';
+const BUSINESS_CONST_MODULE_ID = '@lobechat/business-const';
+const CLOUD_BUSINESS_CONST_MODULE_ID = '@cloud/business-const';
+const DYNAMIC_BUSINESS_CONST_QUERY = '?lobe-cloud-desktop-business-const';
+
+const createBusinessFeaturesBootstrapScript = () =>
+  `globalThis[${JSON.stringify(CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG)}] = true;`;
+
+const replaceBusinessFlagExport = (code: string, name: string, initializer: string) => {
+  const pattern = new RegExp(`export\\s+(?:const|let|var)\\s+${name}\\s*=\\s*[\\s\\S]*?;`);
+
+  return {
+    code: code.replace(pattern, `export let ${name} = ${initializer};`),
+    replaced: pattern.test(code),
+  };
+};
+
+const injectDynamicBusinessFeatureFlag = (code: string) => {
+  const businessFlag = replaceBusinessFlagExport(
+    code,
+    'ENABLE_BUSINESS_FEATURES',
+    `Boolean(globalThis['${CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG}'])`,
+  );
+  const topicLinkFlag = replaceBusinessFlagExport(
+    businessFlag.code,
+    'ENABLE_TOPIC_LINK_SHARE',
+    'ENABLE_BUSINESS_FEATURES',
+  );
+
+  if (!businessFlag.replaced) {
+    throw new Error('Cannot find ENABLE_BUSINESS_FEATURES export in @cloud/business-const');
+  }
+
+  const topicLinkAssignment = topicLinkFlag.replaced
+    ? '\n  ENABLE_TOPIC_LINK_SHARE = enabled;'
+    : '';
+
+  return `${topicLinkFlag.code}
+
+const __lobeCloudDesktopBusinessFeaturesFlagKey = '${CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG}';
+const __lobeCloudDesktopApplyBusinessFeaturesFlag = (value) => {
+  const enabled = Boolean(value);
+  ENABLE_BUSINESS_FEATURES = enabled;${topicLinkAssignment}
+  return enabled;
+};
+
+const __lobeCloudDesktopExistingDescriptor = Object.getOwnPropertyDescriptor(
+  globalThis,
+  __lobeCloudDesktopBusinessFeaturesFlagKey,
+);
+const __lobeCloudDesktopInitialValue = __lobeCloudDesktopExistingDescriptor?.get
+  ? __lobeCloudDesktopExistingDescriptor.get.call(globalThis)
+  : globalThis[__lobeCloudDesktopBusinessFeaturesFlagKey];
+
+Object.defineProperty(globalThis, __lobeCloudDesktopBusinessFeaturesFlagKey, {
+  configurable: true,
+  get() {
+    return ENABLE_BUSINESS_FEATURES;
+  },
+  set(value) {
+    __lobeCloudDesktopApplyBusinessFeaturesFlag(value);
+  },
+});
+
+__lobeCloudDesktopApplyBusinessFeaturesFlag(__lobeCloudDesktopInitialValue);
+`;
+};
+
+function cloudDesktopBusinessConstPlugin(): PluginOption {
+  return {
+    enforce: 'pre',
+    async resolveId(id, importer) {
+      if (id !== BUSINESS_CONST_MODULE_ID) return;
+
+      const resolved = await this.resolve(CLOUD_BUSINESS_CONST_MODULE_ID, importer, {
+        skipSelf: true,
+      });
+      if (!resolved) throw new Error(`Cannot resolve ${CLOUD_BUSINESS_CONST_MODULE_ID}`);
+
+      return `${resolved.id}${DYNAMIC_BUSINESS_CONST_QUERY}`;
+    },
+    load(id) {
+      if (!id.endsWith(DYNAMIC_BUSINESS_CONST_QUERY)) return;
+
+      const sourcePath = id.slice(0, -DYNAMIC_BUSINESS_CONST_QUERY.length);
+      return injectDynamicBusinessFeatureFlag(readFileSync(sourcePath, 'utf8'));
+    },
+    name: 'lobe-cloud-desktop-business-const',
+    transformIndexHtml() {
+      return [
+        {
+          children: createBusinessFeaturesBootstrapScript(),
+          injectTo: 'head-prepend',
+          tag: 'script',
+        },
+      ];
+    },
+  };
+}
+
 dotenv.config();

 const isDev = process.env.NODE_ENV === 'development';
 const ROOT_DIR = path.resolve(__dirname, '../..');
+const CLOUD_ROOT_DIR = path.resolve(__dirname, '../../..');
+const isCloudDesktopBuild = process.env.CLOUD_DESKTOP === '1';
 const mode = process.env.NODE_ENV === 'production' ? 'production' : 'development';

 Object.assign(process.env, loadEnv(mode, ROOT_DIR, ''));
@@ -105,8 +208,17 @@ const mainProcessRuntimeExternals = [
  ...externalRuntimeModules,
  'node-mac-permissions',
 ];
+const externalNavigationHosts =
+  process.env.DESKTOP_EXTERNAL_NAVIGATION_HOSTS ?? (isCloudDesktopBuild ? 'stripe.com' : '');

 console.info(`[electron-vite.config.ts] Detected UPDATE_CHANNEL: ${updateChannel}`);
+console.info(`[electron-vite.config.ts] Cloud desktop build: ${isCloudDesktopBuild}`);
+
+const cloudTsconfigPathsPlugin = () =>
+  ({
+    ...tsconfigPaths({ projects: [path.resolve(CLOUD_ROOT_DIR, 'tsconfig.json')] }),
+    name: 'lobe-cloud-desktop-tsconfig-paths',
+  }) satisfies PluginOption;

 export default defineConfig({
  main: {
@@ -169,6 +281,7 @@ export default defineConfig({
      sourcemap: isDev ? 'inline' : false,
    },
    define: {
+      'process.env.DESKTOP_EXTERNAL_NAVIGATION_HOSTS': JSON.stringify(externalNavigationHosts),
      'process.env.UPDATE_CHANNEL': JSON.stringify(process.env.UPDATE_CHANNEL),
      'process.env.UPDATE_SERVER_URL': JSON.stringify(process.env.UPDATE_SERVER_URL),
    },
@@ -214,6 +327,8 @@ export default defineConfig({
    },
    optimizeDeps: sharedOptimizeDeps,
    plugins: [
+      isCloudDesktopBuild && cloudTsconfigPathsPlugin(),
+      isCloudDesktopBuild && cloudDesktopBusinessConstPlugin(),
      forceAbsoluteBasePlugin(),
      electronDesktopHtmlPlugin(),
      vanillaExtractPlugin(),
@@ -221,7 +336,7 @@ export default defineConfig({
    ],
    resolve: {
      dedupe: ['react', 'react-dom'],
-      tsconfigPaths: true,
+      tsconfigPaths: !isCloudDesktopBuild,
    },
    // In dev the BrowserWindow loads `app://renderer/` and the Electron main process
    // proxies non-backend requests to this Vite dev server via `net.fetch`. The HMR
@@ -56,6 +56,7 @@
    "@electron-toolkit/utils": "^4.0.0",
    "@lobechat/chat-adapter-imessage": "workspace:*",
    "@lobechat/desktop-bridge": "workspace:*",
+    "@lobechat/device-control": "workspace:*",
    "@lobechat/device-gateway-client": "workspace:*",
    "@lobechat/device-identity": "workspace:*",
    "@lobechat/electron-client-ipc": "workspace:*",
@@ -111,7 +112,7 @@
    "undici": "^7.16.0",
    "uuid": "^14.0.0",
    "vite": "8.0.14",
-    "vitest": "3.2.4",
+    "vitest": "3.2.6",
    "zod": "^3.25.76"
  },
  "optionalDependencies": {
@@ -128,7 +129,7 @@
      "node-gyp": "^12.4.0",
      "react": "19.2.4",
      "react-dom": "19.2.4",
-      "vitest": "3.2.4"
+      "vitest": "3.2.6"
    }
  }
 }
@@ -8,6 +8,7 @@ packages:
  - '../../packages/electron-client-ipc'
  - '../../packages/file-loaders'
  - '../../packages/desktop-bridge'
+  - '../../packages/device-control'
  - '../../packages/device-gateway-client'
  - '../../packages/device-identity'
  - '../../packages/local-file-shell'
@@ -7,6 +7,7 @@ import { getDesktopEnv } from '@/env';
 export const isDev = electronIs.dev();

 export const OFFICIAL_CLOUD_SERVER = getDesktopEnv().OFFICIAL_CLOUD_SERVER;
+export const DESKTOP_EXTERNAL_NAVIGATION_HOSTS = getDesktopEnv().DESKTOP_EXTERNAL_NAVIGATION_HOSTS;

 export const isMac = electronIs.macOS();
 export const isWindows = electronIs.windows();
@@ -91,6 +91,13 @@ export default class BrowserWindowsCtr extends ControllerModule {
    });
  }

+  @IpcMethod()
+  isWindowFullScreen() {
+    return this.withSenderIdentifier((identifier) => {
+      return this.app.browserManager.isWindowFullScreen(identifier);
+    });
+  }
+
  @IpcMethod()
  setWindowAlwaysOnTop(flag: boolean) {
    this.withSenderIdentifier((identifier) => {
@@ -3,6 +3,7 @@ import fs from 'node:fs';
 import os from 'node:os';
 import path from 'node:path';

+import { type DeviceControlDeps, executeDeviceRpc as runDeviceRpc } from '@lobechat/device-control';
 import type {
  AgentRunRequestMessage,
  GatewayMcpStdioParams,
@@ -13,10 +14,8 @@ import type {
  GetCommandOutputParams,
  GlobFilesParams,
  GrepContentParams,
-  InitWorkspaceParams,
  KillCommandParams,
  ListLocalFileParams,
-  ListProjectSkillsParams,
  LocalReadFileParams,
  LocalReadFilesParams,
  LocalSearchFilesParams,
@@ -29,15 +28,16 @@ import { type ILocalSystemService, LocalSystemExecutionRuntime } from '@lobechat

 import GatewayConnectionService from '@/services/gatewayConnectionSrv';
 import ImessageBridgeService from '@/services/imessageBridgeSrv';
+import { createLogger } from '@/utils/logger';

-import GitCtr from './GitCtr';
 import HeterogeneousAgentCtr from './HeterogeneousAgentCtr';
 import { ControllerModule, IpcMethod } from './index';
 import LocalFileCtr from './LocalFileCtr';
 import McpCtr from './McpCtr';
 import RemoteServerConfigCtr from './RemoteServerConfigCtr';
 import ShellCommandCtr from './ShellCommandCtr';
-import WorkspaceCtr from './WorkspaceCtr';
+
+const logger = createLogger('controllers:GatewayConnectionCtr');

 /**
 * Inject the lh-notify protocol into the first turn of a new hetero-agent session.
@@ -166,14 +166,6 @@ export default class GatewayConnectionCtr extends ControllerModule {
    return this.app.getController(LocalFileCtr);
  }

-  private get workspaceCtr() {
-    return this.app.getController(WorkspaceCtr);
-  }
-
-  private get gitCtr() {
-    return this.app.getController(GitCtr);
-  }
-
  private get shellCommandCtr() {
    return this.app.getController(ShellCommandCtr);
  }
@@ -300,6 +292,7 @@ export default class GatewayConnectionCtr extends ControllerModule {
      this.heterogeneousAgentCtr.spawnLhHeteroExec({
        agentType: request.agentType,
        cwd: request.cwd,
+        imageList: request.imageList,
        jwt: request.jwt,
        operationId: request.operationId,
        prompt: request.prompt,
@@ -351,87 +344,33 @@ export default class GatewayConnectionCtr extends ControllerModule {
    return this.localSystemRuntime;
  }

+  /**
+   * Platform-specific handlers the shared `@lobechat/device-control` dispatcher
+   * delegates to. Git + workspace-scan methods run inside device-control over
+   * `@lobechat/local-file-shell`; only file preview / index (and preview
+   * approval) are desktop-specific and routed back to the controllers here.
+   */
+  private get deviceControlDeps(): DeviceControlDeps {
+    return {
+      approveProjectRoot: async (root) => {
+        try {
+          await this.app.localFileProtocolManager.approveIndexedProjectRoot(root);
+        } catch (error) {
+          logger.error(`Failed to approve project preview root ${root}:`, error);
+        }
+      },
+      getLocalFilePreview: (params) => this.localFileCtr.getLocalFilePreview(params),
+      getProjectFileIndex: (params) => this.localFileCtr.getProjectFileIndex(params),
+    };
+  }
+
  /**
   * Dispatch a generic server-internal device RPC (not an agent tool call) by
-   * method name. Currently only `initWorkspace` (scan the bound project root for
-   * skills + AGENTS.md); add new server-only device methods here.
+   * method name. The dispatch logic lives in `@lobechat/device-control` so the
+   * desktop main process and the CLI daemon share one device RPC surface.
   */
  private async executeDeviceRpc(method: string, params: unknown): Promise<unknown> {
-    switch (method) {
-      case 'initWorkspace': {
-        return this.workspaceCtr.initWorkspace(params as InitWorkspaceParams);
-      }
-
-      case 'getGitBranch': {
-        return this.gitCtr.getGitBranch((params as { path: string }).path);
-      }
-
-      case 'getLinkedPullRequest': {
-        return this.gitCtr.getLinkedPullRequest(params as { branch: string; path: string });
-      }
-
-      case 'getGitWorkingTreeStatus': {
-        return this.gitCtr.getGitWorkingTreeStatus((params as { path: string }).path);
-      }
-
-      case 'getGitAheadBehind': {
-        return this.gitCtr.getGitAheadBehind((params as { path: string }).path);
-      }
-
-      case 'listGitBranches': {
-        return this.gitCtr.listGitBranches((params as { path: string }).path);
-      }
-
-      case 'checkoutGitBranch': {
-        return this.gitCtr.checkoutGitBranch(
-          params as { branch: string; create?: boolean; path: string },
-        );
-      }
-
-      case 'pullGitBranch': {
-        return this.gitCtr.pullGitBranch(params as { path: string });
-      }
-
-      case 'pushGitBranch': {
-        return this.gitCtr.pushGitBranch(params as { path: string });
-      }
-
-      case 'getGitWorkingTreePatches': {
-        return this.gitCtr.getGitWorkingTreePatches((params as { path: string }).path);
-      }
-
-      case 'getGitWorkingTreeFiles': {
-        return this.gitCtr.getGitWorkingTreeFiles((params as { path: string }).path);
-      }
-
-      case 'getProjectFileIndex': {
-        return this.localFileCtr.getProjectFileIndex(params as { scope?: string });
-      }
-
-      case 'listProjectSkills': {
-        return this.workspaceCtr.listProjectSkills(params as ListProjectSkillsParams);
-      }
-
-      case 'getGitBranchDiff': {
-        return this.gitCtr.getGitBranchDiff(params as { baseRef?: string; path: string });
-      }
-
-      case 'listGitRemoteBranches': {
-        return this.gitCtr.listGitRemoteBranches((params as { path: string }).path);
-      }
-
-      case 'revertGitFile': {
-        return this.gitCtr.revertGitFile(params as { filePath: string; path: string });
-      }
-
-      case 'statPath': {
-        return this.workspaceCtr.statPath(params as { path: string });
-      }
-
-      default: {
-        throw new Error(`Unknown device RPC method: ${method}`);
-      }
-    }
+    return runDeviceRpc(method, params, this.deviceControlDeps);
  }

  private async executeToolCall(
@@ -18,13 +18,20 @@ import {
 } from '@lobechat/electron-client-ipc';
 import type { AskUserBridge } from '@lobechat/heterogeneous-agents/askUser';
 import { AskUserMcpServer } from '@lobechat/heterogeneous-agents/askUser';
-import type { AgentContentBlock } from '@lobechat/heterogeneous-agents/spawn';
+import type {
+  AgentContentBlock,
+  HeteroExecImageRef,
+} from '@lobechat/heterogeneous-agents/protocol';
+import { buildHeteroExecStdinPayload } from '@lobechat/heterogeneous-agents/protocol';
+import type { AgentStreamEvent, UsageData } from '@lobechat/heterogeneous-agents/spawn';
 import {
  AgentStreamPipeline,
  buildAgentInput,
  materializeImageToPath,
  normalizeImage,
+  readCodexSessionModel,
  resolveCliSpawnPlan,
+  resolveCodexInitialModel,
 } from '@lobechat/heterogeneous-agents/spawn';
 import { app as electronApp, BrowserWindow } from 'electron';

@@ -176,9 +183,33 @@ interface AgentSession {
  command: string;
  cwd?: string;
  env?: Record<string, string>;
+  model?: string;
+  modelSource?: string;
+  modelVerificationLastAttemptAt?: number;
+  modelVerificationLastAttemptSessionId?: string;
  process?: ChildProcess;
+  /**
+   * Absolute CLI path resolved by spawn preflight detection. Used for spawn()
+   * when the configured command is bare: detection can find the CLI through
+   * the login-shell PATH or a well-known install location (e.g. the Codex.app
+   * bundled CLI) that plain spawn() with the inherited env can't resolve.
+   */
+  resolvedCommandPath?: string;
+  /**
+   * PATH the preflight detector used to resolve `resolvedCommandPath`, set only
+   * when it fell back to the login-shell PATH. Merged into the child PATH at
+   * spawn so a `#!/usr/bin/env node` shim still finds its interpreter — the
+   * shim resolving in preflight doesn't guarantee `node` is on the leaner
+   * inherited PATH (Finder-launched Electron).
+   */
+  resolvedCommandSearchPath?: string;
  resumeSessionId?: string;
  sessionId: string;
+  verifiedModel?: string;
+  verifiedModelContextWindow?: number;
+  verifiedModelProvider?: string;
+  verifiedModelSessionId?: string;
+  verifiedModelSourceFile?: string;
 }

 type SessionErrorPayload = HeterogeneousAgentSessionError | string;
@@ -454,11 +485,20 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
            session.agentType === 'claude-code' ? 'claude-code' : 'codex',
            command,
          );
-    const cliMissingError = this.buildCliMissingError(session);

-    if (!status || status.available || !cliMissingError) return;
+    if (!status || status.available) {
+      // Spawn through the detector-resolved absolute path when the configured
+      // command is bare — detection may have located the CLI somewhere plain
+      // spawn() can't (login-shell PATH, Codex.app bundled CLI, …).
+      const useResolvedPath = Boolean(status?.path) && !command.includes(path.sep);
+      session.resolvedCommandPath = useResolvedPath ? status!.path : undefined;
+      // Carry the login-shell PATH the detector resolved through, so a
+      // `#!/usr/bin/env node` shim spawned by absolute path still finds `node`.
+      session.resolvedCommandSearchPath = useResolvedPath ? status!.resolvedPathEnv : undefined;
+      return;
+    }

-    return cliMissingError;
+    return this.buildCliMissingError(session);
  }

  private get shouldTraceCliOutput(): boolean {
@@ -581,12 +621,19 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
            createdAt: createdAt.toISOString(),
            cwd,
            envKeys: session.env ? Object.keys(session.env).sort() : [],
+            model: session.model,
+            modelSource: session.modelSource,
            resumeSessionId: session.resumeSessionId,
            sessionId: session.sessionId,
            stdinBytes: stdinPayload === undefined ? 0 : Buffer.byteLength(stdinPayload),
            stdinFile: stdinPayload === undefined ? undefined : 'stdin.txt',
            stderrFile: 'stderr.log',
            stdoutFile: 'stdout.jsonl',
+            verifiedModel: session.verifiedModel,
+            verifiedModelContextWindow: session.verifiedModelContextWindow,
+            verifiedModelProvider: session.verifiedModelProvider,
+            verifiedModelSessionId: session.verifiedModelSessionId,
+            verifiedModelSourceFile: session.verifiedModelSourceFile,
          },
          null,
          2,
@@ -888,6 +935,8 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
    let spawnPlan;
    let traceSession;
    let cwd: string;
+    let initialCumulativeUsage: UsageData | undefined;
+    let spawnEnv: NodeJS.ProcessEnv;
    try {
      const driver = getHeterogeneousAgentDriver(session.agentType);
      spawnPlan = await driver.buildSpawnPlan({
@@ -906,6 +955,34 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
      // Fall back to the user's Desktop so the process never inherits
      // the Electron parent's cwd (which is `/` when launched from Finder).
      cwd = session.cwd || electronApp.getPath('desktop');
+
+      // Forward the user's proxy settings to the CLI. The main-process undici
+      // dispatcher doesn't reach child processes — they need env vars.
+      const proxyEnv = buildProxyEnv(this.app.storeManager.get('networkProxy'));
+      const inheritedEnv = buildInheritedSpawnEnv();
+      // When preflight resolved the CLI via the login-shell PATH, spawn with
+      // that PATH (a superset of the inherited one) so a `#!/usr/bin/env node`
+      // shim finds its interpreter. `session.env` still wins if it sets PATH.
+      if (session.resolvedCommandSearchPath) inheritedEnv.PATH = session.resolvedCommandSearchPath;
+      spawnEnv = { ...inheritedEnv, ...proxyEnv, ...session.env };
+
+      if (session.agentType === 'codex') {
+        const initialModel = await resolveCodexInitialModel({
+          args: spawnPlan.args,
+          env: spawnEnv,
+        });
+        if (initialModel?.model) {
+          session.model = initialModel.model;
+          session.modelSource = initialModel.source;
+        }
+
+        if (session.agentSessionId) {
+          initialCumulativeUsage = (
+            await readCodexSessionModel(session.agentSessionId, { env: spawnEnv })
+          )?.cumulativeUsage;
+        }
+      }
+
      traceSession = await this.createCliTraceSession({
        cliArgs: spawnPlan.args,
        cwd,
@@ -925,7 +1002,10 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
    }
    const useStdin = spawnPlan.stdinPayload !== undefined;
    const cliArgs = spawnPlan.args;
-    const resolvedCliSpawnPlan = await resolveCliSpawnPlan(session.command, cliArgs);
+    const resolvedCliSpawnPlan = await resolveCliSpawnPlan(
+      session.resolvedCommandPath ?? session.command,
+      cliArgs,
+    );

    logger.info(
      'Spawning agent:',
@@ -940,29 +1020,28 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
    // the claude binary can leave bash/grep/etc. tool children running and
    // the CLI hung waiting on them. Windows has different semantics — use
    // taskkill /T /F there; no detached flag needed.
-    // Forward the user's proxy settings to the CLI. The main-process undici
-    // dispatcher doesn't reach child processes — they need env vars.
-    const proxyEnv = buildProxyEnv(this.app.storeManager.get('networkProxy'));
-
    const spawnOptions = {
      cwd,
      detached: process.platform !== 'win32',
      // Strip host Anthropic creds from the inherited env so a developer's
      // shell `ANTHROPIC_API_KEY` can't hijack the CLI's own auth. `session.env`
      // is spread last, so an agent that explicitly configures a key still wins.
-      env: { ...buildInheritedSpawnEnv(), ...proxyEnv, ...session.env },
+      env: spawnEnv,
      stdio: [useStdin ? 'pipe' : 'ignore', 'pipe', 'pipe'] as ['pipe' | 'ignore', 'pipe', 'pipe'],
    };

    return new Promise<void>((resolve, reject) => {
      const proc = spawn(resolvedCliSpawnPlan.command, resolvedCliSpawnPlan.args, spawnOptions);
      this.handleSpawnedAgentProcess({
+        cwd,
        intervention,
        params,
        proc,
        reject,
        resolve,
        session,
+        initialCumulativeUsage,
+        spawnEnv,
        traceSession,
        useStdin,
        spawnPlan,
@@ -970,23 +1049,88 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
    });
  }

+  private async verifyCodexSessionModel({
+    env,
+    pipeline,
+    session,
+    traceSession,
+  }: {
+    env: NodeJS.ProcessEnv;
+    pipeline: AgentStreamPipeline;
+    session: AgentSession;
+    traceSession: CliTraceSession | undefined;
+  }): Promise<AgentStreamEvent[]> {
+    if (
+      session.agentType !== 'codex' ||
+      !pipeline.sessionId ||
+      session.verifiedModelSessionId === pipeline.sessionId
+    ) {
+      return [];
+    }
+
+    const now = Date.now();
+    if (
+      session.modelVerificationLastAttemptSessionId === pipeline.sessionId &&
+      session.modelVerificationLastAttemptAt &&
+      now - session.modelVerificationLastAttemptAt < 1000
+    ) {
+      return [];
+    }
+    session.modelVerificationLastAttemptSessionId = pipeline.sessionId;
+    session.modelVerificationLastAttemptAt = now;
+
+    const sessionModel = await readCodexSessionModel(pipeline.sessionId, { env });
+    if (!sessionModel?.model) return [];
+
+    const previousModel = session.model;
+    session.verifiedModel = sessionModel.model;
+    session.verifiedModelContextWindow = sessionModel.contextWindow;
+    session.verifiedModelProvider = sessionModel.provider;
+    session.verifiedModelSessionId = pipeline.sessionId;
+    session.verifiedModelSourceFile = sessionModel.sourceFile;
+
+    void this.writeCliTraceJson(traceSession, 'model.json', {
+      initialModel: previousModel,
+      initialModelSource: session.modelSource,
+      sessionId: pipeline.sessionId,
+      verifiedAt: new Date().toISOString(),
+      verifiedContextWindow: sessionModel.contextWindow,
+      verifiedLine: sessionModel.line,
+      verifiedModel: sessionModel.model,
+      verifiedModelProvider: sessionModel.provider,
+      verifiedSourceFile: sessionModel.sourceFile,
+    });
+
+    if (previousModel === sessionModel.model) return [];
+
+    session.model = sessionModel.model;
+    session.modelSource = 'codex-session';
+    return pipeline.configureSession({ model: sessionModel.model });
+  }
+
  private handleSpawnedAgentProcess({
+    cwd,
+    initialCumulativeUsage,
    intervention,
    params,
    proc,
    reject,
    resolve,
    session,
+    spawnEnv,
    spawnPlan,
    traceSession,
    useStdin,
  }: {
+    cwd: string;
    intervention?: Awaited<ReturnType<HeterogeneousAgentCtr['setupInterventionForOp']>>;
    params: SendPromptParams;
    proc: ChildProcess;
    reject: (reason?: unknown) => void;
    resolve: () => void;
    session: AgentSession;
+    initialCumulativeUsage?: UsageData | undefined;
+    spawnEnv: NodeJS.ProcessEnv;
    spawnPlan: HeterogeneousAgentBuildPlan;
    traceSession: CliTraceSession | undefined;
    useStdin: boolean;
@@ -1021,10 +1165,13 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
    // toStreamEvent all run inside the shared pipeline, so renderer + future
    // server `heteroIngest` see the same `AgentStreamEvent` wire shape with
    // no per-consumer adapter. The pipeline auto-wires the Codex
-    // file-change line-stat tracker when `agentType === 'codex'`, so this
+    // file-change diff/stat tracker when `agentType === 'codex'`, so this
    // controller stays agent-agnostic.
    const pipeline = new AgentStreamPipeline({
      agentType: session.agentType,
+      cwd,
+      initialCumulativeUsage,
+      initialModel: session.model,
      operationId: params.operationId,
    });
    let stdoutBroadcastQueue: Promise<void> = Promise.resolve();
@@ -1039,6 +1186,14 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
          if (pipeline.sessionId && pipeline.sessionId !== session.agentSessionId) {
            session.agentSessionId = pipeline.sessionId;
          }
+          events.push(
+            ...(await this.verifyCodexSessionModel({
+              env: spawnEnv,
+              pipeline,
+              session,
+              traceSession,
+            })),
+          );
          for (const event of events) {
            this.broadcast('heteroAgentEvent', {
              event,
@@ -1317,6 +1472,8 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
  spawnLhHeteroExec(params: {
    agentType: string;
    cwd?: string;
+    /** Image attachments (signed URLs) appended as image content blocks. */
+    imageList?: HeteroExecImageRef[];
    jwt: string;
    operationId: string;
    prompt: string;
@@ -1328,6 +1485,7 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
    const {
      agentType,
      cwd,
+      imageList,
      jwt,
      operationId,
      prompt,
@@ -1380,16 +1538,11 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
      stdio: ['pipe', 'inherit', 'inherit'],
    });

-    // When systemContext is provided, send a content-block array so CC sees the
-    // context block first, then the user's actual message — mirrors
-    // spawnHeteroSandbox. lh handles JSON arrays via coerceJsonPrompt, so no lh
-    // changes are required.
-    const stdinPayload = systemContext
-      ? JSON.stringify([
-          { text: systemContext, type: 'text' },
-          { text: prompt, type: 'text' },
-        ])
-      : JSON.stringify(prompt);
+    // systemContext / image attachments turn the payload into a content-block
+    // array so CC sees the context block first, then the user's message, then
+    // the images — mirrors spawnHeteroSandbox. lh handles both shapes via
+    // coerceJsonPrompt, so no lh changes are required.
+    const stdinPayload = buildHeteroExecStdinPayload({ imageList, prompt, systemContext });
    child.stdin.write(stdinPayload);
    child.stdin.end();

@@ -12,6 +12,7 @@ import {
  type GrepContentParams,
  type GrepContentResult,
  type ListLocalFileParams,
+  type LocalFilePreviewResult,
  type LocalFilePreviewUrlParams,
  type LocalFilePreviewUrlResult,
  type LocalMoveFilesResultItem,
@@ -65,6 +66,19 @@ const logger = createLogger('controllers:LocalFileCtr');

 const SAFE_PATH_PREFIXES = ['/tmp', '/var/tmp'] as const;

+const TEXT_PREVIEW_MIME_TYPES = new Set([
+  'application/graphql',
+  'application/javascript',
+  'application/json',
+  'application/markdown',
+  'application/toml',
+  'application/xml',
+  'application/yaml',
+  'text/markdown',
+  'text/mdx',
+  'text/x-markdown',
+]);
+
 const normalizeAbsolutePath = (inputPath: string): string =>
  path.normalize(path.isAbsolute(inputPath) ? inputPath : `/${inputPath}`);

@@ -91,6 +105,48 @@ const resolveNearestExistingRealPath = async (targetPath: string): Promise<strin

 const toPosixRelativePath = (filePath: string) => filePath.split(path.sep).join('/');

+const normalizeContentType = (contentType: string): string =>
+  contentType.split(';')[0].trim().toLowerCase();
+
+const isTextPreviewMimeType = (mimeType: string): boolean =>
+  mimeType.startsWith('text/') || TEXT_PREVIEW_MIME_TYPES.has(mimeType);
+
+const serializePreviewFile = ({
+  buffer,
+  contentType,
+}: {
+  buffer: Buffer;
+  contentType: string;
+}): NonNullable<LocalFilePreviewResult['preview']> => {
+  const normalizedContentType = normalizeContentType(contentType);
+
+  if (normalizedContentType.startsWith('image/')) {
+    return {
+      base64: buffer.toString('base64'),
+      contentType: normalizedContentType,
+      type: 'image',
+    };
+  }
+
+  if (isTextPreviewMimeType(normalizedContentType)) {
+    return {
+      content: buffer.toString('utf8'),
+      contentType: normalizedContentType,
+      type: 'text',
+    };
+  }
+
+  if (normalizedContentType === 'application/pdf') {
+    return { contentType: normalizedContentType, type: 'pdf' };
+  }
+
+  if (normalizedContentType.startsWith('video/')) {
+    return { contentType: normalizedContentType, type: 'video' };
+  }
+
+  return { contentType: normalizedContentType, type: 'binary' };
+};
+
 const createProjectFileEntry = (
  root: string,
  absolutePath: string,
@@ -381,11 +437,13 @@ export default class LocalFileCtr extends ControllerModule {

  @IpcMethod()
  async getLocalFilePreviewUrl({
+    accept,
    path: filePath,
    workingDirectory,
  }: LocalFilePreviewUrlParams): Promise<LocalFilePreviewUrlResult> {
    try {
      const url = await this.app.localFileProtocolManager.createPreviewUrl({
+        accept,
        filePath,
        workspaceRoot: workingDirectory,
      });
@@ -401,6 +459,33 @@ export default class LocalFileCtr extends ControllerModule {
    }
  }

+  @IpcMethod()
+  async getLocalFilePreview({
+    accept,
+    path: filePath,
+    workingDirectory,
+  }: LocalFilePreviewUrlParams): Promise<LocalFilePreviewResult> {
+    try {
+      const preview = await this.app.localFileProtocolManager.readPreviewFile({
+        accept,
+        filePath,
+        workspaceRoot: workingDirectory,
+      });
+
+      if (!preview) {
+        return { error: 'File is outside the approved workspace', success: false };
+      }
+
+      return {
+        preview: serializePreviewFile(preview),
+        success: true,
+      };
+    } catch (error) {
+      logger.error('Failed to read local file preview:', error);
+      return { error: (error as Error).message, success: false };
+    }
+  }
+
  @IpcMethod()
  async handlePrepareSkillDirectory({
    forceRefresh,
@@ -1,244 +1,53 @@
-import { readdir, readFile, stat } from 'node:fs/promises';
-import path from 'node:path';
-
+import {
+  initWorkspace as runInitWorkspace,
+  listProjectSkills as runListProjectSkills,
+  statPath as runStatPath,
+  type WorkspaceScanDeps,
+} from '@lobechat/device-control';
 import {
  type InitWorkspaceParams,
  type InitWorkspaceResult,
  type ListProjectSkillsParams,
  type ListProjectSkillsResult,
-  type ProjectSkillItem,
 } from '@lobechat/electron-client-ipc';

-import { detectRepoType } from '@/utils/git';
 import { createLogger } from '@/utils/logger';

 import { ControllerModule, IpcMethod } from './index';

 const logger = createLogger('controllers:WorkspaceCtr');

-const SKILL_FRONTMATTER_RE = /^---\r?\n([\s\S]*?)\r?\n---/;
-
-// Cap recursion to guard against pathological directory trees.
-const MAX_SKILL_FILE_COUNT = 1000;
-
-const toPosixRelativePath = (filePath: string) => filePath.split(path.sep).join('/');
-
-const listSkillFilesRecursive = async (dir: string): Promise<string[]> => {
-  const results: string[] = [];
-  const stack: string[] = [dir];
-
-  while (stack.length > 0 && results.length < MAX_SKILL_FILE_COUNT) {
-    const current = stack.pop()!;
-    let entries;
-    try {
-      entries = await readdir(current, { withFileTypes: true });
-    } catch {
-      continue;
-    }
-    for (const entry of entries) {
-      if (entry.name.startsWith('.')) continue;
-      const full = path.join(current, entry.name);
-      if (entry.isDirectory()) {
-        stack.push(full);
-      } else if (entry.isFile()) {
-        results.push(toPosixRelativePath(path.relative(dir, full)));
-        if (results.length >= MAX_SKILL_FILE_COUNT) break;
-      }
-    }
-  }
-  return results.sort();
-};
-
-// Parse a minimal YAML frontmatter block for SKILL.md files.
-// Only handles `key: value` lines; multi-line block scalars fall back to the first line.
-const parseSkillFrontmatter = (raw: string): Record<string, string> => {
-  const match = raw.match(SKILL_FRONTMATTER_RE);
-  if (!match) return {};
-
-  const fields: Record<string, string> = {};
-  for (const line of match[1].split(/\r?\n/)) {
-    const colonIdx = line.indexOf(':');
-    if (colonIdx === -1) continue;
-    const key = line.slice(0, colonIdx).trim();
-    if (!key || key.startsWith('#')) continue;
-    let value = line.slice(colonIdx + 1).trim();
-    if (value.startsWith('|') || value.startsWith('>')) continue;
-    if (
-      (value.startsWith('"') && value.endsWith('"')) ||
-      (value.startsWith("'") && value.endsWith("'"))
-    ) {
-      value = value.slice(1, -1);
-    }
-    fields[key] = value;
-  }
-  return fields;
-};
-
 /**
 * WorkspaceCtr
 *
- * Owns "project workspace" scanning: discovering agent skills (`.agents/skills`
- * / `.claude/skills`) and project-root instructions (`AGENTS.md` / `CLAUDE.md`)
- * under a bound project directory. Split out of LocalFileCtr so the
- * workspace/agent-config concern is distinct from generic local file ops.
+ * Thin IPC layer over `@lobechat/device-control`'s workspace-scan helpers
+ * (skills discovery under `.agents/skills` / `.claude/skills` + project-root
+ * instructions). The scan logic is shared with the device-control RPC dispatch
+ * so the local desktop IPC path, the remote device RPC, and the CLI all run
+ * identical scans; the desktop-only preview-protocol approval is injected here.
 */
 export default class WorkspaceCtr extends ControllerModule {
  static override readonly groupName = 'workspace';

-  /**
-   * Scan one skill source directory (e.g. `.agents/skills`) under `root` and
-   * return parsed frontmatter for each `SKILL.md`. Returns `[]` when the source
-   * directory is absent or unreadable. Unsorted — callers sort/merge.
-   */
-  private async scanSkillsInSource(
-    root: string,
-    source: ProjectSkillItem['source'],
-  ): Promise<ProjectSkillItem[]> {
-    const dir = path.join(root, source);
-    let entries;
-    try {
-      entries = await readdir(dir, { withFileTypes: true });
-    } catch {
-      // Directory does not exist or is not readable.
-      return [];
-    }
-
-    const skills = await Promise.all(
-      entries
-        .filter((entry) => entry.isDirectory() || entry.isSymbolicLink())
-        .map(async (entry) => {
-          const skillDir = path.join(dir, entry.name);
-          const skillFile = path.join(skillDir, 'SKILL.md');
-          try {
-            const raw = await readFile(skillFile, 'utf8');
-            const fields = parseSkillFrontmatter(raw);
-            const files = await listSkillFilesRecursive(skillDir);
-            return {
-              description: fields.description || undefined,
-              fileCount: files.length,
-              files,
-              name: fields.name || entry.name,
-              path: skillFile,
-              skillDir,
-              source,
-            };
-          } catch {
-            return null;
-          }
-        }),
-    );
-
-    return skills.filter((skill): skill is ProjectSkillItem => skill !== null);
+  private get scanDeps(): WorkspaceScanDeps {
+    return { approveProjectRoot: (root) => this.approveProjectRootForPreview(root) };
  }

-  /**
-   * Scan agent skill directories under the project root and return parsed
-   * frontmatter for each SKILL.md. Used by the hetero agent's working sidebar
-   * to surface skills available in the current project. Returns the first
-   * source directory that yields any skills (`.agents/skills` wins).
-   */
  @IpcMethod()
  async listProjectSkills(params: ListProjectSkillsParams): Promise<ListProjectSkillsResult> {
-    const root = params.scope;
-    const sources = ['.agents/skills', '.claude/skills'] as const;
-
-    for (const source of sources) {
-      const skills = (await this.scanSkillsInSource(root, source)).sort((a, b) =>
-        a.name.localeCompare(b.name),
-      );
-
-      if (skills.length > 0) {
-        await this.approveProjectRootForPreview(root);
-        return { root, skills, source };
-      }
-    }
-
-    return { root, skills: [], source: null };
+    return runListProjectSkills(params, this.scanDeps);
  }

-  /**
-   * One-call "workspace init" scan of a bound project directory: merge the
-   * project skills from BOTH `.agents/skills` and `.claude/skills` (deduped by
-   * name, `.agents/skills` winning) and read the project-root agent
-   * instructions file (`AGENTS.md`, else `CLAUDE.md`). Driven server-side at run
-   * start via the generic device RPC (not an LLM-visible tool) and cached onto
-   * `devices.workingDirs[].workspace`.
-   *
-   * Approves the root for the `lobe-file://` preview protocol (same as
-   * `listProjectSkills`) so the user can later click through to the scanned
-   * skills / instructions in the UI.
-   */
  @IpcMethod()
  async initWorkspace(params: InitWorkspaceParams): Promise<InitWorkspaceResult> {
-    const root = params.scope;
-    const sources = ['.agents/skills', '.claude/skills'] as const;
-
-    const seen = new Set<string>();
-    const skills: ProjectSkillItem[] = [];
-    for (const source of sources) {
-      for (const skill of await this.scanSkillsInSource(root, source)) {
-        if (seen.has(skill.name)) continue;
-        seen.add(skill.name);
-        skills.push(skill);
-      }
-    }
-    skills.sort((a, b) => a.name.localeCompare(b.name));
-
-    const instructions = await this.readWorkspaceInstructions(root);
-
-    // Approve regardless of what was found — the run is now bound to this root,
-    // so any later click-through to it should resolve through the preview
-    // protocol even if the project carries neither skills nor instructions.
-    await this.approveProjectRootForPreview(root);
-
-    return { instructions, root, skills };
+    return runInitWorkspace(params, this.scanDeps);
  }

-  /**
-   * Check whether a path exists on this device and is a directory, plus its git
-   * repo type (`git` / `github` / none). Used to validate a manually-entered
-   * working directory from a web / remote client (which can't browse this
-   * device's filesystem) before binding it, and to render the right dir icon.
-   */
  @IpcMethod()
  async statPath(params: {
    path: string;
  }): Promise<{ exists: boolean; isDirectory: boolean; repoType?: 'git' | 'github' }> {
-    try {
-      const stats = await stat(params.path);
-      if (!stats.isDirectory()) return { exists: true, isDirectory: false };
-      const repoType = await detectRepoType(params.path);
-      return { exists: true, isDirectory: true, repoType };
-    } catch {
-      return { exists: false, isDirectory: false };
-    }
-  }
-
-  /**
-   * Read the project-root agent instructions files. Collects every present
-   * candidate (`AGENTS.md`, then `CLAUDE.md`) rather than first-match, since both
-   * can coexist. Each body is capped so a pathologically large file can't bloat
-   * the cached `workingDirs` payload or the injected system role.
-   */
-  private async readWorkspaceInstructions(
-    root: string,
-  ): Promise<InitWorkspaceResult['instructions']> {
-    const MAX_INSTRUCTIONS_BYTES = 64 * 1024;
-    const candidates = ['AGENTS.md', 'CLAUDE.md'] as const;
-
-    const instructions: InitWorkspaceResult['instructions'] = [];
-    for (const source of candidates) {
-      try {
-        const raw = await readFile(path.join(root, source), 'utf8');
-        const content =
-          raw.length > MAX_INSTRUCTIONS_BYTES ? raw.slice(0, MAX_INSTRUCTIONS_BYTES) : raw;
-        instructions.push({ content, source });
-      } catch {
-        // File absent or unreadable; skip it.
-      }
-    }
-
-    return instructions;
+    return runStatPath(params);
  }

  private async approveProjectRootForPreview(root: string) {
@@ -29,6 +29,7 @@ const mockCloseWindow = vi.fn();
 const mockMinimizeWindow = vi.fn();
 const mockMaximizeWindow = vi.fn();
 const mockIsWindowMaximized = vi.fn();
+const mockIsWindowFullScreen = vi.fn();
 const mockRetrieveByIdentifier = vi.fn();
 const mockStartSession = vi.fn();
 const testSenderIdentifierString: string = 'test-window-event-id';
@@ -58,6 +59,7 @@ const mockApp = {
    minimizeWindow: mockMinimizeWindow,
    maximizeWindow: mockMaximizeWindow,
    isWindowMaximized: mockIsWindowMaximized,
+    isWindowFullScreen: mockIsWindowFullScreen,
    retrieveByIdentifier: mockRetrieveByIdentifier.mockImplementation(
      (identifier: AppBrowsersIdentifiers | string) => {
        if (identifier === 'some-other-window') {
@@ -166,6 +168,20 @@ describe('BrowserWindowsCtr', () => {
    });
  });

+  describe('isWindowFullScreen', () => {
+    it('should return fullscreen state for the sender window', () => {
+      mockIsWindowFullScreen.mockReturnValueOnce(true);
+
+      const sender = {} as any;
+      const context = { sender, event: { sender } as any } as IpcContext;
+      const result = runWithIpcContext(context, () => browserWindowsCtr.isWindowFullScreen());
+
+      expect(mockGetIdentifierByWebContents).toHaveBeenCalledWith(context.sender);
+      expect(mockIsWindowFullScreen).toHaveBeenCalledWith(testSenderIdentifierString);
+      expect(result).toBe(true);
+    });
+  });
+
  describe('interceptRoute', () => {
    const baseParams = { source: 'link-click' as const };

@@ -480,6 +480,87 @@ describe('HeterogeneousAgentCtr', () => {
      expect(spawnCalls).toHaveLength(0);
    });

+    it('spawns through the detector-resolved absolute path when the bare command is off PATH', async () => {
+      // Codex desktop app case: `codex` is not on PATH, but the preflight
+      // detector finds the CLI bundled inside Codex.app. Spawning the bare
+      // command would ENOENT — spawn must use the resolved absolute path.
+      const resolvedPath = '/Applications/Codex.app/Contents/Resources/codex';
+      const detect = vi.fn().mockResolvedValue({ available: true, path: resolvedPath });
+      const { proc } = createFakeProc();
+      nextFakeProc = proc;
+
+      const ctr = new HeterogeneousAgentCtr({
+        appStoragePath,
+        storeManager: { get: vi.fn() },
+        toolDetectorManager: { detect },
+      } as any);
+      const { sessionId } = await ctr.startSession({
+        agentType: 'codex',
+        command: 'codex',
+      });
+      await ctr.sendPrompt({ operationId: 'op-test', prompt: 'hello', sessionId });
+
+      expect(spawnCalls[0].command).toBe(resolvedPath);
+    });
+
+    it('carries the detector login-shell PATH into the spawn env for `env node` shims', async () => {
+      // `codex` resolved via the login-shell PATH (mise/nvm). Spawning the
+      // absolute shim under the leaner inherited PATH would fail at its
+      // `#!/usr/bin/env node` shebang — the resolved PATH must reach the child.
+      const resolvedPath = '/Users/h/.local/share/mise/shims/codex';
+      const searchPath = '/Users/h/.local/share/mise/shims:/usr/bin:/bin';
+      const detect = vi
+        .fn()
+        .mockResolvedValue({ available: true, path: resolvedPath, resolvedPathEnv: searchPath });
+      const { proc } = createFakeProc();
+      nextFakeProc = proc;
+
+      const ctr = new HeterogeneousAgentCtr({
+        appStoragePath,
+        storeManager: { get: vi.fn() },
+        toolDetectorManager: { detect },
+      } as any);
+      const { sessionId } = await ctr.startSession({ agentType: 'codex', command: 'codex' });
+      await ctr.sendPrompt({ operationId: 'op-test', prompt: 'hello', sessionId });
+
+      expect(spawnCalls[0].command).toBe(resolvedPath);
+      expect(spawnCalls[0].options.env.PATH).toBe(searchPath);
+    });
+
+    it('keeps an explicit path-like command for spawn instead of the detector result', async () => {
+      // detectHeterogeneousCliCommand validates the custom path via --version.
+      execFileMock.mockImplementation(
+        (
+          _file: string,
+          _args: string[],
+          optionsOrCallback: unknown,
+          callback?: (error: Error | null, result: { stderr: string; stdout: string }) => void,
+        ) => {
+          const resolvedCallback =
+            typeof optionsOrCallback === 'function' ? optionsOrCallback : callback;
+          (resolvedCallback as any)?.(null, { stderr: '', stdout: 'codex-cli 0.99.0' });
+        },
+      );
+
+      const detect = vi.fn();
+      const { proc } = createFakeProc();
+      nextFakeProc = proc;
+
+      const ctr = new HeterogeneousAgentCtr({
+        appStoragePath,
+        storeManager: { get: vi.fn() },
+        toolDetectorManager: { detect },
+      } as any);
+      const { sessionId } = await ctr.startSession({
+        agentType: 'codex',
+        command: '/custom/bin/codex',
+      });
+      await ctr.sendPrompt({ operationId: 'op-test', prompt: 'hello', sessionId });
+
+      expect(detect).not.toHaveBeenCalled();
+      expect(spawnCalls[0].command).toBe('/custom/bin/codex');
+    });
+
    it('passes prompt via stdin to codex exec instead of argv', async () => {
      const prompt = '--run a shell-like prompt safely';
      const { cliArgs, command, writes } = await runSendPrompt(prompt);
@@ -1,3 +1,5 @@
+import path from 'node:path';
+
 import { zipSync } from 'fflate';
 import { beforeEach, describe, expect, it, vi } from 'vitest';

@@ -88,6 +90,7 @@ const mockLocalFileProtocolManager = {
  approveIndexedProjectRoot: vi.fn(),
  approveProjectRootFromScope: vi.fn(),
  createPreviewUrl: vi.fn(),
+  readPreviewFile: vi.fn(),
 };

 // Mock makeSureDirExist
@@ -146,7 +149,6 @@ describe('LocalFileCtr', () => {

    it('should expand a leading ~ to the user home directory', async () => {
      const os = await import('node:os');
-      const path = await import('node:path');
      vi.mocked(mockShell.openPath).mockResolvedValue('');

      const result = await localFileCtr.handleOpenLocalFile({ path: '~/git/work/file.txt' });
@@ -171,7 +173,6 @@ describe('LocalFileCtr', () => {

    it('should expand a leading ~ when opening a directory', async () => {
      const os = await import('node:os');
-      const path = await import('node:path');
      vi.mocked(mockShell.openPath).mockResolvedValue('');

      const result = await localFileCtr.handleOpenLocalFolder({
@@ -224,6 +225,7 @@ describe('LocalFileCtr', () => {
      });

      expect(mockLocalFileProtocolManager.createPreviewUrl).toHaveBeenCalledWith({
+        accept: undefined,
        filePath: '/workspace/app.ts',
        workspaceRoot: '/workspace',
      });
@@ -246,6 +248,99 @@ describe('LocalFileCtr', () => {
        success: false,
      });
    });
+
+    it('should forward image-only preview URL constraints', async () => {
+      mockLocalFileProtocolManager.createPreviewUrl.mockResolvedValue(
+        'localfile://file/workspace/image.png?token=abc',
+      );
+
+      const result = await localFileCtr.getLocalFilePreviewUrl({
+        accept: 'image',
+        path: '/workspace/image.png',
+        workingDirectory: '/workspace',
+      });
+
+      expect(mockLocalFileProtocolManager.createPreviewUrl).toHaveBeenCalledWith({
+        accept: 'image',
+        filePath: '/workspace/image.png',
+        workspaceRoot: '/workspace',
+      });
+      expect(result).toEqual({
+        success: true,
+        url: 'localfile://file/workspace/image.png?token=abc',
+      });
+    });
+  });
+
+  describe('getLocalFilePreview', () => {
+    it('should return text preview content for an approved workspace file', async () => {
+      mockLocalFileProtocolManager.readPreviewFile.mockResolvedValue({
+        buffer: Buffer.from('const value = 1;'),
+        contentType: 'text/plain; charset=utf-8',
+        realPath: '/workspace/app.ts',
+      });
+
+      const result = await localFileCtr.getLocalFilePreview({
+        path: '/workspace/app.ts',
+        workingDirectory: '/workspace',
+      });
+
+      expect(mockLocalFileProtocolManager.readPreviewFile).toHaveBeenCalledWith({
+        accept: undefined,
+        filePath: '/workspace/app.ts',
+        workspaceRoot: '/workspace',
+      });
+      expect(result).toEqual({
+        preview: {
+          content: 'const value = 1;',
+          contentType: 'text/plain',
+          type: 'text',
+        },
+        success: true,
+      });
+    });
+
+    it('should reject preview payload creation outside an approved workspace', async () => {
+      mockLocalFileProtocolManager.readPreviewFile.mockResolvedValue(null);
+
+      const result = await localFileCtr.getLocalFilePreview({
+        path: '/Users/alice/.ssh/id_rsa',
+        workingDirectory: '/workspace',
+      });
+
+      expect(result).toEqual({
+        error: 'File is outside the approved workspace',
+        success: false,
+      });
+    });
+
+    it('should forward image-only preview read constraints', async () => {
+      mockLocalFileProtocolManager.readPreviewFile.mockResolvedValue({
+        buffer: Buffer.from('image-bytes'),
+        contentType: 'image/png',
+        realPath: '/workspace/image.png',
+      });
+
+      const result = await localFileCtr.getLocalFilePreview({
+        accept: 'image',
+        path: '/workspace/image.png',
+        workingDirectory: '/workspace',
+      });
+
+      expect(mockLocalFileProtocolManager.readPreviewFile).toHaveBeenCalledWith({
+        accept: 'image',
+        filePath: '/workspace/image.png',
+        workspaceRoot: '/workspace',
+      });
+      expect(result).toEqual({
+        preview: {
+          base64: Buffer.from('image-bytes').toString('base64'),
+          contentType: 'image/png',
+          type: 'image',
+        },
+        success: true,
+      });
+    });
  });

  describe('handleWriteFile', () => {
@@ -7,7 +7,7 @@ import type { BrowserWindowConstructorOptions } from 'electron';
 import { app, BrowserWindow, ipcMain, screen, session as electronSession, shell } from 'electron';

 import { preloadDir, resourcesDir } from '@/const/dir';
-import { isMac } from '@/const/env';
+import { DESKTOP_EXTERNAL_NAVIGATION_HOSTS, isMac } from '@/const/env';
 import RemoteServerConfigCtr from '@/controllers/RemoteServerConfigCtr';
 import { backendProxyProtocolManager } from '@/core/infrastructure/BackendProxyProtocolManager';
 import { appendVercelCookie, setResponseHeader } from '@/utils/http-headers';
@@ -19,6 +19,31 @@ import { WindowThemeManager } from './WindowThemeManager';

 const logger = createLogger('core:Browser');

+const getExternalNavigationHosts = () =>
+  DESKTOP_EXTERNAL_NAVIGATION_HOSTS.split(',')
+    .map((host) => host.trim().toLowerCase())
+    .filter(Boolean);
+
+const shouldOpenTopLevelNavigationExternally = (rawUrl: string) => {
+  const externalNavigationHosts = getExternalNavigationHosts();
+  if (externalNavigationHosts.length === 0) return false;
+
+  let url: URL;
+  try {
+    url = new URL(rawUrl);
+  } catch {
+    return false;
+  }
+
+  if (url.protocol !== 'http:' && url.protocol !== 'https:') return false;
+
+  const hostname = url.hostname.toLowerCase();
+
+  return externalNavigationHosts.some(
+    (externalHost) => hostname === externalHost || hostname.endsWith(`.${externalHost}`),
+  );
+};
+
 // ==================== Types ====================

 export interface BrowserWindowOpts extends BrowserWindowConstructorOptions {
@@ -194,10 +219,27 @@ export default class Browser {
    this.setupReadyToShowListener(browserWindow);
    this.setupCloseListener(browserWindow);
    this.setupFocusListener(browserWindow);
+    this.setupFullscreenListener(browserWindow);
+    this.setupTopLevelNavigationListener(browserWindow);
    this.setupWillPreventUnloadListener(browserWindow);
    this.setupContextMenu(browserWindow);
  }

+  private setupTopLevelNavigationListener(browserWindow: BrowserWindow): void {
+    logger.debug(`[${this.identifier}] Setting up top-level navigation listener.`);
+
+    browserWindow.webContents.on('will-navigate', (event, url) => {
+      if (!shouldOpenTopLevelNavigationExternally(url)) return;
+
+      logger.info(`[${this.identifier}] Opening top-level navigation externally: ${url}`);
+      event.preventDefault();
+
+      shell.openExternal(url).catch((error) => {
+        logger.error(`[${this.identifier}] Failed to open external navigation URL: ${url}`, error);
+      });
+    });
+  }
+
  /**
   * Setup window open handler to intercept external links
   * Prevents opening new windows in renderer and uses system browser instead
@@ -268,6 +310,18 @@ export default class Browser {
    });
  }

+  private setupFullscreenListener(browserWindow: BrowserWindow): void {
+    logger.debug(`[${this.identifier}] Setting up fullscreen event listeners.`);
+
+    browserWindow.on('enter-full-screen', () => {
+      this.broadcast('windowFullscreenChanged', { isFullScreen: true });
+    });
+
+    browserWindow.on('leave-full-screen', () => {
+      this.broadcast('windowFullscreenChanged', { isFullScreen: false });
+    });
+  }
+
  /**
   * Setup context menu with platform-specific features
   * Delegates to MenuManager for consistent platform behavior
@@ -368,6 +368,11 @@ export class BrowserManager {
    return browser?.browserWindow.isMaximized() ?? false;
  }

+  isWindowFullScreen(identifier: string) {
+    const browser = this.browsers.get(identifier);
+    return browser?.browserWindow.isFullScreen() ?? false;
+  }
+
  setWindowSize(identifier: string, size: { height?: number; width?: number }) {
    const browser = this.browsers.get(identifier);
    browser?.setWindowSize(size);
@@ -9,6 +9,7 @@ const {
  mockBrowserWindow,
  mockNativeTheme,
  mockIpcMain,
+  mockShell,
  mockScreen,
  MockBrowserWindow,
  mockEnv,
@@ -64,6 +65,7 @@ const {
    MockBrowserWindow: vi.fn().mockImplementation(() => mockBrowserWindow),
    mockBrowserWindow,
    mockEnv: {
+      externalNavigationHosts: '',
      isDev: false,
      isLinux: false,
      isMac: false,
@@ -91,6 +93,9 @@ const {
        workArea: { height: 1080, width: 1920, x: 0, y: 0 },
      }),
    },
+    mockShell: {
+      openExternal: vi.fn().mockResolvedValue(undefined),
+    },
  };
 });

@@ -101,6 +106,7 @@ vi.mock('electron', () => ({
  ipcMain: mockIpcMain,
  nativeTheme: mockNativeTheme,
  screen: mockScreen,
+  shell: mockShell,
 }));

 // Mock logger
@@ -121,6 +127,9 @@ vi.mock('@/const/dir', () => ({
 }));

 vi.mock('@/const/env', () => ({
+  get DESKTOP_EXTERNAL_NAVIGATION_HOSTS() {
+    return mockEnv.externalNavigationHosts;
+  },
  get isDev() {
    return mockEnv.isDev;
  },
@@ -182,6 +191,7 @@ describe('Browser', () => {
    mockEnv.isMac = false;
    mockEnv.isMacTahoe = false;
    mockEnv.isWindows = true;
+    mockEnv.externalNavigationHosts = '';

    // Create mock App
    mockStoreManagerGet = vi.fn().mockReturnValue(undefined);
@@ -531,6 +541,30 @@ describe('Browser', () => {
      });
    });

+    describe('fullscreen events', () => {
+      it('should broadcast fullscreen state changes', () => {
+        const enterHandler = mockBrowserWindow.on.mock.calls.find(
+          (call) => call[0] === 'enter-full-screen',
+        )?.[1];
+        const leaveHandler = mockBrowserWindow.on.mock.calls.find(
+          (call) => call[0] === 'leave-full-screen',
+        )?.[1];
+
+        expect(enterHandler).toBeDefined();
+        expect(leaveHandler).toBeDefined();
+
+        enterHandler();
+        expect(mockBrowserWindow.webContents.send).toHaveBeenCalledWith('windowFullscreenChanged', {
+          isFullScreen: true,
+        });
+
+        leaveHandler();
+        expect(mockBrowserWindow.webContents.send).toHaveBeenCalledWith('windowFullscreenChanged', {
+          isFullScreen: false,
+        });
+      });
+    });
+
    describe('close', () => {
      it('should close window', () => {
        browser.close();
@@ -730,4 +764,38 @@ describe('Browser', () => {
      expect(mockEvent.preventDefault).not.toHaveBeenCalled();
    });
  });
+
+  describe('top-level navigation handling', () => {
+    let willNavigateHandler: (event: any, url: string) => void;
+
+    beforeEach(() => {
+      willNavigateHandler = mockBrowserWindow.webContents.on.mock.calls.find(
+        (call) => call[0] === 'will-navigate',
+      )?.[1];
+    });
+
+    it('should open configured external navigation hosts in system browser', () => {
+      mockEnv.externalNavigationHosts = 'stripe.com';
+      const mockEvent = { preventDefault: vi.fn() };
+
+      expect(willNavigateHandler).toBeDefined();
+      willNavigateHandler(mockEvent, 'https://checkout.stripe.com/c/pay/session_id');
+
+      expect(mockEvent.preventDefault).toHaveBeenCalled();
+      expect(mockShell.openExternal).toHaveBeenCalledWith(
+        'https://checkout.stripe.com/c/pay/session_id',
+      );
+    });
+
+    it('should allow internal result routes in the app window', () => {
+      mockEnv.externalNavigationHosts = 'stripe.com';
+      const mockEvent = { preventDefault: vi.fn() };
+
+      expect(willNavigateHandler).toBeDefined();
+      willNavigateHandler(mockEvent, 'http://localhost:3000/payment/upgrade-success');
+
+      expect(mockEvent.preventDefault).not.toHaveBeenCalled();
+      expect(mockShell.openExternal).not.toHaveBeenCalled();
+    });
+  });
 });
@@ -48,6 +48,27 @@ interface PreviewTokenRecord {
  realPath: string;
 }

+export interface PreviewFileReadResult {
+  buffer: Buffer;
+  contentType: string;
+  realPath: string;
+}
+
+type PreviewFileAccept = 'image';
+
+const normalizeContentType = (contentType: string): string =>
+  contentType.split(';')[0].trim().toLowerCase();
+
+const isAcceptedPreviewContentType = (
+  contentType: string,
+  accept?: PreviewFileAccept,
+): boolean => {
+  if (!accept) return true;
+
+  const normalizedContentType = normalizeContentType(contentType);
+  return accept === 'image' && normalizedContentType.startsWith('image/');
+};
+
 /**
 * Custom `localfile://` protocol for project file previews.
 *
@@ -207,43 +228,65 @@ export class LocalFileProtocolManager {
  }

  async createPreviewUrl({
+    accept,
    filePath,
    workspaceRoot,
  }: {
+    accept?: PreviewFileAccept;
    filePath: string;
    workspaceRoot: string;
  }): Promise<string | null> {
    const normalizedFilePath = normalizeAbsolutePath(filePath);
-    const normalizedWorkspaceRoot = normalizeAbsolutePath(workspaceRoot);
-    if (!normalizedFilePath || !normalizedWorkspaceRoot) return null;
+    if (!normalizedFilePath) return null;

-    const [realFilePath, realWorkspaceRoot] = await Promise.all([
-      realpath(normalizedFilePath),
-      realpath(normalizedWorkspaceRoot),
-    ]);
-    const normalizedRealFilePath = normalizeAbsolutePath(realFilePath);
-    const normalizedRealWorkspaceRoot = normalizeAbsolutePath(realWorkspaceRoot);
-
-    if (!normalizedRealFilePath || !normalizedRealWorkspaceRoot) return null;
-    if (
-      !this.approvedWorkspaceRoots.has(normalizedRealWorkspaceRoot) &&
-      !this.indexedProjectRoots.has(normalizedRealWorkspaceRoot)
-    ) {
-      return null;
-    }
-    if (!isPathWithinRoot(normalizedRealFilePath, normalizedRealWorkspaceRoot)) return null;
+    const realFilePath = accept
+      ? (
+          await this.readPreviewFile({
+            accept,
+            filePath,
+            workspaceRoot,
+          })
+        )?.realPath
+      : await this.resolveApprovedPreviewPath({ filePath, workspaceRoot });
+    if (!realFilePath) return null;

    this.cleanupExpiredTokens();

    const token = randomUUID();
    this.previewTokens.set(token, {
      expiresAt: Date.now() + PREVIEW_TOKEN_TTL_MS,
-      realPath: normalizedRealFilePath,
+      realPath: realFilePath,
    });

    return buildLocalFileUrl(normalizedFilePath, token);
  }

+  async readPreviewFile({
+    accept,
+    filePath,
+    workspaceRoot,
+  }: {
+    accept?: PreviewFileAccept;
+    filePath: string;
+    workspaceRoot: string;
+  }): Promise<PreviewFileReadResult | null> {
+    const realFilePath = await this.resolveApprovedPreviewPath({ filePath, workspaceRoot });
+    if (!realFilePath) return null;
+
+    const fileStat = await stat(realFilePath);
+    if (!fileStat.isFile()) return null;
+
+    const buffer = await readFile(realFilePath);
+    const contentType = resolveLocalFileMimeType(realFilePath, buffer);
+    if (!isAcceptedPreviewContentType(contentType, accept)) return null;
+
+    return {
+      buffer,
+      contentType,
+      realPath: realFilePath,
+    };
+  }
+
  /**
   * Decode the URL pathname back into an absolute filesystem path.
   *
@@ -283,6 +326,36 @@ export class LocalFileProtocolManager {
    return normalized;
  }

+  private async resolveApprovedPreviewPath({
+    filePath,
+    workspaceRoot,
+  }: {
+    filePath: string;
+    workspaceRoot: string;
+  }): Promise<string | null> {
+    const normalizedFilePath = normalizeAbsolutePath(filePath);
+    const normalizedWorkspaceRoot = normalizeAbsolutePath(workspaceRoot);
+    if (!normalizedFilePath || !normalizedWorkspaceRoot) return null;
+
+    const [realFilePath, realWorkspaceRoot] = await Promise.all([
+      realpath(normalizedFilePath),
+      realpath(normalizedWorkspaceRoot),
+    ]);
+    const normalizedRealFilePath = normalizeAbsolutePath(realFilePath);
+    const normalizedRealWorkspaceRoot = normalizeAbsolutePath(realWorkspaceRoot);
+
+    if (!normalizedRealFilePath || !normalizedRealWorkspaceRoot) return null;
+    if (
+      !this.approvedWorkspaceRoots.has(normalizedRealWorkspaceRoot) &&
+      !this.indexedProjectRoots.has(normalizedRealWorkspaceRoot)
+    ) {
+      return null;
+    }
+    if (!isPathWithinRoot(normalizedRealFilePath, normalizedRealWorkspaceRoot)) return null;
+
+    return normalizedRealFilePath;
+  }
+
  private cleanupExpiredTokens() {
    const now = Date.now();
    for (const [token, record] of this.previewTokens) {
@@ -15,6 +15,15 @@ export interface ToolStatus {
  error?: string;
  lastChecked?: Date;
  path?: string;
+  /**
+   * PATH value used to resolve/validate the command, surfaced only when it
+   * differs from the detector process's `process.env.PATH` (e.g. resolution
+   * fell back to the login-shell PATH). A caller that spawns the resolved
+   * `path` must carry this into the child's PATH, or a `#!/usr/bin/env node`
+   * shim that resolved here still fails with `env: node: No such file or
+   * directory` under the leaner inherited env.
+   */
+  resolvedPathEnv?: string;
  version?: string;
 }

@@ -119,6 +119,21 @@ describe('LocalFileProtocolManager', () => {
    expect(response.headers.get('Content-Type')).toBe('text/plain; charset=utf-8');
  });

+  it('does not mint image-only preview URLs for text files', async () => {
+    const manager = new LocalFileProtocolManager();
+    await manager.approveWorkspaceRoot('/Users/alice/project');
+    mockReadFile.mockResolvedValue(Buffer.from('const value = 1;'));
+
+    const url = await manager.createPreviewUrl({
+      accept: 'image',
+      filePath: '/Users/alice/project/App.tsx',
+      workspaceRoot: '/Users/alice/project',
+    });
+
+    expect(url).toBeNull();
+    expect(mockReadFile).toHaveBeenCalledWith('/Users/alice/project/App.tsx');
+  });
+
  it('decodes percent-encoded characters in the path', async () => {
    const manager = new LocalFileProtocolManager();
    manager.registerHandler();
@@ -278,6 +293,52 @@ describe('LocalFileProtocolManager', () => {
    expect(url).toContain('token=');
  });

+  it('reads preview payloads only from approved project roots', async () => {
+    const manager = new LocalFileProtocolManager();
+    await manager.approveIndexedProjectRoot('/Users/alice/project');
+    mockReadFile.mockResolvedValue(Buffer.from('const value = 1;'));
+
+    const result = await manager.readPreviewFile({
+      filePath: '/Users/alice/project/App.tsx',
+      workspaceRoot: '/Users/alice/project',
+    });
+
+    expect(result).toEqual({
+      buffer: Buffer.from('const value = 1;'),
+      contentType: 'text/plain; charset=utf-8',
+      realPath: '/Users/alice/project/App.tsx',
+    });
+    expect(mockReadFile).toHaveBeenCalledWith('/Users/alice/project/App.tsx');
+  });
+
+  it('does not return text payloads for image-only preview reads', async () => {
+    const manager = new LocalFileProtocolManager();
+    await manager.approveIndexedProjectRoot('/Users/alice/project');
+    mockReadFile.mockResolvedValue(Buffer.from('SECRET=value'));
+
+    const result = await manager.readPreviewFile({
+      accept: 'image',
+      filePath: '/Users/alice/project/.env',
+      workspaceRoot: '/Users/alice/project',
+    });
+
+    expect(result).toBeNull();
+    expect(mockReadFile).toHaveBeenCalledWith('/Users/alice/project/.env');
+  });
+
+  it('does not read preview payloads outside the approved workspace root', async () => {
+    const manager = new LocalFileProtocolManager();
+    await manager.approveIndexedProjectRoot('/Users/alice/project');
+
+    const result = await manager.readPreviewFile({
+      filePath: '/Users/alice/.ssh/id_rsa',
+      workspaceRoot: '/Users/alice/project',
+    });
+
+    expect(result).toBeNull();
+    expect(mockReadFile).not.toHaveBeenCalled();
+  });
+
  it('defers registration until app ready when not yet ready', async () => {
    mockApp.isReady.mockReturnValue(false);
    let resolveReady: () => void = () => undefined;
@@ -50,6 +50,13 @@ const envNumber = (defaultValue: number) =>
    }, z.number().optional())
    .default(defaultValue);

+const getRuntimeEnv = () => ({
+  ...process.env,
+  DESKTOP_EXTERNAL_NAVIGATION_HOSTS: process.env.DESKTOP_EXTERNAL_NAVIGATION_HOSTS,
+  UPDATE_CHANNEL: process.env.UPDATE_CHANNEL,
+  UPDATE_SERVER_URL: process.env.UPDATE_SERVER_URL,
+});
+
 /**
 * Desktop (Electron main process) runtime env access.
 *
@@ -63,13 +70,15 @@ export const getDesktopEnv = memoize(() =>
    clientPrefix: 'PUBLIC_',
    emptyStringAsUndefined: true,
    isServer: true,
-    runtimeEnv: process.env,
+    runtimeEnv: getRuntimeEnv(),
    server: {
      DEBUG_VERBOSE: envBoolean(false),

      // escape hatch: allow testing static renderer in dev via env
      DESKTOP_RENDERER_STATIC: envBoolean(false),

+      DESKTOP_EXTERNAL_NAVIGATION_HOSTS: z.string().optional().default(''),
+
      // device gateway url override (dev: point at a local `wrangler dev` instance,
      // e.g. http://localhost:8787). Falls back to the stored value, then the
      // production gateway.
@@ -1,5 +1,6 @@
 import * as childProcess from 'node:child_process';
 import * as os from 'node:os';
+import path from 'node:path';

 import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';

@@ -180,6 +181,76 @@ describe('cliAgentDetectors', () => {
      expect(status.path).toBe('/usr/local/bin/claude');
      expect(execMock).not.toHaveBeenCalled();
      expect(execFileMock).toHaveBeenCalledTimes(2);
+      // Resolved on the inherited PATH — nothing extra to carry into spawn.
+      expect(status.resolvedPathEnv).toBeUndefined();
+    });
+
+    it('falls back to the Codex.app bundled CLI when `codex` is not on any PATH', async () => {
+      const originalPath = process.env.PATH;
+      const originalShell = process.env.SHELL;
+      // Deterministic env: no SHELL → no login-shell lookup, merged PATH
+      // equals process.env.PATH → no second `which` attempt.
+      process.env.PATH = '/usr/bin:/bin';
+      delete process.env.SHELL;
+
+      try {
+        callExecFileError(new Error('not found')); // which codex
+        callExecFile('codex-cli 0.138.0'); // bundled CLI --version
+
+        const { codexDetector } = await import('../cliAgentDetectors');
+        const status = await codexDetector.detect();
+
+        expect(status.available).toBe(true);
+        expect(status.path).toBe('/Applications/Codex.app/Contents/Resources/codex');
+        expect(status.version).toBe('codex-cli 0.138.0');
+
+        expect(execFileMock).toHaveBeenCalledTimes(2);
+        expect(execFileMock.mock.calls[0]![0]).toBe('which');
+        expect(execFileMock.mock.calls[1]![0]).toBe(
+          '/Applications/Codex.app/Contents/Resources/codex',
+        );
+      } finally {
+        process.env.PATH = originalPath;
+        if (originalShell === undefined) delete process.env.SHELL;
+        else process.env.SHELL = originalShell;
+      }
+    });
+
+    it('stays unavailable when neither PATH nor the well-known locations have codex', async () => {
+      const originalPath = process.env.PATH;
+      const originalShell = process.env.SHELL;
+      process.env.PATH = '/usr/bin:/bin';
+      delete process.env.SHELL;
+
+      try {
+        callExecFileError(new Error('not found')); // which codex
+        callExecFileError(new Error('ENOENT')); // /Applications candidate
+        callExecFileError(new Error('ENOENT')); // ~/Applications candidate
+
+        const { codexDetector } = await import('../cliAgentDetectors');
+        const status = await codexDetector.detect();
+
+        expect(status.available).toBe(false);
+        expect(execFileMock).toHaveBeenCalledTimes(3);
+        expect(execFileMock.mock.calls[2]![0]).toBe(
+          path.join(os.homedir(), 'Applications', 'Codex.app', 'Contents', 'Resources', 'codex'),
+        );
+      } finally {
+        process.env.PATH = originalPath;
+        if (originalShell === undefined) delete process.env.SHELL;
+        else process.env.SHELL = originalShell;
+      }
+    });
+
+    it('does not probe well-known locations for an explicit path-like command', async () => {
+      callExecFileError(new Error('ENOENT')); // /custom/bin/codex --version
+
+      const { detectHeterogeneousCliCommand } = await import('../cliAgentDetectors');
+      const status = await detectHeterogeneousCliCommand('codex', '/custom/bin/codex');
+
+      expect(status.available).toBe(false);
+      // Only the explicit path's --version attempt — no fallback probing.
+      expect(execFileMock).toHaveBeenCalledTimes(1);
    });

    it('falls back to the login shell PATH for tools installed by shell setup', async () => {
@@ -200,6 +271,12 @@ describe('cliAgentDetectors', () => {
        expect(status.available).toBe(true);
        expect(status.path).toBe('/Users/Hanam/.local/share/mise/shims/gemini');
        expect(status.version).toBe('gemini 0.2.0');
+        // The login-shell PATH that resolved the shim must be surfaced so the
+        // spawn site can carry it into the child env (mise/nvm `node` lives
+        // there, not on the leaner inherited PATH).
+        expect(status.resolvedPathEnv).toBe(
+          '/opt/homebrew/bin:/Users/Hanam/.local/share/mise/shims:/usr/bin:/bin',
+        );

        expect(execFileMock).toHaveBeenCalledTimes(4);
        expect(execFileMock.mock.calls[0]![0]).toBe('which');
@@ -1,5 +1,5 @@
 import { exec, execFile } from 'node:child_process';
-import { platform } from 'node:os';
+import { homedir, platform } from 'node:os';
 import path from 'node:path';
 import { promisify } from 'node:util';

@@ -190,6 +190,11 @@ const detectValidatedCommand = async (
    return {
      available: true,
      path: resolvedPath,
+      // `env` is set only when resolution fell back to the login-shell PATH.
+      // Surface that PATH so the spawn site can carry it into the child env —
+      // otherwise a `#!/usr/bin/env node` shim resolved here can't find `node`
+      // under the leaner inherited PATH (Finder-launched Electron).
+      resolvedPathEnv: env?.PATH,
      version: output.split(/\r?\n/)[0],
    };
  } catch {
@@ -209,6 +214,27 @@ const HETEROGENEOUS_CLI_AGENT_OPTIONS = {
  Pick<ValidatedDetectorOptions, 'validateKeywords'>
 >;

+// Well-known absolute install locations probed when a bare command isn't on
+// PATH. The Codex desktop app bundles a fully functional CLI inside Codex.app
+// (sharing ~/.codex auth/config) but never symlinks it into PATH, so
+// `which codex` misses an otherwise working install.
+const getWellKnownCommandPaths = (agentType: HeterogeneousCliAgentType): string[] => {
+  if (platform() !== 'darwin') return [];
+
+  switch (agentType) {
+    case 'codex': {
+      const bundledCli = path.join('Codex.app', 'Contents', 'Resources', 'codex');
+      return [
+        path.join('/Applications', bundledCli),
+        path.join(homedir(), 'Applications', bundledCli),
+      ];
+    }
+    default: {
+      return [];
+    }
+  }
+};
+
 export const detectHeterogeneousCliCommand = async (
  agentType: HeterogeneousCliAgentType,
  command: string,
@@ -216,7 +242,20 @@ export const detectHeterogeneousCliCommand = async (
  const validator = HETEROGENEOUS_CLI_AGENT_OPTIONS[agentType];
  if (!validator) return { available: false };

-  return detectValidatedCommand(command, validator);
+  const status = await detectValidatedCommand(command, validator);
+  if (status.available) return status;
+
+  // A bare command missing from PATH may still live at a well-known install
+  // location (e.g. the Codex desktop app's bundled CLI). Don't second-guess
+  // an explicit user-configured path.
+  if (!command.trim().includes(path.sep)) {
+    for (const candidate of getWellKnownCommandPaths(agentType)) {
+      const fallbackStatus = await detectValidatedCommand(candidate, validator);
+      if (fallbackStatus.available) return fallbackStatus;
+    }
+  }
+
+  return status;
 };

 /**
@@ -261,14 +300,17 @@ export const claudeCodeDetector: IToolDetector = createValidatedDetector({
 /**
 * OpenAI Codex CLI
 * @see https://github.com/openai/codex
+ *
+ * Goes through `detectHeterogeneousCliCommand` so the Codex.app bundled-CLI
+ * fallback applies here too, keeping the manager path and the custom-command
+ * path in sync.
 */
-export const codexDetector: IToolDetector = createValidatedDetector({
-  candidates: ['codex'],
+export const codexDetector: IToolDetector = {
  description: 'Codex - OpenAI agentic coding CLI',
+  detect: () => detectHeterogeneousCliCommand('codex', 'codex'),
  name: 'codex',
  priority: 2,
-  validateKeywords: ['codex'],
-});
+};

 /**
 * Google Gemini CLI
@@ -17,24 +17,23 @@ const log = debug('lobe-server:agent-runtime:coordinator');
 * decision) starts, but that resume runs under a **new** operationId with
 * its own event stream. For the paused operationId no further events will
 * arrive, so clients should stop waiting the same way they do on done.
+ *
+ * `waiting_for_async_tool` is different: deferred tools such as server
+ * sub-agents resume the SAME operationId after the out-of-band result is
+ * backfilled. Ending the stream at park time makes the client mark the turn
+ * as stopped while the server is still waiting for sub-agents.
 */
 const STREAM_END_STATUSES = new Set<AgentState['status']>([
  'done',
  'error',
  'interrupted',
  'waiting_for_human',
-  'waiting_for_async_tool',
 ]);

 const hasEnteredStreamEndState = (
  previousStatus?: AgentState['status'],
  nextStatus?: AgentState['status'],
-): nextStatus is
-  | 'done'
-  | 'error'
-  | 'interrupted'
-  | 'waiting_for_human'
-  | 'waiting_for_async_tool' => {
+): nextStatus is 'done' | 'error' | 'interrupted' | 'waiting_for_human' => {
  const wasStreamEnd = previousStatus ? STREAM_END_STATUSES.has(previousStatus) : false;
  return Boolean(nextStatus && STREAM_END_STATUSES.has(nextStatus) && !wasStreamEnd);
 };
@@ -61,6 +61,7 @@ import { chainCompressContext } from '@lobechat/prompts';
 import {
  type ChatToolPayload,
  type ExecSubAgentParams,
+  type ExecVirtualSubAgentParams,
  type MessageToolCall,
  type UIChatMessage,
 } from '@lobechat/types';
@@ -73,6 +74,7 @@ import { TopicModel } from '@/database/models/topic';
 import { UserModel } from '@/database/models/user';
 import { type LobeChatDatabase } from '@/database/type';
 import { fileEnv } from '@/envs/file';
+import { type ExecutionPlan, isDeviceCapablePlan } from '@/helpers/executionTarget';
 import { serverMessagesEngine } from '@/server/modules/Mecha/ContextEngineering';
 import { type EvalContext } from '@/server/modules/Mecha/ContextEngineering/types';
 import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
@@ -202,6 +204,51 @@ const isEmptyModelCompletion = (params: {
  return true;
 };

+type ReasoningReplayNode = {
+  children?: ReasoningReplayNode[];
+  members?: ReasoningReplayNode[];
+  reasoning?: unknown;
+};
+
+const stripAssistantReasoningForReplay = (messages: UIChatMessage[]): UIChatMessage[] => {
+  const stripMessage = <T extends ReasoningReplayNode>(message: T): T => {
+    let changed = false;
+
+    const children = message.children?.map((child) => {
+      const strippedChild = stripMessage(child);
+      if (strippedChild !== child) changed = true;
+      return strippedChild;
+    });
+
+    const members = message.members?.map((member) => {
+      const strippedMember = stripMessage(member);
+      if (strippedMember !== member) changed = true;
+      return strippedMember;
+    });
+
+    if ('reasoning' in message) changed = true;
+    if (!changed) return message;
+
+    const { reasoning: _reasoning, ...messageWithoutReasoning } = message;
+
+    return {
+      ...messageWithoutReasoning,
+      ...(children ? { children } : {}),
+      ...(members ? { members } : {}),
+    } as T;
+  };
+
+  let changed = false;
+
+  const strippedMessages = messages.map((message) => {
+    const strippedMessage = stripMessage(message);
+    if (strippedMessage !== message) changed = true;
+    return strippedMessage;
+  });
+
+  return changed ? strippedMessages : messages;
+};
+
 const GEN_AI_FUNCTION_TOOL_TYPE: ToolType = 'function';

 type ToolFailureKind = 'replan' | 'retry' | 'stop';
@@ -277,7 +324,7 @@ const buildPostProcessUrl = (
 };

 /**
- * Build the per-tool-call server sub-agent runner injected into the tool
+ * Build the per-tool-call server virtual sub-agent runner injected into the tool
 * execution context. Closes over the current tool payload + parent message so
 * the `callSubAgent` server tool can fork a child op without re-deriving the
 * message anchor (which it cannot do correctly from its own context).
@@ -285,17 +332,18 @@ const buildPostProcessUrl = (
 * The runner creates the pending placeholder tool message that anchors the
 * isolation thread (so the UI shows a loading state and the completion bridge
 * has a message to backfill), then kicks off the child op asynchronously and
- * returns immediately. Returns `undefined` when sub-agent execution is not
- * available (no `execSubAgent` callback, or missing agent/topic context).
+ * returns immediately. Returns `undefined` when virtual sub-agent execution is
+ * not available (no `execVirtualSubAgent` callback, or missing agent/topic
+ * context).
 */
-const buildServerSubAgentRunner = (
+const buildServerVirtualSubAgentRunner = (
  ctx: RuntimeExecutorContext,
  state: AgentState,
  chatToolPayload: ChatToolPayload,
  parentMessageId: string,
 ): ServerSubAgentRunner | undefined => {
-  const execSubAgent = ctx.execSubAgent;
-  if (!execSubAgent) return undefined;
+  const execVirtualSubAgent = ctx.execVirtualSubAgent;
+  if (!execVirtualSubAgent) return undefined;

  const agentId = state.metadata?.agentId;
  const topicId = ctx.topicId ?? state.metadata?.topicId;
@@ -318,16 +366,15 @@ const buildServerSubAgentRunner = (
        topicId,
      });

-      // 2. Fork the child op anchored to the placeholder. `resumeParentOnComplete`
-      //    tells execSubAgent to register the completion bridge that
-      //    backfills this tool message and resumes the parent op.
-      const result = (await execSubAgent({
+      // 2. Fork the virtual child op anchored to the placeholder. The virtual
+      //    entry marks the child as `isSubAgent` and registers the completion
+      //    bridge that backfills this tool message and resumes the parent op.
+      const result = (await execVirtualSubAgent({
        agentId: targetAgentId ?? agentId,
        groupId: state.metadata?.groupId ?? undefined,
        instruction,
        parentMessageId: placeholder.id,
        parentOperationId: ctx.operationId,
-        resumeParentOnComplete: true,
        timeout,
        title: description,
        topicId,
@@ -341,7 +388,7 @@ const buildServerSubAgentRunner = (
          await ctx.messageModel.deleteMessage(placeholder.id);
        } catch (error) {
          log(
-            'buildServerSubAgentRunner: failed to clean up placeholder %s: %O',
+            'buildServerVirtualSubAgentRunner: failed to clean up placeholder %s: %O',
            placeholder.id,
            error,
          );
@@ -476,11 +523,17 @@ export interface RuntimeExecutorContext {
  discordContext?: any;
  evalContext?: EvalContext;
  /**
-   * Callback to spawn a sub-agent task server-side.
+   * Callback to run a legacy agent invocation server-side.
   * Injected by AiAgentService so exec_sub_agent / exec_sub_agents executors
-   * can dispatch callAgent-triggered tasks without a circular import.
+   * can dispatch callAgent-triggered runs without a circular import.
   */
  execSubAgent?: (params: ExecSubAgentParams) => Promise<unknown>;
+  /**
+   * Callback to fork a `lobe-agent.callSubAgent` virtual child run. Unlike
+   * execSubAgent, this path installs the async completion bridge and marks the
+   * child operation as a sub-agent.
+   */
+  execVirtualSubAgent?: (params: ExecVirtualSubAgentParams) => Promise<unknown>;
  hookDispatcher?: HookDispatcher;
  loadAgentState?: (operationId: string) => Promise<AgentState | null>;
  messageModel: MessageModel;
@@ -532,17 +585,23 @@ export const createRuntimeExecutors = (
    const provider = llmPayload.provider || state.modelRuntimeConfig?.provider;
    // Resolve tools via ToolResolver (unified tool injection).
    //
-    // Belt-and-suspenders: even if `aiAgent.execAgent` ever forgets to clear
-    // `state.metadata.activeDeviceId` for a non-trusted sender, swallowing
-    // it here keeps `buildStepToolDelta` from re-injecting `local-system` —
-    // the engine's enabledToolIds exclusion alone is not enough, since the
-    // delta builder treats activeDeviceId as an independent activation
-    // signal and only dedupes against already-enabled tools.
+    // Single-track device gate: `buildStepToolDelta` treats activeDeviceId as
+    // an independent activation signal (it only dedupes against already-
+    // enabled tools), so any id that reaches it WILL inject local-system. The
+    // execution plan is the only authority on whether this session may touch
+    // a device — swallow the id for non-device-capable plans (`none`,
+    // `sandbox`) and for denied senders, even if `state.metadata.activeDeviceId`
+    // was populated by a bug or a mid-run side effect. Plans absent on old /
+    // resumed operations fall back to the policy-only gate.
    const devicePolicy = state.metadata?.deviceAccessPolicy as
      | { canUseDevice: boolean; reason: DeviceAccessReason }
      | undefined;
+    const executionPlan = state.metadata?.executionPlan as ExecutionPlan | undefined;
+    const planAllowsDevice = !executionPlan || isDeviceCapablePlan(executionPlan);
    const activeDeviceId =
-      devicePolicy?.canUseDevice === false ? undefined : state.metadata?.activeDeviceId;
+      devicePolicy?.canUseDevice === false || !planAllowsDevice
+        ? undefined
+        : state.metadata?.activeDeviceId;
    const operationToolSet: OperationToolSet = state.operationToolSet ?? {
      enabledToolIds: [],
      executorMap: state.toolExecutorMap ?? {},
@@ -660,7 +719,7 @@ export const createRuntimeExecutors = (

    try {
      type ContentPart = { text: string; type: 'text' } | { image: string; type: 'image' };
-      let shouldPersistAssistantReasoning = false;
+      let shouldReplayAssistantReasoning = false;
      let preserveThinkingForPayload: boolean | undefined;

      // Process messages through serverMessagesEngine to inject system role, knowledge, etc.
@@ -699,19 +758,21 @@ export const createRuntimeExecutors = (
          modelSupportsPreserveThinkingFromCard ||
          (!modelCard && providerSupportsPreserveThinkingFallback);

-        shouldPersistAssistantReasoning =
-          preserveThinkingRequested && modelSupportsPreserveThinking;
+        shouldReplayAssistantReasoning = preserveThinkingRequested && modelSupportsPreserveThinking;
        preserveThinkingForPayload =
          modelSupportsPreserveThinking && typeof preserveThinkingConfigured === 'boolean'
            ? preserveThinkingConfigured
            : undefined;
+        const messagesForContext = shouldReplayAssistantReasoning
+          ? (llmPayload.messages as UIChatMessage[])
+          : stripAssistantReasoningForReplay(llmPayload.messages as UIChatMessage[]);

        // Extract <refer_topic> tags from messages and fetch summaries.
        // Skip if messages already contain injected topic_reference_context
        // (e.g., from client-side contextEngineering preprocessing) to avoid double injection.
        let topicReferences;
        const alreadyHasTopicRefs = (
-          llmPayload.messages as Array<{ content: string | unknown }>
+          messagesForContext as Array<{ content: string | unknown }>
        ).some(
          (m) => typeof m.content === 'string' && m.content.includes('topic_reference_context'),
        );
@@ -720,7 +781,7 @@ export const createRuntimeExecutors = (
          const topicModel = new TopicModel(ctx.serverDB, ctx.userId, ctx.workspaceId);
          const messageModel = new MessageModelClass(ctx.serverDB, ctx.userId, ctx.workspaceId);
          topicReferences = await resolveTopicReferences(
-            llmPayload.messages as Array<{ content: string | unknown }>,
+            messagesForContext as Array<{ content: string | unknown }>,
            async (topicId) => topicModel.findById(topicId),
            async (topicId) => {
              const topic = await topicModel.findById(topicId);
@@ -762,7 +823,7 @@ export const createRuntimeExecutors = (
          agentConfig?.slug === 'web-onboarding' ||
          resolved.enabledToolIds.includes('lobe-web-onboarding');
        const alreadyHasOnboardingContext = (
-          llmPayload.messages as Array<{ content: string | unknown }>
+          messagesForContext as Array<{ content: string | unknown }>
        ).some((message) => {
          if (typeof message.content !== 'string') return false;

@@ -1043,7 +1104,7 @@ export const createRuntimeExecutors = (
                name: kb.name ?? '',
              })),
          },
-          messages: llmPayload.messages as UIChatMessage[],
+          messages: messagesForContext,
          model,
          provider,
          systemRole: agentConfig.systemRole ?? undefined,
@@ -1071,14 +1132,14 @@ export const createRuntimeExecutors = (
          CONTEXT_ENGINEERING_SPAN_NAME,
          {
            attributes: buildContextEngineeringAttributes({
-              hasImages: (llmPayload.messages as Array<{ content?: unknown }>).some(
+              hasImages: (messagesForContext as Array<{ content?: unknown }>).some(
                (m) =>
                  Array.isArray(m.content) &&
                  (m.content as Array<{ type?: string }>).some((p) => p?.type === 'image_url'),
              ),
              historyCompressed:
-                Array.isArray(llmPayload.messages) &&
-                llmPayload.messages.some((m: { role?: string }) => m?.role === 'compressedGroup'),
+                Array.isArray(messagesForContext) &&
+                messagesForContext.some((m: { role?: string }) => m?.role === 'compressedGroup'),
              knowledgeCount:
                (contextEngineInput.knowledge?.knowledgeBases?.length ?? 0) +
                (contextEngineInput.knowledge?.fileContents?.length ?? 0),
@@ -1086,7 +1147,7 @@ export const createRuntimeExecutors = (
                (contextEngineInput.knowledge?.knowledgeBases?.length ?? 0) > 0 ||
                (contextEngineInput.knowledge?.fileContents?.length ?? 0) > 0,
              memoryInjected: Boolean(contextEngineInput.userMemory?.memories),
-              messageCount: llmPayload.messages.length,
+              messageCount: messagesForContext.length,
              operationId,
              stepIndex,
              systemRoleLength: contextEngineInput.systemRole?.length,
@@ -1639,9 +1700,10 @@ export const createRuntimeExecutors = (
                };
              }

-              const persistedReasoning = shouldPersistAssistantReasoning
-                ? finalReasoning
-                : undefined;
+              // preserveThinking only gates whether reasoning is replayed into the
+              // next LLM payload (state.messages); the DB copy powers UI display
+              // after refresh and must always be saved.
+              const replayedReasoning = shouldReplayAssistantReasoning ? finalReasoning : undefined;

              try {
                // Build metadata object
@@ -1675,7 +1737,7 @@ export const createRuntimeExecutors = (
                  content: finalContent,
                  imageList: imageList.length > 0 ? imageList : undefined,
                  metadata: Object.keys(metadata).length > 0 ? metadata : undefined,
-                  reasoning: persistedReasoning,
+                  reasoning: finalReasoning,
                  search: grounding,
                  tools: persistedTools,
                });
@@ -1708,7 +1770,7 @@ export const createRuntimeExecutors = (
              newState.messages.push({
                content,
                id: assistantMessageItem.id,
-                reasoning: persistedReasoning,
+                reasoning: replayedReasoning,
                role: 'assistant',
                tool_calls: stateToolCalls,
              });
@@ -2421,7 +2483,7 @@ export const createRuntimeExecutors = (
                scope: state.metadata?.scope,
                serverDB: ctx.serverDB,
                skipResultTruncation: true,
-                subAgent: buildServerSubAgentRunner(
+                subAgent: buildServerVirtualSubAgentRunner(
                  ctx,
                  state,
                  chatToolPayload,
@@ -2663,14 +2725,15 @@ export const createRuntimeExecutors = (

        log('[%s:%d] Tool execution completed', operationId, stepIndex);

-        // When the tool result carries an execSubAgent / execSubAgents state the
-        // GeneralChatAgent needs `stop: true` in the payload to detect it and
-        // emit the matching exec_sub_agent / exec_sub_agents instruction.  Without
-        // this flag the agent falls through to the normal LLM-call path and the
-        // sub-agent is never spawned.
-        const execTaskStateType = executionResult.state?.type as string | undefined;
-        const isExecTaskState =
-          execTaskStateType === 'execSubAgent' || execTaskStateType === 'execSubAgents';
+        // When a legacy callAgent task result carries execSubAgent / execSubAgents
+        // state, the GeneralChatAgent needs `stop: true` in the payload to detect
+        // it and emit the matching exec_sub_agent / exec_sub_agents instruction.
+        // Without this flag the agent falls through to the normal LLM-call path
+        // and the background agent run is never spawned.
+        const legacyAgentInvocationStateType = executionResult.state?.type as string | undefined;
+        const isLegacyAgentInvocationState =
+          legacyAgentInvocationStateType === 'execSubAgent' ||
+          legacyAgentInvocationStateType === 'execSubAgents';

        executeToolSpan.setAttributes(
          buildExecuteToolResultAttributes({ attempts: execution.attempts, success: isSuccess }),
@@ -2686,7 +2749,7 @@ export const createRuntimeExecutors = (
              isSuccess,
              // Pass tool message ID as parentMessageId for the next LLM call
              parentMessageId: toolMessageId,
-              ...(isExecTaskState && { stop: true }),
+              ...(isLegacyAgentInvocationState && { stop: true }),
              toolCall: chatToolPayload,
              toolCallId: chatToolPayload.id,
            },
@@ -2993,7 +3056,7 @@ export const createRuntimeExecutors = (
                    scope: state.metadata?.scope,
                    serverDB: ctx.serverDB,
                    skipResultTruncation: true,
-                    subAgent: buildServerSubAgentRunner(
+                    subAgent: buildServerVirtualSubAgentRunner(
                      ctx,
                      state,
                      chatToolPayload,
--- a/Show More
+++ b/Show More