diff --git a/.agents/skills/agent-testing/SKILL.md b/.agents/skills/agent-testing/SKILL.md
new file mode 100644
index 0000000000..11bd3581cd
--- /dev/null
+++ b/.agents/skills/agent-testing/SKILL.md
@@ -0,0 +1,159 @@
+---
+name: agent-testing
+description: >
+  Agentic end-to-end testing for LobeHub: backend verification via the CLI,
+  frontend verification via agent-browser (Electron), full-stack verification in
+  the browser, and bot-channel verification via osascript. Local-first today,
+  designed to extend to cloud automation. Triggers on 'cli test', 'test with cli',
+  'verify with cli', 'backend test with cli', 'local test', 'test in electron',
+  'test desktop', 'test bot', 'bot test', 'test in discord', 'test in telegram',
+  'test in slack', 'test in wechat', 'test in weixin', 'test in lark', 'test in feishu',
+  'test in qq', 'manual test', 'osascript', 'test report', or any local
+  end-to-end verification task.
+---
+
+# Agent Testing (Agentic End-to-End Verification)
+
+One skill for all agentic end-to-end testing — local-first today, designed to
+also run as full cloud automation. Every test session follows the same
+four-step contract:
+
+```
+Step 0: Auth ready?  →  Step 1: Pick surface  →  Step 2: Run  →  Step 3: Structured report
+```
+
+## Step 0 — Auth first (mandatory)
+
+**Auth is the gate for all automated testing.** Prepare and verify it BEFORE
+writing a single test step — a half-finished test run that dies on a login wall
+wastes the whole session.
+
+```bash
+./.agents/skills/agent-testing/scripts/setup-auth.sh status
+```
+
+| Surface  | Mechanism                                         | One-key path                   | Human needed?                 |
+| -------- | ------------------------------------------------- | ------------------------------ | ----------------------------- |
+| CLI      | OIDC Device Code Flow (`apps/cli/.lobehub-dev`)   | `setup-auth.sh cli`            | Yes — browser authorization   |
+| Web      | better-auth cookie injection into `agent-browser` | `pbpaste \| setup-auth.sh web` | Copy cookie once per rotation |
+| Electron | App's own persistent login state                  | Log in once in the app         | Once                          |
+| Bot      | Native apps already logged in                     | —                              | Once per app                  |
+
+If `status` is not all green, fix auth first (the steps that need a human must be
+requested from the user explicitly). Full background and failure modes:
+[references/auth.md](./references/auth.md).
+
+## Step 1 — Pick the surface by change scope
+
+| Change scope                                            | Default surface                      | Why                                                               | Guide                              |
+| ------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------------------- | ---------------------------------- |
+| **Backend** (TRPC router / service / model / migration) | **CLI**                              | Fastest loop, text-assertable output, zero UI flakiness           | [cli/index.md](./cli/index.md)     |
+| **Pure frontend** (components, store, styles, UX)       | **Electron** (agent-browser + CDP)   | Primary product shape; `__LOBE_STORES` state introspection        | [ui/electron.md](./ui/electron.md) |
+| **Full-stack** (new API + UI consuming it)              | **Web** (browser + local dev server) | One surface where network requests and UI are observable together | [ui/web.md](./ui/web.md)           |
+| **Bot channels** (Discord / WeChat / Lark / …)          | Native app via osascript / bridge    | Only way to exercise the real channel end-to-end                  | `bot/<platform>/index.md`          |
+
+Escalate, don't duplicate: verify a backend change with the CLI first; only add
+a UI pass when the change actually affects the UI.
+
+### Environment support (local macOS vs cloud Linux)
+
+The decisive constraint per surface is **how evidence (screenshots) is
+captured**: CDP-based capture (`agent-browser screenshot`) renders from the
+browser engine and needs no real display; OS-level capture (`screencapture`,
+osascript) is macOS-only.
+
+| Surface  | macOS (local) | Linux / cloud (headless)                                  | Screenshot mechanism                                   |
+| -------- | ------------- | --------------------------------------------------------- | ------------------------------------------------------ |
+| CLI      | ✅            | ✅                                                        | n/a — text output                                      |
+| Web      | ✅            | ✅ headless Chromium works natively                       | CDP — no display needed                                |
+| Electron | ✅            | ⚠️ runs, but needs a display server: wrap with `xvfb-run` | CDP works under Xvfb; `capture-app-window.sh` does NOT |
+| Bot      | ✅            | ❌ osascript + native apps are macOS-only                 | macOS `screencapture` only                             |
+
+When a test must stay cloud-portable, prefer CDP-based evidence over
+OS-level capture wherever both exist.
+
+### Bot platforms
+
+| Platform      | Guide                                            | Quick switcher        |
+| ------------- | ------------------------------------------------ | --------------------- |
+| Discord       | [bot/discord/index.md](./bot/discord/index.md)   | `Cmd+K`               |
+| Slack         | [bot/slack/index.md](./bot/slack/index.md)       | `Cmd+K`               |
+| Telegram      | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F`               |
+| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md)     | `Cmd+F`               |
+| Lark / 飞书   | [bot/lark/index.md](./bot/lark/index.md)         | `Cmd+K`               |
+| QQ            | [bot/qq/index.md](./bot/qq/index.md)             | `Cmd+F`               |
+| iMessage      | [bot/imessage/index.md](./bot/imessage/index.md) | bridge (no osascript) |
+
+Each platform folder contains an `index.md` (activation, navigation,
+send-message, verification snippets) and a `test-<platform>-bot.sh` script
+sharing the interface:
+
+```bash
+./.agents/skills/agent-testing/bot/<platform>/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
+```
+
+New to osascript automation? Read
+[references/osascript.md](./references/osascript.md) first — it is a general
+macOS-automation asset (activate, type, paste, screenshot, accessibility reads,
+gotchas), not bot-specific.
+
+## Step 2 — Run
+
+Surface guides above carry the detailed workflows. Shared infrastructure:
+
+| Need                                 | Where                                                                |
+| ------------------------------------ | -------------------------------------------------------------------- |
+| Start / restart the local dev server | [references/dev-server.md](./references/dev-server.md)               |
+| `agent-browser` command reference    | [references/agent-browser.md](./references/agent-browser.md)         |
+| osascript patterns (general macOS)   | [references/osascript.md](./references/osascript.md)                 |
+| Agent gateway probing                | [references/agent-gateway.md](./references/agent-gateway.md)         |
+| Screen recording                     | [references/record-app-screen.md](./references/record-app-screen.md) |
+
+### Scripts
+
+All under `.agents/skills/agent-testing/scripts/`:
+
+| Script                    | Usage                                                         |
+| ------------------------- | ------------------------------------------------------------- |
+| `setup-auth.sh`           | One-stop auth setup & status check (`status` / `cli` / `web`) |
+| `report-init.sh`          | Scaffold a structured test report (Step 3)                    |
+| `electron-dev.sh`         | Manage Electron dev env (start/stop/status/restart, CDP 9222) |
+| `capture-app-window.sh`   | Screenshot a specific app window (general; used by bot tests) |
+| `record-app-screen.sh`    | Record app screen (video + periodic screenshots)              |
+| `record-electron-demo.sh` | Record Electron app demo with ffmpeg                          |
+| `agent-gateway/`          | Gateway probe / dump / analyze tools                          |
+
+## Step 3 — Structured report (mandatory deliverable)
+
+Every automated test session ends with a structured, evidence-backed report —
+not a chat-only summary. Scaffold it up front and fill it as you test:
+
+```bash
+DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh my-feature "Verify my feature")
+# ... test, saving screenshots / CLI transcripts into $DIR/assets/ ...
+# fill $DIR/report.md (case table, embedded evidence, verdict) and $DIR/result.json
+```
+
+Reports live in `.records/reports/<timestamp>-<slug>/` (gitignored): `report.md`
+(human-readable, with embedded screenshots), `result.json` (machine-readable
+pass/fail + score), `assets/` (evidence). Format spec and evidence rules:
+[references/report.md](./references/report.md).
+
+## Directory map
+
+```
+agent-testing/
+├── SKILL.md            # this router
+├── cli/index.md        # backend verification via the LobeHub CLI
+├── ui/electron.md      # pure-frontend verification in the desktop app
+├── ui/web.md           # full-stack verification in the browser
+├── bot/<platform>/     # bot-channel verification (osascript / bridge)
+├── references/         # shared knowledge: auth, dev-server, agent-browser, osascript, report
+└── scripts/            # setup-auth, report-init, electron-dev, capture, recording, gateway
+```
+
+## Gotchas
+
+- agent-browser: see [references/agent-browser.md](./references/agent-browser.md#gotchas)
+- Electron: see [ui/electron.md](./ui/electron.md#electron-gotchas)
+- osascript: see [references/osascript.md](./references/osascript.md#gotchas)
diff --git a/.agents/skills/local-testing/bot/discord/index.md b/.agents/skills/agent-testing/bot/discord/index.md
similarity index 90%
rename from .agents/skills/local-testing/bot/discord/index.md
rename to .agents/skills/agent-testing/bot/discord/index.md
index 8dd3789f40..00b9096f10 100644
--- a/.agents/skills/local-testing/bot/discord/index.md
+++ b/.agents/skills/agent-testing/bot/discord/index.md
@@ -2,7 +2,7 @@
 
 **App name:** `Discord` | **Process name:** `Discord`
 
-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.
 
 ## Activate & Navigate
 
@@ -92,6 +92,6 @@ echo "Screenshot saved to /tmp/discord-test-result.png"
 ## Script
 
 ```bash
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
+./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
+./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
 ```
diff --git a/.agents/skills/local-testing/bot/discord/test-discord-bot.sh b/.agents/skills/agent-testing/bot/discord/test-discord-bot.sh
similarity index 96%
rename from .agents/skills/local-testing/bot/discord/test-discord-bot.sh
rename to .agents/skills/agent-testing/bot/discord/test-discord-bot.sh
index 6eeff16408..be71ec1df9 100755
--- a/.agents/skills/local-testing/bot/discord/test-discord-bot.sh
+++ b/.agents/skills/agent-testing/bot/discord/test-discord-bot.sh
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"
 
 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
diff --git a/.agents/skills/local-testing/bot/imessage/index.md b/.agents/skills/agent-testing/bot/imessage/index.md
similarity index 98%
rename from .agents/skills/local-testing/bot/imessage/index.md
rename to .agents/skills/agent-testing/bot/imessage/index.md
index 481e61a557..f966f16d4f 100644
--- a/.agents/skills/local-testing/bot/imessage/index.md
+++ b/.agents/skills/agent-testing/bot/imessage/index.md
@@ -21,7 +21,7 @@ So the test surface is three layers:
   curl -sS -m4 -o /dev/null -w '%{http_code}\n' \
     "http://127.0.0.1:1234/api/v1/server/info?password=<PW>" # expect 200
   ```
-- **Electron dev running with CDP**: `./.agents/skills/local-testing/scripts/electron-dev.sh start`
+- **Electron dev running with CDP**: `./.agents/skills/agent-testing/scripts/electron-dev.sh start`
 - The **iMessage Desktop branch** checked out (the `imessageBridge` IPC group
   and `@lobechat/chat-adapter-imessage` must be compiled into the main bundle).
   Run `pnpm install --ignore-scripts` at the repo root **and** in `apps/desktop/`
@@ -31,7 +31,7 @@ So the test surface is three layers:
 ## Fast path: automated script
 
 ```bash
-./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
+./.agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
 ```
 
 Asserts the whole flow and self-cleans (unique `applicationId` per run, removes
@@ -136,7 +136,7 @@ Verifies the leg the bridge uses to _reply_: `BlueBubblesApiClient.sendText`
 → `POST /api/v1/message/text`. Run the helper against your own number:
 
 ```bash
-./.agents/skills/local-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
+./.agents/skills/agent-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
 ```
 
 **Gotcha that bites everyone:** with `method=apple-script` and a _new_
diff --git a/.agents/skills/local-testing/bot/imessage/send-imessage-test.sh b/.agents/skills/agent-testing/bot/imessage/send-imessage-test.sh
similarity index 100%
rename from .agents/skills/local-testing/bot/imessage/send-imessage-test.sh
rename to .agents/skills/agent-testing/bot/imessage/send-imessage-test.sh
diff --git a/.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh b/.agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh
similarity index 100%
rename from .agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh
rename to .agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh
diff --git a/.agents/skills/local-testing/bot/lark/index.md b/.agents/skills/agent-testing/bot/lark/index.md
similarity index 85%
rename from .agents/skills/local-testing/bot/lark/index.md
rename to .agents/skills/agent-testing/bot/lark/index.md
index 9ec25caf9e..9fbbc01e5b 100644
--- a/.agents/skills/local-testing/bot/lark/index.md
+++ b/.agents/skills/agent-testing/bot/lark/index.md
@@ -2,7 +2,7 @@
 
 **App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`
 
-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.
 
 ## Activate & Navigate
 
@@ -56,6 +56,6 @@ screencapture /tmp/lark-bot-response.png
 ## Script
 
 ```bash
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
+./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
+./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
 ```
diff --git a/.agents/skills/local-testing/bot/lark/test-lark-bot.sh b/.agents/skills/agent-testing/bot/lark/test-lark-bot.sh
similarity index 97%
rename from .agents/skills/local-testing/bot/lark/test-lark-bot.sh
rename to .agents/skills/agent-testing/bot/lark/test-lark-bot.sh
index a62903cdd1..daa0dd89d0 100755
--- a/.agents/skills/local-testing/bot/lark/test-lark-bot.sh
+++ b/.agents/skills/agent-testing/bot/lark/test-lark-bot.sh
@@ -80,5 +80,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"
 
 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
diff --git a/.agents/skills/local-testing/bot/qq/index.md b/.agents/skills/agent-testing/bot/qq/index.md
similarity index 82%
rename from .agents/skills/local-testing/bot/qq/index.md
rename to .agents/skills/agent-testing/bot/qq/index.md
index e5c3815b72..afc0fb0297 100644
--- a/.agents/skills/local-testing/bot/qq/index.md
+++ b/.agents/skills/agent-testing/bot/qq/index.md
@@ -2,7 +2,7 @@
 
 **App name:** `QQ` | **Process name:** `QQ`
 
-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.
 
 ## Activate & Navigate
 
@@ -57,6 +57,6 @@ screencapture /tmp/qq-bot-response.png
 ## Script
 
 ```bash
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
+./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
+./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
 ```
diff --git a/.agents/skills/local-testing/bot/qq/test-qq-bot.sh b/.agents/skills/agent-testing/bot/qq/test-qq-bot.sh
similarity index 96%
rename from .agents/skills/local-testing/bot/qq/test-qq-bot.sh
rename to .agents/skills/agent-testing/bot/qq/test-qq-bot.sh
index 58896a52c8..763bae8752 100755
--- a/.agents/skills/local-testing/bot/qq/test-qq-bot.sh
+++ b/.agents/skills/agent-testing/bot/qq/test-qq-bot.sh
@@ -72,5 +72,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"
 
 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
diff --git a/.agents/skills/local-testing/bot/slack/index.md b/.agents/skills/agent-testing/bot/slack/index.md
similarity index 86%
rename from .agents/skills/local-testing/bot/slack/index.md
rename to .agents/skills/agent-testing/bot/slack/index.md
index 929b6bbdc5..b5c591d4ab 100644
--- a/.agents/skills/local-testing/bot/slack/index.md
+++ b/.agents/skills/agent-testing/bot/slack/index.md
@@ -2,7 +2,7 @@
 
 **App name:** `Slack` | **Process name:** `Slack`
 
-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.
 
 ## Activate & Navigate
 
@@ -68,6 +68,6 @@ screencapture /tmp/slack-bot-response.png
 ## Script
 
 ```bash
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
+./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
+./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
 ```
diff --git a/.agents/skills/local-testing/bot/slack/test-slack-bot.sh b/.agents/skills/agent-testing/bot/slack/test-slack-bot.sh
similarity index 96%
rename from .agents/skills/local-testing/bot/slack/test-slack-bot.sh
rename to .agents/skills/agent-testing/bot/slack/test-slack-bot.sh
index 8841def381..2906d2dfe8 100755
--- a/.agents/skills/local-testing/bot/slack/test-slack-bot.sh
+++ b/.agents/skills/agent-testing/bot/slack/test-slack-bot.sh
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"
 
 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
diff --git a/.agents/skills/local-testing/bot/telegram/index.md b/.agents/skills/agent-testing/bot/telegram/index.md
similarity index 87%
rename from .agents/skills/local-testing/bot/telegram/index.md
rename to .agents/skills/agent-testing/bot/telegram/index.md
index 9c5435141b..df094a374a 100644
--- a/.agents/skills/local-testing/bot/telegram/index.md
+++ b/.agents/skills/agent-testing/bot/telegram/index.md
@@ -2,7 +2,7 @@
 
 **App name:** `Telegram` | **Process name:** `Telegram`
 
-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.
 
 ## Activate & Navigate
 
@@ -75,6 +75,6 @@ curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | j
 ## Script
 
 ```bash
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
+./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
+./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
 ```
diff --git a/.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh b/.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh
similarity index 97%
rename from .agents/skills/local-testing/bot/telegram/test-telegram-bot.sh
rename to .agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh
index 02e5a059c4..3d18f0291c 100755
--- a/.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh
+++ b/.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh
@@ -75,5 +75,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"
 
 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
diff --git a/.agents/skills/local-testing/bot/wechat/index.md b/.agents/skills/agent-testing/bot/wechat/index.md
similarity index 88%
rename from .agents/skills/local-testing/bot/wechat/index.md
rename to .agents/skills/agent-testing/bot/wechat/index.md
index d5a88969f7..53afdef8ca 100644
--- a/.agents/skills/local-testing/bot/wechat/index.md
+++ b/.agents/skills/agent-testing/bot/wechat/index.md
@@ -2,7 +2,7 @@
 
 **App name:** `微信` or `WeChat` | **Process name:** `WeChat`
 
-See [osascript-common.md](../osascript-common.md) for shared patterns.
+See [references/osascript.md](../../references/osascript.md) for shared patterns.
 
 ## Activate & Navigate
 
@@ -76,6 +76,6 @@ screencapture /tmp/wechat-bot-response.png
 ## Script
 
 ```bash
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
+./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
+./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
 ```
diff --git a/.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh b/.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh
similarity index 97%
rename from .agents/skills/local-testing/bot/wechat/test-wechat-bot.sh
rename to .agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh
index 44d322fddf..4f76f28fc3 100755
--- a/.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh
+++ b/.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh
@@ -81,5 +81,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
 sleep "$WAIT"
 
 echo "[$APP] Capturing screenshot..."
-"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
+"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
 echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
diff --git a/.agents/skills/agent-testing/cli/index.md b/.agents/skills/agent-testing/cli/index.md
new file mode 100644
index 0000000000..e7088c43b2
--- /dev/null
+++ b/.agents/skills/agent-testing/cli/index.md
@@ -0,0 +1,142 @@
+# CLI Backend Verification
+
+Default surface for verifying **backend changes** (TRPC routers, services,
+models, migrations) end-to-end: fastest loop, text-assertable output, zero UI
+flakiness.
+
+## When to use
+
+- Verifying TRPC router / service / model changes end-to-end
+- Testing new API fields or response structure changes
+- Validating CLI command output after backend modifications
+- Debugging data flow issues between server and CLI
+
+## Prerequisites
+
+| Requirement  | Details                                                                           |
+| ------------ | --------------------------------------------------------------------------------- |
+| Dev server   | `localhost:3010` — see [../references/dev-server.md](../references/dev-server.md) |
+| CLI source   | `apps/cli/` — runs from source, no rebuild needed                                 |
+| CLI dev mode | `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials                          |
+| Auth         | Device Code Flow login — see [../references/auth.md](../references/auth.md)       |
+
+All CLI dev commands run from `apps/cli/`. Subsequent examples use `$CLI`:
+
+```bash
+CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
+```
+
+## Workflow
+
+### Step 1 — Server up?
+
+See [../references/dev-server.md](../references/dev-server.md) for the health
+check, start, and restart commands. Server-side code changes require a restart.
+
+### Step 2 — Auth ready?
+
+```bash
+./.agents/skills/agent-testing/scripts/setup-auth.sh status
+```
+
+If the CLI is not logged in, **the user must run the login themselves**
+(interactive browser authorization):
+
+```bash
+cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
+```
+
+Credentials persist in `apps/cli/.lobehub-dev/`. Details:
+[../references/auth.md](../references/auth.md).
+
+### Step 3 — Test with CLI commands
+
+CLI runs from source, so CLI-side code changes take effect immediately without
+rebuilding:
+
+```bash
+cd apps/cli
+$CLI <command>
+```
+
+Capture output for the report as you go (e.g. `$CLI task list | tee "$DIR/assets/task-list.txt"`).
+
+### Step 4 — Clean up test data
+
+```bash
+$CLI task delete < id > -y
+$CLI agent delete < id > -y
+```
+
+### Step 5 — Report
+
+Finish with a structured report —
+[../references/report.md](../references/report.md). CLI evidence = exact
+command + trimmed output.
+
+## Common testing patterns
+
+### Task system
+
+```bash
+$CLI task list
+$CLI task create -n "Root Task" -i "Test instruction"
+$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
+$CLI task view T-1
+$CLI task tree T-1
+$CLI task edit T-1 --status running
+$CLI task comment T-1 -m "Test comment"
+$CLI task delete T-1 -y
+```
+
+### Agent system
+
+```bash
+$CLI agent list
+$CLI agent view <agent-id>
+$CLI agent run <agent-id> -m "Test prompt"
+```
+
+### Document & knowledge base
+
+```bash
+$CLI doc list
+$CLI doc create -t "Test Doc" -c "Content here"
+$CLI doc view <doc-id>
+$CLI kb list
+$CLI kb tree <kb-id>
+```
+
+### Model & provider
+
+```bash
+$CLI model list
+$CLI provider list
+$CLI provider test <provider-id>
+```
+
+## Dev-test cycle
+
+```
+1. Make code changes (service/model/router/type)
+         |
+2. Run unit tests (fast feedback)
+   bunx vitest run --silent='passed-only' '<test-file>'
+         |
+3. Restart dev server (if server-side changes — see dev-server.md)
+         |
+4. CLI verification (end-to-end)
+   $CLI <command>
+         |
+5. Clean up test data + write the report
+```
+
+## Troubleshooting
+
+| Issue                       | Solution                                        |
+| --------------------------- | ----------------------------------------------- |
+| `No authentication found`   | Run `login --server http://localhost:3010`      |
+| `UNAUTHORIZED` on API calls | Token expired; re-run login                     |
+| `ECONNREFUSED`              | Dev server not running — see dev-server.md      |
+| CLI shows old data/behavior | Server needs restart to pick up code changes    |
+| Login opens wrong server    | Must use `--server` flag (env var doesn't work) |
diff --git a/.agents/skills/agent-testing/references/agent-browser.md b/.agents/skills/agent-testing/references/agent-browser.md
new file mode 100644
index 0000000000..2d0d4cecd7
--- /dev/null
+++ b/.agents/skills/agent-testing/references/agent-browser.md
@@ -0,0 +1,257 @@
+# agent-browser CLI Reference
+
+Generic reference for the `agent-browser` CLI — automate Chromium-based apps (Electron, Chrome, web) via Chrome DevTools Protocol. LobeHub-specific patterns live in [../ui/electron.md](../ui/electron.md) and [../ui/web.md](../ui/web.md); authentication recipes live in [auth.md](./auth.md).
+
+Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
+
+Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
+
+## Core Workflow
+
+Every browser automation follows this pattern:
+
+1. **Navigate**: `agent-browser open <url>`
+2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
+3. **Interact**: Use refs to click, fill, select
+4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
+
+```bash
+agent-browser open https://example.com/form
+agent-browser snapshot -i
+# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
+
+agent-browser fill @e1 "user@example.com"
+agent-browser fill @e2 "password123"
+agent-browser click @e3
+agent-browser wait --load networkidle
+agent-browser snapshot -i # Check result
+```
+
+## Command Chaining
+
+```bash
+# Chain open + wait + snapshot in one call
+agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
+```
+
+Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
+
+## Essential Commands
+
+```bash
+# Navigation
+agent-browser open <url>              # Navigate (aliases: goto, navigate)
+agent-browser close                   # Close browser
+agent-browser close --all             # Close all active sessions
+
+# Snapshot
+agent-browser snapshot -i             # Interactive elements with refs (recommended)
+agent-browser snapshot -s "#selector" # Scope to CSS selector
+
+# Interaction (use @refs from snapshot)
+agent-browser click @e1               # Click element
+agent-browser click @e1 --new-tab     # Click and open in new tab
+agent-browser fill @e2 "text"         # Clear and type text
+agent-browser type @e2 "text"         # Type without clearing
+agent-browser select @e1 "option"     # Select dropdown option
+agent-browser check @e1               # Check checkbox
+agent-browser press Enter             # Press key
+agent-browser keyboard type "text"    # Type at current focus (no selector)
+agent-browser keyboard inserttext "text"  # Insert without key events
+agent-browser scroll down 500         # Scroll page
+agent-browser scroll down 500 --selector "div.content"  # Scroll within container
+
+# Get information
+agent-browser get text @e1            # Get element text
+agent-browser get url                 # Get current URL
+agent-browser get title               # Get page title
+agent-browser get cdp-url             # Get CDP WebSocket URL
+
+# Wait
+agent-browser wait @e1                # Wait for element
+agent-browser wait --load networkidle # Wait for network idle
+agent-browser wait --url "**/page"    # Wait for URL pattern
+agent-browser wait 2000               # Wait milliseconds
+agent-browser wait --text "Welcome"   # Wait for text to appear
+agent-browser wait --fn "!document.body.innerText.includes('Loading...')"  # Wait for text to disappear
+agent-browser wait "#spinner" --state hidden  # Wait for element to disappear
+
+# Downloads
+agent-browser download @e1 ./file.pdf          # Click element to trigger download
+agent-browser wait --download ./output.zip     # Wait for any download to complete
+
+# Network
+agent-browser network requests                 # Inspect tracked requests
+agent-browser network requests --type xhr,fetch  # Filter by resource type
+agent-browser network requests --method POST   # Filter by HTTP method
+agent-browser network route "**/api/*" --abort # Block matching requests
+agent-browser network har start                # Start HAR recording
+agent-browser network har stop ./capture.har   # Stop and save HAR file
+
+# Viewport & Device Emulation
+agent-browser set viewport 1920 1080          # Set viewport size (default: 1280x720)
+agent-browser set viewport 1920 1080 2        # 2x retina
+agent-browser set device "iPhone 14"          # Emulate device (viewport + user agent)
+
+# Capture
+agent-browser screenshot              # Screenshot to temp dir
+agent-browser screenshot --full       # Full page screenshot
+agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
+agent-browser pdf output.pdf          # Save as PDF
+
+# Clipboard
+agent-browser clipboard read          # Read text from clipboard
+agent-browser clipboard write "text"  # Write text to clipboard
+agent-browser clipboard copy          # Copy current selection
+agent-browser clipboard paste         # Paste from clipboard
+
+# Dialogs (alert, confirm, prompt, beforeunload)
+agent-browser dialog accept           # Accept dialog
+agent-browser dialog accept "input"   # Accept prompt dialog with text
+agent-browser dialog dismiss          # Dismiss/cancel dialog
+agent-browser dialog status           # Check if dialog is open
+
+# Diff (compare page states)
+agent-browser diff snapshot                        # Compare current vs last snapshot
+agent-browser diff screenshot --baseline before.png  # Visual pixel diff
+agent-browser diff url <url1> <url2>               # Compare two pages
+
+# Streaming
+agent-browser stream enable           # Start WebSocket streaming
+agent-browser stream status           # Inspect streaming state
+agent-browser stream disable          # Stop streaming
+```
+
+## Batch Execution
+
+```bash
+echo '[
+  ["open", "https://example.com"],
+  ["snapshot", "-i"],
+  ["click", "@e1"],
+  ["screenshot", "result.png"]
+]' | agent-browser batch --json
+```
+
+## Authentication
+
+```bash
+# Option 1: Auth vault (credentials stored encrypted)
+echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
+agent-browser auth login myapp
+
+# Option 2: Session name (auto-save/restore cookies + localStorage)
+agent-browser --session-name myapp open https://app.example.com/login
+agent-browser close                                                       # State auto-saved
+agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
+
+# Option 3: Persistent profile
+agent-browser --profile ~/.myapp open https://app.example.com/login
+
+# Option 4: State file
+agent-browser state save auth.json
+agent-browser state load auth.json
+```
+
+### LobeHub dev server — inject better-auth cookie
+
+`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3010`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [auth.md](./auth.md) for the full recipe.
+
+## Semantic Locators (Alternative to Refs)
+
+```bash
+agent-browser find text "Sign In" click
+agent-browser find label "Email" fill "user@test.com"
+agent-browser find role button click --name "Submit"
+agent-browser find placeholder "Search" type "query"
+agent-browser find testid "submit-btn" click
+```
+
+## JavaScript Evaluation (eval)
+
+```bash
+# Simple expressions
+agent-browser eval 'document.title'
+
+# Complex JS: use --stdin with heredoc (RECOMMENDED)
+agent-browser eval --stdin << 'EVALEOF'
+JSON.stringify(
+  Array.from(document.querySelectorAll("img"))
+    .filter(i => !i.alt)
+    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
+)
+EVALEOF
+
+# Base64 encoding (avoids all shell escaping issues)
+agent-browser eval -b "$(echo -n 'document.title' | base64)"
+```
+
+## Ref Lifecycle
+
+Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
+
+## Annotated Screenshots (Vision Mode)
+
+```bash
+agent-browser screenshot --annotate
+# Output includes the image path and a legend:
+#   [1] @e1 button "Submit"
+#   [2] @e2 link "Home"
+agent-browser click @e2 # Click using ref from annotated screenshot
+```
+
+## Parallel Sessions
+
+```bash
+agent-browser --session site1 open https://site-a.com
+agent-browser --session site2 open https://site-b.com
+agent-browser session list
+```
+
+## Connect to Existing Chrome
+
+```bash
+agent-browser --auto-connect snapshot # Auto-discover running Chrome
+agent-browser --cdp 9222 snapshot     # Explicit CDP port
+```
+
+## iOS Simulator (Mobile Safari)
+
+```bash
+agent-browser device list
+agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
+agent-browser -p ios snapshot -i
+agent-browser -p ios tap @e1
+agent-browser -p ios swipe up
+agent-browser -p ios screenshot mobile.png
+agent-browser -p ios close
+```
+
+## Observability Dashboard
+
+```bash
+agent-browser dashboard install
+agent-browser dashboard start # Background server on port 4848
+agent-browser dashboard stop
+```
+
+## Cloud Providers
+
+Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
+
+## Browser Engine Selection
+
+```bash
+agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
+```
+
+## Gotchas
+
+- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
+- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
+- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
+- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
+- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
+- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
+- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
+- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
diff --git a/.agents/skills/local-testing/references/agent-gateway.md b/.agents/skills/agent-testing/references/agent-gateway.md
similarity index 92%
rename from .agents/skills/local-testing/references/agent-gateway.md
rename to .agents/skills/agent-testing/references/agent-gateway.md
index de30fa2362..3e5fbf7b3b 100644
--- a/.agents/skills/local-testing/references/agent-gateway.md
+++ b/.agents/skills/agent-testing/references/agent-gateway.md
@@ -19,13 +19,13 @@ works for any LobeHub streaming session.
 
 ```bash
 # 1. Start Electron with CDP
-./.agents/skills/local-testing/scripts/electron-dev.sh start
+./.agents/skills/agent-testing/scripts/electron-dev.sh start
 
 # 2. Navigate to a chat, switch runtime to Cloud Sandbox (gateway mode)
 
 # 3. Install the probe + helpers
 agent-browser --cdp 9222 eval --stdin \
-  < .agents/skills/local-testing/scripts/agent-gateway/probe.js
+  < .agents/skills/agent-testing/scripts/agent-gateway/probe.js
 
 # 4. Send a tool-call message — manually or via type+press
 agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
@@ -34,15 +34,15 @@ agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
 #    rightmost inactive tab as AWAY — edit ROUND_TRIPS / DWELL_MS in the
 #    file if you want different timing)
 agent-browser --cdp 9222 eval --stdin \
-  < .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
+  < .agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js
 
 # 6. Wait for streaming to finish, then dump
 agent-browser --cdp 9222 eval --stdin \
-  < .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js \
+  < .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js \
   > /tmp/probe.json
 
 # 7. Analyze
-node .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
+node .agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
 ```
 
 The analyzer prints three sections: EVENTS, TIMELINE, REGRESSIONS. If
diff --git a/.agents/skills/agent-testing/references/auth.md b/.agents/skills/agent-testing/references/auth.md
new file mode 100644
index 0000000000..8e13c50717
--- /dev/null
+++ b/.agents/skills/agent-testing/references/auth.md
@@ -0,0 +1,115 @@
+# Auth Setup for Local Agent Testing
+
+**Auth is the gate for all automated testing.** Prepare and verify it before
+writing any test step. The one-stop entry point is:
+
+```bash
+SCRIPT=".agents/skills/agent-testing/scripts/setup-auth.sh"
+
+$SCRIPT status        # check server + CLI + web auth readiness
+$SCRIPT cli           # interactive CLI device-code login (must be run by the user)
+pbpaste | $SCRIPT web # inject a copied Cookie header into the agent-browser session
+$SCRIPT web-verify    # live-check that the agent-browser session is authenticated
+```
+
+`SERVER_URL` defaults to `http://localhost:3010` (this repo's `dev:next` port).
+Override it when testing against another server (e.g. `SERVER_URL=http://localhost:3011`
+in the cloud repo).
+
+## Per-surface overview
+
+| Surface  | Mechanism                                | Persistence                                                       | Human interaction                               |
+| -------- | ---------------------------------------- | ----------------------------------------------------------------- | ----------------------------------------------- |
+| CLI      | OIDC Device Code Flow                    | `apps/cli/.lobehub-dev/settings.json`                             | Yes — browser authorization, every token expiry |
+| Web      | better-auth cookie injection             | `~/.lobehub-agent-testing/web-state.json` + agent-browser session | Copy the Cookie header once per token rotation  |
+| Electron | App's own login state                    | Electron user-data dir                                            | Log in once manually in the app                 |
+| Bot      | Native apps (Discord/WeChat/…) logged in | Each app's own session                                            | Once per app                                    |
+
+## CLI — Device Code Flow
+
+Credentials are isolated from the user's real CLI config via
+`LOBEHUB_CLI_HOME=.lobehub-dev` (kept inside `apps/cli/`, gitignored).
+
+Login requires interactive browser authorization, so **the user must run it
+themselves** (e.g. via the `!` prefix in Claude Code):
+
+```bash
+cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
+```
+
+- The `--server` flag is required — an env var does NOT work and login will hit
+  the wrong server without it.
+- Check state without logging in: `setup-auth.sh status` (verifies
+  `settings.json` exists and `serverUrl` matches).
+- `UNAUTHORIZED` on API calls means the token expired — re-run login.
+
+## Web — better-auth cookie injection (agent-browser)
+
+`agent-browser --headed` on macOS often creates the Chromium window off-screen —
+the user can't see or interact with it, so manual login inside the agent-browser
+session fails. Instead, copy the **better-auth session cookie** out of the
+user's own logged-in Chrome and inject it as a Playwright-style state file.
+
+Do **not** use this on production URLs — only local dev. Treat the cookie as a
+secret: don't paste it into shared logs, PRs, or commit it anywhere.
+
+### One-key path
+
+1. Ask the user to copy the Cookie header **from a Network request, NOT
+   `document.cookie`** (`document.cookie` cannot see HttpOnly cookies, which is
+   exactly where better-auth puts its session):
+   - Open the logged-in tab (`http://localhost:<port>/…`) in Chrome.
+   - `Cmd+Option+I` → **Network** tab → refresh → click any same-origin request.
+   - Under **Request Headers**, right-click the `Cookie:` line → **Copy value**.
+2. Inject and verify in one shot:
+
+```bash
+pbpaste | ./.agents/skills/agent-testing/scripts/setup-auth.sh web
+```
+
+The script filters the header down to the better-auth cookies
+(`better-auth.session_token`, `better-auth.state`), builds the Playwright
+`storageState` JSON, loads it into the `agent-browser` session (default name
+`lobehub-dev`), opens `SERVER_URL`, and asserts the URL is not `/signin`.
+
+### Using the authenticated session
+
+```bash
+agent-browser --session lobehub-dev open "http://localhost:3010/"
+agent-browser --session lobehub-dev snapshot -i | head -20
+# Look for the user's avatar/name in the sidebar, or absence of the signin form.
+```
+
+### Notes
+
+- `storageState` doesn't enforce the HttpOnly flag on load — the script stores
+  cookies with `httpOnly: false`, which is fine for local dev and sidesteps a
+  CDP-context quirk where HttpOnly cookies sometimes fail to attach.
+- The state file is kept at `~/.lobehub-agent-testing/web-state.json` so
+  `setup-auth.sh status` can report web-auth readiness across sessions.
+
+### Common failure modes
+
+| Symptom                                       | Cause                                                                     | Fix                                               |
+| --------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------- |
+| Still redirects to `/signin` after injection  | User pasted from `document.cookie` → missed HttpOnly session              | Re-pull from Network request Headers, not console |
+| Script reports `no better-auth cookies found` | Separator wrong, or user pasted URL-decoded value                         | Keep the raw `Cookie:` header as-is               |
+| Login works briefly then expires              | `better-auth.session_token` rotated (user logged out / signed in again)   | Re-copy and re-inject                             |
+| Domain mismatch                               | Cookie domain must be `localhost` literally, no leading dot for local dev | —                                                 |
+
+## Electron
+
+The desktop app keeps its own persistent login state in its user-data
+directory — log in once manually inside the app and it survives restarts of
+`electron-dev.sh`. No injection needed; just confirm with a snapshot that the
+app is past the signin screen before running UI tests.
+
+## Scope
+
+These recipes only cover **local dev** authentication. They do not:
+
+- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com`
+  and must be delivered over HTTPS.
+- Replace real OAuth flows — tests that must exercise the login UI itself need a
+  real Chromium with `--remote-debugging-port` or a bot account.
+- Flow cookies back to the user's Chrome — injection is one-way.
diff --git a/.agents/skills/agent-testing/references/dev-server.md b/.agents/skills/agent-testing/references/dev-server.md
new file mode 100644
index 0000000000..395b118387
--- /dev/null
+++ b/.agents/skills/agent-testing/references/dev-server.md
@@ -0,0 +1,55 @@
+# Local Dev Server
+
+Single source of truth for starting / restarting the backend that all test
+surfaces (CLI, Electron, Web) hit.
+
+## Ports & modes
+
+| Command             | What it runs                                              | Port                              |
+| ------------------- | --------------------------------------------------------- | --------------------------------- |
+| `pnpm run dev:next` | Next.js backend (API + auth)                              | `3010`                            |
+| `bun run dev`       | Full-stack (Next.js + Vite SPA, via `devStartupSequence`) | `3010` (API) + SPA                |
+| `bun run dev:spa`   | Vite SPA only, proxies API to `3010`                      | `9876` (prints a Debug Proxy URL) |
+
+In the **cloud repo** (where this repo is the `lobehub/` submodule) the dev
+server conventionally runs on `3011` — set `SERVER_URL=http://localhost:3011`
+for the scripts in this skill when testing there.
+
+## Health check
+
+```bash
+curl -s -o /dev/null -w '%{http_code}' http://localhost:3010/
+```
+
+## Start / restart
+
+```bash
+# Start (from repo root)
+pnpm run dev:next
+
+# Restart — required to pick up server-side code changes
+lsof -ti:3010 | xargs kill
+pnpm run dev:next
+```
+
+## When a server restart is needed
+
+Next.js hot-reload may not pick up changes in workspace packages — restart when
+in doubt.
+
+| Change location                                 | Restart? |
+| ----------------------------------------------- | -------- |
+| `apps/server/src/` (routers, services, modules) | Yes      |
+| `src/server/` (agent-hono, workflows-hono)      | Yes      |
+| `packages/database/` (models)                   | Yes      |
+| `packages/types/`                               | Yes      |
+| `packages/prompts/`                             | Yes      |
+| `apps/cli/` (CLI runs from source)              | No       |
+
+## Troubleshooting
+
+| Issue                     | Solution                                                |
+| ------------------------- | ------------------------------------------------------- |
+| `ECONNREFUSED`            | Server not running — start it                           |
+| `EADDRINUSE` on the port  | Already running — `lsof -ti:<port> \| xargs kill` first |
+| Stale data / old behavior | Server needs a restart to pick up code changes          |
diff --git a/.agents/skills/local-testing/bot/osascript-common.md b/.agents/skills/agent-testing/references/osascript.md
similarity index 100%
rename from .agents/skills/local-testing/bot/osascript-common.md
rename to .agents/skills/agent-testing/references/osascript.md
diff --git a/.agents/skills/local-testing/references/record-app-screen.md b/.agents/skills/agent-testing/references/record-app-screen.md
similarity index 88%
rename from .agents/skills/local-testing/references/record-app-screen.md
rename to .agents/skills/agent-testing/references/record-app-screen.md
index 193a5a38bb..1f06192d3b 100644
--- a/.agents/skills/local-testing/references/record-app-screen.md
+++ b/.agents/skills/agent-testing/references/record-app-screen.md
@@ -12,13 +12,13 @@ General-purpose screen recording tool for the Electron app. Captures CDP screens
 
 ```bash
 # Start recording (Electron must be running with CDP)
-.agents/skills/local-testing/scripts/record-app-screen.sh start [output_name]
+.agents/skills/agent-testing/scripts/record-app-screen.sh start [output_name]
 
 # Stop recording and assemble video
-.agents/skills/local-testing/scripts/record-app-screen.sh stop
+.agents/skills/agent-testing/scripts/record-app-screen.sh stop
 
 # Check if recording is active
-.agents/skills/local-testing/scripts/record-app-screen.sh status
+.agents/skills/agent-testing/scripts/record-app-screen.sh status
 ```
 
 ### Arguments
@@ -74,10 +74,10 @@ The `.records/` directory is at the project root and is gitignored.
 
 ```bash
 # Start Electron
-.agents/skills/local-testing/scripts/electron-dev.sh start
+.agents/skills/agent-testing/scripts/electron-dev.sh start
 
 # Start recording
-.agents/skills/local-testing/scripts/record-app-screen.sh start my-test
+.agents/skills/agent-testing/scripts/record-app-screen.sh start my-test
 
 # Run automation
 agent-browser --cdp 9222 click @e61
@@ -86,14 +86,14 @@ agent-browser --cdp 9222 press Enter
 sleep 10
 
 # Stop and get results
-.agents/skills/local-testing/scripts/record-app-screen.sh stop
+.agents/skills/agent-testing/scripts/record-app-screen.sh stop
 # → .records/my-test.mp4 + .records/my-test/*.png
 ```
 
 ### Gateway Streaming Demo
 
 ```bash
-.agents/skills/local-testing/scripts/electron-dev.sh start
+.agents/skills/agent-testing/scripts/electron-dev.sh start
 
 # Inject gateway URL
 agent-browser --cdp 9222 eval --stdin << 'EOF'
@@ -106,19 +106,19 @@ agent-browser --cdp 9222 eval --stdin << 'EOF'
 EOF
 
 # Record
-.agents/skills/local-testing/scripts/record-app-screen.sh start gateway-demo
+.agents/skills/agent-testing/scripts/record-app-screen.sh start gateway-demo
 
 # Navigate to agent, send message, wait for completion...
 # (automation commands here)
 
-.agents/skills/local-testing/scripts/record-app-screen.sh stop
+.agents/skills/agent-testing/scripts/record-app-screen.sh stop
 open .records/gateway-demo.mp4
 ```
 
 ### Check Active Recording
 
 ```bash
-.agents/skills/local-testing/scripts/record-app-screen.sh status
+.agents/skills/agent-testing/scripts/record-app-screen.sh status
 # [record] Active recording
 #   Frames:      42 captured (running: yes)
 #   Screenshots: 14 captured (running: yes)
diff --git a/.agents/skills/agent-testing/references/report.md b/.agents/skills/agent-testing/references/report.md
new file mode 100644
index 0000000000..24064d7fc8
--- /dev/null
+++ b/.agents/skills/agent-testing/references/report.md
@@ -0,0 +1,98 @@
+# Structured Test Reports
+
+Every automated test session ends with a structured, evidence-backed report.
+A chat-only summary is not an acceptable deliverable: the report is what the
+user (or a reviewer, or a later agent) audits without replaying the session.
+
+## Location & layout
+
+Reports live under `.records/reports/` (gitignored, like all `.records/`
+output):
+
+```
+.records/reports/<YYYYMMDD-HHMMSS>-<slug>/
+├── report.md      # human-readable report (embedded screenshots, case table, verdict)
+├── result.json    # machine-readable results (pass/fail counts, score)
+└── assets/        # evidence: screenshots, HAR files, CLI transcripts
+```
+
+## Workflow
+
+1. **Scaffold up front** — before running the first test step:
+
+   ```bash
+   DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh < slug > "<title>")
+   ```
+
+   The script creates the directory, pre-fills branch / commit / date in both
+   files, and prints the directory path.
+
+2. **Collect evidence as you test** — every asserted behavior gets one evidence
+   item in `$DIR/assets/`:
+   - UI: `agent-browser screenshot` or `capture-app-window.sh`, then **verify
+     the screenshot with the Read tool before citing it** — never cite an
+     image you haven't looked at.
+   - CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
+   - Network: `agent-browser network requests` dumps or HAR files.
+
+3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
+
+4. **Set the verdict** in both `report.md` and `result.json`, then link the
+   report directory in your final answer to the user.
+
+## report.md sections
+
+| Section         | Content                                                                            |
+| --------------- | ---------------------------------------------------------------------------------- |
+| **Scope**       | What changed / what is being verified; branch + commit                             |
+| **Environment** | Server URL, surfaces used (cli / electron / web / bot), relevant versions          |
+| **Cases**       | Table: `# \| case \| surface \| steps \| expected \| actual \| status \| evidence` |
+| **Evidence**    | Embedded screenshots (`![case 1](assets/case1.png)`), fenced CLI transcripts       |
+| **Verdict**     | Pass/fail/blocked counts, optional 0–100 score, open issues / follow-ups           |
+
+Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
+missing; a blocked case is not a pass).
+
+## result.json schema
+
+```json
+{
+  "branch": "feat/task-tree",
+  "cases": [
+    {
+      "id": "1",
+      "name": "task tree returns nested children",
+      "surface": "cli",
+      "status": "pass",
+      "evidence": ["assets/task-tree.txt"]
+    }
+  ],
+  "commit": "abc1234",
+  "createdAt": "2026-06-11T15:30:00+08:00",
+  "summary": {
+    "total": 1,
+    "passed": 1,
+    "failed": 0,
+    "blocked": 0,
+    "score": 100,
+    "verdict": "pass"
+  },
+  "surfaces": ["cli"],
+  "title": "Verify task tree API"
+}
+```
+
+`score` is optional — use it when the verdict has a subjective component (UI
+polish, copy quality); omit it for purely binary runs. `verdict` is the single
+word the user reads first: `pass`, `fail`, or `partial`.
+
+## Rules
+
+- **No evidence, no claim** — every `pass`/`fail` in the case table must link
+  at least one asset.
+- **Screenshots must be visually verified** with the Read tool before being
+  cited.
+- **Report failures faithfully** — a failing case with clear evidence is a good
+  report; a vague green one is not.
+- If coverage was cut (cases skipped, surfaces not exercised), say so in the
+  Verdict section — silent truncation reads as "covered everything".
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts b/.agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts
similarity index 99%
rename from .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts
rename to .agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts
index c3e755fcf1..82895013cf 100644
--- a/.agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts
+++ b/.agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts
@@ -11,7 +11,7 @@
 //   6. ROLLBACKS — msgN / childN / role drops in the active-topic timeline
 //
 // Usage:
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
 
 import { readFileSync } from 'node:fs';
 
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/analyze.mjs b/.agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs
rename to .agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/probe-dump.js b/.agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js
rename to .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/probe-dump.ts b/.agents/skills/agent-testing/scripts/agent-gateway/probe-dump.ts
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/probe-dump.ts
rename to .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.ts
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/probe-events.ts b/.agents/skills/agent-testing/scripts/agent-gateway/probe-events.ts
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/probe-events.ts
rename to .agents/skills/agent-testing/scripts/agent-gateway/probe-events.ts
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/probe.js b/.agents/skills/agent-testing/scripts/agent-gateway/probe.js
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/probe.js
rename to .agents/skills/agent-testing/scripts/agent-gateway/probe.js
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/run.ts b/.agents/skills/agent-testing/scripts/agent-gateway/run.ts
similarity index 96%
rename from .agents/skills/local-testing/scripts/agent-gateway/run.ts
rename to .agents/skills/agent-testing/scripts/agent-gateway/run.ts
index f407541b81..03b909d4e0 100644
--- a/.agents/skills/local-testing/scripts/agent-gateway/run.ts
+++ b/.agents/skills/agent-testing/scripts/agent-gateway/run.ts
@@ -5,16 +5,16 @@
 // streaming-replay test fixtures.
 //
 // Commands:
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts install
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts install
 //       Bundle probe-events.ts and inject into the CDP-attached browser.
 //       Re-installing clears all buffers and re-patches WebSocket / fetch.
 //
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts dump [name]
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts dump [name]
 //       Stop the timeline timer, fetch the capture as JSON, write it to
 //       `.agent-gateway/<name>-<YYYYMMDD-HHmmss>.json`. `name` defaults to
 //       `dump`. Prints the absolute path written.
 //
-//   bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts analyze [path]
+//   bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts analyze [path]
 //       Run analyze-events.ts on the dump. `path` defaults to the most
 //       recently modified file in `.agent-gateway/`.
 //
@@ -28,7 +28,7 @@ import path from 'node:path';
 import { fileURLToPath } from 'node:url';
 
 const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
-// .agents/skills/local-testing/scripts/agent-gateway/ → 5 levels up
+// .agents/skills/agent-testing/scripts/agent-gateway/ → 5 levels up
 const PROJECT_ROOT = path.resolve(SCRIPT_DIR, '../../../../..');
 const DUMP_DIR = path.join(PROJECT_ROOT, '.agent-gateway');
 
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/tab-switch.js b/.agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
rename to .agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js
diff --git a/.agents/skills/local-testing/scripts/agent-gateway/types.ts b/.agents/skills/agent-testing/scripts/agent-gateway/types.ts
similarity index 100%
rename from .agents/skills/local-testing/scripts/agent-gateway/types.ts
rename to .agents/skills/agent-testing/scripts/agent-gateway/types.ts
diff --git a/.agents/skills/local-testing/bot/capture-app-window.sh b/.agents/skills/agent-testing/scripts/capture-app-window.sh
similarity index 100%
rename from .agents/skills/local-testing/bot/capture-app-window.sh
rename to .agents/skills/agent-testing/scripts/capture-app-window.sh
diff --git a/.agents/skills/local-testing/scripts/electron-dev.sh b/.agents/skills/agent-testing/scripts/electron-dev.sh
similarity index 100%
rename from .agents/skills/local-testing/scripts/electron-dev.sh
rename to .agents/skills/agent-testing/scripts/electron-dev.sh
diff --git a/.agents/skills/local-testing/scripts/record-app-screen.sh b/.agents/skills/agent-testing/scripts/record-app-screen.sh
similarity index 100%
rename from .agents/skills/local-testing/scripts/record-app-screen.sh
rename to .agents/skills/agent-testing/scripts/record-app-screen.sh
diff --git a/.agents/skills/local-testing/scripts/record-electron-demo.sh b/.agents/skills/agent-testing/scripts/record-electron-demo.sh
similarity index 100%
rename from .agents/skills/local-testing/scripts/record-electron-demo.sh
rename to .agents/skills/agent-testing/scripts/record-electron-demo.sh
diff --git a/.agents/skills/agent-testing/scripts/report-init.sh b/.agents/skills/agent-testing/scripts/report-init.sh
new file mode 100755
index 0000000000..8497da2aba
--- /dev/null
+++ b/.agents/skills/agent-testing/scripts/report-init.sh
@@ -0,0 +1,74 @@
+#!/usr/bin/env bash
+# report-init.sh — scaffold a structured test report under .records/reports/.
+#
+# Format spec and evidence rules: ../references/report.md
+#
+# Usage:
+#   report-init.sh <slug> [title]
+#
+# Prints the report directory path (capture it: DIR=$(report-init.sh my-test)).
+
+set -euo pipefail
+
+SLUG="${1:?Usage: report-init.sh <slug> [title]}"
+TITLE="${2:-$SLUG}"
+
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
+TS="$(date +%Y%m%d-%H%M%S)"
+DIR="$REPO_ROOT/.records/reports/$TS-$SLUG"
+mkdir -p "$DIR/assets"
+
+BRANCH=$(git -C "$REPO_ROOT" branch --show-current 2> /dev/null || echo "unknown")
+COMMIT=$(git -C "$REPO_ROOT" rev-parse --short HEAD 2> /dev/null || echo "unknown")
+DATE_HUMAN=$(date '+%Y-%m-%d %H:%M')
+DATE_ISO=$(date '+%Y-%m-%dT%H:%M:%S%z')
+
+cat > "$DIR/report.md" << EOF
+# Test Report: $TITLE
+
+## Scope
+
+<!-- What changed / what is being verified -->
+
+- Branch: \`$BRANCH\`
+- Commit: \`$COMMIT\`
+- Date: $DATE_HUMAN
+
+## Environment
+
+- Server: <!-- e.g. http://localhost:3010 -->
+- Surfaces: <!-- cli / electron / web / bot:<platform> -->
+
+## Cases
+
+| # | Case | Surface | Steps | Expected | Actual | Status | Evidence |
+| - | ---- | ------- | ----- | -------- | ------ | ------ | -------- |
+| 1 |      |         |       |          |        |        |          |
+
+## Evidence
+
+<!-- Embed screenshots: ![case 1](assets/case1.png) -->
+<!-- CLI transcripts in fenced blocks, with the exact command -->
+
+## Verdict
+
+- Passed: 0 / 0
+- Failed: 0
+- Blocked: 0
+- Score (optional): —
+- Open issues / follow-ups:
+EOF
+
+cat > "$DIR/result.json" << EOF
+{
+  "title": "$TITLE",
+  "createdAt": "$DATE_ISO",
+  "branch": "$BRANCH",
+  "commit": "$COMMIT",
+  "surfaces": [],
+  "cases": [],
+  "summary": { "total": 0, "passed": 0, "failed": 0, "blocked": 0, "verdict": "pending" }
+}
+EOF
+
+echo "$DIR"
diff --git a/.agents/skills/agent-testing/scripts/setup-auth.sh b/.agents/skills/agent-testing/scripts/setup-auth.sh
new file mode 100755
index 0000000000..29723e77dd
--- /dev/null
+++ b/.agents/skills/agent-testing/scripts/setup-auth.sh
@@ -0,0 +1,153 @@
+#!/usr/bin/env bash
+# setup-auth.sh — one-stop auth setup & check for local agent testing.
+#
+# Auth is the gate for all automated testing: prepare it BEFORE writing any
+# test step. Background and failure modes: ../references/auth.md
+#
+# Usage:
+#   setup-auth.sh status        # check server + CLI + web auth readiness
+#   setup-auth.sh cli           # interactive CLI device-code login (run by a human)
+#   setup-auth.sh web           # stdin = Cookie header -> inject into agent-browser session
+#   setup-auth.sh web-verify    # live-check the agent-browser session is authenticated
+#
+# Env:
+#   SERVER_URL  (default http://localhost:3010)   dev server under test
+#   SESSION     (default lobehub-dev)             agent-browser session name
+#   AUTH_DIR    (default ~/.lobehub-agent-testing) where web state is persisted
+
+set -euo pipefail
+
+SERVER_URL="${SERVER_URL:-http://localhost:3010}"
+SESSION="${SESSION:-lobehub-dev}"
+AUTH_DIR="${AUTH_DIR:-$HOME/.lobehub-agent-testing}"
+STATE_FILE="$AUTH_DIR/web-state.json"
+REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
+CLI_HOME="$REPO_ROOT/apps/cli/.lobehub-dev"
+
+ok()   { printf '  \033[32m✔\033[0m %s\n' "$1"; }
+bad()  { printf '  \033[31m✘\033[0m %s\n' "$1"; }
+note() { printf '      %s\n' "$1"; }
+
+check_server() {
+  local code
+  code=$(curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/" 2> /dev/null || true)
+  if [[ "$code" =~ ^[23] ]]; then
+    ok "dev server reachable at $SERVER_URL"
+  else
+    bad "dev server NOT reachable at $SERVER_URL (http_code='$code')"
+    note "start it: pnpm run dev:next  (see references/dev-server.md)"
+    return 1
+  fi
+}
+
+check_cli() {
+  if [[ -f "$CLI_HOME/settings.json" ]] && grep -q "$SERVER_URL" "$CLI_HOME/settings.json"; then
+    ok "CLI logged in to $SERVER_URL (creds: apps/cli/.lobehub-dev)"
+  else
+    bad "CLI not logged in to $SERVER_URL"
+    note "ask the user to run:"
+    note "cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server $SERVER_URL"
+    return 1
+  fi
+}
+
+check_web() {
+  if [[ -f "$STATE_FILE" ]]; then
+    ok "web auth state saved ($STATE_FILE)"
+    note "live-verify: $0 web-verify"
+  else
+    bad "no web auth state for agent-browser"
+    note "copy the Cookie header from Chrome DevTools (Network tab), then:"
+    note "pbpaste | $0 web   (see references/auth.md)"
+    return 1
+  fi
+}
+
+cmd_status() {
+  echo "agent-testing auth status (SERVER_URL=$SERVER_URL):"
+  local rc=0
+  check_server || rc=1
+  check_cli || rc=1
+  check_web || rc=1
+  if [[ $rc -eq 0 ]]; then
+    echo "all green — safe to start automated testing."
+  else
+    echo "auth NOT ready — fix the ✘ items before writing any test step."
+  fi
+  return $rc
+}
+
+cmd_cli() {
+  echo "Starting CLI device-code login against $SERVER_URL ..."
+  echo "(opens a browser authorization — must be run by a human in a terminal)"
+  cd "$REPO_ROOT/apps/cli"
+  LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server "$SERVER_URL"
+}
+
+# Build a Playwright storageState file from a raw Cookie header on stdin,
+# keeping only the better-auth cookies. See references/auth.md for why the
+# header must come from a Network request (HttpOnly) and why httpOnly=false.
+cmd_web() {
+  mkdir -p "$AUTH_DIR"
+  python3 - "$STATE_FILE" << 'PY'
+import json, sys, time
+
+raw = sys.stdin.read().strip()
+if raw.lower().startswith("cookie:"):
+    raw = raw.split(":", 1)[1].strip()
+
+WANTED = {"better-auth.session_token", "better-auth.state"}
+exp = int(time.time()) + 30 * 24 * 3600  # 30 days
+
+cookies = []
+for pair in raw.split("; "):
+    if "=" not in pair:
+        continue
+    name, _, value = pair.partition("=")
+    if name not in WANTED:
+        continue
+    cookies.append({
+        "name": name,
+        "value": value,
+        "domain": "localhost",
+        "path": "/",
+        "expires": exp,
+        "httpOnly": False,
+        "secure": False,
+        "sameSite": "Lax",
+    })
+
+if not cookies:
+    sys.stderr.write("no better-auth cookies found in input — paste the raw Cookie header from a Network request\n")
+    sys.exit(1)
+
+with open(sys.argv[1], "w") as f:
+    json.dump({"cookies": cookies, "origins": []}, f, indent=2)
+print(f"wrote {len(cookies)} cookie(s) to {sys.argv[1]}")
+PY
+  agent-browser --session "$SESSION" state load "$STATE_FILE"
+  cmd_web_verify
+}
+
+cmd_web_verify() {
+  agent-browser --session "$SESSION" open "$SERVER_URL/" > /dev/null
+  local url
+  url=$(agent-browser --session "$SESSION" get url)
+  if [[ "$url" == *"/signin"* || "$url" == *"/login"* ]]; then
+    bad "agent-browser session '$SESSION' NOT authenticated (landed on $url)"
+    note "re-copy the Cookie header and re-run: pbpaste | $0 web"
+    return 1
+  fi
+  ok "agent-browser session '$SESSION' authenticated (at $url)"
+}
+
+case "${1:-status}" in
+  status) cmd_status ;;
+  cli) cmd_cli ;;
+  web) cmd_web ;;
+  web-verify) cmd_web_verify ;;
+  *)
+    echo "Usage: $0 {status|cli|web|web-verify}" >&2
+    exit 2
+    ;;
+esac
diff --git a/.agents/skills/agent-testing/ui/electron.md b/.agents/skills/agent-testing/ui/electron.md
new file mode 100644
index 0000000000..3c1f864d6d
--- /dev/null
+++ b/.agents/skills/agent-testing/ui/electron.md
@@ -0,0 +1,112 @@
+# Electron (LobeHub Desktop) UI Testing
+
+Default surface for verifying **pure frontend changes** (components, store logic, styles, interactions) in the primary product shape. Drives the Electron renderer over CDP with `agent-browser` — see [../references/agent-browser.md](../references/agent-browser.md) for the full command reference.
+
+**Auth**: the Electron app keeps its own persistent login state — log in once manually in the app; sessions survive restarts. Run `../scripts/setup-auth.sh status` before testing (see [../references/auth.md](../references/auth.md)).
+
+**Linux / headless (cloud)**: Electron itself runs on Linux, but it has no true headless mode — it needs a display server. In a headless environment wrap the launch with `xvfb-run` (virtual framebuffer). Everything CDP-based keeps working under Xvfb: the `agent-browser --cdp 9222` connection, snapshots, eval, and `agent-browser screenshot` (captured from the renderer via CDP, not the OS screen). What does NOT work on Linux: `capture-app-window.sh` (macOS `screencapture`), osascript, and the ffmpeg recording scripts in their current form.
+
+### Setup / Teardown
+
+Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
+
+```bash
+SCRIPT=".agents/skills/agent-testing/scripts/electron-dev.sh"
+
+# Start Electron dev with CDP (idempotent — skips if already running)
+$SCRIPT start
+
+# Check if Electron is running and CDP is reachable
+$SCRIPT status
+
+# Kill all Electron-related processes (main + helper + vite)
+$SCRIPT stop
+
+# Force fresh restart
+$SCRIPT restart
+```
+
+After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
+
+**Always run `$SCRIPT stop` when done testing** — `pkill -f "Electron"` alone won't catch all helper processes.
+
+#### Environment Variables
+
+| Variable          | Default                 | Description                              |
+| ----------------- | ----------------------- | ---------------------------------------- |
+| `CDP_PORT`        | `9222`                  | Chrome DevTools Protocol port            |
+| `ELECTRON_LOG`    | `/tmp/electron-dev.log` | Electron process log                     |
+| `ELECTRON_WAIT_S` | `60`                    | Max seconds to wait for Electron process |
+| `RENDERER_WAIT_S` | `60`                    | Max seconds to wait for SPA to load      |
+
+### LobeHub-Specific Patterns
+
+#### Access Zustand Store State
+
+```bash
+agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
+(function() {
+  var chat = window.__LOBE_STORES.chat();
+  var ops = Object.values(chat.operations);
+  return JSON.stringify({
+    ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
+    activeAgent: chat.activeAgentId,
+    activeTopic: chat.activeTopicId,
+  });
+})()
+EVALEOF
+```
+
+#### Find and Use the Chat Input
+
+```bash
+# The chat input is contenteditable — must use -C flag
+agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
+
+agent-browser --cdp 9222 click @e48
+agent-browser --cdp 9222 type @e48 "Hello world"
+agent-browser --cdp 9222 press Enter
+```
+
+#### Wait for Agent to Complete
+
+```bash
+agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
+(function() {
+  var chat = window.__LOBE_STORES.chat();
+  var ops = Object.values(chat.operations);
+  var running = ops.filter(function(o) { return o.status === 'running'; });
+  return running.length === 0 ? 'done' : 'running: ' + running.length;
+})()
+EVALEOF
+```
+
+#### Install Error Interceptor
+
+```bash
+agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
+(function() {
+  window.__CAPTURED_ERRORS = [];
+  var orig = console.error;
+  console.error = function() {
+    var msg = Array.from(arguments).map(function(a) {
+      if (a instanceof Error) return a.message;
+      return typeof a === 'object' ? JSON.stringify(a) : String(a);
+    }).join(' ');
+    window.__CAPTURED_ERRORS.push(msg);
+    orig.apply(console, arguments);
+  };
+  return 'installed';
+})()
+EVALEOF
+
+# Later, check captured errors:
+agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
+```
+
+## Electron Gotchas
+
+- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
+- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
+- **Don't resize the Electron window after load** — resizing triggers full SPA reload
+- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
diff --git a/.agents/skills/agent-testing/ui/web.md b/.agents/skills/agent-testing/ui/web.md
new file mode 100644
index 0000000000..78678b4ad6
--- /dev/null
+++ b/.agents/skills/agent-testing/ui/web.md
@@ -0,0 +1,69 @@
+# Web (Full-Stack) Testing
+
+Default surface for **full-stack changes** — a new/changed API plus the UI that
+consumes it. The browser is the one surface where network requests and UI state
+are observable together, so you can assert both sides of the contract in a
+single run.
+
+For pure-frontend changes prefer [electron.md](./electron.md); for
+backend-only changes prefer [../cli/index.md](../cli/index.md).
+
+## Prerequisites
+
+- Local dev server running — [../references/dev-server.md](../references/dev-server.md)
+- Web auth injected into agent-browser — [../references/auth.md](../references/auth.md):
+
+```bash
+pbpaste | ./.agents/skills/agent-testing/scripts/setup-auth.sh web # after copying the Cookie header
+```
+
+## Option A — agent-browser with injected auth (recommended)
+
+```bash
+SESSION=lobehub-dev
+
+agent-browser --session $SESSION open "http://localhost:3010/"
+agent-browser --session $SESSION snapshot -i
+# interact via refs — full command reference: ../references/agent-browser.md
+```
+
+### Watch the API while driving the UI
+
+```bash
+# After triggering the UI action under test:
+agent-browser --session $SESSION network requests --type xhr,fetch
+agent-browser --session $SESSION network requests --method POST
+
+# Record a full HAR for the report
+agent-browser --session $SESSION network har start
+# ... drive the scenario ...
+agent-browser --session $SESSION network har stop ./capture.har
+```
+
+Assert both layers: the request/response shape (network) and the rendered
+result (snapshot/screenshot). Both belong in the report as evidence.
+
+## Option B — real Chrome with remote debugging
+
+For flows that need a real, visible browser (e.g. exercising the login UI
+itself):
+
+```bash
+/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
+  --remote-debugging-port=9222 \
+  --user-data-dir=/tmp/chrome-test-profile \
+  "<URL>" &
+sleep 5
+agent-browser --cdp 9222 snapshot -i
+
+# Or auto-discover running Chrome with remote debugging
+agent-browser --auto-connect snapshot -i
+```
+
+## Option C — Debug Proxy (local frontend, production backend)
+
+`bun run dev:spa` prints a **Debug Proxy** URL
+(`https://app.lobehub.com/_dangerous_local_dev_proxy?debug-host=…`) that loads
+your local Vite SPA inside the online environment — HMR against real server
+config. Useful for verifying frontend behavior against production data, **not**
+for testing backend changes (the backend is production, not your branch).
diff --git a/.agents/skills/cli-backend-testing/SKILL.md b/.agents/skills/cli-backend-testing/SKILL.md
deleted file mode 100644
index 8f9ff62820..0000000000
--- a/.agents/skills/cli-backend-testing/SKILL.md
+++ /dev/null
@@ -1,172 +0,0 @@
----
-name: cli-backend-testing
-description: >
-  CLI + Backend integration testing workflow. Use when verifying backend API changes
-  (TRPC routers, services, models) via the LobeHub CLI against a local dev server.
-  Triggers on 'cli test', 'test with cli', 'verify with cli', 'local cli test',
-  'backend test with cli', or when needing to validate server-side changes end-to-end.
----
-
-# CLI + Backend Integration Testing
-
-Standard workflow for verifying backend changes using the LobeHub CLI (`lh`) against a local dev server.
-
-## When to Use
-
-- Verifying TRPC router / service / model changes end-to-end
-- Testing new API fields or response structure changes
-- Validating CLI command output after backend modifications
-- Debugging data flow issues between server and CLI
-
-## Prerequisites
-
-| Requirement  | Details                                                       |
-| ------------ | ------------------------------------------------------------- |
-| Dev server   | `localhost:3011` (Next.js)                                    |
-| CLI source   | `lobehub/apps/cli/`                                           |
-| CLI dev mode | Uses `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials |
-| Auth         | Device Code Flow login to local server                        |
-
-## Quick Reference
-
-All CLI dev commands run from `lobehub/apps/cli/`. Subsequent examples use `$CLI`:
-
-```bash
-CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
-```
-
-## Workflow
-
-### Step 1: Ensure Dev Server is Running
-
-```bash
-curl -s -o /dev/null -w '%{http_code}' http://localhost:3011/ 2> /dev/null
-```
-
-- **If reachable**: skip to Step 2.
-- **If unreachable**: start from cloud repo root:
-
-```bash
-pnpm run dev:next
-```
-
-To **restart** (pick up server-side code changes):
-
-```bash
-lsof -ti:3011 | xargs kill
-pnpm run dev:next
-```
-
-**Important:** Server-side code changes in the submodule (`lobehub/apps/server/src/`, `lobehub/src/server/`, `lobehub/packages/`) require a server restart. Next.js hot-reload may not pick up changes in submodule packages.
-
-### Step 2: Check CLI Authentication
-
-```bash
-cat lobehub/apps/cli/.lobehub-dev/settings.json 2> /dev/null
-```
-
-- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: skip to Step 3.
-- **If missing or wrong server**: ask the user to run:
-
-```bash
-! cd lobehub/apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3011
-```
-
-> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. Credentials persist in `lobehub/apps/cli/.lobehub-dev/`.
-
-### Step 3: Test with CLI Commands
-
-CLI runs from source, so CLI-side code changes take effect immediately without rebuilding.
-
-```bash
-cd lobehub/apps/cli
-$CLI <command>
-```
-
-### Step 4: Clean Up Test Data
-
-```bash
-$CLI task delete < id > -y
-$CLI agent delete < id > -y
-```
-
-## Common Testing Patterns
-
-### Task System
-
-```bash
-$CLI task list
-$CLI task create -n "Root Task" -i "Test instruction"
-$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
-$CLI task view T-1
-$CLI task tree T-1
-$CLI task edit T-1 --status running
-$CLI task comment T-1 -m "Test comment"
-$CLI task delete T-1 -y
-```
-
-### Agent System
-
-```bash
-$CLI agent list
-$CLI agent view <agent-id>
-$CLI agent run <agent-id> -m "Test prompt"
-```
-
-### Document & Knowledge Base
-
-```bash
-$CLI doc list
-$CLI doc create -t "Test Doc" -c "Content here"
-$CLI doc view <doc-id>
-$CLI kb list
-$CLI kb tree <kb-id>
-```
-
-### Model & Provider
-
-```bash
-$CLI model list
-$CLI provider list
-$CLI provider test <provider-id>
-```
-
-## Dev-Test Cycle
-
-```
-1. Make code changes (service/model/router/type)
-         |
-2. Run unit tests (fast feedback)
-   bunx vitest run --silent='passed-only' '<test-file>'
-         |
-3. Restart dev server (if server-side changes)
-   lsof -ti:3011 | xargs kill && pnpm run dev:next
-         |
-4. CLI verification (end-to-end)
-   $CLI <command>
-         |
-5. Clean up test data
-```
-
-### When Server Restart is Needed
-
-| Change Location                                         | Restart? |
-| ------------------------------------------------------- | -------- |
-| `lobehub/apps/server/src/` (routers, services, modules) | Yes      |
-| `lobehub/src/server/` (agent-hono, workflows-hono)      | Yes      |
-| `lobehub/packages/database/` (models)                   | Yes      |
-| `lobehub/packages/types/`                               | Yes      |
-| `lobehub/packages/prompts/`                             | Yes      |
-| `lobehub/apps/cli/` (CLI code)                          | No       |
-| `src/` (cloud overrides)                                | Yes      |
-
-## Troubleshooting
-
-| Issue                       | Solution                                                              |
-| --------------------------- | --------------------------------------------------------------------- |
-| `No authentication found`   | Run `login --server http://localhost:3011`                            |
-| `UNAUTHORIZED` on API calls | Token expired; re-run login                                           |
-| `ECONNREFUSED`              | Dev server not running; start with `pnpm run dev:next`                |
-| CLI shows old data/behavior | Server needs restart to pick up code changes                          |
-| `EADDRINUSE` on port 3011   | Server already running; kill with `lsof -ti:3011 \| xargs kill`       |
-| Login opens wrong server    | Must use `--server http://localhost:3011` flag (env var doesn't work) |
diff --git a/.agents/skills/heterogeneous-agent/references/debug-workflow.md b/.agents/skills/heterogeneous-agent/references/debug-workflow.md
index 85e3ea5136..90fe59caab 100644
--- a/.agents/skills/heterogeneous-agent/references/debug-workflow.md
+++ b/.agents/skills/heterogeneous-agent/references/debug-workflow.md
@@ -241,6 +241,6 @@ When the bug comes from a real trace, distill it into the closest existing test
 3. Add or update the narrowest failing test near the broken layer.
 4. Fix the smallest layer that can explain the symptom.
 5. Re-run focused tests.
-6. Only then do an Electron smoke test with the `local-testing` skill if UI confirmation is still needed.
+6. Only then do an Electron smoke test with the `agent-testing` skill if UI confirmation is still needed.
 
 Do not start with a broad Electron repro if a raw trace or adapter test can prove the fault zone faster.
diff --git a/.agents/skills/local-testing/SKILL.md b/.agents/skills/local-testing/SKILL.md
deleted file mode 100644
index 3524aa4680..0000000000
--- a/.agents/skills/local-testing/SKILL.md
+++ /dev/null
@@ -1,561 +0,0 @@
----
-name: local-testing
-description: >
-  Local app and bot testing. Uses agent-browser CLI for Electron/web app UI testing,
-  and osascript (AppleScript) for controlling native macOS apps (WeChat, Discord, Telegram, Slack, Lark/飞书, QQ)
-  to test bots. Triggers on 'local test', 'test in electron', 'test desktop', 'test bot',
-  'bot test', 'test in discord', 'test in telegram', 'test in slack', 'test in weixin',
-  'test in wechat', 'test in lark', 'test in feishu', 'test in qq',
-  'manual test', 'osascript', or UI/bot verification tasks.
----
-
-# Local App & Bot Testing
-
-Two approaches for local testing on macOS:
-
-| Approach                    | Tool                | Best For                                             |
-| --------------------------- | ------------------- | ---------------------------------------------------- |
-| **agent-browser + CDP**     | `agent-browser` CLI | Electron apps, web apps (DOM access, JS eval)        |
-| **osascript (AppleScript)** | `osascript -e`      | Native macOS apps (WeChat, Discord, Telegram, Slack) |
-
----
-
-# Part 1: agent-browser (Electron / Web Apps)
-
-Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
-
-Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
-
-## Core Workflow
-
-Every browser automation follows this pattern:
-
-1. **Navigate**: `agent-browser open <url>`
-2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
-3. **Interact**: Use refs to click, fill, select
-4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
-
-```bash
-agent-browser open https://example.com/form
-agent-browser snapshot -i
-# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
-
-agent-browser fill @e1 "user@example.com"
-agent-browser fill @e2 "password123"
-agent-browser click @e3
-agent-browser wait --load networkidle
-agent-browser snapshot -i # Check result
-```
-
-## Command Chaining
-
-```bash
-# Chain open + wait + snapshot in one call
-agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
-```
-
-Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
-
-## Essential Commands
-
-```bash
-# Navigation
-agent-browser open <url>              # Navigate (aliases: goto, navigate)
-agent-browser close                   # Close browser
-agent-browser close --all             # Close all active sessions
-
-# Snapshot
-agent-browser snapshot -i             # Interactive elements with refs (recommended)
-agent-browser snapshot -s "#selector" # Scope to CSS selector
-
-# Interaction (use @refs from snapshot)
-agent-browser click @e1               # Click element
-agent-browser click @e1 --new-tab     # Click and open in new tab
-agent-browser fill @e2 "text"         # Clear and type text
-agent-browser type @e2 "text"         # Type without clearing
-agent-browser select @e1 "option"     # Select dropdown option
-agent-browser check @e1               # Check checkbox
-agent-browser press Enter             # Press key
-agent-browser keyboard type "text"    # Type at current focus (no selector)
-agent-browser keyboard inserttext "text"  # Insert without key events
-agent-browser scroll down 500         # Scroll page
-agent-browser scroll down 500 --selector "div.content"  # Scroll within container
-
-# Get information
-agent-browser get text @e1            # Get element text
-agent-browser get url                 # Get current URL
-agent-browser get title               # Get page title
-agent-browser get cdp-url             # Get CDP WebSocket URL
-
-# Wait
-agent-browser wait @e1                # Wait for element
-agent-browser wait --load networkidle # Wait for network idle
-agent-browser wait --url "**/page"    # Wait for URL pattern
-agent-browser wait 2000               # Wait milliseconds
-agent-browser wait --text "Welcome"   # Wait for text to appear
-agent-browser wait --fn "!document.body.innerText.includes('Loading...')"  # Wait for text to disappear
-agent-browser wait "#spinner" --state hidden  # Wait for element to disappear
-
-# Downloads
-agent-browser download @e1 ./file.pdf          # Click element to trigger download
-agent-browser wait --download ./output.zip     # Wait for any download to complete
-
-# Network
-agent-browser network requests                 # Inspect tracked requests
-agent-browser network requests --type xhr,fetch  # Filter by resource type
-agent-browser network requests --method POST   # Filter by HTTP method
-agent-browser network route "**/api/*" --abort # Block matching requests
-agent-browser network har start                # Start HAR recording
-agent-browser network har stop ./capture.har   # Stop and save HAR file
-
-# Viewport & Device Emulation
-agent-browser set viewport 1920 1080          # Set viewport size (default: 1280x720)
-agent-browser set viewport 1920 1080 2        # 2x retina
-agent-browser set device "iPhone 14"          # Emulate device (viewport + user agent)
-
-# Capture
-agent-browser screenshot              # Screenshot to temp dir
-agent-browser screenshot --full       # Full page screenshot
-agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
-agent-browser pdf output.pdf          # Save as PDF
-
-# Clipboard
-agent-browser clipboard read          # Read text from clipboard
-agent-browser clipboard write "text"  # Write text to clipboard
-agent-browser clipboard copy          # Copy current selection
-agent-browser clipboard paste         # Paste from clipboard
-
-# Dialogs (alert, confirm, prompt, beforeunload)
-agent-browser dialog accept           # Accept dialog
-agent-browser dialog accept "input"   # Accept prompt dialog with text
-agent-browser dialog dismiss          # Dismiss/cancel dialog
-agent-browser dialog status           # Check if dialog is open
-
-# Diff (compare page states)
-agent-browser diff snapshot                        # Compare current vs last snapshot
-agent-browser diff screenshot --baseline before.png  # Visual pixel diff
-agent-browser diff url <url1> <url2>               # Compare two pages
-
-# Streaming
-agent-browser stream enable           # Start WebSocket streaming
-agent-browser stream status           # Inspect streaming state
-agent-browser stream disable          # Stop streaming
-```
-
-## Batch Execution
-
-```bash
-echo '[
-  ["open", "https://example.com"],
-  ["snapshot", "-i"],
-  ["click", "@e1"],
-  ["screenshot", "result.png"]
-]' | agent-browser batch --json
-```
-
-## Authentication
-
-```bash
-# Option 1: Auth vault (credentials stored encrypted)
-echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
-agent-browser auth login myapp
-
-# Option 2: Session name (auto-save/restore cookies + localStorage)
-agent-browser --session-name myapp open https://app.example.com/login
-agent-browser close                                                       # State auto-saved
-agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
-
-# Option 3: Persistent profile
-agent-browser --profile ~/.myapp open https://app.example.com/login
-
-# Option 4: State file
-agent-browser state save auth.json
-agent-browser state load auth.json
-```
-
-### LobeHub dev server — inject better-auth cookie
-
-`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3011`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [references/agent-browser-login.md](./references/agent-browser-login.md) for the full recipe.
-
-## Semantic Locators (Alternative to Refs)
-
-```bash
-agent-browser find text "Sign In" click
-agent-browser find label "Email" fill "user@test.com"
-agent-browser find role button click --name "Submit"
-agent-browser find placeholder "Search" type "query"
-agent-browser find testid "submit-btn" click
-```
-
-## JavaScript Evaluation (eval)
-
-```bash
-# Simple expressions
-agent-browser eval 'document.title'
-
-# Complex JS: use --stdin with heredoc (RECOMMENDED)
-agent-browser eval --stdin << 'EVALEOF'
-JSON.stringify(
-  Array.from(document.querySelectorAll("img"))
-    .filter(i => !i.alt)
-    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
-)
-EVALEOF
-
-# Base64 encoding (avoids all shell escaping issues)
-agent-browser eval -b "$(echo -n 'document.title' | base64)"
-```
-
-## Ref Lifecycle
-
-Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
-
-## Annotated Screenshots (Vision Mode)
-
-```bash
-agent-browser screenshot --annotate
-# Output includes the image path and a legend:
-#   [1] @e1 button "Submit"
-#   [2] @e2 link "Home"
-agent-browser click @e2 # Click using ref from annotated screenshot
-```
-
-## Parallel Sessions
-
-```bash
-agent-browser --session site1 open https://site-a.com
-agent-browser --session site2 open https://site-b.com
-agent-browser session list
-```
-
-## Connect to Existing Chrome
-
-```bash
-agent-browser --auto-connect snapshot # Auto-discover running Chrome
-agent-browser --cdp 9222 snapshot     # Explicit CDP port
-```
-
-## iOS Simulator (Mobile Safari)
-
-```bash
-agent-browser device list
-agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
-agent-browser -p ios snapshot -i
-agent-browser -p ios tap @e1
-agent-browser -p ios swipe up
-agent-browser -p ios screenshot mobile.png
-agent-browser -p ios close
-```
-
-## Observability Dashboard
-
-```bash
-agent-browser dashboard install
-agent-browser dashboard start # Background server on port 4848
-agent-browser dashboard stop
-```
-
-## Cloud Providers
-
-Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
-
-## Browser Engine Selection
-
-```bash
-agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
-```
-
-## Electron (LobeHub Desktop)
-
-### Setup / Teardown
-
-Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
-
-```bash
-SCRIPT=".agents/skills/local-testing/scripts/electron-dev.sh"
-
-# Start Electron dev with CDP (idempotent — skips if already running)
-$SCRIPT start
-
-# Check if Electron is running and CDP is reachable
-$SCRIPT status
-
-# Kill all Electron-related processes (main + helper + vite)
-$SCRIPT stop
-
-# Force fresh restart
-$SCRIPT restart
-```
-
-After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
-
-**Always run `$SCRIPT stop` when done testing** — `pkill -f "Electron"` alone won't catch all helper processes.
-
-#### Environment Variables
-
-| Variable          | Default                 | Description                              |
-| ----------------- | ----------------------- | ---------------------------------------- |
-| `CDP_PORT`        | `9222`                  | Chrome DevTools Protocol port            |
-| `ELECTRON_LOG`    | `/tmp/electron-dev.log` | Electron process log                     |
-| `ELECTRON_WAIT_S` | `60`                    | Max seconds to wait for Electron process |
-| `RENDERER_WAIT_S` | `60`                    | Max seconds to wait for SPA to load      |
-
-### LobeHub-Specific Patterns
-
-#### Access Zustand Store State
-
-```bash
-agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
-(function() {
-  var chat = window.__LOBE_STORES.chat();
-  var ops = Object.values(chat.operations);
-  return JSON.stringify({
-    ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
-    activeAgent: chat.activeAgentId,
-    activeTopic: chat.activeTopicId,
-  });
-})()
-EVALEOF
-```
-
-#### Find and Use the Chat Input
-
-```bash
-# The chat input is contenteditable — must use -C flag
-agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
-
-agent-browser --cdp 9222 click @e48
-agent-browser --cdp 9222 type @e48 "Hello world"
-agent-browser --cdp 9222 press Enter
-```
-
-#### Wait for Agent to Complete
-
-```bash
-agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
-(function() {
-  var chat = window.__LOBE_STORES.chat();
-  var ops = Object.values(chat.operations);
-  var running = ops.filter(function(o) { return o.status === 'running'; });
-  return running.length === 0 ? 'done' : 'running: ' + running.length;
-})()
-EVALEOF
-```
-
-#### Install Error Interceptor
-
-```bash
-agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
-(function() {
-  window.__CAPTURED_ERRORS = [];
-  var orig = console.error;
-  console.error = function() {
-    var msg = Array.from(arguments).map(function(a) {
-      if (a instanceof Error) return a.message;
-      return typeof a === 'object' ? JSON.stringify(a) : String(a);
-    }).join(' ');
-    window.__CAPTURED_ERRORS.push(msg);
-    orig.apply(console, arguments);
-  };
-  return 'installed';
-})()
-EVALEOF
-
-# Later, check captured errors:
-agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
-```
-
-## Chrome / Web Apps
-
-```bash
-/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
-  --remote-debugging-port=9222 \
-  --user-data-dir=/tmp/chrome-test-profile \
-  "<URL>" &
-sleep 5
-agent-browser --cdp 9222 snapshot -i
-
-# Or auto-discover running Chrome with remote debugging
-agent-browser --auto-connect snapshot -i
-```
-
----
-
-# Part 2: osascript (Native macOS App Bot Testing)
-
-Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. Works with any app that supports macOS Accessibility, no CDP or Chromium needed.
-
-The pattern is the same for every platform:
-
-1. **Activate** the app (`tell application "X" to activate`)
-2. **Navigate** to a channel/chat (Quick Switcher `Cmd+K` or Search `Cmd+F`)
-3. **Send** a message (clipboard paste `Cmd+V` + Enter)
-4. **Wait** for the bot response
-5. **Screenshot** for verification (`screencapture` + `Read` tool)
-
-## Per-Platform References
-
-Pick the file for your target platform — each contains activation, navigation, send-message, and verification snippets specific to that app:
-
-Each channel has its own folder under `bot/<channel>/` containing an `index.md`
-(activation, navigation, send-message, and verification snippets specific to
-that app) and its test script:
-
-| Platform      | Reference                                        | Quick switcher |
-| ------------- | ------------------------------------------------ | -------------- |
-| Discord       | [bot/discord/index.md](./bot/discord/index.md)   | `Cmd+K`        |
-| Slack         | [bot/slack/index.md](./bot/slack/index.md)       | `Cmd+K`        |
-| Telegram      | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F`        |
-| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md)     | `Cmd+F`        |
-| Lark / 飞书   | [bot/lark/index.md](./bot/lark/index.md)         | `Cmd+K`        |
-| QQ            | [bot/qq/index.md](./bot/qq/index.md)             | `Cmd+F`        |
-
-For **shared osascript patterns** (activate, type, paste, screenshot, read accessibility, common workflow template, gotchas), see [bot/osascript-common.md](./bot/osascript-common.md). Read this first if you're new to osascript automation.
-
-## Bridge-based channels (no native app)
-
-Some channels have no native app to drive with osascript — they connect through
-a local bridge inside the Desktop app. These are tested with agent-browser
-(IPC + UI) plus the bridge's own HTTP/REST endpoints, not osascript:
-
-| Channel  | Reference                                        | What it drives                                           |
-| -------- | ------------------------------------------------ | -------------------------------------------------------- |
-| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | `imessageBridge.*` IPC + local bridge + BlueBubbles REST |
-
-For iMessage there is a one-shot regression script — see `test-imessage-bridge.sh` below.
-
----
-
-# Scripts
-
-**App / recording scripts** in `.agents/skills/local-testing/scripts/`:
-
-| Script                    | Usage                                               |
-| ------------------------- | --------------------------------------------------- |
-| `electron-dev.sh`         | Manage Electron dev env (start/stop/status/restart) |
-| `record-electron-demo.sh` | Record Electron app demo with ffmpeg                |
-| `record-app-screen.sh`    | Record app screen (video + screenshots, start/stop) |
-
-**Bot scripts** live under `.agents/skills/local-testing/bot/`, one folder per
-channel (alongside that channel's `index.md`). The shared
-`capture-app-window.sh` sits at the `bot/` root:
-
-| Script                             | Usage                                                               |
-| ---------------------------------- | ------------------------------------------------------------------- |
-| `capture-app-window.sh`            | Capture screenshot of a specific app window (used by bot tests)     |
-| `discord/test-discord-bot.sh`      | Send message to Discord bot via osascript                           |
-| `slack/test-slack-bot.sh`          | Send message to Slack bot via osascript                             |
-| `telegram/test-telegram-bot.sh`    | Send message to Telegram bot via osascript                          |
-| `wechat/test-wechat-bot.sh`        | Send message to WeChat bot via osascript                            |
-| `lark/test-lark-bot.sh`            | Send message to Lark / 飞书 bot via osascript                       |
-| `qq/test-qq-bot.sh`                | Send message to QQ bot via osascript                                |
-| `imessage/test-imessage-bridge.sh` | Regression-test the iMessage BlueBubbles bridge (IPC + HTTP)        |
-| `imessage/send-imessage-test.sh`   | Send one real iMessage (desktop → BB → iMessage) and verify it sent |
-
-### Window Screenshot Utility
-
-`capture-app-window.sh` captures a screenshot of a specific app window using `screencapture -l <windowID>`. It uses Swift + CGWindowList to find the window by process name, so screenshots work correctly even when the window is on an external monitor or behind other windows.
-
-```bash
-# Standalone usage
-./.agents/skills/local-testing/bot/capture-app-window.sh "Discord" /tmp/discord.png
-./.agents/skills/local-testing/bot/capture-app-window.sh "Slack" /tmp/slack.png
-./.agents/skills/local-testing/bot/capture-app-window.sh "WeChat" /tmp/wechat.png
-```
-
-All bot test scripts use this utility automatically for their screenshots.
-
-### Bot Test Scripts
-
-All bot test scripts share the same interface:
-
-```bash
-./scripts/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
-```
-
-Examples:
-
-```bash
-# Discord — test a bot in #bot-testing channel
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
-./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
-
-# Slack — test a bot in #bot-testing channel
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
-./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
-
-# Telegram — test a bot by username
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
-./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
-
-# WeChat — test a bot or send to a contact
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
-./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
-
-# Lark/飞书 — test a bot in a group chat
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
-./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
-
-# QQ — test a bot in a group or direct chat
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
-./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
-```
-
-Each script: activates the app, navigates to the channel/contact, pastes the message via clipboard, sends, waits, and takes a screenshot. Use the `Read` tool on the screenshot for visual verification.
-
-### iMessage bridge regression script
-
-`test-imessage-bridge.sh` does **not** follow the osascript bot interface — it
-drives the Desktop bridge's IPC + HTTP layers and asserts the result, then
-self-cleans. Needs BlueBubbles running and Electron up with CDP.
-
-```bash
-./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
-# defaults: bb_url=http://127.0.0.1:1234  cdp_port=9222 — exit 0 = all green
-```
-
-It guards the connect/configure flow (testConfig happy + reject paths, first-time
-`upsertConfig` save, bridge running + webhook registered, local-server secret
-enforcement). See [bot/imessage/index.md](./bot/imessage/index.md)
-for the full manual UI flow and known bugs.
-
----
-
-# Screen Recording
-
-Record automated demos using `record-app-screen.sh` (start/stop lifecycle, CDP screenshots + ffmpeg assembly). See [references/record-app-screen.md](references/record-app-screen.md) for full documentation.
-
-```bash
-./.agents/skills/local-testing/scripts/electron-dev.sh start
-./.agents/skills/local-testing/scripts/record-app-screen.sh start my-demo
-# ... run automation ...
-./.agents/skills/local-testing/scripts/record-app-screen.sh stop
-```
-
-Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/` (screenshots every 3s).
-
----
-
-# Gotchas
-
-### agent-browser
-
-- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
-- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
-- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
-- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
-- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
-- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
-- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
-- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
-
-### Electron-specific
-
-- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
-- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
-- **Don't resize the Electron window after load** — resizing triggers full SPA reload
-- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
-
-### osascript
-
-See [bot/osascript-common.md](./bot/osascript-common.md#gotchas) for the full osascript gotchas list (accessibility permissions, `keystroke` non-ASCII issues, locale-specific app names, rate limiting, etc.).
diff --git a/.agents/skills/local-testing/references/agent-browser-login.md b/.agents/skills/local-testing/references/agent-browser-login.md
deleted file mode 100644
index cdd638fbf4..0000000000
--- a/.agents/skills/local-testing/references/agent-browser-login.md
+++ /dev/null
@@ -1,110 +0,0 @@
-# Log `agent-browser` into a local LobeHub dev server
-
-`agent-browser --headed` on macOS often creates the Chromium window off-screen — the user can't see or interact with it, so manual login inside the agent-browser session fails. Instead of sharing the user's real Chrome profile, copy the **better-auth session cookie** out of a request in DevTools and inject it into the agent-browser session as a Playwright-style state file.
-
-## When to use
-
-- You need `agent-browser` to reach an authenticated page on `http://localhost:<port>` (e.g. `localhost:3011`).
-- The user already has a logged-in tab of the same dev server in their own Chrome.
-- Spawning a headed Chromium to let the user log in manually is unreliable (window off-screen, no interaction).
-
-Do **not** use this on production URLs — only local dev. Treat the cookie as a secret: don't paste it into shared logs, PRs, or commit it anywhere.
-
-## Step 1 — Ask the user to copy the cookie from a Network request, NOT `document.cookie`
-
-`document.cookie` will not return HttpOnly cookies, which is exactly where better-auth puts its session. Instruct the user:
-
-1. Open the logged-in tab (`http://localhost:<port>/…`) in their own Chrome.
-2. `Cmd+Option+I` → **Network** tab.
-3. Refresh, click any same-origin request (e.g. the top-level document request).
-4. In the right pane under **Request Headers**, right-click the `Cookie:` line → **Copy value** (or copy the entire header).
-5. Paste the string into chat.
-
-You only need the better-auth pieces. Everything else (Clerk, `LOBE_LOCALE`, HMR hash, theme vars) is noise and can stay. The minimum viable set is:
-
-```
-better-auth.session_token=<value>; better-auth.state=<value>
-```
-
-## Step 2 — Build a Playwright-style state file
-
-`agent-browser state load` expects Playwright's `storageState` format: a JSON with a `cookies` array and an `origins` array.
-
-```bash
-cat > /tmp/mkstate.py << 'PY'
-import json, sys, time
-
-# Read the Cookie header from stdin (allows optional "Cookie: " prefix).
-raw = sys.stdin.read().strip()
-if raw.lower().startswith("cookie:"):
-    raw = raw.split(":", 1)[1].strip()
-
-# Keep only better-auth cookies. Extend this set if the app genuinely needs more.
-WANTED = {"better-auth.session_token", "better-auth.state"}
-
-cookies = []
-exp = int(time.time()) + 30 * 24 * 3600  # 30 days
-for pair in raw.split("; "):
-    if "=" not in pair:
-        continue
-    name, _, value = pair.partition("=")
-    if name not in WANTED:
-        continue
-    cookies.append({
-        "name": name,
-        "value": value,
-        "domain": "localhost",
-        "path": "/",
-        "expires": exp,
-        "httpOnly": False,
-        "secure": False,
-        "sameSite": "Lax",
-    })
-
-if not cookies:
-    sys.stderr.write("no better-auth cookies found in input\n")
-    sys.exit(1)
-
-print(json.dumps({"cookies": cookies, "origins": []}, indent=2))
-PY
-
-# Feed the copied Cookie header in via env var or heredoc.
-printf '%s' "$COOKIE_HEADER" | python3 /tmp/mkstate.py > /tmp/state.json
-```
-
-**Note on `httpOnly`**: the real cookie in the user's browser is HttpOnly, but `storageState` doesn't enforce the flag on load — it just attaches the value. Storing with `httpOnly: false` is fine for local dev and sidesteps a CDP-context quirk where HttpOnly cookies sometimes fail to attach.
-
-## Step 3 — Load state and navigate
-
-```bash
-SESSION="my-test" # any stable session name
-
-agent-browser --session "$SESSION" state load /tmp/state.json
-agent-browser --session "$SESSION" open "http://localhost:3011/"
-agent-browser --session "$SESSION" get url
-# Expect NOT /signin?callbackUrl=… — if you still see signin, cookie didn't apply.
-```
-
-## Step 4 — Verify
-
-```bash
-agent-browser --session "$SESSION" snapshot -i | head -20
-# Look for the user's avatar/name in the sidebar, or absence of the signin form.
-```
-
-## Common failure modes
-
-| Symptom                                         | Cause                                                                   | Fix                                                  |
-| ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- |
-| Still redirects to `/signin` after `state load` | User pasted from `document.cookie` → missed HttpOnly session            | Re-pull from Network request Headers, not console    |
-| `state load` reports 0 cookies                  | Separator wrong, or user pasted URL-decoded value                       | Keep the raw `Cookie:` header as-is; split on `"; "` |
-| Login works briefly then expires                | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-load                                  |
-| Domain mismatch                                 | Use `domain: "localhost"` literally, no leading dot for local dev       | —                                                    |
-
-## Scope
-
-Only covers authenticating an **agent-browser** session into a **local** LobeHub dev server. It does not:
-
-- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com` and must be delivered over HTTPS.
-- Replace real OAuth flows — tests that must exercise the login UI need a real Chromium with `--remote-debugging-port` or a bot account.
-- Flow cookies back to the user's Chrome — injection is one-way (into agent-browser only).
diff --git a/.agents/skills/skills-audit/SKILL.md b/.agents/skills/skills-audit/SKILL.md
index 669e5ba2c8..0b19ed38f3 100644
--- a/.agents/skills/skills-audit/SKILL.md
+++ b/.agents/skills/skills-audit/SKILL.md
@@ -50,7 +50,7 @@ Common false positives (do NOT merge):
 - `db-migrations` vs `drizzle` — distinct workflows (migration files vs schema authoring).
 - `microcopy` vs `i18n` — content vs mechanics.
 - `agent-runtime-hooks` vs `agent-tracing` vs `agent-signal` — different surfaces of the agent system.
-- `testing` vs `local-testing` vs `cli-backend-testing` — different test types.
+- `testing` vs `agent-testing` — different test types.
 
 ### 4 — Description format consistency