Compare commits

..

1 Commits

Author SHA1 Message Date
ONLY-yours 8c5d2d7e9a chore: lock the missing dependices 2026-06-10 11:44:07 +08:00
3036 changed files with 36745 additions and 158833 deletions
-380
View File
@@ -1,380 +0,0 @@
---
name: agent-testing
description: >
Agentic end-to-end testing for LobeHub: backend verification via the CLI,
frontend verification via agent-browser (Electron), full-stack verification in
the browser, and bot-channel verification via osascript. Local-first today,
designed to extend to cloud automation. Triggers on 'cli test', 'test with cli',
'verify with cli', 'backend test with cli', 'local test', 'test in electron',
'test desktop', 'test bot', 'bot test', 'test in discord', 'test in telegram',
'test in slack', 'test in wechat', 'test in weixin', 'test in lark', 'test in feishu',
'test in qq', 'manual test', 'osascript', 'test report', or any local
end-to-end verification task.
---
# Agent Testing (Agentic End-to-End Verification)
One skill for all agentic end-to-end testing — local-first today, designed to
also run as full cloud automation. Every test session follows the same
four-step contract:
```
Step -1: Plan approval → Step 0: Env + Auth → Step 1: Pick surface → Step 2: Run → Step 3: Structured report
```
## Step -1 — Plan approval for non-trivial tests
Skip directly to Step 0 if: the test is a single re-run after a fix, the plan
was already agreed on, or the user gave exact commands.
Otherwise, propose a test plan (surface, cases, expected evidence, assumptions)
and use the runtime structured question tool (`request_user_input` /
ask-user-question equivalent) with two fixed choices:
1. `开始执行 (Recommended)` — 测试方案没问题,开始执行
2. `先讨论下` — 方案有问题,先讨论下
Wait for the user's choice before proceeding.
## Step 0 — Environment setup + auth check (mandatory)
Step 0 is about getting the environment ready: **dependencies are healthy**
and **auth is green**. A test run that dies halfway on a missing dependency or
a login wall wastes the whole session — clear both gates BEFORE writing a
single test step.
### 0.0 Resolve the current test environment
Before starting a dev server, checking auth, opening agent-browser, or writing
test steps, print and confirm the current local test environment:
```bash
./.agents/skills/agent-testing/scripts/test-env.sh
```
This command is the source of truth for local test ports. It reads the current
shell plus `.env` files using the same precedence as `scripts/runWithEnv.mts`,
then prints:
- `APP_URL`
- `PORT`
- `SERVER_URL`
- `AUTH_TRUSTED_ORIGINS`
- `SPA_PORT`
- `MOBILE_SPA_PORT`
- `DESKTOP_PORT`
For commands that need these values, export them from the same resolver:
```bash
eval "$(./.agents/skills/agent-testing/scripts/test-env.sh --exports)"
```
Do not rely on hard-coded port tables. If the printed values do not match the
running dev server, fix/export the env first, then continue.
### 0.1 Dependencies are installed — root AND standalone apps
The root pnpm workspace does **NOT** cover every app: `pnpm-workspace.yaml`
lists `packages/**`, `e2e`, `apps/server`, and only `apps/desktop/src/main`
**`apps/desktop` and `apps/cli` are standalone**, each keeping its own
`node_modules` with its own links into `packages/`. A root install does not
refresh them, so install in every app the test will touch:
```bash
pnpm install # root workspace
cd apps/desktop && pnpm install # Electron surface
cd apps/cli && pnpm install # CLI surface
```
Symptom of a stale standalone install: the build/launch fails to resolve a
recently added workspace package — `Rolldown failed to resolve import
"@lobechat/<pkg>"` (Electron) or `Cannot find module '@lobechat/<pkg>'` (CLI).
### 0.2 Run scripts from the repo root
All paths in this skill (`./.agents/skills/agent-testing/...`) are
repo-root-relative, and background commands inherit the current working
directory — a script launched while `cwd` is `apps/desktop` fails with
`No such file or directory`. Verify `pwd` is the repo root before launching
long-running scripts.
### 0.3 Init local dev env without `.env`
For Web smoke against local code, start a **normal local dev environment**.
First check the repo root for `.env`:
- If `.env` exists, use the existing local configuration and start the dev
server normally.
- If `.env` does not exist, use the agent-testing env bootstrap.
Do not start the standalone e2e server as the product under test.
Use `scripts/init-dev-env.sh`. It follows the e2e setup pattern — Postgres,
Redis, migrations, auth/key-vault/S3 test env, seed user — but it is owned by this
skill and starts the repo's dev server (`pnpm run dev:next` / `bun run dev`),
not `e2e/scripts/setup.ts --start`. The script hard-blocks when root `.env`
exists, so it cannot accidentally override a user's local config. When `.env`
exists, do not call any `init-dev-env.sh` subcommand.
Decision flow:
```bash
if [[ -f .env ]]; then
bun run dev
else
./.agents/skills/agent-testing/scripts/init-dev-env.sh setup-db
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
fi
```
Bootstrap flow when no `.env` exists:
```bash
# From repo root. Managed Postgres/Redis flow requires Docker Desktop.
./.agents/skills/agent-testing/scripts/init-dev-env.sh setup-db
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
```
If using an existing Postgres instead of the managed Docker DB, set
`DATABASE_URL` and `REDIS_URL`, then skip `setup-db`:
```bash
DATABASE_URL=postgresql://... REDIS_URL=redis://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh migrate
DATABASE_URL=postgresql://... REDIS_URL=redis://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
DATABASE_URL=postgresql://... REDIS_URL=redis://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
```
For backend-only checks, `dev-next` is available, but Web smoke needs the
full-stack `dev` command so Next can proxy the SPA HTML from Vite:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
```
Useful subcommands:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh env # print exports
./.agents/skills/agent-testing/scripts/init-dev-env.sh write # write .records/env/agent-testing-dev.env
./.agents/skills/agent-testing/scripts/init-dev-env.sh migrate # migrations only
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user # seed user + CLI API key
./.agents/skills/agent-testing/scripts/init-dev-env.sh qstash # local QStash for workflow paths
./.agents/skills/agent-testing/scripts/init-dev-env.sh clean-db # remove managed DB container
```
Default script env:
- `APP_URL=http://localhost:3010`
- `DATABASE_URL=postgresql://postgres:postgres@localhost:5433/postgres`
- `DATABASE_DRIVER=node`
- `AGENT_RUNTIME_MODE=queue` so backend-only agent runtime checks use the
same queued execution path as production
- `REDIS_URL=redis://localhost:6380` for queue-mode agent runtime state
- `FEATURE_FLAGS=-agent_self_iteration` so local smoke does not require QStash
- Local QStash defaults (`QSTASH_URL`, `QSTASH_TOKEN`, signing keys) are exported;
run `init-dev-env.sh qstash` in a separate terminal when the path under test
triggers QStash/Workflow.
- `KEY_VAULTS_SECRET`, `AUTH_SECRET`, auth verification off
- S3 mock vars
- Managed DB container: `lobehub-agent-testing-postgres`
- Managed Redis container: `lobehub-agent-testing-redis`
`seed-user` creates `agent-testing@lobehub.com` / `TestPassword123!` with
onboarding already completed, plus a local API key in
`.records/env/agent-testing-cli.env` for CLI automation. When running Cucumber
against this dev server, pass the same script env into the test process too;
Cucumber has its own `BeforeAll` seed path and it must see `DATABASE_URL`
instead of silently skipping setup:
```bash
cd e2e
# Only in the no-.env branch.
eval "$(../.agents/skills/agent-testing/scripts/init-dev-env.sh env)"
BASE_URL=http://localhost:3010 HEADLESS=true bun run test:smoke
```
### 0.4 Auth is green for the selected surface
**Auth is the gate for automated testing, but the gate is surface-scoped.**
Pick the intended surface first when it is already clear from the task, then
check only that surface. Do not block a Web test on CLI device-code auth or an
Electron login state unless the test spans those surfaces.
```bash
./.agents/skills/agent-testing/scripts/setup-auth.sh status --surface web
```
Use `status` with no `--surface` only for cross-surface test plans.
| Surface | Mechanism | One-key path | Standard check |
| -------- | --------------------------------------------- | ------------------------ | ----------------------------------------- |
| CLI | Seeded API key, device-code fallback | `setup-auth.sh cli-seed` | `setup-auth.sh status --surface cli` |
| Web | Seeded better-auth login into `agent-browser` | `setup-auth.sh web-seed` | `setup-auth.sh status --surface web` |
| Electron | App's own persistent login state | Log in once in the app | `setup-auth.sh status --surface electron` |
| Bot | Native apps already logged in | — | per-platform screenshot |
Login-state checks are standardized — do NOT hand-roll `window.__LOBE_STORES`
eval snippets; use `scripts/app-probe.sh auth` (returns `{ isSignedIn, userId }`,
works for Electron CDP and web sessions via `AB_TARGET`).
For Web tests, the test surface is always `agent-browser --session lobehub-dev`.
Use `setup-auth.sh web-seed` first in the seeded local env. The user's normal
Chrome is only a source for copying the Cookie header when seed auth is not
available or `status --surface web` still fails. If Chrome is already logged in,
do not open a login page; verify agent-browser first, then request the Network
`Cookie:` header only if that verification fails. Full background and failure modes:
[references/auth.md](./references/auth.md).
## Step 1 — Pick the surface by change scope
| Change scope | Default surface | Why | Guide |
| ------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------------------- | ---------------------------------- |
| **Backend** (TRPC router / service / model / migration) | **CLI** | Fastest loop, text-assertable output, zero UI flakiness | [cli/index.md](./cli/index.md) |
| **Pure frontend** (components, store, styles, UX) | **Electron** (agent-browser + CDP) | Primary product shape; `__LOBE_STORES` state introspection | [ui/electron.md](./ui/electron.md) |
| **Full-stack** (new API + UI consuming it) | **Web** (browser + local dev server) | One surface where network requests and UI are observable together | [ui/web.md](./ui/web.md) |
| **Bot channels** (Discord / WeChat / Lark / …) | Native app via osascript / bridge | Only way to exercise the real channel end-to-end | `bot/<platform>/index.md` |
Escalate, don't duplicate: verify a backend change with the CLI first; only add
a UI pass when the change actually affects the UI.
### Environment support (local macOS vs cloud Linux)
The decisive constraint per surface is **how evidence (screenshots) is
captured**: CDP-based capture (`agent-browser screenshot`) renders from the
browser engine and needs no real display; OS-level capture (`screencapture`,
osascript) is macOS-only.
| Surface | macOS (local) | Linux / cloud (headless) | Screenshot mechanism |
| -------- | ------------- | --------------------------------------------------------- | ------------------------------------------------------ |
| CLI | ✅ | ✅ | n/a — text output |
| Web | ✅ | ✅ headless Chromium works natively | CDP — no display needed |
| Electron | ✅ | ⚠️ runs, but needs a display server: wrap with `xvfb-run` | CDP works under Xvfb; `capture-app-window.sh` does NOT |
| Bot | ✅ | ❌ osascript + native apps are macOS-only | macOS `screencapture` only |
When a test must stay cloud-portable, prefer CDP-based evidence over
OS-level capture wherever both exist.
### Bot platforms
| Platform | Guide | Quick switcher |
| ------------- | ------------------------------------------------ | --------------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | bridge (no osascript) |
Each platform folder contains an `index.md` (activation, navigation,
send-message, verification snippets) and a `test-<platform>-bot.sh` script
sharing the interface:
```bash
./.agents/skills/agent-testing/bot/<platform>/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
```
New to osascript automation? Read
[references/osascript.md](./references/osascript.md) first — it is a general
macOS-automation asset (activate, type, paste, screenshot, accessibility reads,
gotchas), not bot-specific.
## Step 2 — Run
Surface guides above carry the detailed workflows. Shared infrastructure:
| Need | Where |
| ------------------------------------ | -------------------------------------------------------------------- |
| Start / restart the local dev server | [references/dev-server.md](./references/dev-server.md) |
| `agent-browser` command reference | [references/agent-browser.md](./references/agent-browser.md) |
| osascript patterns (general macOS) | [references/osascript.md](./references/osascript.md) |
| Agent gateway probing | [references/agent-gateway.md](./references/agent-gateway.md) |
| Screen recording | [references/record-app-screen.md](./references/record-app-screen.md) |
### Scripts
All under `.agents/skills/agent-testing/scripts/`:
| Script | Usage |
| ------------------------- | ---------------------------------------------------------------------------- |
| `test-env.sh` | Print/export the resolved local test env and ports |
| `setup-auth.sh` | One-stop auth setup & status check (`status` / `cli` / `web`) |
| `init-dev-env.sh` | Self-contained local dev env (`setup-db` / `seed-user` / `dev-next` / `dev`) |
| `app-probe.sh` | LobeHub app probes: `auth` / `route` / `ops` / `goto <path>` / `errors` |
| `record-gif.sh` | Frame-sequence → GIF for time-based behavior (streaming, timers, animations) |
| `report-init.sh` | Scaffold a structured test report (Step 3) |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart, CDP 9222) |
| `capture-app-window.sh` | Screenshot a specific app window (general; used by bot tests) |
| `record-app-screen.sh` | Record app screen (video + periodic screenshots) |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `agent-gateway/` | Gateway probe / dump / analyze tools |
`app-probe.sh` is the LobeHub-specific fast path into app state — auth check,
current route, running operations, and `goto <path>` quick navigation
(`/agent/<agentId>/<topicId>`, `/task/<taskId>`, `/settings`, …) so a test can
jump straight to the state under test instead of clicking through the UI. See
[ui/electron.md](./ui/electron.md#lobehub-probes--quick-navigation) for usage.
## Step 3 — Structured report (mandatory deliverable)
Every automated test session ends with a structured, evidence-backed report —
not a chat-only summary. Scaffold it up front and fill it as you test:
```bash
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh my-feature "Verify my feature")
# ... test, saving screenshots / CLI transcripts into $DIR/assets/ ...
# fill $DIR/report.md (scope, case table with inline evidence, verdict, score) and $DIR/result.json
```
Reports live in `.records/reports/<timestamp>-<slug>/` (gitignored): `report.md`
(human-readable, with screenshots/GIFs embedded directly in the case table),
`result.json` (machine-readable pass/fail + score), `assets/` (evidence).
Format spec and evidence rules:
[references/report.md](./references/report.md).
Two hard rules worth front-loading:
- **Report language = the user's conversation language.** Write the ENTIRE
`report.md` (headings included) in the language the user is conversing in —
no mixed English. `result.json` keys/status values stay English.
- **The case table is the main reading surface.** Prefer the compact
`# | case | result | key observation | evidence` shape and embed the
screenshot/GIF in the evidence cell. Use separate evidence sections only for
long CLI transcripts, HAR summaries, or supplemental detail.
- **Visual evidence must render inline.** Screenshots and GIFs in `report.md`
must use Markdown image syntax like `![case 1](assets/case1.png)`. Do not
use bare file paths, Markdown links, or local file links as the primary
visual evidence; those make the report unreadable without opening each asset.
- **Final replies must include visual evidence links.** When a run includes UI
screenshots or GIFs, include the report directory and the most important
visual artifacts in the final chat response. Each item must include a stable
label, an evidence caption describing the observed UI outcome, and a
repo-relative path, for example:
`[Image #1 - error toast shows provider auth failure](<report-dir>/assets/foo.png)`.
Use repo-relative paths, not absolute paths.
- **Time-based behavior needs a GIF, not a screenshot.** If a case asserts
change over time (streaming output, a ticking timer, loading states,
animations), record it with `scripts/record-gif.sh` and embed the GIF —
a static screenshot cannot prove the behavior.
## Directory map
```
agent-testing/
├── SKILL.md # this router
├── cli/index.md # backend verification via the LobeHub CLI
├── ui/electron.md # pure-frontend verification in the desktop app
├── ui/web.md # full-stack verification in the browser
├── bot/<platform>/ # bot-channel verification (osascript / bridge)
├── references/ # shared knowledge: auth, dev-server, agent-browser, osascript, report
└── scripts/ # setup-auth, report-init, electron-dev, capture, recording, gateway
```
## Gotchas
- agent-browser: see [references/agent-browser.md](./references/agent-browser.md#gotchas)
- Electron: see [ui/electron.md](./ui/electron.md#electron-gotchas)
- osascript: see [references/osascript.md](./references/osascript.md#gotchas)
-152
View File
@@ -1,152 +0,0 @@
# CLI Backend Verification
Default surface for verifying **backend changes** (TRPC routers, services,
models, migrations) end-to-end: fastest loop, text-assertable output, zero UI
flakiness.
## When to use
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
## Prerequisites
| Requirement | Details |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| Dev server | `localhost:3010` — see [../references/dev-server.md](../references/dev-server.md) |
| CLI source | `apps/cli/` — runs from source, no rebuild; standalone `node_modules` — run `pnpm install` inside `apps/cli/` (root install does not cover it) |
| CLI dev mode | `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated settings |
| Auth | Seeded API key first; Device Code Flow only as fallback — see [../references/auth.md](../references/auth.md) |
All CLI dev commands run from `apps/cli/`. Subsequent examples use `$CLI`:
```bash
source ../../.records/env/agent-testing-cli.env
CLI="bun src/index.ts"
```
## Workflow
### Step 1 — Server up?
See [../references/dev-server.md](../references/dev-server.md) for the health
check, start, and restart commands. Server-side code changes require a restart.
### Step 2 — Auth ready?
```bash
./.agents/skills/agent-testing/scripts/setup-auth.sh status
```
If the CLI is not ready in the seeded local environment:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
source .records/env/agent-testing-cli.env
./.agents/skills/agent-testing/scripts/setup-auth.sh cli-seed
```
If the target environment is not seeded, use the interactive fallback:
```bash
cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
```
Seeded API-key auth does not store credentials. It writes local settings under
`$HOME/.lobehub-dev` and requires the generated env file to be sourced before
CLI commands. Details:
[../references/auth.md](../references/auth.md).
### Step 3 — Test with CLI commands
CLI runs from source, so CLI-side code changes take effect immediately without
rebuilding:
```bash
cd apps/cli
$CLI <command>
```
Capture output for the report as you go (e.g. `$CLI task list | tee "$DIR/assets/task-list.txt"`).
### Step 4 — Clean up test data
```bash
$CLI task delete < id > -y
$CLI agent delete < id > -y
```
### Step 5 — Report
Finish with a structured report —
[../references/report.md](../references/report.md). CLI evidence = exact
command + trimmed output.
## Common testing patterns
### Task system
```bash
$CLI task list
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
$CLI task view T-1
$CLI task tree T-1
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
$CLI task delete T-1 -y
```
### Agent system
```bash
$CLI agent list
$CLI agent view <agent-id>
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & knowledge base
```bash
$CLI doc list
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
$CLI kb list
$CLI kb tree <kb-id>
```
### Model & provider
```bash
$CLI model list
$CLI provider list
$CLI provider test <provider-id>
```
## Dev-test cycle
```
1. Make code changes (service/model/router/type)
|
2. Run unit tests (fast feedback)
bunx vitest run --silent='passed-only' '<test-file>'
|
3. Restart dev server (if server-side changes — see dev-server.md)
|
4. CLI verification (end-to-end)
$CLI <command>
|
5. Clean up test data + write the report
```
## Troubleshooting
| Issue | Solution |
| --------------------------- | ------------------------------------------------------------------------------------------------------ |
| `No authentication found` | Source `.records/env/agent-testing-cli.env`, or run device-code `login --server http://localhost:3010` |
| `UNAUTHORIZED` on API calls | Re-run `init-dev-env.sh seed-user` and re-source the env file; for device-code fallback, re-run login |
| `ECONNREFUSED` | Dev server not running — see dev-server.md |
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| Login opens wrong server | Must use `--server` flag (env var doesn't work) |
@@ -1,257 +0,0 @@
# agent-browser CLI Reference
Generic reference for the `agent-browser` CLI — automate Chromium-based apps (Electron, Chrome, web) via Chrome DevTools Protocol. LobeHub-specific patterns live in [../ui/electron.md](../ui/electron.md) and [../ui/web.md](../ui/web.md); authentication recipes live in [auth.md](./auth.md).
Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
## Core Workflow
Every browser automation follows this pattern:
1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
```
## Command Chaining
```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
```
Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
## Essential Commands
```bash
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "text" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if dialog is open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
# Streaming
agent-browser stream enable # Start WebSocket streaming
agent-browser stream status # Inspect streaming state
agent-browser stream disable # Stop streaming
```
## Batch Execution
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
```
## Authentication
```bash
# Option 1: Auth vault (credentials stored encrypted)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
# Option 2: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close # State auto-saved
agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
# Option 3: Persistent profile
agent-browser --profile ~/.myapp open https://app.example.com/login
# Option 4: State file
agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3010`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [auth.md](./auth.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
```
## JavaScript Evaluation (eval)
```bash
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin << 'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'document.title' | base64)"
```
## Ref Lifecycle
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
## Annotated Screenshots (Vision Mode)
```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
agent-browser click @e2 # Click using ref from annotated screenshot
```
## Parallel Sessions
```bash
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
```
## Connect to Existing Chrome
```bash
agent-browser --auto-connect snapshot # Auto-discover running Chrome
agent-browser --cdp 9222 snapshot # Explicit CDP port
```
## iOS Simulator (Mobile Safari)
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
```
## Observability Dashboard
```bash
agent-browser dashboard install
agent-browser dashboard start # Background server on port 4848
agent-browser dashboard stop
```
## Cloud Providers
Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
## Browser Engine Selection
```bash
agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
```
## Gotchas
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
@@ -1,171 +0,0 @@
# Auth Setup for Local Agent Testing
**Auth is the gate for all automated testing.** Complete
[Step 0.0](../SKILL.md#00-resolve-the-current-test-environment) first so
`SERVER_URL` and ports are resolved, then verify auth before writing any test
step.
Initialize helpers first:
```bash
SCRIPT="./.agents/skills/agent-testing/scripts/setup-auth.sh"
TEST_ENV="./.agents/skills/agent-testing/scripts/test-env.sh"
eval "$($TEST_ENV --exports)"
```
Quick reference after initialization:
| Command | Purpose |
| ------------------------------ | -------------------------------------------------- |
| `$SCRIPT status` | Check all surfaces (server + CLI + web + Electron) |
| `$SCRIPT status --surface web` | Check only the Web surface gate |
| `$SCRIPT cli-seed` | Configure CLI API-key auth from the seeded key |
| `$SCRIPT cli` | Interactive CLI device-code login (user must run) |
| `$SCRIPT open-chrome` | Open Chrome at `SERVER_URL` with DevTools |
| `$SCRIPT web-seed` | Sign in the seeded user and inject cookies |
| `pbpaste \| $SCRIPT web` | Inject a copied Cookie header into agent-browser |
| `$SCRIPT web-verify` | Live-check agent-browser session auth |
Use `localhost` for Web auth; better-auth cookies are stored for `localhost`,
not `127.0.0.1`.
## Per-surface overview
| Surface | Mechanism | Persistence | Human interaction |
| -------- | ---------------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------- |
| CLI | Seeded API key or OIDC Device Code Flow | `.records/env/agent-testing-cli.env` + `$HOME/.lobehub-dev` | No for seed path; yes for device-code fallback |
| Web | Seeded better-auth login or cookie copy | `~/.lobehub-agent-testing/web-state.json` + agent-browser session | No for seed path; copy cookie only as fallback |
| Electron | App's own login state | Electron user-data dir | Log in once manually in the app |
| Bot | Native apps (Discord/WeChat/…) logged in | Each app's own session | Once per app |
## CLI — Seeded API key
For the self-contained no-root-`.env` dev environment, seed the baseline user
and API key once:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
source .records/env/agent-testing-cli.env
./.agents/skills/agent-testing/scripts/setup-auth.sh cli-seed
```
The seed step writes `LOBE_API_KEY` for humans and maps it to the CLI's current
auth variable, `LOBEHUB_CLI_API_KEY`. It also sets `LOBEHUB_SERVER` so CLI
commands hit the local server without needing a stored device-code token.
Use this for automated CLI verification:
```bash
cd apps/cli
source ../../.records/env/agent-testing-cli.env
bun src/index.ts <command>
```
## CLI — Device Code Flow fallback
Use device-code login only when testing against a non-seeded environment.
Credentials are isolated from the user's real CLI config via
`LOBEHUB_CLI_HOME=.lobehub-dev`, which the current CLI stores under
`$HOME/.lobehub-dev`.
```bash
cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
```
- The `--server` flag is required — an env var does NOT work and login will hit
the wrong server without it.
- Check state without logging in: `setup-auth.sh status` (verifies
`LOBEHUB_CLI_API_KEY` when present, otherwise checks the stored server URL).
- `UNAUTHORIZED` on API calls means the token expired — re-run login.
## Web — seeded better-auth login
The Web test surface is `agent-browser --session lobehub-dev`. The user's
ordinary Chrome is only a cookie source; Chrome screenshots, Chrome Network
records, and Chrome logged-in state do not prove the agent-browser test session
is authenticated.
For the seeded local dev environment, use the automatic path:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/setup-auth.sh web-seed
```
`web-seed` posts the seeded email/password to
`/api/auth/sign-in/email`, stores the returned cookie jar under
`~/.lobehub-agent-testing/`, converts it to Playwright `storageState`, loads it
into the `agent-browser` session, and verifies the session does not land on
`/signin`.
## Web — manual cookie injection fallback
`agent-browser --headed` on macOS often creates the Chromium window off-screen —
the user can't see or interact with it, so manual login inside the agent-browser
session fails. Instead, copy the **better-auth session cookie** out of the
user's own logged-in Chrome and inject it as a Playwright-style state file.
Do **not** use this on production URLs — only local dev. Treat the cookie as a
secret: don't paste it into shared logs, PRs, or commit it anywhere.
### Web — decision flow
1. `$SCRIPT status --surface web` — green? Start testing. Do not ask for a Cookie header.
2. Not green and using the seeded local env → `$SCRIPT web-seed`.
3. If repo-root `.env` exists and `web-seed` fails, do **not** seed or modify the current DB; treat it as an existing local environment and use Cookie injection.
4. Still not green or not using the seed env → `$SCRIPT open-chrome` opens Chrome at `SERVER_URL` with DevTools.
5. User copies the `Cookie:` header from Network tab → any same-origin request → Request Headers → right-click `Cookie:`**Copy value**. Must be from Network, NOT `document.cookie` (HttpOnly cookies are invisible to `document.cookie`).
6. `pbpaste | $SCRIPT web` — filters to better-auth cookies (`session_token`, `session_data`, `state`), builds Playwright `storageState`, loads it into the `agent-browser` session (`lobehub-dev`), opens `SERVER_URL`, and asserts the URL is not `/signin`.
`ENABLE_MOCK_DEV_USER` is not Web auth. It only affects server-side API context
and does not satisfy Better Auth or stop the SPA from redirecting to `/signin`.
Do not use it as a substitute for `status --surface web` or Cookie injection.
### Using the authenticated session
```bash
agent-browser --session lobehub-dev open "$SERVER_URL/"
agent-browser --session lobehub-dev snapshot -i | head -20
```
### Notes
- `storageState` doesn't enforce the HttpOnly flag on load — the script stores
cookies with `httpOnly: false`, which is fine for local dev and sidesteps a
CDP-context quirk where HttpOnly cookies sometimes fail to attach.
- The state file is kept at `~/.lobehub-agent-testing/web-state.json` so
`setup-auth.sh status` can report web-auth readiness across sessions.
### Common failure modes
| Symptom | Cause | Fix |
| --------------------------------------------- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| Still redirects to `/signin` after injection | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| Script reports `no better-auth cookies found` | User pasted the wrong value, or the cookie parser regressed | Keep the raw `Cookie:` header as-is; run `scripts/setup-auth.test.sh` if the input looks valid |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-inject |
| Domain mismatch | Cookie domain must be `localhost` literally, no leading dot for local dev | — |
## Electron
The desktop app keeps its own persistent login state in its user-data
directory — log in once manually inside the app and it survives restarts of
`electron-dev.sh`. No injection needed. The standard check (do NOT hand-roll a
store eval) once Electron is up with CDP:
```bash
./.agents/skills/agent-testing/scripts/app-probe.sh auth
# → {"ok":true,"isSignedIn":true,"userId":"user_xxx"}
```
`setup-auth.sh status` runs this probe automatically when CDP 9222 is
reachable.
## Scope
These recipes only cover **local dev** authentication. They do not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com`
and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI itself need a
real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way.
@@ -1,101 +0,0 @@
# Local Dev Server
Single source of truth for starting / restarting the backend that all test
surfaces (CLI, Electron, Web) hit.
## Resolve ports first
Run `test-env.sh` as described in
[SKILL.md Step 0.0](../SKILL.md#00-resolve-the-current-test-environment)
before starting or probing any local test surface.
## Ports & modes
| Command | What it runs | Port source |
| ------------------- | --------------------------------------------------------- | ------------------- |
| `pnpm run dev:next` | Next.js backend (API + auth) | `PORT` |
| `bun run dev` | Full-stack (Next.js + Vite SPA, via `devStartupSequence`) | `PORT` + `SPA_PORT` |
| `bun run dev:spa` | Vite SPA only, proxies API to `PORT` | `SPA_PORT` |
In the **cloud repo** (where this repo is the `lobehub/` submodule), local
worktree names map to fallback defaults only when `.env` and shell env do not
provide values:
| Workspace directory | Default `SERVER_URL` |
| ------------------- | -------------------------------- |
| `lobehub` | `http://localhost:3010` |
| `lobehub-cloud` | `http://localhost:3020` |
| `lobehub-cloud-1` | `http://localhost:3021` |
| `lobehub-cloud-N` | `http://localhost:$((3020 + N))` |
`test-env.sh` and `setup-auth.sh` both use the resolved env first and these
worktree defaults only as fallback. Treat the dev-server terminal output as the
final source of truth when testing a non-standard port, then export it for every
agent-testing command:
```bash
export SERVER_URL=http://localhost:<port-from-dev-output>
```
## Health check
```bash
curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/"
```
## Start / restart
```bash
# Start backend only.
# With root .env: use the existing local config.
# Agent runtime queue mode is required to mirror production async execution.
AGENT_RUNTIME_MODE=queue pnpm run dev:next
# Without root .env: use the self-contained agent-testing env.
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
# Full-stack SPA + backend. Required for Web smoke.
# With root .env:
AGENT_RUNTIME_MODE=queue bun run dev
# Without root .env:
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
# Local QStash. Run in a separate terminal only when testing workflow paths.
./.agents/skills/agent-testing/scripts/init-dev-env.sh qstash
# Restart — required to pick up server-side code changes
lsof -ti:"$PORT" | xargs kill
pnpm run dev:next
# or, when no root .env exists:
# ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
```
## When a server restart is needed
Next.js hot-reload may not pick up changes in workspace packages — restart when
in doubt.
| Change location | Restart? |
| ----------------------------------------------- | -------- |
| `apps/server/src/` (routers, services, modules) | Yes |
| `src/server/` (agent-hono, workflows-hono) | Yes |
| `packages/database/` (models) | Yes |
| `packages/types/` | Yes |
| `packages/prompts/` | Yes |
| `apps/cli/` (CLI runs from source) | No |
## Troubleshooting
| Issue | Solution |
| ------------------------- | --------------------------------------------------------------------------------------------- |
| `ECONNREFUSED` | Server not running — start it |
| `EADDRINUSE` on the port | Already running — `lsof -ti:<port> \| xargs kill` first |
| Stale data / old behavior | Server needs a restart to pick up code changes |
| Agent call runs inline | Set `AGENT_RUNTIME_MODE=queue`, make sure `REDIS_URL` is configured, then restart the server |
| Queue mode needs Redis | Run `init-dev-env.sh setup-db`, or provide `REDIS_URL=redis://...` for an existing Redis |
| QStash workflow failures | Start `init-dev-env.sh qstash` and make sure dev server inherited the script's `QSTASH_*` env |
Marketplace/community endpoints are not part of the local agent-testing auth
gate. Do not block local product-chain verification on marketplace API auth
unless the change explicitly targets marketplace behavior.
@@ -1,186 +0,0 @@
# Structured Test Reports
Every automated test session ends with a structured, evidence-backed report.
A chat-only summary is not an acceptable deliverable: the report is what the
user (or a reviewer, or a later agent) audits without replaying the session.
## Location & layout
Reports live under `.records/reports/` (gitignored, like all `.records/`
output):
```
.records/reports/<YYYYMMDD-HHMMSS>-<slug>/
├── report.md # human-readable report (case table with inline screenshots, verdict)
├── result.json # machine-readable results (pass/fail counts, score)
└── assets/ # evidence: screenshots, HAR files, CLI transcripts
```
## Workflow
1. **Scaffold up front** — before running the first test step:
```bash
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh < slug > "<title>")
```
The script creates the directory, pre-fills branch / commit / date in both
files, and prints the directory path. The scaffold uses the compact report
shape below; translate its headings and table labels to the user's language
before delivery if needed.
2. **Collect evidence as you test** — every asserted behavior gets one evidence
item in `$DIR/assets/`:
- UI (static state): `agent-browser screenshot` or `capture-app-window.sh`,
then **verify the screenshot with the Read tool before citing it** —
never cite an image you haven't looked at.
- UI (time-based behavior): **screenshot vs GIF is a judgment you must
make per case.** If the assertion is about change over time — streaming
output, a ticking timer, loading/progress states, animations,
appear/disappear transitions — a static screenshot cannot prove it.
Record a frame sequence and synthesize a GIF:
```bash
# start recording (background), trigger the behavior, wait for it to finish
../scripts/record-gif.sh "$DIR/assets/case2-streaming.gif" 12 2 &
GIF_PID=$!
# ... drive the scenario ...
wait $GIF_PID
```
Embed it like an image: `![case 2](assets/case2-streaming.gif)`. Verify
at least the first/last frames visually (Read the GIF) before citing.
- CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
- Network: `agent-browser network requests` dumps or HAR files.
3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
The primary evidence belongs in the case table itself: each row should pair
the assertion with the screenshot/GIF or non-visual artifact that proves it,
so readers can scan the result without jumping between sections. UI evidence
must render inline with Markdown image syntax; a plain link or file path is
not acceptable as primary visual evidence.
4. **Set the verdict** in both `report.md` and `result.json`, then link the
report directory in your final answer to the user. If UI evidence exists,
list the key screenshot/GIF links in the final chat response. Use Markdown
link text as the evidence caption, for example:
`[Image #1 - observed outcome](<report-dir>/assets/case1.png)`.
## Report language (hard rule)
**`report.md` MUST be written in the language the user is conversing in** —
the whole file, headings included. If the conversation is in Chinese, the
report is in Chinese; do not mix English prose into it. The scaffold headings
are placeholders — translate them when filling if the user is not conversing in
the scaffold language. Exceptions that stay as-is: code/commands, identifiers,
log excerpts, and `result.json` (its keys and status values are machine-read
and stay English; the `title` and case `name` fields follow the user's
language).
## report.md sections
Default report shape:
| Section | Content |
| ---------------- | -------------------------------------------------------------------------------------------- |
| **Scope** | What changed / what is being verified; branch, commit, date, surface, entry URL/page, focus |
| **Cases** | Compact table: `# \| Case \| Result \| Key observation \| Evidence` |
| **Verdict** | Overall verdict first (`pass` / `partial` / `fail`), then the concise reasons and follow-ups |
| **Verification** | Commands or automated checks run in this session, with trimmed results |
| **Score** | Pass/fail/blocked counts, optional 0100 score |
The case table is the main reading surface. Prefer one clear row per user
scenario or regression assertion, and put the screenshot/GIF directly in the
`Evidence` cell:
```markdown
| # | Case | Result | Key observation | Evidence |
| --- | ------------------------ | ------ | ----------------------------------------------------------------- | ------------------------------------------------ |
| 1 | Create a new page | pass | Title and body persisted after refresh | ![created page](assets/new-page-created.png) |
| 2 | Respect requested length | fail | Requested about 600 Chinese characters; final body was about 1286 | ![final article](assets/write-article-final.png) |
```
## Inline visual evidence
Screenshots and GIFs must be embedded so the report shows the image inline:
```markdown
![case 1 result](assets/case1-result.png)
![streaming response](assets/case2-streaming.gif)
```
Do **not** use these as the primary evidence for UI cases:
```markdown
[case 1 result](assets/case1-result.png)
assets/case1-result.png
file:///tmp/case1-result.png
```
Links are acceptable for non-visual artifacts such as CLI transcripts, HAR
files, or long logs. For videos, embed a representative screenshot/GIF inline in
the case row and link the full video as supplemental evidence.
Avoid the old wide table with separate `steps`, `expected`, and `actual`
columns unless the test is purely non-visual and truly needs that breakdown.
For UI reports, those columns make screenshot-backed reading harder. Put
procedural detail in the row's key observation only when it changes the
interpretation of the result.
Use an extra evidence/detail section only when the inline table cannot carry
the material cleanly, such as long CLI transcripts, HAR summaries, or multiple
screenshots for one case. In that situation, keep the table evidence cell as an
inline visual proof for UI cases or a concise link for non-visual artifacts,
then put the longer material under `Verification` or a brief
`Additional Evidence` section.
Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
missing; a blocked case is not a pass).
## result.json schema
```json
{
"branch": "feat/task-tree",
"cases": [
{
"id": "1",
"name": "task tree returns nested children",
"surface": "cli",
"status": "pass",
"evidence": ["assets/task-tree.txt"]
}
],
"commit": "abc1234",
"createdAt": "2026-06-11T15:30:00+08:00",
"summary": {
"total": 1,
"passed": 1,
"failed": 0,
"blocked": 0,
"score": 100,
"verdict": "pass"
},
"surfaces": ["cli"],
"title": "Verify task tree API"
}
```
`score` is optional — use it when the verdict has a subjective component (UI
polish, copy quality); omit it for purely binary runs. `verdict` is the single
word the user reads first: `pass`, `fail`, or `partial`.
## Rules
- **No evidence, no claim** — every `pass`/`fail` in the case table must link
at least one asset. UI cases must inline-embed their primary screenshot/GIF;
non-visual CLI/network cases may link transcripts, HAR files, or logs.
- **Screenshots must be visually verified** with the Read tool before being
cited.
- **Report failures faithfully** — a failing case with clear evidence is a good
report; a vague green one is not.
- If coverage was cut (cases skipped, surfaces not exercised), say so in the
Verdict section — silent truncation reads as "covered everything".
@@ -1,95 +0,0 @@
#!/usr/bin/env bash
# app-probe.sh — standardized probes for a running LobeHub app (Electron via
# CDP, or a web agent-browser session). Use these instead of hand-rolling
# `window.__LOBE_STORES` eval snippets — especially the auth check.
#
# Usage:
# app-probe.sh auth # { isSignedIn, userId } from the user store
# app-probe.sh route # current SPA route
# app-probe.sh ops # running chat operations (type / status / startTime)
# app-probe.sh goto <path> # navigate the SPA to a route (full reload), e.g. goto /agent/agt_xxx
# app-probe.sh errors-install # install a console.error interceptor
# app-probe.sh errors # dump errors captured since errors-install
#
# Target selection (default: Electron over CDP 9222):
# AB_TARGET="--cdp 9222" # Electron (default; CDP_PORT also honored)
# AB_TARGET="--session lobehub-dev" # web agent-browser session
#
# Common routes (desktop SPA): / /agent/<agentId> /agent/<agentId>/<topicId>
# /task /task/<taskId> /page /settings /community
set -euo pipefail
AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
run_eval() {
# shellcheck disable=SC2086
agent-browser $AB_TARGET eval --stdin
}
case "${1:-}" in
auth)
run_eval << 'EVALEOF'
(function () {
var stores = window.__LOBE_STORES;
if (!stores || !stores.user) return JSON.stringify({ ok: false, reason: 'no user store — app not loaded yet?' });
var u = stores.user();
return JSON.stringify({ ok: !!u.isSignedIn, isSignedIn: !!u.isSignedIn, userId: (u.user && u.user.id) || null });
})()
EVALEOF
;;
route)
run_eval << 'EVALEOF'
location.pathname + location.search + location.hash
EVALEOF
;;
ops)
run_eval << 'EVALEOF'
(function () {
var stores = window.__LOBE_STORES;
if (!stores || !stores.chat) return JSON.stringify({ ok: false, reason: 'no chat store — open a conversation first' });
var ops = Object.values(stores.chat().operations || {});
var running = ops.filter(function (o) { return o.status === 'running'; });
return JSON.stringify({
ok: true,
running: running.map(function (o) { return { startTime: o.metadata && o.metadata.startTime, type: o.type }; }),
runningCount: running.length,
total: ops.length,
});
})()
EVALEOF
;;
goto)
TARGET_PATH="${2:?Usage: app-probe.sh goto <path>}"
# shellcheck disable=SC2086
agent-browser $AB_TARGET eval "location.href = '$TARGET_PATH'" > /dev/null
sleep 2
bash "${BASH_SOURCE[0]}" route
;;
errors-install)
run_eval << 'EVALEOF'
(function () {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function () {
var msg = Array.from(arguments).map(function (a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
;;
errors)
run_eval << 'EVALEOF'
JSON.stringify(window.__CAPTURED_ERRORS || 'interceptor not installed — run errors-install first')
EVALEOF
;;
*)
echo "Usage: $0 {auth|route|ops|goto <path>|errors-install|errors}" >&2
exit 2
;;
esac
@@ -1,459 +0,0 @@
#!/usr/bin/env bash
# init-dev-env.sh — self-contained local dev env for agent testing.
#
# This script initializes the env needed to run LobeHub's normal local dev
# server without depending on a root .env file. It follows the same shape as
# the e2e bootstrap (Postgres + migrations + auth/key-vault/S3 test env), but
# starts the repo's dev server, not the standalone e2e server.
#
# Guardrail: if repo-root .env exists, every non-help command exits immediately.
# Existing local config always wins.
#
# Usage:
# init-dev-env.sh env # print shell exports
# init-dev-env.sh write [file] # write a source-able env file
# init-dev-env.sh setup-db # start local Postgres/Redis and run migrations
# init-dev-env.sh migrate # run DB migrations against the configured DB
# init-dev-env.sh seed-user # seed the baseline test user + CLI API key
# init-dev-env.sh qstash # run local Upstash QStash dev server
# init-dev-env.sh dev-next # exec `pnpm run dev:next` with this env
# init-dev-env.sh dev # exec `bun run dev` with this env
# init-dev-env.sh clean-db # remove the managed Postgres/Redis containers
#
# Overrides:
# SERVER_PORT=3010 DB_PORT=5433 DB_CONTAINER=lobehub-agent-testing-postgres REDIS_PORT=6380 REDIS_CONTAINER=lobehub-agent-testing-redis QSTASH_DEV_PORT=8080
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
ROOT_ENV_FILE="$REPO_ROOT/.env"
SERVER_PORT="${SERVER_PORT:-3010}"
DB_PORT="${DB_PORT:-5433}"
DB_CONTAINER="${DB_CONTAINER:-lobehub-agent-testing-postgres}"
DATABASE_URL="${DATABASE_URL:-postgresql://postgres:postgres@localhost:${DB_PORT}/postgres}"
REDIS_PORT="${REDIS_PORT:-6380}"
REDIS_CONTAINER="${REDIS_CONTAINER:-lobehub-agent-testing-redis}"
REDIS_URL="${REDIS_URL:-redis://localhost:${REDIS_PORT}}"
ENV_FILE_DEFAULT="$REPO_ROOT/.records/env/agent-testing-dev.env"
CLI_ENV_FILE_DEFAULT="$REPO_ROOT/.records/env/agent-testing-cli.env"
AGENT_TESTING_API_KEY="${AGENT_TESTING_API_KEY:-sk-lh-agenttesting0001}"
QSTASH_DEV_PORT="${QSTASH_DEV_PORT:-8080}"
QSTASH_LOCAL_TOKEN="${QSTASH_LOCAL_TOKEN:-eyJVc2VySUQiOiJkZWZhdWx0VXNlciIsIlBhc3N3b3JkIjoiZGVmYXVsdFBhc3N3b3JkIn0=}"
QSTASH_LOCAL_CURRENT_SIGNING_KEY="${QSTASH_LOCAL_CURRENT_SIGNING_KEY:-sig_7kYjw48mhY7kAjqNGcy6cr29RJ6r}"
QSTASH_LOCAL_NEXT_SIGNING_KEY="${QSTASH_LOCAL_NEXT_SIGNING_KEY:-sig_5ZB6DVzB1wjE8S6rZ7eenA8Pdnhs}"
ok() { printf ' \033[32m✔\033[0m %s\n' "$1"; }
bad() { printf ' \033[31m✘\033[0m %s\n' "$1"; }
note() { printf ' %s\n' "$1"; }
guard_no_root_env() {
if [[ -f "$ROOT_ENV_FILE" ]]; then
bad "root .env exists: $ROOT_ENV_FILE"
note "Use the existing local configuration instead of init-dev-env.sh."
note "Start normally from repo root, e.g. pnpm run dev:next or bun run dev."
exit 1
fi
}
apply_env() {
export AGENT_RUNTIME_MODE="${AGENT_RUNTIME_MODE:-queue}"
export APP_URL="${APP_URL:-http://localhost:${SERVER_PORT}}"
export AUTH_EMAIL_VERIFICATION="${AUTH_EMAIL_VERIFICATION:-0}"
export AUTH_SECRET="${AUTH_SECRET:-agent-testing-local-auth-secret-32chars}"
export DATABASE_DRIVER="${DATABASE_DRIVER:-node}"
export DATABASE_URL
export FEATURE_FLAGS="${FEATURE_FLAGS:--agent_self_iteration}"
export KEY_VAULTS_SECRET="${KEY_VAULTS_SECRET:-r2gbBPKyJ8ZRKCLKt+I3DImfcL+wGxaQyRC56xtm9Uk=}"
export NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION="${NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION:-0}"
export NODE_OPTIONS="${NODE_OPTIONS:---max-old-space-size=6144}"
export PORT="${PORT:-$SERVER_PORT}"
export QSTASH_CURRENT_SIGNING_KEY="${QSTASH_CURRENT_SIGNING_KEY:-$QSTASH_LOCAL_CURRENT_SIGNING_KEY}"
export QSTASH_DEV_PORT
export QSTASH_NEXT_SIGNING_KEY="${QSTASH_NEXT_SIGNING_KEY:-$QSTASH_LOCAL_NEXT_SIGNING_KEY}"
export QSTASH_TOKEN="${QSTASH_TOKEN:-$QSTASH_LOCAL_TOKEN}"
export QSTASH_URL="${QSTASH_URL:-http://127.0.0.1:${QSTASH_DEV_PORT}}"
export REDIS_URL
export S3_ACCESS_KEY_ID="${S3_ACCESS_KEY_ID:-agent-testing-access-key}"
export S3_BUCKET="${S3_BUCKET:-agent-testing-bucket}"
export S3_ENDPOINT="${S3_ENDPOINT:-https://agent-testing-s3.localhost}"
export S3_SECRET_ACCESS_KEY="${S3_SECRET_ACCESS_KEY:-agent-testing-secret-key}"
}
env_keys() {
printf '%s\n' \
APP_URL \
AGENT_RUNTIME_MODE \
AUTH_EMAIL_VERIFICATION \
AUTH_SECRET \
DATABASE_DRIVER \
DATABASE_URL \
FEATURE_FLAGS \
KEY_VAULTS_SECRET \
NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION \
NODE_OPTIONS \
PORT \
QSTASH_CURRENT_SIGNING_KEY \
QSTASH_DEV_PORT \
QSTASH_NEXT_SIGNING_KEY \
QSTASH_TOKEN \
QSTASH_URL \
REDIS_URL \
S3_ACCESS_KEY_ID \
S3_BUCKET \
S3_ENDPOINT \
S3_SECRET_ACCESS_KEY
}
print_env() {
apply_env
while IFS= read -r key; do
printf 'export %s=%q\n' "$key" "${!key}"
done < <(env_keys)
}
write_env() {
local file="${1:-$ENV_FILE_DEFAULT}"
apply_env
mkdir -p "$(dirname "$file")"
{
printf '# Source this file before starting LobeHub local dev server.\n'
printf '# Generated by %s\n' "$0"
while IFS= read -r key; do
printf 'export %s=%q\n' "$key" "${!key}"
done < <(env_keys)
} > "$file"
ok "wrote env file: $file"
note "source it with: source $file"
}
require_docker() {
if ! command -v docker > /dev/null 2>&1; then
bad "docker CLI is not available"
note "Install/start Docker Desktop, or provide DATABASE_URL for an existing Postgres."
return 1
fi
}
wait_for_db() {
printf ' waiting for Postgres'
until docker exec "$DB_CONTAINER" pg_isready -U postgres > /dev/null 2>&1; do
printf '.'
sleep 2
done
printf '\n'
}
wait_for_redis() {
printf ' waiting for Redis'
until docker exec "$REDIS_CONTAINER" redis-cli ping > /dev/null 2>&1; do
printf '.'
sleep 1
done
printf '\n'
}
start_db() {
require_docker
if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
ok "Postgres container already running: $DB_CONTAINER"
elif docker ps -a --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
docker start "$DB_CONTAINER" > /dev/null
ok "started existing Postgres container: $DB_CONTAINER"
else
docker run -d \
--name "$DB_CONTAINER" \
-e POSTGRES_PASSWORD=postgres \
-p "${DB_PORT}:5432" \
paradedb/paradedb:latest > /dev/null
ok "created Postgres container: $DB_CONTAINER"
fi
wait_for_db
}
start_redis() {
require_docker
if docker ps --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
ok "Redis container already running: $REDIS_CONTAINER"
elif docker ps -a --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
docker start "$REDIS_CONTAINER" > /dev/null
ok "started existing Redis container: $REDIS_CONTAINER"
else
docker run -d \
--name "$REDIS_CONTAINER" \
-p "${REDIS_PORT}:6379" \
redis:7-alpine > /dev/null
ok "created Redis container: $REDIS_CONTAINER"
fi
wait_for_redis
}
migrate_db() {
apply_env
cd "$REPO_ROOT"
bun run db:migrate
}
seed_user() {
apply_env
export AGENT_TESTING_API_KEY
export AGENT_TESTING_CLI_ENV_FILE="${AGENT_TESTING_CLI_ENV_FILE:-$CLI_ENV_FILE_DEFAULT}"
cd "$REPO_ROOT"
node <<'NODE'
const bcrypt = require('bcryptjs');
const crypto = require('node:crypto');
const fs = require('node:fs');
const path = require('node:path');
const pg = require('pg');
const databaseUrl = process.env.DATABASE_URL;
if (!databaseUrl) {
throw new Error('DATABASE_URL is required to seed the baseline test user.');
}
const TEST_USER = {
email: 'agent-testing@lobehub.com',
fullName: 'Agent Testing User',
id: 'user_agent_testing_001',
password: 'TestPassword123!',
username: 'agent_testing_user',
};
const TEST_API_KEY = {
id: 'api_key_agent_testing_001',
key: process.env.AGENT_TESTING_API_KEY || 'sk-lh-agenttesting0001',
name: 'Agent Testing CLI API Key',
};
const validateApiKeyFormat = (apiKey) => /^sk-lh-[\da-z]{16}$/.test(apiKey);
const hashApiKey = (apiKey) => {
const secret = process.env.KEY_VAULTS_SECRET;
if (!secret) throw new Error('KEY_VAULTS_SECRET is required to seed the baseline API key.');
return crypto.createHmac('sha256', secret).update(apiKey).digest('hex');
};
const encryptWithKeyVaultsSecret = (plaintext) => {
const secret = process.env.KEY_VAULTS_SECRET;
if (!secret) throw new Error('KEY_VAULTS_SECRET is required to seed the baseline API key.');
const rawKey = Buffer.from(secret, 'base64');
if (![16, 24, 32].includes(rawKey.length)) {
throw new Error(
`KEY_VAULTS_SECRET must decode to 16, 24, or 32 bytes, got ${rawKey.length} bytes.`,
);
}
const iv = crypto.randomBytes(12);
const cipher = crypto.createCipheriv(`aes-${rawKey.length * 8}-gcm`, rawKey, iv);
const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
const authTag = cipher.getAuthTag();
return `${iv.toString('hex')}:${authTag.toString('hex')}:${encrypted.toString('hex')}`;
};
const writeCliEnvFile = () => {
const file = process.env.AGENT_TESTING_CLI_ENV_FILE || '.records/env/agent-testing-cli.env';
fs.mkdirSync(path.dirname(file), { recursive: true });
fs.writeFileSync(
file,
[
'# Source this file before running LobeHub CLI agent tests.',
'# Generated by init-dev-env.sh seed-user',
`export LOBE_API_KEY=${TEST_API_KEY.key}`,
`export LOBEHUB_CLI_API_KEY="${'${LOBE_API_KEY}'}"`,
`export LOBEHUB_SERVER=${process.env.APP_URL}`,
'export LOBEHUB_CLI_HOME=.lobehub-dev',
'',
].join('\n'),
);
return file;
};
const client = new pg.Client({ connectionString: databaseUrl });
(async () => {
if (!validateApiKeyFormat(TEST_API_KEY.key)) {
throw new Error(`Invalid AGENT_TESTING_API_KEY format: ${TEST_API_KEY.key}`);
}
await client.connect();
const now = new Date().toISOString();
const onboarding = JSON.stringify({ finishedAt: now, version: 1 });
const passwordHash = await bcrypt.hash(TEST_USER.password, 10);
const encryptedApiKey = encryptWithKeyVaultsSecret(TEST_API_KEY.key);
const apiKeyHash = hashApiKey(TEST_API_KEY.key);
await client.query(
`INSERT INTO users (id, email, normalized_email, username, full_name, email_verified, onboarding, created_at, updated_at, last_active_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $8, $8)
ON CONFLICT (id) DO UPDATE SET onboarding = $7, updated_at = $8`,
[
TEST_USER.id,
TEST_USER.email,
TEST_USER.email.toLowerCase(),
TEST_USER.username,
TEST_USER.fullName,
true,
onboarding,
now,
],
);
await client.query(
`INSERT INTO accounts (id, user_id, account_id, provider_id, password, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, $6, $6)
ON CONFLICT DO NOTHING`,
[
'agent_testing_account_001',
TEST_USER.id,
TEST_USER.email,
'credential',
passwordHash,
now,
],
);
await client.query(
`INSERT INTO api_keys (id, name, key, key_hash, enabled, expires_at, user_id, workspace_id, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, NULL, $6, NULL, $7, $7)
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name,
key = EXCLUDED.key,
key_hash = EXCLUDED.key_hash,
enabled = EXCLUDED.enabled,
expires_at = NULL,
updated_at = EXCLUDED.updated_at`,
[
TEST_API_KEY.id,
TEST_API_KEY.name,
encryptedApiKey,
apiKeyHash,
true,
TEST_USER.id,
now,
],
);
const cliEnvFile = writeCliEnvFile();
console.log('seeded baseline user:');
console.log(` email: ${TEST_USER.email}`);
console.log(` password: ${TEST_USER.password}`);
console.log('seeded baseline API key:');
console.log(` LOBE_API_KEY: ${TEST_API_KEY.key}`);
console.log(` CLI env: ${cliEnvFile}`);
})()
.finally(() => client.end())
.catch((error) => {
console.error(error);
process.exit(1);
});
NODE
}
cmd_status() {
apply_env
echo "agent-testing local dev env:"
note "APP_URL=$APP_URL"
note "AGENT_RUNTIME_MODE=$AGENT_RUNTIME_MODE"
note "DATABASE_URL=$DATABASE_URL"
note "PORT=$PORT"
note "QSTASH_URL=$QSTASH_URL"
note "REDIS_URL=$REDIS_URL"
if command -v docker > /dev/null 2>&1; then
ok "docker CLI available"
if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
ok "managed Postgres running: $DB_CONTAINER"
else
note "managed Postgres is not running: $DB_CONTAINER"
fi
if docker ps --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
ok "managed Redis running: $REDIS_CONTAINER"
else
note "managed Redis is not running: $REDIS_CONTAINER"
fi
else
bad "docker CLI is not available"
fi
}
cmd_qstash() {
apply_env
cd "$REPO_ROOT"
note "starting local QStash dev server at $QSTASH_URL"
note "keep this process running while testing workflow paths"
exec pnpm run qstash -- -port "$QSTASH_DEV_PORT"
}
cmd_dev_next() {
apply_env
cd "$REPO_ROOT"
exec pnpm run dev:next
}
cmd_dev() {
apply_env
cd "$REPO_ROOT"
exec bun run dev
}
cmd_clean_db() {
require_docker
if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
docker stop "$DB_CONTAINER" > /dev/null
fi
if docker ps -a --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
docker rm "$DB_CONTAINER" > /dev/null
ok "removed Postgres container: $DB_CONTAINER"
else
note "Postgres container not found: $DB_CONTAINER"
fi
if docker ps --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
docker stop "$REDIS_CONTAINER" > /dev/null
fi
if docker ps -a --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
docker rm "$REDIS_CONTAINER" > /dev/null
ok "removed Redis container: $REDIS_CONTAINER"
else
note "Redis container not found: $REDIS_CONTAINER"
fi
}
usage() {
sed -n '3,24p' "$0" >&2
}
COMMAND="${1:-status}"
case "$COMMAND" in
help|-h|--help) usage; exit 0 ;;
*) guard_no_root_env ;;
esac
case "$COMMAND" in
env) print_env ;;
write) shift; write_env "${1:-}" ;;
setup-db)
start_db
start_redis
migrate_db
;;
migrate) migrate_db ;;
seed-user) seed_user ;;
qstash) cmd_qstash ;;
dev-next) cmd_dev_next ;;
dev) cmd_dev ;;
clean-db) cmd_clean_db ;;
status) cmd_status ;;
*)
usage
exit 2
;;
esac
@@ -1,61 +0,0 @@
#!/usr/bin/env bash
# record-gif.sh — capture a frame sequence via agent-browser (CDP) and
# synthesize a GIF for embedding in a test report.
#
# Use this whenever the asserted behavior is about CHANGE OVER TIME —
# streaming output, a ticking timer, loading states, animations. A static
# screenshot cannot prove those; a GIF can. Cloud-portable: frames come from
# CDP rendering, no OS-level screen capture.
#
# Usage:
# record-gif.sh <output.gif> <duration_seconds> [fps]
#
# AB_TARGET="--cdp 9222" # Electron (default; CDP_PORT honored)
# AB_TARGET="--session lobehub-dev" # web agent-browser session
# GIF_WIDTH=960 # output width (px), default 960
#
# Requires ffmpeg (`brew install ffmpeg`). Effective fps is capped by
# screenshot latency (~0.3-0.5s per frame); 1-2 fps is the realistic range.
#
# Example — record a 12s run and embed it in the report:
# ./record-gif.sh "$DIR/assets/case2-tray-running.gif" 12 2 &
# GIF_PID=$!
# # ... trigger the streaming behavior ...
# wait $GIF_PID
set -euo pipefail
OUT="${1:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
DUR="${2:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
FPS="${3:-2}"
AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
GIF_WIDTH="${GIF_WIDTH:-960}"
command -v ffmpeg > /dev/null || {
echo "ffmpeg not found — install with: brew install ffmpeg" >&2
exit 1
}
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT
FRAMES=$((DUR * FPS))
INTERVAL=$(python3 -c "print(1 / $FPS)")
for i in $(seq -f '%04g' 1 "$FRAMES"); do
# shellcheck disable=SC2086
agent-browser $AB_TARGET screenshot "$TMP/frame-$i.png" > /dev/null 2>&1 || true
sleep "$INTERVAL"
done
CAPTURED=$(find "$TMP" -name 'frame-*.png' | wc -l | tr -d ' ')
[ "$CAPTURED" -gt 0 ] || {
echo "no frames captured — is the app reachable via $AB_TARGET?" >&2
exit 1
}
ffmpeg -y -loglevel error -framerate "$FPS" -pattern_type glob -i "$TMP/frame-*.png" \
-vf "fps=$FPS,scale=$GIF_WIDTH:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
"$OUT"
echo "$OUT ($CAPTURED frames @ ${FPS}fps)"
@@ -1,88 +0,0 @@
#!/usr/bin/env bash
# report-init.sh — scaffold a structured test report under .records/reports/.
#
# Format spec and evidence rules: ../references/report.md
#
# Usage:
# report-init.sh <slug> [title]
#
# Prints the report directory path (capture it: DIR=$(report-init.sh my-test)).
set -euo pipefail
SLUG="${1:?Usage: report-init.sh <slug> [title]}"
TITLE="${2:-$SLUG}"
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
TS="$(date +%Y%m%d-%H%M%S)"
DIR="$REPO_ROOT/.records/reports/$TS-$SLUG"
mkdir -p "$DIR/assets"
BRANCH=$(git -C "$REPO_ROOT" branch --show-current 2> /dev/null || echo "unknown")
COMMIT=$(git -C "$REPO_ROOT" rev-parse --short HEAD 2> /dev/null || echo "unknown")
DATE_HUMAN=$(date '+%Y-%m-%d %H:%M')
DATE_ISO=$(date '+%Y-%m-%dT%H:%M:%S%z')
cat > "$DIR/report.md" << EOF
# 测试报告:$TITLE
## 范围
<!-- 测试目标 / 变更范围 / 重点风险 -->
- 分支:\`$BRANCH\`
- 当前提交:\`$COMMIT\`
- 日期:$DATE_HUMAN
- 表面:<!-- CLI / Electron + CDP / Web / Bot:<platform> -->
- 测试页 / 入口:<!-- e.g. /settings or http://localhost:3010 -->
- 重点:<!-- 本轮最关心的体验、功能或回归点 -->
## 用例
| # | 用例 | 结果 | 关键现象 | 证据 |
| - | ---- | ---- | -------- | ---- |
| 1 | | 待测 | | ![用例 1](assets/case1.png) |
## 结论
整体结论:\`pending\`。
<!-- 用 1-2 段概括用户最需要知道的结果;失败和阻塞必须明确说明影响。 -->
仍需处理 / 跟进:
- <!-- TODO -->
## 本轮验证
<!-- 如有自动化或命令行验证,保留精简命令与结果;没有则写“未运行额外自动化验证”。 -->
\`\`\`bash
# command
\`\`\`
结果:
- <!-- TODO -->
## 评分
- 通过:0
- 失败:0
- 阻塞:0
- 评分:— / 100
EOF
cat > "$DIR/result.json" << EOF
{
"title": "$TITLE",
"createdAt": "$DATE_ISO",
"branch": "$BRANCH",
"commit": "$COMMIT",
"surfaces": [],
"cases": [],
"summary": { "total": 0, "passed": 0, "failed": 0, "blocked": 0, "verdict": "pending" }
}
EOF
echo "$DIR"
@@ -1,560 +0,0 @@
#!/usr/bin/env bash
# setup-auth.sh — one-stop auth setup & check for local agent testing.
#
# Auth is the gate for all automated testing: prepare it BEFORE writing any
# test step. Background and failure modes: ../references/auth.md
#
# Usage:
# setup-auth.sh status # check server + CLI + web + Electron readiness
# setup-auth.sh status --surface web # check only the Web surface gate
# setup-auth.sh cli-seed # configure CLI API-key auth from seeded local env
# setup-auth.sh cli # interactive CLI device-code login (run by a human)
# setup-auth.sh open-chrome # open SERVER_URL in Chrome and show DevTools
# setup-auth.sh web-seed # sign in seeded user and inject cookies automatically
# setup-auth.sh web # stdin = Cookie header -> inject into agent-browser session
# setup-auth.sh web-verify # live-check the agent-browser session is authenticated
#
# Env:
# SERVER_URL (default from test-env.sh) dev server under test
# SESSION (default lobehub-dev) agent-browser session name
# AUTH_DIR (default ~/.lobehub-agent-testing) where web state is persisted
# SEED_EMAIL / SEED_PASSWORD seeded better-auth login
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
workspace_root_for_port() {
local root="$REPO_ROOT"
local name
name="$(basename "$root")"
if [[ "$name" == "lobehub" ]]; then
local parent
parent="$(cd "$root/.." && pwd)"
local parent_name
parent_name="$(basename "$parent")"
if [[ "$parent_name" == lobehub-cloud* ]]; then
root="$parent"
fi
fi
printf '%s\n' "$root"
}
default_server_url() {
local env_resolver resolved
env_resolver="$(dirname "${BASH_SOURCE[0]}")/test-env.sh"
if [[ -x "$env_resolver" ]]; then
resolved="$("$env_resolver" --value SERVER_URL 2> /dev/null || true)"
if [[ -n "$resolved" ]]; then
printf '%s\n' "$resolved"
return 0
fi
fi
local root name suffix port
root="$(workspace_root_for_port)"
name="$(basename "$root")"
case "$name" in
lobehub-cloud)
port=3020
;;
lobehub-cloud-*)
suffix="${name#lobehub-cloud-}"
if [[ "$suffix" =~ ^[0-9]+$ ]]; then
port=$((3020 + 10#$suffix))
else
port=3010
fi
;;
*)
port=3010
;;
esac
printf 'http://localhost:%s\n' "$port"
}
SERVER_URL="${SERVER_URL:-$(default_server_url)}"
SESSION="${SESSION:-lobehub-dev}"
AUTH_DIR="${AUTH_DIR:-$HOME/.lobehub-agent-testing}"
STATE_FILE="$AUTH_DIR/web-state.json"
ROOT_ENV_FILE="$REPO_ROOT/.env"
CLI_HOME_NAME="${LOBEHUB_CLI_HOME:-.lobehub-dev}"
CLI_HOME="$HOME/${CLI_HOME_NAME#/}"
CLI_CREDENTIALS_FILE="$CLI_HOME/credentials.json"
SEED_EMAIL="${SEED_EMAIL:-agent-testing@lobehub.com}"
SEED_PASSWORD="${SEED_PASSWORD:-TestPassword123!}"
SEED_API_KEY="${SEED_API_KEY:-${AGENT_TESTING_API_KEY:-sk-lh-agenttesting0001}}"
CLI_ENV_FILE="${CLI_ENV_FILE:-$REPO_ROOT/.records/env/agent-testing-cli.env}"
ok() { printf ' \033[32m✔\033[0m %s\n' "$1"; }
bad() { printf ' \033[31m✘\033[0m %s\n' "$1"; }
note() { printf ' %s\n' "$1"; }
usage() {
cat << EOF
Usage:
$0 status [--surface all|cli|web|electron]
$0 cli-seed
$0 cli
$0 open-chrome [--dry-run]
$0 web-seed
$0 web
$0 web-verify
Env:
SERVER_URL=$SERVER_URL
SESSION=$SESSION
AUTH_DIR=$AUTH_DIR
SEED_EMAIL=$SEED_EMAIL
CLI_HOME=$CLI_HOME
EOF
}
check_server() {
local code
code=$(curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/" 2> /dev/null || true)
if [[ "$code" =~ ^[23] ]]; then
ok "dev server reachable at $SERVER_URL"
else
bad "dev server NOT reachable at $SERVER_URL (http_code='$code')"
note "start it: pnpm run dev:next (see references/dev-server.md)"
return 1
fi
}
check_cli() {
local api_key="${LOBEHUB_CLI_API_KEY:-${LOBE_API_KEY:-}}"
if [[ -n "$api_key" ]]; then
local body_file code
body_file="$(mktemp)"
code=$(curl -sS -o "$body_file" -w '%{http_code}' \
-H "Authorization: Bearer $api_key" \
"$SERVER_URL/api/v1/users/me?includeCount=0" 2> /dev/null || true)
if [[ "$code" =~ ^[23] ]]; then
rm -f "$body_file"
ok "CLI API-key auth valid for $SERVER_URL"
return 0
fi
bad "CLI API-key auth failed for $SERVER_URL (http_code='$code')"
note "seed the local API key first:"
note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user"
note "source $CLI_ENV_FILE"
rm -f "$body_file"
return 1
fi
if [[ -f "$CLI_HOME/settings.json" ]] && grep -q "$SERVER_URL" "$CLI_HOME/settings.json" && [[ -f "$CLI_CREDENTIALS_FILE" ]]; then
ok "CLI device-code credentials configured for $SERVER_URL (creds: $CLI_HOME)"
else
bad "CLI not logged in to $SERVER_URL"
note "automated path:"
note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user && source $CLI_ENV_FILE && $0 cli-seed"
note "interactive fallback:"
note "cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server $SERVER_URL"
return 1
fi
}
check_web() {
if [[ -f "$STATE_FILE" ]]; then
ok "web auth state saved ($STATE_FILE)"
else
bad "no web auth state for agent-browser"
note "for the seeded local user, run: $0 web-seed"
note "or copy the Cookie header from Chrome DevTools (Network tab), then:"
note "pbpaste | $0 web (see references/auth.md)"
return 1
fi
cmd_web_verify --skip-server-check
}
check_agent_browser() {
if command -v agent-browser > /dev/null 2>&1; then
ok "agent-browser available"
else
bad "agent-browser command not found"
note "install or expose agent-browser before Web/Electron UI testing"
return 1
fi
}
check_electron() {
local cdp_port="${CDP_PORT:-9222}"
if ! curl -s -o /dev/null --max-time 2 "http://localhost:$cdp_port/json/version" 2> /dev/null; then
note "electron: not running (CDP $cdp_port unreachable) — start with electron-dev.sh; check skipped"
return 0
fi
local probe result
probe="$(dirname "${BASH_SOURCE[0]}")/app-probe.sh"
result=$(bash "$probe" auth 2> /dev/null || true)
# agent-browser eval returns the JSON string with escaped quotes — normalize.
result="${result//\\/}"
if [[ "$result" == *'"isSignedIn":true'* ]]; then
ok "electron app signed in ($result)"
else
bad "electron app NOT signed in ($result)"
note "log in once manually inside the app (state persists across restarts)"
return 1
fi
}
cmd_status() {
local surface="all"
while [[ $# -gt 0 ]]; do
case "$1" in
--surface)
if [[ $# -lt 2 ]]; then
echo "--surface requires one of: all, cli, web, electron" >&2
return 2
fi
surface="${2:-}"
shift 2
;;
--surface=*)
surface="${1#*=}"
shift
;;
all|cli|web|electron)
surface="$1"
shift
;;
-h|--help)
usage
return 0
;;
*)
echo "unknown status option: $1" >&2
usage >&2
return 2
;;
esac
done
case "$surface" in
all|cli|web|electron) ;;
"")
echo "--surface requires one of: all, cli, web, electron" >&2
return 2
;;
*)
echo "unknown surface: $surface" >&2
usage >&2
return 2
;;
esac
echo "agent-testing auth status (surface=$surface, SERVER_URL=$SERVER_URL):"
local rc=0
case "$surface" in
all)
check_server || rc=1
check_cli || rc=1
check_web || rc=1
check_electron || rc=1
;;
cli)
check_server || rc=1
check_cli || rc=1
;;
web)
check_server || rc=1
check_web || rc=1
;;
electron)
check_electron || rc=1
;;
esac
if [[ $rc -eq 0 ]]; then
echo "$surface auth green — safe to start automated testing on this surface."
else
echo "$surface auth NOT ready — fix the ✘ items before writing any test step."
fi
return $rc
}
cmd_cli() {
echo "Starting CLI device-code login against $SERVER_URL ..."
echo "(opens a browser authorization — must be run by a human in a terminal)"
cd "$REPO_ROOT/apps/cli"
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server "$SERVER_URL"
}
write_cli_seed_env() {
mkdir -p "$(dirname "$CLI_ENV_FILE")"
cat > "$CLI_ENV_FILE" << EOF
# Source this file before running LobeHub CLI agent tests.
# Generated by setup-auth.sh cli-seed
export LOBE_API_KEY=$SEED_API_KEY
export LOBEHUB_CLI_API_KEY="\${LOBE_API_KEY}"
export LOBEHUB_SERVER=$SERVER_URL
export LOBEHUB_CLI_HOME=.lobehub-dev
EOF
}
write_cli_settings() {
mkdir -p "$CLI_HOME"
python3 - "$CLI_HOME/settings.json" "$SERVER_URL" << 'PY'
import json
import os
import sys
path, server_url = sys.argv[1], sys.argv[2]
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w") as f:
json.dump({"serverUrl": server_url}, f, indent=2)
f.write("\n")
os.chmod(path, 0o600)
PY
}
cmd_cli_seed() {
check_server || return 1
write_cli_seed_env
write_cli_settings
ok "wrote CLI seed env: $CLI_ENV_FILE"
note "source it before CLI commands: source $CLI_ENV_FILE"
note "settings saved at: $CLI_HOME/settings.json"
LOBE_API_KEY="$SEED_API_KEY" LOBEHUB_CLI_API_KEY="$SEED_API_KEY" check_cli
}
cmd_open_chrome() {
local mode="${1:-}"
if [[ "$mode" != "" && "$mode" != "--dry-run" ]]; then
echo "unknown open-chrome option: $mode" >&2
usage >&2
return 2
fi
if [[ "$mode" == "--dry-run" ]]; then
echo "would open Google Chrome at $SERVER_URL/"
echo "would press Cmd+Option+I to open DevTools"
echo "would open DevTools command menu and run 'Show Network'"
return 0
fi
if [[ "$(uname -s)" != "Darwin" ]]; then
bad "open-chrome is macOS-only"
note "open $SERVER_URL/ in your browser and open DevTools manually"
return 1
fi
if ! command -v osascript > /dev/null 2>&1; then
bad "osascript not found"
note "open $SERVER_URL/ in Chrome and press Cmd+Option+I manually"
return 1
fi
SERVER_URL="$SERVER_URL" osascript << 'OSA'
set targetUrl to (system attribute "SERVER_URL") & "/"
tell application "Google Chrome"
activate
if (count of windows) = 0 then
make new window
end if
tell front window to make new tab with properties {URL:targetUrl}
end tell
delay 1
tell application "System Events"
tell process "Google Chrome"
set frontmost to true
keystroke "i" using {command down, option down}
delay 1
keystroke "p" using {command down, shift down}
delay 0.2
keystroke "Show Network"
key code 36
end tell
end tell
OSA
ok "opened Chrome at $SERVER_URL/ and requested DevTools Network panel"
}
cookie_header_from_jar() {
local jar="$1"
awk '
BEGIN { first = 1 }
/^$/ { next }
/^#/ {
if ($0 !~ /^#HttpOnly_/) next
sub(/^#HttpOnly_/, "")
}
NF >= 7 {
if (!first) printf "; "
printf "%s=%s", $6, $7
first = 0
}
END {
if (!first) printf "\n"
}
' "$jar"
}
# Build a Playwright storageState file from a raw Cookie header on stdin,
# keeping only the better-auth cookies. See references/auth.md for why the
# header must come from a Network request (HttpOnly) and why httpOnly=false.
cmd_web() {
mkdir -p "$AUTH_DIR"
local raw
raw="$(cat)"
COOKIE_INPUT="$raw" python3 - "$STATE_FILE" << 'PY'
import json, os, sys, time
raw = os.environ.get("COOKIE_INPUT", "").strip()
cookie_lines = []
for line in raw.splitlines():
stripped = line.strip()
if not stripped:
continue
if stripped.lower().startswith("cookie:"):
cookie_lines.append(stripped.split(":", 1)[1].strip())
else:
cookie_lines.append(stripped)
raw = "; ".join(cookie_lines)
WANTED = {"better-auth.session_token", "better-auth.session_data", "better-auth.state"}
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
cookies = []
for pair in raw.split(";"):
pair = pair.strip()
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input — paste the raw Cookie header from a Network request\n")
sys.exit(1)
with open(sys.argv[1], "w") as f:
json.dump({"cookies": cookies, "origins": []}, f, indent=2)
print(f"wrote {len(cookies)} cookie(s) to {sys.argv[1]}")
PY
cmd_web_verify
}
cmd_web_seed() {
check_server || return 1
mkdir -p "$AUTH_DIR"
local cookie_jar="$AUTH_DIR/web-seed-cookie.jar"
local response_body="$AUTH_DIR/web-seed-response.json"
local payload code
payload="$(
SEED_EMAIL="$SEED_EMAIL" SEED_PASSWORD="$SEED_PASSWORD" python3 - << 'PY'
import json
import os
print(json.dumps({
"callbackURL": "/",
"email": os.environ["SEED_EMAIL"],
"password": os.environ["SEED_PASSWORD"],
}))
PY
)"
code=$(curl -sS -o "$response_body" -w '%{http_code}' \
-c "$cookie_jar" \
-H 'Content-Type: application/json' \
-X POST "$SERVER_URL/api/auth/sign-in/email" \
--data "$payload" 2> /dev/null || true)
if [[ ! "$code" =~ ^[23] ]]; then
bad "seed user sign-in failed at $SERVER_URL/api/auth/sign-in/email (http_code='$code')"
if [[ -f "$ROOT_ENV_FILE" ]]; then
note "root .env exists; do not seed or modify this DB for Web auth."
note "Use Chrome Cookie injection instead: $0 open-chrome, then pbpaste | $0 web"
else
note "make sure the seed user exists:"
note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user"
fi
return 1
fi
local cookie_header
cookie_header="$(cookie_header_from_jar "$cookie_jar")"
if [[ -z "$cookie_header" ]]; then
bad "seed sign-in succeeded but no cookies were written to $cookie_jar"
return 1
fi
printf '%s\n' "$cookie_header" | cmd_web
}
cmd_web_verify() {
local skip_server_check="${1:-}"
if [[ "$skip_server_check" != "--skip-server-check" ]]; then
check_server || return 1
fi
if [[ ! -f "$STATE_FILE" ]]; then
bad "no web auth state for agent-browser"
note "for the seeded local user, run: $0 web-seed"
note "or copy the Cookie header from Chrome DevTools (Network tab), then:"
note "pbpaste | $0 web"
return 1
fi
check_agent_browser || return 1
if ! agent-browser --session "$SESSION" state load "$STATE_FILE" > /dev/null; then
bad "failed to load web auth state into agent-browser session '$SESSION'"
return 1
fi
if ! agent-browser --session "$SESSION" open "$SERVER_URL/" > /dev/null; then
bad "failed to open $SERVER_URL in agent-browser session '$SESSION'"
return 1
fi
agent-browser --session "$SESSION" wait --load networkidle > /dev/null 2>&1 || true
local url
url=$(agent-browser --session "$SESSION" get url 2> /dev/null || true)
if [[ -z "$url" ]]; then
bad "agent-browser session '$SESSION' did not report a current URL"
return 1
fi
if [[ "$url" == *"/signin"* || "$url" == *"/login"* ]]; then
bad "agent-browser session '$SESSION' NOT authenticated (landed on $url)"
note "re-copy the Cookie header and re-run: pbpaste | $0 web"
return 1
fi
ok "agent-browser session '$SESSION' authenticated (at $url)"
}
case "${1:-status}" in
status)
shift || true
cmd_status "$@"
;;
cli-seed) cmd_cli_seed ;;
cli) cmd_cli ;;
open-chrome)
shift || true
cmd_open_chrome "$@"
;;
web-seed) cmd_web_seed ;;
web) cmd_web ;;
web-verify) cmd_web_verify ;;
-h|--help) usage ;;
*)
echo "Usage: $0 {status|cli-seed|cli|open-chrome|web-seed|web|web-verify}" >&2
exit 2
;;
esac
@@ -1,197 +0,0 @@
#!/usr/bin/env bash
# Smoke tests for setup-auth.sh. Uses a temporary agent-browser stub and local
# HTTP server, so it does not need real browser auth.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT="$SCRIPT_DIR/setup-auth.sh"
fail() {
echo "FAIL: $*" >&2
exit 1
}
assert_contains() {
local file="$1"
local text="$2"
grep -Fq "$text" "$file" || fail "expected '$text' in $file"
}
tmp_dir="$(mktemp -d)"
server_pid=""
cleanup() {
if [[ -n "$server_pid" ]]; then
kill "$server_pid" > /dev/null 2>&1 || true
wait "$server_pid" > /dev/null 2>&1 || true
fi
rm -rf "$tmp_dir"
}
trap cleanup EXIT
export HOME="$tmp_dir/home"
port="$(python3 - << 'PY'
import socket
sock = socket.socket()
sock.bind(("127.0.0.1", 0))
print(sock.getsockname()[1])
sock.close()
PY
)"
python3 - "$port" << 'PY' > "$tmp_dir/http.log" 2>&1 &
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
import sys
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path.startswith("/api/v1/users/me"):
if self.headers.get("authorization") != "Bearer sk-lh-agenttesting0001":
self.send_response(401)
self.end_headers()
self.wfile.write(b'{"success":false}')
return
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{"success":true,"data":{"id":"user_agent_testing_001"}}')
return
self.send_response(200)
self.end_headers()
self.wfile.write(b"ok")
def do_POST(self):
length = int(self.headers.get("content-length") or "0")
if length:
self.rfile.read(length)
if self.path != "/api/auth/sign-in/email":
self.send_response(404)
self.end_headers()
return
self.send_response(200)
self.send_header(
"Set-Cookie",
"better-auth.session_token=seed.token; Path=/; HttpOnly; SameSite=Lax",
)
self.send_header(
"Set-Cookie",
"better-auth.session_data=seed.data; Path=/; HttpOnly; SameSite=Lax",
)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{"ok":true}')
def log_message(self, format, *args):
return
ThreadingHTTPServer(("localhost", int(sys.argv[1])), Handler).serve_forever()
PY
server_pid="$!"
server_url="http://localhost:$port"
for _ in {1..50}; do
if curl -s -o /dev/null "$server_url/"; then
break
fi
sleep 0.1
done
curl -s -o /dev/null "$server_url/" || fail "test HTTP server did not start"
mkdir -p "$tmp_dir/bin" "$tmp_dir/auth"
cat > "$tmp_dir/bin/agent-browser" << 'SH'
#!/usr/bin/env bash
set -euo pipefail
if [[ "${1:-}" == "--session" ]]; then
shift 2
fi
case "${1:-}" in
state)
[[ "${2:-}" == "load" ]] || exit 2
[[ -f "${3:-}" ]] || exit 1
;;
open)
printf '%s\n' "${2:-}" > "${AGENT_BROWSER_URL_FILE:?}"
;;
get)
[[ "${2:-}" == "url" ]] || exit 2
cat "${AGENT_BROWSER_URL_FILE:?}"
;;
*)
echo "unexpected agent-browser command: $*" >&2
exit 2
;;
esac
SH
chmod +x "$tmp_dir/bin/agent-browser"
export PATH="$tmp_dir/bin:$PATH"
export AUTH_DIR="$tmp_dir/auth"
export SESSION="setup-auth-test"
export SERVER_URL="$server_url"
export AGENT_BROWSER_URL_FILE="$tmp_dir/current-url"
cookie_header="Cookie: foo=bar; better-auth.session_token=test.token; better-auth.session_data=encoded%3D; theme=dark"
printf '%s\n' "$cookie_header" | "$SCRIPT" web > "$tmp_dir/web.out"
python3 - "$AUTH_DIR/web-state.json" << 'PY'
import json, sys
with open(sys.argv[1]) as f:
state = json.load(f)
names = {cookie["name"] for cookie in state["cookies"]}
expected = {"better-auth.session_token", "better-auth.session_data"}
if names != expected:
raise SystemExit(f"unexpected cookies: {sorted(names)}")
PY
"$SCRIPT" web-seed > "$tmp_dir/web-seed.out"
python3 - "$AUTH_DIR/web-state.json" << 'PY'
import json, sys
with open(sys.argv[1]) as f:
state = json.load(f)
values = {cookie["name"]: cookie["value"] for cookie in state["cookies"]}
expected = {
"better-auth.session_token": "seed.token",
"better-auth.session_data": "seed.data",
}
if values != expected:
raise SystemExit(f"unexpected seeded cookies: {values}")
PY
"$SCRIPT" status --surface web > "$tmp_dir/status.out"
assert_contains "$tmp_dir/status.out" "surface=web"
assert_contains "$tmp_dir/status.out" "web auth green"
"$SCRIPT" cli-seed > "$tmp_dir/cli-seed.out"
assert_contains "$tmp_dir/cli-seed.out" "CLI API-key auth valid"
assert_contains "$tmp_dir/cli-seed.out" "settings saved at: $HOME/.lobehub-dev/settings.json"
if "$SCRIPT" status --surface cli > "$tmp_dir/cli-no-env.out"; then
fail "cli status without API key unexpectedly passed"
fi
assert_contains "$tmp_dir/cli-no-env.out" "CLI not logged in"
LOBEHUB_CLI_API_KEY=sk-lh-agenttesting0001 "$SCRIPT" status --surface cli > "$tmp_dir/cli-status.out"
assert_contains "$tmp_dir/cli-status.out" "CLI API-key auth valid"
assert_contains "$tmp_dir/cli-status.out" "cli auth green"
if printf 'foo=bar\n' | "$SCRIPT" web > "$tmp_dir/invalid.out" 2> "$tmp_dir/invalid.err"; then
fail "invalid cookie unexpectedly passed"
fi
assert_contains "$tmp_dir/invalid.err" "no better-auth cookies found"
echo "setup-auth tests passed"
@@ -1,377 +0,0 @@
#!/usr/bin/env bash
# Print the resolved local test environment for agent-testing.
#
# This is intentionally read-only. It mirrors scripts/runWithEnv.mts precedence:
# .env -> .env.$NODE_ENV -> .env.local -> .env.$NODE_ENV.local, then shell env.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
NODE_ENV="${NODE_ENV:-development}"
VALUE_APP_URL=""
VALUE_PORT=""
VALUE_SERVER_URL=""
VALUE_AUTH_TRUSTED_ORIGINS=""
VALUE_SPA_PORT=""
VALUE_MOBILE_SPA_PORT=""
VALUE_DESKTOP_PORT=""
SOURCE_APP_URL=""
SOURCE_PORT=""
SOURCE_SERVER_URL=""
SOURCE_AUTH_TRUSTED_ORIGINS=""
SOURCE_SPA_PORT=""
SOURCE_MOBILE_SPA_PORT=""
SOURCE_DESKTOP_PORT=""
LOADED_ENV_FILES=""
keys() {
printf '%s\n' \
APP_URL \
PORT \
SERVER_URL \
AUTH_TRUSTED_ORIGINS \
SPA_PORT \
MOBILE_SPA_PORT \
DESKTOP_PORT
}
trim() {
local value="$1"
value="${value#"${value%%[![:space:]]*}"}"
value="${value%"${value##*[![:space:]]}"}"
printf '%s' "$value"
}
workspace_root() {
local root="$REPO_ROOT"
local name
name="$(basename "$root")"
if [[ "$name" == "lobehub" ]]; then
local parent parent_name
parent="$(cd "$root/.." && pwd)"
parent_name="$(basename "$parent")"
if [[ "$parent_name" == lobehub-cloud* ]]; then
root="$parent"
fi
fi
printf '%s\n' "$root"
}
workspace_offset() {
local name="$1"
case "$name" in
lobehub-cloud)
printf '0\n'
;;
lobehub-cloud-*)
local suffix="${name#lobehub-cloud-}"
if [[ "$suffix" =~ ^[0-9]+$ ]]; then
printf '%s\n' "$((10#$suffix))"
else
printf '\n'
fi
;;
*)
printf '\n'
;;
esac
}
default_port() {
local base="$1"
local fallback="$2"
local root name offset
root="$(workspace_root)"
name="$(basename "$root")"
offset="$(workspace_offset "$name")"
if [[ -n "$offset" ]]; then
printf '%s\n' "$((base + offset))"
else
printf '%s\n' "$fallback"
fi
}
url_port() {
local url="$1"
local hostport
hostport="${url#*://}"
hostport="${hostport%%/*}"
if [[ "$hostport" == *:* ]]; then
local port="${hostport##*:}"
if [[ "$port" =~ ^[0-9]+$ ]]; then
printf '%s\n' "$port"
return 0
fi
fi
return 1
}
url_origin() {
local url="$1"
local scheme rest hostport
if [[ "$url" == *"://"* ]]; then
scheme="${url%%://*}"
rest="${url#*://}"
hostport="${rest%%/*}"
printf '%s://%s\n' "$scheme" "$hostport"
else
printf '%s\n' "$url"
fi
}
set_value() {
local key="$1"
local value="$2"
local source="$3"
case "$key" in
APP_URL) VALUE_APP_URL="$value"; SOURCE_APP_URL="$source" ;;
PORT) VALUE_PORT="$value"; SOURCE_PORT="$source" ;;
SERVER_URL) VALUE_SERVER_URL="$value"; SOURCE_SERVER_URL="$source" ;;
AUTH_TRUSTED_ORIGINS) VALUE_AUTH_TRUSTED_ORIGINS="$value"; SOURCE_AUTH_TRUSTED_ORIGINS="$source" ;;
SPA_PORT) VALUE_SPA_PORT="$value"; SOURCE_SPA_PORT="$source" ;;
MOBILE_SPA_PORT) VALUE_MOBILE_SPA_PORT="$value"; SOURCE_MOBILE_SPA_PORT="$source" ;;
DESKTOP_PORT) VALUE_DESKTOP_PORT="$value"; SOURCE_DESKTOP_PORT="$source" ;;
esac
}
value_for() {
case "$1" in
APP_URL) printf '%s\n' "$VALUE_APP_URL" ;;
PORT) printf '%s\n' "$VALUE_PORT" ;;
SERVER_URL) printf '%s\n' "$VALUE_SERVER_URL" ;;
AUTH_TRUSTED_ORIGINS) printf '%s\n' "$VALUE_AUTH_TRUSTED_ORIGINS" ;;
SPA_PORT) printf '%s\n' "$VALUE_SPA_PORT" ;;
MOBILE_SPA_PORT) printf '%s\n' "$VALUE_MOBILE_SPA_PORT" ;;
DESKTOP_PORT) printf '%s\n' "$VALUE_DESKTOP_PORT" ;;
esac
}
source_for() {
case "$1" in
APP_URL) printf '%s\n' "$SOURCE_APP_URL" ;;
PORT) printf '%s\n' "$SOURCE_PORT" ;;
SERVER_URL) printf '%s\n' "$SOURCE_SERVER_URL" ;;
AUTH_TRUSTED_ORIGINS) printf '%s\n' "$SOURCE_AUTH_TRUSTED_ORIGINS" ;;
SPA_PORT) printf '%s\n' "$SOURCE_SPA_PORT" ;;
MOBILE_SPA_PORT) printf '%s\n' "$SOURCE_MOBILE_SPA_PORT" ;;
DESKTOP_PORT) printf '%s\n' "$SOURCE_DESKTOP_PORT" ;;
esac
}
is_tracked_key() {
case "$1" in
APP_URL|PORT|SERVER_URL|AUTH_TRUSTED_ORIGINS|SPA_PORT|MOBILE_SPA_PORT|DESKTOP_PORT) return 0 ;;
*) return 1 ;;
esac
}
parse_env_file() {
local file="$1"
local root="$2"
local label="${file#$root/}"
local line key value
[[ -f "$file" ]] || return 0
if [[ -z "$LOADED_ENV_FILES" ]]; then
LOADED_ENV_FILES="$label"
else
LOADED_ENV_FILES="$LOADED_ENV_FILES, $label"
fi
while IFS= read -r line || [[ -n "$line" ]]; do
line="$(trim "$line")"
[[ -z "$line" || "$line" == \#* ]] && continue
if [[ "$line" == export[[:space:]]* ]]; then
line="$(trim "${line#export}")"
fi
[[ "$line" == *=* ]] || continue
key="$(trim "${line%%=*}")"
value="$(trim "${line#*=}")"
is_tracked_key "$key" || continue
if [[ "$value" == \"*\" && "$value" == *\" && ${#value} -ge 2 ]]; then
value="${value:1:${#value}-2}"
elif [[ "$value" == \'* && "$value" == *\' && ${#value} -ge 2 ]]; then
value="${value:1:${#value}-2}"
fi
set_value "$key" "$value" "$label"
done < "$file"
}
apply_env_files() {
local root="$1"
parse_env_file "$root/.env" "$root"
parse_env_file "$root/.env.$NODE_ENV" "$root"
parse_env_file "$root/.env.local" "$root"
parse_env_file "$root/.env.$NODE_ENV.local" "$root"
}
apply_shell_overrides() {
local key value
while IFS= read -r key; do
if [[ -n "${!key+x}" ]]; then
value="${!key}"
set_value "$key" "$value" "shell"
fi
done < <(keys)
}
resolve_defaults() {
local app_port spa_port mobile_spa_port desktop_port
app_port="$(default_port 3020 3010)"
spa_port="$(default_port 9800 9876)"
mobile_spa_port="$(default_port 3810 3012)"
desktop_port="$(default_port 3030 3015)"
if [[ -z "$VALUE_APP_URL" ]]; then
set_value APP_URL "http://localhost:$app_port" "inferred"
fi
if [[ -z "$VALUE_PORT" ]]; then
if app_port="$(url_port "$VALUE_APP_URL")"; then
set_value PORT "$app_port" "inferred from APP_URL"
else
set_value PORT "$(default_port 3020 3010)" "inferred"
fi
fi
if [[ -z "$VALUE_SERVER_URL" ]]; then
set_value SERVER_URL "$VALUE_APP_URL" "from APP_URL"
fi
if [[ -z "$VALUE_SPA_PORT" ]]; then
set_value SPA_PORT "$spa_port" "inferred"
fi
if [[ -z "$VALUE_MOBILE_SPA_PORT" ]]; then
set_value MOBILE_SPA_PORT "$mobile_spa_port" "inferred"
fi
if [[ -z "$VALUE_DESKTOP_PORT" ]]; then
set_value DESKTOP_PORT "$desktop_port" "inferred"
fi
if [[ -z "$VALUE_AUTH_TRUSTED_ORIGINS" ]]; then
set_value AUTH_TRUSTED_ORIGINS "$(url_origin "$VALUE_APP_URL"),http://localhost:$VALUE_SPA_PORT" "inferred"
fi
}
contains_origin() {
local list="$1"
local expected="$2"
local item
IFS=',' read -r -a items <<< "$list"
for item in "${items[@]}"; do
item="$(trim "$item")"
[[ "$item" == "$expected" ]] && return 0
done
return 1
}
print_exports() {
local key value
while IFS= read -r key; do
value="$(value_for "$key")"
printf 'export %s=%q\n' "$key" "$value"
done < <(keys)
}
print_value() {
local key="$1"
if ! is_tracked_key "$key"; then
echo "unknown key: $key" >&2
exit 2
fi
value_for "$key"
}
print_human() {
local root="$1"
local key value source
echo "agent-testing test env:"
printf ' workspace: %s\n' "$root"
printf ' NODE_ENV: %s\n' "$NODE_ENV"
printf ' env files: %s\n' "${LOADED_ENV_FILES:-none}"
echo
echo "resolved values:"
while IFS= read -r key; do
value="$(value_for "$key")"
source="$(source_for "$key")"
printf ' %-22s %s (%s)\n' "$key=$value" "" "$source"
done < <(keys)
echo
echo "checks:"
local app_origin spa_origin app_port
app_origin="$(url_origin "$VALUE_APP_URL")"
spa_origin="http://localhost:$VALUE_SPA_PORT"
if app_port="$(url_port "$VALUE_APP_URL")" && [[ "$app_port" == "$VALUE_PORT" ]]; then
printf ' OK PORT matches APP_URL (%s)\n' "$VALUE_PORT"
else
printf ' WARN PORT (%s) does not match APP_URL (%s)\n' "$VALUE_PORT" "$VALUE_APP_URL"
fi
if contains_origin "$VALUE_AUTH_TRUSTED_ORIGINS" "$app_origin"; then
printf ' OK AUTH_TRUSTED_ORIGINS includes %s\n' "$app_origin"
else
printf ' WARN AUTH_TRUSTED_ORIGINS is missing %s\n' "$app_origin"
fi
if contains_origin "$VALUE_AUTH_TRUSTED_ORIGINS" "$spa_origin"; then
printf ' OK AUTH_TRUSTED_ORIGINS includes %s\n' "$spa_origin"
else
printf ' WARN AUTH_TRUSTED_ORIGINS is missing %s\n' "$spa_origin"
fi
}
usage() {
cat << EOF
Usage:
$0 # print resolved test environment
$0 --exports # print source-able export lines
$0 --value KEY # print one resolved value
Tracked keys:
APP_URL PORT SERVER_URL AUTH_TRUSTED_ORIGINS SPA_PORT MOBILE_SPA_PORT DESKTOP_PORT
EOF
}
ROOT="$(workspace_root)"
apply_env_files "$ROOT"
apply_shell_overrides
resolve_defaults
case "${1:-}" in
"")
print_human "$ROOT"
;;
--exports)
print_exports
;;
--value)
print_value "${2:-}"
;;
-h|--help)
usage
;;
*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
esac
@@ -1,57 +0,0 @@
#!/usr/bin/env bash
# Smoke tests for test-env.sh.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
fail() {
echo "FAIL: $*" >&2
exit 1
}
assert_eq() {
local actual="$1"
local expected="$2"
[[ "$actual" == "$expected" ]] || fail "expected '$expected', got '$actual'"
}
assert_contains() {
local file="$1"
local text="$2"
grep -Fq "$text" "$file" || fail "expected '$text' in $file"
}
tmp_dir="$(mktemp -d)"
trap 'rm -rf "$tmp_dir"' EXIT
mkdir -p "$tmp_dir/lobehub-cloud-1/.agents/skills" "$tmp_dir/lobehub/.agents/skills"
ln -s "$SCRIPT_DIR/.." "$tmp_dir/lobehub-cloud-1/.agents/skills/agent-testing"
ln -s "$SCRIPT_DIR/.." "$tmp_dir/lobehub/.agents/skills/agent-testing"
cloud_script="$tmp_dir/lobehub-cloud-1/.agents/skills/agent-testing/scripts/test-env.sh"
oss_script="$tmp_dir/lobehub/.agents/skills/agent-testing/scripts/test-env.sh"
assert_eq "$("$cloud_script" --value SERVER_URL)" "http://localhost:3021"
assert_eq "$("$cloud_script" --value SPA_PORT)" "9801"
assert_eq "$("$cloud_script" --value MOBILE_SPA_PORT)" "3811"
assert_eq "$("$cloud_script" --value DESKTOP_PORT)" "3031"
assert_eq "$("$oss_script" --value SERVER_URL)" "http://localhost:3010"
cat > "$tmp_dir/lobehub-cloud-1/.env" << 'EOF'
APP_URL=http://localhost:4123
PORT=4123
AUTH_TRUSTED_ORIGINS=http://localhost:4123,http://localhost:9823
SPA_PORT=9823
MOBILE_SPA_PORT=3823
DESKTOP_PORT=3043
EOF
assert_eq "$("$cloud_script" --value SERVER_URL)" "http://localhost:4123"
assert_eq "$("$cloud_script" --value SPA_PORT)" "9823"
"$cloud_script" --exports > "$tmp_dir/exports.out"
assert_contains "$tmp_dir/exports.out" "export APP_URL=http://localhost:4123"
assert_contains "$tmp_dir/exports.out" "export SERVER_URL=http://localhost:4123"
assert_contains "$tmp_dir/exports.out" "export AUTH_TRUSTED_ORIGINS=http://localhost:4123\\,http://localhost:9823"
echo "test-env tests passed"
-154
View File
@@ -1,154 +0,0 @@
# Electron (LobeHub Desktop) UI Testing
Default surface for verifying **pure frontend changes** (components, store logic, styles, interactions) in the primary product shape. Drives the Electron renderer over CDP with `agent-browser` — see [../references/agent-browser.md](../references/agent-browser.md) for the full command reference.
**Auth**: the Electron app keeps its own persistent login state — log in once manually in the app; sessions survive restarts. Run `../scripts/setup-auth.sh status` before testing (see [../references/auth.md](../references/auth.md)).
**Linux / headless (cloud)**: Electron itself runs on Linux, but it has no true headless mode — it needs a display server. In a headless environment wrap the launch with `xvfb-run` (virtual framebuffer). Everything CDP-based keeps working under Xvfb: the `agent-browser --cdp 9222` connection, snapshots, eval, and `agent-browser screenshot` (captured from the renderer via CDP, not the OS screen). What does NOT work on Linux: `capture-app-window.sh` (macOS `screencapture`), osascript, and the ffmpeg recording scripts in their current form.
### Setup / Teardown
Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
```bash
SCRIPT=".agents/skills/agent-testing/scripts/electron-dev.sh"
# Start Electron dev with CDP (idempotent — skips if already running)
$SCRIPT start
# Check if Electron is running and CDP is reachable
$SCRIPT status
# Kill all Electron-related processes (main + helper + vite)
$SCRIPT stop
# Force fresh restart
$SCRIPT restart
```
After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
**Always run `$SCRIPT stop` when done testing**`pkill -f "Electron"` alone won't catch all helper processes.
#### Environment Variables
| Variable | Default | Description |
| ----------------- | ----------------------- | ---------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `ELECTRON_LOG` | `/tmp/electron-dev.log` | Electron process log |
| `ELECTRON_WAIT_S` | `60` | Max seconds to wait for Electron process |
| `RENDERER_WAIT_S` | `60` | Max seconds to wait for SPA to load |
### LobeHub Probes & Quick Navigation
`scripts/app-probe.sh` is the standard fast path into app state — **use it
instead of hand-rolling `__LOBE_STORES` eval snippets** for these common needs:
```bash
PROBE=".agents/skills/agent-testing/scripts/app-probe.sh"
$PROBE auth # login check (Step 0.3) → { isSignedIn, userId }
$PROBE route # current SPA route
$PROBE ops # running chat operations (type / startTime)
$PROBE goto /settings # jump the SPA straight to a route (full reload)
$PROBE errors-install # install console.error interceptor
$PROBE errors # dump captured errors
```
`goto` lets a test enter the state under test directly instead of clicking
through the UI. Common desktop routes:
| Route | Where it lands |
| ----------------------------- | ------------------------------------ |
| `/` | Home (has a chat input) |
| `/agent/<agentId>` | Agent conversation (latest topic) |
| `/agent/<agentId>/<topicId>` | Specific topic in a conversation |
| `/task` · `/task/<taskId>` | Task list / task detail |
| `/page` | Documents (文稿) |
| `/settings` | Settings |
| `/community` | Discover / community |
Targets default to Electron (`--cdp 9222`); set `AB_TARGET="--session <name>"`
for web sessions. For deeper or one-off state inspection, fall back to raw
eval below.
### LobeHub-Specific Patterns
#### Access Zustand Store State
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
return JSON.stringify({
ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
activeAgent: chat.activeAgentId,
activeTopic: chat.activeTopicId,
});
})()
EVALEOF
```
#### Find and Use the Chat Input
```bash
# The chat input is contenteditable — must use -C flag
agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
agent-browser --cdp 9222 click @e48
agent-browser --cdp 9222 type @e48 "Hello world"
agent-browser --cdp 9222 press Enter
```
#### Wait for Agent to Complete
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
var running = ops.filter(function(o) { return o.status === 'running'; });
return running.length === 0 ? 'done' : 'running: ' + running.length;
})()
EVALEOF
```
#### Install Error Interceptor
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function() {
var msg = Array.from(arguments).map(function(a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
# Later, check captured errors:
agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
```
## Electron Gotchas
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Dev build auto-opens DevTools, which hijacks the CDP target** — `agent-browser --cdp 9222` may attach to the DevTools page (`devtools://…`) instead of the app (`app://renderer/`). Symptom: `get url` returns a `devtools://` URL. Fix: close the DevTools target and reconnect:
```bash
DT_ID=$(curl -s http://localhost:9222/json/list | python3 -c "import json,sys; ts=json.load(sys.stdin); print(next(t['id'] for t in ts if t['type']=='page' and t['url'].startswith('devtools://')))")
curl -s "http://localhost:9222/json/close/$DT_ID" > /dev/null
agent-browser close --all && agent-browser --cdp 9222 get url # expect app://renderer/
```
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
- **Streaming / ticking UI needs GIF evidence** — see `scripts/record-gif.sh`; a static screenshot cannot prove time-based behavior.
-78
View File
@@ -1,78 +0,0 @@
# Web (Full-Stack) Testing
Default surface for **full-stack changes** — a new/changed API plus the UI that
consumes it. The browser is the one surface where network requests and UI state
are observable together, so you can assert both sides of the contract in a
single run.
For pure-frontend changes prefer [electron.md](./electron.md); for
backend-only changes prefer [../cli/index.md](../cli/index.md).
## Prerequisites
- Complete [Step 0.0](../SKILL.md#00-resolve-the-current-test-environment) (resolve ports) and [Step -1](../SKILL.md#step--1--plan-approval-for-non-trivial-tests) (plan approval) first.
- Local dev server running — [../references/dev-server.md](../references/dev-server.md)
- Web auth verified in agent-browser — prefer `setup-auth.sh web-seed`, see [auth decision flow](../references/auth.md#web--decision-flow).
## Option A — agent-browser with seeded auth (recommended)
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/setup-auth.sh web-seed
```
Then drive the verified session:
```bash
SESSION=lobehub-dev
agent-browser --session $SESSION open "$SERVER_URL/"
agent-browser --session $SESSION snapshot -i
# interact via refs — full command reference: ../references/agent-browser.md
```
Use this session as the evidence source. Do not use ordinary Chrome screenshots
or Chrome Network records as proof for Web tests; ordinary Chrome is only a
fallback source for copying cookies into agent-browser when the seeded login is
not available.
### Watch the API while driving the UI
```bash
# After triggering the UI action under test:
agent-browser --session $SESSION network requests --type xhr,fetch
agent-browser --session $SESSION network requests --method POST
# Record a full HAR for the report
agent-browser --session $SESSION network har start
# ... drive the scenario ...
agent-browser --session $SESSION network har stop ./capture.har
```
Assert both layers: the request/response shape (network) and the rendered
result (snapshot/screenshot). Both belong in the report as evidence.
## Option B — real Chrome with remote debugging
For flows that need a real, visible browser (e.g. exercising the login UI
itself):
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-test-profile \
"<URL>" &
sleep 5
agent-browser --cdp 9222 snapshot -i
# Or auto-discover running Chrome with remote debugging
agent-browser --auto-connect snapshot -i
```
## Option C — Debug Proxy (local frontend, production backend)
`bun run dev:spa` prints a **Debug Proxy** URL
(`https://app.lobehub.com/_dangerous_local_dev_proxy?debug-host=…`) that loads
your local Vite SPA inside the online environment — HMR against real server
config. Useful for verifying frontend behavior against production data, **not**
for testing backend changes (the backend is production, not your branch).
+172
View File
@@ -0,0 +1,172 @@
---
name: cli-backend-testing
description: >
CLI + Backend integration testing workflow. Use when verifying backend API changes
(TRPC routers, services, models) via the LobeHub CLI against a local dev server.
Triggers on 'cli test', 'test with cli', 'verify with cli', 'local cli test',
'backend test with cli', or when needing to validate server-side changes end-to-end.
---
# CLI + Backend Integration Testing
Standard workflow for verifying backend changes using the LobeHub CLI (`lh`) against a local dev server.
## When to Use
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
## Prerequisites
| Requirement | Details |
| ------------ | ------------------------------------------------------------- |
| Dev server | `localhost:3011` (Next.js) |
| CLI source | `lobehub/apps/cli/` |
| CLI dev mode | Uses `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials |
| Auth | Device Code Flow login to local server |
## Quick Reference
All CLI dev commands run from `lobehub/apps/cli/`. Subsequent examples use `$CLI`:
```bash
CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
```
## Workflow
### Step 1: Ensure Dev Server is Running
```bash
curl -s -o /dev/null -w '%{http_code}' http://localhost:3011/ 2> /dev/null
```
- **If reachable**: skip to Step 2.
- **If unreachable**: start from cloud repo root:
```bash
pnpm run dev:next
```
To **restart** (pick up server-side code changes):
```bash
lsof -ti:3011 | xargs kill
pnpm run dev:next
```
**Important:** Server-side code changes in the submodule (`lobehub/apps/server/src/`, `lobehub/src/server/`, `lobehub/packages/`) require a server restart. Next.js hot-reload may not pick up changes in submodule packages.
### Step 2: Check CLI Authentication
```bash
cat lobehub/apps/cli/.lobehub-dev/settings.json 2> /dev/null
```
- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: skip to Step 3.
- **If missing or wrong server**: ask the user to run:
```bash
! cd lobehub/apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3011
```
> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. Credentials persist in `lobehub/apps/cli/.lobehub-dev/`.
### Step 3: Test with CLI Commands
CLI runs from source, so CLI-side code changes take effect immediately without rebuilding.
```bash
cd lobehub/apps/cli
$CLI <command>
```
### Step 4: Clean Up Test Data
```bash
$CLI task delete < id > -y
$CLI agent delete < id > -y
```
## Common Testing Patterns
### Task System
```bash
$CLI task list
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
$CLI task view T-1
$CLI task tree T-1
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
$CLI task delete T-1 -y
```
### Agent System
```bash
$CLI agent list
$CLI agent view <agent-id>
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & Knowledge Base
```bash
$CLI doc list
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
$CLI kb list
$CLI kb tree <kb-id>
```
### Model & Provider
```bash
$CLI model list
$CLI provider list
$CLI provider test <provider-id>
```
## Dev-Test Cycle
```
1. Make code changes (service/model/router/type)
|
2. Run unit tests (fast feedback)
bunx vitest run --silent='passed-only' '<test-file>'
|
3. Restart dev server (if server-side changes)
lsof -ti:3011 | xargs kill && pnpm run dev:next
|
4. CLI verification (end-to-end)
$CLI <command>
|
5. Clean up test data
```
### When Server Restart is Needed
| Change Location | Restart? |
| ------------------------------------------------------- | -------- |
| `lobehub/apps/server/src/` (routers, services, modules) | Yes |
| `lobehub/src/server/` (agent-hono, workflows-hono) | Yes |
| `lobehub/packages/database/` (models) | Yes |
| `lobehub/packages/types/` | Yes |
| `lobehub/packages/prompts/` | Yes |
| `lobehub/apps/cli/` (CLI code) | No |
| `src/` (cloud overrides) | Yes |
## Troubleshooting
| Issue | Solution |
| --------------------------- | --------------------------------------------------------------------- |
| `No authentication found` | Run `login --server http://localhost:3011` |
| `UNAUTHORIZED` on API calls | Token expired; re-run login |
| `ECONNREFUSED` | Dev server not running; start with `pnpm run dev:next` |
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| `EADDRINUSE` on port 3011 | Server already running; kill with `lsof -ti:3011 \| xargs kill` |
| Login opens wrong server | Must use `--server http://localhost:3011` flag (env var doesn't work) |
+1 -5
View File
@@ -38,7 +38,7 @@ Use this skill when the bug or feature lives in the external CLI agent pipeline,
## Default Debug Order
1. Prove whether the raw CLI output is correct before touching UI code. The app records every real session — read the most recent one via `cat .heerogeneous-tracing/.last-live-trace` rather than hand-rolling a `claude -p` repro (see references/debug-workflow\.md §2).
1. Prove whether the raw CLI output is correct before touching UI code.
2. If raw output is correct, compare it with adapter output. In dev, `executeHeterogeneousAgent` exposes `window.__HETERO_AGENT_TRACE`.
3. If adapted events look correct, inspect `persistToolBatch`, `persistToolResult`, step transitions, and subagent routing.
4. Turn the repro into a focused test before fixing.
@@ -77,10 +77,6 @@ Use this skill when the bug or feature lives in the external CLI agent pipeline,
look for `tool_result for unknown toolCallId` and missing `result_msg_id` backfill.
- Subagent tools show up in the main bubble:
check for subagent chunks reaching the main gateway handler.
- Wrong terminal-error guide (e.g. "usage limit reached" shown for a network drop):
a classifier is branching on a structured field whose mere presence isn't its meaning.
Grep the field across all event states in a real trace before trusting it — see
references/debug-workflow\.md §8 (CC `rate_limit_info` rides on `status: "allowed"` too).
## References
@@ -3,13 +3,12 @@
## Contents
1. Pipeline map
2. Capture raw CLI traces first (incl. in-app live traces)
2. Capture raw CLI traces first
3. Compare raw and adapted events
4. Check step boundaries before persistence
5. Check tool persistence invariants
6. Focused tests
7. Repro-to-fix workflow
8. Verify a structured-field classifier against a real trace
## 1. Pipeline Map
@@ -28,54 +27,6 @@ Start at the leftmost broken layer. Do not jump straight to UI rendering unless
## 2. Capture Raw CLI Traces First
### In-app live traces (the faithful capture — prefer this)
The running app already records every CLI session it spawns. This is the most
faithful trace you can get, because it captures the **exact** spawn args, env
keys, cwd, `--resume`/`--mcp-config` flags, model, and stdin that the app used —
things a hand-rolled `claude -p` / `codex exec` repro will not reproduce. Reach
for this before reproducing manually. The recorder lives in
`apps/desktop/src/main/controllers/HeterogeneousAgentCtr.ts`
(`createCliTraceSession`, `shouldTraceCliOutput`, `resolveTraceRootDir`).
When it records:
- **Dev build** (`!app.isPackaged`): always.
- **Packaged build**: only when the user flips the Help-menu developer toggle
(`heteroTracingEnabled`). Off by default so normal runs aren't polluted.
- Never under `NODE_ENV=test`.
Where it writes:
- Toggle **off** (plain dev run): `<cwd>/.heerogeneous-tracing/` — i.e. inside
the repo you're running against. (Yes, the dir name is misspelled
`heerogeneous`; it is the real path.)
- Toggle **on**: `<appStoragePath>/heteroAgent/tracing/` — keeps traces out of
the user's project. This is the only path packaged builds ever use.
Layout per session — `.../<agentType>/<YYYYMMDD-HHMMSS>-<sessionId>/`:
- `meta.json` — spawn `args`, `command`, `cwd`, `envKeys`, `model`,
`resumeSessionId`/`agentSessionId`, attachment summaries. **Read this first**
to know exactly how the CLI was invoked.
- `stdin.txt` — the stream-json request fed to the CLI.
- `stdout.jsonl` — the raw provider NDJSON (the trace you actually read).
- `stderr.log` — CLI stderr.
- `exit.json``{ code, signal, finishedAt }`.
`.heerogeneous-tracing/.last-live-trace` always points at the most recent
session dir, so the fast path to "what just happened" is:
```bash
dir=$(cat .heerogeneous-tracing/.last-live-trace)
cat "$dir/meta.json" # how the CLI was spawned
wc -l "$dir/stdout.jsonl" # raw event count
```
Reproduce the same session yourself by reusing the recorded `meta.json` `args`
together with `stdin.txt` (the args already include `--resume <sessionId>`),
instead of guessing flags.
### Codex raw JSONL
Use a read-only prompt and save traces under the repo-local scratch directory `.heerogeneous-tracing/`.
@@ -290,58 +241,6 @@ When the bug comes from a real trace, distill it into the closest existing test
3. Add or update the narrowest failing test near the broken layer.
4. Fix the smallest layer that can explain the symptom.
5. Re-run focused tests.
6. Only then do an Electron smoke test with the `agent-testing` skill if UI confirmation is still needed.
6. Only then do an Electron smoke test with the `local-testing` skill if UI confirmation is still needed.
Do not start with a broad Electron repro if a raw trace or adapter test can prove the fault zone faster.
## 8. Verify A Structured-Field Classifier Against A Real Trace
Whenever the adapter **branches on a structured field** from the raw stream —
`status`, `usage`, `rateLimitType`, `stop_reason`, `parent_tool_use_id`,
`subtype`, etc. — do not trust your mental model of the wire format. The field
you key on almost always also appears on **benign / non-target** events, and a
classifier that ignores the surrounding state will misfire on those.
The procedure (recurring — run it every time):
1. Pull the most recent real session: `dir=$(cat .heerogeneous-tracing/.last-live-trace)`.
2. Grep the field across **every** event state, not just the failing one, and
count by co-occurring state. Example:
```bash
# Which event statuses carry a rate_limit_info block?
grep -o '"status":"[a-z]*"' "$dir/stdout.jsonl" | sort | uniq -c
grep -c 'rate_limit_info' "$dir/stdout.jsonl"
```
3. If the field rides on states you did not account for, the classifier needs an
extra gate. Add the trace as a fixture/assertion to the adapter test so the
regression can't come back.
### Worked example: CC usage-limit vs. transient throttle (`fix/cc-rate-limit-quota-misclassify`)
- **Symptom:** an unrelated terminal failure (e.g. an `ECONNRESET` network drop)
rendered a bogus "usage limit reached, resets at X" guide.
- **What the trace showed:** Anthropic stamps a `rate_limit_info` block —
carrying `resetsAt` and `rateLimitType` (e.g. `seven_day`) — onto events even
when the request **goes through** (`status: "allowed"`). In real traces those
reset-window fields appear on \~all `rate_limit_info` blocks, the vast majority
of which are `allowed`, not `rejected`. So the window is rolling-window
_metadata for an allowed call_, NOT evidence the limit was hit.
- **The bug:** `isUserQuotaRateLimit` keyed only on the presence of a reset
window (`info.resetsAt != null || info.rateLimitType != null`). A later
terminal error inherited the last allowed event's window → false positive.
- **The fix:** require `status === 'rejected'` **and** a concrete reset window.
A bare `rejected` with no window is the transient server throttle → leave it
to the overloaded (retry) classifier. Status codes (429 / 529) and message
text are deliberately not consulted — only this structured signal decides the
guide.
- `packages/heterogeneous-agents/src/adapters/claudeCode.ts` →
`isUserQuotaRateLimit`
- regression assertions in
`packages/heterogeneous-agents/src/adapters/claudeCode.test.ts`
The general lesson: a field's **presence** is not its **meaning**. Confirm which
event states a discriminator field co-occurs with in a real recorded trace
before branching on it.
+561
View File
@@ -0,0 +1,561 @@
---
name: local-testing
description: >
Local app and bot testing. Uses agent-browser CLI for Electron/web app UI testing,
and osascript (AppleScript) for controlling native macOS apps (WeChat, Discord, Telegram, Slack, Lark/飞书, QQ)
to test bots. Triggers on 'local test', 'test in electron', 'test desktop', 'test bot',
'bot test', 'test in discord', 'test in telegram', 'test in slack', 'test in weixin',
'test in wechat', 'test in lark', 'test in feishu', 'test in qq',
'manual test', 'osascript', or UI/bot verification tasks.
---
# Local App & Bot Testing
Two approaches for local testing on macOS:
| Approach | Tool | Best For |
| --------------------------- | ------------------- | ---------------------------------------------------- |
| **agent-browser + CDP** | `agent-browser` CLI | Electron apps, web apps (DOM access, JS eval) |
| **osascript (AppleScript)** | `osascript -e` | Native macOS apps (WeChat, Discord, Telegram, Slack) |
---
# Part 1: agent-browser (Electron / Web Apps)
Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
## Core Workflow
Every browser automation follows this pattern:
1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
```
## Command Chaining
```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
```
Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
## Essential Commands
```bash
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "text" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if dialog is open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
# Streaming
agent-browser stream enable # Start WebSocket streaming
agent-browser stream status # Inspect streaming state
agent-browser stream disable # Stop streaming
```
## Batch Execution
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
```
## Authentication
```bash
# Option 1: Auth vault (credentials stored encrypted)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
# Option 2: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close # State auto-saved
agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
# Option 3: Persistent profile
agent-browser --profile ~/.myapp open https://app.example.com/login
# Option 4: State file
agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3011`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [references/agent-browser-login.md](./references/agent-browser-login.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
```
## JavaScript Evaluation (eval)
```bash
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin << 'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'document.title' | base64)"
```
## Ref Lifecycle
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
## Annotated Screenshots (Vision Mode)
```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
agent-browser click @e2 # Click using ref from annotated screenshot
```
## Parallel Sessions
```bash
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
```
## Connect to Existing Chrome
```bash
agent-browser --auto-connect snapshot # Auto-discover running Chrome
agent-browser --cdp 9222 snapshot # Explicit CDP port
```
## iOS Simulator (Mobile Safari)
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
```
## Observability Dashboard
```bash
agent-browser dashboard install
agent-browser dashboard start # Background server on port 4848
agent-browser dashboard stop
```
## Cloud Providers
Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
## Browser Engine Selection
```bash
agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
```
## Electron (LobeHub Desktop)
### Setup / Teardown
Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
```bash
SCRIPT=".agents/skills/local-testing/scripts/electron-dev.sh"
# Start Electron dev with CDP (idempotent — skips if already running)
$SCRIPT start
# Check if Electron is running and CDP is reachable
$SCRIPT status
# Kill all Electron-related processes (main + helper + vite)
$SCRIPT stop
# Force fresh restart
$SCRIPT restart
```
After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
**Always run `$SCRIPT stop` when done testing**`pkill -f "Electron"` alone won't catch all helper processes.
#### Environment Variables
| Variable | Default | Description |
| ----------------- | ----------------------- | ---------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `ELECTRON_LOG` | `/tmp/electron-dev.log` | Electron process log |
| `ELECTRON_WAIT_S` | `60` | Max seconds to wait for Electron process |
| `RENDERER_WAIT_S` | `60` | Max seconds to wait for SPA to load |
### LobeHub-Specific Patterns
#### Access Zustand Store State
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
return JSON.stringify({
ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
activeAgent: chat.activeAgentId,
activeTopic: chat.activeTopicId,
});
})()
EVALEOF
```
#### Find and Use the Chat Input
```bash
# The chat input is contenteditable — must use -C flag
agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
agent-browser --cdp 9222 click @e48
agent-browser --cdp 9222 type @e48 "Hello world"
agent-browser --cdp 9222 press Enter
```
#### Wait for Agent to Complete
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
var running = ops.filter(function(o) { return o.status === 'running'; });
return running.length === 0 ? 'done' : 'running: ' + running.length;
})()
EVALEOF
```
#### Install Error Interceptor
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function() {
var msg = Array.from(arguments).map(function(a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
# Later, check captured errors:
agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
```
## Chrome / Web Apps
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-test-profile \
"<URL>" &
sleep 5
agent-browser --cdp 9222 snapshot -i
# Or auto-discover running Chrome with remote debugging
agent-browser --auto-connect snapshot -i
```
---
# Part 2: osascript (Native macOS App Bot Testing)
Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. Works with any app that supports macOS Accessibility, no CDP or Chromium needed.
The pattern is the same for every platform:
1. **Activate** the app (`tell application "X" to activate`)
2. **Navigate** to a channel/chat (Quick Switcher `Cmd+K` or Search `Cmd+F`)
3. **Send** a message (clipboard paste `Cmd+V` + Enter)
4. **Wait** for the bot response
5. **Screenshot** for verification (`screencapture` + `Read` tool)
## Per-Platform References
Pick the file for your target platform — each contains activation, navigation, send-message, and verification snippets specific to that app:
Each channel has its own folder under `bot/<channel>/` containing an `index.md`
(activation, navigation, send-message, and verification snippets specific to
that app) and its test script:
| Platform | Reference | Quick switcher |
| ------------- | ------------------------------------------------ | -------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
For **shared osascript patterns** (activate, type, paste, screenshot, read accessibility, common workflow template, gotchas), see [bot/osascript-common.md](./bot/osascript-common.md). Read this first if you're new to osascript automation.
## Bridge-based channels (no native app)
Some channels have no native app to drive with osascript — they connect through
a local bridge inside the Desktop app. These are tested with agent-browser
(IPC + UI) plus the bridge's own HTTP/REST endpoints, not osascript:
| Channel | Reference | What it drives |
| -------- | ------------------------------------------------ | -------------------------------------------------------- |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | `imessageBridge.*` IPC + local bridge + BlueBubbles REST |
For iMessage there is a one-shot regression script — see `test-imessage-bridge.sh` below.
---
# Scripts
**App / recording scripts** in `.agents/skills/local-testing/scripts/`:
| Script | Usage |
| ------------------------- | --------------------------------------------------- |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart) |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `record-app-screen.sh` | Record app screen (video + screenshots, start/stop) |
**Bot scripts** live under `.agents/skills/local-testing/bot/`, one folder per
channel (alongside that channel's `index.md`). The shared
`capture-app-window.sh` sits at the `bot/` root:
| Script | Usage |
| ---------------------------------- | ------------------------------------------------------------------- |
| `capture-app-window.sh` | Capture screenshot of a specific app window (used by bot tests) |
| `discord/test-discord-bot.sh` | Send message to Discord bot via osascript |
| `slack/test-slack-bot.sh` | Send message to Slack bot via osascript |
| `telegram/test-telegram-bot.sh` | Send message to Telegram bot via osascript |
| `wechat/test-wechat-bot.sh` | Send message to WeChat bot via osascript |
| `lark/test-lark-bot.sh` | Send message to Lark / 飞书 bot via osascript |
| `qq/test-qq-bot.sh` | Send message to QQ bot via osascript |
| `imessage/test-imessage-bridge.sh` | Regression-test the iMessage BlueBubbles bridge (IPC + HTTP) |
| `imessage/send-imessage-test.sh` | Send one real iMessage (desktop → BB → iMessage) and verify it sent |
### Window Screenshot Utility
`capture-app-window.sh` captures a screenshot of a specific app window using `screencapture -l <windowID>`. It uses Swift + CGWindowList to find the window by process name, so screenshots work correctly even when the window is on an external monitor or behind other windows.
```bash
# Standalone usage
./.agents/skills/local-testing/bot/capture-app-window.sh "Discord" /tmp/discord.png
./.agents/skills/local-testing/bot/capture-app-window.sh "Slack" /tmp/slack.png
./.agents/skills/local-testing/bot/capture-app-window.sh "WeChat" /tmp/wechat.png
```
All bot test scripts use this utility automatically for their screenshots.
### Bot Test Scripts
All bot test scripts share the same interface:
```bash
./scripts/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
```
Examples:
```bash
# Discord — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
# Slack — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
# Telegram — test a bot by username
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
# WeChat — test a bot or send to a contact
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
# Lark/飞书 — test a bot in a group chat
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
# QQ — test a bot in a group or direct chat
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
Each script: activates the app, navigates to the channel/contact, pastes the message via clipboard, sends, waits, and takes a screenshot. Use the `Read` tool on the screenshot for visual verification.
### iMessage bridge regression script
`test-imessage-bridge.sh` does **not** follow the osascript bot interface — it
drives the Desktop bridge's IPC + HTTP layers and asserts the result, then
self-cleans. Needs BlueBubbles running and Electron up with CDP.
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
# defaults: bb_url=http://127.0.0.1:1234 cdp_port=9222 — exit 0 = all green
```
It guards the connect/configure flow (testConfig happy + reject paths, first-time
`upsertConfig` save, bridge running + webhook registered, local-server secret
enforcement). See [bot/imessage/index.md](./bot/imessage/index.md)
for the full manual UI flow and known bugs.
---
# Screen Recording
Record automated demos using `record-app-screen.sh` (start/stop lifecycle, CDP screenshots + ffmpeg assembly). See [references/record-app-screen.md](references/record-app-screen.md) for full documentation.
```bash
./.agents/skills/local-testing/scripts/electron-dev.sh start
./.agents/skills/local-testing/scripts/record-app-screen.sh start my-demo
# ... run automation ...
./.agents/skills/local-testing/scripts/record-app-screen.sh stop
```
Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/` (screenshots every 3s).
---
# Gotchas
### agent-browser
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
### Electron-specific
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
### osascript
See [bot/osascript-common.md](./bot/osascript-common.md#gotchas) for the full osascript gotchas list (accessibility permissions, `keystroke` non-ASCII issues, locale-specific app names, rate limiting, etc.).
@@ -2,7 +2,7 @@
**App name:** `Discord` | **Process name:** `Discord`
See [references/osascript.md](../../references/osascript.md) for shared patterns.
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
@@ -92,6 +92,6 @@ echo "Screenshot saved to /tmp/discord-test-result.png"
## Script
```bash
./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -21,7 +21,7 @@ So the test surface is three layers:
curl -sS -m4 -o /dev/null -w '%{http_code}\n' \
"http://127.0.0.1:1234/api/v1/server/info?password=<PW>" # expect 200
```
- **Electron dev running with CDP**: `./.agents/skills/agent-testing/scripts/electron-dev.sh start`
- **Electron dev running with CDP**: `./.agents/skills/local-testing/scripts/electron-dev.sh start`
- The **iMessage Desktop branch** checked out (the `imessageBridge` IPC group
and `@lobechat/chat-adapter-imessage` must be compiled into the main bundle).
Run `pnpm install --ignore-scripts` at the repo root **and** in `apps/desktop/`
@@ -31,7 +31,7 @@ So the test surface is three layers:
## Fast path: automated script
```bash
./.agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
```
Asserts the whole flow and self-cleans (unique `applicationId` per run, removes
@@ -136,7 +136,7 @@ Verifies the leg the bridge uses to _reply_: `BlueBubblesApiClient.sendText`
`POST /api/v1/message/text`. Run the helper against your own number:
```bash
./.agents/skills/agent-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
./.agents/skills/local-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
```
**Gotcha that bites everyone:** with `method=apple-script` and a _new_
@@ -2,7 +2,7 @@
**App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`
See [references/osascript.md](../../references/osascript.md) for shared patterns.
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
@@ -56,6 +56,6 @@ screencapture /tmp/lark-bot-response.png
## Script
```bash
./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
```
@@ -80,5 +80,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `QQ` | **Process name:** `QQ`
See [references/osascript.md](../../references/osascript.md) for shared patterns.
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
@@ -57,6 +57,6 @@ screencapture /tmp/qq-bot-response.png
## Script
```bash
./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
@@ -72,5 +72,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `Slack` | **Process name:** `Slack`
See [references/osascript.md](../../references/osascript.md) for shared patterns.
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
@@ -68,6 +68,6 @@ screencapture /tmp/slack-bot-response.png
## Script
```bash
./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `Telegram` | **Process name:** `Telegram`
See [references/osascript.md](../../references/osascript.md) for shared patterns.
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
@@ -75,6 +75,6 @@ curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | j
## Script
```bash
./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
```
@@ -75,5 +75,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `微信` or `WeChat` | **Process name:** `WeChat`
See [references/osascript.md](../../references/osascript.md) for shared patterns.
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
@@ -76,6 +76,6 @@ screencapture /tmp/wechat-bot-response.png
## Script
```bash
./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
```
@@ -81,5 +81,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -0,0 +1,110 @@
# Log `agent-browser` into a local LobeHub dev server
`agent-browser --headed` on macOS often creates the Chromium window off-screen — the user can't see or interact with it, so manual login inside the agent-browser session fails. Instead of sharing the user's real Chrome profile, copy the **better-auth session cookie** out of a request in DevTools and inject it into the agent-browser session as a Playwright-style state file.
## When to use
- You need `agent-browser` to reach an authenticated page on `http://localhost:<port>` (e.g. `localhost:3011`).
- The user already has a logged-in tab of the same dev server in their own Chrome.
- Spawning a headed Chromium to let the user log in manually is unreliable (window off-screen, no interaction).
Do **not** use this on production URLs — only local dev. Treat the cookie as a secret: don't paste it into shared logs, PRs, or commit it anywhere.
## Step 1 — Ask the user to copy the cookie from a Network request, NOT `document.cookie`
`document.cookie` will not return HttpOnly cookies, which is exactly where better-auth puts its session. Instruct the user:
1. Open the logged-in tab (`http://localhost:<port>/…`) in their own Chrome.
2. `Cmd+Option+I`**Network** tab.
3. Refresh, click any same-origin request (e.g. the top-level document request).
4. In the right pane under **Request Headers**, right-click the `Cookie:` line → **Copy value** (or copy the entire header).
5. Paste the string into chat.
You only need the better-auth pieces. Everything else (Clerk, `LOBE_LOCALE`, HMR hash, theme vars) is noise and can stay. The minimum viable set is:
```
better-auth.session_token=<value>; better-auth.state=<value>
```
## Step 2 — Build a Playwright-style state file
`agent-browser state load` expects Playwright's `storageState` format: a JSON with a `cookies` array and an `origins` array.
```bash
cat > /tmp/mkstate.py << 'PY'
import json, sys, time
# Read the Cookie header from stdin (allows optional "Cookie: " prefix).
raw = sys.stdin.read().strip()
if raw.lower().startswith("cookie:"):
raw = raw.split(":", 1)[1].strip()
# Keep only better-auth cookies. Extend this set if the app genuinely needs more.
WANTED = {"better-auth.session_token", "better-auth.state"}
cookies = []
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
for pair in raw.split("; "):
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input\n")
sys.exit(1)
print(json.dumps({"cookies": cookies, "origins": []}, indent=2))
PY
# Feed the copied Cookie header in via env var or heredoc.
printf '%s' "$COOKIE_HEADER" | python3 /tmp/mkstate.py > /tmp/state.json
```
**Note on `httpOnly`**: the real cookie in the user's browser is HttpOnly, but `storageState` doesn't enforce the flag on load — it just attaches the value. Storing with `httpOnly: false` is fine for local dev and sidesteps a CDP-context quirk where HttpOnly cookies sometimes fail to attach.
## Step 3 — Load state and navigate
```bash
SESSION="my-test" # any stable session name
agent-browser --session "$SESSION" state load /tmp/state.json
agent-browser --session "$SESSION" open "http://localhost:3011/"
agent-browser --session "$SESSION" get url
# Expect NOT /signin?callbackUrl=… — if you still see signin, cookie didn't apply.
```
## Step 4 — Verify
```bash
agent-browser --session "$SESSION" snapshot -i | head -20
# Look for the user's avatar/name in the sidebar, or absence of the signin form.
```
## Common failure modes
| Symptom | Cause | Fix |
| ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- |
| Still redirects to `/signin` after `state load` | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| `state load` reports 0 cookies | Separator wrong, or user pasted URL-decoded value | Keep the raw `Cookie:` header as-is; split on `"; "` |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-load |
| Domain mismatch | Use `domain: "localhost"` literally, no leading dot for local dev | — |
## Scope
Only covers authenticating an **agent-browser** session into a **local** LobeHub dev server. It does not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com` and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI need a real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way (into agent-browser only).
@@ -19,13 +19,13 @@ works for any LobeHub streaming session.
```bash
# 1. Start Electron with CDP
./.agents/skills/agent-testing/scripts/electron-dev.sh start
./.agents/skills/local-testing/scripts/electron-dev.sh start
# 2. Navigate to a chat, switch runtime to Cloud Sandbox (gateway mode)
# 3. Install the probe + helpers
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/agent-testing/scripts/agent-gateway/probe.js
< .agents/skills/local-testing/scripts/agent-gateway/probe.js
# 4. Send a tool-call message — manually or via type+press
agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
@@ -34,15 +34,15 @@ agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
# rightmost inactive tab as AWAY — edit ROUND_TRIPS / DWELL_MS in the
# file if you want different timing)
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js
< .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
# 6. Wait for streaming to finish, then dump
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js \
< .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js \
> /tmp/probe.json
# 7. Analyze
node .agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
node .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
```
The analyzer prints three sections: EVENTS, TIMELINE, REGRESSIONS. If
@@ -12,13 +12,13 @@ General-purpose screen recording tool for the Electron app. Captures CDP screens
```bash
# Start recording (Electron must be running with CDP)
.agents/skills/agent-testing/scripts/record-app-screen.sh start [output_name]
.agents/skills/local-testing/scripts/record-app-screen.sh start [output_name]
# Stop recording and assemble video
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
.agents/skills/local-testing/scripts/record-app-screen.sh stop
# Check if recording is active
.agents/skills/agent-testing/scripts/record-app-screen.sh status
.agents/skills/local-testing/scripts/record-app-screen.sh status
```
### Arguments
@@ -74,10 +74,10 @@ The `.records/` directory is at the project root and is gitignored.
```bash
# Start Electron
.agents/skills/agent-testing/scripts/electron-dev.sh start
.agents/skills/local-testing/scripts/electron-dev.sh start
# Start recording
.agents/skills/agent-testing/scripts/record-app-screen.sh start my-test
.agents/skills/local-testing/scripts/record-app-screen.sh start my-test
# Run automation
agent-browser --cdp 9222 click @e61
@@ -86,14 +86,14 @@ agent-browser --cdp 9222 press Enter
sleep 10
# Stop and get results
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
.agents/skills/local-testing/scripts/record-app-screen.sh stop
# → .records/my-test.mp4 + .records/my-test/*.png
```
### Gateway Streaming Demo
```bash
.agents/skills/agent-testing/scripts/electron-dev.sh start
.agents/skills/local-testing/scripts/electron-dev.sh start
# Inject gateway URL
agent-browser --cdp 9222 eval --stdin << 'EOF'
@@ -106,19 +106,19 @@ agent-browser --cdp 9222 eval --stdin << 'EOF'
EOF
# Record
.agents/skills/agent-testing/scripts/record-app-screen.sh start gateway-demo
.agents/skills/local-testing/scripts/record-app-screen.sh start gateway-demo
# Navigate to agent, send message, wait for completion...
# (automation commands here)
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
.agents/skills/local-testing/scripts/record-app-screen.sh stop
open .records/gateway-demo.mp4
```
### Check Active Recording
```bash
.agents/skills/agent-testing/scripts/record-app-screen.sh status
.agents/skills/local-testing/scripts/record-app-screen.sh status
# [record] Active recording
# Frames: 42 captured (running: yes)
# Screenshots: 14 captured (running: yes)
@@ -11,7 +11,7 @@
// 6. ROLLBACKS — msgN / childN / role drops in the active-topic timeline
//
// Usage:
// bun run .agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
// bun run .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
import { readFileSync } from 'node:fs';
@@ -5,16 +5,16 @@
// streaming-replay test fixtures.
//
// Commands:
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts install
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts install
// Bundle probe-events.ts and inject into the CDP-attached browser.
// Re-installing clears all buffers and re-patches WebSocket / fetch.
//
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts dump [name]
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts dump [name]
// Stop the timeline timer, fetch the capture as JSON, write it to
// `.agent-gateway/<name>-<YYYYMMDD-HHmmss>.json`. `name` defaults to
// `dump`. Prints the absolute path written.
//
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts analyze [path]
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts analyze [path]
// Run analyze-events.ts on the dump. `path` defaults to the most
// recently modified file in `.agent-gateway/`.
//
@@ -28,7 +28,7 @@ import path from 'node:path';
import { fileURLToPath } from 'node:url';
const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
// .agents/skills/agent-testing/scripts/agent-gateway/ → 5 levels up
// .agents/skills/local-testing/scripts/agent-gateway/ → 5 levels up
const PROJECT_ROOT = path.resolve(SCRIPT_DIR, '../../../../..');
const DUMP_DIR = path.join(PROJECT_ROOT, '.agent-gateway');
@@ -1,69 +0,0 @@
---
name: model-bank-metadata
description: 'Backfill and maintain model-bank metadata (knowledgeCutoff, family, generation). Use when adding models, fixing cutoff/family data, running a metadata sweep across aiModels providers, or researching official knowledge cutoffs.'
user-invocable: false
---
# Model-Bank Metadata (knowledgeCutoff / family / generation)
How to populate and maintain the three structured metadata fields on `packages/model-bank/src/aiModels/*.ts` model cards, at single-model scale (new model PR) or repo-wide scale (sweep across \~80 provider files / \~1900 entries).
## Field semantics
| Field | Format | Meaning |
| ----------------- | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `knowledgeCutoff` | `'YYYY-MM'` (or `'YYYY'` if only the year is published) | World-knowledge cutoff. When a vendor distinguishes a **"reliable knowledge cutoff"** from the broader training-data cutoff (Anthropic does), always use the **reliable** one. |
| `family` | lowercase slug (`claude`, `gpt`, `o-series`, `qwen`, `deepseek`, `llama`, `glm`, …) | Model lineage, finer than `organization`. Lets the UI group models and match the same model across aggregator providers. |
| `generation` | family slug + version (`claude-4.6`, `gpt-5.2`, `qwen3.5`, `llama-3.1`) | Generation within the family. Only set when confidently derivable from the model line's naming. Rolling aliases (`qwen-max`, `deepseek-chat`, `gemini-flash-latest`) get `family` only. |
All three are optional. **The cardinal rule: only fill what an authoritative source states or naming rules derive — never guess.** An empty field is correct for vendors that publish nothing.
No DB migration is ever needed for these: builtin models are merged from model-bank at read time (`repositories/aiInfra/index.ts` spreads the whole card), so new card fields flow to the client automatically.
## Sourcing rules for knowledgeCutoff
Accept only:
- Vendor official docs (platform.openai.com / developers.openai.com, docs.x.ai, ai.google.dev, docs.anthropic.com / platform.claude.com)
- Official Hugging Face org model cards (huggingface.co/meta-llama/..., etc.)
- Official tech reports / system cards / launch blog posts
Reject:
- **Third-party aggregator sites** (aiknowledgecutoff.com and similar) — proven to copy one model's value across a whole family. A Cohere sweep once claimed `2024-06` for four distinct base models; none of the cited Cohere pages said that, and the only cutoff Cohere actually publishes is Feb 2023 for the 08-2024 Command R/R+ refresh.
- **AWS Bedrock model cards as sole source** — proven to conflate launch date with knowledge cutoff (DeepSeek R1's card lists both as "Jan 2025"). If Bedrock is the only place a value appears, leave the field empty.
- Inference from `releasedAt` — a release date is not a cutoff.
Variant inheritance: dated snapshots (`-2024-08-06`), speed/price tiers of the same checkpoint, quantizations (`-fp8`, `-awq`), context-length variants (`-32k`), ollama `:NNb` tags, and cloud-prefixed ids (`anthropic.`/`us.`/`global.` Bedrock ids) share their base model's cutoff. **Distills do not inherit** from teacher or base — use the distill's own published value or leave empty. **Sizes within one generation can genuinely differ**: Llama 3 8B is Mar 2023 while 70B is Dec 2023 (per Meta's own card) — don't "fix" that to one family-wide value.
Vendors that publish no cutoffs (leave empty, don't chase): Qwen, DeepSeek, GLM/Zhipu, ERNIE, Doubao, Hunyuan, SenseNova, Spark, MiniMax, StepFun, Yi (mostly), Moonshot.
Known per-vendor footguns:
- **Anthropic**: Opus 4.6 reliable cutoff is `2025-05`, Sonnet 4.6 is `2025-08` — easy to swap. Claude 3.7 is `2024-10` (system card: trained through Nov 2024, knowledge cutoff end of Oct 2024). Cite system cards / the models overview, not the Help Center article (a living page that drops retired models — citation rot).
- **xAI**: docs.x.ai has one blanket sentence covering grok-3/grok-4; mini variants are not named there. Grok 4.20/4.3 have no official cutoff anywhere.
- **OpenAI**: per-model docs pages (developers.openai.com/api/docs/models/<id>) state cutoffs explicitly, including snapshot differences (gpt-4-1106-preview `2023-04` vs gpt-4-0125-preview `2023-12`).
## family/generation derivation
Rule-based, no research needed: `scripts/derive-family.ts` holds the per-family regex rules. Traps already encoded there — keep them when extending:
- Date suffixes are not versions: `claude-sonnet-4-20250514` is generation `claude-4`, not `claude-4.2`.
- Size suffixes are not versions: `llama-3-8b``llama-3` (not `llama-3.8`); `gemma-7b-it` is **gemma-1** (not gemma-7).
- Vendor spelling variants: `qwen2p5` = qwen2.5, `llama-v3p1` = llama-3.1, ollama `:NNb` tags, Bedrock `us.`/`global.`/`anthropic.` prefixes.
- `claude-X.0` normalizes to `claude-X`.
- Fable/Mythos-class ids (`claude-fable-5`) don't match the opus/sonnet/haiku regex — they are the Mythos class — `family: 'claude-mythos'`, `generation: 'mythos-5'` (set manually; the launch page calls Fable 5 "the generally available Mythos-class model").
## Repo-wide sweep workflow
1. **Extract ids**: `bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts` → unique normalized chat-model ids (normalization = last path segment, lowercased). Non-chat types (image/video/embedding/tts) have no knowledge cutoff — skip them.
2. **Research (multi-agent)**: chunk ids by family (≤50 per chunk) and fan out one research agent per chunk (Workflow tool), each returning `{id, cutoff, source}` with the sourcing rules above baked into the prompt, **plus** one adversarial verify agent per chunk that re-fetches cited sources and refutes unsupported claims. The verify pass is load-bearing: it caught the Cohere aggregator copy-paste and the AWS launch-date conflation.
3. **Policy filter**: before applying, drop entries whose only source is a rejected category (check the returned `sources` map — e.g. drop everything sourced to aws.amazon.com).
4. **Apply**: `bun scripts/apply-cutoffs.ts <map.json>` and `bun scripts/apply-family.ts <map.json>` (run from repo root). Both are idempotent codemods keyed on normalized id — aggregator providers get the same values automatically; entries that already have the field are skipped. They rely on the uniform prettier formatting of the data files (entries start ` {` / end ` },`, fields at 4-space indent).
5. **Verify**: `cd packages/model-bank && bunx vitest run src/aiModels/__tests__/index.test.ts && bunx tsc --noEmit`.
## Maintenance rules
- **New model PRs** should fill all three fields inline, citing the official source in the PR body (see the Anthropic entries in `anthropic.ts` for reference values).
- **After resolving merge conflicts** in model-bank data files, sanity-check that metadata didn't vanish: `git grep -c knowledgeCutoff -- 'packages/model-bank/src/aiModels/*.ts'` before vs after. A three-way stack of model PRs once silently dropped all 10 Anthropic cutoffs during conflict resolution.
- Dirty ids exist in aggregator data (a sambanova id once carried a trailing tab). The codemods match ids verbatim — if a map key won't apply, check for invisible characters before assuming the model is missing.
@@ -1,73 +0,0 @@
/**
* One-off codemod: apply a canonical { normalizedModelId: 'YYYY-MM' } map onto
* packages/model-bank/src/aiModels/*.ts, inserting `knowledgeCutoff` after the
* `id:` line of every chat-model entry that matches and doesn't already have one.
*
* Relies on the uniform prettier formatting of these files:
* - each model entry starts with ` {` and ends with ` },` (2-space indent)
* - fields are at 4-space indent: ` id: '...'`, ` type: 'chat'`
*
* Usage: bun /tmp/apply-cutoffs.ts /tmp/cutoff-map.json
*/
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
const mapPath = process.argv[2];
if (!mapPath) throw new Error('usage: bun apply-cutoffs.ts <map.json>');
const map: Record<string, string> = JSON.parse(readFileSync(mapPath, 'utf8'));
const dir = 'packages/model-bank/src/aiModels';
const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
let touchedFiles = 0;
let inserted = 0;
const matchedIds = new Set<string>();
for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const path = join(dir, file);
const lines = readFileSync(path, 'utf8').split('\n');
const out: string[] = [];
let changed = false;
let i = 0;
while (i < lines.length) {
if (lines[i] !== ' {') {
out.push(lines[i]);
i++;
continue;
}
// collect one model entry block
const start = i;
let end = i;
while (end < lines.length && lines[end] !== ' },') end++;
const block = lines.slice(start, end + 1);
const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
const hasCutoff = block.some((l) => /^ {4}knowledgeCutoff:/.test(l));
if (idLineIdx >= 0 && isChat && !hasCutoff) {
const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
const norm = rawId ? normalize(rawId) : undefined;
const cutoff = norm ? map[norm] : undefined;
if (cutoff && /^\d{4}(?:-\d{2})?$/.test(cutoff)) {
block.splice(idLineIdx + 1, 0, ` knowledgeCutoff: '${cutoff}',`);
inserted++;
changed = true;
matchedIds.add(norm!);
}
}
out.push(...block);
i = end + 1;
}
if (changed) {
writeFileSync(path, out.join('\n'));
touchedFiles++;
}
}
console.log(`inserted ${inserted} knowledgeCutoff fields across ${touchedFiles} files`);
console.log(`map ids used: ${matchedIds.size}/${Object.keys(map).length}`);
const unused = Object.keys(map).filter((k) => !matchedIds.has(k));
if (unused.length) console.log('unused map keys (first 20):', unused.slice(0, 20));
@@ -1,49 +0,0 @@
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
const map: Record<string, { family: string; generation?: string }> = JSON.parse(
readFileSync('/tmp/family-map.json', 'utf8'),
);
const dir = 'packages/model-bank/src/aiModels';
const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
let inserted = 0;
let touchedFiles = 0;
for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const path = join(dir, file);
const lines = readFileSync(path, 'utf8').split('\n');
const out: string[] = [];
let changed = false;
let i = 0;
while (i < lines.length) {
if (lines[i] !== ' {') {
out.push(lines[i]);
i++;
continue;
}
let end = i;
while (end < lines.length && lines[end] !== ' },') end++;
const block = lines.slice(i, end + 1);
const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
const hasFamily = block.some((l) => /^ {4}family:/.test(l));
if (idLineIdx >= 0 && isChat && !hasFamily) {
const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
const r = rawId ? map[normalize(rawId)] : undefined;
if (r) {
const add = [` family: '${r.family}',`];
if (r.generation) add.push(` generation: '${r.generation}',`);
block.splice(idLineIdx, 0, ...add);
inserted++;
changed = true;
}
}
out.push(...block);
i = end + 1;
}
if (changed) {
writeFileSync(path, out.join('\n'));
touchedFiles++;
}
}
console.log(`annotated ${inserted} model entries across ${touchedFiles} files`);
@@ -1,237 +0,0 @@
/* eslint-disable regexp/no-unused-capturing-group */
/**
* Rule-based derivation of { family, generation } from normalized model ids.
* Principle: only fill what is confidently derivable; otherwise omit.
*
* Usage: bun /tmp/derive-family.ts # print distinct pairs for review
* bun /tmp/derive-family.ts --emit # write /tmp/family-map.json
*/
import { readFileSync, writeFileSync } from 'node:fs';
const ids: string[] = JSON.parse(readFileSync('/tmp/model-ids.json', 'utf8'));
type R = { family: string; generation?: string };
const derive = (id: string): R | undefined => {
// strip cloud/bedrock prefixes for matching
const m = id.replace(/^(us\.|global\.|eu\.|apac\.)?(anthropic\.|meta\.|cohere\.|azure-)/, '');
// ---- anthropic ----
if (m.startsWith('claude')) {
// family = product-line tier (claude-opus/sonnet/haiku/instant); bare claude-2.x has no tier
const tier = m.match(/(opus|sonnet|haiku|instant)/)?.[1];
const family = tier ? `claude-${tier}` : 'claude';
let g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)[.-](\d)(?!\d)/); // claude-opus-4-8 / claude-haiku-4.5
if (g) return { family, generation: `claude-${g[1]}.${g[2]}` };
g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)(?!\d)/); // claude-opus-4
if (g) return { family, generation: `claude-${g[1]}` };
g = m.match(/^claude-(\d)[.-](\d)(?!\d)/); // claude-3-5-haiku / claude-3.7-sonnet / claude-2.1
if (g) return { family, generation: g[2] === '0' ? `claude-${g[1]}` : `claude-${g[1]}.${g[2]}` };
g = m.match(/^claude-(\d)(?!\d)/); // claude-3-haiku
if (g) return { family, generation: `claude-${g[1]}` };
if (m.startsWith('claude-instant')) return { family: 'claude-instant' };
if (/^claude-v?2/.test(m)) return { family: 'claude', generation: 'claude-2' };
return { family };
}
// ---- openai ----
if (/^(gpt-oss|gpt_oss)/.test(m) || m.startsWith('gpt-oss:'))
return { family: 'gpt-oss', generation: 'gpt-oss' };
if (/^(chatgpt-4o|gpt-4o)/.test(m)) return { family: 'gpt', generation: 'gpt-4o' };
if (/^gpt-(3\.5|35)/.test(m)) return { family: 'gpt', generation: 'gpt-3.5' };
if (m.startsWith('gpt-audio')) return { family: 'gpt', generation: 'gpt-audio' };
{
const g = m.match(/^gpt-(\d)\.(\d)/); // gpt-4.1 / gpt-5.2
if (g) return { family: 'gpt', generation: `gpt-${g[1]}.${g[2]}` };
const g2 = m.match(/^gpt-(\d)(?!\d)/); // gpt-4 / gpt-5
if (g2) return { family: 'gpt', generation: `gpt-${g2[1]}` };
}
{
const g = m.match(/^o([134])(-|$)/); // o1 / o3 / o4
if (g) return { family: 'o-series', generation: `o${g[1]}` };
}
if (/^(codex|computer-use-preview)/.test(m)) return { family: 'gpt' };
// ---- google ----
{
const g = m.match(/^gemini-(\d+(?:\.\d+)?)/);
if (g) return { family: 'gemini', generation: `gemini-${g[1]}` };
if (/^gemini-(pro|flash)/.test(m)) return { family: 'gemini' }; // rolling aliases
if (m.startsWith('gemma')) {
if (/^gemma-?\db/.test(m)) return { family: 'gemma', generation: 'gemma-1' };
const v = m.match(/^gemma-?(\d)(?!b)/);
return { family: 'gemma', generation: v ? `gemma-${v[1]}` : undefined };
}
if (/^(codegemma|learnlm|palm)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
}
// ---- qwen ----
if (m.startsWith('qwq')) return { family: 'qwen', generation: 'qwq' };
if (m.startsWith('qvq')) return { family: 'qwen', generation: 'qvq' };
if (m.startsWith('codeqwen')) return { family: 'qwen' };
if (m.startsWith('qwen')) {
const g =
m.match(/^qwen-?([123](?:\.\d+)?)(?![0-9b])/) || // qwen3.5-plus / qwen-3-14b / qwen2-7b / qwen1.5
m.match(/^qwen([23](?:\.\d+)?):/) || // qwen2.5:72b
m.match(/^qwen([23])p(\d)/); // qwen2p5 -> handled below
if (/^qwen(\d)p(\d)/.test(m)) {
const p = m.match(/^qwen(\d)p(\d)/)!;
return { family: 'qwen', generation: `qwen${p[1]}.${p[2]}` };
}
if (g) return { family: 'qwen', generation: `qwen${g[1]}` };
return { family: 'qwen' }; // qwen-max/plus/turbo/vl rolling aliases
}
// ---- deepseek ----
if (/^(deepseek|azure-deepseek|pro-deepseek)/.test(m) || m.startsWith('deepseek_')) {
const s = m.replace(/^pro-/, '').replaceAll('_', '-');
if (s.startsWith('deepseek-r1-distill'))
return { family: 'deepseek', generation: 'deepseek-r1-distill' };
if (s.startsWith('deepseek-r1')) return { family: 'deepseek', generation: 'deepseek-r1' };
const g = s.match(/^deepseek-(?:chat-)?v(\d(?:\.\d)?)/);
if (g) return { family: 'deepseek', generation: `deepseek-v${g[1]}` };
if (/^deepseek-(coder-v2|coder)/.test(s))
return { family: 'deepseek', generation: 'deepseek-coder' };
return { family: 'deepseek' }; // deepseek-chat / reasoner rolling aliases
}
// ---- meta llama ----
if (m.startsWith('codellama')) return { family: 'llama', generation: 'codellama' };
if (/^(meta-)?llama|^l3(\d)?-|^llava/.test(m)) {
if (m.startsWith('llava')) return { family: 'llava' };
const s = m.replace(/^meta-/, '');
const g =
s.match(/^llama-?([234])(?:[.-](\d))?(?![0-9b])/) || // llama-3.1 / llama3.3 / llama-4
s.match(/^llama-?v([234])p?(\d)?/) || // llama-v3p1
s.match(/^llama([234])[.:-](\d)?/);
if (g) {
const gen = g[2] ? `llama-${g[1]}.${g[2]}` : `llama-${g[1]}`;
return { family: 'llama', generation: gen };
}
if (m.startsWith('l3-')) return { family: 'llama', generation: 'llama-3' };
if (m.startsWith('l31-')) return { family: 'llama', generation: 'llama-3.1' };
return { family: 'llama' };
}
// ---- zhipu ----
if (/^(zai-)?glm/.test(m)) {
const s = m.replace(/^zai-/, '');
if (s.startsWith('glm-z1')) return { family: 'glm', generation: 'glm-z1' };
if (s.startsWith('glm-zero')) return { family: 'glm', generation: 'glm-zero' };
const g = s.match(/^glm-(\d(?:\.\d)?)/);
if (g) return { family: 'glm', generation: `glm-${g[1]}` };
return { family: 'glm' };
}
if (/^(charglm|codegeex|emohaa)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
// ---- mistral ----
if (
/^(open-)?(mistral|mixtral|ministral|codestral|devstral|magistral|pixtral|mathstral|labs-devstral|labs-leanstral|open-codestral)/.test(
m,
)
) {
const fam = m.replace(/^(open-|labs-)/, '').match(/^[a-z]+/)![0];
return { family: fam };
}
// ---- xai ----
if (m.startsWith('grok')) {
const g = m.match(/^grok-(\d(?:\.\d+)?)/);
return { family: 'grok', generation: g ? `grok-${g[1]}` : undefined };
}
// ---- moonshot ----
if (m.startsWith('kimi')) {
const g = m.match(/^kimi-k(\d(?:\.\d)?)/);
return { family: 'kimi', generation: g ? `kimi-k${g[1]}` : undefined };
}
if (m.startsWith('moonshot-kimi-k2')) return { family: 'kimi', generation: 'kimi-k2' };
if (m.startsWith('moonshot-v1')) return { family: 'kimi', generation: 'moonshot-v1' };
// ---- minimax ----
if (m.startsWith('minimax')) {
if (m.startsWith('minimax-text')) return { family: 'minimax', generation: 'minimax-text-01' };
const g = m.match(/^minimax-m(\d(?:\.\d)?)/);
return { family: 'minimax', generation: g ? `minimax-m${g[1]}` : undefined };
}
if (m.startsWith('abab')) return { family: 'minimax', generation: 'abab' };
// ---- baidu ----
if (m.startsWith('ernie')) {
if (m.startsWith('ernie-x1')) return { family: 'ernie', generation: 'ernie-x1' };
const g = m.match(/^ernie-(\d\.\d)/);
return { family: 'ernie', generation: g ? `ernie-${g[1]}` : undefined };
}
if (m.startsWith('qianfan')) return { family: 'qianfan' };
// ---- bytedance ----
if (m.startsWith('doubao')) {
const g = m.match(/^doubao-seed-(\d[.-]\d|\d)/) || m.match(/^doubao-(\d\.\d)/);
return { family: 'doubao', generation: g ? `doubao-${g[1].replace('-', '.')}` : undefined };
}
if (/^(seed-oss|skylark)/.test(m)) return { family: m.startsWith('seed') ? 'doubao' : 'skylark' };
// ---- tencent ----
if (m.startsWith('hunyuan')) {
const g = m.match(/^hunyuan-(\d\.\d)/);
return { family: 'hunyuan', generation: g ? `hunyuan-${g[1]}` : undefined };
}
if (m.startsWith('hy3')) return { family: 'hunyuan', generation: 'hunyuan-3' };
// ---- others (family only / simple version) ----
if (m.startsWith('yi-')) return { family: 'yi' };
if (/^(command|c4ai-command)/.test(m)) return { family: 'command' };
if (/^(aya|c4ai-aya)/.test(m)) return { family: 'aya' };
if (/^phi-?(\d)?/.test(m) && m.startsWith('phi')) {
const g = m.match(/^phi-?(\d(?:\.\d)?)/);
return { family: 'phi', generation: g ? `phi-${g[1]}` : undefined };
}
if (m.startsWith('wizardlm')) return { family: 'wizardlm' };
if (m.startsWith('step-')) {
const g = m.match(/^step-(?:r1|(\d(?:\.\d)?))/);
return { family: 'step', generation: g?.[1] ? `step-${g[1]}` : undefined };
}
if (/^(internlm|intern-)/.test(m)) return { family: 'intern' };
if (m.startsWith('internvl')) return { family: 'internvl' };
if (m.startsWith('baichuan')) {
const g = m.match(/^baichuan-?(m?\d)/);
return { family: 'baichuan', generation: g ? `baichuan-${g[1]}` : undefined };
}
if (/^(sensechat|sensenova)/.test(m)) return { family: 'sensenova' };
if (/^(spark|generalv|4\.0ultra)/.test(m)) return { family: 'spark' };
if (/^(360gpt|360zhinao)/.test(m)) return { family: '360zhinao' };
if (/^(jamba|ai21-jamba)/.test(m)) return { family: 'jamba' };
if (m.startsWith('sonar')) return { family: 'sonar' };
if (/^(nova-lite|nova-micro|nova-pro)/.test(m)) return { family: 'nova' };
if (/^(ling|ring)-/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
if (m.startsWith('longcat')) return { family: 'longcat' };
if (m.startsWith('mimo')) return { family: 'mimo' };
if (m.startsWith('taichu')) return { family: 'taichu' };
if (/^(hermes|nous-hermes)/.test(m)) return { family: 'hermes' };
if (m.startsWith('solar')) return { family: 'solar' };
if (m.startsWith('kat-coder')) return { family: 'kat-coder' };
if (m.startsWith('dbrx')) return { family: 'dbrx' };
if (m.startsWith('morph')) return { family: 'morph' };
return undefined;
};
const map: Record<string, R> = {};
const pairs = new Map<string, number>();
let derived = 0;
for (const id of ids) {
const r = derive(id);
if (!r) continue;
derived++;
map[id] = r;
const key = `${r.family} :: ${r.generation ?? '—'}`;
pairs.set(key, (pairs.get(key) || 0) + 1);
}
console.log(`derived ${derived}/${ids.length}`);
for (const [k, n] of [...pairs.entries()].sort()) console.log(String(n).padStart(4), k);
if (process.argv.includes('--emit')) {
writeFileSync('/tmp/family-map.json', JSON.stringify(map, null, 1));
console.log('\nwritten /tmp/family-map.json');
}
@@ -1,23 +0,0 @@
/**
* Extract unique normalized chat-model ids from packages/model-bank/src/aiModels/*.ts.
* Normalization: last path segment, lowercased (matches the apply codemods).
*
* Usage (repo root): bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts [out.json]
* Default output: /tmp/model-ids.json
*/
import { readdirSync, writeFileSync } from 'node:fs';
import { join, resolve } from 'node:path';
const dir = resolve('packages/model-bank/src/aiModels');
const out = process.argv[2] || '/tmp/model-ids.json';
const ids = new Set<string>();
for (const f of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const mod = await import(join(dir, f));
for (const m of mod.default || []) {
if (!m?.id || m.type !== 'chat') continue;
ids.add(m.id.split('/').pop()!.toLowerCase());
}
}
writeFileSync(out, JSON.stringify([...ids].sort(), null, 1));
console.log(`${ids.size} unique normalized chat ids -> ${out}`);
-7
View File
@@ -53,12 +53,6 @@ For Modal specifically, see the dedicated **modal** skill — use the imperative
| Layout | Center, DraggablePanel, Flexbox, Grid, Header, MaskShadow |
| Navigation | Burger, Menu, SideNav, Tabs |
## Loading indicators
**Do NOT use antd `Spin` / `<Spin />`.** Use a project loader
(`NeuralNetworkLoading`, `DotsLoading`, …) — see the **ux** skill ("Loading
visuals") for the component table and when to use each.
## State
When a feature component manages more than 3 pieces of state (`useState`/`useReducer`/derived state), extract the logic into a custom hook (e.g. `useXxx`). Keep the component focused on rendering — the hook holds state and handlers, so logic can be unit-tested without rendering the component.
@@ -118,7 +112,6 @@ errorElement: <ErrorBoundary />;
| ------------------------------------------------------------------ | --------------------------------------------------------------------------- |
| Using `next/link` in SPA | Use `react-router-dom` `Link` |
| Using antd directly | Use `@lobehub/ui/base-ui` first, then `@lobehub/ui` |
| antd `Spin` / `<Spin />` for loading | Use `NeuralNetworkLoading` / project loaders (see the **ux** skill) |
| `import { Select } from '@lobehub/ui'` | `import { Select } from '@lobehub/ui/base-ui'` |
| `import { Modal } from '@lobehub/ui'` + `<Modal open>` declarative | `createModal` / `confirmModal` from `@lobehub/ui/base-ui` (see modal skill) |
| `import { DropdownMenu/Popover/Switch } from '@lobehub/ui'` | Import same name from `@lobehub/ui/base-ui` instead |
+3 -3
View File
@@ -18,8 +18,8 @@ Periodic review of the project-local skill set under `.agents/skills/`. The goal
Build a fresh census of all SKILL.md files. Do NOT trust any prior cached list.
```bash
find -L .agents/skills -name SKILL.md | wc -l # total count, including symlinked skills
find -L .agents/skills -name SKILL.md -exec wc -l {} \; | sort -rn # by body length, including symlinked skills
find .agents/skills -name SKILL.md | wc -l # total count
find .agents/skills -name SKILL.md -exec wc -l {} \; | sort -rn # by body length
```
Group by domain in a mental table (DB / state / UI / agent / testing / workflow / docs / etc.). Note new arrivals since last audit (`git log --since="1 week ago" -- .agents/skills/`).
@@ -50,7 +50,7 @@ Common false positives (do NOT merge):
- `db-migrations` vs `drizzle` — distinct workflows (migration files vs schema authoring).
- `microcopy` vs `i18n` — content vs mechanics.
- `agent-runtime-hooks` vs `agent-tracing` vs `agent-signal` — different surfaces of the agent system.
- `testing` vs `agent-testing` — different test types.
- `testing` vs `local-testing` vs `cli-backend-testing` — different test types.
### 4 — Description format consistency
-3
View File
@@ -43,9 +43,6 @@ cd packages/database && TEST_SERVER_DB=1 bunx vitest run --silent='passed-only'
2. **Tests must pass type check** - Run `bun run type-check` after writing tests
3. **After 1-2 failed fix attempts, stop and ask for help**
4. **Test behavior, not implementation details**
5. **Regression tests for bug fixes** - After fixing a bug, add a regression test that fails before the fix and passes after, to prevent recurrence
6. **No new component tests** - Only update existing React component tests. Complex logic should be extracted into hooks and tested there instead
7. **All source changes before any test changes** - Complete all source file edits first, then update tests in a separate pass. Interleaving disrupts reasoning about the source changes, especially across many files
## Basic Test Structure
-318
View File
@@ -1,318 +0,0 @@
---
name: ux
description: 'LobeHub product design values / principles / checklists. Load this skill whenever the work touches user-interface features or implementation — designing or building any user-facing flow — to get better UX results.'
user-invocable: false
---
# UX — Design Values & Execution Checklists
How LobeHub products should feel, and concrete rules to get there. Use this when
**building or reviewing** any user-facing flow. For component/styling choices see
**react**, for wording see **microcopy**, for imperative modal wiring see **modal**.
## Design values
LobeHub follows four product design values — **Natural・Meaningful・Certainty・
Growth**. Read them before designing:
**[references/design-values.md](references/design-values.md)** (definitions +
conflict priority).
> The checklists below are the execution layer. Each item is tagged with the
> value(s) it serves; for what those values mean, see the file above.
## How this is organized
The checklists are grouped by **interaction type** — the kind of thing the user
is doing. Jump to the module that matches the surface you're building (reading a
list, editing content, running an action, …); each module collects the rules
specific to that interaction. The same surface often spans several modules (an
editable list is Read + Edit + Act) — walk each that applies.
---
## 1. Read — viewing data & lists
Any surface that **displays** records, lists, or detail. Covers the states a data
view can be in, behavior at scale, and keeping the user's place visible.
### 1.1 Data states: empty / loading / error・Meaningful・Certainty
Every data surface has **four** states — design all of them, not just "has data".
- [ ] **Empty state is a purpose-built page, not a blank screen.** It explains what
this is, why it's empty, and gives a clear next action (CTA + value props).
✅ Devices: an empty "Connect your first device" page with primary/secondary
connect paths and "what you can do once connected" cards — ❌ not a bare title
over skeleton rows or a blank body. _(Meaningful)_
- [ ] **Distinguish the empty variants** — "no data yet" (onboarding CTA) vs
"no match for filters" (clear-filters affordance) are different screens. _(Certainty)_
- [ ] **Always-rendered chrome still needs a body empty state.** When a surface
keeps its toolbar / header mounted even with no data (so a create / `+`
affordance stays reachable), the **body** below it must still render an empty
placeholder — persistent chrome is not an excuse to leave the content area
blank. ✅ The agent **Documents** tab keeps its new-folder / new-doc toolbar
and renders an `Empty` below it when there are no documents — ❌ not a toolbar
over dead space. _(Meaningful)_
- [ ] **Loading state** designed (skeleton / NeuralNetworkLoading), not a flash of
blank or layout shift. _(Natural)_
- [ ] **Error state** designed — surface the reason and a retry/back path. _(Meaningful)_
### 1.2 Lists at scale・Certainty・Natural
A list/data page must be designed for its **whole range of sizes**, not just the
demo data.
- [ ] **Walk the scale: 1 / 2 / 5 / 20 / 100 / 1k10k rows.** Pick the right
mechanism per range — plain render → load-more / pagination → virtual scroll;
add batch-select / bulk actions once counts get large. _(Certainty)_
- [ ] **Co-design empty / loading / error with the data state** (see §1.1). A list
isn't done until all four render well. _(Natural)_
### 1.3 Selection visibility in scrolled lists・Certainty・Natural
A capped / scrollable / virtualized list mounts at `scrollTop = 0`. If the
active item sits below the fold, the user lands on a valid selection that is
**off-screen** — and reads it as "nothing is selected" or a broken page. Any
list that can open with a pre-selected item must **scroll that item into view**.
This is an easy case to miss: it only shows up once the list is long enough and
the selection is restored rather than freshly clicked.
- [ ] **Scroll the active item into view on mount / restore.** When the selection
is restored from a URL query, deep link, or persisted state (not a fresh
click), bring it into view — the container starts at the top otherwise. ✅
The nested thread list is capped to \~9 rows; a thread restored from
`?thread=` below the fold is scrolled into view on mount. _(Certainty)_
- [ ] **Hardest when the selection has no other anchor.** If the parent/container
row isn't highlighted while a child is active (no breadcrumb, no header
echo), an off-screen active row means **zero** visible feedback — design
for exactly this case. _(Meaningful)_
- [ ] **Use `block: 'nearest'` (or equivalent).** Only scroll when the row is
actually off-screen; an already-visible selection must not jump. _(Natural)_
- [ ] **Re-run once async rows mount.** The active id is usually known before the
list finishes loading; key the scroll off a list-ready signal (e.g. row
count), not only off the id, so a restored selection still lands when the
data arrives. _(Certainty)_
- [ ] **Mirror it across duplicated list variants** so the behavior can't regress
in just one (e.g. parallel agent / group lists). _(Certainty)_
### 1.4 Option visibility in pickers・Certainty・Meaningful
- [ ] **Pickers list every valid target.** Watch for options dropped by backend
list queries (pagination, `virtual` flags, scope filters) and add them back.
✅ The default "LobeAI" (inbox) agent is `virtual` and excluded from the
sidebar list, so the move picker re-adds it. An empty picker must mean
"genuinely none", never "we filtered out the only option". _(Meaningful)_
### 1.5 Default view reflects entry intent & data state・Certainty・Meaningful
A surface with multiple tabs / views / panels has a **landing** selection. Don't
hardcode it to "the first tab" — derive it from **(a) how the user got here** (the
intent their navigation carried) and **(b) which views actually have data**. A
static default that lands the user on an empty tab while a sibling holds exactly
what they came for reads as broken. This pairs with §1.1: the empty state is the
fallback _within_ a view; this rule is about not landing on that empty view in the
first place when a better one exists.
- [ ] **Open on the tab the entry implies.** When navigation carries intent — the
user clicked a Skill, a file, a record of a specific type — land on the view
that shows it, not the static first tab. ✅ Opening a document page by clicking
a **skill** lands the right panel on the **Skills** tab; opening a plain
document lands on **Documents**. _(Meaningful)_
- [ ] **Fall back to a populated view when the default would be empty.** If the
default tab has no data but a sibling does, default to the populated one so
the surface opens on content. ✅ An agent with only skills (no documents)
opens the panel on **Skills** instead of an empty **Documents** tab. _(Certainty)_
- [ ] **Decide from resolved state, not mid-load.** Compute the default once the
data has loaded — choosing off an empty _in-flight_ list flips the tab as data
arrives. Hold the static default while loading, switch on resolved-empty. _(Certainty)_
- [ ] **A manual choice wins and sticks.** Once the user picks a tab, stop
auto-selecting — track "user-picked" separately (e.g. a nullable `pickedTab`
that overrides the derived default) so later data changes don't yank them off
their choice. _(Natural)_
---
## 2. Edit — entering & changing content
Any surface where the user **types or edits**. Input is expensive effort; the
overriding rule is **never lose it**.
### 2.1 Protect in-progress edits・Certainty・Meaningful
Typed / edited content is real user effort; losing it is one of the most
infuriating outcomes a product can produce. Whenever an editor holds unsaved
input, assume the exit can be **accidental** — a misclick, a refresh, a crash, a
navigation, a failed save — and build a safety net: back the draft up locally and
recover it.
- [ ] **Back up the draft locally as the user types.** Persist to
localStorage / IndexedDB / store so a refresh, crash, accidental close, or
navigation doesn't vaporize the content. _(Certainty)_
- [ ] **Restore on return.** Coming back to the same editing context auto-restores
(or offers to restore) the unsaved draft, rather than showing a blank field. _(Meaningful)_
- [ ] **Guard destructive exits.** Closing / navigating / switching items away
from a dirty editor warns or auto-saves — never silently discards. _(Certainty)_
- [ ] **Survive a failed save.** If the save errors, keep the user's content in
the field / draft and let them retry; never clear the input on failure. _(Meaningful)_
- [ ] **Scope the draft to its target** (per topic / message / item id) so drafts
don't bleed across entities or resurrect on the wrong item. _(Certainty)_
---
## 3. Act — operations, flows & buttons
Any surface where the user **performs an action** — a single op, a bulk op, or a
multi-step flow. Covers momentum, focus, and full entity lifecycle.
### 3.1 Flow & momentum・Natural・Meaningful
Every action chain must **push the user forward**, never dead-end or block the flow.
- [ ] **Forward momentum** — after any operation, lead the user to the next step,
don't just stop. _(Meaningful)_
- [ ] **Success state = primary "go to result", secondary "dismiss"** — the strong
button is the forward action (take me to the result); "Done" is the weak/
secondary button. ✅ After moving topics: primary = "Go to «target»", secondary
\= "Done". _(Meaningful・Natural)_
- [ ] **Bulk ⇄ single-item parity** — an action on a multi-select toolbar must also
be reachable on a single item (its context menu), and vice versa. _(Certainty)_
- [ ] **Confirm → in-progress → done, in one surface** — bulk/irreversible/async
ops use a modal state machine: a confirm step stating exactly what happens →
an in-progress view with **dismissal locked** → a done (or error) view in the
same modal. Never fire-and-forget with only a toast; never leave a dead
spinner. _(Certainty・Meaningful)_
### 3.2 One primary button per surface・Certainty
- [ ] **One primary button per surface.** The single primary CTA tells the user the
core action; everything else is secondary/tertiary. Never a pile of primary
buttons competing for attention. _(Certainty)_
### 3.3 Entity lifecycle completeness・Meaningful・Certainty
The recurring trap: a feature ships only the **display** of a list, but edit /
delete / management are never built — so the user can add something and then be
stuck with it. For every entity a user can see, design its **full lifecycle**:
create / read / update / delete, plus state transitions (enable/disable,
connect/disconnect, install/uninstall). A read-only list the user can't manage
breaks the flow.
**The allowed operation set depends on the entity's source / ownership** — decide
it explicitly _before_ building. Worked example, the tools/connectors list:
| Entity class | Add | Edit | Remove |
| ----------------------------------- | ------- | --------- | ------------------ |
| Official / built-in (skills, tools) | — | — | ✗ not removable |
| Community (installed MCP) | install | configure | uninstall / remove |
| User-custom (custom connector) | create | edit | delete |
- [ ] **No display-only features.** For every listed entity, enumerate CRUD +
lifecycle ops and build the ones that apply. _(Meaningful)_
- [ ] **Operation set per source/ownership class** — built-in may be read-only;
anything the user _installed_ must be removable; anything the user _created_
must be editable **and** deletable. _(Certainty)_
- [ ] **Each item exposes its allowed ops** (hover action / context menu / detail
page), and there's a clear entry point to add/create where applicable. _(Natural)_
- [ ] **An intentionally-absent op is a documented decision, not an oversight**
(e.g. official tools can't be deleted — by design). _(Certainty)_
---
## 4. Feedback — loading & system response
How the product **answers back** while and after the user acts — loading visuals
and proactive guardrails.
### 4.1 Loading visuals・Natural
**Never use antd `Spin`** — it doesn't match the product's loading visual. Use a
project loader:
| Need | Component |
| --------------------------- | ----------------------------------------------------------------------------- |
| Default loading (in-flight) | `NeuralNetworkLoading` from `@/components/NeuralNetworkLoading` (`size` prop) |
| Inline dots | `DotsLoading` / `BubblesLoading` from `@/components` |
| Branded full-page | `Loading` from `@/components/Loading/BrandTextLoading` |
| List / card placeholder | a skeleton (e.g. `SkeletonList`) |
When in doubt, reach for `NeuralNetworkLoading` — it's the default in-flight
indicator (e.g. modal "in progress" states).
### 4.2 Capability-gated features・Certainty・Meaningful
A feature can be fully built and still produce a broken result when the selected
model — or its still-loading config — **can't deliver the capability the feature
depends on** (for example, an agentic run on a model without tool calling). This
is usually the user's configuration choice, not a defect; but if the product stays
silent the user reads it as the product being broken. When a feature's success
depends on a capability the current config may lack, the product owes a
**proactive, non-blocking reminder** — a guardrail, not a gate.
- [ ] **Surface the mismatch, don't fail silently.** When a feature needs a model
capability (tool calling, vision, reasoning, long context) the current model
lacks, show a soft inline warning at the point of action — never a hard block
or a modal that stops the user. _(Meaningful)_
- [ ] **Stay reactive.** The reminder clears the moment the user switches to a
capable model — derive it from live state, not a one-shot check. _(Natural)_
- [ ] **Don't warn while config is loading.** A capability that hasn't resolved yet
looks "unsupported"; warning then is a false alarm — exactly the glitch users
mistake for a product bug. Warn only on a _resolved_ unsupported state. _(Certainty)_
- [ ] **Scope to the mode that needs it.** Show only when the capability-dependent
mode is on; one reminder per root cause, never a pile of overlapping notices. _(Natural・Certainty)_
- [ ] **State the problem and the remedy.** The copy says what's wrong _and_ what
the user should do about it. _(Meaningful)_
---
## 5. Grow — discoverability & progressive disclosure
How the product **deepens** as the user's needs deepen.
### 5.1 Progressive disclosure・Growth
The product should grow with the user — deeper power shows up as needs deepen.
- [ ] **Progressive disclosure** — keep the novice path clean; reveal advanced
capabilities as the user gets there, don't dump everything at once. _(Growth・Natural)_
- [ ] **Surface related actions at the moment of need** — make the next capability
discoverable in context (e.g. after the first item exists, offer what to do
with it), not buried in a far-off menu. _(Growth・Meaningful)_
---
## Quick review checklist
**Read — viewing data & lists**
- [ ] Empty / loading / error states are all designed; empty is a real page with a CTA. Always-rendered chrome (toolbar/header) still gets a body empty state.
- [ ] List designed across 1 → 10k rows (virtual scroll / pagination / batch as needed).
- [ ] Capped/scrollable/virtualized list scrolls the restored active item into view on mount (`block: 'nearest'`, re-run after async rows mount).
- [ ] Pickers show all valid targets (default/inbox included); empty = truly none.
- [ ] Multi-tab/view surface lands on the tab the entry intent implies (and falls back to a populated view, decided from resolved state); a manual pick sticks.
**Edit — entering & changing content**
- [ ] Editors back up in-progress input locally and recover it after refresh/crash/failed-save; destructive exits warn, never silently discard.
**Act — operations, flows & buttons**
- [ ] Action leads the user forward; success offers a primary "go to result".
- [ ] Bulk action has a single-item entry (and vice versa).
- [ ] Async/bulk/irreversible action: confirm → in-progress (locked) → done/error.
- [ ] Exactly one primary button per surface.
- [ ] Listed entities have their full lifecycle (not display-only); ops match source (built-in / installed / custom).
**Feedback — loading & system response**
- [ ] No antd `Spin`; use `NeuralNetworkLoading` / project loaders.
- [ ] Capability-gated feature warns (soft, reactive, load-gated) when the model can't deliver it; copy gives the remedy.
**Grow — discoverability & progressive disclosure**
- [ ] Advanced capability is progressively disclosed / discoverable at the moment of need.
## Related skills
- **modal** — imperative `createModal` state-machine wiring for confirm/progress/done.
- **microcopy** — wording for confirm / done / empty / error states.
- **react** — component priority, `Button` usage, styling.
@@ -1,51 +0,0 @@
# LobeHub Design Values (设计价值观)
The philosophy behind every LobeHub interface. Read this before designing or
reviewing a flow; the per-aspect execution rules live in the parent
[SKILL.md](../SKILL.md) and each checklist item is tagged with the value(s) it serves.
Adapted from Ant Design's design values
(<https://ant.design/docs/spec/values-cn>, <https://zhuanlan.zhihu.com/p/44809866>).
LobeHub adopts all four.
## 自然 (Natural)
Minimise cognitive load. Digital products keep getting more complex while human
attention stays scarce — so the interface should feel as effortless as the
physical world. The next step should be obvious without thinking; the product
proactively carries the user forward (sensible defaults, AI-assisted decisions,
smooth transitions) rather than making them stop and figure things out.
## 意义感 (Meaningful)
Every screen is rooted in the user's real goal, not an isolated feature. Make the
objective clear, give immediate feedback on the result of each action, and always
point at the next meaningful step. Calibrate difficulty — neither a patronising
over-simplification nor an overwhelming wall — so the user keeps a sense of
progress and accomplishment.
## 确定性 (Certainty)
Low-entropy, predictable interactions. Reuse the same patterns, components, and
wording so behaviour is never surprising. Keep a single clear focus per surface,
and design **every** state (empty / loading / error / success) so nothing is left
undefined. Restraint over cleverness: fewer, consistent rules beat many bespoke
ones.
## 生长性 (Growth)
The product grows together with the user. As needs deepen and roles evolve,
surface advanced capabilities progressively and make related features
discoverable at the moment they become relevant — without crowding the novice
path. Bridge product value to the user's changing scenarios and aim for
humanmachine symbiosis (人机共生): the user and the agent co-evolve, each making
the other more capable over time.
## Priority when values conflict
For moment-to-moment interaction decisions: **意义感 ≳ 自然 > 确定性** — never
sacrifice the user's goal or forward momentum just to keep things uniform.
**生长性 (Growth)** is a longer-horizon lens: weigh it when shaping how a feature
is discovered and how it scales with the user, not when resolving a single-screen
layout trade-off.
@@ -49,4 +49,4 @@ Migration owner: @{pr-author}
The migration owner is responsible for rollout follow-up and incident handling for this schema change.
> \[!NOTE]: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'` or from commit metadata. Do not hardcode a username.
> **Note for Claude**: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'` or from commit metadata. Do not hardcode a username.
@@ -18,4 +18,4 @@
@{pr-author}
> \[!NOTE]: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'`. Do not hardcode a username.
> **Note for Claude**: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'`. Do not hardcode a username.
@@ -86,7 +86,7 @@ New AI model or provider support, typically contributed via community PRs.
- These PR title prefixes (`feat` / `style`) are in the auto-tag trigger list
- No special branch naming or manual release steps required — merging the PR triggers auto patch +1
### When an agent is involved
### When Claude is involved
If asked to add model support, just create a normal feature PR. The title prefix will trigger the release automatically.
+5 -14
View File
@@ -425,14 +425,14 @@ OPENAI_API_KEY=sk-xxxxxxxxx
# MCP_TOOL_TIMEOUT=60000
# #######################################
# ######### Composio Service ############
# ######### Klavis Service ##############
# #######################################
# Composio API Key for accessing hosted integrations (Gmail, Slack, etc.)
# Get your API key from: https://composio.dev
# Klavis API Key for accessing Strata hosted MCP servers
# Get your API key from: https://klavis.io
# IMPORTANT: This key is stored server-side only and NEVER exposed to the client
# When this key is set, Composio integration will be automatically enabled
# COMPOSIO_API_KEY=your_composio_api_key_here
# When this key is set, Klavis integration will be automatically enabled
# KLAVIS_API_KEY=your_klavis_api_key_here
# #######################################
# #### Message Gateway (IM Integration) ##
@@ -445,15 +445,6 @@ OPENAI_API_KEY=sk-xxxxxxxxx
# MESSAGE_GATEWAY_URL=https://message-gateway.lobehub.com
# MESSAGE_GATEWAY_SERVICE_TOKEN=your_service_token_here
# #######################################
# ######### Agent Gateway Mode ##########
# #######################################
# Enable Gateway Mode for self-hosted deployments. Requires AGENT_GATEWAY_URL.
# ENABLE_AGENT_GATEWAY=1
# AGENT_GATEWAY_URL=https://agent-gateway.example.com
# AGENT_GATEWAY_SERVICE_TOKEN=your_service_token_here
# #######################################
# ########### Messenger Bot #############
# #######################################
+2 -89
View File
@@ -5,18 +5,6 @@ inputs:
node-version:
description: Node.js version
required: true
cloud-repository:
description: Cloud repository to overlay for commercial desktop builds
required: false
default: lobehub/lobehub-cloud
cloud-ref:
description: Optional Cloud repository ref
required: false
default: ''
cloud-token:
description: GitHub token with permission to read the Cloud repository
required: false
default: ''
runs:
using: composite
@@ -26,77 +14,9 @@ runs:
with:
node-version: ${{ inputs.node-version }}
- name: Overlay Cloud repository for desktop build
if: inputs.cloud-token != ''
shell: bash
env:
CLOUD_CHECKOUT: ${{ runner.temp }}/lobehub-cloud
CLOUD_REF: ${{ inputs.cloud-ref }}
CLOUD_REPOSITORY: ${{ inputs.cloud-repository }}
CLOUD_ROOT: ${{ github.workspace }}/..
CLOUD_TOKEN: ${{ inputs.cloud-token }}
run: |
set -euo pipefail
cloud_root="$(cd "$GITHUB_WORKSPACE/.." && pwd)"
cloud_checkout="$RUNNER_TEMP/lobehub-cloud"
rm -rf "$cloud_checkout"
clone_args=(--depth 1)
if [ -n "$CLOUD_REF" ]; then
clone_args+=(--branch "$CLOUD_REF")
fi
git clone "${clone_args[@]}" "https://x-access-token:${CLOUD_TOKEN}@github.com/${CLOUD_REPOSITORY}.git" "$cloud_checkout"
node <<'NODE'
const fs = require('node:fs');
const path = require('node:path');
const source = process.env.CLOUD_CHECKOUT;
const target = process.env.CLOUD_ROOT;
const skip = new Set(['.git', 'lobehub', 'node_modules']);
const copy = (from, to) => {
const stat = fs.lstatSync(from);
if (stat.isSymbolicLink()) {
const link = fs.readlinkSync(from);
fs.rmSync(to, { force: true, recursive: true });
fs.symlinkSync(link, to);
return;
}
if (stat.isDirectory()) {
fs.mkdirSync(to, { recursive: true });
for (const entry of fs.readdirSync(from)) {
if (skip.has(entry)) continue;
copy(path.join(from, entry), path.join(to, entry));
}
return;
}
fs.mkdirSync(path.dirname(to), { recursive: true });
fs.copyFileSync(from, to);
};
for (const entry of fs.readdirSync(source)) {
if (skip.has(entry)) continue;
copy(path.join(source, entry), path.join(target, entry));
}
NODE
echo "CLOUD_DESKTOP=1" >> "$GITHUB_ENV"
echo "✅ Cloud repository overlaid at $cloud_root"
- name: Install dependencies
shell: bash
run: |
set -euo pipefail
if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
cd ..
fi
pnpm install --node-linker=hoisted
run: pnpm install --node-linker=hoisted
# 移除国内 electron 镜像配置,GitHub Actions 使用官方源更快
- name: Remove China electron mirror from .npmrc
@@ -111,11 +31,4 @@ runs:
- name: Install deps on Desktop
shell: bash
run: |
set -euo pipefail
if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
cd ..
npm run install-isolated --prefix=./lobehub/apps/desktop
else
npm run install-isolated --prefix=./apps/desktop
fi
run: npm run install-isolated --prefix=./apps/desktop
+1 -1
View File
@@ -6,7 +6,7 @@ const prComment = async ({ github, context, releaseUrl, artifactsUrl, version, t
const COMMENT_IDENTIFIER = '<!-- DESKTOP-BUILD-COMMENT -->';
/**
* Generate comment body content
* 生成评论内容
*/
const generateCommentBody = async () => {
try {
+1 -1
View File
@@ -30,7 +30,7 @@ jobs:
This issue is closed, If you have any questions, you can comment and reply.
- name: Checkout repository
if: github.event_name == 'pull_request_target' && github.event.pull_request.merged == true
uses: actions/checkout@v6
uses: actions/checkout@v4
- name: Check if PR author is maintainer
if: github.event.pull_request.merged == true
@@ -104,7 +104,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -173,7 +172,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -218,7 +216,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+1 -2
View File
@@ -54,7 +54,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.16.0
node-version: 24.11.1
package-manager-cache: false
# 主要逻辑:确定构建版本号
@@ -92,7 +92,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: 24.11.1
# 设置 package.json 的版本号
@@ -87,7 +87,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+1 -2
View File
@@ -223,7 +223,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -410,7 +409,7 @@ jobs:
- uses: actions/checkout@v6
- name: Delete old canary GitHub releases
uses: actions/github-script@v8
uses: actions/github-script@v7
with:
script: |
const { data: releases } = await github.rest.repos.listReleases({
@@ -180,7 +180,6 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -2
View File
@@ -28,7 +28,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.16.0
node-version: 24.11.1
- name: Setup pnpm
uses: pnpm/action-setup@v4
@@ -51,7 +51,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.16.0
node-version: 24.11.1
registry-url: https://registry.npmjs.org
- name: Setup pnpm
+25 -1
View File
@@ -19,6 +19,12 @@ jobs:
steps:
- uses: actions/checkout@v6
- name: Clean issue notice
uses: actions-cool/issues-helper@e361abf610221f09495ad510cb1e69328d839e1c # v3.7.6
with:
actions: 'close-issues'
labels: '🚨 Sync Fail'
- name: Sync upstream changes
id: sync
uses: aormsby/Fork-Sync-With-Upstream-action@v3.4
@@ -27,4 +33,22 @@ jobs:
upstream_sync_branch: main
target_sync_branch: main
target_repo_token: ${{ secrets.GITHUB_TOKEN }} # automatically generated, no need to set
test_mode: false
test_mode: false
- name: Sync check
if: failure()
uses: actions-cool/issues-helper@e361abf610221f09495ad510cb1e69328d839e1c # v3.7.6
with:
actions: 'create-issue'
title: '🚨 同步失败 | Sync Fail'
labels: '🚨 Sync Fail'
body: |
Due to a change in the workflow file of the [LobeChat][lobechat] upstream repository, GitHub has automatically suspended the scheduled automatic update. You need to manually sync your fork. Please refer to the detailed [Tutorial][tutorial-en-US] for instructions.
由于 [LobeChat][lobechat] 上游仓库的 workflow 文件变更,导致 GitHub 自动暂停了本次自动更新,你需要手动 Sync Fork 一次,请查看 [详细教程][tutorial-zh-CN]
![](https://github-production-user-asset-6210df.s3.amazonaws.com/17870709/273954625-df80c890-0822-4ac2-95e6-c990785cbed5.png)
[lobechat]: https://github.com/lobehub/lobe-chat
[tutorial-zh-CN]: https://lobehub.com/zh/docs/self-hosting/advanced/upstream-sync
[tutorial-en-US]: https://lobehub.com/docs/self-hosting/advanced/upstream-sync
+6 -52
View File
@@ -32,7 +32,7 @@ jobs:
runs-on: ubuntu-latest
name: Test Packages
env:
PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'
PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/python-interpreter @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'
steps:
- name: Checkout
@@ -90,23 +90,11 @@ jobs:
for package in $PACKAGES; do
dir="${package#@lobechat/}"
if [ -f "./packages/$dir/coverage/lcov.info" ]; then
flag="packages/$dir"
case "$dir" in
builtin-tool-*)
flag="builtin-tools"
;;
locales|env|device-gateway-client)
echo "Skipping Codecov upload for $dir."
continue
;;
esac
echo "Uploading coverage for $dir as $flag..."
echo "Uploading coverage for $dir..."
./codecov upload-coverage \
$COMMON_ARGS \
--file ./packages/$dir/coverage/lcov.info \
--flag "$flag" \
--flag packages/$dir \
--disable-search
fi
done
@@ -117,8 +105,8 @@ jobs:
if: needs.check-duplicate-run.outputs.should_skip != 'true'
strategy:
matrix:
shard: [1, 2]
name: Test App (shard ${{ matrix.shard }}/2)
shard: [1, 2, 3]
name: Test App (shard ${{ matrix.shard }}/3)
runs-on: ubuntu-latest
steps:
- name: Checkout
@@ -138,7 +126,7 @@ jobs:
run: pnpm install
- name: Run tests
run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/2 --exclude '**/apps/server/**'
run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/3
- name: Upload blob report
if: ${{ !cancelled() }}
@@ -231,40 +219,6 @@ jobs:
files: ./apps/desktop/coverage/lcov.info
flags: desktop
test-server:
needs: check-duplicate-run
if: needs.check-duplicate-run.outputs.should_skip != 'true'
name: Test Server
runs-on: ubuntu-latest
steps:
- name: Checkout
env:
REF_SHA: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
REPOSITORY: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name || github.repository }}
run: |
git init .
git remote add origin "https://github.com/${REPOSITORY}.git"
git fetch --no-tags --depth=1 origin "${REF_SHA}"
git checkout --force FETCH_HEAD
- name: Setup environment
uses: ./.github/actions/setup-env
- name: Install deps
run: pnpm install
- name: Test Server Coverage
run: bunx vitest --coverage --silent='passed-only' --reporter=default --coverage.reportsDirectory=./apps/server/coverage --dir apps/server
- name: Upload Server coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./apps/server/coverage/lcov.info
flags: server
test-databsae:
needs: check-duplicate-run
if: needs.check-duplicate-run.outputs.should_skip != 'true'
+3 -2
View File
@@ -59,7 +59,6 @@ bun.lockb
# Build outputs
dist/
public/_spa/
public/_spa-auth/
public/spa/
es/
lib/
@@ -93,8 +92,10 @@ public/swe-worker*
# Generated files
src/app/spa/[variants]/[[...path]]/spaHtmlTemplates.ts
src/app/spa-auth/authHtmlTemplate.ts
public/*.js
public/sitemap.xml
public/sitemap-index.xml
sitemap*.xml
robots.txt
# Git hooks
-2
View File
@@ -136,5 +136,3 @@ bun run type-check
### Code Review
Before reviewing a PR / diff / branch change, read the **review-checklist** skill (`.agents/skills/review-checklist/SKILL.md`) — it lists the recurring mistakes specific to this codebase.
When designing or reviewing user-facing flows (empty/loading/error states, confirmations, async feedback, button hierarchy, lists at scale, pickers), follow the **ux** skill (`.agents/skills/ux/SKILL.md`) — LobeHub's design values (自然 / 意义感 / 确定性) plus per-aspect execution checklists.
-25
View File
@@ -2,31 +2,6 @@
# Changelog
## [Version 2.2.6](https://github.com/lobehub/lobe-chat/compare/v2.2.6-canary.8...v2.2.6)
<sup>Released on **2026-06-17**</sup>
#### ✨ Features
- **agent**: improve connector, document, and fleet workflows.
<br/>
<details>
<summary><kbd>Improvements and Fixes</kbd></summary>
#### What's improved
- **agent**: improve connector, document, and fleet workflows, closes [#15936](https://github.com/lobehub/lobe-chat/issues/15936) ([3f82033](https://github.com/lobehub/lobe-chat/commit/3f82033))
</details>
<div align="right">
[![](https://img.shields.io/badge/-BACK_TO_TOP-151515?style=flat-square)](#readme-top)
</div>
## [Version 2.2.1](https://github.com/lobehub/lobe-chat/compare/v0.0.0-nightly.pr15228.13999...v2.2.1)
<sup>Released on **2026-05-29**</sup>
-37
View File
@@ -1,7 +1,4 @@
import { execSync } from 'node:child_process';
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import { describe, expect, it } from 'vitest';
@@ -80,40 +77,6 @@ describe('lh file - E2E', () => {
});
});
// ── upload (local file) ───────────────────────────────
describe('upload', () => {
it('should upload a local file passed as a positional argument', () => {
const tmpFile = path.join(os.tmpdir(), `lh-e2e-upload-${Date.now()}.txt`);
fs.writeFileSync(tmpFile, 'hello from lh e2e upload');
try {
const result = runJson<{ id: string }>(`file upload ${tmpFile} --json id`);
expect(result).toHaveProperty('id');
if (result.id) run(`file delete ${result.id} --yes`);
} finally {
fs.rmSync(tmpFile, { force: true });
}
});
it('should upload a local file passed via --file', () => {
const tmpFile = path.join(os.tmpdir(), `lh-e2e-upload-f-${Date.now()}.txt`);
fs.writeFileSync(tmpFile, 'hello from lh e2e --file upload');
try {
const result = runJson<{ id: string }>(`file upload --file ${tmpFile} --json id`);
expect(result).toHaveProperty('id');
if (result.id) run(`file delete ${result.id} --yes`);
} finally {
fs.rmSync(tmpFile, { force: true });
}
});
it('should error when the local file does not exist', () => {
expect(() => run('file upload -f /no/such/lh-file.txt')).toThrow();
});
});
// ── recent ────────────────────────────────────────────
describe('recent', () => {
+1 -1
View File
@@ -1,6 +1,6 @@
.\" Code generated by `npm run man:generate`; DO NOT EDIT.
.\" Manual command details come from the Commander command tree.
.TH LH 1 "" "@lobehub/cli 0.0.32" "User Commands"
.TH LH 1 "" "@lobehub/cli 0.0.27" "User Commands"
.SH NAME
lh \- LobeHub CLI \- manage and connect to LobeHub services
.SH SYNOPSIS
+2 -5
View File
@@ -1,6 +1,6 @@
{
"name": "@lobehub/cli",
"version": "0.0.32",
"version": "0.0.27",
"type": "module",
"bin": {
"lh": "./dist/index.js",
@@ -29,15 +29,13 @@
},
"devDependencies": {
"@lobechat/agent-gateway-client": "workspace:*",
"@lobechat/device-control": "workspace:*",
"@lobechat/device-gateway-client": "workspace:*",
"@lobechat/device-identity": "workspace:*",
"@lobechat/heterogeneous-agents": "workspace:*",
"@lobechat/local-file-shell": "workspace:*",
"@lobechat/tool-runtime": "workspace:*",
"@trpc/client": "^11.8.1",
"@types/node": "^24.13.2",
"@types/semver": "^7.7.1",
"@types/node": "^22.13.5",
"@types/ws": "^8.18.1",
"commander": "^13.1.0",
"dayjs": "^1.11.19",
@@ -46,7 +44,6 @@
"fast-glob": "^3.3.3",
"ignore": "^7.0.5",
"picocolors": "^1.1.1",
"semver": "^7.7.3",
"superjson": "^2.2.6",
"tsdown": "^0.21.4",
"typescript": "^6.0.3",
-1
View File
@@ -1,6 +1,5 @@
packages:
- '../../packages/agent-gateway-client'
- '../../packages/device-control'
- '../../packages/device-gateway-client'
- '../../packages/device-identity'
- '../../packages/heterogeneous-agents'
-19
View File
@@ -440,25 +440,6 @@ describe('connect command', () => {
});
});
describe('disconnect (alias for connect stop)', () => {
it('should stop running daemon', async () => {
mockRunningPid = 12345;
const program = createProgram();
await program.parseAsync(['node', 'test', 'disconnect']);
expect(stopDaemon).toHaveBeenCalled();
expect(log.info).toHaveBeenCalledWith(expect.stringContaining('Daemon stopped'));
});
it('should warn if no daemon is running', async () => {
const program = createProgram();
await program.parseAsync(['node', 'test', 'disconnect']);
expect(log.warn).toHaveBeenCalledWith(expect.stringContaining('No daemon'));
});
});
describe('connect status', () => {
it('should show no daemon running', async () => {
const program = createProgram();
+16 -94
View File
@@ -2,16 +2,8 @@ import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import {
defaultGetLocalFilePreview,
defaultGetProjectFileIndex,
type DeviceControlDeps,
executeDeviceRpc,
} from '@lobechat/device-control';
import type {
AgentRunRequestMessage,
DeviceSystemInfo,
RpcRequestMessage,
SystemInfoRequestMessage,
ToolCallRequestMessage,
} from '@lobechat/device-gateway-client';
@@ -33,7 +25,6 @@ import {
stopDaemon,
writeStatus,
} from '../daemon/manager';
import { spawnHeteroAgentRun } from '../device/agentRun';
import { registerDevice, resolveDeviceIdentity } from '../device/register';
import { loadOrCreateConnectionId, loadSettings, normalizeUrl, saveSettings } from '../settings';
import { executeToolCall } from '../tools';
@@ -74,7 +65,17 @@ export function registerConnectCommand(program: Command) {
});
// Subcommands
connectCmd.command('stop').description('Stop the background daemon process').action(handleStop);
connectCmd
.command('stop')
.description('Stop the background daemon process')
.action(() => {
const stopped = stopDaemon();
if (stopped) {
log.info('Daemon stopped.');
} else {
log.warn('No daemon is running.');
}
});
connectCmd
.command('status')
@@ -138,27 +139,10 @@ export function registerConnectCommand(program: Command) {
}
handleDaemonStart({ ...options, daemon: true });
});
// Top-level alias for `connect stop`. Users who run `lh connect` naturally
// reach for `lh disconnect` to undo it; the nested `connect stop` is not
// discoverable enough on its own.
program
.command('disconnect')
.description('Disconnect from the device gateway (alias for `connect stop`)')
.action(handleStop);
}
// --- Internal helpers ---
function handleStop() {
const stopped = stopDaemon();
if (stopped) {
log.info('Daemon stopped.');
} else {
log.warn('No daemon is running.');
}
}
function handleDaemonStart(options: ConnectOptions) {
const existingPid = getRunningDaemonPid();
if (existingPid !== null) {
@@ -276,23 +260,19 @@ async function runConnect(options: ConnectOptions, isDaemonChild: boolean) {
// Handle tool call requests
client.on('tool_call_request', async (request: ToolCallRequestMessage) => {
const { operationId, requestId, timeout, toolCall } = request;
const { requestId, timeout, toolCall } = request;
if (isDaemonChild) {
appendLog(
`[TOOL] ${toolCall.apiName}${operationId ? ` op=${operationId}` : ''} (${requestId})`,
);
appendLog(`[TOOL] ${toolCall.apiName} (${requestId})`);
} else {
log.toolCall(toolCall.apiName, requestId, toolCall.arguments, operationId);
log.toolCall(toolCall.apiName, requestId, toolCall.arguments);
}
const result = await executeToolCall(toolCall.apiName, toolCall.arguments, timeout);
if (isDaemonChild) {
appendLog(
`[RESULT] ${result.success ? 'OK' : 'FAIL'}${operationId ? ` op=${operationId}` : ''} (${requestId})`,
);
appendLog(`[RESULT] ${result.success ? 'OK' : 'FAIL'} (${requestId})`);
} else {
log.toolResult(requestId, result.success, result.content, operationId);
log.toolResult(requestId, result.success, result.content);
}
client.sendToolCallResponse({
@@ -306,64 +286,6 @@ async function runConnect(options: ConnectOptions, isDaemonChild: boolean) {
});
});
// Handle generic server-internal device RPCs (git / workspace / file ops).
// Shares the `@lobechat/device-control` dispatcher with the desktop app so the
// CLI exposes the same remote-device control surface. File preview / index use
// the package's portable defaults (no preview-protocol approval on the CLI).
const deviceControlDeps: DeviceControlDeps = {
getLocalFilePreview: defaultGetLocalFilePreview,
getProjectFileIndex: defaultGetProjectFileIndex,
};
client.on('rpc_request', async (request: RpcRequestMessage) => {
const { method, params, requestId } = request;
if (isDaemonChild) appendLog(`[RPC] ${method} (${requestId})`);
else info(`Received rpc_request: method=${method} (${requestId})`);
try {
const data = await executeDeviceRpc(method, params, deviceControlDeps);
client.sendRpcResponse({ requestId, result: { data, success: true } });
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
if (isDaemonChild) appendLog(`[RPC ERROR] ${method}: ${message} (${requestId})`);
else error(`rpc_request method=${method} failed: ${message}`);
client.sendRpcResponse({ requestId, result: { error: message, success: false } });
}
});
// Handle gateway-dispatched agent runs (heterogeneous agents, e.g. Claude
// Code). Mirrors the desktop app: spawn `lh hetero exec`, which owns the full
// execution + server-ingest pipeline. Ack with the spawn outcome — `accepted`
// once the child starts, `rejected` if it fails to spawn (e.g. bad cwd) — so
// a failed dispatch surfaces as an error instead of a stuck assistant message.
client.on('agent_run_request', async (request: AgentRunRequestMessage) => {
info(
`Received agent_run_request: operationId=${request.operationId} type=${request.agentType}`,
);
try {
const ack = await spawnHeteroAgentRun(
{
agentType: request.agentType,
cwd: request.cwd,
imageList: request.imageList,
jwt: request.jwt,
operationId: request.operationId,
prompt: request.prompt,
resumeSessionId: request.resumeSessionId,
serverUrl: auth.serverUrl,
systemContext: request.systemContext,
topicId: request.topicId,
},
{ error, info },
);
client.sendAgentRunAck({ operationId: request.operationId, ...ack });
} catch (err) {
const reason = err instanceof Error ? err.message : String(err);
error(`agent_run_request failed: ${reason}`);
client.sendAgentRunAck({ operationId: request.operationId, reason, status: 'rejected' });
}
});
client.on('connected', () => {
updateStatus('connected');
});
+3 -117
View File
@@ -1,7 +1,3 @@
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import { Command } from 'commander';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
@@ -21,9 +17,6 @@ const { mockTrpcClient } = vi.hoisted(() => ({
removeFiles: { mutate: vi.fn() },
updateFile: { mutate: vi.fn() },
},
upload: {
createS3PreSignedUrl: { mutate: vi.fn() },
},
},
}));
@@ -45,11 +38,9 @@ describe('file command', () => {
exitSpy = vi.spyOn(process, 'exit').mockImplementation((() => {}) as any);
consoleSpy = vi.spyOn(console, 'log').mockImplementation(() => {});
mockGetTrpcClient.mockResolvedValue(mockTrpcClient);
for (const group of [mockTrpcClient.file, mockTrpcClient.upload]) {
for (const method of Object.values(group)) {
for (const fn of Object.values(method)) {
(fn as ReturnType<typeof vi.fn>).mockReset();
}
for (const method of Object.values(mockTrpcClient.file)) {
for (const fn of Object.values(method)) {
(fn as ReturnType<typeof vi.fn>).mockReset();
}
}
});
@@ -214,111 +205,6 @@ describe('file command', () => {
expect(mockTrpcClient.file.createFile.mutate).not.toHaveBeenCalled();
expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('already exists'));
});
it('should upload a local file passed as a positional argument', async () => {
const tmpFile = path.join(os.tmpdir(), `lh-upload-${process.pid}.txt`);
fs.writeFileSync(tmpFile, 'hello world');
const fetchSpy = vi
.spyOn(globalThis, 'fetch')
.mockResolvedValue({ ok: true, status: 200, statusText: 'OK' } as Response);
mockTrpcClient.file.checkFileHash.mutate.mockResolvedValue({ isExist: false });
mockTrpcClient.upload.createS3PreSignedUrl.mutate.mockResolvedValue('https://s3/presigned');
mockTrpcClient.file.createFile.mutate.mockResolvedValue({
id: 'f-local',
url: 'files/x.txt',
});
try {
const program = createProgram();
await program.parseAsync(['node', 'test', 'file', 'upload', tmpFile]);
expect(mockTrpcClient.upload.createS3PreSignedUrl.mutate).toHaveBeenCalled();
expect(fetchSpy).toHaveBeenCalledWith(
'https://s3/presigned',
expect.objectContaining({ method: 'PUT' }),
);
expect(mockTrpcClient.file.createFile.mutate).toHaveBeenCalledWith(
expect.objectContaining({
fileType: 'text/plain',
name: path.basename(tmpFile),
url: expect.stringContaining('.txt'),
}),
);
expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('File created'));
} finally {
fetchSpy.mockRestore();
fs.rmSync(tmpFile, { force: true });
}
});
it('should upload a local file passed via --file', async () => {
const tmpFile = path.join(os.tmpdir(), `lh-upload-f-${process.pid}.json`);
fs.writeFileSync(tmpFile, '{}');
const fetchSpy = vi
.spyOn(globalThis, 'fetch')
.mockResolvedValue({ ok: true, status: 200, statusText: 'OK' } as Response);
mockTrpcClient.file.checkFileHash.mutate.mockResolvedValue({ isExist: false });
mockTrpcClient.upload.createS3PreSignedUrl.mutate.mockResolvedValue('https://s3/presigned');
mockTrpcClient.file.createFile.mutate.mockResolvedValue({ id: 'f-json' });
try {
const program = createProgram();
await program.parseAsync(['node', 'test', 'file', 'upload', '--file', tmpFile]);
expect(mockTrpcClient.file.createFile.mutate).toHaveBeenCalledWith(
expect.objectContaining({ fileType: 'application/json' }),
);
} finally {
fetchSpy.mockRestore();
fs.rmSync(tmpFile, { force: true });
}
});
it('should skip the S3 upload when the local file hash already exists', async () => {
const tmpFile = path.join(os.tmpdir(), `lh-upload-dedup-${process.pid}.txt`);
fs.writeFileSync(tmpFile, 'dedup me');
const fetchSpy = vi.spyOn(globalThis, 'fetch');
mockTrpcClient.file.checkFileHash.mutate.mockResolvedValue({
isExist: true,
url: 'files/2024-01-01/existing.txt',
});
mockTrpcClient.file.createFile.mutate.mockResolvedValue({ id: 'f-dedup' });
try {
const program = createProgram();
await program.parseAsync(['node', 'test', 'file', 'upload', tmpFile]);
// No pre-sign and no S3 PUT should happen
expect(mockTrpcClient.upload.createS3PreSignedUrl.mutate).not.toHaveBeenCalled();
expect(fetchSpy).not.toHaveBeenCalled();
// The record reuses the existing url
expect(mockTrpcClient.file.createFile.mutate).toHaveBeenCalledWith(
expect.objectContaining({ url: 'files/2024-01-01/existing.txt' }),
);
} finally {
fetchSpy.mockRestore();
fs.rmSync(tmpFile, { force: true });
}
});
it('should error when local file does not exist', async () => {
const program = createProgram();
await program.parseAsync(['node', 'test', 'file', 'upload', '-f', '/no/such/file.txt']);
expect(log.error).toHaveBeenCalledWith(expect.stringContaining('File not found'));
expect(exitSpy).toHaveBeenCalledWith(1);
});
it('should error when no source is provided', async () => {
const program = createProgram();
await program.parseAsync(['node', 'test', 'file', 'upload']);
expect(log.error).toHaveBeenCalledWith(expect.stringContaining('Provide a local file path'));
expect(exitSpy).toHaveBeenCalledWith(1);
});
});
describe('edit', () => {
+7 -49
View File
@@ -4,7 +4,6 @@ import pc from 'picocolors';
import { getTrpcClient } from '../api/client';
import { confirm, outputJson, printTable, timeAgo, truncate } from '../utils/format';
import { log } from '../utils/logger';
import { uploadLocalFile } from '../utils/uploadLocalFile';
export function registerFileCommand(program: Command) {
const file = program.command('file').description('Manage files');
@@ -114,20 +113,18 @@ export function registerFileCommand(program: Command) {
// ── upload ───────────────────────────────────────────
file
.command('upload [source]')
.description('Upload a file from a local path or a URL')
.option('-f, --file <path>', 'Local file path to upload')
.option('--hash <hash>', 'File hash for deduplication check (URL mode)')
.option('--name <name>', 'File name (URL mode)')
.option('--type <type>', 'File MIME type (URL mode)')
.option('--size <size>', 'File size in bytes (URL mode)')
.command('upload <url>')
.description('Upload a file by URL (checks hash first)')
.option('--hash <hash>', 'File hash for deduplication check')
.option('--name <name>', 'File name')
.option('--type <type>', 'File MIME type')
.option('--size <size>', 'File size in bytes')
.option('--parent-id <id>', 'Parent folder ID')
.option('--json [fields]', 'Output JSON, optionally specify fields (comma-separated)')
.action(
async (
source: string | undefined,
url: string,
options: {
file?: string;
hash?: string;
json?: string | boolean;
name?: string;
@@ -136,47 +133,8 @@ export function registerFileCommand(program: Command) {
type?: string;
},
) => {
const isUrl = (value: string) =>
value.startsWith('http://') || value.startsWith('https://');
// Resolve the local file path: explicit --file, or a positional that is
// not a URL (e.g. `lh file upload ./games_list.txt`).
const localPath = options.file ?? (source && !isUrl(source) ? source : undefined);
const client = await getTrpcClient();
// ── Local file upload ──
if (localPath) {
let result;
try {
result = await uploadLocalFile(client, localPath, { parentId: options.parentId });
} catch (error) {
log.error(error instanceof Error ? error.message : String(error));
process.exit(1);
return;
}
if (options.json !== undefined) {
const fields = typeof options.json === 'string' ? options.json : undefined;
outputJson(result, fields);
return;
}
const r = result as any;
console.log(`${pc.green('✓')} File created: ${pc.bold(r.id || '')}`);
if (r.url) console.log(` URL: ${pc.dim(r.url)}`);
return;
}
// ── URL upload ──
if (!source) {
log.error('Provide a local file path, --file <path>, or a URL to upload.');
process.exit(1);
return;
}
const url = source;
// Check hash first if provided
if (options.hash) {
const check = await client.file.checkFileHash.mutate({ hash: options.hash });
-140
View File
@@ -1,7 +1,3 @@
import { rm as fsRm, writeFile as fsWriteFile } from 'node:fs/promises';
import os from 'node:os';
import path from 'node:path';
import { Command } from 'commander';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
@@ -10,9 +6,6 @@ import { registerGenerateCommand } from './generate';
const { mockTrpcClient } = vi.hoisted(() => ({
mockTrpcClient: {
asr: {
transcribe: { mutate: vi.fn() },
},
generation: {
deleteGeneration: { mutate: vi.fn() },
getGenerationStatus: { query: vi.fn() },
@@ -42,15 +35,6 @@ const { writeFileSync: mockWriteFileSync } = vi.hoisted(() => ({
writeFileSync: vi.fn(),
}));
const { uploadLocalFile: mockUploadLocalFile } = vi.hoisted(() => ({
uploadLocalFile: vi.fn(),
}));
vi.mock('../utils/uploadLocalFile', async (importOriginal) => {
const actual: Record<string, unknown> = await importOriginal();
return { ...actual, uploadLocalFile: mockUploadLocalFile };
});
vi.mock('../api/client', () => ({ getTrpcClient: mockGetTrpcClient }));
vi.mock('../api/http', () => ({ getAuthInfo: mockGetAuthInfo }));
vi.mock('node:fs', async (importOriginal) => {
@@ -385,130 +369,6 @@ describe('generate command', () => {
expect(log.error).toHaveBeenCalledWith(expect.stringContaining('not found'));
expect(exitSpy).toHaveBeenCalledWith(1);
});
it('should upload large local audio and transcribe by fileId', async () => {
// Real >3MB temp file so existsSync/statSync (unmocked) see it as large.
const bigPath = path.join(os.tmpdir(), `lh-asr-test-${process.pid}-${Date.now()}.mp3`);
await fsWriteFile(bigPath, Buffer.alloc(4 * 1024 * 1024));
mockUploadLocalFile.mockResolvedValue({ id: 'file_999' });
mockTrpcClient.asr.transcribe.mutate.mockResolvedValue({ text: 'big result' });
try {
const program = createProgram();
await program.parseAsync(['node', 'test', 'generate', 'asr', bigPath]);
expect(mockUploadLocalFile).toHaveBeenCalledWith(expect.anything(), bigPath);
expect(mockTrpcClient.asr.transcribe.mutate).toHaveBeenCalledWith(
expect.objectContaining({ fileId: 'file_999', model: 'whisper-1', provider: 'openai' }),
);
// never inlines bytes for the large file
expect(mockTrpcClient.asr.transcribe.mutate.mock.calls[0][0]).not.toHaveProperty(
'audioBase64',
);
expect(stdoutSpy).toHaveBeenCalledWith('big result');
} finally {
await fsRm(bigPath, { force: true });
}
});
it('should download and transcribe an audio URL', async () => {
const fetchMock = vi.fn().mockResolvedValue({
arrayBuffer: vi.fn().mockResolvedValue(new TextEncoder().encode('audio-bytes').buffer),
headers: new Headers(),
ok: true,
});
vi.stubGlobal('fetch', fetchMock);
mockTrpcClient.asr.transcribe.mutate.mockResolvedValue({ text: 'hello world' });
const program = createProgram();
await program.parseAsync([
'node',
'test',
'generate',
'asr',
'https://example.com/audio/sample.mp3',
]);
expect(fetchMock).toHaveBeenCalledWith('https://example.com/audio/sample.mp3');
expect(mockTrpcClient.asr.transcribe.mutate).toHaveBeenCalledWith(
expect.objectContaining({
audioBase64: Buffer.from('audio-bytes').toString('base64'),
fileName: 'sample.mp3',
model: 'whisper-1',
provider: 'openai',
}),
);
expect(stdoutSpy).toHaveBeenCalledWith('hello world');
expect(exitSpy).not.toHaveBeenCalled();
});
it('should derive an extension and mime type from Content-Type when the URL has none', async () => {
vi.stubGlobal(
'fetch',
vi.fn().mockResolvedValue({
arrayBuffer: vi.fn().mockResolvedValue(new TextEncoder().encode('audio-bytes').buffer),
headers: new Headers({ 'content-type': 'audio/mpeg; charset=binary' }),
ok: true,
}),
);
mockTrpcClient.asr.transcribe.mutate.mockResolvedValue({ text: 'ok' });
const program = createProgram();
await program.parseAsync(['node', 'test', 'generate', 'asr', 'https://example.com/download']);
expect(mockTrpcClient.asr.transcribe.mutate).toHaveBeenCalledWith(
expect.objectContaining({
fileName: 'download.mp3',
mimeType: 'audio/mpeg',
}),
);
});
it('should prefer the filename from Content-Disposition', async () => {
vi.stubGlobal(
'fetch',
vi.fn().mockResolvedValue({
arrayBuffer: vi.fn().mockResolvedValue(new TextEncoder().encode('audio-bytes').buffer),
headers: new Headers({
'content-disposition': 'attachment; filename="recording.wav"',
}),
ok: true,
}),
);
mockTrpcClient.asr.transcribe.mutate.mockResolvedValue({ text: 'ok' });
const program = createProgram();
await program.parseAsync([
'node',
'test',
'generate',
'asr',
'https://example.com/files/abc123?sig=xyz',
]);
expect(mockTrpcClient.asr.transcribe.mutate).toHaveBeenCalledWith(
expect.objectContaining({ fileName: 'recording.wav' }),
);
});
it('should exit when audio URL download fails', async () => {
vi.stubGlobal(
'fetch',
vi.fn().mockResolvedValue({ ok: false, status: 404, statusText: 'Not Found' }),
);
const program = createProgram();
await program.parseAsync([
'node',
'test',
'generate',
'asr',
'https://example.com/missing.mp3',
]);
expect(log.error).toHaveBeenCalledWith(expect.stringContaining('Failed to download audio'));
expect(exitSpy).toHaveBeenCalledWith(1);
});
});
describe('delete', () => {
+39 -167
View File
@@ -1,27 +1,16 @@
import { existsSync, statSync } from 'node:fs';
import { readFile, rm, writeFile } from 'node:fs/promises';
import os from 'node:os';
import { createReadStream, existsSync } from 'node:fs';
import path from 'node:path';
import type { Command } from 'commander';
import { getTrpcClient } from '../../api/client';
import { getAuthInfo } from '../../api/http';
import { log } from '../../utils/logger';
import { uploadLocalFile } from '../../utils/uploadLocalFile';
// Audio at or below this size is sent inline as base64; anything larger is
// uploaded first and transcribed by `fileId`. Kept in sync with the server-side
// inline cap in `apps/server/src/routers/lambda/asr.ts`.
const MAX_INLINE_AUDIO_BYTES = 3 * 1024 * 1024;
export function registerAsrCommand(parent: Command) {
parent
.command('asr <audio-file>')
.description(
'Convert speech to text (automatic speech recognition). Accepts a local path or a URL',
)
.description('Convert speech to text (automatic speech recognition)')
.option('--model <model>', 'STT model', 'whisper-1')
.option('--provider <provider>', 'AI provider', 'openai')
.option('--language <lang>', 'Language code (e.g. en, zh)')
.option('--json', 'Output raw JSON')
.action(
@@ -31,175 +20,58 @@ export function registerAsrCommand(parent: Command) {
json?: boolean;
language?: string;
model: string;
provider: string;
},
) => {
const isUrl = audioFile.startsWith('http://') || audioFile.startsWith('https://');
if (!isUrl && !existsSync(audioFile)) {
if (!existsSync(audioFile)) {
log.error(`File not found: ${audioFile}`);
process.exit(1);
return;
}
// Resolve the input to a local file path (downloading URLs to a temp
// file) so large audio can reuse the shared upload flow.
let localPath: string;
let fileName: string;
let mimeType: string | undefined;
let size: number;
let tempPath: string | undefined;
try {
if (isUrl) {
const downloaded = await fetchAudioFromUrl(audioFile);
fileName = downloaded.name;
mimeType = downloaded.mimeType;
size = downloaded.bytes.byteLength;
tempPath = path.join(os.tmpdir(), `lh-asr-${process.pid}-${Date.now()}-${fileName}`);
await writeFile(tempPath, downloaded.bytes);
localPath = tempPath;
} else {
localPath = audioFile;
fileName = path.basename(audioFile);
size = statSync(audioFile).size;
}
} catch (error) {
log.error(error instanceof Error ? error.message : String(error));
const { serverUrl, headers } = await getAuthInfo();
const sttOptions: Record<string, any> = { model: options.model };
if (options.language) sttOptions.language = options.language;
const formData = new FormData();
const fileBuffer = await readFileAsBlob(audioFile);
formData.append('speech', fileBuffer, path.basename(audioFile));
formData.append('options', JSON.stringify(sttOptions));
// Remove Content-Type for multipart/form-data (let fetch set it with boundary)
const { 'Content-Type': _, ...formHeaders } = headers;
const res = await fetch(`${serverUrl}/webapi/stt/openai`, {
body: formData,
headers: formHeaders,
method: 'POST',
});
if (!res.ok) {
const errText = await res.text();
log.error(`ASR failed: ${res.status} ${errText}`);
process.exit(1);
return;
}
try {
const client = await getTrpcClient();
const result = await res.json();
let result: { text: string };
if (size > MAX_INLINE_AUDIO_BYTES) {
// Large audio: upload to storage, then transcribe by fileId so the
// bytes never travel inline through tRPC.
process.stderr.write(
`Audio is ${(size / 1024 / 1024).toFixed(1)}MB — uploading before transcription…\n`,
);
const record = (await uploadLocalFile(client, localPath)) as { id: string };
result = await client.asr.transcribe.mutate({
fileId: record.id,
language: options.language,
model: options.model,
provider: options.provider,
});
} else {
const bytes = await readFile(localPath);
result = await client.asr.transcribe.mutate({
audioBase64: Buffer.from(bytes).toString('base64'),
fileName,
language: options.language,
mimeType,
model: options.model,
provider: options.provider,
});
}
if (options.json) {
console.log(JSON.stringify(result, null, 2));
} else {
process.stdout.write(result.text);
process.stdout.write('\n');
}
} catch (error) {
log.error(`ASR failed: ${error instanceof Error ? error.message : String(error)}`);
process.exit(1);
} finally {
if (tempPath) {
await rm(tempPath, { force: true }).catch(() => {});
}
if (options.json) {
console.log(JSON.stringify(result, null, 2));
} else {
const text = (result as any).text || JSON.stringify(result);
process.stdout.write(text);
process.stdout.write('\n');
}
},
);
}
// Common audio MIME types mapped to a file extension the transcription
// provider can recognize. Keep the extensions within the set OpenAI's
// /audio/transcriptions endpoint accepts.
const AUDIO_MIME_TO_EXT: Record<string, string> = {
'audio/aac': 'aac',
'audio/flac': 'flac',
'audio/m4a': 'm4a',
'audio/mp3': 'mp3',
'audio/mp4': 'm4a',
'audio/mpeg': 'mp3',
'audio/mpga': 'mp3',
'audio/ogg': 'ogg',
'audio/opus': 'ogg',
'audio/wav': 'wav',
'audio/wave': 'wav',
'audio/webm': 'webm',
'audio/x-m4a': 'm4a',
'audio/x-wav': 'wav',
};
async function fetchAudioFromUrl(
url: string,
): Promise<{ bytes: Uint8Array; mimeType?: string; name: string }> {
const res = await fetch(url);
if (!res.ok) {
throw new Error(`Failed to download audio: ${res.status} ${res.statusText}`);
}
const bytes = new Uint8Array(await res.arrayBuffer());
// Strip any parameters from the Content-Type (e.g. `audio/mpeg; charset=...`).
const contentType = res.headers.get('content-type')?.split(';')[0]?.trim().toLowerCase();
const mimeType = contentType?.startsWith('audio/') ? contentType : undefined;
// Prefer the name the server advertises, then the URL path, then a fallback.
const name =
fileNameFromContentDisposition(res.headers.get('content-disposition')) ||
basenameFromUrl(url) ||
'audio';
// Transcription providers infer the audio format from the file extension, so
// make sure the name carries one. Signed URLs and /download endpoints often
// have no extension in the path — in that case borrow it from the
// Content-Type when we recognize it.
const ext = contentType ? AUDIO_MIME_TO_EXT[contentType] : undefined;
const finalName = path.extname(name) || !ext ? name : `${name}.${ext}`;
return { bytes, mimeType, name: finalName };
}
// Extract a file name from a Content-Disposition header, handling both the
// plain `filename="x"` form and the RFC 5987 extended `filename*=UTF-8''x` form.
function fileNameFromContentDisposition(header: string | null): string | undefined {
if (!header) return undefined;
// Extended form takes precedence and may be percent-encoded.
const extended = /filename\*=\s*(?:UTF-8|ISO-8859-1)?''([^;]+)/i.exec(header);
if (extended?.[1]) {
try {
return path.basename(decodeURIComponent(extended[1].trim()));
} catch {
// Malformed encoding — fall through to the plain form.
}
}
const plain = /filename=\s*"?([^";]+)"?/i.exec(header);
const value = plain?.[1]?.trim();
return value ? path.basename(value) : undefined;
}
// Derive the (URL-decoded) last path segment of a URL, if any.
function basenameFromUrl(url: string): string | undefined {
let pathname: string;
try {
pathname = new URL(url).pathname;
} catch {
return undefined;
}
const base = path.basename(pathname);
if (!base) return undefined;
try {
return decodeURIComponent(base);
} catch {
return base;
async function readFileAsBlob(filePath: string): Promise<Blob> {
const chunks: Uint8Array[] = [];
const stream = createReadStream(filePath);
for await (const chunk of stream) {
chunks.push(chunk as Uint8Array);
}
return new Blob(chunks);
}
+1 -48
View File
@@ -649,55 +649,8 @@ describe('hetero exec command', () => {
]);
});
it('finishes with result "error" when a terminal error event is pushed despite a clean exit', async () => {
// CC relays an API/rate-limit error as an in-stream `error` event but still
// exits 0. The finish result must NOT be derived from the exit code alone,
// otherwise the topic/task is wrongly marked completed.
mockSpawnAgent.mockReturnValue(
createFakeHandle({
events: [
{
data: {
error: 'API Error: Server is temporarily limiting requests · Rate limited',
message: 'API Error: Server is temporarily limiting requests · Rate limited',
},
operationId: 'op-err',
stepIndex: 0,
timestamp: 1,
type: 'error',
},
],
exitCode: 0,
}),
);
await runCmd([
'hetero',
'exec',
'--type',
'claude-code',
'--prompt',
'hi',
'--topic',
'topic-1',
'--operation-id',
'op-err',
'--render',
'none',
]);
expect(mockHeteroFinishMutate).toHaveBeenCalledTimes(1);
expect(mockHeteroFinishMutate.mock.calls[0][0]).toMatchObject({
error: {
message: 'API Error: Server is temporarily limiting requests · Rate limited',
type: 'AgentRuntimeError',
},
result: 'error',
});
});
it('resets the per-message text accumulator at message boundaries (no cross-message duplication)', async () => {
// The `replace` snapshot accumulator must not span
// LOBE-10157 Bug 3: the `replace` snapshot accumulator must not span
// message boundaries. Two assistant messages separated by a
// stream_end/stream_start boundary must each snapshot only their OWN
// text — otherwise the second message re-emits the first's text verbatim.
+8 -36
View File
@@ -261,7 +261,7 @@ class SerialServerIngester {
// adapter's `openMainMessage`) must reset it — otherwise it spans the
// whole run and every later message's snapshot re-emits all prior
// messages' text verbatim, which the server then persists into the new
// DB message: cross-message text duplication. Reset
// DB message (LOBE-10157 Bug 3: cross-message text duplication). Reset
// AFTER flushing the just-ended message's pending snapshot above.
if (event.type === 'stream_start' || event.type === 'stream_end') {
this.accumulatedText = '';
@@ -467,11 +467,6 @@ const exec = async (options: ExecOptions): Promise<void> => {
* sessionId — CC session id from `system.init` (undefined on resume failure)
* ingestError — true when a batch could not be flushed after retries
* resumeNotFound — true when a resume-not-found error was intercepted
* sawTerminalError — true when a terminal `error` event was pushed to the
* ingester (CC can relay an API/rate-limit error this way
* and still exit 0, so the exit code alone is not enough)
* terminalErrorMessage — the message from that terminal `error` event, used
* as the task-level error detail in the finish payload
* stderrContent — accumulated stderr (only when interceptResumeErrors=true)
*/
const runOneAgent = async (
@@ -482,11 +477,9 @@ const exec = async (options: ExecOptions): Promise<void> => {
code: number | null;
ingestError: boolean;
resumeNotFound: boolean;
sawTerminalError: boolean;
sessionId: string | undefined;
signal: NodeJS.Signals | null;
stderrContent: string;
terminalErrorMessage: string | undefined;
}> => {
// One raw-dump file pair per spawn attempt (the resume retry is a second
// attempt). The stdout tee runs inside `spawnAgent` before the adapter.
@@ -556,8 +549,6 @@ const exec = async (options: ExecOptions): Promise<void> => {
// into the ingester. When intercepting resume errors, a matching
// `error` event is withheld from the ingester and flags a retry instead.
let resumeNotFound = false;
let sawTerminalError = false;
let terminalErrorMessage: string | undefined;
const ingestError = false;
try {
for await (const event of handle.events) {
@@ -572,16 +563,6 @@ const exec = async (options: ExecOptions): Promise<void> => {
continue;
}
}
// A terminal `error` event (e.g. an API/rate-limit error relayed by CC)
// must mark the run as failed even when the child exits 0 — track it so
// the finish result is not derived from the exit code alone. Capture the
// message too, so the finish payload can surface it as the task-level
// error detail (CC relays these on stdout, not stderr).
if (event.type === 'error') {
sawTerminalError = true;
const data = event.data as Record<string, unknown> | undefined;
terminalErrorMessage = String(data?.message ?? data?.error ?? '') || undefined;
}
if (emitJsonl) process.stdout.write(`${JSON.stringify(event)}\n`);
serverIngester?.push(event);
}
@@ -627,11 +608,9 @@ const exec = async (options: ExecOptions): Promise<void> => {
code,
ingestError,
resumeNotFound,
sawTerminalError,
sessionId: handle.sessionId,
signal,
stderrContent,
terminalErrorMessage,
};
};
@@ -696,23 +675,16 @@ const exec = async (options: ExecOptions): Promise<void> => {
result = { ...result, ingestError: true };
}
// CC relays API/rate-limit errors as an in-stream terminal `error` event but
// still exits 0, so the exit code alone would report `success`. Treat any
// pushed terminal error as a failed run so the topic/task is marked failed.
const exitedClean =
!result.ingestError && !result.sawTerminalError && (code === 0 || signal === 'SIGTERM');
const exitedClean = !result.ingestError && (code === 0 || signal === 'SIGTERM');
// When the run failed, pass an error detail so the server surfaces a useful
// message instead of the generic "Agent execution failed" fallback. Prefer
// the in-stream terminal error (CC relays API/rate-limit errors here while
// exiting 0, so stderr is empty); otherwise fall back to the stderr tail.
// Trim to the last 1 KB — the tail is most informative and keeps the tRPC
// payload small.
// When the run failed, pass stderr as the error detail so the server can
// surface a useful message instead of the generic "Agent execution failed"
// fallback. Trim to the last 1 KB — the tail is most informative and
// keeps the tRPC payload small.
const stderrTail = result.stderrContent.trim();
const errorDetail = result.terminalErrorMessage || stderrTail;
const finishError =
!exitedClean && errorDetail
? { message: errorDetail.slice(-1024), type: 'AgentRuntimeError' }
!exitedClean && stderrTail
? { message: stderrTail.slice(-1024), type: 'AgentRuntimeError' }
: undefined;
try {
+74 -13
View File
@@ -1,12 +1,14 @@
import crypto from 'node:crypto';
import fs from 'node:fs';
import path from 'node:path';
import type { Command } from 'commander';
import pc from 'picocolors';
import { getTrpcClient } from '../api/client';
import { getAuthInfo } from '../api/http';
import { confirm, outputJson, printTable, timeAgo, truncate } from '../utils/format';
import { log } from '../utils/logger';
import { uploadLocalFile } from '../utils/uploadLocalFile';
function formatFileType(fileType: string): string {
if (!fileType) return '';
@@ -322,22 +324,81 @@ export function registerKbCommand(program: Command) {
.description('Upload a file to a knowledge base')
.option('--parent <parentId>', 'Parent folder ID')
.action(async (knowledgeBaseId: string, filePath: string, options: { parent?: string }) => {
const client = await getTrpcClient();
let result;
try {
result = await uploadLocalFile(client, filePath, {
knowledgeBaseId,
parentId: options.parent,
});
} catch (error) {
log.error(error instanceof Error ? error.message : String(error));
const resolved = path.resolve(filePath);
if (!fs.existsSync(resolved)) {
log.error(`File not found: ${resolved}`);
process.exit(1);
return;
}
const stat = fs.statSync(resolved);
const fileName = path.basename(resolved);
const fileBuffer = fs.readFileSync(resolved);
// Compute SHA-256 hash
const hash = crypto.createHash('sha256').update(fileBuffer).digest('hex');
// Detect MIME type from extension
const ext = path.extname(fileName).toLowerCase().slice(1);
const mimeMap: Record<string, string> = {
csv: 'text/csv',
doc: 'application/msword',
docx: 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
gif: 'image/gif',
jpeg: 'image/jpeg',
jpg: 'image/jpeg',
json: 'application/json',
md: 'text/markdown',
mp3: 'audio/mpeg',
mp4: 'video/mp4',
pdf: 'application/pdf',
png: 'image/png',
pptx: 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
svg: 'image/svg+xml',
txt: 'text/plain',
webp: 'image/webp',
xlsx: 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
};
const fileType = mimeMap[ext] || 'application/octet-stream';
const client = await getTrpcClient();
const { serverUrl, headers } = await getAuthInfo();
// 1. Get presigned URL
const date = new Date().toLocaleDateString('en-CA'); // YYYY-MM-DD
const pathname = `files/${date}/${hash}.${ext}`;
const presigned = await client.upload.createS3PreSignedUrl.mutate({ pathname });
// 2. Upload to S3
const presignedUrl = typeof presigned === 'string' ? presigned : (presigned as any).url;
const uploadRes = await fetch(presignedUrl, {
body: fileBuffer,
headers: { 'Content-Type': fileType },
method: 'PUT',
});
if (!uploadRes.ok) {
log.error(`Upload failed: ${uploadRes.status} ${uploadRes.statusText}`);
process.exit(1);
}
// 3. Create file record
const result = await client.file.createFile.mutate({
fileType,
hash,
knowledgeBaseId,
metadata: {
date,
dirname: '',
filename: fileName,
path: pathname,
},
name: fileName,
parentId: options.parent,
size: stat.size,
url: pathname,
});
console.log(
`${pc.green('✓')} Uploaded ${pc.bold(path.basename(filePath))}${pc.bold((result as any).id)}`,
`${pc.green('✓')} Uploaded ${pc.bold(fileName)}${pc.bold((result as any).id)}`,
);
});
}
+1 -31
View File
@@ -1,8 +1,7 @@
import { Command } from 'commander';
import { beforeEach, describe, expect, it, vi } from 'vitest';
import { describe, expect, it, vi } from 'vitest';
import { clearCredentials } from '../auth/credentials';
import { stopDaemon } from '../daemon/manager';
import { log } from '../utils/logger';
import { registerLogoutCommand } from './logout';
@@ -10,10 +9,6 @@ vi.mock('../auth/credentials', () => ({
clearCredentials: vi.fn(),
}));
vi.mock('../daemon/manager', () => ({
stopDaemon: vi.fn(),
}));
vi.mock('../utils/logger', () => ({
log: {
debug: vi.fn(),
@@ -24,11 +19,6 @@ vi.mock('../utils/logger', () => ({
}));
describe('logout command', () => {
beforeEach(() => {
vi.clearAllMocks();
vi.mocked(stopDaemon).mockReturnValue(false);
});
function createProgram() {
const program = new Command();
program.exitOverride();
@@ -54,24 +44,4 @@ describe('logout command', () => {
expect(log.info).toHaveBeenCalledWith(expect.stringContaining('Already logged out'));
});
it('should stop the connect daemon before clearing credentials', async () => {
vi.mocked(stopDaemon).mockReturnValue(true);
vi.mocked(clearCredentials).mockReturnValue(true);
const program = createProgram();
await program.parseAsync(['node', 'test', 'logout']);
expect(stopDaemon).toHaveBeenCalled();
expect(log.info).toHaveBeenCalledWith(expect.stringContaining('Disconnected device daemon'));
});
it('should still attempt daemon teardown when no credentials exist', async () => {
vi.mocked(clearCredentials).mockReturnValue(false);
const program = createProgram();
await program.parseAsync(['node', 'test', 'logout']);
expect(stopDaemon).toHaveBeenCalled();
});
});
-9
View File
@@ -1,7 +1,6 @@
import type { Command } from 'commander';
import { clearCredentials } from '../auth/credentials';
import { stopDaemon } from '../daemon/manager';
import { log } from '../utils/logger';
export function registerLogoutCommand(program: Command) {
@@ -9,14 +8,6 @@ export function registerLogoutCommand(program: Command) {
.command('logout')
.description('Log out and remove stored credentials')
.action(() => {
// Tear down the connect daemon first — otherwise it keeps the device
// online on the gateway with the cached token even after credentials are
// gone, leaving the machine remotely driveable past "logout".
const stopped = stopDaemon();
if (stopped) {
log.info('Disconnected device daemon.');
}
const removed = clearCredentials();
if (removed) {
log.info('Logged out. Credentials removed.');
-58
View File
@@ -100,19 +100,6 @@ describe('model command', () => {
expect(consoleSpy).toHaveBeenCalledWith(JSON.stringify(visibleModels, null, 2));
});
it('should normalize the legacy `stt` type to `asr` when filtering', async () => {
mockTrpcClient.aiModel.getAiProviderModelList.query.mockResolvedValue([
{ displayName: 'Whisper', enabled: true, id: 'whisper-1', type: 'asr' },
]);
const program = createProgram();
await program.parseAsync(['node', 'test', 'model', 'list', 'openai', '--type', 'stt']);
expect(mockTrpcClient.aiModel.getAiProviderModelList.query).toHaveBeenCalledWith(
expect.objectContaining({ id: 'openai', type: 'asr' }),
);
});
});
describe('view', () => {
@@ -170,28 +157,6 @@ describe('model command', () => {
);
expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('Created model'));
});
it('should normalize the legacy `stt` type to `asr`', async () => {
mockTrpcClient.aiModel.createAiModel.mutate.mockResolvedValue('whisper-1');
const program = createProgram();
await program.parseAsync([
'node',
'test',
'model',
'create',
'--id',
'whisper-1',
'--provider',
'openai',
'--type',
'stt',
]);
expect(mockTrpcClient.aiModel.createAiModel.mutate).toHaveBeenCalledWith(
expect.objectContaining({ id: 'whisper-1', providerId: 'openai', type: 'asr' }),
);
});
});
describe('edit', () => {
@@ -219,29 +184,6 @@ describe('model command', () => {
expect(consoleSpy).toHaveBeenCalledWith(expect.stringContaining('Updated model'));
});
it('should normalize the legacy `stt` type to `asr`', async () => {
mockTrpcClient.aiModel.updateAiModel.mutate.mockResolvedValue({});
const program = createProgram();
await program.parseAsync([
'node',
'test',
'model',
'edit',
'whisper-1',
'--provider',
'openai',
'--type',
'stt',
]);
expect(mockTrpcClient.aiModel.updateAiModel.mutate).toHaveBeenCalledWith({
id: 'whisper-1',
providerId: 'openai',
value: expect.objectContaining({ type: 'asr' }),
});
});
it('should error when no changes specified', async () => {
const program = createProgram();
await program.parseAsync(['node', 'test', 'model', 'edit', 'gpt-4', '--provider', 'openai']);
+8 -15
View File
@@ -7,11 +7,6 @@ import { log } from '../utils/logger';
const isVisibleModel = (model: { visible?: boolean }) => model.visible !== false;
// The model type `stt` was renamed to the standard `asr`. Accept the legacy
// alias on CLI input and forward/compare `asr`, so existing scripts and muscle
// memory keep working against the new router schema.
const normalizeModelType = (type: string): string => (type === 'stt' ? 'asr' : type);
export function registerModelCommand(program: Command) {
const model = program.command('model').description('Manage AI models');
@@ -24,7 +19,7 @@ export function registerModelCommand(program: Command) {
.option('--enabled', 'Only show enabled models')
.option(
'--type <type>',
'Filter by model type (chat|embedding|tts|asr|image|video|text2music|realtime)',
'Filter by model type (chat|embedding|tts|stt|image|video|text2music|realtime)',
)
.option('--json [fields]', 'Output JSON, optionally specify fields (comma-separated)')
.action(
@@ -34,20 +29,18 @@ export function registerModelCommand(program: Command) {
) => {
const client = await getTrpcClient();
const typeFilter = options.type ? normalizeModelType(options.type) : undefined;
const input: Record<string, any> = { id: providerId };
if (options.limit) input.limit = Number.parseInt(options.limit, 10);
if (options.enabled) input.enabled = true;
if (typeFilter) input.type = typeFilter;
if (options.type) input.type = options.type;
const result = await client.aiModel.getAiProviderModelList.query(input as any);
let items = (Array.isArray(result) ? result : ((result as any).items ?? [])).filter(
isVisibleModel,
);
if (typeFilter) {
items = items.filter((m: any) => m.type === typeFilter);
if (options.type) {
items = items.filter((m: any) => m.type === options.type);
}
if (options.json !== undefined) {
@@ -113,7 +106,7 @@ export function registerModelCommand(program: Command) {
.option('--display-name <name>', 'Display name')
.option(
'--type <type>',
'Model type (chat|embedding|tts|asr|image|video|text2music|realtime)',
'Model type (chat|embedding|tts|stt|image|video|text2music|realtime)',
'chat',
)
.action(
@@ -123,7 +116,7 @@ export function registerModelCommand(program: Command) {
const input: Record<string, any> = {
id: options.id,
providerId: options.provider,
type: normalizeModelType(options.type || 'chat'),
type: options.type || 'chat',
};
if (options.displayName) input.displayName = options.displayName;
@@ -139,7 +132,7 @@ export function registerModelCommand(program: Command) {
.description('Update model info')
.requiredOption('--provider <providerId>', 'Provider ID')
.option('--display-name <name>', 'Display name')
.option('--type <type>', 'Model type (chat|embedding|tts|asr|image|video|text2music|realtime)')
.option('--type <type>', 'Model type (chat|embedding|tts|stt|image|video|text2music|realtime)')
.action(
async (id: string, options: { displayName?: string; provider: string; type?: string }) => {
if (!options.displayName && !options.type) {
@@ -151,7 +144,7 @@ export function registerModelCommand(program: Command) {
const value: Record<string, any> = {};
if (options.displayName) value.displayName = options.displayName;
if (options.type) value.type = normalizeModelType(options.type);
if (options.type) value.type = options.type;
await client.aiModel.updateAiModel.mutate({
id,
+16 -20
View File
@@ -64,18 +64,15 @@ describe('skill command', () => {
describe('list', () => {
it('should display skills in table format', async () => {
mockTrpcClient.agentSkills.list.query.mockResolvedValue({
data: [
{
description: 'A skill',
id: 's1',
identifier: 'test-skill',
name: 'Test Skill',
source: 'user',
},
],
total: 1,
});
mockTrpcClient.agentSkills.list.query.mockResolvedValue([
{
description: 'A skill',
id: 's1',
identifier: 'test-skill',
name: 'Test Skill',
source: 'user',
},
]);
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list']);
@@ -86,7 +83,7 @@ describe('skill command', () => {
it('should output JSON when --json flag is used', async () => {
const items = [{ id: 's1', name: 'Test' }];
mockTrpcClient.agentSkills.list.query.mockResolvedValue({ data: items, total: items.length });
mockTrpcClient.agentSkills.list.query.mockResolvedValue(items);
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list', '--json']);
@@ -95,7 +92,7 @@ describe('skill command', () => {
});
it('should filter by source', async () => {
mockTrpcClient.agentSkills.list.query.mockResolvedValue({ data: [], total: 0 });
mockTrpcClient.agentSkills.list.query.mockResolvedValue([]);
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list', '--source', 'builtin']);
@@ -114,7 +111,7 @@ describe('skill command', () => {
});
it('should show message when no skills found', async () => {
mockTrpcClient.agentSkills.list.query.mockResolvedValue({ data: [], total: 0 });
mockTrpcClient.agentSkills.list.query.mockResolvedValue([]);
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list']);
@@ -214,10 +211,9 @@ describe('skill command', () => {
describe('search', () => {
it('should search skills', async () => {
mockTrpcClient.agentSkills.search.query.mockResolvedValue({
data: [{ description: 'A skill', id: 's1', name: 'Found Skill' }],
total: 1,
});
mockTrpcClient.agentSkills.search.query.mockResolvedValue([
{ description: 'A skill', id: 's1', name: 'Found Skill' },
]);
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'search', 'test']);
@@ -227,7 +223,7 @@ describe('skill command', () => {
});
it('should show message when no results', async () => {
mockTrpcClient.agentSkills.search.query.mockResolvedValue({ data: [], total: 0 });
mockTrpcClient.agentSkills.search.query.mockResolvedValue([]);
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'search', 'nothing']);
+2 -2
View File
@@ -47,7 +47,7 @@ export function registerSkillCommand(program: Command) {
if (options.source) input.source = options.source as 'builtin' | 'market' | 'user';
const result = await client.agentSkills.list.query(input);
const items = result?.data ?? [];
const items = Array.isArray(result) ? result : [];
if (options.json !== undefined) {
const fields = typeof options.json === 'string' ? options.json : undefined;
@@ -206,7 +206,7 @@ export function registerSkillCommand(program: Command) {
.action(async (query: string, options: { json?: string | boolean }) => {
const client = await getTrpcClient();
const result = await client.agentSkills.search.query({ query });
const items = result?.data ?? [];
const items = Array.isArray(result) ? result : [];
if (options.json !== undefined) {
const fields = typeof options.json === 'string' ? options.json : undefined;
-57
View File
@@ -1,57 +0,0 @@
import { describe, expect, it } from 'vitest';
import { buildInstallCommand, isNewerVersion } from './update';
describe('isNewerVersion', () => {
it('compares core versions', () => {
expect(isNewerVersion('1.2.3', '1.2.2')).toBe(true);
expect(isNewerVersion('1.2.2', '1.2.3')).toBe(false);
expect(isNewerVersion('1.2.3', '1.2.3')).toBe(false);
expect(isNewerVersion('2.0.0', '1.9.9')).toBe(true);
});
it('tolerates a leading v and missing segments', () => {
expect(isNewerVersion('v1.2.0', '1.2.0')).toBe(false);
expect(isNewerVersion('1.2', '1.2.0')).toBe(false);
expect(isNewerVersion('1.3', '1.2.9')).toBe(true);
});
it('ranks a stable release above a prerelease of the same core', () => {
expect(isNewerVersion('1.2.3', '1.2.3-beta.1')).toBe(true);
expect(isNewerVersion('1.2.3-beta.1', '1.2.3')).toBe(false);
expect(isNewerVersion('1.2.3-beta.2', '1.2.3-beta.1')).toBe(true);
expect(isNewerVersion('1.2.3-beta.1', '1.2.3-beta.1')).toBe(false);
});
it('orders numeric prerelease identifiers numerically, not lexicographically', () => {
// The bug a raw string compare gets wrong: beta.10 must outrank beta.9.
expect(isNewerVersion('1.0.0-beta.10', '1.0.0-beta.9')).toBe(true);
expect(isNewerVersion('1.0.0-beta.9', '1.0.0-beta.10')).toBe(false);
expect(isNewerVersion('1.0.0-beta.2', '1.0.0-beta.10')).toBe(false);
});
it('returns false for an unparseable latest version', () => {
expect(isNewerVersion('not-a-version', '1.0.0')).toBe(false);
});
});
describe('buildInstallCommand', () => {
it('builds the global install command per package manager', () => {
expect(buildInstallCommand('npm', '@lobehub/cli@1.0.0')).toEqual({
args: ['install', '-g', '@lobehub/cli@1.0.0'],
command: 'npm',
});
expect(buildInstallCommand('pnpm', '@lobehub/cli@1.0.0')).toEqual({
args: ['add', '-g', '@lobehub/cli@1.0.0'],
command: 'pnpm',
});
expect(buildInstallCommand('bun', '@lobehub/cli@1.0.0')).toEqual({
args: ['add', '-g', '@lobehub/cli@1.0.0'],
command: 'bun',
});
expect(buildInstallCommand('yarn', '@lobehub/cli@1.0.0')).toEqual({
args: ['global', 'add', '@lobehub/cli@1.0.0'],
command: 'yarn',
});
});
});

Some files were not shown because too many files have changed in this diff Show More