mirror of
https://github.com/lobehub/lobe-chat.git
synced 2026-06-14 03:30:19 +00:00
🐛 fix(chat): compact operation metrics on narrow inputs (#15735)
* 🐛 fix: compact operation metrics on narrow inputs * 📝 docs: improve agent testing report template
This commit is contained in:
@@ -38,9 +38,9 @@ lists `packages/**`, `e2e`, `apps/server`, and only `apps/desktop/src/main` —
|
||||
refresh them, so install in every app the test will touch:
|
||||
|
||||
```bash
|
||||
pnpm install # root workspace
|
||||
cd apps/desktop && pnpm install # Electron surface
|
||||
cd apps/cli && pnpm install # CLI surface
|
||||
pnpm install # root workspace
|
||||
cd apps/desktop && pnpm install # Electron surface
|
||||
cd apps/cli && pnpm install # CLI surface
|
||||
```
|
||||
|
||||
Symptom of a stale standalone install: the build/launch fails to resolve a
|
||||
@@ -63,12 +63,12 @@ long-running scripts.
|
||||
./.agents/skills/agent-testing/scripts/setup-auth.sh status
|
||||
```
|
||||
|
||||
| Surface | Mechanism | One-key path | Standard check |
|
||||
| -------- | ------------------------------------------------- | ------------------------------ | ------------------------------ |
|
||||
| CLI | OIDC Device Code Flow (`apps/cli/.lobehub-dev`) | `setup-auth.sh cli` | `setup-auth.sh status` |
|
||||
| Web | better-auth cookie injection into `agent-browser` | `pbpaste \| setup-auth.sh web` | `setup-auth.sh web-verify` |
|
||||
| Electron | App's own persistent login state | Log in once in the app | `app-probe.sh auth` |
|
||||
| Bot | Native apps already logged in | — | per-platform screenshot |
|
||||
| Surface | Mechanism | One-key path | Standard check |
|
||||
| -------- | ------------------------------------------------- | ------------------------------ | -------------------------- |
|
||||
| CLI | OIDC Device Code Flow (`apps/cli/.lobehub-dev`) | `setup-auth.sh cli` | `setup-auth.sh status` |
|
||||
| Web | better-auth cookie injection into `agent-browser` | `pbpaste \| setup-auth.sh web` | `setup-auth.sh web-verify` |
|
||||
| Electron | App's own persistent login state | Log in once in the app | `app-probe.sh auth` |
|
||||
| Bot | Native apps already logged in | — | per-platform screenshot |
|
||||
|
||||
Login-state checks are standardized — do NOT hand-roll `window.__LOBE_STORES`
|
||||
eval snippets; use `scripts/app-probe.sh auth` (returns `{ isSignedIn, userId }`,
|
||||
@@ -148,17 +148,17 @@ Surface guides above carry the detailed workflows. Shared infrastructure:
|
||||
|
||||
All under `.agents/skills/agent-testing/scripts/`:
|
||||
|
||||
| Script | Usage |
|
||||
| ------------------------- | ------------------------------------------------------------------------------ |
|
||||
| `setup-auth.sh` | One-stop auth setup & status check (`status` / `cli` / `web`) |
|
||||
| `app-probe.sh` | LobeHub app probes: `auth` / `route` / `ops` / `goto <path>` / `errors` |
|
||||
| `record-gif.sh` | Frame-sequence → GIF for time-based behavior (streaming, timers, animations) |
|
||||
| `report-init.sh` | Scaffold a structured test report (Step 3) |
|
||||
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart, CDP 9222) |
|
||||
| `capture-app-window.sh` | Screenshot a specific app window (general; used by bot tests) |
|
||||
| `record-app-screen.sh` | Record app screen (video + periodic screenshots) |
|
||||
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
|
||||
| `agent-gateway/` | Gateway probe / dump / analyze tools |
|
||||
| Script | Usage |
|
||||
| ------------------------- | ---------------------------------------------------------------------------- |
|
||||
| `setup-auth.sh` | One-stop auth setup & status check (`status` / `cli` / `web`) |
|
||||
| `app-probe.sh` | LobeHub app probes: `auth` / `route` / `ops` / `goto <path>` / `errors` |
|
||||
| `record-gif.sh` | Frame-sequence → GIF for time-based behavior (streaming, timers, animations) |
|
||||
| `report-init.sh` | Scaffold a structured test report (Step 3) |
|
||||
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart, CDP 9222) |
|
||||
| `capture-app-window.sh` | Screenshot a specific app window (general; used by bot tests) |
|
||||
| `record-app-screen.sh` | Record app screen (video + periodic screenshots) |
|
||||
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
|
||||
| `agent-gateway/` | Gateway probe / dump / analyze tools |
|
||||
|
||||
`app-probe.sh` is the LobeHub-specific fast path into app state — auth check,
|
||||
current route, running operations, and `goto <path>` quick navigation
|
||||
@@ -174,12 +174,13 @@ not a chat-only summary. Scaffold it up front and fill it as you test:
|
||||
```bash
|
||||
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh my-feature "Verify my feature")
|
||||
# ... test, saving screenshots / CLI transcripts into $DIR/assets/ ...
|
||||
# fill $DIR/report.md (case table, embedded evidence, verdict) and $DIR/result.json
|
||||
# fill $DIR/report.md (scope, case table with inline evidence, verdict, score) and $DIR/result.json
|
||||
```
|
||||
|
||||
Reports live in `.records/reports/<timestamp>-<slug>/` (gitignored): `report.md`
|
||||
(human-readable, with embedded screenshots), `result.json` (machine-readable
|
||||
pass/fail + score), `assets/` (evidence). Format spec and evidence rules:
|
||||
(human-readable, with screenshots/GIFs embedded directly in the case table),
|
||||
`result.json` (machine-readable pass/fail + score), `assets/` (evidence).
|
||||
Format spec and evidence rules:
|
||||
[references/report.md](./references/report.md).
|
||||
|
||||
Two hard rules worth front-loading:
|
||||
@@ -187,6 +188,10 @@ Two hard rules worth front-loading:
|
||||
- **Report language = the user's conversation language.** Write the ENTIRE
|
||||
`report.md` (headings included) in the language the user is conversing in —
|
||||
no mixed English. `result.json` keys/status values stay English.
|
||||
- **The case table is the main reading surface.** Prefer the compact
|
||||
`# | case | result | key observation | evidence` shape and embed the
|
||||
screenshot/GIF in the evidence cell. Use separate evidence sections only for
|
||||
long CLI transcripts, HAR summaries, or supplemental detail.
|
||||
- **Time-based behavior needs a GIF, not a screenshot.** If a case asserts
|
||||
change over time (streaming output, a ticking timer, loading states,
|
||||
animations), record it with `scripts/record-gif.sh` and embed the GIF —
|
||||
|
||||
@@ -11,7 +11,7 @@ output):
|
||||
|
||||
```
|
||||
.records/reports/<YYYYMMDD-HHMMSS>-<slug>/
|
||||
├── report.md # human-readable report (embedded screenshots, case table, verdict)
|
||||
├── report.md # human-readable report (case table with inline screenshots, verdict)
|
||||
├── result.json # machine-readable results (pass/fail counts, score)
|
||||
└── assets/ # evidence: screenshots, HAR files, CLI transcripts
|
||||
```
|
||||
@@ -25,7 +25,9 @@ output):
|
||||
```
|
||||
|
||||
The script creates the directory, pre-fills branch / commit / date in both
|
||||
files, and prints the directory path.
|
||||
files, and prints the directory path. The scaffold uses the compact report
|
||||
shape below; translate its headings and table labels to the user's language
|
||||
before delivery if needed.
|
||||
|
||||
2. **Collect evidence as you test** — every asserted behavior gets one evidence
|
||||
item in `$DIR/assets/`:
|
||||
@@ -48,10 +50,14 @@ output):
|
||||
|
||||
Embed it like an image: ``. Verify
|
||||
at least the first/last frames visually (Read the GIF) before citing.
|
||||
|
||||
- CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
|
||||
- Network: `agent-browser network requests` dumps or HAR files.
|
||||
|
||||
3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
|
||||
The primary evidence belongs in the case table itself: each row should pair
|
||||
the assertion with the screenshot/GIF/link that proves it, so readers can
|
||||
scan the result without jumping between sections.
|
||||
|
||||
4. **Set the verdict** in both `report.md` and `result.json`, then link the
|
||||
report directory in your final answer to the user.
|
||||
@@ -60,21 +66,47 @@ output):
|
||||
|
||||
**`report.md` MUST be written in the language the user is conversing in** —
|
||||
the whole file, headings included. If the conversation is in Chinese, the
|
||||
report is in Chinese; do not mix English prose into it. The scaffold's English
|
||||
headings are placeholders — translate them when filling. Exceptions that stay
|
||||
as-is: code/commands, identifiers, log excerpts, and `result.json` (its keys
|
||||
and status values are machine-read and stay English; the `title` and case
|
||||
`name` fields follow the user's language).
|
||||
report is in Chinese; do not mix English prose into it. The scaffold headings
|
||||
are placeholders — translate them when filling if the user is not conversing in
|
||||
the scaffold language. Exceptions that stay as-is: code/commands, identifiers,
|
||||
log excerpts, and `result.json` (its keys and status values are machine-read
|
||||
and stay English; the `title` and case `name` fields follow the user's
|
||||
language).
|
||||
|
||||
## report.md sections
|
||||
|
||||
| Section | Content |
|
||||
| --------------- | ---------------------------------------------------------------------------------- |
|
||||
| **Scope** | What changed / what is being verified; branch + commit |
|
||||
| **Environment** | Server URL, surfaces used (cli / electron / web / bot), relevant versions |
|
||||
| **Cases** | Table: `# \| case \| surface \| steps \| expected \| actual \| status \| evidence` |
|
||||
| **Evidence** | Embedded screenshots/GIFs (``), fenced CLI transcripts |
|
||||
| **Verdict** | Pass/fail/blocked counts, optional 0–100 score, open issues / follow-ups |
|
||||
Default report shape:
|
||||
|
||||
| Section | Content |
|
||||
| ---------------- | -------------------------------------------------------------------------------------------- |
|
||||
| **Scope** | What changed / what is being verified; branch, commit, date, surface, entry URL/page, focus |
|
||||
| **Cases** | Compact table: `# \| Case \| Result \| Key observation \| Evidence` |
|
||||
| **Verdict** | Overall verdict first (`pass` / `partial` / `fail`), then the concise reasons and follow-ups |
|
||||
| **Verification** | Commands or automated checks run in this session, with trimmed results |
|
||||
| **Score** | Pass/fail/blocked counts, optional 0–100 score |
|
||||
|
||||
The case table is the main reading surface. Prefer one clear row per user
|
||||
scenario or regression assertion, and put the screenshot/GIF directly in the
|
||||
`Evidence` cell:
|
||||
|
||||
```markdown
|
||||
| # | Case | Result | Key observation | Evidence |
|
||||
| --- | ------------------------ | ------ | ----------------------------------------------------------------- | ------------------------------------------------ |
|
||||
| 1 | Create a new page | pass | Title and body persisted after refresh |  |
|
||||
| 2 | Respect requested length | fail | Requested about 600 Chinese characters; final body was about 1286 |  |
|
||||
```
|
||||
|
||||
Avoid the old wide table with separate `steps`, `expected`, and `actual`
|
||||
columns unless the test is purely non-visual and truly needs that breakdown.
|
||||
For UI reports, those columns make screenshot-backed reading harder. Put
|
||||
procedural detail in the row's key observation only when it changes the
|
||||
interpretation of the result.
|
||||
|
||||
Use an extra evidence/detail section only when the inline table cannot carry
|
||||
the material cleanly, such as long CLI transcripts, HAR summaries, or multiple
|
||||
screenshots for one case. In that situation, keep the table evidence cell as a
|
||||
short inline proof or link, then put the longer material under `Verification`
|
||||
or a brief `Additional Evidence` section.
|
||||
|
||||
Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
|
||||
missing; a blocked case is not a pass).
|
||||
|
||||
@@ -24,39 +24,53 @@ DATE_HUMAN=$(date '+%Y-%m-%d %H:%M')
|
||||
DATE_ISO=$(date '+%Y-%m-%dT%H:%M:%S%z')
|
||||
|
||||
cat > "$DIR/report.md" << EOF
|
||||
# Test Report: $TITLE
|
||||
# 测试报告:$TITLE
|
||||
|
||||
## Scope
|
||||
## 范围
|
||||
|
||||
<!-- What changed / what is being verified -->
|
||||
<!-- 测试目标 / 变更范围 / 重点风险 -->
|
||||
|
||||
- Branch: \`$BRANCH\`
|
||||
- Commit: \`$COMMIT\`
|
||||
- Date: $DATE_HUMAN
|
||||
- 分支:\`$BRANCH\`
|
||||
- 当前提交:\`$COMMIT\`
|
||||
- 日期:$DATE_HUMAN
|
||||
- 表面:<!-- CLI / Electron + CDP / Web / Bot:<platform> -->
|
||||
- 测试页 / 入口:<!-- e.g. /settings or http://localhost:3010 -->
|
||||
- 重点:<!-- 本轮最关心的体验、功能或回归点 -->
|
||||
|
||||
## Environment
|
||||
## 用例
|
||||
|
||||
- Server: <!-- e.g. http://localhost:3010 -->
|
||||
- Surfaces: <!-- cli / electron / web / bot:<platform> -->
|
||||
| # | 用例 | 结果 | 关键现象 | 证据 |
|
||||
| - | ---- | ---- | -------- | ---- |
|
||||
| 1 | | 待测 | |  |
|
||||
|
||||
## Cases
|
||||
## 结论
|
||||
|
||||
| # | Case | Surface | Steps | Expected | Actual | Status | Evidence |
|
||||
| - | ---- | ------- | ----- | -------- | ------ | ------ | -------- |
|
||||
| 1 | | | | | | | |
|
||||
整体结论:\`pending\`。
|
||||
|
||||
## Evidence
|
||||
<!-- 用 1-2 段概括用户最需要知道的结果;失败和阻塞必须明确说明影响。 -->
|
||||
|
||||
<!-- Embed screenshots:  -->
|
||||
<!-- CLI transcripts in fenced blocks, with the exact command -->
|
||||
仍需处理 / 跟进:
|
||||
|
||||
## Verdict
|
||||
- <!-- TODO -->
|
||||
|
||||
- Passed: 0 / 0
|
||||
- Failed: 0
|
||||
- Blocked: 0
|
||||
- Score (optional): —
|
||||
- Open issues / follow-ups:
|
||||
## 本轮验证
|
||||
|
||||
<!-- 如有自动化或命令行验证,保留精简命令与结果;没有则写“未运行额外自动化验证”。 -->
|
||||
|
||||
\`\`\`bash
|
||||
# command
|
||||
\`\`\`
|
||||
|
||||
结果:
|
||||
|
||||
- <!-- TODO -->
|
||||
|
||||
## 评分
|
||||
|
||||
- 通过:0
|
||||
- 失败:0
|
||||
- 阻塞:0
|
||||
- 评分:— / 100
|
||||
EOF
|
||||
|
||||
cat > "$DIR/result.json" << EOF
|
||||
|
||||
Reference in New Issue
Block a user