mirror of
https://github.com/lobehub/lobe-chat.git
synced 2026-06-14 03:30:19 +00:00
📝 docs(agent-testing): require inline visual evidence (#15750)
This commit is contained in:
@@ -192,6 +192,10 @@ Two hard rules worth front-loading:
|
||||
`# | case | result | key observation | evidence` shape and embed the
|
||||
screenshot/GIF in the evidence cell. Use separate evidence sections only for
|
||||
long CLI transcripts, HAR summaries, or supplemental detail.
|
||||
- **Visual evidence must render inline.** Screenshots and GIFs in `report.md`
|
||||
must use Markdown image syntax like ``. Do not
|
||||
use bare file paths, Markdown links, or local file links as the primary
|
||||
visual evidence; those make the report unreadable without opening each asset.
|
||||
- **Time-based behavior needs a GIF, not a screenshot.** If a case asserts
|
||||
change over time (streaming output, a ticking timer, loading states,
|
||||
animations), record it with `scripts/record-gif.sh` and embed the GIF —
|
||||
|
||||
@@ -34,6 +34,7 @@ output):
|
||||
- UI (static state): `agent-browser screenshot` or `capture-app-window.sh`,
|
||||
then **verify the screenshot with the Read tool before citing it** —
|
||||
never cite an image you haven't looked at.
|
||||
|
||||
- UI (time-based behavior): **screenshot vs GIF is a judgment you must
|
||||
make per case.** If the assertion is about change over time — streaming
|
||||
output, a ticking timer, loading/progress states, animations,
|
||||
@@ -52,12 +53,15 @@ output):
|
||||
at least the first/last frames visually (Read the GIF) before citing.
|
||||
|
||||
- CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
|
||||
|
||||
- Network: `agent-browser network requests` dumps or HAR files.
|
||||
|
||||
3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
|
||||
The primary evidence belongs in the case table itself: each row should pair
|
||||
the assertion with the screenshot/GIF/link that proves it, so readers can
|
||||
scan the result without jumping between sections.
|
||||
the assertion with the screenshot/GIF or non-visual artifact that proves it,
|
||||
so readers can scan the result without jumping between sections. UI evidence
|
||||
must render inline with Markdown image syntax; a plain link or file path is
|
||||
not acceptable as primary visual evidence.
|
||||
|
||||
4. **Set the verdict** in both `report.md` and `result.json`, then link the
|
||||
report directory in your final answer to the user.
|
||||
@@ -96,6 +100,27 @@ scenario or regression assertion, and put the screenshot/GIF directly in the
|
||||
| 2 | Respect requested length | fail | Requested about 600 Chinese characters; final body was about 1286 |  |
|
||||
```
|
||||
|
||||
## Inline visual evidence
|
||||
|
||||
Screenshots and GIFs must be embedded so the report shows the image inline:
|
||||
|
||||
```markdown
|
||||

|
||||

|
||||
```
|
||||
|
||||
Do **not** use these as the primary evidence for UI cases:
|
||||
|
||||
```markdown
|
||||
[case 1 result](assets/case1-result.png)
|
||||
assets/case1-result.png
|
||||
file:///tmp/case1-result.png
|
||||
```
|
||||
|
||||
Links are acceptable for non-visual artifacts such as CLI transcripts, HAR
|
||||
files, or long logs. For videos, embed a representative screenshot/GIF inline in
|
||||
the case row and link the full video as supplemental evidence.
|
||||
|
||||
Avoid the old wide table with separate `steps`, `expected`, and `actual`
|
||||
columns unless the test is purely non-visual and truly needs that breakdown.
|
||||
For UI reports, those columns make screenshot-backed reading harder. Put
|
||||
@@ -104,9 +129,10 @@ interpretation of the result.
|
||||
|
||||
Use an extra evidence/detail section only when the inline table cannot carry
|
||||
the material cleanly, such as long CLI transcripts, HAR summaries, or multiple
|
||||
screenshots for one case. In that situation, keep the table evidence cell as a
|
||||
short inline proof or link, then put the longer material under `Verification`
|
||||
or a brief `Additional Evidence` section.
|
||||
screenshots for one case. In that situation, keep the table evidence cell as an
|
||||
inline visual proof for UI cases or a concise link for non-visual artifacts,
|
||||
then put the longer material under `Verification` or a brief
|
||||
`Additional Evidence` section.
|
||||
|
||||
Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
|
||||
missing; a blocked case is not a pass).
|
||||
@@ -147,7 +173,8 @@ word the user reads first: `pass`, `fail`, or `partial`.
|
||||
## Rules
|
||||
|
||||
- **No evidence, no claim** — every `pass`/`fail` in the case table must link
|
||||
at least one asset.
|
||||
at least one asset. UI cases must inline-embed their primary screenshot/GIF;
|
||||
non-visual CLI/network cases may link transcripts, HAR files, or logs.
|
||||
- **Screenshots must be visually verified** with the Read tool before being
|
||||
cited.
|
||||
- **Report failures faithfully** — a failing case with clear evidence is a good
|
||||
|
||||
Reference in New Issue
Block a user