Compare commits

...

59 Commits

Author SHA1 Message Date
yutengjing 5b19ca3990 💄 style: extend artifact code background 2026-06-12 11:29:19 +08:00
yutengjing c8b5e337c0 🐛 fix: keep artifact code panel scrolled 2026-06-12 11:12:34 +08:00
Arvin Xu 61586b9377 🐛 fix(agent): persist & deliver image attachments for device/sandbox hetero runs (#15685)
* 🐛 fix(agent): persist file attachments in hetero early-exit user message

The hetero-agent early exit in execAgent created the user message without
the `files` relation, so attachments sent from the SPA gateway path
(executionTarget=device / sandbox) were never linked via messagesFiles and
disappeared once the optimistic client message was replaced by the server
snapshot. Attach the deduped `fileIds` the same way sendMessageInServer
does on the local-mode path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): deliver image attachments to device/sandbox hetero runs

Persisting the messagesFiles relation fixed display, but the dispatched
CLI still never saw the image — local mode feeds the persisted imageList
into sendPrompt for vision, while the device/sandbox dispatch protocols
(agent_run_request / sandbox runner) only carried a text prompt.

- resolve attached images into signed URLs in the hetero early exit
  (metadata-only, non-fatal) and carry them through heteroParams
- add imageList to the agent_run_request wire type and dispatchAgentRun
  params (gateway client + server service)
- extract buildHeteroExecStdinPayload into @lobechat/heterogeneous-agents
  so the three dispatch sites (desktop spawnLhHeteroExec, lh connect
  daemon, server sandbox runner) build the same content-block payload:
  systemContext, prompt, then image blocks
- lh hetero exec already coerces image blocks via coerceJsonPrompt and
  normalizeImage (url → base64 for Claude Code, materialized path for
  Codex), so no CLI consumer changes are needed

openclaw/hermes (runHeteroTask) keep text-only prompts — their dispatch
goes through a separate one-shot tool protocol.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(heterogeneous-agents): move exec stdin wire contract to a pure /protocol entry

The server sandbox runner imported `buildHeteroExecStdinPayload` through the
`/spawn` barrel, which (with no `sideEffects` hint) bundles the whole spawn
machinery into the Next.js server chunk. Its `process.cwd()`-rooted dynamic
fs calls then make Vercel's output file tracing glob the entire repo source
tree into every serverless function (+~69 MB each), pushing the 4 largest
functions past the 250 MB uncompressed limit and failing the deployment.

Split the dispatch wire contract (stdin payload builder + content-block
types) into a new pure, isomorphic `/protocol` export and point all three
dispatch sites (server sandbox runner, desktop main, `lh connect` daemon) at
it. `/spawn` re-exports the moved symbols so executor-side callers are
unaffected. Also declare `sideEffects: false` for the package.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 00:02:51 +08:00
Arvin Xu eca449e4e2 feat(skills): agent-testing iteration after first real-world run (#15700)
* 📝 docs(skills): make agent-testing Step 0 an env-setup + auth checklist

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(skills): agent-testing probes, GIF evidence, and report-language rule

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 23:52:25 +08:00
renovate[bot] 6c8976b641 Update dependency vitest to v3.2.6 [SECURITY] (#15698)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 23:34:38 +08:00
Arvin Xu 60d9d3c3c7 ♻️ refactor(skills): merge local-testing and cli-backend-testing into agent-testing (#15699)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 23:14:45 +08:00
Arvin Xu 2dd4cf7a1d fix(agentDocument): replace getDocuments with listDocuments in useFetchAgentDocuments to avoid over-fetching (#15301)
* fix(agentDocument): listDocuments returns templateId and derived fields

* fix(agentDocument): useFetchAgentDocuments use listDocuments instead of getDocuments

* fix(agentDocument): derive AgentDocumentItem from listDocuments return type

* fix(agentDocument): export AgentDocumentListItem type

* 🐛 fix(agentDocument): align list projections and consumers after rebase onto canary

- listDocumentsForTopic now returns the same projection as listDocuments
  (derived fields + templateId), so the tRPC union no longer collapses
  the inferred client type to the old 8-field shape
- add description/updatedAt to both projections for sidebar consumers
- AgentDocumentsGroup switches getDocuments -> listDocuments (it already
  shared the documentsList SWR key)
- makePendingDocument trimmed to the lean list item shape
- update useFetchAgentDocuments test to the listDocuments behavior

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agentDocument): migrate agentDocumentSkills sync to slim listDocuments

The tool store's skill registry sync shared agentDocumentSWRKeys.documentsList
with the working sidebar and the new useFetchAgentDocuments hook, but still
fetched the full getDocuments payload. Sharing one SWR key across different
payload shapes made the cached result order-dependent: whichever consumer
mounted first decided whether the cache held the heavy full documents or the
slim list items. Migrate the skills sync to listDocuments, whose projection
covers every field mapDocsToSkills reads.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:41:24 +08:00
Arvin Xu 575ef1e8ee ♻️ refactor(agent): single-track device-tool injection via execution plan (#15683)
* ♻️ refactor(agent): single-track device-tool injection via execution plan

P3 follow-up to #15669 — downstream layers now consume the resolved
ExecutionPlan instead of re-deriving device capability:

- ExecutionPlan carries the effective `target`; persisted into
  state.metadata.executionPlan via createOperation
- call_llm executor gates buildStepToolDelta's activeDeviceId signal on
  the plan (none/sandbox can never re-inject local-system mid-run)
- AgentToolsEngine consumes the plan's target; redundant rule-level
  canUseDevice checks removed (physical manifest walls remain)
- builtin agent runtime config can now override agencyConfig
  (web-onboarding pins executionTarget=none)
- hetero desktop 'local' selection persists this desktop's deviceId so
  opening the agent from web dispatches to the same machine via gateway
- 'local' vs 'device' stay distinct user choices even for the same
  machine: gateway dispatch streams progress to all clients (mobile),
  IPC is faster but desktop-session-only — guarded by a regression test

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): enforce device access policy on hetero dispatch

resolveDeviceAccessPolicy now runs BEFORE the hetero early exit and feeds
canUseDevice into the hetero execution plan: a denied sender (external
bot user) degrades local/device-bound CLI hetero runs to the cloud
sandbox instead of dispatching to the owner's machine, and requestedDeviceId
cannot bypass the policy. Remote hetero agents (openclaw/hermes) are
device-only with no sandbox fallback, so denied senders are refused
outright.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(agent): fix interface field order in RuntimeSelectionContext

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:39:11 +08:00
YuTengjing ba6976c063 🐛 fix: pause input completion after errors (#15692) 2026-06-11 22:05:45 +08:00
Innei bfdfd3bca3 🐛 fix(desktop): adjust mac fullscreen titlebar spacing (#15693) 2026-06-11 22:02:48 +08:00
YuTengjing f6c23e3654 🐛 fix(agent-runtime): persist assistant reasoning to DB (#15690) 2026-06-11 21:05:23 +08:00
Arvin Xu 813d756b9c 🐛 fix(editor-canvas): re-check editor init state before subscribing (#15686)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:42:28 +08:00
renovate[bot] 671bc26e0d Update opentelemetry-js-contrib monorepo (#13582)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 20:41:48 +08:00
renovate[bot] 309c25cb44 Update dependency code-inspector-plugin to v1.3.6 (#14612)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 20:41:40 +08:00
Arvin Xu a810bf3dcd 🐛 fix(agent-runtime): always persist assistant reasoning to DB (#15687)
* 🐛 fix(agent-runtime): always persist assistant reasoning to DB

PR #13494 gated message reasoning persistence behind preserveThinking
(agent chatConfig + model extendParams / qwen|zhipu fallback). That gate
is only meant to control whether reasoning is replayed into the next LLM
payload — applying it to the DB write dropped thinking content for every
non-qwen/zhipu reasoning model in server-side agent mode: reasoning
streamed live via stream_end but vanished after refresh.

Restore unconditional reasoning persistence in messageModel.update and
keep the preserveThinking gate only for state.messages payload replay.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(i18n): localize callSubAgent tool labels

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:41:29 +08:00
Arvin Xu 7d6be512b8 🐛 fix(model-runtime): align tool-calling fallback tests & surface missing tool call as error (#15691)
*  test(model-runtime): align tool-calling fallback tests with new return shape

#15680 changed generateObject's tool-calling fallback to return the parsed
schema object (same shape as the json_schema path) instead of an array of
tool calls, and reworked its error handling, but left the pre-existing
"tool calling fallback" block in index.test.ts asserting the old behavior,
breaking CI on canary:

- result is now the parsed object, not [{ name, arguments }]
- the no-tool-call path returns undefined via debug log without console.error
- the parse-failure path logs the single matched tool call, not the array

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(model-runtime): surface missing tool call in generateObject fallback as error

tool_choice forces the structured-output function, so a response without a
tool call means the provider misbehaved. #15680 routed this branch to a
debug-namespace log that is invisible in production, leaving callers with
an unexplained undefined. Log it via console.error with the response
message as context, matching the parse-failure branch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:40:12 +08:00
LiJian 1130f7df32 feat(devices): add browser device pairing flow (#15678)
*  feat: add browser device pairing flow to /settings/devices

- Add "Via Browser" tab to ConnectDeviceModal with pairing code display and input
- Add "Register this browser as a device" callout card above DeviceList
- Support ?pair=<code> URL param to auto-open browser pairing modal with pre-filled code
- Improve DeviceList empty state with method cards (Desktop + CLI)
- Ship en-US and zh-CN i18n keys for all new browser/sync strings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🔨 fix(devices): fix lint warnings — import sort order and empty catch block

* fix(devices): add pair API route and invalidate device list cache

- Create /api/devices/pair POST handler that authenticates the user via
  Better Auth session, validates the code against the user's registered
  devices via DeviceModel.findByDeviceId, and returns JSON.
- Replace the setListKey/key-prop re-mount trick with
  lambdaQuery.useUtils().device.listDevices.invalidate() so the tRPC
  React Query cache is properly busted after a successful pair (fixes
  staleTime: 30s preventing the new device from appearing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ♻️ refactor(devices): drop browser pairing, fix modal close, redesign UI

- Remove the "Via Browser" pairing flow entirely: browser tab in
  ConnectDeviceModal, the "register this browser" callout card, the
  ?pair=<code> deep-link, and the /api/devices/pair stub route. Only the
  real Desktop and CLI connection methods remain.
- Fix the modal that couldn't be closed: @lobehub/ui Modal closes via
  onCancel (antd), not onClose — the X button was a no-op.
- Redesign the connect modal (segmented tabs, numbered steps, command
  blocks with copy, security footer) and the empty state (onboarding
  hero with Desktop/CLI options + capability cards).
- Clean up browser/sync i18n keys; add capabilities + footer keys for
  en-US and zh-CN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 fix(devices): apply card radius — cssVar.borderRadius already has unit

The radius tokens (cssVar.borderRadius / borderRadiusLG) already include
their unit, so the trailing `px` produced `var(--…)px`, which browsers
drop — leaving the cards with sharp corners. Drop the `px` so the cards
pick up the same rounded radius as the appearance settings FormGroup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 19:50:28 +08:00
Arvin Xu e20496e444 🐛 fix(codex): persist model metadata and file diffs (#15672)
* 🐛 fix(codex): persist model metadata

* 🐛 fix(codex): show file change diffs
2026-06-11 19:15:45 +08:00
Innei dbc8d76c8d feat(desktop): restore cloud desktop builds (#15666) 2026-06-11 19:14:26 +08:00
renovate[bot] ecfdac5395 Update dependency @opentelemetry/sdk-node to ^0.217.0 [SECURITY] (#14687)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 18:51:20 +08:00
YuTengjing 5f4bec347b 🐛 fix(model-runtime): improve DeepSeek structured output (#15680) 2026-06-11 16:57:57 +08:00
Arvin Xu 77e4d0492b ♻️ refactor(agent): resolve device routing via a single execution plan (#15669)
- add resolveExecutionPlan as THE device decision (none/sandbox never
  route to a device; offline bindings stay unrouted; single-online-device
  auto-activation only for device-capable targets)
- fix executionTarget=none being bypassed by single-device auto-activation
  (background runs executed device tools despite 无设备)
- stop exposing the remote-device proxy in none/sandbox sessions
- converge native execAgent, hetero dispatch fork and client
  selectRuntimeType onto the shared resolution
- drop the legacy per-platform chatConfig.runtimeEnv.runtimeMode fallback
  entirely (no migration: unset targets resolve to platform defaults)

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 16:29:37 +08:00
Neko a60d11df48 🐛 fix(chat): preserve message order after tool results (#15657) 2026-06-11 16:18:18 +08:00
YuTengjing 14501ea69a 🐛 fix: keep model guard in provider grouping (#15681) 2026-06-11 15:35:15 +08:00
Arvin Xu b76992e581 feat(file-preview): support remote read-only local previews (#15673)
*  feat(file-preview): support remote read-only local previews

*  feat(local-file): identify tabs by context

* ♻️ refactor(file-preview): route previews through project file service

* 🐛 fix(desktop): clamp nav panel width

*  feat(file-preview): improve local preview controls

* 🐛 fix(file-preview): reload html after refresh completes
2026-06-11 15:10:25 +08:00
Arvin Xu 97e4e345d1 🔨 chore(codecov): update coverage grouping (#15650)
🔨 chore: update codecov coverage grouping
2026-06-11 14:40:06 +08:00
cokeSEE1 c609a60f0e 🔨 chore(ci): bump outdated action versions to latest (#15655)
- actions/checkout@v4 -> @v6 in issue-auto-comments.yml
  (last remaining @v4 usage; all other 48 uses are already @v6)
- actions/github-script@v7 -> @v8 in release-desktop-canary.yml
  (last remaining @v7 usage; all other 4 uses are already @v8)

Co-authored-by: 章岚 <zhanglan@datagrand.com>
2026-06-11 09:54:53 +08:00
renovate[bot] 06bf82f3e0 Update dependency node to v24.16.0 (#14621) 2026-06-11 09:24:21 +08:00
Zhijie He 3ccc23152c 💄 style: add sensenova-6.7-flash-lite & sensenova-u1-fastsupport (#15306) 2026-06-11 09:22:49 +08:00
Zhijie He 3a780a62f6 feat: add AntGroup (蚂蚁百灵) provider support (#13713) 2026-06-11 09:21:54 +08:00
Zhijie He e98ad7edca 💄 style: update models for Longcat, support api fetch model list (#15134) 2026-06-11 09:20:55 +08:00
Arvin Xu 686778fe51 feat(file-preview): render HTML files inline (#15671)
 feat(file-preview): render html files inline
2026-06-11 02:39:05 +08:00
Arvin Xu 914976a52f feat(model-bank): knowledgeCutoff batch 2, metadata skill & always-visible tab bar (#15663)
*  feat(model-bank): backfill knowledgeCutoff batch 2 and restore lost Anthropic values

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 📝 docs(skills): add model-bank-metadata skill for cutoff/family backfill

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(model-bank): Claude Fable 5 belongs to the claude-mythos family

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(desktop): always surface the tab bar by creating a tab on first navigation

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ♻️ refactor(model-bank): family is the product lineage (claude-opus/sonnet/haiku), not the brand

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): backfill activeAgentId before paint on tab/route switches

Tab switches are plain route navigations, so leaving an agent page cleared
activeAgentId via a passive useUnmount and the next page re-set it in a
passive useEffect — the first painted frame always had no active id, flashing
a skeleton even when agentMap already cached the config. Move both the
backfill and the unmount clear to layout effects: removed-tree layout
cleanups run before new-tree layout effects in one commit, so the clear can
never wipe a freshly synced id and the id is in place before paint.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(agent): surface agent config fetch errors with a retry action

isAgentConfigLoading only knows "no data yet", so a failed fetch (e.g. a 401
that SWR deliberately does not retry, with no focus revalidation inside a
single Electron window) left the agent page on a skeleton forever — only a
manual reload recovered. Record per-agent fetch errors in
agentConfigErrorMap (set by onError, cleared on data / retry), expose
currentAgentConfigError / isAgentConfigError selectors, add a
retryAgentConfigFetch action that revalidates the agent's SWR entries, and
show an error alert with a retry button above the main chat input while the
config is still missing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(ci): sync model metadata test expectations

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:29:17 +08:00
Arvin Xu fdd955404d feat(codex): add collab tool render (#15662)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:15:29 +08:00
LiJian 6d47c1d07e feat(connector): fold OAuth into the custom MCP (PluginDevModal) form (#15661)
*  feat(connector): support API key / custom header / OAuth auth in custom connector

Make the connector backend a full replacement for the legacy custom-MCP plugin form:

- connector create/update now accept bearer/apikey/header credentials (encrypted at rest);
  oauth2 stays callback-only
- map apikey → bearer auth and header → request headers in both the sync path
  (syncTools + callTool) and the agent-runtime manifest path
- pass custom HTTP headers through to the MCP client
- AddConnectorModal becomes a rich form: MCP type (HTTP/STDIO), auth type
  (None / API Key / Custom Headers / OAuth), reusing the plugin form inputs;
  OAuth keeps the existing popup authorize flow, others create + sync directly

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(connector): fold OAuth into the PluginDevModal MCP form

Pivot the custom-MCP entry to reuse the rich PluginDevModal / MCPManifestForm
instead of a bespoke connector modal, and add OAuth as an auth type inside it:

- MCPManifestForm: gated `enableOAuth` adds an "OAuth" auth type with
  Client ID / Secret (optional) + redirect-URI hint. Only the custom-connector
  entry enables it, so plain custom-plugin DevModal callers (editing plugins,
  agent tools, …) are unaffected.
- DevModal: opens the OAuth popup synchronously on the save click (browsers
  block window.open once an async boundary is crossed), validates, then hands
  the popup to onSave which navigates it to the authorize URL.
- New CustomConnectorModal wraps DevModal and persists every auth type onto the
  connector backend (none / bearer / custom headers → create + sync; OAuth →
  create with OIDC config + run the authorize popup).
- settings/skill entry now opens CustomConnectorModal; the standalone
  AddConnectorModal rich rewrite from the previous commit is reverted to the
  canary original (it is only referenced by the unused ConnectorList).
- i18n: dev.mcp.auth.oauth* keys (default + en-US + zh-CN).

Backend stays as in the prior commit (connector create/update accept
bearer/apikey/header credentials; sync + manifest paths apply them).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(connector): route the OAuth auth type through the authorize flow, not the token-less manifest test

Selecting OAuth and clicking "Test connection" called the plugin manifest test
(getStreamableMcpServerManifest), which connects with no token and 401s on any
OAuth-gated server (e.g. Linear MCP / DCR). For OAuth there is nothing to test
without authorizing first, so the button now becomes "Authorize & Connect" and
runs the connector OAuth flow (discovery + DCR + authorize popup), shared with
the footer save button via DevModal.runOAuthFlow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(connector): make connector.create idempotent on (user, identifier)

Re-adding or re-authorizing a custom connector with an existing identifier hit
the user_connectors unique constraint and 500'd. Now an existing row is updated
(reset to disconnected, refreshed name/url/oidcConfig/credentials) and its id
reused, instead of inserting a duplicate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(skill-store): route Add Custom MCP through the connector modal, drop the Custom tab

- Skill Store "Add → Add Custom MCP Skill" now opens CustomConnectorModal
  (connector backend + OAuth), matching the settings/skill entry, instead of
  the legacy plugin DevModal (installCustomPlugin + togglePlugin).
- Remove the now-redundant "Custom" tab from the Skill Store (custom MCP lives
  in the connector list now): drop SkillStoreTab.Custom, its tab option,
  CustomList render, and the matching search branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 01:00:38 +08:00
renovate[bot] c65cf8c2a0 Update dependency @opentelemetry/auto-instrumentations-node to ^0.76.0 [SECURITY] (#14686)
Update dependency @opentelemetry/auto-instrumentations-node to ^0.75.0 [SECURITY]

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 00:09:31 +08:00
Arvin Xu 981c57d6f9 🐛 fix(codex): scope repeated tool results (#15659)
* 🐛 fix(codex): scope repeated tool results

* 💄 style(codex): refine local file link states
2026-06-10 23:22:56 +08:00
Arvin Xu 87eba86514 chore(model-bank): backfill knowledgeCutoff + family/generation data (#15642)
*  feat(model-bank): backfill knowledgeCutoff for OpenAI/Claude/Llama/Phi families (batch 1)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(model-bank): add family/generation fields with rule-derived data for chat models

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(model-bank): add canonical knowledge-cutoff map with build-time backfill

Adds MODEL_KNOWLEDGE_CUTOFFS (canonical id → YYYY-MM, all values verified
against official provider docs) plus normalizeModelIdForCutoff, which reduces
provider-specific spellings (openrouter/bedrock prefixes, dated snapshots,
-thinking/-fast/-latest/-preview variants, claude dot-versions) to canonical
ids. buildDefaultModelList backfills knowledgeCutoff from the map when a model
card has no inline value, so all aggregator providers inherit cutoffs
automatically; inline values always win.

Covers Anthropic (incl. legacy 3.x), OpenAI, Google Gemini/Gemma, xAI Grok,
Meta Llama, Amazon Nova, and Cohere. DeepSeek/Qwen/GLM/Kimi/MiniMax/Mistral
publish no official cutoffs and are intentionally absent. Anthropic inline
PoC entries migrate into the map (single source of truth).

Cross-checked against the batch-1 inline backfill: 0 value mismatches.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(model-bank): correct Claude Sonnet 4.6 cutoff

*  test(model-bank): sync metadata expectations

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:59:36 +08:00
Rdmclin2 09e6f02e45 🔨 chore: modify workspace sidebar (#15658)
* chore: change back to user style sidebar panel

* chore: optimize personal menu

* chore: update i18n files
2026-06-10 22:21:27 +08:00
Arvin Xu a2ea314cd8 feat(codex): refine Codex tool renders (#15651)
* 💄 style(codex): refine file change tool render

*  feat(codex): add web search tool render

*  feat(codex): add mcp tool render

*  feat(codex): improve tool command display

* 💄 style(files): refine explorer tree icons

*  test: fix local file link render props
2026-06-10 22:13:56 +08:00
Arvin Xu e2be720726 🐛 fix(agent-runtime): keep async sub-agent stream alive (#15646)
* 🐛 fix: keep async sub-agent stream alive

* 🐛 fix: preserve async tool resume parent chain
2026-06-10 22:12:22 +08:00
Arvin Xu 8b6905ec7e 💄 style(desktop): tighten tab close button right padding (#15636)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:12:02 +08:00
Arvin Xu e4830943cf 🔨 chore(model-bank): add knowledgeCutoff field to model cards (#15640)
*  feat(model-bank): add knowledgeCutoff field with Anthropic models as PoC

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(model-bank): add family/generation fields to model card types

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 20:02:34 +08:00
Arvin Xu 5dfb6fc288 chore: clean [LOBE-XXX] code annotations (2026-06-10) (#15623)
chore: clean up [LOBE-XXX] code annotations (2026-06-10)

Remove LOBE-XXX markers from comments and URLs across 7 files:
- apps/cli/hetero.ts & hetero.test.ts: Remove LOBE-10157 markers, keep context
- apps/server/ModelRuntime: Remove LOBE-10056, keep PK migration note
- packages/database/rbac.ts: Remove LOBE-9193, keep API doc
- scripts/codemodWorkspaceNav.ts: Remove LOBE-9024 from description
- parse.ts & parse.test.ts: Replace LOBE-10141/LOBE-123 with generic IDs

Co-authored-by: lobehub-bot <lobehub-bot@users.noreply.github.com>
2026-06-10 19:59:54 +08:00
Arvin Xu 94ea3f6a34 🚀 release: 20260610 (#15647)
# 🚀 LobeHub Release (20260610)

**Release Date:** June 10, 2026  
**Since v2.2.2:** 131 merged PRs · 13 contributors

> This weekly release strengthens agent collaboration across cloud,
desktop, CLI, and workspace flows, with steadier runtime behavior and a
broader foundation for workspace-scoped data.

---

##  Highlights

- **Agent execution across devices** — Unifies per-device working
directories, project skill discovery, and sub-agent suspend/resume
behavior across server, QStash, and device RPC flows. (#15543, #15566,
#15481, #15620, #15591)
- **Connector and sandbox platform** — Expands connector permissions,
custom OAuth MCP connector onboarding, sandbox provider support, and
user-uploaded file sync into cloud sandbox runs. (#15463, #15546,
#15184, #15550)
- **Desktop and CLI reliability** — Fixes desktop cold-start,
auto-update, Windows build, CLI skill discovery, and `lh connect` agent
dispatch paths. (#15547, #15525, #15527, #15562, #15632, #15634)
- **Pages and sharing** — Refreshes topic sharing, improves Page Editor
layout behavior, and routes Page Agent tool execution through the
server-side editor path. (#15581, #15556, #15588, #15023, #15610)
- **Model availability and provider updates** — Adds user-scoped LobeHub
model availability, Claude Fable 5, Qwen thinking preservation, and
MiniMax M3 updates. (#15590, #15639, #13494, #15376)

---

## 🏗️ Core Product & Architecture

### Agent Runtime & Heterogeneous Agents

- Improves sub-agent lifecycle handling, including async suspend/resume,
queue-mode QStash resume delivery, and blocking nested sub-agent calls.
(#15481, #15620, #15575)
- Stabilizes heterogeneous agent ingestion and streaming with raw stream
dumps, per-turn usage, image forwarding on regenerate, and
duplicate-text fixes. (#15602, #15577, #15592, #15585)
- Adds execution-device and working-directory controls across device
RPC, legacy defaults, and remote-spawned Claude Code sessions. (#15543,
#15566, #15591, #15572)
- Improves runtime diagnostics and compatibility, including Gemini
multimodal output capture, abort stream semantics, and trace quality
analysis. (#15535, #13677, #15508)

---

## 📱 Platforms, Integrations & UX

### Connectors, Sandbox & Tools

- Ships API-level connector tool permissions, custom OAuth MCP connector
onboarding, and connector-first runtime execution. (#15463, #15546)
- Adds sandbox provider support, cloud sandbox file sync, and safer
external URL file input handling with SSRF validation. (#15184, #15550,
#12657)
- Improves tool visibility and execution with pinned app-fixed tools,
ANSI output rendering, gateway-tunneled MCP calls, and automatic
headless tool runs. (#15509, #15516, #15469, #15492)

### Desktop, CLI & Web UX

- Restores desktop startup and reload behavior, preserves IPC error
causes, and keeps the tab bar new-tab action visible across routes.
(#15547, #15597, #15638)
- Fixes desktop update and build stability for browser quit guards,
macOS update signing, and Windows Visual Studio detection. (#15525,
#15527, #15562)
- Shows the plan-limit upgrade UI on desktop builds. (#15628)
- Adds the Agent Run delivery checker and fixes CLI device dispatch plus
skill list/search output. (#15489, #15634, #15632)
- Refreshes onboarding, auth source preservation, topic UI states,
referral/Fable campaign copy, and chat-input control bar behavior.
(#15629, #15544, #15573, #15614, #15616, #15617, #15622, #15643)

---

## 🔒 Security, Reliability & Rollout Notes

- External URL file input now includes SSRF validation for safer Google
file handling. (#12657)
- Database workspace-scope migrations are part of this release;
self-hosted operators should run the normal migration path before
serving the updated app. (#15446, #15465, #15468, #15472)
- The release branch was re-cut from `canary` and includes the latest
`main` release-version commit so `v2.2.2` is the verified compare base.

---

## 👥 Contributors

@ONLY-yours, @sxjeru, @hardy-one, @xujingli, @hezhijie0327, @Coooolfan,
@arvinxx, @tjx666, @Innei, @rivertwilight, @rdmclin2, @cy948,
@AmAzing129

**Full Changelog**:
https://github.com/lobehub/lobehub/compare/v2.2.2...release/weekly-20260610-recut-3
2026-06-10 19:35:47 +08:00
YuTengjing b8339abc76 🐛 fix: show plan limit upgrade UI on desktop builds (#15628) 2026-06-10 18:19:25 +08:00
Innei c037609b8b 💄 style(chat-input): fix control bar height jump when TokenTag appears (#15643) 2026-06-10 17:43:13 +08:00
René Wang b8b37cffa3 feat: refresh topic sharing experience (share page + popover) (#15581) 2026-06-10 17:43:02 +08:00
Rdmclin2 e8e4b2e822 feat: support workspace lobehub (#13977)
feat: support workspace (full) — store→business-hook + workspace router
2026-06-10 17:34:12 +08:00
Arvin Xu c02e5720c2 feat(model-bank): add claude-fable-5 to Anthropic models (#15639)
*  feat(model-bank): add claude-fable-5 to Anthropic models

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): allow adding directory topics on web when agent targets a bound device

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:57:57 +08:00
Arvin Xu 3fb732da66 💄 style(desktop): keep tab bar new-tab button visible on every route (#15638)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:01:38 +08:00
Arvin Xu fdb529d598 🐛 fix(agent): deliver sub-agent resume bridge via QStash webhook in queue mode (#15620)
* 🐛 fix(agent): deliver sub-agent resume bridge via QStash webhook in queue mode

The callSubAgent completion bridge was a handler-only hook, which lives in
process memory: in queue mode (AGENT_RUNTIME_MODE=queue) HookDispatcher only
delivers webhook-configured hooks, so the bridge never fired — the parent op
stayed parked in waiting_for_async_tool forever after all sub-agents finished.

- Give the bridge hook a webhook config (delivery: qstash) targeting the new
  /api/agent/webhooks/subagent-callback endpoint; local mode keeps the
  in-process handler. Both paths converge on
  AgentRuntimeService.completeSubAgentBridge (backfill + barrier/CAS resume).
- Park-time self-check: after the parked state and operation row are
  persisted, re-run the resume barrier once to recover children that
  completed before the parent finished parking.
- One-shot verify watchdog: when a completion finds the parent not yet
  resumable, schedule a delayed verifyAsyncToolBarrier re-check (no step
  lock, CAS-idempotent, never re-arms).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 📝 docs(agent): correct verify-watchdog rationale comment

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 📝 docs(agent): clarify eventFields trimming rationale

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ♻️ refactor(agent): align subagent-callback with workspace-scoped step worker

Post-rebase adaptation to canary's runtime restructure (#15609):

- Route the webhook bridge through AiAgentService (like the /run step
  worker) so the runtime's models stay workspace-scoped — a bare
  AgentRuntimeService would be personal-scoped and the tool-message
  backfill / resume barrier could miss workspace-scoped rows.
- Extract SubAgentBridgeParams into agentRuntime/types and add the
  completeSubAgentBridge passthrough next to executeStep.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): fail sub-agent callback loudly on backfill or delivery failure

Address two review findings on the resume bridge:

- completeSubAgentBridge now checks updateToolMessage's { success } result
  (it swallows transaction errors instead of throwing) and propagates all
  infrastructure failures. The webhook endpoint then returns non-2xx so
  QStash redelivers the whole bridge — previously a failed backfill was
  acked with 200 and the parent stayed parked forever, since the verify
  recheck only re-reads the barrier and cannot retry the backfill.
- New AgentHookWebhook.fallback: 'none' opts a qstash-delivered hook out of
  the unsigned plain-fetch fallback, which can never authenticate against a
  QStash-signed endpoint and only masked publish failures as silently
  dropped 401s. The bridge hook uses it; dispatch escalates such delivery
  failures to console.error instead of the debug namespace.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:00:17 +08:00
Arvin Xu 4c5c8795ef 🐛 fix(model-runtime): emit stop:abort instead of error when stream is aborted (#13677)
* 🐛 fix(model-runtime): emit stop:abort instead of error when stream request is aborted

When user cancels a streaming request, the provider SDK throws abort errors
(e.g. "Request was aborted"). Previously these were propagated as error chunks,
causing the client to display a provider error message. Now abort errors emit
a stop:abort event through the SSE pipeline, allowing the client to handle
cancellation gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* 🐛 fix(model-runtime): fix type error in abort pipeline test

Use `as const` for type literal to satisfy StreamProtocolChunk union type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

*  test(fetch-sse): add planUpgradeAfterFinish to onFinish expectations

#15616 added planUpgradeAfterFinish to the onFinish context but missed
updating fetchSSE.test.ts, breaking 13 tests on canary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(model-runtime): harden abort detection against non-Error throws

isAbortError assumed error.message is always a string, but catch
clauses receive unknown — a non-Error throw (string, object without
message) would make the abort check itself throw inside the stream
error handler, swallowing both ABORT_CHUNK and the first-chunk error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-10 15:56:39 +08:00
YuTengjing 8b342c600f feat: land new signups directly on onboarding (#15629) 2026-06-10 15:31:32 +08:00
LiJian 723c4d6daa 🐛 fix(cli): handle agent_run_request in lh connect so device dispatch doesn't time out (#15634)
* 🐛 fix(cli): handle agent_run_request in `lh connect` so device dispatch doesn't time out

`lh connect` auto-registers the CLI as a device, so the gateway can pick it
as the dispatch target for a heterogeneous agent run (`agent_run_request`).
But the connect daemon only listened for `system_info_request` and
`tool_call_request` — it never handled `agent_run_request`, so it never sent
`agent_run_ack`. The gateway waited out its ack window and returned
`{error:'TIMEOUT',success:false}`, surfaced server-side as "Hetero agent
device dispatch failed".

Add an `agent_run_request` handler mirroring the desktop app: spawn
`lh hetero exec` fire-and-forget and ack `accepted` immediately. The spawned
process owns the full execution + server-ingest pipeline. It re-invokes the
current CLI entry (process.execPath + argv[1]) rather than relying on `lh`
being on PATH, so it works inside the detached daemon.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: bump the cli version

* chore: bump the cli manifest

* 🐛 fix(cli): ack agent run only after spawn succeeds, reject on spawn error

`child_process.spawn` reports a missing/inaccessible cwd asynchronously via
the child's `error` event, after the handler had already sent an `accepted`
ack. The gateway/server then recorded dispatch success while no `lh hetero
exec` process existed to emit `heteroFinish`, leaving the assistant message
stuck instead of surfacing a failure.

`spawnHeteroAgentRun` now resolves on the child's outcome: `accepted` on the
`spawn` event (stdin is written only then), `rejected` on an early `error`. A
rejected ack returns the gateway 422 → execAgent writes a ServerAgentRuntimeError
onto the assistant message, so a failed dispatch is visible. Still resolves in
milliseconds, well within the gateway's 10s ack window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:19:01 +08:00
LiJian 5b02563659 🐛 fix(cli): skill list/search commands returning empty results (#15632)
🐛 fix: skill list/search commands returning empty results

tRPC endpoints return { data, total } but CLI was treating the result as
an array; switch to result?.data ?? [] and update mocks to match.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 14:20:35 +08:00
YuTengjing a5f16c1184 🐛 fix: import button from ui root (#15599) 2026-06-10 14:19:04 +08:00
YuTengjing 7641cda958 💄 style: update i18n locales (#15630) 2026-06-10 14:02:02 +08:00
lobehubbot 248a4dcab5 🔖 chore(release): release version v2.2.2 [skip ci] 2026-06-04 03:59:37 +00:00
1546 changed files with 63666 additions and 10403 deletions
+212
View File
@@ -0,0 +1,212 @@
---
name: agent-testing
description: >
Agentic end-to-end testing for LobeHub: backend verification via the CLI,
frontend verification via agent-browser (Electron), full-stack verification in
the browser, and bot-channel verification via osascript. Local-first today,
designed to extend to cloud automation. Triggers on 'cli test', 'test with cli',
'verify with cli', 'backend test with cli', 'local test', 'test in electron',
'test desktop', 'test bot', 'bot test', 'test in discord', 'test in telegram',
'test in slack', 'test in wechat', 'test in weixin', 'test in lark', 'test in feishu',
'test in qq', 'manual test', 'osascript', 'test report', or any local
end-to-end verification task.
---
# Agent Testing (Agentic End-to-End Verification)
One skill for all agentic end-to-end testing — local-first today, designed to
also run as full cloud automation. Every test session follows the same
four-step contract:
```
Step 0: Env + Auth → Step 1: Pick surface → Step 2: Run → Step 3: Structured report
```
## Step 0 — Environment setup + auth check (mandatory)
Step 0 is about getting the environment ready: **dependencies are healthy**
and **auth is green**. A test run that dies halfway on a missing dependency or
a login wall wastes the whole session — clear both gates BEFORE writing a
single test step.
### 0.1 Dependencies are installed — root AND standalone apps
The root pnpm workspace does **NOT** cover every app: `pnpm-workspace.yaml`
lists `packages/**`, `e2e`, `apps/server`, and only `apps/desktop/src/main`
**`apps/desktop` and `apps/cli` are standalone**, each keeping its own
`node_modules` with its own links into `packages/`. A root install does not
refresh them, so install in every app the test will touch:
```bash
pnpm install # root workspace
cd apps/desktop && pnpm install # Electron surface
cd apps/cli && pnpm install # CLI surface
```
Symptom of a stale standalone install: the build/launch fails to resolve a
recently added workspace package — `Rolldown failed to resolve import
"@lobechat/<pkg>"` (Electron) or `Cannot find module '@lobechat/<pkg>'` (CLI).
### 0.2 Run scripts from the repo root
All paths in this skill (`./.agents/skills/agent-testing/...`) are
repo-root-relative, and background commands inherit the current working
directory — a script launched while `cwd` is `apps/desktop` fails with
`No such file or directory`. Verify `pwd` is the repo root before launching
long-running scripts.
### 0.3 Auth is green
**Auth is the gate for all automated testing.**
```bash
./.agents/skills/agent-testing/scripts/setup-auth.sh status
```
| Surface | Mechanism | One-key path | Standard check |
| -------- | ------------------------------------------------- | ------------------------------ | ------------------------------ |
| CLI | OIDC Device Code Flow (`apps/cli/.lobehub-dev`) | `setup-auth.sh cli` | `setup-auth.sh status` |
| Web | better-auth cookie injection into `agent-browser` | `pbpaste \| setup-auth.sh web` | `setup-auth.sh web-verify` |
| Electron | App's own persistent login state | Log in once in the app | `app-probe.sh auth` |
| Bot | Native apps already logged in | — | per-platform screenshot |
Login-state checks are standardized — do NOT hand-roll `window.__LOBE_STORES`
eval snippets; use `scripts/app-probe.sh auth` (returns `{ isSignedIn, userId }`,
works for Electron CDP and web sessions via `AB_TARGET`).
If `status` is not all green, fix auth first (the steps that need a human must be
requested from the user explicitly). Full background and failure modes:
[references/auth.md](./references/auth.md).
## Step 1 — Pick the surface by change scope
| Change scope | Default surface | Why | Guide |
| ------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------------------- | ---------------------------------- |
| **Backend** (TRPC router / service / model / migration) | **CLI** | Fastest loop, text-assertable output, zero UI flakiness | [cli/index.md](./cli/index.md) |
| **Pure frontend** (components, store, styles, UX) | **Electron** (agent-browser + CDP) | Primary product shape; `__LOBE_STORES` state introspection | [ui/electron.md](./ui/electron.md) |
| **Full-stack** (new API + UI consuming it) | **Web** (browser + local dev server) | One surface where network requests and UI are observable together | [ui/web.md](./ui/web.md) |
| **Bot channels** (Discord / WeChat / Lark / …) | Native app via osascript / bridge | Only way to exercise the real channel end-to-end | `bot/<platform>/index.md` |
Escalate, don't duplicate: verify a backend change with the CLI first; only add
a UI pass when the change actually affects the UI.
### Environment support (local macOS vs cloud Linux)
The decisive constraint per surface is **how evidence (screenshots) is
captured**: CDP-based capture (`agent-browser screenshot`) renders from the
browser engine and needs no real display; OS-level capture (`screencapture`,
osascript) is macOS-only.
| Surface | macOS (local) | Linux / cloud (headless) | Screenshot mechanism |
| -------- | ------------- | --------------------------------------------------------- | ------------------------------------------------------ |
| CLI | ✅ | ✅ | n/a — text output |
| Web | ✅ | ✅ headless Chromium works natively | CDP — no display needed |
| Electron | ✅ | ⚠️ runs, but needs a display server: wrap with `xvfb-run` | CDP works under Xvfb; `capture-app-window.sh` does NOT |
| Bot | ✅ | ❌ osascript + native apps are macOS-only | macOS `screencapture` only |
When a test must stay cloud-portable, prefer CDP-based evidence over
OS-level capture wherever both exist.
### Bot platforms
| Platform | Guide | Quick switcher |
| ------------- | ------------------------------------------------ | --------------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | bridge (no osascript) |
Each platform folder contains an `index.md` (activation, navigation,
send-message, verification snippets) and a `test-<platform>-bot.sh` script
sharing the interface:
```bash
./.agents/skills/agent-testing/bot/<platform>/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
```
New to osascript automation? Read
[references/osascript.md](./references/osascript.md) first — it is a general
macOS-automation asset (activate, type, paste, screenshot, accessibility reads,
gotchas), not bot-specific.
## Step 2 — Run
Surface guides above carry the detailed workflows. Shared infrastructure:
| Need | Where |
| ------------------------------------ | -------------------------------------------------------------------- |
| Start / restart the local dev server | [references/dev-server.md](./references/dev-server.md) |
| `agent-browser` command reference | [references/agent-browser.md](./references/agent-browser.md) |
| osascript patterns (general macOS) | [references/osascript.md](./references/osascript.md) |
| Agent gateway probing | [references/agent-gateway.md](./references/agent-gateway.md) |
| Screen recording | [references/record-app-screen.md](./references/record-app-screen.md) |
### Scripts
All under `.agents/skills/agent-testing/scripts/`:
| Script | Usage |
| ------------------------- | ------------------------------------------------------------------------------ |
| `setup-auth.sh` | One-stop auth setup & status check (`status` / `cli` / `web`) |
| `app-probe.sh` | LobeHub app probes: `auth` / `route` / `ops` / `goto <path>` / `errors` |
| `record-gif.sh` | Frame-sequence → GIF for time-based behavior (streaming, timers, animations) |
| `report-init.sh` | Scaffold a structured test report (Step 3) |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart, CDP 9222) |
| `capture-app-window.sh` | Screenshot a specific app window (general; used by bot tests) |
| `record-app-screen.sh` | Record app screen (video + periodic screenshots) |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `agent-gateway/` | Gateway probe / dump / analyze tools |
`app-probe.sh` is the LobeHub-specific fast path into app state — auth check,
current route, running operations, and `goto <path>` quick navigation
(`/agent/<agentId>/<topicId>`, `/task/<taskId>`, `/settings`, …) so a test can
jump straight to the state under test instead of clicking through the UI. See
[ui/electron.md](./ui/electron.md#lobehub-probes--quick-navigation) for usage.
## Step 3 — Structured report (mandatory deliverable)
Every automated test session ends with a structured, evidence-backed report —
not a chat-only summary. Scaffold it up front and fill it as you test:
```bash
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh my-feature "Verify my feature")
# ... test, saving screenshots / CLI transcripts into $DIR/assets/ ...
# fill $DIR/report.md (case table, embedded evidence, verdict) and $DIR/result.json
```
Reports live in `.records/reports/<timestamp>-<slug>/` (gitignored): `report.md`
(human-readable, with embedded screenshots), `result.json` (machine-readable
pass/fail + score), `assets/` (evidence). Format spec and evidence rules:
[references/report.md](./references/report.md).
Two hard rules worth front-loading:
- **Report language = the user's conversation language.** Write the ENTIRE
`report.md` (headings included) in the language the user is conversing in —
no mixed English. `result.json` keys/status values stay English.
- **Time-based behavior needs a GIF, not a screenshot.** If a case asserts
change over time (streaming output, a ticking timer, loading states,
animations), record it with `scripts/record-gif.sh` and embed the GIF —
a static screenshot cannot prove the behavior.
## Directory map
```
agent-testing/
├── SKILL.md # this router
├── cli/index.md # backend verification via the LobeHub CLI
├── ui/electron.md # pure-frontend verification in the desktop app
├── ui/web.md # full-stack verification in the browser
├── bot/<platform>/ # bot-channel verification (osascript / bridge)
├── references/ # shared knowledge: auth, dev-server, agent-browser, osascript, report
└── scripts/ # setup-auth, report-init, electron-dev, capture, recording, gateway
```
## Gotchas
- agent-browser: see [references/agent-browser.md](./references/agent-browser.md#gotchas)
- Electron: see [ui/electron.md](./ui/electron.md#electron-gotchas)
- osascript: see [references/osascript.md](./references/osascript.md#gotchas)
@@ -2,7 +2,7 @@
**App name:** `Discord` | **Process name:** `Discord`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -92,6 +92,6 @@ echo "Screenshot saved to /tmp/discord-test-result.png"
## Script
```bash
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -21,7 +21,7 @@ So the test surface is three layers:
curl -sS -m4 -o /dev/null -w '%{http_code}\n' \
"http://127.0.0.1:1234/api/v1/server/info?password=<PW>" # expect 200
```
- **Electron dev running with CDP**: `./.agents/skills/local-testing/scripts/electron-dev.sh start`
- **Electron dev running with CDP**: `./.agents/skills/agent-testing/scripts/electron-dev.sh start`
- The **iMessage Desktop branch** checked out (the `imessageBridge` IPC group
and `@lobechat/chat-adapter-imessage` must be compiled into the main bundle).
Run `pnpm install --ignore-scripts` at the repo root **and** in `apps/desktop/`
@@ -31,7 +31,7 @@ So the test surface is three layers:
## Fast path: automated script
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
./.agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
```
Asserts the whole flow and self-cleans (unique `applicationId` per run, removes
@@ -136,7 +136,7 @@ Verifies the leg the bridge uses to _reply_: `BlueBubblesApiClient.sendText`
`POST /api/v1/message/text`. Run the helper against your own number:
```bash
./.agents/skills/local-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
./.agents/skills/agent-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
```
**Gotcha that bites everyone:** with `method=apple-script` and a _new_
@@ -2,7 +2,7 @@
**App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -56,6 +56,6 @@ screencapture /tmp/lark-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
```
@@ -80,5 +80,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `QQ` | **Process name:** `QQ`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -57,6 +57,6 @@ screencapture /tmp/qq-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
@@ -72,5 +72,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `Slack` | **Process name:** `Slack`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -68,6 +68,6 @@ screencapture /tmp/slack-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `Telegram` | **Process name:** `Telegram`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -75,6 +75,6 @@ curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | j
## Script
```bash
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
```
@@ -75,5 +75,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `微信` or `WeChat` | **Process name:** `WeChat`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -76,6 +76,6 @@ screencapture /tmp/wechat-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
```
@@ -81,5 +81,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
+142
View File
@@ -0,0 +1,142 @@
# CLI Backend Verification
Default surface for verifying **backend changes** (TRPC routers, services,
models, migrations) end-to-end: fastest loop, text-assertable output, zero UI
flakiness.
## When to use
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
## Prerequisites
| Requirement | Details |
| ------------ | --------------------------------------------------------------------------------- |
| Dev server | `localhost:3010` — see [../references/dev-server.md](../references/dev-server.md) |
| CLI source | `apps/cli/` — runs from source, no rebuild; standalone `node_modules` — run `pnpm install` inside `apps/cli/` (root install does not cover it) |
| CLI dev mode | `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials |
| Auth | Device Code Flow login — see [../references/auth.md](../references/auth.md) |
All CLI dev commands run from `apps/cli/`. Subsequent examples use `$CLI`:
```bash
CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
```
## Workflow
### Step 1 — Server up?
See [../references/dev-server.md](../references/dev-server.md) for the health
check, start, and restart commands. Server-side code changes require a restart.
### Step 2 — Auth ready?
```bash
./.agents/skills/agent-testing/scripts/setup-auth.sh status
```
If the CLI is not logged in, **the user must run the login themselves**
(interactive browser authorization):
```bash
cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
```
Credentials persist in `apps/cli/.lobehub-dev/`. Details:
[../references/auth.md](../references/auth.md).
### Step 3 — Test with CLI commands
CLI runs from source, so CLI-side code changes take effect immediately without
rebuilding:
```bash
cd apps/cli
$CLI <command>
```
Capture output for the report as you go (e.g. `$CLI task list | tee "$DIR/assets/task-list.txt"`).
### Step 4 — Clean up test data
```bash
$CLI task delete < id > -y
$CLI agent delete < id > -y
```
### Step 5 — Report
Finish with a structured report —
[../references/report.md](../references/report.md). CLI evidence = exact
command + trimmed output.
## Common testing patterns
### Task system
```bash
$CLI task list
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
$CLI task view T-1
$CLI task tree T-1
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
$CLI task delete T-1 -y
```
### Agent system
```bash
$CLI agent list
$CLI agent view <agent-id>
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & knowledge base
```bash
$CLI doc list
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
$CLI kb list
$CLI kb tree <kb-id>
```
### Model & provider
```bash
$CLI model list
$CLI provider list
$CLI provider test <provider-id>
```
## Dev-test cycle
```
1. Make code changes (service/model/router/type)
|
2. Run unit tests (fast feedback)
bunx vitest run --silent='passed-only' '<test-file>'
|
3. Restart dev server (if server-side changes — see dev-server.md)
|
4. CLI verification (end-to-end)
$CLI <command>
|
5. Clean up test data + write the report
```
## Troubleshooting
| Issue | Solution |
| --------------------------- | ----------------------------------------------- |
| `No authentication found` | Run `login --server http://localhost:3010` |
| `UNAUTHORIZED` on API calls | Token expired; re-run login |
| `ECONNREFUSED` | Dev server not running — see dev-server.md |
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| Login opens wrong server | Must use `--server` flag (env var doesn't work) |
@@ -0,0 +1,257 @@
# agent-browser CLI Reference
Generic reference for the `agent-browser` CLI — automate Chromium-based apps (Electron, Chrome, web) via Chrome DevTools Protocol. LobeHub-specific patterns live in [../ui/electron.md](../ui/electron.md) and [../ui/web.md](../ui/web.md); authentication recipes live in [auth.md](./auth.md).
Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
## Core Workflow
Every browser automation follows this pattern:
1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
```
## Command Chaining
```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
```
Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
## Essential Commands
```bash
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "text" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if dialog is open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
# Streaming
agent-browser stream enable # Start WebSocket streaming
agent-browser stream status # Inspect streaming state
agent-browser stream disable # Stop streaming
```
## Batch Execution
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
```
## Authentication
```bash
# Option 1: Auth vault (credentials stored encrypted)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
# Option 2: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close # State auto-saved
agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
# Option 3: Persistent profile
agent-browser --profile ~/.myapp open https://app.example.com/login
# Option 4: State file
agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3010`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [auth.md](./auth.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
```
## JavaScript Evaluation (eval)
```bash
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin << 'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'document.title' | base64)"
```
## Ref Lifecycle
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
## Annotated Screenshots (Vision Mode)
```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
agent-browser click @e2 # Click using ref from annotated screenshot
```
## Parallel Sessions
```bash
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
```
## Connect to Existing Chrome
```bash
agent-browser --auto-connect snapshot # Auto-discover running Chrome
agent-browser --cdp 9222 snapshot # Explicit CDP port
```
## iOS Simulator (Mobile Safari)
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
```
## Observability Dashboard
```bash
agent-browser dashboard install
agent-browser dashboard start # Background server on port 4848
agent-browser dashboard stop
```
## Cloud Providers
Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
## Browser Engine Selection
```bash
agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
```
## Gotchas
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
@@ -19,13 +19,13 @@ works for any LobeHub streaming session.
```bash
# 1. Start Electron with CDP
./.agents/skills/local-testing/scripts/electron-dev.sh start
./.agents/skills/agent-testing/scripts/electron-dev.sh start
# 2. Navigate to a chat, switch runtime to Cloud Sandbox (gateway mode)
# 3. Install the probe + helpers
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/probe.js
< .agents/skills/agent-testing/scripts/agent-gateway/probe.js
# 4. Send a tool-call message — manually or via type+press
agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
@@ -34,15 +34,15 @@ agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
# rightmost inactive tab as AWAY — edit ROUND_TRIPS / DWELL_MS in the
# file if you want different timing)
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
< .agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js
# 6. Wait for streaming to finish, then dump
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js \
< .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js \
> /tmp/probe.json
# 7. Analyze
node .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
node .agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
```
The analyzer prints three sections: EVENTS, TIMELINE, REGRESSIONS. If
@@ -0,0 +1,123 @@
# Auth Setup for Local Agent Testing
**Auth is the gate for all automated testing.** Prepare and verify it before
writing any test step. The one-stop entry point is:
```bash
SCRIPT=".agents/skills/agent-testing/scripts/setup-auth.sh"
$SCRIPT status # check server + CLI + web auth readiness
$SCRIPT cli # interactive CLI device-code login (must be run by the user)
pbpaste | $SCRIPT web # inject a copied Cookie header into the agent-browser session
$SCRIPT web-verify # live-check that the agent-browser session is authenticated
```
`SERVER_URL` defaults to `http://localhost:3010` (this repo's `dev:next` port).
Override it when testing against another server (e.g. `SERVER_URL=http://localhost:3011`
in the cloud repo).
## Per-surface overview
| Surface | Mechanism | Persistence | Human interaction |
| -------- | ---------------------------------------- | ----------------------------------------------------------------- | ----------------------------------------------- |
| CLI | OIDC Device Code Flow | `apps/cli/.lobehub-dev/settings.json` | Yes — browser authorization, every token expiry |
| Web | better-auth cookie injection | `~/.lobehub-agent-testing/web-state.json` + agent-browser session | Copy the Cookie header once per token rotation |
| Electron | App's own login state | Electron user-data dir | Log in once manually in the app |
| Bot | Native apps (Discord/WeChat/…) logged in | Each app's own session | Once per app |
## CLI — Device Code Flow
Credentials are isolated from the user's real CLI config via
`LOBEHUB_CLI_HOME=.lobehub-dev` (kept inside `apps/cli/`, gitignored).
Login requires interactive browser authorization, so **the user must run it
themselves** (e.g. via the `!` prefix in Claude Code):
```bash
cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
```
- The `--server` flag is required — an env var does NOT work and login will hit
the wrong server without it.
- Check state without logging in: `setup-auth.sh status` (verifies
`settings.json` exists and `serverUrl` matches).
- `UNAUTHORIZED` on API calls means the token expired — re-run login.
## Web — better-auth cookie injection (agent-browser)
`agent-browser --headed` on macOS often creates the Chromium window off-screen —
the user can't see or interact with it, so manual login inside the agent-browser
session fails. Instead, copy the **better-auth session cookie** out of the
user's own logged-in Chrome and inject it as a Playwright-style state file.
Do **not** use this on production URLs — only local dev. Treat the cookie as a
secret: don't paste it into shared logs, PRs, or commit it anywhere.
### One-key path
1. Ask the user to copy the Cookie header **from a Network request, NOT
`document.cookie`** (`document.cookie` cannot see HttpOnly cookies, which is
exactly where better-auth puts its session):
- Open the logged-in tab (`http://localhost:<port>/…`) in Chrome.
- `Cmd+Option+I`**Network** tab → refresh → click any same-origin request.
- Under **Request Headers**, right-click the `Cookie:` line → **Copy value**.
2. Inject and verify in one shot:
```bash
pbpaste | ./.agents/skills/agent-testing/scripts/setup-auth.sh web
```
The script filters the header down to the better-auth cookies
(`better-auth.session_token`, `better-auth.state`), builds the Playwright
`storageState` JSON, loads it into the `agent-browser` session (default name
`lobehub-dev`), opens `SERVER_URL`, and asserts the URL is not `/signin`.
### Using the authenticated session
```bash
agent-browser --session lobehub-dev open "http://localhost:3010/"
agent-browser --session lobehub-dev snapshot -i | head -20
# Look for the user's avatar/name in the sidebar, or absence of the signin form.
```
### Notes
- `storageState` doesn't enforce the HttpOnly flag on load — the script stores
cookies with `httpOnly: false`, which is fine for local dev and sidesteps a
CDP-context quirk where HttpOnly cookies sometimes fail to attach.
- The state file is kept at `~/.lobehub-agent-testing/web-state.json` so
`setup-auth.sh status` can report web-auth readiness across sessions.
### Common failure modes
| Symptom | Cause | Fix |
| --------------------------------------------- | ------------------------------------------------------------------------- | ------------------------------------------------- |
| Still redirects to `/signin` after injection | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| Script reports `no better-auth cookies found` | Separator wrong, or user pasted URL-decoded value | Keep the raw `Cookie:` header as-is |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-inject |
| Domain mismatch | Cookie domain must be `localhost` literally, no leading dot for local dev | — |
## Electron
The desktop app keeps its own persistent login state in its user-data
directory — log in once manually inside the app and it survives restarts of
`electron-dev.sh`. No injection needed. The standard check (do NOT hand-roll a
store eval) once Electron is up with CDP:
```bash
./.agents/skills/agent-testing/scripts/app-probe.sh auth
# → {"ok":true,"isSignedIn":true,"userId":"user_xxx"}
```
`setup-auth.sh status` runs this probe automatically when CDP 9222 is
reachable.
## Scope
These recipes only cover **local dev** authentication. They do not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com`
and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI itself need a
real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way.
@@ -0,0 +1,55 @@
# Local Dev Server
Single source of truth for starting / restarting the backend that all test
surfaces (CLI, Electron, Web) hit.
## Ports & modes
| Command | What it runs | Port |
| ------------------- | --------------------------------------------------------- | --------------------------------- |
| `pnpm run dev:next` | Next.js backend (API + auth) | `3010` |
| `bun run dev` | Full-stack (Next.js + Vite SPA, via `devStartupSequence`) | `3010` (API) + SPA |
| `bun run dev:spa` | Vite SPA only, proxies API to `3010` | `9876` (prints a Debug Proxy URL) |
In the **cloud repo** (where this repo is the `lobehub/` submodule) the dev
server conventionally runs on `3011` — set `SERVER_URL=http://localhost:3011`
for the scripts in this skill when testing there.
## Health check
```bash
curl -s -o /dev/null -w '%{http_code}' http://localhost:3010/
```
## Start / restart
```bash
# Start (from repo root)
pnpm run dev:next
# Restart — required to pick up server-side code changes
lsof -ti:3010 | xargs kill
pnpm run dev:next
```
## When a server restart is needed
Next.js hot-reload may not pick up changes in workspace packages — restart when
in doubt.
| Change location | Restart? |
| ----------------------------------------------- | -------- |
| `apps/server/src/` (routers, services, modules) | Yes |
| `src/server/` (agent-hono, workflows-hono) | Yes |
| `packages/database/` (models) | Yes |
| `packages/types/` | Yes |
| `packages/prompts/` | Yes |
| `apps/cli/` (CLI runs from source) | No |
## Troubleshooting
| Issue | Solution |
| ------------------------- | ------------------------------------------------------- |
| `ECONNREFUSED` | Server not running — start it |
| `EADDRINUSE` on the port | Already running — `lsof -ti:<port> \| xargs kill` first |
| Stale data / old behavior | Server needs a restart to pick up code changes |
@@ -12,13 +12,13 @@ General-purpose screen recording tool for the Electron app. Captures CDP screens
```bash
# Start recording (Electron must be running with CDP)
.agents/skills/local-testing/scripts/record-app-screen.sh start [output_name]
.agents/skills/agent-testing/scripts/record-app-screen.sh start [output_name]
# Stop recording and assemble video
.agents/skills/local-testing/scripts/record-app-screen.sh stop
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
# Check if recording is active
.agents/skills/local-testing/scripts/record-app-screen.sh status
.agents/skills/agent-testing/scripts/record-app-screen.sh status
```
### Arguments
@@ -74,10 +74,10 @@ The `.records/` directory is at the project root and is gitignored.
```bash
# Start Electron
.agents/skills/local-testing/scripts/electron-dev.sh start
.agents/skills/agent-testing/scripts/electron-dev.sh start
# Start recording
.agents/skills/local-testing/scripts/record-app-screen.sh start my-test
.agents/skills/agent-testing/scripts/record-app-screen.sh start my-test
# Run automation
agent-browser --cdp 9222 click @e61
@@ -86,14 +86,14 @@ agent-browser --cdp 9222 press Enter
sleep 10
# Stop and get results
.agents/skills/local-testing/scripts/record-app-screen.sh stop
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
# → .records/my-test.mp4 + .records/my-test/*.png
```
### Gateway Streaming Demo
```bash
.agents/skills/local-testing/scripts/electron-dev.sh start
.agents/skills/agent-testing/scripts/electron-dev.sh start
# Inject gateway URL
agent-browser --cdp 9222 eval --stdin << 'EOF'
@@ -106,19 +106,19 @@ agent-browser --cdp 9222 eval --stdin << 'EOF'
EOF
# Record
.agents/skills/local-testing/scripts/record-app-screen.sh start gateway-demo
.agents/skills/agent-testing/scripts/record-app-screen.sh start gateway-demo
# Navigate to agent, send message, wait for completion...
# (automation commands here)
.agents/skills/local-testing/scripts/record-app-screen.sh stop
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
open .records/gateway-demo.mp4
```
### Check Active Recording
```bash
.agents/skills/local-testing/scripts/record-app-screen.sh status
.agents/skills/agent-testing/scripts/record-app-screen.sh status
# [record] Active recording
# Frames: 42 captured (running: yes)
# Screenshots: 14 captured (running: yes)
@@ -0,0 +1,124 @@
# Structured Test Reports
Every automated test session ends with a structured, evidence-backed report.
A chat-only summary is not an acceptable deliverable: the report is what the
user (or a reviewer, or a later agent) audits without replaying the session.
## Location & layout
Reports live under `.records/reports/` (gitignored, like all `.records/`
output):
```
.records/reports/<YYYYMMDD-HHMMSS>-<slug>/
├── report.md # human-readable report (embedded screenshots, case table, verdict)
├── result.json # machine-readable results (pass/fail counts, score)
└── assets/ # evidence: screenshots, HAR files, CLI transcripts
```
## Workflow
1. **Scaffold up front** — before running the first test step:
```bash
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh < slug > "<title>")
```
The script creates the directory, pre-fills branch / commit / date in both
files, and prints the directory path.
2. **Collect evidence as you test** — every asserted behavior gets one evidence
item in `$DIR/assets/`:
- UI (static state): `agent-browser screenshot` or `capture-app-window.sh`,
then **verify the screenshot with the Read tool before citing it** —
never cite an image you haven't looked at.
- UI (time-based behavior): **screenshot vs GIF is a judgment you must
make per case.** If the assertion is about change over time — streaming
output, a ticking timer, loading/progress states, animations,
appear/disappear transitions — a static screenshot cannot prove it.
Record a frame sequence and synthesize a GIF:
```bash
# start recording (background), trigger the behavior, wait for it to finish
../scripts/record-gif.sh "$DIR/assets/case2-streaming.gif" 12 2 &
GIF_PID=$!
# ... drive the scenario ...
wait $GIF_PID
```
Embed it like an image: `![case 2](assets/case2-streaming.gif)`. Verify
at least the first/last frames visually (Read the GIF) before citing.
- CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
- Network: `agent-browser network requests` dumps or HAR files.
3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
4. **Set the verdict** in both `report.md` and `result.json`, then link the
report directory in your final answer to the user.
## Report language (hard rule)
**`report.md` MUST be written in the language the user is conversing in** —
the whole file, headings included. If the conversation is in Chinese, the
report is in Chinese; do not mix English prose into it. The scaffold's English
headings are placeholders — translate them when filling. Exceptions that stay
as-is: code/commands, identifiers, log excerpts, and `result.json` (its keys
and status values are machine-read and stay English; the `title` and case
`name` fields follow the user's language).
## report.md sections
| Section | Content |
| --------------- | ---------------------------------------------------------------------------------- |
| **Scope** | What changed / what is being verified; branch + commit |
| **Environment** | Server URL, surfaces used (cli / electron / web / bot), relevant versions |
| **Cases** | Table: `# \| case \| surface \| steps \| expected \| actual \| status \| evidence` |
| **Evidence** | Embedded screenshots/GIFs (`![case 1](assets/case1.png)`), fenced CLI transcripts |
| **Verdict** | Pass/fail/blocked counts, optional 0100 score, open issues / follow-ups |
Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
missing; a blocked case is not a pass).
## result.json schema
```json
{
"branch": "feat/task-tree",
"cases": [
{
"id": "1",
"name": "task tree returns nested children",
"surface": "cli",
"status": "pass",
"evidence": ["assets/task-tree.txt"]
}
],
"commit": "abc1234",
"createdAt": "2026-06-11T15:30:00+08:00",
"summary": {
"total": 1,
"passed": 1,
"failed": 0,
"blocked": 0,
"score": 100,
"verdict": "pass"
},
"surfaces": ["cli"],
"title": "Verify task tree API"
}
```
`score` is optional — use it when the verdict has a subjective component (UI
polish, copy quality); omit it for purely binary runs. `verdict` is the single
word the user reads first: `pass`, `fail`, or `partial`.
## Rules
- **No evidence, no claim** — every `pass`/`fail` in the case table must link
at least one asset.
- **Screenshots must be visually verified** with the Read tool before being
cited.
- **Report failures faithfully** — a failing case with clear evidence is a good
report; a vague green one is not.
- If coverage was cut (cases skipped, surfaces not exercised), say so in the
Verdict section — silent truncation reads as "covered everything".
@@ -11,7 +11,7 @@
// 6. ROLLBACKS — msgN / childN / role drops in the active-topic timeline
//
// Usage:
// bun run .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
// bun run .agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
import { readFileSync } from 'node:fs';
@@ -5,16 +5,16 @@
// streaming-replay test fixtures.
//
// Commands:
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts install
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts install
// Bundle probe-events.ts and inject into the CDP-attached browser.
// Re-installing clears all buffers and re-patches WebSocket / fetch.
//
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts dump [name]
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts dump [name]
// Stop the timeline timer, fetch the capture as JSON, write it to
// `.agent-gateway/<name>-<YYYYMMDD-HHmmss>.json`. `name` defaults to
// `dump`. Prints the absolute path written.
//
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts analyze [path]
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts analyze [path]
// Run analyze-events.ts on the dump. `path` defaults to the most
// recently modified file in `.agent-gateway/`.
//
@@ -28,7 +28,7 @@ import path from 'node:path';
import { fileURLToPath } from 'node:url';
const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
// .agents/skills/local-testing/scripts/agent-gateway/ → 5 levels up
// .agents/skills/agent-testing/scripts/agent-gateway/ → 5 levels up
const PROJECT_ROOT = path.resolve(SCRIPT_DIR, '../../../../..');
const DUMP_DIR = path.join(PROJECT_ROOT, '.agent-gateway');
+95
View File
@@ -0,0 +1,95 @@
#!/usr/bin/env bash
# app-probe.sh — standardized probes for a running LobeHub app (Electron via
# CDP, or a web agent-browser session). Use these instead of hand-rolling
# `window.__LOBE_STORES` eval snippets — especially the auth check.
#
# Usage:
# app-probe.sh auth # { isSignedIn, userId } from the user store
# app-probe.sh route # current SPA route
# app-probe.sh ops # running chat operations (type / status / startTime)
# app-probe.sh goto <path> # navigate the SPA to a route (full reload), e.g. goto /agent/agt_xxx
# app-probe.sh errors-install # install a console.error interceptor
# app-probe.sh errors # dump errors captured since errors-install
#
# Target selection (default: Electron over CDP 9222):
# AB_TARGET="--cdp 9222" # Electron (default; CDP_PORT also honored)
# AB_TARGET="--session lobehub-dev" # web agent-browser session
#
# Common routes (desktop SPA): / /agent/<agentId> /agent/<agentId>/<topicId>
# /task /task/<taskId> /page /settings /community
set -euo pipefail
AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
run_eval() {
# shellcheck disable=SC2086
agent-browser $AB_TARGET eval --stdin
}
case "${1:-}" in
auth)
run_eval << 'EVALEOF'
(function () {
var stores = window.__LOBE_STORES;
if (!stores || !stores.user) return JSON.stringify({ ok: false, reason: 'no user store — app not loaded yet?' });
var u = stores.user();
return JSON.stringify({ ok: !!u.isSignedIn, isSignedIn: !!u.isSignedIn, userId: (u.user && u.user.id) || null });
})()
EVALEOF
;;
route)
run_eval << 'EVALEOF'
location.pathname + location.search + location.hash
EVALEOF
;;
ops)
run_eval << 'EVALEOF'
(function () {
var stores = window.__LOBE_STORES;
if (!stores || !stores.chat) return JSON.stringify({ ok: false, reason: 'no chat store — open a conversation first' });
var ops = Object.values(stores.chat().operations || {});
var running = ops.filter(function (o) { return o.status === 'running'; });
return JSON.stringify({
ok: true,
running: running.map(function (o) { return { startTime: o.metadata && o.metadata.startTime, type: o.type }; }),
runningCount: running.length,
total: ops.length,
});
})()
EVALEOF
;;
goto)
TARGET_PATH="${2:?Usage: app-probe.sh goto <path>}"
# shellcheck disable=SC2086
agent-browser $AB_TARGET eval "location.href = '$TARGET_PATH'" > /dev/null
sleep 2
bash "${BASH_SOURCE[0]}" route
;;
errors-install)
run_eval << 'EVALEOF'
(function () {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function () {
var msg = Array.from(arguments).map(function (a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
;;
errors)
run_eval << 'EVALEOF'
JSON.stringify(window.__CAPTURED_ERRORS || 'interceptor not installed — run errors-install first')
EVALEOF
;;
*)
echo "Usage: $0 {auth|route|ops|goto <path>|errors-install|errors}" >&2
exit 2
;;
esac
+61
View File
@@ -0,0 +1,61 @@
#!/usr/bin/env bash
# record-gif.sh — capture a frame sequence via agent-browser (CDP) and
# synthesize a GIF for embedding in a test report.
#
# Use this whenever the asserted behavior is about CHANGE OVER TIME —
# streaming output, a ticking timer, loading states, animations. A static
# screenshot cannot prove those; a GIF can. Cloud-portable: frames come from
# CDP rendering, no OS-level screen capture.
#
# Usage:
# record-gif.sh <output.gif> <duration_seconds> [fps]
#
# AB_TARGET="--cdp 9222" # Electron (default; CDP_PORT honored)
# AB_TARGET="--session lobehub-dev" # web agent-browser session
# GIF_WIDTH=960 # output width (px), default 960
#
# Requires ffmpeg (`brew install ffmpeg`). Effective fps is capped by
# screenshot latency (~0.3-0.5s per frame); 1-2 fps is the realistic range.
#
# Example — record a 12s run and embed it in the report:
# ./record-gif.sh "$DIR/assets/case2-tray-running.gif" 12 2 &
# GIF_PID=$!
# # ... trigger the streaming behavior ...
# wait $GIF_PID
set -euo pipefail
OUT="${1:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
DUR="${2:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
FPS="${3:-2}"
AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
GIF_WIDTH="${GIF_WIDTH:-960}"
command -v ffmpeg > /dev/null || {
echo "ffmpeg not found — install with: brew install ffmpeg" >&2
exit 1
}
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT
FRAMES=$((DUR * FPS))
INTERVAL=$(python3 -c "print(1 / $FPS)")
for i in $(seq -f '%04g' 1 "$FRAMES"); do
# shellcheck disable=SC2086
agent-browser $AB_TARGET screenshot "$TMP/frame-$i.png" > /dev/null 2>&1 || true
sleep "$INTERVAL"
done
CAPTURED=$(find "$TMP" -name 'frame-*.png' | wc -l | tr -d ' ')
[ "$CAPTURED" -gt 0 ] || {
echo "no frames captured — is the app reachable via $AB_TARGET?" >&2
exit 1
}
ffmpeg -y -loglevel error -framerate "$FPS" -pattern_type glob -i "$TMP/frame-*.png" \
-vf "fps=$FPS,scale=$GIF_WIDTH:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
"$OUT"
echo "$OUT ($CAPTURED frames @ ${FPS}fps)"
+74
View File
@@ -0,0 +1,74 @@
#!/usr/bin/env bash
# report-init.sh — scaffold a structured test report under .records/reports/.
#
# Format spec and evidence rules: ../references/report.md
#
# Usage:
# report-init.sh <slug> [title]
#
# Prints the report directory path (capture it: DIR=$(report-init.sh my-test)).
set -euo pipefail
SLUG="${1:?Usage: report-init.sh <slug> [title]}"
TITLE="${2:-$SLUG}"
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
TS="$(date +%Y%m%d-%H%M%S)"
DIR="$REPO_ROOT/.records/reports/$TS-$SLUG"
mkdir -p "$DIR/assets"
BRANCH=$(git -C "$REPO_ROOT" branch --show-current 2> /dev/null || echo "unknown")
COMMIT=$(git -C "$REPO_ROOT" rev-parse --short HEAD 2> /dev/null || echo "unknown")
DATE_HUMAN=$(date '+%Y-%m-%d %H:%M')
DATE_ISO=$(date '+%Y-%m-%dT%H:%M:%S%z')
cat > "$DIR/report.md" << EOF
# Test Report: $TITLE
## Scope
<!-- What changed / what is being verified -->
- Branch: \`$BRANCH\`
- Commit: \`$COMMIT\`
- Date: $DATE_HUMAN
## Environment
- Server: <!-- e.g. http://localhost:3010 -->
- Surfaces: <!-- cli / electron / web / bot:<platform> -->
## Cases
| # | Case | Surface | Steps | Expected | Actual | Status | Evidence |
| - | ---- | ------- | ----- | -------- | ------ | ------ | -------- |
| 1 | | | | | | | |
## Evidence
<!-- Embed screenshots: ![case 1](assets/case1.png) -->
<!-- CLI transcripts in fenced blocks, with the exact command -->
## Verdict
- Passed: 0 / 0
- Failed: 0
- Blocked: 0
- Score (optional): —
- Open issues / follow-ups:
EOF
cat > "$DIR/result.json" << EOF
{
"title": "$TITLE",
"createdAt": "$DATE_ISO",
"branch": "$BRANCH",
"commit": "$COMMIT",
"surfaces": [],
"cases": [],
"summary": { "total": 0, "passed": 0, "failed": 0, "blocked": 0, "verdict": "pending" }
}
EOF
echo "$DIR"
+174
View File
@@ -0,0 +1,174 @@
#!/usr/bin/env bash
# setup-auth.sh — one-stop auth setup & check for local agent testing.
#
# Auth is the gate for all automated testing: prepare it BEFORE writing any
# test step. Background and failure modes: ../references/auth.md
#
# Usage:
# setup-auth.sh status # check server + CLI + web auth readiness
# setup-auth.sh cli # interactive CLI device-code login (run by a human)
# setup-auth.sh web # stdin = Cookie header -> inject into agent-browser session
# setup-auth.sh web-verify # live-check the agent-browser session is authenticated
#
# Env:
# SERVER_URL (default http://localhost:3010) dev server under test
# SESSION (default lobehub-dev) agent-browser session name
# AUTH_DIR (default ~/.lobehub-agent-testing) where web state is persisted
set -euo pipefail
SERVER_URL="${SERVER_URL:-http://localhost:3010}"
SESSION="${SESSION:-lobehub-dev}"
AUTH_DIR="${AUTH_DIR:-$HOME/.lobehub-agent-testing}"
STATE_FILE="$AUTH_DIR/web-state.json"
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
CLI_HOME="$REPO_ROOT/apps/cli/.lobehub-dev"
ok() { printf ' \033[32m✔\033[0m %s\n' "$1"; }
bad() { printf ' \033[31m✘\033[0m %s\n' "$1"; }
note() { printf ' %s\n' "$1"; }
check_server() {
local code
code=$(curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/" 2> /dev/null || true)
if [[ "$code" =~ ^[23] ]]; then
ok "dev server reachable at $SERVER_URL"
else
bad "dev server NOT reachable at $SERVER_URL (http_code='$code')"
note "start it: pnpm run dev:next (see references/dev-server.md)"
return 1
fi
}
check_cli() {
if [[ -f "$CLI_HOME/settings.json" ]] && grep -q "$SERVER_URL" "$CLI_HOME/settings.json"; then
ok "CLI logged in to $SERVER_URL (creds: apps/cli/.lobehub-dev)"
else
bad "CLI not logged in to $SERVER_URL"
note "ask the user to run:"
note "cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server $SERVER_URL"
return 1
fi
}
check_web() {
if [[ -f "$STATE_FILE" ]]; then
ok "web auth state saved ($STATE_FILE)"
note "live-verify: $0 web-verify"
else
bad "no web auth state for agent-browser"
note "copy the Cookie header from Chrome DevTools (Network tab), then:"
note "pbpaste | $0 web (see references/auth.md)"
return 1
fi
}
check_electron() {
local cdp_port="${CDP_PORT:-9222}"
if ! curl -s -o /dev/null --max-time 2 "http://localhost:$cdp_port/json/version" 2> /dev/null; then
note "electron: not running (CDP $cdp_port unreachable) — start with electron-dev.sh; check skipped"
return 0
fi
local probe result
probe="$(dirname "${BASH_SOURCE[0]}")/app-probe.sh"
result=$(bash "$probe" auth 2> /dev/null || true)
# agent-browser eval returns the JSON string with escaped quotes — normalize.
result="${result//\\/}"
if [[ "$result" == *'"isSignedIn":true'* ]]; then
ok "electron app signed in ($result)"
else
bad "electron app NOT signed in ($result)"
note "log in once manually inside the app (state persists across restarts)"
return 1
fi
}
cmd_status() {
echo "agent-testing auth status (SERVER_URL=$SERVER_URL):"
local rc=0
check_server || rc=1
check_cli || rc=1
check_web || rc=1
check_electron || rc=1
if [[ $rc -eq 0 ]]; then
echo "all green — safe to start automated testing."
else
echo "auth NOT ready — fix the ✘ items before writing any test step."
fi
return $rc
}
cmd_cli() {
echo "Starting CLI device-code login against $SERVER_URL ..."
echo "(opens a browser authorization — must be run by a human in a terminal)"
cd "$REPO_ROOT/apps/cli"
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server "$SERVER_URL"
}
# Build a Playwright storageState file from a raw Cookie header on stdin,
# keeping only the better-auth cookies. See references/auth.md for why the
# header must come from a Network request (HttpOnly) and why httpOnly=false.
cmd_web() {
mkdir -p "$AUTH_DIR"
python3 - "$STATE_FILE" << 'PY'
import json, sys, time
raw = sys.stdin.read().strip()
if raw.lower().startswith("cookie:"):
raw = raw.split(":", 1)[1].strip()
WANTED = {"better-auth.session_token", "better-auth.state"}
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
cookies = []
for pair in raw.split("; "):
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input — paste the raw Cookie header from a Network request\n")
sys.exit(1)
with open(sys.argv[1], "w") as f:
json.dump({"cookies": cookies, "origins": []}, f, indent=2)
print(f"wrote {len(cookies)} cookie(s) to {sys.argv[1]}")
PY
agent-browser --session "$SESSION" state load "$STATE_FILE"
cmd_web_verify
}
cmd_web_verify() {
agent-browser --session "$SESSION" open "$SERVER_URL/" > /dev/null
local url
url=$(agent-browser --session "$SESSION" get url)
if [[ "$url" == *"/signin"* || "$url" == *"/login"* ]]; then
bad "agent-browser session '$SESSION' NOT authenticated (landed on $url)"
note "re-copy the Cookie header and re-run: pbpaste | $0 web"
return 1
fi
ok "agent-browser session '$SESSION' authenticated (at $url)"
}
case "${1:-status}" in
status) cmd_status ;;
cli) cmd_cli ;;
web) cmd_web ;;
web-verify) cmd_web_verify ;;
*)
echo "Usage: $0 {status|cli|web|web-verify}" >&2
exit 2
;;
esac
+154
View File
@@ -0,0 +1,154 @@
# Electron (LobeHub Desktop) UI Testing
Default surface for verifying **pure frontend changes** (components, store logic, styles, interactions) in the primary product shape. Drives the Electron renderer over CDP with `agent-browser` — see [../references/agent-browser.md](../references/agent-browser.md) for the full command reference.
**Auth**: the Electron app keeps its own persistent login state — log in once manually in the app; sessions survive restarts. Run `../scripts/setup-auth.sh status` before testing (see [../references/auth.md](../references/auth.md)).
**Linux / headless (cloud)**: Electron itself runs on Linux, but it has no true headless mode — it needs a display server. In a headless environment wrap the launch with `xvfb-run` (virtual framebuffer). Everything CDP-based keeps working under Xvfb: the `agent-browser --cdp 9222` connection, snapshots, eval, and `agent-browser screenshot` (captured from the renderer via CDP, not the OS screen). What does NOT work on Linux: `capture-app-window.sh` (macOS `screencapture`), osascript, and the ffmpeg recording scripts in their current form.
### Setup / Teardown
Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
```bash
SCRIPT=".agents/skills/agent-testing/scripts/electron-dev.sh"
# Start Electron dev with CDP (idempotent — skips if already running)
$SCRIPT start
# Check if Electron is running and CDP is reachable
$SCRIPT status
# Kill all Electron-related processes (main + helper + vite)
$SCRIPT stop
# Force fresh restart
$SCRIPT restart
```
After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
**Always run `$SCRIPT stop` when done testing**`pkill -f "Electron"` alone won't catch all helper processes.
#### Environment Variables
| Variable | Default | Description |
| ----------------- | ----------------------- | ---------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `ELECTRON_LOG` | `/tmp/electron-dev.log` | Electron process log |
| `ELECTRON_WAIT_S` | `60` | Max seconds to wait for Electron process |
| `RENDERER_WAIT_S` | `60` | Max seconds to wait for SPA to load |
### LobeHub Probes & Quick Navigation
`scripts/app-probe.sh` is the standard fast path into app state — **use it
instead of hand-rolling `__LOBE_STORES` eval snippets** for these common needs:
```bash
PROBE=".agents/skills/agent-testing/scripts/app-probe.sh"
$PROBE auth # login check (Step 0.3) → { isSignedIn, userId }
$PROBE route # current SPA route
$PROBE ops # running chat operations (type / startTime)
$PROBE goto /settings # jump the SPA straight to a route (full reload)
$PROBE errors-install # install console.error interceptor
$PROBE errors # dump captured errors
```
`goto` lets a test enter the state under test directly instead of clicking
through the UI. Common desktop routes:
| Route | Where it lands |
| ----------------------------- | ------------------------------------ |
| `/` | Home (has a chat input) |
| `/agent/<agentId>` | Agent conversation (latest topic) |
| `/agent/<agentId>/<topicId>` | Specific topic in a conversation |
| `/task` · `/task/<taskId>` | Task list / task detail |
| `/page` | Documents (文稿) |
| `/settings` | Settings |
| `/community` | Discover / community |
Targets default to Electron (`--cdp 9222`); set `AB_TARGET="--session <name>"`
for web sessions. For deeper or one-off state inspection, fall back to raw
eval below.
### LobeHub-Specific Patterns
#### Access Zustand Store State
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
return JSON.stringify({
ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
activeAgent: chat.activeAgentId,
activeTopic: chat.activeTopicId,
});
})()
EVALEOF
```
#### Find and Use the Chat Input
```bash
# The chat input is contenteditable — must use -C flag
agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
agent-browser --cdp 9222 click @e48
agent-browser --cdp 9222 type @e48 "Hello world"
agent-browser --cdp 9222 press Enter
```
#### Wait for Agent to Complete
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
var running = ops.filter(function(o) { return o.status === 'running'; });
return running.length === 0 ? 'done' : 'running: ' + running.length;
})()
EVALEOF
```
#### Install Error Interceptor
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function() {
var msg = Array.from(arguments).map(function(a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
# Later, check captured errors:
agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
```
## Electron Gotchas
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Dev build auto-opens DevTools, which hijacks the CDP target** — `agent-browser --cdp 9222` may attach to the DevTools page (`devtools://…`) instead of the app (`app://renderer/`). Symptom: `get url` returns a `devtools://` URL. Fix: close the DevTools target and reconnect:
```bash
DT_ID=$(curl -s http://localhost:9222/json/list | python3 -c "import json,sys; ts=json.load(sys.stdin); print(next(t['id'] for t in ts if t['type']=='page' and t['url'].startswith('devtools://')))")
curl -s "http://localhost:9222/json/close/$DT_ID" > /dev/null
agent-browser close --all && agent-browser --cdp 9222 get url # expect app://renderer/
```
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
- **Streaming / ticking UI needs GIF evidence** — see `scripts/record-gif.sh`; a static screenshot cannot prove time-based behavior.
+69
View File
@@ -0,0 +1,69 @@
# Web (Full-Stack) Testing
Default surface for **full-stack changes** — a new/changed API plus the UI that
consumes it. The browser is the one surface where network requests and UI state
are observable together, so you can assert both sides of the contract in a
single run.
For pure-frontend changes prefer [electron.md](./electron.md); for
backend-only changes prefer [../cli/index.md](../cli/index.md).
## Prerequisites
- Local dev server running — [../references/dev-server.md](../references/dev-server.md)
- Web auth injected into agent-browser — [../references/auth.md](../references/auth.md):
```bash
pbpaste | ./.agents/skills/agent-testing/scripts/setup-auth.sh web # after copying the Cookie header
```
## Option A — agent-browser with injected auth (recommended)
```bash
SESSION=lobehub-dev
agent-browser --session $SESSION open "http://localhost:3010/"
agent-browser --session $SESSION snapshot -i
# interact via refs — full command reference: ../references/agent-browser.md
```
### Watch the API while driving the UI
```bash
# After triggering the UI action under test:
agent-browser --session $SESSION network requests --type xhr,fetch
agent-browser --session $SESSION network requests --method POST
# Record a full HAR for the report
agent-browser --session $SESSION network har start
# ... drive the scenario ...
agent-browser --session $SESSION network har stop ./capture.har
```
Assert both layers: the request/response shape (network) and the rendered
result (snapshot/screenshot). Both belong in the report as evidence.
## Option B — real Chrome with remote debugging
For flows that need a real, visible browser (e.g. exercising the login UI
itself):
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-test-profile \
"<URL>" &
sleep 5
agent-browser --cdp 9222 snapshot -i
# Or auto-discover running Chrome with remote debugging
agent-browser --auto-connect snapshot -i
```
## Option C — Debug Proxy (local frontend, production backend)
`bun run dev:spa` prints a **Debug Proxy** URL
(`https://app.lobehub.com/_dangerous_local_dev_proxy?debug-host=…`) that loads
your local Vite SPA inside the online environment — HMR against real server
config. Useful for verifying frontend behavior against production data, **not**
for testing backend changes (the backend is production, not your branch).
-172
View File
@@ -1,172 +0,0 @@
---
name: cli-backend-testing
description: >
CLI + Backend integration testing workflow. Use when verifying backend API changes
(TRPC routers, services, models) via the LobeHub CLI against a local dev server.
Triggers on 'cli test', 'test with cli', 'verify with cli', 'local cli test',
'backend test with cli', or when needing to validate server-side changes end-to-end.
---
# CLI + Backend Integration Testing
Standard workflow for verifying backend changes using the LobeHub CLI (`lh`) against a local dev server.
## When to Use
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
## Prerequisites
| Requirement | Details |
| ------------ | ------------------------------------------------------------- |
| Dev server | `localhost:3011` (Next.js) |
| CLI source | `lobehub/apps/cli/` |
| CLI dev mode | Uses `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials |
| Auth | Device Code Flow login to local server |
## Quick Reference
All CLI dev commands run from `lobehub/apps/cli/`. Subsequent examples use `$CLI`:
```bash
CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
```
## Workflow
### Step 1: Ensure Dev Server is Running
```bash
curl -s -o /dev/null -w '%{http_code}' http://localhost:3011/ 2> /dev/null
```
- **If reachable**: skip to Step 2.
- **If unreachable**: start from cloud repo root:
```bash
pnpm run dev:next
```
To **restart** (pick up server-side code changes):
```bash
lsof -ti:3011 | xargs kill
pnpm run dev:next
```
**Important:** Server-side code changes in the submodule (`lobehub/apps/server/src/`, `lobehub/src/server/`, `lobehub/packages/`) require a server restart. Next.js hot-reload may not pick up changes in submodule packages.
### Step 2: Check CLI Authentication
```bash
cat lobehub/apps/cli/.lobehub-dev/settings.json 2> /dev/null
```
- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: skip to Step 3.
- **If missing or wrong server**: ask the user to run:
```bash
! cd lobehub/apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3011
```
> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. Credentials persist in `lobehub/apps/cli/.lobehub-dev/`.
### Step 3: Test with CLI Commands
CLI runs from source, so CLI-side code changes take effect immediately without rebuilding.
```bash
cd lobehub/apps/cli
$CLI <command>
```
### Step 4: Clean Up Test Data
```bash
$CLI task delete < id > -y
$CLI agent delete < id > -y
```
## Common Testing Patterns
### Task System
```bash
$CLI task list
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
$CLI task view T-1
$CLI task tree T-1
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
$CLI task delete T-1 -y
```
### Agent System
```bash
$CLI agent list
$CLI agent view <agent-id>
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & Knowledge Base
```bash
$CLI doc list
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
$CLI kb list
$CLI kb tree <kb-id>
```
### Model & Provider
```bash
$CLI model list
$CLI provider list
$CLI provider test <provider-id>
```
## Dev-Test Cycle
```
1. Make code changes (service/model/router/type)
|
2. Run unit tests (fast feedback)
bunx vitest run --silent='passed-only' '<test-file>'
|
3. Restart dev server (if server-side changes)
lsof -ti:3011 | xargs kill && pnpm run dev:next
|
4. CLI verification (end-to-end)
$CLI <command>
|
5. Clean up test data
```
### When Server Restart is Needed
| Change Location | Restart? |
| ------------------------------------------------------- | -------- |
| `lobehub/apps/server/src/` (routers, services, modules) | Yes |
| `lobehub/src/server/` (agent-hono, workflows-hono) | Yes |
| `lobehub/packages/database/` (models) | Yes |
| `lobehub/packages/types/` | Yes |
| `lobehub/packages/prompts/` | Yes |
| `lobehub/apps/cli/` (CLI code) | No |
| `src/` (cloud overrides) | Yes |
## Troubleshooting
| Issue | Solution |
| --------------------------- | --------------------------------------------------------------------- |
| `No authentication found` | Run `login --server http://localhost:3011` |
| `UNAUTHORIZED` on API calls | Token expired; re-run login |
| `ECONNREFUSED` | Dev server not running; start with `pnpm run dev:next` |
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| `EADDRINUSE` on port 3011 | Server already running; kill with `lsof -ti:3011 \| xargs kill` |
| Login opens wrong server | Must use `--server http://localhost:3011` flag (env var doesn't work) |
@@ -241,6 +241,6 @@ When the bug comes from a real trace, distill it into the closest existing test
3. Add or update the narrowest failing test near the broken layer.
4. Fix the smallest layer that can explain the symptom.
5. Re-run focused tests.
6. Only then do an Electron smoke test with the `local-testing` skill if UI confirmation is still needed.
6. Only then do an Electron smoke test with the `agent-testing` skill if UI confirmation is still needed.
Do not start with a broad Electron repro if a raw trace or adapter test can prove the fault zone faster.
-561
View File
@@ -1,561 +0,0 @@
---
name: local-testing
description: >
Local app and bot testing. Uses agent-browser CLI for Electron/web app UI testing,
and osascript (AppleScript) for controlling native macOS apps (WeChat, Discord, Telegram, Slack, Lark/飞书, QQ)
to test bots. Triggers on 'local test', 'test in electron', 'test desktop', 'test bot',
'bot test', 'test in discord', 'test in telegram', 'test in slack', 'test in weixin',
'test in wechat', 'test in lark', 'test in feishu', 'test in qq',
'manual test', 'osascript', or UI/bot verification tasks.
---
# Local App & Bot Testing
Two approaches for local testing on macOS:
| Approach | Tool | Best For |
| --------------------------- | ------------------- | ---------------------------------------------------- |
| **agent-browser + CDP** | `agent-browser` CLI | Electron apps, web apps (DOM access, JS eval) |
| **osascript (AppleScript)** | `osascript -e` | Native macOS apps (WeChat, Discord, Telegram, Slack) |
---
# Part 1: agent-browser (Electron / Web Apps)
Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
## Core Workflow
Every browser automation follows this pattern:
1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
```
## Command Chaining
```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
```
Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
## Essential Commands
```bash
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "text" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if dialog is open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
# Streaming
agent-browser stream enable # Start WebSocket streaming
agent-browser stream status # Inspect streaming state
agent-browser stream disable # Stop streaming
```
## Batch Execution
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
```
## Authentication
```bash
# Option 1: Auth vault (credentials stored encrypted)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
# Option 2: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close # State auto-saved
agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
# Option 3: Persistent profile
agent-browser --profile ~/.myapp open https://app.example.com/login
# Option 4: State file
agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3011`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [references/agent-browser-login.md](./references/agent-browser-login.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
```
## JavaScript Evaluation (eval)
```bash
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin << 'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'document.title' | base64)"
```
## Ref Lifecycle
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
## Annotated Screenshots (Vision Mode)
```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
agent-browser click @e2 # Click using ref from annotated screenshot
```
## Parallel Sessions
```bash
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
```
## Connect to Existing Chrome
```bash
agent-browser --auto-connect snapshot # Auto-discover running Chrome
agent-browser --cdp 9222 snapshot # Explicit CDP port
```
## iOS Simulator (Mobile Safari)
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
```
## Observability Dashboard
```bash
agent-browser dashboard install
agent-browser dashboard start # Background server on port 4848
agent-browser dashboard stop
```
## Cloud Providers
Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
## Browser Engine Selection
```bash
agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
```
## Electron (LobeHub Desktop)
### Setup / Teardown
Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
```bash
SCRIPT=".agents/skills/local-testing/scripts/electron-dev.sh"
# Start Electron dev with CDP (idempotent — skips if already running)
$SCRIPT start
# Check if Electron is running and CDP is reachable
$SCRIPT status
# Kill all Electron-related processes (main + helper + vite)
$SCRIPT stop
# Force fresh restart
$SCRIPT restart
```
After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
**Always run `$SCRIPT stop` when done testing**`pkill -f "Electron"` alone won't catch all helper processes.
#### Environment Variables
| Variable | Default | Description |
| ----------------- | ----------------------- | ---------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `ELECTRON_LOG` | `/tmp/electron-dev.log` | Electron process log |
| `ELECTRON_WAIT_S` | `60` | Max seconds to wait for Electron process |
| `RENDERER_WAIT_S` | `60` | Max seconds to wait for SPA to load |
### LobeHub-Specific Patterns
#### Access Zustand Store State
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
return JSON.stringify({
ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
activeAgent: chat.activeAgentId,
activeTopic: chat.activeTopicId,
});
})()
EVALEOF
```
#### Find and Use the Chat Input
```bash
# The chat input is contenteditable — must use -C flag
agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
agent-browser --cdp 9222 click @e48
agent-browser --cdp 9222 type @e48 "Hello world"
agent-browser --cdp 9222 press Enter
```
#### Wait for Agent to Complete
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
var running = ops.filter(function(o) { return o.status === 'running'; });
return running.length === 0 ? 'done' : 'running: ' + running.length;
})()
EVALEOF
```
#### Install Error Interceptor
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function() {
var msg = Array.from(arguments).map(function(a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
# Later, check captured errors:
agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
```
## Chrome / Web Apps
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-test-profile \
"<URL>" &
sleep 5
agent-browser --cdp 9222 snapshot -i
# Or auto-discover running Chrome with remote debugging
agent-browser --auto-connect snapshot -i
```
---
# Part 2: osascript (Native macOS App Bot Testing)
Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. Works with any app that supports macOS Accessibility, no CDP or Chromium needed.
The pattern is the same for every platform:
1. **Activate** the app (`tell application "X" to activate`)
2. **Navigate** to a channel/chat (Quick Switcher `Cmd+K` or Search `Cmd+F`)
3. **Send** a message (clipboard paste `Cmd+V` + Enter)
4. **Wait** for the bot response
5. **Screenshot** for verification (`screencapture` + `Read` tool)
## Per-Platform References
Pick the file for your target platform — each contains activation, navigation, send-message, and verification snippets specific to that app:
Each channel has its own folder under `bot/<channel>/` containing an `index.md`
(activation, navigation, send-message, and verification snippets specific to
that app) and its test script:
| Platform | Reference | Quick switcher |
| ------------- | ------------------------------------------------ | -------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
For **shared osascript patterns** (activate, type, paste, screenshot, read accessibility, common workflow template, gotchas), see [bot/osascript-common.md](./bot/osascript-common.md). Read this first if you're new to osascript automation.
## Bridge-based channels (no native app)
Some channels have no native app to drive with osascript — they connect through
a local bridge inside the Desktop app. These are tested with agent-browser
(IPC + UI) plus the bridge's own HTTP/REST endpoints, not osascript:
| Channel | Reference | What it drives |
| -------- | ------------------------------------------------ | -------------------------------------------------------- |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | `imessageBridge.*` IPC + local bridge + BlueBubbles REST |
For iMessage there is a one-shot regression script — see `test-imessage-bridge.sh` below.
---
# Scripts
**App / recording scripts** in `.agents/skills/local-testing/scripts/`:
| Script | Usage |
| ------------------------- | --------------------------------------------------- |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart) |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `record-app-screen.sh` | Record app screen (video + screenshots, start/stop) |
**Bot scripts** live under `.agents/skills/local-testing/bot/`, one folder per
channel (alongside that channel's `index.md`). The shared
`capture-app-window.sh` sits at the `bot/` root:
| Script | Usage |
| ---------------------------------- | ------------------------------------------------------------------- |
| `capture-app-window.sh` | Capture screenshot of a specific app window (used by bot tests) |
| `discord/test-discord-bot.sh` | Send message to Discord bot via osascript |
| `slack/test-slack-bot.sh` | Send message to Slack bot via osascript |
| `telegram/test-telegram-bot.sh` | Send message to Telegram bot via osascript |
| `wechat/test-wechat-bot.sh` | Send message to WeChat bot via osascript |
| `lark/test-lark-bot.sh` | Send message to Lark / 飞书 bot via osascript |
| `qq/test-qq-bot.sh` | Send message to QQ bot via osascript |
| `imessage/test-imessage-bridge.sh` | Regression-test the iMessage BlueBubbles bridge (IPC + HTTP) |
| `imessage/send-imessage-test.sh` | Send one real iMessage (desktop → BB → iMessage) and verify it sent |
### Window Screenshot Utility
`capture-app-window.sh` captures a screenshot of a specific app window using `screencapture -l <windowID>`. It uses Swift + CGWindowList to find the window by process name, so screenshots work correctly even when the window is on an external monitor or behind other windows.
```bash
# Standalone usage
./.agents/skills/local-testing/bot/capture-app-window.sh "Discord" /tmp/discord.png
./.agents/skills/local-testing/bot/capture-app-window.sh "Slack" /tmp/slack.png
./.agents/skills/local-testing/bot/capture-app-window.sh "WeChat" /tmp/wechat.png
```
All bot test scripts use this utility automatically for their screenshots.
### Bot Test Scripts
All bot test scripts share the same interface:
```bash
./scripts/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
```
Examples:
```bash
# Discord — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
# Slack — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
# Telegram — test a bot by username
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
# WeChat — test a bot or send to a contact
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
# Lark/飞书 — test a bot in a group chat
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
# QQ — test a bot in a group or direct chat
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
Each script: activates the app, navigates to the channel/contact, pastes the message via clipboard, sends, waits, and takes a screenshot. Use the `Read` tool on the screenshot for visual verification.
### iMessage bridge regression script
`test-imessage-bridge.sh` does **not** follow the osascript bot interface — it
drives the Desktop bridge's IPC + HTTP layers and asserts the result, then
self-cleans. Needs BlueBubbles running and Electron up with CDP.
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
# defaults: bb_url=http://127.0.0.1:1234 cdp_port=9222 — exit 0 = all green
```
It guards the connect/configure flow (testConfig happy + reject paths, first-time
`upsertConfig` save, bridge running + webhook registered, local-server secret
enforcement). See [bot/imessage/index.md](./bot/imessage/index.md)
for the full manual UI flow and known bugs.
---
# Screen Recording
Record automated demos using `record-app-screen.sh` (start/stop lifecycle, CDP screenshots + ffmpeg assembly). See [references/record-app-screen.md](references/record-app-screen.md) for full documentation.
```bash
./.agents/skills/local-testing/scripts/electron-dev.sh start
./.agents/skills/local-testing/scripts/record-app-screen.sh start my-demo
# ... run automation ...
./.agents/skills/local-testing/scripts/record-app-screen.sh stop
```
Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/` (screenshots every 3s).
---
# Gotchas
### agent-browser
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
### Electron-specific
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
### osascript
See [bot/osascript-common.md](./bot/osascript-common.md#gotchas) for the full osascript gotchas list (accessibility permissions, `keystroke` non-ASCII issues, locale-specific app names, rate limiting, etc.).
@@ -1,110 +0,0 @@
# Log `agent-browser` into a local LobeHub dev server
`agent-browser --headed` on macOS often creates the Chromium window off-screen — the user can't see or interact with it, so manual login inside the agent-browser session fails. Instead of sharing the user's real Chrome profile, copy the **better-auth session cookie** out of a request in DevTools and inject it into the agent-browser session as a Playwright-style state file.
## When to use
- You need `agent-browser` to reach an authenticated page on `http://localhost:<port>` (e.g. `localhost:3011`).
- The user already has a logged-in tab of the same dev server in their own Chrome.
- Spawning a headed Chromium to let the user log in manually is unreliable (window off-screen, no interaction).
Do **not** use this on production URLs — only local dev. Treat the cookie as a secret: don't paste it into shared logs, PRs, or commit it anywhere.
## Step 1 — Ask the user to copy the cookie from a Network request, NOT `document.cookie`
`document.cookie` will not return HttpOnly cookies, which is exactly where better-auth puts its session. Instruct the user:
1. Open the logged-in tab (`http://localhost:<port>/…`) in their own Chrome.
2. `Cmd+Option+I`**Network** tab.
3. Refresh, click any same-origin request (e.g. the top-level document request).
4. In the right pane under **Request Headers**, right-click the `Cookie:` line → **Copy value** (or copy the entire header).
5. Paste the string into chat.
You only need the better-auth pieces. Everything else (Clerk, `LOBE_LOCALE`, HMR hash, theme vars) is noise and can stay. The minimum viable set is:
```
better-auth.session_token=<value>; better-auth.state=<value>
```
## Step 2 — Build a Playwright-style state file
`agent-browser state load` expects Playwright's `storageState` format: a JSON with a `cookies` array and an `origins` array.
```bash
cat > /tmp/mkstate.py << 'PY'
import json, sys, time
# Read the Cookie header from stdin (allows optional "Cookie: " prefix).
raw = sys.stdin.read().strip()
if raw.lower().startswith("cookie:"):
raw = raw.split(":", 1)[1].strip()
# Keep only better-auth cookies. Extend this set if the app genuinely needs more.
WANTED = {"better-auth.session_token", "better-auth.state"}
cookies = []
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
for pair in raw.split("; "):
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input\n")
sys.exit(1)
print(json.dumps({"cookies": cookies, "origins": []}, indent=2))
PY
# Feed the copied Cookie header in via env var or heredoc.
printf '%s' "$COOKIE_HEADER" | python3 /tmp/mkstate.py > /tmp/state.json
```
**Note on `httpOnly`**: the real cookie in the user's browser is HttpOnly, but `storageState` doesn't enforce the flag on load — it just attaches the value. Storing with `httpOnly: false` is fine for local dev and sidesteps a CDP-context quirk where HttpOnly cookies sometimes fail to attach.
## Step 3 — Load state and navigate
```bash
SESSION="my-test" # any stable session name
agent-browser --session "$SESSION" state load /tmp/state.json
agent-browser --session "$SESSION" open "http://localhost:3011/"
agent-browser --session "$SESSION" get url
# Expect NOT /signin?callbackUrl=… — if you still see signin, cookie didn't apply.
```
## Step 4 — Verify
```bash
agent-browser --session "$SESSION" snapshot -i | head -20
# Look for the user's avatar/name in the sidebar, or absence of the signin form.
```
## Common failure modes
| Symptom | Cause | Fix |
| ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- |
| Still redirects to `/signin` after `state load` | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| `state load` reports 0 cookies | Separator wrong, or user pasted URL-decoded value | Keep the raw `Cookie:` header as-is; split on `"; "` |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-load |
| Domain mismatch | Use `domain: "localhost"` literally, no leading dot for local dev | — |
## Scope
Only covers authenticating an **agent-browser** session into a **local** LobeHub dev server. It does not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com` and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI need a real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way (into agent-browser only).
@@ -0,0 +1,69 @@
---
name: model-bank-metadata
description: 'Backfill and maintain model-bank metadata (knowledgeCutoff, family, generation). Use when adding models, fixing cutoff/family data, running a metadata sweep across aiModels providers, or researching official knowledge cutoffs.'
user-invocable: false
---
# Model-Bank Metadata (knowledgeCutoff / family / generation)
How to populate and maintain the three structured metadata fields on `packages/model-bank/src/aiModels/*.ts` model cards, at single-model scale (new model PR) or repo-wide scale (sweep across \~80 provider files / \~1900 entries).
## Field semantics
| Field | Format | Meaning |
| ----------------- | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `knowledgeCutoff` | `'YYYY-MM'` (or `'YYYY'` if only the year is published) | World-knowledge cutoff. When a vendor distinguishes a **"reliable knowledge cutoff"** from the broader training-data cutoff (Anthropic does), always use the **reliable** one. |
| `family` | lowercase slug (`claude`, `gpt`, `o-series`, `qwen`, `deepseek`, `llama`, `glm`, …) | Model lineage, finer than `organization`. Lets the UI group models and match the same model across aggregator providers. |
| `generation` | family slug + version (`claude-4.6`, `gpt-5.2`, `qwen3.5`, `llama-3.1`) | Generation within the family. Only set when confidently derivable from the model line's naming. Rolling aliases (`qwen-max`, `deepseek-chat`, `gemini-flash-latest`) get `family` only. |
All three are optional. **The cardinal rule: only fill what an authoritative source states or naming rules derive — never guess.** An empty field is correct for vendors that publish nothing.
No DB migration is ever needed for these: builtin models are merged from model-bank at read time (`repositories/aiInfra/index.ts` spreads the whole card), so new card fields flow to the client automatically.
## Sourcing rules for knowledgeCutoff
Accept only:
- Vendor official docs (platform.openai.com / developers.openai.com, docs.x.ai, ai.google.dev, docs.anthropic.com / platform.claude.com)
- Official Hugging Face org model cards (huggingface.co/meta-llama/..., etc.)
- Official tech reports / system cards / launch blog posts
Reject:
- **Third-party aggregator sites** (aiknowledgecutoff.com and similar) — proven to copy one model's value across a whole family. A Cohere sweep once claimed `2024-06` for four distinct base models; none of the cited Cohere pages said that, and the only cutoff Cohere actually publishes is Feb 2023 for the 08-2024 Command R/R+ refresh.
- **AWS Bedrock model cards as sole source** — proven to conflate launch date with knowledge cutoff (DeepSeek R1's card lists both as "Jan 2025"). If Bedrock is the only place a value appears, leave the field empty.
- Inference from `releasedAt` — a release date is not a cutoff.
Variant inheritance: dated snapshots (`-2024-08-06`), speed/price tiers of the same checkpoint, quantizations (`-fp8`, `-awq`), context-length variants (`-32k`), ollama `:NNb` tags, and cloud-prefixed ids (`anthropic.`/`us.`/`global.` Bedrock ids) share their base model's cutoff. **Distills do not inherit** from teacher or base — use the distill's own published value or leave empty. **Sizes within one generation can genuinely differ**: Llama 3 8B is Mar 2023 while 70B is Dec 2023 (per Meta's own card) — don't "fix" that to one family-wide value.
Vendors that publish no cutoffs (leave empty, don't chase): Qwen, DeepSeek, GLM/Zhipu, ERNIE, Doubao, Hunyuan, SenseNova, Spark, MiniMax, StepFun, Yi (mostly), Moonshot.
Known per-vendor footguns:
- **Anthropic**: Opus 4.6 reliable cutoff is `2025-05`, Sonnet 4.6 is `2025-08` — easy to swap. Claude 3.7 is `2024-10` (system card: trained through Nov 2024, knowledge cutoff end of Oct 2024). Cite system cards / the models overview, not the Help Center article (a living page that drops retired models — citation rot).
- **xAI**: docs.x.ai has one blanket sentence covering grok-3/grok-4; mini variants are not named there. Grok 4.20/4.3 have no official cutoff anywhere.
- **OpenAI**: per-model docs pages (developers.openai.com/api/docs/models/<id>) state cutoffs explicitly, including snapshot differences (gpt-4-1106-preview `2023-04` vs gpt-4-0125-preview `2023-12`).
## family/generation derivation
Rule-based, no research needed: `scripts/derive-family.ts` holds the per-family regex rules. Traps already encoded there — keep them when extending:
- Date suffixes are not versions: `claude-sonnet-4-20250514` is generation `claude-4`, not `claude-4.2`.
- Size suffixes are not versions: `llama-3-8b``llama-3` (not `llama-3.8`); `gemma-7b-it` is **gemma-1** (not gemma-7).
- Vendor spelling variants: `qwen2p5` = qwen2.5, `llama-v3p1` = llama-3.1, ollama `:NNb` tags, Bedrock `us.`/`global.`/`anthropic.` prefixes.
- `claude-X.0` normalizes to `claude-X`.
- Fable/Mythos-class ids (`claude-fable-5`) don't match the opus/sonnet/haiku regex — they are the Mythos class — `family: 'claude-mythos'`, `generation: 'mythos-5'` (set manually; the launch page calls Fable 5 "the generally available Mythos-class model").
## Repo-wide sweep workflow
1. **Extract ids**: `bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts` → unique normalized chat-model ids (normalization = last path segment, lowercased). Non-chat types (image/video/embedding/tts) have no knowledge cutoff — skip them.
2. **Research (multi-agent)**: chunk ids by family (≤50 per chunk) and fan out one research agent per chunk (Workflow tool), each returning `{id, cutoff, source}` with the sourcing rules above baked into the prompt, **plus** one adversarial verify agent per chunk that re-fetches cited sources and refutes unsupported claims. The verify pass is load-bearing: it caught the Cohere aggregator copy-paste and the AWS launch-date conflation.
3. **Policy filter**: before applying, drop entries whose only source is a rejected category (check the returned `sources` map — e.g. drop everything sourced to aws.amazon.com).
4. **Apply**: `bun scripts/apply-cutoffs.ts <map.json>` and `bun scripts/apply-family.ts <map.json>` (run from repo root). Both are idempotent codemods keyed on normalized id — aggregator providers get the same values automatically; entries that already have the field are skipped. They rely on the uniform prettier formatting of the data files (entries start ` {` / end ` },`, fields at 4-space indent).
5. **Verify**: `cd packages/model-bank && bunx vitest run src/aiModels/__tests__/index.test.ts && bunx tsc --noEmit`.
## Maintenance rules
- **New model PRs** should fill all three fields inline, citing the official source in the PR body (see the Anthropic entries in `anthropic.ts` for reference values).
- **After resolving merge conflicts** in model-bank data files, sanity-check that metadata didn't vanish: `git grep -c knowledgeCutoff -- 'packages/model-bank/src/aiModels/*.ts'` before vs after. A three-way stack of model PRs once silently dropped all 10 Anthropic cutoffs during conflict resolution.
- Dirty ids exist in aggregator data (a sambanova id once carried a trailing tab). The codemods match ids verbatim — if a map key won't apply, check for invisible characters before assuming the model is missing.
@@ -0,0 +1,73 @@
/**
* One-off codemod: apply a canonical { normalizedModelId: 'YYYY-MM' } map onto
* packages/model-bank/src/aiModels/*.ts, inserting `knowledgeCutoff` after the
* `id:` line of every chat-model entry that matches and doesn't already have one.
*
* Relies on the uniform prettier formatting of these files:
* - each model entry starts with ` {` and ends with ` },` (2-space indent)
* - fields are at 4-space indent: ` id: '...'`, ` type: 'chat'`
*
* Usage: bun /tmp/apply-cutoffs.ts /tmp/cutoff-map.json
*/
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
const mapPath = process.argv[2];
if (!mapPath) throw new Error('usage: bun apply-cutoffs.ts <map.json>');
const map: Record<string, string> = JSON.parse(readFileSync(mapPath, 'utf8'));
const dir = 'packages/model-bank/src/aiModels';
const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
let touchedFiles = 0;
let inserted = 0;
const matchedIds = new Set<string>();
for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const path = join(dir, file);
const lines = readFileSync(path, 'utf8').split('\n');
const out: string[] = [];
let changed = false;
let i = 0;
while (i < lines.length) {
if (lines[i] !== ' {') {
out.push(lines[i]);
i++;
continue;
}
// collect one model entry block
const start = i;
let end = i;
while (end < lines.length && lines[end] !== ' },') end++;
const block = lines.slice(start, end + 1);
const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
const hasCutoff = block.some((l) => /^ {4}knowledgeCutoff:/.test(l));
if (idLineIdx >= 0 && isChat && !hasCutoff) {
const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
const norm = rawId ? normalize(rawId) : undefined;
const cutoff = norm ? map[norm] : undefined;
if (cutoff && /^\d{4}(?:-\d{2})?$/.test(cutoff)) {
block.splice(idLineIdx + 1, 0, ` knowledgeCutoff: '${cutoff}',`);
inserted++;
changed = true;
matchedIds.add(norm!);
}
}
out.push(...block);
i = end + 1;
}
if (changed) {
writeFileSync(path, out.join('\n'));
touchedFiles++;
}
}
console.log(`inserted ${inserted} knowledgeCutoff fields across ${touchedFiles} files`);
console.log(`map ids used: ${matchedIds.size}/${Object.keys(map).length}`);
const unused = Object.keys(map).filter((k) => !matchedIds.has(k));
if (unused.length) console.log('unused map keys (first 20):', unused.slice(0, 20));
@@ -0,0 +1,49 @@
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
const map: Record<string, { family: string; generation?: string }> = JSON.parse(
readFileSync('/tmp/family-map.json', 'utf8'),
);
const dir = 'packages/model-bank/src/aiModels';
const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
let inserted = 0;
let touchedFiles = 0;
for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const path = join(dir, file);
const lines = readFileSync(path, 'utf8').split('\n');
const out: string[] = [];
let changed = false;
let i = 0;
while (i < lines.length) {
if (lines[i] !== ' {') {
out.push(lines[i]);
i++;
continue;
}
let end = i;
while (end < lines.length && lines[end] !== ' },') end++;
const block = lines.slice(i, end + 1);
const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
const hasFamily = block.some((l) => /^ {4}family:/.test(l));
if (idLineIdx >= 0 && isChat && !hasFamily) {
const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
const r = rawId ? map[normalize(rawId)] : undefined;
if (r) {
const add = [` family: '${r.family}',`];
if (r.generation) add.push(` generation: '${r.generation}',`);
block.splice(idLineIdx, 0, ...add);
inserted++;
changed = true;
}
}
out.push(...block);
i = end + 1;
}
if (changed) {
writeFileSync(path, out.join('\n'));
touchedFiles++;
}
}
console.log(`annotated ${inserted} model entries across ${touchedFiles} files`);
@@ -0,0 +1,237 @@
/* eslint-disable regexp/no-unused-capturing-group */
/**
* Rule-based derivation of { family, generation } from normalized model ids.
* Principle: only fill what is confidently derivable; otherwise omit.
*
* Usage: bun /tmp/derive-family.ts # print distinct pairs for review
* bun /tmp/derive-family.ts --emit # write /tmp/family-map.json
*/
import { readFileSync, writeFileSync } from 'node:fs';
const ids: string[] = JSON.parse(readFileSync('/tmp/model-ids.json', 'utf8'));
type R = { family: string; generation?: string };
const derive = (id: string): R | undefined => {
// strip cloud/bedrock prefixes for matching
const m = id.replace(/^(us\.|global\.|eu\.|apac\.)?(anthropic\.|meta\.|cohere\.|azure-)/, '');
// ---- anthropic ----
if (m.startsWith('claude')) {
// family = product-line tier (claude-opus/sonnet/haiku/instant); bare claude-2.x has no tier
const tier = m.match(/(opus|sonnet|haiku|instant)/)?.[1];
const family = tier ? `claude-${tier}` : 'claude';
let g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)[.-](\d)(?!\d)/); // claude-opus-4-8 / claude-haiku-4.5
if (g) return { family, generation: `claude-${g[1]}.${g[2]}` };
g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)(?!\d)/); // claude-opus-4
if (g) return { family, generation: `claude-${g[1]}` };
g = m.match(/^claude-(\d)[.-](\d)(?!\d)/); // claude-3-5-haiku / claude-3.7-sonnet / claude-2.1
if (g) return { family, generation: g[2] === '0' ? `claude-${g[1]}` : `claude-${g[1]}.${g[2]}` };
g = m.match(/^claude-(\d)(?!\d)/); // claude-3-haiku
if (g) return { family, generation: `claude-${g[1]}` };
if (m.startsWith('claude-instant')) return { family: 'claude-instant' };
if (/^claude-v?2/.test(m)) return { family: 'claude', generation: 'claude-2' };
return { family };
}
// ---- openai ----
if (/^(gpt-oss|gpt_oss)/.test(m) || m.startsWith('gpt-oss:'))
return { family: 'gpt-oss', generation: 'gpt-oss' };
if (/^(chatgpt-4o|gpt-4o)/.test(m)) return { family: 'gpt', generation: 'gpt-4o' };
if (/^gpt-(3\.5|35)/.test(m)) return { family: 'gpt', generation: 'gpt-3.5' };
if (m.startsWith('gpt-audio')) return { family: 'gpt', generation: 'gpt-audio' };
{
const g = m.match(/^gpt-(\d)\.(\d)/); // gpt-4.1 / gpt-5.2
if (g) return { family: 'gpt', generation: `gpt-${g[1]}.${g[2]}` };
const g2 = m.match(/^gpt-(\d)(?!\d)/); // gpt-4 / gpt-5
if (g2) return { family: 'gpt', generation: `gpt-${g2[1]}` };
}
{
const g = m.match(/^o([134])(-|$)/); // o1 / o3 / o4
if (g) return { family: 'o-series', generation: `o${g[1]}` };
}
if (/^(codex|computer-use-preview)/.test(m)) return { family: 'gpt' };
// ---- google ----
{
const g = m.match(/^gemini-(\d+(?:\.\d+)?)/);
if (g) return { family: 'gemini', generation: `gemini-${g[1]}` };
if (/^gemini-(pro|flash)/.test(m)) return { family: 'gemini' }; // rolling aliases
if (m.startsWith('gemma')) {
if (/^gemma-?\db/.test(m)) return { family: 'gemma', generation: 'gemma-1' };
const v = m.match(/^gemma-?(\d)(?!b)/);
return { family: 'gemma', generation: v ? `gemma-${v[1]}` : undefined };
}
if (/^(codegemma|learnlm|palm)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
}
// ---- qwen ----
if (m.startsWith('qwq')) return { family: 'qwen', generation: 'qwq' };
if (m.startsWith('qvq')) return { family: 'qwen', generation: 'qvq' };
if (m.startsWith('codeqwen')) return { family: 'qwen' };
if (m.startsWith('qwen')) {
const g =
m.match(/^qwen-?([123](?:\.\d+)?)(?![0-9b])/) || // qwen3.5-plus / qwen-3-14b / qwen2-7b / qwen1.5
m.match(/^qwen([23](?:\.\d+)?):/) || // qwen2.5:72b
m.match(/^qwen([23])p(\d)/); // qwen2p5 -> handled below
if (/^qwen(\d)p(\d)/.test(m)) {
const p = m.match(/^qwen(\d)p(\d)/)!;
return { family: 'qwen', generation: `qwen${p[1]}.${p[2]}` };
}
if (g) return { family: 'qwen', generation: `qwen${g[1]}` };
return { family: 'qwen' }; // qwen-max/plus/turbo/vl rolling aliases
}
// ---- deepseek ----
if (/^(deepseek|azure-deepseek|pro-deepseek)/.test(m) || m.startsWith('deepseek_')) {
const s = m.replace(/^pro-/, '').replaceAll('_', '-');
if (s.startsWith('deepseek-r1-distill'))
return { family: 'deepseek', generation: 'deepseek-r1-distill' };
if (s.startsWith('deepseek-r1')) return { family: 'deepseek', generation: 'deepseek-r1' };
const g = s.match(/^deepseek-(?:chat-)?v(\d(?:\.\d)?)/);
if (g) return { family: 'deepseek', generation: `deepseek-v${g[1]}` };
if (/^deepseek-(coder-v2|coder)/.test(s))
return { family: 'deepseek', generation: 'deepseek-coder' };
return { family: 'deepseek' }; // deepseek-chat / reasoner rolling aliases
}
// ---- meta llama ----
if (m.startsWith('codellama')) return { family: 'llama', generation: 'codellama' };
if (/^(meta-)?llama|^l3(\d)?-|^llava/.test(m)) {
if (m.startsWith('llava')) return { family: 'llava' };
const s = m.replace(/^meta-/, '');
const g =
s.match(/^llama-?([234])(?:[.-](\d))?(?![0-9b])/) || // llama-3.1 / llama3.3 / llama-4
s.match(/^llama-?v([234])p?(\d)?/) || // llama-v3p1
s.match(/^llama([234])[.:-](\d)?/);
if (g) {
const gen = g[2] ? `llama-${g[1]}.${g[2]}` : `llama-${g[1]}`;
return { family: 'llama', generation: gen };
}
if (m.startsWith('l3-')) return { family: 'llama', generation: 'llama-3' };
if (m.startsWith('l31-')) return { family: 'llama', generation: 'llama-3.1' };
return { family: 'llama' };
}
// ---- zhipu ----
if (/^(zai-)?glm/.test(m)) {
const s = m.replace(/^zai-/, '');
if (s.startsWith('glm-z1')) return { family: 'glm', generation: 'glm-z1' };
if (s.startsWith('glm-zero')) return { family: 'glm', generation: 'glm-zero' };
const g = s.match(/^glm-(\d(?:\.\d)?)/);
if (g) return { family: 'glm', generation: `glm-${g[1]}` };
return { family: 'glm' };
}
if (/^(charglm|codegeex|emohaa)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
// ---- mistral ----
if (
/^(open-)?(mistral|mixtral|ministral|codestral|devstral|magistral|pixtral|mathstral|labs-devstral|labs-leanstral|open-codestral)/.test(
m,
)
) {
const fam = m.replace(/^(open-|labs-)/, '').match(/^[a-z]+/)![0];
return { family: fam };
}
// ---- xai ----
if (m.startsWith('grok')) {
const g = m.match(/^grok-(\d(?:\.\d+)?)/);
return { family: 'grok', generation: g ? `grok-${g[1]}` : undefined };
}
// ---- moonshot ----
if (m.startsWith('kimi')) {
const g = m.match(/^kimi-k(\d(?:\.\d)?)/);
return { family: 'kimi', generation: g ? `kimi-k${g[1]}` : undefined };
}
if (m.startsWith('moonshot-kimi-k2')) return { family: 'kimi', generation: 'kimi-k2' };
if (m.startsWith('moonshot-v1')) return { family: 'kimi', generation: 'moonshot-v1' };
// ---- minimax ----
if (m.startsWith('minimax')) {
if (m.startsWith('minimax-text')) return { family: 'minimax', generation: 'minimax-text-01' };
const g = m.match(/^minimax-m(\d(?:\.\d)?)/);
return { family: 'minimax', generation: g ? `minimax-m${g[1]}` : undefined };
}
if (m.startsWith('abab')) return { family: 'minimax', generation: 'abab' };
// ---- baidu ----
if (m.startsWith('ernie')) {
if (m.startsWith('ernie-x1')) return { family: 'ernie', generation: 'ernie-x1' };
const g = m.match(/^ernie-(\d\.\d)/);
return { family: 'ernie', generation: g ? `ernie-${g[1]}` : undefined };
}
if (m.startsWith('qianfan')) return { family: 'qianfan' };
// ---- bytedance ----
if (m.startsWith('doubao')) {
const g = m.match(/^doubao-seed-(\d[.-]\d|\d)/) || m.match(/^doubao-(\d\.\d)/);
return { family: 'doubao', generation: g ? `doubao-${g[1].replace('-', '.')}` : undefined };
}
if (/^(seed-oss|skylark)/.test(m)) return { family: m.startsWith('seed') ? 'doubao' : 'skylark' };
// ---- tencent ----
if (m.startsWith('hunyuan')) {
const g = m.match(/^hunyuan-(\d\.\d)/);
return { family: 'hunyuan', generation: g ? `hunyuan-${g[1]}` : undefined };
}
if (m.startsWith('hy3')) return { family: 'hunyuan', generation: 'hunyuan-3' };
// ---- others (family only / simple version) ----
if (m.startsWith('yi-')) return { family: 'yi' };
if (/^(command|c4ai-command)/.test(m)) return { family: 'command' };
if (/^(aya|c4ai-aya)/.test(m)) return { family: 'aya' };
if (/^phi-?(\d)?/.test(m) && m.startsWith('phi')) {
const g = m.match(/^phi-?(\d(?:\.\d)?)/);
return { family: 'phi', generation: g ? `phi-${g[1]}` : undefined };
}
if (m.startsWith('wizardlm')) return { family: 'wizardlm' };
if (m.startsWith('step-')) {
const g = m.match(/^step-(?:r1|(\d(?:\.\d)?))/);
return { family: 'step', generation: g?.[1] ? `step-${g[1]}` : undefined };
}
if (/^(internlm|intern-)/.test(m)) return { family: 'intern' };
if (m.startsWith('internvl')) return { family: 'internvl' };
if (m.startsWith('baichuan')) {
const g = m.match(/^baichuan-?(m?\d)/);
return { family: 'baichuan', generation: g ? `baichuan-${g[1]}` : undefined };
}
if (/^(sensechat|sensenova)/.test(m)) return { family: 'sensenova' };
if (/^(spark|generalv|4\.0ultra)/.test(m)) return { family: 'spark' };
if (/^(360gpt|360zhinao)/.test(m)) return { family: '360zhinao' };
if (/^(jamba|ai21-jamba)/.test(m)) return { family: 'jamba' };
if (m.startsWith('sonar')) return { family: 'sonar' };
if (/^(nova-lite|nova-micro|nova-pro)/.test(m)) return { family: 'nova' };
if (/^(ling|ring)-/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
if (m.startsWith('longcat')) return { family: 'longcat' };
if (m.startsWith('mimo')) return { family: 'mimo' };
if (m.startsWith('taichu')) return { family: 'taichu' };
if (/^(hermes|nous-hermes)/.test(m)) return { family: 'hermes' };
if (m.startsWith('solar')) return { family: 'solar' };
if (m.startsWith('kat-coder')) return { family: 'kat-coder' };
if (m.startsWith('dbrx')) return { family: 'dbrx' };
if (m.startsWith('morph')) return { family: 'morph' };
return undefined;
};
const map: Record<string, R> = {};
const pairs = new Map<string, number>();
let derived = 0;
for (const id of ids) {
const r = derive(id);
if (!r) continue;
derived++;
map[id] = r;
const key = `${r.family} :: ${r.generation ?? '—'}`;
pairs.set(key, (pairs.get(key) || 0) + 1);
}
console.log(`derived ${derived}/${ids.length}`);
for (const [k, n] of [...pairs.entries()].sort()) console.log(String(n).padStart(4), k);
if (process.argv.includes('--emit')) {
writeFileSync('/tmp/family-map.json', JSON.stringify(map, null, 1));
console.log('\nwritten /tmp/family-map.json');
}
@@ -0,0 +1,23 @@
/**
* Extract unique normalized chat-model ids from packages/model-bank/src/aiModels/*.ts.
* Normalization: last path segment, lowercased (matches the apply codemods).
*
* Usage (repo root): bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts [out.json]
* Default output: /tmp/model-ids.json
*/
import { readdirSync, writeFileSync } from 'node:fs';
import { join, resolve } from 'node:path';
const dir = resolve('packages/model-bank/src/aiModels');
const out = process.argv[2] || '/tmp/model-ids.json';
const ids = new Set<string>();
for (const f of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const mod = await import(join(dir, f));
for (const m of mod.default || []) {
if (!m?.id || m.type !== 'chat') continue;
ids.add(m.id.split('/').pop()!.toLowerCase());
}
}
writeFileSync(out, JSON.stringify([...ids].sort(), null, 1));
console.log(`${ids.size} unique normalized chat ids -> ${out}`);
+1 -1
View File
@@ -50,7 +50,7 @@ Common false positives (do NOT merge):
- `db-migrations` vs `drizzle` — distinct workflows (migration files vs schema authoring).
- `microcopy` vs `i18n` — content vs mechanics.
- `agent-runtime-hooks` vs `agent-tracing` vs `agent-signal` — different surfaces of the agent system.
- `testing` vs `local-testing` vs `cli-backend-testing` — different test types.
- `testing` vs `agent-testing` — different test types.
### 4 — Description format consistency
+89 -2
View File
@@ -5,6 +5,18 @@ inputs:
node-version:
description: Node.js version
required: true
cloud-repository:
description: Cloud repository to overlay for commercial desktop builds
required: false
default: lobehub/lobehub-cloud
cloud-ref:
description: Optional Cloud repository ref
required: false
default: ''
cloud-token:
description: GitHub token with permission to read the Cloud repository
required: false
default: ''
runs:
using: composite
@@ -14,9 +26,77 @@ runs:
with:
node-version: ${{ inputs.node-version }}
- name: Overlay Cloud repository for desktop build
if: inputs.cloud-token != ''
shell: bash
env:
CLOUD_CHECKOUT: ${{ runner.temp }}/lobehub-cloud
CLOUD_REF: ${{ inputs.cloud-ref }}
CLOUD_REPOSITORY: ${{ inputs.cloud-repository }}
CLOUD_ROOT: ${{ github.workspace }}/..
CLOUD_TOKEN: ${{ inputs.cloud-token }}
run: |
set -euo pipefail
cloud_root="$(cd "$GITHUB_WORKSPACE/.." && pwd)"
cloud_checkout="$RUNNER_TEMP/lobehub-cloud"
rm -rf "$cloud_checkout"
clone_args=(--depth 1)
if [ -n "$CLOUD_REF" ]; then
clone_args+=(--branch "$CLOUD_REF")
fi
git clone "${clone_args[@]}" "https://x-access-token:${CLOUD_TOKEN}@github.com/${CLOUD_REPOSITORY}.git" "$cloud_checkout"
node <<'NODE'
const fs = require('node:fs');
const path = require('node:path');
const source = process.env.CLOUD_CHECKOUT;
const target = process.env.CLOUD_ROOT;
const skip = new Set(['.git', 'lobehub', 'node_modules']);
const copy = (from, to) => {
const stat = fs.lstatSync(from);
if (stat.isSymbolicLink()) {
const link = fs.readlinkSync(from);
fs.rmSync(to, { force: true, recursive: true });
fs.symlinkSync(link, to);
return;
}
if (stat.isDirectory()) {
fs.mkdirSync(to, { recursive: true });
for (const entry of fs.readdirSync(from)) {
if (skip.has(entry)) continue;
copy(path.join(from, entry), path.join(to, entry));
}
return;
}
fs.mkdirSync(path.dirname(to), { recursive: true });
fs.copyFileSync(from, to);
};
for (const entry of fs.readdirSync(source)) {
if (skip.has(entry)) continue;
copy(path.join(source, entry), path.join(target, entry));
}
NODE
echo "CLOUD_DESKTOP=1" >> "$GITHUB_ENV"
echo "✅ Cloud repository overlaid at $cloud_root"
- name: Install dependencies
shell: bash
run: pnpm install --node-linker=hoisted
run: |
set -euo pipefail
if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
cd ..
fi
pnpm install --node-linker=hoisted
# 移除国内 electron 镜像配置,GitHub Actions 使用官方源更快
- name: Remove China electron mirror from .npmrc
@@ -31,4 +111,11 @@ runs:
- name: Install deps on Desktop
shell: bash
run: npm run install-isolated --prefix=./apps/desktop
run: |
set -euo pipefail
if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
cd ..
npm run install-isolated --prefix=./lobehub/apps/desktop
else
npm run install-isolated --prefix=./apps/desktop
fi
+1 -1
View File
@@ -30,7 +30,7 @@ jobs:
This issue is closed, If you have any questions, you can comment and reply.
- name: Checkout repository
if: github.event_name == 'pull_request_target' && github.event.pull_request.merged == true
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Check if PR author is maintainer
if: github.event.pull_request.merged == true
@@ -104,6 +104,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -172,6 +173,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -216,6 +218,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -1
View File
@@ -54,7 +54,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.11.1
node-version: 24.16.0
package-manager-cache: false
# 主要逻辑:确定构建版本号
@@ -92,6 +92,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: 24.11.1
# 设置 package.json 的版本号
@@ -87,6 +87,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -1
View File
@@ -223,6 +223,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -409,7 +410,7 @@ jobs:
- uses: actions/checkout@v6
- name: Delete old canary GitHub releases
uses: actions/github-script@v7
uses: actions/github-script@v8
with:
script: |
const { data: releases } = await github.rest.repos.listReleases({
@@ -180,6 +180,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -2
View File
@@ -28,7 +28,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.11.1
node-version: 24.16.0
- name: Setup pnpm
uses: pnpm/action-setup@v4
@@ -51,7 +51,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.11.1
node-version: 24.16.0
registry-url: https://registry.npmjs.org
- name: Setup pnpm
+52 -6
View File
@@ -32,7 +32,7 @@ jobs:
runs-on: ubuntu-latest
name: Test Packages
env:
PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/python-interpreter @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'
PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'
steps:
- name: Checkout
@@ -90,11 +90,23 @@ jobs:
for package in $PACKAGES; do
dir="${package#@lobechat/}"
if [ -f "./packages/$dir/coverage/lcov.info" ]; then
echo "Uploading coverage for $dir..."
flag="packages/$dir"
case "$dir" in
builtin-tool-*)
flag="builtin-tools"
;;
locales|env|device-gateway-client)
echo "Skipping Codecov upload for $dir."
continue
;;
esac
echo "Uploading coverage for $dir as $flag..."
./codecov upload-coverage \
$COMMON_ARGS \
--file ./packages/$dir/coverage/lcov.info \
--flag packages/$dir \
--flag "$flag" \
--disable-search
fi
done
@@ -105,8 +117,8 @@ jobs:
if: needs.check-duplicate-run.outputs.should_skip != 'true'
strategy:
matrix:
shard: [1, 2, 3]
name: Test App (shard ${{ matrix.shard }}/3)
shard: [1, 2]
name: Test App (shard ${{ matrix.shard }}/2)
runs-on: ubuntu-latest
steps:
- name: Checkout
@@ -126,7 +138,7 @@ jobs:
run: pnpm install
- name: Run tests
run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/3
run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/2 --exclude '**/apps/server/**'
- name: Upload blob report
if: ${{ !cancelled() }}
@@ -219,6 +231,40 @@ jobs:
files: ./apps/desktop/coverage/lcov.info
flags: desktop
test-server:
needs: check-duplicate-run
if: needs.check-duplicate-run.outputs.should_skip != 'true'
name: Test Server
runs-on: ubuntu-latest
steps:
- name: Checkout
env:
REF_SHA: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
REPOSITORY: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name || github.repository }}
run: |
git init .
git remote add origin "https://github.com/${REPOSITORY}.git"
git fetch --no-tags --depth=1 origin "${REF_SHA}"
git checkout --force FETCH_HEAD
- name: Setup environment
uses: ./.github/actions/setup-env
- name: Install deps
run: pnpm install
- name: Test Server Coverage
run: bunx vitest --coverage --silent='passed-only' --reporter=default --coverage.reportsDirectory=./apps/server/coverage --dir apps/server
- name: Upload Server coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./apps/server/coverage/lcov.info
flags: server
test-databsae:
needs: check-duplicate-run
if: needs.check-duplicate-run.outputs.should_skip != 'true'
+1 -1
View File
@@ -1,6 +1,6 @@
.\" Code generated by `npm run man:generate`; DO NOT EDIT.
.\" Manual command details come from the Commander command tree.
.TH LH 1 "" "@lobehub/cli 0.0.27" "User Commands"
.TH LH 1 "" "@lobehub/cli 0.0.29" "User Commands"
.SH NAME
lh \- LobeHub CLI \- manage and connect to LobeHub services
.SH SYNOPSIS
+2 -2
View File
@@ -1,6 +1,6 @@
{
"name": "@lobehub/cli",
"version": "0.0.27",
"version": "0.0.29",
"type": "module",
"bin": {
"lh": "./dist/index.js",
@@ -35,7 +35,7 @@
"@lobechat/local-file-shell": "workspace:*",
"@lobechat/tool-runtime": "workspace:*",
"@trpc/client": "^11.8.1",
"@types/node": "^22.13.5",
"@types/node": "^24.13.2",
"@types/ws": "^8.18.1",
"commander": "^13.1.0",
"dayjs": "^1.11.19",
+35
View File
@@ -3,6 +3,7 @@ import os from 'node:os';
import path from 'node:path';
import type {
AgentRunRequestMessage,
DeviceSystemInfo,
SystemInfoRequestMessage,
ToolCallRequestMessage,
@@ -25,6 +26,7 @@ import {
stopDaemon,
writeStatus,
} from '../daemon/manager';
import { spawnHeteroAgentRun } from '../device/agentRun';
import { registerDevice, resolveDeviceIdentity } from '../device/register';
import { loadOrCreateConnectionId, loadSettings, normalizeUrl, saveSettings } from '../settings';
import { executeToolCall } from '../tools';
@@ -286,6 +288,39 @@ async function runConnect(options: ConnectOptions, isDaemonChild: boolean) {
});
});
// Handle gateway-dispatched agent runs (heterogeneous agents, e.g. Claude
// Code). Mirrors the desktop app: spawn `lh hetero exec`, which owns the full
// execution + server-ingest pipeline. Ack with the spawn outcome — `accepted`
// once the child starts, `rejected` if it fails to spawn (e.g. bad cwd) — so
// a failed dispatch surfaces as an error instead of a stuck assistant message.
client.on('agent_run_request', async (request: AgentRunRequestMessage) => {
info(
`Received agent_run_request: operationId=${request.operationId} type=${request.agentType}`,
);
try {
const ack = await spawnHeteroAgentRun(
{
agentType: request.agentType,
cwd: request.cwd,
imageList: request.imageList,
jwt: request.jwt,
operationId: request.operationId,
prompt: request.prompt,
resumeSessionId: request.resumeSessionId,
serverUrl: auth.serverUrl,
systemContext: request.systemContext,
topicId: request.topicId,
},
{ error, info },
);
client.sendAgentRunAck({ operationId: request.operationId, ...ack });
} catch (err) {
const reason = err instanceof Error ? err.message : String(err);
error(`agent_run_request failed: ${reason}`);
client.sendAgentRunAck({ operationId: request.operationId, reason, status: 'rejected' });
}
});
client.on('connected', () => {
updateStatus('connected');
});
+1 -1
View File
@@ -650,7 +650,7 @@ describe('hetero exec command', () => {
});
it('resets the per-message text accumulator at message boundaries (no cross-message duplication)', async () => {
// LOBE-10157 Bug 3: the `replace` snapshot accumulator must not span
// The `replace` snapshot accumulator must not span
// message boundaries. Two assistant messages separated by a
// stream_end/stream_start boundary must each snapshot only their OWN
// text — otherwise the second message re-emits the first's text verbatim.
+1 -1
View File
@@ -261,7 +261,7 @@ class SerialServerIngester {
// adapter's `openMainMessage`) must reset it — otherwise it spans the
// whole run and every later message's snapshot re-emits all prior
// messages' text verbatim, which the server then persists into the new
// DB message (LOBE-10157 Bug 3: cross-message text duplication). Reset
// DB message: cross-message text duplication. Reset
// AFTER flushing the just-ended message's pending snapshot above.
if (event.type === 'stream_start' || event.type === 'stream_end') {
this.accumulatedText = '';
+20 -16
View File
@@ -64,15 +64,18 @@ describe('skill command', () => {
describe('list', () => {
it('should display skills in table format', async () => {
mockTrpcClient.agentSkills.list.query.mockResolvedValue([
{
description: 'A skill',
id: 's1',
identifier: 'test-skill',
name: 'Test Skill',
source: 'user',
},
]);
mockTrpcClient.agentSkills.list.query.mockResolvedValue({
data: [
{
description: 'A skill',
id: 's1',
identifier: 'test-skill',
name: 'Test Skill',
source: 'user',
},
],
total: 1,
});
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list']);
@@ -83,7 +86,7 @@ describe('skill command', () => {
it('should output JSON when --json flag is used', async () => {
const items = [{ id: 's1', name: 'Test' }];
mockTrpcClient.agentSkills.list.query.mockResolvedValue(items);
mockTrpcClient.agentSkills.list.query.mockResolvedValue({ data: items, total: items.length });
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list', '--json']);
@@ -92,7 +95,7 @@ describe('skill command', () => {
});
it('should filter by source', async () => {
mockTrpcClient.agentSkills.list.query.mockResolvedValue([]);
mockTrpcClient.agentSkills.list.query.mockResolvedValue({ data: [], total: 0 });
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list', '--source', 'builtin']);
@@ -111,7 +114,7 @@ describe('skill command', () => {
});
it('should show message when no skills found', async () => {
mockTrpcClient.agentSkills.list.query.mockResolvedValue([]);
mockTrpcClient.agentSkills.list.query.mockResolvedValue({ data: [], total: 0 });
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'list']);
@@ -211,9 +214,10 @@ describe('skill command', () => {
describe('search', () => {
it('should search skills', async () => {
mockTrpcClient.agentSkills.search.query.mockResolvedValue([
{ description: 'A skill', id: 's1', name: 'Found Skill' },
]);
mockTrpcClient.agentSkills.search.query.mockResolvedValue({
data: [{ description: 'A skill', id: 's1', name: 'Found Skill' }],
total: 1,
});
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'search', 'test']);
@@ -223,7 +227,7 @@ describe('skill command', () => {
});
it('should show message when no results', async () => {
mockTrpcClient.agentSkills.search.query.mockResolvedValue([]);
mockTrpcClient.agentSkills.search.query.mockResolvedValue({ data: [], total: 0 });
const program = createProgram();
await program.parseAsync(['node', 'test', 'skill', 'search', 'nothing']);
+2 -2
View File
@@ -47,7 +47,7 @@ export function registerSkillCommand(program: Command) {
if (options.source) input.source = options.source as 'builtin' | 'market' | 'user';
const result = await client.agentSkills.list.query(input);
const items = Array.isArray(result) ? result : [];
const items = result?.data ?? [];
if (options.json !== undefined) {
const fields = typeof options.json === 'string' ? options.json : undefined;
@@ -206,7 +206,7 @@ export function registerSkillCommand(program: Command) {
.action(async (query: string, options: { json?: string | boolean }) => {
const client = await getTrpcClient();
const result = await client.agentSkills.search.query({ query });
const items = Array.isArray(result) ? result : [];
const items = result?.data ?? [];
if (options.json !== undefined) {
const fields = typeof options.json === 'string' ? options.json : undefined;
+145
View File
@@ -0,0 +1,145 @@
import { EventEmitter } from 'node:events';
import { afterEach, describe, expect, it, vi } from 'vitest';
import { spawnHeteroAgentRun } from './agentRun';
const { spawnMock } = vi.hoisted(() => ({ spawnMock: vi.fn() }));
vi.mock('node:child_process', () => ({ spawn: spawnMock }));
const makeFakeChild = () => {
const child = new EventEmitter() as EventEmitter & {
stdin: { end: ReturnType<typeof vi.fn>; write: ReturnType<typeof vi.fn> };
};
child.stdin = { end: vi.fn(), write: vi.fn() };
return child;
};
const baseParams = {
agentType: 'claudeCode',
jwt: 'jwt',
operationId: 'op',
prompt: 'hi',
serverUrl: 'https://app.lobehub.com',
topicId: 'tpc',
};
describe('spawnHeteroAgentRun', () => {
afterEach(() => {
spawnMock.mockReset();
});
it('spawns `lh hetero exec` in server-ingest mode via the current CLI entry', async () => {
const child = makeFakeChild();
spawnMock.mockReturnValue(child);
const ackPromise = spawnHeteroAgentRun({
...baseParams,
cwd: '/work/dir',
jwt: 'jwt-token',
operationId: 'op-1',
topicId: 'tpc-1',
});
expect(spawnMock).toHaveBeenCalledTimes(1);
const [bin, args, opts] = spawnMock.mock.calls[0];
expect(bin).toBe(process.execPath);
expect(args).toEqual([
...process.execArgv,
process.argv[1],
'hetero',
'exec',
'--type',
'claudeCode',
'--operation-id',
'op-1',
'--topic',
'tpc-1',
'--render',
'none',
'--input-json',
'-',
'--cwd',
'/work/dir',
]);
expect(opts).toMatchObject({
cwd: '/work/dir',
env: expect.objectContaining({
LOBEHUB_JWT: 'jwt-token',
LOBEHUB_SERVER: 'https://app.lobehub.com',
}),
});
// stdin is only written after the child actually spawns.
expect(child.stdin.write).not.toHaveBeenCalled();
child.emit('spawn');
await expect(ackPromise).resolves.toEqual({ status: 'accepted' });
expect(child.stdin.write).toHaveBeenCalledWith(JSON.stringify('hi'));
expect(child.stdin.end).toHaveBeenCalledTimes(1);
});
it('rejects (no stuck run) when the child errors before spawning, e.g. bad cwd', async () => {
const child = makeFakeChild();
spawnMock.mockReturnValue(child);
const ackPromise = spawnHeteroAgentRun({ ...baseParams, cwd: '/missing' });
child.emit('error', new Error('spawn ENOENT'));
await expect(ackPromise).resolves.toEqual({ reason: 'spawn ENOENT', status: 'rejected' });
expect(child.stdin.write).not.toHaveBeenCalled();
});
it('appends --resume when resuming a session', () => {
const child = makeFakeChild();
spawnMock.mockReturnValue(child);
void spawnHeteroAgentRun({ ...baseParams, resumeSessionId: 'sess-9' });
const [, args] = spawnMock.mock.calls[0];
expect(args).toContain('--resume');
expect(args).toContain('sess-9');
});
it('sends a content-block array to stdin when systemContext is provided', async () => {
const child = makeFakeChild();
spawnMock.mockReturnValue(child);
const ackPromise = spawnHeteroAgentRun({
...baseParams,
prompt: 'do it',
systemContext: 'workspace rules',
});
child.emit('spawn');
await ackPromise;
expect(child.stdin.write).toHaveBeenCalledWith(
JSON.stringify([
{ text: 'workspace rules', type: 'text' },
{ text: 'do it', type: 'text' },
]),
);
});
it('appends image blocks to stdin when imageList is provided', async () => {
const child = makeFakeChild();
spawnMock.mockReturnValue(child);
const ackPromise = spawnHeteroAgentRun({
...baseParams,
imageList: [{ id: 'file-1', url: 'https://signed/a.png' }],
prompt: 'look at this',
});
child.emit('spawn');
await ackPromise;
expect(child.stdin.write).toHaveBeenCalledWith(
JSON.stringify([
{ text: 'look at this', type: 'text' },
{ source: { id: 'file-1', type: 'url', url: 'https://signed/a.png' }, type: 'image' },
]),
);
});
});
+134
View File
@@ -0,0 +1,134 @@
import { spawn } from 'node:child_process';
import {
buildHeteroExecStdinPayload,
type HeteroExecImageRef,
} from '@lobechat/heterogeneous-agents/protocol';
export interface SpawnHeteroAgentRunParams {
agentType: string;
cwd?: string;
/** Image attachments (signed URLs) appended as image content blocks. */
imageList?: HeteroExecImageRef[];
jwt: string;
operationId: string;
prompt: string;
resumeSessionId?: string;
serverUrl: string;
systemContext?: string;
topicId: string;
}
export interface AgentRunAckResult {
reason?: string;
status: 'accepted' | 'rejected';
}
interface SpawnHeteroAgentRunLogger {
error?: (msg: string) => void;
info?: (msg: string) => void;
}
/**
* Spawn `lh hetero exec` for a gateway-dispatched agent run. Mirrors the
* desktop app's `spawnLhHeteroExec`: the spawned CLI owns the full pipeline
* (spawn -> adapt -> BatchIngester -> server ingest), so the connect daemon
* needs no local stream handling — it only kicks off the process.
*
* Re-invokes the current CLI entry (`process.execPath` + `process.argv[1]`)
* instead of relying on `lh` being on `PATH`, so it also works inside the
* detached `lh connect --daemon` child where `PATH` may be minimal.
*
* Resolves only once the child's outcome is known: `accepted` on the `spawn`
* event, `rejected` on an early `error`. `spawn()` reports failures (missing or
* inaccessible `cwd`, etc.) asynchronously via `error`, so acking eagerly would
* report a false success and leave the run with no process to emit
* `heteroFinish` — surfacing as a stuck assistant message. A rejected ack
* instead flows back as a dispatch failure the user can see.
*/
export function spawnHeteroAgentRun(
params: SpawnHeteroAgentRunParams,
logger?: SpawnHeteroAgentRunLogger,
): Promise<AgentRunAckResult> {
const {
agentType,
cwd,
imageList,
jwt,
operationId,
prompt,
resumeSessionId,
serverUrl,
systemContext,
topicId,
} = params;
const workDir = cwd ?? process.cwd();
// Server-ingest mode (--topic + --operation-id): events are batch-POSTed to
// the server, not rendered. `--input-json -` reads the prompt from stdin.
const cliArgs = [
process.argv[1],
'hetero',
'exec',
'--type',
agentType,
'--operation-id',
operationId,
'--topic',
topicId,
'--render',
'none',
'--input-json',
'-',
'--cwd',
workDir,
...(resumeSessionId ? ['--resume', resumeSessionId] : []),
];
// systemContext / image attachments turn the payload into a content-block
// array: context block first, then the user's prompt, then images — mirrors
// the desktop path. `lh hetero exec` coerces both shapes via
// coerceJsonPrompt.
const stdinPayload = buildHeteroExecStdinPayload({ imageList, prompt, systemContext });
return new Promise<AgentRunAckResult>((resolve) => {
let settled = false;
const settle = (result: AgentRunAckResult) => {
if (settled) return;
settled = true;
resolve(result);
};
const child = spawn(process.execPath, [...process.execArgv, ...cliArgs], {
cwd: workDir,
env: {
...process.env,
LOBEHUB_JWT: jwt,
LOBEHUB_SERVER: serverUrl,
},
stdio: ['pipe', 'inherit', 'inherit'],
});
child.once('spawn', () => {
// Only safe to write stdin once the process actually started.
try {
child.stdin?.write(stdinPayload);
child.stdin?.end();
} catch (err) {
logger?.error?.(
`hetero exec stdin write failed (op=${operationId}): ${(err as Error).message}`,
);
}
settle({ status: 'accepted' });
});
child.once('error', (err) => {
logger?.error?.(`hetero exec spawn failed (op=${operationId}): ${err.message}`);
settle({ reason: err.message, status: 'rejected' });
});
child.on('exit', (code, signal) => {
logger?.info?.(`hetero exec exited (op=${operationId}) code=${code} signal=${signal}`);
});
});
}
+116 -1
View File
@@ -6,6 +6,7 @@ import dotenv from 'dotenv';
import { defineConfig } from 'electron-vite';
import type { PluginOption, ViteDevServer } from 'vite';
import { loadEnv } from 'vite';
import tsconfigPaths from 'vite-tsconfig-paths';
import {
sharedOptimizeDeps,
@@ -88,10 +89,112 @@ function electronDesktopHtmlPlugin(): PluginOption {
};
}
const CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG = '__LOBECLOUD_DESKTOP_BUSINESS_FEATURES__';
const BUSINESS_CONST_MODULE_ID = '@lobechat/business-const';
const CLOUD_BUSINESS_CONST_MODULE_ID = '@cloud/business-const';
const DYNAMIC_BUSINESS_CONST_QUERY = '?lobe-cloud-desktop-business-const';
const createBusinessFeaturesBootstrapScript = () =>
`globalThis[${JSON.stringify(CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG)}] = true;`;
const replaceBusinessFlagExport = (code: string, name: string, initializer: string) => {
const pattern = new RegExp(`export\\s+(?:const|let|var)\\s+${name}\\s*=\\s*[\\s\\S]*?;`);
return {
code: code.replace(pattern, `export let ${name} = ${initializer};`),
replaced: pattern.test(code),
};
};
const injectDynamicBusinessFeatureFlag = (code: string) => {
const businessFlag = replaceBusinessFlagExport(
code,
'ENABLE_BUSINESS_FEATURES',
`Boolean(globalThis['${CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG}'])`,
);
const topicLinkFlag = replaceBusinessFlagExport(
businessFlag.code,
'ENABLE_TOPIC_LINK_SHARE',
'ENABLE_BUSINESS_FEATURES',
);
if (!businessFlag.replaced) {
throw new Error('Cannot find ENABLE_BUSINESS_FEATURES export in @cloud/business-const');
}
const topicLinkAssignment = topicLinkFlag.replaced
? '\n ENABLE_TOPIC_LINK_SHARE = enabled;'
: '';
return `${topicLinkFlag.code}
const __lobeCloudDesktopBusinessFeaturesFlagKey = '${CLOUD_DESKTOP_BUSINESS_FEATURES_FLAG}';
const __lobeCloudDesktopApplyBusinessFeaturesFlag = (value) => {
const enabled = Boolean(value);
ENABLE_BUSINESS_FEATURES = enabled;${topicLinkAssignment}
return enabled;
};
const __lobeCloudDesktopExistingDescriptor = Object.getOwnPropertyDescriptor(
globalThis,
__lobeCloudDesktopBusinessFeaturesFlagKey,
);
const __lobeCloudDesktopInitialValue = __lobeCloudDesktopExistingDescriptor?.get
? __lobeCloudDesktopExistingDescriptor.get.call(globalThis)
: globalThis[__lobeCloudDesktopBusinessFeaturesFlagKey];
Object.defineProperty(globalThis, __lobeCloudDesktopBusinessFeaturesFlagKey, {
configurable: true,
get() {
return ENABLE_BUSINESS_FEATURES;
},
set(value) {
__lobeCloudDesktopApplyBusinessFeaturesFlag(value);
},
});
__lobeCloudDesktopApplyBusinessFeaturesFlag(__lobeCloudDesktopInitialValue);
`;
};
function cloudDesktopBusinessConstPlugin(): PluginOption {
return {
enforce: 'pre',
async resolveId(id, importer) {
if (id !== BUSINESS_CONST_MODULE_ID) return;
const resolved = await this.resolve(CLOUD_BUSINESS_CONST_MODULE_ID, importer, {
skipSelf: true,
});
if (!resolved) throw new Error(`Cannot resolve ${CLOUD_BUSINESS_CONST_MODULE_ID}`);
return `${resolved.id}${DYNAMIC_BUSINESS_CONST_QUERY}`;
},
load(id) {
if (!id.endsWith(DYNAMIC_BUSINESS_CONST_QUERY)) return;
const sourcePath = id.slice(0, -DYNAMIC_BUSINESS_CONST_QUERY.length);
return injectDynamicBusinessFeatureFlag(readFileSync(sourcePath, 'utf8'));
},
name: 'lobe-cloud-desktop-business-const',
transformIndexHtml() {
return [
{
children: createBusinessFeaturesBootstrapScript(),
injectTo: 'head-prepend',
tag: 'script',
},
];
},
};
}
dotenv.config();
const isDev = process.env.NODE_ENV === 'development';
const ROOT_DIR = path.resolve(__dirname, '../..');
const CLOUD_ROOT_DIR = path.resolve(__dirname, '../../..');
const isCloudDesktopBuild = process.env.CLOUD_DESKTOP === '1';
const mode = process.env.NODE_ENV === 'production' ? 'production' : 'development';
Object.assign(process.env, loadEnv(mode, ROOT_DIR, ''));
@@ -105,8 +208,17 @@ const mainProcessRuntimeExternals = [
...externalRuntimeModules,
'node-mac-permissions',
];
const externalNavigationHosts =
process.env.DESKTOP_EXTERNAL_NAVIGATION_HOSTS ?? (isCloudDesktopBuild ? 'stripe.com' : '');
console.info(`[electron-vite.config.ts] Detected UPDATE_CHANNEL: ${updateChannel}`);
console.info(`[electron-vite.config.ts] Cloud desktop build: ${isCloudDesktopBuild}`);
const cloudTsconfigPathsPlugin = () =>
({
...tsconfigPaths({ projects: [path.resolve(CLOUD_ROOT_DIR, 'tsconfig.json')] }),
name: 'lobe-cloud-desktop-tsconfig-paths',
}) satisfies PluginOption;
export default defineConfig({
main: {
@@ -169,6 +281,7 @@ export default defineConfig({
sourcemap: isDev ? 'inline' : false,
},
define: {
'process.env.DESKTOP_EXTERNAL_NAVIGATION_HOSTS': JSON.stringify(externalNavigationHosts),
'process.env.UPDATE_CHANNEL': JSON.stringify(process.env.UPDATE_CHANNEL),
'process.env.UPDATE_SERVER_URL': JSON.stringify(process.env.UPDATE_SERVER_URL),
},
@@ -214,6 +327,8 @@ export default defineConfig({
},
optimizeDeps: sharedOptimizeDeps,
plugins: [
isCloudDesktopBuild && cloudTsconfigPathsPlugin(),
isCloudDesktopBuild && cloudDesktopBusinessConstPlugin(),
forceAbsoluteBasePlugin(),
electronDesktopHtmlPlugin(),
vanillaExtractPlugin(),
@@ -221,7 +336,7 @@ export default defineConfig({
],
resolve: {
dedupe: ['react', 'react-dom'],
tsconfigPaths: true,
tsconfigPaths: !isCloudDesktopBuild,
},
// In dev the BrowserWindow loads `app://renderer/` and the Electron main process
// proxies non-backend requests to this Vite dev server via `net.fetch`. The HMR
+2 -2
View File
@@ -111,7 +111,7 @@
"undici": "^7.16.0",
"uuid": "^14.0.0",
"vite": "8.0.14",
"vitest": "3.2.4",
"vitest": "3.2.6",
"zod": "^3.25.76"
},
"optionalDependencies": {
@@ -128,7 +128,7 @@
"node-gyp": "^12.4.0",
"react": "19.2.4",
"react-dom": "19.2.4",
"vitest": "3.2.4"
"vitest": "3.2.6"
}
}
}
+1
View File
@@ -7,6 +7,7 @@ import { getDesktopEnv } from '@/env';
export const isDev = electronIs.dev();
export const OFFICIAL_CLOUD_SERVER = getDesktopEnv().OFFICIAL_CLOUD_SERVER;
export const DESKTOP_EXTERNAL_NAVIGATION_HOSTS = getDesktopEnv().DESKTOP_EXTERNAL_NAVIGATION_HOSTS;
export const isMac = electronIs.macOS();
export const isWindows = electronIs.windows();
@@ -91,6 +91,13 @@ export default class BrowserWindowsCtr extends ControllerModule {
});
}
@IpcMethod()
isWindowFullScreen() {
return this.withSenderIdentifier((identifier) => {
return this.app.browserManager.isWindowFullScreen(identifier);
});
}
@IpcMethod()
setWindowAlwaysOnTop(flag: boolean) {
this.withSenderIdentifier((identifier) => {
@@ -17,6 +17,7 @@ import type {
KillCommandParams,
ListLocalFileParams,
ListProjectSkillsParams,
LocalFilePreviewUrlParams,
LocalReadFileParams,
LocalReadFilesParams,
LocalSearchFilesParams,
@@ -300,6 +301,7 @@ export default class GatewayConnectionCtr extends ControllerModule {
this.heterogeneousAgentCtr.spawnLhHeteroExec({
agentType: request.agentType,
cwd: request.cwd,
imageList: request.imageList,
jwt: request.jwt,
operationId: request.operationId,
prompt: request.prompt,
@@ -408,6 +410,10 @@ export default class GatewayConnectionCtr extends ControllerModule {
return this.localFileCtr.getProjectFileIndex(params as { scope?: string });
}
case 'getLocalFilePreview': {
return this.localFileCtr.getLocalFilePreview(params as LocalFilePreviewUrlParams);
}
case 'listProjectSkills': {
return this.workspaceCtr.listProjectSkills(params as ListProjectSkillsParams);
}
@@ -18,13 +18,20 @@ import {
} from '@lobechat/electron-client-ipc';
import type { AskUserBridge } from '@lobechat/heterogeneous-agents/askUser';
import { AskUserMcpServer } from '@lobechat/heterogeneous-agents/askUser';
import type { AgentContentBlock } from '@lobechat/heterogeneous-agents/spawn';
import type {
AgentContentBlock,
HeteroExecImageRef,
} from '@lobechat/heterogeneous-agents/protocol';
import { buildHeteroExecStdinPayload } from '@lobechat/heterogeneous-agents/protocol';
import type { AgentStreamEvent } from '@lobechat/heterogeneous-agents/spawn';
import {
AgentStreamPipeline,
buildAgentInput,
materializeImageToPath,
normalizeImage,
readCodexSessionModel,
resolveCliSpawnPlan,
resolveCodexInitialModel,
} from '@lobechat/heterogeneous-agents/spawn';
import { app as electronApp, BrowserWindow } from 'electron';
@@ -176,9 +183,18 @@ interface AgentSession {
command: string;
cwd?: string;
env?: Record<string, string>;
model?: string;
modelSource?: string;
modelVerificationLastAttemptAt?: number;
modelVerificationLastAttemptSessionId?: string;
process?: ChildProcess;
resumeSessionId?: string;
sessionId: string;
verifiedModel?: string;
verifiedModelContextWindow?: number;
verifiedModelProvider?: string;
verifiedModelSessionId?: string;
verifiedModelSourceFile?: string;
}
type SessionErrorPayload = HeterogeneousAgentSessionError | string;
@@ -581,12 +597,19 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
createdAt: createdAt.toISOString(),
cwd,
envKeys: session.env ? Object.keys(session.env).sort() : [],
model: session.model,
modelSource: session.modelSource,
resumeSessionId: session.resumeSessionId,
sessionId: session.sessionId,
stdinBytes: stdinPayload === undefined ? 0 : Buffer.byteLength(stdinPayload),
stdinFile: stdinPayload === undefined ? undefined : 'stdin.txt',
stderrFile: 'stderr.log',
stdoutFile: 'stdout.jsonl',
verifiedModel: session.verifiedModel,
verifiedModelContextWindow: session.verifiedModelContextWindow,
verifiedModelProvider: session.verifiedModelProvider,
verifiedModelSessionId: session.verifiedModelSessionId,
verifiedModelSourceFile: session.verifiedModelSourceFile,
},
null,
2,
@@ -888,6 +911,7 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
let spawnPlan;
let traceSession;
let cwd: string;
let spawnEnv: NodeJS.ProcessEnv;
try {
const driver = getHeterogeneousAgentDriver(session.agentType);
spawnPlan = await driver.buildSpawnPlan({
@@ -906,6 +930,23 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
// Fall back to the user's Desktop so the process never inherits
// the Electron parent's cwd (which is `/` when launched from Finder).
cwd = session.cwd || electronApp.getPath('desktop');
// Forward the user's proxy settings to the CLI. The main-process undici
// dispatcher doesn't reach child processes — they need env vars.
const proxyEnv = buildProxyEnv(this.app.storeManager.get('networkProxy'));
spawnEnv = { ...buildInheritedSpawnEnv(), ...proxyEnv, ...session.env };
if (session.agentType === 'codex') {
const initialModel = await resolveCodexInitialModel({
args: spawnPlan.args,
env: spawnEnv,
});
if (initialModel?.model) {
session.model = initialModel.model;
session.modelSource = initialModel.source;
}
}
traceSession = await this.createCliTraceSession({
cliArgs: spawnPlan.args,
cwd,
@@ -940,29 +981,27 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
// the claude binary can leave bash/grep/etc. tool children running and
// the CLI hung waiting on them. Windows has different semantics — use
// taskkill /T /F there; no detached flag needed.
// Forward the user's proxy settings to the CLI. The main-process undici
// dispatcher doesn't reach child processes — they need env vars.
const proxyEnv = buildProxyEnv(this.app.storeManager.get('networkProxy'));
const spawnOptions = {
cwd,
detached: process.platform !== 'win32',
// Strip host Anthropic creds from the inherited env so a developer's
// shell `ANTHROPIC_API_KEY` can't hijack the CLI's own auth. `session.env`
// is spread last, so an agent that explicitly configures a key still wins.
env: { ...buildInheritedSpawnEnv(), ...proxyEnv, ...session.env },
env: spawnEnv,
stdio: [useStdin ? 'pipe' : 'ignore', 'pipe', 'pipe'] as ['pipe' | 'ignore', 'pipe', 'pipe'],
};
return new Promise<void>((resolve, reject) => {
const proc = spawn(resolvedCliSpawnPlan.command, resolvedCliSpawnPlan.args, spawnOptions);
this.handleSpawnedAgentProcess({
cwd,
intervention,
params,
proc,
reject,
resolve,
session,
spawnEnv,
traceSession,
useStdin,
spawnPlan,
@@ -970,23 +1009,86 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
});
}
private async verifyCodexSessionModel({
env,
pipeline,
session,
traceSession,
}: {
env: NodeJS.ProcessEnv;
pipeline: AgentStreamPipeline;
session: AgentSession;
traceSession: CliTraceSession | undefined;
}): Promise<AgentStreamEvent[]> {
if (
session.agentType !== 'codex' ||
!pipeline.sessionId ||
session.verifiedModelSessionId === pipeline.sessionId
) {
return [];
}
const now = Date.now();
if (
session.modelVerificationLastAttemptSessionId === pipeline.sessionId &&
session.modelVerificationLastAttemptAt &&
now - session.modelVerificationLastAttemptAt < 1000
) {
return [];
}
session.modelVerificationLastAttemptSessionId = pipeline.sessionId;
session.modelVerificationLastAttemptAt = now;
const sessionModel = await readCodexSessionModel(pipeline.sessionId, { env });
if (!sessionModel?.model) return [];
const previousModel = session.model;
session.verifiedModel = sessionModel.model;
session.verifiedModelContextWindow = sessionModel.contextWindow;
session.verifiedModelProvider = sessionModel.provider;
session.verifiedModelSessionId = pipeline.sessionId;
session.verifiedModelSourceFile = sessionModel.sourceFile;
void this.writeCliTraceJson(traceSession, 'model.json', {
initialModel: previousModel,
initialModelSource: session.modelSource,
sessionId: pipeline.sessionId,
verifiedAt: new Date().toISOString(),
verifiedContextWindow: sessionModel.contextWindow,
verifiedLine: sessionModel.line,
verifiedModel: sessionModel.model,
verifiedModelProvider: sessionModel.provider,
verifiedSourceFile: sessionModel.sourceFile,
});
if (previousModel === sessionModel.model) return [];
session.model = sessionModel.model;
session.modelSource = 'codex-session';
return pipeline.configureSession({ model: sessionModel.model });
}
private handleSpawnedAgentProcess({
cwd,
intervention,
params,
proc,
reject,
resolve,
session,
spawnEnv,
spawnPlan,
traceSession,
useStdin,
}: {
cwd: string;
intervention?: Awaited<ReturnType<HeterogeneousAgentCtr['setupInterventionForOp']>>;
params: SendPromptParams;
proc: ChildProcess;
reject: (reason?: unknown) => void;
resolve: () => void;
session: AgentSession;
spawnEnv: NodeJS.ProcessEnv;
spawnPlan: HeterogeneousAgentBuildPlan;
traceSession: CliTraceSession | undefined;
useStdin: boolean;
@@ -1021,10 +1123,12 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
// toStreamEvent all run inside the shared pipeline, so renderer + future
// server `heteroIngest` see the same `AgentStreamEvent` wire shape with
// no per-consumer adapter. The pipeline auto-wires the Codex
// file-change line-stat tracker when `agentType === 'codex'`, so this
// file-change diff/stat tracker when `agentType === 'codex'`, so this
// controller stays agent-agnostic.
const pipeline = new AgentStreamPipeline({
agentType: session.agentType,
cwd,
initialModel: session.model,
operationId: params.operationId,
});
let stdoutBroadcastQueue: Promise<void> = Promise.resolve();
@@ -1039,6 +1143,14 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
if (pipeline.sessionId && pipeline.sessionId !== session.agentSessionId) {
session.agentSessionId = pipeline.sessionId;
}
events.push(
...(await this.verifyCodexSessionModel({
env: spawnEnv,
pipeline,
session,
traceSession,
})),
);
for (const event of events) {
this.broadcast('heteroAgentEvent', {
event,
@@ -1317,6 +1429,8 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
spawnLhHeteroExec(params: {
agentType: string;
cwd?: string;
/** Image attachments (signed URLs) appended as image content blocks. */
imageList?: HeteroExecImageRef[];
jwt: string;
operationId: string;
prompt: string;
@@ -1328,6 +1442,7 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
const {
agentType,
cwd,
imageList,
jwt,
operationId,
prompt,
@@ -1380,16 +1495,11 @@ export default class HeterogeneousAgentCtr extends ControllerModule {
stdio: ['pipe', 'inherit', 'inherit'],
});
// When systemContext is provided, send a content-block array so CC sees the
// context block first, then the user's actual message — mirrors
// spawnHeteroSandbox. lh handles JSON arrays via coerceJsonPrompt, so no lh
// changes are required.
const stdinPayload = systemContext
? JSON.stringify([
{ text: systemContext, type: 'text' },
{ text: prompt, type: 'text' },
])
: JSON.stringify(prompt);
// systemContext / image attachments turn the payload into a content-block
// array so CC sees the context block first, then the user's message, then
// the images — mirrors spawnHeteroSandbox. lh handles both shapes via
// coerceJsonPrompt, so no lh changes are required.
const stdinPayload = buildHeteroExecStdinPayload({ imageList, prompt, systemContext });
child.stdin.write(stdinPayload);
child.stdin.end();
@@ -12,6 +12,7 @@ import {
type GrepContentParams,
type GrepContentResult,
type ListLocalFileParams,
type LocalFilePreviewResult,
type LocalFilePreviewUrlParams,
type LocalFilePreviewUrlResult,
type LocalMoveFilesResultItem,
@@ -65,6 +66,19 @@ const logger = createLogger('controllers:LocalFileCtr');
const SAFE_PATH_PREFIXES = ['/tmp', '/var/tmp'] as const;
const TEXT_PREVIEW_MIME_TYPES = new Set([
'application/graphql',
'application/javascript',
'application/json',
'application/markdown',
'application/toml',
'application/xml',
'application/yaml',
'text/markdown',
'text/mdx',
'text/x-markdown',
]);
const normalizeAbsolutePath = (inputPath: string): string =>
path.normalize(path.isAbsolute(inputPath) ? inputPath : `/${inputPath}`);
@@ -91,6 +105,48 @@ const resolveNearestExistingRealPath = async (targetPath: string): Promise<strin
const toPosixRelativePath = (filePath: string) => filePath.split(path.sep).join('/');
const normalizeContentType = (contentType: string): string =>
contentType.split(';')[0].trim().toLowerCase();
const isTextPreviewMimeType = (mimeType: string): boolean =>
mimeType.startsWith('text/') || TEXT_PREVIEW_MIME_TYPES.has(mimeType);
const serializePreviewFile = ({
buffer,
contentType,
}: {
buffer: Buffer;
contentType: string;
}): NonNullable<LocalFilePreviewResult['preview']> => {
const normalizedContentType = normalizeContentType(contentType);
if (normalizedContentType.startsWith('image/')) {
return {
base64: buffer.toString('base64'),
contentType: normalizedContentType,
type: 'image',
};
}
if (isTextPreviewMimeType(normalizedContentType)) {
return {
content: buffer.toString('utf8'),
contentType: normalizedContentType,
type: 'text',
};
}
if (normalizedContentType === 'application/pdf') {
return { contentType: normalizedContentType, type: 'pdf' };
}
if (normalizedContentType.startsWith('video/')) {
return { contentType: normalizedContentType, type: 'video' };
}
return { contentType: normalizedContentType, type: 'binary' };
};
const createProjectFileEntry = (
root: string,
absolutePath: string,
@@ -401,6 +457,31 @@ export default class LocalFileCtr extends ControllerModule {
}
}
@IpcMethod()
async getLocalFilePreview({
path: filePath,
workingDirectory,
}: LocalFilePreviewUrlParams): Promise<LocalFilePreviewResult> {
try {
const preview = await this.app.localFileProtocolManager.readPreviewFile({
filePath,
workspaceRoot: workingDirectory,
});
if (!preview) {
return { error: 'File is outside the approved workspace', success: false };
}
return {
preview: serializePreviewFile(preview),
success: true,
};
} catch (error) {
logger.error('Failed to read local file preview:', error);
return { error: (error as Error).message, success: false };
}
}
@IpcMethod()
async handlePrepareSkillDirectory({
forceRefresh,
@@ -29,6 +29,7 @@ const mockCloseWindow = vi.fn();
const mockMinimizeWindow = vi.fn();
const mockMaximizeWindow = vi.fn();
const mockIsWindowMaximized = vi.fn();
const mockIsWindowFullScreen = vi.fn();
const mockRetrieveByIdentifier = vi.fn();
const mockStartSession = vi.fn();
const testSenderIdentifierString: string = 'test-window-event-id';
@@ -58,6 +59,7 @@ const mockApp = {
minimizeWindow: mockMinimizeWindow,
maximizeWindow: mockMaximizeWindow,
isWindowMaximized: mockIsWindowMaximized,
isWindowFullScreen: mockIsWindowFullScreen,
retrieveByIdentifier: mockRetrieveByIdentifier.mockImplementation(
(identifier: AppBrowsersIdentifiers | string) => {
if (identifier === 'some-other-window') {
@@ -166,6 +168,20 @@ describe('BrowserWindowsCtr', () => {
});
});
describe('isWindowFullScreen', () => {
it('should return fullscreen state for the sender window', () => {
mockIsWindowFullScreen.mockReturnValueOnce(true);
const sender = {} as any;
const context = { sender, event: { sender } as any } as IpcContext;
const result = runWithIpcContext(context, () => browserWindowsCtr.isWindowFullScreen());
expect(mockGetIdentifierByWebContents).toHaveBeenCalledWith(context.sender);
expect(mockIsWindowFullScreen).toHaveBeenCalledWith(testSenderIdentifierString);
expect(result).toBe(true);
});
});
describe('interceptRoute', () => {
const baseParams = { source: 'link-click' as const };
@@ -1,3 +1,5 @@
import path from 'node:path';
import { zipSync } from 'fflate';
import { beforeEach, describe, expect, it, vi } from 'vitest';
@@ -88,6 +90,7 @@ const mockLocalFileProtocolManager = {
approveIndexedProjectRoot: vi.fn(),
approveProjectRootFromScope: vi.fn(),
createPreviewUrl: vi.fn(),
readPreviewFile: vi.fn(),
};
// Mock makeSureDirExist
@@ -146,7 +149,6 @@ describe('LocalFileCtr', () => {
it('should expand a leading ~ to the user home directory', async () => {
const os = await import('node:os');
const path = await import('node:path');
vi.mocked(mockShell.openPath).mockResolvedValue('');
const result = await localFileCtr.handleOpenLocalFile({ path: '~/git/work/file.txt' });
@@ -171,7 +173,6 @@ describe('LocalFileCtr', () => {
it('should expand a leading ~ when opening a directory', async () => {
const os = await import('node:os');
const path = await import('node:path');
vi.mocked(mockShell.openPath).mockResolvedValue('');
const result = await localFileCtr.handleOpenLocalFolder({
@@ -248,6 +249,48 @@ describe('LocalFileCtr', () => {
});
});
describe('getLocalFilePreview', () => {
it('should return text preview content for an approved workspace file', async () => {
mockLocalFileProtocolManager.readPreviewFile.mockResolvedValue({
buffer: Buffer.from('const value = 1;'),
contentType: 'text/plain; charset=utf-8',
realPath: '/workspace/app.ts',
});
const result = await localFileCtr.getLocalFilePreview({
path: '/workspace/app.ts',
workingDirectory: '/workspace',
});
expect(mockLocalFileProtocolManager.readPreviewFile).toHaveBeenCalledWith({
filePath: '/workspace/app.ts',
workspaceRoot: '/workspace',
});
expect(result).toEqual({
preview: {
content: 'const value = 1;',
contentType: 'text/plain',
type: 'text',
},
success: true,
});
});
it('should reject preview payload creation outside an approved workspace', async () => {
mockLocalFileProtocolManager.readPreviewFile.mockResolvedValue(null);
const result = await localFileCtr.getLocalFilePreview({
path: '/Users/alice/.ssh/id_rsa',
workingDirectory: '/workspace',
});
expect(result).toEqual({
error: 'File is outside the approved workspace',
success: false,
});
});
});
describe('handleWriteFile', () => {
it('should write file successfully', async () => {
vi.mocked(mockFsPromises.mkdir).mockResolvedValue(undefined);
+55 -1
View File
@@ -7,7 +7,7 @@ import type { BrowserWindowConstructorOptions } from 'electron';
import { app, BrowserWindow, ipcMain, screen, session as electronSession, shell } from 'electron';
import { preloadDir, resourcesDir } from '@/const/dir';
import { isMac } from '@/const/env';
import { DESKTOP_EXTERNAL_NAVIGATION_HOSTS, isMac } from '@/const/env';
import RemoteServerConfigCtr from '@/controllers/RemoteServerConfigCtr';
import { backendProxyProtocolManager } from '@/core/infrastructure/BackendProxyProtocolManager';
import { appendVercelCookie, setResponseHeader } from '@/utils/http-headers';
@@ -19,6 +19,31 @@ import { WindowThemeManager } from './WindowThemeManager';
const logger = createLogger('core:Browser');
const getExternalNavigationHosts = () =>
DESKTOP_EXTERNAL_NAVIGATION_HOSTS.split(',')
.map((host) => host.trim().toLowerCase())
.filter(Boolean);
const shouldOpenTopLevelNavigationExternally = (rawUrl: string) => {
const externalNavigationHosts = getExternalNavigationHosts();
if (externalNavigationHosts.length === 0) return false;
let url: URL;
try {
url = new URL(rawUrl);
} catch {
return false;
}
if (url.protocol !== 'http:' && url.protocol !== 'https:') return false;
const hostname = url.hostname.toLowerCase();
return externalNavigationHosts.some(
(externalHost) => hostname === externalHost || hostname.endsWith(`.${externalHost}`),
);
};
// ==================== Types ====================
export interface BrowserWindowOpts extends BrowserWindowConstructorOptions {
@@ -194,10 +219,27 @@ export default class Browser {
this.setupReadyToShowListener(browserWindow);
this.setupCloseListener(browserWindow);
this.setupFocusListener(browserWindow);
this.setupFullscreenListener(browserWindow);
this.setupTopLevelNavigationListener(browserWindow);
this.setupWillPreventUnloadListener(browserWindow);
this.setupContextMenu(browserWindow);
}
private setupTopLevelNavigationListener(browserWindow: BrowserWindow): void {
logger.debug(`[${this.identifier}] Setting up top-level navigation listener.`);
browserWindow.webContents.on('will-navigate', (event, url) => {
if (!shouldOpenTopLevelNavigationExternally(url)) return;
logger.info(`[${this.identifier}] Opening top-level navigation externally: ${url}`);
event.preventDefault();
shell.openExternal(url).catch((error) => {
logger.error(`[${this.identifier}] Failed to open external navigation URL: ${url}`, error);
});
});
}
/**
* Setup window open handler to intercept external links
* Prevents opening new windows in renderer and uses system browser instead
@@ -268,6 +310,18 @@ export default class Browser {
});
}
private setupFullscreenListener(browserWindow: BrowserWindow): void {
logger.debug(`[${this.identifier}] Setting up fullscreen event listeners.`);
browserWindow.on('enter-full-screen', () => {
this.broadcast('windowFullscreenChanged', { isFullScreen: true });
});
browserWindow.on('leave-full-screen', () => {
this.broadcast('windowFullscreenChanged', { isFullScreen: false });
});
}
/**
* Setup context menu with platform-specific features
* Delegates to MenuManager for consistent platform behavior
@@ -368,6 +368,11 @@ export class BrowserManager {
return browser?.browserWindow.isMaximized() ?? false;
}
isWindowFullScreen(identifier: string) {
const browser = this.browsers.get(identifier);
return browser?.browserWindow.isFullScreen() ?? false;
}
setWindowSize(identifier: string, size: { height?: number; width?: number }) {
const browser = this.browsers.get(identifier);
browser?.setWindowSize(size);
@@ -9,6 +9,7 @@ const {
mockBrowserWindow,
mockNativeTheme,
mockIpcMain,
mockShell,
mockScreen,
MockBrowserWindow,
mockEnv,
@@ -64,6 +65,7 @@ const {
MockBrowserWindow: vi.fn().mockImplementation(() => mockBrowserWindow),
mockBrowserWindow,
mockEnv: {
externalNavigationHosts: '',
isDev: false,
isLinux: false,
isMac: false,
@@ -91,6 +93,9 @@ const {
workArea: { height: 1080, width: 1920, x: 0, y: 0 },
}),
},
mockShell: {
openExternal: vi.fn().mockResolvedValue(undefined),
},
};
});
@@ -101,6 +106,7 @@ vi.mock('electron', () => ({
ipcMain: mockIpcMain,
nativeTheme: mockNativeTheme,
screen: mockScreen,
shell: mockShell,
}));
// Mock logger
@@ -121,6 +127,9 @@ vi.mock('@/const/dir', () => ({
}));
vi.mock('@/const/env', () => ({
get DESKTOP_EXTERNAL_NAVIGATION_HOSTS() {
return mockEnv.externalNavigationHosts;
},
get isDev() {
return mockEnv.isDev;
},
@@ -182,6 +191,7 @@ describe('Browser', () => {
mockEnv.isMac = false;
mockEnv.isMacTahoe = false;
mockEnv.isWindows = true;
mockEnv.externalNavigationHosts = '';
// Create mock App
mockStoreManagerGet = vi.fn().mockReturnValue(undefined);
@@ -531,6 +541,30 @@ describe('Browser', () => {
});
});
describe('fullscreen events', () => {
it('should broadcast fullscreen state changes', () => {
const enterHandler = mockBrowserWindow.on.mock.calls.find(
(call) => call[0] === 'enter-full-screen',
)?.[1];
const leaveHandler = mockBrowserWindow.on.mock.calls.find(
(call) => call[0] === 'leave-full-screen',
)?.[1];
expect(enterHandler).toBeDefined();
expect(leaveHandler).toBeDefined();
enterHandler();
expect(mockBrowserWindow.webContents.send).toHaveBeenCalledWith('windowFullscreenChanged', {
isFullScreen: true,
});
leaveHandler();
expect(mockBrowserWindow.webContents.send).toHaveBeenCalledWith('windowFullscreenChanged', {
isFullScreen: false,
});
});
});
describe('close', () => {
it('should close window', () => {
browser.close();
@@ -730,4 +764,38 @@ describe('Browser', () => {
expect(mockEvent.preventDefault).not.toHaveBeenCalled();
});
});
describe('top-level navigation handling', () => {
let willNavigateHandler: (event: any, url: string) => void;
beforeEach(() => {
willNavigateHandler = mockBrowserWindow.webContents.on.mock.calls.find(
(call) => call[0] === 'will-navigate',
)?.[1];
});
it('should open configured external navigation hosts in system browser', () => {
mockEnv.externalNavigationHosts = 'stripe.com';
const mockEvent = { preventDefault: vi.fn() };
expect(willNavigateHandler).toBeDefined();
willNavigateHandler(mockEvent, 'https://checkout.stripe.com/c/pay/session_id');
expect(mockEvent.preventDefault).toHaveBeenCalled();
expect(mockShell.openExternal).toHaveBeenCalledWith(
'https://checkout.stripe.com/c/pay/session_id',
);
});
it('should allow internal result routes in the app window', () => {
mockEnv.externalNavigationHosts = 'stripe.com';
const mockEvent = { preventDefault: vi.fn() };
expect(willNavigateHandler).toBeDefined();
willNavigateHandler(mockEvent, 'http://localhost:3000/payment/upgrade-success');
expect(mockEvent.preventDefault).not.toHaveBeenCalled();
expect(mockShell.openExternal).not.toHaveBeenCalled();
});
});
});
@@ -48,6 +48,12 @@ interface PreviewTokenRecord {
realPath: string;
}
export interface PreviewFileReadResult {
buffer: Buffer;
contentType: string;
realPath: string;
}
/**
* Custom `localfile://` protocol for project file previews.
*
@@ -214,36 +220,43 @@ export class LocalFileProtocolManager {
workspaceRoot: string;
}): Promise<string | null> {
const normalizedFilePath = normalizeAbsolutePath(filePath);
const normalizedWorkspaceRoot = normalizeAbsolutePath(workspaceRoot);
if (!normalizedFilePath || !normalizedWorkspaceRoot) return null;
if (!normalizedFilePath) return null;
const [realFilePath, realWorkspaceRoot] = await Promise.all([
realpath(normalizedFilePath),
realpath(normalizedWorkspaceRoot),
]);
const normalizedRealFilePath = normalizeAbsolutePath(realFilePath);
const normalizedRealWorkspaceRoot = normalizeAbsolutePath(realWorkspaceRoot);
if (!normalizedRealFilePath || !normalizedRealWorkspaceRoot) return null;
if (
!this.approvedWorkspaceRoots.has(normalizedRealWorkspaceRoot) &&
!this.indexedProjectRoots.has(normalizedRealWorkspaceRoot)
) {
return null;
}
if (!isPathWithinRoot(normalizedRealFilePath, normalizedRealWorkspaceRoot)) return null;
const realFilePath = await this.resolveApprovedPreviewPath({ filePath, workspaceRoot });
if (!realFilePath) return null;
this.cleanupExpiredTokens();
const token = randomUUID();
this.previewTokens.set(token, {
expiresAt: Date.now() + PREVIEW_TOKEN_TTL_MS,
realPath: normalizedRealFilePath,
realPath: realFilePath,
});
return buildLocalFileUrl(normalizedFilePath, token);
}
async readPreviewFile({
filePath,
workspaceRoot,
}: {
filePath: string;
workspaceRoot: string;
}): Promise<PreviewFileReadResult | null> {
const realFilePath = await this.resolveApprovedPreviewPath({ filePath, workspaceRoot });
if (!realFilePath) return null;
const fileStat = await stat(realFilePath);
if (!fileStat.isFile()) return null;
const buffer = await readFile(realFilePath);
return {
buffer,
contentType: resolveLocalFileMimeType(realFilePath, buffer),
realPath: realFilePath,
};
}
/**
* Decode the URL pathname back into an absolute filesystem path.
*
@@ -283,6 +296,36 @@ export class LocalFileProtocolManager {
return normalized;
}
private async resolveApprovedPreviewPath({
filePath,
workspaceRoot,
}: {
filePath: string;
workspaceRoot: string;
}): Promise<string | null> {
const normalizedFilePath = normalizeAbsolutePath(filePath);
const normalizedWorkspaceRoot = normalizeAbsolutePath(workspaceRoot);
if (!normalizedFilePath || !normalizedWorkspaceRoot) return null;
const [realFilePath, realWorkspaceRoot] = await Promise.all([
realpath(normalizedFilePath),
realpath(normalizedWorkspaceRoot),
]);
const normalizedRealFilePath = normalizeAbsolutePath(realFilePath);
const normalizedRealWorkspaceRoot = normalizeAbsolutePath(realWorkspaceRoot);
if (!normalizedRealFilePath || !normalizedRealWorkspaceRoot) return null;
if (
!this.approvedWorkspaceRoots.has(normalizedRealWorkspaceRoot) &&
!this.indexedProjectRoots.has(normalizedRealWorkspaceRoot)
) {
return null;
}
if (!isPathWithinRoot(normalizedRealFilePath, normalizedRealWorkspaceRoot)) return null;
return normalizedRealFilePath;
}
private cleanupExpiredTokens() {
const now = Date.now();
for (const [token, record] of this.previewTokens) {
@@ -278,6 +278,37 @@ describe('LocalFileProtocolManager', () => {
expect(url).toContain('token=');
});
it('reads preview payloads only from approved project roots', async () => {
const manager = new LocalFileProtocolManager();
await manager.approveIndexedProjectRoot('/Users/alice/project');
mockReadFile.mockResolvedValue(Buffer.from('const value = 1;'));
const result = await manager.readPreviewFile({
filePath: '/Users/alice/project/App.tsx',
workspaceRoot: '/Users/alice/project',
});
expect(result).toEqual({
buffer: Buffer.from('const value = 1;'),
contentType: 'text/plain; charset=utf-8',
realPath: '/Users/alice/project/App.tsx',
});
expect(mockReadFile).toHaveBeenCalledWith('/Users/alice/project/App.tsx');
});
it('does not read preview payloads outside the approved workspace root', async () => {
const manager = new LocalFileProtocolManager();
await manager.approveIndexedProjectRoot('/Users/alice/project');
const result = await manager.readPreviewFile({
filePath: '/Users/alice/.ssh/id_rsa',
workspaceRoot: '/Users/alice/project',
});
expect(result).toBeNull();
expect(mockReadFile).not.toHaveBeenCalled();
});
it('defers registration until app ready when not yet ready', async () => {
mockApp.isReady.mockReturnValue(false);
let resolveReady: () => void = () => undefined;
+10 -1
View File
@@ -50,6 +50,13 @@ const envNumber = (defaultValue: number) =>
}, z.number().optional())
.default(defaultValue);
const getRuntimeEnv = () => ({
...process.env,
DESKTOP_EXTERNAL_NAVIGATION_HOSTS: process.env.DESKTOP_EXTERNAL_NAVIGATION_HOSTS,
UPDATE_CHANNEL: process.env.UPDATE_CHANNEL,
UPDATE_SERVER_URL: process.env.UPDATE_SERVER_URL,
});
/**
* Desktop (Electron main process) runtime env access.
*
@@ -63,13 +70,15 @@ export const getDesktopEnv = memoize(() =>
clientPrefix: 'PUBLIC_',
emptyStringAsUndefined: true,
isServer: true,
runtimeEnv: process.env,
runtimeEnv: getRuntimeEnv(),
server: {
DEBUG_VERBOSE: envBoolean(false),
// escape hatch: allow testing static renderer in dev via env
DESKTOP_RENDERER_STATIC: envBoolean(false),
DESKTOP_EXTERNAL_NAVIGATION_HOSTS: z.string().optional().default(''),
// device gateway url override (dev: point at a local `wrangler dev` instance,
// e.g. http://localhost:8787). Falls back to the stored value, then the
// production gateway.
@@ -17,24 +17,23 @@ const log = debug('lobe-server:agent-runtime:coordinator');
* decision) starts, but that resume runs under a **new** operationId with
* its own event stream. For the paused operationId no further events will
* arrive, so clients should stop waiting the same way they do on done.
*
* `waiting_for_async_tool` is different: deferred tools such as server
* sub-agents resume the SAME operationId after the out-of-band result is
* backfilled. Ending the stream at park time makes the client mark the turn
* as stopped while the server is still waiting for sub-agents.
*/
const STREAM_END_STATUSES = new Set<AgentState['status']>([
'done',
'error',
'interrupted',
'waiting_for_human',
'waiting_for_async_tool',
]);
const hasEnteredStreamEndState = (
previousStatus?: AgentState['status'],
nextStatus?: AgentState['status'],
): nextStatus is
| 'done'
| 'error'
| 'interrupted'
| 'waiting_for_human'
| 'waiting_for_async_tool' => {
): nextStatus is 'done' | 'error' | 'interrupted' | 'waiting_for_human' => {
const wasStreamEnd = previousStatus ? STREAM_END_STATUSES.has(previousStatus) : false;
return Boolean(nextStatus && STREAM_END_STATUSES.has(nextStatus) && !wasStreamEnd);
};
@@ -73,6 +73,7 @@ import { TopicModel } from '@/database/models/topic';
import { UserModel } from '@/database/models/user';
import { type LobeChatDatabase } from '@/database/type';
import { fileEnv } from '@/envs/file';
import { type ExecutionPlan, isDeviceCapablePlan } from '@/helpers/executionTarget';
import { serverMessagesEngine } from '@/server/modules/Mecha/ContextEngineering';
import { type EvalContext } from '@/server/modules/Mecha/ContextEngineering/types';
import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
@@ -202,6 +203,51 @@ const isEmptyModelCompletion = (params: {
return true;
};
type ReasoningReplayNode = {
children?: ReasoningReplayNode[];
members?: ReasoningReplayNode[];
reasoning?: unknown;
};
const stripAssistantReasoningForReplay = (messages: UIChatMessage[]): UIChatMessage[] => {
const stripMessage = <T extends ReasoningReplayNode>(message: T): T => {
let changed = false;
const children = message.children?.map((child) => {
const strippedChild = stripMessage(child);
if (strippedChild !== child) changed = true;
return strippedChild;
});
const members = message.members?.map((member) => {
const strippedMember = stripMessage(member);
if (strippedMember !== member) changed = true;
return strippedMember;
});
if ('reasoning' in message) changed = true;
if (!changed) return message;
const { reasoning: _reasoning, ...messageWithoutReasoning } = message;
return {
...messageWithoutReasoning,
...(children ? { children } : {}),
...(members ? { members } : {}),
} as T;
};
let changed = false;
const strippedMessages = messages.map((message) => {
const strippedMessage = stripMessage(message);
if (strippedMessage !== message) changed = true;
return strippedMessage;
});
return changed ? strippedMessages : messages;
};
const GEN_AI_FUNCTION_TOOL_TYPE: ToolType = 'function';
type ToolFailureKind = 'replan' | 'retry' | 'stop';
@@ -532,17 +578,23 @@ export const createRuntimeExecutors = (
const provider = llmPayload.provider || state.modelRuntimeConfig?.provider;
// Resolve tools via ToolResolver (unified tool injection).
//
// Belt-and-suspenders: even if `aiAgent.execAgent` ever forgets to clear
// `state.metadata.activeDeviceId` for a non-trusted sender, swallowing
// it here keeps `buildStepToolDelta` from re-injecting `local-system` —
// the engine's enabledToolIds exclusion alone is not enough, since the
// delta builder treats activeDeviceId as an independent activation
// signal and only dedupes against already-enabled tools.
// Single-track device gate: `buildStepToolDelta` treats activeDeviceId as
// an independent activation signal (it only dedupes against already-
// enabled tools), so any id that reaches it WILL inject local-system. The
// execution plan is the only authority on whether this session may touch
// a device — swallow the id for non-device-capable plans (`none`,
// `sandbox`) and for denied senders, even if `state.metadata.activeDeviceId`
// was populated by a bug or a mid-run side effect. Plans absent on old /
// resumed operations fall back to the policy-only gate.
const devicePolicy = state.metadata?.deviceAccessPolicy as
| { canUseDevice: boolean; reason: DeviceAccessReason }
| undefined;
const executionPlan = state.metadata?.executionPlan as ExecutionPlan | undefined;
const planAllowsDevice = !executionPlan || isDeviceCapablePlan(executionPlan);
const activeDeviceId =
devicePolicy?.canUseDevice === false ? undefined : state.metadata?.activeDeviceId;
devicePolicy?.canUseDevice === false || !planAllowsDevice
? undefined
: state.metadata?.activeDeviceId;
const operationToolSet: OperationToolSet = state.operationToolSet ?? {
enabledToolIds: [],
executorMap: state.toolExecutorMap ?? {},
@@ -660,7 +712,7 @@ export const createRuntimeExecutors = (
try {
type ContentPart = { text: string; type: 'text' } | { image: string; type: 'image' };
let shouldPersistAssistantReasoning = false;
let shouldReplayAssistantReasoning = false;
let preserveThinkingForPayload: boolean | undefined;
// Process messages through serverMessagesEngine to inject system role, knowledge, etc.
@@ -699,19 +751,21 @@ export const createRuntimeExecutors = (
modelSupportsPreserveThinkingFromCard ||
(!modelCard && providerSupportsPreserveThinkingFallback);
shouldPersistAssistantReasoning =
preserveThinkingRequested && modelSupportsPreserveThinking;
shouldReplayAssistantReasoning = preserveThinkingRequested && modelSupportsPreserveThinking;
preserveThinkingForPayload =
modelSupportsPreserveThinking && typeof preserveThinkingConfigured === 'boolean'
? preserveThinkingConfigured
: undefined;
const messagesForContext = shouldReplayAssistantReasoning
? (llmPayload.messages as UIChatMessage[])
: stripAssistantReasoningForReplay(llmPayload.messages as UIChatMessage[]);
// Extract <refer_topic> tags from messages and fetch summaries.
// Skip if messages already contain injected topic_reference_context
// (e.g., from client-side contextEngineering preprocessing) to avoid double injection.
let topicReferences;
const alreadyHasTopicRefs = (
llmPayload.messages as Array<{ content: string | unknown }>
messagesForContext as Array<{ content: string | unknown }>
).some(
(m) => typeof m.content === 'string' && m.content.includes('topic_reference_context'),
);
@@ -720,7 +774,7 @@ export const createRuntimeExecutors = (
const topicModel = new TopicModel(ctx.serverDB, ctx.userId, ctx.workspaceId);
const messageModel = new MessageModelClass(ctx.serverDB, ctx.userId, ctx.workspaceId);
topicReferences = await resolveTopicReferences(
llmPayload.messages as Array<{ content: string | unknown }>,
messagesForContext as Array<{ content: string | unknown }>,
async (topicId) => topicModel.findById(topicId),
async (topicId) => {
const topic = await topicModel.findById(topicId);
@@ -762,7 +816,7 @@ export const createRuntimeExecutors = (
agentConfig?.slug === 'web-onboarding' ||
resolved.enabledToolIds.includes('lobe-web-onboarding');
const alreadyHasOnboardingContext = (
llmPayload.messages as Array<{ content: string | unknown }>
messagesForContext as Array<{ content: string | unknown }>
).some((message) => {
if (typeof message.content !== 'string') return false;
@@ -1043,7 +1097,7 @@ export const createRuntimeExecutors = (
name: kb.name ?? '',
})),
},
messages: llmPayload.messages as UIChatMessage[],
messages: messagesForContext,
model,
provider,
systemRole: agentConfig.systemRole ?? undefined,
@@ -1071,14 +1125,14 @@ export const createRuntimeExecutors = (
CONTEXT_ENGINEERING_SPAN_NAME,
{
attributes: buildContextEngineeringAttributes({
hasImages: (llmPayload.messages as Array<{ content?: unknown }>).some(
hasImages: (messagesForContext as Array<{ content?: unknown }>).some(
(m) =>
Array.isArray(m.content) &&
(m.content as Array<{ type?: string }>).some((p) => p?.type === 'image_url'),
),
historyCompressed:
Array.isArray(llmPayload.messages) &&
llmPayload.messages.some((m: { role?: string }) => m?.role === 'compressedGroup'),
Array.isArray(messagesForContext) &&
messagesForContext.some((m: { role?: string }) => m?.role === 'compressedGroup'),
knowledgeCount:
(contextEngineInput.knowledge?.knowledgeBases?.length ?? 0) +
(contextEngineInput.knowledge?.fileContents?.length ?? 0),
@@ -1086,7 +1140,7 @@ export const createRuntimeExecutors = (
(contextEngineInput.knowledge?.knowledgeBases?.length ?? 0) > 0 ||
(contextEngineInput.knowledge?.fileContents?.length ?? 0) > 0,
memoryInjected: Boolean(contextEngineInput.userMemory?.memories),
messageCount: llmPayload.messages.length,
messageCount: messagesForContext.length,
operationId,
stepIndex,
systemRoleLength: contextEngineInput.systemRole?.length,
@@ -1639,9 +1693,10 @@ export const createRuntimeExecutors = (
};
}
const persistedReasoning = shouldPersistAssistantReasoning
? finalReasoning
: undefined;
// preserveThinking only gates whether reasoning is replayed into the
// next LLM payload (state.messages); the DB copy powers UI display
// after refresh and must always be saved.
const replayedReasoning = shouldReplayAssistantReasoning ? finalReasoning : undefined;
try {
// Build metadata object
@@ -1675,7 +1730,7 @@ export const createRuntimeExecutors = (
content: finalContent,
imageList: imageList.length > 0 ? imageList : undefined,
metadata: Object.keys(metadata).length > 0 ? metadata : undefined,
reasoning: persistedReasoning,
reasoning: finalReasoning,
search: grounding,
tools: persistedTools,
});
@@ -1708,7 +1763,7 @@ export const createRuntimeExecutors = (
newState.messages.push({
content,
id: assistantMessageItem.id,
reasoning: persistedReasoning,
reasoning: replayedReasoning,
role: 'assistant',
tool_calls: stateToolCalls,
});
@@ -176,6 +176,19 @@ describe('AgentRuntimeCoordinator', () => {
});
});
it('should not publish end event when status changes to waiting_for_async_tool because the same stream will resume', async () => {
const operationId = 'test-operation-id';
const previousState = { status: 'running', stepCount: 3 };
const newState = { status: 'waiting_for_async_tool', stepCount: 4 };
mockStateManager.loadAgentState.mockResolvedValue(previousState);
await coordinator.saveAgentState(operationId, newState as any);
expect(mockStateManager.saveAgentState).toHaveBeenCalledWith(operationId, newState);
expect(mockStreamManager.publishAgentRuntimeEnd).not.toHaveBeenCalled();
});
it('should not publish end event when status was already done', async () => {
const operationId = 'test-operation-id';
const previousState = { status: 'done', stepCount: 5 };
@@ -291,6 +304,22 @@ describe('AgentRuntimeCoordinator', () => {
});
});
it('should not publish end event when status becomes waiting_for_async_tool because deferred tools resume this operation', async () => {
const operationId = 'test-operation-id';
const stepResult = {
executionTime: 1000,
newState: { status: 'waiting_for_async_tool', stepCount: 4 },
stepIndex: 4,
};
mockStateManager.loadAgentState.mockResolvedValue({ status: 'running', stepCount: 3 });
await coordinator.saveStepResult(operationId, stepResult as any);
expect(mockStateManager.saveStepResult).toHaveBeenCalledWith(operationId, stepResult);
expect(mockStreamManager.publishAgentRuntimeEnd).not.toHaveBeenCalled();
});
it('should publish end event when status becomes interrupted', async () => {
const operationId = 'test-operation-id';
const stepResult = {
@@ -408,8 +408,11 @@ describe('RuntimeExecutors', () => {
);
});
describe('reasoning persistence gate', () => {
it('should persist assistant reasoning with tool calls when preserveThinking is enabled on a supported model', async () => {
// preserveThinking gates whether reasoning is replayed into the next LLM
// payload (state.messages). The DB copy powers UI display after refresh and
// is always persisted regardless of the gate.
describe('reasoning replay gate', () => {
it('should replay assistant reasoning with tool calls when preserveThinking is enabled on a supported model', async () => {
const toolCallPayload = [
{
function: { arguments: '{}', name: 'search' },
@@ -474,7 +477,7 @@ describe('RuntimeExecutors', () => {
);
});
it('should not persist assistant reasoning when preserveThinking is not enabled', async () => {
it('should persist reasoning to DB but not replay it when preserveThinking is not enabled', async () => {
const mockChat = vi.fn().mockImplementation(async (_payload, options) => {
await options?.callback?.onThinking?.('hidden reasoning');
await options?.callback?.onText?.('answer');
@@ -498,9 +501,14 @@ describe('RuntimeExecutors', () => {
const assistant = result.newState.messages.at(-1) as any;
expect(assistant.reasoning).toBeUndefined();
// DB persistence must NOT be gated — UI shows reasoning after refresh
expect(mockMessageModel.update).toHaveBeenCalledWith(
'msg-123',
expect.objectContaining({ reasoning: { content: 'hidden reasoning' } }),
);
});
it('should persist assistant reasoning when preserveThinking is enabled on a supported model', async () => {
it('should replay assistant reasoning when preserveThinking is enabled on a supported model', async () => {
const mockChat = vi.fn().mockImplementation(async (_payload, options) => {
await options?.callback?.onThinking?.('preserved reasoning');
await options?.callback?.onText?.('answer');
@@ -546,7 +554,7 @@ describe('RuntimeExecutors', () => {
);
});
it('should persist reasoning for unknown custom deployments on supported providers', async () => {
it('should replay reasoning for unknown custom deployments on supported providers', async () => {
const mockChat = vi.fn().mockImplementation(async (_payload, options) => {
await options?.callback?.onThinking?.('custom deployment reasoning');
await options?.callback?.onText?.('answer');
@@ -592,9 +600,9 @@ describe('RuntimeExecutors', () => {
);
});
it('should not persist reasoning when model does not declare preserveThinking capability', async () => {
it('should persist reasoning to DB but not replay it when model does not declare preserveThinking capability', async () => {
const mockChat = vi.fn().mockImplementation(async (_payload, options) => {
await options?.callback?.onThinking?.('reasoning that should not be saved');
await options?.callback?.onThinking?.('reasoning on an unsupported model');
await options?.callback?.onText?.('answer');
return new Response('done');
});
@@ -634,6 +642,13 @@ describe('RuntimeExecutors', () => {
expect.not.objectContaining({ preserveThinking: expect.any(Boolean) }),
expect.anything(),
);
// DB persistence must NOT be gated — UI shows reasoning after refresh
expect(mockMessageModel.update).toHaveBeenCalledWith(
'msg-123',
expect.objectContaining({
reasoning: { content: 'reasoning on an unsupported model' },
}),
);
});
});
@@ -771,9 +786,9 @@ describe('RuntimeExecutors', () => {
});
vi.mocked(initModelRuntimeFromDB).mockResolvedValueOnce({ chat: mockChat } as any);
// Reasoning only lands in the finalized message when preserveThinking is
// enabled on a supported model; otherwise it is intentionally dropped.
// Enable it here so this still guards reasoning_part capture (not drop).
// Reasoning is only replayed into state.messages when preserveThinking is
// enabled on a supported model. Enable it here so this asserts
// reasoning_part capture via the state replay path.
const ctxWithThinking: RuntimeExecutorContext = {
...ctx,
agentConfig: { chatConfig: { preserveThinking: true }, plugins: [], systemRole: 'test' },
@@ -1596,6 +1611,168 @@ describe('RuntimeExecutors', () => {
);
});
it('should strip stored assistant reasoning before context processing when replay gate is off', async () => {
const ctxWithConfig: RuntimeExecutorContext = {
...ctx,
agentConfig: {
plugins: [],
systemRole: 'test',
},
};
const executors = createRuntimeExecutors(ctxWithConfig);
const state = createMockState();
const messages = [
{
content: 'Previous answer',
reasoning: { content: 'stored reasoning should stay display-only' },
role: 'assistant',
},
{ content: 'Continue', role: 'user' },
];
await executors.call_llm!(
{
payload: {
messages,
model: 'gpt-4',
provider: 'openai',
},
type: 'call_llm' as const,
},
state,
);
const engineInput = engineSpy.mock.calls[0][0];
expect(engineInput.messages[0]).toEqual({
content: 'Previous answer',
role: 'assistant',
});
expect(messages[0]).toEqual(
expect.objectContaining({
reasoning: { content: 'stored reasoning should stay display-only' },
}),
);
});
it('should strip stored reasoning from grouped assistant messages before context processing when replay gate is off', async () => {
const ctxWithConfig: RuntimeExecutorContext = {
...ctx,
agentConfig: {
plugins: [],
systemRole: 'test',
},
};
const executors = createRuntimeExecutors(ctxWithConfig);
const state = createMockState();
const groupedChild = {
content: 'Grouped answer',
id: 'group-child-1',
reasoning: { content: 'grouped child reasoning should stay display-only' },
role: 'assistant',
};
const councilMember = {
content: 'Council member answer',
id: 'member-1',
reasoning: { content: 'member reasoning should stay display-only' },
role: 'assistant',
};
const nestedCouncilChild = {
content: 'Nested council answer',
id: 'member-child-1',
reasoning: { content: 'nested member reasoning should stay display-only' },
role: 'assistant',
};
const messages = [
{
children: [groupedChild],
content: '',
id: 'group-1',
role: 'assistantGroup',
},
{
content: '',
id: 'council-1',
members: [
councilMember,
{
children: [nestedCouncilChild],
content: '',
id: 'member-group-1',
role: 'assistantGroup',
},
],
role: 'agentCouncil',
},
{ content: 'Continue', role: 'user' },
];
await executors.call_llm!(
{
payload: {
messages,
model: 'gpt-4',
provider: 'openai',
},
type: 'call_llm' as const,
},
state,
);
const engineInput = engineSpy.mock.calls[0][0];
expect(engineInput.messages[0].children[0]).not.toHaveProperty('reasoning');
expect(engineInput.messages[1].members[0]).not.toHaveProperty('reasoning');
expect(engineInput.messages[1].members[1].children[0]).not.toHaveProperty('reasoning');
expect(groupedChild).toHaveProperty('reasoning');
expect(councilMember).toHaveProperty('reasoning');
expect(nestedCouncilChild).toHaveProperty('reasoning');
});
it('should keep stored assistant reasoning before context processing when replay gate is enabled', async () => {
const ctxWithConfig: RuntimeExecutorContext = {
...ctx,
agentConfig: {
chatConfig: { preserveThinking: true },
plugins: [],
systemRole: 'test',
},
};
const executors = createRuntimeExecutors(ctxWithConfig);
const state = createMockState({
modelRuntimeConfig: {
model: 'qwen3.6-plus',
provider: 'qwen',
},
});
await executors.call_llm!(
{
payload: {
messages: [
{
content: 'Previous answer',
reasoning: { content: 'reasoning to replay' },
role: 'assistant',
},
{ content: 'Continue', role: 'user' },
],
model: 'qwen3.6-plus',
provider: 'qwen',
},
type: 'call_llm' as const,
},
state,
);
const engineInput = engineSpy.mock.calls[0][0];
expect(engineInput.messages[0]).toEqual(
expect.objectContaining({
content: 'Previous answer',
reasoning: { content: 'reasoning to replay' },
role: 'assistant',
}),
);
});
it('should not call serverMessagesEngine when agentConfig is not set', async () => {
const executors = createRuntimeExecutors(ctx); // ctx without agentConfig
const state = createMockState();
@@ -508,12 +508,34 @@ describe('createServerAgentToolsEngine', () => {
expect(result.enabledToolIds).not.toContain(LocalSystemManifest.identifier);
});
it('should disable LocalSystem when runtimeMode is explicitly set to cloud', () => {
it('should disable LocalSystem when executionTarget is sandbox', () => {
const context = createMockContext();
const engine = createServerAgentToolsEngine(context, {
agentConfig: {
agencyConfig: { executionTarget: 'sandbox' },
plugins: [LocalSystemManifest.identifier],
},
canUseDevice: true,
deviceContext: { gatewayConfigured: true, deviceOnline: true, autoActivated: true },
model: 'gpt-4',
provider: 'openai',
});
const result = engine.generateToolsDetailed({
toolIds: [LocalSystemManifest.identifier],
model: 'gpt-4',
provider: 'openai',
});
expect(result.enabledToolIds).not.toContain(LocalSystemManifest.identifier);
});
it('should disable LocalSystem when executionTarget is none', () => {
const context = createMockContext();
const engine = createServerAgentToolsEngine(context, {
agentConfig: {
agencyConfig: { executionTarget: 'none' },
plugins: [LocalSystemManifest.identifier],
chatConfig: { runtimeEnv: { runtimeMode: { desktop: 'cloud' } } },
},
canUseDevice: true,
deviceContext: { gatewayConfigured: true, deviceOnline: true, autoActivated: true },
@@ -574,6 +596,50 @@ describe('createServerAgentToolsEngine', () => {
expect(result.enabledToolIds).not.toContain(RemoteDeviceManifest.identifier);
});
it('should disable RemoteDevice when executionTarget is none — 无设备 means NO device', () => {
const context = createMockContext();
const engine = createServerAgentToolsEngine(context, {
agentConfig: {
agencyConfig: { executionTarget: 'none' },
plugins: [RemoteDeviceManifest.identifier],
},
canUseDevice: true,
deviceContext: { gatewayConfigured: true },
model: 'gpt-4',
provider: 'openai',
});
const result = engine.generateToolsDetailed({
toolIds: [RemoteDeviceManifest.identifier],
model: 'gpt-4',
provider: 'openai',
});
expect(result.enabledToolIds).not.toContain(RemoteDeviceManifest.identifier);
});
it('should disable RemoteDevice when executionTarget is sandbox — sandbox and devices are mutually exclusive', () => {
const context = createMockContext();
const engine = createServerAgentToolsEngine(context, {
agentConfig: {
agencyConfig: { executionTarget: 'sandbox' },
plugins: [RemoteDeviceManifest.identifier],
},
canUseDevice: true,
deviceContext: { gatewayConfigured: true },
model: 'gpt-4',
provider: 'openai',
});
const result = engine.generateToolsDetailed({
toolIds: [RemoteDeviceManifest.identifier],
model: 'gpt-4',
provider: 'openai',
});
expect(result.enabledToolIds).not.toContain(RemoteDeviceManifest.identifier);
});
it('should disable RemoteDevice when device is already auto-activated', () => {
const context = createMockContext();
const engine = createServerAgentToolsEngine(context, {
@@ -28,7 +28,7 @@ import { ToolsEngine } from '@lobechat/context-engine';
import { type RuntimeEnvMode, type RuntimePlatform } from '@lobechat/types';
import debug from 'debug';
import { resolveRuntimeMode } from '@/helpers/executionTarget';
import { executionTargetToRuntimeMode, resolveExecutionTarget } from '@/helpers/executionTarget';
import {
buildAllowedBuiltinTools,
DEVICE_TOOL_IDENTIFIERS,
@@ -129,6 +129,7 @@ export const createServerAgentToolsEngine = (
canUseDevice = false,
deviceContext,
disableLocalSystem = false,
executionPlan,
globalMemoryEnabled = false,
hasAgentDocuments = false,
hasEnabledKnowledgeBases = false,
@@ -144,20 +145,26 @@ export const createServerAgentToolsEngine = (
// back to the caller) is removed.
const hasDeviceProxy = !!deviceContext?.gatewayConfigured;
// Platform key is used only to look up the user's per-platform
// `runtimeMode` preference. A server configured with a device-gateway is
// serving desktop-class users; otherwise the caller is treated as web.
// A server configured with a device-gateway is serving desktop-class users
// (the unset-target default resolves to `local`); otherwise the caller is
// treated as web.
const platform: RuntimePlatform = hasDeviceProxy ? 'desktop' : 'web';
// Tool gate derived from the single `agencyConfig.executionTarget` param
// (sandbox → cloud tools, local → local-system tools, device → gateway), with
// a no-regression fallback to the legacy per-platform `runtimeMode` for agents
// that predate `executionTarget`.
const runtimeMode: RuntimeEnvMode = resolveRuntimeMode(
agentConfig.agencyConfig,
agentConfig.chatConfig?.runtimeEnv?.runtimeMode?.[platform],
platform === 'desktop',
);
// Tool gate derived from the run's resolved execution plan (sandbox → cloud
// tools, local → local-system tools, device → gateway). Callers that don't
// resolve a plan (focused sub-agent engines) fall back to deriving the
// effective target from agencyConfig.
const executionTarget =
executionPlan?.target ??
resolveExecutionTarget(agentConfig.agencyConfig, {
isDesktop: platform === 'desktop',
});
const runtimeMode: RuntimeEnvMode = executionTargetToRuntimeMode(executionTarget);
// Device tools (local-system, remote-device proxy) only exist for
// device-capable targets. `none` means NO device — the proxy that could
// activate one mid-run must not be offered either; `sandbox` and devices
// are mutually exclusive.
const deviceCapable = executionTarget === 'local' || executionTarget === 'device';
const searchMode = agentConfig.chatConfig?.searchMode ?? 'auto';
const isSearchEnabled = searchMode !== 'off';
@@ -207,13 +214,14 @@ export const createServerAgentToolsEngine = (
// System-level rules (may override user selection for specific tools)
[CloudSandboxManifest.identifier]: runtimeMode === 'cloud',
[KnowledgeBaseManifest.identifier]: hasEnabledKnowledgeBases,
// Local-system: gated by `canUseDevice` (resolveDeviceAccessPolicy)
// first — keeps external bot senders out before runtime checks even
// run. Then user must have opted into local runtime on this platform
// (`runtimeMode === 'local'`) AND have an online, auto-activated
// device registered with the device-gateway.
// Local-system: the user must have opted into local runtime
// (`runtimeMode === 'local'`) AND have an online, auto-activated device
// registered with the device-gateway. Access policy (external bot
// senders) is enforced upstream: `resolveExecutionPlan` degrades denied
// targets to `none`, and `buildAllowedBuiltinTools` +
// `excludeIdentifiers` physically drop the manifest for
// `canUseDevice=false` turns.
[LocalSystemManifest.identifier]:
canUseDevice &&
!disableLocalSystem &&
runtimeMode === 'local' &&
hasDeviceProxy &&
@@ -222,16 +230,13 @@ export const createServerAgentToolsEngine = (
[MemoryManifest.identifier]: globalMemoryEnabled,
// Only auto-enable in bot conversations; otherwise let user's plugin selection take effect
...(isBotConversation && { [MessageManifest.identifier]: true }),
// Remote-device proxy: shown only when the server has a proxy but
// no specific device is auto-activated yet (user must pick).
//
// `canUseDevice` is the first short-circuit: external bot senders
// (and unconfigured bot owners) never reach the proxy, both because
// it would let them poke at the owner's machine AND because its
// systemRole would otherwise leak the device list into the LLM
// context — see the gated injection in `aiAgent.execAgent`.
// Remote-device proxy: shown only for device-capable targets when the
// server has a proxy but no specific device is auto-activated yet (user
// must pick). External bot senders never reach it: the plan degrades
// denied targets to `none` (→ not deviceCapable) and the physical
// manifest walls drop it for `canUseDevice=false` turns.
[RemoteDeviceManifest.identifier]:
canUseDevice && hasDeviceProxy && !deviceContext?.autoActivated,
deviceCapable && hasDeviceProxy && !deviceContext?.autoActivated,
[AgentDocumentsManifest.identifier]: hasAgentDocuments,
[WebBrowsingManifest.identifier]: isSearchEnabled,
};
@@ -1,10 +1,7 @@
import { type LobeToolManifest, type PluginEnableChecker } from '@lobechat/context-engine';
import {
type LobeAgentAgencyConfig,
type LobeBuiltinTool,
type LobeTool,
type RuntimeEnvConfig,
} from '@lobechat/types';
import { type LobeAgentAgencyConfig, type LobeBuiltinTool, type LobeTool } from '@lobechat/types';
import type { ExecutionPlan } from '@/helpers/executionTarget';
/**
* Installed plugin with manifest
@@ -67,7 +64,6 @@ export interface ServerCreateAgentToolsEngineParams {
* `plugins` and `alwaysOnToolIds`. Undefined / true agent mode.
*/
enableAgentMode?: boolean;
runtimeEnv?: RuntimeEnvConfig;
searchMode?: 'off' | 'on' | 'auto';
/**
* Overrides the `enableAgentMode` derivation. `custom` = the toolset is
@@ -97,6 +93,12 @@ export interface ServerCreateAgentToolsEngineParams {
};
/** Whether to suppress the local-system builtin while preserving other tools. */
disableLocalSystem?: boolean;
/**
* The run's resolved execution plan (see `resolveExecutionPlan`). When
* provided, its effective `target` drives the runtime tool gate; when
* omitted the engine derives the target from `agencyConfig` directly.
*/
executionPlan?: ExecutionPlan;
/** Whether the user's global memory setting is enabled */
globalMemoryEnabled?: boolean;
/** Whether agent has agent documents */
@@ -408,7 +408,7 @@ export const initModelRuntimeFromDB = async (
): Promise<ModelRuntime> => {
// 1. Get user's provider configuration from database
// NOTE: workspace-scoped ai_infra is deferred until the ai_infra surrogate-`_id`
// PK migration (LOBE-10056) lands; AiProviderModel stays personal-scoped for now.
// PK migration lands; AiProviderModel stays personal-scoped for now.
const aiProviderModel = new AiProviderModel(db, userId);
// Use getAiProviderById with KeyVaultsGateKeeper.getUserKeyVaults as decryptor
@@ -1032,6 +1032,93 @@ describe('aiChatRouter', () => {
}
});
it('marks input completion runtime 4xx errors to skip tRPC handler logging', async () => {
const { initModelRuntimeFromDB } = await import('@/server/modules/ModelRuntime');
const runtimeError = {
error: { message: 'rate limited' },
errorType: AgentRuntimeErrorType.RateLimitExceeded,
};
vi.mocked(initModelRuntimeFromDB).mockRejectedValueOnce(runtimeError);
const caller = aiChatRouter.createCaller({ ...mockCtx, serverDB: {} } as any);
try {
await caller.outputJSON({
messages: [{ content: 'test', role: 'user' }],
model: 'gpt-4o',
provider: 'openai',
tracing: { scenario: 'input_completion' },
});
throw new Error('Expected outputJSON to throw');
} catch (error) {
expect(error).toBeInstanceOf(TRPCError);
expect((runtimeError as any).__lobeSilentTRPCErrorLog).toBe(true);
}
});
it('does not mark non-input-completion runtime errors as silent', async () => {
const { initModelRuntimeFromDB } = await import('@/server/modules/ModelRuntime');
const runtimeError = {
error: { message: 'rate limited' },
errorType: AgentRuntimeErrorType.RateLimitExceeded,
};
vi.mocked(initModelRuntimeFromDB).mockRejectedValueOnce(runtimeError);
const caller = aiChatRouter.createCaller({ ...mockCtx, serverDB: {} } as any);
try {
await caller.outputJSON({
messages: [{ content: 'test', role: 'user' }],
model: 'gpt-4o',
provider: 'openai',
tracing: { scenario: 'topic_title' },
});
throw new Error('Expected outputJSON to throw');
} catch (error) {
expect(error).toBeInstanceOf(TRPCError);
expect((runtimeError as any).__lobeSilentTRPCErrorLog).toBeUndefined();
}
});
it('maps raw provider 4xx errors to BAD_REQUEST instead of internal errors', async () => {
const { initModelRuntimeFromDB } = await import('@/server/modules/ModelRuntime');
// Raw SDK APIError shape: carries an HTTP status but no errorType — the
// generateObject path rethrows upstream errors verbatim (e.g. a BYOK
// gateway rejecting response_format json_schema).
const providerError = Object.assign(
new Error(
'400 Error from provider (DeepSeek): This response_format type is unavailable now',
),
{ status: 400 },
);
const mockGenerateObject = vi.fn().mockRejectedValue(providerError);
vi.mocked(initModelRuntimeFromDB).mockResolvedValue({
generateObject: mockGenerateObject,
} as any);
const caller = aiChatRouter.createCaller({ ...mockCtx, serverDB: {} } as any);
try {
await caller.outputJSON({
messages: [{ content: 'test', role: 'user' }],
model: 'deepseek-v4-flash-free',
provider: 'opencodezen',
});
throw new Error('Expected outputJSON to throw');
} catch (error) {
expect(error).toBeInstanceOf(TRPCError);
expect(error).toMatchObject({
cause: providerError,
code: 'BAD_REQUEST',
message: providerError.message,
});
}
});
it('should handle tools parameter when provided', async () => {
const { initModelRuntimeFromDB } = await import('@/server/modules/ModelRuntime');
+48 -8
View File
@@ -1,5 +1,6 @@
import { randomUUID } from 'node:crypto';
import { TRACING_SCENARIOS } from '@lobechat/const';
import { getErrorCodeSpec } from '@lobechat/model-runtime';
import type { CreateMessageParams, SendMessageServerResponse } from '@lobechat/types';
import { AiSendMessageServerSchema, RequestTrigger, StructureOutputSchema } from '@lobechat/types';
@@ -28,6 +29,7 @@ const log = debug('lobe-lambda-router:ai-chat');
const { createPrefixedTimingContext, logTiming, runTimedStage } = createTimingHelpers(
'lobe-server:chat:lobehub:timing',
);
const SILENT_TRPC_ERROR_LOG_KEY = '__lobeSilentTRPCErrorLog';
type TRPCErrorCode = ConstructorParameters<typeof TRPCError>[0]['code'];
type TRPCStatusCode = Parameters<typeof getStatusKeyFromCode>[0];
@@ -49,16 +51,52 @@ const getTRPCErrorCodeFromStatus = (status: number): TRPCErrorCode => {
return 'INTERNAL_SERVER_ERROR';
};
const createRuntimeTRPCError = (error: unknown): TRPCError | undefined => {
const markSilentTRPCErrorLog = (error: unknown) => {
if (!error || typeof error !== 'object') return;
try {
Object.defineProperty(error, SILENT_TRPC_ERROR_LOG_KEY, {
configurable: true,
value: true,
});
} catch {
// Best-effort logging hint; never let it mask the original runtime error.
}
};
const createRuntimeTRPCError = (
error: unknown,
options?: { silentHandlerLog?: boolean },
): TRPCError | undefined => {
const errorType = getRuntimeErrorType(error);
const spec = getErrorCodeSpec(errorType);
if (!errorType || !spec) return;
if (errorType && spec) {
if (options?.silentHandlerLog && spec.httpStatus < 500) markSilentTRPCErrorLog(error);
return new TRPCError({
cause: error,
code: getTRPCErrorCodeFromStatus(spec.httpStatus),
message: errorType,
});
return new TRPCError({
cause: error,
code: getTRPCErrorCodeFromStatus(spec.httpStatus),
message: errorType,
});
}
// Raw provider SDK errors (OpenAI/Anthropic APIError) carry an HTTP status
// but no errorType — the generateObject path rethrows upstream errors
// verbatim. Without this mapping, tRPC classifies them as
// INTERNAL_SERVER_ERROR, so a user-channel 4xx (e.g. a BYOK provider
// rejecting the request) pollutes server 500 monitoring.
const status = (error as { status?: unknown } | undefined)?.status;
if (typeof status === 'number' && status >= 400 && status < 500) {
if (options?.silentHandlerLog) markSilentTRPCErrorLog(error);
return new TRPCError({
cause: error,
code: getTRPCErrorCodeFromStatus(status),
message: error instanceof Error ? error.message : `Provider error (${status})`,
});
}
return undefined;
};
const aiChatProcedure = wsCompatProcedure.use(serverDatabase).use(async (opts) => {
@@ -113,7 +151,9 @@ export const aiChatRouter = router({
},
);
} catch (error) {
const runtimeTRPCError = createRuntimeTRPCError(error);
const runtimeTRPCError = createRuntimeTRPCError(error, {
silentHandlerLog: input.tracing?.scenario === TRACING_SCENARIOS.InputCompletion,
});
if (runtimeTRPCError) throw runtimeTRPCError;
throw error;
@@ -31,7 +31,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "The latest GPT-4 Turbo model includes vision. Vision requests can use JSON mode and function calling. GPT-4 Turbo is an enhanced version that balances accuracy and efficiency for cost-effective multimodal tasks and real-time interactions.",
"displayName": "gpt-4-32k",
"enabled": true,
"family": "gpt",
"generation": "gpt-4",
"id": "gpt-4-0125-preview",
"knowledgeCutoff": "2023-12",
"pricing": {
"units": [
{
@@ -66,7 +69,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "GPT 3.5 Turbo for text generation and understanding; currently points to gpt-3.5-turbo-0125.",
"displayName": "GPT-3.5 Turbo 1106",
"enabled": true,
"family": "gpt",
"generation": "gpt-3.5",
"id": "gpt-3.5-turbo-1106",
"knowledgeCutoff": "2021-09",
"pricing": {
"units": [
{
@@ -96,7 +102,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "GPT 3.5 Turbo for text generation and understanding; currently points to gpt-3.5-turbo-0125.",
"displayName": "GPT-3.5 Turbo",
"enabled": true,
"family": "gpt",
"generation": "gpt-3.5",
"id": "gpt-3.5-turbo",
"knowledgeCutoff": "2021-09",
"pricing": {
"units": [
{
@@ -125,7 +134,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "GPT-4 provides a larger context window to handle longer inputs, suitable for broad information synthesis and data analysis.",
"displayName": "GPT-4",
"enabled": true,
"family": "gpt",
"generation": "gpt-4",
"id": "gpt-4",
"knowledgeCutoff": "2021-09",
"pricing": {
"units": [
{
@@ -154,7 +166,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "GPT-4 provides a larger context window to handle longer inputs for scenarios needing broad information integration and data analysis.",
"displayName": "GPT-4 32K",
"enabled": true,
"family": "gpt",
"generation": "gpt-4",
"id": "gpt-4-32k",
"knowledgeCutoff": "2021-09",
"pricing": {
"units": [
{
@@ -183,7 +198,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "The latest GPT-4 Turbo model includes vision. Vision requests can use JSON mode and function calling. GPT-4 Turbo is an enhanced version that balances accuracy and efficiency for cost-effective multimodal tasks and real-time interactions.",
"displayName": "GPT-4 Turbo Preview 1106",
"enabled": true,
"family": "gpt",
"generation": "gpt-4",
"id": "gpt-4-1106-preview",
"knowledgeCutoff": "2023-04",
"pricing": {
"units": [
{
@@ -223,7 +241,10 @@ exports[`configRouter > getGlobalConfig > Model Provider env > OPENAI_MODEL_LIST
"description": "The latest GPT-4 Turbo model includes vision. Vision requests can use JSON mode and function calling. GPT-4 Turbo is an enhanced version that balances accuracy and efficiency for cost-effective multimodal tasks and real-time interactions.",
"displayName": "GPT-4 Turbo Preview 1106",
"enabled": true,
"family": "gpt",
"generation": "gpt-4",
"id": "gpt-4-1106-preview",
"knowledgeCutoff": "2023-04",
"pricing": {
"units": [
{
+50 -4
View File
@@ -56,7 +56,19 @@ const oidcConfigSchema = z.object({
usePKCE: z.boolean().optional(),
});
/**
* Non-OAuth credentials the client may set directly when creating/updating a
* connector. OAuth2 tokens are intentionally excluded those are written only
* by the OAuth callback after a successful authorization exchange.
*/
const connectorCredentialsInputSchema = z.discriminatedUnion('type', [
z.object({ token: z.string().min(1), type: z.literal('bearer') }),
z.object({ apiKey: z.string().min(1), type: z.literal('apikey') }),
z.object({ headers: z.record(z.string()), type: z.literal('header') }),
]);
const createConnectorSchema = z.object({
credentials: connectorCredentialsInputSchema.optional(),
identifier: z.string().min(1).max(255),
isEnabled: z.boolean().optional().default(true),
mcpConnectionType: z
@@ -113,13 +125,36 @@ export const connectorRouter = router({
// ── Mutations ─────────────────────────────────────────────────────────────
create: connectorProcedure.input(createConnectorSchema).mutation(async ({ input, ctx }) => {
return ctx.connectorModel.create({
...input,
const fields = {
// The model expects the decrypted JSON string and encrypts it at rest.
credentials: input.credentials ? JSON.stringify(input.credentials) : null,
mcpConnectionType: input.mcpConnectionType ?? null,
mcpServerUrl: input.mcpServerUrl ?? null,
mcpStdioConfig: input.mcpStdioConfig ?? null,
metadata: input.metadata ?? null,
name: input.name,
oidcConfig: input.oidcConfig ?? null,
};
// Idempotent on (user_id, identifier): re-adding or re-authorizing the same
// connector updates the existing row instead of violating the unique index.
// Status resets to `disconnected` — the OAuth callback / tool sync promotes
// it back to `connected` on success.
const [existing] = await ctx.connectorModel.queryByIdentifiers([input.identifier]);
if (existing) {
await ctx.connectorModel.update(existing.id, {
...fields,
isEnabled: input.isEnabled ?? true,
status: ConnectorStatus.disconnected,
});
return { id: existing.id };
}
return ctx.connectorModel.create({
...fields,
identifier: input.identifier,
isEnabled: input.isEnabled ?? true,
sourceType: input.sourceType,
status: ConnectorStatus.disconnected,
});
}),
@@ -221,11 +256,22 @@ export const connectorRouter = router({
.input(
z.object({
id: z.string().uuid(),
patch: createConnectorSchema.partial().omit({ identifier: true, sourceType: true }),
patch: createConnectorSchema
.partial()
.omit({ identifier: true, sourceType: true })
// Allow `null` here so an edit can clear credentials (switch to no-auth).
.extend({ credentials: connectorCredentialsInputSchema.nullable().optional() }),
}),
)
.mutation(async ({ input, ctx }) => {
await ctx.connectorModel.update(input.id, input.patch as any);
const { credentials, ...patch } = input.patch;
await ctx.connectorModel.update(input.id, {
...patch,
// undefined → leave untouched; null → clear; object → encrypt the JSON string.
...(credentials === undefined
? {}
: { credentials: credentials ? JSON.stringify(credentials) : null }),
} as any);
}),
delete: connectorProcedure
+15
View File
@@ -270,6 +270,21 @@ export const deviceRouter = router({
return result ?? null;
}),
/**
* Read-only local file preview for a file on a remote device. The web client
* receives render data, not a `localfile://` URL; saving remains unsupported.
*/
getLocalFilePreview: deviceProcedure
.input(z.object({ deviceId: z.string(), path: z.string(), workingDirectory: z.string() }))
.query(async ({ ctx, input }) =>
deviceGateway.getLocalFilePreview({
deviceId: input.deviceId,
path: input.path,
userId: ctx.userId,
workingDirectory: input.workingDirectory,
}),
),
/**
* Project skills (`.agents/skills` / `.claude/skills`) for a directory on a
* remote device, via the device's `listProjectSkills` RPC. Powers the
@@ -616,6 +616,8 @@ export class AgentDocumentsService {
const filtered =
sourceType && sourceType !== 'all' ? docs.filter((d) => d.sourceType === sourceType) : docs;
return filtered.map((d) => ({
...deriveAgentDocumentFields(d),
description: d.description,
documentId: d.documentId,
fileType: d.fileType,
filename: d.filename,
@@ -623,7 +625,9 @@ export class AgentDocumentsService {
loadPosition: d.policy?.context?.position,
parentId: d.parentId,
sourceType: d.sourceType,
templateId: d.templateId,
title: d.title,
updatedAt: d.updatedAt,
}));
}
@@ -642,6 +646,8 @@ export class AgentDocumentsService {
.filter((doc): doc is AgentDocumentWithRules => Boolean(doc))
.filter((doc) => !sourceType || sourceType === 'all' || doc.sourceType === sourceType)
.map((doc) => ({
...deriveAgentDocumentFields(doc),
description: doc.description,
documentId: doc.documentId,
fileType: doc.fileType,
filename: doc.filename,
@@ -649,7 +655,9 @@ export class AgentDocumentsService {
loadPosition: doc.policy?.context?.position,
parentId: doc.parentId,
sourceType: doc.sourceType,
templateId: doc.templateId,
title: doc.title,
updatedAt: doc.updatedAt,
}));
}

Some files were not shown because too many files have changed in this diff Show More