Compare commits

...

316 Commits

Author SHA1 Message Date
Arvin Xu def6029d30 ️ perf: write message mutations through to the message:list SWR cache
Message mutations only touched the in-memory store, so the message:list
SWR/IndexedDB cache stayed stale until a network refetch. Because the
Conversation store is recreated on every topic/session switch and
re-hydrates from that cache, the stale cache is what forced a refetch on
every switch.

replaceMessages now seeds the message:list cache for the exact bucket via
mutate(matcher, messages, { revalidate: false }). Skips the
useFetchMessages onData sync path (SWR already holds it) and skips while
the context is streaming to avoid per-token IndexedDB thrash; the
agent_runtime_end snapshot still writes through since it clears the
running flag first.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 20:12:06 +08:00
Arvin Xu 10b3cbda3e ♻️ refactor(agent): run callAgent as deferred tool (#15765)
* ♻️ refactor(agent): delegate callAgent via server runner

* ♻️ refactor(agent): run callAgent as deferred tool

*  test(agent): cover server callAgent deferred flow
2026-06-16 14:29:36 +08:00
Arvin Xu b6ad1ed4be ♻️ refactor(conversation-flow): role-aware dual-form message-chain reader (#15908)
* ♻️ refactor(conversation-flow): role-aware dual-form message-chain reader

Make the read side role-aware so both persisted chain shapes parse to
equivalent display output (LOBE-10445 phase 1):

- tool-anchored (legacy): next step's assistant hangs off the previous
  step's last tool result
- assistant-anchored (new): next step's assistant hangs off the most
  recent non-tool message, so a tool result and the next assistant are
  siblings under one assistant

Two invariants drive a single reader: a `tool` message is always inline
data of its assistant; a branch is >=2 non-tool siblings under one parent.
The continuation walk now looks for the next spine assistant among the
assistant's own non-tool children as well as its tools' children;
group detection keys on "has >=1 tool child"; branch detection counts
non-tool children only.

Pure read-side, no write-path change — ships independently. Verified
against 5 fixture classes (old / new / mixed / parallel-tool /
regenerate-branch) asserting flatList + contextTree parity.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(conversation-flow): guard assistant-anchored chain continuation

Address two edge cases in the dual-form continuation finder where seeding
the candidate set with the assistant's own id (the new-form path) bypassed
logic the per-tool path already had:

1. Regenerated continuation: a tool-using assistant can have two non-tool
   assistant children beside its tool result. The finder flattened all
   candidates and returned the earliest, ignoring the parent's
   activeBranchIndex and dropping the other branch. Route >1 non-tool-child
   sets through BranchResolver before picking a linear continuation.

2. Async-task summary: when a tool spawned tasks but the follow-up summary
   uses the assistant-anchored parent (summary.parentId === assistant.id),
   the assistant seed bypassed the task/AgentCouncil fan-out guard and the
   summary got folded into the AssistantGroup before the tasks aggregation.
   Apply the same fan-out guard to the assistant-anchored candidate so the
   group -> tasks -> summary order is preserved.

Both the flat (findFlatChainContinuation) and tree (findChainContinuationNode)
variants share a resolveActiveContinuationId helper; BranchResolver is now
injected into MessageCollector. Adds two fixtures (⑥ regenerated branch,
⑦ async-task summary). conversation-flow: 143 passed, type-check clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:26:28 +08:00
Arvin Xu 5c5a719186 🐛 fix(device): lock a run to the explicitly selected device, never offer device-switching (#15914)
When a device is explicitly selected (`boundDeviceId`), the run must stay on
it and the model must never be able to activate / switch to another machine.

The remote-device (activate-device) tool was gated only by `!autoActivated`.
That left a hole: when the selected device went OFFLINE the plan became
`device-unrouted`, `autoActivated` flipped to false, and the activate-device
tool resurfaced — letting the model silently hop onto a *different* online
device. That is exactly the "auto-replace to another device after offline"
behavior we want gone.

Also suppress the tool whenever `boundDeviceId` is set, regardless of online
status. An explicitly selected device now locks the run: the tool is never
offered, so the run stays unrouted until that device comes back instead of
switching machines. The unbound case is unchanged — the tool is still offered
so the model/user can pick a device.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 14:24:55 +08:00
LobeHub Bot 3d594e77f5 🌐 chore: translate non-English comments to English in openapi-remaining-services (#15913)
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-16 13:43:26 +08:00
Arvin Xu ac8f324a3b chore: remove LOBE-XXX markers from code comments (2026-06-16) (#15905)
chore: remove LOBE-XXX markers from code comments

Replace LOBE-XXX ticket references in code comments with descriptive
context from the corresponding Linear issues. The markers served as
internal tracking anchors during development but are inappropriate
for the open-source codebase.

Files changed:
- AgentRuntimeService.ts: LOBE-10385 → async sub-agent suspend/resume
  stability hardening context
- observability-otel/agent-runtime/index.ts: same LOBE-10385 context
- buildRunLifecycle.ts: LOBE-10378/10379/10382 → run lifecycle and
  transport unification context
- streamingExecutor.ts: LOBE-10378 reference removed
- modelExtendParams.test.ts: LOBE-10442 → Gemini 3 Pro reasoning token
  context

Co-authored-by: Arvin Xu <arvinxx@users.noreply.github.com>
2026-06-16 13:38:09 +08:00
René Wang 53e82d2e13 📝 docs: add June 15 weekly changelog (#15907)
* 📝 docs: add June 15 weekly changelog

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 docs: restructure last 12 changelog entries into Features/Improvements/Fixes

Normalize section headings, split improvements from fixes, plainer wording, and fewer em-dashes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 📝 docs: remove Claude Fable 5 from June 15 changelog

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🍱 docs: add cover image for June 15 changelog

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 12:00:03 +08:00
LiJian bbbe1a96d7 🐛 fix(connector): restore credentials in edit mode, prevent silent wipe on save (#15909)
* 🐛 fix(agent-builder): correct target agentId and refresh sidebar in gateway mode

In gateway mode, AgentBuilder's tool calls (updateConfig / updatePrompt /
installPlugin) were targeting the builtin builder agent instead of the agent
being edited, and the left-sidebar never refreshed after a successful write.

Two root causes fixed:

1. **Wrong agentId on server** — `executeGatewayAgent` only sent `context.agentId`
   (= the AgentBuilder builtin) to the server. The editing target was only held
   in `chatStore.activeAgentId` (synced by AgentBuilderProvider) but never
   forwarded. Now, when `scope === 'agent_builder'`, the client sends
   `appContext.editingAgentId = chatStore.activeAgentId`. The tRPC Zod schema
   and `ExecAgentAppContext` type both accept the new field, and
   `aiAgent/index.ts` uses it to override the operation's `agentId` so
   `state.metadata.agentId` (and therefore `ctx.agentId` in the server
   executor) points to the correct editing target.

2. **No sidebar refresh** — In client mode the `AgentManagerRuntime` directly
   calls `agentStore.optimisticUpdateAgentConfig()`, which triggers a Zustand
   re-render. In gateway mode the update happens server-side so no Zustand
   mutation ever fires. Fixed by adding an `onAfterCall` hook to
   `AgentBuilderExecutor`: after any successful write it reads the editing
   agent ID from `chatStore.activeAgentId` and calls
   `getAgentStoreState().internal_refreshAgentConfig()` to re-fetch and
   re-render the sidebar.

Closes LOBE-10441

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(agent-builder): isolate editingAgentId to avoid message ownership desync

Per code review: the previous fix overrode state.metadata.agentId with the
editing target, but messages are already written with persistAgentId =
resolvedAgentId (the builder builtin). AgentRuntimeService.queryUiMessages
reads metadata.agentId to filter messages, so overriding it would cause the
gateway handler to snapshot the wrong topic and desync the builder conversation.

Correct approach: keep agentId as the builder builtin throughout. Carry
editingAgentId as a separate metadata field that only flows through to
ToolExecutionContext, where the AgentBuilder server runtime reads it via
ctx.editingAgentId ?? ctx.agentId. No other part of the pipeline is affected.

Changes:
- apps/server/src/services/aiAgent/index.ts: revert agentId override; keep
  editingAgentId as an independent appContext field (conditional spread)
- apps/server/src/services/toolExecution/types.ts: add editingAgentId to
  ToolExecutionContext
- apps/server/src/modules/AgentRuntime/RuntimeExecutors.ts: forward
  state.metadata.editingAgentId into the ToolExecutionContext
- apps/server/src/services/toolExecution/serverRuntimes/agentBuilder.ts:
  use ctx.editingAgentId ?? ctx.agentId in all three write methods

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): restore credentials in edit mode, prevent silent wipe on save

Three bugs caused custom connector headers / bearer tokens to be lost silently:

1. Dead-code branch in edit-mode save: `authType === 'header'` could never be
   true (the auth radio only has none/bearer/oauth2), so every save with
   `authType === 'none'` hit `patch.credentials = null` and wiped whatever
   was stored — including valid header credentials.  Fixed by mirroring the
   create-mode logic: `authType !== 'oauth2'` → check Advanced headers → save
   `{type:'header'}` if present, null otherwise.

2. `list` API strips credentials entirely, so `editValue` always computed
   `authType = 'none'` and `headers = undefined`, leaving the edit form blank
   even when credentials were saved.  Added `getForEdit` tRPC query that
   returns the decrypted user-set credentials (bearer token, custom headers)
   while still excluding machine-managed OAuth tokens and DCR client secrets.
   `CustomConnectorModal` now fetches this on open and builds `editValue`
   from the real data.

3. `DevModal` seeded the form once on mount (`useEffect([], [])`).  Since
   credentials are loaded asynchronously after open, the form was already
   seeded with empty data before the fetch completed.  Changed to a
   `seededRef`-guarded effect on `[open, value]`: resets on close, seeds once
   when the value arrives, and never overwrites user edits mid-session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-16 11:06:17 +08:00
Arvin Xu a94b3a4ce2 ️ perf: optimize agent document list query (#15904) 2026-06-16 10:47:19 +08:00
Innei 30a62ab478 🐛 fix(desktop): remove web onboarding aliases (#15902) 2026-06-16 10:42:27 +08:00
LiJian d3fbc19473 🐛 fix(agent-builder): correct target agentId and refresh sidebar in gateway mode (#15888)
* 🐛 fix(agent-builder): correct target agentId and refresh sidebar in gateway mode

In gateway mode, AgentBuilder's tool calls (updateConfig / updatePrompt /
installPlugin) were targeting the builtin builder agent instead of the agent
being edited, and the left-sidebar never refreshed after a successful write.

Two root causes fixed:

1. **Wrong agentId on server** — `executeGatewayAgent` only sent `context.agentId`
   (= the AgentBuilder builtin) to the server. The editing target was only held
   in `chatStore.activeAgentId` (synced by AgentBuilderProvider) but never
   forwarded. Now, when `scope === 'agent_builder'`, the client sends
   `appContext.editingAgentId = chatStore.activeAgentId`. The tRPC Zod schema
   and `ExecAgentAppContext` type both accept the new field, and
   `aiAgent/index.ts` uses it to override the operation's `agentId` so
   `state.metadata.agentId` (and therefore `ctx.agentId` in the server
   executor) points to the correct editing target.

2. **No sidebar refresh** — In client mode the `AgentManagerRuntime` directly
   calls `agentStore.optimisticUpdateAgentConfig()`, which triggers a Zustand
   re-render. In gateway mode the update happens server-side so no Zustand
   mutation ever fires. Fixed by adding an `onAfterCall` hook to
   `AgentBuilderExecutor`: after any successful write it reads the editing
   agent ID from `chatStore.activeAgentId` and calls
   `getAgentStoreState().internal_refreshAgentConfig()` to re-fetch and
   re-render the sidebar.

Closes LOBE-10441

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(agent-builder): isolate editingAgentId to avoid message ownership desync

Per code review: the previous fix overrode state.metadata.agentId with the
editing target, but messages are already written with persistAgentId =
resolvedAgentId (the builder builtin). AgentRuntimeService.queryUiMessages
reads metadata.agentId to filter messages, so overriding it would cause the
gateway handler to snapshot the wrong topic and desync the builder conversation.

Correct approach: keep agentId as the builder builtin throughout. Carry
editingAgentId as a separate metadata field that only flows through to
ToolExecutionContext, where the AgentBuilder server runtime reads it via
ctx.editingAgentId ?? ctx.agentId. No other part of the pipeline is affected.

Changes:
- apps/server/src/services/aiAgent/index.ts: revert agentId override; keep
  editingAgentId as an independent appContext field (conditional spread)
- apps/server/src/services/toolExecution/types.ts: add editingAgentId to
  ToolExecutionContext
- apps/server/src/modules/AgentRuntime/RuntimeExecutors.ts: forward
  state.metadata.editingAgentId into the ToolExecutionContext
- apps/server/src/services/toolExecution/serverRuntimes/agentBuilder.ts:
  use ctx.editingAgentId ?? ctx.agentId in all three write methods

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-16 10:19:30 +08:00
Arvin Xu 11f0083074 💄 style(chat): add breathing room around message refresh hint (#15906)
* 💄 style(chat): add breathing room around message refresh hint

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(chat): keep refresh hint top flush, widen bottom gap to 24px

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 10:19:21 +08:00
Arvin Xu 89b74fd7eb 🐛 fix(chat): show cached message refresh hint (#15901)
* 🐛 fix(chat): show cached message refresh hint

* 🐛 fix(chat): show refresh hint for store-backed cache

* 🐛 fix(chat): wait for model config before agent notice
2026-06-16 01:47:06 +08:00
Arvin Xu 68aaa2f6f2 🐛 fix(agent-runtime): forward model extend params on server-side agent runtime (#15891)
* 🐛 fix(agent-runtime): forward model extend params on server-side agent runtime

Share the model extend-params resolution between the client chat service and
the server-side agent runtime so reasoning/thinking params (e.g. Gemini's
thinkingLevel) actually reach the request. Previously only the client resolved
them, so server-driven agent runs returned empty thought summaries.

- extract applyModelExtendParams into @lobechat/model-runtime
- client resolveModelExtendParams delegates to the shared core
- server RuntimeExecutors resolves extendParams (with canonical-card fallback
  for aggregation providers like lobehub) and forwards them in the payload

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(agent-runtime): mock applyModelExtendParams in agent-runtime suites

The executor now imports applyModelExtendParams from @lobechat/model-runtime,
which these suites mock as a fixed object. Add the new named export (returning
an empty result, preserving prior payload behavior) so the mocked module
resolves and call_llm can run.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 01:37:49 +08:00
Arvin Xu 906e385e8a feat(desktop): support approved external local file previews (#15895)
*  feat(desktop): support approved external local file previews

* 🐛 fix: keep external local tabs in close scope
2026-06-16 01:24:47 +08:00
AmAzing- 91d8025421 💄 style(agent): clarify workspace copy and move actions (#15897) 2026-06-16 01:13:04 +08:00
Innei 3e261ca2c9 🐛 fix(sidebar): anchor spacer immediately after the accordion block (#15871)
* 🐛 fix(sidebar): anchor spacer immediately after the accordion block

The home sidebar spacer (`__spacer__`) drifts away from the recents+agent
accordion block in two reachable cases: (1) the dropdown-menu "move recents/
agent up/down" leaves the spacer floating above the accordion, and the
CustomizeSidebarModal then silently relocates it on the next drag; (2)
`withAllKnownKeys` appends every missing default to the tail, so any future
top-group default would land in the bottom group for existing users.

Enforce a single invariant in the selector: the spacer always sits right
after the last accordion item. `normalizeSpacerPosition` re-anchors on read
so legacy state self-heals, `withAllKnownKeys` splits backfilled defaults
into top vs bottom by their position in `DEFAULT_SIDEBAR_ITEMS`, and
`reorderSidebarItems` normalizes its result and returns the input reference
when the move is a visible no-op so callers' `next === items` short-circuit
still fires.

* 🐛 fix(sidebar): keep customize drag overlay within modal context

* 🐛 fix(sidebar): apply customization after confirm
2026-06-16 01:10:33 +08:00
Innei 2746a4b454 🐛 fix(locale): align dayjs locale imports (#15896) 2026-06-16 01:10:21 +08:00
Arvin Xu a15ef2e19d feat(git): add git worktree listing across local and gateway paths (#15889)
Add a listGitWorktrees read that powers a worktree picker on both the
local desktop (IPC) and remote device (gateway) paths, mirroring the
existing branch/working-tree read plumbing.

- local-file-shell: parse `git worktree list --porcelain -z`, mark the
  current worktree and attach dirty-file status per worktree
- desktop GitController IpcMethod + electron client service
- deviceGateway.listGitWorktrees + device.listGitWorktrees TRPC procedure
- DeviceGitWorktreeListItem type + useFetchGitWorktrees SWR hook

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 01:02:00 +08:00
Arvin Xu 7bceba5c19 feat(fleet): lab-gated Fleet running-tasks dashboard (#15817)
*  feat(fleet): add lab-gated Fleet running-tasks dashboard

Side-by-side board of all running tasks across the account. The running-task
list is portaled into the NavPanel (replacing the standard nav rail), and each
task renders as a resizable, reorderable conversation column with its own
ChatInput. Columns default to every running task, support drag reorder, width
resize (persisted), close and a "+" to re-add. Gated behind the `enableFleet`
lab flag (Settings → Advanced → Labs); the title-bar entry is hidden by default.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(fleet): add store unit tests for column reorder/add/remove/width

Covers seedColumns (seed-once), addColumn (dedupe), removeColumn,
reorderColumns (the dnd onDragEnd path) and setWidth.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(fleet): running-topic data source, X-axis drag lock, hover border, full-width back bar

- Data source switched from the task running-group to all topics whose
  status is `running` (one column per running topic). getAllTopics is
  filtered client-side; a server-side getRunningTopics query is a planned
  follow-up for accounts with many topics.
- Reorder drag is now locked to the horizontal axis (inline dnd-kit modifier).
- The resize-handle highlight only shows when hovering the handle itself,
  not anywhere on the column.
- Back-to-home now lives in a full-width SideBarHeaderLayout top bar.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(fleet): server-side queryTopics, drag border, loading-state input

- Replace getAllTopics with a `queryTopics` query that filters by status
  server-side (topicModel.queryTopics + lambda TRPC + topicService). The
  board now pulls only running topics instead of the full topic set, and the
  unused getAllTopics procedures (lambda + mobile) and queryAll are removed.
- Dragging a column shows a primary border ring instead of dimming the column.
- ChatInput renders its loading skeleton while the column's messages load.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 fix(fleet): reuse StatusDot, topic-first header, on-demand reply, create-task entry

- Status: reuse the app's StatusDot (running = warning spinner) instead of a
  bespoke badge; drop StatusBadge/status.ts.
- Column header: topic title is now the primary line; agent name + avatar +
  status moved to a smaller secondary line.
- Reply: each column's always-on ChatInput is replaced by a "Reply" button
  that reveals the input on click (lower pressure).
- Sidebar: add a "Create task" button (createTaskModal) above the list.
- Drag: dragging a column shows a fill tint + 1px border instead of the 2px
  primary ring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(fleet): observation tab, status spinner reuse, column menu/op-tray/workdir, agent-picker add

- Create-task entry moved into the sidebar header (next to the title).
- Column "open in chat" icon replaced by a ⋯ menu; the action now opens a
  new Electron tab via electron addTab.
- Fleet route shows "Observation Mode" as its tab title (fleetRouteMeta).
- Each column shows its topic working directory + live git branch under the
  agent name (useFetchGitInfo).
- Dragging a column is opaque now (solid bg + 1px border), not see-through.
- OpStatusTray added to each column to surface running-op progress / tokens.
- Trailing "+" opens an agent picker (AssigneeAgentSelector); selecting an
  agent creates a fresh topic and opens it as a new column. Empty board keeps
  the "+" available.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(fleet): align topic creation params

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 00:43:55 +08:00
Arvin Xu 984815cfc6 🐛 fix(file): only enforce chat upload file-type whitelist in chat mode (#15884)
* 🐛 fix(file): only enforce chat upload file-type whitelist in chat mode

The chat upload file-type whitelist rejects files that agents can readily
parse via tool calls (zip, html, provisioning profiles, files without an
extension, etc.), which hurts agent and heterogeneous-agent workflows where
the whitelist adds no value.

Scope the whitelist to plain chat mode only: `uploadChatFiles` now takes the
conversation's agent id and skips type validation when that agent has agent
mode enabled or is heterogeneous (Claude Code / Codex, etc.). The decision is
keyed off the input/conversation agent id via the by-id selectors rather than
the global current agent, because the chat input can be scoped to a different
agent than activeAgentId (e.g. another desktop tab). Closes #15770.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🚨 fix: sort imports in file chat action to satisfy lint

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 00:41:48 +08:00
AmAzing- b873f26a8c 🐛 fix(agent-signal): preserve preference memory receipt routing (#15892) 2026-06-16 00:03:39 +08:00
Arvin Xu 294400383d feat(group): server-side group orchestration (call agent member) (#15870)
*  feat(group): add server-side group orchestration (call agent member)

Mirror the client GroupOrchestrationRuntime on the server: the supervisor's
own durable QStash operation drives the loop, with lobe-group-management
registered as a server deferred tool. speak/broadcast/delegate run members in
the shared group session via execAgentMember; executeAgentTask(s) reuse the
isolated sub-agent thread. A K=N member barrier backfills the group tool
message and resumes (or finishes, for skipCallSupervisor/delegate) the parked
supervisor through the existing async-tool bridge + CAS. Adds the
group-member-callback QStash webhook for queue mode.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(group): address PR review (finish disposition, ephemeral prompt, task timeout)

- finish-vs-resume now scans ALL pending tools, not just pending[0], so a
  group skipCallSupervisor/delegate call that isn't the first deferred tool in
  a batched turn no longer wrongly schedules a resume.
- in-group member instructions are injected as ephemeral LLM context
  (execAgent: suppressUserMessage + new ephemeralUserMessage) instead of being
  persisted as real `role: 'user'` group messages — matches the client's
  virtual supervisor instruction.
- isolated executeAgentTask(s) now enforce the requested timeout: a watchdog
  interrupts the member and bridges a `timeout` completion so the supervisor
  resumes/finishes instead of staying parked indefinitely.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 23:14:44 +08:00
Arvin Xu 9b93c47415 🐛 fix(hetero): anchor server-side main chain to run's real last tool (#15883)
Server-triggered heterogeneous-agent runs forked the message chain on a
remote-device WS reconnect: several consecutive, distinct main-agent steps
all parented onto the run's FIRST tool message instead of chaining linearly,
leaving orphan sibling assistants.

The chain rule (`computeTurnParentId = lastToolMsgIdEver ?? currentAssistantId`)
relies on in-memory reducer state. On a non-sticky / cold replica the state is
rebuilt from DB by `refreshMainStateFromDb`, which anchored off
`getLastChildToolMessageId(currentAssistantId)`. When `heteroCurrentMsgId` is
not yet bound to the operation, `currentAssistantId` regresses to the seeded
placeholder assistant, so the anchor collapses to the seed's first child tool
and every later step opens off that same node. The class already documents the
"must be sticky to a single replica" caveat — the remote-device path breaks it.

Anchor the chain to the run's real latest main-thread tool instead, read from
the DB and ordered by createdAt, independent of currentAssistantId. Scope to
the run via the seed assistant's createdAt floor (messages carry no operationId,
and a topic runs at most one operation at a time). This also sidesteps the
multi-tool-batch hazard where an earlier tool's result_msg_id is backfilled
before a later tool row's JSONB is rewritten.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:52:02 +08:00
LobeHub Bot f341507fa9 🌐 chore: translate non-English comments to English in src-helpers (#15856)
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 22:04:13 +08:00
Arvin Xu eb31b7d8b9 feat(electron): open a new Home tab from the tab bar "+" button (#15825)
Previously the "+" button ran the active route's createNewTab handler, so
on an agent/group/page tab it created a new topic/page of that same kind.
Make it always open Home instead, and remove the now-dead createNewTab
route-meta machinery.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 22:03:33 +08:00
Arvin Xu 97df98b269 fix: remove ParamsPanelToggle (control icon) from chat header (#15860)
fix: remove ParamsPanelToggle icon from chat header
2026-06-15 22:02:27 +08:00
Arvin Xu cdb0280cd6 ♻️ refactor(agent): scope agent conversation subtree to explicit agentId (#15866)
* 🐛 fix(agent): stop background config fetch from hijacking the active agent

Switching to or opening an agent tab could flash the conversation
header/welcome back to the inbox "Lobe AI" identity. Two causes:

- `useFetchAgentConfig.onData` set the global `activeAgentId` to whatever
  config resolved, so a background/secondary fetch (the inbox config from
  the home input, a side-panel copilot, or another open tab) hijacked the
  routed agent. It now only adopts the fetched agent when none is active;
  route-level sync (AgentIdSync on desktop/mobile, the popup pages' own
  setState) owns `activeAgentId`.
- `AgentInfo` (the agent conversation welcome) read the global
  `currentAgentMeta` / `isInboxAgent`. Scope it to the conversation's agent
  via `useConversationStore(contextSelectors.agentId)` + `*ById` selectors,
  so it renders the routed agent even if the global races.

Also remove the dead `Conversation/AgentWelcome/{index,OpeningQuestions}`
(the conversation welcome is `AgentHome`/`AgentInfo`; this variant was
unreferenced).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): scope agent conversation subtree to explicit agentId (LOBE-10402 phase 1)

Replace global `current*` selectors (which implicitly read the hijack-prone
`agentStore.activeAgentId`) with `*ById(agentId)` in the agent conversation
subtree and two shared features. The agentId is sourced explicitly:

- inside the ConversationProvider → `useConversationStore(contextSelectors.agentId)`
  (MainChatInput, AgentConfigError, HeterogeneousChatInput, ToolAuthAlert, TTS,
  ShareImage, History)
- ConversationArea → its own `context.agentId`
- above the provider → `useChatStore(s => s.activeAgentId)` (route-driven via
  AgentIdSync) — ChatConversation, AgentSummary
- already-available id → prop (AgentTopicManager/Header) or resolved context
  (ShareModal/ShareDataProvider)

Add the missing `getAgentTTSVoiceById` and `getAgentConfigErrorById` byId
selectors (+ tests). The `current*` selectors are left in place for now; they
are removed in the final phase once every caller is migrated.

Refs LOBE-10402.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent): pass the scoped conversation agentId into the hetero guards

`useHeteroAgentCloudConfig` and `useRemoteAgentDeviceGuard` read the global
`activeAgentId` internally, so when the conversation agent differs from it
(the tab-hijack scenario), the cloud-credential and bound-device checks
validated a different agent than the one `agencyConfig`/`isDeviceExecution`
were computed from — the input could be enabled without the routed Claude Code
agent's credential check, or blocked with the wrong device status.

Both hooks now take the conversation `agentId` explicitly and read that agent's
agencyConfig by id, keeping every hetero check on the same routed agent.

Refs LOBE-10402.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:58:42 +08:00
Arvin Xu 45dfc4cf87 💄 style(thread-list): cap nested thread list height with scroll overflow (#15861)
* 💄 style(thread-list): cap nested thread list height with scroll overflow

When an active topic has many threads, the nested list grows unbounded and
pushes the rest of the topic list off-screen. Cap it at ~9 rows and scroll
the overflow within the list itself.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(thread-list): scroll the active thread into view in the capped list

The new max-height scroll container always mounts at scrollTop=0, so a thread
restored from the ?thread= query that sits below the visible rows stayed out of
view — and since the topic row isn't highlighted while a thread is active, the
sidebar showed no selection at all.

Add a shared useScrollActiveThreadIntoView hook that nudges the capped list so
the active row (marked via data-thread-id) is visible, keyed off the list-ready
signal so it also fires once async-fetched threads mount. Wired into both the
agent and group ThreadList variants.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 📝 docs(ux): add list selection-visibility & in-progress-edit rules

Distill two UX learnings from the capped thread-list work into the ux skill:
restoring an off-screen selection in a scrolled/capped/virtualized list must
scroll it into view, and editors must back up in-progress input locally so an
accidental exit, crash, or failed save can't vaporize the user's work.

Reorganize the checklist by interaction type (Read / Edit / Act / Feedback /
Grow) instead of a flat list, and use English-only headings and value tags.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(thread-list): make the capped nested list actually scroll

The cap was on the container but never engaged: the list is a flex column,
so the rows (default flex-shrink) compressed to fit max-height instead of
overflowing. Pin each row to min-height 36 so the content overflows, and
swap the wrapper to ScrollShadow so the cut-off shows an edge fade instead
of an invisible hard clip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 21:50:58 +08:00
YuTengjing 6461f4053c ️ perf: dedupe unread count polling (#15881) 2026-06-15 20:18:14 +08:00
LiJian fb5566cdbc 🚀 release: 20260615 (#15877)
# 🚀 LobeHub Release (20260615)

**Release Date:** June 15, 2026
**Since v2.2.4:** 48 merged PRs · 5 contributors

> This cycle lands the Composio integration as the new connector
backbone, a unified tiered client cache, and a deep round of
agent-runtime reliability hardening for cold-replica and sub-agent
flows.

---

##  Highlights

- **Composio integration** — New Composio integration layer replaces
Klavis as the connector backbone for third-party skills. (#15461)
- **Tiered client cache** — Unified localStorage + IndexedDB cache
provider with per-scope isolation, plus a registry-wide convergence of
SWR keys for predictable invalidation. (#15844)
- **Gateway mode in chat config** — Gateway mode now lives in chat
config, making it per-conversation rather than a global toggle. (#15714)
- **Bulk move topics** — Move multiple topics to another assistant in
one action. (#15809)
- **Skills row actions** — View / rename / delete row actions in the
working sidebar, plus edit / uninstall for connectors in Skill detail.
(#15864, #15829)
- **Token usage cache rate** — Conversations now surface the
prompt-cache hit rate alongside token usage. (#15812)

---

## 🏗️ Core Agent & Architecture

- **Run lifecycle** — Extracted client run-completion into a shared
`buildRunLifecycle`, with a characterization net over agent-runtime
run-lifecycle. (#15854, #15843)
- **Sub-agent resilience** — Hardened async sub-agent suspend/resume
against missed wakeups. (#15855)
- **Cold-replica correctness** — Fixed main-turn idempotency and now
mark topics failed on terminal errors; persist sub-agent turn id so cold
replicas don't fragment a turn; dedupe sub-agent thread creation after
finalize. (#15838, #15808, #15849)
- **Stream routing** — Drop sub-agent-tagged events from the main
gateway stream handler, and preserve `subAgentId` / `documentId` in the
message bucket key context. (#15814, #15865)
- **Heterogeneous agents** — Forward bot / IM image attachments to
heterogeneous agents. (#15868)
- **Agent state** — Stop background config fetch from hijacking the
active agent, and warn when agent mode is on but the model lacks tool
calling. (#15862, #15828)
- **Tracing** — Enable S3 tracing by default in production. (#15841)

---

## 🔌 Integrations & Skills

- **Skill panel** — Dedupe skill-panel rows and allow deleting pending
integrations; stop connected integrations from duplicating in the
chat-input skill panel. (#15872, #15869)
- **Connectors** — Edit / uninstall buttons for connectors in Skill
detail. (#15829)

---

## 🖥️ Chat & User Experience

- **Topics** — Server-side status filter via a new `queryTopics` query,
and per-agent topic search scoped by `agentId`. (#15822, #15798)
- **Message rendering** — Render mixed assistant blocks in natural
order, fold short mixed tool blocks together, and render mention names
from the serialized attribute instead of falling back to "unknown".
(#15810, #15857, #15831)
- **Tool workflow** — Tool-workflow collapse no longer shows "in
progress" once content renders below it. (#15815)
- **Token usage** — Derive operation token usage from messages rather
than a parallel accumulation. (#15819)
- **Reconnect** — Normalize reconnect `startTime` to epoch ms. (#15811)
- **Home & editor** — Hide the agent-mode notice while config is
loading, and isolate the page-editor copilot context from global
agent/document state. (#15846, #15826)
- **Polish** — base-ui modal fixes the provider delete-confirm z-index,
the updater renders release notes as Markdown, revert-confirm and toast
copy tightened. (#15845, #15867, #15813)
- **Desktop** — Tray double-click opens the main window. (#15816)

---

## 🔒 Reliability

- **Auth gating** — Gate the `listDevices` request behind login state so
it no longer fires before authentication. (#15876)

---

## 🔧 Tooling & Internal

- **SWR convergence** — Converged store-, UI-, and straggler SWR keys
into the `swrKeys` registry, fixing a stale prefetch key along the way.
(#15863, #15858, #15853, #15850, #15848)
- **Tests** — Characterization coverage for parked states and
post-persist title wiring; removed stale `LOBE-XXX` markers; updated
testing skill rules. (#15847, #15852, #15807)
- **Docs** — Added the ux design-values / execution-checklist skill and
a capability-gated feature checklist. (#15823, #15832)
- **Misc** — Fixed workspace prefix handling; bumped
`@vitest/coverage-v8` to v3.2.6. (#15837, #15802)

---

## 👥 Contributors

Huge thanks to **5 contributors** who shipped **48 merged PRs** this
cycle.

@arvinxx · @LiJian · @Innei · @tjx666 · @Rdmclin2

Plus @lobehubbot and renovate[bot] for maintenance.

---

**Full Changelog**: v2.2.4...release/weekly-20260615
2026-06-15 19:37:10 +08:00
Arvin Xu e444a886ff 🐛 fix(device): gate listDevices request behind login state (#15876)
Devices are served by an authed lambda procedure, but the client fired
`device.listDevices` unconditionally — `useEffectiveWorkingDirectory`
(broadly mounted in chat) and `WorkingDirectoryPicker` both called
`useFetchDevices()` with no argument, so logged-out web users sent a bare
request that 401s. The settings `DeviceList` queried it directly with no
`enabled` gate too.

Thread `isLogin` (|| isDesktop, matching `useInitUserState`) into all three
call sites and flip `useFetchDevices`'s default to `false` so the safe
default is opt-in.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 18:05:44 +08:00
LiJian aa7bc81fbc 🐛 fix(skill): dedupe skill panel rows & allow deleting pending integrations (#15872)
* 🐛 fix(skill): dedupe skill panel rows and allow deleting pending integrations

Two related fixes for the chat-input "+" → Skills panel:

1. Dedupe by key: the same app can be sourced from more than one list
   (a Composio/LobeHub integration item plus an installed plugin sharing the
   same identifier), which rendered the row multiple times. Add a key-based
   dedup pass on the final skill list, keeping the first (richer) occurrence.

2. Deletable pending integrations: a Composio server that exists but isn't
   ACTIVE (pending auth / re-authorize — e.g. after closing the OAuth popup)
   only rendered a Connect/Re-authorize link with no "..." menu, so it could
   never be removed. Give these rows a delete-only policy menu (via the "..."
   button and right-click) backed by removeComposioConnection, while keeping
   the Re-authorize action. renderPolicyMenu gains a `deleteOnly` mode that
   hides the meaningless Pinned/Auto options for not-yet-connected entries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(skill): drop optimistic plugin id when deleting a Composio connection

handleConnect adds the new server id to the agent's plugins before OAuth
completes, so removeComposioConnection alone left an orphan id in the config:
the row stayed counted as pinned, and a later reconnect's togglePlugin flipped
the freshly-connected skill back off. Wrap removal so it also unpins the id via
togglePlugin(id, false) (a no-op when absent), for both active and pending
delete paths.

Addresses Codex review feedback on #15872.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🔨 chore(skill): make Composio plugin-id cleanup best-effort on delete

Swallow togglePlugin failures so the optimistic plugin-id cleanup can never
break the actual connection removal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(skill): allow removing orphaned Composio entries with no server

A Composio app whose id lingers in the agent's plugins but has no server yet
(added optimistically, never authorized) rendered a plain "Connect" row with no
"..." menu, so it couldn't be removed. Surface such ids in the list and give
them the same delete-only menu (via "..." and right-click) as pending servers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 17:05:17 +08:00
Arvin Xu ced1d5dec5 🐛 fix(agent): forward bot/IM image attachments to heterogeneous agents (#15868)
Bot/IM channels (Slack, Telegram, …) deliver attachments as raw `files`
buffers, while the SPA gateway delivers pre-uploaded `fileIds`. The
heterogeneous-agent branch of `execAgent` forked early and only handled
`fileIds`, so images sent through a bot were silently dropped — the CLI
(Claude Code / Codex) received text only.

Unify the turn setup so both branches share one implementation:
- Extract `resolveRunAttachments` (raw `files` → S3 via ingestAttachment +
  `attachedFileIds` → resolveAttachmentsByFileIds), returning
  {fileIds, imageList, videoList, fileList, warnings}; attachment resolution
  is non-fatal.
- Hoist attachment ingestion + user-message + assistant-placeholder creation
  above the hetero/normal fork; both branches consume the same records.
- Exclude the freshly-created turn from `loadHistoryMessages` via a
  `selfMessageIds` set so the prompt isn't double-counted in the LLM context.
- Assistant-placeholder fields stay conditional (hetero seeds provider only;
  the CLI reports the real model later). Agent Signal stays normal-only.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 16:30:46 +08:00
Arvin Xu 4b72bcfe99 feat(skills): add view/rename/delete row actions in working sidebar (#15864)
Add hover-revealed action buttons and a shared right-click context menu to
skill rows across project, agent, and user skill lists in the working
sidebar, plus a shared RenameSkillModal.

- SkillsList: per-row `getRowActions` descriptor drives both the hover icon
  cluster and the context menu; disabled actions render greyed for
  not-yet-supported operations
- User skills: view (detail modal), rename (user-authored only), delete
- Agent skills: view/rename/delete via the agent-document service
- Project skills: view (local only); rename/delete stubbed "coming soon"
  until the filesystem-mutation IPC lands

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 16:26:32 +08:00
LiJian 6f07089ea7 🐛 fix(skill): stop connected integrations duplicating in chat-input skill panel (#15869)
The chat-input "+" → Skills panel listed connected integrations (Gmail,
Google Calendar, Google Drive, etc.) twice: once as a brand-icon item under
the LobeHub group, and again as a generic plug-icon "community plugin".

Root cause: community plugins were filtered with a blacklist
(`type !== 'customPlugin'`), so integration gateway plugins whose source is
`'self'`/`'builtin'` leaked into the community group. The /settings/skill
page already avoids this by whitelisting `type === 'plugin'`. Align the
chat-input panel with the same whitelist.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:59:22 +08:00
Arvin Xu 5770ba67a8 ♻️ refactor(chat): extract client run-completion into buildRunLifecycle (#15854)
Introduce the unified store/UI run-lifecycle contract (AgentRunLifecycle) + a
buildRunLifecycle factory, and wire the CLIENT streaming runtime through it.
Behavior-preserving (strategy A): the client completion effects are relocated
verbatim into the factory hooks, so the characterization net stays green.

- runLifecycle/types.ts — AgentRunLifecycle contract: 9 lifecycle hooks incl.
  onRunParked/onRunResumed, carrying a runId that survives across operations and
  a runScope gate. Explicitly separate from the runtime-internal BLOCKING hooks.
- runLifecycle/buildRunLifecycle.ts — factory implementing the client effect set
  (afterCompletion → drain/requeue → completeOperation/markUnread → normalized
  client.runtime.complete signal → desktop notification). normalize/findCompletion
  helpers relocated here.
- streamingExecutor — completion block replaced by completeRun + afterRunComplete
  calls; dead emit closure removed.

Gateway/hetero adapters + hoisting the assembly to the sendMessage seam land in
LOBE-10379. No behavior change: streamingExecutor net 43/43, sibling suites 79/79,
type-check + eslint clean.

Part of LOBE-10376
Closes LOBE-10378

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 15:38:24 +08:00
Arvin Xu b684305667 chore: remove LOBE-XXX markers from characterization test comments (2026-06-15) (#15852)
chore: remove LOBE-XXX markers from streamingExecutor characterization tests

- Replace LOBE-10377 with cross-transport baseline description
- Replace LOBE-10382 with parked/resumed/terminal signal normalization context
- Preserve test semantics — comments now explain intent without Linear ticket references

Co-authored-by: Arvin Xu <arvinx@lobehub.com>
2026-06-15 14:56:05 +08:00
Arvin Xu 8ad6c2180d 🐛 fix(chat): preserve subAgentId/documentId in message bucket key context (#15865)
* 🐛 fix(chat): preserve subAgentId/documentId in message bucket key context

`replaceMessages` and `internal_getConversationContext` rebuilt the
conversation context with a hand-picked field whitelist, silently dropping
`subAgentId` (and others). Since `messageMapKey` uses `subAgentId` as the
group_agent scope subTopicId, group-agent writes collapsed into the wrong
bucket. Spread the whole context instead and only special-case the fields
that need a fallback/assertion (agentId, topicId), so every bucket-key
field carries through.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(database): deterministic ordering in topic.duplicate test

Both seed messages were inserted in one transaction with no explicit
createdAt, so they shared the same `now()` default. `duplicate`'s
`orderBy(createdAt)` then returned the tied rows in arbitrary order,
making the positional assertions flaky. Give them distinct createdAt
(user before assistant) so the order is well-defined.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:57:09 +08:00
Arvin Xu b8ed49ce5b 🐛 fix(updater): render update release notes as Markdown instead of raw source (#15867)
🐛 fix(updater): render release notes as Markdown instead of raw source

The update modal injected release notes via dangerouslySetInnerHTML, but
the content is a Markdown source string (e.g. `## Canary Build`, GFM
tables), so headings/tables/bold were shown literally as raw text.

Render it with @lobehub/ui's <Markdown> component instead. Also handle the
`ReleaseNoteInfo[]` shape of `releaseNotes` by rendering each note.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:56:52 +08:00
Arvin Xu 2df87284cb 🐛 fix(agent): stop background config fetch from hijacking the active agent (#15862)
Switching to or opening an agent tab could flash the conversation
header/welcome back to the inbox "Lobe AI" identity. Two causes:

- `useFetchAgentConfig.onData` set the global `activeAgentId` to whatever
  config resolved, so a background/secondary fetch (the inbox config from
  the home input, a side-panel copilot, or another open tab) hijacked the
  routed agent. It now only adopts the fetched agent when none is active;
  route-level sync (AgentIdSync on desktop/mobile, the popup pages' own
  setState) owns `activeAgentId`.
- `AgentInfo` (the agent conversation welcome) read the global
  `currentAgentMeta` / `isInboxAgent`. Scope it to the conversation's agent
  via `useConversationStore(contextSelectors.agentId)` + `*ById` selectors,
  so it renders the routed agent even if the global races.

Also remove the dead `Conversation/AgentWelcome/{index,OpeningQuestions}`
(the conversation welcome is `AgentHome`/`AgentInfo`; this variant was
unreferenced).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:44:55 +08:00
Arvin Xu 3f7f50edef ♻️ refactor(swr): converge the last straggler SWR keys + fix stale prefetch key (#15863)
* ♻️ refactor(swr): converge the last straggler SWR keys + fix stale prefetch key

Final cleanup of the SWR key convergence. Migrates the remaining ad-hoc keys
that earlier grep-based sweeps missed (they hid behind non-obvious const names
like SWR_KEY / FETCH_*_KEY / SWR_RESOURCES, template-literal keys, the electron
store, and assorted one-off hooks):

- hooks: usePrefetchAgent, useHomeDailyBrief, useGatewayReconnect
- features: OpenInAppButton, Recommendations/useHeteroDetections,
  RecommendTaskTemplates, ResourceManager search
- routes: provider ClientMode + DisabledModels (useSWRInfinite), memory
  analysis task, sidebar task groups, imessage bridge status, Review git patches
- store: user initState + checkTrace, builtin agent init, file resources,
  electron settings/gateway/sync

New registry domains: home, taskTemplate, resource, provider, recommendations,
openInApp, gateway, user, builtinAgent, imessage, sidebar, electron — plus
extensions to aiModel (disabledModelsPage), device (gitReviewPatches /
gitRemoteBranches), userMemory (analysisTask).

🐛 Fix: usePrefetchAgent warmed `['FETCH_AGENT_CONFIG', agentId]`, which never
matched what `useFetchAgentConfig` reads. It now warms
`augmentKey(agentConfigKeys.config(agentId), getActiveWorkspaceId())` — the
exact workspace-scoped key the consumer subscribes to, so hover-prefetch
actually populates the cache.

No tiering/caching change: every new prefix is kept out of CACHE_TIERS
(names avoid the cached agent:/task:/brief: tiers). The electron factory roots
retain their original `electron:getXxx` strings, so those cache identities are
unchanged.

After this, the only ad-hoc SWR keys left are in `packages/*` (can't import
`@/libs/swr/keys`); every `src/` SWR call site now routes through the registry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(swr): drop suspense: true from data-fetching hooks

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(swr): update refreshUserState assertion to registry key

Follow-up to the prior commit: the auth-slice test still expected
mutate('initUserState'); refreshUserState now passes userKeys.initState()
(['user:initState']). Assert against the factory so it tracks the registry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 13:11:21 +08:00
Arvin Xu dcff321290 🐛 fix(assistant): fold short mixed tool blocks together (#15857) 2026-06-15 12:42:30 +08:00
Arvin Xu b28e3672f6 ♻️ refactor(swr): converge UI-layer SWR keys into swrKeys registry (#15858)
Completes the SWR key convergence by migrating the remaining UI-layer ad-hoc
keys (features / routes / components) into the central registry. New domains:
stats, messenger, verify, inbox, share, fork, portal, favorite, changelog,
onboarding, agentHome, agentProfile, agentSignal, ollama, auth, cron,
topicAction — plus extensions to discover (mcpAgents/skillAgents/market),
device (gitBranches/repoType), session (createSession), group (queryAgents*).

- Shared keys (availablePlatforms, agentsForBinding, bindingScopes,
  shared-topic, favorite-status, openNewTopicOrSaveTopic, portal-document-header,
  inbox notifications/unread) are routed through one factory at every call site
  so they still dedupe to a single cache entry.
- The notifications useSWRInfinite getKey and the userMemory-style matcher
  invalidations were migrated in lockstep with their fetch keys.
- No tiering/caching change: every new prefix is kept out of CACHE_TIERS, and
  names avoid the cached prefixes (share:/portal:/agentHome:/agentProfile: etc.
  instead of topic:/document:/agent:). Behavior preserved.
- Folds in the lone cross-layer `cronTopicsWithJobInfo` store mutate.

Packages (builtin-tool *) keep their local keys — they can't import from
`@/libs/swr/keys`; left as-is intentionally.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 12:05:18 +08:00
LiJian 866af8d2f0 feat(composio): add Composio integration layer as Klavis replacement (#15461)
*  feat(composio): add Composio integration layer as Klavis replacement

- Add @composio/core SDK client factory (src/libs/composio)
- Add COMPOSIO_API_KEY server config + enableComposio flag
- Add COMPOSIO_APP_TYPES const with 21 curated apps (appSlug-based)
- Add lambda/composio tRPC router (createConnection, deleteConnection, getConnection, updateComposioPlugin)
- Add tools/composio tRPC router (executeAction, listActions, getActions)
- Add ComposioService with executeComposioTool + getComposioManifests
- Add composioStore Zustand slice (7 files: types, initialState, action, selectors, index, test)
- Wire composioStore into ToolStore state and action tree
- Add composioStoreSelectors to tool selectors index
- Add handleComposioInstall to AgentManagerRuntime
- Extend CustomPluginParams with composio field
- Add enableComposio to GlobalServerConfig types

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🔥 refactor(klavis): remove Klavis integration and migrate all references to Composio

- Delete all Klavis source files (libs, config, const, routers, services, store, UI components)
- Rename KlavisX components to ComposioX equivalents
- Replace all Klavis store selectors, types, and action names with Composio counterparts
- Fix authConfigId to be server-side managed (auto-fetch/create from Composio API)
- Update DB customParams.klavis → customParams.composio throughout
- Fix ToolSource type: 'klavis' → 'composio'
- Fix TaskTemplateSkillSource: 'klavis' → 'composio'
- Fix RecommendedSkillType.Klavis → RecommendedSkillType.Composio
- Remove klavis npm package dependency
- Update builtin-tool-creds: connectKlavisService → connectComposioService
- Update RuntimeExecutors: KLAVIS_SERVICES_LIST → COMPOSIO_SERVICES_LIST
- All Composio-related type errors: 0 remaining

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(composio): complete the klavis→composio migration and wire the OAuth callback

The composio branch had renamed the klavis modules but left consumers
half-migrated, so the OAuth connect link did not work end-to-end. Finish it:

- Add the missing OAuth callback route `/api/composio/oauth/callback` (Composio
  uses managed auth, so it only lands the user back and closes the popup; the
  opener then polls getConnection and syncs tools). Allowlist it as a public
  cross-site redirect landing in the proxy define-config.
- Remove leftover `import { type Klavis } from 'composio'` (non-existent package)
  and type the prop as `string`.
- Fix undefined `oauthUrl` → `redirectUrl` in every OAuth popup opener.
- Map `serverName` to `appSlug` (API) / `label` (display); unify every
  createComposioConnection call to `{ appSlug, identifier, label }`.
- Compare against the `ComposioServerStatus` enum instead of the `'ACTIVE'`
  string literal.
- Use the renamed store fields `composioServers` / `isComposioServersInit`.
- executeComposioTool: `toolName` → `toolSlug`.
- Rename onboarding `KlavisServerItem.tsx` → `ComposioServerItem.tsx` to match
  its import.

* 🐛 fix(composio): use connectedAccounts.link for Composio-managed OAuth

`connectedAccounts.initiate` is no longer supported for Composio-managed OAuth
auth configs (HTTP 400), which broke connecting apps like Gmail. Switch to
`connectedAccounts.link` (POST /api/v3/connected_accounts/link) — same
`{ callbackUrl }` options and `{ id, redirectUrl }` result, so it is a drop-in.

Also treat Composio's `status=failed` callback query param as a failed
authorization in the OAuth callback page.

* 🐛 fix(composio): correct tool sync, execution, callback build, and list dedup

Four fixes found while testing the Composio integration end-to-end:

- listActions: use `getRawComposioTools` (raw defs with slug/inputParameters)
  instead of `tools.get()` (provider-wrapped, name/params under `.function`).
  The wrapped shape left every synced tool with an empty name, so they all
  collapsed to `${identifier}____` and the LLM rejected the request with
  "Tool names must be unique."
- tools.execute: pass `dangerouslySkipVersionCheck: true` (manual execution
  otherwise throws ComposioToolVersionRequiredError when the toolkit version
  resolves to "latest"). Applied to both the executeAction router and the
  ComposioService used by the agent runtime.
- OAuth callback route: escape only `<`/`>`/`&` for the inline-script payload;
  the previous regex embedded literal U+2028/U+2029 line separators which broke
  the regex literal at build time ("Unterminated regular expression").
- installed-plugin selectors: filter out `customParams.composio` (was still
  checking the old `customParams.klavis`), so a connected Composio app no longer
  shows up twice in the skill picker / tool discovery list.

*  feat(composio): pin auth config id per toolkit via env

Add `COMPOSIO_AUTH_CONFIG_IDS` (JSON map of `identifier -> authConfigId`) so a
pre-created Composio auth config (e.g. a custom/white-label OAuth app set up in
the dashboard) can be used directly per toolkit. `createConnection` now resolves
the pinned auth config first, then falls back to discovering an existing one for
the toolkit (matched case-insensitively), and only auto-creates a
Composio-managed config when nothing is configured.

* 🐛 fix(composio): update plugin invoke test to composio + sort tool initialState imports

- action.test.ts: the action was renamed invokeKlavisTypePlugin → invokeComposioTypePlugin
  (Klavis is being removed); update the test to call the composio action and drop
  the klavis-era naming/mock field.
- store/tool/initialState.ts: order the composioStore import before connector to
  satisfy simple-import-sort/imports.

* 🐛 fix(composio): stop client deleting remote connections by static allowlist

useFetchUserComposioConnections no longer deletes remote connections/plugins
for identifiers outside the compile-time COMPOSIO_APP_TYPES list — an outdated
client bundle would silently destroy a legitimate connection. Unknown
identifiers are now only hidden locally.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(composio): resolve connectedAccountId server-side in executeAction

executeAction now takes `identifier` and looks up the connectedAccountId from
the caller's own user-scoped plugin record (PluginModel), instead of trusting a
connectedAccountId supplied by the client — which would let a user drive
another user's connection. Callers (callComposioTool, composioExecutor) pass
identifier accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(composio): enable plugin only after OAuth succeeds

Move enablePluginForAgent into the ACTIVE and post-auth-success branches so a
cancelled/timed-out authorization no longer leaves an enabled-but-unauthorized
Composio tool on the agent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🔥 fix(composio): drop dead OAuth callback postMessage

The lobe-composio-oauth postMessage had no consumer — the OAuth wait uses
polling + window.closed detection. Remove it and its escaping helpers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(composio): resolve type-check errors after canary merge

- Guard authConfigId to a definite string before persisting/returning it
  (createConnection), fixing the string|undefined assignment in both the
  server router and the composio store server object.
- Replace leftover KLAVIS_SERVER_TYPES with COMPOSIO_APP_TYPES in AgentTool.
- Update SkillAuthRow test to a composio source/provider (klavis is removed
  from TaskTemplateSkillSource).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ♻️ refactor(composio): remove leftover klavis naming after migration

Klavis is deprecated and fully replaced by Composio. The migration kept the
underlying composio wiring but left klavis-named identifiers, comments, prompt
tags, i18n keys, and files throughout. Sweep them to composio:

- Code identifiers/comments across ~70 files (isKlavisEnabled→isComposioEnabled,
  allKlavisServers→allComposioServers, klavisManifests→composioManifests, etc.)
- LLM prompt tags (<klavis_tools>→<composio_tools>, KLAVIS_SERVICES_LIST→
  COMPOSIO_SERVICES_LIST) — kept consistent across definition and substitution
- i18n keys tools.klavis.*→tools.composio.* + user-facing "Klavis"→"Composio"
  brand strings, in default setting.ts and all locale setting.json files
- Rename useKlavisOAuth→useComposioOAuth, useKlavisServerActions→
  useComposioServerActions (+ imports)
- klavis.ai homepage URLs → composio.dev
- Remove the dead `klavis` npm peerDependency; swap .env.example Klavis section
  for Composio; update product docs

Changelog history left untouched. Pure rename — no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(composio): remove duplicate composio key in CustomPluginParams

The klavis→composio rename collapsed the deprecated klavis param block onto
the live composio one, producing a duplicate `composio` property. The klavis
shape (instanceId/serverName/serverUrl/isAuthenticated) is dead — no code reads
it — so drop it and keep the live composio shape.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(composio): let pending/errored connections re-authorize or be deleted

A Composio connection link (lk_...) expires after a while. Previously a
pending/errored row only offered to reopen the stored — now expired —
redirectUrl, and the delete action existed only for ACTIVE connections, so an
expired link left the tool permanently stuck: unauthenticatable and
unremovable.

- Add reauthorizeComposioConnection store action: best-effort delete the stale
  connection, then mint a fresh link (replaces the record in place)
- Settings skill item + chat toolbar item: PENDING/ERROR now render a ··· menu
  with Re-authorize (fresh link) and Delete
- Onboarding: pending/errored row click re-mints a fresh link instead of
  reopening the stale one
- i18n: add tools.composio.reauthorize (en-US + zh-CN)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(composio): return auth link instead of opening popup from agent

connectComposioService runs from the agent's response, which carries no user
gesture, so window.open was blocked by the browser and the flow always failed
with "Authorization was cancelled or timed out". Instead of opening the popup
ourselves, return the authorization redirectUrl in the tool result so the agent
can surface a clickable link — the user's click is a real gesture and completes
the OAuth normally. Drops the now-unused popup/poll helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 💄 fix(composio): match pending toolbar item to sibling authorize affordance

The ··· dropdown I added to the chat-toolbar Composio item was a bare icon
(inconsistent color/size with the app's standard menus), its popup was
mis-anchored/offset, and replacing the visible "authorize" cue with a ···
made an un-authorized (pending) row look connected.

Match the sibling LobehubSkillServerItem instead: render a clickable
"Re-authorize" text + external-link icon for PENDING/ERROR. Clicking re-mints a
fresh link (the prior one may have expired) and opens it. No dropdown, so no
offset; the explicit affordance makes it clear the row still needs auth. Delete
stays on the settings page (siblings have no inline delete here either).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-15 11:53:48 +08:00
Arvin Xu f362dcb5db 🐛 fix(agent-runtime): harden async sub-agent suspend/resume against missed wakeups (#15855)
* 🐛 fix(agent-runtime): harden async sub-agent suspend/resume against missed wakeups

The server callSubAgent async park/resume chain (#15481) had a one-shot,
no-retry recovery: a single transient miss left the parent stuck in
waiting_for_async_tool forever. Harden the resume barrier and watchdog
(LOBE-10385 parts 1-3, 5; the park-side deadline fallback follows separately):

- Read-your-writes barrier: completeSubAgentBridge passes the just-backfilled
  toolMessageId to the barrier, which trusts that local write instead of
  re-reading message_plugins from a possibly-stale read replica.
- Bounded backoff watchdog: verifyAsyncToolBarrier now re-arms with exponential
  backoff (15s→30s→60s→120s→240s, 5 attempts) until the barrier passes or the
  op is terminal, replacing the single 15s shot that never re-armed.
- Plug silent bails: !state and pending.length===0 now warn + emit a metric;
  the empty-pending case also arms a fallback verify for snapshot-persist lag.
- Observability: new agent_runtime_async_tool_resume_total counter keyed by
  outcome (resumed/barrier_held/no_pending/no_state/lost_cas/verify_exhausted)
  so missed wakeups surface instead of accumulating silently.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(hetero): reconstruct queued upload files from filesPreview on run continuation

When continuing a heterogeneous agent run with remaining queued messages, rebuild
the upload file items from filesPreview metadata instead of passing bare { id }
stubs, so file context (name/type/preview) survives the continuation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 11:50:05 +08:00
Arvin Xu 3c43d55c69 🐛 fix: render mention name from serialized attribute instead of falling back to unknown (#15831)
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 11:46:19 +08:00
Arvin Xu fa17f00d56 ♻️ refactor(swr): converge remaining store-layer SWR keys (discover/tool/global/userMemory) (#15853)
♻️ refactor(swr): converge remaining discover/tool/global/userMemory store keys

Completes the store-layer SWR key convergence into the central registry
(batch3 only partially covered discover). Migrates the remaining ~39 ad-hoc
keys:

- discover: model/plugin/provider/skill/mcp/groupAgent list+detail+categories
  and user profile (the `.join('-')` string keys → registry array factories).
- tool: agentSkills, installedPlugins, builtin uninstalled-tools, lobehubSkill
  store, mcpPluginList, klavis store. (The dynamic `plugins`-array key is left
  as-is — it's data-derived, not a named key.)
- global: latest/server version, system status.
- userMemory: retrieve / memoryDetail / activities / contexts / experiences /
  identityList / preferences. The `purgeAllMemories` invalidation was rewritten
  from `startsWith('useFetch…')` string matchers to array `key[0] === *.root`
  matchers, in lockstep with the fetch keys.

No tiering/caching change: all new prefixes (discover/tool/global/userMemory)
are kept out of CACHE_TIERS, so everything stays memory-only as before.
Behavior preserved (key identity, mutate match sets, personal-vs-workspace).
UI-layer keys + the cross-layer `cronTopicsWithJobInfo` remain for the next PR.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 11:12:35 +08:00
Arvin Xu c9c57bb7ba 🐛 fix(hetero): dedupe subagent thread create on cold replica after finalize (#15849)
The subagent run coordinator keys thread creation purely on the in-memory
`runs` map. On a cold serverless replica / BatchIngester retry the map is
empty, and `refreshSubagentRunsFromDb` only rehydrates `Processing` isolation
threads — a spawn that already finalized (thread flipped `Active`) is excluded.
So a replayed first-event for a finished subagent hits the `!existing` branch of
`ensureRun` and forks a SECOND thread with the identical title ("一模一样的两个
thread"). Sibling of #15838 (main-turn) / #15808 (subagent-turn), but for the
thread-create step.

Fix: give thread creation a DB-homed, status-independent idempotency guard keyed
by `sourceToolCallId`.
- `SubagentRunsState` gains `finalizedParents: Set<string>`; `finalizeRun`
  records the parent there (instead of just deleting the run), so `ensureRun`
  returns a no-op for a replayed finished spawn — no duplicate thread or message.
- `refreshSubagentRunsFromDb` seeds `finalizedParents` from this operation's
  `Active` isolation threads (without resurrecting them as live runs, which would
  mint empty assistants / re-finalize churn).

Regression: subagent reducer unit test (finalize → replay first event → 0
intents) + handler cold-replica test (finished subagent replay → still 1 thread).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 10:02:21 +08:00
Arvin Xu d6ca168199 ♻️ refactor(swr): converge remaining store-layer SWR keys into swrKeys registry (#15850)
♻️ refactor(swr): converge remaining store-layer keys into swrKeys registry

Migrate all ad-hoc SWR keys still living in the store/service layer onto the
central registry (src/libs/swr/keys.ts), under the uniform `domain:resource`
naming. New domains: discover, eval (agent eval), ragEval, knowledgeBase,
device (incl. git), userMemory, agentKnowledge, agentBot, file, chatTool.

- Pure key convergence: no tiering/caching change. The new prefixes are kept
  deliberately OUT of CACHE_TIERS, so every migrated key stays memory-only
  exactly as before (agentKnowledge:/agentBot: avoid the cached `agent:` tier).
- Behavior preserved: key array shapes, mutate matchers (key[0] === *.root),
  and personal-vs-workspace match semantics are unchanged; string-join keys
  (discover assistant/social) become arrays with equivalent identity.
- UI-embedded SWR keys (features/routes/components/packages) intentionally left
  for a later pass.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 09:55:24 +08:00
Arvin Xu ae88d7535f ♻️ refactor(swr): centralize session/thread/recent/group keys into swrKeys registry (#15848)
* ♻️ refactor(swr): migrate session/thread/recent/group-list keys into swrKeys registry

Batch 1 of the SWR key centralization: add session/thread/recent keys and
group:list to the registry under the domain:resource convention, migrate call
sites + mutate matchers, update the localStorage tier patterns (recent:list,
group:list), and update tests. Removes the ALL_RECENTS_DRAWER_SWR_PREFIX export
in favor of recentKeys.allDrawer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(swr): version+unify message key, drop isLogin from keys, migrate agent/aiModel/image/video/serverConfig

- message: drop `listLegacy`; both stores use the accurate `message:list` key,
  now carrying MESSAGE_CACHE_VERSION; fix the chat store `refreshMessages` to
  invalidate the real key via a context matcher (was a dead key, never matched).
- keys: remove the redundant `isLogin` arg from all list factories (the app is
  always authenticated); drop the now-unused isLogin param from useFetchSessions.
- migrate agent config/available/search, aiModel, image+video generation, and
  serverConfig keys into the registry; update call sites, mutate matchers, tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(swr): restore isLogin arg in list keys

Re-introduce the isLogin argument across the session/agent/group/recent/brief
list key factories and their call sites (incl. useFetchSessions). The key must
vary with auth state so login/logout transitions invalidate the cached list
instead of serving another user's snapshot.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(swr): harden tiered cache flush + scope re-hydration

- localStorageProvider: flush both tiers on visibilitychange→hidden (and
  pagehide) instead of beforeunload. IndexedDB writes are async and can't be
  awaited on teardown; flushing while the page is still alive (hidden) gives
  them time to land before unload.
- Query: reset the new scope's hydration readiness before reloadScope() (in a
  layout effect), so the boot gate keeps blocking through the async IDB re-load
  instead of rendering stale data from a previously-visited scope.
- CacheHydrationGate: render the brand logo while gating instead of returning
  null, keeping the hand-off from the static loading screen seamless.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 03:04:29 +08:00
Arvin Xu 66370675ab test(chat): characterize parked states + post-persist title wiring (#15847)
Fills the three refactor-critical holes left in the characterization net
(LOBE-10377) — exactly the invariants LOBE-10378/10379/10382 will rewrite.

- client (streamingExecutor): waiting_for_async_tool leaves the op UNcompleted
  (no switch case) and emits an undefined complete-signal status (normalize
  falls through); waiting_for_human completes-for-UI but does NOT drain queue
  or mark unread (parked != terminal).
- gateway (gatewayEventHandler): waiting_for_async_tool park is currently
  treated as a completed + unread terminal (no pause short-circuit), and shares
  the `interrupted` reconciliation branch (preserve streamed content vs DB
  refetch, uiMessages SoT takes precedence).
- lifecycle (conversationLifecycle): post-persist summaryTopicTitle fires on the
  CLIENT path (new-topic OR empty-title gate) and is NOT invoked on the GATEWAY
  path (early return; title handled server-side).

Tests-only; characterization (locks current behavior, incl. suspected gaps with
comments). 135 tests pass across the 3 files.

Part of LOBE-10376
2026-06-15 02:33:57 +08:00
Arvin Xu 457b4638c1 🐛 fix(home): hide agent-mode notice while config is loading (#15846)
Home InputArea computed isAgentConfigLoading but never passed it to
DesktopChatInput, so AgentModeNotice flashed the "model unsupported"
warning during hydration. Forward isConfigLoading like every other
call site so the notice only appears after config loads.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 02:16:24 +08:00
Arvin Xu edf058e325 feat(swr): unified tiered cache provider (localStorage + IndexedDB) with scope isolation (#15844)
*  feat(swr): unified tiered cache provider (localStorage + IndexedDB) with scope isolation

Route SWR persistence to a tier chosen centrally by key — IndexedDB for large
business entities (messages, topics, tasks, documents, agents), localStorage for
small list shells (recents) — instead of stuffing everything into one ~5MB
localStorage blob. Partition every tier by identity scope (`${userId}:${workspaceId}`)
so users/workspaces sharing an origin never collide, and add a boot hydration gate
so local-first data is present before the routed app mounts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(swr): centralize IndexedDB-tier keys into swrKeys registry with domain:resource naming

Introduce src/libs/swr/keys.ts as the single source of truth for SWR cache keys,
named uniformly as `<domain>:<resource>` (e.g. message:list, topic:list,
task:detail). Migrate the IndexedDB-tier domains (message, topic, agent, group,
task, document/page/notebook, brief) off scattered local consts/inline literals
onto registry factories, updating call sites, mutate matchers, and tests. The
tiered cache provider now routes by `domain:` prefix instead of ad-hoc
substrings, and matchDomain() enables refreshing a whole domain at once.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 02:02:09 +08:00
Innei c740b13021 🐛 fix(provider): correct delete confirm z-index by switching to base-ui modal (#15845)
* 🐛 fix(provider): use base-ui modal so delete confirm stacks above the config dialog

Closes #15836

* 💄 style(provider): split delete confirm into short title and description

* 🌐 chore(i18n): sync delete confirm title/description across all locales
2026-06-15 01:53:27 +08:00
Arvin Xu f9e7ca5b68 test(chat): characterization net for agent runtime run-lifecycle (#15843)
*  test(chat): characterization net for agent runtime run-lifecycle

Lock the CURRENT client / gateway / heterogeneous run-completion behavior
across all terminal branches BEFORE the unified run-lifecycle refactor
(LOBE-10376), so any behavioral drift is caught by tests.

- client (streamingExecutor): afterCompletion fires on error terminal;
  complete-signal status=failed on error; queue-drain + markUnread skipped
  on error (negative); desktop-notification gating (content && !tools)
- gateway (gatewayEventHandler): error event completes op WITHOUT markUnread
  (asymmetry vs agent_runtime_end); completeOperation double-call idempotency
- hetero (heterogeneousAgentExecutor): notification + dock badge on success;
  updateTopicMetadata-rejection behavior; queue-drain gating
  (success / !aborted / !error); error & abort paths fire no notification/drain
- entry points: regenerate-hetero (imageList + parentOperationId +
  onRegenerateComplete), continue-hetero early-return, rejectAndContinue
  client dual-op, submitHeteroIntervention IPC submit + GC fallback

Tests-only; no implementation changes. 255 tests pass across the affected files.

Part of LOBE-10376
Closes LOBE-10377
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(chat): surface executor rejections in hetero completion helpers

The clean-completion `runToComplete` helper (and its sibling `runToError`)
awaited the executor with `.catch(() => {})`, swallowing any rejection. Both
paths resolve today, so this only masked future regressions: a happy/error
run that starts rejecting after some side effects would still pass — the
isDesktop=false "no notification" negative assertion is especially vulnerable
since an early rejection before the notification step trivially satisfies it.

Await the executor promise directly so a rejection fails the characterization
test instead of passing silently. 70/70 still green (both paths resolve today).

Part of LOBE-10376

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 01:49:48 +08:00
Arvin Xu 542197d8ab 🐛 fix(hetero): correct cold-replica main-turn idempotency and mark topic failed on terminal errors (#15838)
* 🐛 fix(server): dedupe replayed main-turn newStep on a cold replica

The main-agent coordinator cuts a turn purely on the adapter's `newStep` signal and minted a fresh random assistant id each time, with no DB-homed idempotency key for the turn (unlike the subagent path after #15808). On a cold serverless replica the in-memory `processedKeys` dedupe is empty, so a BatchIngester retry reprocesses the `newStep` and `openTurn` forks a second assistant — orphaning the first as a usage-only empty shell (the remote-CC "空壳" bubble).

Mirror #15808 onto the main chain: the adapter emits the turn's CC `message.id` on `stream_start{newStep}`; the reducer records it as `currentMainMessageId` and treats a same-id `newStep` as a replay (no-op); the server stamps it on `metadata.mainMessageId` and recovers it on a cold replica. Backward-compatible: a `newStep` without a message id opens a turn as before.

Regression: HeterogeneousPersistenceHandler.mainTurnRehydration.test.ts (cold-replica retry: 2 assistants + empty shell -> 1) plus 4 mainAgentCoordinator reducer unit tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(cli): mark topic failed when remote CC relays a terminal error on a clean exit

Claude Code relays API/rate-limit errors as an in-stream terminal `error`
event but still exits 0. The CLI derived the heteroFinish result from the
process exit code alone, so such runs reported `result: 'success'` →
`reason: 'done'` and the topic/task was wrongly marked completed instead of
failed (the error was only persisted on the message).

Track whether a terminal `error` event was pushed to the ingester and force
`result: 'error'` even on a clean exit, mirroring the desktop executor where
the stream error drives both the message error and the topic status. Also
surface the terminal error message as the finish error detail (CC relays these
on stdout, so stderr is empty in this case).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-15 01:00:19 +08:00
Arvin Xu 507f251ac5 🔨 chore(agent-runtime): enable S3 tracing by default in production (#15841)
 feat(agent-runtime): enable S3 tracing by default in production
2026-06-15 00:56:20 +08:00
Rdmclin2 d3cc667c97 fix: workspace preifx (#15837)
chore: remove workspace prefix
2026-06-14 22:17:19 +08:00
LiJian 346d5be27c feat(connectors): add edit/uninstall buttons for connectors in SkillDetail (#15829)
* 🐛 fix: clear credentials on URL change; gate Edit button to http connectors

P1 (AddConnectorModal): when handleEdit detects a URL change, pass
credentials: null so the server drops the old OAuth token — a stale token
from the previous server must not be sent to the new one. The server-side
update mutation now also clears tokenExpiresAt in the same round-trip
whenever credentials are set to null.

P2 (ConnectorDetail): narrow the Edit button (and the modal mount) from
isMcpConnector to isMcpConnector && connector.mcpConnectionType === 'http'.
stdio connectors have no mcpServerUrl, so the URL-edit dialog would open
with an empty field and mislead the user.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connectors): add edit/uninstall buttons for SkillDetail connectors

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: re-enable OAuth in edit mode + pre-fill bearer/header credentials

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: resolve TypeScript errors in CustomConnectorModal edit mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: add clientId/clientSecret to mcp.auth type to resolve TS error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix: correct description field location in editValue

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-14 21:55:49 +08:00
Arvin Xu 43c91caf6a 📝 docs: add capability-gated feature checklist to ux skill (#15832)
* 📝 docs: add capability-gated feature checklist to ux skill

Guide designers to fulfil the reminder obligation when a selected model
or its still-loading config can't deliver a feature's required capability
(e.g. agentic tool calling): surface a soft, reactive, load-gated warning
with the remedy, rather than failing silently.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 📝 docs: broaden ux skill trigger to any UI work

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 📝 docs: simplify ux skill description

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:38:31 +08:00
Arvin Xu 602e768419 🐛 fix(page-editor): isolate page copilot context from global agent/document state (#15826)
* 🐛 fix(page-editor): isolate page copilot context from global agent/document state

Two independent bugs both rooted in the page conversation context leaning on
process-global singletons that can't express multiple tabs/documents:

- Heterogeneous agents (Claude Code / Codex) leaked into the page copilot:
  `selectedAgentId` only excluded empty and chat-group ids, so navigating from
  a heterogeneous agent tab made the page right panel run that external agent.
  Also fall back to the page agent when the active agent is heterogeneous.

- `documentId` was lost in multi-tab scenarios because the conversation context
  carried no documentId and relied on the `pageAgentRuntime` singleton, which
  represents only one open document and is cleared on tab switch — causing
  "PageAgent server runtime received a tool call without documentId". Inject the
  editor's `pageId` straight into `context.documentId` so the send-time guard
  uses a deterministic value instead of the singleton.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(page-editor): include documentId in the page conversation key

The previous fix injected `documentId` into the conversation context, but all
state isolation (messages, operations, input-loading/runtime selectors,
replaceMessages) is keyed through `messageMapKey(context)`, which dropped
`documentId` entirely for page scope. Two documents sharing the page agent thus
collapsed into one `page_<agent>_new` bucket — document B could inherit A's
copilot history or be queued behind A's running operation while tool calls now
target B.

Carry `documentId` into the page-scoped key (as subTopicId) so each open
document gets its own isolated bucket; topicless page keys avoid emitting a
literal `null` segment, and the no-document case still falls back to
`page_<agent>_new` without colliding with document-specific keys.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 20:29:42 +08:00
Arvin Xu 87966afec8 🐛 fix(chat): warn when agent mode is on but the model lacks tool calling (#15828)
 feat(chat): warn when agent mode is on but the model lacks tool calling

Show a warning above the desktop chat input when Agent mode is enabled
but the selected model does not support function/tool calling, suggesting
switching to a model with agent capability.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 20:17:58 +08:00
Arvin Xu 8b59a71f29 feat(topic): add queryTopics query with server-side status filter (#15822)
*  feat(topic): add queryTopics query with server-side status filter

Adds `topicModel.queryTopics({ statuses?, pageSize? })`, a lambda `queryTopics`
TRPC procedure, and `topicService.queryTopics` — filtering topics by status
server-side (e.g. to list actively-running topics across all agents without
pulling the full topic set to the client).

Removes the now-unused `getAllTopics` procedures (lambda + mobile),
`topicModel.queryAll`, and the `getAllTopics` service method.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(topic): ownership isolation tests for queryTopics; authed mobile getTopics

- queryTopics: assert it only returns the model user's topics (a status filter
  must not leak another user's data) and that personal vs workspace scopes stay
  isolated.
- mobile getTopics: switch from publicProcedure to the authed topicProcedure
  (drops the manual userId guard + ad-hoc TopicModel construction).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 19:43:32 +08:00
Arvin Xu 097987a262 📝 docs(skills): add ux design-values & execution-checklist skill (#15823)
Define LobeHub's four product design values — 自然 Natural / 意义感 Meaningful /
确定性 Certainty / 生长性 Growth (adapted from Ant Design's values) — in a
dedicated reference file (references/design-values.md), and keep the skill index
focused on per-aspect execution checklists, each tagged with the value it serves:

- Flow & momentum: push the user forward; success state = primary "go to result".
- States: empty / loading / error all designed; empty is a purpose-built page.
- Buttons & focus: exactly one primary button per surface.
- Lists at scale: design for 1 → 10k rows (virtual scroll / pagination / batch).
- Option visibility: pickers list all valid targets (e.g. the virtual inbox).
- Loading visuals: no antd Spin; use NeuralNetworkLoading / project loaders.
- Discoverability & growth: progressive disclosure; surface next capability in context.
- Entity lifecycle completeness: no display-only features — design full CRUD +
  lifecycle, with the operation set scoped to the entity's source (official =
  read-only, community = install/uninstall, custom = full CRUD).

Also: react skill points to ux for loading components, and AGENTS.md references
the ux skill for designing/reviewing user-facing flows.
2026-06-14 19:37:17 +08:00
Arvin Xu 455c25ed1b feat(topic): add bulk move topics to another assistant UI (#15809)
*  feat(topic): add bulk move topics to another assistant UI

Surface the batch-move feature in the per-agent Topics manager:

- `MoveToAgentButton`: a bulk action that opens an assistant picker
  (excludes the source agent) and moves the selected topics over.
- Wire it into `BulkActionBar` next to favorite/archive/delete.
- `batchMoveTopicsToAgent` store action: calls `topicService.batchMoveTopics`,
  optimistically drops moved topics from the current list, refreshes, and
  switches away if the active topic was moved.
- i18n keys (en-US source + zh-CN) for the move action, picker, and toast.

Depends on the server `topic.batchMoveTopics` mutation (already on canary).
Part of LOBE-10330

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(topic): add per-topic move menu + confirm/progress move modal

Address review feedback on the move-topics UI:

- Add a "move to another assistant" item to the per-topic dropdown menu in
  the left sidebar topic list (single-topic move).
- Introduce a shared MoveTopicsModal (base-ui) with a pick → confirm →
  moving → done state machine: a confirmation step before the move, an
  in-progress "Moving…" view that locks dismissal, and a "moved" completion
  view. Both the bulk action and the per-topic menu open this modal.
- BulkActionBar's move button now opens the modal instead of a popover +
  toast, so multi-select moves get the confirm + progress + done flow.
- i18n: add management.moveModal.* + actions.moveToAgent (en-US + zh-CN);
  drop the now-unused management.bulk.moveSuccess toast keys.

Part of LOBE-10330

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(topic): allow moving topics to the inbox (LobeAI) assistant

The move picker sourced agents from the sidebar list, which excludes the
virtual inbox agent — so the default "LobeAI" assistant could never be
chosen as a move target (picker showed "no other assistants"). Prepend the
inbox agent to the target list (unless it is the source), mirroring
AssigneeAgentSelector. The DB-layer ownership check already accepts the
inbox agent, so moving into it is valid.

Part of LOBE-10330

* 💄 style(topic): use NeuralNetworkLoading for the move-in-progress state

Replace the antd Spin in the move modal's "moving" step with the project's
NeuralNetworkLoading, matching the product loading visual. Also document the
rule in the react skill: antd Spin is forbidden — use NeuralNetworkLoading
(or the other src/components loaders) instead.

Part of LOBE-10330

* 💄 style(topic): add "go to target assistant" action on move success

On the move modal's done step, make "Done" a secondary (weak) button and add
a primary "Go to <target>" button that navigates to the assistant the topics
were moved into, so the user can jump straight to the relocated topics.

Part of LOBE-10330

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 18:41:52 +08:00
Arvin Xu 6c8bcf0c8a 🐛 fix(conversation): stop tool workflow collapse showing 'in progress' once content renders below it (#15815)
🐛 fix(conversation): stop tool workflow collapse showing "working" once content renders below it

When an assistant group is still generating, a workflow segment can have a real
answer segment rendered below it — most notably an errored tool block, which
splits into a folded workflow (the tools) plus a trailing answer segment (the
error text). The group-level `workflowChromeComplete` only accounts for the
promoted-final-answer path (`postToolTailPromoted`), so in these cases the
collapse kept rendering its streaming "working" header even though the model had
already moved past it and content was visible below.

Derive completeness from segment ordering: a workflow segment that has any
rendered content after it is no longer the active step. Add
`hasRenderedContentAfter` and OR it into the per-segment `workflowChromeComplete`.

Guard the shortcut with `hasPendingIntervention`: `areWorkflowToolsComplete`
ignores pending-intervention tools and the "awaiting confirmation" UI only shows
while streaming, so a segment still awaiting user confirmation must keep its
streaming chrome even with content below it.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 18:31:16 +08:00
Arvin Xu 32cf754ae3 🐛 fix(chat): derive operation token usage from messages, not a parallel accumulation (#15819)
The operation status tray maintained its OWN running token total by summing
every `turn_metadata` event's usage (`addUsageToOperationMetrics`), separate
from the per-message usage written via `recordUsage`. The two diverged badly:
in an agentic Claude Code loop the tray showed ~8M while the per-message bubbles
summed to ~2.2M.

Root cause is two computations for one number:
- `recordUsage` OVERWRITES each assistant message's usage (last turn wins when
  multiple turns map to one message).
- the tray ADDED every turn's usage — and each turn's `totalTokens` includes
  `cache_read_input_tokens`, so a re-read context got counted once per turn.

Make the per-message usage the single source of truth: `OpStatusTray` always
derives the total via `calculateOperationUsageMetrics(messages)` (previously
only a fallback), and the parallel `addUsageToOperationMetrics` accumulation is
removed from both the heterogeneous-agent executor and the gateway handler. The
tray now equals the sum of the bubbles and refreshes as messages do.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 18:22:09 +08:00
Arvin Xu 8ee5f1c806 🐛 fix(chat): drop subagent-tagged events from the main gateway stream handler (#15814)
On a live gateway / remote-CC stream, a subagent (Claude Code `Agent`/`Task`)
inner-tool event is tagged with `data.subagent` and belongs to an isolation
Thread, not the main bubble. The gateway path fed raw events straight into
`createGatewayEventHandler` (main-agent-only), so a subagent `tools_calling`
chunk appended the inner tool onto the MAIN assistant's `tools[]` — the tools
"leaked" into the parent bubble DURING streaming, then snapped back when the
terminal `fetchAndReplaceMessages` pulled correct DB state (where they live
under the Thread). Classic "流式时漏出来、结束后正常".

The local desktop executor already drops `data.subagent` events before
forwarding (`heterogeneousAgentExecutor`); the gateway path didn't. Drop them at
the top of the handler — one place that covers every gateway caller, and a
no-op for the local executor (which already pre-drops). DB persistence is
unaffected: the server writes subagent rows under the Thread regardless, so they
still appear — correctly under their Thread — after the terminal fetch.

Regression: a subagent-tagged `tools_calling` chunk no longer dispatches onto
the main assistant (verified red without the drop); a non-subagent chunk still
dispatches.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 17:02:50 +08:00
Innei 536290973b feat(desktop): tray double-click opens main window (#15816)
Single click on the tray still starts the Quick Composer capture
session, but is now debounced 250ms so a follow-up double-click can
pre-empt it. Double-click surfaces the main window via
browserManager.showMainWindow(). macOS / Windows only; Linux trays
under AppIndicator do not emit click events and remain unaffected.
2026-06-14 16:44:28 +08:00
Innei 46b379f446 💄 style(chat): tighten revert confirm and toast copy (#15813)
* 💄 style(chat): tighten revert confirm and toast copy

Trim the file-revert Popconfirm description from a two-sentence warning
to a single line ("This can't be undone."), and switch the success toast
from full {{filePath}} to just {{fileName}} so it doesn't span the screen
for deep paths. Updated across all 18 locales.

* ♻️ refactor(chat): migrate file revert from Popconfirm to base-ui confirmModal

Per @lobehub/ui/base-ui-first convention. Drops the local confirmOpen/reverting
state and the data-force-visible CSS pin (no longer anchored to the trigger),
and lets confirmModal handle the OK button's in-flight loading.
2026-06-14 16:29:13 +08:00
Arvin Xu 97708c3fbb 🐛 fix(conversation): render mixed assistant blocks in natural order (#15810)
* 🐛 fix(conversation): render mixed assistant blocks in natural order

Drop the `shouldPromoteMixedBlockContent` heuristic that relocated a
tool-bearing block's prose below its tool when the text scored as
"final-answer-like". Within one assistant message the model's text always
precedes its tool_use (tool_use ends the turn; post-tool prose lands in a
separate, tool-less block), so a mixed block's content is always a preamble
and must stay above its tool. This fixes Claude Code turns (e.g.
askUserQuestion) that rendered the tool card above its own explanatory text.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(conversation): keep mixed multi-tool preamble outside the workflow fold

A mixed block's prose is a preamble, so in a multi-tool turn lift the full
text into a visible answer segment above the workflow and leave only the
tool(s) in the fold. Previously `leadingSentenceSplit` kept only the first
sentence visible and pushed the remaining prose into the WorkflowCollapse
body, which defaults to collapsed once complete — hiding most of the
explanation until the user expanded it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 16:26:11 +08:00
YuTengjing 5872468c17 🔨 chore: update testing skill rules (#15807) 2026-06-14 15:33:23 +08:00
Arvin Xu bc9a7cfab8 feat(gateway): move gateway mode to chat config (#15714)
*  feat(gateway): move gateway mode to chat config

*  feat(gateway): add agent gateway env flag
2026-06-14 15:18:40 +08:00
lobehubbot 8992df3187 🔖 chore(release): release version v2.2.4 [skip ci] 2026-06-14 07:13:33 +00:00
lobehubbot d62843b90b Merge remote-tracking branch 'origin/main' into canary 2026-06-14 07:05:23 +00:00
Arvin Xu 21f78d3a6d 🚀 release: 20260614 (#15806)
# 🚀 LobeHub Release (20260614)

**Release Date:** June 14, 2026\
**Since v2.2.3:** 99 commits · 99 merged PRs · 11 contributors

> This cycle deepens cross-device collaboration — browser pairing, a
shared desktop/CLI device gateway, and edit locks that keep multiple
agents and people aligned on the same Context.

---

##  Highlights

- **Browser device pairing** — Pair a browser as a device and route
agent tools to it, with rename/delete actions on the branch switcher.
(#15678, #15774)
- **Shared device gateway** — Desktop and CLI now share one
remote-device gateway RPC, so device-bound runs behave the same
everywhere. (#15780)
- **Operation status tray** — A live op-status tray sits above the chat
input, tracking operation usage and staying compact on narrow screens.
(#14737, #15736, #15735)
- **Inline file previews** — HTML files render inline and remote
read-only local files preview directly in the portal. (#15671, #15673)
- **New providers** — Added AntGroup (蚂蚁百灵), Longcat with live
model-list fetch, and new SenseNova models. (#13713, #15134, #15306)
- **Desktop tab management** — Drag-to-reorder desktop tabs, plus
restored cloud desktop builds. (#15787, #15666)

---

## 🏗️ Core Agent & Runtime

- **Heterogeneous chaining** — Stabilized main-message chaining and
unified the client hetero executor on a shared `mainAgentReducer`.
(#15783, #15762)
- **Sub-agent resilience** — Block recursive server sub-agents, keep
async sub-agent streams alive, and rehydrate sub-agent runs from DB on
cold replicas. (#15731, #15646, #15788)
- **Reasoning persistence** — Always persist assistant reasoning to the
DB so it survives reloads. (#15687, #15690)
- **Device routing** — Resolve device routing and device-tool injection
through a single execution plan. (#15669, #15683)
- **Image attachments** — Persist and deliver image attachments for
device/sandbox hetero runs. (#15685)
- **Virtual sub-agents** — Split the virtual sub-agent entry and
clarified its naming. (#15733, #15737)

---

## 🖥️ Chat & User Experience

- **Topic management** — Topic sidebar status indicators, selector topic
actions, and a `batchMoveTopics` mutation for bulk moves. (#15739,
#15744, #15793)
- **Local file portals** — Scope local file tabs by working directory
and auto-close empty local previews. (#15732, #15760)
- **Editing** — Coalesce document autosave history into 10-minute
windows and fold connector OAuth into the custom MCP form. (#15716,
#15661)
- **Skills** — Delete/remove actions on settings skill items. (#15708)
- **Polish** — Preserve message order after tool results and stop
ContentLoading from leaking raw operation i18n keys. (#15657, #15752)

---

## 🤖 Models & Providers

- **Model bank metadata** — `knowledgeCutoff` batch 2 with a metadata
skill and an always-visible tab bar, plus backfilled family/generation
data. (#15663, #15642, #15640)
- **Provider quality** — Improved DeepSeek structured output, Kimi code
thinking mode, and a model guard kept in provider grouping. (#15680,
#15725, #15681)
- **Discoverability** — Surface model-list fetch failures instead of
failing silently. (#15753)

---

## 🔒 Reliability & Security

- **Error classification** — Classify "Agent state not found" as
`StateStoreReadError`, classify untyped `Error` throws via message
patterns, and surface missing tool calls as errors. (#15778, #15767,
#15691)
- **Codex** — Parse retry time in the stated timezone and detect the
bundled Codex CLI from Codex.app on macOS. (#15758, #15759)
- **Mobile** — Stop the `pushToken.unregister` 401 storm while
preserving authenticated legacy cleanup, and gate inbox unread count by
login state. (#15719, #15723, #15724)
- **Performance** — Derive topic activity from messages and drop sitemap
generation to cut static export time. (#15726, #15702)
- **Security:** Bumped `@opentelemetry/auto-instrumentations-node`,
`@opentelemetry/sdk-node`, and `vitest`. (#14686, #14687, #15698)

---

## 🔧 Tooling & Docs

- **Agent testing** — Merged local-testing and cli-backend-testing into
a single `agent-testing` skill, with local dev env bootstrap and
post-run iteration. (#15699, #15757, #15700, #15750)
- **Docs** — Replaced Claude-specific references with generic agent
wording across skills. (#15785)

---

## 👥 Contributors

Huge thanks to **11 contributors** who shipped **99 merged PRs** this
cycle.

@hezhijie0327 · @cokeSEE1 · @R3pl4c3r · @arvinxx · @tjx666 · @Innei ·
@Rdmclin2 · @LiJian · @sudongyuer · @Neko · @cy948

Plus @lobehubbot and renovate[bot] for maintenance.

---

**Full Changelog**:
https://github.com/lobehub/lobehub/compare/v2.2.3...release/weekly-20260614
2026-06-14 15:03:54 +08:00
Arvin Xu 9f1ab92242 🐛 fix(chat): normalize reconnect startTime to epoch ms (#15811)
* 🐛 fix(chat): normalize reconnect startTime to epoch ms

After a DB rehydrate (quit + relaunch), an assistant message's `createdAt`
can arrive as an ISO string / Date rather than epoch ms (the message service
casts rows `as unknown` without converting). The gateway reconnect path
anchored a running operation's `startTime` to that value verbatim, so the
running-elapsed-time label computed `Date.now() - startTime` as NaN and
rendered "NaN:NaN" in the topic list.

Normalize `createdAt` to epoch ms and only set `startTime` when the result is
finite; otherwise fall back to `startOperation`'s default `Date.now()`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(chat): assert reconnect omits startTime via matcher

Avoid indexing mock.calls (TS2532/TS2493 on the untyped spy tuple); use
toHaveBeenCalledWith + expect.not.objectContaining instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 15:01:58 +08:00
Arvin Xu 73b58d5bba feat(chat): show token usage cache rate (#15812) 2026-06-14 14:44:24 +08:00
renovate[bot] 729393ca1b Update dependency @vitest/coverage-v8 to v3.2.6 (#15802)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-14 14:33:01 +08:00
Arvin Xu 3335072bdb 🐛 fix(topic): scope per-agent topic search by agentId (#15798)
* 🐛 fix(topic): scope per-agent topic search by agentId

The per-agent Topics search resolved agentId→sessionId and filtered only
by the container (sessionId/groupId). Topics created by the new agent
system carry `agentId` directly with a null sessionId, so they were never
matched — the search showed "No topics match these filters" even though
the topics list (filtered by agentId) and global search displayed them.

`queryByKeyword` now accepts an agentId-aware scope mirroring `query`'s
precedence (groupId > agentId > containerId), matching `topics.agentId`
directly while still matching the resolved sessionId for legacy
un-migrated rows. The lambda searchTopics router passes the agentId
through.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(topic): align keyword search scope with the topics list

Address review on #15798:
- Drop the resolved-sessionId fallback in the agent branch. The topics list
  (`query`) scopes by agentId only, so the fallback (a) surfaced un-migrated
  rows the list hides and (b) leaked topics owned by another agent that shares
  the same session mapping. `matchKeywordScope` now mirrors `query` exactly:
  groupId > agentId > containerId (the last only for legacy/mobile string args).
- Topic inbox no longer exists, so no isInbox handling is threaded through.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:32:33 +08:00
Arvin Xu 01278efdde 🐛 fix(server): persist subagent turn id so cold replicas don't fragment a turn (#15808)
On a cold serverless replica the subagent run is rebuilt from DB, but the run's
turn identity — CC's per-turn `message.id` (`currentSubagentMessageId`) — was the
one field with no DB home, so rehydration hard-set it to ''. The subagent reducer
detects in-thread turn boundaries by comparing that id, so the first event of
every cold batch satisfied `'' !== realId` → a SPURIOUS turn boundary. One CC
subagent turn then fragmented across multiple in-thread assistant rows (text on
one, tools on another), spawned empty-shell assistants (only usage, no
content/tools), and mis-anchored siblings under the same old tool.

Give the turn id a DB home: stamp it on the in-thread assistant's
`metadata.subagentMessageId` at creation (`CreateMessageIntent.subagentMessageId`
→ server interpreter), and recover it in `buildSubagentSnapshot` →
`SubagentRunSnapshot.currentSubagentMessageId` → `rehydrateSubagentRunsState`. A
continuation is then recognized as the SAME turn — no spurious boundary, no
fragmentation, no empty shells. `MessageModel.update` deep-merges metadata, so
later usage/content writes don't clobber the stored id.

Follow-up to #15788 (subagent thread rehydration): that fixed the thread-
duplication half of cold-replica recovery; this fixes the turn-boundary half.

Regression: a CC turn continued on a fresh replica now yields exactly one
in-thread assistant (verified red without the recovery).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 14:21:40 +08:00
Arvin Xu 60fa4e31cd 🐛 fix(chat): inject device-bound project skills into the slash menu (#15797)
* 🐛 fix(chat): inject device-bound project skills into the slash menu

The `/` slash menu loaded project skills via `localFileService.listProjectSkills`
(local Electron IPC) and gated on `isDesktop` alone, so a device-bound (remote)
run scanned the controlling machine instead of the device — and the device's
`.claude/skills` / `.agents/skills` never appeared.

Route through the device-aware `projectSkillService` with the resolved
`remoteDeviceId` and gate on `(isDesktop || !!remoteDeviceId)`, mirroring the
WorkingSidebar's `SkillsGroup`. The SWR key shape matches `useProjectSkills` so
the two share one fetch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(chat): extract shared useFetchProjectSkills hook

Both the `/` slash menu and the SkillsList UI hook duplicated the same
project-skills SWR call (key, fetcher, options). Pull it into a single
`useFetchProjectSkills(workingDirectory, deviceId)` hook so the transport choice
and SWR key live in one place and the two callers dedupe one fetch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(chat): revalidate remote project skills on focus

Remote skills live on a device this client can't watch for filesystem changes,
so refetch them on window focus to pick up edits made on the device. The local
IPC path keeps revalidateOnFocus off — the desktop already sees its own
filesystem.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(chat): resolve effective execution target before picking device id

The slash menu read the raw stored `executionTarget`, so a hetero agent saved as
desktop "This device" (`local` + boundDeviceId) opened on web — where
`resolveExecutionTarget` coerces it to `device` — kept `remoteDeviceId`
undefined and left the menu without project skills, even though the
WorkingSidebar (which resolves the effective target) lists them for the same
agent. Resolve the effective target the same way and treat it as remote only
when it lands on `device` with a bound device.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:11:17 +08:00
Arvin Xu 21a1b45fa1 feat(server): add batchMoveTopics TRPC mutation (#15793)
Expose TopicModel.batchMoveToAgent through a new topic.batchMoveTopics
lambda mutation (topic:update scoped permission, input { topicIds,
targetAgentId }) and add the matching topicService.batchMoveTopics client
wrapper.

Depends on the database layer (TopicModel.batchMoveToAgent).
Part of LOBE-10330

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:00:49 +08:00
Arvin Xu 30fc25fad7 chore(database): add batchMoveToAgent to TopicModel (#15792)
 feat(database): add batchMoveToAgent to TopicModel

Add a transactional TopicModel.batchMoveToAgent(topicIds, targetAgentId)
that reassigns topics to another agent purely via the agentId foreign key.
Both topics.agentId and messages.agentId are updated together (topic lists
query by topics.agentId and message queries filter by messages.agentId),
and sessionId is cleared on both tables so rows fully detach from the
source agent's legacy session. Scoped by ownership to prevent cross-user
moves.

Part of LOBE-10330

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 12:36:29 +08:00
YuTengjing 2f1a746756 📝 docs: replace Claude-specific references with generic agent wording in skills (#15785) 2026-06-14 11:47:20 +08:00
Arvin Xu f47e65d215 🐛 fix(server): rehydrate subagent runs from DB on cold replica (#15788)
* 🐛 fix(server): rehydrate subagent runs from DB on cold replica

Server-side hetero persistence kept per-operation state in a module-level
map. On a cold serverless replica (or any cross-replica batch), the main
agent state is rebuilt from DB but `MainAgentRunState.subagents` was seeded
empty. A continuing subagent event then hit the `!existing` branch of
`ensureRun` and forked a brand-new isolation thread for a parentToolCallId
that already had one — producing piles of generic "Subagent" threads that
were never attached to the right thread. Desktop never hit this (one
long-lived run-state closure).

Rebuild `state.main.subagents` from DB the same way the main half is
rehydrated: add `rehydrateSubagentRunsState` to @lobechat/heterogeneous-agents
and call a new `refreshSubagentRunsFromDb` each ingest. Only runs MISSING
from memory are rehydrated (warm accumulators win); finalized (Active)
threads are excluded so completed spawns are never resurrected.

Sibling of #15783 (main message chaining) — same root cause, subagent half.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(server): scope subagent rehydration to operation + de-dupe inner tools

Two follow-up fixes on the cold-replica subagent rehydration:

- P1: de-dupe inner tool creation against the run-lifetime tool set, not just
  the per-turn `persistedIds`. Per-turn state is reset on every turn boundary
  and starts empty after a rehydration, so a replayed / continued tools_calling
  on a cold replica minted a SECOND tool message for an id the run already
  wrote. `lifetimeToolCallIds` survives boundaries and is restored from DB, so
  it is the durable de-dupe key. Mirrors the main-agent retry protection.

- P2: scope `refreshSubagentRunsFromDb` to the current operation. Topics are
  reused across turns; a prior crashed/cancelled run can leave a subagent
  thread stuck `Processing`. Rehydrating purely by topic+status would merge
  that unrelated thread into the new operation's reducer state and finalize it
  on the new run's terminal drain. Stamp `operationId` on the subagent thread
  metadata at creation and filter rehydration by it.

Adds regression cases for both (each verified to fail without its fix).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 03:13:35 +08:00
Arvin Xu 6dcbd387f7 feat: support drag-to-reorder for desktop tabs (#15787)
*  feat: support drag-to-reorder for desktop tabs

Make the Electron titlebar tabs draggable horizontally to reorder them,
like Chrome tab dragging. Wires the existing `reorderTabs` store action
to a @dnd-kit sortable context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix: preserve scroll position when reordering background tabs

The active-tab auto-scroll effect depends on `tabs`, so reordering
retriggered it and jumped the viewport back to the active tab. Guard it
with a ref so it only scrolls when the active tab id actually changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 02:57:21 +08:00
Arvin Xu fa58fd12a0 🔨 chore(testing): automate local auth setup (#15790)
🧪 test(agent): automate local auth setup
2026-06-14 02:00:49 +08:00
Rdmclin2 913ee4210d feat: page/agent/agentGroup/task edit lock (#15786)
* feat: support page editor lock

Squashed page-lock feature work:
- support page editor lock
- support agent group / agent / task edit
- add edit lock to agent/agentgroup/task
- refactor page lock
- fix workspaceId for edit objects
- align with agent/group/task

* fix: collaborative edit lock

* chore: update i18n

* fix: redis acquire

* fix: release lock

* fix: test case

* chore: complement page lock test cases
2026-06-14 01:40:36 +08:00
Arvin Xu 99411041b9 feat(device): share remote-device gateway RPC between desktop and CLI (#15780)
*  feat(device): share remote-device gateway RPC between desktop and CLI

Extract the desktop's remote-device gateway RPC surface into a shared
`@lobechat/device-control` package and wire it into the CLI so `lh connect`
serves the same git / workspace / file device RPCs as the desktop app.

- local-file-shell: relocate all git operations (branches, working-tree
  patches, branch diff, checkout/rename/delete/pull/push/revert) from the
  desktop GitCtr into the shared package as pure functions
- device-control (new): the `executeDeviceRpc` dispatch + workspace scan +
  portable file-preview / file-index defaults, with platform hooks injected
- desktop: GitCtr / WorkspaceCtr / GatewayConnectionCtr become thin wrappers
  delegating to the shared package (local IPC path unchanged)
- cli: handle `rpc_request` over the gateway via the shared dispatcher

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(device): cover git branch ops and device-control portable defaults

- local-file-shell: real-git integration tests for branch checkout / rename /
  delete (+ validation), working-tree files & patches, revert, branch-diff with
  no remote, and push / pull / ahead-behind against a bare origin
- device-control: defaultGetLocalFilePreview (text / image / accept filter /
  workspace containment / missing file) and defaultGetProjectFileIndex (git
  ls-files path + glob fallback)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(device): preserve directory entries in the glob project-file index

The CLI `getProjectFileIndex` glob fallback used `globLocalFiles`, which returns
only non-hidden file paths and no directory entries — so the Files tree builder
flattened nested files to the root and dropped dot-directories.

Walk with fast-glob (`dot: true`) and synthesize directory entries via the same
`collectProjectDirectories` path the git branch uses, so nesting and dot-dirs
(e.g. `.agents`) render correctly. Extracted a shared `buildEntries` helper.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 00:56:53 +08:00
YuTengjing 39bce329fd 🐛 fix: surface model list fetch failures (#15753) 2026-06-13 23:05:44 +08:00
Arvin Xu 55a969a3c1 🐛 fix(server): stabilize heterogeneous main message chaining (#15783)
* ♻️ refactor(server): reduce main heterogeneous persistence

* 🐛 fix(server): anchor hetero turns to latest tool row
2026-06-13 22:13:45 +08:00
Arvin Xu f51dd06a36 🐛 fix(model-runtime): classify "Agent state not found" as StateStoreReadError (#15778)
`coordinator.loadAgentState(operationId)` returning null throws a raw
`Error("Agent state not found for operation …")`, which (after the refine fix)
otherwise lands as a bare 500. It is a state-store READ failure, so route it to
StateStoreReadError alongside the caller-gone abort.

Because losing an operation's state is a genuine system fault (not benign
client abandonment), promote StateStoreReadError to countAsFailure: true /
severity: error. `ERR caller gone` now counts too — accepted trade-off, both
are system-side read failures worth tracking.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 21:11:33 +08:00
Arvin Xu 24e34c7545 Revert "🐛 fix(agent-document): support image LiteXML in headless editor (#15764)"
This reverts commit 3f3f12dbd2.
2026-06-13 20:29:35 +08:00
Arvin Xu 81d40b90d4 ♻️ refactor(chat): unify client hetero executor on a shared mainAgentReducer (#15762)
*  feat(hetero): add shared mainAgentCoordinator reducer

Pure, transactional main-agent run reducer mirroring subagentCoordinator.
Owns the asst→tool→asst chain rule (lastToolMsgIdEver) as the single source
of truth so client and server can converge on one processing flow. Not yet
wired into either interpreter.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(chat): drive client hetero executor via shared mainAgentReducer

Replace the renderer's hand-written main-agent event state machine with the
shared reduceMainAgent + an applyIntent interpreter (main + delegated subagent
intents). The executor keeps its shell (persistQueue/IPC ordering, optimistic
intervention UI, op usage-metrics tray, notifications, resume fallback) and
still forwards raw events to the gateway handler for live UI; durable DB writes
now flow through the reducer's intents, so the asst→tool→asst parent chain
(incl. the lastToolMsgIdEver toolless-step rescue) is a single shared source of
truth with the server.

Tool/assistant message ids are now pre-allocated by the reducer (matching the
subagent path); updated the executor tests to honor caller-provided ids and
assert against captured ids instead of mock-minted ones.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 📝 docs(chat): clarify why main-scope streamContent intent is a no-op

It's intentional, not dead code: main live token UI is driven by the raw
stream_chunk forward to the gateway handler; the intent only drives the
subagent thread bucket (whose events are dropped before that forward).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(chat): close two hetero executor races from reducer refactor

Two review-found bugs introduced by moving main-agent state into the queued
reduceAndApplyMain:

1. retryWithoutResume's hasStreamedState() read mainState, which is now only
   updated inside the queued reduce — so a recoverable resume error landing
   after partial output was queued (but before the queue drained) could start a
   second run and duplicate/interleave messages. Restore the old synchronous
   guarantee with a `sawStreamedEvent` flag set the moment a stream_chunk /
   tool_result arrives, before queueing.

2. A transient createMessage failure on a step-boundary assistant was
   best-effort (logged, not rethrown), so reduceAndApplyMain still committed
   currentAssistantId to a row that was never created — every later
   content/tool/result write then targeted a missing assistant and was lost.
   Rethrow so the commit is skipped and currentAssistantId stays valid, mirroring
   the subagent createMessage path.

Both guarded by regression tests that fail without the fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 20:10:51 +08:00
Arvin Xu 9cde29fb14 💄 style(workflow): inset partial warning badge (#15773)
* 💄 style(workflow): inset partial warning badge

*  feat(portal): support preview for local markdown images

* 🐛 fix(portal): narrow markdown image src
2026-06-13 20:10:08 +08:00
Arvin Xu ebe8411e7e 💄 style: compact device guard alert (#15776) 2026-06-13 20:09:16 +08:00
Arvin Xu 381e87474c feat(device): add rename & delete actions to branch switcher (#15774)
Hover a branch row in the branch switcher to rename or delete it. Wires
new renameGitBranch / deleteGitBranch operations through both transports
(Electron IPC for the local machine, device.* TRPC RPCs for remote/web),
mirroring the existing checkoutGitBranch / revertGitFile stack.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 20:07:45 +08:00
Arvin Xu 09fd6f3411 💄 style(chat): carousel the OpStatusTray generating phrase every 4s (#15775)
The generating status phrase was picked once per operation and stayed
frozen for the whole run. Rotate it like a carousel — advancing to the
next phrase every 4s with a subtle fade — so a long-running task feels
alive instead of stuck on one line.

- add pickRotatingStatusPhrase: seed keeps the starting phrase stable
  per operation, step advances the carousel; reuses the existing 1s
  elapsed ticker so no extra timer is needed
- fade/slide the phrase on each switch via a keyed wrapper span (keeps
  the shiny-text shimmer animation intact)

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 20:03:07 +08:00
Arvin Xu d9d9f44cb2 🐛 fix(model-runtime): classify untyped Error throws via message patterns (#15767)
* 🐛 fix(model-runtime): classify untyped Error throws via message patterns

`refineErrorCode` only re-derived a specific code when the incoming errorType
was `ProviderBizError`, so raw `Error` throws — which `formatErrorForState`
wraps as `InternalServerError` (HTTP 500) — never reached `matchErrorPattern`.
Persistence-layer (`Failed query: …`) and state-store drops therefore landed
as bare, un-classified 500s instead of `DatabasePersistError` etc.

Add the two un-typed fallback wrappers (`InternalServerError`, `AgentRuntimeError`)
to `REFINABLE_CODES` so their message runs through the pattern registry before
falling back. The existing `Failed query:` pattern already classifies these;
this just lets it run again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(model-runtime): classify Upstash readonly-upgrade & dropped-caller drops

Add `READONLY Writes are temporarily rejected` and `ERR caller gone` to the
StateStorePersistError pattern block — both are Redis/Upstash state-store
failures that otherwise fall through to a bare 500. They describe the
connection/server condition rather than a specific command, so there is no
read-vs-write signal to split on.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(model-runtime): split caller-gone state-store reads into StateStoreReadError

`ERR caller gone` is an Upstash reply when an in-flight blocking READ
(XREAD on the agent event stream, BLPOP on a tool result) is aborted because
the originating caller disconnected — a benign client abandonment tied to the
request lifecycle, not a write/persist fault. Bucketing it under
StateStorePersistError mislabelled it as a harness failure (attribution:
harness, countAsFailure: true).

Add a dedicated StateStoreReadError (E7007, attribution: system, severity:
warning, countAsFailure: false) and route `ERR caller gone` to it. The
write-side rejection `READONLY Writes are temporarily rejected` stays under
StateStorePersistError.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(model-runtime): scope HTTP-status fallback to provider catch-alls

Opening the un-typed wrappers (InternalServerError / AgentRuntimeError) to the
full refine path also let them hit the leadingStatusFromMessage /
codeFromHttpStatus fallback. A harness/DB/Redis throw like `Error('429 …')` or
`Error('500 …')` with no registered pattern would then be recast as
RateLimitExceeded / ProviderServiceUnavailable — provider retry/failure
semantics on a harness error.

Split the sets: PATTERN_REFINABLE_CODES (message matching) stays open to the
wrappers; STATUS_REFINABLE_CODES (the coarse HTTP-status bucket) is limited to
ProviderBizError, where a leading status is a real upstream signal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 19:16:43 +08:00
Arvin Xu 1244a40950 🐛 fix(chat): stop ContentLoading from leaking raw operation i18n keys (#15752)
Internal/bookkeeping operation types (createToolMessage, executeToolCall,
pluginApi, builtinTool*, callLLM, searchWorkflow, ...) have no `operation.*`
locale key, so ContentLoading fell back to rendering the raw key
(e.g. `operation.toolCalling...`).

Extract OpStatusTray's operation→activity mapping into a shared
`resolveOperationActivity` helper and reuse it in ContentLoading: mappable
ops show the localized `opStatusTray.status.*` phase label, container ops
keep their dedicated copy, and unmappable ones fall back to the dot loader.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 19:14:24 +08:00
Arvin Xu a48c2badd9 💄 style: improve shared Linear tool rendering (#15769) 2026-06-13 18:37:51 +08:00
Arvin Xu 3f3f12dbd2 🐛 fix(agent-document): support image LiteXML in headless editor (#15764) 2026-06-13 17:37:51 +08:00
Rylan Cai 99023811d8 📝 fix: clarify local system shell result wording (#15745)
* 🔥 remove local system listFiles exposure

* 📝 clarify local system shell result wording

* 📝 refine local system shell manifest copy

* 📝 simplify local system shell prompt semantics

* 🐛 fix command wait-window result wording

* 📝 limit transient device retry guidance

*  show command output duration

* 🏷️ narrow command duration result type

* 🐛 propagate operation id for device tool calls

* 🐛 update project skill discovery hint

* 📝 clarify project skill file access

* 📝 align project skill discovery comment
2026-06-13 16:34:10 +08:00
Arvin Xu 480a2979e1 🐛 fix(codex): parse retry time in stated timezone (#15758)
* 🐛 fix(codex): parse retry time in stated timezone

* 🐛 fix: enable remote git review panel

* 🐛 fix(codex): preserve adjacent retry meridiem
2026-06-13 16:32:35 +08:00
Arvin Xu 531900cf70 🐛 fix(desktop): detect bundled Codex CLI from Codex.app on macOS (#15759)
* 🐛 fix(desktop): detect bundled Codex CLI from Codex.app on macOS

OpenAI's Codex desktop app bundles the real codex CLI inside Codex.app
(Contents/Resources/codex) but never symlinks it onto PATH. A user with
only the desktop app installed failed PATH-based detection, so codex was
never spawned and the chat silently produced no reply.

Add a well-known install-location fallback inside detectHeterogeneousCliCommand
(tried after the PATH lookup, so a user's own install still wins), covering
both /Applications and ~/Applications. The fallback runs at detection time,
not module load, so it touches no node:os named exports on import. Feed the
detector-resolved absolute path through to spawn so a bare `codex` doesn't
ENOENT under spawn's leaner env.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(desktop): carry login-shell PATH into CLI spawn env

When the detector resolved a bare command via the login-shell PATH, only
the absolute shim path was kept; the PATH used for resolution was dropped.
spawn() then built its env from the leaner Finder-inherited PATH, so an
absolute shim with `#!/usr/bin/env node` still failed with
`env: node: No such file or directory` even though preflight succeeded
(npm/Homebrew/mise installs launched from Finder on macOS).

Surface the resolved PATH through ToolStatus.resolvedPathEnv, stash it on
the session, and merge it into spawnEnv (session.env still wins). Only set
when resolution fell back to the login-shell PATH, so the common on-PATH
case is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 16:32:27 +08:00
Arvin Xu c9325794e5 🐛 fix(portal): close empty local file preview (#15760) 2026-06-13 16:31:56 +08:00
Innei 4a11ed9887 ♻️ refactor(auth): migrate auth pages to a standalone lightweight SPA (#15689)
*  feat(oidc): add interaction details endpoint

*  feat(auth-spa): scaffold standalone auth SPA shell and build pipeline

* 🐛 fix(auth-spa): address review findings in AuthShell copies

*  feat(auth-spa): add spa-auth html route handler

* ♻️ refactor(auth-spa): migrate simple auth pages into auth SPA

* 🔒 fix(auth-spa): validate locale segment in spa-auth route

* ♻️ refactor(auth-spa): move verify-im route to main SPA

* 🔒 fix(auth-spa): sanitize callbackUrl, fix signup form wiring, add router error element

* ♻️ refactor(auth-spa): migrate oauth pages into auth SPA

* 🐛 fix(auth-spa): address oauth migration review findings

* ♻️ refactor(auth): route auth pages to standalone SPA and drop Next auth tree

* 🔒 fix(auth): validate locale before middleware rewrite

* 🔥 chore(auth-spa): drop unused messenger i18n namespace from auth shell

* ️ perf(build): share one react vendor bundle across web/mobile/auth SPA builds

Build react core (react, react-dom, react-dom/client, react/jsx-runtime)
once as a self-contained ESM bundle under /_spa/vendor-shared, then mark
those specifiers external in every SPA build and map them via rolldown
output.paths to the same hashed URLs, so the auth page warms the main
app's react cache. react-router-dom stays per-build: apps use ~19K of it
after tree shaking while a shared bundle must export all 252K.

Also split auth i18n namespaces into per-locale chunks, keep locale
runtime helpers out of the default locale chunk, and group packages/const
into app-const so vendor-ai-runtime no longer captures it.

* ♻️ refactor(spa): extract shared SPA html serving helpers

Both the main SPA and auth SPA route handlers duplicated the Vite dev
asset rewriting, analytics config assembly and html template rendering.
Move them into src/server/spaHtml.ts; the desktop umami block becomes an
opt-in flag only the main SPA enables.

* 🐛 fix(auth-spa): bundle default locale resources and disable i18n suspense to fix signin mount loop

*  feat(auth-spa): wrap auth shell with BusinessAuthProvider slot

* 👷 build(spa): support custom vite dev origin and mark SPA entries side-effectful

* 🔥 chore: drop dead /welcome entry from nextjsOnlyRoutes

* 🐛 fix(auth-spa): forward referral to signup and fix error boundary dark-mode contrast

* ♻️ refactor(spa): lift NextThemeProvider above RouterProvider so route error boundaries are theme-aware

* update
2026-06-13 16:15:04 +08:00
Arvin Xu be7b759820 🛠️ chore(agent-testing): add local dev env bootstrap (#15757) 2026-06-13 13:54:13 +08:00
Arvin Xu fa76928f62 🐛 fix: fix Codex resumed usage reporting for heterogeneous agents (#15751)
🐛 fix(heterogeneous-agent): normalize codex resumed usage
2026-06-13 13:34:41 +08:00
Arvin Xu f6db1361ee feat(agent): show topic sidebar status indicators (#15739) 2026-06-13 13:32:56 +08:00
Arvin Xu 5d6eaf53f3 📝 docs(agent-testing): require inline visual evidence (#15750) 2026-06-13 12:28:56 +08:00
YuTengjing c4e4469083 🐛 fix: improve fallback trace error UI (#15746) 2026-06-13 12:18:56 +08:00
Arvin Xu 800b534741 🐛 fix(chat): track operation usage in status tray (#15736) 2026-06-13 11:55:39 +08:00
Arvin Xu 03b9d07d0b feat(topic): add selector topic actions (#15744) 2026-06-13 11:53:21 +08:00
Arvin Xu f60d1fe8dd 🐛 fix(codex): reuse Linear inspector for MCP calls (#15738)
* 🐛 fix(codex): reuse Linear inspector for MCP calls

* 🐛 fix(codex): gate generic Linear MCP labels
2026-06-13 11:46:16 +08:00
YuTengjing e5a27dc97c 🐛 fix: handle Kimi code thinking mode (#15725) 2026-06-13 11:21:25 +08:00
Arvin Xu c7e0c83174 ♻️ refactor(agent-runtime): clarify virtual sub-agent naming (#15737) 2026-06-13 11:10:14 +08:00
Arvin Xu ab958a0b98 🐛 fix(chat): compact operation metrics on narrow inputs (#15735)
* 🐛 fix: compact operation metrics on narrow inputs

* 📝 docs: improve agent testing report template
2026-06-13 02:28:38 +08:00
Arvin Xu 5362be4078 ♻️ refactor(agent): split virtual sub-agent entry (#15733) 2026-06-13 02:10:47 +08:00
Arvin Xu 6887930428 🐛 fix: resolve local markdown image assets (#15729)
* 🐛 fix: resolve local markdown image assets

* 🐛 fix: preserve UNC markdown asset paths

* 🔒️ fix: restrict markdown image previews to images

* ♻️ refactor: pass markdown image preview accept directly
2026-06-13 01:55:00 +08:00
Arvin Xu da94942d9c 🐛 fix(portal): scope local file tabs by working directory (#15732) 2026-06-13 01:54:44 +08:00
Arvin Xu a9141c8ade 🐛 fix(page): stabilize agent editor sync (#15730) 2026-06-13 01:36:38 +08:00
R3pl4c3r 8ab5ec5364 🐛 chore(workflow): fix Upstream Sync workflow running error (#15706)
fix(workflow): fix Upstream Sync workflow running error
2026-06-13 01:29:44 +08:00
Arvin Xu 222534dbe1 🐛 fix(agent): block recursive server sub-agents (#15731) 2026-06-13 01:24:41 +08:00
Neko f31c94490d ️ perf(app,database): derive topic activity from messages (#15726) 2026-06-13 00:57:45 +08:00
Rdmclin2 52eaf2702e 🐛 fix: workspace url sync (#15728)
* fix: workspace url sync

* chore: remove billing as personal
2026-06-13 00:15:48 +08:00
YuTengjing ce81ea44bf 🐛 fix: gate inbox unread count by login state (#15724) 2026-06-12 23:32:14 +08:00
Tsuki 29974d3ab9 🐛 fix(mobile): preserve authenticated legacy unregister cleanup (#15723)
Follow-up to #15719 addressing a Codex P2 review note.

After #15719, legacy v1.0.7 clients that only send `deviceId` were
silent-OKed unconditionally. But `publicProcedure` still receives
`ctx.userId` from `createLambdaContext` — and in the *active*
sign-out path (the user is still authenticated when logout fires)
that userId is valid. Skipping the delete in that case orphans the
existing `(userId, deviceId)` row, so `PushChannel.deliver` keeps
fanning notifications out to a signed-out device. Expo's
`DeviceNotRegistered` receipt only fires on uninstall, not on
logout, so the cron worker doesn't catch this either.

Fix: add a Path B fallback — when `ctx.userId` is available, run
the original `(userId, deviceId)` delete. Path A (expoToken pair)
still wins when present; Path C (silent OK) is now reserved for
the case the original PR was actually targeting: a v1.0.7 client
whose session is already gone, which is the source of the 401
storm.

Path matrix:
  expoToken present                  → Path A: precise delete by (expoToken, deviceId)
  no expoToken, ctx.userId present   → Path B: legacy (userId, deviceId) delete
  no expoToken, no session           → Path C: silent OK, cron cleans up

Tests added:
- legacy + valid session → falls back to (userId, deviceId)
- legacy + no session    → silent OK
- expoToken always takes precedence over userId fallback
2026-06-12 21:58:23 +08:00
Tsuki f4c431b028 🐛 fix(mobile): stop pushToken.unregister 401 storm (#15719)
Symptom: app.lobehub.com production logs show ~50+ TRPCError
UNAUTHORIZED traces per second on /trpc/mobile/pushToken.unregister,
starting from the v1.0.7 mobile release. Only `unregister` is hit
— `register` never appears in logs.

Root cause: the v1.0.7 client calls unregister *during* sign-out,
after the session is already invalid in practice (expired OIDC
token / cleared cookie). With authedProcedure gating, every logout
turns into a 401 that the client mistakes for an auth-expired
event and retries → a storm. Inside the client this also creates
a logout → 401 → authExpired.redirect → logout recursion.

Fix: change `unregister` to publicProcedure and authorize by the
(deviceId, expoToken) pair the client received at registration —
holding both is proof of ownership of that row, same trust model
as APNs/FCM unregister. Legacy v1.0.7 clients that only send
deviceId get a silent 200; the stale row is cleaned up by the
existing `process-push-receipts` worker via Expo's
DeviceNotRegistered receipts.

Returning 200 to those legacy calls also breaks the client-side
recursion at the source — the in-the-wild v1.0.7 fleet stops 401
flooding the moment this ships, before users update.

Tests:
- Router (mocked): expoToken path deletes by (expoToken, deviceId);
  no-expoToken path silently succeeds; unauthenticated caller
  succeeds; empty-string fields rejected.
- Model (integration): only the row matching both fields is
  removed; mismatched expoToken is preserved (defense against
  callers who only guess deviceId).

Fixes LOBE-10174
2026-06-12 21:47:19 +08:00
Innei 34fbd9ffd3 feat(document): coalesce autosave history versions into 10-minute windows (#15716)
*  feat(document): coalesce autosave history versions into 10-minute windows

*  feat(document): break autosave history window on new page load session
2026-06-12 20:55:28 +08:00
Arvin Xu 09b5e926bf feat(conversation): add op status tray above chat input (#14737)
*  feat(conversation): add op status tray above chat input

Show elapsed time, total tokens, and total cost while an AI-runtime
operation is running in the current conversation. Lives in the floating
overlay above the chat input alongside QueueTray and TodoProgress,
attaches flush to the input panel below.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* 🐛 fix(conversation): read top-level message.usage in op status tray

Token totals stayed at 0 during regular agent runs because the standard
agent path writes usage to `message.usage` (top-level) while the
heterogeneous executor writes `metadata.usage`. Read both. Also drop the
fragile createdAt window — assistant messages can be created before the
AI_RUNTIME op's startTime, which excluded otherwise-valid rows — and
aggregate across the whole conversation instead.

UI: a little more padding, a pulsing dot to mark the running state, a
tokens label, and a divider between tokens and cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

*  feat(conversation): streaming phase, ping dot, and richer metrics in op status tray

- Left side now shows the current streaming phase (thinking / calling tools /
  searching / compressing / generating) derived from the most recent running
  sub-operation; server runtimes surface no sub-ops on the client and fall
  back to 'generating'.
- Pulse dot upgraded to an expanding ping ring animation.
- Zero-valued metrics are hidden entirely (no more '0 tokens / $0').
- Long-running tasks additionally surface turns and tool-call counts next to
  tokens and total cost.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(conversation): polish op status tray display

* 💄 style(conversation): unify op status tray glyph to a single hue

The activity glyph mixed purple and cyan accents into the primary color;
all layers now derive from colorPrimary alone (opacity-only variation).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(conversation): strip glyph halo fill and drop-shadow

The halo's tinted fill plus the drop-shadow rendered as a muddy disc
behind the glyph (worst in light theme). Reduce to a breathing core dot
plus a single rotating dashed orbit, primary hue only.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(conversation): drop dollar prefix and code font in op status tray

The dollar icon already conveys currency, and the code font made the
numbers feel out of place next to the body text.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(conversation): show per-message cost next to the token chip

Renders usage.cost beside the token count in the assistant message
footer; hidden in credit mode (credits already express cost) and when
the value is zero/absent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(conversation): hide per-message cost below $0.20

Cheap messages don't need a cost callout — the chip only surfaces once
the cost is large enough to matter.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(conversation): anchor reconnected op timer to real run start, surface steps

- Page-refresh reconnect recreated the gateway operation with
  startTime=Date.now(), resetting the tray timer to 00:00 mid-run.
  Anchor it to the assistant message's createdAt instead.
- Mirror the server's authoritative stepIndex onto op.metadata.stepCount
  at every step_start event, so the steps metric shows for real
  server-side runs (and survives reconnects).
- Drop the tool-call count metric from the tray.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  test(conversation): stub updateOperationMetadata in gateway event handler mock store

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-12 18:10:29 +08:00
Innei d3e8e7cb65 🐛 fix(locale): support eager dayjs locale modules (#15711) 2026-06-12 16:57:42 +08:00
Rdmclin2 60bed5782f chore: update i18n (#15712)
chore: update i18n files
2026-06-12 16:21:34 +08:00
Rdmclin2 35b6bc55b8 🐛 fix: workspace error (#15701)
feat: support workspace (page author, copyTo/transferTo, notifications, i18n & fixes)

Squashed 13 commits from fix/workspace-error for clean rebase onto main's submodule base.
2026-06-12 16:08:31 +08:00
Innei 365dd1ff64 ️ perf(build): remove sitemap generation to cut static export time (#15702)
* ️ perf(build): remove sitemap generation to cut static export time

The sitemap accounted for 772 of 827 prerendered pages, each fetching
marketplace data at build time. Static generation drops from 28.2s to
0.3s and total next build from ~59s to ~32s.

* Redirect legacy sitemap URLs to the landing site

* Redirect sitemap index to landing sitemap
2026-06-12 15:17:52 +08:00
Innei 7633c0e83f 🐛 fix(share): always serve desktop bundle for share routes (#15710) 2026-06-12 14:54:18 +08:00
LiJian 87b1f39c0f feat(skill): add delete/remove actions to settings/skill items (#15708)
*  feat: add delete/uninstall actions to settings/skill items

- LobehubSkillItem: show compact `...` dropdown in list mode for connected items with Disconnect action (revokes OAuth)
- KlavisSkillItem: show compact `...` dropdown in list mode for connected/pending servers with Remove action (true delete via removeKlavisServer)
- ConnectorDetail: add Delete button for custom (mcp) connectors; calls deleteConnector + notifies parent via onDelete
- SkillDetail / Page: thread onDelete callback so selecting null after deletion triggers auto-select of next item
- Locales: add tools.klavis.remove / removeConfirm.title / removeConfirm.desc in en-US, zh-CN, and default source

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(skill): gate Klavis remove by canEdit and clear selected after removal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(skill): show dropdown for all Klavis/Lobehub items in list mode

Previously, the ... button was gated behind `server` (Klavis) and
`isConnected` (LobehubSkill), so disconnected/never-connected items
showed no actions. Remove those guards so the dropdown always renders
in list mode. handleRemove/handleDisconnect now skip the server call
when no server instance exists and instead clear the selected item.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(skill): move delete/uninstall actions from list dropdown to detail panel

- Remove heavy ... dropdown from KlavisSkillItem / LobehubSkillItem list items
- Add danger Uninstall button to builtin-skill detail header (matches ConnectorDetail style)
- Add slim action bar with Uninstall to agent-skill detail panel
- All actions respect canEdit / canCreate permissions with confirmModal gating

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-12 12:38:22 +08:00
LiJian ca91d2d756 refactor: replace Segmented tabs with SearchBar in ProfileEditor; gate local-system injection (#15593)
* 🐛 fix: activator tool discovery for cloud-sandbox and local-system

- P0: Explicitly inject LocalSystemManifest when device gateway is configured
  (discoverable: isDesktop is always false on server, so it never enters
  the discovery loop. The explicit injection mirrors the canUseDevice guard.)

- P1: Skip CloudSandboxManifest when runtimeMode is not 'cloud'
  (resolveRuntimeMode unifies executionTarget='sandbox' and legacy
  chatConfig.runtimeEnv.runtimeMode paths, so agents with sandbox
  disabled correctly exclude the cloud-sandbox tool.)

Both fixes operate at the manifest-map build stage, consistently affecting
all downstream consumers (activator discovery, availableTools, etc.)

* 🐛 fix: remove cloud-sandbox manifest when runtime is not sandbox

The initial manifest seed via getEnabledPluginManifests includes
defaultToolIds (which contains lobe-cloud-sandbox), so the manifest
was already in toolManifestMap before the allowedBuiltinTools loop's
continue guard. This made lobe-cloud-sandbox activatable even when
sandbox was disabled.

Add a delete right after resolveRuntimeMode to cover both the
manifestMap seed and the allowedBuiltinTools loop in one place.

Co-authored-by: chatgpt-codex-connector[bot]

* ♻️ refactor: replace Segmented tabs with SearchBar in ProfileEditor tool dropdown

- PopoverContent: replace Segmented with SearchBar + internal client-side filtering (same pattern as ChatInput ActionBar)
- AgentTool: remove ~270 lines of duplicated installedTabItems useMemo; pass unified items
- AgentTool: add auto-cleanup for stale plugin identifiers in agent config
2026-06-12 11:18:44 +08:00
Arvin Xu 61586b9377 🐛 fix(agent): persist & deliver image attachments for device/sandbox hetero runs (#15685)
* 🐛 fix(agent): persist file attachments in hetero early-exit user message

The hetero-agent early exit in execAgent created the user message without
the `files` relation, so attachments sent from the SPA gateway path
(executionTarget=device / sandbox) were never linked via messagesFiles and
disappeared once the optimistic client message was replaced by the server
snapshot. Attach the deduped `fileIds` the same way sendMessageInServer
does on the local-mode path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): deliver image attachments to device/sandbox hetero runs

Persisting the messagesFiles relation fixed display, but the dispatched
CLI still never saw the image — local mode feeds the persisted imageList
into sendPrompt for vision, while the device/sandbox dispatch protocols
(agent_run_request / sandbox runner) only carried a text prompt.

- resolve attached images into signed URLs in the hetero early exit
  (metadata-only, non-fatal) and carry them through heteroParams
- add imageList to the agent_run_request wire type and dispatchAgentRun
  params (gateway client + server service)
- extract buildHeteroExecStdinPayload into @lobechat/heterogeneous-agents
  so the three dispatch sites (desktop spawnLhHeteroExec, lh connect
  daemon, server sandbox runner) build the same content-block payload:
  systemContext, prompt, then image blocks
- lh hetero exec already coerces image blocks via coerceJsonPrompt and
  normalizeImage (url → base64 for Claude Code, materialized path for
  Codex), so no CLI consumer changes are needed

openclaw/hermes (runHeteroTask) keep text-only prompts — their dispatch
goes through a separate one-shot tool protocol.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(heterogeneous-agents): move exec stdin wire contract to a pure /protocol entry

The server sandbox runner imported `buildHeteroExecStdinPayload` through the
`/spawn` barrel, which (with no `sideEffects` hint) bundles the whole spawn
machinery into the Next.js server chunk. Its `process.cwd()`-rooted dynamic
fs calls then make Vercel's output file tracing glob the entire repo source
tree into every serverless function (+~69 MB each), pushing the 4 largest
functions past the 250 MB uncompressed limit and failing the deployment.

Split the dispatch wire contract (stdin payload builder + content-block
types) into a new pure, isomorphic `/protocol` export and point all three
dispatch sites (server sandbox runner, desktop main, `lh connect` daemon) at
it. `/spawn` re-exports the moved symbols so executor-side callers are
unaffected. Also declare `sideEffects: false` for the package.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 00:02:51 +08:00
Arvin Xu eca449e4e2 feat(skills): agent-testing iteration after first real-world run (#15700)
* 📝 docs(skills): make agent-testing Step 0 an env-setup + auth checklist

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(skills): agent-testing probes, GIF evidence, and report-language rule

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 23:52:25 +08:00
renovate[bot] 6c8976b641 Update dependency vitest to v3.2.6 [SECURITY] (#15698)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 23:34:38 +08:00
Arvin Xu 60d9d3c3c7 ♻️ refactor(skills): merge local-testing and cli-backend-testing into agent-testing (#15699)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 23:14:45 +08:00
Arvin Xu 2dd4cf7a1d fix(agentDocument): replace getDocuments with listDocuments in useFetchAgentDocuments to avoid over-fetching (#15301)
* fix(agentDocument): listDocuments returns templateId and derived fields

* fix(agentDocument): useFetchAgentDocuments use listDocuments instead of getDocuments

* fix(agentDocument): derive AgentDocumentItem from listDocuments return type

* fix(agentDocument): export AgentDocumentListItem type

* 🐛 fix(agentDocument): align list projections and consumers after rebase onto canary

- listDocumentsForTopic now returns the same projection as listDocuments
  (derived fields + templateId), so the tRPC union no longer collapses
  the inferred client type to the old 8-field shape
- add description/updatedAt to both projections for sidebar consumers
- AgentDocumentsGroup switches getDocuments -> listDocuments (it already
  shared the documentsList SWR key)
- makePendingDocument trimmed to the lean list item shape
- update useFetchAgentDocuments test to the listDocuments behavior

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agentDocument): migrate agentDocumentSkills sync to slim listDocuments

The tool store's skill registry sync shared agentDocumentSWRKeys.documentsList
with the working sidebar and the new useFetchAgentDocuments hook, but still
fetched the full getDocuments payload. Sharing one SWR key across different
payload shapes made the cached result order-dependent: whichever consumer
mounted first decided whether the cache held the heavy full documents or the
slim list items. Migrate the skills sync to listDocuments, whose projection
covers every field mapDocsToSkills reads.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:41:24 +08:00
Arvin Xu 575ef1e8ee ♻️ refactor(agent): single-track device-tool injection via execution plan (#15683)
* ♻️ refactor(agent): single-track device-tool injection via execution plan

P3 follow-up to #15669 — downstream layers now consume the resolved
ExecutionPlan instead of re-deriving device capability:

- ExecutionPlan carries the effective `target`; persisted into
  state.metadata.executionPlan via createOperation
- call_llm executor gates buildStepToolDelta's activeDeviceId signal on
  the plan (none/sandbox can never re-inject local-system mid-run)
- AgentToolsEngine consumes the plan's target; redundant rule-level
  canUseDevice checks removed (physical manifest walls remain)
- builtin agent runtime config can now override agencyConfig
  (web-onboarding pins executionTarget=none)
- hetero desktop 'local' selection persists this desktop's deviceId so
  opening the agent from web dispatches to the same machine via gateway
- 'local' vs 'device' stay distinct user choices even for the same
  machine: gateway dispatch streams progress to all clients (mobile),
  IPC is faster but desktop-session-only — guarded by a regression test

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): enforce device access policy on hetero dispatch

resolveDeviceAccessPolicy now runs BEFORE the hetero early exit and feeds
canUseDevice into the hetero execution plan: a denied sender (external
bot user) degrades local/device-bound CLI hetero runs to the cloud
sandbox instead of dispatching to the owner's machine, and requestedDeviceId
cannot bypass the policy. Remote hetero agents (openclaw/hermes) are
device-only with no sandbox fallback, so denied senders are refused
outright.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(agent): fix interface field order in RuntimeSelectionContext

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 22:39:11 +08:00
YuTengjing ba6976c063 🐛 fix: pause input completion after errors (#15692) 2026-06-11 22:05:45 +08:00
Innei bfdfd3bca3 🐛 fix(desktop): adjust mac fullscreen titlebar spacing (#15693) 2026-06-11 22:02:48 +08:00
YuTengjing f6c23e3654 🐛 fix(agent-runtime): persist assistant reasoning to DB (#15690) 2026-06-11 21:05:23 +08:00
Arvin Xu 813d756b9c 🐛 fix(editor-canvas): re-check editor init state before subscribing (#15686)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:42:28 +08:00
renovate[bot] 671bc26e0d Update opentelemetry-js-contrib monorepo (#13582)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 20:41:48 +08:00
renovate[bot] 309c25cb44 Update dependency code-inspector-plugin to v1.3.6 (#14612)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 20:41:40 +08:00
Arvin Xu a810bf3dcd 🐛 fix(agent-runtime): always persist assistant reasoning to DB (#15687)
* 🐛 fix(agent-runtime): always persist assistant reasoning to DB

PR #13494 gated message reasoning persistence behind preserveThinking
(agent chatConfig + model extendParams / qwen|zhipu fallback). That gate
is only meant to control whether reasoning is replayed into the next LLM
payload — applying it to the DB write dropped thinking content for every
non-qwen/zhipu reasoning model in server-side agent mode: reasoning
streamed live via stream_end but vanished after refresh.

Restore unconditional reasoning persistence in messageModel.update and
keep the preserveThinking gate only for state.messages payload replay.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(i18n): localize callSubAgent tool labels

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:41:29 +08:00
Arvin Xu 7d6be512b8 🐛 fix(model-runtime): align tool-calling fallback tests & surface missing tool call as error (#15691)
*  test(model-runtime): align tool-calling fallback tests with new return shape

#15680 changed generateObject's tool-calling fallback to return the parsed
schema object (same shape as the json_schema path) instead of an array of
tool calls, and reworked its error handling, but left the pre-existing
"tool calling fallback" block in index.test.ts asserting the old behavior,
breaking CI on canary:

- result is now the parsed object, not [{ name, arguments }]
- the no-tool-call path returns undefined via debug log without console.error
- the parse-failure path logs the single matched tool call, not the array

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(model-runtime): surface missing tool call in generateObject fallback as error

tool_choice forces the structured-output function, so a response without a
tool call means the provider misbehaved. #15680 routed this branch to a
debug-namespace log that is invisible in production, leaving callers with
an unexplained undefined. Log it via console.error with the response
message as context, matching the parse-failure branch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 20:40:12 +08:00
LiJian 1130f7df32 feat(devices): add browser device pairing flow (#15678)
*  feat: add browser device pairing flow to /settings/devices

- Add "Via Browser" tab to ConnectDeviceModal with pairing code display and input
- Add "Register this browser as a device" callout card above DeviceList
- Support ?pair=<code> URL param to auto-open browser pairing modal with pre-filled code
- Improve DeviceList empty state with method cards (Desktop + CLI)
- Ship en-US and zh-CN i18n keys for all new browser/sync strings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🔨 fix(devices): fix lint warnings — import sort order and empty catch block

* fix(devices): add pair API route and invalidate device list cache

- Create /api/devices/pair POST handler that authenticates the user via
  Better Auth session, validates the code against the user's registered
  devices via DeviceModel.findByDeviceId, and returns JSON.
- Replace the setListKey/key-prop re-mount trick with
  lambdaQuery.useUtils().device.listDevices.invalidate() so the tRPC
  React Query cache is properly busted after a successful pair (fixes
  staleTime: 30s preventing the new device from appearing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ♻️ refactor(devices): drop browser pairing, fix modal close, redesign UI

- Remove the "Via Browser" pairing flow entirely: browser tab in
  ConnectDeviceModal, the "register this browser" callout card, the
  ?pair=<code> deep-link, and the /api/devices/pair stub route. Only the
  real Desktop and CLI connection methods remain.
- Fix the modal that couldn't be closed: @lobehub/ui Modal closes via
  onCancel (antd), not onClose — the X button was a no-op.
- Redesign the connect modal (segmented tabs, numbered steps, command
  blocks with copy, security footer) and the empty state (onboarding
  hero with Desktop/CLI options + capability cards).
- Clean up browser/sync i18n keys; add capabilities + footer keys for
  en-US and zh-CN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 fix(devices): apply card radius — cssVar.borderRadius already has unit

The radius tokens (cssVar.borderRadius / borderRadiusLG) already include
their unit, so the trailing `px` produced `var(--…)px`, which browsers
drop — leaving the cards with sharp corners. Drop the `px` so the cards
pick up the same rounded radius as the appearance settings FormGroup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-11 19:50:28 +08:00
Arvin Xu e20496e444 🐛 fix(codex): persist model metadata and file diffs (#15672)
* 🐛 fix(codex): persist model metadata

* 🐛 fix(codex): show file change diffs
2026-06-11 19:15:45 +08:00
Innei dbc8d76c8d feat(desktop): restore cloud desktop builds (#15666) 2026-06-11 19:14:26 +08:00
renovate[bot] ecfdac5395 Update dependency @opentelemetry/sdk-node to ^0.217.0 [SECURITY] (#14687)
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 18:51:20 +08:00
YuTengjing 5f4bec347b 🐛 fix(model-runtime): improve DeepSeek structured output (#15680) 2026-06-11 16:57:57 +08:00
Arvin Xu 77e4d0492b ♻️ refactor(agent): resolve device routing via a single execution plan (#15669)
- add resolveExecutionPlan as THE device decision (none/sandbox never
  route to a device; offline bindings stay unrouted; single-online-device
  auto-activation only for device-capable targets)
- fix executionTarget=none being bypassed by single-device auto-activation
  (background runs executed device tools despite 无设备)
- stop exposing the remote-device proxy in none/sandbox sessions
- converge native execAgent, hetero dispatch fork and client
  selectRuntimeType onto the shared resolution
- drop the legacy per-platform chatConfig.runtimeEnv.runtimeMode fallback
  entirely (no migration: unset targets resolve to platform defaults)

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 16:29:37 +08:00
Neko a60d11df48 🐛 fix(chat): preserve message order after tool results (#15657) 2026-06-11 16:18:18 +08:00
YuTengjing 14501ea69a 🐛 fix: keep model guard in provider grouping (#15681) 2026-06-11 15:35:15 +08:00
Arvin Xu b76992e581 feat(file-preview): support remote read-only local previews (#15673)
*  feat(file-preview): support remote read-only local previews

*  feat(local-file): identify tabs by context

* ♻️ refactor(file-preview): route previews through project file service

* 🐛 fix(desktop): clamp nav panel width

*  feat(file-preview): improve local preview controls

* 🐛 fix(file-preview): reload html after refresh completes
2026-06-11 15:10:25 +08:00
Arvin Xu 97e4e345d1 🔨 chore(codecov): update coverage grouping (#15650)
🔨 chore: update codecov coverage grouping
2026-06-11 14:40:06 +08:00
cokeSEE1 c609a60f0e 🔨 chore(ci): bump outdated action versions to latest (#15655)
- actions/checkout@v4 -> @v6 in issue-auto-comments.yml
  (last remaining @v4 usage; all other 48 uses are already @v6)
- actions/github-script@v7 -> @v8 in release-desktop-canary.yml
  (last remaining @v7 usage; all other 4 uses are already @v8)

Co-authored-by: 章岚 <zhanglan@datagrand.com>
2026-06-11 09:54:53 +08:00
renovate[bot] 06bf82f3e0 Update dependency node to v24.16.0 (#14621) 2026-06-11 09:24:21 +08:00
Zhijie He 3ccc23152c 💄 style: add sensenova-6.7-flash-lite & sensenova-u1-fastsupport (#15306) 2026-06-11 09:22:49 +08:00
Zhijie He 3a780a62f6 feat: add AntGroup (蚂蚁百灵) provider support (#13713) 2026-06-11 09:21:54 +08:00
Zhijie He e98ad7edca 💄 style: update models for Longcat, support api fetch model list (#15134) 2026-06-11 09:20:55 +08:00
Arvin Xu 686778fe51 feat(file-preview): render HTML files inline (#15671)
 feat(file-preview): render html files inline
2026-06-11 02:39:05 +08:00
Arvin Xu 914976a52f feat(model-bank): knowledgeCutoff batch 2, metadata skill & always-visible tab bar (#15663)
*  feat(model-bank): backfill knowledgeCutoff batch 2 and restore lost Anthropic values

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 📝 docs(skills): add model-bank-metadata skill for cutoff/family backfill

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(model-bank): Claude Fable 5 belongs to the claude-mythos family

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 💄 style(desktop): always surface the tab bar by creating a tab on first navigation

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ♻️ refactor(model-bank): family is the product lineage (claude-opus/sonnet/haiku), not the brand

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): backfill activeAgentId before paint on tab/route switches

Tab switches are plain route navigations, so leaving an agent page cleared
activeAgentId via a passive useUnmount and the next page re-set it in a
passive useEffect — the first painted frame always had no active id, flashing
a skeleton even when agentMap already cached the config. Move both the
backfill and the unmount clear to layout effects: removed-tree layout
cleanups run before new-tree layout effects in one commit, so the clear can
never wipe a freshly synced id and the id is in place before paint.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(agent): surface agent config fetch errors with a retry action

isAgentConfigLoading only knows "no data yet", so a failed fetch (e.g. a 401
that SWR deliberately does not retry, with no focus revalidation inside a
single Electron window) left the agent page on a skeleton forever — only a
manual reload recovered. Record per-agent fetch errors in
agentConfigErrorMap (set by onError, cleared on data / retry), expose
currentAgentConfigError / isAgentConfigError selectors, add a
retryAgentConfigFetch action that revalidates the agent's SWR entries, and
show an error alert with a retry button above the main chat input while the
config is still missing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(ci): sync model metadata test expectations

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:29:17 +08:00
Arvin Xu fdd955404d feat(codex): add collab tool render (#15662)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 01:15:29 +08:00
LiJian 6d47c1d07e feat(connector): fold OAuth into the custom MCP (PluginDevModal) form (#15661)
*  feat(connector): support API key / custom header / OAuth auth in custom connector

Make the connector backend a full replacement for the legacy custom-MCP plugin form:

- connector create/update now accept bearer/apikey/header credentials (encrypted at rest);
  oauth2 stays callback-only
- map apikey → bearer auth and header → request headers in both the sync path
  (syncTools + callTool) and the agent-runtime manifest path
- pass custom HTTP headers through to the MCP client
- AddConnectorModal becomes a rich form: MCP type (HTTP/STDIO), auth type
  (None / API Key / Custom Headers / OAuth), reusing the plugin form inputs;
  OAuth keeps the existing popup authorize flow, others create + sync directly

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(connector): fold OAuth into the PluginDevModal MCP form

Pivot the custom-MCP entry to reuse the rich PluginDevModal / MCPManifestForm
instead of a bespoke connector modal, and add OAuth as an auth type inside it:

- MCPManifestForm: gated `enableOAuth` adds an "OAuth" auth type with
  Client ID / Secret (optional) + redirect-URI hint. Only the custom-connector
  entry enables it, so plain custom-plugin DevModal callers (editing plugins,
  agent tools, …) are unaffected.
- DevModal: opens the OAuth popup synchronously on the save click (browsers
  block window.open once an async boundary is crossed), validates, then hands
  the popup to onSave which navigates it to the authorize URL.
- New CustomConnectorModal wraps DevModal and persists every auth type onto the
  connector backend (none / bearer / custom headers → create + sync; OAuth →
  create with OIDC config + run the authorize popup).
- settings/skill entry now opens CustomConnectorModal; the standalone
  AddConnectorModal rich rewrite from the previous commit is reverted to the
  canary original (it is only referenced by the unused ConnectorList).
- i18n: dev.mcp.auth.oauth* keys (default + en-US + zh-CN).

Backend stays as in the prior commit (connector create/update accept
bearer/apikey/header credentials; sync + manifest paths apply them).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(connector): route the OAuth auth type through the authorize flow, not the token-less manifest test

Selecting OAuth and clicking "Test connection" called the plugin manifest test
(getStreamableMcpServerManifest), which connects with no token and 401s on any
OAuth-gated server (e.g. Linear MCP / DCR). For OAuth there is nothing to test
without authorizing first, so the button now becomes "Authorize & Connect" and
runs the connector OAuth flow (discovery + DCR + authorize popup), shared with
the footer save button via DevModal.runOAuthFlow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(connector): make connector.create idempotent on (user, identifier)

Re-adding or re-authorizing a custom connector with an existing identifier hit
the user_connectors unique constraint and 500'd. Now an existing row is updated
(reset to disconnected, refreshed name/url/oidcConfig/credentials) and its id
reused, instead of inserting a duplicate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(skill-store): route Add Custom MCP through the connector modal, drop the Custom tab

- Skill Store "Add → Add Custom MCP Skill" now opens CustomConnectorModal
  (connector backend + OAuth), matching the settings/skill entry, instead of
  the legacy plugin DevModal (installCustomPlugin + togglePlugin).
- Remove the now-redundant "Custom" tab from the Skill Store (custom MCP lives
  in the connector list now): drop SkillStoreTab.Custom, its tab option,
  CustomList render, and the matching search branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 01:00:38 +08:00
renovate[bot] c65cf8c2a0 Update dependency @opentelemetry/auto-instrumentations-node to ^0.76.0 [SECURITY] (#14686)
Update dependency @opentelemetry/auto-instrumentations-node to ^0.75.0 [SECURITY]

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
2026-06-11 00:09:31 +08:00
Arvin Xu 981c57d6f9 🐛 fix(codex): scope repeated tool results (#15659)
* 🐛 fix(codex): scope repeated tool results

* 💄 style(codex): refine local file link states
2026-06-10 23:22:56 +08:00
Arvin Xu 87eba86514 chore(model-bank): backfill knowledgeCutoff + family/generation data (#15642)
*  feat(model-bank): backfill knowledgeCutoff for OpenAI/Claude/Llama/Phi families (batch 1)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(model-bank): add family/generation fields with rule-derived data for chat models

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(model-bank): add canonical knowledge-cutoff map with build-time backfill

Adds MODEL_KNOWLEDGE_CUTOFFS (canonical id → YYYY-MM, all values verified
against official provider docs) plus normalizeModelIdForCutoff, which reduces
provider-specific spellings (openrouter/bedrock prefixes, dated snapshots,
-thinking/-fast/-latest/-preview variants, claude dot-versions) to canonical
ids. buildDefaultModelList backfills knowledgeCutoff from the map when a model
card has no inline value, so all aggregator providers inherit cutoffs
automatically; inline values always win.

Covers Anthropic (incl. legacy 3.x), OpenAI, Google Gemini/Gemma, xAI Grok,
Meta Llama, Amazon Nova, and Cohere. DeepSeek/Qwen/GLM/Kimi/MiniMax/Mistral
publish no official cutoffs and are intentionally absent. Anthropic inline
PoC entries migrate into the map (single source of truth).

Cross-checked against the batch-1 inline backfill: 0 value mismatches.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(model-bank): correct Claude Sonnet 4.6 cutoff

*  test(model-bank): sync metadata expectations

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:59:36 +08:00
Rdmclin2 09e6f02e45 🔨 chore: modify workspace sidebar (#15658)
* chore: change back to user style sidebar panel

* chore: optimize personal menu

* chore: update i18n files
2026-06-10 22:21:27 +08:00
Arvin Xu a2ea314cd8 feat(codex): refine Codex tool renders (#15651)
* 💄 style(codex): refine file change tool render

*  feat(codex): add web search tool render

*  feat(codex): add mcp tool render

*  feat(codex): improve tool command display

* 💄 style(files): refine explorer tree icons

*  test: fix local file link render props
2026-06-10 22:13:56 +08:00
Arvin Xu e2be720726 🐛 fix(agent-runtime): keep async sub-agent stream alive (#15646)
* 🐛 fix: keep async sub-agent stream alive

* 🐛 fix: preserve async tool resume parent chain
2026-06-10 22:12:22 +08:00
Arvin Xu 8b6905ec7e 💄 style(desktop): tighten tab close button right padding (#15636)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 22:12:02 +08:00
Arvin Xu e4830943cf 🔨 chore(model-bank): add knowledgeCutoff field to model cards (#15640)
*  feat(model-bank): add knowledgeCutoff field with Anthropic models as PoC

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

*  feat(model-bank): add family/generation fields to model card types

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 20:02:34 +08:00
Arvin Xu 5dfb6fc288 chore: clean [LOBE-XXX] code annotations (2026-06-10) (#15623)
chore: clean up [LOBE-XXX] code annotations (2026-06-10)

Remove LOBE-XXX markers from comments and URLs across 7 files:
- apps/cli/hetero.ts & hetero.test.ts: Remove LOBE-10157 markers, keep context
- apps/server/ModelRuntime: Remove LOBE-10056, keep PK migration note
- packages/database/rbac.ts: Remove LOBE-9193, keep API doc
- scripts/codemodWorkspaceNav.ts: Remove LOBE-9024 from description
- parse.ts & parse.test.ts: Replace LOBE-10141/LOBE-123 with generic IDs

Co-authored-by: lobehub-bot <lobehub-bot@users.noreply.github.com>
2026-06-10 19:59:54 +08:00
lobehubbot 553d3d8fc7 🔖 chore(release): release version v2.2.3 [skip ci] 2026-06-10 11:38:19 +00:00
Arvin Xu 94ea3f6a34 🚀 release: 20260610 (#15647)
# 🚀 LobeHub Release (20260610)

**Release Date:** June 10, 2026  
**Since v2.2.2:** 131 merged PRs · 13 contributors

> This weekly release strengthens agent collaboration across cloud,
desktop, CLI, and workspace flows, with steadier runtime behavior and a
broader foundation for workspace-scoped data.

---

##  Highlights

- **Agent execution across devices** — Unifies per-device working
directories, project skill discovery, and sub-agent suspend/resume
behavior across server, QStash, and device RPC flows. (#15543, #15566,
#15481, #15620, #15591)
- **Connector and sandbox platform** — Expands connector permissions,
custom OAuth MCP connector onboarding, sandbox provider support, and
user-uploaded file sync into cloud sandbox runs. (#15463, #15546,
#15184, #15550)
- **Desktop and CLI reliability** — Fixes desktop cold-start,
auto-update, Windows build, CLI skill discovery, and `lh connect` agent
dispatch paths. (#15547, #15525, #15527, #15562, #15632, #15634)
- **Pages and sharing** — Refreshes topic sharing, improves Page Editor
layout behavior, and routes Page Agent tool execution through the
server-side editor path. (#15581, #15556, #15588, #15023, #15610)
- **Model availability and provider updates** — Adds user-scoped LobeHub
model availability, Claude Fable 5, Qwen thinking preservation, and
MiniMax M3 updates. (#15590, #15639, #13494, #15376)

---

## 🏗️ Core Product & Architecture

### Agent Runtime & Heterogeneous Agents

- Improves sub-agent lifecycle handling, including async suspend/resume,
queue-mode QStash resume delivery, and blocking nested sub-agent calls.
(#15481, #15620, #15575)
- Stabilizes heterogeneous agent ingestion and streaming with raw stream
dumps, per-turn usage, image forwarding on regenerate, and
duplicate-text fixes. (#15602, #15577, #15592, #15585)
- Adds execution-device and working-directory controls across device
RPC, legacy defaults, and remote-spawned Claude Code sessions. (#15543,
#15566, #15591, #15572)
- Improves runtime diagnostics and compatibility, including Gemini
multimodal output capture, abort stream semantics, and trace quality
analysis. (#15535, #13677, #15508)

---

## 📱 Platforms, Integrations & UX

### Connectors, Sandbox & Tools

- Ships API-level connector tool permissions, custom OAuth MCP connector
onboarding, and connector-first runtime execution. (#15463, #15546)
- Adds sandbox provider support, cloud sandbox file sync, and safer
external URL file input handling with SSRF validation. (#15184, #15550,
#12657)
- Improves tool visibility and execution with pinned app-fixed tools,
ANSI output rendering, gateway-tunneled MCP calls, and automatic
headless tool runs. (#15509, #15516, #15469, #15492)

### Desktop, CLI & Web UX

- Restores desktop startup and reload behavior, preserves IPC error
causes, and keeps the tab bar new-tab action visible across routes.
(#15547, #15597, #15638)
- Fixes desktop update and build stability for browser quit guards,
macOS update signing, and Windows Visual Studio detection. (#15525,
#15527, #15562)
- Shows the plan-limit upgrade UI on desktop builds. (#15628)
- Adds the Agent Run delivery checker and fixes CLI device dispatch plus
skill list/search output. (#15489, #15634, #15632)
- Refreshes onboarding, auth source preservation, topic UI states,
referral/Fable campaign copy, and chat-input control bar behavior.
(#15629, #15544, #15573, #15614, #15616, #15617, #15622, #15643)

---

## 🔒 Security, Reliability & Rollout Notes

- External URL file input now includes SSRF validation for safer Google
file handling. (#12657)
- Database workspace-scope migrations are part of this release;
self-hosted operators should run the normal migration path before
serving the updated app. (#15446, #15465, #15468, #15472)
- The release branch was re-cut from `canary` and includes the latest
`main` release-version commit so `v2.2.2` is the verified compare base.

---

## 👥 Contributors

@ONLY-yours, @sxjeru, @hardy-one, @xujingli, @hezhijie0327, @Coooolfan,
@arvinxx, @tjx666, @Innei, @rivertwilight, @rdmclin2, @cy948,
@AmAzing129

**Full Changelog**:
https://github.com/lobehub/lobehub/compare/v2.2.2...release/weekly-20260610-recut-3
2026-06-10 19:35:47 +08:00
YuTengjing b8339abc76 🐛 fix: show plan limit upgrade UI on desktop builds (#15628) 2026-06-10 18:19:25 +08:00
Innei c037609b8b 💄 style(chat-input): fix control bar height jump when TokenTag appears (#15643) 2026-06-10 17:43:13 +08:00
René Wang b8b37cffa3 feat: refresh topic sharing experience (share page + popover) (#15581) 2026-06-10 17:43:02 +08:00
Rdmclin2 e8e4b2e822 feat: support workspace lobehub (#13977)
feat: support workspace (full) — store→business-hook + workspace router
2026-06-10 17:34:12 +08:00
Arvin Xu c02e5720c2 feat(model-bank): add claude-fable-5 to Anthropic models (#15639)
*  feat(model-bank): add claude-fable-5 to Anthropic models

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): allow adding directory topics on web when agent targets a bound device

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:57:57 +08:00
Arvin Xu 3fb732da66 💄 style(desktop): keep tab bar new-tab button visible on every route (#15638)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:01:38 +08:00
Arvin Xu fdb529d598 🐛 fix(agent): deliver sub-agent resume bridge via QStash webhook in queue mode (#15620)
* 🐛 fix(agent): deliver sub-agent resume bridge via QStash webhook in queue mode

The callSubAgent completion bridge was a handler-only hook, which lives in
process memory: in queue mode (AGENT_RUNTIME_MODE=queue) HookDispatcher only
delivers webhook-configured hooks, so the bridge never fired — the parent op
stayed parked in waiting_for_async_tool forever after all sub-agents finished.

- Give the bridge hook a webhook config (delivery: qstash) targeting the new
  /api/agent/webhooks/subagent-callback endpoint; local mode keeps the
  in-process handler. Both paths converge on
  AgentRuntimeService.completeSubAgentBridge (backfill + barrier/CAS resume).
- Park-time self-check: after the parked state and operation row are
  persisted, re-run the resume barrier once to recover children that
  completed before the parent finished parking.
- One-shot verify watchdog: when a completion finds the parent not yet
  resumable, schedule a delayed verifyAsyncToolBarrier re-check (no step
  lock, CAS-idempotent, never re-arms).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 📝 docs(agent): correct verify-watchdog rationale comment

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 📝 docs(agent): clarify eventFields trimming rationale

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* ♻️ refactor(agent): align subagent-callback with workspace-scoped step worker

Post-rebase adaptation to canary's runtime restructure (#15609):

- Route the webhook bridge through AiAgentService (like the /run step
  worker) so the runtime's models stay workspace-scoped — a bare
  AgentRuntimeService would be personal-scoped and the tool-message
  backfill / resume barrier could miss workspace-scoped rows.
- Extract SubAgentBridgeParams into agentRuntime/types and add the
  completeSubAgentBridge passthrough next to executeStep.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* 🐛 fix(agent): fail sub-agent callback loudly on backfill or delivery failure

Address two review findings on the resume bridge:

- completeSubAgentBridge now checks updateToolMessage's { success } result
  (it swallows transaction errors instead of throwing) and propagates all
  infrastructure failures. The webhook endpoint then returns non-2xx so
  QStash redelivers the whole bridge — previously a failed backfill was
  acked with 200 and the parent stayed parked forever, since the verify
  recheck only re-reads the barrier and cannot retry the backfill.
- New AgentHookWebhook.fallback: 'none' opts a qstash-delivered hook out of
  the unsigned plain-fetch fallback, which can never authenticate against a
  QStash-signed endpoint and only masked publish failures as silently
  dropped 401s. The bridge hook uses it; dispatch escalates such delivery
  failures to console.error instead of the debug namespace.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 16:00:17 +08:00
Arvin Xu 4c5c8795ef 🐛 fix(model-runtime): emit stop:abort instead of error when stream is aborted (#13677)
* 🐛 fix(model-runtime): emit stop:abort instead of error when stream request is aborted

When user cancels a streaming request, the provider SDK throws abort errors
(e.g. "Request was aborted"). Previously these were propagated as error chunks,
causing the client to display a provider error message. Now abort errors emit
a stop:abort event through the SSE pipeline, allowing the client to handle
cancellation gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* 🐛 fix(model-runtime): fix type error in abort pipeline test

Use `as const` for type literal to satisfy StreamProtocolChunk union type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

*  test(fetch-sse): add planUpgradeAfterFinish to onFinish expectations

#15616 added planUpgradeAfterFinish to the onFinish context but missed
updating fetchSSE.test.ts, breaking 13 tests on canary.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(model-runtime): harden abort detection against non-Error throws

isAbortError assumed error.message is always a string, but catch
clauses receive unknown — a non-Error throw (string, object without
message) would make the abort check itself throw inside the stream
error handler, swallowing both ABORT_CHUNK and the first-chunk error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-06-10 15:56:39 +08:00
YuTengjing 8b342c600f feat: land new signups directly on onboarding (#15629) 2026-06-10 15:31:32 +08:00
LiJian 723c4d6daa 🐛 fix(cli): handle agent_run_request in lh connect so device dispatch doesn't time out (#15634)
* 🐛 fix(cli): handle agent_run_request in `lh connect` so device dispatch doesn't time out

`lh connect` auto-registers the CLI as a device, so the gateway can pick it
as the dispatch target for a heterogeneous agent run (`agent_run_request`).
But the connect daemon only listened for `system_info_request` and
`tool_call_request` — it never handled `agent_run_request`, so it never sent
`agent_run_ack`. The gateway waited out its ack window and returned
`{error:'TIMEOUT',success:false}`, surfaced server-side as "Hetero agent
device dispatch failed".

Add an `agent_run_request` handler mirroring the desktop app: spawn
`lh hetero exec` fire-and-forget and ack `accepted` immediately. The spawned
process owns the full execution + server-ingest pipeline. It re-invokes the
current CLI entry (process.execPath + argv[1]) rather than relying on `lh`
being on PATH, so it works inside the detached daemon.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix: bump the cli version

* chore: bump the cli manifest

* 🐛 fix(cli): ack agent run only after spawn succeeds, reject on spawn error

`child_process.spawn` reports a missing/inaccessible cwd asynchronously via
the child's `error` event, after the handler had already sent an `accepted`
ack. The gateway/server then recorded dispatch success while no `lh hetero
exec` process existed to emit `heteroFinish`, leaving the assistant message
stuck instead of surfacing a failure.

`spawnHeteroAgentRun` now resolves on the child's outcome: `accepted` on the
`spawn` event (stdin is written only then), `rejected` on an early `error`. A
rejected ack returns the gateway 422 → execAgent writes a ServerAgentRuntimeError
onto the assistant message, so a failed dispatch is visible. Still resolves in
milliseconds, well within the gateway's 10s ack window.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-10 15:19:01 +08:00
LiJian 5b02563659 🐛 fix(cli): skill list/search commands returning empty results (#15632)
🐛 fix: skill list/search commands returning empty results

tRPC endpoints return { data, total } but CLI was treating the result as
an array; switch to result?.data ?? [] and update mocks to match.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-10 14:20:35 +08:00
YuTengjing a5f16c1184 🐛 fix: import button from ui root (#15599) 2026-06-10 14:19:04 +08:00
YuTengjing 7641cda958 💄 style: update i18n locales (#15630) 2026-06-10 14:02:02 +08:00
Arvin Xu 9ef76475c2 💄 style: add fable promo locale keys for plans page (#15622) 2026-06-10 07:59:15 +08:00
YuTengjing 1ed93b6a24 🐛 fix: type fable starter config (#15618) 2026-06-10 06:05:49 +08:00
Arvin Xu 004027ffdd 💄 style: update free credit badge copy and add cta/dismiss keys (#15617)
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 06:05:28 +08:00
Arvin Xu 0434953053 chore: add home free credit badge business slot (#15615)
 feat: add home free credit badge business slot

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 05:47:37 +08:00
YuTengjing 4b7ef28e46 🐛 fix: support fable campaign UI (#15616) 2026-06-10 05:46:31 +08:00
Arvin Xu 437b4c8968 💄 style: update referral copy for pay-to-unlock reward (#15614)
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 05:14:49 +08:00
Arvin Xu fdb4f37053 ♻️ refactor(hetero-agent): shared subagent-run coordinator + fix device-mode subagent streaming (#15613)
* ♻️ refactor(hetero-agent): shared subagent-run coordinator + fix device-mode subagent streaming

Remote-device (gateway) hetero runs corrupted SubAgent text on the wire: the
CLI `SerialServerIngester`'s main-agent text-snapshot coalescing was subagent-
unaware, so subagent full-block text got mixed into the main accumulator and
re-`append`ed as `replace` snapshots server-side. Fix: exclude `data.subagent`
text from the coalescer so it forwards raw (the server appends it once).

The deeper cause was duplication: the renderer executor and the server
persistence handler each hand-wrote the SAME subagent-run state machine (lazy
thread create, turn-boundary cut, finalize, orphan drain, chain parenting) —
the epicenter of past hetero subagent bugs. Extract it into ONE pure,
transactional reducer (`reduceSubagentRuns`) in `@lobechat/heterogeneous-agents`
that emits declarative intents; each engine keeps a thin interpreter for its
own I/O (renderer: messageService + live store dispatch; server: messageModel).

The reducer pre-allocates ids so intents carry parentId chains with no
create→backfill round-trip; this needs `messageService.createMessage` to accept
a caller id (threaded through; the model already supported it). Also widened the
message nanoid 14→18 for the higher per-run id volume.

Behavior unifications (vs the two old copies):
- transactional commit-on-success subsumes the renderer's `pendingFlushTarget`
  (a failed flush leaves the run intact for the onComplete-drain retry; the
  renderer keeps a local pending-flush map pinned to the original assistant).
- finalize DELETES the run (server-style); a second finalize / orphan drain is
  a clean no-op with the same DB end-state.

Scoped to subagent runs only; main-agent persistence stays per-engine. A future
pass can absorb the main-agent path into a unified agent-event reducer.

Tests: reducer 13, CLI hetero 22, server hetero 84, renderer executor 58.

Refs: LOBE-10175

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(hetero-agent): strengthen subagent flush-retry assertion

The earlier rewrite of this assertion (caused by ids moving from server-
generated to caller-pre-allocated) weakened it to "all streamed writes share
one id", which would also pass if they all wrongly hit the terminal row. Pin it
back to the test's real intent: resolve the FIRST streaming-turn assistant by
its create payload and assert every streamed write targets it AND that it
differs from the terminal assistant's id — so `resultContent` is never clobbered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(hetero-agent): honor commit-on-success for renderer subagent intents + fix stale id-length tests

- renderer interpreter: createThread / createMessage failures now rethrow so
  reduceAndApplySubagent skips the state commit — the next event retries the
  lazy create / turn boundary instead of orphaning the run (review P2)
- catch around the intent loop so a failed intent can't poison persistQueue
- regression test: transient createThread failure retries on next event
- update message id length assertions 18 → 22 (nanoid widened 14→18 + msg_)
- update messageService.createMessage spy assertions for the new (params, id) call

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 05:09:43 +08:00
Arvin Xu 1260756246 feat(agent): block nested sub-agent calls (#15575)
*  feat(agent): block nested sub-agent calls

Sub-agents must not recursively spawn further sub-agents. Plumb an
`isSubAgent` flag from the spawning thread through the conversation /
operation / tool-call metadata, and refuse nested dispatch at every layer:

- streamingExecutor marks the spawned sub-agent context with `isSubAgent`
- aiAgent strips the LobeAgent tool from a sub-agent's plugin config
- client builtin-tool executor + server tool runtime return a clear error
- RuntimeExecutors blocks both single and batch sub-agent dispatch

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(test): align execSubAgentTask expectation with isSubAgent appContext

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(agent): don't mark group sub-agent tasks as isSubAgent

Group sub-agents are real agent dispatches and must keep the ability to
spawn their own sub-agents; only the LobeAgent-tool virtual sub-agent
path should carry isSubAgent. Drop the flag from execSubAgentTask.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 04:00:23 +08:00
YuTengjing cb769534d3 ♻️ refactor: parse Claude model ids for runtime checks (#15601) 2026-06-10 02:55:34 +08:00
Arvin Xu de1a5c88e4 test(database): cover more model/repository gaps (client-db 95.4%→95.7%) (#15612)
Extend tests toward full coverage of PGlite-reachable code:
- agentEval/runTopic (batchMarkAborted, deleteByRunAndTestCase) → 100%
- agentEval/run (benchmarkId filter branch) → 100%
- verifyCheckResult (createMany empty, findById, update, backfillTracingId) → 100%
- asyncTask, document, systemBotProvider, dataImporter — additional branches

Remaining client-db gaps are BM25/pg_search paths (run only in server-db/CI)
and real-Postgres-error / defensive fallbacks not reachable under PGlite.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 02:48:58 +08:00
Arvin Xu 5b4b50e050 🐛 fix(page-agent): inject active documentId into context on send (#15610)
* 🐛 fix(page-agent): inject active documentId into context on send

Page-scoped conversations never carried the open document id to the
agent runtime. At send time `operationContext` only had agentId/scope/
topicId, so the gateway's `appContext.documentId` was undefined and the
server-side PageAgent runtime threw "received a tool call without
documentId in context".

Inject the live document id from the page editor runtime
(`pageAgentRuntime.getCurrentDocId()`) into `operationContext` when
scope is `page`, so it flows through `execAgentTask` → server
`state.metadata.documentId` → tool execution context.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(page-agent): pass new document id explicitly in sendAsWrite to avoid stale injection

The page-scoped documentId fallback reads the page editor runtime
singleton, which is only authoritative once the active page's editor has
mounted. `sendAsWrite` creates a document, navigates, and sends
immediately — before the new editor mounts — so the singleton may still
be bound to the previously open page, scoping server-side PageAgent
tools to the wrong document.

Thread the freshly created `newDoc.id` through the conversation context;
the existing `!context.documentId` guard then skips the singleton
fallback entirely. Document the constraint at the fallback site.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 02:30:33 +08:00
YuTengjing 1d619ad507 feat: add user-scoped LobeHub model availability (#15590) 2026-06-10 02:19:14 +08:00
Arvin Xu 3ce3b5388f test(database): raise model/repository coverage to 95%+ and document DB test conventions (#15611)
*  test(database): raise model/repository coverage to 95%+ and document DB test conventions

Raise @lobechat/database client-db coverage 89.11% -> 95.36%:
- New integration tests for connector, connectorTool, workspaceMember (were 0%)
- Extend task, workspace, rbac, notification, userMemory/query, file,
  agentSignal/reviewContext, verifyRubric, brief, taskTopic, dataImporter,
  messengerAccountLink, home

Fix client-db (PGlite) test failures: BM25 search lacks the pg_search
extension under PGlite, so wrap session.queryByKeyword and home.searchAgents
in describe.skipIf(!isServerDB), matching the existing convention.

Document DB model/repository testing conventions so new models ship with tests:
- Rewrite testing skill's db-model-test.md (getTestDB integration pattern,
  client-vs-server-db split, BM25 skipIf guard, schema gotchas, user isolation)
- Surface the rule in testing/SKILL.md, cross-link from drizzle/SKILL.md,
  review-checklist/SKILL.md, and models/_template.ts

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(database): make verifyRubric/brief ordering tests deterministic

These models order by `updatedAt`/`createdAt` desc with no id tiebreaker, and
the tests created rows back-to-back relying on default `now()` — when two rows
land in the same millisecond the order is non-deterministic, causing flaky CI
failures. Set explicit, well-separated timestamps instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 01:42:08 +08:00
Innei 991c2f79e8 🐛 fix(desktop): trace Session Expired cause and resume onboarding at Login (#15604)
- Carry a `reason` payload on the `authorizationRequired` IPC event so the
  cause behind the Session Expired modal (proxy 401, refresh non-retryable,
  startup proactive refresh exception, etc.) lands in `electron-log` and the
  renderer debug namespace for postmortem.
- On 401 + `X-Auth-Required`, enrich the reason with `hadToken`, the upstream
  `www-authenticate` header and a truncated body snippet so OAuth/tRPC error
  details are captured without consuming the forwarded stream.
- Fix returning users (token refresh failed -> active=false -> relaunch)
  landing on the Welcome screen of desktop onboarding. Persist an
  `everCompleted` flag in localStorage and resume at the Login screen for
  anyone who has already completed onboarding once.
- Extract the screen-resolution logic into a pure `resolveInitialScreen`
  helper with unit tests; cover the new storage flag and reason payload in
  AuthCtr / BackendProxy tests.
2026-06-10 01:06:00 +08:00
Arvin Xu c329696dc2 🐛 fix(hetero): chain step boundary off tool row when tools[] backfill is unseen (#15607)
* 🐛 fix(hetero): chain step boundary off tool row when tools[] backfill is unseen

On a warm replica that did not drain the prior step's `tools_calling` (or
before the assistant's `tools[]` JSONB has its `result_msg_id` backfilled),
the in-memory tool state is empty, so the step boundary falls back to the
previous assistant and forks the wire into two disconnected bubbles.

Fall back to the authoritative anchor — the `role:'tool'` rows themselves,
committed in Phase 2 independently of the JSONB mirror's Phase-3 backfill —
via a new `MessageModel.getLastChildToolMessageId`. Excludes subagent tool
rows (threadId set) so they never anchor the main-agent wire.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(hetero): write per-device cwd when adding topic from project group

The sidebar "+ new topic in this directory" action wrote the working
directory to the legacy per-agent slot (localAgentWorkingDirectoryMap),
which sits below agencyConfig.workingDirByDevice in the resolution
precedence. Once a directory had been picked via the ControlBar (which
writes workingDirByDevice), the "+" action was silently shadowed and the
new topic was created with the previously-picked directory instead.

Route the action through useCommitWorkingDirectory.commitAgentDefault so
it writes the same high-precedence per-device slot the picker uses,
keeping the two write paths from drifting again.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(hetero): cover MessageModel.getLastChildToolMessageId

The fallback anchor query added in 599eea5bda had no DB-level test — the
persistence handler mocks it, so its real SQL was never exercised and
patch coverage dropped. Add direct PGlite tests covering all branches:
latest-tool ordering, no-tool → undefined (ignoring non-tool children),
subagent thread exclusion (threadId IS NULL), and ownership isolation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 00:42:37 +08:00
Arvin Xu 4b5e001934 🐛 fix(server): restore sub-agent forking in QStash step worker (#15609)
* 🐛 fix(server): restore sub-agent forking in QStash step worker

In QStash mode every agent step runs in a fresh HTTP request via the
hono `runStep` handler, which built a bare AgentRuntimeService without
the `execSubAgent` fork callback. As a result `lobe-agent.callSubAgent`
failed with SUB_AGENT_UNAVAILABLE in cloud (the in-process callback
never survives the queue boundary).

Step through AiAgentService.executeStep instead, reusing its internal
runtime that is already wired with the fork callback — no second runtime,
no manual rebinding.

Also rename the internal `execSubAgentTask` → `execSubAgent` (method,
runtime/tool context fields, options, ExecSubAgent{Params,Result} types)
to separate the "task" concept from "sub-agent", and make the method an
auto-bound arrow field so it no longer needs `.bind(this)`. The external
lambda procedure name (`execSubAgentTask`) and the client service are
left unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(server): group runtime upward-calls into an AgentRuntimeDelegate

`execSubAgent` was a loose top-level option on AgentRuntimeService, which
hid that it is not ordinary config but an upward call: the low-level
runtime, mid-step, triggering a high-level pipeline that lives in
AiAgentService (the layer above it).

Introduce `AgentRuntimeDelegate` as the single named home for these
upward-call capabilities, and inject it as `delegate: { execSubAgent }`.
The interface doc states the convention so future "runtime must trigger a
higher-layer pipeline" capabilities land in the same place instead of
sprawling as ad-hoc options.

Scope is deliberately the injection surface (options + service field +
AiAgentService wiring). The downstream executor/tool context keeps its
flat `execSubAgent` field — the tool runner wants the unpacked capability,
not the whole delegate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 00:41:01 +08:00
Arvin Xu aa46864df6 ♻️ refactor(lobe-agent): remove callSubAgents in favor of parallel callSubAgent calls (#15608)
The lobe-agent manifest exposed `callSubAgents` (parallel multi-task
dispatch), but the server runtime only implemented `callSubAgent`. When an
agent run executed server-side and the model invoked `callSubAgents`, the
builtin executor threw "Builtin tool lobe-agent's callSubAgents is not
implemented".

The server already supports parallel sub-agents natively: a batch parks on
all deferred tools (`pendingToolsCalling`) and `tryResumeParentFromAsyncTool`
enforces a K=N barrier, resuming the parent only once every pending
tool_result is fulfilled. So emitting multiple `callSubAgent` calls in one
turn is equivalent to the old `callSubAgents` — making the plural API
redundant and the source of a server/client inconsistency.

Remove `callSubAgents` end to end (manifest, types, client executor,
Inspector/Render/Streaming components + registries, locale keys, display-name
map, dev fixture) and update the system prompt to guide the model to fan out
via multiple `callSubAgent` calls.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 23:59:20 +08:00
Arvin Xu af3f0ea171 🐛 fix(desktop): preserve Error cause across IPC so renderer sees real failure reason (#15597)
* 🐛 fix(desktop): preserve Error cause across IPC so renderer sees real failure reason

Electron's IPC error serialization carries an Error's message/stack/name plus
its enumerable own properties, but a standard `cause` (set via
`new Error(msg, { cause })`) is non-enumerable — so the real failure reason
(e.g. undici wrapping ENOTFOUND/ECONNREFUSED under a generic
`TypeError: fetch failed`) was dropped on the way to the renderer.

- IPC base: re-expose `cause` as an enumerable, clone-safe field in the central
  handler catch (nested Errors flattened to { name, message, code }) so every
  IPC method's error carries it.
- Heterogeneous agent executor: include `cause` in the ChatMessageError body so
  the surfaced error structure exposes the underlying reason alongside message.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(desktop): ferry IPC error cause via a serializable envelope

Making `cause` enumerable before rethrowing didn't actually reach the renderer:
Electron's `ipcRenderer.invoke` rebuilds a thrown handler error from its *string*
form (`Error invoking remote method '<channel>': <String(error)>`), so the
original error object — and any `cause` — never crosses the boundary.

Switch to an explicit serializable envelope:
- `~common/ipcError`: `toIpcErrorEnvelope` (clone-safe plain object, recursively
  captures name/message/stack/code/cause) + `isIpcErrorEnvelope` /
  `fromIpcErrorEnvelope` to rebuild a real Error.
- IPC base handler: return the envelope instead of throwing.
- preload `invoke`: detect the envelope and re-throw a rebuilt Error (with
  `cause`), preserving the "promise rejects on failure" contract.
- hetero executor: flatten the Error cause to a plain object for the
  DB-persisted `ChatMessageError.body`.

Adds unit tests for the envelope round-trip and the preload unwrap.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 22:58:50 +08:00
Arvin Xu 84a7b5c7c8 📝 docs(agents): replace device-gateway with server in apps tree (#15606) 2026-06-09 22:55:32 +08:00
Arvin Xu e01cadb779 feat(hetero): add --raw-dump to persist agent raw stream-json for debugging (#15602)
*  feat(hetero): add --raw-dump to persist agent raw stream-json for debugging

The remote-device path (`spawnLhHeteroExec`) leaves no local execution
record: `lh hetero exec` consumes the agent's stdout internally and only
POSTs adapted events to the server, so a misbehaving remote run can't be
inspected. The adapted/ingested view also can't distinguish a CC-side
empty `tool_result` from an adapter extraction bug.

Add `lh hetero exec --raw-dump <dir>`: spawnAgent gains an `onRawStdout`
tee that captures the child's untouched stdout BEFORE the adapter; the
CLI writes it (plus stderr + a meta.json) to
`<dir>/<timestamp>-<operationId>/`, one file pair per spawn attempt.
Fully best-effort — a dump failure never affects the run or exit code.

Wire the desktop device path to pass `--raw-dump` (gated by the existing
`shouldTraceCliOutput` toggle, into `resolveTraceRootDir`), so remote-device
CC runs now leave a raw stream on the device — the same toggle/location the
local trace path already uses. Reusable later for the server sandbox path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🔖 chore(cli): bump version to 0.0.27

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 22:16:05 +08:00
Arvin Xu ce5833cb67 feat(file): persist image dimensions into file metadata (#15594)
*  feat(file): persist image dimensions into file metadata

Record intrinsic width/height for uploaded images so consumers can
reserve layout space (avoid CLS) without loading the file first.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  test(file): assert persisted dimensions in upload createFile payload

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🔖 chore(cli): bump version to 0.0.26 and regenerate man page

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(file): record image aspect ratio alongside width/height

Compute intrinsic aspect ratio (width / height, rounded) at extraction
time and persist it into file metadata so consumers can group/reserve
layout by orientation without recomputing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 22:11:15 +08:00
Arvin Xu 5b534f45d1 ♻️ refactor(chat-input): rename RuntimeConfig→ControlBar, WorkingDirectoryBar→HeteroControlBar (#15545)
* ♻️ refactor(chat-input): rename RuntimeConfig to ControlBar

The bar below the chat input now composes mode switcher, execution
device + working directory, approval mode and context window — "runtime
config" no longer matches. Rename the directory, component, and the
showRuntimeConfig / runtimeConfigSlot props (→ showControlBar /
controlBarSlot) across all call sites. Reads as a sibling of ActionBar.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): rename WorkingDirectoryBar to HeteroControlBar

Make the heterogeneous chat-input bar a symmetric sibling of ControlBar:
both compose the shared WorkspaceControls, so naming should match. Rename
the file, component and displayName, and update the controlBarSlot usage.
2026-06-09 20:21:05 +08:00
Innei e692448346 🔨 chore(deps): pin @lobehub/editor to stable ^4.17.1 (#15600)
Switch from the pkg.pr.new preview snapshot back to the published 4.17.1 release.
2026-06-09 20:09:27 +08:00
Rylan Cai 3fe5b62cbe 🐛 fix: relax clear todo intervention (#15598)
🔒 Relax clear todo intervention
2026-06-09 19:55:20 +08:00
Arvin Xu 6c6c8698d3 🐛 fix(hetero): forward user images on regenerate so vision input isn't dropped (#15592)
* 🐛 fix(agent): resolve working directory by target device instead of legacy-only

The chat-input directory picker writes the selection to
`agencyConfig.workingDirByDevice[deviceId]`, but the send / regenerate /
streaming / placeholder paths resolved the agent working directory via
selectors that only read the legacy `localAgentWorkingDirectoryMap`. So a
freshly picked directory was silently dropped and execution fell back to a
default cwd (the app's own repo), losing the user's project and `--resume`.

Make both `getAgentWorkingDirectoryById` and `currentAgentWorkingDirectory`
device-aware: per-device choice > legacy > desktop/home, with the target
device resolved from a passed-in `currentDeviceId` (kept out of the selector
so hook callers stay reactive). Update all call sites to supply the device id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(hetero): forward user images on regenerate so vision input isn't dropped

The hetero regenerate/resend path (`runHeterogeneousFromExistingMessage`)
only forwarded the text prompt to `executeHeterogeneousAgent`, never the
original user message's `imageList`. The send path reads imageList off the
persisted user message and passes it along; this path must too. Without it,
regenerating an image turn re-ran the CLI with no attachments (fully lost
when the session couldn't be resumed, e.g. cwd changed).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 19:50:55 +08:00
Arvin Xu cdbef3f72e 🐛 fix(agent): resolve working directory by target device instead of legacy-only (#15591)
The chat-input directory picker writes the selection to
`agencyConfig.workingDirByDevice[deviceId]`, but the send / regenerate /
streaming / placeholder paths resolved the agent working directory via
selectors that only read the legacy `localAgentWorkingDirectoryMap`. So a
freshly picked directory was silently dropped and execution fell back to a
default cwd (the app's own repo), losing the user's project and `--resume`.

Make both `getAgentWorkingDirectoryById` and `currentAgentWorkingDirectory`
device-aware: per-device choice > legacy > desktop/home, with the target
device resolved from a passed-in `currentDeviceId` (kept out of the selector
so hook callers stay reactive). Update all call sites to supply the device id.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 19:45:36 +08:00
YuTengjing 71030c6e21 ♻️ refactor(auth): remove email harmony plugin (#15589) 2026-06-09 19:18:56 +08:00
LiJian adf49db7c4 🐛 fix: activator tool discovery for cloud-sandbox and local-system (#15586)
* 🐛 fix: activator tool discovery for cloud-sandbox and local-system

- P0: Explicitly inject LocalSystemManifest when device gateway is configured
  (discoverable: isDesktop is always false on server, so it never enters
  the discovery loop. The explicit injection mirrors the canUseDevice guard.)

- P1: Skip CloudSandboxManifest when runtimeMode is not 'cloud'
  (resolveRuntimeMode unifies executionTarget='sandbox' and legacy
  chatConfig.runtimeEnv.runtimeMode paths, so agents with sandbox
  disabled correctly exclude the cloud-sandbox tool.)

Both fixes operate at the manifest-map build stage, consistently affecting
all downstream consumers (activator discovery, availableTools, etc.)

* 🐛 fix: remove cloud-sandbox manifest when runtime is not sandbox

The initial manifest seed via getEnabledPluginManifests includes
defaultToolIds (which contains lobe-cloud-sandbox), so the manifest
was already in toolManifestMap before the allowedBuiltinTools loop's
continue guard. This made lobe-cloud-sandbox activatable even when
sandbox was disabled.

Add a delete right after resolveRuntimeMode to cover both the
manifestMap seed and the allowedBuiltinTools loop in one place.

Co-authored-by: chatgpt-codex-connector[bot]

* 🐛 fix: gate local-system injection by runtimeMode === 'local'
2026-06-09 19:03:25 +08:00
Innei 69cefce3d9 🐛 fix(page-editor): align table bleed with controllers (#15588) 2026-06-09 19:02:47 +08:00
Arvin Xu b295265f25 🐛 fix(hetero): stop cross-message text duplication in server-ingest mode (#15585)
🐛 fix(hetero): reset per-message text accumulator at message boundaries

In server-ingest mode (remote-device CC and cloud sandbox both run
`lh hetero exec`), SerialServerIngester's `accumulatedText` spanned the
whole run and never reset across assistant-message boundaries. Combined
with `snapshotMode: 'replace'`, every later message's snapshot re-emitted
all prior messages' text verbatim, which the server persisted into the
new DB message — producing cross-message text duplication.

Reset `accumulatedText` on `stream_start` / `stream_end` (emitted by the
adapter's `openMainMessage`) after flushing the just-ended message's
snapshot, so each message snapshots only its own text.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 18:27:47 +08:00
Innei 1a4005c7b9 ♻️ refactor: extract server into apps/server + root namespaces into packages (#14949)
* ♻️ refactor(server-deps): extract envs/trpc/config/locales/business-server into packages

* ♻️ refactor: relocate src/server backend modules to apps/server package

Rebuilt on current canary: git mv the 8 server subtrees (services, routers,
modules, globalConfig, utils, runtimeConfig, workflows, featureFlags) into
@lobechat/server, with @/server/* dual-path alias, database vitest aliases,
and instrumentation import fixup.

* 📝 docs(skills): update src/server path refs to apps/server/src after relocation
2026-06-09 18:09:26 +08:00
sxjeru 64d3bdb978 💄 style: add preserve thinking feature for Qwen3.7 Max model (#13494)
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: YuTengjing <ytj2713151713@gmail.com>
2026-06-09 17:21:39 +08:00
Arvin Xu 434532ce36 🐛 fix(heterogeneous-agents): emit per-turn usage for batch-mode Claude Code (#15577)
* 🐛 fix(heterogeneous-agents): emit per-turn usage for batch-mode Claude Code

Device + sandbox runs spawn Claude Code via the `lh hetero exec` CLI in BATCH
mode (no `--include-partial-messages`), unlike the desktop driver which always
streams partial messages. In batch mode CC emits no `message_delta`, and the
adapter deliberately skipped usage on `assistant` events (assuming the stale
`message_start` echo that only exists in partial mode). The grand-total
`result_usage` is intentionally ignored to avoid double-counting, so batch runs
ended up persisting NO usage at all — the model tag showed no token count.

Track whether any `stream_event` was seen (partial mode); when none has been
(batch mode), emit per-turn usage from the `assistant` event as turn_metadata.
The assistant event's usage is authoritative in batch mode, not a stale echo.

This also fixes the model tag showing `claude-opus-4-8[1m]`: the `[1m]` 1M-context
beta marker only appears in the `system init` model field, while `assistant`
events report the canonical `claude-opus-4-8`. The new turn_metadata carries the
clean id, which supersedes the init-captured one (and matches the id ModelIcon /
pricing lookups expect).

Partial mode (desktop/local) is unchanged — `message_delta` still owns usage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(heterogeneous-agents): update batch-mode E2E for assistant usage

The multi-step E2E fixture has no `stream_event` records (batch mode) and 5
assistant events with `message.usage`, so the new batch-mode path now emits 5
turn_metadata events. Update the expectation from 0 — this validates the fix on
a realistic device/sandbox session: per-turn usage lands with the canonical
model id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(heterogeneous-agents): stop leaking host Anthropic creds into spawned CLI

The local CLI spawn forwarded the entire `process.env` to `claude`, so a
developer with `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN` / `ANTHROPIC_BASE_URL`
exported in their shell had it inherited by the CLI — overriding its own
subscription login and surfacing as a baffling "Invalid API key" + non-zero
exit on every message.

Strip those three vars from the inherited env via `buildInheritedSpawnEnv`.
`session.env` is still spread last, so an agent that explicitly configures an
API key continues to win. Adds regression tests for both the strip and the
override.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 16:51:33 +08:00
YuTengjing 23120f26e4 💄 style: update referral backfill copy (#15583) 2026-06-09 16:40:35 +08:00
sxjeru 77dbe4b7b3 🔨 chore(google): Support External URL file input with SSRF validation to optimize transmission (#12657)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: yutengjing <ytj2713151713@gmail.com>
2026-06-09 16:13:54 +08:00
LiJian 1ccc86e589 🐛 fix(skill): consolidate add-skill button into header dropdown (#15578)
* 🐛 fix(skill): consolidate add-skill button into header dropdown

Move the standalone 'AddSkillButton' from SkillList sidebar into the
header '+' dropdown, providing a unified entry point for all add-skill
actions (import from URL/GitHub, upload zip, custom connector).
Replace legacy 'Add Custom MCP' with the new Connector flow.

* 🐛 fix(skill): fix lint - remove unused ChevronDown import, sort imports
2026-06-09 16:07:36 +08:00
Rdmclin2 ccb33fa48c feat: workspace backend service slice (#15560)
Backend-only slice of the workspace feature (server routers/services, database models with workspaceId threading, openapi middleware, business/server stubs, const/types). Excludes all UI (features/routes/store/hooks). Deploys dark behind the workspace feature flag.

Includes open-source stub fixes: workspaceCreds router stub, ChargeParams workspaceId, usage.ts null-coalesce, DBMessageItem.workspaceId.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:54:26 +08:00
YuTengjing 082481c35d 🔇 chore: silence noisy dev console logs (#15548) 2026-06-09 14:55:37 +08:00
Arvin Xu 441e0c5b7c 🐛 fix(heterogeneous-agents): refine execution target + topic sidebar attention grouping (#15574)
* 🐛 fix(heterogeneous-agents): hide "no device" execution target for hetero agents

Heterogeneous agents (Claude Code / Codex) bring their own toolchain and must
execute somewhere, so the 'none' (plain chat) execution target is invalid for
them. Hide the option in the device switcher and never resolve/display 'none'
for hetero agents — fall back to local (desktop) or sandbox (web) instead.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(topic): use colorText for titles and move "Needs attention" below favorites

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(chat-input): improve runtime config bar layout on narrow screens

Keep chips on a single line (no per-character wrapping), truncate long
labels (working dir / branch / device name) with ellipsis, and let the
workspace cluster scroll horizontally instead of wrapping. On a narrow
bar the hetero "full access" badge collapses to its icon (hover tooltip
still explains it) via a container query.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(topic): show project directory under topic items in by-status mode

Surface each topic's working directory as a muted second line in the
by-status grouping, where rows otherwise carry no project context. Data
is already on the topic metadata, so no extra fetch.

- NavItem: add opt-in `description` slot (single-line layout unchanged)
- DirIcon: convert `renderDirIcon` function into a memo component, add
  `size` prop, rename file to PascalCase, migrate all call sites

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 14:18:18 +08:00
Arvin Xu 0a6b02ccb5 💄 style(topic): show error alert icon with tooltip on failed topics (#15573)
* 💄 style(topic): show error alert icon with tooltip on failed topics

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(topic): merge attention-needing topics into one "Needs attention" group

Collapse the unread-completion, failed, and waitingForHuman states into a single
top "pending" status bucket (待处理 / Needs attention) so the sidebar surfaces
everything that needs the user's attention in one place.

- groupTopicsByStatus now buckets those three states into `pending`, taking a new
  `unreadTopicIds` set (unread completions are a client-only state).
- Server STATUS_SORT_RANK floats `failed` to the top alongside `waitingForHuman`
  so failed topics stay on the first page and don't drop out of the group.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(topic): pin the "Needs attention" group above favorites

The pending bucket already sorts above running, but the synthetic favorite group
was prepended ahead of it. Hoist pending to index 0 so attention-needing topics
sit at the very top of the sidebar, above both favorites and running.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(heterogeneous-agents): pin resolved cwd onto remote-CC new topics

Remote CC dispatched the run with the correct working directory (the
precedence chain falls back to the agent's per-device pick), but a
brand-new topic was created without `metadata.workingDirectory`, so the
sidebar grouped it under "No directory" / 无目录.

Unify the three drifting server-side cwd-precedence sites behind one
pure helper (`resolveDeviceWorkingDirectory`) and persist the resolved
cwd back onto a freshly-created topic so grouping, next-turn reuse, and
workspace-init scan all agree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:24:42 +08:00
LiJian 5dd0f0c0c9 feat: specialize Market auth modal copy per capability scene (#15569)
Introduce a MarketAuthScene ('default' | 'sandbox' | 'mcp' | 'publish') so the
Market authorization modal can show capability-specific copy instead of the
generic "Create Community Profile" wording, while falling back to the generic
copy for unknown scenes.

- Reactive (401) path: infer scene from the tRPC procedure path in the error
  link and carry it on the market-unauthorized event.
- Proactive path: callers pass the scene to signIn() (publish buttons, MCP/skill
  install, in-chat market tool auth).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:39:33 +08:00
LiJian dfb70c1e87 🐛 fix(skills): inject pinned skill content into the system prompt (#15568)
* 🐛 fix(skills): inject pinned skill content into the system prompt

Pinned skills (ids in agentConfig.plugins) were marked activated by
SkillResolver but never carried their content, because resolveClientSkills
dropped the `content` field when mapping store skills to metas. As a result
SkillContextProvider's `s.activated && s.content` filter skipped them, so the
agent had to call activateSkill to use a pinned skill instead of it being
force-injected.

- builtin skill content is already in the store: carry it through.
- pinned DB skill content is fetched on demand (store cache first), only for
  pinned ids to avoid bulk network calls when auto mode exposes every skill;
  a failed fetch degrades gracefully to a content-less listing.
- resolveClientSkills becomes async; contextEngineering awaits it.
- add skillEngineering tests covering both paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(skills): mark pinned skills activated and fix test types

The MessagesEngine path passes skillsConfig.enabledSkills straight to
SkillContextProvider without running SkillResolver, so the metas must carry
`activated` themselves — content alone is not enough (the provider only injects
`s.activated && s.content`). Mark pinned skills activated in resolveClientSkills,
guarded by content presence so a content-less pinned skill still falls back to
the <available_skills> list instead of disappearing.

Also widen the test helper's param type so `content`/`activated` are accessible
(fixes TS2339 in CI).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(skills): don't pre-activate ZIP-bundled pinned skills

Server-side bundle mounting for execScript / readReference is keyed off
stepContext.activatedSkills, which is populated only by the activateSkill tool
call — operation-level pinning never seeds it. So pre-injecting the content of a
ZIP-bundled DB skill would tell the model to run scripts from an unmounted bundle.

Gate the content pre-injection on the absence of a zipFileHash: bundled skills
stay in <available_skills> and are activated via the tool (which mounts the
bundle), while pure-content skills (builtin Artifacts, bundle-free DB skills)
are still force-injected when pinned.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:38:59 +08:00
Arvin Xu 7ad6e2aa25 🐛 fix(agent): make working-directory Clear actually clear legacy / default-sourced cwd (#15571)
* 🐛 fix(agent): make working-directory Clear actually clear legacy / default-sourced cwd

The "Clear" action in the working-directory picker was a no-op whenever the
shown directory came from a precedence level that clear() never touched:

- clear() only removed the topic override and the agent's per-device choice
  (workingDirByDevice), but the button's visibility was gated on selectedDir,
  which also resolves from legacyAgentWorkingDirectory (pre-migration
  localStorage pick) and deviceDefaultCwd (device-wide default). When the cwd
  came from either, clear() deleted an already-empty higher level → nothing
  changed.

Fixes:
- useCommitWorkingDirectory: when clearing at the agent-default scope, also drop
  the legacy per-agent value (localStorage-only, no network round-trip).
- WorkingDirectoryPicker: gate the Clear button on hasClearableSelection
  (topic / agent choice / legacy) instead of selectedDir, so it no longer
  renders as a dead button when the cwd comes solely from the device default
  (which isn't clearable from the agent picker).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(claude-code): slow token count-up animation to 2000ms

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:22:40 +08:00
Arvin Xu 3986223b25 🐛 fix(heterogeneous-agents): show real CLI model on remote-spawned Claude Code (#15572)
Remote/device-spawned CC runs persist via the server-side
HeterogeneousPersistenceHandler (the executing device is not the viewing
client), and the assistant placeholder was created with the agent's
configured chat model/provider (e.g. deepseek-v4-pro). That value leaked
into the model tag and was re-applied at terminal, so the model tag showed
the wrong model instead of the real Claude Code model.

- Create the hetero placeholder with `provider: heteroType` for ALL hetero
  agents (not just remote openclaw/hermes) and no model, mirroring the
  client path. The real model is reported by the CLI and backfilled.
- Capture the CLI's authoritative model/provider from the first
  `stream_start` (CC system/init) and backfill the placeholder, so the real
  model lands from the first turn even without usage-bearing turn_metadata.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:08:00 +08:00
Arvin Xu ea246d6e17 feat(agent): list project skills over device RPC in the sidebar (#15566)
*  feat(agent): list project skills over device RPC in the sidebar

The right-sidebar 技能 (project skills) tab only read skills over local
Electron IPC, so in device mode (working dir on a bound remote device, or
the web client) the list was always empty — unlike the Files / Review tabs
which already branch on `deviceId`.

Add a `listProjectSkills` device RPC mirroring `getProjectFileIndex`:
- types: `DeviceProjectSkillItem` / `DeviceListProjectSkillsResult`
- `deviceGateway.listProjectSkills` via the generic `invokeRpc` relay
- TRPC `device.listProjectSkills` + `GatewayConnectionCtr` dispatch to
  `WorkspaceCtr.listProjectSkills`
- renderer chokepoint `projectSkillService` branches on `deviceId`
- `useProjectSkills(dir, deviceId?)`; remote mode lists but doesn't open
  previews (parity with the Files tab)
- thread `remoteDeviceId` through `SkillsGroup`

No device-gateway repo change needed — the RPC relay is method-agnostic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): list project skills over device RPC for homogeneous agents too

Thread `deviceId` through the homogeneous resources path
(`AgentDocumentsGroup` → `ProjectLevelSkills`) so a device-bound homogeneous
agent's 技能 tab populates over RPC, matching the heterogeneous `SkillsGroup`.
`useProjectSkills` already accepts `deviceId`; this just wires it in and
OR-s `deviceId` into the `showProjectSkills` gate.

(The large AgentDocumentsGroup diff is prettier re-indentation from wrapping
the outer memo() once the param list crossed the print width.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent): resolve per-device cwd in ResourcesSection so device-mode skills load

ResourcesSection computed its working directory with the legacy
`topicCwd || agentCwd` selector, which misses `workingDirByDevice[deviceId]`
and `device.defaultCwd`. For a device-bound agent the cwd lives in that
per-device map, so it resolved to `undefined` — the project-skills SWR key
was null and the fetch never fired even though `deviceId` was set (the 技能
tab showed "暂无可用技能"). Switch to `useEffectiveWorkingDirectory`, the
same resolver the runtime bar / WorkingSidebar use. Fixes both the hetero
SkillsGroup and the homogeneous AgentDocumentsGroup paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(agent): show loading state for project skills while switching path

On a working-directory switch the project-skills SWR key changes, so items
go empty while the new scan is in flight. The homogeneous skills panel was
flashing the empty placeholder instead of a loader. Surface
`useProjectSkills().isLoading` and render NeuralNetworkLoading when project
skills are the only source and still loading. (The hetero SkillsGroup already
shows it via SkillSection's isLoading.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 10:58:55 +08:00
Arvin Xu f5458e1ad9 ♻️ chore: replace LOBE-XXX markers with inline migration context (#15567)
♻️ chore: replace LOBE-XXX markers with inline migration context in 0110 SQL
2026-06-09 10:54:22 +08:00
LiJian 251e2ede5e feat(sandbox): sync user-uploaded files into the cloud sandbox (#15550)
*  feat(sandbox): sync user-uploaded files into the cloud sandbox

Pre-load the files a user attached in a conversation (topic message files +
session files) into the cloud sandbox the first time it is used, and tell the
agent they are available.

- FileModel.findFilesToInitInSandbox: merge messages_files (by topic) and
  files_to_sessions (by the topic's session), de-duped by file id
- SandboxMiddlewareService.ensureFilesInitialized: on first tool call, presign
  download URLs and run an idempotent curl bootstrap into /mnt/data; guarded by
  an in-sandbox marker and a short-lived Redis hint, best-effort so it never
  blocks the actual tool call (caps: 50 files / 100MB / 120s)
- Agent awareness via {{sandbox_uploaded_files}} in the cloud-sandbox systemRole,
  populated by both the server (RuntimeExecutors) and client (contextEngineering)
  placeholder generators

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(sandbox): make file sync work on all server runtimes & keep prompt consistent

Address review feedback on the uploaded-files sync:

1. (high) The sync was a no-op on the cloudSandbox server runtime and the skills
   runtime because createSandboxService() was called without serverDB, so
   ensureFilesInitialized() returned early. Thread serverDB through both.
   (heterogeneous sandboxRunner is intentionally left out: it runs a coding agent
   in /workspace and does not use the cloud-sandbox systemRole.)

2. (medium) Drop the Redis "already initialized" hint. The in-sandbox marker is
   now the single source of truth for idempotency, so a recycled sandbox always
   re-syncs instead of being skipped by a stale 5-min Redis key.

3. (medium) Apply the 50-file / 100MB caps inside formatUploadedFilesPrompt (via
   the shared selectSandboxInitFiles), so the files the prompt advertises match
   exactly what the bootstrap downloads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 10:40:34 +08:00
Innei 337e7f244c 🐛 fix(market-auth): skip auth flow when LobeChat session is missing (#15532)
Guard `signIn()` and the market.* 401 handlers on `isSignedIn` so the
Create Community Profile modal no longer pops up for unauthenticated
users. Routing the user back to LobeChat sign-in is not MarketAuth's
responsibility — callers handle that.
2026-06-09 10:16:44 +08:00
Arvin Xu eae47f527c feat(markdown): render GitHub / Linear / external links as rich chips (#15561)
*  feat(heterogeneous-agents): default Codex exec to bypass approvals/sandbox

Switch the default Codex execution mode from --full-auto to
--dangerously-bypass-approvals-and-sandbox, and share the execution-mode
constants from @lobechat/heterogeneous-agents/spawn so the desktop driver
and spawnAgent stay in sync. An explicit execution flag in extraArgs still
wins. Also fix the Codex adapter step tracking so consecutive agent_message
items stay in one step, stale tool completions don't start a new step, and
turn completion drains pending tools before emitting stream_end +
agent_runtime_end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(shared-tool-ui): unwrap shell-wrapper commands in RunCommand UI

Codex execs commands wrapped as `/bin/zsh -lc '...'`; surface the inner
command in the RunCommand inspector and render. Also switch Unix glob
fallback from `find` to `fast-glob` to preserve globstar semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(markdown): render GitHub / Linear / external links as rich chips

Add a markdown Link plugin that rewrites anchor elements into rich inline
chips: GitHub repo/PR/issue/commit/user, Linear issues, npm packages, Figma
files, mailto, and any other external link (favicon + full URL). Citation,
footnote, anchor and relative links keep the default renderer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ⬆️ chore(deps): bump @lobehub/editor to 4.17.0 and @lobehub/ui to 5.15.10

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 09:22:35 +08:00
Arvin Xu dfdf844761 🐛 fix(desktop): bump node-gyp to 12.x so Windows build finds Visual Studio 2026 (#15562)
GitHub redirects the `windows-2025` runner to the new `windows-2025-vs2026`
image, which ships Visual Studio 2026. node-gyp 11.5.0 only recognizes VS
2019/2022, so `electron-builder install-app-deps` fails to rebuild the native
`get-windows` module with "Could not find any Visual Studio installation".
node-gyp 12.x adds VS 2026 detection. Override it in both the root workspace
and the isolated apps/desktop install.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 02:36:10 +08:00
Arvin Xu cca01451f9 feat(heterogeneous-agents): default Codex exec to bypass approvals/sandbox (#15557)
*  feat(heterogeneous-agents): default Codex exec to bypass approvals/sandbox

Switch the default Codex execution mode from --full-auto to
--dangerously-bypass-approvals-and-sandbox, and share the execution-mode
constants from @lobechat/heterogeneous-agents/spawn so the desktop driver
and spawnAgent stay in sync. An explicit execution flag in extraArgs still
wins. Also fix the Codex adapter step tracking so consecutive agent_message
items stay in one step, stale tool completions don't start a new step, and
turn completion drains pending tools before emitting stream_end +
agent_runtime_end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(shared-tool-ui): unwrap shell-wrapper commands in RunCommand UI

Codex execs commands wrapped as `/bin/zsh -lc '...'`; surface the inner
command in the RunCommand inspector and render. Also switch Unix glob
fallback from `find` to `fast-glob` to preserve globstar semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-09 02:03:35 +08:00
Innei d2cd9ef023 feat(page-editor): enable block plugin with shared inline padding (#15556)
*  feat(page-editor): enable block plugin with shared inline padding

Mount `ReactBlockPlugin` on the page editor with `anchorPadding={0}` so
the editor root no longer reserves its default 54 px gutters, and apply
`DEFAULT_BLOCK_ANCHOR_PADDING` as `paddingInline` on the `Flexbox`
wrapping `TitleSection` + `EditorCanvas`. This keeps the title and
editor content aligned while leaving the same 54 px of room for the
floating block menu / drag handle to render in.

Requires `@lobehub/editor` with `anchorPadding` support and the
exported `DEFAULT_BLOCK_ANCHOR_PADDING` constant.

* 🐛 fix(page-editor): drop redundant overflowY on editor content wrapper

`editorContent` previously declared `overflowY: 'auto'`, which created
a second scroll container nested inside `.contentWrapper` (already
`overflowY: 'auto'`). With the new inline padding from
`DEFAULT_BLOCK_ANCHOR_PADDING`, the nested scroller clipped the
floating block menu / drag handle that the editor renders in the
inline-padding gutter. Let the outer wrapper own scrolling so the
gutter overflow stays visible.
2026-06-09 01:04:10 +08:00
Arvin Xu ea3ae583d6 feat(agent): unified per-device working directory + execution-device UI (#15543)
*  feat(agent): unified per-device working directory + execution-device UI

Client UI consuming the backend contract (#15542). User-facing — validate
before merge.

- New `src/store/device` (SWR fetch + cwd writes) — single source of device data;
  `deviceCwd` helper moves here from the chat-input feature layer.
- One `WorkingDirectoryPicker` for local + remote (native dialog vs manual path).
- Shared `WorkspaceControls` strip composed by both chat-input bars.
- GitStatus reads remote git via `useDeviceGitInfo` (read-only).
- Execution-device switcher graduates out of labs → writes only executionTarget.
- One-time migration of legacy localStorage recents into device.workingDirs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): wire executionTarget→runtimeMode + workingDirByDevice cwd

The runtime-decision wiring, kept out of the backend contract PR so it's
reviewed/validated together with the UI that drives it.

- `helpers/executionTarget`: resolveRuntimeMode / executionTarget resolvers.
- server tool gate (AgentToolsEngine) derives runtimeMode from
  `agencyConfig.executionTarget`, with a no-regression fallback to the legacy
  per-platform runtimeMode.
- server cwd precedence (aiAgent resolveWorkspaceInit + hetero dispatch) now
  consumes `workingDirByDevice[targetDeviceId]`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(agent): cover executionTarget + workingDir helpers; drop dead lab key

- Unit-test resolveRuntimeMode / resolveExecutionTarget and the working-dir
  precedence (locks the web default→cloud graduation + legacy fallback)
- Remove the now-unused `executionDeviceSwitcher` lab i18n keys (toggle deleted)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): guide web users to the desktop app in the device switcher

On web with no remote device, replace the muted "no devices" dead-end with a
prominent, clickable download-desktop card (and drop the now-duplicate header
link). Desktop keeps the muted hint since local execution is already available.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): fix execution-device copy for desktop + web

- Desktop "no devices" hint no longer tells an already-on-desktop user to
  "install the desktop app" — just points at `lh connect`.
- Tighten the web download-card description to the desktop's real benefit
  (run on your computer with local file access).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): flatten the web download card to a plain row

Drop the outer border/background so it reads as a normal menu row (like the
sandbox option), and shorten the description to a single line so the row stops
being taller than its neighbours.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): reword download-card desc to "access to your computer"

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): add "no device" execution target (plain chat, no run tools)

Restores the option to run an agent with no execution environment, lost when
the per-platform runtimeMode was unified into executionTarget. Adds `none` to
HeteroExecutionTarget (→ runtimeMode `none`), surfaces it at the top of the
switcher on both web + desktop, and flips the web default back to `none` so an
unconfigured web agent is plain chat again (desktop still defaults to local).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): rename HeteroExecutionTarget→DeviceExecutionTarget, reorder switcher

- Rename the type (it now carries `none`, so "device" target fits better than
  "hetero") across types + helpers + dispatcher + switcher.
- Move "no device" to the bottom of the list (real targets first, opt-out last).
- Reword the download card to "let agents connect directly to your computer".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): move "no device" back to top, restore EN download copy

"No device" sits above the dynamic device rows; keep the EN download-card
wording as "Run agents with access to your computer".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): swap switcher icons — MonitorOff for "no device", Box for sandbox

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): clarify execution-device info tooltip + "no device" desc

- Info tooltip now explains the cloud sandbox is provided by the centralized
  LobeHub Marketplace, and that picking a device makes it the agent's runtime
  for reading/writing files and operating the computer.
- "No device" description now conveys "no device enabled, can't operate a
  computer" instead of "plain chat".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): move info icon beside the title, shorten "no device" desc

- Info tooltip trigger now sits next to the "Execution Device" title instead of
  right-aligned; the download link stays on the right.
- "No device" description trimmed to just "No device enabled".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): zh tooltip wording — "提供服务"

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): reorder tooltip — device runtime first, marketplace last

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): trim tooltip — drop "设备"/devices and trailing period

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): tag the current machine's device row, drop duplicate "This device"

When the desktop's own machine appears in the device list, badge that real row
with a "This device" tag and hide the generic "This device" (local) option —
no more two entries for the same machine. The local option still shows as a
fallback when the machine isn't enrolled in the list yet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(agent): hoist this-machine device above sandbox + auto-bind on first run

Switcher-only (no routing/dispatch changes):
- Order is now: no device → this device → cloud sandbox → other devices.
- On desktop, when this machine is enrolled and online and the agent has no
  explicit target yet, default to it and persist the binding once.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): widen gap between execution-device rows

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): hide "Get Desktop App" link on desktop

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): capitalize "Cloud Sandbox" label

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(agent): web working-dir entry via "Add folder" modal instead of inline input

The browser folder picker can't yield an absolute path (sandboxed handle), so
on web / a remote device the working directory is entered manually. Replace the
inline input with an "Add folder…" row that opens a modal for absolute-path
entry; the local desktop machine still opens the native folder dialog.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): split working-dir footer into local/remote row components

Replace the scattered `isLocalDevice ?` forks (icon, label, handler) with one
branch that picks between two self-contained rows: ChooseLocalFolderRow (native
dialog) and AddRemoteFolderRow (absolute-path modal).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): use the device default cwd as the add-folder placeholder

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): validate manually-entered working dir via device statPath RPC

Web / remote clients can't browse the target device's filesystem, so the
"Add folder" modal now checks the typed path on the device before binding it.
New `statPath` device RPC mirrors gitInfo end-to-end:
- desktop WorkspaceCtr.statPath (fs.stat → exists / isDirectory) + RPC dispatch
- server deviceGateway.statPath + device.statPath tRPC (invokeRpc relay)
- modal blocks on a definitive negative (not found / not a directory); an
  unreachable device is treated as "can't verify" and allowed through

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): route statPath through deviceService, not lambdaClient

Components shouldn't import lambdaClient directly — add a thin deviceService
wrapping device.statPath, and call it from the working-dir picker.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(i18n): move working-directory strings from plugin to a device ns

The working-directory / git control-bar strings (53 keys) were lumped under the
`plugin` namespace. Move them to a dedicated `device` namespace and drop the
now-redundant `localSystem.` prefix (`plugin:localSystem.workingDirectory.X` →
`device:workingDirectory.X`). Updates the 4 consumer components; the `device`
ns auto-registers via defaultResources.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): route all device TRPC calls through deviceService

Components/hooks/stores shouldn't reach into lambdaClient.device.* directly.
Expand deviceService with listDevices/updateDevice/listGitBranches/
checkoutGitBranch/checkCapability/getAgentProfile and migrate every imperative
call site (device store, BranchSwitcher, CreatePlatformAgent, the remote-agent
guard, RemoteAgentConfigCard) + the DeviceListItem type. lambdaQuery.device.*
React-Query hooks are left as-is (a different pattern).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): pull/push a remote device's branch over RPC

Wire git pull/push through the device's pullGitBranch/pushGitBranch RPC so the
web/remote GitStatus bar can sync, not just the local desktop over IPC. Shows
the pull/push affordances for remote devices too.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): route git pull/push through deviceService too

Add pullGitBranch/pushGitBranch to deviceService and switch GitStatus off the
direct lambdaClient.device.* calls, so no component reaches the device router
directly anymore.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent): detect repoType for manually-added working dirs

A directory added via the "Add folder" modal committed without a repoType, so a
GitHub repo showed a plain folder icon. statPath now also returns the git repo
type (detected on the target device); the modal threads it into the committed
entry. Collapses the modal's separate validate+submit into one onSubmit that
validates and enriches in a single round-trip.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(agent): create new branch via a modal instead of inline footer

"Checkout new branch…" now opens a focused modal (branch-name input + create)
rather than expanding an inline footer inside the branch dropdown. Always
creates + checks out the branch — no checkout/overwrite options. Errors show
inline in the modal; drops the dead inline-create state/styles.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(agent): route all git ops through a unified gitService

Pick Electron IPC vs device RPC inside the service so UI / store / hooks
stay transport-agnostic. Replace the bundled `gitInfo` device RPC with
granular reads (branch / linked PR / working-tree / ahead-behind) that
mirror the local IPC methods one-to-one, and move the git read SWR hooks
into the device store (useFetchGitInfo / WorkingTreeStatus / AheadBehind).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): route Review git ops through device RPC (remote-capable)

Extend the device-RPC git pipeline to the 4 ops the Review panel needs
(getGitWorkingTreePatches / getGitBranchDiff / listGitRemoteBranches /
revertGitFile), mirroring the listGitBranches pattern end-to-end: desktop RPC
dispatch → deviceGateway → device.* tRPC → gitService. Adds minimal DeviceGit*
mirror types to @lobechat/types. Review (useReviewPatches / useGitRemoteBranches
/ FileItem) now goes through gitService with a deviceId, dropping the isDesktop
gate so web/remote devices get the diff + revert too.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent): resolve repoType from device store so remote Review tab shows

useRepoType now reads the persisted workingDirs[].repoType from the device
store (keyed by deviceId), so a remote device's git/github type — and thus the
Review tab visibility — resolves without a local-only IPC probe. The IPC probe
+ localStorage fallback are kept only when the target is the local machine.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(agent): optimistic branch switch in the branch switcher

Flip the displayed branch the instant a checkout is clicked (or a new branch
created) instead of waiting for the IPC/RPC round-trip + gitInfo refetch. The
git-info SWR cache is optimistically updated and reconciled on completion — a
failed checkout rolls the label back and toasts the error.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat: support remote device files panel

* 💄 style: restore desktop this-device option

* 🐛 fix: keep files panel local for this device

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 23:27:52 +08:00
Arvin Xu a75eba5a4f 💄 style(chat-input): use compact stats footer for skill tools popover (#15552)
* 💄 style(chat-input): use compact stats footer for skill tools popover

- Replace the two full-width footer rows (store / management) with a
  compact stats footer: pinned / auto counts on the left, an
  "Add Skills / Connector" store button (icon + label) and a settings
  icon button on the right.
- Right-align each item's type tag (MCP / Skills / builtin) so badges sit
  flush next to the row action instead of trailing the name.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(aiAgent): mock deviceGateway in connectorOverlap exec test

execAgent reads `deviceGateway.isConfigured`, which under the happy-dom
test environment hits real t3-env and throws "server-side env var on the
client". Mock `@/server/services/deviceGateway` like the sibling device
tests do so the connector/plugin overlap cases run in isolation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 22:23:38 +08:00
Innei 9eff025787 💄 style(modal): use base-ui Button in custom modal footers (#15444)
💄 style(modal): use @lobehub/ui/base-ui Button in modal footers

Align custom-footer button padding/font with base-ui Modal's built-in
OkBtn/CancelBtn (32h / 14 / 13) for consistent visual rhythm. Affects
AuthRequiredModal footer and TaskTemplateDetailModal content button.
2026-06-08 21:31:34 +08:00
Innei 9b19ebb2c6 🐛 fix(desktop): unbreak dev cold-start + restore UI language across reloads (#15547)
* 🐛 fix(desktop): unbreak dev cold-start on non-default UI languages

`ViteRendererFallback` now proxies via globalThis `fetch` (Node undici) instead
of Electron `net.fetch`, and Vite dev server is pinned to IPv4 listen. The
main-process Chromium `net` pool is small and surfaces `ERR_INSUFFICIENT_RESOURCES`
under cold-start module bursts + ~50 i18n namespace fan-out under non-en-US
locales. undici queues internally and avoids that pool entirely; v4 listen avoids
happy-eyeballs dual-stack connect storms. A Semaphore(64) still caps in-flight
fetches so the OS socket layer never gets buried.

Fixes LOBE-10086

* 🐛 fix(desktop): restore persisted UI language across renderer reloads

The renderer's `<html lang>` was being computed from `?lng=` (injected by the
main process at `loadURL` time) with `navigator.language` as fallback. On
`Cmd+R` the webContents reload reuses the prior URL without rebuilding it
against `storeManager.locale`, so users who changed their language after
launch got dropped back to the OS locale on every reload (white screen, then
English). Read the i18next localStorage cache first — that's the actual
persisted user setting written by the language switcher — and fall back to the
URL param + navigator as before.

*  test: mock device gateway in connector overlap spec
2026-06-08 21:21:24 +08:00
YuTengjing a2fd98a2d1 🐛 fix: restore file URLs in context prompts (#15549) 2026-06-08 19:26:16 +08:00
Arvin Xu 235a16fc11 chore(agent): agencyConfig contract + git-over-RPC backend (#15542)
*  feat(agent): agencyConfig contract — workingDirByDevice + executionTarget

Type-only contract for the unified per-device working-directory work. Adds
`workingDirByDevice` (per-device cwd) and `executionTarget` to agencyConfig.
No runtime logic consumes them yet — the server/client wiring lands in the UI
PR so it can be validated as one unit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): device gitInfo over RPC + shared local-file-shell git impl

Backend/RPC capability for "git branch / changes / PR for remote devices".
Dormant — no client caller yet; merging changes no existing behavior.

- `@lobechat/local-file-shell/git`: repoType + branch / linked-PR / working-tree
  / ahead-behind + `gitInfo` aggregate + `DeviceGitInfo` type (desktop + CLI).
- desktop `GitCtr.gitInfo()` (@IpcMethod) delegates to it; registered in
  GatewayConnectionCtr's RPC dispatch. `utils/git` re-exports the helpers.
- server: `deviceGateway.gitInfo()` wrapper + `device.gitInfo` TRPC query.
- `@lobechat/types`: `DeviceGitInfo` shape.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  test(desktop): fix stale mocks after git impl moved to local-file-shell

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(server): extract DeviceGateway into its own service dir

deviceGateway is a device-scoped gateway client (status/list/tool-call/git/
workspace RPC), not tool-execution-specific. Move it out of toolExecution/
into its own services/deviceGateway/ and update all import sites.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 18:09:09 +08:00
LiJian ee65cf2a0f feat(connector): custom OAuth MCP connectors — onboarding, runtime execution & connector-first (LOBE-9983) (#15546)
*  feat(connector): wire custom MCP OAuth — Pre-registration & DCR (LOBE-9983)

Connect the two OIDC schemes designed in LOBE-9736 (oidcConfig) end-to-end so
users can add a custom OAuth MCP server from /settings/skill. Until now the DB
schema, models, and tool-permission UI existed, but nothing ran the OAuth
authorization flow — syncTools only worked when a token already existed.

Flow (shared pipeline, branches only on where client_id comes from):
- Add modal (client_id present → Pre-registration; absent → DCR/RFC 7591)
- startOAuth: probe MCP URL → RFC 9728 protected-resource metadata → RFC 8414
  AS metadata; DCR-register the client when no client_id; persist resolved
  oidcConfig; build PKCE authorize URL, stash verifier in Redis keyed by state
- /oauth/connector/callback: consume state → exchange code → store encrypted
  tokens (KeyVaultsGateKeeper) + tokenExpiresAt + status=connected → postMessage
- syncTools lazily refreshes the access token before connecting

Built on @modelcontextprotocol/sdk OAuth helpers (discover/register/start/
exchange/refresh) — no hand-rolled protocol code.

Security:
- Wire KeyVaultsGateKeeper into ConnectorModel so OAuth tokens are encrypted at
  rest (previously the router passed no gatekeeper → plaintext)
- Strip decrypted credentials and oidcConfig.clientSecret from the list response

UI:
- "+" button in /settings/skill Connectors tab opens the Add modal
- SkillList surfaces custom connectors from the connector store
- Modal wires the client secret field, infers the scheme, and shows the
  redirect URI to register

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(connector): request server-advertised scopes in OAuth flow

The authorize request sent an empty scope list, so providers that require a
scope (e.g. Linear MCP advertises scopes_supported ["read","write"]) issued a
useless token or rejected the flow. Default to the authorization server's
advertised scopes_supported when the user did not specify any, and use them for
both DCR registration and the authorize request.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(connector): let OAuth callback bypass SPA rewrite and auth gate

/oauth/connector/callback is a backend route handler reached via a cross-site
redirect from the OAuth provider, so the proxy middleware broke it two ways:

1. It was not in the backend passthrough list, so it got rewritten to the SPA /
   locale shell instead of running the route handler (307 → blank).
2. It was not in isPublicRoute, so BetterAuth treated it as protected; the
   cross-site top-level navigation doesn't reliably carry the SameSite session
   cookie, so it redirected to sign-in (307).

Add /oauth/connector to backendApiEndpoints and /oauth/connector/callback to
isPublicRoute (the handler validates its own single-use state, so it must not be
session-gated). Scoped so /oauth/callback/success|error SPA pages are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(connector): execute connector tools server-side + agent-runtime wiring

Make custom OAuth MCP connectors actually callable, and sync their tools as
soon as authorization completes.

- callback: after token exchange, sync the tool list server-side via a shared
  syncConnectorToolsById — the connector is usable without a client round-trip
- sync.ts: extract buildConnectorMcpParams (http+auth / stdio), shared by
  syncTools and the new callTool
- connector router: add `callTool` (resolve connector, hard-block disabled
  tools, refresh token, call the remote MCP with decrypted credentials)
- aiAgent runtime: pass a KeyVaultsGateKeeper when resolving connectors so OAuth
  tokens decrypt (otherwise tool calls 401); surface connectors in the
  agent-management availablePlugins as a new 'connector' type
- AgentManagementContextInjector: render a <connector_plugins> section

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(connector): wire connectors into the classic client chat path

The front-end chat orchestrates tools client-side (via /webapi/chat proxy),
separate from the server agent runtime. Connectors were invisible and
unexecutable there. Wire them in, connector-first.

- toolEngineering: build connector manifests from the store and inject them into
  createToolsEngine; drop plugins sharing a connector identifier (connector wins)
- buildClientConnectorManifests: store rows → type 'mcp' manifests (no token; the
  client has none) with permission → humanIntervention mapping
- mcpService.invokeMcpToolCall: route connector tool calls to connector.callTool
  before the plugin path (only connectors with a real MCP endpoint, so
  Lobehub/Klavis skills keep their executor)
- DeferredStoreInitialization: fetch connectors post-login so chat sees them
- AddConnectorModal: refresh after OAuth regardless of popup outcome
- chat-input skills picker: surface custom connectors in the auto group

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(connector): open OAuth popup synchronously + escape callback HTML (codex P1)

- AddConnectorModal: open the OAuth popup synchronously inside the click handler
  (before any await), then navigate it to the authorize URL. Browsers block
  window.open once an async boundary is crossed, which left popup=null and the
  poll loop never resolving — the Add modal hung. Null popup now fails fast with
  a "allow popups" message.
- callback route: escape the postMessage payload for `<script>` context
  (`<`, `>`, `&`, U+2028/U+2029 → \uXXXX). A malicious OAuth server could put
  `</script>...` in the error param and execute script on the app origin.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(connector): tighten execution boundary + surface OAuth failures + tests

Address review: enforce the same constraints at the call site that the manifest
layer enforces, and stop swallowing OAuth failures.

- isEnabled on BOTH sides: invokeMcpToolCall only routes enabled connectors
  (a disabled connector no longer steals a same-name plugin's call), and the
  server rejects calls to a disabled connector. Matches buildClientConnectorManifests
  which only exposes enabled connectors.
- callTool requires the toolName to exist in the synced user_connector_tools
  list — unsynced / hand-crafted tool names are rejected instead of being
  forwarded blindly to the remote MCP.
- extract callConnectorToolById (typed ConnectorToolCallError → tRPC codes) so
  the gates are unit-testable.
- AddConnectorModal: distinguish success / provider-error (show the reason) /
  user-dismissed instead of collapsing every failure into a silent close.
- tests: exec gates (not-found / disabled connector / unknown tool / disabled
  tool / success / token-refresh) + buildClientConnectorManifests mapping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(connector): align redirect URI, connector-override & partial-failure UX

Second review round.

- redirect URI: the modal showed a client-origin URI while the server sent an
  APP_URL one — register-vs-use mismatch broke the callback. Add a
  `connector.getRedirectUri` query (server source of truth) and show exactly
  that in the modal.
- execAgent: derive the plugin-override set from the connectors that ACTUALLY
  produce a manifest (enabled + with tools), not the raw endpoint-having set —
  a disabled / not-yet-synced same-named connector no longer evicts the plugin
  and leaves the runtime with no tools. Matches the client-chat behaviour.
- partial failure: when code exchange succeeds but the tool sync fails, the
  callback now reports `synced: false`; the modal shows "authorized but tools
  could not be synced" instead of a false "connected".

Tests: execAgent overlap regression (disabled / 0-tool keeps the plugin; real
tools replace it) + callback partial-failure (synced:false on sync error).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ♻️ refactor(connector): name the availablePlugins source 'custom' not 'connector'

The agent-management availablePlugins types describe a tool's SOURCE
(builtin / klavis / lobehub-skill); 'connector' named the storage system
instead. Once plugins migrate to the connector table everything is a connector,
so the source-based label is what matters. Rename to 'custom' to align with
ConnectorSourceType.custom (single source of truth); section is <custom_plugins>.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(connector): enforce connector permissions for community MCP plugins

Community MCPs execute via the plugin path (not connector.callTool), so the
per-tool permissions a user sets in the new Connectors UI weren't surfaced:
needs_approval didn't trigger the approval prompt on either runtime. (disabled
was already hard-blocked at execution by ToolExecutionService and the mcp
router.)

- extract patchManifestWithPermissions into a pure, client-safe module
  (patchManifestPermissions.ts); connectorPermissionCheck.ts re-exports it.
- execAgent: also patch community-plugin manifests (pluginsWithoutConnectors)
  with their connector permissions, alongside lobehub/klavis.
- client createToolsEngine: patch community-plugin manifests with connector
  permissions from the store so needs_approval surfaces as humanIntervention
  in the classic chat path too.
- unit tests for the shared patch function.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  fix(connector): tolerate uninitialized connectors slice in selectors

createToolsEngine now reads connectorSelectors.{customConnectors,connectorList};
toolEngineering/index.test.ts mocks getToolStoreState without `connectors`, so
the selectors hit `undefined.filter`. Guard with `?? []` (the real store always
seeds connectors:[] via initialState) and add connectors:[] to the test mock.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  fix(connector): guard every connector selector against an uninitialized slice

mcp.test.ts mocks the tool store without `connectors`, and invokeMcpToolCall
calls connectorByIdentifier → `s.connectors.find` threw. The previous fix only
guarded connectorList/customConnectors; harden all of them (find/filter) so any
partial-store mock is safe. The real store always seeds connectors:[].

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 18:01:41 +08:00
Arvin Xu 0ac53b4e80 🐛 fix(agent-runtime): capture Gemini multimodal content_part/reasoning_part output (#15535)
Gemini 2.5+/3 thinking streams deliver assistant text and reasoning as
content_part/reasoning_part events instead of plain text/reasoning. The
runtime registered no onContentPart/onReasoningPart handlers, so the text
was silently dropped: onCompletion still reported usage tokens, the
empty-completion guard saw outputTokens > 0, and the turn finalized to a
blank `done` (lost in DB, client stream and trace alike).

Add the two handlers, mirroring onText/onThinking for text parts so
streaming, persistence and tracing all capture the content. Image parts
are uploaded to object storage and serialized as multimodal content
(text + image URLs, in order) — never persisting raw base64.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 14:55:37 +08:00
René Wang 91588bfdf8 📝 docs: add June 8 weekly changelog (#15537)
* 📝 docs: add June 8 weekly changelog

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 📝 docs: add June 8 changelog cover and register index entry

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 13:57:37 +08:00
LiJian 927a79c3fb feat(auth): preserve utm_source through the OIDC sign-in/sign-up flow (#15544)
When Market kicks off OIDC against LobeHub, unauthenticated users are
redirected by the auth middleware to /signin (and onward to /signup).
The utm_source param sent on the original /oidc/auth request was only
buried inside callbackUrl and never surfaced on the sign-up page.

Carry utm_source as a first-class query param through the auth detour,
mirroring how the `hl` locale param is already preserved:
- middleware lifts utm_source from the request onto the /signin URL
- sign-in forwards utm_source to /signup in both navigation paths

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 13:24:53 +08:00
Arvin Xu c5c047e4b5 🐛 fix(desktop): misc independent fixes (vite fetch cap, gateway loading, token animation) (#15541)
* 🐛 fix(desktop): bound concurrent Vite dev-server fetches

Since #15304 unified dev under app://, every renderer asset round-trips
through the main-process net stack. A cold start (thousands of module
requests) or a non-default UI language (~50 i18n namespaces over HTTP at
once) could exhaust the net request pool and surface as
ERR_INSUFFICIENT_RESOURCES. Gate Vite dev-server fetches behind a FIFO
semaphore (cap 64), holding each slot until the response body is fully
drained so streaming responses count for their whole lifetime.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(desktop): add trailing inset to tab title

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix: eliminate blank loading state during Gateway/ServerRuntime execution

When sending a message in Gateway (ServerRuntime) mode, the UI showed
a blank state between 'Sending message' and 'Task is running in server'
because the new execServerAgentRuntime operation was associated with the
server-created message ID, while the UI was still rendering the temp
message ID. The temp ID had no running operation, so ContentLoading
returned null.

Fix: pass temp message IDs to executeGatewayAgent and associate them
with the gateway operation alongside the server message ID. This ensures
ContentLoading finds a running operation regardless of which message ID
the UI is currently rendering.

*  feat(agent): animate subagent token count with count-up effect

Promote a shared AnimatedNumber into @lobechat/shared-tool-ui/components and
use it for the subagent metrics token total so it rolls up smoothly while
streaming instead of jumping.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 13:09:45 +08:00
LiJian 419aca2c59 🐛 fix(skill): stop OAuth connectors duplicating into the Skills tab (#15510)
The unified /settings/skill manager renders the Connectors and Skills
sub-tabs from one SkillList via viewMode. Lobehub/Klavis OAuth connectors
(type 'lobehub' | 'klavis') belong only in the Connectors view, but the
Skills view's "Community Skill" section still mapped them alongside the
market agent skills — so Gmail, Notion, Google Drive, etc. showed up in
both tabs.

Render only market agent skills in the Skills view; OAuth connectors stay
exclusively under the Connectors view's "OAuth Connectors" group.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 11:26:14 +08:00
Arvin Xu f0f8ecd64d 🧹 chore: clean LOBE marker comments from aiInfra schema (2026-06-08) (#15536)
🧹 chore: replace LOBE-10056 markers with inline context in aiInfra schema comments
2026-06-08 11:07:18 +08:00
René Wang b19008ed24 💄 style: bring various details for better experience (#15486) 2026-06-08 10:55:46 +08:00
Arvin Xu dbf743cc12 feat(verify): Agent Run delivery checker system (#15489)
* 🗃️ feat(database): add verify system tables for agent run delivery checker

Implement the database layer for the Agent Run delivery checker (Verify System).

Reuse / definition layer:
- verify_criteria: a single reusable pass/fail standard (atomic unit), carrying
  its verifier config + onFail default and bound to a document for judging
  guidance (iteration history reuses document_history; no version columns)
- verify_rubrics: a named group that aggregates criteria — the reusable unit
- verify_rubric_criteria: junction, which criteria a rubric aggregates
  (criteria are reusable across rubrics)

Mounted onto an agent via the existing agency config jsonb:
- agencyConfig.verifyRubricId: a reusable rubric (criteria template)
- agencyConfig.verifyCriteriaIds: ad-hoc one-off criteria
A run's plan instantiates the union of both. No dedicated bindings table.

Snapshot + result layer:
- agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at: the per-run
  immutable check-item snapshot lives ON the operation (1:1 — auto-repair spawns
  a new operation), instead of a separate plans table
- agent_operations.verify_status: denormalized rollup for list-page badges
- verify_check_results: per-criterion result with the Toulmin model
  (verdict/confidence as columns, narrative in a typed toulmin jsonb), N:1
  verifier_tracing_id for batch judging, FP/FN flags for the data flywheel;
  relates to the plan via operation_id + stable check_item_id

Ref: LOBE-10019

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  feat(verify): add Agent Run delivery checker backend + frontend module

Implements the verify system on top of the schema (PR #15480):
- models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult;
  agentOperation verify plan/status methods
- services/verify: AI plan generation (auto-create criteria), executor with
  LLM Toulmin judge (per-criterion + batch), program placeholder, agent &
  auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion
  lifecycle bridge
- lambda verify router (criteria/rubric CRUD, plan, results, feedback)
- frontend feature module: service, SWR hooks, CheckerDock state machine,
  RunArtifact, verify i18n namespace
- tracing scenarios: VerifyPlanGen / VerifyJudge

Live UI mount (dock/artifact into chat) pending server operationId source.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(verify): persist delivery-checker verdicts via async tracing backfill

The LLM judge produced valid verdicts but they were never persisted, leaving
every run stuck at `verifying`. Two root causes:

1. FK ordering: `writeVerdict` stamped `verifier_tracing_id` synchronously, but
   the `llm_generation_tracing` row is written asynchronously (best-effort,
   after the response) — so the hard FK was violated every time and the verdict
   write was rolled back. Now the verdict is written with a null link, and the
   tracing id is backfilled by an `onPersisted` callback that fires only after
   the tracing row commits (still non-blocking). If tracing is disabled the link
   simply stays null.

2. Verdict parse: the judge JSON schema is non-strict, so the provider returns
   optional Toulmin fields as explicit `null`. The Zod validator used
   `.optional()` (accepts undefined, not null), so any null failed the whole
   `safeParse` and discarded the batch. Switched to `.nullish()`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(cli): add `verify` command for the delivery checker

Adds `lh verify` covering the full delivery-checker chain — criteria & rubric
CRUD, per-run plan (generate/state/confirm/skip), execute (LLM judge), results,
and feedback — calling the `verify` lambda router. Enables end-to-end backend
testing of the verify system.

Also adds the missing `tool-runtime` / `prompts` / `const` workspace entries to
the CLI's `pnpm-workspace.yaml` so the standalone package installs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(verify): add verify message role + delivery-checker card UI

Make the delivery-checker renderable in chat:

- Fix the `features/Verify` components so they compile: flatten the `verify`
  locale to the repo's flat-dotted-key convention (keySeparator: false), import
  `Flexbox`/`TextArea` from `@lobehub/ui` (react-layout-kit is no longer a dep),
  and the token cast.
- Add a `verify` UI message role + a `VerifyMessage` card that renders the
  Run Artifact + checker dock from `metadata.verifyOperationId`, wired into the
  message renderer switch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): add lobe-agent `generateVerifyPlan` tool (server runtime)

Lets an agent set up the delivery checker for its run: the agent calls
`generateVerifyPlan` early (per the new `<delivery_checker>` system-role
guidance), which instantiates the rubric / ad-hoc criteria into a frozen plan on
the current `agent_operations` row. Executed server-side only — the executor is
dispatched via `runtime[apiName]` with `operationId` threaded through the tool
execution context; the client `BaseExecutor` gracefully no-ops it.

Also registers the metadata fields (`verifyOperationId`/`verifyRound`) on the
message metadata zod schema so the role='verify' card can carry its operation id.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): surface role=verify card on run completion (LOBE-10051)

Connect the delivery checker to the conversation: when an Agent Run with a
verify plan completes, `CompletionLifecycle` inserts a persisted `role='verify'`
message (parented to the assistant, carrying `metadata.verifyOperationId`) that
renders the checker card. Self-guarded — no plan → no card, failures never
affect the run.

`role='verify'` behaves like a `user` leaf message everywhere it flows
(persistence + conversation-flow pass it through unchanged); only the
context-engine treats it specially: a new `VerifyMessageProcessor` drops it from
the model context (UI-only card, not a valid model role). Adds `verify` to
`CreateMessageRoleType`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(verify): merge run-artifact + checker into one card

The role=verify message rendered two stacked cards (Run Artifact summary +
Delivery Checker) that duplicated the check-item list. Merge into a single card:
the `Run Artifact · Round N` header, then the checker results + actions, then the
snapshot note. RunArtifact/CheckerDock gain an `embedded` prop (header-only /
body-only, no card chrome) and VerifyMessage composes them under one border.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): derive generateVerifyPlan rubric from agencyConfig

A real agent calls `generateVerifyPlan` with just a `goal` and doesn't know
rubric ids. When `rubricId`/`criteriaIds` params are absent, derive the mounted
rubric + ad-hoc criteria from the executing agent's
`agencyConfig.verifyRubricId / verifyCriteriaIds`. Params still win when given.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(cli): surface agent gateway WebSocket close code + reason

The `onclose` handler logged `String(event)` → the useless "[object
CloseEvent]". Surface `event.code` (+ `event.reason` when present) so a gateway
disconnect before completion is actually diagnosable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 fix(verify): rename "Run Artifact" → "Verification", drop failed red border

- The kicker said "Run Artifact" — it's automated verification, not an artifact.
  Renamed to "Verification · Round N".
- Removed the red error border on a failed check — a normal card reads better.
- Fixes a render crash (`useVerifyState is not defined`): the border removal left
  a dangling reference after the import was dropped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(cli): poll run status when the agent stream drops

When the live stream (gateway WebSocket / SSE) closes before the run finishes,
the run is still executing server-side — so instead of hard-exiting, fall back to
polling `aiAgent.getOperationStatus` every 10s until the run reaches a terminal
state (or is no longer tracked). Pairs with surfacing the WS close code/reason.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 feat(verify): add Render for generateVerifyPlan tool call

The generateVerifyPlan tool call rendered as the default param/result dump. Add a
Render that lists the generated delivery checks (title + gate/auto-fill tag), and
surface the items on the tool state so the Render can read them.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): auto-confirm generated plan so checks run on completion

The agent generated a plan but it stayed `planned`/unconfirmed, so the completion
hook (which gates on a confirmed plan) never ran the checks — the card was stuck
at "awaiting confirmation" with no pass/fail. In the headless agent flow there's
no one to click Confirm, so `generateVerifyPlan` now auto-confirms the plan it
generates; the checks then run automatically on completion. (An interactive
"review before run" gate is a future enhancement.)

Also: the verify card header disappeared in the draft/planned phase
(`phaseToArtifact.draft` was null). Give it a header so the card always shows its
"Verification · Round N" heading.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent-tracing): only count opaque/presentational attrs as structural noise

The first structuralNoiseRatio charged ALL markup (every <...> tag) as noise,
which over-penalized legitimately structured results 3x. Grounding against real
web-search output (`<item title="…" url="…">snippet</item>`) showed the tags and
the title=/url= attributes ARE the signal the model reads.

Now only opaque/presentational attribute names (id, class, style, data-*, aria-*,
role, on*) count as noise; semantic element tags and content-bearing attributes
(title, url, href, name…) are kept. On a 57-op user-interrupted sample this drops
web-search noise 42%→0% and overall estimated waste 16%→5%, leaving large-payload
(readDocument) and high error-rate tools as the real signal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): model-authored criteria with name/description/instruction-in-document + agent verifier

Restructure the generateVerifyPlan tool to a createDocument-style full-create flow
and wire up the agent verifier path:

- criteria now = title + description (required one-liner) + instruction (required
  detailed rubric); instruction lives in a linked document (verify_criteria.documentId),
  description is a new verify_criteria column (migration 0111). verifierConfig no
  longer holds description/instruction.
- generateVerifyPlan creates verify_criteria + a rubric, snapshots the plan onto
  the operation and confirms it; judge resolves the instruction from the document.
- agent-type checks run as verifier sub-agents (execAgent + isolated thread) whose
  onComplete hook parses a VERDICT and writes it back to verify_check_results
  (renamed AgentVerifierSpawner → VerifierAgentRunner).
- UI: custom Inspector for the tool header; check list shows per-verifier-type icons
  (llm/agent/program) + description + required/optional tag; i18n en/zh.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ️ perf(verify): run program/llm/agent checks concurrently on completion

The three verifier kinds are independent; previously the agent spawn waited for
the batched LLM judge to finish. Run them via Promise.all so agent sub-agents
start immediately alongside the LLM batch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): dedicated builtin verify-agent + writeback tool, role=verify message, portal check editor

- Add `@lobechat/builtin-tool-verify` (submitVerifyResult) + builtin `verify-agent`;
  agent-type checks now run as the dedicated verify agent (not the user's agent),
  which investigates and writes its verdict back via the tool during its run.
- Verifier inherits the parent run's model/provider (builtin default may be
  unconfigured locally).
- role=verify completion message no longer requires an assistantMessageId, so the
  delivery-checker card always surfaces when a plan exists.
- Portal editor for verify checks (title/description/instruction/verifier/onFail).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(verify): restrict verify-agent to its writeback tool; fix running loader icon

Root cause of stuck `running` agent checks: the verify-agent ran in agent mode and
inherited all default tools (web-browsing, cloud-sandbox, skills, activator), so it
went off web-searching/crawling to "investigate" and never called submitVerifyResult.

- Run the verify-agent in chat mode (enableAgentMode: false, searchMode: off) — the
  strict whitelist — and whitelist `lobe-verify` for chat mode so the verifier gets
  ONLY its writeback tool.
- Sharpen the verify systemRole: judge from the provided deliverable/instruction
  (no external tools), always reach a verdict, and always call submitVerifyResult.
- CheckerDock: running check now uses the standard RingLoadingIcon (warning ring),
  matching the app's loader instead of a blue spinner.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): auto-repair loop — re-run the agent with failure feedback on failed checks

When required checks fail with onFail=auto_repair, automatically run a second
iteration instead of ending at `failed`:

- createRepairRunner: re-runs the SAME agent in the same topic with the failure
  feedback as the prompt, re-snapshots the plan onto the repair operation and
  confirms it so it re-verifies on completion (the next round). Capped at
  MAX_REPAIR_ROUNDS via parent-chain depth to prevent runaway loops.
- maybeAutoRepair: fires only once every required check has a terminal result, so
  it works for inline LLM checks (triggered from lifecycle) and async agent checks
  (triggered from the verify tool's writeback path).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): open check result detail in portal & rename artifact→result

- add a VerifyResult portal view: clicking any check row opens that result's
  detail (verdict, confidence, Toulmin sections, suggestion) on the right; agent
  checks expose their execution trace from inside the panel
- CheckerDock rows are all clickable now (chevron affordance), status shown by
  icon only; verify card uses colorBgElevated
- rename the run-result surface from "artifact" to "result" everywhere: RunArtifact
  → RunResult, phaseToArtifact → phaseToResult, and all `artifact.*` i18n keys →
  `result.*`
- ship verify namespace zh-CN / en-US locales

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): enrich check result portal — criterion stepper, richer detail view

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): rubric run-policy config + repair feedback on the verify card

Auto-repair feedback now lives on the failed round's role=verify message
(content), and the VerifyMessageProcessor surfaces it into the repair run's
context as a tagged user turn — so the repair op runs off history via a new
execAgent `suppressUserMessage` path instead of injecting a synthetic user
message. createVerifyMessage is awaited before verification to avoid a race.

maxRepairRounds becomes a rubric-level config: new `verify_rubrics.config`
jsonb column, read live at repair time via the plan's sourceRubricId. Adds a
RubricConfig portal panel (reachable from the plan card's settings affordance)
to view/edit it, wired through the verify store + TRPC.

Verify domain types/vocab/config are extracted from the DB schema into
@lobechat/types as the single source of truth; schema and consumers import
from there.

Tests: VerifyMessageProcessor dual behavior; VerifyRubricModel config
round-trip; MessageModel.findVerifyMessageByOperationId.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🗃️ refactor(verify): squash the 3 verify migrations into one

Collapse 0110 (tables) + 0111 (criteria.description) + 0112 (rubrics.config)
into a single regenerated 0110_add_verify_tables so the PR ships one clean,
idempotent migration. No schema change vs the three combined.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(cli): verify rubric run-policy config commands + shrink judging-rule editor font

CLI: `verify rubric create --max-repair-rounds`, `verify rubric view`, and
`verify rubric update` exercise the rubric config endpoints end-to-end; adds a
mocked command test. UI: judging-rule editor font 16px → 14px.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(verify): editable rubric name in the config panel + default 3 repair rounds

Add a name (title) field to the RubricConfig portal, persisted via a new
updateRubricTitle store action + service (optimistic + debounced, alongside
the config write-back). Bump DEFAULT_MAX_REPAIR_ROUNDS 2 → 3.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(verify): extract generateVerifyPlan into installable lobe-delivery-checker tool

Move the delivery-checker plan-creation flow out of the always-on lobe-agent
tool into a new standalone, installable builtin tool `lobe-delivery-checker`
(Skill Store, opt-in per agent — not loaded by default). lobe-agent no longer
ships generateVerifyPlan.

- new packages/builtin-tool-lobe-delivery-checker (manifest/types/systemRole +
  client Render/Inspector/Portal moved wholesale from lobe-agent)
- new serverRuntimes/lobeDeliveryChecker.ts (generateVerifyPlan moved out of
  lobeAgent.ts), registered alongside verifyResult
- registered installable in builtin-tools (no hidden/discoverable:false, not in
  defaultToolIds/alwaysOnToolIds/runtimeManagedToolIds); renders/inspectors/
  portals/identifiers wired; lobe-agent portal entries removed
- i18n keys moved builtins.lobe-agent.verifyPlan.* → builtins.lobe-delivery-checker.*

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(agent): add `custom` tool mode; verify agent uses it instead of chat-mode

Chat mode's contract is to strip ALL user/agent plugins (strict KB/memory/web
allow-list) — so the verify sub-agent couldn't get its writeback tool without a
leaky blanket rule. Introduce a third tool mode `custom` where the toolset is
EXACTLY the agent's declared plugins (no always-on, no defaults, no activator),
for focused builtin sub-agents.

- chatConfig.toolMode: 'agent' | 'chat' | 'custom' (overrides enableAgentMode)
- AgentToolsEngine: custom branch (defaultToolIds = plugins, rules = plugins-on,
  allowExplicitActivation only in agent mode); chatModeRules restored to strict
- verify agent → toolMode: 'custom'; lobe-verify dropped from chatModeAllowedToolIds
- test: custom mode enables exactly the declared plugin, no always-on / defaults

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-08 09:16:35 +08:00
Arvin Xu fc0daa7604 💄 style(conversation): show running indicator after a settled inline tool while generating (#15528)
 feat(conversation): show running indicator after a settled inline tool while generating

Heterogeneous agent turns render a single tool call inline (no
WorkflowCollapse chrome). Once that tool settles but the run is still
generating the next step, the inline path showed nothing below it — a
blank gap that reads as "stuck". Render the same turn-start "running"
indicator at the segment tail for this case. Multi-tool segments keep
WorkflowCollapse's own streaming header; a tool still executing is
already covered by its loading placeholder.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 02:03:50 +08:00
Arvin Xu df72bc335e 🎨 refactor(local-system): preserve ANSI escape codes in command output (#15529)
* 🎨 refactor(local-system): preserve ANSI escape codes in command output

The client now renders ANSI sequences, so stripping color codes from
shell command output is no longer needed. Drop the stripAnsi helper and
let truncateOutput keep the raw colored output intact.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(local-system): drop dangling ANSI escape and reset open SGR state before truncation notice

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 01:37:34 +08:00
Innei e855fcc0b8 ♻️ refactor(desktop): move backend URL rewrite into main process (#15304)
* ♻️ refactor(desktop): move backend URL rewrite into main process

Renderer code no longer needs `withElectronProtocolIfElectron` to rewrite
backend URLs to `lobe-backend://`. The Electron main process now diverts
backend-prefixed paths (`/trpc`, `/webapi`, `/api/auth`, `/market`) to the
remote LobeHub server in two places:

- prod: `RendererProtocolManager` (`app://` handler) delegates to
  `BackendProxyProtocolManager.proxy(request, session)` after the existing
  hostname guard.
- dev: `Browser.setupRemoteServerRequestHook` registers a
  `webRequest.onBeforeRequest` listener that redirects
  `http://localhost(:*)/<backend-prefix>...` to `lobe-backend://lobe<path>`.

`BackendProxyProtocolManager` keeps a per-session `WeakMap<Session, Context>`
and exposes `proxy(request, session)` so the same OIDC token / Vercel cookie
/ 401 debounce / `X-Auth-Required` pipeline serves both entry points.

The helper and ~35 call sites in `src/services/_url.ts` and the three tRPC
clients are removed. `ELECTRON_BE_PROTOCOL_SCHEME` stays for the main
process; new `BACKEND_PATH_PREFIXES` + `isBackendPath` predicate live in
`apps/desktop/src/main/const/protocol.ts`.

* ♻️ refactor(desktop): decouple renderer protocol from backend proxy via interceptor pipeline

`RendererProtocolManager` no longer imports `BackendProxyProtocolManager` or
`isBackendPath`. It exposes a generic `addRequestInterceptor(fn)` hook and
runs interceptors in order inside the `app://` handler — first non-null
Response short-circuits the file pipeline.

`BackendProxyProtocolManager.createAppRequestInterceptor()` owns the
"what counts as a backend path" knowledge and returns a 502 for backend
prefixes when no proxy context is wired up (must not fall through to SPA
HTML).

Wiring happens in `App.ts` after `RendererUrlManager` construction —
composition root knows both modules so neither has to know the other.

* ♻️ refactor(desktop): unify dev/prod renderer under app:// and drop lobe-backend://

Dev mode no longer uses `http://localhost:<port>` as the renderer origin; the
BrowserWindow now loads `app://renderer/` in both dev and prod. Non-backend
requests fall through to a strategy:

- prod: `StaticRendererFallback` serves the static export from `rendererDir`
  (Range support, SPA HTML fallback, 404 handling)
- dev:  `ViteRendererFallback` proxies to the electron-vite dev server via
  `net.fetch('http://localhost:5173/<path>')`; HMR WebSocket connects
  directly (configured via `server.hmr.{host,clientPort}` + `strictPort`)

`lobe-backend://` is gone — the scheme, its privileged registration, the
`session.protocol.handle('lobe-backend', ...)` call, and the dev
`webRequest.onBeforeRequest` trampoline are all removed.
`BackendProxyProtocolManager` now only stores per-session context and
exposes `createAppRequestInterceptor()` for the `app://` pipeline.

Dev userData is pinned to `<appData>/lobehub-desktop-dev` via a new
`pre-app-init.ts` that runs before `@/const/dir` captures
`app.getPath('userData')` — necessary because dev and prod now share the
`app://renderer` origin and would otherwise collide on localStorage /
cookies / IndexedDB.

Also adds `stream: true` to the `app` scheme privilege so dev media Range
requests survive forwarding.
2026-06-08 00:49:33 +08:00
Arvin Xu ee6a74ba06 🗃️ feat(db): verify delivery-checker schema + ai_providers/ai_models _id column (#15526)
🗃️ feat(db): delivery-checker schema + ai_providers/ai_models surrogate `_id`

The DB layer, split out so it merges ahead of its callers (services / TRPC /
store / UI ship in a follow-up stacked PR). One consolidated, idempotent
migration (0110_add_verify_tables_and_ai_infra_id):

- verify delivery-checker: verify_criteria / verify_rubrics (+ config) /
  verify_rubric_criteria / verify_check_results tables + verify_status /
  verify_plan / verify_plan_confirmed_at columns on agent_operations; plus the
  verify domain types/vocab/config in @lobechat/types the schema imports.
  All four verify tables carry a workspace_id FK + index (cascade on workspace
  delete), matching documents / agent_operations. verify_check_results has a
  UNIQUE (operation_id, check_item_id) index — one lifecycle row per plan item
  per run, so a retry / concurrent worker can't create conflicting duplicates.
- ai-infra (LOBE-10072): nullable `_id uuid DEFAULT gen_random_uuid()` on
  ai_providers / ai_models, written as the safe two-step form (ADD nullable,
  then SET DEFAULT) to avoid a full-table rewrite + ACCESS EXCLUSIVE lock;
  backfill + NOT NULL are later manual steps (LOBE-10073 / LOBE-10074)

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 23:03:50 +08:00
Arvin Xu 20cea3a6bf feat(page-agent): execute tools server-side via HeadlessEditor (#15023)
*  feat(page-agent): execute tools server-side via HeadlessEditor

Page-agent tools (initPage / editTitle / getPageContent / modifyNodes /
replaceText) now run on the server against a `@lobehub/editor/headless`
instance and persist through `DocumentService.updateDocument`, instead
of executing inside the renderer's Lexical instance. The renderer
applies the resulting snapshot via the builtin-tool `onAfterCall` hook,
so the document store stays in sync without an extra fetch.

This makes page-agent execution independent of the client lifecycle
(editor unmount, tab switch, network blip), gives us full server-side
tracing for free (OTel gen-ai + agent-signal + documentHistories), and
exposes a `silent-no-op` / `unexpected-mutation` invariant when the
exported editorData hash diverges from what the handler reported.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 🐛 fix(page-agent): decouple EditorRuntime from @lobehub/editor side-effecting bundle

EditorRuntime statically imported LITEXML_*_COMMAND from @lobehub/editor,
which pulls ReactSlashPlugin and crashes Node (`document is not defined`)
in any server-side test that transitively touched the runtime. The same
import also dispatched the wrong command identity on HeadlessEditor's
kernel — pnpm resolves @lobehub/editor to a different module copy than
the headless bundle, so dispatchCommand would silently no-op server-side.

Introduce a LiteXMLAdapter strategy: renderer wires command dispatch
against the live editor; server wires HeadlessEditor.applyLiteXMLBatch
/ applyLiteXML so the correct headless-bundle symbols are used.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 🐛 fix(page-agent): restore client-side mutate handlers on PageEditor mount

The main commit dropped `setBeforeMutateHandler`/`setAfterMutateHandler`
under the assumption that page-agent tools always execute server-side.
But the chat-store path (`invokeBuiltinTool` → `PageAgentExecutor.modifyNodes`
→ `EditorRuntime.modifyNodes`) still routes through the client-bound
runtime whenever the LLM dispatcher is the chat slice — it does not
consult `manifest.executors`. Without the handlers, that path mutates
the live editor but skips both `documentHistoryQueueService.enqueueEditorSnapshot`
(loses undo baseline) and `commitEditorMutation(saveSource: 'llm_call')`
(row never persists).

Re-wire both handlers. Server-runtime path is unaffected: it instantiates
its own `EditorRuntime` against `HeadlessEditor` and never sees the
client's StoreUpdater wiring, so the two paths can coexist without
double-writing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(editor-runtime): split client / server entries so renderer gets adapter for free

Renderer call sites shouldn't have to opt in to the obvious default
(dispatch LITEXML_*_COMMAND on the live editor). Split the package into
two entries:

- `@lobechat/editor-runtime` — renderer entry; constructor auto-wires
  the LiteXML adapter from `@lobehub/editor`. Static-importing this
  from Node still crashes (ReactSlashPlugin), so it's the right shape
  for the browser only.
- `@lobechat/editor-runtime/server` — server-safe entry; exports the
  bare class without touching `@lobehub/editor`. Callers (currently
  only the page-agent server runtime) supply their own HeadlessEditor-
  backed adapter.

Drops the renderer-side setLiteXMLAdapter patch and a stale comment
block in StoreUpdater.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(page-agent): drop LiteXMLAdapter, dispatch commands directly

`@lobehub/editor` 4.16.1 ships the LiteXML command identities through the
side-effect-free `@lobehub/editor/litexml-commands` subpath, so a single command
object is shared across the browser and node bundles and can be imported in Node
without pulling the DOM-dependent editor bundle.

`EditorRuntime` now imports `LITEXML_MODIFY_COMMAND` / `LITEXML_APPLY_COMMAND`
from that subpath and dispatches them straight onto the editor kernel. This
removes the `LiteXMLAdapter` strategy object (`setLiteXMLAdapter` /
`getLiteXMLAdapter`) — a leaky abstraction whose only purpose was to keep the
crash-on-Node command import out of the shared base.

- editor-runtime: dispatch `LITEXML_*_COMMAND` directly; delete the adapter
  interface, field, setter and runtime-throw guard.
- Collapse the client/server entry split (its sole reason — isolating the
  DOM-crashing import — is gone); both entries now re-export the isomorphic base.
- pageAgent server runtime: drop the HeadlessEditor-backed adapter wiring.
- Bump `@lobehub/editor` to ^4.16.1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(editor-runtime): drop redundant /server entry

Now that `EditorRuntime` is isomorphic (LiteXML commands come from the DOM-free
`@lobehub/editor/litexml-commands` subpath), the `./server` entry is byte-for-byte
identical to the root `.` entry. Remove it and point the only consumer
(pageAgent server runtime) at the root entry.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 22:33:41 +08:00
Arvin Xu 78657d496e 🐛 fix(desktop): pin electron-builder to 26.14.0 to fix broken macOS update signing (#15527)
electron-builder was floating on `^26.8.1` and the repo commits no lockfile,
so each CI build resolved a fresh version. The canary.12 build (2026-06-07)
picked up 26.15.0, which regressed macOS .app bundle signing: codesign reports
"bundle format is ambiguous (could be app or framework)" and Squirrel.Mac
rejects the update during code-signature validation, so the app never quits
to install — surfacing as "auto-update does nothing".

26.15.0 introduced the two suspect changes (mac signing rework #9822 and the
full app-builder-bin Go→TS replacement #9829). 26.14.0 predates both and does
not touch macOS app-bundle signing/layout. Pinning the exact version cascades
to app-builder-lib / dmg-builder / builder-util (electron-builder pins those
exactly), stopping the toolchain from floating across CI installs.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:20:26 +08:00
Arvin Xu 2453fc3515 🐛 fix(desktop): skip browser beforeunload guard so auto-update can quit (#15525)
On desktop the chat-loading beforeunload guard (preventLeavingFn) blocks
window.close() during quitAndInstall, so the app fails to quit & install
the update. The main process already manages close/quit via keepAlive +
isQuiting, so short-circuit the guard on desktop.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 17:42:33 +08:00
Coooolfan a28fd30719 feat: suppport sandbox provider (#15184)
*  feat(cloud-sandbox): add Onlyboxes provider support for self-hosted sandbox (#15136)

- Add `SANDBOX_PROVIDER` env var (market | onlyboxes) to select sandbox backend
- Add Onlyboxes-specific env vars: `ONLYBOXES_BASE_URL`, `ONLYBOXES_API_TOKEN`, `ONLYBOXES_LEASE_TTL_SEC`
- Create `SandboxService` abstraction layer with `MarketSandboxService` and `OnlyboxesSandboxService` implementations
- Add `createSandboxService` factory that routes to configured provider
- Migrate `execInSandbox` and `exportFile` t

*  feat(sandbox): improve Onlyboxes export flow

* 🐛 fix(sandbox): pass presigned upload headers to Onlyboxes

*  test(sandbox): import tool runtime package

* 🐛 fix(sandbox): preserve Market export errors

* 🐛 fix(sandbox): allow empty docker env defaults

* 🔒 fix: redact sandbox auth params in logs

* 🐛 fix: address sandbox provider review comments

* 🔐 feat: use onlyboxes jit tokens

* 📝 docs: clarify cloud sandbox provider config

* 🐛 fix: align cloud sandbox timeout defaults

* 🐛 fix(sandbox): lower default Onlyboxes lease TTL to 15 minutes

* 🐛 fix(sandbox): cap Onlyboxes task wait time

* ♻️ refactor: split sandbox env config
2026-06-07 12:18:39 +08:00
Arvin Xu c711279edf feat(tools): show app-fixed tools in the chat-input Pinned section (#15509)
*  feat(tools): show app-fixed tools in the chat-input Pinned section

Surface always-on, runtime-owned tools (lobe-agent + always-on infra) read-only
at the top of the Tools popover "Pinned" group, so users can see what the app
keeps active for every conversation. These have no toggle — a Pin indicator with
a hint replaces the per-tool policy menu.

- builtin-tools: add `fixedDisplayToolIds` ([lobe-agent, ...alwaysOnToolIds])
- builtin selectors: add `fixedDisplayMetaList` (reads hidden tools by id)
- useControls: render read-only fixed items, prepend to Pinned, fold into counts
- i18n: add `tools.activation.fixed.hint` + `tools.builtins.lobe-agent.*`

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(tools): make lobe-agent actually always-on; gate fixed display to runtime

The Pinned section was rendering tools that aren't enabled every turn:
- lobe-agent was only enabled when injected into plugins/runtime ids (it has no
  rule in the engine, so it defaulted to disabled) — showing it as "always on"
  was a UI lie.
- manual skill-activate mode strips manualModeExcludeToolIds (activator,
  skill-store) from the defaults, so they're off — but they still showed as fixed.

Fixes:
- Add lobe-agent to alwaysOnToolIds so its core capabilities (plan/todo, sub-agent
  dispatch, visual-media fallback) are genuinely on every agent-mode turn. Chat
  mode still drops alwaysOn entirely.
- Derive fixedDisplayToolIds from alwaysOnToolIds (single source of truth, no drift).
- Make fixedDisplayMetaList mode-aware: drop manualModeExcludeToolIds in manual mode
  so the Pinned list matches what the engine actually enables.
- Update engine tests that asserted the old "lobe-agent off by default" behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* ♻️ refactor(tools): drop fixedDisplayToolIds alias, use alwaysOnToolIds directly

fixedDisplayToolIds was just `= alwaysOnToolIds`; collapse it. The selector now
reads alwaysOnToolIds directly and still applies the manual-mode exclusion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 12:10:32 +08:00
Arvin Xu e7c73bd4ce 💄 style: support show CC subagent metrics chip (#15217)
*  feat(cc): show tool count + token + model metrics on Agent inspector chip

Surface per-subagent progress on the inline Agent inspector row so users can
see how much work has happened without expanding the thread:

- Inspector chip renders `[count] tools · [tokens]` after the description
  chip, with the model name in a Tooltip. Tool count = count of `role==='tool'`
  child messages; tokens = LAST subagent assistant's `metadata.usage.totalTokens`
  (CC's per-turn `message.usage` already includes the full prior context,
  so summing would double-count the shared history — the final turn's value
  matches the main-agent message-footer convention).
- New `threadSelectors.getThreadDbMessages` reads the raw DB-shape child
  messages from `dbMessagesMap[thread_*]` (the display-bound `messagesMap`
  bucket only holds the parent + a virtual `assistantGroup`).
- `BuiltinInspectorProps` carries `toolCallId` so the chip can join to its
  subagent Thread via `metadata.sourceToolCallId`; propagated from both the
  chat Inspector caller and the DevPanel `ToolInspectorSlot`.

Adapter / executor changes so subagent token usage actually flows in:
- `claudeCode.ts` `handleSubagentAssistant` emits a
  `step_complete{phase:turn_metadata, subagent}` event when
  `raw.message.usage` is present. Subagent assistant events are not
  partial-streamed (unlike main-agent), so `message.usage` is
  authoritative — no de-stale logic needed. The subagent ctx tag lets
  the executor route the usage write onto the in-thread assistant
  instead of the main agent's, so CC's `result_usage` grand-total
  semantics aren't double-counted.
- Renderer + server `step_complete{turn_metadata}` branches check for
  `event.data.subagent` and route to the run's `currentAssistantMsgId`.
  Renderer mirrors the write into `dbMessagesMap` via `run.stream.update`
  so the chip's selector picks up usage as it lands.

Server-side finalize rolls totals onto `thread.metadata` for the
historical-view cold-load path: tool count from `lifetimeToolCallIds.size`,
tokens from the last in-thread assistant's `metadata.usage.totalTokens`,
plus `completedAt` / `duration`. Done via the existing `threadModel.update`
with an inline metadata read-merge — no new `ThreadModel.updateMetadata`
method or `threadRouter.updateThreadMetadata` endpoint introduced.

i18n: 5 keys under `chat.thread.subagentMetrics.*` in `chat.ts` + zh-CN +
en-US.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* 🐛 fix(cc): persist subagent metrics so the inspector chip survives cold-load

The metrics chip (tool count · tokens, model in tooltip) only rendered while
the run streamed — after a reload it vanished on desktop. Two gaps:

- The renderer `heterogeneousAgentExecutor.finalizeSubagentRun` never rolled
  totals onto `thread.metadata` (only the server `HeterogeneousPersistenceHandler`
  did). On cold-load the child messages aren't hydrated, so the live selector
  had nothing to read and the chip's `hasAny` went false. Added the symmetric
  rollup (`totalToolCalls` / `totalTokens` / `completedAt` / `duration`),
  re-sending the create-time `sourceToolCallId` / `subagentType` / `startedAt`
  since `updateThread` replaces the whole metadata column.
- Subagent assistant messages carried no `model`, so the tooltip's model line
  never showed. The subagent `turn_metadata` branch now writes `model` /
  `provider` onto the in-thread assistant (live tooltip) and persists `model`
  onto `thread.metadata.model` (cold-load tooltip); the chip selector falls
  back to `thread.metadata.model`.

Also fixes a latent bug both paths shared: finalize read `totalTokens` off
`currentAssistantMsgId`, which by then points at the freshly-created terminal
assistant (no usage), so it always resolved `undefined`. Now tracks the last
non-zero per-turn `totalTokens` on the run — matching the live selector's
"last turn, not a sum" convention.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(cc): derive subagent chip metrics on read, drop run-state tracking

The chip's tool-count / token / model metrics were captured incrementally on
the subagent run (`lastTurnTokens` / `subagentModel`) and denormalized onto
`thread.metadata` at finalize — in BOTH the renderer executor and the server
handler, so the rule lived in three places and the two finalize paths had to
be kept in sync by hand.

Derive them on read instead, from the child messages (the single source of
truth):

- `aggregateSubagentMetrics(messages)` (new, `src/utils`) is the one rule:
  COUNT `role='tool'`, SUM every assistant turn's `usage.totalTokens`, pin the
  model. SUM (not last-turn) matches the project's token-usage heatmap
  convention — "total tokens processed".
- The chip selector aggregates the in-memory child messages live, falling back
  to `thread.metadata.*` on cold-load.
- `threadModel.queryByTopicId` computes the SAME projection in SQL (LEFT JOIN +
  GROUP BY, reusing the `usage->totalTokens` index, with a legacy
  `metadata.usage` fallback) and folds it onto `metadata`, so cold-load reads a
  server-derived value without hydrating the child messages.

Both finalize paths drop the metadata rollup and now only flip thread status
Active; `lastTurnTokens` / `subagentModel` run-state fields are gone. Each
subagent turn still writes its `usage` + `model` onto the in-thread assistant —
those rows are what the read-time aggregation sums over.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-07 02:21:53 +08:00
Arvin Xu 28f0117932 💄 style(tool-ui): render ANSI escape codes in RunCommand output (#15516)
 feat(tool-ui): render ANSI escape codes in RunCommand output

Parse ANSI SGR sequences in shell stdout/stderr with anser and emit
styled spans for fg/bg colors, dim, bold, italic, underline, strikethrough.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-07 01:34:16 +08:00
Arvin Xu 573cc5b798 💄 style(desktop): move panel toggle into titlebar top-left (#15515)
*  feat(desktop): move panel toggle into titlebar top-left

Place a persistent collapse/expand toggle at the titlebar's top-left
corner on desktop, to the right of the macOS traffic lights. The
NavigationBar now splits into a left group (toggle) and a right group
(back / forward / clock) with space-between: expanded, the right group
hugs the sidebar's right edge; collapsed, the controls cluster at the
left edge like codex.

ToggleLeftPanelButton gains an optional `id` prop so the titlebar
instance can opt out of the shared TOGGLE_BUTTON_ID, avoiding a
duplicate DOM id and NavPanelDraggable's hover-reveal CSS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(desktop): expand untracked directories in git status

`git status --porcelain` defaults to `--untracked-files=normal`, which
collapses whole untracked directories into a single `?? path/` entry.
That trailing-slash path then flowed into `readUntrackedAsPatch` as if
it were a file — `stat()` reported `isFile()=false`, an empty patch was
returned, and the Review panel rendered "无法加载该文件的 diff" against
a directory row. Pass `-u` so git expands those directories into their
individual files; each file then produces a real synthetic patch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 💄 style(desktop): scope titlebar toggle to macOS, hide in-page toggles there

The persistent titlebar toggle now renders only on macOS; Windows/Linux
keep the original right-aligned navigation controls and their in-page
toggles.

On macOS desktop, ToggleLeftPanelButton instances hide themselves (the
titlebar owns the control) unless `forceVisible` is set, removing the
now-redundant sidebar-header and content-header toggles. NavHeader also
skips rendering its empty toggle-only bar in this case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:42:57 +08:00
Rdmclin2 7b54edc665 🐛 fix(database): scope ai-infra upsert conflict targets to workspace (precursor for 0110) (#15507)
🐛 fix(database): scope ai-infra upsert conflict targets to personal partial index

The 0110 migration replaces the (id, user_id) / (id, provider_id, user_id)
primary keys with partial unique indexes (WHERE workspace_id IS NULL). A bare
ON CONFLICT target can no longer infer a partial index, so add
`targetWhere: isNull(workspaceId)` (and `where` for onConflictDoNothing) to
every personal-scope upsert. Keeps existing provider/model toggling, ordering
and batch upserts working after the migration.
2026-06-07 00:40:08 +08:00
Arvin Xu b6ae130c97 feat(agent): auto-scan project workspace (skills + AGENTS.md) for server agents (#15512)
*  feat(agent): auto-scan project workspace (skills + AGENTS.md) for server agents

When a server agent runs against a bound project directory, scan it server-side
at run start for project skills (.agents/skills + .claude/skills) and root
AGENTS.md/CLAUDE.md, cache the result on devices.workingDirs[].workspace (1h TTL),
surface skills in <available_skills>, and inject instructions into the system role.
Replaces the desktop-only client pre-scan so it works for any run initiator.

- Generic device RPC channel (invokeRpc / rpc_request) for server-internal device
  methods, separate from the LLM-facing tool-call path
- New desktop WorkspaceCtr owns project-skill / workspace scanning

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent): preserve workspace-init cache on device cwd save

device.updateDevice validates workingDirs as { path, repoType } only, so zod
strips the server-written workspace / workspaceScannedAt cache — an ordinary cwd
pick wiped the 1h workspace-init cache (and web reuse), forcing every later run
to rescan. The cache is server-owned, so re-attach it by path from the stored
row instead of trusting the client to round-trip it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:26:48 +08:00
Arvin Xu 5b5794baa4 ♻️ refactor(server): rename deviceProxy → deviceGateway (#15513)
Pure mechanical rename of the server device-relay module/class/singleton
(deviceProxy → deviceGateway, file included) to match the underlying
GatewayHttpClient naming. No behavior change. Split out of the workspace-init
feature PR (lobehub/lobehub#15512) to keep that diff reviewable.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:07:30 +08:00
Arvin Xu 04700bed52 feat(agent-runtime): server callSubAgent async suspend/resume (#15481)
*  feat(agent-runtime): add waiting_for_async_tool parked state for deferred tools

Add a dedicated `waiting_for_async_tool` operation status that mirrors
`waiting_for_human` as a non-terminal, resumable pause, and migrate the
client-tool execution pause off `interrupted` onto it — so `interrupted`
once again means only user-initiated cancellation.

Also add the AgentOperationModel primitives the upcoming server sub-agent
bridge needs: queryByParentOperationId (reconcile child ops) and
tryResumeFromAsyncTool (atomic single-fire CAS).

Foundation for the server sub-agent suspend/resume mechanism (LOBE-9763).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(agent-runtime): extract isParkedStatus / isBlockedStatus predicates

Replace the repeated `status === 'waiting_for_human' || ... === 'waiting_for_async_tool' || ... === 'interrupted'`
chains with named predicates so the parked/blocked semantics live in one place
(runtime step-loop break, completion lifecycle completedAt, executeSync pause,
operation isActive).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(aiAgent): rename execSubAgentTask -> execSubAgent

Full rename of the service method, its `ExecSubAgentTaskParams`/`ExecSubAgentTaskResult`
types, the tRPC endpoint, the injected `RuntimeExecutorContext`/`AgentRuntimeServiceOptions`
callback, and tests. Group-mode `execGroupSubAgent*` identifiers are intentionally left
untouched. Prep for the server sub-agent suspend/resume work (LOBE-9763).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Revert "♻️ refactor(aiAgent): rename execSubAgentTask -> execSubAgent"

This reverts commit f1ea407d74.

*  feat(agent-runtime): add deferred-tool park infrastructure

Introduce a generic `deferred` result flag (BuiltinServerRuntimeOutput /
ToolExecutionResult). When a tool returns deferred, call_tool parks the
operation (waiting_for_async_tool + pendingToolsCalling) without writing a
tool_result — mirroring the client-tool pause — so the result can be
delivered out-of-band later by a completion bridge. Thread the existing
execSubAgentTask DI seam into ToolExecutionContext so async tools can spawn
a child op without a circular import.

Part of the server sub-agent suspend/resume mechanism (LOBE-9763).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(agent-runtime): park call_tools_batch on deferred tools

Mirror the call_tool deferred-park on the parallel path: deferred (async)
tools are collected during the concurrent batch and, once server tools
settle, the operation parks (waiting_for_async_tool + pendingToolsCalling)
alongside any client tools — so K parallel sub-agents in one round all
resolve before the parent resumes.

Part of the server sub-agent suspend/resume mechanism (LOBE-9763).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(agent-runtime): server callSubAgent async suspend/resume bridge

Turn the server `callSubAgent` path from fire-and-forget into a real
deferred-tool suspend/resume loop (LOBE-9763 Phase 2):

- lobeAgent server runtime: add `callSubAgent` executor returning a
  `deferred` result via an injected `ctx.subAgent` runner
- RuntimeExecutors: build a per-tool-call server sub-agent runner that
  creates the pending placeholder tool message (anchoring the isolation
  thread) and kicks off the child op
- aiAgent.execSubAgentTask: register an onComplete bridge hook that
  backfills the placeholder and resumes the parent
- AgentRuntimeService: `tryResumeParentFromAsyncTool` (barrier over
  pendingToolsCalling + single-fire CAS + schedule), `refreshMessagesFromDB`,
  and the `resumeAsyncTool` branch in executeStep
- queue/local: forward `payload` to the execution callback so local/in-memory
  resumes (and human-approval) no longer drop their signal

Tests: callSubAgent executor unit tests, tryResumeParentFromAsyncTool
barrier/CAS unit tests, and a server suspend/resume integration test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent-runtime): keep hooks across waiting_for_async_tool park

The async sub-agent resume reuses the SAME operationId, but dispatchHooks
fired onComplete and unregistered all hooks on every non-continue step —
including the waiting_for_async_tool park. That made completion consumers
(webhooks, bot promises, eval snapshots) fire prematurely on the park and
miss the real terminal state after resume.

For waiting_for_async_tool, persist the parked status (the resume CAS reads
it) but skip onComplete and keep hooks registered, so the eventual resume
under the same op still notifies consumers. waiting_for_human is unchanged
(its resume runs under a new operationId).

Found via the server-subagent agent-eval (real LLM, in-memory runtime):
parent now correctly reaches `done` after the sub-op completes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent-runtime): unwrap QStash body.payload in runStep handler

QStashQueueServiceImpl nests resume/intervention fields under `body.payload`
(operationId/stepIndex/context stay top-level), but the runStep handler
destructured them from the top level. In production/QStash the resumed step
therefore saw `resumeAsyncTool` (and approvedToolCall/toolMessageId/…) as
undefined and never ran the waiting_for_async_tool DB-refresh/clear-pending
branch — the parent op would stay parked forever. The local queue spreads
payload itself, which masked this in local/eval runs.

Merge `body.payload` over the top-level body so both shapes work. Adds a
handler test asserting the QStash-nested payload reaches executeStep.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent-runtime): unpark parent when callSubAgent fails to start

When a server callSubAgent child op fails to start, no completion bridge
ever fires, so the parent stayed parked in `waiting_for_async_tool`
forever. The runner now drops the placeholder and signals `started:false`
so callSubAgent surfaces an inline tool error instead of parking the
parent — the batch continues (or parks only for genuinely-deferred
siblings, whose barrier already counts this error result).

Also:
- add isParkedStatus/isBlockedStatus to the @lobechat/agent-runtime test
  mock — persistCompletion/getOperationStatus call isParkedStatus, so the
  missing export crashed dispatchHooks (swallowing onComplete) and
  getOperationStatus, failing 3 AgentRuntimeService tests.
- fix completion-bridge totalToolCalls path (finalState.session.toolCalls
  → finalState.usage.tools.totalCalls; the former never existed).
- remove dead AgentOperationModel.queryByParentOperationId (zero callers).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 22:46:10 +08:00
Arvin Xu ad87e43b2e feat(agent-tracing): tool-result feedback quality analysis (tq command) (#15508)
*  feat(agent-tracing): add tool-result feedback quality analysis (tq command)

Adds a shared, no-LLM analyzer that scores how "clean / LLM-friendly" the
environment feedback (tool return content) is, plus an `agent-tracing tq`
CLI command to preview it over a snapshot corpus.

- src/analysis/toolFeedback.ts: pure analysis lib (reusable core) — per
  tool-result metrics (tokens, self-redundancy, structural-noise ratio,
  error flag/size, format) + op-level and corpus-level rollups.
- src/cli/tool-quality.ts: `tq` (alias `tool-quality`) — token-size
  histogram, dirty leaderboard ranked by token-weighted waste, single-op
  drill-down, and --json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(agent-tracing): guard against undefined histogram bucket in buildCorpusReport

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 18:31:06 +08:00
Arvin Xu 32c293f8c0 feat(claude-code): add per-question custom input to askUserQuestion (#15506)
*  feat(claude-code): add per-question custom input to askUserQuestion

Let users write their own answer as the trailing item in each question's
option list, beside picking a numbered choice. Single-select treats the two
as mutually exclusive; multi-select appends the custom text as an extra
entry. Merged into the question's answer at submit, so the bridge formatter
and completed Render need no changes. Draft round-trips via a __custom__:
prefix on the existing askUserDraft map.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* ♻️ refactor(claude-code): split askUserQuestion form & drop draft key prefix

Break the single ~530-line AskUserQuestion.tsx into a folder:
- draft.ts        pure helpers (read/buildSubmitPayload/isQuestionAnswered)
- useAskUserForm.ts  all state + handlers + draft persistence
- OptionCard.tsx / QuestionPanel.tsx  presentational pieces
- index.tsx       thin view

Also drop the `__custom__:<question>` draft-key prefix: persist the draft as
a typed object { picks, custom, escapeText, escapeActive } instead of a flat
string-keyed map. The picks/custom split now lives in named fields, so the
only sentinel left is `__freeform__` — and only in the submit payload, which
is the actual bridge contract. No behaviour change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(claude-code): make AskUserDraft assignable to setInterventionDraft

`setInterventionDraft` takes `Record<string, unknown>`; an `interface` isn't
assignable to it (open to declaration merging, so no implicit index
signature). Switch `AskUserDraft` to a `type` alias, which is closed and
satisfies the index signature. Fixes the tsgo TS2345 in CI.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 17:30:19 +08:00
LiJian 6f5a633c9f feat(connector): Connectors system — API-level tool permissions with plugin fallback (#15463)
*  feat(connector): add ConnectorModel, ConnectorToolModel, tRPC router, and inferCrudType util (LOBE-9984, LOBE-9985)

- packages/database/src/models/connector.ts: ConnectorModel with create/delete/query/queryByIdentifiers/findById/update/updateStatus
- packages/database/src/models/connectorTool.ts: ConnectorToolModel with upsertMany (preserves user permission on sync), updatePermission, queryByConnector, queryByConnectorIds
- src/libs/mcp/utils.ts: inferCrudType() — name-based CRUD type inference (delete > update > read > write)
- src/server/routers/lambda/connector.ts: tRPC router with list/create/update/delete/syncTools/updateToolPermission
- src/server/routers/lambda/index.ts: register connectorRouter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): runtime integration — connector-first tool resolution with plugin fallback (LOBE-9986)

- src/libs/mcp/buildConnectorManifests.ts: converts user_connector_tools rows into LobeToolManifest entries; maps permission → humanIntervention ('needs_approval' → 'required', 'disabled' → excluded)
- src/server/services/aiAgent/index.ts:
  - queryByIdentifiers(agentPlugins) to find matching connectors first
  - filter installedPlugins to exclude connector-covered identifiers
  - inject connectorManifests as additionalManifests into createServerAgentToolsEngine
  - add connector stdio tools to client executor map

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): add connector Zustand store slice (LOBE-9987)

- src/store/tool/slices/connector/: new slice with ConnectorState, ConnectorAction, connectorSelectors
  - fetchConnectors, createConnector, deleteConnector, syncConnectorTools, disconnectConnector
  - updateToolPermission with optimistic update + rollback
  - connectorToolsGrouped selector splits tools into read / write groups
- Wired into ToolStore (initialState + store.ts)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): add Connectors UI feature — list, detail, tool permission editor (LOBE-9988)

- src/features/Connectors/: new feature with two-panel layout (list + detail)
  - ConnectorList: groups connectors by Connected / Not connected, Add button
  - ConnectorDetail: sync button, disconnect, tool permission groups (read/write)
  - ToolPermissionGroup: collapsible with batch set (auto/approval/disable all)
  - ToolPermissionRow: three-state toggle auto(✓) / needs_approval() / disabled(🚫)
  - AddConnectorModal: name + MCP URL input via @lobehub/ui/base-ui Modal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): add Connectors tab to Agent customization panel (LOBE-9989)

- src/store/global/initialState.ts: add ChatSettingsTabs.Connector = 'connector'
- src/features/AgentSetting/AgentCategory/useCategory.tsx: add Connectors tab with LinkIcon
- src/features/AgentSetting/AgentConnectors/: new component listing user connectors with toggle
  - toggle calls toggleAgentPlugin(connector.identifier) — reuses agents.plugins[] field
  - shows per-connector tool count
- src/features/AgentSetting/AgentSettingsContent.tsx: render AgentConnectors for Connector tab

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): wire Connectors feature to /settings/connector route

- src/store/global/initialState.ts: add SettingsTabs.Connector = 'connector'
- src/routes/(main)/settings/hooks/useCategory.tsx: add Connectors item (LinkIcon) after Skills in AI config group
- src/routes/(main)/settings/features/componentMap.ts: map SettingsTabs.Connector → '../connector'
- src/routes/(main)/settings/features/SettingsContent.tsx: render Connector tab full-width (no SettingContainer), same as Provider
- src/routes/(main)/settings/connector/index.tsx: route page rendering the Connectors feature

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): use cssVar.property syntax in createStaticStyles (not function call)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): refactor /settings/skill to unified master-detail tool manager

## Backend
- connector.ts: add syncBuiltinTool — bootstraps user_connectors from builtin manifest api[]
- connector.ts: add syncPluginTools — bootstraps user_connectors from user_installed_plugins manifest
- connector.ts: upsertConnectorEntry helper + resolveDefaultPermission (maps humanIntervention → permission)
- connectorTool.ts: SyncToolInput.defaultPermission — per-tool default for new rows, existing rows preserved

## Store
- connector/selectors.ts: add connectorByIdentifier, connectorToolsGroupedByIdentifier, isSyncingByIdentifier
- connector/action.ts: add syncBuiltinTool, syncPluginTools (idempotent — safe to call on panel open)

## /settings/skill refactor
- index.tsx: two-panel master-detail layout (left: 300px skill list, right: detail + permissions)
- SkillList: add onSelect + selectedIdentifier props, pass through to builtin/mcp items
- BuiltinSkillItem: add onSelect + isSelected (selection highlight, click triggers right panel)
- McpSkillItem: add onSelect + isSelected
- SkillDetail (new): auto-syncs connector entry on mount, then renders ConnectorDetail permission editor
- SettingsContent: Skill tab now renders full-width (same as Provider/Connector)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(skill): createStaticStyles returns static object, not a hook

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(skill): wire onSelect to all skill item types — LobehubSkillItem, KlavisSkillItem + error handling in SkillDetail

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): use createStaticStyles correctly — static object, not hook; use string concat instead of cx()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): whole row clickable in list mode, hide action buttons when onSelect provided

All 5 item types (Builtin/Mcp/Lobehub/Klavis/AgentSkill):
- When onSelect is provided (list mode): entire row is clickable, action buttons hidden
- When onSelect is not provided (other usages): original behavior preserved
- Added onSelect/isSelected to AgentSkillItem + wired in SkillList for all agent skill types
- SkillDetail: show friendly message instead of error when skill has no tool permissions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): route sync action by sourceType; improve no-tools skill UI

ConnectorDetail:
- builtin → Reset (syncBuiltinTool from local manifest, resets permissions to defaults)
- marketplace → Refresh (syncPluginTools from installed plugin manifest)
- custom MCP → Sync (syncTools via remote MCP server, existing behavior)
- Hide Disconnect button for builtin/marketplace (only MCP connectors can disconnect)
- Show 'No tool permissions' message when connector has 0 tools
- Fix hooks-rules violation: move useCallback before early return

SkillDetail:
- Catch sync failure cleanly — shows graceful 'no tool permissions' panel
- Show skill identifier as title even when no tools available

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): inline AgentSkillDetail for agent skills; clean ConnectorDetail layout

SkillDetail:
- Add 'agent-skill' ToolDetailType — renders AgentSkillDetail inline (no modal, no connector sync)
- All hooks called before conditional returns (fixes rules-of-hooks)

SkillList:
- Pass type='agent-skill' for market/user agent skills (UUID identifiers, not plugin identifiers)

ConnectorDetail:
- Remove 'Tool permissions / Choose when AI...' subheader — tool groups render directly
- Cleaner layout: name → sync/disconnect buttons → tool groups

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): description in ConnectorDetail header + builtin-skill detail panel

Backend (connector.ts):
- syncBuiltinTool: store manifest meta.description + meta.avatar in connector.metadata
- syncPluginTools: same for plugin manifest meta
- upsertConnectorEntry: always update metadata on re-sync (keeps description fresh)

ConnectorDetail:
- Show connector.metadata.description below name in header

SkillDetail:
- Add 'builtin-skill' ToolDetailType for builtinSkills (Artifacts, Task, AgentBrowser)
  → Shows avatar + name + description panel; no connector sync needed (prompt-based)
- Add 'builtin-skill' type: reads from store builtinSkills array by identifier

SkillList:
- builtinAgent items → pass type='builtin-skill' (not 'builtin') to SkillDetail

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): fix crudType for camelCase, show skill content, compact items + categorized groups

inferCrudType (utils.ts):
- Fix: use prefix ^ anchoring instead of \b word boundary
- getReactions/listPins/searchMessages now correctly → 'read' (not 'write')
- \b fails on camelCase: 'getreactions' has no boundary after 'get' (both \w chars)

SkillDetail:
- builtin-skill type: render builtinSkill.content via <Markdown variant='chat'>
- Artifacts/Task/LobeHub skills now show their full markdown content in right panel

style.ts:
- Compact skill items: icon 48→36px, padding-block 12→6px

SkillList:
- Remove old flat renderIntegrations() + Divider
- Add categorized sections with headers:
  LobeHub 内置 Tools | 内置 Skill | 社区 Skill | 社区 Tools | 自定义
- Add sectionHeader style

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): collapsible sections, compact items matching reference design

style.ts:
- icon: 28→24px, no background (reference style: plain icon, no container bg)
- padding-block: 4→3px, font-size: 13px
- sectionHeader: collapsible with hover state

SkillList:
- Sections are collapsible — click header to toggle
- ChevronDown/ChevronRight icons on section headers
- All renderSection calls now pass a unique key

All item components (Builtin/Mcp/Lobehub/Klavis/AgentSkill):
- gap: 16→8px (tighter horizontal spacing)
- avatar/icon: 32→22px (matches reference ~24px icon)
- In list mode (onSelect): tag moves to RIGHT side of row
- In list mode: remove tag from title area, status text below title

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): default select first item; + button opens Add custom connector modal

index.tsx:
- Auto-select first installed builtin tool (or first builtin skill) on page load
- + button → opens AddConnectorModal (add custom MCP connector)
- 技能商店 button → still opens skill store (unchanged)

AddConnectorModal:
- Add Advanced settings section (collapsible chevron)
- OAuth Client ID field → stored in oidcConfig.clientId
- OAuth Client Secret field (UI only, encryption path TBD)
- Clear all fields on cancel/submit

Connectors/index.ts: export AddConnectorModal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): reference-quality UI polish + Connectors/Skills tab switcher

Style polish (matching linear-tool-permissions demo):
- style.ts: icon 20px, padding-block 6px, font-size 14px (no bold)
- All item avatars: 16px
- ToolPermissionRow: py-10px px-12px, font-mono tool names, 15px icons, hover bg
- ToolPermissionGroup: rounded badge for count, outline 'Custom ▾' batch button
- ConnectorDetail: restore 'Tool permissions' h3 + subtitle

Connectors/Skills tab switcher:
- Top of left panel: Connectors tab | Skills tab
- Connectors: builtin tools + OAuth connectors + community/custom MCPs
- Skills: builtin agent skills + community/user agent skills
- Switching tabs resets selection and auto-selects first item in new view
- + button only shown in Connectors view

SkillList: add viewMode='connector'|'skill' prop with filtered section display

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(skill): active permission state + Lobehub OAuth skill tools sync

ToolPermissionRow:
- btnActive: use primary color + primaryBg background (clearly visible selected state)

connector router:
- Add syncToolsFromClient: accepts client-provided tool list for skills that already
  have their tool list fetched (Lobehub OAuth skills, etc.)

Store action:
- Add syncToolsFromClient action

SkillDetail:
- Add 'lobehub-connector' ToolDetailType
- For lobehub-connector: reads server.tools from lobehubSkillStore (already populated
  after OAuth connect) and syncs via syncToolsFromClient — no remote MCP call needed

SkillList:
- Pass type='lobehub-connector' for Lobehub OAuth items (was 'plugin', wrong path)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* ♻️ refactor(connector): replace 'Tool permissions' header with connector description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): show disabled tools in settings UI (only filter at runtime)

connectorToolsGrouped: remove permission !== disabled filter — all tools should
be visible in ConnectorDetail so users can re-enable them. Disabled filtering
already happens at runtime in buildConnectorManifests and queryByConnectorIds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(skill): section lowercase, 4-group tools, remove tags in list mode

SkillList: remove text-transform: uppercase from sectionHeader
ConnectorDetail: split tools into 4 groups — Read / Create / Update / Delete
  (maps to crudType: read / write / update / delete)
connectorToolsGrouped selector: return { readTools, createTools, updateTools, deleteTools }
All item components: remove SkillSourceTag in list mode (onSelect provided)
  — tags are redundant when section headers already provide categorization

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): add Reset permissions button — restore all tools to auto

connector router: resetPermissions endpoint — sets all connector's tools to 'auto'
store: resetConnectorPermissions action
ConnectorDetail:
- Add 'Reset permissions' button — resets ALL tools back to auto (fully open)
- Rename 'Reset'/'Refresh' button to 'Refresh' — clarifies it syncs tool list only
- Two separate concerns: Refresh (tool list) vs Reset permissions (all → auto)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): use excluded.* in onConflictDoUpdate to ensure crudType updates + add description to tool rows

connectorTool.ts:
- Use sql`excluded.crud_type` etc. instead of table.column refs in onConflictDoUpdate
- table.column in set generates self-reference (no-op) in some Drizzle versions
- Now correctly updates crudType when Refresh is clicked (read/update/delete groups will show correctly)

ToolPermissionRow:
- Add description below tool name: 11px, tertiary color, single-line truncate with ellipsis
- Tooltip shows full description on hover (mouseEnterDelay: 0.5s)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): createStaticStyles returns static object not hook in ConnectorItem

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🗑️ chore(settings): remove /settings/connector route — Connectors are in /settings/skill

- Remove src/routes/(main)/settings/connector/index.tsx
- Remove SettingsTabs.Connector from enum and componentMap
- Remove Connectors item from settings sidebar useCategory
- Remove Connector from full-width list in SettingsContent
- Remove unused LinkIcon import from useCategory

ChatSettingsTabs.Connector (agent panel) is separate and unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): disabled tools stay in manifest with blocking description + hard-block at callTool

buildConnectorManifests:
- Disabled tools are now INCLUDED in the manifest (not excluded)
- Description replaced with: '[TOOL DISABLED] The user has disabled this tool and it cannot be executed...'
- humanIntervention: 'required' set for disabled tools so AI is explicitly warned
- AI can inform user the tool is disabled instead of silently not knowing it exists

mcp.callTool:
- Pre-call permission gate: query ConnectorModel + ConnectorToolModel by connector identifier
- If tool.permission === 'disabled': return immediately with "disabled by user" message
- MCP server is never called — the block is enforced server-side regardless of what AI attempts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): add permission gate to klavis.callTool for disabled tools

Gmail (and other Klavis-sourced connectors) use tools.klavis.callTool,
not tools.mcp.callTool, so the previous MCP permission gate didn't apply.

Fix: Add serverDatabase to klavisProcedure, extract connector identifier from
toolName prefix, query user_connector_tools, hard-block if permission=disabled.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🗑️ chore(skill): hide + button (custom MCP connector creation — OAuth flow TBD)

Remove AddConnectorModal entry point from /settings/skill header.
Custom HTTP MCP connectors require OAuth (Pre-registration / DCR) which
is not yet fully implemented. Will be re-added in a future PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): only replace plugins with connectors that have a real MCP endpoint

Root cause: Lobehub/Klavis OAuth skills are synced into user_connectors via
syncToolsFromClient with mcpServerUrl=null. buildConnectorManifests generates
mcpParams={url:''} for them. After humanIntervention approval, the runtime calls
tools.mcp.callTool({url:''}) → fails silently → empty result.

Fix: only use connectorsMcp (connectors with mcpServerUrl or stdio config) to
replace installedPlugins and build connector manifests. Connectors without a real
MCP endpoint (Lobehub/Klavis) fall back to their original plugin executor path,
preserving the Klavis callTool execution chain and fixing needs_approval flow.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

*  feat(connector): centralized tool permission enforcement across all execution paths

connectorPermissionCheck.ts (new shared utility):
- getConnectorToolPermission(): look up permission by identifier + toolName
- buildBlockedToolResponse(): standardized "disabled by user" response
- patchManifestWithPermissions(): patch manifest api[] with DB permissions

ToolExecutionService.executeTool() — centralized disabled gate:
- Queries DB at execution entry for ALL tool types (Lobehub skills, Klavis,
  MCP connectors, builtin plugins, and qstash/execAgent async path)
- Hard-blocks 'disabled' tools before any executor runs
- needs_approval handled by manifest humanIntervention (not blocked here)

aiAgent/index.ts — manifest patching for Lobehub/Klavis:
- After fetching lobehubSkillManifests + klavisManifests, query connector tools
- Patch manifests: needs_approval → humanIntervention:'required' (pauses for approval)
- Patch manifests: disabled → blocking description (AI informed, executor blocks)
- humanIntervention system already handles headless auto-reject for qstash

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): invokeBuiltinTool falls back to store lookup when payload.source is undefined

Root cause: when a tool call is re-invoked after humanIntervention approval,
the payload comes from the DB-stored message which does NOT persist the `source`
field. `internal_transformToolCalls` sets source correctly but it only runs for
LLM-generated tool calls, not for the approval re-invocation path.

Fix: in `invokeBuiltinTool`, if `payload.source` is undefined, do a live lookup
from the tool store (klavisAsLobeTools / lobehubSkillAsLobeTools) to determine
the correct executor. Applies to Klavis (Gmail, etc) and LobeHub Skills alike.

Also: remove all temporary [DEBUG] console.log statements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🔨 chore: fix TypeScript errors and test failures after canary rebase

- buildConnectorManifests: LobeToolManifest → ToolManifest (correct export name)
- connectorPermissionCheck: cast permission string to ConnectorToolPermission
- connector.ts model: guard encryptCredentials against null credentials
- ConnectorDetail: String() cast for unknown metadata.description
- AddConnectorModal: move loading to Modal.confirmLoading (correct prop)
- connector/action.ts: break circular ToolStore type reference with Pick<Impl>
- execAgent.disableTools.test.ts: mock ConnectorModel/ConnectorToolModel DB deps

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🐛 fix(connector): P1/P3 fixes + test mock coverage after code review

P1 — real MCP disabled tools now appear in manifest:
- ConnectorToolModel.queryAllByConnectorIds: new method without disabled filter
- aiAgent.ts: uses queryAllByConnectorIds for manifest building so buildConnectorManifests
  receives ALL tools (including disabled) and can emit blocking descriptions
- queryByConnectorIds (non-disabled filter) retained for runtime hot-path

P1 — Klavis gate works for hyphenated identifiers (google-calendar, etc):
- klavis.ts: replace split('_')[0] prefix hack with direct findByToolName DB lookup
- ConnectorToolModel.findByToolName: query user_connector_tools by userId + toolName

P3 — queryByConnector adds userId filter:
- Prevents leaking tool metadata to wrong user if connector UUID is known

Tests — mock ConnectorModel/ConnectorToolModel in all execAgent test files:
- execAgent.builtinRuntime.test.ts
- execAgent.deviceToolPipeline.test.ts
- execAgent.disableTools.test.ts (queryAllByConnectorIds added to mock)

TypeScript — ConnectorDetail metadata.description:
- Use typeof === 'string' type guard to narrow unknown → string for JSX render

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 🔨 fix(connector): precise Klavis permission gate + update stale disabled comments

Klavis gate — identifier + toolName (precise, no same-name collision risk):
- CallKlavisToolParams: add identifier? field
- klavisExecutor: pass identifier to callKlavisTool
- callKlavisTool store action: thread identifier through to tRPC mutate
- klavis.callTool router: accept optional identifier in input schema
- Permission gate: when identifier present, do queryByIdentifiers + queryByConnector
  + find by toolName for a precise 2-field lookup; fall back to findByToolName for
  legacy callers without identifier

Comments updated to reflect current disabled behavior:
- buildConnectorManifests.ts: disabled → injected with blocking description
- connector.ts schema: same correction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-06 12:03:56 +08:00
AmAzing- 485d664589 💬 style: rebrand platform agent copy to Connect Agent (#15498) 2026-06-06 09:55:34 +08:00
Arvin Xu b1ada9e5fc 🐛 fix(conversation): hide Usage extra for local hetero agents until model arrives (#15501)
Local CLI hetero agents (claude-code, codex) only report `model` after
turn_metadata lands mid-stream. The previous `showUsage` check used the
broad `HETEROGENEOUS_TYPE_LABELS` lookup which matches both local and
remote types, so it returned true with an empty model. Usage then fell
through to the `ModelIcon` path (Usage uses the narrower
`isRemoteHeterogeneousType` for the brand-label branch) and rendered a
lone empty-model placeholder icon under the message.

Align the gate with Usage's internal branching: only bypass `!!model`
for remote hetero (openclaw, hermes) which never expose a real model id.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 01:49:49 +08:00
Arvin Xu 5dc769d135 🐛 fix(agent-signal): attribute self-iteration run trace to reviewed agent & isolate memory runs (#15479)
Background Agent Signal runs (memory / skill / self-reflection) execute under a
builtin agent slug. Two attribution gaps caused their traces to surface in the
wrong place:

- execAgent persisted the run's user + assistant message rows under the builtin
  slug's agent id, while the operation row, isolated thread, and receipts all
  attribute to the reviewed user agent on `marker.agentId`. The trace therefore
  "hung" under the builtin reflection/skill agent. Persist messages under
  `marker.agentId` when present, falling back to the executing agent otherwise.

- The memory run only created its isolated thread when an `assistantMessageId`
  could be extracted from a `clientRuntimeComplete` source id
  (`${assistantMessageId}:completion:${parentMessageId}`). Any other source left
  it undefined, skipping thread creation so the memory-agent messages leaked
  into the active conversation. Fall back to the triggering user `messageId` so
  a child thread is still created.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 01:42:48 +08:00
Arvin Xu 64b7ab2f17 💄 style(topic): one-click collapse/expand all topic groups (#15484)
*  feat(topic): add one-click collapse/expand all groups in topic sidebar

Add a toggle button in the topic sidebar header (next to Filter and the
more-actions menu) that collapses or expands all topic groups at once.
It reuses the existing `expandTopicGroupKeys` global status, so it stays
in sync with manual per-group toggling, and hides itself when there are
fewer than two groups (e.g. flat mode).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(topic): hide group toggle in flat mode

In flat mode, groupedTopicsForSidebar falls through to time grouping so
the computed group count can exceed one, but List renders FlatMode with
no accordion for the toggle to affect. Hide the control explicitly when
topicGroupMode === 'flat' instead of relying on the group count.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 💄 style(topic): use 2-corner minimize/maximize icons for group toggle

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 01:33:32 +08:00
Arvin Xu 9c4dadda4c feat(task-detail): replace inline comment input with ChatInput that triggers a new run (#14873)
*  feat(task-detail): split task panel comment from topic-thread reply

CommentInput in TaskActivities stays as-is on canary — avatar + EditorCanvas
+ attachment + send button, posting a plain task-level comment.

TopicChatDrawer footer becomes a FeedbackInput that calls the in-scope
ConversationProvider's sendMessage, continuing the existing topic
conversation instead of attaching a comment + restarting the run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(task-detail): keep FeedbackInput visible while topic is running

Drop the canLeaveFeedback gate so the in-thread reply box renders even
when the topic is pending/running. ConversationStore.sendMessage already
queues messages during an in-flight stream, so this just exposes the
queue affordance to the user — letting them steer the next step
without waiting for the current run to terminate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 💄 style(task-detail): collapse FeedbackInput behind a follow-up button + add attach action

FeedbackInput now starts collapsed as a full-width "Send follow up message"
button. Click expands a ChatInput shell with EditorCanvas inside and a footer
that carries an AttachmentUploadButton on the left (+ icon) and the send
button on the right. Files are inserted inline into the editor (same
pattern as CommentInput) so they ride along on sendMessage's editorData.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 💄 style(task-detail): tighten CommentInput card & switch follow-up button to filled

- CommentInput card: padding-block 8px → 4px, editor placeholder fontSize 14px
- FeedbackInput collapsed button: default size + variant="filled" for a less
  obtrusive look that sits flush in the chat footer

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 💄 style(task-detail): drop top padding above FeedbackInput in topic drawer

Use paddingBlock="0 12px" so the follow-up button hugs the last message
instead of floating with a 12px gap above.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 🐛 fix(task-detail): clear FeedbackInput editor before awaiting sendMessage

Previously the editor cleanup ran after the awaited sendMessage call, so
the box kept the just-sent text on screen until the entire send + stream
lifecycle resolved. Move clearContent / collapse before the await so the
input feels responsive (sendMessage already snapshots markdown and
editorData for its optimistic update).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 🐛 fix(task-detail): keep FeedbackInput expanded after sending

Drop the setExpanded(false) call in handleSubmit so the ChatInput
remains open once the user has opened it. Collapsing it back to the
"Send follow up message" button right after every reply was disruptive
mid-conversation; the button only makes sense as the initial resting
state of the drawer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(chat): add forceRuntime override to SendMessageParams

Plumb a new optional forceRuntime field through SendMessageParams →
ConversationLifecycle.sendMessage → selectRuntimeType(parentRuntime).
parentRuntime already wins over every other signal in the dispatcher,
so callers can pin a send to 'gateway' / 'client' / 'hetero' regardless
of the agent's local/cloud config.

Also propagate forceRuntime through the message queue (QueuedMessage +
MergedQueuedMessage + mergeQueuedMessages + both drain sites in the
client and hetero executors) so a follow-up queued during an in-flight
run keeps its runtime pin when it eventually fires.

FeedbackInput in TopicChatDrawer passes forceRuntime: 'gateway' so
task-topic follow-ups stay on the server-side path that runTask
originally used, even if the user's global runtime preference is local.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-06 01:26:25 +08:00
AmAzing- ab7cb07ae5 🐛 fix: type errors in oidc http-adapter test breaking CI lint (#15499) 2026-06-06 01:24:12 +08:00
Rylan Cai 596440901d 🐛 fix: auto-run required tools in headless mode (#15492) 2026-06-06 00:40:24 +08:00
YuTengjing 2b9f08a43b 🐛 fix: timeout Market connection listing (#15487) 2026-06-05 13:27:08 +08:00
YuTengjing 95a0cf1264 🐛 fix: handle runtime request errors (#15478) 2026-06-05 13:13:56 +08:00
Innei 65ba086685 🐛 fix(agent-documents): render system docs in editor (#15462)
* 🐛 fix(agent-documents): render system docs in editor

*  feat(agent-documents): autosave highlight editor with safe unmount flush

Add debounced autosave to the non-markdown highlight editor and a StrictMode-safe
unmount flush via queueMicrotask, plus a beforeunload guard against dirty buffers.

*  test: fix agent document PR type checks
2026-06-05 10:22:31 +08:00
Zhijie He 25635ddb38 feat(task): auto-ensure qstash schedule for task system (#14771)
*  feat(task): auto-ensure qstash schedule

chore: cleanup code

chore: cleanup code

chore: cleanup code

* chore: migrate qstash init workflow to startServer

chore: migrate qstash init workflow to startServer

* fix: set default QSTASH_URL to eu region, same as SDK

fix: set default QSTASH_URL to eu region, same as SDK
2026-06-05 02:07:03 +08:00
Arvin Xu f5d78d3d28 feat(device): switch device cwd handling to structured workingDirs (#15353)
Consume the `working_dirs` column: model `updateDevice`, tRPC `updateDevice`
input + `listDevices` output, and the client cwd pickers now operate on
`WorkingDirEntry[]` instead of the flat `recentCwds: string[]`.

- model / tRPC: `workingDirs` (input capped at 20, validated `{ path, repoType? }`)
- client `deviceCwd`: `nextRecentCwds` → `nextWorkingDirs`
- UI: DeviceWorkingDirectory / WorkingDirectory / DeviceDetailPanel / DeviceItem
  render the detected repo type via the shared `renderDirIcon`

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:27:03 +08:00
Hardy f7c46a30a4 feat(opencode-go) add MiniMax M3, remove deprecated models, rework model fetch logic (#15376)
* 🗑️ chore(opencode-go): remove MiMo V2 Omni and MiMo V2 Pro models

*  feat(opencode-go): fetch model list from API with models.dev enrichment

- Try API /models first for real-time available models
- Enrich with models.dev data (pricing, abilities, SDK routing)
- Fallback to models.dev + model-bank if API fails
- Dynamic Anthropic SDK routing via provider.npm field

* 💰 fix(opencode-go): update MiMo pricing to match models.dev

- mimo-v2.5: input $0.14, output $0.28, cache_read $0.0028
- mimo-v2.5-pro: input $1.74, output $3.48, cache_read $0.0145

*  feat(opencode-go): add MiniMax M3 and remove deprecated Qwen3.5 Plus

- Add minimax-m3: 512K context, vision support (image+video), 131K output,
  pricing 0.6/2.4/0.12 USD per M tokens, released 2026-05-31
- Remove qwen3.5-plus: marked deprecated in models.dev

* 🐛 fix(opencode-go): restore Anthropic routing fallback when models.dev is unreachable

Codex P2 review on #15376:
- `routers` is called with `ClientOptions` (no `client` field), so
  `options.client?.models.list?.()` silently returned `undefined` via
  optional chaining; the `catch` never ran and `modelIds` stayed `[]`.
- In API + models.dev double-failure scenarios, `getAnthropicModels([])`
  returned an empty list, regressing Anthropic SDK routing for MiniMax /
  Qwen models.

Fix:
- Make `getAnthropicModels` self-contained: takes no parameters.
- Fallback chain: models.dev → static model-bank prefix match → `[]`.
- `routers` no longer touches `options.client`.

*  feat(opencode-go): enrich model list with models.dev metadata

The model list pipeline previously forwarded only `{ id }` from the API
and models.dev, so displayName / pricing / context / modalities all came
from the static model-bank. When models.dev disagrees with model-bank
(e.g. a price update or new model), the runtime would show stale data.

Map models.dev fields into the flat shape that `processModelCard`
understands, so each card is enriched with:
  - displayName (dev.name)
  - contextWindowTokens / maxOutput (dev.limit)
  - releasedAt (dev.release_date)
  - functionCall / reasoning / vision / structuredOutput (dev.flags +
    dev.modalities.input)
  - pricing (dev.cost → flat input/output/cachedInput/writeCacheInput;
    processModelCard's formatPricing converts it to units)

Fields models.dev doesn't have (description, organization, settings
.extendParams, etc.) still fall back to the model-bank entry via
processModelCard's knownModel lookup, keeping the static config as the
source of truth for UX-only fields.

*  feat(opencode-go): drive reasoning_content handling from models.dev

The `reasoningInterleavedModels` list was hardcoded and drifted from
models.dev:
  - Missing: kimi-k2.5, kimi-k2.6, mimo-v2-omni, mimo-v2-pro
  - Stale: qwen3.7-max (no longer has `interleaved` in models.dev)

Move the source of truth into the models.dev cache. `fetchModelsDevData`
now also builds an `interleavedIds: Set<string>` from `m.interleaved.field`
alongside `anthropicModels`, so every derived field stays in sync with
a single fetch.

The new `getInterleavedModelIds` sync accessor lets `buildOpenAIPayload`
keep its sync signature; it returns the cached set when populated and
falls back to a hardcoded snapshot of the last-known models.dev state on
the very first chat request before any fetch has run.
2026-06-05 01:11:40 +08:00
Arvin Xu f77f31efc0 🔨 chore(database): re-tighten getBuiltinAgent onConflict after 0109 (#15475)
🔨 chore(database): re-tighten getBuiltinAgent onConflict to the 0109 partial index

Now that migration 0109 has flipped agents_slug_user_id_unique to a partial
index (WHERE workspace_id IS NULL) in all environments, restore the precise
conflict arbiter { target: [slug, userId], where: isNull(workspaceId) } so
unexpected unique violations surface instead of being silently swallowed by the
bare onConflictDoNothing() transition form.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:07:57 +08:00
Rylan Cai cd171d3510 🐛 fix: bypass audits for headless tool calls (#15406)
* 🐛 fix: bypass audits for headless tool calls

* 🐛 fix: block high-risk headless tools at execution

* Revert "🐛 fix: block high-risk headless tools at execution"

This reverts commit 1d4b534e7a36757bfea0ab229b45a7da647898a3.

* 🐛 fix: restore headless audit bypass

* 🐛 fix: resolve headless blocked tools

* 🐛 fix: simplify blocked tool results

* 🧹 chore: remove unrelated prompt diff

* 🐛 fix: narrow blocked tool instruction type

* 🐛 fix: split security blacklist policies

* 🐛 fix: simplify security blacklist policy rules

* 💄 style: tighten security blacklist diff

* 💄 style: reduce agent config doc diff

* 💄 style: tighten headless audit diff

* 💄 style: minimize audit policy diff

* 💄 style: clarify global audit match naming

* 🐛 fix: auto-run required global audits in headless

* 💄 style: clarify headless intervention comments

* 💄 style: clarify headless global audit comment

* 💄 style: use blocked tool instruction type

* 💄 style: clarify headless audit tests

* 💄 style: annotate headless blocked tool tests

* 🐛 fix: type security blacklist policy filter

* 💄 style: clarify local system 403 guidance

* 🐛 fix: use current persist error helper
2026-06-04 23:42:21 +08:00
YuTengjing b7e2663079 ♻️ refactor: expose email harmony options slot (#15477) 2026-06-04 23:06:14 +08:00
René Wang 537c39f771 💄 style(chat-input): rework Plus menu with toggle switches and grouped submenus (#15433) 2026-06-04 21:24:28 +08:00
Arvin Xu ed47d9ece5 🗃️ build(database): migrate unique constraints to workspace scope (#15472)
* 🗃️ db(database): migrate unique constraints to workspace scope (migration 0109)

Replace the legacy user-scoped UNIQUE constraints with workspace-scoped
partial unique indexes across agents, agent evals, agent skills,
documents, sessions, tasks, and rbac roles/user-roles. Adds migration
0109_migrate_unique_constraints and updates the affected schemas.

* 🐛 fix(database): match partial unique index in getBuiltinAgent upsert

Migration 0109 turned `agents_slug_user_id_unique` into a partial index
(WHERE workspace_id IS NULL). A plain `ON CONFLICT (slug, user_id)` no longer
matches it (Postgres 42P10), breaking getBuiltinAgent. Add the same predicate
via onConflictDoNothing's `where` option; builtin agents are always
workspace-less so the predicate always holds.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🔨 chore(database): use bare onConflictDoNothing in getBuiltinAgent for 0109 transition

Index-shape-agnostic upsert so the builtin-agent path works whether
agents_slug_user_id_unique is the legacy full unique or the 0109 partial,
removing the deploy-ordering coupling. Re-tighten to { target, where } in a
follow-up once 0109 has flipped the index everywhere.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 21:08:36 +08:00
Arvin Xu 2bb39f470a feat(gateway): add explicit type discriminator to tunneled tool calls (#15473)
*  feat(gateway): add explicit type discriminator to tunneled tool calls

The device-gateway relays builtin local-system calls and tunneled stdio MCP
calls over one `tool-call` channel. The device was meant to tell them apart by
sniffing whether `toolCall.params` exists — fragile: any future builtin tool
that grows a `params` field would be misrouted to the MCP client.

Add an explicit `toolCall.type` discriminator (`'builtin' | 'mcp'`). The HTTP
client stamps it: `executeToolCall` → `'builtin'`, `executeMcpCall` → `'mcp'`.
The device routes on `type`, never on payload shape. Optional + back-compatible:
an older server that omits it is treated as `'builtin'`.

The desktop receiver switches to this discriminator in a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

*  feat(desktop): execute tunneled stdio MCP calls from the gateway (#15470)

Receiving half of the gateway stdio-MCP work. When the cloud server tunnels a
stdio MCP tool call to this device (a `tool_call_request` carrying
`mcpParams`), run it locally instead of falling through to the builtin
local-system tool switch (which keys on apiName and has no MCP context, so it
rejected these as "not available on this device").

- `gatewayConnectionSrv`: add a dedicated `mcpCallHandler` + `setMcpCallHandler`;
  `handleToolCallRequest` routes on the presence of `toolCall.mcpParams`,
  sharing the existing response-envelope path.
- `GatewayConnectionCtr`: wire `setMcpCallHandler` → `executeMcpCall`, which
  maps the wire payload to `McpCtr.runStdioMcpTool`.
- `McpCtr`: extract `runStdioMcpTool` core from the `callTool` IPC method so
  both the renderer and the gateway tunnel share one stdio execution path
  (no SuperJSON round-trip for the in-process caller).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 21:04:09 +08:00
Johnny 92ec067718 fix: prefer INTERNAL_APP_URL for ComfyUI server calls (#15387)
🐛 fix: prefer internal app url for comfyui calls
2026-06-04 19:37:39 +08:00
Arvin Xu 8f19fde3e7 🗃️ build(database): add workspace_id indexes (#15468)
* 🗃️ db(database): add workspace_id indexes (migration 0108)

Phase 3 of the workspace DB migration (LOBE-9961). Adds a btree index on
workspace_id to 70 tenant tables, plus 7 workspace-scoped partial unique
indexes (WHERE workspace_id IS NOT NULL) that pre-build the "new" side of the
Phase 4 (0109) unique-constraint cutover.

A separate production-safe runbook (0108_concurrent.sql, CREATE INDEX
CONCURRENTLY, ordered smallest->largest) is intentionally NOT committed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🗃️ db(database): make 0108 index migration idempotent

Add IF NOT EXISTS to all 70 CREATE INDEX + 7 CREATE UNIQUE INDEX statements,
per the db-migrations standard flow (defensive/idempotent SQL), matching how
0107 used DROP CONSTRAINT IF EXISTS. Safe to re-run and safe if the concurrent
runbook already built the indexes before the auto-migrator reaches 0108.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 19:03:00 +08:00
Arvin Xu f35f984268 feat(gateway): tunnel stdio MCP tool calls to the device (#15469)
Stdio MCP servers live on the user's machine, but in gateway (cloud) mode
the agent runs server-side and `executeMCPTool` tried to spawn the stdio
binary on the cloud server — which has neither the binary nor access to the
user's machine, so local MCP tools (e.g. tasks calling a local kimi-datasource
MCP) always failed.

Add a dedicated `executeMcpCall` path that forwards the stdio connection
params (command/args/env) to a connected device, which spawns the MCP server
and runs the call locally. It rides the existing `/api/device/tool-call`
relay — the gateway forwards `toolCall` opaquely — so the device-gateway
worker needs no changes; the device routes on the presence of
`toolCall.mcpParams`.

Server-side only: when no device is connected, behavior is unchanged
(standalone Electron still spawns in-process). The desktop-side receiver that
runs the forwarded call lands in a follow-up.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 19:02:11 +08:00
YuTengjing b9fbad7f02 ♻️ refactor(ai-chat): remove simple turn fast path (#15471) 2026-06-04 17:58:57 +08:00
YuTengjing e165b6424b 📝 docs: clarify drizzle raw sql guidance (#15467) 2026-06-04 17:00:42 +08:00
YuTengjing bab3ff4a7a 🐛 fix: reduce agent document context latency (#15436) 2026-06-04 16:23:51 +08:00
Arvin Xu 1e2c1aacd5 🗃️ build(database): add workspace_id FK constraints (#15465)
* 🗃️ db(database): add workspace_id FK constraints (migration 0107)

Phase 2 of workspace_id rollout: add the FK constraint on the 70 tables
that gained a bare `workspace_id` column in Phase 1 (0106), referencing
workspaces(id) ON DELETE CASCADE.

- schema: add `.references(() => workspaces.id, { onDelete: 'cascade' })`
  to all 70 nullable workspace_id columns
- 0107_add_workspace_id_fk.sql: idempotent drizzle migration
  (DROP CONSTRAINT IF EXISTS + ADD), runs in CI / dev / self-host
- 0107_concurrent.sql: production-safe out-of-band runbook
  (NOT VALID + VALIDATE) to avoid write-blocking locks on large tables;
  NOT run by drizzle

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🔥 db(database): remove stray 0107_concurrent migration file

* 🐛 fix(database): break user/workspace schema circular dependency

Move userInstalledPlugins from user.ts into connector.ts to break the
user.ts <-> workspace.ts import cycle flagged by dpdm. connector.ts
already imports both users and workspaces, and consumers import the
table from the schemas barrel, so no call sites change.

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 16:23:14 +08:00
Arvin Xu 475f391d97 ♻️ refactor(message): prefer dedicated usage column over metadata.usage (#15457)
* ♻️ refactor(message): prefer dedicated usage column over metadata.usage

Token usage was promoted out of metadata.usage into a dedicated messages.usage
column, but nothing populated it and all reads still went through metadata.usage.

- Centralize write-side promotion in the DB model (update / updateMetadata /
  create), so all executor callers populate the usage column from a top-level
  usage payload, falling back to metadata.usage. metadata.usage stays dual-written
  for backward-compatible reads.
- Reads prefer the usage column and fall back to metadata.usage: message queries,
  getTokenHeatmaps, recomputeTopicUsage, the usage record service, and context
  token accounting.
- Add top-level usage to UpdateMessageParams + DBMessageItem types.
- Mark metadata.usage and the legacy flat token fields as @deprecated, pointing
  to the top-level usage field.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* 🐛 fix(message): dual-write metadata.usage for top-level usage updates

When a caller passed the new top-level `usage` param without also sending
`metadata.usage`, the update wrote only `messages.usage` and left
`metadata.usage` stale/absent — legacy readers and rollback paths still consume
it during the dual-write transition. Fold the resolved usage into the metadata
patch so `metadata.usage` stays in sync regardless of how usage was passed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 14:14:11 +08:00
Arvin Xu 133675adda 🗃️ db(database): add workspace_id columns to existing tables (#15446)
* 🗃️ feat(database): add workspace_id columns to existing tables

Add a nullable `workspace_id text` column to user-owned business tables
(agents, sessions, topics, messages, files, tasks, RAG/eval, RBAC, devices,
connectors, etc.) so records can later be scoped to a workspace. Workspace
tables themselves already landed on canary via 0105_add_usage_agent_share_workspace.

Also folds in the additive device schema from #15356: the structured
`working_dirs` jsonb column + `WorkingDirEntry` type (recent_cwds kept,
now @deprecated).

Scope is deliberately column-only — the lowest-risk slice:
- migration 0106 is pure `ADD COLUMN IF NOT EXISTS` (metadata-only, ~ms locks
  per table, online-safe, no app code change since columns are all NULL).
- FKs, btree indexes, and the per-user→workspace-scoped unique-constraint
  conversions are intentionally deferred to follow-up PRs so each can use the
  production-safe execution path Drizzle can't express (NOT VALID + VALIDATE,
  CREATE INDEX CONCURRENTLY, atomic unique swap).

Scoping notes:
- devices / user_connectors / user_connector_tools: scoped (user-owned resources).
- push_tokens: left user/device-level — an Expo token is one per app install and
  receives a person's notifications across all their workspaces.
- agent_shares: no workspace_id — scoped transitively via agent_id → agents.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* 🐛 fix(database): satisfy inferred row types after adding workspace_id

Adding workspace_id made it a required key in the Drizzle-inferred row types
($inferSelect), breaking call sites that build those shapes by hand:
- rbac.getUserRoles: include workspace_id in the explicit select projection
- session action: add workspaceId to the constructed chat-group literal
- test mocks (apiKey / generation / generationBatch / generationTopic): add
  workspaceId: null

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

*  test(database): use toMatchObject for topic.create row assertions

The two `expect(createdTopic).toEqual({ ...full literal })` snapshots broke
on every new column (here: workspace_id). Switch them to toMatchObject so the
returned row may carry extra columns without churning the expected literal.
The dbTopic↔createdTopic strict comparisons are left as toEqual.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 13:18:16 +08:00
Arvin Xu e8b914feef ♻️ refactor(agent-signal): S6 — migrate skillManagement to execAgent builtin agent (#15443)
Move the self-iteration skill-management action off the inline policy
implementation onto an execAgent-dispatched builtin agent (slug
`skill-management`), mirroring the S3/S4 memoryWriter + self-iteration
migration. Adds the `agentSignalSkillManagement` serverRuntime, the
builtin-tool-agent-signal skill-management manifest/systemRole, and the
builtin-agents skill-management agent; strips the ~3.5k-line inline
skillManagement policy down to the dispatch shim.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-04 12:49:26 +08:00
Hardy 7f3f1278e4 feat(prompts): use XML format for topic title generation to improve DeepSeek compatibility (#15413) 2026-06-04 12:42:11 +08:00
Arvin Xu 951561f685 ️ perf(database): add optional statement_timeout to server DB connections (#15445)
Long-running queries (e.g. an insert stuck for 700s on lock contention)
could block indefinitely because Postgres' statement_timeout defaults to
0 (no limit) and neither the node nor neon pool configured one.

Add an optional DATABASE_STATEMENT_TIMEOUT env (milliseconds, no default)
applied to both NodePool and NeonPool as statement_timeout and
idle_in_transaction_session_timeout, so Postgres aborts a stuck statement
or idle transaction on the server side. Unset keeps the previous behavior.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 12:34:23 +08:00
lobehubbot d3eebd3994 Merge remote-tracking branch 'origin/main' into canary 2026-06-04 03:57:53 +00:00
AmAzing- 54e1b59ce6 feat(agent-management): paginate searchAgent with real totals + wire 8 packages into CI (#15448)
*  feat(agent-management): paginate searchAgent with real totals and cap notice

The searchAgent tool silently clamped limit to 20 with no pagination and
reported totalCount as the returned page size, so models (and users) could
never discover agents beyond the 20 most recently updated ones.

- AgentModel: extract shared where builder, add countAgents (same
  conditions as queryAgents)
- lambda router + client agent service: expose countAgents
- server tool runtime & AgentManagerRuntime: pass offset through, report
  real totals (workspace + marketplace), emit explicit notes when the
  requested limit is capped and when more pages exist, explain
  out-of-range offsets instead of claiming no matches
- manifest: add offset param, document pagination
- agent-manager-runtime: add vitest config + test scripts (suite was
  previously unrunnable), repair stale store mocks

* 👷 build(ci): wire 8 tested packages into the package test workflow

An audit found 8 packages carrying test:coverage scripts that were never
added to the CI PACKAGES allowlist, so their suites never ran:

- agent-gateway-client, device-gateway-client, device-identity,
  eval-dataset-parser: already green, added as-is
- eval-rubric, fetch-sse: had no package-level vitest config, so vitest
  fell back to the root config whose setup/aliases break outside src/ —
  added minimal configs
- heterogeneous-agents: one assertion drifted (labels registry gained
  amp/hermes/openclaw/opencode) with nobody noticing — updated
- agent-manager-runtime: wired in the previous commit

All 8 verified locally with the exact CI command
(bun run --filter <pkg> test:coverage).

*  test(agent-management): cover searchAgent error path and market totalCount fallback

Codecov flagged 3 uncovered lines in the patch: the searchAgents catch
block (2 misses) and the totalCount ?? items.length fallback (1 partial).
Add the missing failure-path and fallback tests on both execution paths
(client AgentManagerRuntime + server tool runtime).
2026-06-04 10:52:25 +08:00
4273 changed files with 277589 additions and 46726 deletions
+2 -2
View File
@@ -51,7 +51,7 @@ export interface GlobalServerConfig {
### 3. Assemble Server Config (if new domain)
In `src/server/globalConfig/index.ts`:
In `apps/server/src/globalConfig/index.ts`:
```typescript
import { <domain>Env } from '@/envs/<domain>';
@@ -97,7 +97,7 @@ AI_IMAGE_DEFAULT_IMAGE_NUM: z.coerce.number().min(1).max(20).optional(),
// packages/types/src/serverConfig.ts
image?: PartialDeep<UserImageConfig>;
// src/server/globalConfig/index.ts
// apps/server/src/globalConfig/index.ts
image: cleanObject({ defaultImageNum: imageEnv.AI_IMAGE_DEFAULT_IMAGE_NUM }),
// src/store/user/slices/common/action.ts
+8 -8
View File
@@ -50,14 +50,14 @@ execAgent({ hooks })
## Key Files
| File | Role |
| ---------------------------------------------------------- | ------------------------------------------------------ |
| `packages/agent-runtime/src/types/hooks.ts` | Type definitions (AgentHookType, all event interfaces) |
| `src/server/services/agentRuntime/hooks/types.ts` | Server-side types (AgentHook, re-exports) |
| `src/server/services/agentRuntime/hooks/HookDispatcher.ts` | Registration, dispatch, dispatchBeforeToolCall |
| `src/server/modules/AgentRuntime/RuntimeExecutors.ts` | Tool/Compact/HumanIntervention hook dispatch |
| `src/server/services/agentRuntime/AgentRuntimeService.ts` | Step hooks + HumanIntervention resume/reject |
| `src/server/services/aiAgent/index.ts` | CallAgent hook dispatch |
| File | Role |
| --------------------------------------------------------------- | ------------------------------------------------------ |
| `packages/agent-runtime/src/types/hooks.ts` | Type definitions (AgentHookType, all event interfaces) |
| `apps/server/src/services/agentRuntime/hooks/types.ts` | Server-side types (AgentHook, re-exports) |
| `apps/server/src/services/agentRuntime/hooks/HookDispatcher.ts` | Registration, dispatch, dispatchBeforeToolCall |
| `apps/server/src/modules/AgentRuntime/RuntimeExecutors.ts` | Tool/Compact/HumanIntervention hook dispatch |
| `apps/server/src/services/agentRuntime/AgentRuntimeService.ts` | Step hooks + HumanIntervention resume/reject |
| `apps/server/src/services/aiAgent/index.ts` | CallAgent hook dispatch |
## Registration Flow
+18 -18
View File
@@ -26,9 +26,9 @@ Agent Signal has one consistent shape:
Read:
- `src/server/services/agentSignal/index.ts`
- `src/server/workflows/agentSignal/index.ts`
- `src/server/workflows/agentSignal/run.ts`
- `apps/server/src/services/agentSignal/index.ts`
- `apps/server/src/workflows/agentSignal/index.ts`
- `apps/server/src/workflows/agentSignal/run.ts`
## Core Model
@@ -48,11 +48,11 @@ Keep the boundaries strict:
## Implementation Workflow
1. Decide whether the use case is synchronous or quiet background work.
2. Define or reuse a source type in `src/server/services/agentSignal/sourceTypes.ts`.
3. Define or reuse signal and action types in `src/server/services/agentSignal/policies/types.ts`.
2. Define or reuse a source type in `apps/server/src/services/agentSignal/sourceTypes.ts`.
3. Define or reuse signal and action types in `apps/server/src/services/agentSignal/policies/types.ts`.
4. Implement handlers with `defineSourceHandler`, `defineSignalHandler`, or `defineActionHandler`.
5. Bundle handlers with `defineAgentSignalHandlers(...)`.
6. Register the policy in `src/server/services/agentSignal/policies/index.ts` and pass it into the runtime factory if needed.
6. Register the policy in `apps/server/src/services/agentSignal/policies/index.ts` and pass it into the runtime factory if needed.
7. Add or update ingress code that emits or enqueues the source event.
8. Add observability and tests before considering the flow complete.
@@ -63,19 +63,19 @@ Keep the boundaries strict:
`packages/agent-signal/src/base/builders.ts`
`packages/agent-signal/src/base/types.ts`
- Server-owned runtime and middleware:
`src/server/services/agentSignal/runtime/AgentSignalRuntime.ts`
`src/server/services/agentSignal/runtime/AgentSignalScheduler.ts`
`src/server/services/agentSignal/runtime/middleware.ts`
`src/server/services/agentSignal/runtime/context.ts`
`apps/server/src/services/agentSignal/runtime/AgentSignalRuntime.ts`
`apps/server/src/services/agentSignal/runtime/AgentSignalScheduler.ts`
`apps/server/src/services/agentSignal/runtime/middleware.ts`
`apps/server/src/services/agentSignal/runtime/context.ts`
- Existing policy example:
`src/server/services/agentSignal/policies/analyzeIntent/index.ts`
`src/server/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
`src/server/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
`src/server/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
`src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
`apps/server/src/services/agentSignal/policies/analyzeIntent/index.ts`
`apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
`apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
`apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
`apps/server/src/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
- Observability:
`src/server/services/agentSignal/observability/projector.ts`
`src/server/services/agentSignal/observability/traceEvents.ts`
`apps/server/src/services/agentSignal/observability/projector.ts`
`apps/server/src/services/agentSignal/observability/traceEvents.ts`
`packages/observability-otel/src/modules/agent-signal/index.ts`
## Implementation Rules
@@ -86,7 +86,7 @@ Keep the boundaries strict:
- Use stable ids and idempotency keys when the same source can arrive more than once.
- Preserve scope discipline. The runtime uses `scopeKey` to serialize related background work.
- Prefer the dedicated shared package types and builders from `@lobechat/agent-signal` for normalized nodes and result contracts.
- Add focused tests near the touched runtime, policy, or store module. Existing tests under `src/server/services/agentSignal/**/__tests__` are the reference pattern.
- Add focused tests near the touched runtime, policy, or store module. Existing tests under `apps/server/src/services/agentSignal/**/__tests__` are the reference pattern.
## References
@@ -32,9 +32,9 @@ source node
Read:
- `src/server/services/agentSignal/index.ts`
- `src/server/services/agentSignal/sources/index.ts`
- `src/server/services/agentSignal/runtime/AgentSignalScheduler.ts`
- `apps/server/src/services/agentSignal/index.ts`
- `apps/server/src/services/agentSignal/sources/index.ts`
- `apps/server/src/services/agentSignal/runtime/AgentSignalScheduler.ts`
## Package Boundaries
@@ -56,7 +56,7 @@ Read:
- `packages/agent-signal/src/types/events.ts`
- `packages/agent-signal/src/types/builtin.ts`
### `src/server/services/agentSignal`
### `apps/server/src/services/agentSignal`
Treat this as the server-owned implementation layer.
@@ -89,11 +89,11 @@ Examples:
Define source payloads in:
- `src/server/services/agentSignal/sourceTypes.ts`
- `apps/server/src/services/agentSignal/sourceTypes.ts`
Build normalized sources in:
- `src/server/services/agentSignal/sources/buildSource.ts`
- `apps/server/src/services/agentSignal/sources/buildSource.ts`
- `packages/agent-signal/src/base/builders.ts`
### Signal
@@ -109,7 +109,7 @@ Examples from `analyzeIntent`:
Define server-owned signal types in:
- `src/server/services/agentSignal/policies/types.ts`
- `apps/server/src/services/agentSignal/policies/types.ts`
### Action
@@ -157,9 +157,9 @@ When a user asks for "the procedure", document the flow above and point to the e
Read:
- `src/server/services/agentSignal/sources/index.ts`
- `src/server/services/agentSignal/runtime/context.ts`
- `src/server/services/agentSignal/constants.ts`
- `apps/server/src/services/agentSignal/sources/index.ts`
- `apps/server/src/services/agentSignal/runtime/context.ts`
- `apps/server/src/services/agentSignal/constants.ts`
Use `enqueueAgentSignalSourceEvent(...)` when the work should stay quiet and out-of-band. That path:
@@ -172,8 +172,8 @@ This is the preferred path when the UI request should finish immediately and the
Read:
- `src/server/workflows/agentSignal/index.ts`
- `src/server/workflows/agentSignal/run.ts`
- `apps/server/src/workflows/agentSignal/index.ts`
- `apps/server/src/workflows/agentSignal/run.ts`
## Existing Example: `analyzeIntent`
@@ -192,8 +192,8 @@ agent.user.message
Read:
- `src/server/services/agentSignal/policies/analyzeIntent/index.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/index.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
@@ -2,7 +2,7 @@
## Fluent Registration API
Use the middleware helpers in `src/server/services/agentSignal/runtime/middleware.ts`.
Use the middleware helpers in `apps/server/src/services/agentSignal/runtime/middleware.ts`.
They provide:
@@ -32,7 +32,7 @@ The context gives you:
Read:
- `src/server/services/agentSignal/runtime/context.ts`
- `apps/server/src/services/agentSignal/runtime/context.ts`
## Return Contracts
@@ -48,7 +48,7 @@ Return one of these shapes:
Read:
- `packages/agent-signal/src/base/types.ts`
- `src/server/services/agentSignal/runtime/AgentSignalScheduler.ts`
- `apps/server/src/services/agentSignal/runtime/AgentSignalScheduler.ts`
## Policy Composition Pattern
@@ -72,8 +72,8 @@ That bundle is later passed into the runtime via:
Read:
- `src/server/services/agentSignal/policies/index.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/index.ts`
- `apps/server/src/services/agentSignal/policies/index.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/index.ts`
## Source Handler Pattern
@@ -81,7 +81,7 @@ Use a source handler when you are interpreting a producer event into semantic si
Reference:
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
Pattern:
@@ -114,8 +114,8 @@ Use a signal handler when one semantic state should branch into more semantic st
References:
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
Pattern:
@@ -148,7 +148,7 @@ Use an action handler when the runtime should do actual work.
Reference:
- `src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
Pattern:
@@ -186,9 +186,9 @@ Keep these rules:
Use this split:
- external event payloads:
`src/server/services/agentSignal/sourceTypes.ts`
`apps/server/src/services/agentSignal/sourceTypes.ts`
- policy-owned signal and action payloads:
`src/server/services/agentSignal/policies/types.ts`
`apps/server/src/services/agentSignal/policies/types.ts`
- normalized shared node contracts:
`packages/agent-signal/src/base/types.ts`
@@ -216,10 +216,10 @@ Prefer focused tests near the touched code.
Useful references:
- `src/server/services/agentSignal/runtime/__tests__/AgentSignalRuntime.test.ts`
- `src/server/services/agentSignal/__tests__/index.integration.test.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/__tests__/*`
- `src/server/services/agentSignal/policies/analyzeIntent/actions/__tests__/*`
- `apps/server/src/services/agentSignal/runtime/__tests__/AgentSignalRuntime.test.ts`
- `apps/server/src/services/agentSignal/__tests__/index.integration.test.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/__tests__/*`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/actions/__tests__/*`
Test at the smallest level that proves the behavior:
@@ -24,9 +24,9 @@ After runtime execution, the service projects one compact observability model fr
Read:
- `src/server/services/agentSignal/observability/projector.ts`
- `src/server/services/agentSignal/observability/traceEvents.ts`
- `src/server/services/agentSignal/observability/store.ts`
- `apps/server/src/services/agentSignal/observability/projector.ts`
- `apps/server/src/services/agentSignal/observability/traceEvents.ts`
- `apps/server/src/services/agentSignal/observability/store.ts`
Projection outputs:
@@ -58,7 +58,7 @@ Workflow-triggered runs do not naturally pass through the normal foreground runt
Read:
- `src/server/workflows/agentSignal/run.ts`
- `apps/server/src/workflows/agentSignal/run.ts`
Use that path when:
@@ -77,8 +77,8 @@ Check:
Read:
- `src/server/services/agentSignal/index.ts`
- `src/server/services/agentSignal/sources/index.ts`
- `apps/server/src/services/agentSignal/index.ts`
- `apps/server/src/services/agentSignal/sources/index.ts`
### The signal exists but no action runs
@@ -98,8 +98,8 @@ Check:
Reference:
- `src/server/services/agentSignal/policies/actionIdempotency.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
- `apps/server/src/services/agentSignal/policies/actionIdempotency.ts`
- `apps/server/src/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
### Background runs are hard to discover
+380
View File
@@ -0,0 +1,380 @@
---
name: agent-testing
description: >
Agentic end-to-end testing for LobeHub: backend verification via the CLI,
frontend verification via agent-browser (Electron), full-stack verification in
the browser, and bot-channel verification via osascript. Local-first today,
designed to extend to cloud automation. Triggers on 'cli test', 'test with cli',
'verify with cli', 'backend test with cli', 'local test', 'test in electron',
'test desktop', 'test bot', 'bot test', 'test in discord', 'test in telegram',
'test in slack', 'test in wechat', 'test in weixin', 'test in lark', 'test in feishu',
'test in qq', 'manual test', 'osascript', 'test report', or any local
end-to-end verification task.
---
# Agent Testing (Agentic End-to-End Verification)
One skill for all agentic end-to-end testing — local-first today, designed to
also run as full cloud automation. Every test session follows the same
four-step contract:
```
Step -1: Plan approval → Step 0: Env + Auth → Step 1: Pick surface → Step 2: Run → Step 3: Structured report
```
## Step -1 — Plan approval for non-trivial tests
Skip directly to Step 0 if: the test is a single re-run after a fix, the plan
was already agreed on, or the user gave exact commands.
Otherwise, propose a test plan (surface, cases, expected evidence, assumptions)
and use the runtime structured question tool (`request_user_input` /
ask-user-question equivalent) with two fixed choices:
1. `开始执行 (Recommended)` — 测试方案没问题,开始执行
2. `先讨论下` — 方案有问题,先讨论下
Wait for the user's choice before proceeding.
## Step 0 — Environment setup + auth check (mandatory)
Step 0 is about getting the environment ready: **dependencies are healthy**
and **auth is green**. A test run that dies halfway on a missing dependency or
a login wall wastes the whole session — clear both gates BEFORE writing a
single test step.
### 0.0 Resolve the current test environment
Before starting a dev server, checking auth, opening agent-browser, or writing
test steps, print and confirm the current local test environment:
```bash
./.agents/skills/agent-testing/scripts/test-env.sh
```
This command is the source of truth for local test ports. It reads the current
shell plus `.env` files using the same precedence as `scripts/runWithEnv.mts`,
then prints:
- `APP_URL`
- `PORT`
- `SERVER_URL`
- `AUTH_TRUSTED_ORIGINS`
- `SPA_PORT`
- `MOBILE_SPA_PORT`
- `DESKTOP_PORT`
For commands that need these values, export them from the same resolver:
```bash
eval "$(./.agents/skills/agent-testing/scripts/test-env.sh --exports)"
```
Do not rely on hard-coded port tables. If the printed values do not match the
running dev server, fix/export the env first, then continue.
### 0.1 Dependencies are installed — root AND standalone apps
The root pnpm workspace does **NOT** cover every app: `pnpm-workspace.yaml`
lists `packages/**`, `e2e`, `apps/server`, and only `apps/desktop/src/main`
**`apps/desktop` and `apps/cli` are standalone**, each keeping its own
`node_modules` with its own links into `packages/`. A root install does not
refresh them, so install in every app the test will touch:
```bash
pnpm install # root workspace
cd apps/desktop && pnpm install # Electron surface
cd apps/cli && pnpm install # CLI surface
```
Symptom of a stale standalone install: the build/launch fails to resolve a
recently added workspace package — `Rolldown failed to resolve import
"@lobechat/<pkg>"` (Electron) or `Cannot find module '@lobechat/<pkg>'` (CLI).
### 0.2 Run scripts from the repo root
All paths in this skill (`./.agents/skills/agent-testing/...`) are
repo-root-relative, and background commands inherit the current working
directory — a script launched while `cwd` is `apps/desktop` fails with
`No such file or directory`. Verify `pwd` is the repo root before launching
long-running scripts.
### 0.3 Init local dev env without `.env`
For Web smoke against local code, start a **normal local dev environment**.
First check the repo root for `.env`:
- If `.env` exists, use the existing local configuration and start the dev
server normally.
- If `.env` does not exist, use the agent-testing env bootstrap.
Do not start the standalone e2e server as the product under test.
Use `scripts/init-dev-env.sh`. It follows the e2e setup pattern — Postgres,
Redis, migrations, auth/key-vault/S3 test env, seed user — but it is owned by this
skill and starts the repo's dev server (`pnpm run dev:next` / `bun run dev`),
not `e2e/scripts/setup.ts --start`. The script hard-blocks when root `.env`
exists, so it cannot accidentally override a user's local config. When `.env`
exists, do not call any `init-dev-env.sh` subcommand.
Decision flow:
```bash
if [[ -f .env ]]; then
bun run dev
else
./.agents/skills/agent-testing/scripts/init-dev-env.sh setup-db
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
fi
```
Bootstrap flow when no `.env` exists:
```bash
# From repo root. Managed Postgres/Redis flow requires Docker Desktop.
./.agents/skills/agent-testing/scripts/init-dev-env.sh setup-db
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
```
If using an existing Postgres instead of the managed Docker DB, set
`DATABASE_URL` and `REDIS_URL`, then skip `setup-db`:
```bash
DATABASE_URL=postgresql://... REDIS_URL=redis://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh migrate
DATABASE_URL=postgresql://... REDIS_URL=redis://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
DATABASE_URL=postgresql://... REDIS_URL=redis://... ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
```
For backend-only checks, `dev-next` is available, but Web smoke needs the
full-stack `dev` command so Next can proxy the SPA HTML from Vite:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
```
Useful subcommands:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh env # print exports
./.agents/skills/agent-testing/scripts/init-dev-env.sh write # write .records/env/agent-testing-dev.env
./.agents/skills/agent-testing/scripts/init-dev-env.sh migrate # migrations only
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user # seed user + CLI API key
./.agents/skills/agent-testing/scripts/init-dev-env.sh qstash # local QStash for workflow paths
./.agents/skills/agent-testing/scripts/init-dev-env.sh clean-db # remove managed DB container
```
Default script env:
- `APP_URL=http://localhost:3010`
- `DATABASE_URL=postgresql://postgres:postgres@localhost:5433/postgres`
- `DATABASE_DRIVER=node`
- `AGENT_RUNTIME_MODE=queue` so backend-only agent runtime checks use the
same queued execution path as production
- `REDIS_URL=redis://localhost:6380` for queue-mode agent runtime state
- `FEATURE_FLAGS=-agent_self_iteration` so local smoke does not require QStash
- Local QStash defaults (`QSTASH_URL`, `QSTASH_TOKEN`, signing keys) are exported;
run `init-dev-env.sh qstash` in a separate terminal when the path under test
triggers QStash/Workflow.
- `KEY_VAULTS_SECRET`, `AUTH_SECRET`, auth verification off
- S3 mock vars
- Managed DB container: `lobehub-agent-testing-postgres`
- Managed Redis container: `lobehub-agent-testing-redis`
`seed-user` creates `agent-testing@lobehub.com` / `TestPassword123!` with
onboarding already completed, plus a local API key in
`.records/env/agent-testing-cli.env` for CLI automation. When running Cucumber
against this dev server, pass the same script env into the test process too;
Cucumber has its own `BeforeAll` seed path and it must see `DATABASE_URL`
instead of silently skipping setup:
```bash
cd e2e
# Only in the no-.env branch.
eval "$(../.agents/skills/agent-testing/scripts/init-dev-env.sh env)"
BASE_URL=http://localhost:3010 HEADLESS=true bun run test:smoke
```
### 0.4 Auth is green for the selected surface
**Auth is the gate for automated testing, but the gate is surface-scoped.**
Pick the intended surface first when it is already clear from the task, then
check only that surface. Do not block a Web test on CLI device-code auth or an
Electron login state unless the test spans those surfaces.
```bash
./.agents/skills/agent-testing/scripts/setup-auth.sh status --surface web
```
Use `status` with no `--surface` only for cross-surface test plans.
| Surface | Mechanism | One-key path | Standard check |
| -------- | --------------------------------------------- | ------------------------ | ----------------------------------------- |
| CLI | Seeded API key, device-code fallback | `setup-auth.sh cli-seed` | `setup-auth.sh status --surface cli` |
| Web | Seeded better-auth login into `agent-browser` | `setup-auth.sh web-seed` | `setup-auth.sh status --surface web` |
| Electron | App's own persistent login state | Log in once in the app | `setup-auth.sh status --surface electron` |
| Bot | Native apps already logged in | — | per-platform screenshot |
Login-state checks are standardized — do NOT hand-roll `window.__LOBE_STORES`
eval snippets; use `scripts/app-probe.sh auth` (returns `{ isSignedIn, userId }`,
works for Electron CDP and web sessions via `AB_TARGET`).
For Web tests, the test surface is always `agent-browser --session lobehub-dev`.
Use `setup-auth.sh web-seed` first in the seeded local env. The user's normal
Chrome is only a source for copying the Cookie header when seed auth is not
available or `status --surface web` still fails. If Chrome is already logged in,
do not open a login page; verify agent-browser first, then request the Network
`Cookie:` header only if that verification fails. Full background and failure modes:
[references/auth.md](./references/auth.md).
## Step 1 — Pick the surface by change scope
| Change scope | Default surface | Why | Guide |
| ------------------------------------------------------- | ------------------------------------ | ----------------------------------------------------------------- | ---------------------------------- |
| **Backend** (TRPC router / service / model / migration) | **CLI** | Fastest loop, text-assertable output, zero UI flakiness | [cli/index.md](./cli/index.md) |
| **Pure frontend** (components, store, styles, UX) | **Electron** (agent-browser + CDP) | Primary product shape; `__LOBE_STORES` state introspection | [ui/electron.md](./ui/electron.md) |
| **Full-stack** (new API + UI consuming it) | **Web** (browser + local dev server) | One surface where network requests and UI are observable together | [ui/web.md](./ui/web.md) |
| **Bot channels** (Discord / WeChat / Lark / …) | Native app via osascript / bridge | Only way to exercise the real channel end-to-end | `bot/<platform>/index.md` |
Escalate, don't duplicate: verify a backend change with the CLI first; only add
a UI pass when the change actually affects the UI.
### Environment support (local macOS vs cloud Linux)
The decisive constraint per surface is **how evidence (screenshots) is
captured**: CDP-based capture (`agent-browser screenshot`) renders from the
browser engine and needs no real display; OS-level capture (`screencapture`,
osascript) is macOS-only.
| Surface | macOS (local) | Linux / cloud (headless) | Screenshot mechanism |
| -------- | ------------- | --------------------------------------------------------- | ------------------------------------------------------ |
| CLI | ✅ | ✅ | n/a — text output |
| Web | ✅ | ✅ headless Chromium works natively | CDP — no display needed |
| Electron | ✅ | ⚠️ runs, but needs a display server: wrap with `xvfb-run` | CDP works under Xvfb; `capture-app-window.sh` does NOT |
| Bot | ✅ | ❌ osascript + native apps are macOS-only | macOS `screencapture` only |
When a test must stay cloud-portable, prefer CDP-based evidence over
OS-level capture wherever both exist.
### Bot platforms
| Platform | Guide | Quick switcher |
| ------------- | ------------------------------------------------ | --------------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | bridge (no osascript) |
Each platform folder contains an `index.md` (activation, navigation,
send-message, verification snippets) and a `test-<platform>-bot.sh` script
sharing the interface:
```bash
./.agents/skills/agent-testing/bot/<platform>/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
```
New to osascript automation? Read
[references/osascript.md](./references/osascript.md) first — it is a general
macOS-automation asset (activate, type, paste, screenshot, accessibility reads,
gotchas), not bot-specific.
## Step 2 — Run
Surface guides above carry the detailed workflows. Shared infrastructure:
| Need | Where |
| ------------------------------------ | -------------------------------------------------------------------- |
| Start / restart the local dev server | [references/dev-server.md](./references/dev-server.md) |
| `agent-browser` command reference | [references/agent-browser.md](./references/agent-browser.md) |
| osascript patterns (general macOS) | [references/osascript.md](./references/osascript.md) |
| Agent gateway probing | [references/agent-gateway.md](./references/agent-gateway.md) |
| Screen recording | [references/record-app-screen.md](./references/record-app-screen.md) |
### Scripts
All under `.agents/skills/agent-testing/scripts/`:
| Script | Usage |
| ------------------------- | ---------------------------------------------------------------------------- |
| `test-env.sh` | Print/export the resolved local test env and ports |
| `setup-auth.sh` | One-stop auth setup & status check (`status` / `cli` / `web`) |
| `init-dev-env.sh` | Self-contained local dev env (`setup-db` / `seed-user` / `dev-next` / `dev`) |
| `app-probe.sh` | LobeHub app probes: `auth` / `route` / `ops` / `goto <path>` / `errors` |
| `record-gif.sh` | Frame-sequence → GIF for time-based behavior (streaming, timers, animations) |
| `report-init.sh` | Scaffold a structured test report (Step 3) |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart, CDP 9222) |
| `capture-app-window.sh` | Screenshot a specific app window (general; used by bot tests) |
| `record-app-screen.sh` | Record app screen (video + periodic screenshots) |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `agent-gateway/` | Gateway probe / dump / analyze tools |
`app-probe.sh` is the LobeHub-specific fast path into app state — auth check,
current route, running operations, and `goto <path>` quick navigation
(`/agent/<agentId>/<topicId>`, `/task/<taskId>`, `/settings`, …) so a test can
jump straight to the state under test instead of clicking through the UI. See
[ui/electron.md](./ui/electron.md#lobehub-probes--quick-navigation) for usage.
## Step 3 — Structured report (mandatory deliverable)
Every automated test session ends with a structured, evidence-backed report —
not a chat-only summary. Scaffold it up front and fill it as you test:
```bash
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh my-feature "Verify my feature")
# ... test, saving screenshots / CLI transcripts into $DIR/assets/ ...
# fill $DIR/report.md (scope, case table with inline evidence, verdict, score) and $DIR/result.json
```
Reports live in `.records/reports/<timestamp>-<slug>/` (gitignored): `report.md`
(human-readable, with screenshots/GIFs embedded directly in the case table),
`result.json` (machine-readable pass/fail + score), `assets/` (evidence).
Format spec and evidence rules:
[references/report.md](./references/report.md).
Two hard rules worth front-loading:
- **Report language = the user's conversation language.** Write the ENTIRE
`report.md` (headings included) in the language the user is conversing in —
no mixed English. `result.json` keys/status values stay English.
- **The case table is the main reading surface.** Prefer the compact
`# | case | result | key observation | evidence` shape and embed the
screenshot/GIF in the evidence cell. Use separate evidence sections only for
long CLI transcripts, HAR summaries, or supplemental detail.
- **Visual evidence must render inline.** Screenshots and GIFs in `report.md`
must use Markdown image syntax like `![case 1](assets/case1.png)`. Do not
use bare file paths, Markdown links, or local file links as the primary
visual evidence; those make the report unreadable without opening each asset.
- **Final replies must include visual evidence links.** When a run includes UI
screenshots or GIFs, include the report directory and the most important
visual artifacts in the final chat response. Each item must include a stable
label, an evidence caption describing the observed UI outcome, and a
repo-relative path, for example:
`[Image #1 - error toast shows provider auth failure](<report-dir>/assets/foo.png)`.
Use repo-relative paths, not absolute paths.
- **Time-based behavior needs a GIF, not a screenshot.** If a case asserts
change over time (streaming output, a ticking timer, loading states,
animations), record it with `scripts/record-gif.sh` and embed the GIF —
a static screenshot cannot prove the behavior.
## Directory map
```
agent-testing/
├── SKILL.md # this router
├── cli/index.md # backend verification via the LobeHub CLI
├── ui/electron.md # pure-frontend verification in the desktop app
├── ui/web.md # full-stack verification in the browser
├── bot/<platform>/ # bot-channel verification (osascript / bridge)
├── references/ # shared knowledge: auth, dev-server, agent-browser, osascript, report
└── scripts/ # setup-auth, report-init, electron-dev, capture, recording, gateway
```
## Gotchas
- agent-browser: see [references/agent-browser.md](./references/agent-browser.md#gotchas)
- Electron: see [ui/electron.md](./ui/electron.md#electron-gotchas)
- osascript: see [references/osascript.md](./references/osascript.md#gotchas)
@@ -2,7 +2,7 @@
**App name:** `Discord` | **Process name:** `Discord`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -92,6 +92,6 @@ echo "Screenshot saved to /tmp/discord-test-result.png"
## Script
```bash
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/agent-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -21,7 +21,7 @@ So the test surface is three layers:
curl -sS -m4 -o /dev/null -w '%{http_code}\n' \
"http://127.0.0.1:1234/api/v1/server/info?password=<PW>" # expect 200
```
- **Electron dev running with CDP**: `./.agents/skills/local-testing/scripts/electron-dev.sh start`
- **Electron dev running with CDP**: `./.agents/skills/agent-testing/scripts/electron-dev.sh start`
- The **iMessage Desktop branch** checked out (the `imessageBridge` IPC group
and `@lobechat/chat-adapter-imessage` must be compiled into the main bundle).
Run `pnpm install --ignore-scripts` at the repo root **and** in `apps/desktop/`
@@ -31,7 +31,7 @@ So the test surface is three layers:
## Fast path: automated script
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
./.agents/skills/agent-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
```
Asserts the whole flow and self-cleans (unique `applicationId` per run, removes
@@ -136,7 +136,7 @@ Verifies the leg the bridge uses to _reply_: `BlueBubblesApiClient.sendText`
`POST /api/v1/message/text`. Run the helper against your own number:
```bash
./.agents/skills/local-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
./.agents/skills/agent-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
```
**Gotcha that bites everyone:** with `method=apple-script` and a _new_
@@ -2,7 +2,7 @@
**App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -56,6 +56,6 @@ screencapture /tmp/lark-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/agent-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
```
@@ -80,5 +80,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `QQ` | **Process name:** `QQ`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -57,6 +57,6 @@ screencapture /tmp/qq-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/agent-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
@@ -72,5 +72,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `Slack` | **Process name:** `Slack`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -68,6 +68,6 @@ screencapture /tmp/slack-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/agent-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
```
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `Telegram` | **Process name:** `Telegram`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -75,6 +75,6 @@ curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | j
## Script
```bash
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/agent-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
```
@@ -75,5 +75,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -2,7 +2,7 @@
**App name:** `微信` or `WeChat` | **Process name:** `WeChat`
See [osascript-common.md](../osascript-common.md) for shared patterns.
See [references/osascript.md](../../references/osascript.md) for shared patterns.
## Activate & Navigate
@@ -76,6 +76,6 @@ screencapture /tmp/wechat-bot-response.png
## Script
```bash
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/agent-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
```
@@ -81,5 +81,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/../../scripts/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
+152
View File
@@ -0,0 +1,152 @@
# CLI Backend Verification
Default surface for verifying **backend changes** (TRPC routers, services,
models, migrations) end-to-end: fastest loop, text-assertable output, zero UI
flakiness.
## When to use
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
## Prerequisites
| Requirement | Details |
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| Dev server | `localhost:3010` — see [../references/dev-server.md](../references/dev-server.md) |
| CLI source | `apps/cli/` — runs from source, no rebuild; standalone `node_modules` — run `pnpm install` inside `apps/cli/` (root install does not cover it) |
| CLI dev mode | `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated settings |
| Auth | Seeded API key first; Device Code Flow only as fallback — see [../references/auth.md](../references/auth.md) |
All CLI dev commands run from `apps/cli/`. Subsequent examples use `$CLI`:
```bash
source ../../.records/env/agent-testing-cli.env
CLI="bun src/index.ts"
```
## Workflow
### Step 1 — Server up?
See [../references/dev-server.md](../references/dev-server.md) for the health
check, start, and restart commands. Server-side code changes require a restart.
### Step 2 — Auth ready?
```bash
./.agents/skills/agent-testing/scripts/setup-auth.sh status
```
If the CLI is not ready in the seeded local environment:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
source .records/env/agent-testing-cli.env
./.agents/skills/agent-testing/scripts/setup-auth.sh cli-seed
```
If the target environment is not seeded, use the interactive fallback:
```bash
cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
```
Seeded API-key auth does not store credentials. It writes local settings under
`$HOME/.lobehub-dev` and requires the generated env file to be sourced before
CLI commands. Details:
[../references/auth.md](../references/auth.md).
### Step 3 — Test with CLI commands
CLI runs from source, so CLI-side code changes take effect immediately without
rebuilding:
```bash
cd apps/cli
$CLI <command>
```
Capture output for the report as you go (e.g. `$CLI task list | tee "$DIR/assets/task-list.txt"`).
### Step 4 — Clean up test data
```bash
$CLI task delete < id > -y
$CLI agent delete < id > -y
```
### Step 5 — Report
Finish with a structured report —
[../references/report.md](../references/report.md). CLI evidence = exact
command + trimmed output.
## Common testing patterns
### Task system
```bash
$CLI task list
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
$CLI task view T-1
$CLI task tree T-1
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
$CLI task delete T-1 -y
```
### Agent system
```bash
$CLI agent list
$CLI agent view <agent-id>
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & knowledge base
```bash
$CLI doc list
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
$CLI kb list
$CLI kb tree <kb-id>
```
### Model & provider
```bash
$CLI model list
$CLI provider list
$CLI provider test <provider-id>
```
## Dev-test cycle
```
1. Make code changes (service/model/router/type)
|
2. Run unit tests (fast feedback)
bunx vitest run --silent='passed-only' '<test-file>'
|
3. Restart dev server (if server-side changes — see dev-server.md)
|
4. CLI verification (end-to-end)
$CLI <command>
|
5. Clean up test data + write the report
```
## Troubleshooting
| Issue | Solution |
| --------------------------- | ------------------------------------------------------------------------------------------------------ |
| `No authentication found` | Source `.records/env/agent-testing-cli.env`, or run device-code `login --server http://localhost:3010` |
| `UNAUTHORIZED` on API calls | Re-run `init-dev-env.sh seed-user` and re-source the env file; for device-code fallback, re-run login |
| `ECONNREFUSED` | Dev server not running — see dev-server.md |
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| Login opens wrong server | Must use `--server` flag (env var doesn't work) |
@@ -0,0 +1,257 @@
# agent-browser CLI Reference
Generic reference for the `agent-browser` CLI — automate Chromium-based apps (Electron, Chrome, web) via Chrome DevTools Protocol. LobeHub-specific patterns live in [../ui/electron.md](../ui/electron.md) and [../ui/web.md](../ui/web.md); authentication recipes live in [auth.md](./auth.md).
Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
## Core Workflow
Every browser automation follows this pattern:
1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
```
## Command Chaining
```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
```
Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
## Essential Commands
```bash
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "text" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if dialog is open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
# Streaming
agent-browser stream enable # Start WebSocket streaming
agent-browser stream status # Inspect streaming state
agent-browser stream disable # Stop streaming
```
## Batch Execution
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
```
## Authentication
```bash
# Option 1: Auth vault (credentials stored encrypted)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
# Option 2: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close # State auto-saved
agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
# Option 3: Persistent profile
agent-browser --profile ~/.myapp open https://app.example.com/login
# Option 4: State file
agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3010`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [auth.md](./auth.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
```
## JavaScript Evaluation (eval)
```bash
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin << 'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'document.title' | base64)"
```
## Ref Lifecycle
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
## Annotated Screenshots (Vision Mode)
```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
agent-browser click @e2 # Click using ref from annotated screenshot
```
## Parallel Sessions
```bash
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
```
## Connect to Existing Chrome
```bash
agent-browser --auto-connect snapshot # Auto-discover running Chrome
agent-browser --cdp 9222 snapshot # Explicit CDP port
```
## iOS Simulator (Mobile Safari)
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
```
## Observability Dashboard
```bash
agent-browser dashboard install
agent-browser dashboard start # Background server on port 4848
agent-browser dashboard stop
```
## Cloud Providers
Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
## Browser Engine Selection
```bash
agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
```
## Gotchas
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
@@ -19,13 +19,13 @@ works for any LobeHub streaming session.
```bash
# 1. Start Electron with CDP
./.agents/skills/local-testing/scripts/electron-dev.sh start
./.agents/skills/agent-testing/scripts/electron-dev.sh start
# 2. Navigate to a chat, switch runtime to Cloud Sandbox (gateway mode)
# 3. Install the probe + helpers
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/probe.js
< .agents/skills/agent-testing/scripts/agent-gateway/probe.js
# 4. Send a tool-call message — manually or via type+press
agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
@@ -34,15 +34,15 @@ agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
# rightmost inactive tab as AWAY — edit ROUND_TRIPS / DWELL_MS in the
# file if you want different timing)
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
< .agents/skills/agent-testing/scripts/agent-gateway/tab-switch.js
# 6. Wait for streaming to finish, then dump
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js \
< .agents/skills/agent-testing/scripts/agent-gateway/probe-dump.js \
> /tmp/probe.json
# 7. Analyze
node .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
node .agents/skills/agent-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
```
The analyzer prints three sections: EVENTS, TIMELINE, REGRESSIONS. If
@@ -0,0 +1,166 @@
# Auth Setup for Local Agent Testing
**Auth is the gate for all automated testing.** Complete
[Step 0.0](../SKILL.md#00-resolve-the-current-test-environment) first so
`SERVER_URL` and ports are resolved, then verify auth before writing any test
step.
Initialize helpers first:
```bash
SCRIPT="./.agents/skills/agent-testing/scripts/setup-auth.sh"
TEST_ENV="./.agents/skills/agent-testing/scripts/test-env.sh"
eval "$($TEST_ENV --exports)"
```
Quick reference after initialization:
| Command | Purpose |
| ------------------------------ | -------------------------------------------------- |
| `$SCRIPT status` | Check all surfaces (server + CLI + web + Electron) |
| `$SCRIPT status --surface web` | Check only the Web surface gate |
| `$SCRIPT cli-seed` | Configure CLI API-key auth from the seeded key |
| `$SCRIPT cli` | Interactive CLI device-code login (user must run) |
| `$SCRIPT open-chrome` | Open Chrome at `SERVER_URL` with DevTools |
| `$SCRIPT web-seed` | Sign in the seeded user and inject cookies |
| `pbpaste \| $SCRIPT web` | Inject a copied Cookie header into agent-browser |
| `$SCRIPT web-verify` | Live-check agent-browser session auth |
Use `localhost` for Web auth; better-auth cookies are stored for `localhost`,
not `127.0.0.1`.
## Per-surface overview
| Surface | Mechanism | Persistence | Human interaction |
| -------- | ---------------------------------------- | ----------------------------------------------------------------- | ---------------------------------------------- |
| CLI | Seeded API key or OIDC Device Code Flow | `.records/env/agent-testing-cli.env` + `$HOME/.lobehub-dev` | No for seed path; yes for device-code fallback |
| Web | Seeded better-auth login or cookie copy | `~/.lobehub-agent-testing/web-state.json` + agent-browser session | No for seed path; copy cookie only as fallback |
| Electron | App's own login state | Electron user-data dir | Log in once manually in the app |
| Bot | Native apps (Discord/WeChat/…) logged in | Each app's own session | Once per app |
## CLI — Seeded API key
For the self-contained no-root-`.env` dev environment, seed the baseline user
and API key once:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
source .records/env/agent-testing-cli.env
./.agents/skills/agent-testing/scripts/setup-auth.sh cli-seed
```
The seed step writes `LOBE_API_KEY` for humans and maps it to the CLI's current
auth variable, `LOBEHUB_CLI_API_KEY`. It also sets `LOBEHUB_SERVER` so CLI
commands hit the local server without needing a stored device-code token.
Use this for automated CLI verification:
```bash
cd apps/cli
source ../../.records/env/agent-testing-cli.env
bun src/index.ts <command>
```
## CLI — Device Code Flow fallback
Use device-code login only when testing against a non-seeded environment.
Credentials are isolated from the user's real CLI config via
`LOBEHUB_CLI_HOME=.lobehub-dev`, which the current CLI stores under
`$HOME/.lobehub-dev`.
```bash
cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3010
```
- The `--server` flag is required — an env var does NOT work and login will hit
the wrong server without it.
- Check state without logging in: `setup-auth.sh status` (verifies
`LOBEHUB_CLI_API_KEY` when present, otherwise checks the stored server URL).
- `UNAUTHORIZED` on API calls means the token expired — re-run login.
## Web — seeded better-auth login
The Web test surface is `agent-browser --session lobehub-dev`. The user's
ordinary Chrome is only a cookie source; Chrome screenshots, Chrome Network
records, and Chrome logged-in state do not prove the agent-browser test session
is authenticated.
For the seeded local dev environment, use the automatic path:
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/setup-auth.sh web-seed
```
`web-seed` posts the seeded email/password to
`/api/auth/sign-in/email`, stores the returned cookie jar under
`~/.lobehub-agent-testing/`, converts it to Playwright `storageState`, loads it
into the `agent-browser` session, and verifies the session does not land on
`/signin`.
## Web — manual cookie injection fallback
`agent-browser --headed` on macOS often creates the Chromium window off-screen —
the user can't see or interact with it, so manual login inside the agent-browser
session fails. Instead, copy the **better-auth session cookie** out of the
user's own logged-in Chrome and inject it as a Playwright-style state file.
Do **not** use this on production URLs — only local dev. Treat the cookie as a
secret: don't paste it into shared logs, PRs, or commit it anywhere.
### Web — decision flow
1. `$SCRIPT status --surface web` — green? Start testing. Do not ask for a Cookie header.
2. Not green and using the seeded local env → `$SCRIPT web-seed`.
3. Still not green or not using the seed env → `$SCRIPT open-chrome` opens Chrome at `SERVER_URL` with DevTools.
4. User copies the `Cookie:` header from Network tab → any same-origin request → Request Headers → right-click `Cookie:`**Copy value**. Must be from Network, NOT `document.cookie` (HttpOnly cookies are invisible to `document.cookie`).
5. `pbpaste | $SCRIPT web` — filters to better-auth cookies (`session_token`, `session_data`, `state`), builds Playwright `storageState`, loads it into the `agent-browser` session (`lobehub-dev`), opens `SERVER_URL`, and asserts the URL is not `/signin`.
### Using the authenticated session
```bash
agent-browser --session lobehub-dev open "$SERVER_URL/"
agent-browser --session lobehub-dev snapshot -i | head -20
```
### Notes
- `storageState` doesn't enforce the HttpOnly flag on load — the script stores
cookies with `httpOnly: false`, which is fine for local dev and sidesteps a
CDP-context quirk where HttpOnly cookies sometimes fail to attach.
- The state file is kept at `~/.lobehub-agent-testing/web-state.json` so
`setup-auth.sh status` can report web-auth readiness across sessions.
### Common failure modes
| Symptom | Cause | Fix |
| --------------------------------------------- | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- |
| Still redirects to `/signin` after injection | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| Script reports `no better-auth cookies found` | User pasted the wrong value, or the cookie parser regressed | Keep the raw `Cookie:` header as-is; run `scripts/setup-auth.test.sh` if the input looks valid |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-inject |
| Domain mismatch | Cookie domain must be `localhost` literally, no leading dot for local dev | — |
## Electron
The desktop app keeps its own persistent login state in its user-data
directory — log in once manually inside the app and it survives restarts of
`electron-dev.sh`. No injection needed. The standard check (do NOT hand-roll a
store eval) once Electron is up with CDP:
```bash
./.agents/skills/agent-testing/scripts/app-probe.sh auth
# → {"ok":true,"isSignedIn":true,"userId":"user_xxx"}
```
`setup-auth.sh status` runs this probe automatically when CDP 9222 is
reachable.
## Scope
These recipes only cover **local dev** authentication. They do not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com`
and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI itself need a
real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way.
@@ -0,0 +1,101 @@
# Local Dev Server
Single source of truth for starting / restarting the backend that all test
surfaces (CLI, Electron, Web) hit.
## Resolve ports first
Run `test-env.sh` as described in
[SKILL.md Step 0.0](../SKILL.md#00-resolve-the-current-test-environment)
before starting or probing any local test surface.
## Ports & modes
| Command | What it runs | Port source |
| ------------------- | --------------------------------------------------------- | ------------------- |
| `pnpm run dev:next` | Next.js backend (API + auth) | `PORT` |
| `bun run dev` | Full-stack (Next.js + Vite SPA, via `devStartupSequence`) | `PORT` + `SPA_PORT` |
| `bun run dev:spa` | Vite SPA only, proxies API to `PORT` | `SPA_PORT` |
In the **cloud repo** (where this repo is the `lobehub/` submodule), local
worktree names map to fallback defaults only when `.env` and shell env do not
provide values:
| Workspace directory | Default `SERVER_URL` |
| ------------------- | -------------------------------- |
| `lobehub` | `http://localhost:3010` |
| `lobehub-cloud` | `http://localhost:3020` |
| `lobehub-cloud-1` | `http://localhost:3021` |
| `lobehub-cloud-N` | `http://localhost:$((3020 + N))` |
`test-env.sh` and `setup-auth.sh` both use the resolved env first and these
worktree defaults only as fallback. Treat the dev-server terminal output as the
final source of truth when testing a non-standard port, then export it for every
agent-testing command:
```bash
export SERVER_URL=http://localhost:<port-from-dev-output>
```
## Health check
```bash
curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/"
```
## Start / restart
```bash
# Start backend only.
# With root .env: use the existing local config.
# Agent runtime queue mode is required to mirror production async execution.
AGENT_RUNTIME_MODE=queue pnpm run dev:next
# Without root .env: use the self-contained agent-testing env.
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
# Full-stack SPA + backend. Required for Web smoke.
# With root .env:
AGENT_RUNTIME_MODE=queue bun run dev
# Without root .env:
./.agents/skills/agent-testing/scripts/init-dev-env.sh dev
# Local QStash. Run in a separate terminal only when testing workflow paths.
./.agents/skills/agent-testing/scripts/init-dev-env.sh qstash
# Restart — required to pick up server-side code changes
lsof -ti:"$PORT" | xargs kill
pnpm run dev:next
# or, when no root .env exists:
# ./.agents/skills/agent-testing/scripts/init-dev-env.sh dev-next
```
## When a server restart is needed
Next.js hot-reload may not pick up changes in workspace packages — restart when
in doubt.
| Change location | Restart? |
| ----------------------------------------------- | -------- |
| `apps/server/src/` (routers, services, modules) | Yes |
| `src/server/` (agent-hono, workflows-hono) | Yes |
| `packages/database/` (models) | Yes |
| `packages/types/` | Yes |
| `packages/prompts/` | Yes |
| `apps/cli/` (CLI runs from source) | No |
## Troubleshooting
| Issue | Solution |
| ------------------------- | --------------------------------------------------------------------------------------------- |
| `ECONNREFUSED` | Server not running — start it |
| `EADDRINUSE` on the port | Already running — `lsof -ti:<port> \| xargs kill` first |
| Stale data / old behavior | Server needs a restart to pick up code changes |
| Agent call runs inline | Set `AGENT_RUNTIME_MODE=queue`, make sure `REDIS_URL` is configured, then restart the server |
| Queue mode needs Redis | Run `init-dev-env.sh setup-db`, or provide `REDIS_URL=redis://...` for an existing Redis |
| QStash workflow failures | Start `init-dev-env.sh qstash` and make sure dev server inherited the script's `QSTASH_*` env |
Marketplace/community endpoints are not part of the local agent-testing auth
gate. Do not block local product-chain verification on marketplace API auth
unless the change explicitly targets marketplace behavior.
@@ -12,13 +12,13 @@ General-purpose screen recording tool for the Electron app. Captures CDP screens
```bash
# Start recording (Electron must be running with CDP)
.agents/skills/local-testing/scripts/record-app-screen.sh start [output_name]
.agents/skills/agent-testing/scripts/record-app-screen.sh start [output_name]
# Stop recording and assemble video
.agents/skills/local-testing/scripts/record-app-screen.sh stop
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
# Check if recording is active
.agents/skills/local-testing/scripts/record-app-screen.sh status
.agents/skills/agent-testing/scripts/record-app-screen.sh status
```
### Arguments
@@ -74,10 +74,10 @@ The `.records/` directory is at the project root and is gitignored.
```bash
# Start Electron
.agents/skills/local-testing/scripts/electron-dev.sh start
.agents/skills/agent-testing/scripts/electron-dev.sh start
# Start recording
.agents/skills/local-testing/scripts/record-app-screen.sh start my-test
.agents/skills/agent-testing/scripts/record-app-screen.sh start my-test
# Run automation
agent-browser --cdp 9222 click @e61
@@ -86,14 +86,14 @@ agent-browser --cdp 9222 press Enter
sleep 10
# Stop and get results
.agents/skills/local-testing/scripts/record-app-screen.sh stop
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
# → .records/my-test.mp4 + .records/my-test/*.png
```
### Gateway Streaming Demo
```bash
.agents/skills/local-testing/scripts/electron-dev.sh start
.agents/skills/agent-testing/scripts/electron-dev.sh start
# Inject gateway URL
agent-browser --cdp 9222 eval --stdin << 'EOF'
@@ -106,19 +106,19 @@ agent-browser --cdp 9222 eval --stdin << 'EOF'
EOF
# Record
.agents/skills/local-testing/scripts/record-app-screen.sh start gateway-demo
.agents/skills/agent-testing/scripts/record-app-screen.sh start gateway-demo
# Navigate to agent, send message, wait for completion...
# (automation commands here)
.agents/skills/local-testing/scripts/record-app-screen.sh stop
.agents/skills/agent-testing/scripts/record-app-screen.sh stop
open .records/gateway-demo.mp4
```
### Check Active Recording
```bash
.agents/skills/local-testing/scripts/record-app-screen.sh status
.agents/skills/agent-testing/scripts/record-app-screen.sh status
# [record] Active recording
# Frames: 42 captured (running: yes)
# Screenshots: 14 captured (running: yes)
@@ -0,0 +1,186 @@
# Structured Test Reports
Every automated test session ends with a structured, evidence-backed report.
A chat-only summary is not an acceptable deliverable: the report is what the
user (or a reviewer, or a later agent) audits without replaying the session.
## Location & layout
Reports live under `.records/reports/` (gitignored, like all `.records/`
output):
```
.records/reports/<YYYYMMDD-HHMMSS>-<slug>/
├── report.md # human-readable report (case table with inline screenshots, verdict)
├── result.json # machine-readable results (pass/fail counts, score)
└── assets/ # evidence: screenshots, HAR files, CLI transcripts
```
## Workflow
1. **Scaffold up front** — before running the first test step:
```bash
DIR=$(./.agents/skills/agent-testing/scripts/report-init.sh < slug > "<title>")
```
The script creates the directory, pre-fills branch / commit / date in both
files, and prints the directory path. The scaffold uses the compact report
shape below; translate its headings and table labels to the user's language
before delivery if needed.
2. **Collect evidence as you test** — every asserted behavior gets one evidence
item in `$DIR/assets/`:
- UI (static state): `agent-browser screenshot` or `capture-app-window.sh`,
then **verify the screenshot with the Read tool before citing it** —
never cite an image you haven't looked at.
- UI (time-based behavior): **screenshot vs GIF is a judgment you must
make per case.** If the assertion is about change over time — streaming
output, a ticking timer, loading/progress states, animations,
appear/disappear transitions — a static screenshot cannot prove it.
Record a frame sequence and synthesize a GIF:
```bash
# start recording (background), trigger the behavior, wait for it to finish
../scripts/record-gif.sh "$DIR/assets/case2-streaming.gif" 12 2 &
GIF_PID=$!
# ... drive the scenario ...
wait $GIF_PID
```
Embed it like an image: `![case 2](assets/case2-streaming.gif)`. Verify
at least the first/last frames visually (Read the GIF) before citing.
- CLI: exact command + trimmed output (`$CLI task list | tee "$DIR/assets/task-list.txt"`).
- Network: `agent-browser network requests` dumps or HAR files.
3. **Fill `report.md` as you go** — don't reconstruct from memory at the end.
The primary evidence belongs in the case table itself: each row should pair
the assertion with the screenshot/GIF or non-visual artifact that proves it,
so readers can scan the result without jumping between sections. UI evidence
must render inline with Markdown image syntax; a plain link or file path is
not acceptable as primary visual evidence.
4. **Set the verdict** in both `report.md` and `result.json`, then link the
report directory in your final answer to the user. If UI evidence exists,
list the key screenshot/GIF links in the final chat response. Use Markdown
link text as the evidence caption, for example:
`[Image #1 - observed outcome](<report-dir>/assets/case1.png)`.
## Report language (hard rule)
**`report.md` MUST be written in the language the user is conversing in** —
the whole file, headings included. If the conversation is in Chinese, the
report is in Chinese; do not mix English prose into it. The scaffold headings
are placeholders — translate them when filling if the user is not conversing in
the scaffold language. Exceptions that stay as-is: code/commands, identifiers,
log excerpts, and `result.json` (its keys and status values are machine-read
and stay English; the `title` and case `name` fields follow the user's
language).
## report.md sections
Default report shape:
| Section | Content |
| ---------------- | -------------------------------------------------------------------------------------------- |
| **Scope** | What changed / what is being verified; branch, commit, date, surface, entry URL/page, focus |
| **Cases** | Compact table: `# \| Case \| Result \| Key observation \| Evidence` |
| **Verdict** | Overall verdict first (`pass` / `partial` / `fail`), then the concise reasons and follow-ups |
| **Verification** | Commands or automated checks run in this session, with trimmed results |
| **Score** | Pass/fail/blocked counts, optional 0100 score |
The case table is the main reading surface. Prefer one clear row per user
scenario or regression assertion, and put the screenshot/GIF directly in the
`Evidence` cell:
```markdown
| # | Case | Result | Key observation | Evidence |
| --- | ------------------------ | ------ | ----------------------------------------------------------------- | ------------------------------------------------ |
| 1 | Create a new page | pass | Title and body persisted after refresh | ![created page](assets/new-page-created.png) |
| 2 | Respect requested length | fail | Requested about 600 Chinese characters; final body was about 1286 | ![final article](assets/write-article-final.png) |
```
## Inline visual evidence
Screenshots and GIFs must be embedded so the report shows the image inline:
```markdown
![case 1 result](assets/case1-result.png)
![streaming response](assets/case2-streaming.gif)
```
Do **not** use these as the primary evidence for UI cases:
```markdown
[case 1 result](assets/case1-result.png)
assets/case1-result.png
file:///tmp/case1-result.png
```
Links are acceptable for non-visual artifacts such as CLI transcripts, HAR
files, or long logs. For videos, embed a representative screenshot/GIF inline in
the case row and link the full video as supplemental evidence.
Avoid the old wide table with separate `steps`, `expected`, and `actual`
columns unless the test is purely non-visual and truly needs that breakdown.
For UI reports, those columns make screenshot-backed reading harder. Put
procedural detail in the row's key observation only when it changes the
interpretation of the result.
Use an extra evidence/detail section only when the inline table cannot carry
the material cleanly, such as long CLI transcripts, HAR summaries, or multiple
screenshots for one case. In that situation, keep the table evidence cell as an
inline visual proof for UI cases or a concise link for non-visual artifacts,
then put the longer material under `Verification` or a brief
`Additional Evidence` section.
Status values: `pass` / `fail` / `blocked` (couldn't run — e.g. auth or env
missing; a blocked case is not a pass).
## result.json schema
```json
{
"branch": "feat/task-tree",
"cases": [
{
"id": "1",
"name": "task tree returns nested children",
"surface": "cli",
"status": "pass",
"evidence": ["assets/task-tree.txt"]
}
],
"commit": "abc1234",
"createdAt": "2026-06-11T15:30:00+08:00",
"summary": {
"total": 1,
"passed": 1,
"failed": 0,
"blocked": 0,
"score": 100,
"verdict": "pass"
},
"surfaces": ["cli"],
"title": "Verify task tree API"
}
```
`score` is optional — use it when the verdict has a subjective component (UI
polish, copy quality); omit it for purely binary runs. `verdict` is the single
word the user reads first: `pass`, `fail`, or `partial`.
## Rules
- **No evidence, no claim** — every `pass`/`fail` in the case table must link
at least one asset. UI cases must inline-embed their primary screenshot/GIF;
non-visual CLI/network cases may link transcripts, HAR files, or logs.
- **Screenshots must be visually verified** with the Read tool before being
cited.
- **Report failures faithfully** — a failing case with clear evidence is a good
report; a vague green one is not.
- If coverage was cut (cases skipped, surfaces not exercised), say so in the
Verdict section — silent truncation reads as "covered everything".
@@ -11,7 +11,7 @@
// 6. ROLLBACKS — msgN / childN / role drops in the active-topic timeline
//
// Usage:
// bun run .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
// bun run .agents/skills/agent-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
import { readFileSync } from 'node:fs';
@@ -5,16 +5,16 @@
// streaming-replay test fixtures.
//
// Commands:
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts install
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts install
// Bundle probe-events.ts and inject into the CDP-attached browser.
// Re-installing clears all buffers and re-patches WebSocket / fetch.
//
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts dump [name]
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts dump [name]
// Stop the timeline timer, fetch the capture as JSON, write it to
// `.agent-gateway/<name>-<YYYYMMDD-HHmmss>.json`. `name` defaults to
// `dump`. Prints the absolute path written.
//
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts analyze [path]
// bun run .agents/skills/agent-testing/scripts/agent-gateway/run.ts analyze [path]
// Run analyze-events.ts on the dump. `path` defaults to the most
// recently modified file in `.agent-gateway/`.
//
@@ -28,7 +28,7 @@ import path from 'node:path';
import { fileURLToPath } from 'node:url';
const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
// .agents/skills/local-testing/scripts/agent-gateway/ → 5 levels up
// .agents/skills/agent-testing/scripts/agent-gateway/ → 5 levels up
const PROJECT_ROOT = path.resolve(SCRIPT_DIR, '../../../../..');
const DUMP_DIR = path.join(PROJECT_ROOT, '.agent-gateway');
+95
View File
@@ -0,0 +1,95 @@
#!/usr/bin/env bash
# app-probe.sh — standardized probes for a running LobeHub app (Electron via
# CDP, or a web agent-browser session). Use these instead of hand-rolling
# `window.__LOBE_STORES` eval snippets — especially the auth check.
#
# Usage:
# app-probe.sh auth # { isSignedIn, userId } from the user store
# app-probe.sh route # current SPA route
# app-probe.sh ops # running chat operations (type / status / startTime)
# app-probe.sh goto <path> # navigate the SPA to a route (full reload), e.g. goto /agent/agt_xxx
# app-probe.sh errors-install # install a console.error interceptor
# app-probe.sh errors # dump errors captured since errors-install
#
# Target selection (default: Electron over CDP 9222):
# AB_TARGET="--cdp 9222" # Electron (default; CDP_PORT also honored)
# AB_TARGET="--session lobehub-dev" # web agent-browser session
#
# Common routes (desktop SPA): / /agent/<agentId> /agent/<agentId>/<topicId>
# /task /task/<taskId> /page /settings /community
set -euo pipefail
AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
run_eval() {
# shellcheck disable=SC2086
agent-browser $AB_TARGET eval --stdin
}
case "${1:-}" in
auth)
run_eval << 'EVALEOF'
(function () {
var stores = window.__LOBE_STORES;
if (!stores || !stores.user) return JSON.stringify({ ok: false, reason: 'no user store — app not loaded yet?' });
var u = stores.user();
return JSON.stringify({ ok: !!u.isSignedIn, isSignedIn: !!u.isSignedIn, userId: (u.user && u.user.id) || null });
})()
EVALEOF
;;
route)
run_eval << 'EVALEOF'
location.pathname + location.search + location.hash
EVALEOF
;;
ops)
run_eval << 'EVALEOF'
(function () {
var stores = window.__LOBE_STORES;
if (!stores || !stores.chat) return JSON.stringify({ ok: false, reason: 'no chat store — open a conversation first' });
var ops = Object.values(stores.chat().operations || {});
var running = ops.filter(function (o) { return o.status === 'running'; });
return JSON.stringify({
ok: true,
running: running.map(function (o) { return { startTime: o.metadata && o.metadata.startTime, type: o.type }; }),
runningCount: running.length,
total: ops.length,
});
})()
EVALEOF
;;
goto)
TARGET_PATH="${2:?Usage: app-probe.sh goto <path>}"
# shellcheck disable=SC2086
agent-browser $AB_TARGET eval "location.href = '$TARGET_PATH'" > /dev/null
sleep 2
bash "${BASH_SOURCE[0]}" route
;;
errors-install)
run_eval << 'EVALEOF'
(function () {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function () {
var msg = Array.from(arguments).map(function (a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
;;
errors)
run_eval << 'EVALEOF'
JSON.stringify(window.__CAPTURED_ERRORS || 'interceptor not installed — run errors-install first')
EVALEOF
;;
*)
echo "Usage: $0 {auth|route|ops|goto <path>|errors-install|errors}" >&2
exit 2
;;
esac
+459
View File
@@ -0,0 +1,459 @@
#!/usr/bin/env bash
# init-dev-env.sh — self-contained local dev env for agent testing.
#
# This script initializes the env needed to run LobeHub's normal local dev
# server without depending on a root .env file. It follows the same shape as
# the e2e bootstrap (Postgres + migrations + auth/key-vault/S3 test env), but
# starts the repo's dev server, not the standalone e2e server.
#
# Guardrail: if repo-root .env exists, every non-help command exits immediately.
# Existing local config always wins.
#
# Usage:
# init-dev-env.sh env # print shell exports
# init-dev-env.sh write [file] # write a source-able env file
# init-dev-env.sh setup-db # start local Postgres/Redis and run migrations
# init-dev-env.sh migrate # run DB migrations against the configured DB
# init-dev-env.sh seed-user # seed the baseline test user + CLI API key
# init-dev-env.sh qstash # run local Upstash QStash dev server
# init-dev-env.sh dev-next # exec `pnpm run dev:next` with this env
# init-dev-env.sh dev # exec `bun run dev` with this env
# init-dev-env.sh clean-db # remove the managed Postgres/Redis containers
#
# Overrides:
# SERVER_PORT=3010 DB_PORT=5433 DB_CONTAINER=lobehub-agent-testing-postgres REDIS_PORT=6380 REDIS_CONTAINER=lobehub-agent-testing-redis QSTASH_DEV_PORT=8080
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
ROOT_ENV_FILE="$REPO_ROOT/.env"
SERVER_PORT="${SERVER_PORT:-3010}"
DB_PORT="${DB_PORT:-5433}"
DB_CONTAINER="${DB_CONTAINER:-lobehub-agent-testing-postgres}"
DATABASE_URL="${DATABASE_URL:-postgresql://postgres:postgres@localhost:${DB_PORT}/postgres}"
REDIS_PORT="${REDIS_PORT:-6380}"
REDIS_CONTAINER="${REDIS_CONTAINER:-lobehub-agent-testing-redis}"
REDIS_URL="${REDIS_URL:-redis://localhost:${REDIS_PORT}}"
ENV_FILE_DEFAULT="$REPO_ROOT/.records/env/agent-testing-dev.env"
CLI_ENV_FILE_DEFAULT="$REPO_ROOT/.records/env/agent-testing-cli.env"
AGENT_TESTING_API_KEY="${AGENT_TESTING_API_KEY:-sk-lh-agenttesting0001}"
QSTASH_DEV_PORT="${QSTASH_DEV_PORT:-8080}"
QSTASH_LOCAL_TOKEN="${QSTASH_LOCAL_TOKEN:-eyJVc2VySUQiOiJkZWZhdWx0VXNlciIsIlBhc3N3b3JkIjoiZGVmYXVsdFBhc3N3b3JkIn0=}"
QSTASH_LOCAL_CURRENT_SIGNING_KEY="${QSTASH_LOCAL_CURRENT_SIGNING_KEY:-sig_7kYjw48mhY7kAjqNGcy6cr29RJ6r}"
QSTASH_LOCAL_NEXT_SIGNING_KEY="${QSTASH_LOCAL_NEXT_SIGNING_KEY:-sig_5ZB6DVzB1wjE8S6rZ7eenA8Pdnhs}"
ok() { printf ' \033[32m✔\033[0m %s\n' "$1"; }
bad() { printf ' \033[31m✘\033[0m %s\n' "$1"; }
note() { printf ' %s\n' "$1"; }
guard_no_root_env() {
if [[ -f "$ROOT_ENV_FILE" ]]; then
bad "root .env exists: $ROOT_ENV_FILE"
note "Use the existing local configuration instead of init-dev-env.sh."
note "Start normally from repo root, e.g. pnpm run dev:next or bun run dev."
exit 1
fi
}
apply_env() {
export AGENT_RUNTIME_MODE="${AGENT_RUNTIME_MODE:-queue}"
export APP_URL="${APP_URL:-http://localhost:${SERVER_PORT}}"
export AUTH_EMAIL_VERIFICATION="${AUTH_EMAIL_VERIFICATION:-0}"
export AUTH_SECRET="${AUTH_SECRET:-agent-testing-local-auth-secret-32chars}"
export DATABASE_DRIVER="${DATABASE_DRIVER:-node}"
export DATABASE_URL
export FEATURE_FLAGS="${FEATURE_FLAGS:--agent_self_iteration}"
export KEY_VAULTS_SECRET="${KEY_VAULTS_SECRET:-r2gbBPKyJ8ZRKCLKt+I3DImfcL+wGxaQyRC56xtm9Uk=}"
export NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION="${NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION:-0}"
export NODE_OPTIONS="${NODE_OPTIONS:---max-old-space-size=6144}"
export PORT="${PORT:-$SERVER_PORT}"
export QSTASH_CURRENT_SIGNING_KEY="${QSTASH_CURRENT_SIGNING_KEY:-$QSTASH_LOCAL_CURRENT_SIGNING_KEY}"
export QSTASH_DEV_PORT
export QSTASH_NEXT_SIGNING_KEY="${QSTASH_NEXT_SIGNING_KEY:-$QSTASH_LOCAL_NEXT_SIGNING_KEY}"
export QSTASH_TOKEN="${QSTASH_TOKEN:-$QSTASH_LOCAL_TOKEN}"
export QSTASH_URL="${QSTASH_URL:-http://127.0.0.1:${QSTASH_DEV_PORT}}"
export REDIS_URL
export S3_ACCESS_KEY_ID="${S3_ACCESS_KEY_ID:-agent-testing-access-key}"
export S3_BUCKET="${S3_BUCKET:-agent-testing-bucket}"
export S3_ENDPOINT="${S3_ENDPOINT:-https://agent-testing-s3.localhost}"
export S3_SECRET_ACCESS_KEY="${S3_SECRET_ACCESS_KEY:-agent-testing-secret-key}"
}
env_keys() {
printf '%s\n' \
APP_URL \
AGENT_RUNTIME_MODE \
AUTH_EMAIL_VERIFICATION \
AUTH_SECRET \
DATABASE_DRIVER \
DATABASE_URL \
FEATURE_FLAGS \
KEY_VAULTS_SECRET \
NEXT_PUBLIC_AUTH_EMAIL_VERIFICATION \
NODE_OPTIONS \
PORT \
QSTASH_CURRENT_SIGNING_KEY \
QSTASH_DEV_PORT \
QSTASH_NEXT_SIGNING_KEY \
QSTASH_TOKEN \
QSTASH_URL \
REDIS_URL \
S3_ACCESS_KEY_ID \
S3_BUCKET \
S3_ENDPOINT \
S3_SECRET_ACCESS_KEY
}
print_env() {
apply_env
while IFS= read -r key; do
printf 'export %s=%q\n' "$key" "${!key}"
done < <(env_keys)
}
write_env() {
local file="${1:-$ENV_FILE_DEFAULT}"
apply_env
mkdir -p "$(dirname "$file")"
{
printf '# Source this file before starting LobeHub local dev server.\n'
printf '# Generated by %s\n' "$0"
while IFS= read -r key; do
printf 'export %s=%q\n' "$key" "${!key}"
done < <(env_keys)
} > "$file"
ok "wrote env file: $file"
note "source it with: source $file"
}
require_docker() {
if ! command -v docker > /dev/null 2>&1; then
bad "docker CLI is not available"
note "Install/start Docker Desktop, or provide DATABASE_URL for an existing Postgres."
return 1
fi
}
wait_for_db() {
printf ' waiting for Postgres'
until docker exec "$DB_CONTAINER" pg_isready -U postgres > /dev/null 2>&1; do
printf '.'
sleep 2
done
printf '\n'
}
wait_for_redis() {
printf ' waiting for Redis'
until docker exec "$REDIS_CONTAINER" redis-cli ping > /dev/null 2>&1; do
printf '.'
sleep 1
done
printf '\n'
}
start_db() {
require_docker
if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
ok "Postgres container already running: $DB_CONTAINER"
elif docker ps -a --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
docker start "$DB_CONTAINER" > /dev/null
ok "started existing Postgres container: $DB_CONTAINER"
else
docker run -d \
--name "$DB_CONTAINER" \
-e POSTGRES_PASSWORD=postgres \
-p "${DB_PORT}:5432" \
paradedb/paradedb:latest > /dev/null
ok "created Postgres container: $DB_CONTAINER"
fi
wait_for_db
}
start_redis() {
require_docker
if docker ps --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
ok "Redis container already running: $REDIS_CONTAINER"
elif docker ps -a --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
docker start "$REDIS_CONTAINER" > /dev/null
ok "started existing Redis container: $REDIS_CONTAINER"
else
docker run -d \
--name "$REDIS_CONTAINER" \
-p "${REDIS_PORT}:6379" \
redis:7-alpine > /dev/null
ok "created Redis container: $REDIS_CONTAINER"
fi
wait_for_redis
}
migrate_db() {
apply_env
cd "$REPO_ROOT"
bun run db:migrate
}
seed_user() {
apply_env
export AGENT_TESTING_API_KEY
export AGENT_TESTING_CLI_ENV_FILE="${AGENT_TESTING_CLI_ENV_FILE:-$CLI_ENV_FILE_DEFAULT}"
cd "$REPO_ROOT"
node <<'NODE'
const bcrypt = require('bcryptjs');
const crypto = require('node:crypto');
const fs = require('node:fs');
const path = require('node:path');
const pg = require('pg');
const databaseUrl = process.env.DATABASE_URL;
if (!databaseUrl) {
throw new Error('DATABASE_URL is required to seed the baseline test user.');
}
const TEST_USER = {
email: 'agent-testing@lobehub.com',
fullName: 'Agent Testing User',
id: 'user_agent_testing_001',
password: 'TestPassword123!',
username: 'agent_testing_user',
};
const TEST_API_KEY = {
id: 'api_key_agent_testing_001',
key: process.env.AGENT_TESTING_API_KEY || 'sk-lh-agenttesting0001',
name: 'Agent Testing CLI API Key',
};
const validateApiKeyFormat = (apiKey) => /^sk-lh-[\da-z]{16}$/.test(apiKey);
const hashApiKey = (apiKey) => {
const secret = process.env.KEY_VAULTS_SECRET;
if (!secret) throw new Error('KEY_VAULTS_SECRET is required to seed the baseline API key.');
return crypto.createHmac('sha256', secret).update(apiKey).digest('hex');
};
const encryptWithKeyVaultsSecret = (plaintext) => {
const secret = process.env.KEY_VAULTS_SECRET;
if (!secret) throw new Error('KEY_VAULTS_SECRET is required to seed the baseline API key.');
const rawKey = Buffer.from(secret, 'base64');
if (![16, 24, 32].includes(rawKey.length)) {
throw new Error(
`KEY_VAULTS_SECRET must decode to 16, 24, or 32 bytes, got ${rawKey.length} bytes.`,
);
}
const iv = crypto.randomBytes(12);
const cipher = crypto.createCipheriv(`aes-${rawKey.length * 8}-gcm`, rawKey, iv);
const encrypted = Buffer.concat([cipher.update(plaintext, 'utf8'), cipher.final()]);
const authTag = cipher.getAuthTag();
return `${iv.toString('hex')}:${authTag.toString('hex')}:${encrypted.toString('hex')}`;
};
const writeCliEnvFile = () => {
const file = process.env.AGENT_TESTING_CLI_ENV_FILE || '.records/env/agent-testing-cli.env';
fs.mkdirSync(path.dirname(file), { recursive: true });
fs.writeFileSync(
file,
[
'# Source this file before running LobeHub CLI agent tests.',
'# Generated by init-dev-env.sh seed-user',
`export LOBE_API_KEY=${TEST_API_KEY.key}`,
`export LOBEHUB_CLI_API_KEY="${'${LOBE_API_KEY}'}"`,
`export LOBEHUB_SERVER=${process.env.APP_URL}`,
'export LOBEHUB_CLI_HOME=.lobehub-dev',
'',
].join('\n'),
);
return file;
};
const client = new pg.Client({ connectionString: databaseUrl });
(async () => {
if (!validateApiKeyFormat(TEST_API_KEY.key)) {
throw new Error(`Invalid AGENT_TESTING_API_KEY format: ${TEST_API_KEY.key}`);
}
await client.connect();
const now = new Date().toISOString();
const onboarding = JSON.stringify({ finishedAt: now, version: 1 });
const passwordHash = await bcrypt.hash(TEST_USER.password, 10);
const encryptedApiKey = encryptWithKeyVaultsSecret(TEST_API_KEY.key);
const apiKeyHash = hashApiKey(TEST_API_KEY.key);
await client.query(
`INSERT INTO users (id, email, normalized_email, username, full_name, email_verified, onboarding, created_at, updated_at, last_active_at)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $8, $8)
ON CONFLICT (id) DO UPDATE SET onboarding = $7, updated_at = $8`,
[
TEST_USER.id,
TEST_USER.email,
TEST_USER.email.toLowerCase(),
TEST_USER.username,
TEST_USER.fullName,
true,
onboarding,
now,
],
);
await client.query(
`INSERT INTO accounts (id, user_id, account_id, provider_id, password, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, $6, $6)
ON CONFLICT DO NOTHING`,
[
'agent_testing_account_001',
TEST_USER.id,
TEST_USER.email,
'credential',
passwordHash,
now,
],
);
await client.query(
`INSERT INTO api_keys (id, name, key, key_hash, enabled, expires_at, user_id, workspace_id, created_at, updated_at)
VALUES ($1, $2, $3, $4, $5, NULL, $6, NULL, $7, $7)
ON CONFLICT (id) DO UPDATE
SET name = EXCLUDED.name,
key = EXCLUDED.key,
key_hash = EXCLUDED.key_hash,
enabled = EXCLUDED.enabled,
expires_at = NULL,
updated_at = EXCLUDED.updated_at`,
[
TEST_API_KEY.id,
TEST_API_KEY.name,
encryptedApiKey,
apiKeyHash,
true,
TEST_USER.id,
now,
],
);
const cliEnvFile = writeCliEnvFile();
console.log('seeded baseline user:');
console.log(` email: ${TEST_USER.email}`);
console.log(` password: ${TEST_USER.password}`);
console.log('seeded baseline API key:');
console.log(` LOBE_API_KEY: ${TEST_API_KEY.key}`);
console.log(` CLI env: ${cliEnvFile}`);
})()
.finally(() => client.end())
.catch((error) => {
console.error(error);
process.exit(1);
});
NODE
}
cmd_status() {
apply_env
echo "agent-testing local dev env:"
note "APP_URL=$APP_URL"
note "AGENT_RUNTIME_MODE=$AGENT_RUNTIME_MODE"
note "DATABASE_URL=$DATABASE_URL"
note "PORT=$PORT"
note "QSTASH_URL=$QSTASH_URL"
note "REDIS_URL=$REDIS_URL"
if command -v docker > /dev/null 2>&1; then
ok "docker CLI available"
if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
ok "managed Postgres running: $DB_CONTAINER"
else
note "managed Postgres is not running: $DB_CONTAINER"
fi
if docker ps --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
ok "managed Redis running: $REDIS_CONTAINER"
else
note "managed Redis is not running: $REDIS_CONTAINER"
fi
else
bad "docker CLI is not available"
fi
}
cmd_qstash() {
apply_env
cd "$REPO_ROOT"
note "starting local QStash dev server at $QSTASH_URL"
note "keep this process running while testing workflow paths"
exec pnpm run qstash -- -port "$QSTASH_DEV_PORT"
}
cmd_dev_next() {
apply_env
cd "$REPO_ROOT"
exec pnpm run dev:next
}
cmd_dev() {
apply_env
cd "$REPO_ROOT"
exec bun run dev
}
cmd_clean_db() {
require_docker
if docker ps --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
docker stop "$DB_CONTAINER" > /dev/null
fi
if docker ps -a --format '{{.Names}}' | grep -Fxq "$DB_CONTAINER"; then
docker rm "$DB_CONTAINER" > /dev/null
ok "removed Postgres container: $DB_CONTAINER"
else
note "Postgres container not found: $DB_CONTAINER"
fi
if docker ps --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
docker stop "$REDIS_CONTAINER" > /dev/null
fi
if docker ps -a --format '{{.Names}}' | grep -Fxq "$REDIS_CONTAINER"; then
docker rm "$REDIS_CONTAINER" > /dev/null
ok "removed Redis container: $REDIS_CONTAINER"
else
note "Redis container not found: $REDIS_CONTAINER"
fi
}
usage() {
sed -n '3,24p' "$0" >&2
}
COMMAND="${1:-status}"
case "$COMMAND" in
help|-h|--help) usage; exit 0 ;;
*) guard_no_root_env ;;
esac
case "$COMMAND" in
env) print_env ;;
write) shift; write_env "${1:-}" ;;
setup-db)
start_db
start_redis
migrate_db
;;
migrate) migrate_db ;;
seed-user) seed_user ;;
qstash) cmd_qstash ;;
dev-next) cmd_dev_next ;;
dev) cmd_dev ;;
clean-db) cmd_clean_db ;;
status) cmd_status ;;
*)
usage
exit 2
;;
esac
+61
View File
@@ -0,0 +1,61 @@
#!/usr/bin/env bash
# record-gif.sh — capture a frame sequence via agent-browser (CDP) and
# synthesize a GIF for embedding in a test report.
#
# Use this whenever the asserted behavior is about CHANGE OVER TIME —
# streaming output, a ticking timer, loading states, animations. A static
# screenshot cannot prove those; a GIF can. Cloud-portable: frames come from
# CDP rendering, no OS-level screen capture.
#
# Usage:
# record-gif.sh <output.gif> <duration_seconds> [fps]
#
# AB_TARGET="--cdp 9222" # Electron (default; CDP_PORT honored)
# AB_TARGET="--session lobehub-dev" # web agent-browser session
# GIF_WIDTH=960 # output width (px), default 960
#
# Requires ffmpeg (`brew install ffmpeg`). Effective fps is capped by
# screenshot latency (~0.3-0.5s per frame); 1-2 fps is the realistic range.
#
# Example — record a 12s run and embed it in the report:
# ./record-gif.sh "$DIR/assets/case2-tray-running.gif" 12 2 &
# GIF_PID=$!
# # ... trigger the streaming behavior ...
# wait $GIF_PID
set -euo pipefail
OUT="${1:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
DUR="${2:?Usage: record-gif.sh <output.gif> <duration_seconds> [fps]}"
FPS="${3:-2}"
AB_TARGET="${AB_TARGET:---cdp ${CDP_PORT:-9222}}"
GIF_WIDTH="${GIF_WIDTH:-960}"
command -v ffmpeg > /dev/null || {
echo "ffmpeg not found — install with: brew install ffmpeg" >&2
exit 1
}
TMP=$(mktemp -d)
trap 'rm -rf "$TMP"' EXIT
FRAMES=$((DUR * FPS))
INTERVAL=$(python3 -c "print(1 / $FPS)")
for i in $(seq -f '%04g' 1 "$FRAMES"); do
# shellcheck disable=SC2086
agent-browser $AB_TARGET screenshot "$TMP/frame-$i.png" > /dev/null 2>&1 || true
sleep "$INTERVAL"
done
CAPTURED=$(find "$TMP" -name 'frame-*.png' | wc -l | tr -d ' ')
[ "$CAPTURED" -gt 0 ] || {
echo "no frames captured — is the app reachable via $AB_TARGET?" >&2
exit 1
}
ffmpeg -y -loglevel error -framerate "$FPS" -pattern_type glob -i "$TMP/frame-*.png" \
-vf "fps=$FPS,scale=$GIF_WIDTH:-1:flags=lanczos,split[s0][s1];[s0]palettegen[p];[s1][p]paletteuse" \
"$OUT"
echo "$OUT ($CAPTURED frames @ ${FPS}fps)"
+88
View File
@@ -0,0 +1,88 @@
#!/usr/bin/env bash
# report-init.sh — scaffold a structured test report under .records/reports/.
#
# Format spec and evidence rules: ../references/report.md
#
# Usage:
# report-init.sh <slug> [title]
#
# Prints the report directory path (capture it: DIR=$(report-init.sh my-test)).
set -euo pipefail
SLUG="${1:?Usage: report-init.sh <slug> [title]}"
TITLE="${2:-$SLUG}"
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
TS="$(date +%Y%m%d-%H%M%S)"
DIR="$REPO_ROOT/.records/reports/$TS-$SLUG"
mkdir -p "$DIR/assets"
BRANCH=$(git -C "$REPO_ROOT" branch --show-current 2> /dev/null || echo "unknown")
COMMIT=$(git -C "$REPO_ROOT" rev-parse --short HEAD 2> /dev/null || echo "unknown")
DATE_HUMAN=$(date '+%Y-%m-%d %H:%M')
DATE_ISO=$(date '+%Y-%m-%dT%H:%M:%S%z')
cat > "$DIR/report.md" << EOF
# 测试报告:$TITLE
## 范围
<!-- 测试目标 / 变更范围 / 重点风险 -->
- 分支:\`$BRANCH\`
- 当前提交:\`$COMMIT\`
- 日期:$DATE_HUMAN
- 表面:<!-- CLI / Electron + CDP / Web / Bot:<platform> -->
- 测试页 / 入口:<!-- e.g. /settings or http://localhost:3010 -->
- 重点:<!-- 本轮最关心的体验、功能或回归点 -->
## 用例
| # | 用例 | 结果 | 关键现象 | 证据 |
| - | ---- | ---- | -------- | ---- |
| 1 | | 待测 | | ![用例 1](assets/case1.png) |
## 结论
整体结论:\`pending\`。
<!-- 用 1-2 段概括用户最需要知道的结果;失败和阻塞必须明确说明影响。 -->
仍需处理 / 跟进:
- <!-- TODO -->
## 本轮验证
<!-- 如有自动化或命令行验证,保留精简命令与结果;没有则写“未运行额外自动化验证”。 -->
\`\`\`bash
# command
\`\`\`
结果:
- <!-- TODO -->
## 评分
- 通过:0
- 失败:0
- 阻塞:0
- 评分:— / 100
EOF
cat > "$DIR/result.json" << EOF
{
"title": "$TITLE",
"createdAt": "$DATE_ISO",
"branch": "$BRANCH",
"commit": "$COMMIT",
"surfaces": [],
"cases": [],
"summary": { "total": 0, "passed": 0, "failed": 0, "blocked": 0, "verdict": "pending" }
}
EOF
echo "$DIR"
+553
View File
@@ -0,0 +1,553 @@
#!/usr/bin/env bash
# setup-auth.sh — one-stop auth setup & check for local agent testing.
#
# Auth is the gate for all automated testing: prepare it BEFORE writing any
# test step. Background and failure modes: ../references/auth.md
#
# Usage:
# setup-auth.sh status # check server + CLI + web + Electron readiness
# setup-auth.sh status --surface web # check only the Web surface gate
# setup-auth.sh cli-seed # configure CLI API-key auth from seeded local env
# setup-auth.sh cli # interactive CLI device-code login (run by a human)
# setup-auth.sh open-chrome # open SERVER_URL in Chrome and show DevTools
# setup-auth.sh web-seed # sign in seeded user and inject cookies automatically
# setup-auth.sh web # stdin = Cookie header -> inject into agent-browser session
# setup-auth.sh web-verify # live-check the agent-browser session is authenticated
#
# Env:
# SERVER_URL (default from test-env.sh) dev server under test
# SESSION (default lobehub-dev) agent-browser session name
# AUTH_DIR (default ~/.lobehub-agent-testing) where web state is persisted
# SEED_EMAIL / SEED_PASSWORD seeded better-auth login
set -euo pipefail
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../.." && pwd)"
workspace_root_for_port() {
local root="$REPO_ROOT"
local name
name="$(basename "$root")"
if [[ "$name" == "lobehub" ]]; then
local parent
parent="$(cd "$root/.." && pwd)"
local parent_name
parent_name="$(basename "$parent")"
if [[ "$parent_name" == lobehub-cloud* ]]; then
root="$parent"
fi
fi
printf '%s\n' "$root"
}
default_server_url() {
local env_resolver resolved
env_resolver="$(dirname "${BASH_SOURCE[0]}")/test-env.sh"
if [[ -x "$env_resolver" ]]; then
resolved="$("$env_resolver" --value SERVER_URL 2> /dev/null || true)"
if [[ -n "$resolved" ]]; then
printf '%s\n' "$resolved"
return 0
fi
fi
local root name suffix port
root="$(workspace_root_for_port)"
name="$(basename "$root")"
case "$name" in
lobehub-cloud)
port=3020
;;
lobehub-cloud-*)
suffix="${name#lobehub-cloud-}"
if [[ "$suffix" =~ ^[0-9]+$ ]]; then
port=$((3020 + 10#$suffix))
else
port=3010
fi
;;
*)
port=3010
;;
esac
printf 'http://localhost:%s\n' "$port"
}
SERVER_URL="${SERVER_URL:-$(default_server_url)}"
SESSION="${SESSION:-lobehub-dev}"
AUTH_DIR="${AUTH_DIR:-$HOME/.lobehub-agent-testing}"
STATE_FILE="$AUTH_DIR/web-state.json"
CLI_HOME_NAME="${LOBEHUB_CLI_HOME:-.lobehub-dev}"
CLI_HOME="$HOME/${CLI_HOME_NAME#/}"
CLI_CREDENTIALS_FILE="$CLI_HOME/credentials.json"
SEED_EMAIL="${SEED_EMAIL:-agent-testing@lobehub.com}"
SEED_PASSWORD="${SEED_PASSWORD:-TestPassword123!}"
SEED_API_KEY="${SEED_API_KEY:-${AGENT_TESTING_API_KEY:-sk-lh-agenttesting0001}}"
CLI_ENV_FILE="${CLI_ENV_FILE:-$REPO_ROOT/.records/env/agent-testing-cli.env}"
ok() { printf ' \033[32m✔\033[0m %s\n' "$1"; }
bad() { printf ' \033[31m✘\033[0m %s\n' "$1"; }
note() { printf ' %s\n' "$1"; }
usage() {
cat << EOF
Usage:
$0 status [--surface all|cli|web|electron]
$0 cli-seed
$0 cli
$0 open-chrome [--dry-run]
$0 web-seed
$0 web
$0 web-verify
Env:
SERVER_URL=$SERVER_URL
SESSION=$SESSION
AUTH_DIR=$AUTH_DIR
SEED_EMAIL=$SEED_EMAIL
CLI_HOME=$CLI_HOME
EOF
}
check_server() {
local code
code=$(curl -s -o /dev/null -w '%{http_code}' "$SERVER_URL/" 2> /dev/null || true)
if [[ "$code" =~ ^[23] ]]; then
ok "dev server reachable at $SERVER_URL"
else
bad "dev server NOT reachable at $SERVER_URL (http_code='$code')"
note "start it: pnpm run dev:next (see references/dev-server.md)"
return 1
fi
}
check_cli() {
local api_key="${LOBEHUB_CLI_API_KEY:-${LOBE_API_KEY:-}}"
if [[ -n "$api_key" ]]; then
local body_file code
body_file="$(mktemp)"
code=$(curl -sS -o "$body_file" -w '%{http_code}' \
-H "Authorization: Bearer $api_key" \
"$SERVER_URL/api/v1/users/me?includeCount=0" 2> /dev/null || true)
if [[ "$code" =~ ^[23] ]]; then
rm -f "$body_file"
ok "CLI API-key auth valid for $SERVER_URL"
return 0
fi
bad "CLI API-key auth failed for $SERVER_URL (http_code='$code')"
note "seed the local API key first:"
note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user"
note "source $CLI_ENV_FILE"
rm -f "$body_file"
return 1
fi
if [[ -f "$CLI_HOME/settings.json" ]] && grep -q "$SERVER_URL" "$CLI_HOME/settings.json" && [[ -f "$CLI_CREDENTIALS_FILE" ]]; then
ok "CLI device-code credentials configured for $SERVER_URL (creds: $CLI_HOME)"
else
bad "CLI not logged in to $SERVER_URL"
note "automated path:"
note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user && source $CLI_ENV_FILE && $0 cli-seed"
note "interactive fallback:"
note "cd apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server $SERVER_URL"
return 1
fi
}
check_web() {
if [[ -f "$STATE_FILE" ]]; then
ok "web auth state saved ($STATE_FILE)"
else
bad "no web auth state for agent-browser"
note "for the seeded local user, run: $0 web-seed"
note "or copy the Cookie header from Chrome DevTools (Network tab), then:"
note "pbpaste | $0 web (see references/auth.md)"
return 1
fi
cmd_web_verify --skip-server-check
}
check_agent_browser() {
if command -v agent-browser > /dev/null 2>&1; then
ok "agent-browser available"
else
bad "agent-browser command not found"
note "install or expose agent-browser before Web/Electron UI testing"
return 1
fi
}
check_electron() {
local cdp_port="${CDP_PORT:-9222}"
if ! curl -s -o /dev/null --max-time 2 "http://localhost:$cdp_port/json/version" 2> /dev/null; then
note "electron: not running (CDP $cdp_port unreachable) — start with electron-dev.sh; check skipped"
return 0
fi
local probe result
probe="$(dirname "${BASH_SOURCE[0]}")/app-probe.sh"
result=$(bash "$probe" auth 2> /dev/null || true)
# agent-browser eval returns the JSON string with escaped quotes — normalize.
result="${result//\\/}"
if [[ "$result" == *'"isSignedIn":true'* ]]; then
ok "electron app signed in ($result)"
else
bad "electron app NOT signed in ($result)"
note "log in once manually inside the app (state persists across restarts)"
return 1
fi
}
cmd_status() {
local surface="all"
while [[ $# -gt 0 ]]; do
case "$1" in
--surface)
if [[ $# -lt 2 ]]; then
echo "--surface requires one of: all, cli, web, electron" >&2
return 2
fi
surface="${2:-}"
shift 2
;;
--surface=*)
surface="${1#*=}"
shift
;;
all|cli|web|electron)
surface="$1"
shift
;;
-h|--help)
usage
return 0
;;
*)
echo "unknown status option: $1" >&2
usage >&2
return 2
;;
esac
done
case "$surface" in
all|cli|web|electron) ;;
"")
echo "--surface requires one of: all, cli, web, electron" >&2
return 2
;;
*)
echo "unknown surface: $surface" >&2
usage >&2
return 2
;;
esac
echo "agent-testing auth status (surface=$surface, SERVER_URL=$SERVER_URL):"
local rc=0
case "$surface" in
all)
check_server || rc=1
check_cli || rc=1
check_web || rc=1
check_electron || rc=1
;;
cli)
check_server || rc=1
check_cli || rc=1
;;
web)
check_server || rc=1
check_web || rc=1
;;
electron)
check_electron || rc=1
;;
esac
if [[ $rc -eq 0 ]]; then
echo "$surface auth green — safe to start automated testing on this surface."
else
echo "$surface auth NOT ready — fix the ✘ items before writing any test step."
fi
return $rc
}
cmd_cli() {
echo "Starting CLI device-code login against $SERVER_URL ..."
echo "(opens a browser authorization — must be run by a human in a terminal)"
cd "$REPO_ROOT/apps/cli"
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server "$SERVER_URL"
}
write_cli_seed_env() {
mkdir -p "$(dirname "$CLI_ENV_FILE")"
cat > "$CLI_ENV_FILE" << EOF
# Source this file before running LobeHub CLI agent tests.
# Generated by setup-auth.sh cli-seed
export LOBE_API_KEY=$SEED_API_KEY
export LOBEHUB_CLI_API_KEY="\${LOBE_API_KEY}"
export LOBEHUB_SERVER=$SERVER_URL
export LOBEHUB_CLI_HOME=.lobehub-dev
EOF
}
write_cli_settings() {
mkdir -p "$CLI_HOME"
python3 - "$CLI_HOME/settings.json" "$SERVER_URL" << 'PY'
import json
import os
import sys
path, server_url = sys.argv[1], sys.argv[2]
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w") as f:
json.dump({"serverUrl": server_url}, f, indent=2)
f.write("\n")
os.chmod(path, 0o600)
PY
}
cmd_cli_seed() {
check_server || return 1
write_cli_seed_env
write_cli_settings
ok "wrote CLI seed env: $CLI_ENV_FILE"
note "source it before CLI commands: source $CLI_ENV_FILE"
note "settings saved at: $CLI_HOME/settings.json"
LOBE_API_KEY="$SEED_API_KEY" LOBEHUB_CLI_API_KEY="$SEED_API_KEY" check_cli
}
cmd_open_chrome() {
local mode="${1:-}"
if [[ "$mode" != "" && "$mode" != "--dry-run" ]]; then
echo "unknown open-chrome option: $mode" >&2
usage >&2
return 2
fi
if [[ "$mode" == "--dry-run" ]]; then
echo "would open Google Chrome at $SERVER_URL/"
echo "would press Cmd+Option+I to open DevTools"
echo "would open DevTools command menu and run 'Show Network'"
return 0
fi
if [[ "$(uname -s)" != "Darwin" ]]; then
bad "open-chrome is macOS-only"
note "open $SERVER_URL/ in your browser and open DevTools manually"
return 1
fi
if ! command -v osascript > /dev/null 2>&1; then
bad "osascript not found"
note "open $SERVER_URL/ in Chrome and press Cmd+Option+I manually"
return 1
fi
SERVER_URL="$SERVER_URL" osascript << 'OSA'
set targetUrl to (system attribute "SERVER_URL") & "/"
tell application "Google Chrome"
activate
if (count of windows) = 0 then
make new window
end if
tell front window to make new tab with properties {URL:targetUrl}
end tell
delay 1
tell application "System Events"
tell process "Google Chrome"
set frontmost to true
keystroke "i" using {command down, option down}
delay 1
keystroke "p" using {command down, shift down}
delay 0.2
keystroke "Show Network"
key code 36
end tell
end tell
OSA
ok "opened Chrome at $SERVER_URL/ and requested DevTools Network panel"
}
cookie_header_from_jar() {
local jar="$1"
awk '
BEGIN { first = 1 }
/^$/ { next }
/^#/ {
if ($0 !~ /^#HttpOnly_/) next
sub(/^#HttpOnly_/, "")
}
NF >= 7 {
if (!first) printf "; "
printf "%s=%s", $6, $7
first = 0
}
END {
if (!first) printf "\n"
}
' "$jar"
}
# Build a Playwright storageState file from a raw Cookie header on stdin,
# keeping only the better-auth cookies. See references/auth.md for why the
# header must come from a Network request (HttpOnly) and why httpOnly=false.
cmd_web() {
mkdir -p "$AUTH_DIR"
local raw
raw="$(cat)"
COOKIE_INPUT="$raw" python3 - "$STATE_FILE" << 'PY'
import json, os, sys, time
raw = os.environ.get("COOKIE_INPUT", "").strip()
cookie_lines = []
for line in raw.splitlines():
stripped = line.strip()
if not stripped:
continue
if stripped.lower().startswith("cookie:"):
cookie_lines.append(stripped.split(":", 1)[1].strip())
else:
cookie_lines.append(stripped)
raw = "; ".join(cookie_lines)
WANTED = {"better-auth.session_token", "better-auth.session_data", "better-auth.state"}
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
cookies = []
for pair in raw.split(";"):
pair = pair.strip()
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input — paste the raw Cookie header from a Network request\n")
sys.exit(1)
with open(sys.argv[1], "w") as f:
json.dump({"cookies": cookies, "origins": []}, f, indent=2)
print(f"wrote {len(cookies)} cookie(s) to {sys.argv[1]}")
PY
cmd_web_verify
}
cmd_web_seed() {
check_server || return 1
mkdir -p "$AUTH_DIR"
local cookie_jar="$AUTH_DIR/web-seed-cookie.jar"
local response_body="$AUTH_DIR/web-seed-response.json"
local payload code
payload="$(
SEED_EMAIL="$SEED_EMAIL" SEED_PASSWORD="$SEED_PASSWORD" python3 - << 'PY'
import json
import os
print(json.dumps({
"callbackURL": "/",
"email": os.environ["SEED_EMAIL"],
"password": os.environ["SEED_PASSWORD"],
}))
PY
)"
code=$(curl -sS -o "$response_body" -w '%{http_code}' \
-c "$cookie_jar" \
-H 'Content-Type: application/json' \
-X POST "$SERVER_URL/api/auth/sign-in/email" \
--data "$payload" 2> /dev/null || true)
if [[ ! "$code" =~ ^[23] ]]; then
bad "seed user sign-in failed at $SERVER_URL/api/auth/sign-in/email (http_code='$code')"
note "make sure the seed user exists:"
note "./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user"
return 1
fi
local cookie_header
cookie_header="$(cookie_header_from_jar "$cookie_jar")"
if [[ -z "$cookie_header" ]]; then
bad "seed sign-in succeeded but no cookies were written to $cookie_jar"
return 1
fi
printf '%s\n' "$cookie_header" | cmd_web
}
cmd_web_verify() {
local skip_server_check="${1:-}"
if [[ "$skip_server_check" != "--skip-server-check" ]]; then
check_server || return 1
fi
if [[ ! -f "$STATE_FILE" ]]; then
bad "no web auth state for agent-browser"
note "for the seeded local user, run: $0 web-seed"
note "or copy the Cookie header from Chrome DevTools (Network tab), then:"
note "pbpaste | $0 web"
return 1
fi
check_agent_browser || return 1
if ! agent-browser --session "$SESSION" state load "$STATE_FILE" > /dev/null; then
bad "failed to load web auth state into agent-browser session '$SESSION'"
return 1
fi
if ! agent-browser --session "$SESSION" open "$SERVER_URL/" > /dev/null; then
bad "failed to open $SERVER_URL in agent-browser session '$SESSION'"
return 1
fi
local url
url=$(agent-browser --session "$SESSION" get url 2> /dev/null || true)
if [[ -z "$url" ]]; then
bad "agent-browser session '$SESSION' did not report a current URL"
return 1
fi
if [[ "$url" == *"/signin"* || "$url" == *"/login"* ]]; then
bad "agent-browser session '$SESSION' NOT authenticated (landed on $url)"
note "re-copy the Cookie header and re-run: pbpaste | $0 web"
return 1
fi
ok "agent-browser session '$SESSION' authenticated (at $url)"
}
case "${1:-status}" in
status)
shift || true
cmd_status "$@"
;;
cli-seed) cmd_cli_seed ;;
cli) cmd_cli ;;
open-chrome)
shift || true
cmd_open_chrome "$@"
;;
web-seed) cmd_web_seed ;;
web) cmd_web ;;
web-verify) cmd_web_verify ;;
-h|--help) usage ;;
*)
echo "Usage: $0 {status|cli-seed|cli|open-chrome|web-seed|web|web-verify}" >&2
exit 2
;;
esac
+197
View File
@@ -0,0 +1,197 @@
#!/usr/bin/env bash
# Smoke tests for setup-auth.sh. Uses a temporary agent-browser stub and local
# HTTP server, so it does not need real browser auth.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT="$SCRIPT_DIR/setup-auth.sh"
fail() {
echo "FAIL: $*" >&2
exit 1
}
assert_contains() {
local file="$1"
local text="$2"
grep -Fq "$text" "$file" || fail "expected '$text' in $file"
}
tmp_dir="$(mktemp -d)"
server_pid=""
cleanup() {
if [[ -n "$server_pid" ]]; then
kill "$server_pid" > /dev/null 2>&1 || true
wait "$server_pid" > /dev/null 2>&1 || true
fi
rm -rf "$tmp_dir"
}
trap cleanup EXIT
export HOME="$tmp_dir/home"
port="$(python3 - << 'PY'
import socket
sock = socket.socket()
sock.bind(("127.0.0.1", 0))
print(sock.getsockname()[1])
sock.close()
PY
)"
python3 - "$port" << 'PY' > "$tmp_dir/http.log" 2>&1 &
from http.server import BaseHTTPRequestHandler, ThreadingHTTPServer
import sys
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path.startswith("/api/v1/users/me"):
if self.headers.get("authorization") != "Bearer sk-lh-agenttesting0001":
self.send_response(401)
self.end_headers()
self.wfile.write(b'{"success":false}')
return
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{"success":true,"data":{"id":"user_agent_testing_001"}}')
return
self.send_response(200)
self.end_headers()
self.wfile.write(b"ok")
def do_POST(self):
length = int(self.headers.get("content-length") or "0")
if length:
self.rfile.read(length)
if self.path != "/api/auth/sign-in/email":
self.send_response(404)
self.end_headers()
return
self.send_response(200)
self.send_header(
"Set-Cookie",
"better-auth.session_token=seed.token; Path=/; HttpOnly; SameSite=Lax",
)
self.send_header(
"Set-Cookie",
"better-auth.session_data=seed.data; Path=/; HttpOnly; SameSite=Lax",
)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(b'{"ok":true}')
def log_message(self, format, *args):
return
ThreadingHTTPServer(("localhost", int(sys.argv[1])), Handler).serve_forever()
PY
server_pid="$!"
server_url="http://localhost:$port"
for _ in {1..50}; do
if curl -s -o /dev/null "$server_url/"; then
break
fi
sleep 0.1
done
curl -s -o /dev/null "$server_url/" || fail "test HTTP server did not start"
mkdir -p "$tmp_dir/bin" "$tmp_dir/auth"
cat > "$tmp_dir/bin/agent-browser" << 'SH'
#!/usr/bin/env bash
set -euo pipefail
if [[ "${1:-}" == "--session" ]]; then
shift 2
fi
case "${1:-}" in
state)
[[ "${2:-}" == "load" ]] || exit 2
[[ -f "${3:-}" ]] || exit 1
;;
open)
printf '%s\n' "${2:-}" > "${AGENT_BROWSER_URL_FILE:?}"
;;
get)
[[ "${2:-}" == "url" ]] || exit 2
cat "${AGENT_BROWSER_URL_FILE:?}"
;;
*)
echo "unexpected agent-browser command: $*" >&2
exit 2
;;
esac
SH
chmod +x "$tmp_dir/bin/agent-browser"
export PATH="$tmp_dir/bin:$PATH"
export AUTH_DIR="$tmp_dir/auth"
export SESSION="setup-auth-test"
export SERVER_URL="$server_url"
export AGENT_BROWSER_URL_FILE="$tmp_dir/current-url"
cookie_header="Cookie: foo=bar; better-auth.session_token=test.token; better-auth.session_data=encoded%3D; theme=dark"
printf '%s\n' "$cookie_header" | "$SCRIPT" web > "$tmp_dir/web.out"
python3 - "$AUTH_DIR/web-state.json" << 'PY'
import json, sys
with open(sys.argv[1]) as f:
state = json.load(f)
names = {cookie["name"] for cookie in state["cookies"]}
expected = {"better-auth.session_token", "better-auth.session_data"}
if names != expected:
raise SystemExit(f"unexpected cookies: {sorted(names)}")
PY
"$SCRIPT" web-seed > "$tmp_dir/web-seed.out"
python3 - "$AUTH_DIR/web-state.json" << 'PY'
import json, sys
with open(sys.argv[1]) as f:
state = json.load(f)
values = {cookie["name"]: cookie["value"] for cookie in state["cookies"]}
expected = {
"better-auth.session_token": "seed.token",
"better-auth.session_data": "seed.data",
}
if values != expected:
raise SystemExit(f"unexpected seeded cookies: {values}")
PY
"$SCRIPT" status --surface web > "$tmp_dir/status.out"
assert_contains "$tmp_dir/status.out" "surface=web"
assert_contains "$tmp_dir/status.out" "web auth green"
"$SCRIPT" cli-seed > "$tmp_dir/cli-seed.out"
assert_contains "$tmp_dir/cli-seed.out" "CLI API-key auth valid"
assert_contains "$tmp_dir/cli-seed.out" "settings saved at: $HOME/.lobehub-dev/settings.json"
if "$SCRIPT" status --surface cli > "$tmp_dir/cli-no-env.out"; then
fail "cli status without API key unexpectedly passed"
fi
assert_contains "$tmp_dir/cli-no-env.out" "CLI not logged in"
LOBEHUB_CLI_API_KEY=sk-lh-agenttesting0001 "$SCRIPT" status --surface cli > "$tmp_dir/cli-status.out"
assert_contains "$tmp_dir/cli-status.out" "CLI API-key auth valid"
assert_contains "$tmp_dir/cli-status.out" "cli auth green"
if printf 'foo=bar\n' | "$SCRIPT" web > "$tmp_dir/invalid.out" 2> "$tmp_dir/invalid.err"; then
fail "invalid cookie unexpectedly passed"
fi
assert_contains "$tmp_dir/invalid.err" "no better-auth cookies found"
echo "setup-auth tests passed"
+377
View File
@@ -0,0 +1,377 @@
#!/usr/bin/env bash
# Print the resolved local test environment for agent-testing.
#
# This is intentionally read-only. It mirrors scripts/runWithEnv.mts precedence:
# .env -> .env.$NODE_ENV -> .env.local -> .env.$NODE_ENV.local, then shell env.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
NODE_ENV="${NODE_ENV:-development}"
VALUE_APP_URL=""
VALUE_PORT=""
VALUE_SERVER_URL=""
VALUE_AUTH_TRUSTED_ORIGINS=""
VALUE_SPA_PORT=""
VALUE_MOBILE_SPA_PORT=""
VALUE_DESKTOP_PORT=""
SOURCE_APP_URL=""
SOURCE_PORT=""
SOURCE_SERVER_URL=""
SOURCE_AUTH_TRUSTED_ORIGINS=""
SOURCE_SPA_PORT=""
SOURCE_MOBILE_SPA_PORT=""
SOURCE_DESKTOP_PORT=""
LOADED_ENV_FILES=""
keys() {
printf '%s\n' \
APP_URL \
PORT \
SERVER_URL \
AUTH_TRUSTED_ORIGINS \
SPA_PORT \
MOBILE_SPA_PORT \
DESKTOP_PORT
}
trim() {
local value="$1"
value="${value#"${value%%[![:space:]]*}"}"
value="${value%"${value##*[![:space:]]}"}"
printf '%s' "$value"
}
workspace_root() {
local root="$REPO_ROOT"
local name
name="$(basename "$root")"
if [[ "$name" == "lobehub" ]]; then
local parent parent_name
parent="$(cd "$root/.." && pwd)"
parent_name="$(basename "$parent")"
if [[ "$parent_name" == lobehub-cloud* ]]; then
root="$parent"
fi
fi
printf '%s\n' "$root"
}
workspace_offset() {
local name="$1"
case "$name" in
lobehub-cloud)
printf '0\n'
;;
lobehub-cloud-*)
local suffix="${name#lobehub-cloud-}"
if [[ "$suffix" =~ ^[0-9]+$ ]]; then
printf '%s\n' "$((10#$suffix))"
else
printf '\n'
fi
;;
*)
printf '\n'
;;
esac
}
default_port() {
local base="$1"
local fallback="$2"
local root name offset
root="$(workspace_root)"
name="$(basename "$root")"
offset="$(workspace_offset "$name")"
if [[ -n "$offset" ]]; then
printf '%s\n' "$((base + offset))"
else
printf '%s\n' "$fallback"
fi
}
url_port() {
local url="$1"
local hostport
hostport="${url#*://}"
hostport="${hostport%%/*}"
if [[ "$hostport" == *:* ]]; then
local port="${hostport##*:}"
if [[ "$port" =~ ^[0-9]+$ ]]; then
printf '%s\n' "$port"
return 0
fi
fi
return 1
}
url_origin() {
local url="$1"
local scheme rest hostport
if [[ "$url" == *"://"* ]]; then
scheme="${url%%://*}"
rest="${url#*://}"
hostport="${rest%%/*}"
printf '%s://%s\n' "$scheme" "$hostport"
else
printf '%s\n' "$url"
fi
}
set_value() {
local key="$1"
local value="$2"
local source="$3"
case "$key" in
APP_URL) VALUE_APP_URL="$value"; SOURCE_APP_URL="$source" ;;
PORT) VALUE_PORT="$value"; SOURCE_PORT="$source" ;;
SERVER_URL) VALUE_SERVER_URL="$value"; SOURCE_SERVER_URL="$source" ;;
AUTH_TRUSTED_ORIGINS) VALUE_AUTH_TRUSTED_ORIGINS="$value"; SOURCE_AUTH_TRUSTED_ORIGINS="$source" ;;
SPA_PORT) VALUE_SPA_PORT="$value"; SOURCE_SPA_PORT="$source" ;;
MOBILE_SPA_PORT) VALUE_MOBILE_SPA_PORT="$value"; SOURCE_MOBILE_SPA_PORT="$source" ;;
DESKTOP_PORT) VALUE_DESKTOP_PORT="$value"; SOURCE_DESKTOP_PORT="$source" ;;
esac
}
value_for() {
case "$1" in
APP_URL) printf '%s\n' "$VALUE_APP_URL" ;;
PORT) printf '%s\n' "$VALUE_PORT" ;;
SERVER_URL) printf '%s\n' "$VALUE_SERVER_URL" ;;
AUTH_TRUSTED_ORIGINS) printf '%s\n' "$VALUE_AUTH_TRUSTED_ORIGINS" ;;
SPA_PORT) printf '%s\n' "$VALUE_SPA_PORT" ;;
MOBILE_SPA_PORT) printf '%s\n' "$VALUE_MOBILE_SPA_PORT" ;;
DESKTOP_PORT) printf '%s\n' "$VALUE_DESKTOP_PORT" ;;
esac
}
source_for() {
case "$1" in
APP_URL) printf '%s\n' "$SOURCE_APP_URL" ;;
PORT) printf '%s\n' "$SOURCE_PORT" ;;
SERVER_URL) printf '%s\n' "$SOURCE_SERVER_URL" ;;
AUTH_TRUSTED_ORIGINS) printf '%s\n' "$SOURCE_AUTH_TRUSTED_ORIGINS" ;;
SPA_PORT) printf '%s\n' "$SOURCE_SPA_PORT" ;;
MOBILE_SPA_PORT) printf '%s\n' "$SOURCE_MOBILE_SPA_PORT" ;;
DESKTOP_PORT) printf '%s\n' "$SOURCE_DESKTOP_PORT" ;;
esac
}
is_tracked_key() {
case "$1" in
APP_URL|PORT|SERVER_URL|AUTH_TRUSTED_ORIGINS|SPA_PORT|MOBILE_SPA_PORT|DESKTOP_PORT) return 0 ;;
*) return 1 ;;
esac
}
parse_env_file() {
local file="$1"
local root="$2"
local label="${file#$root/}"
local line key value
[[ -f "$file" ]] || return 0
if [[ -z "$LOADED_ENV_FILES" ]]; then
LOADED_ENV_FILES="$label"
else
LOADED_ENV_FILES="$LOADED_ENV_FILES, $label"
fi
while IFS= read -r line || [[ -n "$line" ]]; do
line="$(trim "$line")"
[[ -z "$line" || "$line" == \#* ]] && continue
if [[ "$line" == export[[:space:]]* ]]; then
line="$(trim "${line#export}")"
fi
[[ "$line" == *=* ]] || continue
key="$(trim "${line%%=*}")"
value="$(trim "${line#*=}")"
is_tracked_key "$key" || continue
if [[ "$value" == \"*\" && "$value" == *\" && ${#value} -ge 2 ]]; then
value="${value:1:${#value}-2}"
elif [[ "$value" == \'* && "$value" == *\' && ${#value} -ge 2 ]]; then
value="${value:1:${#value}-2}"
fi
set_value "$key" "$value" "$label"
done < "$file"
}
apply_env_files() {
local root="$1"
parse_env_file "$root/.env" "$root"
parse_env_file "$root/.env.$NODE_ENV" "$root"
parse_env_file "$root/.env.local" "$root"
parse_env_file "$root/.env.$NODE_ENV.local" "$root"
}
apply_shell_overrides() {
local key value
while IFS= read -r key; do
if [[ -n "${!key+x}" ]]; then
value="${!key}"
set_value "$key" "$value" "shell"
fi
done < <(keys)
}
resolve_defaults() {
local app_port spa_port mobile_spa_port desktop_port
app_port="$(default_port 3020 3010)"
spa_port="$(default_port 9800 9876)"
mobile_spa_port="$(default_port 3810 3012)"
desktop_port="$(default_port 3030 3015)"
if [[ -z "$VALUE_APP_URL" ]]; then
set_value APP_URL "http://localhost:$app_port" "inferred"
fi
if [[ -z "$VALUE_PORT" ]]; then
if app_port="$(url_port "$VALUE_APP_URL")"; then
set_value PORT "$app_port" "inferred from APP_URL"
else
set_value PORT "$(default_port 3020 3010)" "inferred"
fi
fi
if [[ -z "$VALUE_SERVER_URL" ]]; then
set_value SERVER_URL "$VALUE_APP_URL" "from APP_URL"
fi
if [[ -z "$VALUE_SPA_PORT" ]]; then
set_value SPA_PORT "$spa_port" "inferred"
fi
if [[ -z "$VALUE_MOBILE_SPA_PORT" ]]; then
set_value MOBILE_SPA_PORT "$mobile_spa_port" "inferred"
fi
if [[ -z "$VALUE_DESKTOP_PORT" ]]; then
set_value DESKTOP_PORT "$desktop_port" "inferred"
fi
if [[ -z "$VALUE_AUTH_TRUSTED_ORIGINS" ]]; then
set_value AUTH_TRUSTED_ORIGINS "$(url_origin "$VALUE_APP_URL"),http://localhost:$VALUE_SPA_PORT" "inferred"
fi
}
contains_origin() {
local list="$1"
local expected="$2"
local item
IFS=',' read -r -a items <<< "$list"
for item in "${items[@]}"; do
item="$(trim "$item")"
[[ "$item" == "$expected" ]] && return 0
done
return 1
}
print_exports() {
local key value
while IFS= read -r key; do
value="$(value_for "$key")"
printf 'export %s=%q\n' "$key" "$value"
done < <(keys)
}
print_value() {
local key="$1"
if ! is_tracked_key "$key"; then
echo "unknown key: $key" >&2
exit 2
fi
value_for "$key"
}
print_human() {
local root="$1"
local key value source
echo "agent-testing test env:"
printf ' workspace: %s\n' "$root"
printf ' NODE_ENV: %s\n' "$NODE_ENV"
printf ' env files: %s\n' "${LOADED_ENV_FILES:-none}"
echo
echo "resolved values:"
while IFS= read -r key; do
value="$(value_for "$key")"
source="$(source_for "$key")"
printf ' %-22s %s (%s)\n' "$key=$value" "" "$source"
done < <(keys)
echo
echo "checks:"
local app_origin spa_origin app_port
app_origin="$(url_origin "$VALUE_APP_URL")"
spa_origin="http://localhost:$VALUE_SPA_PORT"
if app_port="$(url_port "$VALUE_APP_URL")" && [[ "$app_port" == "$VALUE_PORT" ]]; then
printf ' OK PORT matches APP_URL (%s)\n' "$VALUE_PORT"
else
printf ' WARN PORT (%s) does not match APP_URL (%s)\n' "$VALUE_PORT" "$VALUE_APP_URL"
fi
if contains_origin "$VALUE_AUTH_TRUSTED_ORIGINS" "$app_origin"; then
printf ' OK AUTH_TRUSTED_ORIGINS includes %s\n' "$app_origin"
else
printf ' WARN AUTH_TRUSTED_ORIGINS is missing %s\n' "$app_origin"
fi
if contains_origin "$VALUE_AUTH_TRUSTED_ORIGINS" "$spa_origin"; then
printf ' OK AUTH_TRUSTED_ORIGINS includes %s\n' "$spa_origin"
else
printf ' WARN AUTH_TRUSTED_ORIGINS is missing %s\n' "$spa_origin"
fi
}
usage() {
cat << EOF
Usage:
$0 # print resolved test environment
$0 --exports # print source-able export lines
$0 --value KEY # print one resolved value
Tracked keys:
APP_URL PORT SERVER_URL AUTH_TRUSTED_ORIGINS SPA_PORT MOBILE_SPA_PORT DESKTOP_PORT
EOF
}
ROOT="$(workspace_root)"
apply_env_files "$ROOT"
apply_shell_overrides
resolve_defaults
case "${1:-}" in
"")
print_human "$ROOT"
;;
--exports)
print_exports
;;
--value)
print_value "${2:-}"
;;
-h|--help)
usage
;;
*)
echo "unknown option: $1" >&2
usage >&2
exit 2
;;
esac
+57
View File
@@ -0,0 +1,57 @@
#!/usr/bin/env bash
# Smoke tests for test-env.sh.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
fail() {
echo "FAIL: $*" >&2
exit 1
}
assert_eq() {
local actual="$1"
local expected="$2"
[[ "$actual" == "$expected" ]] || fail "expected '$expected', got '$actual'"
}
assert_contains() {
local file="$1"
local text="$2"
grep -Fq "$text" "$file" || fail "expected '$text' in $file"
}
tmp_dir="$(mktemp -d)"
trap 'rm -rf "$tmp_dir"' EXIT
mkdir -p "$tmp_dir/lobehub-cloud-1/.agents/skills" "$tmp_dir/lobehub/.agents/skills"
ln -s "$SCRIPT_DIR/.." "$tmp_dir/lobehub-cloud-1/.agents/skills/agent-testing"
ln -s "$SCRIPT_DIR/.." "$tmp_dir/lobehub/.agents/skills/agent-testing"
cloud_script="$tmp_dir/lobehub-cloud-1/.agents/skills/agent-testing/scripts/test-env.sh"
oss_script="$tmp_dir/lobehub/.agents/skills/agent-testing/scripts/test-env.sh"
assert_eq "$("$cloud_script" --value SERVER_URL)" "http://localhost:3021"
assert_eq "$("$cloud_script" --value SPA_PORT)" "9801"
assert_eq "$("$cloud_script" --value MOBILE_SPA_PORT)" "3811"
assert_eq "$("$cloud_script" --value DESKTOP_PORT)" "3031"
assert_eq "$("$oss_script" --value SERVER_URL)" "http://localhost:3010"
cat > "$tmp_dir/lobehub-cloud-1/.env" << 'EOF'
APP_URL=http://localhost:4123
PORT=4123
AUTH_TRUSTED_ORIGINS=http://localhost:4123,http://localhost:9823
SPA_PORT=9823
MOBILE_SPA_PORT=3823
DESKTOP_PORT=3043
EOF
assert_eq "$("$cloud_script" --value SERVER_URL)" "http://localhost:4123"
assert_eq "$("$cloud_script" --value SPA_PORT)" "9823"
"$cloud_script" --exports > "$tmp_dir/exports.out"
assert_contains "$tmp_dir/exports.out" "export APP_URL=http://localhost:4123"
assert_contains "$tmp_dir/exports.out" "export SERVER_URL=http://localhost:4123"
assert_contains "$tmp_dir/exports.out" "export AUTH_TRUSTED_ORIGINS=http://localhost:4123\\,http://localhost:9823"
echo "test-env tests passed"
+154
View File
@@ -0,0 +1,154 @@
# Electron (LobeHub Desktop) UI Testing
Default surface for verifying **pure frontend changes** (components, store logic, styles, interactions) in the primary product shape. Drives the Electron renderer over CDP with `agent-browser` — see [../references/agent-browser.md](../references/agent-browser.md) for the full command reference.
**Auth**: the Electron app keeps its own persistent login state — log in once manually in the app; sessions survive restarts. Run `../scripts/setup-auth.sh status` before testing (see [../references/auth.md](../references/auth.md)).
**Linux / headless (cloud)**: Electron itself runs on Linux, but it has no true headless mode — it needs a display server. In a headless environment wrap the launch with `xvfb-run` (virtual framebuffer). Everything CDP-based keeps working under Xvfb: the `agent-browser --cdp 9222` connection, snapshots, eval, and `agent-browser screenshot` (captured from the renderer via CDP, not the OS screen). What does NOT work on Linux: `capture-app-window.sh` (macOS `screencapture`), osascript, and the ffmpeg recording scripts in their current form.
### Setup / Teardown
Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
```bash
SCRIPT=".agents/skills/agent-testing/scripts/electron-dev.sh"
# Start Electron dev with CDP (idempotent — skips if already running)
$SCRIPT start
# Check if Electron is running and CDP is reachable
$SCRIPT status
# Kill all Electron-related processes (main + helper + vite)
$SCRIPT stop
# Force fresh restart
$SCRIPT restart
```
After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
**Always run `$SCRIPT stop` when done testing**`pkill -f "Electron"` alone won't catch all helper processes.
#### Environment Variables
| Variable | Default | Description |
| ----------------- | ----------------------- | ---------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `ELECTRON_LOG` | `/tmp/electron-dev.log` | Electron process log |
| `ELECTRON_WAIT_S` | `60` | Max seconds to wait for Electron process |
| `RENDERER_WAIT_S` | `60` | Max seconds to wait for SPA to load |
### LobeHub Probes & Quick Navigation
`scripts/app-probe.sh` is the standard fast path into app state — **use it
instead of hand-rolling `__LOBE_STORES` eval snippets** for these common needs:
```bash
PROBE=".agents/skills/agent-testing/scripts/app-probe.sh"
$PROBE auth # login check (Step 0.3) → { isSignedIn, userId }
$PROBE route # current SPA route
$PROBE ops # running chat operations (type / startTime)
$PROBE goto /settings # jump the SPA straight to a route (full reload)
$PROBE errors-install # install console.error interceptor
$PROBE errors # dump captured errors
```
`goto` lets a test enter the state under test directly instead of clicking
through the UI. Common desktop routes:
| Route | Where it lands |
| ----------------------------- | ------------------------------------ |
| `/` | Home (has a chat input) |
| `/agent/<agentId>` | Agent conversation (latest topic) |
| `/agent/<agentId>/<topicId>` | Specific topic in a conversation |
| `/task` · `/task/<taskId>` | Task list / task detail |
| `/page` | Documents (文稿) |
| `/settings` | Settings |
| `/community` | Discover / community |
Targets default to Electron (`--cdp 9222`); set `AB_TARGET="--session <name>"`
for web sessions. For deeper or one-off state inspection, fall back to raw
eval below.
### LobeHub-Specific Patterns
#### Access Zustand Store State
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
return JSON.stringify({
ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
activeAgent: chat.activeAgentId,
activeTopic: chat.activeTopicId,
});
})()
EVALEOF
```
#### Find and Use the Chat Input
```bash
# The chat input is contenteditable — must use -C flag
agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
agent-browser --cdp 9222 click @e48
agent-browser --cdp 9222 type @e48 "Hello world"
agent-browser --cdp 9222 press Enter
```
#### Wait for Agent to Complete
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
var running = ops.filter(function(o) { return o.status === 'running'; });
return running.length === 0 ? 'done' : 'running: ' + running.length;
})()
EVALEOF
```
#### Install Error Interceptor
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function() {
var msg = Array.from(arguments).map(function(a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
# Later, check captured errors:
agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
```
## Electron Gotchas
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Dev build auto-opens DevTools, which hijacks the CDP target** — `agent-browser --cdp 9222` may attach to the DevTools page (`devtools://…`) instead of the app (`app://renderer/`). Symptom: `get url` returns a `devtools://` URL. Fix: close the DevTools target and reconnect:
```bash
DT_ID=$(curl -s http://localhost:9222/json/list | python3 -c "import json,sys; ts=json.load(sys.stdin); print(next(t['id'] for t in ts if t['type']=='page' and t['url'].startswith('devtools://')))")
curl -s "http://localhost:9222/json/close/$DT_ID" > /dev/null
agent-browser close --all && agent-browser --cdp 9222 get url # expect app://renderer/
```
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
- **Streaming / ticking UI needs GIF evidence** — see `scripts/record-gif.sh`; a static screenshot cannot prove time-based behavior.
+78
View File
@@ -0,0 +1,78 @@
# Web (Full-Stack) Testing
Default surface for **full-stack changes** — a new/changed API plus the UI that
consumes it. The browser is the one surface where network requests and UI state
are observable together, so you can assert both sides of the contract in a
single run.
For pure-frontend changes prefer [electron.md](./electron.md); for
backend-only changes prefer [../cli/index.md](../cli/index.md).
## Prerequisites
- Complete [Step 0.0](../SKILL.md#00-resolve-the-current-test-environment) (resolve ports) and [Step -1](../SKILL.md#step--1--plan-approval-for-non-trivial-tests) (plan approval) first.
- Local dev server running — [../references/dev-server.md](../references/dev-server.md)
- Web auth verified in agent-browser — prefer `setup-auth.sh web-seed`, see [auth decision flow](../references/auth.md#web--decision-flow).
## Option A — agent-browser with seeded auth (recommended)
```bash
./.agents/skills/agent-testing/scripts/init-dev-env.sh seed-user
./.agents/skills/agent-testing/scripts/setup-auth.sh web-seed
```
Then drive the verified session:
```bash
SESSION=lobehub-dev
agent-browser --session $SESSION open "$SERVER_URL/"
agent-browser --session $SESSION snapshot -i
# interact via refs — full command reference: ../references/agent-browser.md
```
Use this session as the evidence source. Do not use ordinary Chrome screenshots
or Chrome Network records as proof for Web tests; ordinary Chrome is only a
fallback source for copying cookies into agent-browser when the seeded login is
not available.
### Watch the API while driving the UI
```bash
# After triggering the UI action under test:
agent-browser --session $SESSION network requests --type xhr,fetch
agent-browser --session $SESSION network requests --method POST
# Record a full HAR for the report
agent-browser --session $SESSION network har start
# ... drive the scenario ...
agent-browser --session $SESSION network har stop ./capture.har
```
Assert both layers: the request/response shape (network) and the rendered
result (snapshot/screenshot). Both belong in the report as evidence.
## Option B — real Chrome with remote debugging
For flows that need a real, visible browser (e.g. exercising the login UI
itself):
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-test-profile \
"<URL>" &
sleep 5
agent-browser --cdp 9222 snapshot -i
# Or auto-discover running Chrome with remote debugging
agent-browser --auto-connect snapshot -i
```
## Option C — Debug Proxy (local frontend, production backend)
`bun run dev:spa` prints a **Debug Proxy** URL
(`https://app.lobehub.com/_dangerous_local_dev_proxy?debug-host=…`) that loads
your local Vite SPA inside the online environment — HMR against real server
config. Useful for verifying frontend behavior against production data, **not**
for testing backend changes (the backend is production, not your branch).
+2 -2
View File
@@ -216,6 +216,6 @@ When using `--messages`, the output shows three sections (if context engine data
## Integration Points
- **Recording**: `src/server/services/agentRuntime/AgentRuntimeService.ts` — in the `executeStep()` method, after building `stepPresentationData`, writes partial snapshot in dev mode
- **Context engine capture**: `src/server/modules/AgentRuntime/RuntimeExecutors.ts` — in `call_llm` executor, after `serverMessagesEngine()` returns, calls `ctx.tracingContextEngine(input, output)`. `AgentRuntimeService.executeStep` buffers it per step and passes it to `traceRecorder.appendStep` as the typed `contextEngine` field (kept off the `events` array to stay out of Redis state).
- **Recording**: `apps/server/src/services/agentRuntime/AgentRuntimeService.ts` — in the `executeStep()` method, after building `stepPresentationData`, writes partial snapshot in dev mode
- **Context engine capture**: `apps/server/src/modules/AgentRuntime/RuntimeExecutors.ts` — in `call_llm` executor, after `serverMessagesEngine()` returns, calls `ctx.tracingContextEngine(input, output)`. `AgentRuntimeService.executeStep` buffers it per step and passes it to `traceRecorder.appendStep` as the typed `contextEngine` field (kept off the `events` array to stay out of Redis state).
- **Store**: `FileSnapshotStore` reads/writes to `.agent-tracing/` relative to `process.cwd()`
@@ -271,7 +271,7 @@ Lists in the same file you may need to touch:
- `defaultToolIds` — added to the agent's tool list by default
- `alwaysOnToolIds` — forced on regardless of user selection (use sparingly)
- `runtimeManagedToolIds` — enable state controlled by runtime, not user UI; **must mirror the rules map** in `src/server/modules/Mecha/AgentToolsEngine/index.ts` and `src/helpers/toolEngineering/index.ts`
- `runtimeManagedToolIds` — enable state controlled by runtime, not user UI; **must mirror the rules map** in `apps/server/src/modules/Mecha/AgentToolsEngine/index.ts` and `src/helpers/toolEngineering/index.ts`
---
-218
View File
@@ -1,218 +0,0 @@
---
name: cli-backend-testing
description: >
CLI + Backend integration testing workflow. Use when verifying backend API changes
(TRPC routers, services, models) via the LobeHub CLI against a local dev server.
Triggers on 'cli test', 'test with cli', 'verify with cli', 'local cli test',
'backend test with cli', or when needing to validate server-side changes end-to-end.
---
# CLI + Backend Integration Testing
Standard workflow for verifying backend changes using the LobeHub CLI (`lh`) against a local dev server.
## When to Use
- Verifying TRPC router / service / model changes end-to-end
- Testing new API fields or response structure changes
- Validating CLI command output after backend modifications
- Debugging data flow issues between server and CLI
## Prerequisites
| Requirement | Details |
| ------------ | ------------------------------------------------------------- |
| Dev server | `localhost:3011` (Next.js) |
| CLI source | `lobehub/apps/cli/` |
| CLI dev mode | Uses `LOBEHUB_CLI_HOME=.lobehub-dev` for isolated credentials |
| Auth | Device Code Flow login to local server |
## Quick Reference
All CLI dev commands run from `lobehub/apps/cli/`:
```bash
# Shorthand for all commands below
CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
```
## Workflow
### Step 1: Ensure Dev Server is Running
Check if the dev server is already running:
```bash
curl -s -o /dev/null -w '%{http_code}' http://localhost:3011/ 2> /dev/null
```
- **If reachable** (returns any HTTP status): server is running. Skip to Step 2.
- **If unreachable**: start the server:
```bash
# From cloud repo root
pnpm run dev:next
```
To **restart** (pick up server-side code changes):
```bash
lsof -ti:3011 | xargs kill
pnpm run dev:next
```
**Important:** Server-side code changes in the submodule (`lobehub/src/server/`, `lobehub/packages/`) require a server restart. Next.js hot-reload may not pick up changes in submodule packages.
### Step 2: Check CLI Authentication
Check if dev credentials already exist:
```bash
cat lobehub/apps/cli/.lobehub-dev/settings.json 2> /dev/null
```
- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: already authenticated. Skip to Step 3.
- **If file missing or points to wrong server**: login is needed. Ask the user to run:
```bash
! cd lobehub/apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3011
```
> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. After login, credentials are saved to `lobehub/apps/cli/.lobehub-dev/` and persist across sessions.
### Step 3: Test with CLI Commands
CLI runs from source (`bun src/index.ts`), so CLI-side code changes take effect immediately without rebuilding.
```bash
cd lobehub/apps/cli
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts <command>
```
### Step 4: Clean Up Test Data
Delete any test data created during verification:
```bash
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts task delete < id > -y
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts agent delete < id > -y
```
## Common Testing Patterns
### Task System
```bash
# List tasks
$CLI task list
# Create test data with nesting
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
# View task detail (tests getTaskDetail service)
$CLI task view T-1
# View task tree
$CLI task tree T-1
# Test lifecycle
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
# Clean up
$CLI task delete T-1 -y
```
### Agent System
```bash
# List agents
$CLI agent list
# View agent detail
$CLI agent view <agent-id>
# Run agent (tests agent execution pipeline)
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & Knowledge Base
```bash
# List documents
$CLI doc list
# Create and view
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
# Knowledge base
$CLI kb list
$CLI kb tree <kb-id>
```
### Model & Provider
```bash
# List models and providers
$CLI model list
$CLI provider list
# Test provider connectivity
$CLI provider test <provider-id>
```
## Dev-Test Cycle
The standard cycle for backend development:
```
1. Make code changes (service/model/router/type)
|
2. Run unit tests (fast feedback)
bunx vitest run --silent='passed-only' '<test-file>'
|
3. Restart dev server (if server-side changes)
lsof -ti:3011 | xargs kill && pnpm run dev:next
|
4. CLI verification (end-to-end)
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts <command>
|
5. Clean up test data
```
### When Server Restart is Needed
| Change Location | Restart? |
| ----------------------------------------- | -------- |
| `lobehub/src/server/` (routers, services) | Yes |
| `lobehub/packages/database/` (models) | Yes |
| `lobehub/packages/types/` | Yes |
| `lobehub/packages/prompts/` | Yes |
| `lobehub/apps/cli/` (CLI code) | No |
| `src/` (cloud overrides) | Yes |
### When Server Restart is NOT Needed
CLI runs from source via `bun src/index.ts`, so any changes to `lobehub/apps/cli/src/` take effect immediately on next command invocation.
## Troubleshooting
| Issue | Solution |
| --------------------------- | --------------------------------------------------------------------- |
| `No authentication found` | Run `login --server http://localhost:3011` |
| `UNAUTHORIZED` on API calls | Token expired; re-run login |
| `ECONNREFUSED` | Dev server not running; start with `pnpm run dev:next` |
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| `EADDRINUSE` on port 3011 | Server already running; kill with `lsof -ti:3011 \| xargs kill` |
| Login opens wrong server | Must use `--server http://localhost:3011` flag (env var doesn't work) |
## Credential Isolation
| Mode | Credential Dir | Server |
| ---------- | -------------------------------- | ----------------- |
| Dev | `lobehub/apps/cli/.lobehub-dev/` | `localhost:3011` |
| Production | `~/.lobehub/` | `app.lobehub.com` |
The two environments are completely isolated. Dev mode credentials are gitignored.
+5 -5
View File
@@ -111,7 +111,7 @@ Generate video from text prompt. This is an async operation.
**Source**: `apps/cli/src/commands/generate/video.ts`
```bash
lh gen video "A cat playing piano" -m <model> -p <provider> [options]
lh gen video "A cat playing piano" -m < model > -p < provider > [options]
```
| Option | Description | Required |
@@ -259,13 +259,13 @@ Image and video generation use an async task pattern:
UUID from the `async_tasks` table, not `gen_xxx`
- Returns `{ status, error, generation }` (generation includes asset URLs on success)
- Before querying, calls `checkTimeoutTasks` which marks tasks as `error` if they have been
`pending` or `processing` for more than ~5 minutes (`ASYNC_TASK_TIMEOUT = 298s`)
`pending` or `processing` for more than \~5 minutes (`ASYNC_TASK_TIMEOUT = 298s`)
**Server routes**:
- `src/server/routers/lambda/image/index.ts` — image creation (uses `authedProcedure` + `serverDatabase`)
- `src/server/routers/lambda/video/index.ts` — video creation (uses `authedProcedure` + `serverDatabase`)
- `src/server/routers/lambda/generation.ts` — status checking
- `apps/server/src/routers/lambda/image/index.ts` — image creation (uses `authedProcedure` + `serverDatabase`)
- `apps/server/src/routers/lambda/video/index.ts` — video creation (uses `authedProcedure` + `serverDatabase`)
- `apps/server/src/routers/lambda/generation.ts` — status checking
- `packages/database/src/models/asyncTask.ts``AsyncTaskModel` including `checkTimeoutTasks`
**Note**: Image/video routes do NOT use the `keyVaults` middleware — they read API keys from the database via `initModelRuntimeFromDB` or `createAsyncCaller`.
+60
View File
@@ -6,6 +6,66 @@ user-invocable: false
# Database Migrations Guide
## Development-stage schema changes
Schema changes churn during feature development. When the schema changes before the migration has shipped, do not hand-edit the existing migration SQL to chase the new schema shape. Delete the draft migration artifacts added by this branch (SQL file, matching snapshot, and matching journal entry), then run the generator again and re-apply the normal migration review steps below.
For example, if this branch's draft migration is `0110_add_verify_tables_and_ai_infra_id`:
```bash
# 1. Delete the draft SQL and its snapshot
rm packages/database/migrations/0110_add_verify_tables_and_ai_infra_id.sql
rm packages/database/migrations/meta/0110_snapshot.json
# 2. Remove the matching 0110 entry from the journal's "entries" array
# packages/database/migrations/meta/_journal.json
# 3. Regenerate from the current schema
bun run db:generate
```
This keeps the generated SQL, snapshot, and journal aligned with the actual schema. Manual SQL edits are reserved for review-time hardening such as idempotent clauses, custom extension SQL, and meaningful filename/tag updates.
Before release, if a feature branch accumulated multiple development-only migrations, consolidate them into one migration when possible. Production does not need to replay every intermediate draft shape, and fewer migrations reduce deploy-time risk.
For example, if this branch added `0110`, `0111`, and `0112`, delete all three drafts and regenerate a single migration:
```bash
# 1. Delete every draft SQL and snapshot this branch added
rm packages/database/migrations/011{0,1,2}_*.sql
rm packages/database/migrations/meta/011{0,1,2}_snapshot.json
# 2. Remove the 0110/0111/0112 entries from the journal's "entries" array
# packages/database/migrations/meta/_journal.json
# 3. Regenerate one migration covering the full schema delta
bun run db:generate
```
Do not make a migration compatible with earlier development-only versions of the same branch. While the migration has not shipped, there is no production history to preserve. Fix local/dev databases directly with whatever SQL is simplest (drop the draft table, rename a column, delete draft rows), then regenerate the branch migration from the current schema.
For example, if an earlier draft on this branch created `signup_attempt_id` and you have since renamed it to `user_signup_log_id`, do not add a compatibility `ALTER ... RENAME` to the migration. Just fix the dev DB directly (see the `access-pg` skill for the `bun -e` + `pg` pattern), then regenerate:
```bash
# Fix the dev DB to match the new schema (simplest SQL wins)
set -a && source .env && set +a && bun -e '
import pg from "pg";
const client = new pg.Client({ connectionString: process.env.DATABASE_URL });
await client.connect();
await client.query("ALTER TABLE user_signup_logs DROP COLUMN signup_attempt_id");
await client.end();
'
# Regenerate so the migration reflects only the final shape
bun run db:generate
```
After a migration has reached production or the target default branch, treat it as immutable: add a follow-up migration instead of rewriting it.
## Rebase conflicts
When a rebase conflicts in migration files, keep the upstream/default-branch migrations and remove all migrations introduced by the current feature branch. Complete the rebase, then regenerate this branch's migration from the rebased schema. This avoids merging two independent snapshots or hand-splicing journal entries.
## Step 1: Generate Migrations
```bash
+1 -1
View File
@@ -57,7 +57,7 @@ process.env.DEBUG = 'lobe-*';
## Example
```typescript
// src/server/routers/edge/market/index.ts
// apps/server/src/routers/edge/market/index.ts
import debug from 'debug';
const log = debug('lobe-edge-router:market');
+183 -3
View File
@@ -6,16 +6,24 @@ user-invocable: false
# Drizzle ORM Schema Style Guide
> **Adding a Model or Repository?** Ship a sibling test in the same PR — every new
> file under `packages/database/src/models/**` or `src/repositories/**` needs a
> matching `__tests__/<name>.test.ts`. See the **testing** skill
> (`.agents/skills/testing/references/db-model-test.md`) for the `getTestDB()`
> integration pattern, user-isolation tests, the BM25 `describe.skipIf(!isServerDB)`
> guard, and schema gotchas. CI's coverage patch gate won't reliably catch a brand-new
> untested file, so this is on you.
## Configuration
- Config: `drizzle.config.ts`
- Schemas: `src/database/schemas/`
- Migrations: `src/database/migrations/`
- Schemas: `packages/database/src/schemas/`
- Migrations: `packages/database/migrations/`
- Dialect: `postgresql` with `strict: true`
## Helper Functions
Location: `src/database/schemas/_helpers.ts`
Location: `packages/database/src/schemas/_helpers.ts`
- `timestamptz(name)`: Timestamp with timezone
- `createdAt()`, `updatedAt()`, `accessedAt()`: Standard timestamp columns
@@ -25,16 +33,42 @@ Location: `src/database/schemas/_helpers.ts`
- **Tables**: Plural snake_case (`users`, `session_groups`)
- **Columns**: snake_case (`user_id`, `created_at`)
- **New tables**: Check nearby existing tables before naming a new one. Preserve
the established noun family and suffix. For example, if the user-scoped table
is `user_xxx_logs`, the workspace-scoped counterpart should be
`workspace_xxx_logs`, not `workspace_xxx_records` or another new synonym.
```typescript
// ✅ Good: follows the existing user/workspace table family.
export const userSignupLogs = pgTable('user_signup_logs', { ... });
export const workspaceSignupLogs = pgTable('workspace_signup_logs', { ... });
// ❌ Bad: introduces a new suffix for the same concept.
export const workspaceSignupRecords = pgTable('workspace_signup_records', { ... });
```
## Column Definitions
### Primary Keys
Do not use auto-incrementing primary keys (`serial`, `bigserial`, generated
identity columns). They create sequence-state problems during cross-database
migrations, restores, and data copy jobs. Prefer text IDs from application
generators (`idGenerator`, `createNanoId`) or `uuid` for internal tables.
Keep `$defaultFn(...)` when a table normally owns ID generation. Callers can
still pass an explicit `id`; the default only runs when the insert omits it. Do
not remove the default just because one flow needs to supply a request-scoped ID.
```typescript
// ✅ Good: app-generated text ID; explicit inserts can still override it.
id: text('id')
.primaryKey()
.$defaultFn(() => idGenerator('agents'))
.notNull(),
// ❌ Bad: sequence state is fragile across DB migrations and restores.
id: serial('id').primaryKey(),
```
ID prefixes make entity types distinguishable. For internal tables, use `uuid`.
@@ -53,6 +87,80 @@ userId: text('user_id')
...timestamps, // Spread from _helpers.ts
```
### Optional and Undefined Values
Do not introduce artificial sentinel strings for missing values, such as
`unknown`, unless the domain already has that explicit state and existing code
uses it consistently. Prefer nullable columns, optional TypeScript fields, or a
separate concrete status enum when the value is genuinely absent.
```typescript
// ✅ Good: absent until the final stage writes a real decision.
export type UserSignupLogFinalDecision = 'allow' | 'block' | 'error';
finalDecision: varchar('final_decision', { length: 32 }).$type<UserSignupLogFinalDecision>(),
// ❌ Bad: invents a new state that callers now need to handle everywhere.
export type UserSignupLogFinalDecision = 'allow' | 'block' | 'error' | 'unknown';
finalDecision: varchar('final_decision', { length: 32 })
.$type<UserSignupLogFinalDecision>()
.notNull()
.default('unknown');
```
### Field Descriptions
For columns whose meaning is not obvious from the name alone, add JSDoc on the
schema field. Include a concrete example when it clarifies the stored value or
the lifecycle moment that writes it. This is especially important for external
IDs, lifecycle statuses, denormalized snapshots, JSONB signals, and fields whose
name could mean either a request ID or a persisted row ID.
```typescript
// ✅ Good: explain the table's business object first, then only document
// non-obvious lifecycle or risk-control fields.
/**
* User signup logs - one row per signup flow, collecting stage-level
* risk-control decisions before and after the auth provider creates a user.
*/
export const userSignupLogs = pgTable('user_signup_logs', {
/** Final signup outcome reason, for example user_created, llm_block, or guard_error */
finalReason: text('final_reason'),
/** Aggregated risk level derived from stage decisions, for example block -> high */
riskLevel: varchar('risk_level', { length: 16 }).$type<UserSignupLogRiskLevel>(),
/** Ordered stage-level decisions and metadata grouped by signup review stage */
stageResults: jsonb('stage_results').$type<UserSignupLogStageResults>(),
});
// ❌ Bad: comments restate obvious column names without adding domain meaning.
/** User email */
email: text('email'),
```
### JSONB Types
Avoid `Record<string, unknown>` or similarly loose JSONB types for schema
columns. Define a concrete interface that describes the expected JSON shape, even
when most properties are optional. This keeps callers, migrations, and review
queries aligned on the same data contract.
```typescript
interface UserSignupLogMetadata {
payloadPath?: string;
requestPath?: string;
}
metadata: jsonb('metadata').$type<UserSignupLogMetadata>(),
```
```typescript
// ❌ Bad: hides the contract and makes downstream access untyped.
metadata: jsonb('metadata').$type<Record<string, unknown>>(),
```
### Indexes
```typescript
@@ -174,6 +282,78 @@ const rows = await this.db
.groupBy(agentEvalDatasets.id);
```
### Raw SQL and Advanced Queries
Prefer Drizzle builders whenever the query reads clearly with `select`,
`insert().select()`, `update().from()`, joins, CTEs, and `groupBy` — this keeps
table/column references tied to schema, so changes surface as TypeScript errors.
Within a builder, expression-level `sql<T>` is fine for features lacking a helper
(JSON path, casts, aggregates, `CASE`, `NOW()`). Row locks are clauses, not
expressions — use `.for('update')`, never raw `FOR UPDATE`.
Use `COALESCE` only when null-handling is part of required DB semantics (nullable
JSONB append/merge, "keep first non-null"). Don't scatter
`COALESCE(excluded.col, current.col)` across ordinary upsert scalars just to avoid
an update object — build `set` from defined values only, and hide any remaining
SQL behind named helpers (`appendJsonbArray`, `mergeJsonbObject`, `keepFirstValue`)
so the method reads as business intent, not SQL plumbing.
```typescript
// ✅ Scalars included only when present; SQL hidden behind a named helper.
const updateValues = compactUndefined({
email: record.email ?? undefined,
ip: record.ip ?? undefined,
});
await db.insert(userSignupLogs).values(values).onConflictDoUpdate({
set: { ...updateValues, stageResults: appendStageResult(stage, result), updatedAt: now },
target: userSignupLogs.id,
});
// ❌ Every scalar becomes SQL plumbing.
set: {
email: sql`COALESCE(excluded.email, ${userSignupLogs.email})`,
ip: sql`COALESCE(excluded.ip, ${userSignupLogs.ip})`,
}
```
When refactoring raw SQL:
- Preserve query shape on latency-sensitive paths. If raw SQL is one roundtrip,
don't split it into multiple depth-based queries just to drop `execute`.
- Use `$with(...)` + `insert().select()` / `update().from()` for multi-step
single-roundtrip writes Drizzle can express.
- Don't rely on `execute<MyRow>(sql...)` for safety — it types rows but doesn't keep
selected columns in sync with schema changes.
- If only a PostgreSQL feature Drizzle can't express works, keep the raw SQL and
tighten it: schema refs in interpolations, explicit user scope, a narrow row
interface, and regression tests.
Recursive CTEs are the canonical "keep raw" case — there's no clean `WITH RECURSIVE`
builder, and a rewrite would add depth-based roundtrips:
```typescript
interface TaskTreeRow {
id: string;
parent_task_id: string | null;
}
// execute<T> acceptable: no clean WITH RECURSIVE builder. Keep schema refs in the
// interpolations and scope every leg to the user.
const { rows } = await db.execute<TaskTreeRow>(sql`
WITH RECURSIVE task_tree AS (
SELECT ${tasks.id}, ${tasks.parentTaskId}
FROM ${tasks}
WHERE ${tasks.id} = ${rootTaskId} AND ${tasks.createdByUserId} = ${userId}
UNION ALL
SELECT ${tasks.id}, ${tasks.parentTaskId}
FROM ${tasks}
JOIN task_tree ON ${tasks.parentTaskId} = task_tree.id
WHERE ${tasks.createdByUserId} = ${userId}
)
SELECT * FROM task_tree
`);
```
### One-to-Many (Separate Queries)
When you need a parent record with its children, use two queries instead of relational `with:`:
@@ -241,6 +241,6 @@ When the bug comes from a real trace, distill it into the closest existing test
3. Add or update the narrowest failing test near the broken layer.
4. Fix the smallest layer that can explain the symptom.
5. Re-run focused tests.
6. Only then do an Electron smoke test with the `local-testing` skill if UI confirmation is still needed.
6. Only then do an Electron smoke test with the `agent-testing` skill if UI confirmation is still needed.
Do not start with a broad Electron repro if a raw trace or adapter test can prove the fault zone faster.
-561
View File
@@ -1,561 +0,0 @@
---
name: local-testing
description: >
Local app and bot testing. Uses agent-browser CLI for Electron/web app UI testing,
and osascript (AppleScript) for controlling native macOS apps (WeChat, Discord, Telegram, Slack, Lark/飞书, QQ)
to test bots. Triggers on 'local test', 'test in electron', 'test desktop', 'test bot',
'bot test', 'test in discord', 'test in telegram', 'test in slack', 'test in weixin',
'test in wechat', 'test in lark', 'test in feishu', 'test in qq',
'manual test', 'osascript', or UI/bot verification tasks.
---
# Local App & Bot Testing
Two approaches for local testing on macOS:
| Approach | Tool | Best For |
| --------------------------- | ------------------- | ---------------------------------------------------- |
| **agent-browser + CDP** | `agent-browser` CLI | Electron apps, web apps (DOM access, JS eval) |
| **osascript (AppleScript)** | `osascript -e` | Native macOS apps (WeChat, Discord, Telegram, Slack) |
---
# Part 1: agent-browser (Electron / Web Apps)
Use `agent-browser` to automate Chromium-based apps via Chrome DevTools Protocol.
Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome. Run `agent-browser upgrade` to update.
## Core Workflow
Every browser automation follows this pattern:
1. **Navigate**: `agent-browser open <url>`
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
3. **Interact**: Use refs to click, fill, select
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
```bash
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser snapshot -i # Check result
```
## Command Chaining
```bash
# Chain open + wait + snapshot in one call
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
```
Use `&&` when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
## Essential Commands
```bash
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait --load networkidle # Wait for network idle
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Welcome" # Wait for text to appear
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser pdf output.pdf # Save as PDF
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "text" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if dialog is open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff screenshot --baseline before.png # Visual pixel diff
agent-browser diff url <url1> <url2> # Compare two pages
# Streaming
agent-browser stream enable # Start WebSocket streaming
agent-browser stream status # Inspect streaming state
agent-browser stream disable # Stop streaming
```
## Batch Execution
```bash
echo '[
["open", "https://example.com"],
["snapshot", "-i"],
["click", "@e1"],
["screenshot", "result.png"]
]' | agent-browser batch --json
```
## Authentication
```bash
# Option 1: Auth vault (credentials stored encrypted)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
# Option 2: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
agent-browser close # State auto-saved
agent-browser --session-name myapp open https://app.example.com/dashboard # Auto-restored
# Option 3: Persistent profile
agent-browser --profile ~/.myapp open https://app.example.com/login
# Option 4: State file
agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3011`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [references/agent-browser-login.md](./references/agent-browser-login.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find text "Sign In" click
agent-browser find label "Email" fill "user@test.com"
agent-browser find role button click --name "Submit"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
```
## JavaScript Evaluation (eval)
```bash
# Simple expressions
agent-browser eval 'document.title'
# Complex JS: use --stdin with heredoc (RECOMMENDED)
agent-browser eval --stdin << 'EVALEOF'
JSON.stringify(
Array.from(document.querySelectorAll("img"))
.filter(i => !i.alt)
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
)
EVALEOF
# Base64 encoding (avoids all shell escaping issues)
agent-browser eval -b "$(echo -n 'document.title' | base64)"
```
## Ref Lifecycle
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after clicking links/buttons that navigate, form submissions, or dynamic content loading.
## Annotated Screenshots (Vision Mode)
```bash
agent-browser screenshot --annotate
# Output includes the image path and a legend:
# [1] @e1 button "Submit"
# [2] @e2 link "Home"
agent-browser click @e2 # Click using ref from annotated screenshot
```
## Parallel Sessions
```bash
agent-browser --session site1 open https://site-a.com
agent-browser --session site2 open https://site-b.com
agent-browser session list
```
## Connect to Existing Chrome
```bash
agent-browser --auto-connect snapshot # Auto-discover running Chrome
agent-browser --cdp 9222 snapshot # Explicit CDP port
```
## iOS Simulator (Mobile Safari)
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i
agent-browser -p ios tap @e1
agent-browser -p ios swipe up
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close
```
## Observability Dashboard
```bash
agent-browser dashboard install
agent-browser dashboard start # Background server on port 4848
agent-browser dashboard stop
```
## Cloud Providers
Use `-p <provider>` to run against cloud browsers: `agentcore`, `browserbase`, `browserless`, `browseruse`, `kernel`.
## Browser Engine Selection
```bash
agent-browser --engine lightpanda open example.com # 10x faster, 10x less memory
```
## Electron (LobeHub Desktop)
### Setup / Teardown
Use the `electron-dev.sh` script to manage the Electron dev environment. It handles process lifecycle, waits for SPA readiness, and reliably kills all child processes (main + helpers + vite).
```bash
SCRIPT=".agents/skills/local-testing/scripts/electron-dev.sh"
# Start Electron dev with CDP (idempotent — skips if already running)
$SCRIPT start
# Check if Electron is running and CDP is reachable
$SCRIPT status
# Kill all Electron-related processes (main + helper + vite)
$SCRIPT stop
# Force fresh restart
$SCRIPT restart
```
After `start` succeeds, connect with: `agent-browser --cdp 9222 snapshot -i`
**Always run `$SCRIPT stop` when done testing**`pkill -f "Electron"` alone won't catch all helper processes.
#### Environment Variables
| Variable | Default | Description |
| ----------------- | ----------------------- | ---------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `ELECTRON_LOG` | `/tmp/electron-dev.log` | Electron process log |
| `ELECTRON_WAIT_S` | `60` | Max seconds to wait for Electron process |
| `RENDERER_WAIT_S` | `60` | Max seconds to wait for SPA to load |
### LobeHub-Specific Patterns
#### Access Zustand Store State
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
return JSON.stringify({
ops: ops.map(function(o) { return { type: o.type, status: o.status }; }),
activeAgent: chat.activeAgentId,
activeTopic: chat.activeTopicId,
});
})()
EVALEOF
```
#### Find and Use the Chat Input
```bash
# The chat input is contenteditable — must use -C flag
agent-browser --cdp 9222 snapshot -i -C 2>&1 | grep "editable"
agent-browser --cdp 9222 click @e48
agent-browser --cdp 9222 type @e48 "Hello world"
agent-browser --cdp 9222 press Enter
```
#### Wait for Agent to Complete
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
var chat = window.__LOBE_STORES.chat();
var ops = Object.values(chat.operations);
var running = ops.filter(function(o) { return o.status === 'running'; });
return running.length === 0 ? 'done' : 'running: ' + running.length;
})()
EVALEOF
```
#### Install Error Interceptor
```bash
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(function() {
window.__CAPTURED_ERRORS = [];
var orig = console.error;
console.error = function() {
var msg = Array.from(arguments).map(function(a) {
if (a instanceof Error) return a.message;
return typeof a === 'object' ? JSON.stringify(a) : String(a);
}).join(' ');
window.__CAPTURED_ERRORS.push(msg);
orig.apply(console, arguments);
};
return 'installed';
})()
EVALEOF
# Later, check captured errors:
agent-browser --cdp 9222 eval "JSON.stringify(window.__CAPTURED_ERRORS)"
```
## Chrome / Web Apps
```bash
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-test-profile \
"<URL>" &
sleep 5
agent-browser --cdp 9222 snapshot -i
# Or auto-discover running Chrome with remote debugging
agent-browser --auto-connect snapshot -i
```
---
# Part 2: osascript (Native macOS App Bot Testing)
Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. Works with any app that supports macOS Accessibility, no CDP or Chromium needed.
The pattern is the same for every platform:
1. **Activate** the app (`tell application "X" to activate`)
2. **Navigate** to a channel/chat (Quick Switcher `Cmd+K` or Search `Cmd+F`)
3. **Send** a message (clipboard paste `Cmd+V` + Enter)
4. **Wait** for the bot response
5. **Screenshot** for verification (`screencapture` + `Read` tool)
## Per-Platform References
Pick the file for your target platform — each contains activation, navigation, send-message, and verification snippets specific to that app:
Each channel has its own folder under `bot/<channel>/` containing an `index.md`
(activation, navigation, send-message, and verification snippets specific to
that app) and its test script:
| Platform | Reference | Quick switcher |
| ------------- | ------------------------------------------------ | -------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
For **shared osascript patterns** (activate, type, paste, screenshot, read accessibility, common workflow template, gotchas), see [bot/osascript-common.md](./bot/osascript-common.md). Read this first if you're new to osascript automation.
## Bridge-based channels (no native app)
Some channels have no native app to drive with osascript — they connect through
a local bridge inside the Desktop app. These are tested with agent-browser
(IPC + UI) plus the bridge's own HTTP/REST endpoints, not osascript:
| Channel | Reference | What it drives |
| -------- | ------------------------------------------------ | -------------------------------------------------------- |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | `imessageBridge.*` IPC + local bridge + BlueBubbles REST |
For iMessage there is a one-shot regression script — see `test-imessage-bridge.sh` below.
---
# Scripts
**App / recording scripts** in `.agents/skills/local-testing/scripts/`:
| Script | Usage |
| ------------------------- | --------------------------------------------------- |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart) |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `record-app-screen.sh` | Record app screen (video + screenshots, start/stop) |
**Bot scripts** live under `.agents/skills/local-testing/bot/`, one folder per
channel (alongside that channel's `index.md`). The shared
`capture-app-window.sh` sits at the `bot/` root:
| Script | Usage |
| ---------------------------------- | ------------------------------------------------------------------- |
| `capture-app-window.sh` | Capture screenshot of a specific app window (used by bot tests) |
| `discord/test-discord-bot.sh` | Send message to Discord bot via osascript |
| `slack/test-slack-bot.sh` | Send message to Slack bot via osascript |
| `telegram/test-telegram-bot.sh` | Send message to Telegram bot via osascript |
| `wechat/test-wechat-bot.sh` | Send message to WeChat bot via osascript |
| `lark/test-lark-bot.sh` | Send message to Lark / 飞书 bot via osascript |
| `qq/test-qq-bot.sh` | Send message to QQ bot via osascript |
| `imessage/test-imessage-bridge.sh` | Regression-test the iMessage BlueBubbles bridge (IPC + HTTP) |
| `imessage/send-imessage-test.sh` | Send one real iMessage (desktop → BB → iMessage) and verify it sent |
### Window Screenshot Utility
`capture-app-window.sh` captures a screenshot of a specific app window using `screencapture -l <windowID>`. It uses Swift + CGWindowList to find the window by process name, so screenshots work correctly even when the window is on an external monitor or behind other windows.
```bash
# Standalone usage
./.agents/skills/local-testing/bot/capture-app-window.sh "Discord" /tmp/discord.png
./.agents/skills/local-testing/bot/capture-app-window.sh "Slack" /tmp/slack.png
./.agents/skills/local-testing/bot/capture-app-window.sh "WeChat" /tmp/wechat.png
```
All bot test scripts use this utility automatically for their screenshots.
### Bot Test Scripts
All bot test scripts share the same interface:
```bash
./scripts/test-<platform>-bot.sh <channel_or_contact> <message> [wait_seconds] [screenshot_path]
```
Examples:
```bash
# Discord — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
# Slack — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
# Telegram — test a bot by username
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
# WeChat — test a bot or send to a contact
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
# Lark/飞书 — test a bot in a group chat
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
# QQ — test a bot in a group or direct chat
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
Each script: activates the app, navigates to the channel/contact, pastes the message via clipboard, sends, waits, and takes a screenshot. Use the `Read` tool on the screenshot for visual verification.
### iMessage bridge regression script
`test-imessage-bridge.sh` does **not** follow the osascript bot interface — it
drives the Desktop bridge's IPC + HTTP layers and asserts the result, then
self-cleans. Needs BlueBubbles running and Electron up with CDP.
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
# defaults: bb_url=http://127.0.0.1:1234 cdp_port=9222 — exit 0 = all green
```
It guards the connect/configure flow (testConfig happy + reject paths, first-time
`upsertConfig` save, bridge running + webhook registered, local-server secret
enforcement). See [bot/imessage/index.md](./bot/imessage/index.md)
for the full manual UI flow and known bugs.
---
# Screen Recording
Record automated demos using `record-app-screen.sh` (start/stop lifecycle, CDP screenshots + ffmpeg assembly). See [references/record-app-screen.md](references/record-app-screen.md) for full documentation.
```bash
./.agents/skills/local-testing/scripts/electron-dev.sh start
./.agents/skills/local-testing/scripts/record-app-screen.sh start my-demo
# ... run automation ...
./.agents/skills/local-testing/scripts/record-app-screen.sh stop
```
Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/` (screenshots every 3s).
---
# Gotchas
### agent-browser
- **Daemon can get stuck** — if commands hang, `agent-browser close --all` or `pkill -f agent-browser` to reset
- **HMR invalidates everything** — after code changes, refs break. Re-snapshot or restart
- **`snapshot -i` doesn't find contenteditable** — use `snapshot -i -C` for rich text editors
- **`fill` doesn't work on contenteditable** — use `type` for chat inputs
- **Screenshots go to `~/.agent-browser/tmp/screenshots/`** — read them with the `Read` tool
- **Dialogs block all commands** — if commands time out, check `agent-browser dialog status`
- **Default timeout is 25s** — override with `AGENT_BROWSER_DEFAULT_TIMEOUT` (ms) or use explicit waits
- **Shell quoting corrupts eval** — use `eval --stdin <<'EVALEOF'` for complex JS
### Electron-specific
- **Always use `electron-dev.sh stop` to clean up** — `pkill -f "Electron"` only kills the main process; helper processes (GPU, renderer, network) survive. The script finds and kills all of them via PID matching against the project's electron binary path.
- **`npx electron-vite dev` must run from `apps/desktop/`** — running from project root fails silently. The `electron-dev.sh` script handles this automatically.
- **Don't resize the Electron window after load** — resizing triggers full SPA reload
- **Store is at `window.__LOBE_STORES`** not `window.__ZUSTAND_STORES__`
### osascript
See [bot/osascript-common.md](./bot/osascript-common.md#gotchas) for the full osascript gotchas list (accessibility permissions, `keystroke` non-ASCII issues, locale-specific app names, rate limiting, etc.).
@@ -1,110 +0,0 @@
# Log `agent-browser` into a local LobeHub dev server
`agent-browser --headed` on macOS often creates the Chromium window off-screen — the user can't see or interact with it, so manual login inside the agent-browser session fails. Instead of sharing the user's real Chrome profile, copy the **better-auth session cookie** out of a request in DevTools and inject it into the agent-browser session as a Playwright-style state file.
## When to use
- You need `agent-browser` to reach an authenticated page on `http://localhost:<port>` (e.g. `localhost:3011`).
- The user already has a logged-in tab of the same dev server in their own Chrome.
- Spawning a headed Chromium to let the user log in manually is unreliable (window off-screen, no interaction).
Do **not** use this on production URLs — only local dev. Treat the cookie as a secret: don't paste it into shared logs, PRs, or commit it anywhere.
## Step 1 — Ask the user to copy the cookie from a Network request, NOT `document.cookie`
`document.cookie` will not return HttpOnly cookies, which is exactly where better-auth puts its session. Instruct the user:
1. Open the logged-in tab (`http://localhost:<port>/…`) in their own Chrome.
2. `Cmd+Option+I`**Network** tab.
3. Refresh, click any same-origin request (e.g. the top-level document request).
4. In the right pane under **Request Headers**, right-click the `Cookie:` line → **Copy value** (or copy the entire header).
5. Paste the string into chat.
You only need the better-auth pieces. Everything else (Clerk, `LOBE_LOCALE`, HMR hash, theme vars) is noise and can stay. The minimum viable set is:
```
better-auth.session_token=<value>; better-auth.state=<value>
```
## Step 2 — Build a Playwright-style state file
`agent-browser state load` expects Playwright's `storageState` format: a JSON with a `cookies` array and an `origins` array.
```bash
cat > /tmp/mkstate.py << 'PY'
import json, sys, time
# Read the Cookie header from stdin (allows optional "Cookie: " prefix).
raw = sys.stdin.read().strip()
if raw.lower().startswith("cookie:"):
raw = raw.split(":", 1)[1].strip()
# Keep only better-auth cookies. Extend this set if the app genuinely needs more.
WANTED = {"better-auth.session_token", "better-auth.state"}
cookies = []
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
for pair in raw.split("; "):
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input\n")
sys.exit(1)
print(json.dumps({"cookies": cookies, "origins": []}, indent=2))
PY
# Feed the copied Cookie header in via env var or heredoc.
printf '%s' "$COOKIE_HEADER" | python3 /tmp/mkstate.py > /tmp/state.json
```
**Note on `httpOnly`**: the real cookie in the user's browser is HttpOnly, but `storageState` doesn't enforce the flag on load — it just attaches the value. Storing with `httpOnly: false` is fine for local dev and sidesteps a CDP-context quirk where HttpOnly cookies sometimes fail to attach.
## Step 3 — Load state and navigate
```bash
SESSION="my-test" # any stable session name
agent-browser --session "$SESSION" state load /tmp/state.json
agent-browser --session "$SESSION" open "http://localhost:3011/"
agent-browser --session "$SESSION" get url
# Expect NOT /signin?callbackUrl=… — if you still see signin, cookie didn't apply.
```
## Step 4 — Verify
```bash
agent-browser --session "$SESSION" snapshot -i | head -20
# Look for the user's avatar/name in the sidebar, or absence of the signin form.
```
## Common failure modes
| Symptom | Cause | Fix |
| ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- |
| Still redirects to `/signin` after `state load` | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| `state load` reports 0 cookies | Separator wrong, or user pasted URL-decoded value | Keep the raw `Cookie:` header as-is; split on `"; "` |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-load |
| Domain mismatch | Use `domain: "localhost"` literally, no leading dot for local dev | — |
## Scope
Only covers authenticating an **agent-browser** session into a **local** LobeHub dev server. It does not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com` and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI need a real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way (into agent-browser only).
@@ -0,0 +1,69 @@
---
name: model-bank-metadata
description: 'Backfill and maintain model-bank metadata (knowledgeCutoff, family, generation). Use when adding models, fixing cutoff/family data, running a metadata sweep across aiModels providers, or researching official knowledge cutoffs.'
user-invocable: false
---
# Model-Bank Metadata (knowledgeCutoff / family / generation)
How to populate and maintain the three structured metadata fields on `packages/model-bank/src/aiModels/*.ts` model cards, at single-model scale (new model PR) or repo-wide scale (sweep across \~80 provider files / \~1900 entries).
## Field semantics
| Field | Format | Meaning |
| ----------------- | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `knowledgeCutoff` | `'YYYY-MM'` (or `'YYYY'` if only the year is published) | World-knowledge cutoff. When a vendor distinguishes a **"reliable knowledge cutoff"** from the broader training-data cutoff (Anthropic does), always use the **reliable** one. |
| `family` | lowercase slug (`claude`, `gpt`, `o-series`, `qwen`, `deepseek`, `llama`, `glm`, …) | Model lineage, finer than `organization`. Lets the UI group models and match the same model across aggregator providers. |
| `generation` | family slug + version (`claude-4.6`, `gpt-5.2`, `qwen3.5`, `llama-3.1`) | Generation within the family. Only set when confidently derivable from the model line's naming. Rolling aliases (`qwen-max`, `deepseek-chat`, `gemini-flash-latest`) get `family` only. |
All three are optional. **The cardinal rule: only fill what an authoritative source states or naming rules derive — never guess.** An empty field is correct for vendors that publish nothing.
No DB migration is ever needed for these: builtin models are merged from model-bank at read time (`repositories/aiInfra/index.ts` spreads the whole card), so new card fields flow to the client automatically.
## Sourcing rules for knowledgeCutoff
Accept only:
- Vendor official docs (platform.openai.com / developers.openai.com, docs.x.ai, ai.google.dev, docs.anthropic.com / platform.claude.com)
- Official Hugging Face org model cards (huggingface.co/meta-llama/..., etc.)
- Official tech reports / system cards / launch blog posts
Reject:
- **Third-party aggregator sites** (aiknowledgecutoff.com and similar) — proven to copy one model's value across a whole family. A Cohere sweep once claimed `2024-06` for four distinct base models; none of the cited Cohere pages said that, and the only cutoff Cohere actually publishes is Feb 2023 for the 08-2024 Command R/R+ refresh.
- **AWS Bedrock model cards as sole source** — proven to conflate launch date with knowledge cutoff (DeepSeek R1's card lists both as "Jan 2025"). If Bedrock is the only place a value appears, leave the field empty.
- Inference from `releasedAt` — a release date is not a cutoff.
Variant inheritance: dated snapshots (`-2024-08-06`), speed/price tiers of the same checkpoint, quantizations (`-fp8`, `-awq`), context-length variants (`-32k`), ollama `:NNb` tags, and cloud-prefixed ids (`anthropic.`/`us.`/`global.` Bedrock ids) share their base model's cutoff. **Distills do not inherit** from teacher or base — use the distill's own published value or leave empty. **Sizes within one generation can genuinely differ**: Llama 3 8B is Mar 2023 while 70B is Dec 2023 (per Meta's own card) — don't "fix" that to one family-wide value.
Vendors that publish no cutoffs (leave empty, don't chase): Qwen, DeepSeek, GLM/Zhipu, ERNIE, Doubao, Hunyuan, SenseNova, Spark, MiniMax, StepFun, Yi (mostly), Moonshot.
Known per-vendor footguns:
- **Anthropic**: Opus 4.6 reliable cutoff is `2025-05`, Sonnet 4.6 is `2025-08` — easy to swap. Claude 3.7 is `2024-10` (system card: trained through Nov 2024, knowledge cutoff end of Oct 2024). Cite system cards / the models overview, not the Help Center article (a living page that drops retired models — citation rot).
- **xAI**: docs.x.ai has one blanket sentence covering grok-3/grok-4; mini variants are not named there. Grok 4.20/4.3 have no official cutoff anywhere.
- **OpenAI**: per-model docs pages (developers.openai.com/api/docs/models/<id>) state cutoffs explicitly, including snapshot differences (gpt-4-1106-preview `2023-04` vs gpt-4-0125-preview `2023-12`).
## family/generation derivation
Rule-based, no research needed: `scripts/derive-family.ts` holds the per-family regex rules. Traps already encoded there — keep them when extending:
- Date suffixes are not versions: `claude-sonnet-4-20250514` is generation `claude-4`, not `claude-4.2`.
- Size suffixes are not versions: `llama-3-8b``llama-3` (not `llama-3.8`); `gemma-7b-it` is **gemma-1** (not gemma-7).
- Vendor spelling variants: `qwen2p5` = qwen2.5, `llama-v3p1` = llama-3.1, ollama `:NNb` tags, Bedrock `us.`/`global.`/`anthropic.` prefixes.
- `claude-X.0` normalizes to `claude-X`.
- Fable/Mythos-class ids (`claude-fable-5`) don't match the opus/sonnet/haiku regex — they are the Mythos class — `family: 'claude-mythos'`, `generation: 'mythos-5'` (set manually; the launch page calls Fable 5 "the generally available Mythos-class model").
## Repo-wide sweep workflow
1. **Extract ids**: `bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts` → unique normalized chat-model ids (normalization = last path segment, lowercased). Non-chat types (image/video/embedding/tts) have no knowledge cutoff — skip them.
2. **Research (multi-agent)**: chunk ids by family (≤50 per chunk) and fan out one research agent per chunk (Workflow tool), each returning `{id, cutoff, source}` with the sourcing rules above baked into the prompt, **plus** one adversarial verify agent per chunk that re-fetches cited sources and refutes unsupported claims. The verify pass is load-bearing: it caught the Cohere aggregator copy-paste and the AWS launch-date conflation.
3. **Policy filter**: before applying, drop entries whose only source is a rejected category (check the returned `sources` map — e.g. drop everything sourced to aws.amazon.com).
4. **Apply**: `bun scripts/apply-cutoffs.ts <map.json>` and `bun scripts/apply-family.ts <map.json>` (run from repo root). Both are idempotent codemods keyed on normalized id — aggregator providers get the same values automatically; entries that already have the field are skipped. They rely on the uniform prettier formatting of the data files (entries start ` {` / end ` },`, fields at 4-space indent).
5. **Verify**: `cd packages/model-bank && bunx vitest run src/aiModels/__tests__/index.test.ts && bunx tsc --noEmit`.
## Maintenance rules
- **New model PRs** should fill all three fields inline, citing the official source in the PR body (see the Anthropic entries in `anthropic.ts` for reference values).
- **After resolving merge conflicts** in model-bank data files, sanity-check that metadata didn't vanish: `git grep -c knowledgeCutoff -- 'packages/model-bank/src/aiModels/*.ts'` before vs after. A three-way stack of model PRs once silently dropped all 10 Anthropic cutoffs during conflict resolution.
- Dirty ids exist in aggregator data (a sambanova id once carried a trailing tab). The codemods match ids verbatim — if a map key won't apply, check for invisible characters before assuming the model is missing.
@@ -0,0 +1,73 @@
/**
* One-off codemod: apply a canonical { normalizedModelId: 'YYYY-MM' } map onto
* packages/model-bank/src/aiModels/*.ts, inserting `knowledgeCutoff` after the
* `id:` line of every chat-model entry that matches and doesn't already have one.
*
* Relies on the uniform prettier formatting of these files:
* - each model entry starts with ` {` and ends with ` },` (2-space indent)
* - fields are at 4-space indent: ` id: '...'`, ` type: 'chat'`
*
* Usage: bun /tmp/apply-cutoffs.ts /tmp/cutoff-map.json
*/
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
const mapPath = process.argv[2];
if (!mapPath) throw new Error('usage: bun apply-cutoffs.ts <map.json>');
const map: Record<string, string> = JSON.parse(readFileSync(mapPath, 'utf8'));
const dir = 'packages/model-bank/src/aiModels';
const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
let touchedFiles = 0;
let inserted = 0;
const matchedIds = new Set<string>();
for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const path = join(dir, file);
const lines = readFileSync(path, 'utf8').split('\n');
const out: string[] = [];
let changed = false;
let i = 0;
while (i < lines.length) {
if (lines[i] !== ' {') {
out.push(lines[i]);
i++;
continue;
}
// collect one model entry block
const start = i;
let end = i;
while (end < lines.length && lines[end] !== ' },') end++;
const block = lines.slice(start, end + 1);
const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
const hasCutoff = block.some((l) => /^ {4}knowledgeCutoff:/.test(l));
if (idLineIdx >= 0 && isChat && !hasCutoff) {
const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
const norm = rawId ? normalize(rawId) : undefined;
const cutoff = norm ? map[norm] : undefined;
if (cutoff && /^\d{4}(?:-\d{2})?$/.test(cutoff)) {
block.splice(idLineIdx + 1, 0, ` knowledgeCutoff: '${cutoff}',`);
inserted++;
changed = true;
matchedIds.add(norm!);
}
}
out.push(...block);
i = end + 1;
}
if (changed) {
writeFileSync(path, out.join('\n'));
touchedFiles++;
}
}
console.log(`inserted ${inserted} knowledgeCutoff fields across ${touchedFiles} files`);
console.log(`map ids used: ${matchedIds.size}/${Object.keys(map).length}`);
const unused = Object.keys(map).filter((k) => !matchedIds.has(k));
if (unused.length) console.log('unused map keys (first 20):', unused.slice(0, 20));
@@ -0,0 +1,49 @@
import { readdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join } from 'node:path';
const map: Record<string, { family: string; generation?: string }> = JSON.parse(
readFileSync('/tmp/family-map.json', 'utf8'),
);
const dir = 'packages/model-bank/src/aiModels';
const normalize = (id: string) => id.split('/').pop()!.toLowerCase();
let inserted = 0;
let touchedFiles = 0;
for (const file of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const path = join(dir, file);
const lines = readFileSync(path, 'utf8').split('\n');
const out: string[] = [];
let changed = false;
let i = 0;
while (i < lines.length) {
if (lines[i] !== ' {') {
out.push(lines[i]);
i++;
continue;
}
let end = i;
while (end < lines.length && lines[end] !== ' },') end++;
const block = lines.slice(i, end + 1);
const idLineIdx = block.findIndex((l) => /^ {4}id: '/.test(l));
const isChat = block.some((l) => /^ {4}type: 'chat',?$/.test(l));
const hasFamily = block.some((l) => /^ {4}family:/.test(l));
if (idLineIdx >= 0 && isChat && !hasFamily) {
const rawId = block[idLineIdx].match(/^ {4}id: '(.+)',$/)?.[1];
const r = rawId ? map[normalize(rawId)] : undefined;
if (r) {
const add = [` family: '${r.family}',`];
if (r.generation) add.push(` generation: '${r.generation}',`);
block.splice(idLineIdx, 0, ...add);
inserted++;
changed = true;
}
}
out.push(...block);
i = end + 1;
}
if (changed) {
writeFileSync(path, out.join('\n'));
touchedFiles++;
}
}
console.log(`annotated ${inserted} model entries across ${touchedFiles} files`);
@@ -0,0 +1,237 @@
/* eslint-disable regexp/no-unused-capturing-group */
/**
* Rule-based derivation of { family, generation } from normalized model ids.
* Principle: only fill what is confidently derivable; otherwise omit.
*
* Usage: bun /tmp/derive-family.ts # print distinct pairs for review
* bun /tmp/derive-family.ts --emit # write /tmp/family-map.json
*/
import { readFileSync, writeFileSync } from 'node:fs';
const ids: string[] = JSON.parse(readFileSync('/tmp/model-ids.json', 'utf8'));
type R = { family: string; generation?: string };
const derive = (id: string): R | undefined => {
// strip cloud/bedrock prefixes for matching
const m = id.replace(/^(us\.|global\.|eu\.|apac\.)?(anthropic\.|meta\.|cohere\.|azure-)/, '');
// ---- anthropic ----
if (m.startsWith('claude')) {
// family = product-line tier (claude-opus/sonnet/haiku/instant); bare claude-2.x has no tier
const tier = m.match(/(opus|sonnet|haiku|instant)/)?.[1];
const family = tier ? `claude-${tier}` : 'claude';
let g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)[.-](\d)(?!\d)/); // claude-opus-4-8 / claude-haiku-4.5
if (g) return { family, generation: `claude-${g[1]}.${g[2]}` };
g = m.match(/^claude-(?:opus|sonnet|haiku)-(\d)(?!\d)/); // claude-opus-4
if (g) return { family, generation: `claude-${g[1]}` };
g = m.match(/^claude-(\d)[.-](\d)(?!\d)/); // claude-3-5-haiku / claude-3.7-sonnet / claude-2.1
if (g) return { family, generation: g[2] === '0' ? `claude-${g[1]}` : `claude-${g[1]}.${g[2]}` };
g = m.match(/^claude-(\d)(?!\d)/); // claude-3-haiku
if (g) return { family, generation: `claude-${g[1]}` };
if (m.startsWith('claude-instant')) return { family: 'claude-instant' };
if (/^claude-v?2/.test(m)) return { family: 'claude', generation: 'claude-2' };
return { family };
}
// ---- openai ----
if (/^(gpt-oss|gpt_oss)/.test(m) || m.startsWith('gpt-oss:'))
return { family: 'gpt-oss', generation: 'gpt-oss' };
if (/^(chatgpt-4o|gpt-4o)/.test(m)) return { family: 'gpt', generation: 'gpt-4o' };
if (/^gpt-(3\.5|35)/.test(m)) return { family: 'gpt', generation: 'gpt-3.5' };
if (m.startsWith('gpt-audio')) return { family: 'gpt', generation: 'gpt-audio' };
{
const g = m.match(/^gpt-(\d)\.(\d)/); // gpt-4.1 / gpt-5.2
if (g) return { family: 'gpt', generation: `gpt-${g[1]}.${g[2]}` };
const g2 = m.match(/^gpt-(\d)(?!\d)/); // gpt-4 / gpt-5
if (g2) return { family: 'gpt', generation: `gpt-${g2[1]}` };
}
{
const g = m.match(/^o([134])(-|$)/); // o1 / o3 / o4
if (g) return { family: 'o-series', generation: `o${g[1]}` };
}
if (/^(codex|computer-use-preview)/.test(m)) return { family: 'gpt' };
// ---- google ----
{
const g = m.match(/^gemini-(\d+(?:\.\d+)?)/);
if (g) return { family: 'gemini', generation: `gemini-${g[1]}` };
if (/^gemini-(pro|flash)/.test(m)) return { family: 'gemini' }; // rolling aliases
if (m.startsWith('gemma')) {
if (/^gemma-?\db/.test(m)) return { family: 'gemma', generation: 'gemma-1' };
const v = m.match(/^gemma-?(\d)(?!b)/);
return { family: 'gemma', generation: v ? `gemma-${v[1]}` : undefined };
}
if (/^(codegemma|learnlm|palm)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
}
// ---- qwen ----
if (m.startsWith('qwq')) return { family: 'qwen', generation: 'qwq' };
if (m.startsWith('qvq')) return { family: 'qwen', generation: 'qvq' };
if (m.startsWith('codeqwen')) return { family: 'qwen' };
if (m.startsWith('qwen')) {
const g =
m.match(/^qwen-?([123](?:\.\d+)?)(?![0-9b])/) || // qwen3.5-plus / qwen-3-14b / qwen2-7b / qwen1.5
m.match(/^qwen([23](?:\.\d+)?):/) || // qwen2.5:72b
m.match(/^qwen([23])p(\d)/); // qwen2p5 -> handled below
if (/^qwen(\d)p(\d)/.test(m)) {
const p = m.match(/^qwen(\d)p(\d)/)!;
return { family: 'qwen', generation: `qwen${p[1]}.${p[2]}` };
}
if (g) return { family: 'qwen', generation: `qwen${g[1]}` };
return { family: 'qwen' }; // qwen-max/plus/turbo/vl rolling aliases
}
// ---- deepseek ----
if (/^(deepseek|azure-deepseek|pro-deepseek)/.test(m) || m.startsWith('deepseek_')) {
const s = m.replace(/^pro-/, '').replaceAll('_', '-');
if (s.startsWith('deepseek-r1-distill'))
return { family: 'deepseek', generation: 'deepseek-r1-distill' };
if (s.startsWith('deepseek-r1')) return { family: 'deepseek', generation: 'deepseek-r1' };
const g = s.match(/^deepseek-(?:chat-)?v(\d(?:\.\d)?)/);
if (g) return { family: 'deepseek', generation: `deepseek-v${g[1]}` };
if (/^deepseek-(coder-v2|coder)/.test(s))
return { family: 'deepseek', generation: 'deepseek-coder' };
return { family: 'deepseek' }; // deepseek-chat / reasoner rolling aliases
}
// ---- meta llama ----
if (m.startsWith('codellama')) return { family: 'llama', generation: 'codellama' };
if (/^(meta-)?llama|^l3(\d)?-|^llava/.test(m)) {
if (m.startsWith('llava')) return { family: 'llava' };
const s = m.replace(/^meta-/, '');
const g =
s.match(/^llama-?([234])(?:[.-](\d))?(?![0-9b])/) || // llama-3.1 / llama3.3 / llama-4
s.match(/^llama-?v([234])p?(\d)?/) || // llama-v3p1
s.match(/^llama([234])[.:-](\d)?/);
if (g) {
const gen = g[2] ? `llama-${g[1]}.${g[2]}` : `llama-${g[1]}`;
return { family: 'llama', generation: gen };
}
if (m.startsWith('l3-')) return { family: 'llama', generation: 'llama-3' };
if (m.startsWith('l31-')) return { family: 'llama', generation: 'llama-3.1' };
return { family: 'llama' };
}
// ---- zhipu ----
if (/^(zai-)?glm/.test(m)) {
const s = m.replace(/^zai-/, '');
if (s.startsWith('glm-z1')) return { family: 'glm', generation: 'glm-z1' };
if (s.startsWith('glm-zero')) return { family: 'glm', generation: 'glm-zero' };
const g = s.match(/^glm-(\d(?:\.\d)?)/);
if (g) return { family: 'glm', generation: `glm-${g[1]}` };
return { family: 'glm' };
}
if (/^(charglm|codegeex|emohaa)/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
// ---- mistral ----
if (
/^(open-)?(mistral|mixtral|ministral|codestral|devstral|magistral|pixtral|mathstral|labs-devstral|labs-leanstral|open-codestral)/.test(
m,
)
) {
const fam = m.replace(/^(open-|labs-)/, '').match(/^[a-z]+/)![0];
return { family: fam };
}
// ---- xai ----
if (m.startsWith('grok')) {
const g = m.match(/^grok-(\d(?:\.\d+)?)/);
return { family: 'grok', generation: g ? `grok-${g[1]}` : undefined };
}
// ---- moonshot ----
if (m.startsWith('kimi')) {
const g = m.match(/^kimi-k(\d(?:\.\d)?)/);
return { family: 'kimi', generation: g ? `kimi-k${g[1]}` : undefined };
}
if (m.startsWith('moonshot-kimi-k2')) return { family: 'kimi', generation: 'kimi-k2' };
if (m.startsWith('moonshot-v1')) return { family: 'kimi', generation: 'moonshot-v1' };
// ---- minimax ----
if (m.startsWith('minimax')) {
if (m.startsWith('minimax-text')) return { family: 'minimax', generation: 'minimax-text-01' };
const g = m.match(/^minimax-m(\d(?:\.\d)?)/);
return { family: 'minimax', generation: g ? `minimax-m${g[1]}` : undefined };
}
if (m.startsWith('abab')) return { family: 'minimax', generation: 'abab' };
// ---- baidu ----
if (m.startsWith('ernie')) {
if (m.startsWith('ernie-x1')) return { family: 'ernie', generation: 'ernie-x1' };
const g = m.match(/^ernie-(\d\.\d)/);
return { family: 'ernie', generation: g ? `ernie-${g[1]}` : undefined };
}
if (m.startsWith('qianfan')) return { family: 'qianfan' };
// ---- bytedance ----
if (m.startsWith('doubao')) {
const g = m.match(/^doubao-seed-(\d[.-]\d|\d)/) || m.match(/^doubao-(\d\.\d)/);
return { family: 'doubao', generation: g ? `doubao-${g[1].replace('-', '.')}` : undefined };
}
if (/^(seed-oss|skylark)/.test(m)) return { family: m.startsWith('seed') ? 'doubao' : 'skylark' };
// ---- tencent ----
if (m.startsWith('hunyuan')) {
const g = m.match(/^hunyuan-(\d\.\d)/);
return { family: 'hunyuan', generation: g ? `hunyuan-${g[1]}` : undefined };
}
if (m.startsWith('hy3')) return { family: 'hunyuan', generation: 'hunyuan-3' };
// ---- others (family only / simple version) ----
if (m.startsWith('yi-')) return { family: 'yi' };
if (/^(command|c4ai-command)/.test(m)) return { family: 'command' };
if (/^(aya|c4ai-aya)/.test(m)) return { family: 'aya' };
if (/^phi-?(\d)?/.test(m) && m.startsWith('phi')) {
const g = m.match(/^phi-?(\d(?:\.\d)?)/);
return { family: 'phi', generation: g ? `phi-${g[1]}` : undefined };
}
if (m.startsWith('wizardlm')) return { family: 'wizardlm' };
if (m.startsWith('step-')) {
const g = m.match(/^step-(?:r1|(\d(?:\.\d)?))/);
return { family: 'step', generation: g?.[1] ? `step-${g[1]}` : undefined };
}
if (/^(internlm|intern-)/.test(m)) return { family: 'intern' };
if (m.startsWith('internvl')) return { family: 'internvl' };
if (m.startsWith('baichuan')) {
const g = m.match(/^baichuan-?(m?\d)/);
return { family: 'baichuan', generation: g ? `baichuan-${g[1]}` : undefined };
}
if (/^(sensechat|sensenova)/.test(m)) return { family: 'sensenova' };
if (/^(spark|generalv|4\.0ultra)/.test(m)) return { family: 'spark' };
if (/^(360gpt|360zhinao)/.test(m)) return { family: '360zhinao' };
if (/^(jamba|ai21-jamba)/.test(m)) return { family: 'jamba' };
if (m.startsWith('sonar')) return { family: 'sonar' };
if (/^(nova-lite|nova-micro|nova-pro)/.test(m)) return { family: 'nova' };
if (/^(ling|ring)-/.test(m)) return { family: m.match(/^[a-z]+/)![0] };
if (m.startsWith('longcat')) return { family: 'longcat' };
if (m.startsWith('mimo')) return { family: 'mimo' };
if (m.startsWith('taichu')) return { family: 'taichu' };
if (/^(hermes|nous-hermes)/.test(m)) return { family: 'hermes' };
if (m.startsWith('solar')) return { family: 'solar' };
if (m.startsWith('kat-coder')) return { family: 'kat-coder' };
if (m.startsWith('dbrx')) return { family: 'dbrx' };
if (m.startsWith('morph')) return { family: 'morph' };
return undefined;
};
const map: Record<string, R> = {};
const pairs = new Map<string, number>();
let derived = 0;
for (const id of ids) {
const r = derive(id);
if (!r) continue;
derived++;
map[id] = r;
const key = `${r.family} :: ${r.generation ?? '—'}`;
pairs.set(key, (pairs.get(key) || 0) + 1);
}
console.log(`derived ${derived}/${ids.length}`);
for (const [k, n] of [...pairs.entries()].sort()) console.log(String(n).padStart(4), k);
if (process.argv.includes('--emit')) {
writeFileSync('/tmp/family-map.json', JSON.stringify(map, null, 1));
console.log('\nwritten /tmp/family-map.json');
}
@@ -0,0 +1,23 @@
/**
* Extract unique normalized chat-model ids from packages/model-bank/src/aiModels/*.ts.
* Normalization: last path segment, lowercased (matches the apply codemods).
*
* Usage (repo root): bun .agents/skills/model-bank-metadata/scripts/extract-model-ids.ts [out.json]
* Default output: /tmp/model-ids.json
*/
import { readdirSync, writeFileSync } from 'node:fs';
import { join, resolve } from 'node:path';
const dir = resolve('packages/model-bank/src/aiModels');
const out = process.argv[2] || '/tmp/model-ids.json';
const ids = new Set<string>();
for (const f of readdirSync(dir).filter((f) => f.endsWith('.ts'))) {
const mod = await import(join(dir, f));
for (const m of mod.default || []) {
if (!m?.id || m.type !== 'chat') continue;
ids.add(m.id.split('/').pop()!.toLowerCase());
}
}
writeFileSync(out, JSON.stringify([...ids].sort(), null, 1));
console.log(`${ids.size} unique normalized chat ids -> ${out}`);
+23 -22
View File
@@ -56,7 +56,8 @@ git submodules.
├── apps/
│ ├── cli/ # LobeHub CLI
│ ├── desktop/ # Electron desktop app
── device-gateway/ # Device gateway service
── device-gateway/ # Device gateway service
│ └── server/ # Next.js-backed server: featureFlags, globalConfig, modules, routers, services, utils, workflows (`@/server/*` alias)
├── docs/ # changelog, development, self-hosting, usage
├── locales/ # en-US, zh-CN, ...
├── packages/ # ~80 @lobechat/* workspace packages — `ls` for the full set. Key ones:
@@ -85,32 +86,32 @@ git submodules.
├── business/ # Open-source stubs (client/server) — cloud repo provides real impls
├── features/ # Domain business components
├── store/ # ~30 zustand stores — `ls` for the full set
├── server/ # featureFlags, globalConfig, modules, routers, services, workflows, agent-hono
├── server/ # standalone-Hono server pieces only: agent-hono, workflows-hono (main backend lives in `apps/server`)
└── ... # components, hooks, layout, libs, locales, services, types, utils
```
## Architecture Map
| Layer | Location |
| ---------------- | --------------------------------------------------- |
| UI Components | `src/components`, `src/features` |
| SPA Pages | `src/routes/` |
| React Router | `src/spa/router/` |
| Global Providers | `src/layout` |
| Zustand Stores | `src/store` |
| Client Services | `src/services/` |
| REST API | `src/app/(backend)/webapi` |
| tRPC Routers | `src/server/routers/{async\|lambda\|mobile\|tools}` |
| Server Services | `src/server/services` (can access DB) |
| Server Modules | `src/server/modules` (no DB access) |
| Feature Flags | `src/server/featureFlags` |
| Global Config | `src/server/globalConfig` |
| DB Schema | `packages/database/src/schemas` |
| DB Model | `packages/database/src/models` |
| DB Repository | `packages/database/src/repositories` |
| Third-party | `src/libs` (analytics, oidc, etc.) |
| Builtin Tools | `packages/builtin-tool-*`, `packages/builtin-tools` |
| Open-source stub | `src/business/*`, `packages/business/*` (this repo) |
| Layer | Location |
| ---------------- | -------------------------------------------------------- |
| UI Components | `src/components`, `src/features` |
| SPA Pages | `src/routes/` |
| React Router | `src/spa/router/` |
| Global Providers | `src/layout` |
| Zustand Stores | `src/store` |
| Client Services | `src/services/` |
| REST API | `src/app/(backend)/webapi` |
| tRPC Routers | `apps/server/src/routers/{async\|lambda\|mobile\|tools}` |
| Server Services | `apps/server/src/services` (can access DB) |
| Server Modules | `apps/server/src/modules` (no DB access) |
| Feature Flags | `apps/server/src/featureFlags` |
| Global Config | `apps/server/src/globalConfig` |
| DB Schema | `packages/database/src/schemas` |
| DB Model | `packages/database/src/models` |
| DB Repository | `packages/database/src/repositories` |
| Third-party | `src/libs` (analytics, oidc, etc.) |
| Builtin Tools | `packages/builtin-tool-*`, `packages/builtin-tools` |
| Open-source stub | `src/business/*`, `packages/business/*` (this repo) |
## Data Flow
+7
View File
@@ -53,6 +53,12 @@ For Modal specifically, see the dedicated **modal** skill — use the imperative
| Layout | Center, DraggablePanel, Flexbox, Grid, Header, MaskShadow |
| Navigation | Burger, Menu, SideNav, Tabs |
## Loading indicators
**Do NOT use antd `Spin` / `<Spin />`.** Use a project loader
(`NeuralNetworkLoading`, `DotsLoading`, …) — see the **ux** skill ("Loading
visuals") for the component table and when to use each.
## State
When a feature component manages more than 3 pieces of state (`useState`/`useReducer`/derived state), extract the logic into a custom hook (e.g. `useXxx`). Keep the component focused on rendering — the hook holds state and handlers, so logic can be unit-tested without rendering the component.
@@ -112,6 +118,7 @@ errorElement: <ErrorBoundary />;
| ------------------------------------------------------------------ | --------------------------------------------------------------------------- |
| Using `next/link` in SPA | Use `react-router-dom` `Link` |
| Using antd directly | Use `@lobehub/ui/base-ui` first, then `@lobehub/ui` |
| antd `Spin` / `<Spin />` for loading | Use `NeuralNetworkLoading` / project loaders (see the **ux** skill) |
| `import { Select } from '@lobehub/ui'` | `import { Select } from '@lobehub/ui/base-ui'` |
| `import { Modal } from '@lobehub/ui'` + `<Modal open>` declarative | `createModal` / `confirmModal` from `@lobehub/ui/base-ui` (see modal skill) |
| `import { DropdownMenu/Popover/Switch } from '@lobehub/ui'` | Import same name from `@lobehub/ui/base-ui` instead |
+1
View File
@@ -22,6 +22,7 @@ user-invocable: false
- Bug fixes must include tests covering the fixed scenario
- New logic (services, store actions, utilities) should have test coverage
- **New database Model/Repository** (`packages/database/src/models/**`, `src/repositories/**`) must ship a sibling `__tests__/<name>.test.ts` — incl. user-isolation tests; BM25 search guarded by `describe.skipIf(!isServerDB)` (see `/testing``db-model-test.md`)
- Existing tests still cover the changed behavior?
- Prefer `vi.spyOn` over `vi.mock` (see `/testing` skill)
+1 -1
View File
@@ -50,7 +50,7 @@ Common false positives (do NOT merge):
- `db-migrations` vs `drizzle` — distinct workflows (migration files vs schema authoring).
- `microcopy` vs `i18n` — content vs mechanics.
- `agent-runtime-hooks` vs `agent-tracing` vs `agent-signal` — different surfaces of the agent system.
- `testing` vs `local-testing` vs `cli-backend-testing` — different test types.
- `testing` vs `agent-testing` — different test types.
### 4 — Description format consistency
+11 -2
View File
@@ -14,15 +14,21 @@ user-invocable: false
# Run specific test file
bunx vitest run --silent='passed-only' '[file-path]'
# Database package (client)
# Database package (client-db, PGlite — default, skips BM25/pg_search)
cd packages/database && bunx vitest run --silent='passed-only' '[file]'
# Database package (server)
# Database package (server-db, Postgres — BM25/pgvector parity, what CI measures coverage in)
cd packages/database && TEST_SERVER_DB=1 bunx vitest run --silent='passed-only' '[file]'
```
**Never run** `bun run test` - it runs all 3000+ tests (\~10 minutes).
> **Database models/repositories:** every new file under `packages/database/src/models/**`
> or `src/repositories/**` ships with a sibling `__tests__/<name>.test.ts` in the same PR.
> Use the real DB via `getTestDB()` (integration style), guard BM25/full-text-search blocks
> with `describe.skipIf(!isServerDB)`, and always test user-isolation. See
> `references/db-model-test.md` for setup, schema gotchas, and the client-vs-server-db split.
## Test Categories
| Category | Location | Config |
@@ -37,6 +43,9 @@ cd packages/database && TEST_SERVER_DB=1 bunx vitest run --silent='passed-only'
2. **Tests must pass type check** - Run `bun run type-check` after writing tests
3. **After 1-2 failed fix attempts, stop and ask for help**
4. **Test behavior, not implementation details**
5. **Regression tests for bug fixes** - After fixing a bug, add a regression test that fails before the fix and passes after, to prevent recurrence
6. **No new component tests** - Only update existing React component tests. Complex logic should be extracted into hooks and tested there instead
7. **All source changes before any test changes** - Complete all source file edits first, then update tests in a separate pass. Interleaving disrupts reasoning about the source changes, especially across many files
## Basic Test Structure
@@ -1,95 +1,74 @@
# Database Model Testing Guide
Test `packages/database` Model layer.
Test the `packages/database` Model and Repository layers.
## Dual Environment Verification (Required)
> **Rule: every new Model or Repository ships with a sibling test in the same PR.**
> A new file under `src/models/**` or `src/repositories/**` must have a matching
> `__tests__/<name>.test.ts`. Coverage runs in server-db mode in CI and the patch
> gate will not always catch a brand-new untested file (a small new file barely
> moves the project total) — so this is a convention, not something CI guarantees.
> Start from the template: `packages/database/src/models/__tests__/_test_template.ts`.
## Two test environments: client-db vs server-db
`getTestDB()` (`src/core/getTestDB.ts`) returns different engines based on the
`TEST_SERVER_DB` env var:
| Mode | Engine | When | Notes |
| ----------------------- | ----------------------------------- | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **client-db** (default) | PGlite (in-memory) | `bunx vitest run` | Migration runner **skips any SQL containing `pg_search` / `bm25`** — the ParadeDB BM25 `@@@` operator does not exist here. |
| **server-db** | node-postgres → `DATABASE_TEST_URL` | `TEST_SERVER_DB=1` | CI uses the `paradedb/paradedb` image (has `pg_search`). **Coverage is measured in this mode** (`test:coverage``vitest.config.server.mts`, uploaded to Codecov). |
```bash
# 1. Client environment (fast)
cd packages/database && TEST_SERVER_DB=0 bunx vitest run --silent='passed-only' '[file]'
# 1. Client environment (fast, default — what most local runs use)
cd packages/database && bunx vitest run --silent='passed-only' '[file]'
# 2. Server environment (compatibility)
# 2. Server environment (BM25 / pg_search / pgvector parity, needs DATABASE_TEST_URL)
cd packages/database && TEST_SERVER_DB=1 bunx vitest run --silent='passed-only' '[file]'
```
## User Permission Check - Security First 🔒
Implication: client-db coverage **under-counts** any code that needs BM25 (e.g.
`repositories/search/index.ts` reads near-0% locally but is fully covered in CI).
Don't chase those lines locally — confirm via CI/Codecov.
**Critical security requirement**: All user data operations must include permission checks.
## BM25 / full-text search → `describe.skipIf(!isServerDB)`
Any method using the BM25 `@@@` operator or `sanitizeBm25` (keyword search:
`queryByKeyword`, `searchAgents`, userMemory lexical search, …) **throws under
PGlite** (often swallowed by a `catch` that returns `[]`, so the test silently
fails with empty results). Guard those blocks so they only run in server-db:
```typescript
// ❌ DANGEROUS: Missing permission check
update = async (id: string, data: Partial<MyModel>) => {
return this.db
.update(myTable)
.set(data)
.where(eq(myTable.id, id)) // Only checks ID
.returning();
};
// ✅ SECURE: Permission check included
update = async (id: string, data: Partial<MyModel>) => {
return this.db
.update(myTable)
.set(data)
.where(
and(
eq(myTable.id, id),
eq(myTable.userId, this.userId), // ✅ Permission check
),
)
.returning();
};
```
## Test File Structure
```typescript
// @vitest-environment node
describe('MyModel', () => {
describe('create', () => {
/* ... */
});
describe('queryAll', () => {
/* ... */
});
describe('update', () => {
it('should update own records');
it('should NOT update other users records'); // 🔒 Security
});
describe('delete', () => {
it('should delete own records');
it('should NOT delete other users records'); // 🔒 Security
});
describe('user isolation', () => {
it('should enforce user data isolation'); // 🔒 Core security
});
// BM25 search requires the pg_search extension (ParadeDB), not available in PGlite
const isServerDB = process.env.TEST_SERVER_DB === '1';
describe.skipIf(!isServerDB)('queryByKeyword', () => {
/* ... */
});
```
## Security Test Example
Convention already used in `session.test.ts`, `topic.query.test.ts`,
`message.query.test.ts`, `home/index.test.ts`, `repositories/search/index.test.ts`.
## Setup boilerplate
Top-of-file pattern (see `_test_template.ts` for the full version). Use real DB
integration via `getTestDB()`**not a mocked `vi.fn()` db**; the integration
style exercises real SQL and gives far deeper coverage.
```typescript
it('should not update records of other users', async () => {
const [otherUserRecord] = await serverDB
.insert(myTable)
.values({ userId: 'other-user', data: 'original' })
.returning();
import { eq } from 'drizzle-orm';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
const result = await myModel.update(otherUserRecord.id, { data: 'hacked' });
import { getTestDB } from '../../core/getTestDB';
import { users } from '../../schemas';
import type { LobeChatDatabase } from '../../type';
import { MyModel } from '../myModel';
expect(result).toBeUndefined();
const unchanged = await serverDB.query.myTable.findFirst({
where: eq(myTable.id, otherUserRecord.id),
});
expect(unchanged?.data).toBe('original');
});
```
const serverDB: LobeChatDatabase = await getTestDB(); // top-level await is fine
## Data Management
```typescript
const userId = 'test-user';
const userId = 'my-model-test-user';
const otherUserId = 'other-user';
const myModel = new MyModel(serverDB, userId);
beforeEach(async () => {
await serverDB.delete(users);
@@ -97,40 +76,99 @@ beforeEach(async () => {
});
afterEach(async () => {
await serverDB.delete(users);
await serverDB.delete(users); // cascades to user-scoped rows
});
```
## Foreign Key Handling
Some tests need the Node environment (pgvector, server-only deps) — add
`// @vitest-environment node` as the first line when required.
## User permission check — security first 🔒
**Every user-data operation must be ownership-scoped.** Always add a test proving
another user cannot read/update/delete the row.
```typescript
// ❌ Wrong: Invalid foreign key
// ✅ SECURE: ownership in the WHERE clause
update = async (id: string, data: Partial<MyModel>) =>
this.db
.update(myTable)
.set(data)
.where(and(eq(myTable.id, id), eq(myTable.userId, this.userId)))
.returning();
```
```typescript
it('should NOT update another user's record', async () => {
const otherModel = new MyModel(serverDB, otherUserId);
const [row] = await otherModel.create({ data: 'original' });
await myModel.update(row.id, { data: 'hacked' });
const unchanged = await serverDB.query.myTable.findFirst({
where: eq(myTable.id, row.id),
});
expect(unchanged?.data).toBe('original');
});
```
## What to cover
Aim each model/repository as close to 100% as practical (excluding BM25):
- Every public method
- Both branches of conditionals; empty-list / `if (!x) return []` early returns
- Error fallbacks (e.g. decrypt/JSON-parse failure → `null`)
- Filters, pagination, ordering branches
- Ownership / user isolation, and workspace scoping if the model takes a `workspaceId`
## Schema gotchas (real traps that fail inserts or types)
- **`workspaces`** requires `{ id, name, slug, primaryOwnerId }` and has **no
`userId` column** — `insert(workspaces).values({ id, name, slug, primaryOwnerId })`.
- **uuid columns**: a "not found" test must pass a _valid_ UUID
(`'00000000-0000-0000-0000-000000000000'`); a random string raises a `22P02`
DB error instead of returning `undefined`/`null`.
- **Enum / `$type` columns** are type-checked: e.g. `files.source` is a
`FileSource` enum (`image_generation` | `page-editor` | `video_generation`),
not free text — passing `'upload'` is a type error.
- Read the table's schema in `src/schemas/` for `notNull` columns **without
defaults**; you must supply those on insert.
## Foreign key handling
```typescript
// ❌ Wrong: invalid foreign key
const testData = { asyncTaskId: 'invalid-uuid', fileId: 'non-existent' };
// ✅ Correct: Use null
// ✅ Use null
const testData = { asyncTaskId: null, fileId: null };
// ✅ Or: Create referenced record first
beforeEach(async () => {
const [asyncTask] = await serverDB
.insert(asyncTasks)
.values({ id: 'valid-id', status: 'pending' })
.returning();
testData.asyncTaskId = asyncTask.id;
});
// ✅ … or create the referenced row first
const [asyncTask] = await serverDB.insert(asyncTasks).values({ status: 'pending' }).returning();
testData.asyncTaskId = asyncTask.id;
```
## Predictable Sorting
## Predictable sorting
```typescript
// ✅ Use explicit timestamps
const oldDate = new Date('2024-01-01T10:00:00Z');
const newDate = new Date('2024-01-02T10:00:00Z');
// ✅ Use explicit timestamps — never rely on insert order
await serverDB.insert(table).values([
{ ...data1, createdAt: oldDate },
{ ...data2, createdAt: newDate },
{ ...data1, createdAt: new Date('2024-01-01T10:00:00Z') },
{ ...data2, createdAt: new Date('2024-01-02T10:00:00Z') },
]);
// ❌ Don't rely on insert order
await serverDB.insert(table).values([data1, data2]); // Unpredictable
```
## Checking coverage of one file
```bash
# Per-file coverage; read the "Uncovered Line #s" column to find gaps
cd packages/database
bunx vitest run --coverage --silent='passed-only' '[test-file]' 2>&1 | grep '[sourceFile].ts'
```
## Before finishing
1. Tests pass: `bunx vitest run --silent='passed-only' '[file]'`
2. Types pass: `bun run type-check` (vitest uses esbuild and does **not**
type-check — a green test run can still have type errors).
+4 -4
View File
@@ -1,6 +1,6 @@
---
name: trpc-router
description: 'TRPC router development guide. Use when creating or modifying src/server/routers, adding procedures, or implementing server-side API endpoints.'
description: 'TRPC router development guide. Use when creating or modifying apps/server/src/routers, adding procedures, or implementing server-side API endpoints.'
user-invocable: false
---
@@ -8,9 +8,9 @@ user-invocable: false
## File Location
- Routers: `src/server/routers/lambda/<domain>.ts`
- Helpers: `src/server/routers/lambda/_helpers/`
- Schemas: `src/server/routers/lambda/_schema/`
- Routers: `apps/server/src/routers/lambda/<domain>.ts`
- Helpers: `apps/server/src/routers/lambda/_helpers/`
- Schemas: `apps/server/src/routers/lambda/_schema/`
## Router Structure
+1 -1
View File
@@ -186,4 +186,4 @@ QSTASH_URL=https://custom-qstash.com
- [Upstash Workflow Documentation](https://upstash.com/docs/workflow)
- [QStash Documentation](https://upstash.com/docs/qstash)
- [Example Workflows in Codebase](<../../src/app/(backend)/api/workflows/>)
- [Workflow Classes](../../src/server/workflows/)
- [Workflow Classes](../../apps/server/src/workflows/)
@@ -177,7 +177,7 @@ This allows cloud to override specific modules while using lobehub defaults.
Place workflow class in cloud:
```text
lobehub-cloud/src/server/workflows/featureName/index.ts
lobehub-cloud/apps/server/src/workflows/featureName/index.ts
```
### Shared Workflows
@@ -185,7 +185,7 @@ lobehub-cloud/src/server/workflows/featureName/index.ts
Place workflow class in lobehub, re-export in cloud if needed:
```text
lobehub/src/server/workflows/featureName/index.ts
lobehub/apps/server/src/workflows/featureName/index.ts
```
---
@@ -294,8 +294,8 @@ export { POST } from 'lobehub/src/app/(backend)/api/workflows/feature/*/route';
**Step 4**: Move workflow class to lobehub
```bash
mv lobehub-cloud/src/server/workflows/feature \
lobehub/src/server/workflows/
mv lobehub-cloud/apps/server/src/workflows/feature \
lobehub/apps/server/src/workflows/
```
**Step 5**: Update cloud imports
@@ -305,7 +305,7 @@ mv lobehub-cloud/src/server/workflows/feature \
import { Workflow } from '@/server/workflows/feature';
// To
import { Workflow } from 'lobehub/src/server/workflows/feature';
import { Workflow } from 'lobehub/apps/server/src/workflows/feature';
```
---
@@ -326,7 +326,7 @@ lobehub-cloud/
│ ├── process-users/route.ts
│ ├── paginate-users/route.ts
│ └── generate-user/route.ts
└── src/server/workflows/welcomePlaceholder/
└── apps/server/src/workflows/welcomePlaceholder/
└── index.ts
```
@@ -4,7 +4,7 @@ Full code templates for the 3-layer architecture. Read this when actually writin
## Table of Contents
1. [Workflow Class](#workflow-class) — `src/server/workflows/{workflowName}/index.ts`
1. [Workflow Class](#workflow-class) — `apps/server/src/workflows/{workflowName}/index.ts`
2. [Layer 1: Entry Point](#layer-1-entry-point-process-) — `process-*` route
3. [Layer 2: Pagination](#layer-2-pagination-paginate-) — `paginate-*` route
4. [Layer 3: Execution](#layer-3-execution-execute--generate-) — `execute-*` / `generate-*` route
@@ -13,7 +13,7 @@ Full code templates for the 3-layer architecture. Read this when actually writin
## Workflow Class
**Location:** `src/server/workflows/{workflowName}/index.ts`
**Location:** `apps/server/src/workflows/{workflowName}/index.ts`
```typescript
import { Client } from '@upstash/workflow';
+283
View File
@@ -0,0 +1,283 @@
---
name: ux
description: 'LobeHub product design values / principles / checklists. Load this skill whenever the work touches user-interface features or implementation — designing or building any user-facing flow — to get better UX results.'
user-invocable: false
---
# UX — Design Values & Execution Checklists
How LobeHub products should feel, and concrete rules to get there. Use this when
**building or reviewing** any user-facing flow. For component/styling choices see
**react**, for wording see **microcopy**, for imperative modal wiring see **modal**.
## Design values
LobeHub follows four product design values — **Natural・Meaningful・Certainty・
Growth**. Read them before designing:
**[references/design-values.md](references/design-values.md)** (definitions +
conflict priority).
> The checklists below are the execution layer. Each item is tagged with the
> value(s) it serves; for what those values mean, see the file above.
## How this is organized
The checklists are grouped by **interaction type** — the kind of thing the user
is doing. Jump to the module that matches the surface you're building (reading a
list, editing content, running an action, …); each module collects the rules
specific to that interaction. The same surface often spans several modules (an
editable list is Read + Edit + Act) — walk each that applies.
---
## 1. Read — viewing data & lists
Any surface that **displays** records, lists, or detail. Covers the states a data
view can be in, behavior at scale, and keeping the user's place visible.
### 1.1 Data states: empty / loading / error・Meaningful・Certainty
Every data surface has **four** states — design all of them, not just "has data".
- [ ] **Empty state is a purpose-built page, not a blank screen.** It explains what
this is, why it's empty, and gives a clear next action (CTA + value props).
✅ Devices: an empty "Connect your first device" page with primary/secondary
connect paths and "what you can do once connected" cards — ❌ not a bare title
over skeleton rows or a blank body. _(Meaningful)_
- [ ] **Distinguish the empty variants** — "no data yet" (onboarding CTA) vs
"no match for filters" (clear-filters affordance) are different screens. _(Certainty)_
- [ ] **Loading state** designed (skeleton / NeuralNetworkLoading), not a flash of
blank or layout shift. _(Natural)_
- [ ] **Error state** designed — surface the reason and a retry/back path. _(Meaningful)_
### 1.2 Lists at scale・Certainty・Natural
A list/data page must be designed for its **whole range of sizes**, not just the
demo data.
- [ ] **Walk the scale: 1 / 2 / 5 / 20 / 100 / 1k10k rows.** Pick the right
mechanism per range — plain render → load-more / pagination → virtual scroll;
add batch-select / bulk actions once counts get large. _(Certainty)_
- [ ] **Co-design empty / loading / error with the data state** (see §1.1). A list
isn't done until all four render well. _(Natural)_
### 1.3 Selection visibility in scrolled lists・Certainty・Natural
A capped / scrollable / virtualized list mounts at `scrollTop = 0`. If the
active item sits below the fold, the user lands on a valid selection that is
**off-screen** — and reads it as "nothing is selected" or a broken page. Any
list that can open with a pre-selected item must **scroll that item into view**.
This is an easy case to miss: it only shows up once the list is long enough and
the selection is restored rather than freshly clicked.
- [ ] **Scroll the active item into view on mount / restore.** When the selection
is restored from a URL query, deep link, or persisted state (not a fresh
click), bring it into view — the container starts at the top otherwise. ✅
The nested thread list is capped to \~9 rows; a thread restored from
`?thread=` below the fold is scrolled into view on mount. _(Certainty)_
- [ ] **Hardest when the selection has no other anchor.** If the parent/container
row isn't highlighted while a child is active (no breadcrumb, no header
echo), an off-screen active row means **zero** visible feedback — design
for exactly this case. _(Meaningful)_
- [ ] **Use `block: 'nearest'` (or equivalent).** Only scroll when the row is
actually off-screen; an already-visible selection must not jump. _(Natural)_
- [ ] **Re-run once async rows mount.** The active id is usually known before the
list finishes loading; key the scroll off a list-ready signal (e.g. row
count), not only off the id, so a restored selection still lands when the
data arrives. _(Certainty)_
- [ ] **Mirror it across duplicated list variants** so the behavior can't regress
in just one (e.g. parallel agent / group lists). _(Certainty)_
### 1.4 Option visibility in pickers・Certainty・Meaningful
- [ ] **Pickers list every valid target.** Watch for options dropped by backend
list queries (pagination, `virtual` flags, scope filters) and add them back.
✅ The default "LobeAI" (inbox) agent is `virtual` and excluded from the
sidebar list, so the move picker re-adds it. An empty picker must mean
"genuinely none", never "we filtered out the only option". _(Meaningful)_
---
## 2. Edit — entering & changing content
Any surface where the user **types or edits**. Input is expensive effort; the
overriding rule is **never lose it**.
### 2.1 Protect in-progress edits・Certainty・Meaningful
Typed / edited content is real user effort; losing it is one of the most
infuriating outcomes a product can produce. Whenever an editor holds unsaved
input, assume the exit can be **accidental** — a misclick, a refresh, a crash, a
navigation, a failed save — and build a safety net: back the draft up locally and
recover it.
- [ ] **Back up the draft locally as the user types.** Persist to
localStorage / IndexedDB / store so a refresh, crash, accidental close, or
navigation doesn't vaporize the content. _(Certainty)_
- [ ] **Restore on return.** Coming back to the same editing context auto-restores
(or offers to restore) the unsaved draft, rather than showing a blank field. _(Meaningful)_
- [ ] **Guard destructive exits.** Closing / navigating / switching items away
from a dirty editor warns or auto-saves — never silently discards. _(Certainty)_
- [ ] **Survive a failed save.** If the save errors, keep the user's content in
the field / draft and let them retry; never clear the input on failure. _(Meaningful)_
- [ ] **Scope the draft to its target** (per topic / message / item id) so drafts
don't bleed across entities or resurrect on the wrong item. _(Certainty)_
---
## 3. Act — operations, flows & buttons
Any surface where the user **performs an action** — a single op, a bulk op, or a
multi-step flow. Covers momentum, focus, and full entity lifecycle.
### 3.1 Flow & momentum・Natural・Meaningful
Every action chain must **push the user forward**, never dead-end or block the flow.
- [ ] **Forward momentum** — after any operation, lead the user to the next step,
don't just stop. _(Meaningful)_
- [ ] **Success state = primary "go to result", secondary "dismiss"** — the strong
button is the forward action (take me to the result); "Done" is the weak/
secondary button. ✅ After moving topics: primary = "Go to «target»", secondary
\= "Done". _(Meaningful・Natural)_
- [ ] **Bulk ⇄ single-item parity** — an action on a multi-select toolbar must also
be reachable on a single item (its context menu), and vice versa. _(Certainty)_
- [ ] **Confirm → in-progress → done, in one surface** — bulk/irreversible/async
ops use a modal state machine: a confirm step stating exactly what happens →
an in-progress view with **dismissal locked** → a done (or error) view in the
same modal. Never fire-and-forget with only a toast; never leave a dead
spinner. _(Certainty・Meaningful)_
### 3.2 One primary button per surface・Certainty
- [ ] **One primary button per surface.** The single primary CTA tells the user the
core action; everything else is secondary/tertiary. Never a pile of primary
buttons competing for attention. _(Certainty)_
### 3.3 Entity lifecycle completeness・Meaningful・Certainty
The recurring trap: a feature ships only the **display** of a list, but edit /
delete / management are never built — so the user can add something and then be
stuck with it. For every entity a user can see, design its **full lifecycle**:
create / read / update / delete, plus state transitions (enable/disable,
connect/disconnect, install/uninstall). A read-only list the user can't manage
breaks the flow.
**The allowed operation set depends on the entity's source / ownership** — decide
it explicitly _before_ building. Worked example, the tools/connectors list:
| Entity class | Add | Edit | Remove |
| ----------------------------------- | ------- | --------- | ------------------ |
| Official / built-in (skills, tools) | — | — | ✗ not removable |
| Community (installed MCP) | install | configure | uninstall / remove |
| User-custom (custom connector) | create | edit | delete |
- [ ] **No display-only features.** For every listed entity, enumerate CRUD +
lifecycle ops and build the ones that apply. _(Meaningful)_
- [ ] **Operation set per source/ownership class** — built-in may be read-only;
anything the user _installed_ must be removable; anything the user _created_
must be editable **and** deletable. _(Certainty)_
- [ ] **Each item exposes its allowed ops** (hover action / context menu / detail
page), and there's a clear entry point to add/create where applicable. _(Natural)_
- [ ] **An intentionally-absent op is a documented decision, not an oversight**
(e.g. official tools can't be deleted — by design). _(Certainty)_
---
## 4. Feedback — loading & system response
How the product **answers back** while and after the user acts — loading visuals
and proactive guardrails.
### 4.1 Loading visuals・Natural
**Never use antd `Spin`** — it doesn't match the product's loading visual. Use a
project loader:
| Need | Component |
| --------------------------- | ----------------------------------------------------------------------------- |
| Default loading (in-flight) | `NeuralNetworkLoading` from `@/components/NeuralNetworkLoading` (`size` prop) |
| Inline dots | `DotsLoading` / `BubblesLoading` from `@/components` |
| Branded full-page | `Loading` from `@/components/Loading/BrandTextLoading` |
| List / card placeholder | a skeleton (e.g. `SkeletonList`) |
When in doubt, reach for `NeuralNetworkLoading` — it's the default in-flight
indicator (e.g. modal "in progress" states).
### 4.2 Capability-gated features・Certainty・Meaningful
A feature can be fully built and still produce a broken result when the selected
model — or its still-loading config — **can't deliver the capability the feature
depends on** (for example, an agentic run on a model without tool calling). This
is usually the user's configuration choice, not a defect; but if the product stays
silent the user reads it as the product being broken. When a feature's success
depends on a capability the current config may lack, the product owes a
**proactive, non-blocking reminder** — a guardrail, not a gate.
- [ ] **Surface the mismatch, don't fail silently.** When a feature needs a model
capability (tool calling, vision, reasoning, long context) the current model
lacks, show a soft inline warning at the point of action — never a hard block
or a modal that stops the user. _(Meaningful)_
- [ ] **Stay reactive.** The reminder clears the moment the user switches to a
capable model — derive it from live state, not a one-shot check. _(Natural)_
- [ ] **Don't warn while config is loading.** A capability that hasn't resolved yet
looks "unsupported"; warning then is a false alarm — exactly the glitch users
mistake for a product bug. Warn only on a _resolved_ unsupported state. _(Certainty)_
- [ ] **Scope to the mode that needs it.** Show only when the capability-dependent
mode is on; one reminder per root cause, never a pile of overlapping notices. _(Natural・Certainty)_
- [ ] **State the problem and the remedy.** The copy says what's wrong _and_ what
the user should do about it. _(Meaningful)_
---
## 5. Grow — discoverability & progressive disclosure
How the product **deepens** as the user's needs deepen.
### 5.1 Progressive disclosure・Growth
The product should grow with the user — deeper power shows up as needs deepen.
- [ ] **Progressive disclosure** — keep the novice path clean; reveal advanced
capabilities as the user gets there, don't dump everything at once. _(Growth・Natural)_
- [ ] **Surface related actions at the moment of need** — make the next capability
discoverable in context (e.g. after the first item exists, offer what to do
with it), not buried in a far-off menu. _(Growth・Meaningful)_
---
## Quick review checklist
**Read — viewing data & lists**
- [ ] Empty / loading / error states are all designed; empty is a real page with a CTA.
- [ ] List designed across 1 → 10k rows (virtual scroll / pagination / batch as needed).
- [ ] Capped/scrollable/virtualized list scrolls the restored active item into view on mount (`block: 'nearest'`, re-run after async rows mount).
- [ ] Pickers show all valid targets (default/inbox included); empty = truly none.
**Edit — entering & changing content**
- [ ] Editors back up in-progress input locally and recover it after refresh/crash/failed-save; destructive exits warn, never silently discard.
**Act — operations, flows & buttons**
- [ ] Action leads the user forward; success offers a primary "go to result".
- [ ] Bulk action has a single-item entry (and vice versa).
- [ ] Async/bulk/irreversible action: confirm → in-progress (locked) → done/error.
- [ ] Exactly one primary button per surface.
- [ ] Listed entities have their full lifecycle (not display-only); ops match source (built-in / installed / custom).
**Feedback — loading & system response**
- [ ] No antd `Spin`; use `NeuralNetworkLoading` / project loaders.
- [ ] Capability-gated feature warns (soft, reactive, load-gated) when the model can't deliver it; copy gives the remedy.
**Grow — discoverability & progressive disclosure**
- [ ] Advanced capability is progressively disclosed / discoverable at the moment of need.
## Related skills
- **modal** — imperative `createModal` state-machine wiring for confirm/progress/done.
- **microcopy** — wording for confirm / done / empty / error states.
- **react** — component priority, `Button` usage, styling.
@@ -0,0 +1,51 @@
# LobeHub Design Values (设计价值观)
The philosophy behind every LobeHub interface. Read this before designing or
reviewing a flow; the per-aspect execution rules live in the parent
[SKILL.md](../SKILL.md) and each checklist item is tagged with the value(s) it serves.
Adapted from Ant Design's design values
(<https://ant.design/docs/spec/values-cn>, <https://zhuanlan.zhihu.com/p/44809866>).
LobeHub adopts all four.
## 自然 (Natural)
Minimise cognitive load. Digital products keep getting more complex while human
attention stays scarce — so the interface should feel as effortless as the
physical world. The next step should be obvious without thinking; the product
proactively carries the user forward (sensible defaults, AI-assisted decisions,
smooth transitions) rather than making them stop and figure things out.
## 意义感 (Meaningful)
Every screen is rooted in the user's real goal, not an isolated feature. Make the
objective clear, give immediate feedback on the result of each action, and always
point at the next meaningful step. Calibrate difficulty — neither a patronising
over-simplification nor an overwhelming wall — so the user keeps a sense of
progress and accomplishment.
## 确定性 (Certainty)
Low-entropy, predictable interactions. Reuse the same patterns, components, and
wording so behaviour is never surprising. Keep a single clear focus per surface,
and design **every** state (empty / loading / error / success) so nothing is left
undefined. Restraint over cleverness: fewer, consistent rules beat many bespoke
ones.
## 生长性 (Growth)
The product grows together with the user. As needs deepen and roles evolve,
surface advanced capabilities progressively and make related features
discoverable at the moment they become relevant — without crowding the novice
path. Bridge product value to the user's changing scenarios and aim for
humanmachine symbiosis (人机共生): the user and the agent co-evolve, each making
the other more capable over time.
## Priority when values conflict
For moment-to-moment interaction decisions: **意义感 ≳ 自然 > 确定性** — never
sacrifice the user's goal or forward momentum just to keep things uniform.
**生长性 (Growth)** is a longer-horizon lens: weigh it when shaping how a feature
is discovered and how it scales with the user, not when resolving a single-screen
layout trade-off.
@@ -49,4 +49,4 @@ Migration owner: @{pr-author}
The migration owner is responsible for rollout follow-up and incident handling for this schema change.
> **Note for Claude**: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'` or from commit metadata. Do not hardcode a username.
> \[!NOTE]: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'` or from commit metadata. Do not hardcode a username.
@@ -18,4 +18,4 @@
@{pr-author}
> **Note for Claude**: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'`. Do not hardcode a username.
> \[!NOTE]: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'`. Do not hardcode a username.
@@ -86,7 +86,7 @@ New AI model or provider support, typically contributed via community PRs.
- These PR title prefixes (`feat` / `style`) are in the auto-tag trigger list
- No special branch naming or manual release steps required — merging the PR triggers auto patch +1
### When Claude is involved
### When an agent is involved
If asked to add model support, just create a normal feature PR. The title prefix will trigger the release automatically.
+4 -40
View File
@@ -177,29 +177,12 @@ export const chatGroupAction: StateCreator<
### Slices That Don't Currently Need `set`
When a slice doesn't write local state at the moment — e.g. it reads context
from `#get()` and forwards calls to another store, or just runs hooks — drop
the `#set` field. Otherwise ESLint's `no-unused-vars` flags the unused private
field.
Mark the constructor's `set` param as `_set` and `void _set` it to keep the
`(set, get, api)` shape aligned with `StateCreator`. This is **a snapshot of
the current need, not a permanent contract** — if a later change needs `set`,
restore the `#set` field and use it; do not invent a workaround to keep the
"unused" form.
When a slice doesn't write local state (e.g. it delegates to another store or just runs hooks), drop `#set` and mark the constructor param as `_set` with `void _set` to keep the `(set, get, api)` shape:
```ts
type Setter = StoreSetter<ConversationStore>;
export const toolSlice = (set: Setter, get: () => ConversationStore, _api?: unknown) =>
new ToolActionImpl(set, get, _api);
export class ToolActionImpl {
readonly #get: () => ConversationStore;
// Mark unused params with `_` prefix and `void _x` so the constructor still
// matches StateCreator's `(set, get, api)` shape without triggering unused
// diagnostics.
constructor(_set: Setter, get: () => ConversationStore, _api?: unknown) {
void _set;
void _api;
@@ -212,27 +195,8 @@ export class ToolActionImpl {
hooks.onToolCallComplete?.(id, undefined);
};
}
export type ToolAction = Pick<ToolActionImpl, keyof ToolActionImpl>;
```
Rules of thumb:
- If a slice doesn't currently call `set`, drop `#set` (use `_set` + `void _set`
in the constructor). When a later edit needs `set`, restore `#set` and use it.
- Don't add `setNamespace` for slices that don't write state. Add it when the
slice starts writing state.
- Never leave `#set` declared but unused "for future use" — lint will fail and
re-adding it later costs nothing.
### Do / Don't
- **Do**: keep constructor signature aligned with `StateCreator` params `(set, get, api)`.
- **Do**: use `#private` to avoid `set/get` being exposed.
- **Do**: use `flattenActions` instead of spreading class instances.
- **Do**: drop `#set` (and use `_set` + `void _set` in the constructor) for
delegate-only slices that never write state — keeps lint green without
breaking the `(set, get, api)` shape.
- **Don't**: keep both old slice objects and class actions active at the same time.
- **Don't**: keep an unused `#set` field "for future use" — it fails ESLint and
re-adding it later costs nothing.
- Drop `#set` when unused; restore it when a later edit needs `set` — re-adding costs nothing.
- Don't add `setNamespace` for slices that don't write state.
- Don't keep both old slice objects and class actions active at the same time during migration.
+2 -1
View File
@@ -1,6 +1,7 @@
# Add directories or file patterns to ignore during indexing (e.g. foo/ or *.csv)
locales/
apps/desktop/resources/locales/
**/__snapshots__/
**/fixtures/
src/database/migrations/
packages/database/migrations/
+42 -5
View File
@@ -223,6 +223,29 @@ OPENAI_API_KEY=sk-xxxxxxxxx
# The LobeChat agents market index url
# AGENTS_INDEX_URL=https://chat-agents.lobehub.com
# #######################################
# ######### Cloud Sandbox Service #######
# #######################################
# Sandbox provider for built-in code execution, shell, file operations, and export.
# Supported values: market, onlyboxes
# SANDBOX_PROVIDER=market
# Required when SANDBOX_PROVIDER=onlyboxes. Base URL of the Onlyboxes console API, without /api/v1.
# ONLYBOXES_BASE_URL=https://onlyboxes.example.com
# Required when SANDBOX_PROVIDER=onlyboxes. Must match Onlyboxes CONSOLE_JIT_SIGNING_KEY.
# ONLYBOXES_JIT_SIGNING_KEY=onlyboxes-jit-signing-secret
# Optional JIT token issuer. Defaults to APP_URL.
# ONLYBOXES_JIT_ISSUER=https://lobehub.example.com
# Optional JIT token TTL in seconds.
# ONLYBOXES_JIT_TTL_SEC=1800
# Optional terminal session lease in seconds for the Onlyboxes provider.
# ONLYBOXES_LEASE_TTL_SEC=900
# #######################################
# ########### Plugin Service ############
# #######################################
@@ -376,6 +399,11 @@ OPENAI_API_KEY=sk-xxxxxxxxx
# Postgres database URL
# DATABASE_URL=postgres://username:password@host:port/database
# Optional: server-side timeout (in milliseconds) for a single SQL statement.
# When set, Postgres aborts any statement/idle transaction exceeding it, so a stuck
# query can't block indefinitely. Leave unset to keep Postgres' default of no timeout.
# DATABASE_STATEMENT_TIMEOUT=300000
# use `openssl rand -base64 32` to generate a key for the encryption of the database
# we use this key to encrypt the user api key and proxy url
# KEY_VAULTS_SECRET=xxxxx/xxxxxxxxxxxxxx=
@@ -397,14 +425,14 @@ OPENAI_API_KEY=sk-xxxxxxxxx
# MCP_TOOL_TIMEOUT=60000
# #######################################
# ######### Klavis Service ##############
# ######### Composio Service ############
# #######################################
# Klavis API Key for accessing Strata hosted MCP servers
# Get your API key from: https://klavis.io
# Composio API Key for accessing hosted integrations (Gmail, Slack, etc.)
# Get your API key from: https://composio.dev
# IMPORTANT: This key is stored server-side only and NEVER exposed to the client
# When this key is set, Klavis integration will be automatically enabled
# KLAVIS_API_KEY=your_klavis_api_key_here
# When this key is set, Composio integration will be automatically enabled
# COMPOSIO_API_KEY=your_composio_api_key_here
# #######################################
# #### Message Gateway (IM Integration) ##
@@ -417,6 +445,15 @@ OPENAI_API_KEY=sk-xxxxxxxxx
# MESSAGE_GATEWAY_URL=https://message-gateway.lobehub.com
# MESSAGE_GATEWAY_SERVICE_TOKEN=your_service_token_here
# #######################################
# ######### Agent Gateway Mode ##########
# #######################################
# Enable Gateway Mode for self-hosted deployments. Requires AGENT_GATEWAY_URL.
# ENABLE_AGENT_GATEWAY=1
# AGENT_GATEWAY_URL=https://agent-gateway.example.com
# AGENT_GATEWAY_SERVICE_TOKEN=your_service_token_here
# #######################################
# ########### Messenger Bot #############
# #######################################
+89 -2
View File
@@ -5,6 +5,18 @@ inputs:
node-version:
description: Node.js version
required: true
cloud-repository:
description: Cloud repository to overlay for commercial desktop builds
required: false
default: lobehub/lobehub-cloud
cloud-ref:
description: Optional Cloud repository ref
required: false
default: ''
cloud-token:
description: GitHub token with permission to read the Cloud repository
required: false
default: ''
runs:
using: composite
@@ -14,9 +26,77 @@ runs:
with:
node-version: ${{ inputs.node-version }}
- name: Overlay Cloud repository for desktop build
if: inputs.cloud-token != ''
shell: bash
env:
CLOUD_CHECKOUT: ${{ runner.temp }}/lobehub-cloud
CLOUD_REF: ${{ inputs.cloud-ref }}
CLOUD_REPOSITORY: ${{ inputs.cloud-repository }}
CLOUD_ROOT: ${{ github.workspace }}/..
CLOUD_TOKEN: ${{ inputs.cloud-token }}
run: |
set -euo pipefail
cloud_root="$(cd "$GITHUB_WORKSPACE/.." && pwd)"
cloud_checkout="$RUNNER_TEMP/lobehub-cloud"
rm -rf "$cloud_checkout"
clone_args=(--depth 1)
if [ -n "$CLOUD_REF" ]; then
clone_args+=(--branch "$CLOUD_REF")
fi
git clone "${clone_args[@]}" "https://x-access-token:${CLOUD_TOKEN}@github.com/${CLOUD_REPOSITORY}.git" "$cloud_checkout"
node <<'NODE'
const fs = require('node:fs');
const path = require('node:path');
const source = process.env.CLOUD_CHECKOUT;
const target = process.env.CLOUD_ROOT;
const skip = new Set(['.git', 'lobehub', 'node_modules']);
const copy = (from, to) => {
const stat = fs.lstatSync(from);
if (stat.isSymbolicLink()) {
const link = fs.readlinkSync(from);
fs.rmSync(to, { force: true, recursive: true });
fs.symlinkSync(link, to);
return;
}
if (stat.isDirectory()) {
fs.mkdirSync(to, { recursive: true });
for (const entry of fs.readdirSync(from)) {
if (skip.has(entry)) continue;
copy(path.join(from, entry), path.join(to, entry));
}
return;
}
fs.mkdirSync(path.dirname(to), { recursive: true });
fs.copyFileSync(from, to);
};
for (const entry of fs.readdirSync(source)) {
if (skip.has(entry)) continue;
copy(path.join(source, entry), path.join(target, entry));
}
NODE
echo "CLOUD_DESKTOP=1" >> "$GITHUB_ENV"
echo "✅ Cloud repository overlaid at $cloud_root"
- name: Install dependencies
shell: bash
run: pnpm install --node-linker=hoisted
run: |
set -euo pipefail
if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
cd ..
fi
pnpm install --node-linker=hoisted
# 移除国内 electron 镜像配置,GitHub Actions 使用官方源更快
- name: Remove China electron mirror from .npmrc
@@ -31,4 +111,11 @@ runs:
- name: Install deps on Desktop
shell: bash
run: npm run install-isolated --prefix=./apps/desktop
run: |
set -euo pipefail
if [ "${CLOUD_DESKTOP:-}" = "1" ]; then
cd ..
npm run install-isolated --prefix=./lobehub/apps/desktop
else
npm run install-isolated --prefix=./apps/desktop
fi
+1 -1
View File
@@ -30,7 +30,7 @@ jobs:
This issue is closed, If you have any questions, you can comment and reply.
- name: Checkout repository
if: github.event_name == 'pull_request_target' && github.event.pull_request.merged == true
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: Check if PR author is maintainer
if: github.event.pull_request.merged == true
@@ -104,6 +104,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -172,6 +173,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -216,6 +218,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -1
View File
@@ -54,7 +54,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.11.1
node-version: 24.16.0
package-manager-cache: false
# 主要逻辑:确定构建版本号
@@ -92,6 +92,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: 24.11.1
# 设置 package.json 的版本号
@@ -87,6 +87,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -1
View File
@@ -223,6 +223,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
@@ -409,7 +410,7 @@ jobs:
- uses: actions/checkout@v6
- name: Delete old canary GitHub releases
uses: actions/github-script@v7
uses: actions/github-script@v8
with:
script: |
const { data: releases } = await github.rest.repos.listReleases({
@@ -180,6 +180,7 @@ jobs:
- name: Setup build environment
uses: ./.github/actions/desktop-build-setup
with:
cloud-token: ${{ secrets.LOBEHUB_CLOUD_TOKEN }}
node-version: ${{ env.NODE_VERSION }}
- name: Set package version
+2 -2
View File
@@ -28,7 +28,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.11.1
node-version: 24.16.0
- name: Setup pnpm
uses: pnpm/action-setup@v4
@@ -51,7 +51,7 @@ jobs:
- name: Setup Node.js
uses: actions/setup-node@v6
with:
node-version: 24.11.1
node-version: 24.16.0
registry-url: https://registry.npmjs.org
- name: Setup pnpm
+1 -25
View File
@@ -19,12 +19,6 @@ jobs:
steps:
- uses: actions/checkout@v6
- name: Clean issue notice
uses: actions-cool/issues-helper@e361abf610221f09495ad510cb1e69328d839e1c # v3.7.6
with:
actions: 'close-issues'
labels: '🚨 Sync Fail'
- name: Sync upstream changes
id: sync
uses: aormsby/Fork-Sync-With-Upstream-action@v3.4
@@ -33,22 +27,4 @@ jobs:
upstream_sync_branch: main
target_sync_branch: main
target_repo_token: ${{ secrets.GITHUB_TOKEN }} # automatically generated, no need to set
test_mode: false
- name: Sync check
if: failure()
uses: actions-cool/issues-helper@e361abf610221f09495ad510cb1e69328d839e1c # v3.7.6
with:
actions: 'create-issue'
title: '🚨 同步失败 | Sync Fail'
labels: '🚨 Sync Fail'
body: |
Due to a change in the workflow file of the [LobeChat][lobechat] upstream repository, GitHub has automatically suspended the scheduled automatic update. You need to manually sync your fork. Please refer to the detailed [Tutorial][tutorial-en-US] for instructions.
由于 [LobeChat][lobechat] 上游仓库的 workflow 文件变更,导致 GitHub 自动暂停了本次自动更新,你需要手动 Sync Fork 一次,请查看 [详细教程][tutorial-zh-CN]
![](https://github-production-user-asset-6210df.s3.amazonaws.com/17870709/273954625-df80c890-0822-4ac2-95e6-c990785cbed5.png)
[lobechat]: https://github.com/lobehub/lobe-chat
[tutorial-zh-CN]: https://lobehub.com/zh/docs/self-hosting/advanced/upstream-sync
[tutorial-en-US]: https://lobehub.com/docs/self-hosting/advanced/upstream-sync
test_mode: false
+52 -6
View File
@@ -32,7 +32,7 @@ jobs:
runs-on: ubuntu-latest
name: Test Packages
env:
PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/python-interpreter @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/builtin-tool-lobe-agent model-bank'
PACKAGES: '@lobechat/file-loaders @lobechat/prompts @lobechat/model-runtime @lobechat/web-crawler @lobechat/electron-server-ipc @lobechat/utils @lobechat/context-engine @lobechat/agent-runtime @lobechat/conversation-flow @lobechat/ssrf-safe-fetch @lobechat/memory-user-memory @lobechat/types @lobechat/trpc @lobechat/app-config @lobechat/locales @lobechat/env @lobechat/builtin-tool-lobe-agent model-bank @lobechat/agent-gateway-client @lobechat/agent-manager-runtime @lobechat/device-gateway-client @lobechat/device-identity @lobechat/eval-dataset-parser @lobechat/eval-rubric @lobechat/fetch-sse @lobechat/heterogeneous-agents'
steps:
- name: Checkout
@@ -90,11 +90,23 @@ jobs:
for package in $PACKAGES; do
dir="${package#@lobechat/}"
if [ -f "./packages/$dir/coverage/lcov.info" ]; then
echo "Uploading coverage for $dir..."
flag="packages/$dir"
case "$dir" in
builtin-tool-*)
flag="builtin-tools"
;;
locales|env|device-gateway-client)
echo "Skipping Codecov upload for $dir."
continue
;;
esac
echo "Uploading coverage for $dir as $flag..."
./codecov upload-coverage \
$COMMON_ARGS \
--file ./packages/$dir/coverage/lcov.info \
--flag packages/$dir \
--flag "$flag" \
--disable-search
fi
done
@@ -105,8 +117,8 @@ jobs:
if: needs.check-duplicate-run.outputs.should_skip != 'true'
strategy:
matrix:
shard: [1, 2, 3]
name: Test App (shard ${{ matrix.shard }}/3)
shard: [1, 2]
name: Test App (shard ${{ matrix.shard }}/2)
runs-on: ubuntu-latest
steps:
- name: Checkout
@@ -126,7 +138,7 @@ jobs:
run: pnpm install
- name: Run tests
run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/3
run: bunx vitest --coverage --silent='passed-only' --reporter=default --reporter=blob --shard=${{ matrix.shard }}/2 --exclude '**/apps/server/**'
- name: Upload blob report
if: ${{ !cancelled() }}
@@ -219,6 +231,40 @@ jobs:
files: ./apps/desktop/coverage/lcov.info
flags: desktop
test-server:
needs: check-duplicate-run
if: needs.check-duplicate-run.outputs.should_skip != 'true'
name: Test Server
runs-on: ubuntu-latest
steps:
- name: Checkout
env:
REF_SHA: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
REPOSITORY: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name || github.repository }}
run: |
git init .
git remote add origin "https://github.com/${REPOSITORY}.git"
git fetch --no-tags --depth=1 origin "${REF_SHA}"
git checkout --force FETCH_HEAD
- name: Setup environment
uses: ./.github/actions/setup-env
- name: Install deps
run: pnpm install
- name: Test Server Coverage
run: bunx vitest --coverage --silent='passed-only' --reporter=default --coverage.reportsDirectory=./apps/server/coverage --dir apps/server
- name: Upload Server coverage to Codecov
uses: codecov/codecov-action@v5
with:
token: ${{ secrets.CODECOV_TOKEN }}
files: ./apps/server/coverage/lcov.info
flags: server
test-databsae:
needs: check-duplicate-run
if: needs.check-duplicate-run.outputs.should_skip != 'true'
+2 -3
View File
@@ -59,6 +59,7 @@ bun.lockb
# Build outputs
dist/
public/_spa/
public/_spa-auth/
public/spa/
es/
lib/
@@ -92,10 +93,8 @@ public/swe-worker*
# Generated files
src/app/spa/[variants]/[[...path]]/spaHtmlTemplates.ts
src/app/spa-auth/authHtmlTemplate.ts
public/*.js
public/sitemap.xml
public/sitemap-index.xml
sitemap*.xml
robots.txt
# Git hooks
+12 -5
View File
@@ -19,7 +19,7 @@ lobehub/
├── apps/
│ ├── desktop/ # Electron desktop app
│ ├── cli/ # LobeHub CLI
│ └── device-gateway/ # Device gateway service
│ └── server/ # Server service
├── packages/ # Shared packages (@lobechat/*)
│ ├── database/ # Database schemas, models, repositories
│ ├── agent-runtime/ # Agent runtime
@@ -115,14 +115,19 @@ cd packages/database && bunx vitest run --silent='passed-only' '[file]'
```
- Prefer `vi.spyOn` over `vi.mock`
- Tests must pass type check: `bun run type-check`
- After 2 failed fix attempts, stop and ask for help
### Type Checking
```bash
bun run type-check
```
### i18n
- Add keys to a namespace file under `src/locales/default/` (e.g. `agent.ts`, `auth.ts`)
- For dev preview: translate `locales/zh-CN/` and `locales/en-US/`
- `pnpm i18n` is slow; run it manually when locale keys need updating (e.g. before opening a PR).
- Ship en-US and zh-CN by hand in the same PR: write the English source in `src/locales/default/*.ts` and mirror it to `locales/en-US/`; hand-translate `locales/zh-CN/`. Leave all other locales to CI.
- Don't run `pnpm i18n` manually by default — a daily CI workflow (`auto-i18n.yml`) runs it and opens an automated translation PR for any missing keys.
- Run `pnpm i18n` manually only when your branch needs the translated locales immediately, instead of waiting for the daily job (slow; requires `OPENAI_API_KEY`). Note it only fills keys missing from other locales — value-only edits never need it.
### Code Style
@@ -131,3 +136,5 @@ cd packages/database && bunx vitest run --silent='passed-only' '[file]'
### Code Review
Before reviewing a PR / diff / branch change, read the **review-checklist** skill (`.agents/skills/review-checklist/SKILL.md`) — it lists the recurring mistakes specific to this codebase.
When designing or reviewing user-facing flows (empty/loading/error states, confirmations, async feedback, button hierarchy, lists at scale, pickers), follow the **ux** skill (`.agents/skills/ux/SKILL.md`) — LobeHub's design values (自然 / 意义感 / 确定性) plus per-aspect execution checklists.
+8
View File
@@ -210,6 +210,14 @@ ENV NEXT_PUBLIC_S3_DOMAIN="" \
S3_ENABLE_PATH_STYLE="" \
S3_SET_ACL=""
# Cloud Sandbox
ENV SANDBOX_PROVIDER="" \
ONLYBOXES_BASE_URL="" \
ONLYBOXES_JIT_ISSUER="" \
ONLYBOXES_JIT_SIGNING_KEY="" \
ONLYBOXES_JIT_TTL_SEC="" \
ONLYBOXES_LEASE_TTL_SEC=""
# Model Variables
ENV \
# AI21
+4 -1
View File
@@ -1,6 +1,6 @@
.\" Code generated by `npm run man:generate`; DO NOT EDIT.
.\" Manual command details come from the Commander command tree.
.TH LH 1 "" "@lobehub/cli 0.0.24" "User Commands"
.TH LH 1 "" "@lobehub/cli 0.0.29" "User Commands"
.SH NAME
lh \- LobeHub CLI \- manage and connect to LobeHub services
.SH SYNOPSIS
@@ -113,6 +113,9 @@ Manage plugins
.B user
Manage user account and settings
.TP
.B verify
Manage the Agent Run delivery checker (criteria, rubrics, plans, results)
.TP
.B whoami
Display current user information
.TP

Some files were not shown because too many files have changed in this diff Show More