Compare commits

..

1 Commits

Author SHA1 Message Date
Arvin Xu 0469464c37 feat: optimize summary title prompt with few-shot and language rules
Add few-shot examples, explicit locale-to-language mapping, anti-filler
constraints, emotion-vs-intent guidance, and technical term preservation
rules. Improves pass rate from 80% to 100% on gpt-5-mini across 10 eval cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 18:41:40 +08:00
5802 changed files with 77777 additions and 737985 deletions
+1 -3
View File
@@ -1,8 +1,6 @@
---
name: add-provider-doc
description: Add documentation for a new AI provider — usage docs, env vars, Docker config, image resources.
disable-model-invocation: true
argument-hint: '[provider-name]'
description: Guide for adding new AI provider documentation. Use when adding documentation for a new AI provider (like OpenAI, Anthropic, etc.), including usage docs, environment variables, Docker config, and image resources. Triggers on provider documentation tasks.
---
# Adding New AI Provider Documentation
+1 -3
View File
@@ -1,8 +1,6 @@
---
name: add-setting-env
description: Add server-side environment variables that control default values for user settings.
disable-model-invocation: true
argument-hint: '[setting-name]'
description: Guide for adding environment variables to configure user settings. Use when implementing server-side environment variables that control default values for user settings. Triggers on env var configuration or setting default value tasks.
---
# Adding Environment Variable for User Settings
-209
View File
@@ -1,209 +0,0 @@
---
name: agent-runtime-hooks
description: 'Agent runtime lifecycle hooks. Use for before/after tool or step hooks, tool mocks, human intervention, sub-agent calls, context compression, evals, tracing, callAgent, or lifecycle events.'
user-invocable: false
---
# Agent Runtime Hooks
Lifecycle hooks for observing and intercepting agent execution. Hooks are registered per-operation via `execAgent({ hooks })` and dispatched by `HookDispatcher`.
## Hook Types
16 hook types across 5 categories:
```
execAgent({ hooks })
├─ beforeStep ──────────── Before each step executes
│ │
│ ├─ [call_llm] LLM inference
│ │
│ ├─ [call_tool]
│ │ ├─ beforeToolCall ── Before tool executes (supports mocking)
│ │ ├─ (tool execution)
│ │ ├─ afterToolCall ─── After tool completes (observation only)
│ │ └─ onToolCallError ─ Tool threw an exception
│ │
│ ├─ [request_human_approve]
│ │ ├─ beforeHumanIntervention ── Before agent pauses
│ │ ├─ afterHumanIntervention ─── After approve/reject + resume
│ │ └─ onStopByHumanIntervention ── User rejected, agent halted
│ │
│ ├─ [compress_context]
│ │ ├─ beforeCompact ──── Before compression starts
│ │ ├─ afterCompact ───── After compression completes
│ │ └─ onCompactError ─── Compression failed
│ │
│ ├─ [callAgent] (via execSubAgentTask)
│ │ ├─ beforeCallAgent ── Before sub-agent starts
│ │ ├─ afterCallAgent ─── After sub-agent completes
│ │ └─ onCallAgentError ── Sub-agent failed
│ │
│ └─ afterStep ──────────── After step completes
├─ (next step...)
├─ onComplete ───────────── Operation reaches terminal state
└─ onError ──────────────── Error during execution
```
## Key Files
| File | Role |
| ---------------------------------------------------------- | ------------------------------------------------------ |
| `packages/agent-runtime/src/types/hooks.ts` | Type definitions (AgentHookType, all event interfaces) |
| `src/server/services/agentRuntime/hooks/types.ts` | Server-side types (AgentHook, re-exports) |
| `src/server/services/agentRuntime/hooks/HookDispatcher.ts` | Registration, dispatch, dispatchBeforeToolCall |
| `src/server/modules/AgentRuntime/RuntimeExecutors.ts` | Tool/Compact/HumanIntervention hook dispatch |
| `src/server/services/agentRuntime/AgentRuntimeService.ts` | Step hooks + HumanIntervention resume/reject |
| `src/server/services/aiAgent/index.ts` | CallAgent hook dispatch |
## Registration Flow
```ts
const hooks: AgentHook[] = [
{ id: 'my-hook', type: 'afterStep', handler: async (event) => { ... } },
];
await aiAgentService.execAgent({ agentId, prompt, hooks });
// Internally: hookDispatcher.register(operationId, hooks)
// Cleanup: hookDispatcher.unregister(operationId)
```
## Hook Reference
### Step Level
**`beforeStep`** — Before each step. `event: AgentHookEvent`
**`afterStep`** — After each step. `event: AgentHookEvent` (content, toolsCalling, totalCost, etc.)
**`onComplete`** — Terminal state. `event: AgentHookEvent` (reason: done/error/interrupted/max_steps/cost_limit)
**`onError`** — Error occurred. `event: AgentHookEvent` (errorMessage, errorDetail)
### Tool Call Level
**`beforeToolCall`** — Before tool executes. **Supports mocking** via `event.mock()`.
```ts
// event: ToolCallHookEvent
{
(identifier, apiName, args, callIndex, stepIndex, operationId, mock);
}
// Mock example:
event.mock({ content: '{"error":"rate limited"}' });
```
Dispatch method: `hookDispatcher.dispatchBeforeToolCall()` (returns mock result or null).
**`afterToolCall`** — After tool completes. Observation only.
```ts
// event: AfterToolCallHookEvent
{
(identifier, apiName, args, callIndex, content, success, mocked, executionTimeMs, stepIndex);
}
```
**`onToolCallError`** — Tool threw an exception (catch block, not just `success=false`).
```ts
// event: ToolCallErrorHookEvent
{
(identifier, apiName, args, callIndex, error, stepIndex);
}
```
### Human Intervention
**`beforeHumanIntervention`** — Before agent pauses for approval.
```ts
// event: BeforeHumanInterventionHookEvent
{ operationId, stepIndex, pendingTools: [{ identifier, apiName }] }
```
**`afterHumanIntervention`** — After approve/reject, agent resumes.
```ts
// event: AfterHumanInterventionHookEvent
{ operationId, action: 'approve' | 'reject' | 'rejectAndContinue', toolCallId?, rejectionReason? }
```
**`onStopByHumanIntervention`** — User rejected, agent halted.
```ts
// event: StopByHumanInterventionHookEvent
{ operationId, toolCallId?, rejectionReason? }
```
### Context Compression
**`beforeCompact`** — Before compression starts.
```ts
// event: BeforeCompactHookEvent
{
(operationId, stepIndex, messageCount, tokenCount);
}
```
**`afterCompact`** — After compression completes.
```ts
// event: AfterCompactHookEvent
{
(operationId, stepIndex, groupId, messagesBefore, messagesAfter, summary);
}
```
**`onCompactError`** — Compression failed.
```ts
// event: CompactErrorHookEvent
{
(operationId, stepIndex, tokenCount, error);
}
```
### Sub-Agent (CallAgent)
**`beforeCallAgent`** — Before calling sub-agent. Dispatched on **parent** operation.
```ts
// event: BeforeCallAgentHookEvent
{
(operationId, agentId, instruction);
}
```
**`afterCallAgent`** — Sub-agent completed. Dispatched on **parent** operation.
```ts
// event: AfterCallAgentHookEvent
{
(operationId, agentId, subOperationId, threadId, success);
}
```
**`onCallAgentError`** — Sub-agent failed. Dispatched on **parent** operation.
```ts
// event: CallAgentErrorHookEvent
{
(operationId, agentId, error);
}
```
Note: CallAgent hooks require `parentOperationId` in `ExecSubAgentTaskParams`.
## Design Notes
- **Fire-and-forget**: All handlers return `Promise<void>`. Errors are non-fatal.
- **Exception**: `beforeToolCall` supports mock via `event.mock()` — uses `dispatchBeforeToolCall()` which returns the mock result.
- **Sequential**: Same-type hooks run in registration order.
- **Local only**: `beforeToolCall` mock only works in local mode (in-memory hooks). Webhook mode does not support mocking.
- **Scoped per operation**: Auto-cleaned via `hookDispatcher.unregister()` on completion.
- **Sandbox/MCP**: No separate hooks — they go through `executeTool`, so `beforeToolCall`/`afterToolCall` cover them. Use `event.identifier` to filter.
## Real-World Example: agent-evals
See `devtools/agent-evals/helpers/runner.ts``createEvalHooks()` uses `afterStep`, `onComplete`, `afterToolCall`, and `beforeToolCall` (for mock).
-95
View File
@@ -1,95 +0,0 @@
---
name: agent-signal
description: 'Build or extend LobeHub Agent Signal pipelines. Use for signal sources, signal/action types, policies, middleware, workflow handoff, dedupe, scope behavior, or observability.'
---
# Agent Signal
Use this skill to implement event-driven background work for agents without coupling the work to the foreground chat request.
Agent Signal has one consistent shape:
`source event` -> `signal interpretation` -> `action execution` -> built-in result signals
## Start Here
1. Read `references/architecture.md` to map the package boundary, runtime queue, scope model, and async workflow handoff.
2. Read `references/handlers.md` before writing any new policy, source handler, signal handler, or action handler.
3. Read `references/observability.md` when you need tracing, metrics, debugging, or workflow snapshot visibility.
## Use The Right Entry Point
- Use `emitAgentSignalSourceEvent(...)` when a server-owned producer should execute the pipeline immediately.
- Use `executeAgentSignalSourceEvent(...)` when a worker or controlled backend path already owns execution timing and may inject a runtime guard backend.
- Use `enqueueAgentSignalSourceEvent(...)` when the caller should return quickly and let Upstash Workflow process the event out-of-band.
- Use `emitAgentSignalSourceEventWithStore(...)` for isolated tests or evals that should avoid ambient Redis state.
Read:
- `src/server/services/agentSignal/index.ts`
- `src/server/workflows/agentSignal/index.ts`
- `src/server/workflows/agentSignal/run.ts`
## Core Model
- `source`: A normalized fact that happened. Sources come from producers such as runtime lifecycle events, user messages, or bot ingress.
- `signal`: A semantic interpretation derived from one source or from another signal. Signals express meaning, routing, or policy state.
- `action`: A concrete side effect planned from one signal. Actions do the work.
- `policy`: An installable middleware bundle that registers source, signal, and action handlers.
- `procedure`: Not a distinct runtime node. Treat "procedure" as the end-to-end flow for one use case: ingress source, matching handlers, planned actions, execution result, and observability.
Keep the boundaries strict:
- Add a new `source` when the outside world produced a new event.
- Add a new `signal` when the system needs a reusable semantic interpretation.
- Add a new `action` when the runtime needs a concrete side effect.
- Add or update a `policy` when you are wiring those pieces together.
## Implementation Workflow
1. Decide whether the use case is synchronous or quiet background work.
2. Define or reuse a source type in `src/server/services/agentSignal/sourceTypes.ts`.
3. Define or reuse signal and action types in `src/server/services/agentSignal/policies/types.ts`.
4. Implement handlers with `defineSourceHandler`, `defineSignalHandler`, or `defineActionHandler`.
5. Bundle handlers with `defineAgentSignalHandlers(...)`.
6. Register the policy in `src/server/services/agentSignal/policies/index.ts` and pass it into the runtime factory if needed.
7. Add or update ingress code that emits or enqueues the source event.
8. Add observability and tests before considering the flow complete.
## Default Reading Set
- Shared semantic core:
`packages/agent-signal/src/index.ts`
`packages/agent-signal/src/base/builders.ts`
`packages/agent-signal/src/base/types.ts`
- Server-owned runtime and middleware:
`src/server/services/agentSignal/runtime/AgentSignalRuntime.ts`
`src/server/services/agentSignal/runtime/AgentSignalScheduler.ts`
`src/server/services/agentSignal/runtime/middleware.ts`
`src/server/services/agentSignal/runtime/context.ts`
- Existing policy example:
`src/server/services/agentSignal/policies/analyzeIntent/index.ts`
`src/server/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
`src/server/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
`src/server/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
`src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
- Observability:
`src/server/services/agentSignal/observability/projector.ts`
`src/server/services/agentSignal/observability/traceEvents.ts`
`packages/observability-otel/src/modules/agent-signal/index.ts`
## Implementation Rules
- Reuse existing source, signal, and action types before adding new ones.
- Keep source handlers focused on interpretation and fan-out, not heavy side effects.
- Keep action handlers responsible for side effects, idempotency, and executor-style result reporting.
- Use stable ids and idempotency keys when the same source can arrive more than once.
- Preserve scope discipline. The runtime uses `scopeKey` to serialize related background work.
- Prefer the dedicated shared package types and builders from `@lobechat/agent-signal` for normalized nodes and result contracts.
- Add focused tests near the touched runtime, policy, or store module. Existing tests under `src/server/services/agentSignal/**/__tests__` are the reference pattern.
## References
- Architecture and boundaries: `references/architecture.md`
- Writing handlers and policies: `references/handlers.md`
- Observability, metrics, and debugging: `references/observability.md`
@@ -1,4 +0,0 @@
interface:
display_name: 'Agent Signal'
short_description: 'Build AgentSignal sources, signals, actions, and policies.'
default_prompt: 'Use $agent-signal to add a new Agent Signal source, policy, handler, or observability flow.'
@@ -1,199 +0,0 @@
# Agent Signal Architecture
## Pipeline
Use this mental model first:
```text
producer
-> emitAgentSignalSourceEvent(...) or enqueueAgentSignalSourceEvent(...)
-> emitSourceEvent(...)
-> dedupe + scope lock + source normalization
-> runtime.emitNormalized(source)
-> source handlers
-> signal handlers
-> action handlers
-> built-in result signals
-> observability projection + persistence
```
The scheduler is queue-driven, not hard-coded for one policy:
```text
source node
-> matching source handlers
-> dispatch signals/actions
-> matching signal handlers
-> dispatch more signals/actions
-> matching action handlers
-> ExecutorResult
-> signal.action.applied | signal.action.skipped | signal.action.failed
```
Read:
- `src/server/services/agentSignal/index.ts`
- `src/server/services/agentSignal/sources/index.ts`
- `src/server/services/agentSignal/runtime/AgentSignalScheduler.ts`
## Package Boundaries
### `packages/agent-signal`
Treat this as the shared semantic core.
It provides:
- base node types: source, signal, action
- builders: `createSource`, `createSignal`, `createAction`
- built-in result signal types
- runtime result contracts such as `RuntimeProcessorResult` and `ExecutorResult`
Read:
- `packages/agent-signal/src/base/types.ts`
- `packages/agent-signal/src/base/builders.ts`
- `packages/agent-signal/src/types/events.ts`
- `packages/agent-signal/src/types/builtin.ts`
### `src/server/services/agentSignal`
Treat this as the server-owned implementation layer.
It owns:
- source catalogs and payload maps
- policy-specific signal and action catalogs
- middleware registration
- runtime scheduling and guard backends
- Redis-backed dedupe, waypoint, and policy state
- service entrypoints for synchronous and async execution
### `packages/observability-otel/src/modules/agent-signal`
Treat this as shared OTEL ownership for Agent Signal metrics and tracer instances.
## Core Vocabulary
### Source
A source is the normalized external fact that started the chain.
Examples:
- `agent.user.message`
- `runtime.before_step`
- `runtime.after_step`
- `client.runtime.start`
- `bot.message.merged`
Define source payloads in:
- `src/server/services/agentSignal/sourceTypes.ts`
Build normalized sources in:
- `src/server/services/agentSignal/sources/buildSource.ts`
- `packages/agent-signal/src/base/builders.ts`
### Signal
A signal is a semantic interpretation. Signals should be reusable and meaning-oriented.
Examples from `analyzeIntent`:
- `signal.feedback.satisfaction`
- `signal.feedback.domain.memory`
- `signal.feedback.domain.prompt`
- `signal.feedback.domain.skill`
Define server-owned signal types in:
- `src/server/services/agentSignal/policies/types.ts`
### Action
An action is a concrete side effect the runtime should execute.
Example:
- `action.user-memory.handle`
Action handlers usually:
- check idempotency
- call tools, models, or services
- return `ExecutorResult`
### Policy
A policy is an installable bundle of handlers. It is the composition unit that turns the generic runtime into a feature.
Example:
- `createAnalyzeIntentPolicy(...)`
### Procedure
"Procedure" is not a first-class type in this runtime. Use the word to describe one end-to-end use case:
1. define ingress source
2. emit or enqueue the source
3. interpret source into signals
4. plan actions from signals
5. execute actions
6. persist trace and metrics
When a user asks for "the procedure", document the flow above and point to the exact producer, handlers, and execution entrypoint.
## Scope, Deduping, And Quiet Background Work
`scopeKey` is the serialization boundary for related work. It is used for:
- source dedupe windows
- scope locks during source generation
- runtime guard state
- waypoint persistence for queued processing
Read:
- `src/server/services/agentSignal/sources/index.ts`
- `src/server/services/agentSignal/runtime/context.ts`
- `src/server/services/agentSignal/constants.ts`
Use `enqueueAgentSignalSourceEvent(...)` when the work should stay quiet and out-of-band. That path:
1. normalizes the source envelope
2. derives or reuses `scopeKey`
3. triggers `AgentSignalWorkflow`
4. executes later in `runAgentSignalWorkflow`
This is the preferred path when the UI request should finish immediately and the policy can run in the background.
Read:
- `src/server/workflows/agentSignal/index.ts`
- `src/server/workflows/agentSignal/run.ts`
## Existing Example: `analyzeIntent`
Use `analyzeIntent` as the reference chain:
```text
agent.user.message
-> feedback satisfaction source handler
-> signal.feedback.satisfaction
-> feedback domain signal handler
-> signal.feedback.domain.*
-> feedback action planner
-> action.user-memory.handle
-> signal.action.applied | skipped | failed
```
Read:
- `src/server/services/agentSignal/policies/analyzeIntent/index.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
@@ -1,228 +0,0 @@
# Writing Handlers And Policies
## Fluent Registration API
Use the middleware helpers in `src/server/services/agentSignal/runtime/middleware.ts`.
They provide:
- `defineSourceHandler(...)`
- `defineSignalHandler(...)`
- `defineActionHandler(...)`
- `defineAgentSignalHandlers(...)`
These helpers do two jobs:
1. keep handler registration terse
2. preserve strong typing when `listen` points at concrete source, signal, or action types
## Handler Shape
Each handler receives:
- the current runtime node
- `RuntimeProcessorContext`
The context gives you:
- `scopeKey`
- `now()`
- `runtimeState.getGuardState(lane)`
- `runtimeState.touchGuardState(lane, now?)`
Read:
- `src/server/services/agentSignal/runtime/context.ts`
## Return Contracts
Return one of these shapes:
- `void`: no fan-out, stop at this handler
- `{ status: 'dispatch', signals?, actions? }`: continue the chain
- `{ status: 'wait', pending? }`: pause for later host coordination
- `{ status: 'schedule', nextHop }`: schedule another hop
- `{ status: 'conclude', concluded? }`: stop with a terminal runtime result
- `ExecutorResult`: only for action handlers that performed a concrete side effect
Read:
- `packages/agent-signal/src/base/types.ts`
- `src/server/services/agentSignal/runtime/AgentSignalScheduler.ts`
## Policy Composition Pattern
Use `defineAgentSignalHandlers([...])` to bundle related handlers into one policy.
Example from `analyzeIntent`:
```ts
return defineAgentSignalHandlers([
createFeedbackSatisfactionJudgeProcessor(...),
createFeedbackDomainJudgeSignalHandler(...),
createFeedbackActionPlannerSignalHandler(),
defineUserMemoryActionHandler(...),
]);
```
That bundle is later passed into the runtime via:
- `createDefaultAgentSignalPolicies(...)`
- `createAgentSignalRuntime({ policies })`
Read:
- `src/server/services/agentSignal/policies/index.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/index.ts`
## Source Handler Pattern
Use a source handler when you are interpreting a producer event into semantic signals.
Reference:
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackSatisfaction.ts`
Pattern:
```ts
return defineSourceHandler(
AGENT_SIGNAL_SOURCE_TYPES.agentUserMessage,
'agent.user.message:my-handler',
async (source, ctx): Promise<RuntimeProcessorResult | void> => {
// interpret source payload
// optionally use ctx.runtimeState
return {
signals: [
/* one or more semantic signals */
],
status: 'dispatch',
};
},
);
```
Write source handlers when:
- a raw message, lifecycle event, or bot ingress needs interpretation
- the work is still semantic, not side-effectful
## Signal Handler Pattern
Use a signal handler when one semantic state should branch into more semantic states or planned actions.
References:
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackDomain.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/feedbackAction.ts`
Pattern:
```ts
return defineSignalHandler(
MY_SIGNAL_TYPE,
'signal.my-policy-router',
async (signal): Promise<RuntimeProcessorResult | void> => {
return {
actions: [
/* planned work */
],
status: 'dispatch',
};
},
);
```
Use signal handlers for:
- routing
- fan-out
- filtering
- conflict resolution
- converting interpretation into planned actions
## Action Handler Pattern
Use an action handler when the runtime should do actual work.
Reference:
- `src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
Pattern:
```ts
return defineActionHandler(
MY_ACTION_TYPE,
'action.my-policy-executor',
async (action, ctx): Promise<ExecutorResult> => {
// run service/tool/model side effect
// check idempotency if needed
return {
actionId: action.actionId,
attempt: {
completedAt: ctx.now(),
current: 1,
startedAt,
status: 'succeeded',
},
status: 'applied',
};
},
);
```
Keep these rules:
- perform idempotency checks here or immediately before side effects
- return stable `actionId`
- include failure detail in `error`
- let the scheduler turn the `ExecutorResult` into built-in result signals
## Source, Signal, And Action Type Placement
Use this split:
- external event payloads:
`src/server/services/agentSignal/sourceTypes.ts`
- policy-owned signal and action payloads:
`src/server/services/agentSignal/policies/types.ts`
- normalized shared node contracts:
`packages/agent-signal/src/base/types.ts`
Do not put app-specific signal catalogs into `packages/agent-signal`. That package should stay generic and reusable.
## Choosing The Right Node
Choose `source` when:
- the outside world emitted a new fact
Choose `signal` when:
- the system needs semantic meaning that downstream handlers can reuse
Choose `action` when:
- the runtime is ready for a concrete side effect
If a handler both interprets meaning and performs side effects, split it. That keeps chains inspectable and testable.
## Testing Strategy
Prefer focused tests near the touched code.
Useful references:
- `src/server/services/agentSignal/runtime/__tests__/AgentSignalRuntime.test.ts`
- `src/server/services/agentSignal/__tests__/index.integration.test.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/__tests__/*`
- `src/server/services/agentSignal/policies/analyzeIntent/actions/__tests__/*`
Test at the smallest level that proves the behavior:
- handler unit test for one routing rule
- runtime test for queue fan-out
- integration test for service ingress and observability persistence
@@ -1,118 +0,0 @@
# Observability And Debugging
## OTEL Ownership
Use `packages/observability-otel/src/modules/agent-signal/index.ts` for the shared tracer and metrics.
Available instruments:
- `tracer`
- `sourceCounter`
- `signalCounter`
- `actionCounter`
- `actionResultCounter`
- `chainCounter`
- `signalActionTransitionCounter`
- `chainDurationHistogram`
- `actionDurationHistogram`
Use this module when you need shared telemetry ownership instead of creating feature-local meters or tracers.
## Projection Pipeline
After runtime execution, the service projects one compact observability model from the full chain.
Read:
- `src/server/services/agentSignal/observability/projector.ts`
- `src/server/services/agentSignal/observability/traceEvents.ts`
- `src/server/services/agentSignal/observability/store.ts`
Projection outputs:
- a trace envelope with source, signals, actions, results, edges, and handler runs
- a compact telemetry record with dominant path, status breakdown, and chain metadata
This projection is built from:
- source node
- emitted signals
- planned actions
- executor results
## How To Inspect A Chain
Use this order:
1. Inspect the source type and payload.
2. Inspect emitted signals.
3. Inspect planned actions.
4. Inspect executor results.
5. Inspect projected edges and dominant path.
The helper `toAgentSignalTraceEvents(...)` flattens a chain into compact event records suitable for tracing snapshots.
## Workflow Snapshot Bridge
Workflow-triggered runs do not naturally pass through the normal foreground runtime snapshot path, so `runAgentSignalWorkflow` adds a development-only bridge into `.agent-tracing/`.
Read:
- `src/server/workflows/agentSignal/run.ts`
Use that path when:
- the source was enqueued with `enqueueAgentSignalSourceEvent(...)`
- you need local trace visibility for quiet background work
## Common Debug Questions
### The source emits but nothing happens
Check:
- feature gate enabled for the user
- source type matches a registered source handler
- dedupe or scope lock did not short-circuit generation
Read:
- `src/server/services/agentSignal/index.ts`
- `src/server/services/agentSignal/sources/index.ts`
### The signal exists but no action runs
Check:
- the signal type has a registered signal handler
- the signal handler returns `status: 'dispatch'`
- the handler actually returned actions
### The action runs twice
Check:
- source dedupe key stability
- action idempotency strategy
- scope key stability across retries and workflow handoff
Reference:
- `src/server/services/agentSignal/policies/actionIdempotency.ts`
- `src/server/services/agentSignal/policies/analyzeIntent/actions/userMemory.ts`
### Background runs are hard to discover
Check:
- workflow snapshot bridge in development
- projected telemetry record contents
- OTEL counters and histograms in the shared module
## Minimal Completion Checklist
- source ingress is testable
- handler registration is discoverable from the policy factory
- action executor returns structured results
- projection includes the new path cleanly
- tests cover at least one happy path and one no-op or failure path
+6 -7
View File
@@ -1,6 +1,6 @@
---
name: agent-tracing
description: 'Agent tracing CLI for execution snapshots. Use for agent-tracing, traces, snapshots, LLM call inspection, context engine data, agent step analysis, or execution debugging.'
description: "Agent tracing CLI for inspecting agent execution snapshots. Use when user mentions 'agent-tracing', 'trace', 'snapshot', wants to debug agent execution, inspect LLM calls, view context engine data, or analyze agent steps. Triggers on agent debugging, trace inspection, or execution analysis tasks."
user-invocable: false
---
@@ -14,7 +14,7 @@ In `NODE_ENV=development`, `AgentRuntimeService.executeStep()` automatically rec
**Data flow**: executeStep loop -> build `StepPresentationData` -> write partial snapshot to disk -> on completion, finalize to `.agent-tracing/{timestamp}_{traceId}.json`
**Context engine capture**: In `RuntimeExecutors.ts`, the `call_llm` executor calls `ctx.tracingContextEngine(input, output)` after `serverMessagesEngine()` processes messages. `AgentRuntimeService.executeStep` buffers the call per step and forwards it to `OperationTraceRecorder.appendStep` as the typed `contextEngine` field. CE flows through this side channel rather than the `events` array so its heavy payload (agentDocuments, systemRole, …) never enters the Redis state pipeline (LOBE-9110).
**Context engine capture**: In `RuntimeExecutors.ts`, the `call_llm` executor emits a `context_engine_result` event after `serverMessagesEngine()` processes messages. This event carries the full `contextEngineInput` (DB messages, systemRole, model, knowledge, tools, userMemory, etc.) and the processed `output` messages (the final LLM payload).
## Package Location
@@ -199,10 +199,9 @@ interface StepSnapshot {
messages?: any[]; // DB messages before step
context?: { phase: string; payload?: unknown; stepContext?: unknown };
events?: Array<{ type: string; [key: string]: unknown }>;
contextEngine?: {
input?: unknown; // contextEngineInput minus messages + toolsConfig (reconstructible from baseline)
output?: unknown; // processed messages array (final LLM payload)
};
// context_engine_result event contains:
// input: full contextEngineInput (messages, systemRole, model, knowledge, tools, userMemory, ...)
// output: processed messages array (final LLM payload)
}
```
@@ -217,5 +216,5 @@ When using `--messages`, the output shows three sections (if context engine data
## Integration Points
- **Recording**: `src/server/services/agentRuntime/AgentRuntimeService.ts` — in the `executeStep()` method, after building `stepPresentationData`, writes partial snapshot in dev mode
- **Context engine capture**: `src/server/modules/AgentRuntime/RuntimeExecutors.ts` — in `call_llm` executor, after `serverMessagesEngine()` returns, calls `ctx.tracingContextEngine(input, output)`. `AgentRuntimeService.executeStep` buffers it per step and passes it to `traceRecorder.appendStep` as the typed `contextEngine` field (kept off the `events` array to stay out of Redis state).
- **Context engine event**: `src/server/modules/AgentRuntime/RuntimeExecutors.ts` — in `call_llm` executor, after `serverMessagesEngine()` returns, emits `context_engine_result` event
- **Store**: `FileSnapshotStore` reads/writes to `.agent-tracing/` relative to `process.cwd()`
-130
View File
@@ -1,130 +0,0 @@
---
name: builtin-tool
description: 'Build LobeHub builtin tool packages. Use when adding agent-callable tools, manifests, executors, runtimes, inspectors, renders, placeholders, streaming, interventions, portals, or tool registries.'
---
# Builtin Tool Authoring Guide
A builtin tool is a package the agent runtime can call. It ships **five faces**:
| Face | Lives in | Audience |
| -------------------- | -------------------------------------------------------------------------------------- | ------------------------------------- |
| **Manifest + types** | `src/{manifest,types,systemRole}.ts` | The LLM (tool spec + system prompt) |
| **ExecutionRuntime** | `src/ExecutionRuntime/` | Server / desktop / any runtime caller |
| **Executor** | `src/client/executor/` | Frontend (wraps stores/services) |
| **Client UI** | `src/client/{Inspector,Render,…}/` | Chat UI |
| **Registry wiring** | `packages/builtin-tools/src/*.ts` + `src/store/tool/slices/builtin/executors/index.ts` | Framework |
---
## Read These First
| Question | Doc |
| ------------------------------------------------------------------------------------ | --------------------------------------------- |
| Where do files live? What does each face do? Wiring? | [architecture.md](references/architecture.md) |
| How do I name the tool, design APIs, write the manifest, executor, ExecutionRuntime? | [tool-design.md](references/tool-design.md) |
| How do I build Inspector / Render / Placeholder / Streaming / Intervention / Portal? | [ui/](references/ui/README.md) |
---
## When to Use This Skill
- Creating a new `packages/builtin-tool-<name>/` package
- Adding a new API method to an existing builtin tool
- Building or restyling any of the 6 client surfaces for a tool
- Wiring a tool into the central registries
- Debugging "tool not found / API not found / render not showing / placeholder stuck" errors
---
## Top-Level Design Principles
1. **`lobe-<domain>` identifier is permanent.** It's stored in message history. Renames need `@deprecated` aliases (see `packages/builtin-tools/src/inspectors.ts:88-89`). Get it right the first time.
2. **ApiName is an `as const` object**, not a TS enum. It doubles as the runtime list `BaseExecutor` iterates over.
3. **Three result fields, three audiences:**
- `content: string` → the LLM reads it
- `state: Record<…>` → the UI's `pluginState`; **result-domain only**, never echo all params back
- `error: { type, message, body? }` → both LLM and UI; `type` is a stable code
4. **Split execution from frontend wiring.**
- `src/ExecutionRuntime/` — pure runtime, no React, no Zustand, accepts services via constructor. **The default place for new logic.**
- `src/client/executor/``BaseExecutor` subclass that calls `ExecutionRuntime` (or stores/services directly when frontend-only).
5. **UI defaults to "do nothing".** Inspector is required (the header strip). Render/Placeholder/Streaming/Intervention/Portal are added **only when there's something specific to show** — empty registries are fine.
6. **Style with `createStaticStyles + cssVar.*`** (zero-runtime). Fall back to `createStyles + token` only when you genuinely need runtime values. Use `@lobehub/ui` components, not raw antd.
7. **i18n keys live in `src/locales/default/plugin.ts`.** Inspector titles must come from `t('builtins.<identifier>.apiName.<api>')` so something renders while args stream.
---
## Package Layout (preferred, post-2026 convention)
```
packages/builtin-tool-<name>/
├── package.json
└── src/
├── index.ts # exports manifest + types + systemRole + Identifier (no React, no stores)
├── manifest.ts # BuiltinToolManifest with JSON Schema for every API
├── types.ts # ApiName const + Params/State interfaces per API
├── systemRole.ts # System prompt teaching the model when/how to use the APIs
├── ExecutionRuntime/ # ✅ Default home for runtime logic (server- or anywhere-callable)
│ └── index.ts
└── client/
├── index.ts # Re-exports for the registries
├── executor/ # ✅ Frontend executor — extends BaseExecutor, often delegates to ExecutionRuntime
│ └── index.ts
├── Inspector/ # required — header chip per API
├── Render/ # optional — rich result card
├── Placeholder/ # optional — skeleton during streaming/execution
├── Streaming/ # optional — live output renderer (e.g. RunCommand, WriteFile)
├── Intervention/ # optional — approval / edit-before-run UI
├── Portal/ # optional — full-screen detail view
└── components/ # shared subcomponents used by the surfaces above
```
**Older packages** (`builtin-tool-task`, `builtin-tool-calculator`, etc.) still have `src/executor/` as a sibling of `src/client/`. That's grandfathered; **don't relocate without a deliberate refactor**. New packages and new APIs added to existing packages should follow the layout above.
`package.json` exports map:
```json
"exports": {
".": "./src/index.ts",
"./client": "./src/client/index.ts",
"./executor": "./src/client/executor/index.ts",
"./executionRuntime": "./src/ExecutionRuntime/index.ts"
}
```
---
## Authoring Checklist
Before opening the PR:
- [ ] Identifier follows `lobe-<domain>` and is **stable** (lives in message history).
- [ ] Every `<Name>ApiName` value has: a manifest `api[]` entry, an executor method, an Inspector, an i18n `apiName.*` key.
- [ ] `Params` interfaces match the JSON Schema; `State` interfaces match what the executor returns and what the UI surfaces read.
- [ ] System prompt disambiguates confusable APIs and points to batch variants.
- [ ] Runtime logic lives in `ExecutionRuntime/`; the `client/executor/` only wires stores/services and delegates.
- [ ] Executor returns `{ success, content, state, error? }` via a single `toResult()` funnel — `content` always non-empty (default to `error.message`).
- [ ] Inspector handles `isArgumentsStreaming`, `isLoading`, `partialArgs`, missing `pluginState`.
- [ ] Render returns `null` until it has data; only created for APIs with rich results.
- [ ] Placeholder added if the API has a perceivable execution lag (search, list, crawl).
- [ ] Streaming added for APIs that emit incremental output (run command, write file, code execution).
- [ ] Intervention added if `humanIntervention` is set in the manifest.
- [ ] All registry files updated (see [architecture.md → Registry wiring](references/architecture.md#registry-wiring)).
- [ ] i18n keys in `src/locales/default/plugin.ts` plus dev seeds in `en-US`/`zh-CN`.
- [ ] `bunx vitest run --silent='passed-only' 'packages/builtin-tool-<name>'` passes.
- [ ] `bun run type-check` passes.
---
## Reference Tools
Pick the closest neighbor and copy:
| If your tool is… | Read first |
| ----------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| Pure-compute, no UI state | `packages/builtin-tool-calculator/``ExecutionRuntime` reuses executor (mathjs/nerdamer work everywhere) |
| CRUD over a domain entity | `packages/builtin-tool-task/` — full Inspector + Render set, batch variants |
| Heavy UI (Inspector/Render/Placeholder/Portal) | `packages/builtin-tool-web-browsing/` — search-style result UI, Portal for detail view |
| Desktop / filesystem with all surfaces (incl. Streaming + Intervention) | `packages/builtin-tool-local-system/``ExecutionRuntime` injects an `ILocalSystemService`, executor calls it |
| Server-side pure (no client executor) | `packages/builtin-tool-web-browsing/` — only `ExecutionRuntime` is exported; the chat client doesn't run it |
| Needs human approval before running | `packages/builtin-tool-local-system/src/client/Intervention/` — per-API approval components |
@@ -1,315 +0,0 @@
# Builtin Tool Architecture
## The Five Faces
A builtin tool ships five distinct faces, each compiled into a different bundle:
```
┌─────────────────────────────────────────────────────────────────┐
│ ./ │
│ Manifest + Types + systemRole │
│ ─ Pure data, no React, no Node-only deps. │
│ ─ Imported by: server (LLM tool spec), client (registries), │
│ anyone who needs to know "what tools exist". │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ./executionRuntime │
│ src/ExecutionRuntime/index.ts │
│ ─ Pure runtime logic. Accepts services via constructor — │
│ never imports concrete services or stores directly. │
│ ─ Imported by: server (BuiltinServerRuntimeOutput), tests, │
│ and the client executor as a delegate. │
│ ─ Returns: BuiltinServerRuntimeOutput { content, state, … } │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ./executor │
│ src/client/executor/index.ts │
│ ─ BaseExecutor subclass. Wires Zustand stores and frontend │
│ services into ExecutionRuntime, then funnels through │
│ toResult() into BuiltinToolResult { content, state, error, │
│ success }. │
│ ─ Imported by: src/store/tool/slices/builtin/executors/ │
│ index.ts (registered as a singleton). │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ./client │
│ src/client/{Inspector,Render,Placeholder,Streaming, │
│ Intervention,Portal,components}/ │
│ ─ React 'use client' surfaces. Read args + pluginState. │
│ ─ Imported by: packages/builtin-tools/src/{inspectors, │
│ renders,placeholders,streamings,interventions,portals}.ts. │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ Registry wiring │
│ packages/builtin-tools/src/*.ts │
│ src/store/tool/slices/builtin/executors/index.ts │
│ ─ Aggregator maps: identifier → { apiName → component }. │
└─────────────────────────────────────────────────────────────────┘
```
The split exists so:
- Server bundles import only `./` and `./executionRuntime` and never touch React.
- Frontend bundles import `./client` and never touch Node-only services.
- The runtime is testable without React or Electron present.
---
## Why ExecutionRuntime is the Default Home for Logic
**Old pattern (grandfathered):** business logic in `src/executor/` directly. Examples: `builtin-tool-task`, older tools. Works, but the executor mixes runtime logic with frontend service plumbing — hard to reuse on the server.
**New pattern (preferred):** business logic in `src/ExecutionRuntime/`, frontend wiring in `src/client/executor/`. Examples: `builtin-tool-local-system`, `builtin-tool-web-browsing`, `builtin-tool-calculator`.
```
ExecutionRuntime
├─ accepts services via constructor (or `static create(opts)`)
├─ returns BuiltinServerRuntimeOutput (content + state + success)
└─ no React, no Zustand, no `@/services/...` direct imports
client/executor
├─ extends BaseExecutor<typeof <Name>ApiName>
├─ holds a `runtime = new <Name>ExecutionRuntime(realService)` instance
├─ each ApiName method:
│ 1. resolve scope / pull defaults from BuiltinToolContext
│ 2. call runtime.<method>(args)
│ 3. funnel through toResult() → BuiltinToolResult
└─ exported singleton: export const <name>Executor = new <Name>Executor()
```
### Service injection
`ExecutionRuntime` should declare a TypeScript interface for the services it needs and accept the implementation via constructor. Server callers wire in real implementations; tests wire in mocks. Example from `local-system`:
```ts
export interface ILocalSystemService {
readLocalFile: (params: any) => Promise<any>;
writeFile: (params: any) => Promise<any>;
/* … */
}
export class LocalSystemExecutionRuntime extends ComputerRuntime {
constructor(private service: ILocalSystemService) {
super();
}
/* methods delegate to this.service.* */
}
```
The `client/executor` instantiates it once with the real service:
```ts
import { localFileService } from '@/services/electron/localFileService';
import { LocalSystemExecutionRuntime } from '../../ExecutionRuntime';
class LocalSystemExecutor extends BaseExecutor<typeof LocalSystemApiEnum> {
private runtime = new LocalSystemExecutionRuntime(localFileService);
/* … */
}
```
### When ExecutionRuntime is the only thing you ship
Some tools are server-only — there's no frontend executor. `builtin-tool-web-browsing` is the canonical example: only `./` and `./executionRuntime` are exported, no `./executor`, and the runtime is constructed by the server-side `ToolExecutionService`. Skip `client/executor/` entirely for those.
### When the executor reuses the runtime as-is
Pure-compute tools (`builtin-tool-calculator`) often have an executor whose ApiName methods call `executor.calculate(args)` and an `ExecutionRuntime` whose methods call `calculatorExecutor.calculate(args)` — same logic, two thin wrappers. That's fine; the duplication buys you the bundle split.
---
## The Result Contract
### `BuiltinServerRuntimeOutput` (what ExecutionRuntime returns)
```ts
{
content: string; // the LLM-facing text — never undefined; default to error message
state?: any; // result-domain object the UI reads as pluginState
success: boolean; // mandatory
error?: any; // raw error; the executor will repackage
}
```
### `BuiltinToolResult` (what the executor returns to the runtime)
```ts
{
success: boolean;
content?: string;
state?: any;
error?: { type: string; message: string; body?: any };
metadata?: Record<string, any>; // rare; e.g. { agentCouncil: true }
stop?: boolean; // rare; halt the orchestration step
}
```
### The `toResult` funnel (mandatory)
Every executor method returns through a single `toResult()` to enforce two invariants:
1. **`content` is never undefined.** A missing content collapses downstream into `''`, leaving the Debug pane blank while `pluginState` was already saved. See the `globLocalFiles` regression in `local-system/src/client/executor/index.ts:60-84`.
2. **`state` survives failures.** Renderers can keep showing partial output even when `success: false`.
```ts
private toResult(output: BuiltinServerRuntimeOutput): BuiltinToolResult {
const errorMessage = typeof output.error?.message === 'string' ? output.error.message : undefined;
const safeContent = output.content || errorMessage || 'Tool execution failed';
if (!output.success) {
return {
success: false,
content: safeContent,
state: output.state,
error: output.error
? { type: 'PluginServerError', message: errorMessage ?? safeContent, body: output.error }
: undefined,
};
}
return { success: true, content: safeContent, state: output.state };
}
```
---
## `BaseExecutor` — How Method Dispatch Works
`BaseExecutor.invoke(apiName, params, ctx)` does:
```ts
if (!this.hasApi(apiName)) return { error: { type: 'ApiNotFound', }, success: false };
return (this as any)[apiName](params, ctx); // method name MUST equal apiName value
```
So:
- **Method names must equal `<Name>ApiName` values, exactly.** A typo silently routes to "ApiNotFound".
- **Methods must be class fields, not class methods**, because `this` is lost when registry calls `executor.invoke(apiName, params, ctx)`. Always declare as `methodName = async (…) => { … }`.
- **Always destructure `apiEnum` and `identifier` as `readonly` instance fields**, not getters — `BaseExecutor.hasApi/getApiNames` reads them synchronously.
---
## `BuiltinToolContext` — What the Executor Receives
The runtime hands every executor method an optional `BuiltinToolContext` as the second argument:
| Field | Use |
| ----------------------------- | -------------------------------------------------------------- |
| `agentId` | Default agent for "current agent" semantics (e.g. `listTasks`) |
| `groupId` | Group chat scope |
| `topicId` | Current topic — needed when creating messages/operations |
| `taskId` | Current task identifier — fallback for "implicit" param |
| `documentId` | Current page/document scope |
| `messageId` | The tool message being created (for state attachments) |
| `sourceMessageId` | The user message that triggered this tool turn |
| `operationId` | Operation lineage (use for cancellation, tracing) |
| `scope` | `'task' \| 'agent' \| …` — toggles default behaviors |
| `signal: AbortSignal` | Honor for long-running ops |
| `stepContext` | Cross-message runtime state (lobe-agent todos, etc.) |
| `registerAfterCompletion(cb)` | Defer side-effects past message-update race |
| `groupOrchestration` | Group orchestration callbacks |
**Use rule:** read with `?.`, fall back to explicit params, **never silently override** an explicit param with a context value.
---
## i18n Integration
Source of truth: `src/locales/default/plugin.ts`. Keys follow `builtins.<identifier>.<topic>.<…>`:
| Key | Use |
| ------------------------------------- | ------------------------------------------------------------ |
| `builtins.<identifier>.title` | Display title (overrides `manifest.meta.title` when present) |
| `builtins.<identifier>.apiName.<api>` | Inspector header label (one per ApiName) |
| `builtins.<identifier>.inspector.<…>` | Extra Inspector strings ("no results", chips, counters) |
| `builtins.<identifier>.<feature>.<…>` | Render / Intervention strings, free-form per tool |
For dev preview, also seed `locales/zh-CN/plugin.json` and `locales/en-US/plugin.json`. Run `pnpm i18n` before opening a PR — it's slow, so do it once at the end. (See the **i18n** skill for the full workflow.)
---
## Registry Wiring
Five core files plus optional ones. Miss any and you'll see "tool not found", a missing chip, a blank result card, a stuck spinner, or an approval dialog that never appears.
| File | Add what |
| -------------------------------------------------- | ----------------------------------------------------------------------------------------- |
| **Required** | |
| `packages/builtin-tools/src/index.ts` | Import `<Name>Manifest`; push entry to `builtinTools`. Set `hidden`/`discoverable` flags. |
| `packages/builtin-tools/src/identifiers.ts` | Add `<Name>Manifest.identifier` to `builtinToolIdentifiers`. |
| `packages/builtin-tools/src/inspectors.ts` | Import `<Name>Inspectors, <Name>Manifest`; add to `BuiltinToolInspectors`. |
| `src/store/tool/slices/builtin/executors/index.ts` | Import `<name>Executor`; add to `registerExecutors([…])`. |
| **Conditional — add only if the surface exists** | |
| `packages/builtin-tools/src/renders.ts` | Add to `BuiltinToolsRenders` if any API has a Render. |
| `packages/builtin-tools/src/placeholders.ts` | Add to `BuiltinToolPlaceholders` if any API has a Placeholder. |
| `packages/builtin-tools/src/streamings.ts` | Add to `BuiltinToolStreamings` if any API has a Streaming renderer. |
| `packages/builtin-tools/src/interventions.ts` | Add to `BuiltinToolInterventions` if any API has an Intervention component. |
| `packages/builtin-tools/src/portals.ts` | Add to `BuiltinToolsPortals` if the tool has a Portal. |
| `packages/builtin-tools/src/displayControls.ts` | Add if Render must show/hide based on result content (rare; see ClaudeCode/Codex). |
### Optional flags in `packages/builtin-tools/src/index.ts`
```ts
{
identifier: TaskManifest.identifier,
manifest: TaskManifest,
type: 'builtin',
hidden: true, // hide from chat-input Tools popover
discoverable: false, // exclude from agent builder / skill discovery
}
```
Lists in the same file you may need to touch:
- `defaultToolIds` — added to the agent's tool list by default
- `alwaysOnToolIds` — forced on regardless of user selection (use sparingly)
- `runtimeManagedToolIds` — enable state controlled by runtime, not user UI; **must mirror the rules map** in `src/server/modules/Mecha/AgentToolsEngine/index.ts` and `src/helpers/toolEngineering/index.ts`
---
## File-Map at a Glance
```
packages/builtin-tool-<name>/
├── package.json # exports: ., ./client, ./executor, ./executionRuntime
└── src/
├── index.ts # export Manifest, Identifier, types, systemPrompt
├── manifest.ts # BuiltinToolManifest + Identifier const
├── types.ts # ApiName + Params/State per API
├── systemRole.ts # System prompt (multiple variants OK: systemRole.desktop.ts)
├── ExecutionRuntime/
│ └── index.ts # <Name>ExecutionRuntime — pure runtime, service injection
└── client/
├── index.ts # exports for the registries
├── executor/
│ └── index.ts # <Name>Executor extends BaseExecutor; export <name>Executor
├── Inspector/
│ ├── index.ts # <Name>Inspectors record
│ └── <ApiName>/index.tsx # one folder per API (or .tsx file when trivial)
├── Render/
│ ├── index.ts # <Name>Renders record
│ └── <ApiName>/ # rich renders → folder with subcomponents
├── Placeholder/
│ ├── index.ts
│ └── <ApiName>.tsx # usually a single skeleton file
├── Streaming/
│ ├── index.ts
│ └── <ApiName>/ # live-output renderer
├── Intervention/
│ ├── index.ts
│ └── <ApiName>/ # approval / edit-before-run UI
├── Portal/
│ ├── index.tsx # routing component (switch on apiName)
│ └── <ApiName>/ # full-screen detail view
└── components/ # FileItem, EngineAvatar, etc. — shared subcomponents
```
Skip every `client/<surface>/` directory you don't need — empty registries are fine.
@@ -1,478 +0,0 @@
# Tool Design (Naming, Manifest, Executor, Runtime)
This doc covers everything that **isn't UI**: the tool's identifier, API surface, manifest, types, system prompt, ExecutionRuntime, and the executor that wires it into the frontend.
For UI surfaces (Inspector / Render / Placeholder / Streaming / Intervention / Portal), see [ui/](ui/README.md).
For where files live and how registries work, see [architecture.md](architecture.md).
---
## 1. Naming
| Thing | Convention | Example |
| ----------------------- | -------------------------------------------------------------- | ------------------------------------------------------------ |
| Package directory | `packages/builtin-tool-<kebab>/` | `builtin-tool-task` |
| npm name | `@lobechat/builtin-tool-<kebab>` | `@lobechat/builtin-tool-task` |
| Tool `identifier` | `lobe-<kebab-domain>`**persisted in message history** | `lobe-task`, `lobe-calculator`, `lobe-knowledge-base` |
| Identifier const | `<Name>Identifier` exported from `manifest.ts` (or `types.ts`) | `export const TaskIdentifier = 'lobe-task'` |
| API name const | `<Name>ApiName``as const` object, **camelCase verbs** | `createTask`, `listTasks`, `runTask` |
| Executor class | `<Name>Executor extends BaseExecutor<typeof <Name>ApiName>` | `TaskExecutor` |
| Executor singleton | `<name>Executor` (camelCase) | `export const taskExecutor = new TaskExecutor()` |
| ExecutionRuntime class | `<Name>ExecutionRuntime` | `LocalSystemExecutionRuntime`, `WebBrowsingExecutionRuntime` |
| Inspector / Render etc. | `<ApiName>Inspector` / `<ApiName>Render` | `CreateTaskInspector`, `SearchInspector` |
### Identifier rules
- **`lobe-` prefix is mandatory** — many switches in the codebase key off it.
- Pick a **domain noun**, not a verb (`lobe-task`, not `lobe-task-manager`).
- The identifier is **persisted in message history** — renaming after release means the `@deprecated` alias trick (register the legacy identifier as a second key in `inspectors.ts` / `renders.ts` pointing at the new module). Get it right the first time.
### ApiName rules
- Verb + noun, camelCase: `createTask`, `viewTask`, `runTasks`.
- **Plural variant for batch** (`createTasks`, `runTasks`) — describe in the manifest description that it's preferred over multiple single calls. The system prompt should also push the batch form.
- Reserve **clear separation between mutating verbs** (`updateTaskStatus`, `editTask`) and **execution verbs** (`runTask`). The system prompt must warn the model when these are confusable — see `task` for the canonical "do NOT use updateTaskStatus(running) to start a task" warning.
- Read-only verbs: `list*`, `view*`, `get*`, `search*`. Mutating: `create*`, `edit*`, `update*`, `delete*`. Triggers/effects: `run*`, `execute*`, `submit*`.
---
## 2. `types.ts` — ApiName + Params/State
Define `<Name>ApiName` as `as const` so it doubles as a runtime enum (used by `BaseExecutor`) and a literal type. Then declare `Params` and `State` per API.
```ts
export const TaskIdentifier = 'lobe-task';
export const TaskApiName = {
createTask: 'createTask',
createTasks: 'createTasks',
listTasks: 'listTasks',
/* …one entry per API, group logically (CRUD then run-style) */
} as const;
export type TaskApiNameType = (typeof TaskApiName)[keyof typeof TaskApiName];
// One block per API
export interface CreateTaskParams {
name: string;
instruction: string; /* … */
}
export interface CreateTaskState {
identifier?: string;
success: boolean;
}
export interface CreateTasksParams {
tasks: CreateTaskParams[];
}
export interface CreateTasksItemResult {
error?: string;
identifier?: string;
name: string;
success: boolean;
}
export interface CreateTasksState {
failed: number;
results: CreateTasksItemResult[];
succeeded: number;
}
```
**The result-domain rule for `State`** (memory: "pluginState is result-domain, not call-domain"):
- Include only fields the UI **renders after the call returns** — ids the LLM didn't have when calling, counts, summary numbers, server-assigned status.
- **Don't echo all params.** The Inspector/Render gets `args` for free.
- Keep batch results as `{ succeeded, failed, results }` so the Render can show a one-line summary plus a detail list.
---
## 3. `manifest.ts` — JSON Schema for the LLM
```ts
import type { BuiltinToolManifest } from '@lobechat/types';
import { systemPrompt } from './systemRole';
import { TaskApiName, TaskIdentifier } from './types';
export const TaskManifest: BuiltinToolManifest = {
identifier: TaskIdentifier,
type: 'builtin',
systemRole: systemPrompt,
meta: {
avatar: '📋',
title: 'Task Tools',
description: 'Create, list, edit, delete tasks with dependencies',
readme: 'Optional long description shown in tool detail pages',
},
api: [
{
name: TaskApiName.createTask,
description:
'Create a new task. Optionally attach as a subtask via parentIdentifier. ' +
'Prefer createTasks when planning a batch.',
parameters: {
type: 'object',
required: ['name', 'instruction'],
properties: {
name: { type: 'string', description: 'Short, descriptive name.' },
instruction: {
type: 'string',
description: 'Detailed instruction for what the task should accomplish.',
},
parentIdentifier: {
type: 'string',
description:
'Identifier of the parent task (e.g. "TASK-1"). If provided, the new task becomes a subtask.',
},
priority: {
type: 'number',
description: 'Priority level: 0=none, 1=urgent, 2=high, 3=normal, 4=low. Default is 0.',
},
},
},
},
/* …one entry per ApiName */
],
};
```
### Manifest writing checklist
- **Every API in `<Name>ApiName` has exactly one entry in `api[]`.** Easy to drift after a refactor.
- **`description` on each API is the model's only docs.** Make it long enough for the LLM to pick the right tool. Mention edge cases ("If you provide any filter, omitted filters are not applied implicitly"), defaults, and the relationship to sibling APIs ("To START a task, use runTask — updateTaskStatus only flips a flag").
- **`parameters` is JSON Schema** (`LobeChatPluginApi`). Use `enum`, `required`, `items`, `oneOf`, `additionalProperties: false` etc. — these survive into the LLM's tool spec.
- **Use `additionalProperties: false`** on parameter objects so the model can't sneak unknown fields past validation.
- **Number parameters with semantic values** (`priority: 0=none, 1=urgent, …`) should describe the mapping in the description. Don't rely on `enum` alone for numbers — the model often fills the wrong one.
- **`enum` arrays for known string sets** (statuses, categories, engines). Spread from a constants module (`enum: [...TASK_STATUSES]`) so the manifest stays in sync.
### Optional manifest fields
```ts
{
/* Where this tool can run.
'client' → Agent Gateway dispatches to the desktop client (filesystem, Electron only)
'server' → ToolExecutionService runs it on the server
omitted → server only */
executors: ['client', 'server'],
/* Default human intervention policy for all APIs that don't specify one.
Pair with an Intervention component (see ui/intervention.md). */
humanIntervention: 'never' | 'always' | { /* extended config */ },
}
```
Per-API `humanIntervention` and `renderDisplayControl` go inside each `api[]` entry.
---
## 4. `systemRole.ts` — Operator Instructions for the Model
This is appended to the agent system prompt whenever the tool is enabled. Treat it as a **how-to-use guide for the LLM**, not marketing copy.
```ts
export const systemPrompt = `You have access to Task management tools. Use them to:
- **createTask**: Create a new task. Use parentIdentifier to make it a subtask.
- **createTasks**: Prefer this over multiple createTask calls when planning a batch
(e.g. all subtasks under one parent, or all chapters of an outline).
- **runTask**: Actually START a task — kicks off the agent in a new (or continued)
topic. Do NOT use updateTaskStatus(running) to start a task; that only flips a
flag without executing. The task must have an assigneeAgentId.
- **updateTaskStatus**: Change a task's status (completed/cancelled/paused/failed).
If you mark a task as failed, include an error message explaining why.
- ...
When planning work:
1. Create tasks for each major piece (use parentIdentifier to organize as subtasks).
2. Use editTask with addDependencies to control execution order.
3. Use updateTaskStatus to mark the current task completed when done.`;
```
### Patterns that work well
- **Bulleted list, bold the API name, one line per API.** The model picks tools by skimming.
- **Disambiguate confusable APIs explicitly** (`runTask` vs `updateTaskStatus`).
- **Push toward batched APIs** ("Prefer this when…").
- **End with a numbered workflow** if the tool has a typical sequence.
- **For tools with multiple environments** (e.g. desktop vs cloud), keep variants in `systemRole.ts` and `systemRole.desktop.ts` and pick at the manifest level. See `builtin-tool-local-system`.
### Dynamic system prompts
If the prompt depends on runtime state (current date, available models), export a function and call it in the manifest:
```ts
// systemRole.ts
export const systemPrompt = (today: string) => `Today is ${today}. You have web search tools…`;
// manifest.ts
import dayjs from 'dayjs';
systemRole: systemPrompt(dayjs(new Date()).format('YYYY-MM-DD')),
```
---
## 5. `ExecutionRuntime/index.ts` — Pure Runtime
This is **the default home for new tool logic** going forward. The runtime is a class that:
- Has no React, no Zustand, no `@/services/...` direct imports.
- Receives services as **constructor injection** (or as method args).
- Returns `BuiltinServerRuntimeOutput` from each method.
- Is unit-testable by passing in mocks.
### Pattern A: Inject a service interface
Use when the runtime calls out to IPC, network, or DB.
```ts
// ExecutionRuntime/index.ts
import type { BuiltinServerRuntimeOutput } from '@lobechat/types';
export interface IWebBrowsingService {
search: (q: SearchQuery) => Promise<UniformSearchResponse>;
crawlPages: (urls: string[]) => Promise<CrawlResults>;
}
export interface WebBrowsingRuntimeOptions {
searchService: IWebBrowsingService;
documentService?: WebBrowsingDocumentService;
agentId?: string;
topicId?: string;
}
export class WebBrowsingExecutionRuntime {
constructor(private opts: WebBrowsingRuntimeOptions) {}
async search(
args: SearchQuery,
options?: { signal?: AbortSignal },
): Promise<BuiltinServerRuntimeOutput> {
try {
const data = await this.opts.searchService.search(args, options);
if (data.errorDetail) {
return {
success: false,
content: data.errorDetail,
error: { message: data.errorDetail },
state: data,
};
}
return {
success: true,
content: searchResultsPrompt(data.results.slice(0, 10)),
state: data,
};
} catch (e) {
return { success: false, content: (e as Error).message, error: e };
}
}
}
```
### Pattern B: Reuse the executor
Use when the same logic runs in browser and Node (e.g. mathjs, nerdamer). The runtime is a thin wrapper that imports the executor and re-types the state per API. See `builtin-tool-calculator/src/ExecutionRuntime/index.ts` for the canonical example.
### Pattern C: Extend a shared base
When you're implementing a domain that already has a base runtime (file ops via `ComputerRuntime`), extend and only override `callService` + result normalization. See `builtin-tool-local-system/src/ExecutionRuntime/index.ts`.
### Runtime contract
Every method returns:
```ts
{
content: string; // LLM-facing — never undefined; default to error message
state?: any; // result-domain — what the UI's pluginState becomes
success: boolean; // mandatory
error?: any; // raw error object; the executor will repackage
}
```
Use `@lobechat/prompts` formatters (`searchResultsPrompt`, `crawlResultsPrompt`, `formatTaskCreated`, etc.) to produce structured `content`. They emit XML/markdown that's already tuned for token efficiency.
---
## 6. `client/executor/index.ts` — Frontend Wiring
The executor's job is to **resolve frontend defaults** (current agent, current task, scope) and **call the runtime**. It then funnels through `toResult()` into the `BuiltinToolResult` shape.
```ts
import { BaseExecutor, type BuiltinToolContext, type BuiltinToolResult } from '@lobechat/types';
import debug from 'debug';
import { taskService } from '@/services/task';
import { getTaskStoreState } from '@/store/task';
import { TaskIdentifier } from '../../manifest';
import { TaskApiName, type CreateTaskParams } from '../../types';
const log = debug('lobe-task:executor');
class TaskExecutor extends BaseExecutor<typeof TaskApiName> {
readonly identifier = TaskIdentifier;
protected readonly apiEnum = TaskApiName;
// ⚠ class FIELD, not a method — preserves `this` when invoked via registry
createTask = async (
params: CreateTaskParams,
ctx?: BuiltinToolContext,
): Promise<BuiltinToolResult> => {
try {
log('createTask params=%o', params);
const task = await getTaskStoreState().createTask({
name: params.name,
instruction: params.instruction,
// Default assignee from context — never silently override an explicit value
assigneeAgentId:
params.assigneeAgentId ?? (ctx?.scope === 'task' ? undefined : ctx?.agentId),
parentTaskId: params.parentIdentifier?.trim() || undefined,
priority: params.priority,
});
if (!task) return this.errorResult('Failed to create task', 'CreateFailed');
return {
success: true,
content: formatTaskCreated({ identifier: task.identifier, name: task.name /* … */ }),
state: { identifier: task.identifier, success: true },
};
} catch (error) {
return this.errorResult(error, 'CreateTaskFailed');
}
};
private errorResult(err: unknown, type: string): BuiltinToolResult {
const message = err instanceof Error ? err.message : String(err) || 'Unknown error';
return { success: false, content: `Failed: ${message}`, error: { type, message } };
}
}
export const taskExecutor = new TaskExecutor();
```
### Hard rules
1. **Methods are class fields** (`name = async (…) => {…}`), not class methods. The registry calls `(executor as any)[apiName](params, ctx)`; arrow-function fields keep `this` bound.
2. **`identifier` and `apiEnum` are `readonly` instance fields**, not getters — `BaseExecutor.hasApi/getApiNames` reads them synchronously at registration time.
3. **Default missing params from `ctx`**, but never silently override explicit values. Use `params.foo ?? ctx?.foo`, not `ctx?.foo ?? params.foo`.
4. **One funnel for all returns.** Either always return through `toResult(runtime.x())` (when delegating) or through `errorResult(…)` for the catch arm. Never inline `{ success: false, content: '' }``content: ''` collapses the Debug pane to blank.
5. **`debug('lobe-<name>:executor')`.** Match the namespace to the identifier minus `lobe-` when convenient.
6. **Singleton export.** `export const <name>Executor = new <Name>Executor()` — the registry imports the instance, not the class.
### When the executor delegates to ExecutionRuntime
```ts
class LocalSystemExecutor extends BaseExecutor<typeof LocalSystemApiEnum> {
readonly identifier = LocalSystemIdentifier;
protected readonly apiEnum = LocalSystemApiEnum;
private runtime = new LocalSystemExecutionRuntime(localFileService);
readLocalFile = async (params: LocalReadFileParams): Promise<BuiltinToolResult> => {
try {
const result = await this.runtime.readFile({
path: params.path,
startLine: params.loc?.[0],
endLine: params.loc?.[1],
});
return this.toResult(result);
} catch (error) {
return this.errorResult(error);
}
};
private toResult(out: BuiltinServerRuntimeOutput): BuiltinToolResult {
const errMsg = typeof out.error?.message === 'string' ? out.error.message : undefined;
const safe = out.content || errMsg || 'Tool execution failed';
if (!out.success) {
return {
success: false,
content: safe,
state: out.state, // ← preserve partial state on failure
error: out.error
? { type: 'PluginServerError', message: errMsg ?? safe, body: out.error }
: undefined,
};
}
return { success: true, content: safe, state: out.state };
}
}
```
The `toResult` funnel is **mandatory**: it enforces never-undefined `content` and partial-state preservation. Both invariants caught real production bugs (`globLocalFiles` Response empty, `editLocalFile` partial state lost).
---
## 7. `index.ts` — Package Entry Point
Keep it pure data + the manifest. **No React, no stores, no Node-only imports.**
```ts
export { TaskIdentifier, TaskManifest } from './manifest';
export { systemPrompt } from './systemRole';
export {
TaskApiName,
type TaskApiNameType,
type CreateTaskParams,
type CreateTaskState,
/* …all Params/State types */
} from './types';
// Optional helpers used by both the runtime and the UI
export { TASK_STATUSES, UNFINISHED_TASK_STATUSES } from './constants';
```
This entry is what `packages/builtin-tools/src/index.ts` and `identifiers.ts` import — it must be importable from server bundles.
---
## 8. `package.json`
```json
{
"dependencies": {
"@lobechat/prompts": "workspace:*"
},
"devDependencies": {
"@lobechat/types": "workspace:*"
},
"exports": {
".": "./src/index.ts",
"./client": "./src/client/index.ts",
"./executor": "./src/client/executor/index.ts",
"./executionRuntime": "./src/ExecutionRuntime/index.ts"
},
"main": "./src/index.ts",
"name": "@lobechat/builtin-tool-<name>",
"peerDependencies": {
"@lobehub/ui": "^5",
"antd": "^6",
"antd-style": "*",
"lucide-react": "*",
"react": "*",
"react-i18next": "*"
},
"private": true,
"version": "1.0.0"
}
```
**Why peer not direct deps for client libs:** the `./` and `./executionRuntime` entry points must be importable from server code. Listing React etc. as peer deps prevents bundlers from following them when only the runtime is consumed.
**Skip `./executor`** if the package has no frontend executor (server-only tools like `builtin-tool-web-browsing`).
---
## 9. Common Pitfalls
| Symptom | Likely cause |
| ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| "ApiNotFound" at runtime | Method name in executor doesn't match `ApiName` value (typo, wrong case) |
| Method works once, then "this is undefined" | Method declared as `async fn() {}` instead of `fn = async () => {}``this` lost when registry invokes |
| Debug "Response" pane blank but `pluginState` populated | Returning `content: ''` or letting `output.content` be undefined — use the `toResult` funnel |
| Partial result vanishes on failure | `toResult` discarded `state` when `success: false`; preserve it |
| Tool shows up but doesn't run on desktop | `executors` in manifest doesn't include `'client'` (or vice versa for server-only) |
| Same tool registered twice / legacy identifier ghost | Identifier collision; check `@deprecated` aliases in `inspectors.ts`/`renders.ts` |
| Manifest test fails after adding API | Forgot to add the corresponding i18n `apiName.<api>` key |
| TypeScript error on `BaseExecutor<typeof X>` | `X` declared with `enum` instead of `as const` object — must be the const-object form |
@@ -1,36 +0,0 @@
# Tool UI Surfaces
A builtin tool can ship up to **six client-side surfaces**, each with a different role in the chat UI. Only `Inspector` is required; the other five are added on demand and registered in their own central files.
| Surface | Required? | When the chat shows it | Registered in |
| ------------ | --------- | --------------------------------------------------------------------- | --------------------------------------------- |
| Inspector | ✅ Always | Header strip of every tool call (one-line chip) | `packages/builtin-tools/src/inspectors.ts` |
| Render | Optional | Rich result card below the header, after the call returns | `packages/builtin-tools/src/renders.ts` |
| Placeholder | Optional | Skeleton between "args streaming complete" and "result arrives" | `packages/builtin-tools/src/placeholders.ts` |
| Streaming | Optional | Live output during execution (e.g. command stdout) | `packages/builtin-tools/src/streamings.ts` |
| Intervention | Optional | Approval / edit-before-run dialog (when `humanIntervention` triggers) | `packages/builtin-tools/src/interventions.ts` |
| Portal | Optional | Full-screen detail view (right-side or modal) | `packages/builtin-tools/src/portals.ts` |
The two reference tools to read end-to-end:
- **`builtin-tool-web-browsing/src/client/`** — Inspector + Render + Placeholder + Portal (no Intervention/Streaming).
- **`builtin-tool-local-system/src/client/`** — all six surfaces, including `components/` for shared building blocks.
---
## Files in this folder
Read **principles** and **shared-rules** first — they apply to every surface. Then jump to the surface you're building.
| File | What it covers |
| ---------------------------------- | ----------------------------------------------------------------------- |
| [principles.md](principles.md) | Design principles — when each surface exists and how far to take it |
| [shared-rules.md](shared-rules.md) | Cross-surface rules: component skeleton, styling, single-layer surfaces |
| [inspector.md](inspector.md) | Inspector — header chip (required) |
| [render.md](render.md) | Render — rich result card |
| [placeholder.md](placeholder.md) | Placeholder — skeleton between args and result |
| [streaming.md](streaming.md) | Streaming — live output during execution |
| [intervention.md](intervention.md) | Intervention — approval / edit-before-run |
| [portal.md](portal.md) | Portal — full-screen detail view |
| [composition.md](composition.md) | Shared subcomponents (`client/components/`) + package public API |
| [diagnostics.md](diagnostics.md) | Symptom → surface quick-lookup |
@@ -1,51 +0,0 @@
# Composition — Shared Components & Package API
## `client/components/` — Shared Subcomponents
Cross-cutting building blocks used by multiple surfaces live here, not duplicated in each surface folder.
Examples from `web-browsing/src/client/components/`:
- `CategoryAvatar.tsx` — search category icon
- `EngineAvatar.tsx` — search engine logo (used in Inspector chip + Render list + Portal header)
- `SearchBar.tsx` — editable query bar (used in Render and Portal)
Examples from `local-system/src/client/components/`:
- `FileItem.tsx` — single file row (used in ListFiles Render, SearchFiles Render, MoveLocalFiles Render)
- `FilePathDisplay.tsx` — path with truncation (used everywhere)
### Rules
- Live under `client/components/`, exported via `client/components/index.ts`.
- Re-export from `client/index.ts` only if other packages need them; otherwise keep internal.
- Keep them dumb — props in, JSX out, no store reads. The store reads belong in the surface that composes them.
---
## `client/index.ts` — Package Public API
Re-exports everything the registries need plus useful types/manifest:
```ts
// Inspector — required
export { TaskInspectors } from './Inspector';
// Render — only if any API has one
export { TaskRenders, CreateTaskRender, RunTasksRender } from './Render';
// Placeholder / Streaming / Intervention — only if used
export { LocalSystemListFilesPlaceholder, LocalSystemSearchFilesPlaceholder } from './Placeholder';
export { LocalSystemStreamings } from './Streaming';
export { LocalSystemInterventions } from './Intervention';
// Portal — single export per tool
export { default as WebBrowsingPortal } from './Portal';
// Reusable components if other packages need them
export { CategoryAvatar, EngineAvatar, SearchBar } from './components';
// Re-export manifest, identifier, types for convenience
export { TaskManifest, TaskIdentifier } from '../manifest';
export * from '../types';
```
@@ -1,15 +0,0 @@
# Diagnostic Quick-Lookup
| Symptom | Surface to check |
| ----------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| No header at all on the tool call | Inspector missing from `client/Inspector/index.ts` registry |
| Header shows the API name but no chips | Inspector missing `args?.X \|\| partialArgs?.X` fallback |
| Header doesn't pulse during loading | Missing `shinyTextStyles.shinyText` on `isArgumentsStreaming \|\| isLoading` |
| Empty result card under header | Render returned `<div />` instead of `null` when no data |
| Render looks "complex" / card-in-card | Filled container (`colorFillQuaternary`) wrapping more filled boxes — flatten to single-layer, see [shared-rules.md](shared-rules.md) |
| Layout jump when result arrives | Placeholder dimensions don't match Render dimensions |
| Approval dialog never appears | Manifest missing `humanIntervention`, or Intervention not in registry |
| Approval click doesn't wait for inline edit | Missing `registerBeforeApprove(id, flushFn)` |
| Portal opens but blank | Switch in `Portal/index.tsx` doesn't cover the apiName |
| Strings show as `builtins.lobe-foo.apiName.bar` | Missing i18n key in `src/locales/default/plugin.ts` (or not seeded in dev locale files) |
| Wrong color shade on `<Text type="secondary">` | `type='secondary'` is lighter than `colorTextSecondary` — pass via `style={{ color: cssVar.colorTextSecondary }}` |
@@ -1,118 +0,0 @@
# Inspector — Header Chip (required)
**Lifecycle:** Inspector renders for **every phase** of a tool call: while args are streaming in, while the executor is running, and after results come back. It's the only surface that's always visible.
**Goal:** keep it to a single line. Show what's happening with as much context as is currently available.
## Props (`BuiltinInspectorProps<Args, State>`)
```ts
interface BuiltinInspectorProps<Arguments = any, State = any> {
apiName: string;
args: Arguments; // final args (only after the assistant stops streaming)
identifier: string;
isArgumentsStreaming?: boolean; // args still arriving
isLoading?: boolean; // args complete, executor running
partialArgs?: Arguments; // partial JSON during streaming
pluginState?: State; // executor's `state` after success
result?: { content: string | null; error?: any };
}
```
## State machine
| Phase | What's available | What to show |
| ----------------------------------- | ---------------------------------------------------------- | ---------------------------------------------------------- |
| Args streaming, no useful field yet | `isArgumentsStreaming === true`, `partialArgs.X` undefined | Just the API title with `shinyTextStyles.shinyText` |
| Args streaming, key field arrived | `partialArgs.X` populated | Title + key field chip, still pulse-animated |
| Args complete, executor running | `args` populated, `isLoading === true` | Same as above, still pulse-animated |
| Result arrived | `pluginState` populated, `isLoading === false` | Title + chips + result summary (count, identifier, status) |
## Canonical example — Search
`packages/builtin-tool-web-browsing/src/client/Inspector/Search/index.tsx`:
```tsx
'use client';
import type { BuiltinInspectorProps, SearchQuery, UniformSearchResponse } from '@lobechat/types';
import { Text } from '@lobehub/ui';
import { cssVar, cx } from 'antd-style';
import { memo } from 'react';
import { useTranslation } from 'react-i18next';
import { highlightTextStyles, inspectorTextStyles, shinyTextStyles } from '@/styles';
export const SearchInspector = memo<BuiltinInspectorProps<SearchQuery, UniformSearchResponse>>(
({ args, partialArgs, isArgumentsStreaming, isLoading, pluginState }) => {
const { t } = useTranslation('plugin');
const query = args?.query || partialArgs?.query || '';
const resultCount = pluginState?.results?.length ?? 0;
const hasResults = resultCount > 0;
if (isArgumentsStreaming && !query) {
return (
<div className={cx(inspectorTextStyles.root, shinyTextStyles.shinyText)}>
<span>{t('builtins.lobe-web-browsing.apiName.search')}</span>
</div>
);
}
return (
<div
className={cx(
inspectorTextStyles.root,
(isArgumentsStreaming || isLoading) && shinyTextStyles.shinyText,
)}
>
<span>{t('builtins.lobe-web-browsing.apiName.search')}:&nbsp;</span>
{query && <span className={highlightTextStyles.primary}>{query}</span>}
{!isLoading &&
!isArgumentsStreaming &&
pluginState?.results &&
(hasResults ? (
<span style={{ marginInlineStart: 4 }}>({resultCount})</span>
) : (
<Text as="span" color={cssVar.colorTextDescription} fontSize={12}>
({t('builtins.lobe-web-browsing.inspector.noResults')})
</Text>
))}
</div>
);
},
);
SearchInspector.displayName = 'SearchInspector';
export default SearchInspector;
```
## Inspector rules
- Wrap the whole row with `inspectorTextStyles.root` (provides correct flex / line-height baseline).
- Pulse with `shinyTextStyles.shinyText` whenever `isArgumentsStreaming || isLoading`.
- Show the i18n title first so the row is non-empty during the earliest streaming phase.
- Read both `args?.X` and `partialArgs?.X` together — `args` is final, `partialArgs` is in-stream.
- Use chips/tags for distinct facets (identifier, name, parent, status, count). Each chip should clip with `text-overflow: ellipsis` and have a `max-width` so long values don't blow out the chat bubble.
- Append `pluginState`-derived suffixes only **after** loading finishes — count or "(no results)" should not appear while still searching.
- **Switch copy by phase.** If the verb implies an ongoing action ("Creating", "Searching", "Listing"), define `<api>.loading` and `<api>.completed` keys and select via `isArgumentsStreaming || isLoading ? loadingKey : completedKey`. Inspector chips persist in chat history — leaving "Creating task" frozen on a finished call reads as if the tool is still running. Read-only labels that are already noun-form ("View task") can keep a single key. See `CallSubAgentInspector` for the canonical two-key pattern.
## Inspector registry — `client/Inspector/index.ts`
```ts
import type { BuiltinInspector } from '@lobechat/types';
import { TaskApiName } from '../../types';
import { CreateTaskInspector } from './CreateTask';
import { ListTasksInspector } from './ListTasks';
/* … */
export const TaskInspectors: Record<string, BuiltinInspector> = {
[TaskApiName.createTask]: CreateTaskInspector as BuiltinInspector,
[TaskApiName.listTasks]: ListTasksInspector as BuiltinInspector,
/* one entry per ApiName */
};
export { CreateTaskInspector } from './CreateTask';
export { ListTasksInspector } from './ListTasks';
/* re-export each */
```
@@ -1,88 +0,0 @@
# Intervention — Approval / Edit-Before-Run (optional)
**Lifecycle:** rendered **before the executor runs** for APIs whose manifest sets `humanIntervention`. The user sees a preview of the args, can edit them, then approves or skips/cancels.
**Add for** destructive or sensitive ops: shell commands, file writes, file moves, payments, message broadcasts.
## Props (`BuiltinInterventionProps<Args>`)
```ts
interface BuiltinInterventionProps<Arguments = any> {
apiName?: string;
args: Arguments;
identifier?: string;
interactionMode?: 'approval' | 'custom';
messageId: string;
/** Called when the user edits the args; the approve action awaits this. */
onArgsChange?: (args: Arguments) => void | Promise<void>;
/** Called on approve / skip / cancel. */
onInteractionAction?: (
action:
| { type: 'submit'; payload: Record<string, unknown> }
| { type: 'skip'; payload?: Record<string, unknown>; reason?: string }
| { type: 'cancel'; payload?: Record<string, unknown> },
) => Promise<void>;
/** Register a callback to flush pending saves before approval. Returns cleanup. */
registerBeforeApprove?: (id: string, callback: () => void | Promise<void>) => () => void;
}
```
## Canonical example — RunCommand Intervention
`packages/builtin-tool-local-system/src/client/Intervention/RunCommand/index.tsx`:
```tsx
import type { RunCommandParams } from '@lobechat/electron-client-ipc';
import type { BuiltinInterventionProps } from '@lobechat/types';
import { Flexbox, Highlighter, Text } from '@lobehub/ui';
import { memo } from 'react';
const RunCommand = memo<BuiltinInterventionProps<RunCommandParams>>(({ args }) => {
const { description, command, timeout } = args;
return (
<Flexbox gap={8}>
<Flexbox horizontal justify="space-between">
{description && <Text>{description}</Text>}
{timeout && (
<Text style={{ fontSize: 12 }} type="secondary">
timeout: {formatTimeout(timeout)}
</Text>
)}
</Flexbox>
{command && (
<Highlighter wrap language="sh" showLanguage={false} variant="outlined">
{command}
</Highlighter>
)}
</Flexbox>
);
});
export default RunCommand;
```
## Intervention rules
- **Show a preview, not a form by default.** Editing UI is opt-in via `onArgsChange` and is usually inline (click to edit a code block, etc.).
- For args with debounced edit state (text fields), use `registerBeforeApprove(id, flushFn)` so the approve action waits for the debounce to flush. Always return the cleanup function.
- Call `onInteractionAction({ type: 'submit', payload })` when the user approves; `'skip'` if they skip with a reason; `'cancel'` if they cancel the whole turn.
- Add a corresponding `interventionAudit.ts` in the package root if the tool needs scope/path validation before approval (see `local-system/src/interventionAudit.ts`).
## Intervention registry — `client/Intervention/index.ts`
```ts
import { LocalSystemApiName } from '../..';
import EditLocalFile from './EditLocalFile';
import RunCommand from './RunCommand';
import WriteFile from './WriteFile';
/* … */
export const LocalSystemInterventions = {
[LocalSystemApiName.editLocalFile]: EditLocalFile,
[LocalSystemApiName.runCommand]: RunCommand,
[LocalSystemApiName.writeLocalFile]: WriteFile,
/* one entry per API that needs approval */
};
```
@@ -1,93 +0,0 @@
# Placeholder — Skeleton Between Args and Result (optional)
**Lifecycle:** rendered when the args have finished streaming but the executor hasn't returned yet. Disappears when `pluginState` arrives. Bridges the moment of perceived lag.
**Add for** APIs with noticeable execution time: web search, network crawl, file list, large grep. **Skip for** instant ops (status flips, calculator).
## Props (`BuiltinPlaceholderProps<Args>`)
```ts
interface BuiltinPlaceholderProps<T extends Record<string, any> = any> {
apiName: string;
args?: T;
identifier: string;
}
```
No `pluginState` — Placeholder lives entirely in the "executing" gap.
## Canonical example — Search Placeholder
`packages/builtin-tool-web-browsing/src/client/Placeholder/Search.tsx`:
```tsx
import type { BuiltinPlaceholderProps, SearchQuery } from '@lobechat/types';
import { Flexbox, Icon, Skeleton } from '@lobehub/ui';
import { createStaticStyles, cx } from 'antd-style';
import { SearchIcon } from 'lucide-react';
import { memo } from 'react';
import { useIsMobile } from '@/hooks/useIsMobile';
import { shinyTextStyles } from '@/styles';
const styles = createStaticStyles(({ css, cssVar }) => ({
query: cx(
css`
padding: 4px 8px;
border-radius: 8px;
font-size: 12px;
color: ${cssVar.colorTextSecondary};
&:hover {
background: ${cssVar.colorFillTertiary};
}
`,
shinyTextStyles.shinyText,
),
}));
export const Search = memo<BuiltinPlaceholderProps<SearchQuery>>(({ args }) => {
const { query } = args || {};
const isMobile = useIsMobile();
return (
<Flexbox gap={8}>
<Flexbox horizontal={!isMobile} gap={isMobile ? 8 : 40}>
<Flexbox horizontal align="center" className={styles.query} gap={8}>
<Icon icon={SearchIcon} />
{query ? query : <Skeleton.Block active style={{ height: 20, width: 40 }} />}
</Flexbox>
<Skeleton.Block active style={{ height: 20, width: 40 }} />
</Flexbox>
<Flexbox horizontal gap={12}>
{[1, 2, 3, 4, 5].map((id) => (
<Skeleton.Button active key={id} style={{ borderRadius: 8, height: 80, width: 160 }} />
))}
</Flexbox>
</Flexbox>
);
});
```
## Placeholder rules
- **Mirror the eventual Render's layout.** When the result arrives the Placeholder unmounts and the Render mounts; if they share dimensions, the chat doesn't jump.
- Use `Skeleton.Block` / `Skeleton.Button` from `@lobehub/ui` for placeholder shapes.
- Embed any args you have (e.g. the query text) — context helps the user know what's loading.
- Pulse with `shinyTextStyles.shinyText` if the Placeholder includes literal text.
## Placeholder registry — `client/Placeholder/index.ts`
```ts
import { WebBrowsingApiName } from '../../types';
import CrawlMultiPages from './CrawlMultiPages';
import CrawlSinglePage from './CrawlSinglePage';
import { Search } from './Search';
export const WebBrowsingPlaceholders = {
[WebBrowsingApiName.crawlMultiPages]: CrawlMultiPages,
[WebBrowsingApiName.crawlSinglePage]: CrawlSinglePage,
[WebBrowsingApiName.search]: Search,
};
export { CrawlMultiPages, CrawlSinglePage, Search };
```
@@ -1,71 +0,0 @@
# Portal — Full-Screen Detail View (optional)
**Lifecycle:** rendered when the user opens the tool message in a side panel or full-screen modal. One Portal per **tool**, not per API — the Portal switches on `apiName` internally.
**Add for** tools whose results deserve a deep-dive view: search results with editable filters, page content with reader mode, code interpreter sessions.
## Props (`BuiltinPortalProps<Args, State>`)
```ts
interface BuiltinPortalProps<Arguments = Record<string, any>, State = any> {
apiName?: string;
arguments: Arguments;
identifier: string;
messageId: string;
state: State;
}
```
## Canonical example — Web-Browsing Portal
`packages/builtin-tool-web-browsing/src/client/Portal/index.tsx`:
```tsx
import type { BuiltinPortalProps, CrawlPluginState, SearchQuery } from '@lobechat/types';
import { memo } from 'react';
import { WebBrowsingApiName } from '../../types';
import PageContent from './PageContent';
import PageContents from './PageContents';
import Search from './Search';
const Portal = memo<BuiltinPortalProps>(({ arguments: args, messageId, state, apiName }) => {
switch (apiName) {
case WebBrowsingApiName.search:
return <Search messageId={messageId} query={args as SearchQuery} response={state} />;
case WebBrowsingApiName.crawlSinglePage: {
const result = (state as CrawlPluginState).results.find((r) => r.originalUrl === args.url);
return <PageContent messageId={messageId} result={result} />;
}
case WebBrowsingApiName.crawlMultiPages:
return (
<PageContents
messageId={messageId}
results={(state as CrawlPluginState).results}
urls={args.urls}
/>
);
}
return null;
});
export default Portal;
```
## Portal rules
- One Portal per tool — the file is the routing layer, subcomponents implement each API's view.
- Portals can read the chat store directly to detect "still streaming" and render a Skeleton internally (see `Search/index.tsx:20-46`).
- Layout assumes more space than the Render — use `Flexbox` with `height={'100%'}` and structure for a side panel viewport.
## Portal registry — `packages/builtin-tools/src/portals.ts`
```ts
import { WebBrowsingManifest, WebBrowsingPortal } from '@lobechat/builtin-tool-web-browsing/client';
import { type BuiltinPortal } from '@lobechat/types';
export const BuiltinToolsPortals: Record<string, BuiltinPortal> = {
[WebBrowsingManifest.identifier]: WebBrowsingPortal as BuiltinPortal,
};
```
@@ -1,19 +0,0 @@
# Tool Render 设计原则(中文草案)
这些原则用于判断一个 builtin tool 的 Inspector / Render / Placeholder / Streaming / Intervention / Portal 应该做什么,以及做到什么程度。
1. **先保证折叠态可读。** 每个 API 都必须有 Inspector;用户不展开也应该能看懂 “正在做什么 / 对什么做 / 当前结果是什么”。Inspector 不应该只展示函数名和原始参数。
2. **Inspector 是一句话,不是详情页。** 优先表达动作、关键对象、数量、状态,例如 “分析图片 3 张”“搜索 12 个结果”“读取 config.json”。长文本、列表和结构化结果放到 Render 或 Portal。
3. **Inspector 要覆盖执行生命周期。** `args` 还在 streaming、工具执行中、执行完成、执行失败时都应该有稳定展示;必要时同时读取 `args``partialArgs``pluginState`,避免出现空白、跳变或只显示半截参数。
4. **文案要随状态切换时态。** 同一个动作在 loading 与 completed 两个阶段必须用不同的措辞:执行中用现在进行时(“正在创建任务 / Creating task / 正在搜索”),执行完成后切到完成态(“已创建任务 / Task created / 已找到 N 条”)。Inspector chip 会一直留在聊天记录里 —— 如果一直挂着 “正在 xxx”,几小时后回看历史时会读起来像还在跑。约定的 i18n 形式是 `<api>.loading` / `<api>.completed` 一对键(见 `lobe-agent.apiName.callSubAgent.{loading,completed}``lobe-claude-code.task.{create,list,update,get}.{loading,completed}`),渲染时按 `isArgumentsStreaming || isLoading` 决定取哪一个。只读 / 查询类(“查看任务” 这种本来就是名词性的)可以共用一个键。
5. **只有结构化结果才需要 Render。** 如果工具结果只是自然语言总结,通常不需要 Render;如果结果包含列表、媒体、文件、表格、代码、diff、地图、时间线、权限请求等结构,就应该提供 Render。
6. **Render 要帮助用户检查结果,而不是复述参数。** Render 的主体应该围绕工具产物组织:可预览、可比较、可筛选、可定位。参数只作为上下文辅助出现,不要把 Render 做成一块更大的 args dump。
7. **参数和结果要一起参与渲染。** 好的 Tool UI 通常同时用 `args` 解释意图,用 `pluginState` 展示真实执行结果;但 `pluginState` 只放结果域数据,不要反向塞入可以从 `args` 推导出的内容。
8. **慢操作要有 Placeholder。** 如果工具通常需要等待网络、文件系统、模型或外部进程,Placeholder 应该先占住最终 Render 的版式,让用户知道即将看到什么,而不是只显示一个泛化 loading。
9. **Streaming 只用于连续产物。** 搜索列表、日志、长文本、文件分析、分阶段计划适合 Streaming;一次性小结果不需要强行做 Streaming。Streaming UI 要能渐进追加,并且完成后自然过渡到最终 Render。
10. **有风险的动作必须 Intervention。** 写文件、删除、发送、安装、执行命令、外部可见操作、权限敏感操作,都应该在执行前给出可理解的确认界面;确认文案要说明影响范围,而不是只问 “是否继续”。
11. **错误、空态和截断都是正式状态。** Render 不能在失败、无结果、超长结果时退化成空白。错误要说明发生在哪一步;空态要告诉用户没有产物;超长内容要明确 “展示前 N 项 / 还有 N 项”。
12. **信息密度要克制。** 默认展示最有判断价值的部分:标题、来源、状态、摘要、少量关键字段。大对象、长列表、原文、调试数据放进可展开区域或 Portal,避免把聊天流撑成后台管理页。
13. **视觉上融入聊天流。** Tool UI 应该使用 `@lobehub/ui` / base-ui、`Flexbox``createStaticStyles``cssVar.*`,遵循现有间距、圆角、颜色、字号;不要为单个工具发明一套独立视觉语言。具体的样式约定见 [shared-rules.md](shared-rules.md)。
14. **Devtools fixture 是验收入口。** 新增或修改 Tool UI 时,应在 `/devtools` 里准备覆盖典型态、loading/streaming、空态、错误态、长内容态的 fixture;一个 API 如果在真实聊天里会出现,就不应该在 devtools 中缺席。
15. **先做用户会看的 UI,再做调试 UI。** Raw JSON、trace、schema、内部 id 可以存在,但应默认收起或放到调试区;主界面先回答用户最关心的问题:工具做了什么,结果值不值得信任,下一步能做什么。
@@ -1,101 +0,0 @@
# Render — Rich Result Card (optional)
**Lifecycle:** rendered **once the result arrives** (after Placeholder/Streaming hand off). Sits below the Inspector header.
**Skip if** the API is read-only or the result is just text — the framework already shows the executor's `content` string. Add a Render only when there's a structured artifact worth seeing: a card, a chart, a diff, a list of files.
## Props (`BuiltinRenderProps<Args, State, Content>`)
```ts
interface BuiltinRenderProps<Arguments = any, State = any, Content = any> {
apiName?: string;
args: Arguments; // final params from the LLM
content: Content; // executor's content string (or parsed)
identifier?: string;
messageId: string; // for store lookups
pluginError?: any; // from BuiltinToolResult.error
pluginState?: State; // executor's state
toolCallId?: string;
}
```
## Two patterns
**Pattern A — Single-file Render** (web-browsing CrawlSinglePage):
```tsx
// client/Render/CrawlSinglePage.tsx
import type { BuiltinRenderProps, CrawlPluginState, CrawlSinglePageQuery } from '@lobechat/types';
import { memo } from 'react';
import PageContent from './PageContent';
const CrawlSinglePage = memo<BuiltinRenderProps<CrawlSinglePageQuery, CrawlPluginState>>(
({ messageId, pluginState, args }) => (
<PageContent messageId={messageId} results={pluginState?.results} urls={[args?.url]} />
),
);
export default CrawlSinglePage;
```
**Pattern B — Folder with subcomponents** (web-browsing Search):
```
client/Render/Search/
├── index.tsx # composes the subcomponents, handles error states
├── ConfigForm.tsx # appears when pluginError.type === 'PluginSettingsInvalid'
├── SearchQuery.tsx # editable query header
└── SearchResult.tsx # result list
```
Use Pattern B when the Render has internal state (editing mode, expanded items), error variants, or is large enough to benefit from splitting.
## Error handling in Render
Renders are the canonical place to surface `pluginError` because the chat doesn't auto-render typed errors:
```tsx
if (pluginError) {
if (pluginError?.type === 'PluginSettingsInvalid') {
return <ConfigForm id={messageId} provider={pluginError.body?.provider} />;
}
return (
<Alert
title={pluginError?.message}
type="error"
extra={<Highlighter language="json">{JSON.stringify(pluginError.body, null, 2)}</Highlighter>}
/>
);
}
```
## Render rules
- **Return `null`** if there's nothing useful to draw yet (avoids empty cards during stream).
- Use `pluginState` for server-truth (ids, counts, server-assigned status) and `args` for what the LLM asked. **Combine — neither alone is enough.**
- For lists, summarize with a header line and show top N items with a "+N more" tail rather than rendering everything.
- **Keep the Render single-layer** — the tool card is already your surface, so don't open with your own filled container and then nest more filled boxes inside it. See [shared-rules.md](shared-rules.md) → "Stay single-layer".
- For modals from a Render, use `@lobehub/ui/base-ui` (`createModal`, `useModalContext`, `confirmModal`) — see the **modal** skill.
## Render registry — `client/Render/index.ts`
```ts
import type { BuiltinRender } from '@lobechat/types';
import { TaskApiName } from '../../types';
import CreateTaskRender from './CreateTask';
import RunTasksRender from './RunTasks';
export const TaskRenders: Record<string, BuiltinRender> = {
[TaskApiName.createTask]: CreateTaskRender as BuiltinRender,
[TaskApiName.runTasks]: RunTasksRender as BuiltinRender,
/* only the APIs with rich result UI — others fall back to text content */
};
export { default as CreateTaskRender } from './CreateTask';
export { default as RunTasksRender } from './RunTasks';
```
## Render display control (rare)
If the Render should hide for certain results (e.g. ClaudeCode's TodoWrite hides when the agent is mid-stream), add a `RenderDisplayControl` to `packages/builtin-tools/src/displayControls.ts`. See `ClaudeCodeRenderDisplayControls` for the pattern.
@@ -1,89 +0,0 @@
# Shared Style Rules
These apply across every surface.
## The component skeleton
Every surface file is the same shape, so internalize it once instead of re-deriving it per rule. The skeleton below bakes in five mechanical conventions — copy it and fill the body:
```tsx
'use client'; // (a) leaves of the chat tree must not block server rendering
import type { BuiltinInspectorProps, SearchQuery, UniformSearchResponse } from '@lobechat/types';
import { memo } from 'react';
import { useTranslation } from 'react-i18next';
// (b) type with BuiltinXProps<Args, State> — never widen to `any`.
// Args = the JSON Schema params, State = the executor's `state` field;
// they should match <Name>Params / <Name>State from types.ts.
export const SearchInspector = memo<BuiltinInspectorProps<SearchQuery, UniformSearchResponse>>(
({ args, pluginState }) => {
const { t } = useTranslation('plugin'); // (c) all strings from the `plugin` namespace
// (d) cross-cutting state (loading, streaming buffer) comes from the store,
// not props — props only carry args/state/messageId.
// const buffer = useChatStore((s) => chatToolSelectors.streamingBuffer(messageId)(s));
return <span>{t('builtins.<identifier>.apiName.search')}</span>;
},
);
SearchInspector.displayName = 'SearchInspector'; // (e) always memo + displayName
export default SearchInspector;
```
- **(c)** Default an Inspector to `t('builtins.<identifier>.apiName.<api>')` so the row is non-empty while args stream in.
- **(d)** Read the store via Zustand selectors inside the component; see [streaming.md](streaming.md) for the buffer selector.
## Styling: `createStaticStyles + cssVar.*`, `@lobehub/ui` over `antd`
Zero-runtime CSS-in-JS — styles compile once and read CSS variables at runtime:
```tsx
import { createStaticStyles, cssVar } from 'antd-style';
const styles = createStaticStyles(({ css, cssVar }) => ({
chip: css`
padding-block: 2px;
padding-inline: 8px;
border-radius: 999px;
color: ${cssVar.colorText};
background: ${cssVar.colorFillTertiary};
`,
}));
```
- Fall back to `createStyles + token` only when you need runtime token computation (rare). Inline `style={{ color: cssVar.colorTextSecondary }}` is fine for one-off dynamic values.
- Components come from `@lobehub/ui` (`Block`, `Text`, `Flexbox`, `Highlighter`, `Alert`, `Tooltip`, `Skeleton`), not raw `antd`. Modals come from `@lobehub/ui/base-ui` (`createModal`, `useModalContext`, `confirmModal`) — see the **modal** skill.
- Note: `<Text type='secondary'>` is a lighter shade than `colorTextSecondary`. For that exact token color, write `<Text style={{ color: cssVar.colorTextSecondary }}>`.
## Stay single-layer — don't nest filled cards
The framework already wraps every Render / Intervention in a tool card, so that card **is** your surface. A Render that opens with its own `background: ${cssVar.colorFillQuaternary}` container is already one card deep; put another filled box inside it (`colorBgContainer` / `colorFillTertiary`) and you get the card-in-card look that reads as "complex" — two or three stacked fills for what is really a flat list of fields.
- **The outermost wrapper carries no fill.** Use a flat container with only `padding-block: 4px` for breathing room; let the tool card provide the card. (See `Agent/index.tsx`'s `container`.)
- **At most one filled box, and only to delineate real content** — a Markdown preview, a diff, a code/result block. Labels, keyvalue fields, question/answer text, chips: render flat on the surface, separated by spacing or a hairline divider (`height: 1px; background: ${cssVar.colorFillSecondary}`), not by wrapping each in its own box.
- **A box on a flat surface needs a visible fill.** Once the outer fill is gone, an inner `colorBgContainer` box can vanish against the tool card (same color). Use `colorFillTertiary` for the one content box so it still reads as delineated.
- Don't wrap a single value in a box just to give it padding — that's the redundant-nesting smell (a `detailCard` around a `value` box around one string).
```tsx
// ❌ card-in-card: filled container wrapping a filled preview box
container: css`
padding: 12px;
background: ${cssVar.colorFillQuaternary};
`,
previewBox: css`
background: ${cssVar.colorBgContainer};
`,
// ✅ single-layer: flat container, one visible content box
container: css`
padding-block: 4px;
`,
previewBox: css`
background: ${cssVar.colorFillTertiary};
`,
```
For the common "icon + file/title header, then one content box" shape, reuse `ToolResultCard` from `@lobechat/shared-tool-ui/components` instead of rebuilding it — it's already single-layer (flat wrapper, one `colorFillTertiary` content box) and is what CC `Read` / `Grep` / `Glob` / `Write` / `WebSearch` / `WebFetch` render through.
The exception is a deliberate **panel** pattern — an `<Block variant="outlined">` with a header bar + list rows (CC `TodoWrite` / `Task`). There the single outlined block is the panel and the header fill is a header bar, not a nested card. One structured panel is fine; stacked decorative fills are not.
@@ -1,83 +0,0 @@
# Streaming — Live Output During Execution (optional)
**Lifecycle:** rendered **while the executor is still running** for APIs that emit incremental output. The component is responsible for fetching the in-flight stream from the chat store and rendering it.
**Add for** long-running ops with continuous output: shell command execution (stdout/stderr), file write progress, code interpreter cells.
## Props (`BuiltinStreamingProps<Args>`)
```ts
interface BuiltinStreamingProps<Arguments = any> {
apiName: string;
args: Arguments;
identifier: string;
messageId: string; // use to fetch the streaming buffer from store
toolCallId: string;
}
```
Note there's **no `state` or `result` prop** — the Streaming component is for the in-flight phase. It pulls the live buffer from the store itself (typically via `chatToolSelectors.streamingContent(messageId)` or similar).
## Canonical example — RunCommandStreaming
`packages/builtin-tool-local-system/src/client/Streaming/RunCommand/index.tsx`:
```tsx
'use client';
import type { BuiltinStreamingProps } from '@lobechat/types';
import { Highlighter } from '@lobehub/ui';
import { memo } from 'react';
interface RunCommandParams {
command?: string;
description?: string;
timeout?: number;
}
export const RunCommandStreaming = memo<BuiltinStreamingProps<RunCommandParams>>(({ args }) => {
const { command } = args || {};
if (!command) return null;
return (
<Highlighter
animated
wrap
language="sh"
showLanguage={false}
style={{ padding: '4px 8px' }}
variant="outlined"
>
{command}
</Highlighter>
);
});
RunCommandStreaming.displayName = 'RunCommandStreaming';
```
For real-time output beyond just the command (stderr/stdout streaming), pull from the chat store:
```tsx
const buffer = useChatStore((state) =>
chatToolSelectors.streamingBuffer(messageId, toolCallId)(state),
);
```
## Streaming rules
- Render `null` until you have something to display (avoids flash).
- For terminal-style output, use `Highlighter` with `animated` to show typing-like effect.
- The Streaming component must **unmount cleanly** when execution ends — typically the framework swaps it out for the Render automatically.
## Streaming registry — `client/Streaming/index.ts`
```ts
import { LocalSystemApiName } from '../..';
import { RunCommandStreaming } from './RunCommand';
import { WriteFileStreaming } from './WriteFile';
export const LocalSystemStreamings = {
[LocalSystemApiName.runCommand]: RunCommandStreaming,
[LocalSystemApiName.writeLocalFile]: WriteFileStreaming,
};
```
+8 -2
View File
@@ -1,7 +1,13 @@
---
name: chat-sdk
description: 'Build multi-platform chat bots with the chat SDK. Use for Slack, Teams, Google Chat, Discord, GitHub, Linear bots, webhooks, mentions, slash commands, cards, modals, or streaming responses.'
user-invocable: false
description: >
Build multi-platform chat bots with Chat SDK (`chat` npm package). Use when developers want to
(1) Build a Slack, Teams, Google Chat, Discord, GitHub, or Linear bot,
(2) Use the Chat SDK to handle mentions, messages, reactions, slash commands, cards, modals, or streaming,
(3) Set up webhook handlers for chat platforms,
(4) Send interactive cards or stream AI responses to chat platforms.
Triggers on "chat sdk", "chat bot", "slack bot", "teams bot", "discord bot", "@chat-adapter",
building bots that work across multiple chat platforms.
---
# Chat SDK
+58 -11
View File
@@ -29,9 +29,10 @@ Standard workflow for verifying backend changes using the LobeHub CLI (`lh`) aga
## Quick Reference
All CLI dev commands run from `lobehub/apps/cli/`. Subsequent examples use `$CLI`:
All CLI dev commands run from `lobehub/apps/cli/`:
```bash
# Shorthand for all commands below
CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
```
@@ -39,14 +40,17 @@ CLI="LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts"
### Step 1: Ensure Dev Server is Running
Check if the dev server is already running:
```bash
curl -s -o /dev/null -w '%{http_code}' http://localhost:3011/ 2> /dev/null
```
- **If reachable**: skip to Step 2.
- **If unreachable**: start from cloud repo root:
- **If reachable** (returns any HTTP status): server is running. Skip to Step 2.
- **If unreachable**: start the server:
```bash
# From cloud repo root
pnpm run dev:next
```
@@ -61,33 +65,37 @@ pnpm run dev:next
### Step 2: Check CLI Authentication
Check if dev credentials already exist:
```bash
cat lobehub/apps/cli/.lobehub-dev/settings.json 2> /dev/null
```
- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: skip to Step 3.
- **If missing or wrong server**: ask the user to run:
- **If file exists and contains `"serverUrl": "http://localhost:3011"`**: already authenticated. Skip to Step 3.
- **If file missing or points to wrong server**: login is needed. Ask the user to run:
```bash
! cd lobehub/apps/cli && LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts login --server http://localhost:3011
```
> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. Credentials persist in `lobehub/apps/cli/.lobehub-dev/`.
> Login requires interactive browser authorization (OIDC Device Code Flow), so the user must run it themselves via `!` prefix. After login, credentials are saved to `lobehub/apps/cli/.lobehub-dev/` and persist across sessions.
### Step 3: Test with CLI Commands
CLI runs from source, so CLI-side code changes take effect immediately without rebuilding.
CLI runs from source (`bun src/index.ts`), so CLI-side code changes take effect immediately without rebuilding.
```bash
cd lobehub/apps/cli
$CLI <command>
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts <command>
```
### Step 4: Clean Up Test Data
Delete any test data created during verification:
```bash
$CLI task delete < id > -y
$CLI agent delete < id > -y
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts task delete < id > -y
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts agent delete < id > -y
```
## Common Testing Patterns
@@ -95,30 +103,51 @@ $CLI agent delete < id > -y
### Task System
```bash
# List tasks
$CLI task list
# Create test data with nesting
$CLI task create -n "Root Task" -i "Test instruction"
$CLI task create -n "Child Task" -i "Sub instruction" --parent T-1
# View task detail (tests getTaskDetail service)
$CLI task view T-1
# View task tree
$CLI task tree T-1
# Test lifecycle
$CLI task edit T-1 --status running
$CLI task comment T-1 -m "Test comment"
# Clean up
$CLI task delete T-1 -y
```
### Agent System
```bash
# List agents
$CLI agent list
# View agent detail
$CLI agent view <agent-id>
# Run agent (tests agent execution pipeline)
$CLI agent run <agent-id> -m "Test prompt"
```
### Document & Knowledge Base
```bash
# List documents
$CLI doc list
# Create and view
$CLI doc create -t "Test Doc" -c "Content here"
$CLI doc view <doc-id>
# Knowledge base
$CLI kb list
$CLI kb tree <kb-id>
```
@@ -126,13 +155,18 @@ $CLI kb tree <kb-id>
### Model & Provider
```bash
# List models and providers
$CLI model list
$CLI provider list
# Test provider connectivity
$CLI provider test <provider-id>
```
## Dev-Test Cycle
The standard cycle for backend development:
```
1. Make code changes (service/model/router/type)
|
@@ -143,7 +177,7 @@ $CLI provider test <provider-id>
lsof -ti:3011 | xargs kill && pnpm run dev:next
|
4. CLI verification (end-to-end)
$CLI <command>
LOBEHUB_CLI_HOME=.lobehub-dev bun src/index.ts <command>
|
5. Clean up test data
```
@@ -159,6 +193,10 @@ $CLI provider test <provider-id>
| `lobehub/apps/cli/` (CLI code) | No |
| `src/` (cloud overrides) | Yes |
### When Server Restart is NOT Needed
CLI runs from source via `bun src/index.ts`, so any changes to `lobehub/apps/cli/src/` take effect immediately on next command invocation.
## Troubleshooting
| Issue | Solution |
@@ -169,3 +207,12 @@ $CLI provider test <provider-id>
| CLI shows old data/behavior | Server needs restart to pick up code changes |
| `EADDRINUSE` on port 3011 | Server already running; kill with `lsof -ti:3011 \| xargs kill` |
| Login opens wrong server | Must use `--server http://localhost:3011` flag (env var doesn't work) |
## Credential Isolation
| Mode | Credential Dir | Server |
| ---------- | -------------------------------- | ----------------- |
| Dev | `lobehub/apps/cli/.lobehub-dev/` | `localhost:3011` |
| Production | `~/.lobehub/` | `app.lobehub.com` |
The two environments are completely isolated. Dev mode credentials are gitignored.
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: cli
description: LobeHub CLI (@lobehub/cli) development guide — commands, subcommands, architecture.
description: LobeHub CLI (@lobehub/cli) development guide. Use when working on CLI commands, adding new subcommands, fixing CLI bugs, or understanding CLI architecture. Triggers on CLI development, command implementation, or `lh` command questions.
disable-model-invocation: true
---
+35 -60
View File
@@ -8,20 +8,16 @@ Generate text, images, videos, speech, and transcriptions.
```
lh generate (alias: gen)
├── text <prompt> # Text generation
├── image <prompt> # Image generation
├── video <prompt> # Video generation
├── tts <text> # Text-to-speech
├── asr <audioFile> # Audio-to-text (speech recognition)
├── download <generationId> <asyncTaskId> # Wait & download generation result
├── status <generationId> <asyncTaskId> # Check async task status
└── list # List generation topics
├── text <prompt> # Text generation
├── image <prompt> # Image generation
├── video <prompt> # Video generation
├── tts <text> # Text-to-speech
├── asr <audioFile> # Audio-to-text (speech recognition)
├── download <genId> <taskId> # Wait & download generation result
├── status <genId> <taskId> # Check async task status
└── list # List generation topics
```
> ⚠️ **Important**: `status` and `download` require an `asyncTaskId` (UUID format, e.g.
> `7ad0eb13-e9a5-4403-8070-1f7fe95b2f95`), **not** the generation ID (`gen_xxx`).
> The asyncTaskId is printed after "→ Task" in the `video` / `image` command output.
---
## `lh generate text <prompt>` / `lh gen text <prompt>`
@@ -58,7 +54,7 @@ cat README.md | lh gen text "summarize this" --pipe
## `lh generate image <prompt>` / `lh gen image <prompt>`
Generate images from text prompt. This is an async operation — the command submits the task and returns a generation ID + async task ID for tracking.
Generate images from text prompt. This is an async operation — the command submits the task and returns a generation ID + task ID for tracking.
**Source**: `apps/cli/src/commands/generate/image.ts`
@@ -84,22 +80,17 @@ lh gen image "A cute cat" --model dall-e-3 --provider openai --json
✓ Image generation started
Batch ID: gb_xxx
1 image(s) queued
Generation gen_xxx → Task 7ad0eb13-xxxx-xxxx-xxxx-xxxxxxxxxxxx
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the asyncTaskId — use this for status/download
Generation gen_xxx → Task <taskId>
Use "lh generate status <generationId> <asyncTaskId>" to check progress.
Use "lh generate status <generationId> <taskId>" to check progress.
```
**Typical workflow**:
```bash
# 1. Submit generation — note down BOTH IDs from the output
# Generate image, then wait & download
lh gen image "A cute cat"
# Generation gen_abc123 → Task 7ad0eb13-e9a5-4403-8070-1f7fe95b2f95
# 2. Wait & download using generationId + asyncTaskId (the UUID)
lh gen download gen_abc123 7ad0eb13-e9a5-4403-8070-1f7fe95b2f95 -o cat.png
lh gen download <generationId> <taskId> -o cat.png
```
---
@@ -111,7 +102,7 @@ Generate video from text prompt. This is an async operation.
**Source**: `apps/cli/src/commands/generate/video.ts`
```bash
lh gen video "A cat playing piano" -m <model> -p <provider> [options]
lh gen video "A cat playing piano" -m < model > -p < provider > [options]
```
| Option | Description | Required |
@@ -131,26 +122,9 @@ lh gen video "A cat playing piano" -m <model> -p <provider> [options]
```
✓ Video generation started
Batch ID: gb_xxx
Generation gen_xxx → Task 7ad0eb13-xxxx-xxxx-xxxx-xxxxxxxxxxxx
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the asyncTaskId — use this for status/download
Generation gen_xxx → Task <taskId>
Use "lh generate status <generationId> <asyncTaskId>" to check progress.
```
**Typical workflow**:
```bash
# 1. Find available video models for a provider
lh model list volcengine --json | grep -i seedance
# 2. Submit generation — note down BOTH IDs from the output
lh gen video "A cat on a runway" -m doubao-seedance-2-0-260128 -p volcengine \
--aspect-ratio 9:16 --duration 5 --resolution 1080p
# Generation gen_abc123 → Task 7ad0eb13-e9a5-4403-8070-1f7fe95b2f95
# 3. Wait & download using generationId + asyncTaskId (the UUID)
lh gen download gen_abc123 7ad0eb13-e9a5-4403-8070-1f7fe95b2f95 -o result.mp4 --timeout 600
Use "lh generate status <generationId> <taskId>" to check progress.
```
---
@@ -179,18 +153,15 @@ lh gen asr recording.wav [options]
---
## `lh generate download <generationId> <asyncTaskId>`
## `lh generate download <generationId> <taskId>`
Wait for an async generation task to complete and download the result file.
**Source**: `apps/cli/src/commands/generate/index.ts`
> ⚠️ `<asyncTaskId>` is the UUID printed after "→ Task" in the video/image output.
> Do **not** pass the generation ID (`gen_xxx`) here — that will cause a server error.
```bash
lh gen download <generationId> <asyncTaskId> [-o output.png]
lh gen download gen_xxx 7ad0eb13-xxxx-xxxx-xxxx-xxxxxxxxxxxx -o ~/Desktop/result.mp4 --timeout 600
lh gen download <generationId> <taskId> [-o output.png]
lh gen download gen_xxx task_xxx -o ~/Desktop/result.mp4 --timeout 600
```
| Option | Description | Default |
@@ -204,21 +175,30 @@ lh gen download gen_xxx 7ad0eb13-xxxx-xxxx-xxxx-xxxxxxxxxxxx -o ~/Desktop/result
1. Polls `generation.getGenerationStatus` at the specified interval
2. Shows live progress: `⋯ Status: processing... (42s)`
3. On success: downloads asset URL to local file
4. On error / wrong ID: displays a clear message pointing to the correct ID format
4. On error: displays error message and exits
5. On timeout: suggests using `lh gen status` to check later
**Typical workflow**:
```bash
# One-shot: generate and download
lh gen image "A sunset"
# Copy the generation ID and task ID from output
lh gen download gen_xxx taskId_xxx -o sunset.png
# Video (longer timeout)
lh gen video "A cat running" -m model -p provider
lh gen download gen_xxx taskId_xxx -o cat.mp4 --timeout 600
```
---
## `lh generate status <generationId> <asyncTaskId>`
## `lh generate status <generationId> <taskId>`
Check the status of an async generation task.
> ⚠️ `<asyncTaskId>` is the UUID printed after "→ Task" in the video/image output.
> Do **not** pass the generation ID (`gen_xxx`) here — that will cause a server error.
```bash
lh gen status <generationId> <asyncTaskId> [--json]
lh gen status gen_xxx 7ad0eb13-xxxx-xxxx-xxxx-xxxxxxxxxxxx
lh gen status <generationId> <taskId> [--json]
```
| Option | Description |
@@ -255,17 +235,12 @@ Image and video generation use an async task pattern:
- Triggers async background task (image via `createAsyncCaller`, video via `initModelRuntimeFromDB`)
- Returns `{ data: { batch, generations }, success }` with `asyncTaskId` in each generation
3. **Poll status**`generation.getGenerationStatus`
- Input: `{ generationId, asyncTaskId }` — both are required, and `asyncTaskId` must be the
UUID from the `async_tasks` table, not `gen_xxx`
- Returns `{ status, error, generation }` (generation includes asset URLs on success)
- Before querying, calls `checkTimeoutTasks` which marks tasks as `error` if they have been
`pending` or `processing` for more than ~5 minutes (`ASYNC_TASK_TIMEOUT = 298s`)
**Server routes**:
- `src/server/routers/lambda/image/index.ts` — image creation (uses `authedProcedure` + `serverDatabase`)
- `src/server/routers/lambda/video/index.ts` — video creation (uses `authedProcedure` + `serverDatabase`)
- `src/server/routers/lambda/generation.ts` — status checking
- `packages/database/src/models/asyncTask.ts``AsyncTaskModel` including `checkTimeoutTasks`
**Note**: Image/video routes do NOT use the `keyVaults` middleware — they read API keys from the database via `initModelRuntimeFromDB` or `createAsyncCaller`.
@@ -1,52 +1,58 @@
---
name: review-checklist
description: 'LobeHub code review checklist. Use when reviewing a PR, diff, or branch for console leftovers, return await, secrets, i18n, desktop router drift, UI imports, migrations, or cloud impact.'
user-invocable: false
name: code-review
description: 'Code review checklist for LobeHub. Use when reviewing PRs, diffs, or code changes. Covers correctness, security, quality, and project-specific patterns.'
---
# Review Checklist
# Code Review Guide
## Correctness
## Before You Start
1. Read `/typescript` and `/testing` skills for code style and test conventions
2. Get the diff (skip if already in context, e.g., injected by GitHub review app): `git diff` or `git diff origin/canary..HEAD`
## Checklist
### Correctness
- Leftover `console.log` / `console.debug` — should use `debug` package or remove
- Missing `return await` in try/catch — see <https://typescript-eslint.io/rules/return-await/> (not in our ESLint config yet, requires type info)
- Can the fix/implementation be more concise, efficient, or have better compatibility?
## Security
### Security
- No sensitive data (API keys, tokens, credentials) in `console.*` or `debug()` output
- No base64 output to terminal — extremely long, freezes output
- No hardcoded secrets — use environment variables
## Testing
### Testing
- Bug fixes must include tests covering the fixed scenario
- New logic (services, store actions, utilities) should have test coverage
- Existing tests still cover the changed behavior?
- Prefer `vi.spyOn` over `vi.mock` (see `/testing` skill)
## i18n
### i18n
- New user-facing strings use i18n keys, not hardcoded text
- Keys added to `src/locales/default/{namespace}.ts` with `{feature}.{context}.{action|status}` naming
- For PRs: `locales/` translations for all languages updated (`pnpm i18n`)
## SPA / routing
### SPA / routing
- **`desktopRouter` pair:** If the diff touches `src/spa/router/desktopRouter.config.tsx`, does it also update `src/spa/router/desktopRouter.config.desktop.tsx` with the same route paths and nesting? Single-file edits often cause drift and blank screens.
## Reuse
### Reuse
- Newly written code duplicates existing utilities in `packages/utils` or shared modules?
- Copy-pasted blocks with slight variation — extract into shared function
- `antd` imports replaceable with `@lobehub/ui` wrapped components (`Input`, `Button`, `Modal`, `Avatar`, etc.)
- Use `antd-style` token system, not hardcoded colors; prefer `createStaticStyles` + `cssVar.*` over `createStyles` + `token` unless runtime computation is required
- Use `antd-style` token system, not hardcoded colors
## Database
### Database
- Migration scripts must be idempotent (`IF NOT EXISTS`, `IF EXISTS` guards)
## Cloud Impact
### Cloud Impact
A downstream cloud deployment depends on this repo. Flag changes that may require cloud-side updates:
@@ -55,3 +61,13 @@ A downstream cloud deployment depends on this repo. Flag changes that may requir
- **Dependency versions bumped** — e.g., upgrading `next` or `drizzle-orm` in `package.json`
- **`@lobechat/business-*` exports changed** — e.g., renaming a function in `src/business/` or changing type signatures in `packages/business/`
- `src/business/` and `packages/business/` must not expose cloud commercial logic in comments or code
## Output Format
For local CLI review only (GitHub review app posts inline PR comments instead):
- Number all findings sequentially
- Indicate priority: `[high]` / `[medium]` / `[low]`
- Include file path and line number for each finding
- Only list problems — no summary, no praise
- Re-read full source for each finding to verify it's real, then output "All findings verified."
@@ -1,614 +0,0 @@
---
name: data-fetching-architecture
description: 'LobeHub data-fetching pipeline guide. Use for service layer, Zustand store, SWR, lambdaClient, useClientDataSWR, useFetchXxx hooks, or migrating useEffect fetches.'
user-invocable: false
---
# LobeHub Data Fetching Architecture
> **Related:** `store-data-structures` covers List vs Detail data shape rationale (Map vs Array).
## Architecture Overview
```text
┌─────────────┐
│ Component │
└──────┬──────┘
│ 1. Call useFetchXxx hook from store
┌──────────────────┐
│ Zustand Store │
│ (State + Hook) │
└──────┬───────────┘
│ 2. useClientDataSWR calls service
┌──────────────────┐
│ Service Layer │
│ (xxxService) │
└──────┬───────────┘
│ 3. Call lambdaClient
┌──────────────────┐
│ lambdaClient │
│ (TRPC Client) │
└──────────────────┘
```
## Core Principles
### ✅ DO
1. **Use Service Layer** for all API calls
2. **Use Store SWR Hooks** for data fetching (not useEffect)
3. **Use proper data structures** — see `store-data-structures` skill for List vs Detail patterns
4. **Use lambdaClient.mutate** for write operations (create/update/delete)
5. **Use lambdaClient.query** only inside service methods
6. **Naming convention** — read hooks are `useFetchXxx`, cache invalidation helpers are `refreshXxx` (e.g. `useFetchBenchmarks` / `refreshBenchmarks`). Mutations then chain `refreshXxx()` after the service call.
### ❌ DON'T
1. **Never use useEffect** for data fetching
2. **Never call lambdaClient** directly in components or stores
3. **Never use useState** for server data
4. **Never mix data structure patterns** — follow `store-data-structures` skill
---
## Layer 1: Service Layer
### Purpose
- Encapsulate all API calls to lambdaClient
- Provide clean, typed interfaces
- Single source of truth for API operations
### Service Structure
```typescript
// src/services/agentEval.ts
class AgentEvalService {
// Query methods - READ operations
async listBenchmarks() {
return lambdaClient.agentEval.listBenchmarks.query();
}
async getBenchmark(id: string) {
return lambdaClient.agentEval.getBenchmark.query({ id });
}
// Mutation methods - WRITE operations
async createBenchmark(params: CreateBenchmarkParams) {
return lambdaClient.agentEval.createBenchmark.mutate(params);
}
async updateBenchmark(params: UpdateBenchmarkParams) {
return lambdaClient.agentEval.updateBenchmark.mutate(params);
}
async deleteBenchmark(id: string) {
return lambdaClient.agentEval.deleteBenchmark.mutate({ id });
}
}
export const agentEvalService = new AgentEvalService();
```
### Service Guidelines
1. **One service per domain** (e.g., agentEval, ragEval, aiAgent)
2. **Export singleton instance** (`export const xxxService = new XxxService()`)
3. **Method names match operations** (list, get, create, update, delete)
4. **Clear parameter types** (use interfaces for complex params)
---
## Layer 2: Store with SWR Hooks
### Purpose
- Manage client-side state
- Provide SWR hooks for data fetching
- Handle cache invalidation
### State Structure
```typescript
// src/store/eval/slices/benchmark/initialState.ts
export interface BenchmarkSliceState {
// List data - simple array
benchmarkList: AgentEvalBenchmarkListItem[];
benchmarkListInit: boolean;
// Detail data - map for caching
benchmarkDetailMap: Record<string, AgentEvalBenchmark>;
loadingBenchmarkDetailIds: string[];
// Mutation states
isCreatingBenchmark: boolean;
isUpdatingBenchmark: boolean;
isDeletingBenchmark: boolean;
}
```
> For complete initialState, reducer, and internal dispatch patterns, see the `store-data-structures` skill.
### Actions
```typescript
// src/store/eval/slices/benchmark/action.ts
const FETCH_BENCHMARKS_KEY = 'FETCH_BENCHMARKS';
const FETCH_BENCHMARK_DETAIL_KEY = 'FETCH_BENCHMARK_DETAIL';
export interface BenchmarkAction {
// SWR Hooks - for data fetching
useFetchBenchmarks: () => SWRResponse;
useFetchBenchmarkDetail: (id?: string) => SWRResponse;
// Refresh methods - for cache invalidation
refreshBenchmarks: () => Promise<void>;
refreshBenchmarkDetail: (id: string) => Promise<void>;
// Mutation actions
createBenchmark: (params: CreateParams) => Promise<any>;
updateBenchmark: (params: UpdateParams) => Promise<void>;
deleteBenchmark: (id: string) => Promise<void>;
// Internal methods - not for direct UI use
internal_dispatchBenchmarkDetail: (payload: BenchmarkDetailDispatch) => void;
internal_updateBenchmarkDetailLoading: (id: string, loading: boolean) => void;
}
export const createBenchmarkSlice: StateCreator<EvalStore, any, [], BenchmarkAction> = (
set,
get,
) => ({
// Fetch list — simple array stored in benchmarkList
useFetchBenchmarks: () =>
useClientDataSWR(FETCH_BENCHMARKS_KEY, () => agentEvalService.listBenchmarks(), {
onSuccess: (data) => {
set({ benchmarkList: data, benchmarkListInit: true }, false, 'useFetchBenchmarks/success');
},
}),
// Fetch detail — null key disables the request when id is missing
useFetchBenchmarkDetail: (id) =>
useClientDataSWR(
id ? [FETCH_BENCHMARK_DETAIL_KEY, id] : null,
() => agentEvalService.getBenchmark(id!),
{
onSuccess: (data) => {
get().internal_dispatchBenchmarkDetail({
type: 'setBenchmarkDetail',
id: id!,
value: data,
});
get().internal_updateBenchmarkDetailLoading(id!, false);
},
},
),
// Refresh methods
refreshBenchmarks: () => mutate(FETCH_BENCHMARKS_KEY),
refreshBenchmarkDetail: (id) => mutate([FETCH_BENCHMARK_DETAIL_KEY, id]),
// CREATE — refresh list after creation
createBenchmark: async (params) => {
set({ isCreatingBenchmark: true }, false, 'createBenchmark/start');
try {
const result = await agentEvalService.createBenchmark(params);
await get().refreshBenchmarks();
return result;
} finally {
set({ isCreatingBenchmark: false }, false, 'createBenchmark/end');
}
},
// UPDATE — optimistic update + refresh
updateBenchmark: async (params) => {
const { id } = params;
// 1. Optimistic update
get().internal_dispatchBenchmarkDetail({
type: 'updateBenchmarkDetail',
id,
value: params,
});
// 2. Set loading
get().internal_updateBenchmarkDetailLoading(id, true);
try {
// 3. Call service
await agentEvalService.updateBenchmark(params);
// 4. Refresh from server
await get().refreshBenchmarks();
await get().refreshBenchmarkDetail(id);
} finally {
get().internal_updateBenchmarkDetailLoading(id, false);
}
},
// DELETE — optimistic update + refresh
deleteBenchmark: async (id) => {
get().internal_dispatchBenchmarkDetail({ type: 'deleteBenchmarkDetail', id });
get().internal_updateBenchmarkDetailLoading(id, true);
try {
await agentEvalService.deleteBenchmark(id);
await get().refreshBenchmarks();
} finally {
get().internal_updateBenchmarkDetailLoading(id, false);
}
},
// Internal — dispatch to reducer (for detail map)
internal_dispatchBenchmarkDetail: (payload) => {
const currentMap = get().benchmarkDetailMap;
const nextMap = benchmarkDetailReducer(currentMap, payload);
// Skip set when nothing changed — avoids unnecessary re-renders
if (isEqual(nextMap, currentMap)) return;
set({ benchmarkDetailMap: nextMap }, false, `dispatchBenchmarkDetail/${payload.type}`);
},
// Internal — update loading state for specific detail
internal_updateBenchmarkDetailLoading: (id, loading) => {
set(
(state) => ({
loadingBenchmarkDetailIds: loading
? [...state.loadingBenchmarkDetailIds, id]
: state.loadingBenchmarkDetailIds.filter((i) => i !== id),
}),
false,
'updateBenchmarkDetailLoading',
);
},
});
```
### Store Guidelines
1. **SWR keys as constants** at top of file
2. **useClientDataSWR** for all data fetching (never useEffect)
3. **onSuccess callback** updates store state
4. **Refresh methods** use `mutate()` to invalidate cache
5. **Loading states** in initialState, updated in onSuccess
6. **Mutations** call service, then refresh relevant cache
---
## Layer 3: Component Usage
### Fetching List Data
```tsx
// ✅ CORRECT
const BenchmarkList = () => {
// 1. Get the hook from store
const useFetchBenchmarks = useEvalStore((s) => s.useFetchBenchmarks);
// 2. Get list data
const benchmarks = useEvalStore((s) => s.benchmarkList);
const isInit = useEvalStore((s) => s.benchmarkListInit);
// 3. Call the hook (SWR handles the data fetching)
useFetchBenchmarks();
// 4. Use the data
if (!isInit) return <Loading />;
return (
<div>
<h2>Total: {benchmarks.length}</h2>
{benchmarks.map((b) => (
<BenchmarkCard key={b.id} {...b} />
))}
</div>
);
};
```
### Fetching Detail Data
```tsx
// ✅ CORRECT
const BenchmarkDetail = () => {
const { benchmarkId } = useParams<{ benchmarkId: string }>();
const useFetchBenchmarkDetail = useEvalStore((s) => s.useFetchBenchmarkDetail);
// Detail from map
const benchmark = useEvalStore((s) =>
benchmarkId ? s.benchmarkDetailMap[benchmarkId] : undefined,
);
// Per-item loading
const isLoading = useEvalStore((s) =>
benchmarkId ? s.loadingBenchmarkDetailIds.includes(benchmarkId) : false,
);
useFetchBenchmarkDetail(benchmarkId);
if (!benchmark) return <Loading />;
return (
<div>
<h1>{benchmark.name}</h1>
<p>{benchmark.description}</p>
{isLoading && <Spinner />}
</div>
);
};
```
### Using Selectors (Recommended)
```typescript
// src/store/eval/slices/benchmark/selectors.ts
export const benchmarkSelectors = {
getBenchmarkDetail: (id: string) => (s: EvalStore) => s.benchmarkDetailMap[id],
isLoadingBenchmarkDetail: (id: string) => (s: EvalStore) =>
s.loadingBenchmarkDetailIds.includes(id),
};
// Component with selectors
const BenchmarkDetail = () => {
const { benchmarkId } = useParams();
const useFetchBenchmarkDetail = useEvalStore((s) => s.useFetchBenchmarkDetail);
const benchmark = useEvalStore(benchmarkSelectors.getBenchmarkDetail(benchmarkId!));
useFetchBenchmarkDetail(benchmarkId);
return <div>{benchmark && <h1>{benchmark.name}</h1>}</div>;
};
```
### Anti-pattern
```tsx
// ❌ WRONG — Don't use useEffect for data fetching
const BenchmarkList = () => {
const [data, setData] = useState([]);
const [loading, setLoading] = useState(false);
useEffect(() => {
setLoading(true);
lambdaClient.agentEval.listBenchmarks
.query()
.then(setData)
.finally(() => setLoading(false));
}, []);
return <div>...</div>;
};
```
### Mutations in Components
```tsx
// Create — global mutation flag drives form loading
const CreateBenchmarkModal = () => {
const createBenchmark = useEvalStore((s) => s.createBenchmark);
const isCreating = useEvalStore((s) => s.isCreatingBenchmark);
const handleSubmit = async (values) => {
try {
// Optimistic update + refresh happen inside createBenchmark
await createBenchmark(values);
message.success('Created successfully');
onClose();
} catch (error) {
message.error('Failed to create');
}
};
return (
<Form onSubmit={handleSubmit} loading={isCreating}>
...
</Form>
);
};
// Update / delete — per-item loading so only the row being mutated spins
const BenchmarkItem = ({ id }: { id: string }) => {
const updateBenchmark = useEvalStore((s) => s.updateBenchmark);
const deleteBenchmark = useEvalStore((s) => s.deleteBenchmark);
const isLoading = useEvalStore(benchmarkSelectors.isLoadingBenchmarkDetail(id));
const handleUpdate = async (data) => {
await updateBenchmark({ id, ...data });
};
const handleDelete = async () => {
await deleteBenchmark(id);
};
return (
<div>
{isLoading && <Spinner />}
<button onClick={handleUpdate}>Update</button>
<button onClick={handleDelete}>Delete</button>
</div>
);
};
```
**Why two patterns:** create has no id yet, so a single `isCreatingXxx` flag is enough. Update/delete target a specific row, so global flags would freeze unrelated rows — keep per-item state in `loadingXxxIds`.
---
## Need a fuller worked example?
The canonical `Benchmark` example above is the one to copy for a flat list + detail map. If you need to maintain a list **keyed by a parent id** (e.g. `datasetMap[benchmarkId]` because the same shape appears under multiple parents), read [`references/walkthrough.md`](./references/walkthrough.md) — it walks through the full 6 steps (service → reducer → slice → store wiring → selectors → component) for that variant.
---
## Common Patterns
### Pattern 1: Pagination
Cache key array must include every parameter that should trigger a refetch.
```typescript
useFetchTestCases: (params: { datasetId: string; limit: number; offset: number }) =>
useClientDataSWR(
params.datasetId ? [FETCH_TEST_CASES_KEY, params.datasetId, params.limit, params.offset] : null,
() => agentEvalService.listTestCases(params),
{
onSuccess: (data) =>
set({
testCaseList: data.data,
testCaseTotal: data.total,
isLoadingTestCases: false,
}),
},
);
```
### Pattern 2: Dependent Fetching
Both hooks run in parallel — SWR dedupes, no manual sequencing needed.
```tsx
const BenchmarkDetail = () => {
const { benchmarkId } = useParams();
const useFetchBenchmarkDetail = useEvalStore((s) => s.useFetchBenchmarkDetail);
const useFetchDatasets = useEvalStore((s) => s.useFetchDatasets);
useFetchBenchmarkDetail(benchmarkId);
useFetchDatasets(benchmarkId);
return <div>...</div>;
};
```
### Pattern 3: Conditional Fetching
Pass `undefined` to disable the hook entirely.
```tsx
// only fetch when modal is open AND id present
useFetchDatasetDetail(open && datasetId ? datasetId : undefined);
```
### Pattern 4: Cross-domain Refresh
```typescript
deleteBenchmark: async (id) => {
await agentEvalService.deleteBenchmark(id);
await get().refreshBenchmarks();
await get().refreshDatasets(id); // related cache invalidated too
};
```
---
## Migration Guide: useEffect → Store SWR
### Before (❌ Wrong)
```tsx
const TestCaseList = ({ datasetId }: Props) => {
const [data, setData] = useState<any[]>([]);
const [loading, setLoading] = useState(false);
useEffect(() => {
setLoading(true);
lambdaClient.agentEval.listTestCases
.query({ datasetId })
.then((r) => setData(r.data))
.finally(() => setLoading(false));
}, [datasetId]);
return <Table data={data} loading={loading} />;
};
```
### After (✅ Correct)
```typescript
// 1. Add service method
class AgentEvalService {
async listTestCases(params: { datasetId: string }) {
return lambdaClient.agentEval.listTestCases.query(params);
}
}
// 2. Add store slice hook
export const createTestCaseSlice: StateCreator<...> = (set) => ({
useFetchTestCases: (params) =>
useClientDataSWR(
params.datasetId ? [FETCH_TEST_CASES_KEY, params.datasetId] : null,
() => agentEvalService.listTestCases(params),
{
onSuccess: (data) =>
set({ testCaseList: data.data, isLoadingTestCases: false }),
},
),
});
// 3. Component reads from store
const TestCaseList = ({ datasetId }: Props) => {
const useFetchTestCases = useEvalStore((s) => s.useFetchTestCases);
const data = useEvalStore((s) => s.testCaseList);
const loading = useEvalStore((s) => s.isLoadingTestCases);
useFetchTestCases({ datasetId });
return <Table data={data} loading={loading} />;
};
```
---
## Troubleshooting
| Symptom | Check |
| --------------------------- | ------------------------------------------------------------------- |
| Data never loads | Hook called? Key not `null`/`undefined`? Network tab shows request? |
| Stale data after mutation | Did `refreshXxx` run? Cache key matches what the hook uses? |
| Loading state stuck `true` | `onSuccess` writes loading=false? Promise rejected silently? |
| Detail map missing an entry | Reducer dispatch ran? `isEqual` short-circuited on stale data? |
---
## Summary Checklist
When adding new data fetching:
### Step 1: Types & State
See `store-data-structures` for details.
- [ ] Define types in `@lobechat/types`: Detail type + List item type
- [ ] State structure: `xxxList: XxxListItem[]`, `xxxDetailMap: Record<string, Xxx>`, `loadingXxxDetailIds: string[]`
- [ ] Reducer if optimistic updates are needed
### Step 2: Service Layer
- [ ] Create service in `src/services/xxxService.ts`
- [ ] Methods: `listXxx()`, `getXxx(id)`, `createXxx()`, `updateXxx()`, `deleteXxx()`
### Step 3: Store Actions
- [ ] `initialState.ts` with state structure
- [ ] `action.ts` with:
- [ ] `useFetchXxxList()`, `useFetchXxxDetail(id)` — SWR hooks
- [ ] `refreshXxxList()`, `refreshXxxDetail(id)` — cache invalidation
- [ ] CRUD methods calling service
- [ ] `internal_dispatch`, `internal_updateLoading` if using reducer
- [ ] `selectors.ts` (optional but recommended)
- [ ] Integrate slice into main store + initialState
### Step 4: Component Usage
- [ ] Use store hooks (NOT useEffect)
- [ ] List pages: access `xxxList` array
- [ ] Detail pages: access `xxxDetailMap[id]`
- [ ] Use loading states for UI feedback
**Mental model:** Types → Service → Reducer → Slice → Component 🎯
---
## Related Skills
- **`store-data-structures`** — How to structure List and Detail data in stores
- **`zustand`** — General Zustand patterns and best practices
@@ -1,244 +0,0 @@
# Walkthrough: Adding a New Feature End-to-End
This is a worked example of the canonical 6-step recipe applied to a new entity (`Dataset`), showing a variant of the main skill's pattern: **a list keyed by a parent id** (`datasetMap[benchmarkId]`), useful when the same shape appears under different parents.
If you only need the canonical (single-array) pattern, the main `SKILL.md` already shows it for `Benchmark`. Read this file when you need the parent-keyed Map variant, or when you want a checklist-style walkthrough.
## Step 1: Add Service methods
```typescript
class AgentEvalService {
async listDatasets(benchmarkId: string) {
return lambdaClient.agentEval.listDatasets.query({ benchmarkId });
}
async getDataset(id: string) {
return lambdaClient.agentEval.getDataset.query({ id });
}
async createDataset(params: CreateDatasetParams) {
return lambdaClient.agentEval.createDataset.mutate(params);
}
// updateDataset / deleteDataset follow the same shape
}
```
## Step 2: Reducer (optimistic updates)
```typescript
// src/store/eval/slices/dataset/reducer.ts
export type DatasetDispatch =
| { type: 'addDataset'; value: Dataset }
| { type: 'updateDataset'; id: string; value: Partial<Dataset> }
| { type: 'deleteDataset'; id: string };
export const datasetReducer = (state: Dataset[] = [], payload: DatasetDispatch): Dataset[] =>
produce(state, (draft) => {
switch (payload.type) {
case 'addDataset':
draft.unshift(payload.value);
break;
case 'updateDataset': {
const i = draft.findIndex((item) => item.id === payload.id);
if (i !== -1) draft[i] = { ...draft[i], ...payload.value };
break;
}
case 'deleteDataset': {
const i = draft.findIndex((item) => item.id === payload.id);
if (i !== -1) draft.splice(i, 1);
break;
}
}
});
```
## Step 3: Store slice
```typescript
// src/store/eval/slices/dataset/initialState.ts
export interface DatasetData {
currentPage: number;
hasMore: boolean;
isLoading: boolean;
items: Dataset[];
pageSize: number;
total: number;
}
export interface DatasetSliceState {
// Map keyed by benchmarkId — multiple parent contexts share the slice
datasetMap: Record<string, DatasetData>;
// Single item for modal display
datasetDetail: Dataset | null;
isLoadingDatasetDetail: boolean;
loadingDatasetIds: string[];
}
export const datasetInitialState: DatasetSliceState = {
datasetMap: {},
datasetDetail: null,
isLoadingDatasetDetail: false,
loadingDatasetIds: [],
};
```
```typescript
// src/store/eval/slices/dataset/action.ts
const FETCH_DATASETS_KEY = 'FETCH_DATASETS';
const FETCH_DATASET_DETAIL_KEY = 'FETCH_DATASET_DETAIL';
export const createDatasetSlice: StateCreator<EvalStore, any, [], DatasetAction> = (set, get) => ({
// Cache key includes benchmarkId so each parent has its own SWR entry
useFetchDatasets: (benchmarkId) =>
useClientDataSWR(
benchmarkId ? [FETCH_DATASETS_KEY, benchmarkId] : null,
() => agentEvalService.listDatasets(benchmarkId!),
{
onSuccess: (data) => {
set({
datasetMap: {
...get().datasetMap,
[benchmarkId!]: {
currentPage: 1,
hasMore: false,
isLoading: false,
items: data,
pageSize: data.length,
total: data.length,
},
},
});
},
},
),
useFetchDatasetDetail: (id) =>
useClientDataSWR(
id ? [FETCH_DATASET_DETAIL_KEY, id] : null,
() => agentEvalService.getDataset(id!),
{
onSuccess: (data) => set({ datasetDetail: data, isLoadingDatasetDetail: false }),
},
),
refreshDatasets: (benchmarkId) => mutate([FETCH_DATASETS_KEY, benchmarkId]),
refreshDatasetDetail: (id) => mutate([FETCH_DATASET_DETAIL_KEY, id]),
// CREATE with optimistic update — note the temp id pattern
createDataset: async (params) => {
const tmpId = Date.now().toString();
const { benchmarkId } = params;
get().internal_dispatchDataset(
{ type: 'addDataset', value: { ...params, id: tmpId, createdAt: Date.now() } as any },
benchmarkId,
);
get().internal_updateDatasetLoading(tmpId, true);
try {
const result = await agentEvalService.createDataset(params);
await get().refreshDatasets(benchmarkId);
return result;
} finally {
get().internal_updateDatasetLoading(tmpId, false);
}
},
// UPDATE / DELETE follow the same optimistic + refresh pattern as BenchmarkSlice
// (see the main SKILL.md)
// Internal — dispatch reducer scoped to a parent
internal_dispatchDataset: (payload, benchmarkId) => {
const currentData = get().datasetMap[benchmarkId];
const nextItems = datasetReducer(currentData?.items, payload);
// Skip set when nothing changed — avoids unnecessary re-renders
if (isEqual(nextItems, currentData?.items)) return;
set({
datasetMap: {
...get().datasetMap,
[benchmarkId]: {
...currentData,
currentPage: currentData?.currentPage ?? 1,
hasMore: currentData?.hasMore ?? false,
isLoading: false,
items: nextItems,
pageSize: currentData?.pageSize ?? nextItems.length,
total: currentData?.total ?? nextItems.length,
},
},
});
},
internal_updateDatasetLoading: (id, loading) => {
set((state) => ({
loadingDatasetIds: loading
? [...state.loadingDatasetIds, id]
: state.loadingDatasetIds.filter((i) => i !== id),
}));
},
});
```
## Step 4: Wire into the store
```typescript
// src/store/eval/store.ts
export type EvalStore = EvalStoreState & BenchmarkAction & DatasetAction & RunAction;
const createStore: StateCreator<EvalStore, [['zustand/devtools', never]]> = (set, get, store) => ({
...initialState,
...createBenchmarkSlice(set, get, store),
...createDatasetSlice(set, get, store),
...createRunSlice(set, get, store),
});
// src/store/eval/initialState.ts
export const initialState: EvalStoreState = {
...benchmarkInitialState,
...datasetInitialState,
...runInitialState,
};
```
## Step 5: Selectors (optional but recommended)
```typescript
export const datasetSelectors = {
getDatasetData: (benchmarkId: string) => (s: EvalStore) => s.datasetMap[benchmarkId],
getDatasets: (benchmarkId: string) => (s: EvalStore) => s.datasetMap[benchmarkId]?.items ?? [],
isLoadingDataset: (id: string) => (s: EvalStore) => s.loadingDatasetIds.includes(id),
};
```
## Step 6: Use in component
```tsx
// List scoped to a parent
const DatasetList = ({ benchmarkId }: { benchmarkId: string }) => {
const useFetchDatasets = useEvalStore((s) => s.useFetchDatasets);
const datasets = useEvalStore(datasetSelectors.getDatasets(benchmarkId));
const datasetData = useEvalStore(datasetSelectors.getDatasetData(benchmarkId));
useFetchDatasets(benchmarkId);
if (datasetData?.isLoading) return <Loading />;
return (
<div>
<h2>Total: {datasetData?.total ?? 0}</h2>
<List data={datasets} />
</div>
);
};
// Single item for modal — conditional fetching pattern
const DatasetImportModal = ({ open, datasetId }: Props) => {
const useFetchDatasetDetail = useEvalStore((s) => s.useFetchDatasetDetail);
const dataset = useEvalStore((s) => s.datasetDetail);
const isLoading = useEvalStore((s) => s.isLoadingDatasetDetail);
// Only fetch when modal is open AND id present
useFetchDatasetDetail(open && datasetId ? datasetId : undefined);
return <Modal open={open}>{isLoading ? <Loading /> : <div>{dataset?.name}</div>}</Modal>;
};
```
File diff suppressed because it is too large Load Diff
+1 -2
View File
@@ -1,7 +1,6 @@
---
name: db-migrations
description: 'Use for Drizzle migrations: schema/table/column changes, migration generation or regeneration, sequence conflicts after rebase, idempotent SQL review, or migration renames.'
user-invocable: false
description: 'Use when generating or regenerating Drizzle migration files, changing database schema tables or columns, resolving migration sequence conflicts after rebase, reviewing migration SQL for idempotent patterns, or renaming migration files.'
---
# Database Migrations Guide
@@ -1,6 +1,6 @@
---
name: debug-package
description: 'LobeHub debug package and log namespace guide. Use when adding debug() logging, choosing lobe-* namespaces, troubleshooting DEBUG output, localStorage.debug, or log format specifiers.'
name: debug
description: Debug package usage guide. Use when adding debug logging, understanding log namespaces, or implementing debugging features. Triggers on debug logging requests or logging implementation.
user-invocable: false
---
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: desktop
description: Electron desktop development guide IPC handlers, controllers, preload scripts, window/menu management.
description: Electron desktop development guide. Use when implementing desktop features, IPC handlers, controllers, preload scripts, window management, menu configuration, or Electron-specific functionality. Triggers on desktop app development, Electron IPC, or desktop local tools implementation.
disable-model-invocation: true
---
-155
View File
@@ -1,155 +0,0 @@
---
name: docs-changelog
description: 'Write website changelog pages under docs/changelog/*.mdx. Use for EN/ZH product update posts, changelog posts, update-log copy, or docs changelog edits; not GitHub Release notes.'
---
# Docs Changelog Writing Guide
## Scope Boundary (Important)
This skill is only for changelog pages in:
- `docs/changelog/*.mdx`
This skill is **not** for GitHub Releases.\
If the user asks for release PR body / GitHub Release notes, load `../version-release/SKILL.md`.
## Mandatory Companion Skills
For every docs changelog task, you MUST load:
- `../microcopy/SKILL.md`
- `../i18n/SKILL.md` (when EN/ZH pair is involved)
## File and Naming Convention
Use date-based file names:
- English: `docs/changelog/YYYY-MM-DD-topic.mdx`
- Chinese: `docs/changelog/YYYY-MM-DD-topic.zh-CN.mdx`
EN and ZH files must exist as a pair and describe the same release facts.
## Frontmatter Requirements
Each file should include:
```md
---
title: <Title>
description: <1 sentence summary>
tags:
- <Tag 1>
- <Tag 2>
---
```
Rules:
1. `title` should match the H1 title in meaning.
2. `description` should be concise and user-facing.
3. `tags` should be feature-oriented, not internal-team labels.
## Content Structure (Recommended)
Use this shape unless the user requests otherwise:
1. `# <Title>`
2. Opening paragraph (2-4 sentences): user-visible impact
3. 1-3 capability sections (optional `##` headings)
4. `## Improvements and fixes` / `## 体验优化与修复` with concise bullets
Keep heading count low and avoid heading-per-bullet structure.
## Writing Rules
1. Keep all claims factual and tied to actual shipped changes.
2. Explain user value first, implementation second.
3. Prefer natural narrative paragraphs over pure bullet dumps.
4. Avoid marketing exaggeration and vague adjectives.
5. Keep internal terms consistent across EN/ZH files.
6. Keep EN/ZH section order aligned and scope-aligned.
## EN/ZH Synchronization Rules
When generating bilingual changelogs:
1. Keep the same key facts in the same order.
2. Localize naturally; do not do literal sentence-by-sentence translation.
3. If one version has an `Improvements and fixes` bullet list, the other should have equivalent list intent.
4. Do not introduce capabilities in only one language unless explicitly requested.
## Length Guidance
- Small update: 3-5 short paragraphs total
- Medium update: 4-7 short paragraphs + concise fix bullets
- Large update: 6-10 short paragraphs split into 2-4 sections
Do not pad content when changes are limited.
## Authoring Workflow
1. Collect source facts from PRs/commits/issues.
2. Group changes by user workflow (not by internal module path).
3. Draft EN and ZH versions with aligned structure.
4. Verify terminology using `microcopy`/`i18n` guidance.
5. Final pass: remove AI-like filler and tighten sentences.
## Docs Changelog Template (English)
```md
---
title: <Feature title>
description: <One-sentence summary for users>
tags:
- <Tag A>
- <Tag B>
---
# <Feature title>
<Opening paragraph: what changed for users and why it matters.>
<Optional section paragraph for key capability 1.>
<Optional section paragraph for key capability 2.>
## Improvements and fixes
- <Fix or optimization 1>
- <Fix or optimization 2>
```
## Docs Changelog Template (Chinese)
```md
---
title: <功能标题>
description: <一句话说明>
tags:
- <标签 A>
- <标签 B>
---
# <功能标题>
<开场段:这次更新给用户带来的直接变化。>
<可选能力段 1。>
<可选能力段 2。>
## 体验优化与修复
- <优化或修复 1>
- <优化或修复 2>
```
## Quick Checklist
- [ ] File path matches `docs/changelog` naming convention
- [ ] EN and ZH versions both exist and match in facts
- [ ] Opening paragraph explains user-facing outcome
- [ ] Main body is narrative-first, not bullet-only
- [ ] `Improvements and fixes` section is concise and concrete
- [ ] No fabricated claims or unsupported scope
+9 -94
View File
@@ -1,7 +1,6 @@
---
name: drizzle
description: 'LobeHub Drizzle ORM schema and query style. Use for pgTable schemas, indexes, joins, inferred types, db.select/db.query, schema fields, foreign keys, junction tables, or postgres query patterns.'
user-invocable: false
description: Drizzle ORM schema and database guide. Use when working with database schemas (src/database/schemas/*), defining tables, creating migrations, or database model code. Triggers on Drizzle schema definition, database migrations, or ORM usage questions.
---
# Drizzle ORM Schema Style Guide
@@ -9,13 +8,13 @@ user-invocable: false
## Configuration
- Config: `drizzle.config.ts`
- Schemas: `packages/database/src/schemas/`
- Migrations: `packages/database/migrations/`
- Schemas: `src/database/schemas/`
- Migrations: `src/database/migrations/`
- Dialect: `postgresql` with `strict: true`
## Helper Functions
Location: `packages/database/src/schemas/_helpers.ts`
Location: `src/database/schemas/_helpers.ts`
- `timestamptz(name)`: Timestamp with timezone
- `createdAt()`, `updatedAt()`, `accessedAt()`: Standard timestamp columns
@@ -126,7 +125,11 @@ The relational API generates complex lateral joins with `json_build_array` that
```typescript
// ✅ Good
const [result] = await this.db.select().from(agents).where(eq(agents.id, id)).limit(1);
const [result] = await this.db
.select()
.from(agents)
.where(eq(agents.id, id))
.limit(1);
return result;
// ❌ Bad: relational API
@@ -174,94 +177,6 @@ const rows = await this.db
.groupBy(agentEvalDatasets.id);
```
### Raw SQL and Advanced Queries
Prefer Drizzle builders whenever the query can be expressed clearly with `select`,
`insert().select()`, `update().from()`, joins, CTEs, `groupBy`, and typed selected
columns. This keeps table and column references tied to schema definitions, so
schema changes are more likely to surface as TypeScript errors.
Expression-level `sql<T>` is fine inside a Drizzle builder for PostgreSQL features
that do not have a dedicated helper, such as JSON path extraction, casts, aggregate
expressions, `CASE`, `NOW()`, or advisory locks. Row locks are query clauses, not
expressions; use the select builder's `.for('update')` instead of raw
`FOR UPDATE` SQL fragments.
When refactoring raw SQL:
- Preserve the original query shape for latency-sensitive paths. If raw SQL is one
database roundtrip, do not replace it with multiple depth-based queries just to
remove `execute`.
- Use `$with(...)` plus `insert().select()` / `update().from()` for multi-step
single-roundtrip writes when Drizzle can express the data flow.
- Avoid generic `execute<MyRow>(sql...)` as the main safety mechanism. It types the
returned rows, but it does not keep selected columns in sync with schema changes.
- If the only clean implementation is a PostgreSQL feature that Drizzle cannot
express well, keep the raw SQL and tighten it instead: use schema references in
interpolations, explicit user scope, a narrow row interface, and regression tests.
Recursive CTEs are a special case: current Drizzle usage in this repo does not have
a clean `WITH RECURSIVE` builder pattern. Keep recursive CTE raw SQL when replacing
it would add extra database roundtrips or materially worsen performance.
Example: convert an aggregate query when Drizzle can preserve one roundtrip:
```typescript
// ✅ Good: builder owns table and column references; sql<T> stays expression-level.
const rows = await trx
.select({
model: messages.model,
provider: messages.provider,
totalCost: sql<string | null>`sum((${messages.metadata}->'usage'->>'cost')::numeric)`.as(
'totalCost',
),
})
.from(messages)
.where(
and(
eq(messages.topicId, topicId),
eq(messages.userId, userId),
eq(messages.role, 'assistant'),
sql`${messages.metadata} ? 'usage'`,
),
)
.groupBy(messages.provider, messages.model);
```
Example: use the select lock builder for row locks:
```typescript
const [user] = await trx.select().from(users).where(eq(users.id, userId)).for('update');
```
Example: keep a recursive CTE raw when replacing it would add depth-based DB
roundtrips:
```typescript
interface TaskTreeRow {
id: string;
parent_task_id: string | null;
}
// execute<T> is acceptable here only because Drizzle has no clean WITH RECURSIVE
// builder; a builder rewrite would add depth-based roundtrips. Keep schema refs in
// the interpolations and scope every leg to the user.
const { rows } = await db.execute<TaskTreeRow>(sql`
WITH RECURSIVE task_tree AS (
SELECT ${tasks.id}, ${tasks.parentTaskId}
FROM ${tasks}
WHERE ${tasks.id} = ${rootTaskId}
AND ${tasks.createdByUserId} = ${userId}
UNION ALL
SELECT ${tasks.id}, ${tasks.parentTaskId}
FROM ${tasks}
JOIN task_tree ON ${tasks.parentTaskId} = task_tree.id
WHERE ${tasks.createdByUserId} = ${userId}
)
SELECT * FROM task_tree
`);
```
### One-to-Many (Separate Queries)
When you need a parent record with its children, use two queries instead of relational `with:`:
@@ -1,83 +0,0 @@
---
name: heterogeneous-agent
description: 'Implement or debug LobeHub heterogeneous agents. Use for Claude Code/Codex adapters, external CLI agents, event mapping, IPC, persistence, tool-call chains, sessions, traces, or adapter bugs.'
---
# Heterogeneous Agent Development
Use this skill when the bug or feature lives in the external CLI agent pipeline, not the normal server-side agent runtime.
## Use This Skill For
- Adding or changing a driver under `apps/desktop/src/main/modules/heterogeneousAgent/drivers/`
- Editing an adapter under `packages/heterogeneous-agents/src/adapters/`
- Debugging `heteroAgentRawLine` transport, `window.__HETERO_AGENT_TRACE`, or `executeHeterogeneousAgent`
- Fixing Claude Code stream-json bugs such as duplicate partial/full chunks, broken `message.id` boundaries, missing `tool_result`, TodoWrite state drift, or subagent thread routing
- Fixing Codex JSONL bugs such as mixed multi-tool messages, broken turn boundaries, or missing tool-result mapping
- Fixing step-boundary, tool persistence, subagent thread, or resume bugs in Claude Code / Codex flows
- Reproducing multi-tool mixing, orphan tool messages, or stuck tool-result loading
## Pipeline Map
1. CLI raw stdout / JSONL
2. Electron main spawns the CLI and broadcasts `heteroAgentRawLine`
3. Adapter maps raw provider events into `HeterogeneousAgentEvent`
4. `executeHeterogeneousAgent` persists assistant/tool messages and forwards stream events
5. `createGatewayEventHandler` hydrates the UI
6. Only after this path looks correct should you move on to `agent-tracing` or context-engine debugging
## Read These Files First
- `apps/desktop/src/main/controllers/HeterogeneousAgentCtr.ts`
- `apps/desktop/src/main/modules/heterogeneousAgent/drivers/claudeCode.ts`
- `apps/desktop/src/main/modules/heterogeneousAgent/drivers/codex.ts`
- `packages/heterogeneous-agents/src/adapters/claudeCode.ts`
- `packages/heterogeneous-agents/src/adapters/codex.ts`
- `src/store/chat/slices/aiChat/actions/heterogeneousAgentExecutor.ts`
- `src/store/chat/slices/aiChat/actions/__tests__/heterogeneousAgentExecutor.test.ts`
## Default Debug Order
1. Prove whether the raw CLI output is correct before touching UI code.
2. If raw output is correct, compare it with adapter output. In dev, `executeHeterogeneousAgent` exposes `window.__HETERO_AGENT_TRACE`.
3. If adapted events look correct, inspect `persistToolBatch`, `persistToolResult`, step transitions, and subagent routing.
4. Turn the repro into a focused test before fixing.
5. Only after the transport/adapter/executor path looks sound should you debug later-stage message processing.
## Critical Invariants
- One raw tool item must map to one stable `ToolCallPayload.id`.
- A new main-agent step must emit a boundary signal before events are forwarded to the new assistant.
- In Claude Code, multiple assistant events with the same `message.id` are one turn, not multiple turns.
- In Claude Code, `tool_result` lives in `type: 'user'` events, not assistant events.
- In Claude Code partial mode, `message_delta.usage` is authoritative; do not trust echoed usage on every assistant block.
- `persistToolBatch` must pre-register assistant `tools[]` before creating tool messages.
- Every tool message must keep `parentId` equal to the owning assistant and `tool_call_id` equal to the tool id.
- `tool_result` must resolve an existing `toolMsgIdByCallId`.
- Subagent chunks must stay in thread scope and must not be forwarded into the main assistant stream.
- Never clear the global `toolMsgIdByCallId` map at main step boundaries.
## Common Bug Patterns
- Claude Code duplicates text or thinking:
check whether partial deltas and the later full assistant block are both being emitted.
- Claude Code opens too many assistant messages:
check whether the adapter is cutting steps on every assistant event instead of only on `message.id` changes.
- Claude Code tool results never land:
check whether `type: 'user'` `tool_result` blocks are being ignored because the code only inspects assistant events.
- Claude Code TodoWrite cards look stale:
check whether synthesized `pluginState.todos` is being attached at tool-result time.
- Claude Code subagent transcript leaks into the main bubble:
check `parent_tool_use_id` handling and whether subagent chunks are being forwarded to the main gateway handler.
- Multiple Codex tools collapse into one assistant message:
first check whether the adapter emits a usable step boundary such as `newStep` or an equivalent turn-change signal.
- Orphan tool messages:
first check step-transition ordering and whether `persistToolBatch` Phase 1 ran before tool message creation.
- Tool bubble stays loading:
look for `tool_result for unknown toolCallId` and missing `result_msg_id` backfill.
- Subagent tools show up in the main bubble:
check for subagent chunks reaching the main gateway handler.
## References
- For commands, trace capture, invariants, and focused test commands, read [references/debug-workflow.md](./references/debug-workflow.md).
@@ -1,246 +0,0 @@
# Heterogeneous Agent Debug Workflow
## Contents
1. Pipeline map
2. Capture raw CLI traces first
3. Compare raw and adapted events
4. Check step boundaries before persistence
5. Check tool persistence invariants
6. Focused tests
7. Repro-to-fix workflow
## 1. Pipeline Map
```
CLI raw stdout
-> HeterogeneousAgentCtr (Electron main)
-> heteroAgentRawLine broadcast
-> createAdapter(...)
-> executeHeterogeneousAgent(...)
-> persistToolBatch / persistToolResult
-> createGatewayEventHandler(...)
-> UI hydration
```
Start at the leftmost broken layer. Do not jump straight to UI rendering unless raw and adapted events already look correct.
## 2. Capture Raw CLI Traces First
### Codex raw JSONL
Use a read-only prompt and save traces under the repo-local scratch directory `.heerogeneous-tracing/`.
```bash
ts=$(date +%Y%m%d-%H%M%S)
out=".heerogeneous-tracing/codex-${ts}.jsonl"
last=".heerogeneous-tracing/codex-${ts}.last.txt"
cat << 'EOF' | codex exec --json --skip-git-repo-check --sandbox read-only -C "$PWD" -o "$last" - > "$out"
You are being run only to collect a raw Codex JSON event trace.
Do not modify any files.
Use at least 4 separate shell tool invocations, one invocation per command.
Run a short sequence of read-only repo checks and then reply with a one-sentence summary.
EOF
```
What to look for in the JSONL:
- `thread.started`
- `turn.started`
- `item.started` / `item.completed`
- `item.type === 'command_execution'`
- `item.type === 'agent_message'`
- `turn.completed`
If raw Codex already merges tools into one item, the adapter is innocent. If raw Codex emits independent items but UI collapses them, the bug is downstream.
If the repo already contains useful traces under `.heerogeneous-tracing/`, inspect them before reproducing.
### Claude Code raw NDJSON
Mirror the arguments from `apps/desktop/src/main/modules/heterogeneousAgent/drivers/claudeCode.ts`.
- `-p`
- `--input-format stream-json`
- `--output-format stream-json`
- `--verbose`
- `--include-partial-messages`
- `--permission-mode bypassPermissions`
You can capture a local raw trace like this:
```bash
ts=$(date +%Y%m%d-%H%M%S)
out=".heerogeneous-tracing/claude-${ts}.ndjson"
cat << 'EOF' | claude -p \
--input-format stream-json \
--output-format stream-json \
--verbose \
--include-partial-messages \
--permission-mode bypassPermissions \
> "$out"
{"type":"user","message":{"role":"user","content":[{"type":"text","text":"Do a few read-only repo checks, use several tool calls, and then summarize briefly."}]}}
EOF
```
What to look for in Claude Code raw traces:
- `type: 'system', subtype: 'init'`
- `type: 'assistant'` blocks for `thinking`, `tool_use`, and `text`
- `type: 'user'` blocks containing `tool_result`
- `type: 'stream_event'` with `message_start`, `content_block_delta`, and `message_delta`
- `type: 'result'`
- `type: 'rate_limit_event'`
Important Claude Code semantics:
- Each content block often arrives as its own assistant event.
- Multiple assistant events can share the same `message.id`; that is still one turn.
- `message.id` change is the main-step boundary.
- Partial deltas arrive before the later full assistant block.
- `message_delta.usage` is the authoritative per-turn usage.
- Subagent events are tagged with `parent_tool_use_id`.
If the repo already contains useful references, inspect these first:
- `.heerogeneous-tracing/cc-monitor-real-trace.jsonl`
- `.heerogeneous-tracing/cc-stream-chain-reference.md`
If you only need boundary semantics or tool persistence behavior, prefer existing adapter tests under:
- `packages/heterogeneous-agents/src/adapters/claudeCode.test.ts`
- `packages/heterogeneous-agents/src/adapters/claudeCode.e2e.test.ts`
## 3. Compare Raw And Adapted Events
In dev builds, `executeHeterogeneousAgent` stores raw lines plus adapted events on:
- `window.__HETERO_AGENT_TRACE`
Use that trace to compare:
- raw `item.started` / `item.completed`
- adapted `stream_chunk { chunkType: 'tools_calling' }`
- adapted `tool_result`
- adapted `tool_end`
For Codex, the usual mapping is:
- raw `item.started(command_execution)` -> `tools_calling` + `tool_start`
- raw `item.completed(command_execution)` -> `tool_result` + `tool_end`
- raw `item.completed(agent_message)` -> `stream_chunk(text)`
If the raw trace is right but adapted events are wrong, fix the adapter before touching persistence.
## 4. Check Step Boundaries Before Persistence
This is the first thing to verify for "mixed tools in one assistant" bugs.
### Claude Code
Claude Code step boundaries are keyed off assistant `message.id` changes. The adapter should emit:
- `stream_end`
- `stream_start { newStep: true }`
Also verify these Claude-specific invariants:
- the first assistant after init does not open a new step
- repeated assistant events with the same `message.id` do not open a new step
- partial `content_block_delta` text/thinking does not get duplicated by the later full assistant event
- `tool_result` from `type: 'user'` updates the matching tool row
- `parent_tool_use_id` creates thread-scoped subagent chunks instead of main-stream chunks
- TodoWrite `tool_use.input` is converted into synthesized `pluginState.todos` on `tool_result`
Good references:
- `packages/heterogeneous-agents/src/adapters/claudeCode.ts`
- `packages/heterogeneous-agents/src/adapters/claudeCode.test.ts`
### Codex
Codex raw traces usually provide turn-level boundaries through:
- `turn.started`
- `turn.completed`
The executor only cuts a new assistant message when it receives a step-boundary signal it understands. If the adapter emits `stream_start` without `newStep`, multiple Codex tools and text chunks can accumulate under the same assistant longer than intended.
Relevant files:
- `packages/heterogeneous-agents/src/adapters/codex.ts`
- `src/store/chat/slices/aiChat/actions/heterogeneousAgentExecutor.ts`
## 5. Check Tool Persistence Invariants
Read `persistToolBatch` and `persistToolResult` before changing UI code.
### `persistToolBatch`
The expected order is:
1. Pre-register assistant `tools[]`
2. Create `role: 'tool'` messages
3. Backfill `result_msg_id` onto assistant `tools[]`
If tool rows are created before assistant `tools[]` are registered, orphan tool messages are likely.
### `persistToolResult`
`tool_result` must resolve the tool row through `toolMsgIdByCallId`.
Warning signs:
- `tool_result for unknown toolCallId`
- tool rows with empty content forever
- missing `result_msg_id`
For Claude Code, remember that tool results originate from raw `type: 'user'` events.
### Main vs subagent scope
- Main-agent tool state is per-step.
- `toolMsgIdByCallId` is global across main and subagent scopes.
- Subagent chunks must not be forwarded into the main gateway handler.
If subagent events leak to the main handler, the main bubble can inherit the wrong `tools[]` and content.
## 6. Focused Tests
Run the smallest useful test set first.
```bash
bunx vitest run --silent='passed-only' 'packages/heterogeneous-agents/src/adapters/codex.test.ts'
bunx vitest run --silent='passed-only' 'packages/heterogeneous-agents/src/adapters/claudeCode.test.ts'
bunx vitest run --silent='passed-only' 'src/store/chat/slices/aiChat/actions/__tests__/heterogeneousAgentExecutor.test.ts'
```
Especially useful places:
- `packages/heterogeneous-agents/src/adapters/codex.test.ts`
- `packages/heterogeneous-agents/src/adapters/claudeCode.test.ts`
- `src/store/chat/slices/aiChat/actions/__tests__/heterogeneousAgentExecutor.test.ts`
Claude Code-specific assertions worth adding when fixing bugs:
- same `message.id` does not emit `newStep`
- changed `message.id` does emit `stream_end` plus `stream_start { newStep: true }`
- partial text/thinking is emitted once
- `tool_result` from `user` events reaches the right tool row
- subagent chunks carry `subagent.parentToolCallId`
- TodoWrite result synthesizes `pluginState.todos`
When the bug comes from a real trace, distill it into the closest existing test file instead of relying on manual UI-only repros.
## 7. Repro-To-Fix Workflow
1. Capture a raw trace and save it under `.heerogeneous-tracing/`.
2. Confirm whether the bug appears in raw events, adapted events, or persistence.
3. Add or update the narrowest failing test near the broken layer.
4. Fix the smallest layer that can explain the symptom.
5. Re-run focused tests.
6. Only then do an Electron smoke test with the `local-testing` skill if UI confirmation is still needed.
Do not start with a broad Electron repro if a raw trace or adapter test can prove the fault zone faster.
+1 -2
View File
@@ -1,7 +1,6 @@
---
name: hotkey
description: 'Add or edit LobeHub keyboard shortcuts. Use for HotkeyEnum, HOTKEYS_REGISTRATION, combineKeys, useHotkeyById, tooltip hotkeys, shortcut scope, conflicts, or Cmd/Ctrl key combos.'
user-invocable: false
description: Guide for adding keyboard shortcuts. Use when implementing new hotkeys, registering shortcuts, or working with keyboard interactions. Triggers on hotkey implementation or keyboard shortcut tasks.
---
# Adding Keyboard Shortcuts Guide
+2 -3
View File
@@ -1,12 +1,11 @@
---
name: i18n
description: 'LobeHub i18n with react-i18next. Use for user-facing strings, locale keys, namespaces, useTranslation, t(), interpolation, zh-CN/en-US previews, hardcoded UI copy, or pnpm i18n.'
user-invocable: false
description: Internationalization guide using react-i18next. Use when adding translations, creating i18n keys, or working with localized text in React components (.tsx files). Triggers on translation tasks, locale management, or i18n implementation.
---
# LobeHub Internationalization Guide
- Default language: English (en-US)
- Default language: Chinese (zh-CN)
- Framework: react-i18next
- **Only edit files in `src/locales/default/`** - Never edit JSON files in `locales/`
- Run `pnpm i18n` to generate translations (or manually translate zh-CN/en-US for dev preview)
+30 -92
View File
@@ -1,106 +1,38 @@
---
name: linear
description: 'Linear issue management. Use for LOBE-xxx issues, Linear links, PRs referencing Linear, retrieving issues, updating status, completion comments, or sub-issue trees.'
user-invocable: false
description: "Linear issue management. MUST USE when: (1) user mentions LOBE-xxx issue IDs (e.g. LOBE-4540), (2) user says 'linear', 'linear issue', 'link linear', (3) creating PRs that reference Linear issues. Provides workflows for retrieving issues, updating status, and adding comments."
---
# Linear Issue Management
Before using Linear workflows, search for `linear` MCP tools. If not found, treat as not installed.
## PR Creation with Linear Issues
## ⚠️ CRITICAL: PR Creation with Linear Issues
A PR that fixes a Linear issue has **two separate jobs to do**, and both matter:
**When creating a PR that references Linear issues (LOBE-xxx), you MUST:**
1. **`Fixes LOBE-xxx` in the PR body** — Linear watches GitHub for these magic keywords and auto-links the PR and auto-closes the issue on merge. This is the machine-readable side.
2. **A completion comment on the Linear issue** — gives the reviewer/PM/teammate landing in Linear a human-readable summary of what changed and why, without forcing them to click through to GitHub and read a diff.
1. Create the PR with magic keywords (`Fixes LOBE-xxx`)
2. **IMMEDIATELY after PR creation**, add completion comments to ALL referenced Linear issues
3. Do NOT consider the task complete until Linear comments are added
If you only do step 1, Linear watchers (often non-engineers) hit the issue and see no context. So pair PR creation with the Linear comment as part of the same task — finish both before considering the work done.
This is NON-NEGOTIABLE. Skipping Linear comments is a workflow violation.
## Workflow
1. **Retrieve issue details** before starting: `mcp__linear-server__get_issue`
2. **Read images** issue descriptions often contain screenshots with critical context (mockups, error states, before/after). Use `mcp__linear-server__extract_images` so you actually see them; reading raw markdown alone misses what the reporter was looking at.
3. **Check for sub-issues**: `mcp__linear-server__list_issues` with `parentId` filter
4. **Mark as In Progress** at the moment you start planning or implementing — this signals to teammates the issue is owned, so they don't double-pick it up.
2. **Read images**: If the issue description contains images, MUST use `mcp__linear-server__extract_images` to read image content for full context
3. **Check for sub-issues**: Use `mcp__linear-server__list_issues` with `parentId` filter
4. **Mark as In Progress**: When starting to plan or implement an issue, immediately update status to **"In Progress"** via `mcp__linear-server__update_issue`
5. **Update issue status** when completing: `mcp__linear-server__update_issue`
6. **Add completion comment** (see [format below](#completion-comment-format))
6. **Add completion comment** (REQUIRED): `mcp__linear-server__create_comment`
## Creating Issues
When creating issues with `mcp__linear-server__create_issue`, add the `claude code` label. Reason: the label is how the team filters/audits AI-generated issues; without it those issues vanish into the general backlog and the team loses visibility into AI contribution patterns.
## Language
Match the issue language to the conversation that produced it — if you're discussing in 中文,write the issue in 中文;if discussing in English, write it in English. Reason: the issue is a continuation of the conversation, and forcing a language switch creates translation friction for the collaborator who started the thread.
Specifics:
- 中文 conversation → 中文 body; technical terms (file paths, identifiers, library names, commands, error messages) stay in English.
- English conversation → English body.
- Code blocks, file paths, and quoted strings always stay in their original form regardless of surrounding language.
- This applies equally to **updates** — when editing an existing issue (description **and titles**), preserve the language of the conversation that triggered the edit; don't switch the issue language mid-refactor.
## Creating Sub-issue Trees
When breaking a parent issue into a tree of sub-issues (e.g., task decomposition for LOBE-xxx), follow these rules — they work around real limitations of the Linear MCP tools.
### 1. Prefix titles with an ordering index
The Linear Sub-issues panel orders children by `sortOrder`, which **defaults to newest-first** (most recently created appears on top). Neither parallel nor serial creation produces the intended top-to-bottom reading order, and the MCP `save_issue` tool does **not expose a `sortOrder` parameter** — you can't set order at create time.
Workaround: encode execution order in the title itself:
```plaintext
[1] [db] add schema fields
[2] [db] new table + repository
[3] [service] business logic layer
[4] [api] REST endpoints
[4.1] [sdk] client SDK wrapper
[4.1.1] [app] consumer integration
[4.1.2] [app] UI surface
[4.2] [ui] dashboard page
```
Even when the panel shuffles, the reader can mentally reconstruct the dependency graph at a glance. Dotted numbering `[n.m.k]` should mirror the parent-child nesting so the index and the tree agree.
### 2. Nest sub-issues by logical parent-child, not flat under the root
Linear supports **unlimited sub-issue depth**. A flat list of 8+ siblings under one root is hard to scan. Group by main-subordinate logic:
- Core service → its SDK → SDK consumers
- Don't create a sibling when a child is more accurate
Use `parentId: "LOBE-xxxx"` at creation (or `save_issue` to move). Moving an issue's parent does not disturb its `blockedBy` relations.
### 3. Sub-issue creation order is dictated by `blockedBy`
`blockedBy` requires the blocker to exist first (you need its LOBE-id). So:
1. **Topologically sort** the DAG — leaves (no deps) first, roots last
2. Create issues with zero deps in the first wave
3. Create dependent issues only after collecting the blocker IDs from prior responses
4. `blockedBy` is **append-only**; passing it again does not overwrite — safe to re-run
### 4. Don't waste rounds trying to parallelize
MCP tool calls in a single message look parallel but execute sequentially on the server, and you still need blocker IDs from earlier responses. Just issue calls in dependency order; optimizing for parallelism gains nothing here.
### 5. Keep each sub-issue description self-contained
Each sub-issue should state:
- Goal (12 lines)
- Key files to touch
- Concrete changes / acceptance criteria
- Dependencies (link to blocker issues by `LOBE-xxxx`)
- Validation steps
The implementer may open only the sub-issue, not the parent — don't rely on context that lives only in the parent description.
When creating issues with `mcp__linear-server__create_issue`, **MUST add the `claude code` label**.
## Completion Comment Format
Each completed issue gets a comment summarizing the work, so reviewers and future readers don't have to reconstruct it from the PR diff:
Every completed issue MUST have a comment summarizing work done:
```markdown
## Changes Summary
@@ -116,28 +48,34 @@ Each completed issue gets a comment summarizing the work, so reviewers and futur
- ...
```
This gives team visibility, code-review context, and a paper trail for future reference.
This is critical for:
## PR Association
- Team visibility
- Code review context
- Future reference
When creating PRs for Linear issues, include magic keywords in the PR body:
## PR Association (REQUIRED)
When creating PRs for Linear issues, include magic keywords in PR body:
- `Fixes LOBE-123`
- `Closes LOBE-123`
- `Resolves LOBE-123`
These trigger Linear's auto-link + auto-close on merge.
## Per-Issue Completion Rule
When working on multiple issues, close out **each one before starting the next** — don't batch all the Linear updates to the end. Batching is where comments get forgotten and issues stay stuck in "In Progress" days after the PR shipped.
For each issue:
When working on multiple issues, update EACH issue IMMEDIATELY after completing it:
1. Complete implementation
2. Run `bun run type-check`
3. Run related tests
4. Create PR if needed
5. Update status to **"In Review"** (not "Done" — "Done" is for after the PR merges)
6. Add the completion comment
7. Move to the next issue
5. Update status to **"In Review"** (NOT "Done")
6. **Add completion comment immediately**
7. Move to next issue
**Note:** Status → "In Review" when PR created. "Done" only after PR merged.
**❌ Wrong:** Complete all → Create PR → Forget Linear comments
**✅ Correct:** Complete → Create PR → Add Linear comments → Task done
+653 -89
View File
@@ -173,10 +173,6 @@ agent-browser state save auth.json
agent-browser state load auth.json
```
### LobeHub dev server — inject better-auth cookie
`agent-browser --headed` on macOS can create an off-screen Chromium window, blocking manual login. For a local LobeHub dev server (e.g. `localhost:3011`), copy the `better-auth.session_token` cookie out of a **Network request** in the user's own Chrome DevTools and load it via `state load`. See [references/agent-browser-login.md](./references/agent-browser-login.md) for the full recipe.
## Semantic Locators (Alternative to Refs)
```bash
@@ -383,74 +379,639 @@ agent-browser --auto-connect snapshot -i
# Part 2: osascript (Native macOS App Bot Testing)
Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. Works with any app that supports macOS Accessibility, no CDP or Chromium needed.
Use AppleScript via `osascript` to control native macOS desktop apps for bot testing. This works with any app that supports macOS Accessibility, without needing CDP or Chromium.
The pattern is the same for every platform:
## Core osascript Patterns
1. **Activate** the app (`tell application "X" to activate`)
2. **Navigate** to a channel/chat (Quick Switcher `Cmd+K` or Search `Cmd+F`)
3. **Send** a message (clipboard paste `Cmd+V` + Enter)
4. **Wait** for the bot response
5. **Screenshot** for verification (`screencapture` + `Read` tool)
### Activate an App
## Per-Platform References
```bash
osascript -e 'tell application "Discord" to activate'
```
Pick the file for your target platform — each contains activation, navigation, send-message, and verification snippets specific to that app:
### Type Text
Each channel has its own folder under `bot/<channel>/` containing an `index.md`
(activation, navigation, send-message, and verification snippets specific to
that app) and its test script:
```bash
# Type character by character (reliable, but slow for long text)
osascript -e 'tell application "System Events" to keystroke "Hello world"'
| Platform | Reference | Quick switcher |
| ------------- | ------------------------------------------------ | -------------- |
| Discord | [bot/discord/index.md](./bot/discord/index.md) | `Cmd+K` |
| Slack | [bot/slack/index.md](./bot/slack/index.md) | `Cmd+K` |
| Telegram | [bot/telegram/index.md](./bot/telegram/index.md) | `Cmd+F` |
| WeChat / 微信 | [bot/wechat/index.md](./bot/wechat/index.md) | `Cmd+F` |
| Lark / 飞书 | [bot/lark/index.md](./bot/lark/index.md) | `Cmd+K` |
| QQ | [bot/qq/index.md](./bot/qq/index.md) | `Cmd+F` |
# Press Enter
osascript -e 'tell application "System Events" to key code 36'
For **shared osascript patterns** (activate, type, paste, screenshot, read accessibility, common workflow template, gotchas), see [bot/osascript-common.md](./bot/osascript-common.md). Read this first if you're new to osascript automation.
# Press Tab
osascript -e 'tell application "System Events" to key code 48'
## Bridge-based channels (no native app)
# Press Escape
osascript -e 'tell application "System Events" to key code 53'
```
Some channels have no native app to drive with osascript — they connect through
a local bridge inside the Desktop app. These are tested with agent-browser
(IPC + UI) plus the bridge's own HTTP/REST endpoints, not osascript:
### Paste from Clipboard (fast, for long text)
| Channel | Reference | What it drives |
| -------- | ------------------------------------------------ | -------------------------------------------------------- |
| iMessage | [bot/imessage/index.md](./bot/imessage/index.md) | `imessageBridge.*` IPC + local bridge + BlueBubbles REST |
```bash
# Set clipboard and paste — much faster than keystroke for long messages
osascript -e 'set the clipboard to "Your long message here"'
osascript -e 'tell application "System Events" to keystroke "v" using command down'
```
For iMessage there is a one-shot regression script — see `test-imessage-bridge.sh` below.
Or in one shot:
```bash
osascript -e '
set the clipboard to "Your long message here"
tell application "System Events" to keystroke "v" using command down
'
```
### Keyboard Shortcuts
```bash
# Cmd+K (quick switcher in Discord/Slack)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
# Cmd+F (search)
osascript -e 'tell application "System Events" to keystroke "f" using command down'
# Cmd+N (new message/chat)
osascript -e 'tell application "System Events" to keystroke "n" using command down'
# Cmd+Shift+K (example: multi-modifier)
osascript -e 'tell application "System Events" to keystroke "k" using {command down, shift down}'
```
### Click at Position
```bash
# Click at absolute screen coordinates
osascript -e '
tell application "System Events"
click at {500, 300}
end tell
'
```
### Get Window Info
```bash
# Get window position and size
osascript -e '
tell application "System Events"
tell process "Discord"
get {position, size} of window 1
end tell
end tell
'
```
### Screenshot
```bash
# Full screen
screencapture /tmp/screenshot.png
# Interactive region select
screencapture -i /tmp/screenshot.png
# Specific window (by window ID from CGWindowList)
screencapture -l < WINDOW_ID > /tmp/screenshot.png
```
To get window ID for a specific app:
```bash
osascript -e '
tell application "System Events"
tell process "Discord"
get id of window 1
end tell
end tell
'
```
### Read Accessibility Elements
```bash
# Get all UI elements of the frontmost window (can be slow/large)
osascript -e '
tell application "System Events"
tell process "Discord"
entire contents of window 1
end tell
end tell
'
# Get a specific element's value
osascript -e '
tell application "System Events"
tell process "Discord"
get value of text field 1 of window 1
end tell
end tell
'
```
> **Warning:** `entire contents` can be extremely slow on complex UIs. Prefer screenshots + `Read` tool for visual verification.
### Read Screen Text via Clipboard
For reading the latest message or response from an app:
```bash
# Select all text in the focused area and copy
osascript -e '
tell application "System Events"
keystroke "a" using command down
keystroke "c" using command down
end tell
'
sleep 0.5
# Read clipboard
pbpaste
```
---
## Client: Discord
**App name:** `Discord` | **Process name:** `Discord`
### Activate & Navigate
```bash
# Activate Discord
osascript -e 'tell application "Discord" to activate'
sleep 1
# Open Quick Switcher (Cmd+K) to navigate to a channel
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e 'tell application "System Events" to keystroke "bot-testing"'
sleep 1
osascript -e 'tell application "System Events" to key code 36' # Enter
sleep 2
```
### Send Message to Bot
```bash
# The message input is focused after navigating to a channel
# Type a message
osascript -e 'tell application "System Events" to keystroke "/hello"'
sleep 0.5
osascript -e 'tell application "System Events" to key code 36' # Enter
```
### Send Long Message (via clipboard)
```bash
osascript -e '
tell application "Discord" to activate
delay 0.5
set the clipboard to "Write a 3000 word essay about space exploration"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36 -- Enter
end tell
'
```
### Verify Bot Response
```bash
# Wait for bot to respond, then screenshot
sleep 10
screencapture /tmp/discord-bot-response.png
# Read with the Read tool for visual verification
```
### Full Bot Test Example
```bash
#!/usr/bin/env bash
# test-discord-bot.sh — Send message and verify bot response
# 1. Activate Discord and navigate to channel
osascript -e '
tell application "Discord" to activate
delay 1
-- Quick Switcher
tell application "System Events" to keystroke "k" using command down
delay 0.5
tell application "System Events" to keystroke "bot-testing"
delay 1
tell application "System Events" to key code 36
delay 2
'
# 2. Send test message
osascript -e '
set the clipboard to "!ping"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
# 3. Wait for response and capture
sleep 5
screencapture /tmp/discord-test-result.png
echo "Screenshot saved to /tmp/discord-test-result.png"
```
---
## Client: Slack
**App name:** `Slack` | **Process name:** `Slack`
### Activate & Navigate
```bash
# Activate Slack
osascript -e 'tell application "Slack" to activate'
sleep 1
# Quick Switcher (Cmd+K)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e 'tell application "System Events" to keystroke "bot-testing"'
sleep 1
osascript -e 'tell application "System Events" to key code 36' # Enter
sleep 2
```
### Send Message to Bot
```bash
# Direct message input (focused after channel nav)
osascript -e 'tell application "System Events" to keystroke "@mybot hello"'
sleep 0.3
osascript -e 'tell application "System Events" to key code 36'
```
### Send Long Message
```bash
osascript -e '
tell application "Slack" to activate
delay 0.5
set the clipboard to "A long test message for the bot..."
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
```
### Slash Command Test
```bash
osascript -e '
tell application "Slack" to activate
delay 0.5
tell application "System Events"
keystroke "/ask What is the meaning of life?"
delay 0.5
key code 36
end tell
'
```
### Verify Response
```bash
sleep 10
screencapture /tmp/slack-bot-response.png
```
---
## Client: Telegram
**App name:** `Telegram` | **Process name:** `Telegram`
### Activate & Navigate
```bash
# Activate Telegram
osascript -e 'tell application "Telegram" to activate'
sleep 1
# Search for a bot (Cmd+F or click search)
osascript -e '
tell application "System Events"
keystroke "f" using command down
delay 0.5
keystroke "MyTestBot"
delay 1
key code 36 -- Enter to select
end tell
'
sleep 2
```
### Send Message to Bot
```bash
# After navigating to bot chat, input is focused
osascript -e '
tell application "System Events"
keystroke "/start"
delay 0.3
key code 36
end tell
'
```
### Send Long Message
```bash
osascript -e '
tell application "Telegram" to activate
delay 0.5
set the clipboard to "Tell me about quantum computing in detail"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
```
### Verify Response
```bash
sleep 10
screencapture /tmp/telegram-bot-response.png
```
### Telegram Bot API (programmatic alternative)
For sending messages directly to the bot's chat without UI:
```bash
# Send message as the bot (for testing webhooks/responses)
curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
-d "chat_id=$CHAT_ID&text=test message"
# Get recent updates
curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | jq .
```
---
## Client: WeChat / 微信
**App name:** `微信` or `WeChat` | **Process name:** `WeChat`
### Activate & Navigate
```bash
# Activate WeChat
osascript -e 'tell application "微信" to activate'
sleep 1
# Search for a contact/bot (Cmd+F)
osascript -e '
tell application "System Events"
keystroke "f" using command down
delay 0.5
keystroke "TestBot"
delay 1
key code 36 -- Enter to select
end tell
'
sleep 2
```
### Send Message
```bash
# After navigating to a chat, the input is focused
osascript -e '
tell application "System Events"
keystroke "Hello bot!"
delay 0.3
key code 36
end tell
'
```
### Send Long Message (clipboard)
```bash
osascript -e '
tell application "微信" to activate
delay 0.5
set the clipboard to "Please help me with this task..."
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
```
### Verify Response
```bash
sleep 10
screencapture /tmp/wechat-bot-response.png
```
### WeChat-Specific Notes
- WeChat macOS app name can be `微信` or `WeChat` depending on system language. Try both:
```bash
osascript -e 'tell application "微信" to activate' 2> /dev/null \
|| osascript -e 'tell application "WeChat" to activate'
```
- WeChat uses **Enter** to send (not Cmd+Enter by default, but configurable)
- For multi-line messages without sending, use **Shift+Enter**:
```bash
osascript -e 'tell application "System Events" to key code 36 using shift down'
```
---
## Client: Lark / 飞书
**App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`
### Activate & Navigate
```bash
# Activate Lark (auto-detects Lark or 飞书)
osascript -e 'tell application "Lark" to activate' 2> /dev/null \
|| osascript -e 'tell application "飞书" to activate'
sleep 1
# Quick Switcher / Search (Cmd+K)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e '
set the clipboard to "bot-testing"
tell application "System Events"
keystroke "v" using command down
delay 1.5
key code 36 -- Enter
end tell
'
sleep 2
```
### Send Message to Bot
```bash
osascript -e '
set the clipboard to "@MyBot help me with this task"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36 -- Enter
end tell
'
```
### Verify Response
```bash
sleep 10
screencapture /tmp/lark-bot-response.png
```
### Lark-Specific Notes
- App name varies: `Lark` (international) vs `飞书` (China mainland) — the script auto-detects
- Uses `Cmd+K` for quick search (same as Discord/Slack)
- Enter sends message by default
---
## Client: QQ
**App name:** `QQ` | **Process name:** `QQ`
### Activate & Navigate
```bash
osascript -e 'tell application "QQ" to activate'
sleep 1
# Search for contact/group (Cmd+F)
osascript -e '
tell application "System Events"
keystroke "f" using command down
delay 0.8
end tell
'
osascript -e '
set the clipboard to "bot-testing"
tell application "System Events"
keystroke "v" using command down
delay 1.5
key code 36 -- Enter
end tell
'
sleep 2
```
### Send Message to Bot
```bash
osascript -e '
set the clipboard to "Hello bot!"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36 -- Enter
end tell
'
```
### Verify Response
```bash
sleep 10
screencapture /tmp/qq-bot-response.png
```
### QQ-Specific Notes
- Enter sends message by default; Shift+Enter for newlines
- Uses `Cmd+F` for search
- Always use clipboard paste for CJK characters
---
## Common Bot Testing Workflow (osascript)
Regardless of platform, the pattern is:
```bash
APP_NAME="Discord" # or "Slack", "Telegram", "微信"
CHANNEL="bot-testing"
MESSAGE="Hello bot!"
WAIT_SECONDS=10
# 1. Activate
osascript -e "tell application \"$APP_NAME\" to activate"
sleep 1
# 2. Navigate to channel/chat (via Quick Switcher or Search)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e "tell application \"System Events\" to keystroke \"$CHANNEL\""
sleep 1
osascript -e 'tell application "System Events" to key code 36'
sleep 2
# 3. Send message
osascript -e "set the clipboard to \"$MESSAGE\""
osascript -e '
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
# 4. Wait for bot response
sleep "$WAIT_SECONDS"
# 5. Screenshot for verification
screencapture /tmp/"${APP_NAME,,}"-bot-test.png
echo "Result saved to /tmp/${APP_NAME,,}-bot-test.png"
```
### Tips
- **Use clipboard paste** (`Cmd+V`) for messages containing special characters or long text — `keystroke` can mangle non-ASCII
- **Add `delay`** between actions — apps need time to process UI events
- **Screenshot for verification** — use `screencapture` + `Read` tool for visual checks
- **Use a dedicated test channel/chat** — avoid polluting real conversations
- **Check app name** — some apps have different names in different locales (e.g., `微信` vs `WeChat`)
- **Accessibility permissions required** — System Events automation requires granting Accessibility access in System Preferences > Privacy & Security > Accessibility
---
# Scripts
**App / recording scripts** in `.agents/skills/local-testing/scripts/`:
Ready-to-use scripts in `.agents/skills/local-testing/scripts/`:
| Script | Usage |
| ------------------------- | --------------------------------------------------- |
| `electron-dev.sh` | Manage Electron dev env (start/stop/status/restart) |
| `capture-app-window.sh` | Capture screenshot of a specific app window |
| `record-electron-demo.sh` | Record Electron app demo with ffmpeg |
| `record-app-screen.sh` | Record app screen (video + screenshots, start/stop) |
**Bot scripts** live under `.agents/skills/local-testing/bot/`, one folder per
channel (alongside that channel's `index.md`). The shared
`capture-app-window.sh` sits at the `bot/` root:
| Script | Usage |
| ---------------------------------- | ------------------------------------------------------------------- |
| `capture-app-window.sh` | Capture screenshot of a specific app window (used by bot tests) |
| `discord/test-discord-bot.sh` | Send message to Discord bot via osascript |
| `slack/test-slack-bot.sh` | Send message to Slack bot via osascript |
| `telegram/test-telegram-bot.sh` | Send message to Telegram bot via osascript |
| `wechat/test-wechat-bot.sh` | Send message to WeChat bot via osascript |
| `lark/test-lark-bot.sh` | Send message to Lark / 飞书 bot via osascript |
| `qq/test-qq-bot.sh` | Send message to QQ bot via osascript |
| `imessage/test-imessage-bridge.sh` | Regression-test the iMessage BlueBubbles bridge (IPC + HTTP) |
| `imessage/send-imessage-test.sh` | Send one real iMessage (desktop → BB → iMessage) and verify it sent |
| `test-discord-bot.sh` | Send message to Discord bot via osascript |
| `test-slack-bot.sh` | Send message to Slack bot via osascript |
| `test-telegram-bot.sh` | Send message to Telegram bot via osascript |
| `test-wechat-bot.sh` | Send message to WeChat bot via osascript |
| `test-lark-bot.sh` | Send message to Lark / 飞书 bot via osascript |
| `test-qq-bot.sh` | Send message to QQ bot via osascript |
### Window Screenshot Utility
@@ -458,9 +1019,9 @@ channel (alongside that channel's `index.md`). The shared
```bash
# Standalone usage
./.agents/skills/local-testing/bot/capture-app-window.sh "Discord" /tmp/discord.png
./.agents/skills/local-testing/bot/capture-app-window.sh "Slack" /tmp/slack.png
./.agents/skills/local-testing/bot/capture-app-window.sh "WeChat" /tmp/wechat.png
./.agents/skills/local-testing/scripts/capture-app-window.sh "Discord" /tmp/discord.png
./.agents/skills/local-testing/scripts/capture-app-window.sh "Slack" /tmp/slack.png
./.agents/skills/local-testing/scripts/capture-app-window.sh "WeChat" /tmp/wechat.png
```
All bot test scripts use this utility automatically for their screenshots.
@@ -477,62 +1038,55 @@ Examples:
```bash
# Discord — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
./.agents/skills/local-testing/scripts/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/scripts/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
# Slack — test a bot in #bot-testing channel
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
./.agents/skills/local-testing/scripts/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/scripts/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
# Telegram — test a bot by username
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
./.agents/skills/local-testing/scripts/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/scripts/test-telegram-bot.sh "GPTBot" "Hello" 60
# WeChat — test a bot or send to a contact
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
./.agents/skills/local-testing/scripts/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/scripts/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
# Lark/飞书 — test a bot in a group chat
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
./.agents/skills/local-testing/scripts/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/scripts/test-lark-bot.sh "bot-testing" "Help me with this" 30
# QQ — test a bot in a group or direct chat
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
./.agents/skills/local-testing/scripts/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/scripts/test-qq-bot.sh "MyBot" "/help" 10
```
Each script: activates the app, navigates to the channel/contact, pastes the message via clipboard, sends, waits, and takes a screenshot. Use the `Read` tool on the screenshot for visual verification.
### iMessage bridge regression script
`test-imessage-bridge.sh` does **not** follow the osascript bot interface — it
drives the Desktop bridge's IPC + HTTP layers and asserts the result, then
self-cleans. Needs BlueBubbles running and Electron up with CDP.
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
# defaults: bb_url=http://127.0.0.1:1234 cdp_port=9222 — exit 0 = all green
```
It guards the connect/configure flow (testConfig happy + reject paths, first-time
`upsertConfig` save, bridge running + webhook registered, local-server secret
enforcement). See [bot/imessage/index.md](./bot/imessage/index.md)
for the full manual UI flow and known bugs.
---
# Screen Recording
Record automated demos using `record-app-screen.sh` (start/stop lifecycle, CDP screenshots + ffmpeg assembly). See [references/record-app-screen.md](references/record-app-screen.md) for full documentation.
Record automated demos by combining `ffmpeg` screen capture with `agent-browser` automation. The script `.agents/skills/local-testing/scripts/record-electron-demo.sh` handles the full lifecycle for Electron.
### Usage
```bash
./.agents/skills/local-testing/scripts/electron-dev.sh start
./.agents/skills/local-testing/scripts/record-app-screen.sh start my-demo
# ... run automation ...
./.agents/skills/local-testing/scripts/record-app-screen.sh stop
# Run the built-in demo (queue-edit feature)
./.agents/skills/local-testing/scripts/record-electron-demo.sh
# Run a custom automation script
./.agents/skills/local-testing/scripts/record-electron-demo.sh ./my-demo.sh /tmp/my-demo.mp4
```
Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/` (screenshots every 3s).
The script automatically:
1. Starts Electron with CDP and waits for SPA to load
2. Detects window position, screen, and Retina scale via Swift/CGWindowList
3. Records only the Electron window region using `ffmpeg -f avfoundation` with crop
4. Runs the demo (built-in or custom script receiving CDP port as `$1`)
5. Stops recording and cleans up
---
@@ -558,4 +1112,14 @@ Outputs to `.records/` directory (gitignored): `<name>.mp4` (video) + `<name>/`
### osascript
See [bot/osascript-common.md](./bot/osascript-common.md#gotchas) for the full osascript gotchas list (accessibility permissions, `keystroke` non-ASCII issues, locale-specific app names, rate limiting, etc.).
- **Accessibility permission required** — first run will prompt for access; grant it in System Preferences > Privacy & Security > Accessibility for Terminal / iTerm / Claude Code
- **`keystroke` is slow for long text** — always use clipboard paste (`Cmd+V`) for messages over \~20 characters
- **`keystroke` can mangle non-ASCII** — use clipboard paste for Chinese, emoji, or special characters
- **`key code 36` is Enter** — this is the hardware key code, works regardless of keyboard layout
- **`entire contents` is extremely slow** — avoid for complex UIs; use screenshots instead
- **App name varies by locale** — `微信` vs `WeChat`, `企业微信` vs `WeCom`; handle both
- **WeChat Enter sends immediately** — use `Shift+Enter` for newlines within a message
- **Rate limiting** — don't send messages too fast; platforms may throttle or flag automated input
- **Lark / 飞书 app name varies** — `Lark` (international) vs `飞书` (China mainland); scripts auto-detect
- **QQ uses `Cmd+F` for search** — not `Cmd+K` like Discord/Slack/Lark
- **Bot response times vary** — AI-powered bots may take 10-60s; use generous sleep values
@@ -1,97 +0,0 @@
# Discord Bot Testing
**App name:** `Discord` | **Process name:** `Discord`
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
```bash
# Activate Discord
osascript -e 'tell application "Discord" to activate'
sleep 1
# Open Quick Switcher (Cmd+K) to navigate to a channel
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e 'tell application "System Events" to keystroke "bot-testing"'
sleep 1
osascript -e 'tell application "System Events" to key code 36' # Enter
sleep 2
```
## Send Message to Bot
```bash
# The message input is focused after navigating to a channel
# Type a message
osascript -e 'tell application "System Events" to keystroke "/hello"'
sleep 0.5
osascript -e 'tell application "System Events" to key code 36' # Enter
```
## Send Long Message (via clipboard)
```bash
osascript -e '
tell application "Discord" to activate
delay 0.5
set the clipboard to "Write a 3000 word essay about space exploration"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36 -- Enter
end tell
'
```
## Verify Bot Response
```bash
# Wait for bot to respond, then screenshot
sleep 10
screencapture /tmp/discord-bot-response.png
# Read with the Read tool for visual verification
```
## Full Bot Test Example
```bash
#!/usr/bin/env bash
# test-discord-bot.sh — Send message and verify bot response
# 1. Activate Discord and navigate to channel
osascript -e '
tell application "Discord" to activate
delay 1
-- Quick Switcher
tell application "System Events" to keystroke "k" using command down
delay 0.5
tell application "System Events" to keystroke "bot-testing"
delay 1
tell application "System Events" to key code 36
delay 2
'
# 2. Send test message
osascript -e '
set the clipboard to "!ping"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
# 3. Wait for response and capture
sleep 5
screencapture /tmp/discord-test-result.png
echo "Screenshot saved to /tmp/discord-test-result.png"
```
## Script
```bash
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "!ping"
./.agents/skills/local-testing/bot/discord/test-discord-bot.sh "bot-testing" "/ask Tell me a joke" 30
```
@@ -1,232 +0,0 @@
# iMessage Desktop bridge regression test
The iMessage channel is different from the other bot platforms: there is **no
native app to drive with osascript**. Instead the Desktop app runs a local
**BlueBubbles bridge** — a small HTTP server in the Electron main process that
registers a webhook on a local [BlueBubbles](https://bluebubbles.app/) server,
receives iMessage events, and forwards them to LobeHub Cloud.
So the test surface is three layers:
1. **Electron main IPC**`imessageBridge.*` handlers (`getStatus`,
`testConfig`, `upsertConfig`, `removeConfig`, `start`, `stop`)
2. **Local bridge HTTP server**`http://127.0.0.1:<port>/webhooks/bluebubbles/<appId>?secret=<secret>`
3. **BlueBubbles REST API**`http://127.0.0.1:1234/api/v1/*` (webhook + server/info)
## Prerequisites
- A running **BlueBubbles server** (macOS, default `http://127.0.0.1:1234`) with
a known password. Sanity check:
```bash
curl -sS -m4 -o /dev/null -w '%{http_code}\n' \
"http://127.0.0.1:1234/api/v1/server/info?password=<PW>" # expect 200
```
- **Electron dev running with CDP**: `./.agents/skills/local-testing/scripts/electron-dev.sh start`
- The **iMessage Desktop branch** checked out (the `imessageBridge` IPC group
and `@lobechat/chat-adapter-imessage` must be compiled into the main bundle).
Run `pnpm install --ignore-scripts` at the repo root **and** in `apps/desktop/`
after switching branches — the new workspace package must be linked or the
main build fails to resolve `@lobechat/chat-adapter-imessage`.
## Fast path: automated script
```bash
./.agents/skills/local-testing/bot/imessage/test-imessage-bridge.sh '<bluebubbles_password>' [bb_url] [cdp_port]
```
Asserts the whole flow and self-cleans (unique `applicationId` per run, removes
its bridge config + BlueBubbles webhook on exit). Exit 0 = all green. It covers:
- BlueBubbles reachable + password valid; Electron CDP reachable; IPC available
- `testConfig` happy path → success
- `testConfig` wrong password → rejected; unreachable URL → rejected
- `upsertConfig` **first-time save → success** (Bug #1 regression guard, below)
- `getStatus` → `running:true`, config persisted, password redacted (`blueBubblesPasswordSet`)
- BlueBubbles webhook actually registered for the appId
- Local bridge HTTP server: wrong secret → 401; valid secret → past auth
The password is passed as argv (visible in `ps`) — local dev only, don't use a
real secret on a shared machine.
## Layer 1 — IPC probes (no UI)
The renderer exposes the main-process handlers via `window.electronAPI.invoke`.
This is the quickest way to exercise the bridge without clicking:
```bash
# baseline
agent-browser --cdp 9222 eval \
"(async()=>JSON.stringify(await window.electronAPI.invoke('imessageBridge.getStatus',{})))()"
# test a connection (note: password as a JS string)
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(async function () {
try {
var r = await window.electronAPI.invoke('imessageBridge.testConfig', {
applicationId: 'probe',
blueBubblesServerUrl: 'http://127.0.0.1:1234',
blueBubblesPassword: 'PASTE_PW',
enabled: true,
webhookSecret: 'probe-secret',
});
return JSON.stringify(r); // { success: true }
} catch (e) { return 'ERR: ' + (e.message || e); }
})()
EVALEOF
```
`upsertConfig` persists to the Electron store, starts the local HTTP server, and
registers the BlueBubbles webhook. `removeConfig` + `stop` reverse it.
## Layer 2 — full UI flow (agent-browser)
The bridge settings only render in Desktop (`isDesktop` guard) under the agent's
**Channel → iMessage** screen. The platform tile only appears as a real (non
"Coming Soon") entry once the server registers `imessage` **and** the frontend
drops it from `COMING_SOON_PLATFORMS` (`src/routes/(main)/agent/channel/const.ts`).
```bash
agent-browser --cdp 9222 open "http://localhost:5173/agent/<aid>/channel"
agent-browser --cdp 9222 wait --load networkidle && agent-browser --cdp 9222 wait 1500
# confirm the remote backend lists imessage (it must be registered + deployed)
agent-browser --cdp 9222 eval --stdin << 'EVALEOF'
(async function(){
var url='lobe-backend://lobe/trpc/lambda/agentBotProvider.listPlatforms?input='+encodeURIComponent('{"json":null,"meta":{"values":["undefined"],"v":1}}');
var d=await (await fetch(url,{credentials:'include'})).json();
var p=d.result?.data?.json||d;
return JSON.stringify(p.map(function(x){return x.id;}));
})()
EVALEOF
# click the iMessage tile, then fill the form by ref
agent-browser --cdp 9222 eval "(()=>{var b=[...document.querySelectorAll('aside button')].find(x=>/imessage/i.test(x.textContent));b&&b.click();})()"
agent-browser --cdp 9222 wait 1500
agent-browser --cdp 9222 snapshot -i | grep -iE "127.0.0.1:1234|Application ID|Webhook Secret|Test BlueBubbles|Save Bridge"
```
Field refs (from the snapshot): Application ID, Webhook Secret, BlueBubbles
Server URL (`placeholder="http://127.0.0.1:1234"`), and a **nested** textbox right
under the URL one is the BlueBubbles Password. Fill with `fill` (real input
events — `eval`-setting React inputs won't fire onChange), click **Test
BlueBubbles**, then **Save Bridge**. Read the antd toast immediately (it
auto-dismisses):
```bash
agent-browser --cdp 9222 eval \
"JSON.stringify([...new Set([...document.querySelectorAll('.ant-message-custom-content')].map(n=>n.textContent.trim()))])"
# Test → "BlueBubbles connection passed"
# Save → "iMessage Desktop bridge saved"
```
Verify the end state via BlueBubbles + IPC:
```bash
curl -sS "http://127.0.0.1:1234/api/v1/webhook?password=<PW>" # webhook for the appId present
agent-browser --cdp 9222 eval "(async()=>JSON.stringify(await window.electronAPI.invoke('imessageBridge.getStatus',{})))()"
# running:true, serverUrl: http://127.0.0.1:33270, configs[].blueBubblesPasswordSet:true
```
Cleanup: `removeConfig` + `stop` via IPC, then `DELETE /api/v1/webhook/<id>` on
BlueBubbles.
## Outbound send test (desktop → BlueBubbles → iMessage)
Verifies the leg the bridge uses to _reply_: `BlueBubblesApiClient.sendText`
→ `POST /api/v1/message/text`. Run the helper against your own number:
```bash
./.agents/skills/local-testing/bot/imessage/send-imessage-test.sh '<bb_password>' '+<E164>' # e.g. +15551234567
```
**Gotcha that bites everyone:** with `method=apple-script` and a _new_
conversation, the HTTP POST often **times out** even though the message is
sent. Never judge success by the HTTP response. Instead poll
`POST /api/v1/message/query` and read the matching `isFromMe:true` row's
`error` field:
- `error: 0` (or null) → sent OK
- non-zero `error` → real send failure
The script does exactly this: fires the send, ignores the timeout, then matches
its marker text in the message store and asserts `error == 0`.
Two more notes:
- Use a full E.164 handle (`iMessage;-;+<countrycode><number>`) or an Apple ID
email. Looking the chat up by guid afterwards may 404 if BB filed the message
under a differently-formatted guid — that's a lookup quirk, not a send failure.
- Sending to _your own_ number round-trips: BB records both the outgoing
(`fromMe:true`) and an incoming copy (`fromMe:false`).
## Inbound e2e test (iMessage → cloud agent → reply)
Full inbound chain: a message arrives → BlueBubbles fires its `new-message`
webhook → local bridge (`:33270`) → `forwardWebhook` POSTs to
`<remote>/api/agent/webhooks/imessage/<appId>?secret=…` → cloud agent → reply
flows back via Device Gateway → BB `sendText`.
Prerequisites:
- A cloud bot provider for the same `applicationId` exists and is **connected**
(Save Configuration + the device gateway connected — a _disconnected_ gateway
yields `DEVICE_NOT_FOUND` on connect and blocks the reply leg).
- The `imessage` Labs toggle is on (otherwise the channel is gated to "Coming
Soon"), and `webhookSecret` matches on both ends (auto-generated on save).
Two ways to drive it:
1. **Second device / Apple ID (recommended).** Have _another_ Apple ID message
the BB-hosted number (e.g. "please reply pong"). The bot replies; you see it
on the other device. **No loop risk** — the reply goes to the other party,
not back to itself.
2. **Send to your own number (quick, loop-aware).** `sendText` to the hosted
number; the loopback _incoming_ copy (`isFromMe:false`) triggers the bot.
Watch the reply land in `message/query` as a `fromMe:true` row.
**Loop guard — why a self-send doesn't spin forever:** the Chat SDK adapter
drops any `isFromMe` message before dispatch
(`packages/chat-adapter-imessage/src/adapter.ts`: `if (message.isFromMe) return`).
The bot's own reply (`isFromMe:true`) is never re-processed, so in the normal
case (someone else → bot → reply to them) there is no loop. The self-send case
is a **test-only edge**: the bot's reply also round-trips to your number, and
only the adapter's `isFromMe` check stops a second pass. Keep the prompt
conversational (so the bot doesn't keep finding something to answer), and
**turn the `imessage` lab off / remove the config when done** — never leave a
self-send bot running unattended.
Watch the chain live:
```bash
tail -f /tmp/electron-dev.log | grep -iE "imessage|bridge|forward|Message API"
# the agent reply shows up as a fromMe:true row with the bot's text:
curl -sS -X POST "http://127.0.0.1:1234/api/v1/message/query?password=<PW>" \
-H 'Content-Type: application/json' -d '{"limit":5,"sort":"DESC"}'
```
`startTyping` will log a Private-API error unless BlueBubbles has the Private
API helper set up (needs a jailbroken / SIP-disabled Mac) — it's logged and
ignored; text replies still work.
## Known bugs / gotchas
- **Bug #1 — first-time save (fixed; guarded by the script).** BlueBubbles'
`GET /api/v1/webhook?url=<unregistered>` returns **HTTP 500**
(`Cannot read properties of null (reading 'events')`). The bridge must list
**all** webhooks and match client-side, never pass the `?url=` filter. If you
see `upsertConfig` fail with "An unhandled error has occurred!" originating in
`listWebhooks`, this regressed.
- **Save leaves a half-state on webhook failure.** `upsertConfig` writes the
config + starts the HTTP server _before_ registering the webhook, so a webhook
failure still reports `running:true` with the config persisted but no
BlueBubbles webhook. Always assert the BlueBubbles webhook list, not just IPC
status.
- **Unknown appId / forward failure → 500.** Posting to the local bridge for an
unknown appId, or when no cloud bot is bound, returns 500 (BlueBubbles retries
on 5xx). Auth (wrong secret → 401) is enforced before that.
- **Backend deploy lag.** Desktop dev proxies tRPC through `lobe-backend://` to
the _remote_ server. iMessage only appears in `listPlatforms` once the server
registration is deployed there, regardless of local branch.
- **Restart to load main-process fixes.** Editing `imessageBridgeSrv.ts` /
`@lobechat/chat-adapter-imessage` needs `electron-dev.sh restart` — main isn't
hot-replaced. On restart, enabled configs auto-register their webhook again.
@@ -1,81 +0,0 @@
#!/usr/bin/env bash
#
# send-imessage-test.sh — Verify the outbound leg: desktop → BlueBubbles → iMessage
#
# Sends one real iMessage via the same REST call the Desktop bridge uses
# (`POST /api/v1/message/text`, which BlueBubblesApiClient.sendText wraps) and
# confirms it actually went out.
#
# KEY GOTCHA: with method=apple-script and a NEW conversation, the HTTP request
# often TIMES OUT even though the message is sent. Do NOT treat the timeout as a
# failure — instead poll `POST /api/v1/message/query` and check the message's
# `error` field (0 = sent OK). This script does that for you.
#
# This sends a REAL message, so it has side effects. Target your own number.
#
# Usage:
# ./send-imessage-test.sh <bb_password> <target_e164> [message] [bb_url]
#
# Example (send to your own phone, E.164 with country code):
# ./send-imessage-test.sh 'my-bb-pass' '+15551234567'
#
set -euo pipefail
BB_PASS="${1:?Usage: $0 <bb_password> <target_e164(+countrycode)> [message] [bb_url]}"
TARGET="${2:?Need a target handle in E.164, e.g. +15551234567 (or an Apple ID email)}"
MARKER="lobe-imsg-test-$(date +%s)"
MESSAGE="${3:-[${MARKER}] desktop bridge → BlueBubbles → iMessage outbound check}"
BB_URL="${4:-http://127.0.0.1:1234}"
CHAT_GUID="iMessage;-;${TARGET}"
echo "[send-test] target=${TARGET} marker=${MARKER}"
# 1) Fire the send. apple-script on a new chat may hang the HTTP response, so we
# cap it short and ignore a timeout — step 2 is the source of truth.
python3 - "$BB_PASS" "$BB_URL" "$CHAT_GUID" "$MESSAGE" <<'PY' || true
import json,sys,urllib.request,urllib.parse,uuid
pw,base,guid,msg=sys.argv[1:5]
url=base+"/api/v1/message/text?password="+urllib.parse.quote(pw)
body={"chatGuid":guid,"message":msg,"method":"apple-script","tempGuid":str(uuid.uuid4())}
req=urllib.request.Request(url,data=json.dumps(body).encode("utf-8"),
headers={"Content-Type":"application/json"},method="POST")
try:
r=urllib.request.urlopen(req,timeout=8)
print("[send-test] HTTP",r.status,"(immediate response)")
except urllib.error.HTTPError as e:
print("[send-test] HTTP",e.code,e.read().decode()[:200])
except Exception as e:
print("[send-test] HTTP request returned no body (likely apple-script delay):",type(e).__name__)
PY
# 2) Source of truth: find our marker in the message store and read its error.
echo "[send-test] verifying via message/query (the HTTP timeout above is expected)…"
sleep 3
python3 - "$BB_PASS" "$BB_URL" "$MARKER" <<'PY'
import json,sys,time,urllib.request,urllib.parse
pw,base,marker=sys.argv[1:4]
url=base+"/api/v1/message/query?password="+urllib.parse.quote(pw)
def query():
body={"limit":15,"offset":0,"with":["chats"],"sort":"DESC"}
req=urllib.request.Request(url,data=json.dumps(body).encode(),
headers={"Content-Type":"application/json"},method="POST")
return json.load(urllib.request.urlopen(req,timeout=12)).get("data") or []
hit=None
for _ in range(5):
for m in query():
if marker in (m.get("text") or "") and m.get("isFromMe"):
hit=m; break
if hit: break
time.sleep(2)
if not hit:
print("[send-test] ✗ outbound message not found in BB store — send likely failed")
sys.exit(1)
err=hit.get("error")
if err in (0,None):
print("[send-test] ✓ outbound message sent (fromMe=True, error=%s)"%err)
print("[send-test] → confirm it arrived in the Messages app on the target device")
else:
print("[send-test] ✗ BlueBubbles reported send error=%s"%err)
sys.exit(1)
PY
@@ -1,187 +0,0 @@
#!/usr/bin/env bash
#
# test-imessage-bridge.sh — Regression test for the iMessage Desktop bridge
#
# Drives the Electron main-process `imessageBridge.*` IPC handlers plus the
# local bridge HTTP server and the BlueBubbles server, asserting the full
# connect/configure flow. Use it to regression-test PR work on the iMessage
# channel (BlueBubbles bridge) without clicking through the UI every time.
#
# Prerequisites:
# 1. BlueBubbles server running and reachable (default http://127.0.0.1:1234)
# 2. Electron dev running with CDP — `electron-dev.sh start`
# 3. `agent-browser` on PATH, connected to the same CDP port
#
# Usage:
# ./test-imessage-bridge.sh <bluebubbles_password> [bb_url] [cdp_port]
#
# Example:
# ./test-imessage-bridge.sh 'my-bb-password'
# ./test-imessage-bridge.sh 'my-bb-password' http://127.0.0.1:1234 9222
#
# Notes:
# - The password is passed as an argv, so it is visible in `ps`. This is a
# local dev tool; do not run it on shared machines with a real secret.
# - It uses a unique applicationId per run (imsg-regression-$$) and cleans up
# its own bridge config + BlueBubbles webhook on exit, so it is safe to
# re-run and does not disturb real configs.
set -euo pipefail
BB_PASS="${1:?Usage: $0 <bluebubbles_password> [bb_url] [cdp_port]}"
BB_URL="${2:-http://127.0.0.1:1234}"
CDP_PORT="${3:-9222}"
APP_ID="imsg-regression-$$"
SECRET="regression-secret-$$"
PASS=0
FAIL=0
# ── Output helpers ───────────────────────────────────────────────────
ok() { echo "$1"; PASS=$((PASS + 1)); }
bad() { echo "$1$2"; FAIL=$((FAIL + 1)); }
note() { echo "[imsg-test] $1"; }
# ── BlueBubbles REST helpers ─────────────────────────────────────────
bb_get_webhooks() {
curl -sS -m 8 "${BB_URL}/api/v1/webhook?password=${BB_PASS}"
}
# Delete every webhook whose URL mentions our APP_ID (cleanup is idempotent).
bb_cleanup_webhooks() {
local ids
ids=$(bb_get_webhooks | python3 -c '
import json,sys
try: d=json.load(sys.stdin)
except Exception: sys.exit(0)
for w in (d.get("data") or []):
if "'"$APP_ID"'" in (w.get("url") or ""): print(w["id"])
' 2>/dev/null || true)
for id in $ids; do
curl -sS -m 8 -X DELETE "${BB_URL}/api/v1/webhook/${id}?password=${BB_PASS}" >/dev/null 2>&1 || true
done
}
# ── IPC helper (drives the Electron renderer's electronAPI bridge) ───
# Runs a JS snippet that returns a string token; prints the raw token.
# The BlueBubbles password is base64-injected (atob) so special chars in the
# secret never need shell/JS quoting.
ipc_eval() {
local js="$1"
agent-browser --cdp "$CDP_PORT" eval -b "$(printf '%s' "$js" | base64)" 2>/dev/null
}
PASS_B64=$(printf '%s' "$BB_PASS" | base64)
# Emit an inline JS object literal for the bridge config. $1 overrides the
# password expression (defaults to atob of the real password); pass a JS string
# literal like "'wrong'" to test the rejection path.
ipc_config_js() {
local pwexpr="${1:-atob('${PASS_B64}')}"
printf "{applicationId:'%s',blueBubblesServerUrl:'%s',blueBubblesPassword:%s,enabled:true,webhookSecret:'%s'}" \
"$APP_ID" "$BB_URL" "$pwexpr" "$SECRET"
}
# ── Preflight ────────────────────────────────────────────────────────
note "BlueBubbles: ${BB_URL} CDP: ${CDP_PORT} appId: ${APP_ID}"
code=$(curl -sS -m 6 -o /dev/null -w '%{http_code}' \
"${BB_URL}/api/v1/server/info?password=${BB_PASS}" || echo 000)
if [ "$code" = "200" ]; then ok "BlueBubbles reachable + password valid"; else
bad "BlueBubbles preflight" "HTTP $code (is BlueBubbles running on ${BB_URL}?)"
echo "Aborting — fix BlueBubbles first."; exit 1
fi
if ! curl -sf --max-time 3 "http://localhost:${CDP_PORT}/json/version" >/dev/null 2>&1; then
bad "Electron CDP preflight" "CDP ${CDP_PORT} unreachable — run electron-dev.sh start"
echo "Aborting."; exit 1
fi
ok "Electron CDP reachable"
# Bridge must expose the IPC group (built from this branch's code).
probe=$(ipc_eval "(async()=>{try{var s=await window.electronAPI.invoke('imessageBridge.getStatus',{});return 'OK:'+JSON.stringify(s);}catch(e){return 'ERR:'+(e.message||e);}})()")
case "$probe" in
*OK:*) ok "imessageBridge IPC available" ;;
*) bad "imessageBridge IPC" "got: $probe (is the iMessage Desktop branch checked out?)"; echo "Aborting."; exit 1 ;;
esac
# Start clean: remove any leftover config for this appId + BB webhooks.
ipc_eval "(async()=>{try{await window.electronAPI.invoke('imessageBridge.removeConfig',{applicationId:'${APP_ID}'});}catch(e){}return 'done';})()" >/dev/null
bb_cleanup_webhooks
# ── testConfig: happy path ───────────────────────────────────────────
r=$(ipc_eval "(async()=>{try{var c=$(ipc_config_js);var x=await window.electronAPI.invoke('imessageBridge.testConfig',c);return 'OK:'+JSON.stringify(x);}catch(e){return 'ERR:'+(e.message||e);}})()")
case "$r" in
*OK:*success*true*) ok "testConfig with valid password → success" ;;
*) bad "testConfig (valid)" "got: $r" ;;
esac
# ── testConfig: wrong password rejects ───────────────────────────────
r=$(ipc_eval "(async()=>{try{var c=$(ipc_config_js "'definitely-wrong-password'");var x=await window.electronAPI.invoke('imessageBridge.testConfig',c);return 'OK:'+JSON.stringify(x);}catch(e){return 'ERR:'+(e.message||e);}})()")
case "$r" in
*ERR:*) ok "testConfig with wrong password → rejected" ;;
*) bad "testConfig (wrong password)" "expected rejection, got: $r" ;;
esac
# ── testConfig: unreachable URL rejects ──────────────────────────────
r=$(ipc_eval "(async()=>{try{var x=await window.electronAPI.invoke('imessageBridge.testConfig',{applicationId:'${APP_ID}',blueBubblesServerUrl:'http://127.0.0.1:65530',blueBubblesPassword:atob('${PASS_B64}'),enabled:true,webhookSecret:'${SECRET}'});return 'OK:'+JSON.stringify(x);}catch(e){return 'ERR:'+(e.message||e);}})()")
case "$r" in
*ERR:*) ok "testConfig with unreachable URL → rejected" ;;
*) bad "testConfig (unreachable)" "expected rejection, got: $r" ;;
esac
# ── upsertConfig: FIRST-TIME registration (Bug #1 regression guard) ──
# BlueBubbles' GET /webhook?url=<unregistered> returns HTTP 500. The bridge
# must list ALL webhooks and match client-side, otherwise this first save
# fails. This assertion guards that fix.
r=$(ipc_eval "(async()=>{try{var c=$(ipc_config_js);var x=await window.electronAPI.invoke('imessageBridge.upsertConfig',c);return 'OK:'+JSON.stringify(x);}catch(e){return 'ERR:'+(e.message||e);}})()")
case "$r" in
*OK:*success*true*) ok "upsertConfig first-time save → success (Bug #1 guard)" ;;
*) bad "upsertConfig (first-time)" "got: $r" ;;
esac
# ── getStatus: bridge running + config persisted ─────────────────────
# Return a quote-free token so grep isn't tripped up by agent-browser's
# JSON-string escaping of the eval result.
r=$(ipc_eval "(async()=>{var s=await window.electronAPI.invoke('imessageBridge.getStatus',{});var c=(s.configs||[]).find(function(x){return x.applicationId==='${APP_ID}';});return 'RUN='+(s.running?'Y':'N')+' CFG='+(c?'Y':'N')+' PW='+((c&&c.blueBubblesPasswordSet)?'Y':'N');})()")
echo "$r" | grep -q 'RUN=Y' && ok "bridge running" || bad "bridge running" "got: $r"
echo "$r" | grep -q 'CFG=Y' && ok "config persisted" || bad "config persisted" "got: $r"
echo "$r" | grep -q 'PW=Y' && ok "password stored (redacted in status)" || bad "password stored" "got: $r"
# ── BlueBubbles webhook actually registered ──────────────────────────
if bb_get_webhooks | grep -q "${APP_ID}"; then
ok "BlueBubbles webhook registered for appId"
else
bad "BlueBubbles webhook" "no webhook URL containing ${APP_ID}"
fi
# ── Local bridge HTTP server: secret enforcement ─────────────────────
BRIDGE_URL=$(ipc_eval "(async()=>{var s=await window.electronAPI.invoke('imessageBridge.getStatus',{});return s.serverUrl||'';})()" | tr -d '"')
if [ -n "$BRIDGE_URL" ]; then
# wrong secret → 401
code=$(curl -sS -m 6 -o /dev/null -w '%{http_code}' -X POST \
-H 'Content-Type: application/json' \
"${BRIDGE_URL}/webhooks/bluebubbles/${APP_ID}?secret=WRONG" \
-d '{"type":"new-message","data":{"guid":"x"}}' || echo 000)
[ "$code" = "401" ] && ok "local bridge rejects wrong secret (401)" || bad "local bridge wrong secret" "expected 401, got $code"
# right secret → passes auth (reaches forward; without a bound cloud bot it
# returns 5xx — that's fine, we're only asserting auth + routing here)
code=$(curl -sS -m 6 -o /dev/null -w '%{http_code}' -X POST \
-H 'Content-Type: application/json' \
"${BRIDGE_URL}/webhooks/bluebubbles/${APP_ID}?secret=${SECRET}" \
-d '{"type":"new-message","data":{"guid":"x","text":"hi"}}' || echo 000)
[ "$code" != "401" ] && ok "local bridge accepts valid secret (HTTP $code, past auth)" || bad "local bridge valid secret" "got 401 with correct secret"
else
bad "local bridge URL" "getStatus returned no serverUrl"
fi
# ── Cleanup ──────────────────────────────────────────────────────────
ipc_eval "(async()=>{try{await window.electronAPI.invoke('imessageBridge.removeConfig',{applicationId:'${APP_ID}'});await window.electronAPI.invoke('imessageBridge.stop',{});}catch(e){}return 'cleaned';})()" >/dev/null
bb_cleanup_webhooks
note "cleaned up config + BlueBubbles webhook for ${APP_ID}"
# ── Summary ──────────────────────────────────────────────────────────
echo ""
echo "[imsg-test] PASS=${PASS} FAIL=${FAIL}"
[ "$FAIL" -eq 0 ] || exit 1
@@ -1,61 +0,0 @@
# Lark / 飞书 Bot Testing
**App name:** `Lark` or `飞书` | **Process name:** `Lark` or `飞书`
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
```bash
# Activate Lark (auto-detects Lark or 飞书)
osascript -e 'tell application "Lark" to activate' 2> /dev/null \
|| osascript -e 'tell application "飞书" to activate'
sleep 1
# Quick Switcher / Search (Cmd+K)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e '
set the clipboard to "bot-testing"
tell application "System Events"
keystroke "v" using command down
delay 1.5
key code 36 -- Enter
end tell
'
sleep 2
```
## Send Message to Bot
```bash
osascript -e '
set the clipboard to "@MyBot help me with this task"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36 -- Enter
end tell
'
```
## Verify Response
```bash
sleep 10
screencapture /tmp/lark-bot-response.png
```
## Lark-Specific Notes
- App name varies: `Lark` (international) vs `飞书` (China mainland) — the script auto-detects
- Uses `Cmd+K` for quick search (same as Discord/Slack)
- Enter sends message by default
- Always use clipboard paste for CJK characters
## Script
```bash
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "@MyBot hello"
./.agents/skills/local-testing/bot/lark/test-lark-bot.sh "bot-testing" "Help me with this" 30
```
@@ -1,217 +0,0 @@
# osascript Common Patterns
Shared AppleScript / `osascript` patterns used by all platform bot tests. Read this first, then refer to the per-platform file for app-specific quirks.
## Core Patterns
### Activate an App
```bash
osascript -e 'tell application "Discord" to activate'
```
### Type Text
```bash
# Type character by character (reliable, but slow for long text)
osascript -e 'tell application "System Events" to keystroke "Hello world"'
# Press Enter
osascript -e 'tell application "System Events" to key code 36'
# Press Tab
osascript -e 'tell application "System Events" to key code 48'
# Press Escape
osascript -e 'tell application "System Events" to key code 53'
```
### Paste from Clipboard (fast, for long text)
```bash
# Set clipboard and paste — much faster than keystroke for long messages
osascript -e 'set the clipboard to "Your long message here"'
osascript -e 'tell application "System Events" to keystroke "v" using command down'
```
Or in one shot:
```bash
osascript -e '
set the clipboard to "Your long message here"
tell application "System Events" to keystroke "v" using command down
'
```
### Keyboard Shortcuts
```bash
# Cmd+K (quick switcher in Discord/Slack)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
# Cmd+F (search)
osascript -e 'tell application "System Events" to keystroke "f" using command down'
# Cmd+N (new message/chat)
osascript -e 'tell application "System Events" to keystroke "n" using command down'
# Cmd+Shift+K (example: multi-modifier)
osascript -e 'tell application "System Events" to keystroke "k" using {command down, shift down}'
```
### Click at Position
```bash
# Click at absolute screen coordinates
osascript -e '
tell application "System Events"
click at {500, 300}
end tell
'
```
### Get Window Info
```bash
# Get window position and size
osascript -e '
tell application "System Events"
tell process "Discord"
get {position, size} of window 1
end tell
end tell
'
```
### Screenshot
```bash
# Full screen
screencapture /tmp/screenshot.png
# Interactive region select
screencapture -i /tmp/screenshot.png
# Specific window (by window ID from CGWindowList)
screencapture -l < WINDOW_ID > /tmp/screenshot.png
```
To get window ID for a specific app:
```bash
osascript -e '
tell application "System Events"
tell process "Discord"
get id of window 1
end tell
end tell
'
```
### Read Accessibility Elements
```bash
# Get all UI elements of the frontmost window (can be slow/large)
osascript -e '
tell application "System Events"
tell process "Discord"
entire contents of window 1
end tell
end tell
'
# Get a specific element's value
osascript -e '
tell application "System Events"
tell process "Discord"
get value of text field 1 of window 1
end tell
end tell
'
```
> **Warning:** `entire contents` can be extremely slow on complex UIs. Prefer screenshots + `Read` tool for visual verification.
### Read Screen Text via Clipboard
For reading the latest message or response from an app:
```bash
# Select all text in the focused area and copy
osascript -e '
tell application "System Events"
keystroke "a" using command down
keystroke "c" using command down
end tell
'
sleep 0.5
# Read clipboard
pbpaste
```
---
## Common Bot Testing Workflow
Regardless of platform, the pattern is:
```bash
APP_NAME="Discord" # or "Slack", "Telegram", "微信"
CHANNEL="bot-testing"
MESSAGE="Hello bot!"
WAIT_SECONDS=10
# 1. Activate
osascript -e "tell application \"$APP_NAME\" to activate"
sleep 1
# 2. Navigate to channel/chat (via Quick Switcher or Search)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e "tell application \"System Events\" to keystroke \"$CHANNEL\""
sleep 1
osascript -e 'tell application "System Events" to key code 36'
sleep 2
# 3. Send message
osascript -e "set the clipboard to \"$MESSAGE\""
osascript -e '
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
# 4. Wait for bot response
sleep "$WAIT_SECONDS"
# 5. Screenshot for verification
screencapture /tmp/"${APP_NAME,,}"-bot-test.png
echo "Result saved to /tmp/${APP_NAME,,}-bot-test.png"
```
### Tips
- **Use clipboard paste** (`Cmd+V`) for messages containing special characters or long text — `keystroke` can mangle non-ASCII
- **Add `delay`** between actions — apps need time to process UI events
- **Screenshot for verification** — use `screencapture` + `Read` tool for visual checks
- **Use a dedicated test channel/chat** — avoid polluting real conversations
- **Check app name** — some apps have different names in different locales (e.g., `微信` vs `WeChat`)
- **Accessibility permissions required** — System Events automation requires granting Accessibility access in System Preferences > Privacy & Security > Accessibility
---
## Gotchas
- **Accessibility permission required** — first run will prompt for access; grant it in System Preferences > Privacy & Security > Accessibility for Terminal / iTerm / Claude Code
- **`keystroke` is slow for long text** — always use clipboard paste (`Cmd+V`) for messages over \~20 characters
- **`keystroke` can mangle non-ASCII** — use clipboard paste for Chinese, emoji, or special characters
- **`key code 36` is Enter** — this is the hardware key code, works regardless of keyboard layout
- **`entire contents` is extremely slow** — avoid for complex UIs; use screenshots instead
- **App name varies by locale** — `微信` vs `WeChat`, `企业微信` vs `WeCom`; handle both
- **WeChat Enter sends immediately** — use `Shift+Enter` for newlines within a message
- **Rate limiting** — don't send messages too fast; platforms may throttle or flag automated input
- **Lark / 飞书 app name varies** — `Lark` (international) vs `飞书` (China mainland); scripts auto-detect
- **QQ uses `Cmd+F` for search** — not `Cmd+K` like Discord/Slack/Lark
- **Bot response times vary** — AI-powered bots may take 10-60s; use generous sleep values
@@ -1,62 +0,0 @@
# QQ Bot Testing
**App name:** `QQ` | **Process name:** `QQ`
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
```bash
osascript -e 'tell application "QQ" to activate'
sleep 1
# Search for contact/group (Cmd+F)
osascript -e '
tell application "System Events"
keystroke "f" using command down
delay 0.8
end tell
'
osascript -e '
set the clipboard to "bot-testing"
tell application "System Events"
keystroke "v" using command down
delay 1.5
key code 36 -- Enter
end tell
'
sleep 2
```
## Send Message to Bot
```bash
osascript -e '
set the clipboard to "Hello bot!"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36 -- Enter
end tell
'
```
## Verify Response
```bash
sleep 10
screencapture /tmp/qq-bot-response.png
```
## QQ-Specific Notes
- Enter sends message by default; Shift+Enter for newlines
- Uses `Cmd+F` for search (not `Cmd+K` like Discord/Slack/Lark)
- Always use clipboard paste for CJK characters
## Script
```bash
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "bot-testing" "Hello bot" 15
./.agents/skills/local-testing/bot/qq/test-qq-bot.sh "MyBot" "/help" 10
```
@@ -1,73 +0,0 @@
# Slack Bot Testing
**App name:** `Slack` | **Process name:** `Slack`
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
```bash
# Activate Slack
osascript -e 'tell application "Slack" to activate'
sleep 1
# Quick Switcher (Cmd+K)
osascript -e 'tell application "System Events" to keystroke "k" using command down'
sleep 0.5
osascript -e 'tell application "System Events" to keystroke "bot-testing"'
sleep 1
osascript -e 'tell application "System Events" to key code 36' # Enter
sleep 2
```
## Send Message to Bot
```bash
# Direct message input (focused after channel nav)
osascript -e 'tell application "System Events" to keystroke "@mybot hello"'
sleep 0.3
osascript -e 'tell application "System Events" to key code 36'
```
## Send Long Message
```bash
osascript -e '
tell application "Slack" to activate
delay 0.5
set the clipboard to "A long test message for the bot..."
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
```
## Slash Command Test
```bash
osascript -e '
tell application "Slack" to activate
delay 0.5
tell application "System Events"
keystroke "/ask What is the meaning of life?"
delay 0.5
key code 36
end tell
'
```
## Verify Response
```bash
sleep 10
screencapture /tmp/slack-bot-response.png
```
## Script
```bash
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "@mybot hello"
./.agents/skills/local-testing/bot/slack/test-slack-bot.sh "bot-testing" "/ask What is 2+2?" 20
```
@@ -1,80 +0,0 @@
# Telegram Bot Testing
**App name:** `Telegram` | **Process name:** `Telegram`
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
```bash
# Activate Telegram
osascript -e 'tell application "Telegram" to activate'
sleep 1
# Search for a bot (Cmd+F or click search)
osascript -e '
tell application "System Events"
keystroke "f" using command down
delay 0.5
keystroke "MyTestBot"
delay 1
key code 36 -- Enter to select
end tell
'
sleep 2
```
## Send Message to Bot
```bash
# After navigating to bot chat, input is focused
osascript -e '
tell application "System Events"
keystroke "/start"
delay 0.3
key code 36
end tell
'
```
## Send Long Message
```bash
osascript -e '
tell application "Telegram" to activate
delay 0.5
set the clipboard to "Tell me about quantum computing in detail"
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
```
## Verify Response
```bash
sleep 10
screencapture /tmp/telegram-bot-response.png
```
## Telegram Bot API (programmatic alternative)
For sending messages directly to the bot's chat without UI:
```bash
# Send message as the bot (for testing webhooks/responses)
curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
-d "chat_id=$CHAT_ID&text=test message"
# Get recent updates
curl -s "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/getUpdates?limit=5" | jq .
```
## Script
```bash
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "MyTestBot" "/start"
./.agents/skills/local-testing/bot/telegram/test-telegram-bot.sh "GPTBot" "Hello" 60
```
@@ -1,81 +0,0 @@
# WeChat / 微信 Bot Testing
**App name:** `微信` or `WeChat` | **Process name:** `WeChat`
See [osascript-common.md](../osascript-common.md) for shared patterns.
## Activate & Navigate
```bash
# Activate WeChat
osascript -e 'tell application "微信" to activate'
sleep 1
# Search for a contact/bot (Cmd+F)
osascript -e '
tell application "System Events"
keystroke "f" using command down
delay 0.5
keystroke "TestBot"
delay 1
key code 36 -- Enter to select
end tell
'
sleep 2
```
## Send Message
```bash
# After navigating to a chat, the input is focused
osascript -e '
tell application "System Events"
keystroke "Hello bot!"
delay 0.3
key code 36
end tell
'
```
## Send Long Message (clipboard)
```bash
osascript -e '
tell application "微信" to activate
delay 0.5
set the clipboard to "Please help me with this task..."
tell application "System Events"
keystroke "v" using command down
delay 0.3
key code 36
end tell
'
```
## Verify Response
```bash
sleep 10
screencapture /tmp/wechat-bot-response.png
```
## WeChat-Specific Notes
- WeChat macOS app name can be `微信` or `WeChat` depending on system language. Try both:
```bash
osascript -e 'tell application "微信" to activate' 2> /dev/null \
|| osascript -e 'tell application "WeChat" to activate'
```
- WeChat uses **Enter** to send (not Cmd+Enter by default, but configurable)
- For multi-line messages without sending, use **Shift+Enter**:
```bash
osascript -e 'tell application "System Events" to key code 36 using shift down'
```
- Always use clipboard paste for CJK characters — `keystroke` mangles non-ASCII
## Script
```bash
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "文件传输助手" "test message" 5
./.agents/skills/local-testing/bot/wechat/test-wechat-bot.sh "MyBot" "Tell me a joke" 30
```
@@ -1,110 +0,0 @@
# Log `agent-browser` into a local LobeHub dev server
`agent-browser --headed` on macOS often creates the Chromium window off-screen — the user can't see or interact with it, so manual login inside the agent-browser session fails. Instead of sharing the user's real Chrome profile, copy the **better-auth session cookie** out of a request in DevTools and inject it into the agent-browser session as a Playwright-style state file.
## When to use
- You need `agent-browser` to reach an authenticated page on `http://localhost:<port>` (e.g. `localhost:3011`).
- The user already has a logged-in tab of the same dev server in their own Chrome.
- Spawning a headed Chromium to let the user log in manually is unreliable (window off-screen, no interaction).
Do **not** use this on production URLs — only local dev. Treat the cookie as a secret: don't paste it into shared logs, PRs, or commit it anywhere.
## Step 1 — Ask the user to copy the cookie from a Network request, NOT `document.cookie`
`document.cookie` will not return HttpOnly cookies, which is exactly where better-auth puts its session. Instruct the user:
1. Open the logged-in tab (`http://localhost:<port>/…`) in their own Chrome.
2. `Cmd+Option+I`**Network** tab.
3. Refresh, click any same-origin request (e.g. the top-level document request).
4. In the right pane under **Request Headers**, right-click the `Cookie:` line → **Copy value** (or copy the entire header).
5. Paste the string into chat.
You only need the better-auth pieces. Everything else (Clerk, `LOBE_LOCALE`, HMR hash, theme vars) is noise and can stay. The minimum viable set is:
```
better-auth.session_token=<value>; better-auth.state=<value>
```
## Step 2 — Build a Playwright-style state file
`agent-browser state load` expects Playwright's `storageState` format: a JSON with a `cookies` array and an `origins` array.
```bash
cat > /tmp/mkstate.py << 'PY'
import json, sys, time
# Read the Cookie header from stdin (allows optional "Cookie: " prefix).
raw = sys.stdin.read().strip()
if raw.lower().startswith("cookie:"):
raw = raw.split(":", 1)[1].strip()
# Keep only better-auth cookies. Extend this set if the app genuinely needs more.
WANTED = {"better-auth.session_token", "better-auth.state"}
cookies = []
exp = int(time.time()) + 30 * 24 * 3600 # 30 days
for pair in raw.split("; "):
if "=" not in pair:
continue
name, _, value = pair.partition("=")
if name not in WANTED:
continue
cookies.append({
"name": name,
"value": value,
"domain": "localhost",
"path": "/",
"expires": exp,
"httpOnly": False,
"secure": False,
"sameSite": "Lax",
})
if not cookies:
sys.stderr.write("no better-auth cookies found in input\n")
sys.exit(1)
print(json.dumps({"cookies": cookies, "origins": []}, indent=2))
PY
# Feed the copied Cookie header in via env var or heredoc.
printf '%s' "$COOKIE_HEADER" | python3 /tmp/mkstate.py > /tmp/state.json
```
**Note on `httpOnly`**: the real cookie in the user's browser is HttpOnly, but `storageState` doesn't enforce the flag on load — it just attaches the value. Storing with `httpOnly: false` is fine for local dev and sidesteps a CDP-context quirk where HttpOnly cookies sometimes fail to attach.
## Step 3 — Load state and navigate
```bash
SESSION="my-test" # any stable session name
agent-browser --session "$SESSION" state load /tmp/state.json
agent-browser --session "$SESSION" open "http://localhost:3011/"
agent-browser --session "$SESSION" get url
# Expect NOT /signin?callbackUrl=… — if you still see signin, cookie didn't apply.
```
## Step 4 — Verify
```bash
agent-browser --session "$SESSION" snapshot -i | head -20
# Look for the user's avatar/name in the sidebar, or absence of the signin form.
```
## Common failure modes
| Symptom | Cause | Fix |
| ----------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------- |
| Still redirects to `/signin` after `state load` | User pasted from `document.cookie` → missed HttpOnly session | Re-pull from Network request Headers, not console |
| `state load` reports 0 cookies | Separator wrong, or user pasted URL-decoded value | Keep the raw `Cookie:` header as-is; split on `"; "` |
| Login works briefly then expires | `better-auth.session_token` rotated (user logged out / signed in again) | Re-copy and re-load |
| Domain mismatch | Use `domain: "localhost"` literally, no leading dot for local dev | — |
## Scope
Only covers authenticating an **agent-browser** session into a **local** LobeHub dev server. It does not:
- Work for production — production cookies are `Secure; HttpOnly; Domain=.lobehub.com` and must be delivered over HTTPS.
- Replace real OAuth flows — tests that must exercise the login UI need a real Chromium with `--remote-debugging-port` or a bot account.
- Flow cookies back to the user's Chrome — injection is one-way (into agent-browser only).
@@ -1,93 +0,0 @@
# LobeHub gateway streaming + tab-switch test harness
Captures store + DOM state at 200ms intervals so we can prove or disprove
claims like "切回 tab 后消息回到了很早以前". Built for gateway-mode chat but
works for any LobeHub streaming session.
## Files
`scripts/agent-gateway/`
| File | Role |
| --------------- | ---------------------------------------------------------------- |
| `probe.js` | Injects a 200ms sampler + `__PROBE_EVENT` marker + `__switchTab` |
| `probe-dump.js` | Stops the sampler and returns `{events, samples}` as JSON string |
| `tab-switch.js` | Runs N round-trip switches between two tabs, marks each step |
| `analyze.mjs` | Node post-processor: timeline + regression detection |
## Standard workflow
```bash
# 1. Start Electron with CDP
./.agents/skills/local-testing/scripts/electron-dev.sh start
# 2. Navigate to a chat, switch runtime to Cloud Sandbox (gateway mode)
# 3. Install the probe + helpers
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/probe.js
# 4. Send a tool-call message — manually or via type+press
agent-browser --cdp 9222 eval "window.__PROBE_EVENT('SENT')"
# 5. Run the multi-switch driver (auto-picks active tab as BACK and the
# rightmost inactive tab as AWAY — edit ROUND_TRIPS / DWELL_MS in the
# file if you want different timing)
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/tab-switch.js
# 6. Wait for streaming to finish, then dump
agent-browser --cdp 9222 eval --stdin \
< .agents/skills/local-testing/scripts/agent-gateway/probe-dump.js \
> /tmp/probe.json
# 7. Analyze
node .agents/skills/local-testing/scripts/agent-gateway/analyze.mjs /tmp/probe.json
```
The analyzer prints three sections: EVENTS, TIMELINE, REGRESSIONS. If
REGRESSIONS is non-empty it means content/reasoning/childN dropped on the
same topic — the symptom users describe.
## What the probe tracks (and why)
`chat.messagesMap` only stores the top-level `assistantGroup` shell. The
actual streamed content, reasoning, and tool calls live in
`assistantGroup.children: AssistantContentBlock[]`. Any probe that only
reads `m.content` / `m.reasoning` will see zeros throughout streaming and
miss everything that matters. probe.js walks both levels and sums:
- `cT` total content length
- `rT` total reasoning length
- `toolT` total tool-call count
- `childN` number of content blocks
Plus DOM-side signals (`domLen`, search/crawl indicator counts) so you can
tell store-side regressions apart from render-side regressions.
## Gotchas
- **Optimistic new-topic state.** Before the first chunk lands, messages
live under the `<scope>_new` key with `tmp_*` ids and no `topicId` field.
probe.js falls back to those when `activeTopicId` is null.
- **Reasoning resets to 0 are not bugs.** When the assistant finishes
thinking and starts tool-use or text, the streaming reasoning buffer
empties and the finalised reasoning gets sealed into a completed block.
Filter these out manually if needed.
- **DOM length jitters by a handful of chars** because counters like "(10)"
in tool-call labels change as results arrive. analyze.mjs only flags
`domLen` drops greater than 100 chars to ignore that noise.
- **Never identify tabs by innerText.** The active tab's text embeds a
` · <agent name>` suffix, so a search like `'LobeHub Growth'` matches the
active tab when the active agent happens to be LobeHub Growth — and you
end up clicking the tab you're already on. probe.js uses the stable
`data-contextmenu-trigger` attribute (a React `useId()` value that's set
per-tab and survives focus changes) plus `data-active="true"` to mark
the active one. Helpers exposed:
`__listTabs()` / `__clickTabByKey(key)` / `__clickTabByIndex(i)` /
`__activeTabKey()`.
- **`tab-switch.js` fires-and-forgets.** The IIFE kicks off an async loop
and returns immediately so the agent-browser CLI eval doesn't blow past
its default 25 s timeout. Wait on the `SWITCH_LOOP_DONE` event marker
before dumping. Re-running while a loop is in flight is refused — the
chaotic data from overlapping runs is not worth debugging.
@@ -1,142 +0,0 @@
# record-app-screen.sh
General-purpose screen recording tool for the Electron app. Captures CDP screenshots as video frames and gallery snapshots, then assembles into an MP4 on stop.
## Why CDP Screenshots Instead of ffmpeg Screen Capture
- **Works on any screen** — CDP screenshots capture the browser viewport directly, so external monitors, Retina scaling, and window positioning are all handled automatically
- **No signal handling issues** — ffmpeg-static (npm) produces corrupt MP4 files when killed (missing moov atom). CDP screenshots avoid this entirely
- **Consistent output** — Screenshots are resolution-independent and don't require crop coordinate calculations
## Commands
```bash
# Start recording (Electron must be running with CDP)
.agents/skills/local-testing/scripts/record-app-screen.sh start [output_name]
# Stop recording and assemble video
.agents/skills/local-testing/scripts/record-app-screen.sh stop
# Check if recording is active
.agents/skills/local-testing/scripts/record-app-screen.sh status
```
### Arguments
| Argument | Default | Description |
| ------------- | --------------------------- | -------------------------- |
| `output_name` | `recording-YYYYMMDD-HHMMSS` | Base name for output files |
### Environment Variables
| Variable | Default | Description |
| ---------------------- | ------- | -------------------------------------- |
| `CDP_PORT` | `9222` | Chrome DevTools Protocol port |
| `SCREENSHOT_INTERVAL` | `3` | Seconds between gallery screenshots |
| `VIDEO_FRAME_INTERVAL` | `0.5` | Seconds between video frames (\~2 fps) |
## Output Structure
```
.records/
<name>.mp4 # Video assembled from frames (~2 fps)
<name>/ # Gallery screenshots (every 3s)
0000.png
0001.png
0002.png
...
```
The `.records/` directory is at the project root and is gitignored.
## How It Works
### Start
1. Creates two background loops:
- **Video frames** — `agent-browser screenshot` every `VIDEO_FRAME_INTERVAL` seconds into a temp directory (`/tmp/record-frames-XXXXXX/`)
- **Gallery screenshots** — `agent-browser screenshot` every `SCREENSHOT_INTERVAL` seconds into `.records/<name>/`
2. Saves PIDs and paths to `/tmp/record-app-screen.pids` and `/tmp/record-app-screen.state`
### Stop
1. Kills both background loops
2. Assembles video frames into MP4 using ffmpeg:
```
ffmpeg -framerate 2 -i frame_%06d.png -c:v libx264 -crf 23 -pix_fmt yuv420p <output>.mp4
```
3. Cleans up temp frame directory
4. Reports file sizes and paths
## Usage Examples
### Basic Test Recording
```bash
# Start Electron
.agents/skills/local-testing/scripts/electron-dev.sh start
# Start recording
.agents/skills/local-testing/scripts/record-app-screen.sh start my-test
# Run automation
agent-browser --cdp 9222 click @e61
agent-browser --cdp 9222 type @e42 "hello"
agent-browser --cdp 9222 press Enter
sleep 10
# Stop and get results
.agents/skills/local-testing/scripts/record-app-screen.sh stop
# → .records/my-test.mp4 + .records/my-test/*.png
```
### Gateway Streaming Demo
```bash
.agents/skills/local-testing/scripts/electron-dev.sh start
# Inject gateway URL
agent-browser --cdp 9222 eval --stdin << 'EOF'
(function() {
var store = window.global_serverConfigStore;
store.setState({ serverConfig: { ...store.getState().serverConfig,
agentGatewayUrl: 'https://agent-gateway.lobehub.com' } });
return 'ready';
})()
EOF
# Record
.agents/skills/local-testing/scripts/record-app-screen.sh start gateway-demo
# Navigate to agent, send message, wait for completion...
# (automation commands here)
.agents/skills/local-testing/scripts/record-app-screen.sh stop
open .records/gateway-demo.mp4
```
### Check Active Recording
```bash
.agents/skills/local-testing/scripts/record-app-screen.sh status
# [record] Active recording
# Frames: 42 captured (running: yes)
# Screenshots: 14 captured (running: yes)
# Output: .records/my-test.mp4
```
## Prerequisites
- **ffmpeg** — For video assembly. Install via `bun add -g ffmpeg-static` or `brew install ffmpeg`
- **agent-browser** — For CDP screenshots. Install via `npm i -g agent-browser`
- **Electron app running** — With CDP enabled (use `electron-dev.sh start`)
## Troubleshooting
| Problem | Solution |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| "No active recording found" on stop | PID file was cleaned up. Check if background processes are still running with `ps aux \| grep agent-browser` |
| "A recording is already active" | Run `stop` first, or manually clean: `rm /tmp/record-app-screen.pids /tmp/record-app-screen.state` |
| Video is 0 bytes | No frames were captured. Ensure Electron is running and CDP port is correct |
| Screenshots are blank/white | SPA may not have loaded yet. Wait for `electron-dev.sh` to report "Renderer ready" |
| ffmpeg assembly fails | Check `/tmp/ffmpeg-assemble.log`. Ensure ffmpeg is installed and frames exist |
@@ -1,243 +0,0 @@
// Analyzer for probe-events dumps. Reads a JSON file produced by `run.ts dump`
// and prints a layered breakdown:
//
// 1. STREAM EVENTS — every non-chunk WS/SSE event in receipt order
// 2. CHUNKS SUMMARY — collapsed per-step chunk counts (otherwise floods)
// 3. ACTION CALLS — replaceMessages / refreshMessages / MARK:* with stack
// 4. CORRELATION — calls ↔ nearest stream event within ±300ms
// 5. PER-KEY ASSISTANT GROWTH — for each messagesMap key, when the leading
// assistant message's cLen / rLen actually moves (this is what reveals
// "chunks arrived but the message never grew" regressions)
// 6. ROLLBACKS — msgN / childN / role drops in the active-topic timeline
//
// Usage:
// bun run .agents/skills/local-testing/scripts/agent-gateway/analyze-events.ts <dump.json>
import { readFileSync } from 'node:fs';
import type {
ProbeActionCall,
ProbeDump,
ProbeMessageSummary,
ProbeStreamEvent,
ProbeTimelineSample,
} from './types';
const file = process.argv[2];
if (!file) {
console.error('usage: bun run analyze-events.ts <dump.json>');
process.exit(1);
}
const raw = readFileSync(file, 'utf8');
// agent-browser eval --stdin wraps return values in quotes when the value is
// a string — so the JSON file may be double-encoded depending on how it was
// captured. Handle both.
const parsedOnce = JSON.parse(raw) as ProbeDump | string;
const dump: ProbeDump = typeof parsedOnce === 'string' ? JSON.parse(parsedOnce) : parsedOnce;
const { streamEvents = [], actionCalls = [], timeline = [] } = dump;
const pad = (v: unknown, n: number) => String(v).padStart(n);
// ── META ───────────────────────────────────────────────────────────
console.log('=== META ===');
console.log(` events: ${streamEvents.length}`);
console.log(` calls: ${actionCalls.length}`);
console.log(` timeline: ${timeline.length}`);
// ── 1. STREAM EVENTS (non-chunk) ───────────────────────────────────
const nonChunkEvents = streamEvents.filter((e) => e.type !== 'stream_chunk');
const chunkEvents = streamEvents.filter((e) => e.type === 'stream_chunk');
console.log(
`\n=== STREAM EVENTS (${nonChunkEvents.length} non-chunk + ${chunkEvents.length} chunks elided) ===`,
);
for (const e of nonChunkEvents) {
const dataStr = e.dataKeys?.length ? ` [${e.dataKeys.join(',')}]` : '';
const data = e.data as Record<string, unknown> | undefined;
const uiHint = data?.uiMessagesPreview
? ` uiPreview=${JSON.stringify(data.uiMessagesPreview)}`
: data?.uiMessagesTotal
? ` uiTotal=${data.uiMessagesTotal}`
: '';
const phaseHint = data?.phase ? ` phase=${data.phase}` : '';
const extra = e.serverType ? ` serverType=${e.serverType}` : '';
console.log(
` t=${pad(e.t, 7)} [${(e.transport ?? '?').padEnd(3)}] step=${pad(e.stepIndex ?? '-', 2)} ` +
`type=${(e.type ?? '').padEnd(22)} op=${e.opIdTail ?? '-'}${phaseHint}${uiHint}${extra}${dataStr}`,
);
}
// ── 2. CHUNK SUMMARY ───────────────────────────────────────────────
console.log('\n=== CHUNKS SUMMARY (per step / chunkType) ===');
const chunkBuckets = new Map<string, { count: number; firstT: number; lastT: number }>();
for (const c of chunkEvents) {
const data = c.data as Record<string, unknown> | undefined;
const ct = (data?.chunkType as string | undefined) ?? '?';
const key = `step=${c.stepIndex ?? '-'} chunkType=${ct.padEnd(8)} op=${c.opIdTail}`;
const slot = chunkBuckets.get(key);
if (slot) {
slot.count += 1;
slot.lastT = c.t;
} else {
chunkBuckets.set(key, { count: 1, firstT: c.t, lastT: c.t });
}
}
for (const [k, v] of chunkBuckets) {
console.log(` ${k} count=${pad(v.count, 4)} t=${pad(v.firstT, 7)}..${pad(v.lastT, 7)}`);
}
// ── 3. ACTION CALLS ───────────────────────────────────────────────
console.log('\n=== ACTION CALLS (replace/refresh/MARK) ===');
for (const c of actionCalls) {
if (c.name?.startsWith('MARK:')) {
console.log(` t=${pad(c.t, 7)} ${c.name}`);
continue;
}
const snapshot = (c.args as any)?.snapshot as
| Array<{ id: string; role: string; cLen: number; rLen: number }>
| undefined;
const snapStr = snapshot?.length
? ' snapshot=' + snapshot.map((m) => `${m.id}:${m.role}/c${m.cLen}/r${m.rLen}`).join(' | ')
: '';
const summary =
c.name === 'replaceMessages'
? `count=${c.args?.count} action=${(c.args?.params as any)?.action ?? '-'}${snapStr}`
: c.name === 'refreshMessages'
? `ctx=${JSON.stringify(c.args?.context)}`
: c.error
? `error=${c.error}`
: '';
console.log(` t=${pad(c.t, 7)} ${c.name.padEnd(20)} ${summary}`);
if (c.stack) {
const frames = c.stack
.split(' ← ')
.filter((f) => !!f && !f.includes('Object.<anonymous>'))
.slice(0, 3);
for (const f of frames) console.log(`${f}`);
}
}
// ── 4. CORRELATION ────────────────────────────────────────────────
function nearestEventForCall(
call: ProbeActionCall,
windowMs = 300,
): { event: ProbeStreamEvent; delta: number } | null {
let best: ProbeStreamEvent | null = null;
let bestDelta = Infinity;
for (const e of streamEvents) {
const d = Math.abs(e.t - call.t);
if (d < bestDelta && d <= windowMs) {
bestDelta = d;
best = e;
}
}
return best ? { event: best, delta: bestDelta } : null;
}
console.log('\n=== CORRELATION (replace/refresh ↔ nearest event within ±300ms) ===');
for (const c of actionCalls) {
if (c.name !== 'refreshMessages' && c.name !== 'replaceMessages') continue;
const hit = nearestEventForCall(c);
if (hit) {
const phase = (hit.event.data as Record<string, unknown> | undefined)?.phase;
console.log(
` t=${pad(c.t, 7)} ${c.name.padEnd(16)} ← Δ${pad(hit.delta, 4)}ms ${hit.event.type}` +
(phase ? ` phase=${phase}` : ''),
);
} else {
console.log(` t=${pad(c.t, 7)} ${c.name.padEnd(16)} ← (no event nearby — external trigger)`);
}
}
// ── 5. PER-KEY ASSISTANT GROWTH ───────────────────────────────────
// For each messagesMap key, find the trailing assistant message and report
// the points in time where its cLen / rLen actually changed. If the timeline
// shows chunks arriving but the assistant cLen never moves, that's the
// signature of "dispatch queue blocked / messageId mismatch".
console.log('\n=== PER-KEY ASSISTANT GROWTH ===');
const keysEverSeen = new Set<string>();
for (const s of timeline) for (const k of Object.keys(s.byKey ?? {})) keysEverSeen.add(k);
for (const key of keysEverSeen) {
console.log(`\n key=${key}`);
let lastSig: string | null = null;
for (const s of timeline) {
const slot = s.byKey?.[key];
if (!slot) continue;
const last = slot.msgs.at(-1) as ProbeMessageSummary | undefined;
if (!last) continue;
const sig = `${last.id}|c${last.cLen}|r${last.rLen}|n${slot.n}`;
if (sig === lastSig) continue;
lastSig = sig;
console.log(
` t=${pad(s.t, 7)} msgN=${pad(slot.n, 3)} ` +
`lastAssistant=${last.id} cLen=${pad(last.cLen, 5)} rLen=${pad(last.rLen, 5)}` +
` runOps=${s.runOps}`,
);
}
}
// ── 6. ROLLBACKS (active-topic msgN / childN / role drops) ─────────
console.log('\n=== ROLLBACKS (active-topic msgN / childN / role drops) ===');
let prev: ProbeTimelineSample | null = null;
const rollbacks: Array<{ t: number; topic: string | null; drops: string[] }> = [];
const flatten = (s: ProbeTimelineSample) => {
if (!s.activeTopic) return [];
return Object.entries(s.byKey ?? {})
.filter(([k]) => k.includes(s.activeTopic!))
.flatMap(([, v]) => v.msgs);
};
for (const s of timeline) {
if (s.err) {
prev = null;
continue;
}
if (!prev || prev.activeTopic !== s.activeTopic) {
prev = s;
continue;
}
const prevMsgs = flatten(prev);
const curMsgs = flatten(s);
const drops: string[] = [];
if (curMsgs.length < prevMsgs.length) drops.push(`msgN ${prevMsgs.length}${curMsgs.length}`);
let prevChild = 0;
let curChild = 0;
for (const m of prevMsgs) prevChild += m.chN ?? 0;
for (const m of curMsgs) curChild += m.chN ?? 0;
if (curChild < prevChild) drops.push(`childN ${prevChild}${curChild}`);
const prevById = new Map(prevMsgs.map((m) => [m.id, m]));
for (const m of curMsgs) {
const pr = prevById.get(m.id);
if (!pr) continue;
if (m.cLen < pr.cLen) drops.push(`cLen[${m.id}] ${pr.cLen}${m.cLen}`);
if (m.rLen < pr.rLen) drops.push(`rLen[${m.id}] ${pr.rLen}${m.rLen}`);
}
if (drops.length) rollbacks.push({ t: s.t, topic: s.activeTopic, drops });
prev = s;
}
if (rollbacks.length === 0) {
console.log(' (none)');
} else {
for (const r of rollbacks) {
const nearEvent = streamEvents
.filter((e) => Math.abs(e.t - r.t) <= 300)
.map((e) => `${e.type}${(e.data as any)?.phase ? ':' + (e.data as any).phase : ''}`);
const nearCall = actionCalls
.filter((c) => Math.abs(c.t - r.t) <= 300 && !c.name?.startsWith('MARK:'))
.map((c) => c.name);
console.log(
` t=${pad(r.t, 7)} topic=${r.topic} ${r.drops.join(' | ')}` +
(nearEvent.length ? ` near-event:[${nearEvent.join(',')}]` : '') +
(nearCall.length ? ` near-call:[${nearCall.join(',')}]` : ''),
);
}
}
@@ -1,119 +0,0 @@
#!/usr/bin/env node
// Analyze a probe dump captured by probe.js + probe-dump.js.
//
// node analyze.mjs /tmp/probe.json
//
// Prints:
// 1. EVENTS — user-action markers with their relative timestamps
// 2. TIMELINE — periodic samples (~1 per second + event-adjacent samples)
// showing every interesting field; columns:
// t(ms) | runOps | msgN | childN | content | reasoning | tools | domLen | search | crawl | topic | event
// 3. REGRESSIONS — every place a tracked counter *dropped* on the same
// topic between adjacent samples. A "true" UI rollback shows up as a
// drop in content/reasoning/tools/childN/domLen without a topic change.
//
// Whitelisted transitions (not flagged):
// - topic change → all drops expected (focus moved away)
// - reasoning length 0 after content starts → reasoning gets sealed into a
// completed sub-block; the parent's running reasoning resets to ''.
// - msgN drop when topic transitions from `_new` placeholder to a real id.
import fs from 'node:fs';
const file = process.argv[2];
if (!file) {
console.error('usage: node analyze.mjs <probe.json>');
process.exit(1);
}
const raw = JSON.parse(fs.readFileSync(file, 'utf8'));
// probe-dump.js wraps the payload in JSON.stringify so agent-browser returns
// it as a single quoted string. Unwrap.
const data = typeof raw === 'string' ? JSON.parse(raw) : raw;
const { events, samples } = data;
const fmt = {
pad(v, n) {
return String(v).padStart(n);
},
};
console.log('=== EVENTS ===');
for (const e of events) console.log(` t=${fmt.pad(e.t, 7)} ${e.name}`);
console.log(
'\n=== TIMELINE (~1s cadence, plus event-adjacent samples) ===\n' +
' t(ms) runOps msgN childN content reasoning tools domLen search crawl topic event',
);
let lastSampledAt = -1e9;
const eventBuckets = events.map((e) => e.t);
for (let i = 0; i < samples.length; i++) {
const s = samples[i];
const nearEvent = eventBuckets.some((et) => Math.abs(et - s.t) < 110);
if (!nearEvent && s.t - lastSampledAt < 1000) continue;
lastSampledAt = s.t;
const ev = events.find((e) => Math.abs(e.t - s.t) < 110);
const evMarker = ev ? `${ev.name}` : '';
const topicSuffix = s.topicId ? s.topicId.slice(-6) : '(none)';
const search = s.ind?.search ?? 0;
const crawl = s.ind?.crawl ?? 0;
console.log(
` ${fmt.pad(s.t, 6)} ` +
`${fmt.pad(s.runOps, 6)} ` +
`${fmt.pad(s.msgN, 4)} ` +
`${fmt.pad(s.childN ?? 0, 5)} ` +
`${fmt.pad(s.cT ?? 0, 8)} ` +
`${fmt.pad(s.rT ?? 0, 9)} ` +
`${fmt.pad(s.toolT ?? 0, 5)} ` +
`${fmt.pad(s.domLen ?? 0, 7)} ` +
`${fmt.pad(search, 6)} ` +
`${fmt.pad(crawl, 5)} ` +
`${topicSuffix.padEnd(8)}${evMarker}`,
);
}
console.log('\n=== REGRESSIONS (same topic, value dropped) ===');
const regressions = [];
for (let i = 1; i < samples.length; i++) {
const prev = samples[i - 1];
const cur = samples[i];
if (!cur.topicId || prev.topicId !== cur.topicId) continue;
const drops = [];
if (cur.msgN < prev.msgN) drops.push(`msgN: ${prev.msgN}${cur.msgN}`);
if ((cur.childN ?? 0) < (prev.childN ?? 0)) drops.push(`childN: ${prev.childN}${cur.childN}`);
if ((cur.cT ?? 0) < (prev.cT ?? 0)) drops.push(`content: ${prev.cT}${cur.cT}`);
if ((cur.rT ?? 0) < (prev.rT ?? 0)) drops.push(`reasoning: ${prev.rT}${cur.rT}`);
if ((cur.toolT ?? 0) < (prev.toolT ?? 0)) drops.push(`tools: ${prev.toolT}${cur.toolT}`);
// domLen jitters by a few chars from counter labels — only flag big drops.
if ((cur.domLen ?? 0) < (prev.domLen ?? 0) - 100) {
drops.push(`domLen: ${prev.domLen}${cur.domLen}`);
}
if (drops.length === 0) continue;
const nearbyEv = events.filter((e) => Math.abs(e.t - cur.t) < 600).map((e) => e.name);
regressions.push({ t: cur.t, topic: cur.topicId.slice(-6), drops, nearbyEv });
}
if (regressions.length === 0) {
console.log(' (none)');
} else {
for (const r of regressions) {
const evStr = r.nearbyEv.length ? ` near:[${r.nearbyEv.join(',')}]` : '';
console.log(` t=${fmt.pad(r.t, 7)} topic=${r.topic} ${r.drops.join(' | ')}${evStr}`);
}
}
console.log(`\n=== SUMMARY ===`);
console.log(` samples: ${samples.length}`);
console.log(` events: ${events.length}`);
console.log(` regressions: ${regressions.length}`);
if (samples.length) {
const last = samples.at(-1);
console.log(
` final: msgN=${last.msgN} childN=${last.childN ?? 0} content=${last.cT ?? 0} ` +
`reasoning=${last.rT ?? 0} tools=${last.toolT ?? 0} runOps=${last.runOps}`,
);
}
@@ -1,17 +0,0 @@
// Stop the probe and serialize collected data.
//
// agent-browser --cdp 9222 eval --stdin < probe-dump.js > /tmp/probe.json
//
// The whole thing is wrapped in a JSON.stringify so agent-browser returns it
// as a single quoted string — the analyzer double-parses to handle that.
(function () {
if (window.__PROBE_TIMER) {
clearInterval(window.__PROBE_TIMER);
window.__PROBE_TIMER = null;
}
return JSON.stringify({
events: window.__PROBE_EVENTS || [],
samples: window.__PROBE_SAMPLES || [],
});
})();
@@ -1,37 +0,0 @@
// Stops the events-probe timeline timer and stashes the full capture as a
// JSON string on `window.__PROBE_LAST_DUMP_JSON`. `run.ts` wraps the bundle
// in an IIFE that returns that global, which `agent-browser eval` prints to
// stdout — the runner then persists it under `.agent-gateway/`.
import type { ProbeDump } from './types';
declare global {
interface Window {
__PROBE_LAST_DUMP_JSON?: string;
}
}
const w = window;
if (w.__PROBE_TIMELINE_TIMER) {
clearInterval(w.__PROBE_TIMELINE_TIMER);
w.__PROBE_TIMELINE_TIMER = null;
}
const mutations = w.__PROBE_MUTATIONS ?? [];
const dump: ProbeDump & { mutations: typeof mutations } = {
meta: {
t0: w.__PROBE_T0 ?? 0,
collectedAt: Date.now(),
sampleCount: (w.__PROBE_MSG_TIMELINE ?? []).length,
eventCount: (w.__PROBE_STREAM_EVENTS ?? []).length,
callCount: (w.__PROBE_ACTION_CALLS ?? []).length,
},
streamEvents: w.__PROBE_STREAM_EVENTS ?? [],
actionCalls: w.__PROBE_ACTION_CALLS ?? [],
timeline: w.__PROBE_MSG_TIMELINE ?? [],
mutations,
};
w.__PROBE_LAST_DUMP_JSON = JSON.stringify(dump);
@@ -1,637 +0,0 @@
// LobeHub gateway raw-event-stream probe.
//
// Gateway-mode chats subscribe via WebSocket — NOT via the `/api/agent/stream`
// SSE endpoint (that one belongs to the direct/client durable-agent runtime).
// `AgentStreamClient` (`packages/agent-gateway-client/src/client.ts`) opens
// `new WebSocket('wss://.../ws?operationId=...')`, then parses JSON frames in
// its `onmessage` handler and re-emits `agent_event.event` objects to the
// chat store.
//
// To capture the RAW gateway events before the store touches them, we wrap
// `window.WebSocket` so that for any socket whose URL contains `operationId=`
// we intercept the `onmessage` handler / `addEventListener('message')` and
// log every `agent_event` frame.
//
// We *also* keep the `window.fetch` hook for `/api/agent/stream` so this
// probe still works for direct-mode runs — but gateway-mode events come
// through the WebSocket path.
//
// Buffers (read via `dump`):
// __PROBE_STREAM_EVENTS — raw events parsed off the wire
// __PROBE_ACTION_CALLS — replaceMessages / refreshMessages calls (best-effort)
// __PROBE_MSG_TIMELINE — 200ms snapshots of every messagesMap key
import type {
ProbeActionCall,
ProbeMessageSummary,
ProbeStreamEvent,
ProbeTimelineSample,
} from './types';
// Bundled by esbuild as an IIFE. Top-level code runs once on injection.
const w = window;
// ── Buffers ─────────────────────────────────────────────────────────
declare global {
interface Window {
__PROBE_MUTATIONS?: Array<{
t: number;
key: string;
n: number;
last?: { id: string; role: string; cLen: number; rLen: number; updatedAt?: unknown };
prevLast?: { id: string; role: string; cLen: number; rLen: number };
delta?: string;
}>;
__PROBE_STORE_UNSUB?: () => void;
}
}
const events: ProbeStreamEvent[] = (w.__PROBE_STREAM_EVENTS ??= []);
const calls: ProbeActionCall[] = (w.__PROBE_ACTION_CALLS ??= []);
const timeline: ProbeTimelineSample[] = (w.__PROBE_MSG_TIMELINE ??= []);
const mutations = (w.__PROBE_MUTATIONS ??= []);
events.length = 0;
calls.length = 0;
timeline.length = 0;
mutations.length = 0;
const t0 = Date.now();
w.__PROBE_T0 = t0;
const now = (): number => Date.now() - t0;
// ── Helpers ─────────────────────────────────────────────────────────
function summarizeData(data: unknown): Record<string, unknown> | unknown {
if (!data || typeof data !== 'object') return data;
const src = data as Record<string, unknown>;
const out: Record<string, unknown> = {};
for (const k of Object.keys(src)) {
const v = src[k];
if (v == null) {
out[k] = v;
} else if (Array.isArray(v)) {
out[k] = `Array(${v.length})`;
if (k === 'uiMessages') {
out.uiMessagesPreview = v.slice(0, 5).map((m: any) => ({
id: (m.id ?? '').slice(-8),
role: m.role,
cLen: (m.content ?? '').length,
children: (m.children ?? []).length,
tools: (m.tools ?? []).length,
reasoning: (m.reasoning?.content ?? '').length,
}));
out.uiMessagesTotal = v.length;
}
} else if (typeof v === 'object') {
const obj = v as Record<string, unknown>;
out[k] =
'Object{' +
Object.keys(obj)
.slice(0, 6)
.map((kk) => kk + (typeof obj[kk] === 'string' ? `=${(obj[kk] as string).length}ch` : ''))
.join(',') +
'}';
} else if (typeof v === 'string') {
out[k] = v.length > 100 ? v.slice(0, 100) + `…(${v.length})` : v;
} else {
out[k] = v;
}
}
return out;
}
function summarizeMessages(msgs: any[]): ProbeMessageSummary[] {
return (msgs ?? []).slice(0, 80).map((m) => ({
id: (m.id ?? '').slice(-8),
role: m.role,
cLen: (m.content ?? '').length,
rLen: (m.reasoning?.content ?? '').length,
tools: (m.tools ?? []).length,
chN: (m.children ?? []).length,
}));
}
function shortStack(): string {
const raw = new Error('probe-stack').stack ?? '';
return raw
.split('\n')
.slice(3)
.filter((l) => !l.includes('probe-events') && !l.includes('node_modules'))
.map((l) => l.trim().replace(/^at\s+/, ''))
.slice(0, 6)
.join(' ← ');
}
function recordAgentEvent(args: {
transport: 'ws' | 'sse';
opId: string | null;
agentEvent: any;
eventId?: string | null;
rawLen?: number;
}): void {
const { transport, opId, agentEvent, eventId, rawLen } = args;
if (!agentEvent || typeof agentEvent !== 'object') return;
events.push({
t: now(),
transport,
opIdTail: (opId ?? '').slice(-10),
eventId: eventId ?? null,
type: agentEvent.type,
stepIndex: agentEvent.stepIndex,
dataKeys: agentEvent.data ? Object.keys(agentEvent.data) : [],
data: summarizeData(agentEvent.data) as Record<string, unknown>,
rawLen,
});
}
// ── 1. Patch window.WebSocket for gateway WS events ────────────────
if (!w.__PROBE_ORIG_WEBSOCKET) w.__PROBE_ORIG_WEBSOCKET = w.WebSocket;
const OrigWS = w.__PROBE_ORIG_WEBSOCKET;
function extractOpIdFromWsUrl(url: string | URL): string | null {
const m = String(url ?? '').match(/operationId=([^&]+)/);
return m ? decodeURIComponent(m[1]) : null;
}
function isGatewayWs(url: string | URL): boolean {
return String(url ?? '').includes('operationId=');
}
function handleWsFrame(rawData: unknown, opId: string | null): void {
const rawLen = typeof rawData === 'string' ? rawData.length : -1;
let parsed: any;
try {
parsed = typeof rawData === 'string' ? JSON.parse(rawData) : null;
} catch {
events.push({
t: now(),
transport: 'ws',
opIdTail: (opId ?? '').slice(-10),
type: '_PARSE_ERROR_',
raw: typeof rawData === 'string' && rawData.length < 400 ? rawData : '(non-string or large)',
});
return;
}
if (!parsed) return;
if (parsed.type === 'agent_event') {
recordAgentEvent({
transport: 'ws',
opId,
agentEvent: parsed.event,
eventId: parsed.id,
rawLen,
});
} else {
events.push({
t: now(),
transport: 'ws',
opIdTail: (opId ?? '').slice(-10),
type: '_SERVER_MSG_',
serverType: parsed.type,
rawLen,
});
}
}
// Wrap the constructor. Instance `constructor` will still reflect OrigWS
// (we share prototypes), so use the `_WS_OPEN_` sentinel events to confirm
// the patch is firing.
function PatchedWebSocket(this: WebSocket, url: string | URL, protocols?: string | string[]) {
const ws: WebSocket = protocols == null ? new OrigWS(url) : new OrigWS(url, protocols);
const opId = extractOpIdFromWsUrl(url);
if (!isGatewayWs(url)) return ws;
events.push({
t: now(),
transport: 'ws',
opIdTail: (opId ?? '').slice(-10),
type: '_WS_OPEN_',
url: String(url),
});
// One observer listener that always fires, regardless of how the consumer
// (AgentStreamClient uses `ws.onmessage = …`) subscribes.
ws.addEventListener('message', (e) => {
try {
handleWsFrame((e as MessageEvent).data, opId);
} catch {
/* swallow */
}
});
ws.addEventListener('close', () => {
events.push({
t: now(),
transport: 'ws',
opIdTail: (opId ?? '').slice(-10),
type: '_WS_CLOSE_',
});
});
return ws;
}
// Preserve prototype + static fields so `instanceof WebSocket` and
// `WebSocket.OPEN` constants still work.
(PatchedWebSocket as unknown as { prototype: WebSocket }).prototype = OrigWS.prototype;
for (const k of Object.keys(OrigWS) as Array<keyof typeof OrigWS>) {
try {
(PatchedWebSocket as any)[k] = (OrigWS as any)[k];
} catch {
/* readonly */
}
}
(['CONNECTING', 'OPEN', 'CLOSING', 'CLOSED'] as const).forEach((k) => {
(PatchedWebSocket as any)[k] = (OrigWS as any)[k];
});
w.WebSocket = PatchedWebSocket as unknown as typeof WebSocket;
// ── 2. Patch window.fetch for `/api/agent/stream` (direct-mode SSE) ─
if (!w.__PROBE_ORIG_FETCH) w.__PROBE_ORIG_FETCH = w.fetch.bind(w);
const origFetch = w.__PROBE_ORIG_FETCH;
function isAgentStreamUrl(input: RequestInfo | URL): boolean {
let url = '';
if (typeof input === 'string') url = input;
else if (input instanceof URL) url = input.toString();
else if (input && typeof (input as Request).url === 'string') url = (input as Request).url;
return url.includes('/api/agent/stream');
}
function extractOpIdFromHttpUrl(input: RequestInfo | URL): string | null {
const url = typeof input === 'string' ? input : (input as Request | URL).toString();
const m = url.match(/operationId=([^&]+)/);
return m ? decodeURIComponent(m[1]) : null;
}
function pushFromSSEFrame(rawFrame: string, opId: string | null): void {
const lines = rawFrame.split('\n');
let dataJson = '';
let evtName = 'message';
for (const line of lines) {
if (line.startsWith('event:')) evtName = line.slice(6).trim();
else if (line.startsWith('data:')) dataJson += line.slice(5).trim();
}
if (!dataJson) return;
let parsed: any;
try {
parsed = JSON.parse(dataJson);
} catch {
events.push({
t: now(),
transport: 'sse',
opIdTail: (opId ?? '').slice(-10),
type: '_PARSE_ERROR_',
sseEvent: evtName,
raw: dataJson.length > 400 ? dataJson.slice(0, 400) + '…' : dataJson,
});
return;
}
recordAgentEvent({
transport: 'sse',
opId,
agentEvent: parsed,
eventId: null,
rawLen: dataJson.length,
});
}
async function teeAndDrain(response: Response, opId: string | null): Promise<Response> {
if (!response.body) return response;
const [a, b] = response.body.tee();
void (async () => {
const reader = b.getReader();
const decoder = new TextDecoder();
let buf = '';
try {
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
let idx: number;
while ((idx = buf.indexOf('\n\n')) !== -1) {
const frame = buf.slice(0, idx);
buf = buf.slice(idx + 2);
if (frame.trim()) pushFromSSEFrame(frame, opId);
}
}
if (buf.trim()) pushFromSSEFrame(buf, opId);
} catch (e: any) {
events.push({
t: now(),
transport: 'sse',
opIdTail: (opId ?? '').slice(-10),
type: '_TEE_ERROR_',
message: String(e?.message ?? e),
});
}
})();
return new Response(a, {
headers: response.headers,
status: response.status,
statusText: response.statusText,
});
}
w.fetch = async function patchedFetch(input: RequestInfo | URL, init?: RequestInit) {
const response = await origFetch(input as any, init);
if (!isAgentStreamUrl(input)) return response;
const opId = extractOpIdFromHttpUrl(input);
const url =
typeof input === 'string'
? input.split('?')[0]
: (input as Request | URL).toString().split('?')[0];
events.push({
t: now(),
transport: 'sse',
opIdTail: (opId ?? '').slice(-10),
type: '_CONNECTED_',
url,
status: response.status,
});
return teeAndDrain(response, opId);
} as typeof fetch;
// ── 3. Wrap store actions (best-effort for "who called replace") ────
// Side-global stash for the original chat-store actions. Re-installs ALWAYS
// rewrap from the originals so updates to the probe body take effect
// without a page reload — using only a `__probeWrapped` flag on the chat
// state object would freeze the first-installed wrapper across re-installs.
declare global {
interface Window {
__PROBE_ORIG_REFRESH_MESSAGES?: any;
__PROBE_ORIG_REPLACE_MESSAGES?: any;
}
}
try {
const chat = w.__LOBE_STORES?.chat?.();
if (chat) {
// First-time install: cache the originals. Re-install: restore from
// the cached originals before wrapping again.
if (!w.__PROBE_ORIG_REFRESH_MESSAGES) w.__PROBE_ORIG_REFRESH_MESSAGES = chat.refreshMessages;
if (!w.__PROBE_ORIG_REPLACE_MESSAGES) w.__PROBE_ORIG_REPLACE_MESSAGES = chat.replaceMessages;
const origRefresh = w.__PROBE_ORIG_REFRESH_MESSAGES;
const origReplace = w.__PROBE_ORIG_REPLACE_MESSAGES;
chat.refreshMessages = origRefresh;
chat.replaceMessages = origReplace;
chat.refreshMessages = async function probeRefresh(this: unknown, ...args: any[]) {
calls.push({
t: now(),
name: 'refreshMessages',
args: { context: args[0] ?? null },
stack: shortStack(),
});
return origRefresh.apply(this, args);
};
chat.replaceMessages = function probeReplace(this: unknown, ...args: any[]) {
const msgs = (args[0] as any[]) ?? [];
const snapshot = msgs.slice(-2).map((m) => ({
id: (m.id ?? '').slice(-8),
role: m.role,
cLen: (m.content ?? '').length,
rLen: (m.reasoning?.content ?? '').length,
updatedAt: m.updatedAt,
}));
calls.push({
t: now(),
name: 'replaceMessages',
args: { count: msgs.length, params: args[1] ?? null, snapshot } as any,
stack: shortStack(),
});
// Pair the call with a mutation row so the analyzer can build a
// single ordered timeline across replaceMessages + dispatchMessage.
const stackTop = shortStack().split(' ← ')[0]?.slice(0, 80);
const last = msgs.at(-1);
const lastSum = last
? {
id: (last.id ?? '').slice(-8),
role: last.role,
cLen: (last.content ?? '').length,
rLen: (last.reasoning?.content ?? '').length,
updatedAt: last.updatedAt,
}
: undefined;
const params: any = args[1] ?? {};
const ctxKey = params.context
? `main_${params.context.agentId ?? '?'}_${
params.context.topicId ? 'tpc_' + params.context.topicId : 'new'
}`.replace('main_tpc_', 'main_') // crude key inference
: '(no-ctx)';
mutations.push({
t: now(),
key: ctxKey,
n: msgs.length,
last: lastSum,
delta: `replaceMessages(action=${params.action ?? '-'}) src=${stackTop ?? '-'}`,
});
return origReplace.apply(this, args);
};
}
} catch (e: any) {
calls.push({ t: now(), name: '_WRAP_ERROR_', error: String(e?.message ?? e) });
}
// ── 3.5. Mutation log — wrap the TWO ChatStore writers (replaceMessages,
// internal_dispatchMessage) to record EVERY dbMessagesMap[key] reference
// change with a one-line "before/after last assistant message" delta. This
// reveals dispatchMessage-driven collapses that the replaceMessages wrap
// alone cannot see.
declare global {
interface Window {
__PROBE_ORIG_DISPATCH_MESSAGE?: any;
}
}
try {
const chat = w.__LOBE_STORES?.chat?.();
if (chat?.internal_dispatchMessage) {
if (!w.__PROBE_ORIG_DISPATCH_MESSAGE)
w.__PROBE_ORIG_DISPATCH_MESSAGE = chat.internal_dispatchMessage;
const origDispatch = w.__PROBE_ORIG_DISPATCH_MESSAGE;
chat.internal_dispatchMessage = origDispatch;
chat.internal_dispatchMessage = function probeDispatch(this: unknown, payload: any, ctx?: any) {
// Snapshot BEFORE — read the would-be target key + last message.
const before = (() => {
try {
const state = w.__LOBE_STORES?.chat?.();
if (!state) return null;
// Replicate state.internal_getConversationContext logic enough to
// resolve a key — but most callers pass operationId on ctx, and
// operationId-keyed lookup needs store internals. Easiest: snapshot
// ALL keys' last-assistant cLen and compare BEFORE vs AFTER below.
const map = state.dbMessagesMap ?? {};
const out: Record<string, any> = {};
for (const k of Object.keys(map)) {
const last = (map[k] ?? []).at(-1);
out[k] = last
? {
id: (last.id ?? '').slice(-8),
cLen: (last.content ?? '').length,
rLen: (last.reasoning?.content ?? '').length,
n: map[k].length,
}
: { n: 0 };
}
return out;
} catch {
return null;
}
})();
const result = origDispatch.apply(this, [payload, ctx]);
// Snapshot AFTER — find which key(s) actually changed.
try {
const state = w.__LOBE_STORES?.chat?.();
if (state && before) {
const map = state.dbMessagesMap ?? {};
for (const k of Object.keys(map)) {
const last = (map[k] ?? []).at(-1);
const beforeSnap = before[k];
const afterSnap = last
? {
id: (last.id ?? '').slice(-8),
cLen: (last.content ?? '').length,
rLen: (last.reasoning?.content ?? '').length,
n: map[k].length,
}
: { n: 0 };
const changed =
!beforeSnap ||
beforeSnap.n !== afterSnap.n ||
beforeSnap.id !== (afterSnap as any).id ||
beforeSnap.cLen !== (afterSnap as any).cLen ||
beforeSnap.rLen !== (afterSnap as any).rLen;
if (!changed) continue;
let delta = '';
if (beforeSnap?.id !== undefined && beforeSnap.id !== (afterSnap as any).id)
delta += `id:${beforeSnap.id}${(afterSnap as any).id};`;
if (
beforeSnap?.cLen !== undefined &&
(afterSnap as any).cLen !== undefined &&
(afterSnap as any).cLen < beforeSnap.cLen
)
delta += `cLen↓${beforeSnap.cLen}${(afterSnap as any).cLen};`;
if (
beforeSnap?.rLen !== undefined &&
(afterSnap as any).rLen !== undefined &&
(afterSnap as any).rLen < beforeSnap.rLen
)
delta += `rLen↓${beforeSnap.rLen}${(afterSnap as any).rLen};`;
if (beforeSnap?.n !== undefined && afterSnap.n < beforeSnap.n)
delta += `n↓${beforeSnap.n}${afterSnap.n};`;
mutations.push({
t: now(),
key: k,
n: afterSnap.n,
last: (afterSnap as any).id ? (afterSnap as any) : undefined,
prevLast: beforeSnap?.id ? beforeSnap : undefined,
delta: delta || `dispatch:${payload?.type}`,
});
}
}
} catch (e: any) {
mutations.push({
t: now(),
key: '_DISPATCH_PROBE_ERROR_',
n: -1,
delta: String(e?.message ?? e),
});
}
return result;
};
}
} catch (e: any) {
calls.push({ t: now(), name: '_DISPATCH_WRAP_ERROR_', error: String(e?.message ?? e) });
}
// ── 4. Periodic per-key timeline snapshots ─────────────────────────
function captureTimeline(): void {
try {
const c = w.__LOBE_STORES?.chat?.();
if (!c) return;
const msgsMap = (c.messagesMap ?? {}) as Record<string, any[]>;
const dbMap = (c.dbMessagesMap ?? {}) as Record<string, any[]>;
const byKey: ProbeTimelineSample['byKey'] = {};
for (const k of Object.keys(msgsMap)) {
const display = msgsMap[k] ?? [];
const db = dbMap[k] ?? [];
if (display.length === 0 && db.length === 0) continue;
byKey[k] = {
n: display.length,
dbN: db.length,
msgs: summarizeMessages(display),
};
}
const ops = Object.values((c.operations ?? {}) as Record<string, any>);
timeline.push({
t: now(),
activeTopic: ((c.activeTopicId as string | null) ?? '').slice(-10) || null,
keys: Object.keys(byKey),
byKey,
runOps: ops.filter((o: any) => o.status === 'running').length,
});
} catch (e: any) {
timeline.push({
t: now(),
activeTopic: null,
keys: [],
byKey: {},
runOps: 0,
err: e?.message ?? String(e),
});
}
}
captureTimeline();
if (w.__PROBE_TIMELINE_TIMER) clearInterval(w.__PROBE_TIMELINE_TIMER);
w.__PROBE_TIMELINE_TIMER = setInterval(captureTimeline, 200);
// ── 5. Tab-switch helpers ──────────────────────────────────────────
function listTopBarTabs(): HTMLElement[] {
return Array.from(
document.querySelectorAll<HTMLElement>(
'[data-insp-path*="TabItem.tsx"][data-contextmenu-trigger]',
),
).filter((t) => t.getBoundingClientRect().top < 30);
}
w.__listTabs = () =>
listTopBarTabs().map((t, i) => ({
i,
key: t.getAttribute('data-contextmenu-trigger'),
active: t.getAttribute('data-active') === 'true',
title: (t.innerText ?? '').slice(0, 60),
}));
w.__clickTabByKey = (key: string) => {
const tab = listTopBarTabs().find((t) => t.getAttribute('data-contextmenu-trigger') === key);
if (!tab) return 'not found: ' + key;
if (tab.getAttribute('data-active') === 'true') return 'already active: ' + key;
tab.click();
return 'clicked key=' + key;
};
w.__PROBE_EVENT = (name: string) => {
calls.push({ t: now(), name: 'MARK:' + name });
};
// `run.ts` wraps the bundle in an IIFE and appends a `return <confirmation>`
// after the bundle body — agent-browser then prints the confirmation back to
// the operator. Nothing to do here at the end of the module body.
@@ -1,204 +0,0 @@
// LobeHub chat streaming time-series probe.
//
// Inject into the renderer (via agent-browser eval) to record store + DOM
// snapshots every 200ms during a streaming session. Designed to surface
// "UI rolled back to an earlier state" symptoms — especially around
// gateway-mode tab switches that happen while the assistant is still writing.
//
// Usage:
// agent-browser --cdp 9222 eval --stdin < probe.js
// # ...do test interactions, call window.__PROBE_EVENT('LABEL') to mark moments...
// agent-browser --cdp 9222 eval --stdin < probe-dump.js > /tmp/probe.json
// node analyze.mjs /tmp/probe.json
//
// What it captures per sample:
// - activeTopicId
// - msgN: top-level messages in chat.messagesMap for this topic
// - childN: total assistantGroup.children blocks across all msgs (THIS is
// where streaming content actually lives — top-level assistantGroup stays empty)
// - cT / rT / toolT: totals across messages AND their children
// (content, reasoning, tool-call count)
// - perMsg: per-message breakdown so regressions can be located precisely
// - runOps: number of running operations (execServerAgentRuntime etc.)
// - domLen: total innerText length of the rendered chat list area
// - ind: visible UI indicators (Search pages, Crawled pages, Deeply Thought, Sending)
//
// Event markers: window.__PROBE_EVENT('NAME') records {t, name} into
// __PROBE_EVENTS, used by the analyzer to align state changes with
// user-driven actions (SENT, AWAY_1, BACK_1, ...).
(function () {
if (window.__PROBE_TIMER) clearInterval(window.__PROBE_TIMER);
window.__PROBE_SAMPLES = [];
window.__PROBE_EVENTS = [];
const t0 = Date.now();
function snapshot() {
try {
const chat = window.__LOBE_STORES.chat();
const topicId = chat.activeTopicId;
const idTail = topicId ? topicId.replace('tpc_', '') : null;
const keys = Object.keys(chat.messagesMap || {});
// Collect messages for the active topic. Before a topic is committed,
// optimistic messages live under the `<agentScope>_new` key — fall
// back to those when no topic is active yet.
let msgs = [];
if (idTail) {
keys.forEach((k) => {
if (k.includes(idTail)) msgs = msgs.concat(chat.messagesMap[k] || []);
});
} else {
keys
.filter((k) => k.endsWith('_new'))
.forEach((k) => {
msgs = msgs.concat(chat.messagesMap[k] || []);
});
}
// Walk top-level + assistantGroup.children. children carry the actual
// streamed content / reasoning / tool calls; the parent assistantGroup
// remains a placeholder (cLen=0, rLen=0) for its whole lifetime.
let totalContent = 0;
let totalReason = 0;
let totalTools = 0;
let childCount = 0;
const perMsg = msgs.map((m) => {
const cLen = (m.content || '').length;
const rLen = ((m.reasoning && m.reasoning.content) || '').length;
const tools = (m.tools || []).length;
totalContent += cLen;
totalReason += rLen;
totalTools += tools;
const children = m.children || [];
let chC = 0;
let chR = 0;
let chT = 0;
children.forEach((c) => {
chC += (c.content || '').length;
chR += ((c.reasoning && c.reasoning.content) || '').length;
chT += (c.tools || []).length;
});
totalContent += chC;
totalReason += chR;
totalTools += chT;
childCount += children.length;
return {
id: (m.id || '').slice(-8),
role: m.role,
cLen,
rLen,
tools,
chCount: children.length,
chC,
chR,
chT,
};
});
const ops = Object.values(chat.operations || {});
const runningOps = ops.filter((o) => o.status === 'running');
// DOM probe: total rendered text in the chat scroll area (proxy for
// "how much is actually visible to the user").
const convScroll =
document.querySelector(
'[data-chat-list], [class*="ChatList"], [class*="ConversationList"]',
) ||
document.querySelector('main [class*="scroll"]') ||
document.querySelector('main');
const domTxt = convScroll ? convScroll.innerText || '' : '';
const bodyTxt = document.body.innerText || '';
const searchMatches = (bodyTxt.match(/Search pages?:|Searched the web/g) || []).length;
const crawlMatches = (bodyTxt.match(/Crawl(ed|ing) pages?/g) || []).length;
window.__PROBE_SAMPLES.push({
t: Date.now() - t0,
topicId,
msgN: msgs.length,
childN: childCount,
cT: totalContent,
rT: totalReason,
toolT: totalTools,
perMsg,
runOps: runningOps.length,
runOpTypes: runningOps.map((o) => o.type),
domLen: domTxt.length,
ind: {
search: searchMatches,
crawl: crawlMatches,
sending: bodyTxt.includes('Sending message'),
deeplyThinking: bodyTxt.includes('Deeply Thinking'),
deeplyThought: bodyTxt.includes('Deeply Thought'),
},
});
} catch (e) {
window.__PROBE_SAMPLES.push({ t: Date.now() - t0, err: e.message });
}
}
snapshot();
window.__PROBE_TIMER = setInterval(snapshot, 200);
window.__PROBE_EVENT = function (name) {
window.__PROBE_EVENTS.push({ t: Date.now() - t0, name });
};
// Tab-switch helpers installed alongside the probe.
//
// The Electron tab bar mounts each tab as a div with data-insp-path
// ending in `TabItem.tsx:...`. The active tab is marked with
// data-active="true". DO NOT search by innerText — the active tab's text
// includes a ` · <agent name>` suffix that produces false matches when
// your search string happens to overlap with the agent name.
function listTabs() {
return Array.from(
document.querySelectorAll('[data-insp-path*="TabItem.tsx"][data-contextmenu-trigger]'),
).filter((t) => t.getBoundingClientRect().top < 30);
}
function tabKey(el) {
// Stable for the tab's lifetime; survives focus changes.
return el.getAttribute('data-contextmenu-trigger');
}
function findActiveTab() {
return listTabs().find((t) => t.getAttribute('data-active') === 'true') || null;
}
// Click by stable key captured earlier (preferred for round-trips).
window.__clickTabByKey = function (key) {
const tab = listTabs().find((t) => tabKey(t) === key);
if (!tab) return 'not found: key=' + key;
if (tab.getAttribute('data-active') === 'true') return 'already active: ' + key;
tab.click();
return 'clicked key=' + key;
};
// Click by index in the tab strip (0-based, left-to-right).
window.__clickTabByIndex = function (i) {
const tabs = listTabs();
if (i < 0 || i >= tabs.length) return 'index out of range: ' + i + '/' + tabs.length;
const t = tabs[i];
if (t.getAttribute('data-active') === 'true') return 'already active: i=' + i;
t.click();
return 'clicked i=' + i + ' key=' + tabKey(t);
};
// Snapshot all tabs in order: [{key, active, title (first 60 chars of innerText)}]
window.__listTabs = function () {
return listTabs().map((t, i) => ({
i,
key: tabKey(t),
active: t.getAttribute('data-active') === 'true',
title: (t.innerText || '').slice(0, 60),
}));
};
window.__activeTabKey = function () {
const a = findActiveTab();
return a ? tabKey(a) : null;
};
return 'probe installed';
})();
@@ -1,211 +0,0 @@
// CLI for the agent-gateway probe.
//
// Bundles the TS probes with esbuild, pipes them into `agent-browser eval`,
// and persists dumps under `.agent-gateway/` (gitignored) for later use as
// streaming-replay test fixtures.
//
// Commands:
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts install
// Bundle probe-events.ts and inject into the CDP-attached browser.
// Re-installing clears all buffers and re-patches WebSocket / fetch.
//
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts dump [name]
// Stop the timeline timer, fetch the capture as JSON, write it to
// `.agent-gateway/<name>-<YYYYMMDD-HHmmss>.json`. `name` defaults to
// `dump`. Prints the absolute path written.
//
// bun run .agents/skills/local-testing/scripts/agent-gateway/run.ts analyze [path]
// Run analyze-events.ts on the dump. `path` defaults to the most
// recently modified file in `.agent-gateway/`.
//
// Optional flags:
// --cdp <port> CDP port (default 9222)
// --browser <bin> agent-browser binary (default 'agent-browser')
import { spawn } from 'node:child_process';
import { mkdirSync, readdirSync, statSync, writeFileSync } from 'node:fs';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
const SCRIPT_DIR = path.dirname(fileURLToPath(import.meta.url));
// .agents/skills/local-testing/scripts/agent-gateway/ → 5 levels up
const PROJECT_ROOT = path.resolve(SCRIPT_DIR, '../../../../..');
const DUMP_DIR = path.join(PROJECT_ROOT, '.agent-gateway');
interface Flags {
browser: string;
cdp: string;
positional: string[];
}
function parseFlags(argv: string[]): Flags {
const out: Flags = { cdp: '9222', browser: 'agent-browser', positional: [] };
for (let i = 0; i < argv.length; i++) {
const a = argv[i];
if (a === '--cdp') out.cdp = argv[++i] ?? out.cdp;
else if (a === '--browser') out.browser = argv[++i] ?? out.browser;
else out.positional.push(a);
}
return out;
}
async function bundle(entry: string): Promise<string> {
// Bun.build is built into the Bun runtime — no external dep needed.
const r = await Bun.build({
entrypoints: [path.join(SCRIPT_DIR, entry)],
target: 'browser',
format: 'esm',
minify: false,
});
if (!r.success) {
const msgs = r.logs.map((l) => `${l.level}: ${l.message}`).join('\n');
throw new Error(`bundle failed for ${entry}:\n${msgs}`);
}
return await r.outputs[0].text();
}
function wrapIife(body: string, returnExpr: string): string {
// Wrap as an IIFE that swallows the bundled top-level (top-level `const`
// declarations get scoped to the IIFE, so re-injection doesn't conflict)
// and returns the configured expression — which `agent-browser eval`
// captures and prints to stdout.
return `(() => {\n${body}\n;return ${returnExpr};\n})()`;
}
function runAgentBrowserEval(flags: Flags, script: string): Promise<string> {
return new Promise((resolveP, rejectP) => {
const child = spawn(flags.browser, ['--cdp', flags.cdp, 'eval', '--stdin'], {
stdio: ['pipe', 'pipe', 'inherit'],
});
let stdout = '';
child.stdout.on('data', (chunk: Buffer) => {
stdout += chunk.toString('utf8');
});
child.on('error', rejectP);
child.on('close', (code) => {
if (code === 0) resolveP(stdout);
else rejectP(new Error(`agent-browser exited ${code}`));
});
child.stdin.write(script);
child.stdin.end();
});
}
// agent-browser prints eval results as JSON (string values are quoted).
function unquoteAgentBrowserResult(raw: string): string {
const trimmed = raw.trim();
if (trimmed.startsWith('"') && trimmed.endsWith('"')) {
try {
return JSON.parse(trimmed) as string;
} catch {
/* fall through */
}
}
return trimmed;
}
function isoStamp(): string {
const d = new Date();
const yyyy = d.getFullYear();
const mm = String(d.getMonth() + 1).padStart(2, '0');
const dd = String(d.getDate()).padStart(2, '0');
const hh = String(d.getHours()).padStart(2, '0');
const mi = String(d.getMinutes()).padStart(2, '0');
const ss = String(d.getSeconds()).padStart(2, '0');
return `${yyyy}${mm}${dd}-${hh}${mi}${ss}`;
}
function ensureDumpDir(): void {
mkdirSync(DUMP_DIR, { recursive: true });
}
function latestDump(): string | null {
ensureDumpDir();
const entries = readdirSync(DUMP_DIR)
.filter((f) => f.endsWith('.json'))
.map((f) => ({ f, mtime: statSync(path.join(DUMP_DIR, f)).mtimeMs }))
.sort((a, b) => b.mtime - a.mtime);
return entries[0] ? path.join(DUMP_DIR, entries[0].f) : null;
}
// ── Commands ────────────────────────────────────────────────────────
async function cmdInstall(flags: Flags): Promise<void> {
const body = await bundle('probe-events.ts');
const installMsg = JSON.stringify(
'events probe installed: WebSocket+fetch interception. ' +
'WS captures operationId= sockets (gateway), fetch captures /api/agent/stream (direct).',
);
const script = wrapIife(body, installMsg);
const out = await runAgentBrowserEval(flags, script);
console.log(unquoteAgentBrowserResult(out));
}
async function cmdDump(flags: Flags): Promise<void> {
const name = flags.positional[1] ?? 'dump';
const body = await bundle('probe-dump.ts');
const script = wrapIife(body, 'window.__PROBE_LAST_DUMP_JSON');
const raw = await runAgentBrowserEval(flags, script);
const json = unquoteAgentBrowserResult(raw);
ensureDumpDir();
const filename = `${name}-${isoStamp()}.json`;
const dumpPath = path.join(DUMP_DIR, filename);
writeFileSync(dumpPath, json, 'utf8');
// Validate by parsing the meta header so we error early on bad capture
try {
const parsed = JSON.parse(json) as {
meta?: { eventCount?: number; callCount?: number; sampleCount?: number };
};
const meta = parsed.meta ?? {};
console.log(
`wrote ${dumpPath} (${json.length} bytes events=${meta.eventCount ?? '?'} ` +
`calls=${meta.callCount ?? '?'} samples=${meta.sampleCount ?? '?'})`,
);
} catch {
console.log(`wrote ${dumpPath} (${json.length} bytes — JSON.parse failed; see file)`);
}
}
async function cmdAnalyze(flags: Flags): Promise<void> {
const target = flags.positional[1] ?? latestDump();
if (!target) {
console.error('no dump file found. run `dump` first or pass a path.');
process.exit(1);
}
const child = spawn('bun', ['run', path.join(SCRIPT_DIR, 'analyze-events.ts'), target], {
stdio: 'inherit',
});
await new Promise<void>((resolveP, rejectP) => {
child.on('error', rejectP);
child.on('close', (code) => (code === 0 ? resolveP() : rejectP(new Error(`exit ${code}`))));
});
}
// ── Entry point ─────────────────────────────────────────────────────
const flags = parseFlags(process.argv.slice(2));
const cmd = flags.positional[0];
const usage = `usage:
bun run run.ts install [--cdp 9222]
bun run run.ts dump [name] [--cdp 9222]
bun run run.ts analyze [path]
`;
if (!cmd) {
console.error(usage);
process.exit(1);
}
try {
if (cmd === 'install') await cmdInstall(flags);
else if (cmd === 'dump') await cmdDump(flags);
else if (cmd === 'analyze') await cmdAnalyze(flags);
else {
console.error(`unknown command: ${cmd}\n\n${usage}`);
process.exit(1);
}
} catch (e: any) {
console.error(e?.stack ?? e);
process.exit(1);
}
@@ -1,72 +0,0 @@
// Run N round-trip tab switches with event markers timed against the probe.
//
// agent-browser --cdp 9222 eval --stdin < tab-switch.js
//
// Captures the currently-active tab as the BACK target and the rightmost
// inactive tab as the AWAY target. Both are addressed by their stable
// data-contextmenu-trigger key (NOT by visible title — the active tab's
// innerText embeds a ` · <agent name>` suffix that breaks text matching).
//
// Fires the loop in the background and returns immediately so the
// agent-browser eval doesn't have to await the full ROUND_TRIPS × DWELL_MS
// duration. Wait on the `SWITCH_LOOP_DONE` event before dumping.
//
// Refuses to launch if a previous loop is still in flight.
//
// Requires probe.js to have been installed first (provides
// window.__PROBE_EVENT / __listTabs / __clickTabByKey / __activeTabKey).
(function () {
const ROUND_TRIPS = 4;
const DWELL_MS = 10_000;
if (!window.__PROBE_EVENT || !window.__listTabs || !window.__clickTabByKey) {
return 'probe not installed — eval probe.js first';
}
if (window.__SWITCH_LOOP_RUNNING) {
return 'switch loop already running — wait for SWITCH_LOOP_DONE first';
}
const tabs = window.__listTabs();
const activeTab = tabs.find((t) => t.active);
if (!activeTab) return 'no active tab — abort';
// Pick the first inactive tab as AWAY target. With multiple inactive tabs
// you'll usually want the one that's stable across the test — feel free
// to swap to tabs[tabs.length-1] if you want the rightmost.
const inactives = tabs.filter((t) => !t.active);
if (inactives.length === 0) return 'no inactive tab to switch to — abort';
const awayTab = inactives.at(-1); // rightmost inactive
const BACK_KEY = activeTab.key;
const AWAY_KEY = awayTab.key;
window.__SWITCH_LOOP_RUNNING = true;
window.__PROBE_EVENT('SWITCH_LOOP_CONFIG:back=' + BACK_KEY + ',away=' + AWAY_KEY);
(async function () {
function sleep(ms) {
return new Promise((r) => setTimeout(r, ms));
}
try {
window.__PROBE_EVENT('SWITCH_LOOP_START');
for (let i = 1; i <= ROUND_TRIPS; i++) {
window.__PROBE_EVENT('AWAY_' + i);
const awayResult = window.__clickTabByKey(AWAY_KEY);
window.__PROBE_EVENT('AWAY_' + i + '_RES:' + awayResult.slice(0, 50));
await sleep(DWELL_MS);
window.__PROBE_EVENT('BACK_' + i);
const backResult = window.__clickTabByKey(BACK_KEY);
window.__PROBE_EVENT('BACK_' + i + '_RES:' + backResult.slice(0, 50));
await sleep(DWELL_MS);
}
window.__PROBE_EVENT('SWITCH_LOOP_DONE');
} finally {
window.__SWITCH_LOOP_RUNNING = false;
}
})();
return 'switch loop kicked off (BACK=' + BACK_KEY + ', AWAY=' + AWAY_KEY + ')';
})();
@@ -1,113 +0,0 @@
// Shared types between the in-browser probe and the Node-side analyzer.
// Kept tiny on purpose — anything the analyzer can re-derive is left off.
export interface ProbeStreamEvent {
/** Summarized payload — long strings truncated, arrays printed as Array(N) */
data?: Record<string, unknown>;
/** Keys present on the event's `data` payload — useful at a glance */
dataKeys?: string[];
/** ServerMessage.id — gateway WS frames carry an event-id we may resume from */
eventId?: string | null;
message?: string;
/** Last 10 chars of the operationId (full id is excessively long) */
opIdTail: string;
raw?: string;
/** Raw frame byte length, when applicable */
rawLen?: number;
/** For non-agent_event server frames (auth_success, heartbeat_ack, …) */
serverType?: string;
sseEvent?: string;
status?: number;
stepIndex?: number;
/** Milliseconds since the probe's t0 (install time). */
t: number;
/** 'ws' for gateway WebSocket frames, 'sse' for direct /api/agent/stream */
transport: 'ws' | 'sse';
/** Either the AgentStreamEvent.type, or a probe sentinel like `_WS_OPEN_` */
type: string;
url?: string;
}
export interface ProbeActionCall {
args?: {
count?: number;
context?: unknown;
params?: unknown;
};
error?: string;
/** `replaceMessages` / `refreshMessages` / `MARK:<label>` / `_WRAP_ERROR_` */
name: string;
stack?: string;
t: number;
}
export interface ProbeMessageSummary {
/** children.length */
chN: number;
/** content.length */
cLen: number;
/** Last 8 chars of the message id */
id: string;
/** reasoning.content.length */
rLen: number;
role: string;
/** tools.length */
tools: number;
}
export interface ProbeTimelineSample {
/** Last 10 chars of activeTopicId, or null */
activeTopic: string | null;
/** Per-key breakdown: display count, db count, message summaries */
byKey: Record<
string,
{
n: number;
dbN: number;
msgs: ProbeMessageSummary[];
}
>;
err?: string;
/** All messagesMap keys that have content at this moment */
keys: string[];
/** Number of operations in 'running' status */
runOps: number;
t: number;
}
export interface ProbeDumpMeta {
callCount: number;
/** Date.now() at dump call */
collectedAt: number;
eventCount: number;
sampleCount: number;
/** Date.now() at probe install */
t0: number;
}
export interface ProbeDump {
actionCalls: ProbeActionCall[];
meta: ProbeDumpMeta;
streamEvents: ProbeStreamEvent[];
timeline: ProbeTimelineSample[];
}
/**
* Globals the probe attaches to `window`. Keeps `as any` casts at the boundary
* instead of sprinkling them through the probe body.
*/
declare global {
interface Window {
__clickTabByKey?: (key: string) => string;
__listTabs?: () => Array<{ i: number; key: string | null; active: boolean; title: string }>;
__LOBE_STORES?: Record<string, () => any>;
__PROBE_ACTION_CALLS?: ProbeActionCall[];
__PROBE_EVENT?: (label: string) => void;
__PROBE_MSG_TIMELINE?: ProbeTimelineSample[];
__PROBE_ORIG_FETCH?: typeof fetch;
__PROBE_ORIG_WEBSOCKET?: typeof WebSocket;
__PROBE_STREAM_EVENTS?: ProbeStreamEvent[];
__PROBE_T0?: number;
__PROBE_TIMELINE_TIMER?: ReturnType<typeof setInterval> | null;
}
}
@@ -11,169 +11,86 @@
# Environment variables:
# CDP_PORT — Chrome DevTools Protocol port (default: 9222)
# ELECTRON_LOG — Log file path (default: /tmp/electron-dev.log)
# ELECTRON_WAIT_S — Max seconds to wait for CDP to become reachable (default: 90)
# RENDERER_WAIT_S — Max seconds to wait for SPA after CDP is up (default: 60)
# FORCE_KILL_USER — When set to 1, silently kill the user's `bun run dev`
# Electron without confirmation (default: always confirm-by-action)
# ELECTRON_WAIT_S — Max seconds to wait for Electron process (default: 60)
# RENDERER_WAIT_S — Max seconds to wait for renderer/SPA (default: 60)
#
set -euo pipefail
CDP_PORT="${CDP_PORT:-9222}"
ELECTRON_LOG="${ELECTRON_LOG:-/tmp/electron-dev.log}"
ELECTRON_WAIT_S="${ELECTRON_WAIT_S:-90}"
ELECTRON_WAIT_S="${ELECTRON_WAIT_S:-60}"
RENDERER_WAIT_S="${RENDERER_WAIT_S:-60}"
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
PIDFILE="/tmp/electron-dev-cdp-${CDP_PORT}.pid"
# Project-scoped electron path prefix used for pgrep matching. Any Electron
# binary from this project (main + helpers, with or without --remote-debugging-port)
# starts with this string in its argv[0], so a single substring match catches all.
PROJECT_ELECTRON_PATH="${PROJECT_ROOT}/apps/desktop/node_modules/.pnpm/electron@"
# ── Helpers ──────────────────────────────────────────────────────────
# Print pid + every descendant pid (DFS via pgrep -P).
expand_descendants() {
local pid="$1"
echo "$pid"
local children
children=$(pgrep -P "$pid" 2>/dev/null || true)
for c in $children; do
expand_descendants "$c"
done
# Get the Electron binary path used by this project
electron_bin_pattern() {
echo "${PROJECT_ROOT}/apps/desktop/node_modules/.pnpm/electron@*/node_modules/electron/dist/Electron.app"
}
# Find seed PIDs related to this project's Electron dev session.
# Matches REGARDLESS of whether --remote-debugging-port was passed, so it also
# catches a plain `bun run dev` session the user started outside this script.
find_project_pids() {
# Find all PIDs related to the project's Electron dev session
find_electron_pids() {
local pids=""
# 1. Any process whose command line mentions this project's electron path
# (covers the main Electron binary AND every Helper subprocess)
local electron_pids
electron_pids=$(pgrep -f "$PROJECT_ELECTRON_PATH" 2>/dev/null || true)
pids="$pids $electron_pids"
# 1. Main Electron process (launched with --remote-debugging-port)
local main_pids
main_pids=$(pgrep -f "Electron\.app.*--remote-debugging-port=${CDP_PORT}" 2>/dev/null || true)
[ -n "$main_pids" ] && pids="$pids $main_pids"
# 2. electron-vite dev server (narrow match to avoid catching unrelated Vite invocations)
# 2. Electron Helper processes (gpu, renderer, utility) spawned from the project's electron binary
local helper_pids
helper_pids=$(pgrep -f "${PROJECT_ROOT}/apps/desktop/node_modules/.*Electron Helper" 2>/dev/null || true)
[ -n "$helper_pids" ] && pids="$pids $helper_pids"
# 3. electron-vite dev server
local vite_pids
vite_pids=$(pgrep -f "electron-vite[/.].*\\bdev\\b" 2>/dev/null || true)
pids="$pids $vite_pids"
vite_pids=$(pgrep -f "electron-vite.*dev" 2>/dev/null || true)
[ -n "$vite_pids" ] && pids="$pids $vite_pids"
# 3. The launcher subshell from a previous `start` (saved to pidfile)
# 4. PID from pidfile (fallback)
if [ -f "$PIDFILE" ]; then
local saved_pid
saved_pid=$(cat "$PIDFILE" 2>/dev/null || true)
if [ -n "$saved_pid" ] && kill -0 "$saved_pid" 2>/dev/null; then
saved_pid=$(cat "$PIDFILE")
if kill -0 "$saved_pid" 2>/dev/null; then
pids="$pids $saved_pid"
fi
fi
# 4. Whatever is currently bound to the CDP port — catches strays whose
# binary path doesn't match (e.g. orphaned from a crashed restart)
local port_pid
port_pid=$(lsof -ti tcp:"$CDP_PORT" -sTCP:LISTEN 2>/dev/null || true)
pids="$pids $port_pid"
# `|| true` because `grep -v '^$'` exits 1 when input has no non-empty
# lines, which (with pipefail + set -e) silently kills the caller.
# Deduplicate
echo "$pids" | tr ' ' '\n' | sort -u | grep -v '^$' | tr '\n' ' ' || true
}
# Wait for the CDP HTTP endpoint to respond, with a deadline + early bail-out
# if the launcher process died (no point waiting if Electron crashed).
wait_for_cdp() {
local deadline=$(( $(date +%s) + ELECTRON_WAIT_S ))
echo "[electron-dev] Waiting for CDP on port ${CDP_PORT} (up to ${ELECTRON_WAIT_S}s)..."
while [ "$(date +%s)" -lt "$deadline" ]; do
if curl -sf --max-time 2 "http://localhost:${CDP_PORT}/json/version" >/dev/null 2>&1; then
echo "[electron-dev] CDP is reachable."
return 0
fi
# If our launcher subshell died, abort early so we don't hang the full timeout
if [ -f "$PIDFILE" ]; then
local saved_pid
saved_pid=$(cat "$PIDFILE" 2>/dev/null || true)
if [ -n "$saved_pid" ] && ! kill -0 "$saved_pid" 2>/dev/null; then
echo "[electron-dev] Launcher PID $saved_pid is gone before CDP came up."
echo "[electron-dev] Last 30 lines of $ELECTRON_LOG:"
tail -30 "$ELECTRON_LOG" 2>/dev/null || true
return 1
fi
fi
sleep 2
done
echo "[electron-dev] ERROR: CDP did not respond within ${ELECTRON_WAIT_S}s"
echo "[electron-dev] Last 30 lines of $ELECTRON_LOG:"
tail -30 "$ELECTRON_LOG" 2>/dev/null || true
return 1
}
# After CDP is up, wait until the SPA renders interactive elements.
wait_for_renderer() {
local deadline=$(( $(date +%s) + RENDERER_WAIT_S ))
echo "[electron-dev] Waiting for SPA to load (up to ${RENDERER_WAIT_S}s)..."
while [ "$(date +%s)" -lt "$deadline" ]; do
local snap
snap=$(agent-browser --cdp "$CDP_PORT" snapshot -i 2>&1 || true)
if echo "$snap" | grep -qE '\b(link|button)\b'; then
echo "[electron-dev] Renderer ready."
return 0
fi
sleep 2
done
echo "[electron-dev] WARNING: Renderer not interactive within ${RENDERER_WAIT_S}s — proceeding anyway."
return 0
}
# ── Commands ─────────────────────────────────────────────────────────
do_stop() {
echo "[electron-dev] Stopping Electron dev environment..."
local seed_pids
seed_pids=$(find_project_pids)
local pids
pids=$(find_electron_pids)
# Expand to include all descendants — catches helpers spawned by the main
# process AFTER our pgrep snapshot, and the launcher's child node/electron-vite
# process tree.
local all_pids=""
for pid in $seed_pids; do
all_pids="$all_pids $(expand_descendants "$pid")"
done
all_pids=$(echo "$all_pids" | tr ' ' '\n' | sort -u | grep -v '^$' | tr '\n' ' ' || true)
if [ -z "$all_pids" ]; then
echo "[electron-dev] No project Electron/vite processes found."
if [ -z "$pids" ]; then
echo "[electron-dev] No Electron processes found."
else
local count
count=$(echo "$all_pids" | tr ' ' '\n' | grep -c .)
echo "[electron-dev] Sending SIGTERM to $count process(es): $all_pids"
for pid in $all_pids; do
echo "[electron-dev] Killing PIDs: $pids"
for pid in $pids; do
kill "$pid" 2>/dev/null || true
done
# Wait up to 5s for graceful exit
# Wait up to 5s for graceful exit, then force-kill survivors
local waited=0
while [ $waited -lt 5 ]; do
local any_alive=0
for pid in $all_pids; do
if kill -0 "$pid" 2>/dev/null; then any_alive=1; break; fi
local alive=""
for pid in $pids; do
kill -0 "$pid" 2>/dev/null && alive="$alive $pid"
done
[ "$any_alive" = "0" ] && break
[ -z "$alive" ] && break
sleep 1
waited=$((waited + 1))
done
# SIGKILL anyone still alive
for pid in $all_pids; do
# Force-kill any remaining
for pid in $pids; do
if kill -0 "$pid" 2>/dev/null; then
echo "[electron-dev] Force-killing PID $pid"
kill -9 "$pid" 2>/dev/null || true
@@ -181,27 +98,7 @@ do_stop() {
done
fi
# Belt-and-suspenders: anything still bound to the CDP port goes away
local port_pid
port_pid=$(lsof -ti tcp:"$CDP_PORT" -sTCP:LISTEN 2>/dev/null || true)
if [ -n "$port_pid" ]; then
echo "[electron-dev] Port $CDP_PORT still bound by PID $port_pid; force-killing"
# shellcheck disable=SC2086
kill -9 $port_pid 2>/dev/null || true
fi
# Also re-sweep the project's electron processes — sometimes the OS spawns
# new helpers during shutdown that didn't exist when we first enumerated.
local stragglers
stragglers=$(pgrep -f "$PROJECT_ELECTRON_PATH" 2>/dev/null || true)
if [ -n "$stragglers" ]; then
echo "[electron-dev] Cleaning up stragglers: $stragglers"
for pid in $stragglers; do
kill -9 "$pid" 2>/dev/null || true
done
fi
# Close any agent-browser sessions connected to this port
# Also close any agent-browser sessions connected to this port
agent-browser --cdp "$CDP_PORT" close --all 2>/dev/null || true
rm -f "$PIDFILE"
@@ -210,91 +107,113 @@ do_stop() {
do_status() {
local pids
pids=$(find_project_pids)
pids=$(find_electron_pids)
if [ -z "$pids" ]; then
echo "[electron-dev] No project Electron processes found."
echo "[electron-dev] Electron is NOT running."
return 1
fi
echo "[electron-dev] Project processes: $pids"
echo "[electron-dev] Electron is running (PIDs: $pids)"
if curl -sf --max-time 2 "http://localhost:${CDP_PORT}/json/version" >/dev/null 2>&1; then
# Check CDP connectivity
if agent-browser --cdp "$CDP_PORT" get url >/dev/null 2>&1; then
local url
url=$(agent-browser --cdp "$CDP_PORT" get url 2>&1 | tail -1 || echo "?")
url=$(agent-browser --cdp "$CDP_PORT" get url 2>&1 | tail -1)
echo "[electron-dev] CDP port ${CDP_PORT} is reachable. URL: $url"
return 0
else
echo "[electron-dev] CDP port ${CDP_PORT} is NOT reachable (no --remote-debugging-port, or still loading)."
echo "[electron-dev] CDP port ${CDP_PORT} is NOT reachable (Electron may still be loading)."
return 2
fi
}
wait_for_electron() {
echo "[electron-dev] Waiting for Electron process (up to ${ELECTRON_WAIT_S}s)..."
local elapsed=0
local interval=3
while [ $elapsed -lt "$ELECTRON_WAIT_S" ]; do
if strings "$ELECTRON_LOG" 2>/dev/null | grep -q "starting electron"; then
echo "[electron-dev] Electron process started."
return 0
fi
sleep "$interval"
elapsed=$((elapsed + interval))
echo "[electron-dev] Still waiting... (${elapsed}/${ELECTRON_WAIT_S}s)"
done
echo "[electron-dev] ERROR: Electron did not start within ${ELECTRON_WAIT_S}s"
echo "[electron-dev] Last 20 lines of log:"
tail -20 "$ELECTRON_LOG" 2>/dev/null || true
return 1
}
wait_for_renderer() {
echo "[electron-dev] Waiting for renderer/SPA to load (up to ${RENDERER_WAIT_S}s)..."
# Initial delay — renderer needs time to bootstrap
sleep 10
local elapsed=10
local interval=5
while [ $elapsed -lt "$RENDERER_WAIT_S" ]; do
if agent-browser --cdp "$CDP_PORT" wait 2000 >/dev/null 2>&1; then
# Check if interactive elements are present (SPA loaded)
local snap
snap=$(agent-browser --cdp "$CDP_PORT" snapshot -i 2>&1 || true)
if echo "$snap" | grep -qE 'link |button '; then
echo "[electron-dev] Renderer ready (interactive elements found)."
return 0
fi
fi
sleep "$interval"
elapsed=$((elapsed + interval))
echo "[electron-dev] SPA still loading... (${elapsed}/${RENDERER_WAIT_S}s)"
done
echo "[electron-dev] WARNING: Timed out waiting for renderer, proceeding anyway."
return 0
}
do_start() {
# Already up and CDP is reachable → nothing to do
if curl -sf --max-time 2 "http://localhost:${CDP_PORT}/json/version" >/dev/null 2>&1; then
echo "[electron-dev] CDP already reachable on port $CDP_PORT. Skipping start."
echo "[electron-dev] Use 'restart' to force a fresh session."
# If already running and healthy, skip
local status_ok=0
do_status >/dev/null 2>&1 || status_ok=$?
if [ "$status_ok" -eq 0 ]; then
echo "[electron-dev] Electron is already running and CDP is reachable. Skipping start."
echo "[electron-dev] Use 'restart' to force a fresh session, or 'stop' to tear down."
return 0
fi
# Detect the user's existing dev session (or stale processes) BEFORE killing
local existing
existing=$(find_project_pids)
if [ -n "$existing" ]; then
echo "[electron-dev] Existing project Electron/vite processes detected:"
echo "$existing" | tr ' ' '\n' | sed 's/^/[electron-dev] PID /'
echo "[electron-dev] Tearing them down so we can start a CDP-enabled session..."
fi
# Clean up any stale processes
do_stop
# Wait for port + user-data-dir locks to release. Without this, the new
# Electron may fail with "user data directory in use" or fail to bind CDP.
local waited=0
while [ $waited -lt 10 ]; do
if ! lsof -i tcp:"$CDP_PORT" >/dev/null 2>&1 \
&& ! pgrep -f "$PROJECT_ELECTRON_PATH" >/dev/null 2>&1; then
break
fi
[ $waited -eq 0 ] && echo "[electron-dev] Waiting for port + Electron locks to release..."
sleep 1
waited=$((waited + 1))
done
# Start fresh
echo "[electron-dev] Starting Electron dev server..."
echo "[electron-dev] Project: $PROJECT_ROOT"
echo "[electron-dev] Project: $PROJECT_ROOT"
echo "[electron-dev] CDP port: $CDP_PORT"
echo "[electron-dev] Log: $ELECTRON_LOG"
echo "[electron-dev] Log: $ELECTRON_LOG"
: > "$ELECTRON_LOG" # Truncate log
# Launch in a new session (setsid) so the whole process tree shares a PGID
# we can later signal in one shot. `setsid bash -c '... exec ...' &` keeps
# the bash shell as the session leader; its PID is what we save.
# macOS doesn't ship setsid by default — fall back to plain bash; cleanup
# still works via `expand_descendants` walking the process tree.
local launch_cmd="
cd '$PROJECT_ROOT/apps/desktop'
exec npx electron-vite dev -- --remote-debugging-port=$CDP_PORT
"
if command -v setsid >/dev/null 2>&1; then
setsid bash -c "$launch_cmd" >> "$ELECTRON_LOG" 2>&1 < /dev/null &
else
bash -c "$launch_cmd" >> "$ELECTRON_LOG" 2>&1 < /dev/null &
fi
local launcher_pid=$!
echo "$launcher_pid" > "$PIDFILE"
echo "[electron-dev] Launcher PID (session leader): $launcher_pid"
(
cd "$PROJECT_ROOT/apps/desktop" && \
ELECTRON_ENABLE_LOGGING=1 npx electron-vite dev -- --remote-debugging-port="$CDP_PORT" \
>> "$ELECTRON_LOG" 2>&1
) &
local bg_pid=$!
echo "$bg_pid" > "$PIDFILE"
echo "[electron-dev] Background PID: $bg_pid"
if ! wait_for_cdp; then
echo "[electron-dev] Failed to bring up CDP. Cleaning up..."
# Wait for Electron process to start
if ! wait_for_electron; then
echo "[electron-dev] Failed to start. Cleaning up..."
do_stop
return 1
fi
# Wait for renderer to be interactive
if ! wait_for_renderer; then
echo "[electron-dev] Renderer not interactive — you may need to wait more."
echo "[electron-dev] Renderer not ready, but Electron is running. You may need to wait more."
fi
echo "[electron-dev] Ready! Use: agent-browser --cdp $CDP_PORT snapshot -i"
@@ -302,7 +221,7 @@ do_start() {
do_restart() {
do_stop
sleep 1
sleep 2
do_start
}
@@ -316,12 +235,10 @@ case "${1:-help}" in
*)
echo "Usage: $0 {start|stop|status|restart}"
echo ""
echo " start — Start Electron dev with CDP. Detects + tears down any"
echo " existing project Electron (e.g. \`bun run dev\`) first."
echo " stop — Kill all project Electron/vite processes (main + helpers"
echo " + descendants), with SIGTERM → 5s wait → SIGKILL fallback."
echo " status — Check if Electron is running and CDP is reachable."
echo " restart — Stop then start."
echo " start — Start Electron dev with CDP (idempotent, skips if already running)"
echo " stop — Kill all Electron dev processes (main + helpers + vite)"
echo " status — Check if Electron is running and CDP is reachable"
echo " restart — Stop then start"
exit 1
;;
esac
@@ -1,189 +0,0 @@
#!/usr/bin/env bash
#
# record-app-screen.sh — Record the Electron app window (video + screenshots)
#
# Captures screenshots via agent-browser (CDP), then assembles into video on stop.
# Works on any screen (including external monitors) since it uses CDP, not screen capture.
#
# Usage:
# ./record-app-screen.sh start [output_name] # Begin recording
# ./record-app-screen.sh stop # Stop and save
# ./record-app-screen.sh status # Check recording state
#
# Outputs to .records/ directory:
# .records/<name>.mp4 — Video assembled from screenshots (~2 fps)
# .records/<name>/ — Screenshots every SCREENSHOT_INTERVAL seconds
#
# Prerequisites:
# - ffmpeg installed (bun add -g ffmpeg-static, or brew install ffmpeg)
# - agent-browser CLI installed
# - Electron app already running with CDP enabled
#
# Environment variables:
# CDP_PORT — Chrome DevTools Protocol port (default: 9222)
# SCREENSHOT_INTERVAL — Seconds between gallery screenshots (default: 3)
# VIDEO_FRAME_INTERVAL — Seconds between video frames (default: 0.5)
#
# Examples:
# ./electron-dev.sh start
# ./record-app-screen.sh start gateway-demo
# # ... run automation via agent-browser ...
# ./record-app-screen.sh stop
#
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
PROJECT_DIR="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
RECORDS_DIR="$PROJECT_DIR/.records"
PID_FILE="/tmp/record-app-screen.pids"
STATE_FILE="/tmp/record-app-screen.state"
CDP_PORT="${CDP_PORT:-9222}"
SCREENSHOT_INTERVAL="${SCREENSHOT_INTERVAL:-3}"
VIDEO_FRAME_INTERVAL="${VIDEO_FRAME_INTERVAL:-0.5}"
AB="agent-browser --cdp $CDP_PORT"
# ─── Commands ───
cmd_start() {
local output_name="${1:-recording-$(date +%Y%m%d-%H%M%S)}"
local output_video="$RECORDS_DIR/${output_name}.mp4"
local screenshot_dir="$RECORDS_DIR/${output_name}"
local frames_dir
frames_dir=$(mktemp -d /tmp/record-frames-XXXXXX)
if [ -f "$PID_FILE" ]; then
echo "[record] A recording is already active. Run '$0 stop' first."
exit 1
fi
mkdir -p "$RECORDS_DIR" "$screenshot_dir"
# Video frames loop (~2 fps via agent-browser CDP screenshots)
(
local idx=0
while true; do
local fname
fname=$(printf "%s/frame_%06d.png" "$frames_dir" "$idx")
$AB screenshot "$fname" 2>/dev/null || true
idx=$((idx + 1))
sleep "$VIDEO_FRAME_INTERVAL"
done
) &
local frames_pid=$!
# Gallery screenshots loop (every N seconds for human review)
(
local idx=0
while true; do
local fname
fname=$(printf "%s/%04d.png" "$screenshot_dir" "$idx")
$AB screenshot "$fname" 2>/dev/null || true
idx=$((idx + 1))
sleep "$SCREENSHOT_INTERVAL"
done
) &
local screenshot_pid=$!
# Save state
echo "$frames_pid $screenshot_pid" > "$PID_FILE"
echo "$output_video $frames_dir $screenshot_dir" > "$STATE_FILE"
echo "[record] Started!"
echo " Video frames: every ${VIDEO_FRAME_INTERVAL}s (PID $frames_pid)"
echo " Screenshots: every ${SCREENSHOT_INTERVAL}s → $screenshot_dir/"
echo " Stop with: $0 stop"
}
cmd_stop() {
if [ ! -f "$PID_FILE" ] || [ ! -f "$STATE_FILE" ]; then
echo "[record] No active recording found."
return 0
fi
local frames_pid screenshot_pid
read -r frames_pid screenshot_pid < "$PID_FILE"
local output_video frames_dir screenshot_dir
read -r output_video frames_dir screenshot_dir < "$STATE_FILE"
# Stop both capture loops
kill "$frames_pid" 2>/dev/null || true
kill "$screenshot_pid" 2>/dev/null || true
wait "$frames_pid" 2>/dev/null || true
wait "$screenshot_pid" 2>/dev/null || true
# Assemble frames into video
local frame_count
frame_count=$(ls -1 "$frames_dir"/frame_*.png 2>/dev/null | wc -l | tr -d ' ')
if [ "$frame_count" -gt 0 ]; then
echo "[record] Assembling $frame_count frames into video..."
ffmpeg -y -framerate 2 -i "$frames_dir/frame_%06d.png" \
-c:v libx264 -crf 23 -pix_fmt yuv420p -an \
"$output_video" > /tmp/ffmpeg-assemble.log 2>&1
if [ ! -s "$output_video" ]; then
echo " [warn] Video assembly failed. Check /tmp/ffmpeg-assemble.log"
echo " Frames preserved in: $frames_dir/"
fi
else
echo " [warn] No frames captured."
fi
rm -rf "$frames_dir" 2>/dev/null
rm -f "$PID_FILE" "$STATE_FILE"
local video_size screenshot_count
video_size=$(ls -lh "$output_video" 2>/dev/null | awk '{print $5}' || echo "?")
screenshot_count=$(ls -1 "$screenshot_dir"/*.png 2>/dev/null | wc -l | tr -d ' ' || echo "0")
echo "[record] Stopped!"
echo " Video: $output_video ($video_size)"
echo " Screenshots: ${screenshot_count} files in $screenshot_dir/"
echo " Play: open $output_video"
}
cmd_status() {
if [ ! -f "$PID_FILE" ]; then
echo "[record] No active recording."
return 0
fi
local frames_pid screenshot_pid
read -r frames_pid screenshot_pid < "$PID_FILE"
local frames_ok="no" screenshot_ok="no"
kill -0 "$frames_pid" 2>/dev/null && frames_ok="yes"
kill -0 "$screenshot_pid" 2>/dev/null && screenshot_ok="yes"
if [ -f "$STATE_FILE" ]; then
local output_video frames_dir screenshot_dir
read -r output_video frames_dir screenshot_dir < "$STATE_FILE"
local frame_count ss_count
frame_count=$(ls -1 "$frames_dir"/frame_*.png 2>/dev/null | wc -l | tr -d ' ' || echo "0")
ss_count=$(ls -1 "$screenshot_dir"/*.png 2>/dev/null | wc -l | tr -d ' ' || echo "0")
echo "[record] Active recording"
echo " Frames: $frame_count captured (running: $frames_ok)"
echo " Screenshots: $ss_count captured (running: $screenshot_ok)"
echo " Output: $output_video"
fi
}
# ─── Main ───
case "${1:-}" in
start) shift; cmd_start "$@" ;;
stop) cmd_stop ;;
status) cmd_status ;;
*)
echo "Usage: $0 {start [name] | stop | status}"
echo ""
echo " start [name] Start recording (default: recording-YYYYMMDD-HHMMSS)"
echo " stop Stop recording and save outputs"
echo " status Check if recording is active"
exit 1
;;
esac
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -80,5 +80,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -72,5 +72,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -60,5 +60,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -75,5 +75,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
@@ -81,5 +81,5 @@ echo "[$APP] Waiting ${WAIT}s for bot response..."
sleep "$WAIT"
echo "[$APP] Capturing screenshot..."
"$SCRIPT_DIR/../capture-app-window.sh" "$APP" "$SCREENSHOT"
"$SCRIPT_DIR/capture-app-window.sh" "$APP" "$SCREENSHOT"
echo "[$APP] Done! Screenshot saved to $SCREENSHOT"
+1 -7
View File
@@ -1,16 +1,10 @@
---
name: microcopy
description: 'UI copy and microcopy guidelines. Use for user-facing copy, buttons, errors, empty states, onboarding, i18n wording, translation, or copy improvements in Chinese or English.'
user-invocable: false
description: UI copy and microcopy guidelines. Use when writing UI text, buttons, error messages, empty states, onboarding, or any user-facing copy. Triggers on i18n translation, UI text writing, or copy improvement tasks. Supports both Chinese and English.
---
# LobeHub UI Microcopy Guidelines
This file is the quick-reference summary. For full prompt-style guidelines with extensive examples (anti-patterns, tone matrices, scenario walk-throughs), load the language-specific reference:
- **中文文案** — [`references/zh.md`](./references/zh.md)
- **English copy** — [`references/en.md`](./references/en.md)
Brand: **Where Agents Collaborate** - Focus on collaborative agent system, not just "generation".
## Fixed Terminology
+36 -73
View File
@@ -1,76 +1,64 @@
---
name: modal
description: 'LobeHub imperative modal conventions. Use when creating or migrating modals, dialogs, popups, confirm flows, ModalHost wiring, createModal, confirmModal, useModalContext, or base-ui modal APIs.'
description: Modal imperative API guide. Use when creating modal dialogs using createModal from @lobehub/ui. Triggers on modal component implementation or dialog creation tasks.
user-invocable: false
---
# Modal Imperative API Guide
## Recommended: `@lobehub/ui/base-ui`
Use `createModal` from `@lobehub/ui` for imperative modal dialogs.
New code should use the **base-ui** modal stack (headless primitives, not antd `Modal`):
## Why Imperative?
- `createModal`, `confirmModal`, `ModalHost` from `@lobehub/ui/base-ui`
- `useModalContext` from `@lobehub/ui/base-ui` inside modal **content**
| Mode | Characteristics | Recommended |
| ----------- | ------------------------------------- | ----------- |
| Declarative | Need `open` state, render `<Modal />` | ❌ |
| Imperative | Call function directly, no state | ✅ |
Body slot: pass **`content`** (or `children`; runtime uses `content ?? children`).
### Global `ModalHost` (required)
Base-ui `createModal` renders through a **separate** host from the root package. The app must mount **`ModalHost`** from `@lobehub/ui/base-ui` once near the root (e.g. next to other global hosts). Without it, `createModal` calls will not appear.
If the project only mounts `ModalHost` from `@lobehub/ui`, add a second lazy `ModalHost` from `@lobehub/ui/base-ui` until all imperative modals are migrated.
### Why imperative?
| Mode | Characteristics | Recommended |
| ----------- | ------------------------------------ | ----------- |
| Declarative | `open` state + `<Modal />` | ❌ |
| Imperative | Call `createModal()`, no local state | ✅ |
### File structure
## File Structure
```
features/
└── MyFeatureModal/
├── index.tsx # export createXxxModal
└── MyFeatureContent.tsx # modal body
├── index.tsx # Export createXxxModal
└── MyFeatureContent.tsx # Modal content
```
### 1. Content (`MyFeatureContent.tsx`)
## Implementation
### 1. Content Component (`MyFeatureContent.tsx`)
```tsx
'use client';
import { useModalContext } from '@lobehub/ui/base-ui';
import { useModalContext } from '@lobehub/ui';
import { useTranslation } from 'react-i18next';
export const MyFeatureContent = () => {
const { t } = useTranslation('namespace');
const { close } = useModalContext();
const { close } = useModalContext(); // Optional: get close method
return <div>{/* ... */}</div>;
return <div>{/* Modal content */}</div>;
};
```
### 2. `createModal` (`index.tsx`)
### 2. Export createModal (`index.tsx`)
```tsx
'use client';
import { createModal } from '@lobehub/ui/base-ui';
import { t } from 'i18next';
import { createModal } from '@lobehub/ui';
import { t } from 'i18next'; // Note: use i18next, not react-i18next
import { MyFeatureContent } from './MyFeatureContent';
export const createMyFeatureModal = () =>
createModal({
content: <MyFeatureContent />,
allowFullscreen: true,
children: <MyFeatureContent />,
destroyOnHidden: false,
footer: null,
maskClosable: true,
styles: {
content: { overflow: 'hidden', padding: 0 },
},
styles: { body: { overflow: 'hidden', padding: 0 } },
title: t('myFeature.title', { ns: 'setting' }),
width: 'min(80%, 800px)',
});
@@ -88,52 +76,27 @@ const handleOpen = useCallback(() => {
return <Button onClick={handleOpen}>Open</Button>;
```
### i18n
## i18n Handling
- **Content**: `useTranslation` in components.
- **`createModal` options**: `import { t } from 'i18next'` where hooks are unavailable.
- **Content component**: `useTranslation` hook (React context)
- **createModal params**: `import { t } from 'i18next'` (non-hook, imperative)
### `useModalContext`
## useModalContext Hook
```tsx
const { close, setCanDismissByClickOutside } = useModalContext();
```
### Common options (base-ui)
## Common Config
`ImperativeModalProps` builds on `BaseModalProps`: `title`, `width`, `maskClosable`, `open`, `onOpenChange`, `footer`, `styles` / `classNames` (keys: `backdrop`, `popup`, `header`, `title`, `close`, `content`, …).
| Property | Notes |
| -------------- | ---------------------------------------- |
| `content` | Main body (preferred name vs `children`) |
| `maskClosable` | Click outside to dismiss |
| `styles.*` | Semantic regions, not antd `styles.body` |
### Confirm
```tsx
import { confirmModal } from '@lobehub/ui/base-ui';
confirmModal({
title: '…',
content: '…',
okText: '…',
cancelText: '…',
onOk: async () => {},
});
```
---
## Legacy: `@lobehub/ui` (root)
Older call sites use **`createModal` from `@lobehub/ui`**, which is typed as **antd `Modal` props** (`children`, `allowFullscreen`, `getContainer`, `destroyOnHidden`, `styles.body`, etc.). Prefer migrating new work to **`@lobehub/ui/base-ui`**.
Examples (legacy): `src/features/SkillStore/index.tsx`, `src/features/LibraryModal/CreateNew/index.tsx`.
---
| Property | Type | Description |
| ----------------- | ------------------- | ------------------------ |
| `allowFullscreen` | `boolean` | Allow fullscreen mode |
| `destroyOnHidden` | `boolean` | Destroy content on close |
| `footer` | `ReactNode \| null` | Footer content |
| `width` | `string \| number` | Modal width |
## Examples
- Base-ui (preferred): follow sections above; ensure **base-ui `ModalHost`** is mounted.
- Legacy: `src/features/SkillStore/index.tsx`, `src/features/LibraryModal/CreateNew/index.tsx`
- `src/features/SkillStore/index.tsx`
- `src/features/LibraryModal/CreateNew/index.tsx`
+2 -81
View File
@@ -1,7 +1,7 @@
---
name: pr
description: "Create a PR for the current branch (targets `canary` by default), including splitting one cross-layer branch into ordered stacked PRs so a lower layer (db / shared package / server TRPC) merges before its callers (desktop / CLI / UI). Use when the user asks to create / submit a PR, or to split a branch because clients call a server contract that isn't on the trunk yet. Triggers on 'pr', 'create pr', 'submit pr', 'open a PR', 'pull request', 'split this PR', 'stacked PR', 'backend should merge first', '提 PR', '提个 PR', '新建 PR', '拆 PR', '后端先合', '分层合并'."
user-invocable: true
description: "Create a PR for the current branch. Use when the user asks to create a pull request, submit PR, or says 'pr'."
user_invocable: true
---
# Create Pull Request
@@ -71,82 +71,3 @@ Use `.github/PULL_REQUEST_TEMPLATE.md` as the body structure. Key sections:
- **Language**: All PR content must be in English
- If a PR already exists for the branch, inform the user instead of creating a duplicate
---
# Stacked PRs (cross-layer feature)
The steps above create **one** PR for the current branch. When a single branch lands across layers — `packages/database` schema/model → a shared `packages/*` lib → `src/server` TRPC → `apps/desktop` + `apps/cli` callers → `src/features` UI — shipping it as one PR can't merge safely: the clients call an endpoint that doesn't exist on the trunk until the same PR merges, so any partial/rollback or independent review breaks. Split it into **ordered PRs**, lower layer first.
## The ordering rule
A PR may only merge **after** every layer it calls is already on the trunk.
- The **server contract** (new TRPC procedure, changed return shape, new table/model) merges first.
- The **callers** (desktop, CLI, UI) merge after — they invoke that contract.
- Tie-break with one question: _"if this merged alone to `canary` right now, would it build and behave?"_ If no, it belongs in a later PR.
## Which file goes in which PR
The non-obvious calls:
- **Frontend that adapts to a contract change goes WITH the server PR.** If you widen a TRPC return shape (e.g. `listDevices` now returns `platform: string | null`), the component consuming it must change in the _same_ PR — otherwise the server PR breaks the build on its own. Contract + its in-repo consumers ship together.
- **A new shared package goes with its consumer**, not the server, unless the server imports it too. A `@lobechat/*` package imported only by desktop/CLI ships in the client PR. Don't carry an unused package in the lower PR.
- **Workspace dep declarations** (`package.json` `workspace:*`, `pnpm-workspace.yaml`) travel with the code that imports the package.
## The git recipe — split an existing full branch
Starting point: one branch (`feat/x`) with a single commit `<FULL>` containing everything, already pushed (so it's also safe on the remote).
```bash
# 1. Safety nets — make the full work unloseable before rewriting anything
git branch backup/x-full <FULL> # local ref to the full commit
git branch feat/x-clients <FULL> # the higher-layer branch starts here
# 2. Rewrite the lower-layer branch to lower-layer files only
git checkout feat/x # this becomes the SERVER PR
git reset --hard origin/canary
git checkout <FULL> -- <server/db files…> # stages just those paths
git commit -m "✨ feat(...): <server half>"
git push --force-with-lease origin feat/x # never --force; never push to canary
# 3. Build the higher-layer branch STACKED on the lower branch
git checkout feat/x-clients
git reset --hard feat/x # base = the just-rewritten server HEAD
git checkout backup/x-full -- <client/ui files…> # only the remaining paths
git commit -m "✨ feat(...): <client half>"
git push -u origin feat/x-clients
```
Then open the higher PR **based on the lower branch**, not the trunk:
```bash
gh pr create --base feat/x --head feat/x-clients --title "…" --body "…"
```
`--base feat/x` keeps the diff client-only (no server files leak in) and makes it physically impossible to merge the clients before the server. **After the server PR merges to `canary`, retarget the client PR's base to `canary`** (GitHub usually auto-retargets when the base branch merges; note it in the PR body so a human confirms).
## Verify the dependency actually holds
The whole point is the higher layer needs the lower one. Prove it: on the stacked higher branch, type-check the caller and confirm the symbol the lower layer introduced resolves.
```bash
cd apps/cli && bun run type-check 2>&1 | grep -iE "connect\.ts|device\.register"
# empty (re: your change) = the stacked base supplies device.register ✓
```
Filter to your touched files — this repo's standalone type-check emits pre-existing env noise (`__ELECTRON__`, `@/types/llm`, unbuilt `@lobechat/types`) that isn't yours.
## PR + Linear bookkeeping
- **Each PR closes only its own layer's issues.** Server PR: `Closes LOBE-<server>`. Client PR: `Closes LOBE-<pkg> / <desktop> / <cli>`. Don't let one PR's body claim another layer's issue.
- Both PRs are `Part of LOBE-<parent>`.
- On PR creation, move each closed sub-issue to **In Review** (not Done) and add a completion comment — see the `linear` skill.
## Gotchas
- **Never push to `canary`.** A split branch cut with `git checkout -b feat/x origin/canary` _tracks_ `origin/canary`, so a bare `git push` targets canary. Always `git push origin feat/x` with the explicit branch name.
- **`--force-with-lease`, not `--force`** when rewriting the lower branch — it aborts if the remote moved under you.
- **Back up before `reset --hard`.** Step 1's `backup/x-full` + the pushed remote branch mean the full commit is referenced by ≥3 refs before you rewrite anything. Verify with `git branch --contains <FULL>`.
- **Lockfiles:** this monorepo commits no root `pnpm-lock.yaml`, so a new `workspace:*` dep needs no lockfile churn. In a repo that _does_ commit one, regenerate it on each branch after the split.
- **Don't over-split.** Two PRs (contract / callers) is usually enough. A UI page that only reads an existing endpoint can be its own later PR, but don't fragment a single layer across PRs for its own sake.
+115 -63
View File
@@ -1,25 +1,19 @@
---
name: project-overview
description: 'LobeHub open-source monorepo architecture map. Use when locating code layers, understanding apps/packages/src layout, business stubs, project structure, or onboarding to the repository.'
user-invocable: false
description: Complete project architecture and structure guide. Use when exploring the codebase, understanding project organization, finding files, or needing comprehensive architectural context. Triggers on architecture questions, directory navigation, or project overview needs.
---
# LobeHub Project Overview
> The directory listings below are a **curated map of key locations**, not an
> exhaustive tree. `packages/`, `src/store/`, route groups etc. grow over time —
> run `ls` against the real directory for the current set.
## Project Description
Open-source, modern-design AI Agent Workspace: **LobeHub** (previously LobeChat).
This repo is the **open-source root** (`github.com/lobehub/lobehub`, package `@lobehub/lobehub`).
**Supported platforms:**
- Web desktop/mobile
- Desktop (Electron)`apps/desktop`
- Mobile app (React Native) **separate repo, already launched** (not in this monorepo)
- Desktop (Electron)
- Mobile app (React Native) - coming soon
**Logo emoji:** 🤯
@@ -44,49 +38,123 @@ This repo is the **open-source root** (`github.com/lobehub/lobehub`, package `@l
| Database | Neon PostgreSQL + Drizzle ORM |
| Testing | Vitest |
> Exact versions live in the root `package.json` — check there, not here.
## Complete Project Structure
## Monorepo Layout
Flat layout — `apps/`, `packages/`, and `src/` all sit at the repo root. No
git submodules.
Monorepo using `@lobechat/` namespace for workspace packages.
```
(repo root)
lobehub/
├── apps/
── cli/ # LobeHub CLI
├── desktop/ # Electron desktop app
── device-gateway/ # Device gateway service
├── docs/ # changelog, development, self-hosting, usage
├── locales/ # en-US, zh-CN, ...
├── packages/ # ~80 @lobechat/* workspace packages — `ls` for the full set. Key ones:
│ ├── agent-runtime/ # Agent runtime core
│ ├── agent-signal/ # Agent Signal pipeline
── agent-tracing/ # Tracing / snapshots
│ ├── builtin-tool-*/ # Per-tool packages (calculator, web-browsing, claude-code, ...)
│ ├── builtin-tools/ # Central registries that compose builtin-tool-*
── desktop/ # Electron desktop app
├── docs/
── changelog/
├── development/
│ ├── self-hosting/
│ └── usage/
├── locales/
│ ├── en-US/
── zh-CN/
├── packages/
│ ├── agent-runtime/ # Agent runtime
│ ├── builtin-agents/
│ ├── builtin-tool-*/ # Builtin tool packages
│ ├── business/ # Cloud-only business logic
│ │ ├── config/
│ │ ├── const/
│ │ └── model-runtime/
│ ├── config/
│ ├── const/
│ ├── context-engine/
│ ├── database/ # src/{models,schemas,repositories}
│ ├── model-bank/ # Model definitions & provider cards
├── model-runtime/ # src/{core,providers}
├── business/ # Open-source stubs (config, const, model-bank, model-runtime) — overridden by cloud
│ ├── conversation-flow/
│ ├── database/
│ └── src/
│ ├── models/
│ │ ├── schemas/
│ │ └── repositories/
│ ├── desktop-bridge/
│ ├── edge-config/
│ ├── editor-runtime/
│ ├── electron-client-ipc/
│ ├── electron-server-ipc/
│ ├── fetch-sse/
│ ├── file-loaders/
│ ├── memory-user-memory/
│ ├── model-bank/
│ ├── model-runtime/
│ │ └── src/
│ │ ├── core/
│ │ └── providers/
│ ├── observability-otel/
│ ├── prompts/
│ ├── python-interpreter/
│ ├── ssrf-safe-fetch/
│ ├── types/
│ ├── utils/
│ └── web-crawler/
├── src/
│ ├── app/
│ │ ├── (backend)/
│ │ │ ├── api/
│ │ │ ├── f/
│ │ │ ├── market/
│ │ │ ├── middleware/
│ │ │ ├── oidc/
│ │ │ ├── trpc/
│ │ │ └── webapi/
│ │ ├── spa/ # SPA HTML template service
│ │ └── [variants]/
│ │ └── (auth)/ # Auth pages (SSR required)
│ ├── routes/ # SPA page components (Vite)
│ │ ├── (main)/
│ │ ├── (mobile)/
│ │ ├── (desktop)/
│ │ ├── onboarding/
│ │ └── share/
│ ├── spa/ # SPA entry points and router config
│ │ ├── entry.web.tsx
│ │ ├── entry.mobile.tsx
│ │ ├── entry.desktop.tsx
│ │ └── router/
│ ├── business/ # Cloud-only (client/server)
│ │ ├── client/
│ │ ├── locales/
│ │ └── server/
│ ├── components/
│ ├── config/
│ ├── const/
│ ├── envs/
│ ├── features/
│ ├── helpers/
│ ├── hooks/
│ ├── layout/
│ │ ├── AuthProvider/
│ │ └── GlobalProvider/
│ ├── libs/
│ │ ├── better-auth/
│ │ ├── oidc-provider/
│ │ └── trpc/
│ ├── locales/
│ │ └── default/
│ ├── server/
│ │ ├── featureFlags/
│ │ ├── globalConfig/
│ │ ├── modules/
│ │ ├── routers/
│ │ │ ├── async/
│ │ │ ├── lambda/
│ │ │ ├── mobile/
│ │ │ └── tools/
│ │ └── services/
│ ├── services/
│ ├── store/
│ │ ├── agent/
│ │ ├── chat/
│ │ └── user/
│ ├── styles/
│ ├── tools/
│ ├── types/
│ └── utils/
└── src/
├── app/
│ ├── (backend)/ # api, f, market, middleware, oidc, trpc, webapi
│ ├── spa/ # SPA HTML template service
│ └── [variants]/(auth)/ # Auth pages (SSR required)
├── routes/ # SPA page segments (thin — delegate to features/)
│ └── (main)/ (mobile)/ (desktop)/ (popup)/ onboarding/ share/
├── spa/ # SPA entries + router config
│ ├── entry.{web,mobile,desktop,popup}.tsx
│ └── router/
├── business/ # Open-source stubs (client/server) — cloud repo provides real impls
├── features/ # Domain business components
├── store/ # ~30 zustand stores — `ls` for the full set
├── server/ # featureFlags, globalConfig, modules, routers, services, workflows, agent-hono
└── ... # components, hooks, layout, libs, locales, services, types, utils
└── e2e/ # E2E tests (Cucumber + Playwright)
```
## Architecture Map
@@ -109,27 +177,11 @@ git submodules.
| DB Model | `packages/database/src/models` |
| DB Repository | `packages/database/src/repositories` |
| Third-party | `src/libs` (analytics, oidc, etc.) |
| Builtin Tools | `packages/builtin-tool-*`, `packages/builtin-tools` |
| Open-source stub | `src/business/*`, `packages/business/*` (this repo) |
| Builtin Tools | `src/tools`, `packages/builtin-tool-*` |
| Cloud-only | `src/business/*`, `packages/business/*` |
## Data Flow
```
React UI → Store Actions → Client Service → TRPC Lambda → Server Services → DB Model → PostgreSQL
```
## Note: Relationship to the Cloud Repo
This open-source repo is consumed by a **separate, private cloud (SaaS) repo**
as a git submodule mounted at `lobehub/`. The cloud repo provides:
- **`src/business/{client,server}`** and **`packages/business/*`** implementations
that override the stubs shipped here.
- Cloud-only routes (e.g. `(cloud)/`, `embed/`), cloud-only stores (e.g.
`subscription/`), cloud-only TRPC routers (billing, budget, risk control, …),
and Vercel cron routes under `src/app/(backend)/cron/`.
- File-resolution order in cloud: `@/store/x` → cloud `src/store/x` first, then
`lobehub/packages/store/src/x`, then `lobehub/src/store/x`. **Cloud override wins.**
When working in this repo alone, ignore the cloud layer — the stubs in
`src/business/` and `packages/business/` are the source of truth here.
+61 -92
View File
@@ -1,122 +1,91 @@
---
name: react
description: 'LobeHub React component conventions. Use when editing TSX UI, choosing base-ui vs @lobehub/ui vs antd, styling with antd-style, routing, desktop variants, layouts, or component state.'
user-invocable: false
description: React component development guide. Use when working with React components (.tsx files), creating UI, using @lobehub/ui components, implementing routing, or building frontend features. Triggers on React component creation, modification, layout implementation, or navigation tasks.
---
# React Component Writing Guide
## Styling
- Use antd-style for complex styles; for simple cases, use inline `style` attribute
- Use `Flexbox` and `Center` from `@lobehub/ui` for layouts (see `references/layout-kit.md`)
- Component priority: `src/components` > `@lobehub/ui/base-ui` > `@lobehub/ui` > custom implementation
- Always prefer `@lobehub/ui/base-ui` primitives (Select, Modal, DropdownMenu, Popover, Switch, ScrollArea…) over antd equivalents
- Fall back to `@lobehub/ui` higher-level components when base-ui has no match
- Only implement a custom component as a last resort — never reach for antd directly
- Use selectors to access zustand store data
| Scenario | Approach |
| ---------------------------------------------------------- | -------------------------------------------------------------- |
| Most cases | `createStaticStyles` + `cssVar.*` (zero-runtime, module-level) |
| Simple one-off | Inline `style` attribute |
| Truly dynamic (JS color fns like `readableColor`/`chroma`) | `createStyles` + `token`**last resort** |
## @lobehub/ui Components
## Component Priority
If unsure about component usage, search existing code in this project. Most components extend antd with additional props.
1. **`src/components`** — project-specific reusable components
2. **`@lobehub/ui/base-ui`** — headless primitives. **If the component lives here, use it. Do NOT import the same-named root export.**
3. **`@lobehub/ui`** — higher-level / antd-wrapping components (only when no base-ui equivalent)
4. **antd** — only when neither base-ui nor `@lobehub/ui` root provides it
5. **Custom implementation** — true last resort
Reference: `node_modules/@lobehub/ui/es/index.mjs` for all available components.
If unsure about available components, search existing code or check `node_modules/@lobehub/ui/es/index.mjs` and `node_modules/@lobehub/ui/es/base-ui/`.
**Common Components:**
### `@lobehub/ui/base-ui` — always prefer for these
- General: ActionIcon, ActionIconGroup, Block, Button, Icon
- Data Display: Avatar, Collapse, Empty, Highlighter, Markdown, Tag, Tooltip
- Data Entry: CodeEditor, CopyButton, EditableText, Form, FormModal, Input, SearchBar, Select
- Feedback: Alert, Drawer, Modal
- Layout: Center, DraggablePanel, Flexbox, Grid, Header, MaskShadow
- Navigation: Burger, Dropdown, Menu, SideNav, Tabs
| Component | Import |
| ------------------------------------------ | ------------------------------------------------------------------------------------------------------- |
| `Select` (+ `SelectProps`, `SelectOption`) | `import { Select } from '@lobehub/ui/base-ui';` |
| `Modal` (imperative API) | `import { createModal, confirmModal, useModalContext, type ModalInstance } from '@lobehub/ui/base-ui';` |
| `DropdownMenu` | `import { DropdownMenu } from '@lobehub/ui/base-ui';` |
| `ContextMenu` | `import { ContextMenu } from '@lobehub/ui/base-ui';` |
| `Popover` | `import { Popover } from '@lobehub/ui/base-ui';` |
| `ScrollArea` | `import { ScrollArea } from '@lobehub/ui/base-ui';` |
| `Switch` | `import { Switch } from '@lobehub/ui/base-ui';` |
| `Toast` | `import { Toast } from '@lobehub/ui/base-ui';` |
| `FloatingSheet` | `import { FloatingSheet } from '@lobehub/ui/base-ui';` |
## Routing Architecture
For Modal specifically, see the dedicated **modal** skill — use the imperative `createModal({ content: … })` pattern over the legacy `<Modal open … />` declarative pattern. base-ui has its own `ModalHost` already mounted in `SPAGlobalProvider`.
Hybrid routing: Next.js App Router (static pages) + React Router DOM (main SPA).
> Common slip: `import { Select } from '@lobehub/ui'` looks fine but it's the antd-backed Select. Use base-ui Select. Same for `Modal`, `DropdownMenu`, etc.
| Route Type | Use Case | Implementation |
| ------------------ | --------------------------------- | ---------------------------------------------------------------------------- |
| Next.js App Router | Auth pages (login, signup, oauth) | `src/app/[variants]/(auth)/` |
| React Router DOM | Main SPA (chat, settings) | `desktopRouter.config.tsx` + `desktopRouter.config.desktop.tsx` (must match) |
### `@lobehub/ui` root — use when base-ui has no equivalent
### Key Files
| Category | Components |
| ------------ | ------------------------------------------------------------------------------------- |
| General | ActionIcon, ActionIconGroup, Block, Button, Icon |
| Data Display | Avatar, Collapse, Empty, Highlighter, Markdown, Tag, Tooltip |
| Data Entry | CodeEditor, CopyButton, EditableText, Form, Input, InputPassword, SearchBar, TextArea |
| Feedback | Alert, Drawer |
| Layout | Center, DraggablePanel, Flexbox, Grid, Header, MaskShadow |
| Navigation | Burger, Menu, SideNav, Tabs |
- Entry: `src/spa/entry.web.tsx` (web), `src/spa/entry.mobile.tsx`, `src/spa/entry.desktop.tsx`
- Desktop router (pair — **always edit both** when changing routes): `src/spa/router/desktopRouter.config.tsx` (dynamic imports) and `src/spa/router/desktopRouter.config.desktop.tsx` (sync imports). Drift can cause unregistered routes / blank screen.
- Mobile router: `src/spa/router/mobileRouter.config.tsx`
- Router utilities: `src/utils/router.tsx`
## State
### `.desktop.{ts,tsx}` File Sync Rule
When a feature component manages more than 3 pieces of state (`useState`/`useReducer`/derived state), extract the logic into a custom hook (e.g. `useXxx`). Keep the component focused on rendering — the hook holds state and handlers, so logic can be unit-tested without rendering the component.
**CRITICAL**: Some files have a `.desktop.ts(x)` variant that Electron uses instead of the base file. When editing a base file, **always check** if a `.desktop` counterpart exists and update it in sync. Drift causes blank pages or missing features in Electron.
## Layout
Known pairs that must stay in sync:
Use `Flexbox` and `Center` from `@lobehub/ui`. See `references/layout-kit.md` for full props and examples.
| Base file (web, dynamic imports) | Desktop file (Electron, sync imports) |
| ----------------------------------------------------- | ------------------------------------------------------------- |
| `src/spa/router/desktopRouter.config.tsx` | `src/spa/router/desktopRouter.config.desktop.tsx` |
| `src/routes/(main)/settings/features/componentMap.ts` | `src/routes/(main)/settings/features/componentMap.desktop.ts` |
- Use `gap` instead of `margin` for spacing between flex children
- Use `flex={1}` to fill available space
- Nest Flexbox for complex layouts; set `overflow: 'auto'` for scrollable regions
**How to check**: After editing any `.ts` / `.tsx` file, run `Glob` for `<filename>.desktop.{ts,tsx}` in the same directory. If a match exists, update it with the equivalent sync-import change.
## Navigation
### Router Utilities
**For SPA pages, use `react-router-dom`, NOT `next/link`.**
```tsx
import { dynamicElement, redirectElement, ErrorBoundary } from '@/utils/router';
element: dynamicElement(() => import('./chat'), 'Desktop > Chat');
element: redirectElement('/settings/profile');
errorElement: <ErrorBoundary resetPath="/chat" />;
```
### Navigation
**Important**: For SPA pages, use `Link` from `react-router-dom`, NOT `next/link`.
```tsx
// ❌ Wrong
import Link from 'next/link';
<Link href="/">Home</Link>;
// ✅ Correct
import { Link, useNavigate } from 'react-router-dom';
import { Link } from 'react-router-dom';
<Link to="/">Home</Link>;
// In components
import { useNavigate } from 'react-router-dom';
const navigate = useNavigate();
navigate('/chat');
// From stores
const navigate = useGlobalStore.getState().navigate;
navigate?.('/settings');
```
Access navigate from stores: `useGlobalStore.getState().navigate?.('/settings');`
## Desktop File Sync Rule
Files with a `.desktop.ts(x)` variant must be edited **in sync**. Drift causes blank pages in Electron.
| Base file (web) | Desktop file (Electron) |
| -------------------------- | ---------------------------------- |
| `desktopRouter.config.tsx` | `desktopRouter.config.desktop.tsx` |
| `componentMap.ts` | `componentMap.desktop.ts` |
**After editing any `.ts`/`.tsx`:** glob for `<filename>.desktop.{ts,tsx}` in the same directory. If found, apply the equivalent sync-import change.
## Routing Architecture
| Route Type | Use Case | Implementation |
| ------------------ | ---------- | -------------------------------------------------- |
| Next.js App Router | Auth pages | `src/app/[variants]/(auth)/` |
| React Router DOM | Main SPA | `desktopRouter.config.tsx` + `.desktop.tsx` (pair) |
Router utilities:
```tsx
import { dynamicElement, redirectElement, ErrorBoundary } from '@/utils/router';
element: dynamicElement(() => import('./chat'), 'Desktop > Chat');
element: redirectElement('/settings/profile');
errorElement: <ErrorBoundary />;
```
## Common Mistakes
| Mistake | Fix |
| ------------------------------------------------------------------ | --------------------------------------------------------------------------- |
| Using `next/link` in SPA | Use `react-router-dom` `Link` |
| Using antd directly | Use `@lobehub/ui/base-ui` first, then `@lobehub/ui` |
| `import { Select } from '@lobehub/ui'` | `import { Select } from '@lobehub/ui/base-ui'` |
| `import { Modal } from '@lobehub/ui'` + `<Modal open>` declarative | `createModal` / `confirmModal` from `@lobehub/ui/base-ui` (see modal skill) |
| `import { DropdownMenu/Popover/Switch } from '@lobehub/ui'` | Import same name from `@lobehub/ui/base-ui` instead |
| `createStyles` for static styles | Use `createStaticStyles` + `cssVar` |
| Editing only `desktopRouter.config.tsx` | Must edit both `.tsx` and `.desktop.tsx` |
| Using `margin` for flex spacing | Use `gap` prop on Flexbox |
| Accessing zustand store without selector | Use selectors to access store data (see zustand skill) |
| Text or icon-text actions built with `Flexbox`/`Text` + `onClick` | Use `Button type={'text'} size={'small'}` with `icon` when needed |
+114
View File
@@ -0,0 +1,114 @@
---
name: recent-data
description: Guide for using Recent Data (topics, resources, pages). Use when working with recently accessed items, implementing recent lists, or accessing session store recent data. Triggers on recent data usage or implementation tasks.
user-invocable: false
---
# Recent Data Usage Guide
Recent data (recentTopics, recentResources, recentPages) is stored in session store.
## Initialization
In app top-level (e.g., `RecentHydration.tsx`):
```tsx
import { useInitRecentTopic } from '@/hooks/useInitRecentTopic';
import { useInitRecentResource } from '@/hooks/useInitRecentResource';
import { useInitRecentPage } from '@/hooks/useInitRecentPage';
const App = () => {
useInitRecentTopic();
useInitRecentResource();
useInitRecentPage();
return <YourComponents />;
};
```
## Usage
### Method 1: Read from Store (Recommended)
```tsx
import { useSessionStore } from '@/store/session';
import { recentSelectors } from '@/store/session/selectors';
const Component = () => {
const recentTopics = useSessionStore(recentSelectors.recentTopics);
const isInit = useSessionStore(recentSelectors.isRecentTopicsInit);
if (!isInit) return <div>Loading...</div>;
return (
<div>
{recentTopics.map((topic) => (
<div key={topic.id}>{topic.title}</div>
))}
</div>
);
};
```
### Method 2: Use Hook Return (Single component)
```tsx
const { data: recentTopics, isLoading } = useInitRecentTopic();
```
## Available Selectors
### Recent Topics
```tsx
const recentTopics = useSessionStore(recentSelectors.recentTopics);
// Type: RecentTopic[]
const isInit = useSessionStore(recentSelectors.isRecentTopicsInit);
// Type: boolean
```
**RecentTopic type:**
```typescript
interface RecentTopic {
agent: {
avatar: string | null;
backgroundColor: string | null;
id: string;
title: string | null;
} | null;
id: string;
title: string | null;
updatedAt: Date;
}
```
### Recent Resources
```tsx
const recentResources = useSessionStore(recentSelectors.recentResources);
// Type: FileListItem[]
const isInit = useSessionStore(recentSelectors.isRecentResourcesInit);
```
### Recent Pages
```tsx
const recentPages = useSessionStore(recentSelectors.recentPages);
const isInit = useSessionStore(recentSelectors.isRecentPagesInit);
```
## Features
1. **Auto login detection**: Only loads when user is logged in
2. **Data caching**: Stored in store, no repeated loading
3. **Auto refresh**: SWR refreshes on focus (5-minute interval)
4. **Type safe**: Full TypeScript types
## Best Practices
1. Initialize all recent data at app top-level
2. Use selectors to read from store
3. For multi-component use, prefer Method 1
4. Use selectors for render optimization
+1 -1
View File
@@ -1,6 +1,6 @@
---
name: response-compliance
description: 'OpenResponses API compliance testing. Use for Response API endpoint tests, compliance runs, schema debugging, response api test, or openresponses test tasks.'
description: OpenResponses API compliance testing. Use when testing the Response API endpoint, running compliance tests, or debugging Response API schema issues. Triggers on 'compliance', 'response api test', 'openresponses test'.
---
# OpenResponses Compliance Test
-142
View File
@@ -1,142 +0,0 @@
---
name: skills-audit
description: 'Audit .agents/skills SKILL.md files. Use for recurring checks of duplicate, overlapping, stale, inconsistent, or broken skills and merge/delete candidates.'
disable-model-invocation: true
argument-hint: '[--verbose | --apply]'
---
# Skills Audit
Periodic review of the project-local skill set under `.agents/skills/`. The goal is to catch drift before the catalog becomes confusing — too many skills, overlapping triggers, descriptions that no longer match the body, references to skills that were renamed/deleted.
**Recommended cadence:** weekly, or after any week where >1 skill was added/renamed.
## Procedure
### 1 — Inventory
Build a fresh census of all SKILL.md files. Do NOT trust any prior cached list.
```bash
find .agents/skills -name SKILL.md | wc -l # total count
find .agents/skills -name SKILL.md -exec wc -l {} \; | sort -rn # by body length
```
Group by domain in a mental table (DB / state / UI / agent / testing / workflow / docs / etc.). Note new arrivals since last audit (`git log --since="1 week ago" -- .agents/skills/`).
### 2 — Pull frontmatter for all skills
```bash
# Extract name + description for each SKILL.md
for f in .agents/skills/*/SKILL.md; do
echo "=== $(basename $(dirname $f)) ==="
awk '/^---$/{c++; next} c==1' "$f" | head -20
done
```
Read the description block of every skill. The body can stay unread unless step 4 flags it.
### 3 — Detect overlap / redundancy
For each pair within the same domain, ask:
- **Same description**? → likely duplicate (one is probably a stale rename leftover, or a global-vs-local collision).
- **Trigger keywords substantially overlap**? → either merge, OR tighten one description so the model can choose unambiguously.
- **One skill's body says "see also: foo"**? → confirm `foo` still exists, AND confirm the cross-reference is still meaningful (the referenced skill may have absorbed the referrer's concerns).
- **Skill duplicates content from `AGENTS.md`**? → fold into AGENTS.md or slim the skill to just the delta.
Common false positives (do NOT merge):
- `db-migrations` vs `drizzle` — distinct workflows (migration files vs schema authoring).
- `microcopy` vs `i18n` — content vs mechanics.
- `agent-runtime-hooks` vs `agent-tracing` vs `agent-signal` — different surfaces of the agent system.
- `testing` vs `local-testing` vs `cli-backend-testing` — different test types.
### 4 — Description format consistency
Apply the **standard template**:
```
{Topic + key conventions or scope}. Use when {scenarios — verbs + nouns}. Triggers on {`code-symbols`, 'natural phrases', '中文'}.
```
Skills with `disable-model-invocation: true` (user-invoked only, slash commands) don't need `Triggers on` — they're never auto-routed.
Flag descriptions that:
- ❌ Have NO `Use when` clause (model can't decide when to load it).
- ❌ Have NO `Triggers on` clause (and aren't `disable-model-invocation`).
- ❌ Use weird formats (numbered lists `(1)(2)(3)`, `Triggers:` colon instead of `Triggers on`, `MUST use when ...` as opening word).
- ❌ Are dramatically terse for a 200+ line body, or dramatically verbose for a 60-line body.
- ❌ Reference deleted/renamed skills.
### 5 — Stale-skill check
For narrow domain skills (e.g. `response-compliance`, one-off CLI workflows):
```bash
# Confirm the referenced code surface still exists
rg -l "response-compliance|openresponses" packages/ src/ # adjust per skill
git log --since="3 months ago" -- .agents/skills/ < skill > /SKILL.md # is it being maintained?
```
If the underlying surface is gone and the skill hasn't been edited in 3+ months → flag for archival.
### 6 — Cross-reference integrity
Any skill body mentioning another skill by name:
```bash
# Scan all skill bodies for skill-name references
rg -o '`[a-z][a-z0-9-]+`' .agents/skills/*/SKILL.md | grep -v ':\s*$' | sort -u
```
For each name extracted, confirm `.agents/skills/<name>/SKILL.md` exists. Broken references happen after renames — fix them in the same audit pass.
### 7 — Output report
Produce a markdown summary back to the user with the same structure as the original audit (this skill was created during one):
```markdown
## 📊 Inventory
{count, domain breakdown}
## 🎯 Recommendations
### 🔴 High confidence
- {action} — {reason}
### 🟡 Medium confidence
- {action} — {reason needs verification}
### 🟢 Low confidence / no-op
- {item considered but skipping because ...}
## 📋 Suggested order
{table of actions with risk + LOC estimate}
```
End by asking the user which actions to apply — do NOT auto-apply unless the user passed `--apply` and even then confirm destructive deletes individually.
## Output rules
- Be specific. "Skill X overlaps with Y" is useless without naming the overlapping triggers.
- Cite line numbers when flagging description / body issues.
- Don't recommend merges unless the call sites would actually load the merged skill in the same context.
- Don't recommend deletes for skills that haven't been touched recently — "unused" can mean "stable", not "dead".
## What NOT to do
- ❌ Don't rename skill directories without checking for cross-references AND user memory entries that name the old slug.
- ❌ Don't normalize a description by removing trigger keywords just to fit the template — the keywords are the routing signal.
- ❌ Don't fold a heavy 200+ line skill into another just because they share a domain — large skills get loaded selectively and merging makes everything load.
- ❌ Don't propose `.agents/skills/INDEX.md` or `<domain>-<skill>` prefix renames unless the user explicitly asks — costs > benefits for cosmetic reorgs.
## Related history
- First audit: `chore/skills-audit` branch (2026-05-25) — deleted `source-command-dedupe`, renamed `data-fetching``data-fetching-architecture`, normalized 9 descriptions, created this skill.
+5 -27
View File
@@ -1,7 +1,6 @@
---
name: spa-routes
description: 'LobeHub SPA route architecture. Use when editing src/routes, src/features delegation, desktop/mobile/popup router configs, .desktop variants, route segments, redirects, or new pages.'
user-invocable: false
description: MUST use when editing src/routes/ segments, src/spa/router/desktopRouter.config.tsx or desktopRouter.config.desktop.tsx (always change both together), mobileRouter.config.tsx, or when moving UI/logic between routes and src/features/.
---
# SPA Routes and Features Guide
@@ -85,36 +84,15 @@ Each feature should:
## 3a. Desktop router pair (`desktopRouter.config` × 2)
| File | Role |
| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| `desktopRouter.config.tsx` | Dynamic imports via `dynamicElement` / `dynamicLayout` — code-splitting; used by `entry.web.tsx` and `entry.desktop.tsx`. |
| `desktopRouter.config.desktop.tsx` | Same route tree with **synchronous** imports — kept for Electron / local parity and predictable bundling. |
| File | Role |
|------|------|
| `desktopRouter.config.tsx` | Dynamic imports via `dynamicElement` / `dynamicLayout` — code-splitting; used by `entry.web.tsx` and `entry.desktop.tsx`. |
| `desktopRouter.config.desktop.tsx` | Same route tree with **synchronous** imports — kept for Electron / local parity and predictable bundling. |
Anything that changes the tree (new segment, renamed `path`, moved layout, new child route) must be reflected in **both** files in one PR or commit. Remove routes from both when deleting.
---
## 3b. Other `.desktop.{ts,tsx}` variants inside `src/routes/`
The router pair is **not** the only `.desktop` variant pattern in this repo. Some route trees colocate a `<name>.desktop.{ts,tsx}` next to its base `<name>.{ts,tsx}` — Vite's resolver swaps in the `.desktop` file for Electron builds. Same drift risk as the router pair: editing only one side can break Electron silently.
Known variants today:
| Base file (web) | Desktop file (Electron) | Purpose |
| ----------------------------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `src/routes/(main)/settings/features/componentMap.ts` | `src/routes/(main)/settings/features/componentMap.desktop.ts` | Settings tab → component map. Web uses dynamic `import()`; desktop uses sync imports. `componentMap.sync.test.ts` enforces identical keys. |
| `src/routes/(main)/agent/index.tsx` | `src/routes/(main)/agent/index.desktop.tsx` | Page entry. Desktop variant overrides the web page wholesale (e.g. extra popup guards). |
| `src/routes/(main)/group/index.tsx` | `src/routes/(main)/group/index.desktop.tsx` | Same pattern as agent. |
**Rules:**
1. After editing **any** `.ts`/`.tsx` under `src/routes/`, glob the same directory for a `<filename>.desktop.{ts,tsx}` sibling. If one exists, apply the equivalent change there in the same commit.
2. When adding a new SettingsTab, register it in **both** `componentMap.ts` (with `dynamic(...)`) and `componentMap.desktop.ts` (with a sync `import`). `componentMap.sync.test.ts` will fail the build otherwise.
3. When adding a new desktop-only page wholesale-override, prefer a single base file with platform-aware code over introducing a new `.desktop.tsx` variant — only add a new variant when the two trees genuinely diverge (different store wiring, different popup guards, etc.).
4. When deleting, remove **both** files together.
---
## 4. How to Divide Files (route vs feature)
| Question | Put in `src/routes/` | Put in `src/features/` |
+381 -71
View File
@@ -1,91 +1,257 @@
---
name: store-data-structures
description: 'LobeHub Zustand store data-shape patterns. Use when designing store state, list/detail splits, normalized maps, reducers, messagesMap, topicsMap, or choosing shared type sources.'
user-invocable: false
description: Zustand store data structure patterns for LobeHub. Covers List vs Detail data structures, Map + Reducer patterns, type definitions, and when to use each pattern. Use when designing store state, choosing data structures, or implementing list/detail pages.
---
# LobeHub Store Data Structures
How to structure data in Zustand stores for fast list rendering, multi-detail caching, and ergonomic optimistic updates.
This guide covers how to structure data in Zustand stores for optimal performance and user experience.
## Core Principles
### ✅ DO
1. **Separate List and Detail** different structures for list pages and detail pages
2. **Use Map for Details** — cache multiple detail pages with `Record<string, Detail>`
3. **Use Array for Lists** — simple arrays for list display
4. **Types from `@lobechat/types`** — never use `@lobechat/database` types in stores
5. **Distinguish List and Detail types** List types may have computed UI fields
1. **Separate List and Detail** - Use different structures for list pages and detail pages
2. **Use Map for Details** - Cache multiple detail pages with `Record<string, Detail>`
3. **Use Array for Lists** - Simple arrays for list display
4. **Types from @lobechat/types** - Never use `@lobechat/database` types in stores
5. **Distinguish List and Detail types** - List types may have computed UI fields
### ❌ DON'T
1. **Don't use a single detail object** — can't cache multiple pages
2. **Don't mix List and Detail types** — they have different purposes
3. **Don't use database types** — use types from `@lobechat/types`
4. **Don't use Map for lists** — simple arrays are sufficient
1. **Don't use single detail object** - Can't cache multiple pages
2. **Don't mix List and Detail types** - They have different purposes
3. **Don't use database types** - Use types from `@lobechat/types`
4. **Don't use Map for lists** - Simple arrays are sufficient
---
## Type Definitions
Each entity gets its own file under `@lobechat/types/`. Each file exports two types:
Types should be organized by entity in separate files:
- **Detail type** — full entity, including heavy fields (rubrics, content, editor state, …)
- **List item type** — a **subset** that excludes heavy fields, may add computed UI fields (counts, timestamps formatted for display)
```
@lobechat/types/src/eval/
├── benchmark.ts # Benchmark types
├── agentEvalDataset.ts # Dataset types
├── agentEvalRun.ts # Run types
└── index.ts # Re-exports
```
**Important:** the List type is a **subset**, not an `extends` of Detail. Extending pulls the heavy fields right back in.
### Example: Benchmark Types
> See [`references/types.md`](./references/types.md) for full worked examples (Benchmark, Document) and the heavy-field exclusion checklist.
```typescript
// packages/types/src/eval/benchmark.ts
import type { EvalBenchmarkRubric } from './rubric';
// ============================================
// Detail Type - Full entity (for detail pages)
// ============================================
/**
* Full benchmark entity with all fields including heavy data
*/
export interface AgentEvalBenchmark {
createdAt: Date;
description?: string | null;
id: string;
identifier: string;
isSystem: boolean;
metadata?: Record<string, unknown> | null;
name: string;
referenceUrl?: string | null;
rubrics: EvalBenchmarkRubric[]; // Heavy field
updatedAt: Date;
}
// ============================================
// List Type - Lightweight (for list display)
// ============================================
/**
* Lightweight benchmark item - excludes heavy fields
* May include computed statistics for UI
*/
export interface AgentEvalBenchmarkListItem {
createdAt: Date;
description?: string | null;
id: string;
identifier: string;
isSystem: boolean;
name: string;
// Note: rubrics NOT included (heavy field)
// Computed statistics for UI display
datasetCount?: number;
runCount?: number;
testCaseCount?: number;
}
```
### Example: Document Types (with heavy content)
```typescript
// packages/types/src/document.ts
/**
* Full document entity - includes heavy content fields
*/
export interface Document {
id: string;
title: string;
description?: string;
content: string; // Heavy field - full markdown content
editorData: any; // Heavy field - editor state
metadata?: Record<string, unknown>;
createdAt: Date;
updatedAt: Date;
}
/**
* Lightweight document item - excludes heavy content
*/
export interface DocumentListItem {
id: string;
title: string;
description?: string;
// Note: content and editorData NOT included
createdAt: Date;
updatedAt: Date;
// Computed statistics
wordCount?: number;
lastEditedBy?: string;
}
```
**Key Points:**
- **Detail types** include ALL fields from database (full entity)
- **List types** are **subsets** that exclude heavy/large fields
- List types may add computed statistics for UI (e.g., `testCaseCount`)
- **Each entity gets its own file** (not mixed together)
- **All types** exported from `@lobechat/types`, NOT `@lobechat/database`
**Heavy fields to exclude from List:**
- Large text content (`content`, `editorData`, `fullDescription`)
- Complex objects (`rubrics`, `config`, `metrics`)
- Binary data (`image`, `file`)
- Large arrays (`messages`, `items`)
---
## When to Use Map vs Array
### Use Map + Reducer for Detail Data
### Use Map + Reducer (for Detail Data)
✅ Detail page data caching multiple detail pages cached simultaneously
✅ Optimistic updates — update UI before API responds
✅ Per-item loading states — track which items are being updated
✅ Multi-page navigation — user can switch between details without refetching
**Detail page data caching** - Cache multiple detail pages simultaneously
**Optimistic updates** - Update UI before API responds
**Per-item loading states** - Track which items are being updated
**Multiple pages open** - User can navigate between details without refetching
**Structure:**
```typescript
benchmarkDetailMap: Record<string, AgentEvalBenchmark>;
```
Examples: benchmark detail pages, dataset detail pages, user profiles.
**Example:** Benchmark detail pages, Dataset detail pages, User profiles
### Use Simple Array for List Data
### Use Simple Array (for List Data)
✅ List display — lists, tables, cards
Refresh as a whole — entire list refreshes together
✅ No per-item updates — no need to mutate individual rows in place
✅ Simple data flow — fewer moving parts
**List display** - Lists, tables, cards
**Read-only or refresh-as-whole** - Entire list refreshes together
**No per-item updates** - No need to update individual items
**Simple data flow** - Easier to understand and maintain
**Structure:**
```typescript
benchmarkList: AgentEvalBenchmarkListItem[];
benchmarkList: AgentEvalBenchmarkListItem[]
```
Examples: benchmark list, dataset list, user list.
**Example:** Benchmark list, Dataset list, User list
---
## State Structure Pattern
### Complete Example
```typescript
// packages/types/src/eval/benchmark.ts
import type { EvalBenchmarkRubric } from './rubric';
/**
* Full benchmark entity (for detail pages)
*/
export interface AgentEvalBenchmark {
id: string;
name: string;
description?: string | null;
identifier: string;
rubrics: EvalBenchmarkRubric[]; // Heavy field
metadata?: Record<string, unknown> | null;
isSystem: boolean;
createdAt: Date;
updatedAt: Date;
}
/**
* Lightweight benchmark (for list display)
* Excludes heavy fields like rubrics
*/
export interface AgentEvalBenchmarkListItem {
id: string;
name: string;
description?: string | null;
identifier: string;
isSystem: boolean;
createdAt: Date;
// Note: rubrics excluded
// Computed statistics
testCaseCount?: number;
datasetCount?: number;
runCount?: number;
}
```
```typescript
// src/store/eval/slices/benchmark/initialState.ts
import type { AgentEvalBenchmark, AgentEvalBenchmarkListItem } from '@lobechat/types';
export interface BenchmarkSliceState {
// List — simple array
// ============================================
// List Data - Simple Array
// ============================================
/**
* List of benchmarks for list page display
* May include computed fields like testCaseCount
*/
benchmarkList: AgentEvalBenchmarkListItem[];
benchmarkListInit: boolean;
// Detail — map for multi-entity caching
// ============================================
// Detail Data - Map for Caching
// ============================================
/**
* Map of benchmark details keyed by ID
* Caches detail page data for multiple benchmarks
* Enables optimistic updates and per-item loading
*/
benchmarkDetailMap: Record<string, AgentEvalBenchmark>;
loadingBenchmarkDetailIds: string[]; // per-item loading
// Mutation states (drive form-level UI)
/**
* Track which benchmark details are being loaded/updated
* For showing spinners on specific items
*/
loadingBenchmarkDetailIds: string[];
// ============================================
// Mutation States
// ============================================
isCreatingBenchmark: boolean;
isUpdatingBenchmark: boolean;
isDeletingBenchmark: boolean;
@@ -106,51 +272,180 @@ export const benchmarkInitialState: BenchmarkSliceState = {
## Reducer Pattern (for Detail Map)
When the Detail Map needs optimistic updates (i.e. the user edits a row and the UI should reflect it before the server confirms), wire a typed reducer instead of inlining `set` calls. This keeps mutations testable and the dispatch surface small.
### Why Use Reducer?
> See [`references/reducer.md`](./references/reducer.md) for the full discriminated-union action types, the `produce`-based reducer, and the `internal_dispatch*` slice methods that connect them to Zustand.
- **Immutable updates** - Immer ensures immutability
- **Type-safe actions** - TypeScript discriminated unions
- **Testable** - Pure functions easy to test
- **Reusable** - Same reducer for optimistic updates and server data
### Reducer Structure
```typescript
// src/store/eval/slices/benchmark/reducer.ts
import { produce } from 'immer';
import type { AgentEvalBenchmark } from '@lobechat/types';
// ============================================
// Action Types
// ============================================
type SetBenchmarkDetailAction = {
id: string;
type: 'setBenchmarkDetail';
value: AgentEvalBenchmark;
};
type UpdateBenchmarkDetailAction = {
id: string;
type: 'updateBenchmarkDetail';
value: Partial<AgentEvalBenchmark>;
};
type DeleteBenchmarkDetailAction = {
id: string;
type: 'deleteBenchmarkDetail';
};
export type BenchmarkDetailDispatch =
| SetBenchmarkDetailAction
| UpdateBenchmarkDetailAction
| DeleteBenchmarkDetailAction;
// ============================================
// Reducer Function
// ============================================
export const benchmarkDetailReducer = (
state: Record<string, AgentEvalBenchmark> = {},
payload: BenchmarkDetailDispatch,
): Record<string, AgentEvalBenchmark> => {
switch (payload.type) {
case 'setBenchmarkDetail': {
return produce(state, (draft) => {
draft[payload.id] = payload.value;
});
}
case 'updateBenchmarkDetail': {
return produce(state, (draft) => {
if (draft[payload.id]) {
draft[payload.id] = { ...draft[payload.id], ...payload.value };
}
});
}
case 'deleteBenchmarkDetail': {
return produce(state, (draft) => {
delete draft[payload.id];
});
}
default:
return state;
}
};
```
### Internal Dispatch Methods
```typescript
// In action.ts
export interface BenchmarkAction {
// ... other methods ...
// Internal methods - not for direct UI use
internal_dispatchBenchmarkDetail: (payload: BenchmarkDetailDispatch) => void;
internal_updateBenchmarkDetailLoading: (id: string, loading: boolean) => void;
}
export const createBenchmarkSlice: StateCreator<...> = (set, get) => ({
// ... other methods ...
// Internal - Dispatch to reducer
internal_dispatchBenchmarkDetail: (payload) => {
const currentMap = get().benchmarkDetailMap;
const nextMap = benchmarkDetailReducer(currentMap, payload);
// Only update if changed
if (isEqual(nextMap, currentMap)) return;
set(
{ benchmarkDetailMap: nextMap },
false,
`dispatchBenchmarkDetail/${payload.type}`,
);
},
// Internal - Update loading state
internal_updateBenchmarkDetailLoading: (id, loading) => {
set(
(state) => {
if (loading) {
return { loadingBenchmarkDetailIds: [...state.loadingBenchmarkDetailIds, id] };
}
return {
loadingBenchmarkDetailIds: state.loadingBenchmarkDetailIds.filter((i) => i !== id),
};
},
false,
'updateBenchmarkDetailLoading',
);
},
});
```
---
## Data Structure Comparison
### ❌ WRONG Single Detail Object
### ❌ WRONG - Single Detail Object
```typescript
interface BenchmarkSliceState {
// ❌ Can only cache one detail
benchmarkDetail: AgentEvalBenchmark | null;
// ❌ Global loading state
isLoadingBenchmarkDetail: boolean;
}
```
Problems:
**Problems:**
- Can only cache one detail page at a time
- Switching between details forces refetch
- Switching between details causes unnecessary refetches
- No optimistic updates
- No per-item loading states
### ✅ CORRECT Separate List and Detail
### ✅ CORRECT - Separate List and Detail
```typescript
import type { AgentEvalBenchmark, AgentEvalBenchmarkListItem } from '@lobechat/types';
interface BenchmarkSliceState {
// ✅ List data - simple array
benchmarkList: AgentEvalBenchmarkListItem[];
benchmarkListInit: boolean;
// ✅ Detail data - map for caching
benchmarkDetailMap: Record<string, AgentEvalBenchmark>;
// ✅ Per-item loading
loadingBenchmarkDetailIds: string[];
// ✅ Mutation states
isCreatingBenchmark: boolean;
isUpdatingBenchmark: boolean;
isDeletingBenchmark: boolean;
}
```
Benefits:
**Benefits:**
- Cache multiple detail pages
- Fast navigation between cached details
- Optimistic updates via reducer
- Optimistic updates with reducer
- Per-item loading states
- Clear separation of concerns
@@ -160,16 +455,22 @@ Benefits:
### Accessing List Data
```tsx
```typescript
const BenchmarkList = () => {
// Simple array access
const benchmarks = useEvalStore((s) => s.benchmarkList);
const isInit = useEvalStore((s) => s.benchmarkListInit);
if (!isInit) return <Loading />;
return (
<div>
{benchmarks.map((b) => (
<BenchmarkCard key={b.id} name={b.name} testCaseCount={b.testCaseCount} />
{benchmarks.map(b => (
<BenchmarkCard
key={b.id}
name={b.name}
testCaseCount={b.testCaseCount} // Computed field
/>
))}
</div>
);
@@ -178,18 +479,22 @@ const BenchmarkList = () => {
### Accessing Detail Data
```tsx
```typescript
const BenchmarkDetail = () => {
const { benchmarkId } = useParams<{ benchmarkId: string }>();
// Get from map
const benchmark = useEvalStore((s) =>
benchmarkId ? s.benchmarkDetailMap[benchmarkId] : undefined,
);
// Check loading
const isLoading = useEvalStore((s) =>
benchmarkId ? s.loadingBenchmarkDetailIds.includes(benchmarkId) : false,
);
if (!benchmark) return <Loading />;
return (
<div>
<h1>{benchmark.name}</h1>
@@ -205,6 +510,7 @@ const BenchmarkDetail = () => {
// src/store/eval/slices/benchmark/selectors.ts
export const benchmarkSelectors = {
getBenchmarkDetail: (id: string) => (s: EvalStore) => s.benchmarkDetailMap[id],
isLoadingBenchmarkDetail: (id: string) => (s: EvalStore) =>
s.loadingBenchmarkDetailIds.includes(id),
};
@@ -218,7 +524,7 @@ const isLoading = useEvalStore(benchmarkSelectors.isLoadingBenchmarkDetail(bench
## Decision Tree
```text
```
Need to store data?
├─ Is it a LIST for display?
@@ -241,40 +547,43 @@ Need to store data?
When designing store state structure:
- [ ] **Organize types by entity** in separate files (e.g. `benchmark.ts`, `agentEvalDataset.ts`)
- [ ] **Organize types by entity** in separate files (e.g., `benchmark.ts`, `agentEvalDataset.ts`)
- [ ] Create **Detail** type (full entity with all fields including heavy ones)
- [ ] Create **ListItem** type:
- [ ] Subset of Detail (exclude heavy fields)
- [ ] Subset of Detail type (exclude heavy fields)
- [ ] May include computed statistics for UI
- [ ] **NOT** `extends` Detail
- [ ] **NOT** extending Detail type (it's a subset, not extension)
- [ ] Use **array** for list data: `xxxList: XxxListItem[]`
- [ ] Use **Map** for detail data: `xxxDetailMap: Record<string, Xxx>`
- [ ] Per-item loading: `loadingXxxDetailIds: string[]`
- [ ] **Reducer** for detail map if optimistic updates needed (see [`references/reducer.md`](./references/reducer.md))
- [ ] **Internal dispatch** and **loading** methods
- [ ] **Selectors** for clean access (optional but recommended)
- [ ] Document in comments which fields are excluded from List and why
- [ ] Add per-item loading: `loadingXxxDetailIds: string[]`
- [ ] Create **reducer** for detail map if optimistic updates needed
- [ ] Add **internal dispatch** and **loading** methods
- [ ] Create **selectors** for clean access (optional but recommended)
- [ ] Document in comments:
- [ ] What fields are excluded from List and why
- [ ] What computed fields mean
- [ ] What each Map is for
---
## Best Practices
1. **File organization** — one entity per file, not mixed
2. **List is a subset** ListItem excludes heavy fields, does not `extends` Detail
3. **Clear naming** `xxxList` for arrays, `xxxDetailMap` for maps
4. **Consistent patterns** — all detail maps follow the same shape
5. **Type safety** — never use `any`, always use proper types
6. **Document exclusions** — comment which fields are excluded and why
7. **Selectors** — encapsulate access patterns
8. **Loading states** — per-item for details, global for mutations
9. **Immutability** — use Immer in reducers
1. **File organization** - One entity per file, not mixed together
2. **List is subset** - ListItem excludes heavy fields, not extends Detail
3. **Clear naming** - `xxxList` for arrays, `xxxDetailMap` for maps
4. **Consistent patterns** - All detail maps follow same structure
5. **Type safety** - Never use `any`, always use proper types
6. **Document exclusions** - Comment which fields are excluded from List and why
7. **Selectors** - Encapsulate access patterns
8. **Loading states** - Per-item for details, global for lists
9. **Immutability** - Use Immer in reducers
### Common Mistakes to Avoid
**DON'T extend Detail in List:**
```typescript
// Wrong — pulls heavy fields back in
// Wrong - List should not extend Detail
export interface BenchmarkListItem extends Benchmark {
testCaseCount?: number;
}
@@ -283,6 +592,7 @@ export interface BenchmarkListItem extends Benchmark {
**DO create separate subset:**
```typescript
// Correct - List is a subset with computed fields
export interface BenchmarkListItem {
id: string;
name: string;
@@ -293,14 +603,14 @@ export interface BenchmarkListItem {
**DON'T mix entities in one file:**
```text
// Wrong all entities in agentEvalEntities.ts
```typescript
// Wrong - all entities in agentEvalEntities.ts
```
**DO separate by entity:**
```text
// Correct separate files
```typescript
// Correct - separate files
// benchmark.ts
// agentEvalDataset.ts
// agentEvalRun.ts
@@ -310,5 +620,5 @@ export interface BenchmarkListItem {
## Related Skills
- `data-fetching-architecture` — how to fetch and update this data
- `zustand` — general Zustand patterns
- `data-fetching` - How to fetch and update this data
- `zustand` - General Zustand patterns
@@ -1,118 +0,0 @@
# Reducer Pattern (for Detail Map)
## Why Use a Reducer?
- **Immutable updates** — Immer makes immutability easy
- **Type-safe actions** — discriminated union of action types prevents typos
- **Testable** — pure function, easy to unit test
- **Reusable** — same reducer powers optimistic updates and server-data writes
## Reducer Structure
```typescript
// src/store/eval/slices/benchmark/reducer.ts
import { produce } from 'immer';
import type { AgentEvalBenchmark } from '@lobechat/types';
// Action types — discriminated union
type SetBenchmarkDetailAction = {
id: string;
type: 'setBenchmarkDetail';
value: AgentEvalBenchmark;
};
type UpdateBenchmarkDetailAction = {
id: string;
type: 'updateBenchmarkDetail';
value: Partial<AgentEvalBenchmark>;
};
type DeleteBenchmarkDetailAction = {
id: string;
type: 'deleteBenchmarkDetail';
};
export type BenchmarkDetailDispatch =
| SetBenchmarkDetailAction
| UpdateBenchmarkDetailAction
| DeleteBenchmarkDetailAction;
export const benchmarkDetailReducer = (
state: Record<string, AgentEvalBenchmark> = {},
payload: BenchmarkDetailDispatch,
): Record<string, AgentEvalBenchmark> => {
switch (payload.type) {
case 'setBenchmarkDetail': {
return produce(state, (draft) => {
draft[payload.id] = payload.value;
});
}
case 'updateBenchmarkDetail': {
return produce(state, (draft) => {
if (draft[payload.id]) {
draft[payload.id] = { ...draft[payload.id], ...payload.value };
}
});
}
case 'deleteBenchmarkDetail': {
return produce(state, (draft) => {
delete draft[payload.id];
});
}
default:
return state;
}
};
```
## Internal Dispatch Methods
The slice exposes two `internal_*` methods so the reducer and the loading state stay encapsulated behind a stable contract:
```typescript
// In action.ts
export interface BenchmarkAction {
// ... other methods ...
// Internal — not for direct UI use
internal_dispatchBenchmarkDetail: (payload: BenchmarkDetailDispatch) => void;
internal_updateBenchmarkDetailLoading: (id: string, loading: boolean) => void;
}
export const createBenchmarkSlice: StateCreator<...> = (set, get) => ({
// ... other methods ...
// Dispatch to reducer
internal_dispatchBenchmarkDetail: (payload) => {
const currentMap = get().benchmarkDetailMap;
const nextMap = benchmarkDetailReducer(currentMap, payload);
// Skip set when nothing changed — avoids unnecessary re-renders
if (isEqual(nextMap, currentMap)) return;
set(
{ benchmarkDetailMap: nextMap },
false,
`dispatchBenchmarkDetail/${payload.type}`,
);
},
// Update loading state for a specific id
internal_updateBenchmarkDetailLoading: (id, loading) => {
set(
(state) => ({
loadingBenchmarkDetailIds: loading
? [...state.loadingBenchmarkDetailIds, id]
: state.loadingBenchmarkDetailIds.filter((i) => i !== id),
}),
false,
'updateBenchmarkDetailLoading',
);
},
});
```
The `internal_` prefix is a convention — UI components should call the public mutation methods (e.g. `updateBenchmark`), which in turn call `internal_dispatch*`. This keeps reducer dispatch shapes out of the component layer.
@@ -1,101 +0,0 @@
# Type Definitions in Detail
The skill body's Type Definitions section covers the rules; this file holds the full worked examples to keep SKILL.md lean.
## Organization
Types should be organized by entity in separate files (not mixed):
```text
@lobechat/types/src/eval/
├── benchmark.ts # Benchmark types
├── agentEvalDataset.ts # Dataset types
├── agentEvalRun.ts # Run types
└── index.ts # Re-exports
```
## Example: Benchmark Types
```typescript
// packages/types/src/eval/benchmark.ts
import type { EvalBenchmarkRubric } from './rubric';
/**
* Full benchmark entity with all fields including heavy data.
*/
export interface AgentEvalBenchmark {
createdAt: Date;
description?: string | null;
id: string;
identifier: string;
isSystem: boolean;
metadata?: Record<string, unknown> | null;
name: string;
referenceUrl?: string | null;
rubrics: EvalBenchmarkRubric[]; // Heavy field
updatedAt: Date;
}
/**
* Lightweight benchmark item — excludes heavy fields, may add computed stats.
*/
export interface AgentEvalBenchmarkListItem {
createdAt: Date;
description?: string | null;
id: string;
identifier: string;
isSystem: boolean;
name: string;
// Note: rubrics NOT included (heavy field)
// Computed statistics for UI display
datasetCount?: number;
runCount?: number;
testCaseCount?: number;
}
```
## Example: Document Types (with heavy content)
```typescript
// packages/types/src/document.ts
/**
* Full document entity — includes heavy content fields.
*/
export interface Document {
id: string;
title: string;
description?: string;
content: string; // Heavy field — full markdown content
editorData: any; // Heavy field — editor state
metadata?: Record<string, unknown>;
createdAt: Date;
updatedAt: Date;
}
/**
* Lightweight document item — excludes heavy content.
*/
export interface DocumentListItem {
id: string;
title: string;
description?: string;
// Note: content and editorData NOT included
createdAt: Date;
updatedAt: Date;
// Computed statistics
wordCount?: number;
lastEditedBy?: string;
}
```
## Heavy Fields to Exclude from List
- Large text content (`content`, `editorData`, `fullDescription`)
- Complex objects (`rubrics`, `config`, `metrics`)
- Binary data (`image`, `file`)
- Large arrays (`messages`, `items`)
The reason these belong only on Detail: list pages render many rows, so pulling heavy fields blows up payload size and slows render. Detail pages render one entity, so the full payload is fine.
+1 -2
View File
@@ -1,7 +1,6 @@
---
name: testing
description: 'Vitest testing guide. Use when writing or updating tests, fixing failing tests, improving coverage, debugging test issues, or setting up mocks.'
user-invocable: false
description: Testing guide using Vitest. Use when writing tests (.test.ts, .test.tsx), fixing failing tests, improving test coverage, or debugging test issues. Triggers on test creation, test debugging, mock setup, or test-related questions.
---
# LobeHub Testing Guide
@@ -117,7 +117,7 @@ it('should handle tool calls', async () => {
toolCalls: [
{
id: 'call_123',
name: 'lobe-web-browsing____search',
name: 'lobe-web-browsing____search____builtin',
arguments: JSON.stringify({ query: 'weather' }),
},
],
+1 -2
View File
@@ -1,7 +1,6 @@
---
name: trpc-router
description: 'TRPC router development guide. Use when creating or modifying src/server/routers, adding procedures, or implementing server-side API endpoints.'
user-invocable: false
description: TRPC router development guide. Use when creating or modifying TRPC routers (src/server/routers/**), adding procedures, or working with server-side API endpoints. Triggers on TRPC router creation, procedure implementation, or API endpoint tasks.
---
# TRPC Router Guide
+2 -8
View File
@@ -1,7 +1,6 @@
---
name: typescript
description: 'LobeHub TypeScript style and type-safety guide. Use when editing TS/TSX/MTS, fixing types, choosing interface vs type, avoiding any/object, import type, async flow, or ts-expect-error.'
user-invocable: false
description: TypeScript code style and optimization guidelines. MUST READ before writing or modifying any TypeScript code (.ts, .tsx, .mts files). Also use when reviewing code quality or implementing type-safe patterns. Triggers on any TypeScript file edit, code style discussions, or type safety questions.
---
# TypeScript Code Style Guide
@@ -29,16 +28,12 @@ user-invocable: false
## Imports
- This project uses `simple-import-sort/imports` and `consistent-type-imports` (`fixStyle: 'separate-type-imports'`)
- **Separate type imports**: always use `import type { ... }` for type-only imports, NOT `import { type ... }` inline syntax
- When a file already has `import type { ... }` from a package and you need to add a value import, keep them as **two separate statements**:
```ts
import type { ChatTopicBotContext } from '@lobechat/types';
import { RequestTrigger } from '@lobechat/types';
```
- Within each import statement, specifiers are sorted **alphabetically by name**
## Code Structure
@@ -47,8 +42,6 @@ user-invocable: false
- Use consistent, descriptive naming; avoid obscure abbreviations
- Replace magic numbers/strings with well-named constants
- Defer formatting to tooling
- Prefer **named exports** over `export default` — keeps refactor renames and IDE auto-import in sync, and avoids the `default` re-naming drift you get with `import Foo from './foo'`. Reserve `export default` for files where the framework requires it (Next.js page/route/layout, React.lazy targets, config files like `vitest.config.ts`)
- Before adding local helpers for common guards/parsing/normalization (record checks, string extraction, empty-string handling, timing helpers, JSON-safe utilities, etc.), search `packages/utils` first. If the helper already exists or clearly belongs there, import it from `@lobechat/utils` (or the relevant `@lobechat/utils/*` subpath) instead of duplicating tiny helpers across feature files.
## UI and Theming
@@ -58,6 +51,7 @@ user-invocable: false
## Performance
- Prefer `for…of` loops over index-based `for` loops
- Reuse existing utils in `packages/utils` or installed npm packages
- Query only required columns from database
File diff suppressed because it is too large Load Diff
@@ -1,20 +1,6 @@
# Cloud Project Workflow Configuration
Cloud-specific workflow configurations and patterns for the lobehub-cloud project.
## Table of Contents
1. [Overview](#overview)
2. [Directory Structure](#directory-structure) — submodule + cloud layout
3. [Cloud-Specific Patterns](#cloud-specific-patterns) — cloud-only workflows + re-export pattern
4. [TypeScript Path Mappings](#typescript-path-mappings)
5. [Workflow Class Location](#workflow-class-location) — cloud-only vs shared
6. [Environment Variables](#environment-variables)
7. [Best Practices](#best-practices) — decide cloud vs OSS, re-export rules, naming
8. [Migration Guide](#migration-guide) — moving workflows from cloud to lobehub
9. [Examples](#examples) — `welcome-placeholder`, `agent-eval-run`
10. [Troubleshooting](#troubleshooting) — circular imports, 404s, type errors
11. [Related Documentation](#related-documentation)
This document covers cloud-specific workflow configurations and patterns for the lobehub-cloud project.
## Overview
@@ -29,7 +15,7 @@ The lobehub-cloud project extends the open-source lobehub codebase with cloud-sp
### Lobehub Submodule (Open-source)
```text
```
lobehub/
└── src/
├── app/(backend)/api/workflows/
@@ -42,7 +28,7 @@ lobehub/
### Lobehub-cloud (Proprietary)
```text
```
lobehub-cloud/
└── src/
├── app/(backend)/api/workflows/
@@ -74,7 +60,7 @@ lobehub-cloud/
**Structure**:
```text
```
lobehub-cloud/src/
├── app/(backend)/api/workflows/
│ └── feature-name/
@@ -176,7 +162,7 @@ This allows cloud to override specific modules while using lobehub defaults.
Place workflow class in cloud:
```text
```
lobehub-cloud/src/server/workflows/featureName/index.ts
```
@@ -184,7 +170,7 @@ lobehub-cloud/src/server/workflows/featureName/index.ts
Place workflow class in lobehub, re-export in cloud if needed:
```text
```
lobehub/src/server/workflows/featureName/index.ts
```
@@ -259,7 +245,7 @@ For shared features:
Follow consistent naming across lobehub and cloud:
```text
```
# Both should use same structure
lobehub/src/app/(backend)/api/workflows/feature-name/
lobehub-cloud/src/app/(backend)/api/workflows/feature-name/
@@ -320,7 +306,7 @@ import { Workflow } from 'lobehub/src/server/workflows/feature';
**Structure**:
```text
```
lobehub-cloud/
├── src/app/(backend)/api/workflows/welcome-placeholder/
│ ├── process-users/route.ts
@@ -1,226 +0,0 @@
# Best Practices & Common Pitfalls
Apply these once your scaffold from `implementation.md` is in place.
## Table of Contents
1. [Error Handling](#1-error-handling)
2. [Logging](#2-logging)
3. [Return Values](#3-return-values)
4. [flowControl Configuration](#4-flowcontrol-configuration)
5. [context.run() Best Practices](#5-contextrun-best-practices)
6. [Payload Validation](#6-payload-validation)
7. [Database Connection](#7-database-connection)
8. [Testing](#8-testing)
9. [Common Pitfalls](#common-pitfalls)
---
## 1. Error Handling
```typescript
export const { POST } = serve<Payload>(
async (context) => {
const { itemId } = context.requestPayload ?? {};
if (!itemId) {
return { success: false, error: 'Missing itemId in payload' };
}
try {
const result = await context.run('step-name', () => doWork(itemId));
return { success: true, itemId, result };
} catch (error) {
console.error('[workflow:error]', error);
return {
success: false,
error: error instanceof Error ? error.message : 'Unknown error',
};
}
},
{ flowControl: { ... } },
);
```
## 2. Logging
Consistent prefixes make debugging much easier across QStash dashboards and grep:
```typescript
console.log('[{workflow}:{layer}] Starting with payload:', payload);
console.log('[{workflow}:{layer}] Processing items:', { count: items.length });
console.log('[{workflow}:{layer}] Completed:', result);
console.error('[{workflow}:{layer}:error]', error);
```
## 3. Return Values
Pick the shape that matches the layer's purpose — entry points return statistics, execution layers return per-item results.
```typescript
// Success
return { success: true, itemId, result, message: 'Optional success message' };
// Error
return { success: false, error: 'Error description', itemId };
// Statistics (entry point)
return {
success: true,
totalEligible: 100,
toProcess: 80,
alreadyProcessed: 20,
dryRun: true, // if applicable
message: 'Summary message',
};
```
## 4. flowControl Configuration
Tune concurrency by layer — entry points are singletons, execution layers fan out.
```typescript
// Layer 1: Entry — single instance to avoid duplicate processing
flowControl: { key: '{workflow}.process', parallelism: 1, ratePerSecond: 1 }
// Layer 2: Pagination — moderate concurrency
flowControl: { key: '{workflow}.paginate', parallelism: 20, ratePerSecond: 5 }
// Layer 3: Execution — higher concurrency for parallel item work
flowControl: { key: '{workflow}.execute', parallelism: 10, ratePerSecond: 5 }
```
**Why these defaults:**
- **Layer 1** always uses `parallelism: 1` so concurrent triggers don't both start the same batch.
- **Layer 2** can fan out widely (10-20) since pagination is cheap.
- **Layer 3** caps at 5-10 by default; raise/lower based on external API rate limits.
## 5. context.run() Best Practices
- Use descriptive step names with prefixes: `{workflow}:step-name`
- Each step should be idempotent (safe to retry)
- Don't nest `context.run()` calls — keep them flat
- Use unique step names when processing multiple items:
```typescript
// ✅ Unique step names
await Promise.all(
items.map((item) => context.run(`{workflow}:execute:${item.id}`, () => processItem(item))),
);
// ❌ Same step name — Upstash de-dupes by step name and you'll lose data
await Promise.all(items.map((item) => context.run(`{workflow}:execute`, () => processItem(item))));
```
## 6. Payload Validation
Validate at the top so failures are explicit, not silent `undefined` cascades:
```typescript
export const { POST } = serve<Payload>(
async (context) => {
const { itemId, configId } = context.requestPayload ?? {};
if (!itemId) return { success: false, error: 'Missing itemId in payload' };
if (!configId) return { success: false, error: 'Missing configId in payload' };
// Proceed with work...
},
{ flowControl: { ... } },
);
```
## 7. Database Connection
Get the connection once per workflow — `getServerDB()` is async, repeating it inside each step adds latency:
```typescript
export const { POST } = serve<Payload>(
async (context) => {
const db = await getServerDB();
const item = await context.run('get-item', () => itemModel.findById(db, itemId));
const result = await context.run('save-result', () => resultModel.create(db, result));
},
{ flowControl: { ... } },
);
```
## 8. Testing
Integration tests should exercise both the dry-run statistics path and the full execution path:
```typescript
describe('WorkflowName', () => {
it('should process items successfully', async () => {
const items = await createTestItems();
await WorkflowClass.triggerProcessItems({ dryRun: false });
await waitForCompletion();
const results = await getResults();
expect(results).toHaveLength(items.length);
});
it('should support dryRun mode', async () => {
const result = await WorkflowClass.triggerProcessItems({ dryRun: true });
expect(result).toMatchObject({
success: true,
dryRun: true,
totalEligible: expect.any(Number),
toProcess: expect.any(Number),
});
});
});
```
---
## Common Pitfalls
### ❌ Reusing `context.run()` step names
```typescript
// Bad — Upstash dedupes by step name
await Promise.all(items.map((item) => context.run('process', () => process(item))));
// Good
await Promise.all(items.map((item) => context.run(`process:${item.id}`, () => process(item))));
```
### ❌ Skipping payload validation
```typescript
// Bad — undefined cascades into a confusing failure later
const { itemId } = context.requestPayload ?? {};
const result = await process(itemId);
// Good — fail fast with a clear error
if (!itemId) return { success: false, error: 'Missing itemId' };
```
### ❌ Skipping the filter step
```typescript
// Bad — duplicates work for items that were already processed
const allItems = await getAllItems();
await Promise.all(allItems.map((item) => triggerExecute(item)));
// Good — keeps the pipeline idempotent
const allItems = await getAllItems();
const itemsNeedingProcessing = await filterExisting(allItems);
await Promise.all(itemsNeedingProcessing.map((item) => triggerExecute(item)));
```
### ❌ Inconsistent logging
```typescript
// Bad — different prefixes, mixed formats
console.log('Starting workflow');
log.info('Processing item:', itemId);
console.log(`Done with ${itemId}`);
// Good — uniform prefix lets you grep by workflow+layer
console.log('[workflow:layer] Starting with payload:', payload);
console.log('[workflow:layer] Processing item:', { itemId });
console.log('[workflow:layer] Completed:', { itemId, result });
```
@@ -1,91 +0,0 @@
# Worked Examples
Two real workflows already in the codebase that follow this skill's pattern verbatim. Skim them when you want to see the pattern applied to concrete entities.
## Example 1: Welcome Placeholder
**Use case:** Generate AI-powered welcome placeholders for users.
**Structure:**
- Layer 1: `process-users` — entry point, checks eligible users
- Layer 2: `paginate-users` — paginates through active users
- Layer 3: `generate-user` — generates placeholders for ONE user
**Key features:**
- Filters users who already have cached placeholders in Redis
- `paidOnly` flag to scope to subscribed users
- `dryRun` mode for statistics
- Fan-out for large user batches (`CHUNK_SIZE=20`)
**Layer 3 shape:**
```typescript
export const { POST } = serve<GenerateUserPlaceholderPayload>(async (context) => {
const { userId } = context.requestPayload ?? {};
const workflow = new WelcomePlaceholderWorkflow(db, userId);
const placeholders = await context.run('generate', () => workflow.generate());
return { success: true, userId, placeholdersCount: placeholders.length };
});
```
**Files:**
- `/api/workflows/welcome-placeholder/process-users/route.ts`
- `/api/workflows/welcome-placeholder/paginate-users/route.ts`
- `/api/workflows/welcome-placeholder/generate-user/route.ts`
- `/server/workflows/welcomePlaceholder/index.ts`
---
## Example 2: Agent Welcome
**Use case:** Generate welcome messages and open questions for AI agents.
**Structure:**
- Layer 1: `process-agents` — entry point, checks eligible agents
- Layer 2: `paginate-agents` — paginates through active agents
- Layer 3: `generate-agent` — generates welcome data for ONE agent
**Key features:**
- Filters agents who already have cached data in Redis
- `paidOnly` flag for subscribed users' agents only
- `dryRun` mode for statistics
- Fan-out for large agent batches (`CHUNK_SIZE=20`)
**Layer 3 shape:**
```typescript
export const { POST } = serve<GenerateAgentWelcomePayload>(async (context) => {
const { agentId } = context.requestPayload ?? {};
const workflow = new AgentWelcomeWorkflow(db, agentId);
const data = await context.run('generate', () => workflow.generate());
return { success: true, agentId, data };
});
```
**Files:**
- `/api/workflows/agent-welcome/process-agents/route.ts`
- `/api/workflows/agent-welcome/paginate-agents/route.ts`
- `/api/workflows/agent-welcome/generate-agent/route.ts`
- `/server/workflows/agentWelcome/index.ts`
---
## What's identical, what differs
Both workflows are the **same pattern** — they only differ in:
- Entity type (users vs agents)
- Business logic (placeholder generation vs welcome generation)
- Data source (different database queries)
Everything else — the 3-layer split, dry-run handling, fan-out, filter-existing, flowControl tuning — is identical. That's the whole point: once you internalize the pattern, adding a new workflow is mostly entity-substitution.
@@ -1,333 +0,0 @@
# Implementation Patterns
Full code templates for the 3-layer architecture. Read this when actually writing workflow files.
## Table of Contents
1. [Workflow Class](#workflow-class) — `src/server/workflows/{workflowName}/index.ts`
2. [Layer 1: Entry Point](#layer-1-entry-point-process-) — `process-*` route
3. [Layer 2: Pagination](#layer-2-pagination-paginate-) — `paginate-*` route
4. [Layer 3: Execution](#layer-3-execution-execute--generate-) — `execute-*` / `generate-*` route
---
## Workflow Class
**Location:** `src/server/workflows/{workflowName}/index.ts`
```typescript
import { Client } from '@upstash/workflow';
import debug from 'debug';
const log = debug('lobe-server:workflows:{workflow-name}');
// Workflow paths
const WORKFLOW_PATHS = {
processItems: '/api/workflows/{workflow-name}/process-items',
paginateItems: '/api/workflows/{workflow-name}/paginate-items',
executeItem: '/api/workflows/{workflow-name}/execute-item',
} as const;
// Payload types
export interface ProcessItemsPayload {
dryRun?: boolean;
force?: boolean;
}
export interface PaginateItemsPayload {
cursor?: string;
itemIds?: string[]; // For fanout chunks
}
export interface ExecuteItemPayload {
itemId: string;
}
const getWorkflowUrl = (path: string): string => {
const baseUrl = process.env.APP_URL;
if (!baseUrl) throw new Error('APP_URL is required to trigger workflows');
return new URL(path, baseUrl).toString();
};
const getWorkflowClient = (): Client => {
const token = process.env.QSTASH_TOKEN;
if (!token) throw new Error('QSTASH_TOKEN is required to trigger workflows');
const config: ConstructorParameters<typeof Client>[0] = { token };
if (process.env.QSTASH_URL) {
(config as Record<string, unknown>).url = process.env.QSTASH_URL;
}
return new Client(config);
};
export class {WorkflowName}Workflow {
private static client: Client;
private static getClient(): Client {
if (!this.client) this.client = getWorkflowClient();
return this.client;
}
static triggerProcessItems(payload: ProcessItemsPayload) {
const url = getWorkflowUrl(WORKFLOW_PATHS.processItems);
log('Triggering process-items workflow');
return this.getClient().trigger({ body: payload, url });
}
static triggerPaginateItems(payload: PaginateItemsPayload) {
const url = getWorkflowUrl(WORKFLOW_PATHS.paginateItems);
log('Triggering paginate-items workflow');
return this.getClient().trigger({ body: payload, url });
}
static triggerExecuteItem(payload: ExecuteItemPayload) {
const url = getWorkflowUrl(WORKFLOW_PATHS.executeItem);
log('Triggering execute-item workflow: %s', payload.itemId);
return this.getClient().trigger({ body: payload, url });
}
/**
* Filter items that need processing (e.g. check Redis cache, database state).
* Return only the ones that actually need work — keeps the pipeline idempotent.
*/
static async filterItemsNeedingProcessing(itemIds: string[]): Promise<string[]> {
if (itemIds.length === 0) return [];
// Check existing state and return items that need processing
return itemIds;
}
}
```
---
## Layer 1: Entry Point (process-\*)
**Purpose:** Validates prerequisites, calculates statistics, supports dry-run mode.
```typescript
import { serve } from '@upstash/workflow/nextjs';
import { getServerDB } from '@/database/server';
import { WorkflowClass, type ProcessPayload } from '@/server/workflows/{workflowName}';
export const { POST } = serve<ProcessPayload>(
async (context) => {
const { dryRun, force } = context.requestPayload ?? {};
console.log('[{workflow}:process] Starting with payload:', { dryRun, force });
const allItemIds = await context.run('{workflow}:get-all-items', async () => {
const db = await getServerDB();
// Query database for eligible items
return items.map((item) => item.id);
});
console.log('[{workflow}:process] Total eligible items:', allItemIds.length);
if (allItemIds.length === 0) {
return { success: true, totalEligible: 0, message: 'No eligible items found' };
}
const itemsNeedingProcessing = await context.run('{workflow}:filter-existing', () =>
WorkflowClass.filterItemsNeedingProcessing(allItemIds),
);
const result = {
success: true,
totalEligible: allItemIds.length,
toProcess: itemsNeedingProcessing.length,
alreadyProcessed: allItemIds.length - itemsNeedingProcessing.length,
};
// Dry-run short-circuits before any side effects
if (dryRun) {
console.log('[{workflow}:process] Dry run mode, returning statistics only');
return {
...result,
dryRun: true,
message: `[DryRun] Would process ${itemsNeedingProcessing.length} items`,
};
}
if (itemsNeedingProcessing.length === 0) {
return { ...result, message: 'All items already processed' };
}
await context.run('{workflow}:trigger-paginate', () => WorkflowClass.triggerPaginateItems({}));
return {
...result,
message: `Triggered pagination for ${itemsNeedingProcessing.length} items`,
};
},
{
flowControl: {
key: '{workflow}.process',
parallelism: 1, // single instance — avoids duplicate processing
ratePerSecond: 1,
},
},
);
```
---
## Layer 2: Pagination (paginate-\*)
**Purpose:** Handles cursor-based pagination, implements fan-out for large batches.
```typescript
import { serve } from '@upstash/workflow/nextjs';
import { chunk } from 'es-toolkit/compat';
import { getServerDB } from '@/database/server';
import { WorkflowClass, type PaginatePayload } from '@/server/workflows/{workflowName}';
const PAGE_SIZE = 50;
const CHUNK_SIZE = 20;
export const { POST } = serve<PaginatePayload>(
async (context) => {
const { cursor, itemIds: payloadItemIds } = context.requestPayload ?? {};
console.log('[{workflow}:paginate] Starting:', {
cursor,
itemIdsCount: payloadItemIds?.length ?? 0,
});
// If specific itemIds were passed in (from a fanout chunk), process them directly
if (payloadItemIds && payloadItemIds.length > 0) {
await Promise.all(
payloadItemIds.map((itemId) =>
context.run(`{workflow}:execute:${itemId}`, () =>
WorkflowClass.triggerExecuteItem({ itemId }),
),
),
);
return { success: true, processedItems: payloadItemIds.length };
}
// Paginate through all items
const itemBatch = await context.run('{workflow}:get-batch', async () => {
const db = await getServerDB();
const items = await db.query(...);
if (!items.length) return { ids: [] };
const last = items.at(-1);
return {
ids: items.map((item) => item.id),
cursor: last ? last.id : undefined,
};
});
const batchItemIds = itemBatch.ids;
const nextCursor = 'cursor' in itemBatch ? itemBatch.cursor : undefined;
if (batchItemIds.length === 0) {
return { success: true, message: 'Pagination complete' };
}
const itemIds = await context.run('{workflow}:filter-existing', () =>
WorkflowClass.filterItemsNeedingProcessing(batchItemIds),
);
if (itemIds.length > 0) {
if (itemIds.length > CHUNK_SIZE) {
// Fan out — recursively re-enter pagination with each chunk
const chunks = chunk(itemIds, CHUNK_SIZE);
console.log('[{workflow}:paginate] Fanout mode:', {
chunks: chunks.length,
chunkSize: CHUNK_SIZE,
});
await Promise.all(
chunks.map((ids, idx) =>
context.run(`{workflow}:fanout:${idx + 1}/${chunks.length}`, () =>
WorkflowClass.triggerPaginateItems({ itemIds: ids }),
),
),
);
} else {
// Process this page directly
await Promise.all(
itemIds.map((itemId) =>
context.run(`{workflow}:execute:${itemId}`, () =>
WorkflowClass.triggerExecuteItem({ itemId }),
),
),
);
}
}
// Tail-call into the next page
if (nextCursor) {
await context.run('{workflow}:next-page', () =>
WorkflowClass.triggerPaginateItems({ cursor: nextCursor }),
);
}
return {
success: true,
processedItems: itemIds.length,
skippedItems: batchItemIds.length - itemIds.length,
nextCursor: nextCursor ?? null,
};
},
{
flowControl: {
key: '{workflow}.paginate',
parallelism: 20,
ratePerSecond: 5,
},
},
);
```
---
## Layer 3: Execution (execute-\* / generate-\*)
**Purpose:** Performs the actual business logic for exactly ONE item.
```typescript
import { serve } from '@upstash/workflow/nextjs';
import { getServerDB } from '@/database/server';
import { WorkflowClass, type ExecutePayload } from '@/server/workflows/{workflowName}';
export const { POST } = serve<ExecutePayload>(
async (context) => {
const { itemId } = context.requestPayload ?? {};
if (!itemId) {
return { success: false, error: 'Missing itemId' };
}
const db = await getServerDB();
const item = await context.run('{workflow}:get-item', async () => {
// Query database for item
return item;
});
if (!item) {
return { success: false, error: 'Item not found' };
}
const result = await context.run('{workflow}:process-item', async () => {
const workflow = new WorkflowClass(db, itemId);
return workflow.generate(); // or process(), execute(), etc.
});
await context.run('{workflow}:save-result', async () => {
const workflow = new WorkflowClass(db, itemId);
return workflow.saveToRedis(result); // or saveToDatabase(), etc.
});
return { success: true, itemId, result };
},
{
flowControl: {
key: '{workflow}.execute',
parallelism: 10,
ratePerSecond: 5,
},
},
);
```
+107 -48
View File
@@ -1,57 +1,91 @@
---
name: version-release
description: 'Version release workflow — release process and GitHub Release notes (not docs/changelog pages).'
disable-model-invocation: true
argument-hint: '[minor|patch] [version?]'
description: "Version release workflow. Use when the user mentions 'release', 'hotfix', 'version upgrade', 'weekly release', or '发版'/'发布'/'小班车'. Provides guides for Minor Release and Patch Release workflows."
---
# Version Release Workflow
This skill is a router. The detailed steps live in `references/`.
## Scope Boundary (Important)
This skill is only for:
1. Release branch / PR workflow
2. CI trigger constraints (`auto-tag-release.yml`)
3. GitHub Release note writing
This skill is **not** for writing `docs/changelog/*.mdx`.\
If the user asks for website changelog pages, load `../docs-changelog/SKILL.md`.
## Mandatory Companion Skill
For every `/version-release` execution, you MUST load and apply:
- `../microcopy/SKILL.md`
## Overview
The primary development branch is **canary**. All day-to-day development happens on canary. When releasing, canary is merged into main. After merge, `auto-tag-release.yml` automatically handles tagging, version bumping, creating a GitHub Release, and syncing back to the canary branch.
Only two release types are used in practice (major releases are extremely rare and can be ignored):
| Type | Use Case | Frequency | Source Branch | PR Title Format | Version | Reference |
| ----- | ---------------------------------------------- | --------------------- | -------------- | ------------------------------------ | ------------- | --------------------------------------- |
| Minor | Feature iteration release | \~Every 4 weeks | canary | `🚀 release: v{x.y.0}` | Manually set | `references/minor-release.md` |
| Patch | Weekly release / hotfix / model / DB migration | \~Weekly or as needed | canary or main | Custom (e.g. `🚀 release: 20260222`) | Auto patch +1 | `references/patch-release-scenarios.md` |
| Type | Use Case | Frequency | Source Branch | PR Title Format | Version |
| ----- | ---------------------------------------------- | --------------------- | -------------- | ------------------------------------ | ------------- |
| Minor | Feature iteration release | \~Every 4 weeks | canary | `🚀 release: v{x.y.0}` | Manually set |
| Patch | Weekly release / hotfix / model / DB migration | \~Weekly or as needed | canary or main | Custom (e.g. `🚀 release: 20260222`) | Auto patch +1 |
For writing the release-note body (any release type), see `references/release-notes-style.md`.
## Minor Release Workflow
## Auto-Release Trigger Rules (`auto-tag-release.yml`)
Used to publish a new minor version (e.g. v2.2.0), roughly every 4 weeks.
### Steps
1. **Create a release branch from canary**
```bash
git checkout canary
git pull origin canary
git checkout -b release/v{version}
git push -u origin release/v{version}
```
2. **Determine the version number** — Read the current version from `package.json` and compute the next minor version (e.g. 2.1.x → 2.2.0)
3. **Create a PR to main**
```bash
gh pr create \
--title "🚀 release: v{version}" \
--base main \
--head release/v{version} \
--body "## 📦 Release v{version} ..."
```
> \[!IMPORTANT]: The PR title must strictly match the `🚀 release: v{x.y.z}` format. CI uses a regex on this title to determine the exact version number.
4. **Automatic trigger after merge**: auto-tag-release detects the title format and uses the version number from the title to complete the release.
### Scripts
```bash
bun run release:branch # Interactive
bun run release:branch --minor # Directly specify minor
```
## Patch Release Workflow
Version number is automatically bumped by patch +1. There are 4 common scenarios:
| Scenario | Source Branch | Branch Naming | Description |
| ------------------- | ------------- | ----------------------------- | ------------------------------------------------ |
| Weekly Release | canary | `release/weekly-{YYYYMMDD}` | Weekly release train, canary → main |
| Bug Hotfix | main | `hotfix/v{version}-{hash}` | Emergency bug fix |
| New Model Launch | canary | Community PR merged directly | New model launch, triggered by PR title prefix |
| DB Schema Migration | main | `release/db-migration-{name}` | Database migration, requires dedicated changelog |
All scenarios auto-bump patch +1. Patch PR titles do not need a version number. See `reference/patch-release-scenarios.md` for detailed steps per scenario.
### Scripts
```bash
bun run hotfix:branch # Hotfix scenario
```
## Auto-Release Trigger Rules (auto-tag-release.yml)
After a PR is merged into main, CI determines whether to release based on the following priority:
### 1. Minor Release (Exact Version)
PR title matches `🚀 release: v{x.y.z}` -> uses the version number from the title.
PR title matches `🚀 release: v{x.y.z}` uses the version number from the title.
### 2. Patch Release (Auto patch +1)
Triggered by the following priority:
- **Branch name match**: `hotfix/*` or `release/*` -> triggers directly (skips title detection)
- **Branch name match**: `hotfix/*` or `release/*` triggers directly (skips title detection)
- **Title prefix match**: PRs with the following title prefixes will trigger:
- `style` / `💄 style`
- `feat` / `✨ feat`
@@ -62,39 +96,64 @@ Triggered by the following priority:
### 3. No Trigger
PRs that don't match any conditions above (e.g. `docs`, `chore`, `ci`, `test`) will not trigger a release when merged into main.
PRs that don't match any of the above conditions (e.g. `docs`, `chore`, `ci`, `test` prefixes) will not trigger a release when merged into main.
## Post-Release Automated Actions
1. **Bump `package.json`** — commits `🔖 chore(release): release version v{x.y.z} [skip ci]`
1. **Bump package.json** — commits `🔖 chore(release): release version v{x.y.z} [skip ci]`
2. **Create annotated tag**`v{x.y.z}`
3. **Create GitHub Release**
4. **Dispatch `sync-main-to-canary`** — syncs main back to canary
4. **Dispatch sync-main-to-canary** — syncs main back to the canary branch
## Agent Action Guide
## Claude Action Guide
When the user requests a release:
### Precheck (applies to all release types)
### Minor Release
1. Read `package.json` to get the current version and compute the next minor version
2. Create a `release/v{version}` branch from canary
3. Push and create a PR — **title must be `🚀 release: v{version}`**
4. Inform the user that merging the PR will automatically trigger the release
### Precheck
Before creating the release branch, verify the source branch:
- **Weekly Release** (`release/weekly-*`): must branch from `canary`
- **All other release/hotfix branches**: must branch from `main`; run `git merge-base --is-ancestor main <branch> && echo OK`
- If the branch is based on the wrong source, recreate from the correct base
- **All other release/hotfix branches**: must branch from `main` run `git merge-base --is-ancestor main <branch> && echo OK` to confirm
- If the branch is based on the wrong source, delete and recreate from the correct base
### Routing
### Patch Release
Pick the right reference and follow it end-to-end:
Choose the appropriate workflow based on the scenario (see `reference/patch-release-scenarios.md`):
- **Minor release** → `references/minor-release.md`
- **Patch release** (weekly / hotfix / model launch / DB migration) → `references/patch-release-scenarios.md`
- **Writing the PR body / release notes** (any release type) → `references/release-notes-style.md`
- **Weekly Release**: Create a `release/weekly-{YYYYMMDD}` branch from canary, scan `git log main..canary` to write the changelog, title like `🚀 release: 20260222`
- **Bug Hotfix**: Create a `hotfix/` branch from main, use a gitmoji prefix title (e.g. `🐛 fix: ...`)
- **New Model Launch**: Community PRs trigger automatically via title prefix (`feat` / `style`), no extra steps needed
- **DB Migration**: Create a `release/db-migration-{name}` branch from main, cherry-pick migration commits, write a dedicated migration changelog
### Hard Rules (apply to every release type)
### Important Notes
- **Do NOT** manually modify `package.json` version — CI handles it.
- **Do NOT** manually create tags — CI handles them.
- Minor PR title format is strict (`🚀 release: v{x.y.z}`).
- Patch PRs do not need an explicit version number.
- Keep release facts accurate; do not invent metrics or availability statements. Release-note inputs (compare base, PR refs, contributor list) **must be derived from `git`** per `references/release-notes-style.md` § Computing Inputs — never from memory or descriptions.
- **Do NOT manually modify the version in package.json** — CI will auto-bump it
- **Do NOT manually create tags** — CI will create them automatically
- The Minor Release PR title format is a hard requirement — incorrect format will not use the specified version number
- Patch PRs do not need a version number — CI auto-bumps patch +1
- All release PRs must include a user-facing changelog
## Changelog Writing Guidelines
All release PR bodies (both Minor and Patch) must include a user-facing changelog. Scan changes via `git log main..canary --oneline` or `git diff main...canary --stat`, then write following the format below.
### Format Reference
- Weekly Release: See `reference/changelog-example/weekly-release.md`
- DB Migration: See `reference/changelog-example/db-migration.md`
### Writing Tips
- **User-facing**: Describe changes that users can perceive, not internal implementation details
- **Clear categories**: Group by features, models/providers, desktop, stability/fixes, etc.
- **Highlight key items**: Use `**bold**` for important feature names
- **Credit contributors**: Collect all committers via `git log` and list alphabetically
- **Flexible categories**: Choose categories based on actual changes — no need to force-fit all categories
@@ -0,0 +1,20 @@
# DB Schema Migration Changelog Example
A changelog reference for database migration release PR bodies.
---
This release includes a **database schema migration** involving **5 new tables** for the Agent Evaluation Benchmark system.
### Migration: Add Agent Evaluation Benchmark Tables
- Added 5 new tables: `agent_eval_benchmarks`, `agent_eval_datasets`, `agent_eval_records`, `agent_eval_runs`, `agent_eval_run_topics`
### Notes for Self-hosted Users
- The migration runs automatically on application startup
- No manual intervention required
The migration owner: @{pr-author} — responsible for this database schema change, reach out for any migration-related issues.
> **Note for Claude**: Replace `{pr-author}` with the actual PR author. Retrieve via `gh pr view <number> --json author --jq '.author.login'` or `git log` commit author. Do NOT hardcode a username.
@@ -0,0 +1,46 @@
# Patch Release (Weekly) Changelog Example
A real-world changelog reference for weekly patch release PR bodies.
---
This release includes **82 commits** , Key updates are below.
### New Features and Enhancements
- Added **Agent Benchmark** support for more systematic agent performance evaluation.
- Introduced the **video generation** feature end-to-end, including entry points, sidebar "new" badge support, and skeleton loading for topic switching.
- Expanded memory capabilities: support for memory effort/tool permission configuration and improved timeout calculation for memory analysis tasks.
- Added desktop editor support for image upload via file picker.
### Models and Provider Expansion
- Added a new provider: **Straico**.
- Added/updated support for:
- Claude Sonnet 4.6
- Gemini 3.1 Pro Preview
- Qwen3.5 series
- Grok Imagine (`grok-imagine-image`)
- MiniMax 2.5
- Added related i18n copy and model parameter adaptations.
### Desktop Improvements
- Integrated `electron-liquid-glass` (macOS Tahoe).
- Improved DMG background assets and desktop release workflow.
### Stability, Security, and UX Fixes
- Fixed multiple video generation pipeline issues: precharge refund handling, webhook token verification, pricing parameter usage, asset cleanup, and type safety.
- Fixed `sanitizeFileName` path traversal risks and added unit tests.
- Fixed MCP media URL generation with duplicated `APP_URL` prefix.
- Fixed Qwen3 embedding failures caused by batch-size limits.
- Fixed multiple UI/interaction issues, including mobile header agent selector/topic count, ChatInput scrolling behavior, and tooltip stacking context.
- Fixed missing `@napi-rs/canvas` native bindings in Docker standalone builds.
- Improved GitHub Copilot authentication retry behavior and response error handling in edge cases.
### Credits
Huge thanks to these contributors (alphabetical):
@AmAzing129 @Coooolfan @Innei @ONLY-yours @Zhouguanyang @arvinxx @eaten-cake @hezhijie0327 @nekomeowww @rdmclin2 @rivertwilight @sxjeru @tjx666
@@ -21,16 +21,12 @@ git push -u origin release/weekly-{YYYYMMDD}
2. **Scan changes and write changelog**
Compute the previous tag from main first — never reuse the last weekly's tag, since hotfixes published in between will be missed:
```bash
git fetch origin main canary --tags
PREV_TAG=$(git describe --tags --abbrev=0 origin/main --match 'v*.*.*' --exclude '*-canary*' --exclude '*-nightly*')
git log "$PREV_TAG..origin/release/weekly-{YYYYMMDD}" --oneline --no-merges
git diff "$PREV_TAG...origin/release/weekly-{YYYYMMDD}" --stat
git log main..canary --oneline
git diff main...canary --stat
```
Then follow `./release-notes-style.md` § **Computing Inputs (Hard Rules)** to derive PR refs, metrics, and contributors. Every `(#XXXX)` in the body must come from actual commit subjects in this range — never inferred from descriptions.
Write a user-facing changelog following the format in `patch-release-changelog-example.md`.
3. **Create PR to main** with the changelog as the PR body
@@ -63,10 +59,7 @@ git push -u origin hotfix/v{version}-{short-hash}
2. **Create PR to main** with a gitmoji prefix title (e.g. `🐛 fix: description`)
3. **Write a short hotfix changelog** — See `changelog-example/hotfix.md`. Keep it minimal: scope line, 1-3 fix bullets (symptom + fix in one sentence), upgrade note, owner. No long root-cause section — that lives in the commit message.
- **Hotfix owner**: Use the actual PR author (retrieve via `gh pr view <number> --json author --jq '.author.login'`), never hardcode a username.
4. **After merge**: auto-tag-release detects `hotfix/*` branch → auto patch +1.
3. **After merge**: auto-tag-release detects `hotfix/*` branch → auto patch +1.
### Script

Some files were not shown because too many files have changed in this diff Show More