mirror of
https://github.com/lobehub/lobe-chat.git
synced 2026-06-13 19:20:04 +00:00
✨ feat(verify): Agent Run delivery checker system (#15489)
* 🗃️ feat(database): add verify system tables for agent run delivery checker Implement the database layer for the Agent Run delivery checker (Verify System). Reuse / definition layer: - verify_criteria: a single reusable pass/fail standard (atomic unit), carrying its verifier config + onFail default and bound to a document for judging guidance (iteration history reuses document_history; no version columns) - verify_rubrics: a named group that aggregates criteria — the reusable unit - verify_rubric_criteria: junction, which criteria a rubric aggregates (criteria are reusable across rubrics) Mounted onto an agent via the existing agency config jsonb: - agencyConfig.verifyRubricId: a reusable rubric (criteria template) - agencyConfig.verifyCriteriaIds: ad-hoc one-off criteria A run's plan instantiates the union of both. No dedicated bindings table. Snapshot + result layer: - agent_operations.verify_plan (jsonb) + verify_plan_confirmed_at: the per-run immutable check-item snapshot lives ON the operation (1:1 — auto-repair spawns a new operation), instead of a separate plans table - agent_operations.verify_status: denormalized rollup for list-page badges - verify_check_results: per-criterion result with the Toulmin model (verdict/confidence as columns, narrative in a typed toulmin jsonb), N:1 verifier_tracing_id for batch judging, FP/FN flags for the data flywheel; relates to the plan via operation_id + stable check_item_id Ref: LOBE-10019 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ✨ feat(verify): add Agent Run delivery checker backend + frontend module Implements the verify system on top of the schema (PR #15480): - models: verifyCriterion / verifyRubric (+junction) / verifyCheckResult; agentOperation verify plan/status methods - services/verify: AI plan generation (auto-create criteria), executor with LLM Toulmin judge (per-criterion + batch), program placeholder, agent & auto-repair spawner seams, rollup chokepoint, feedback fp/fn, completion lifecycle bridge - lambda verify router (criteria/rubric CRUD, plan, results, feedback) - frontend feature module: service, SWR hooks, CheckerDock state machine, RunArtifact, verify i18n namespace - tracing scenarios: VerifyPlanGen / VerifyJudge Live UI mount (dock/artifact into chat) pending server operationId source. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * 🐛 fix(verify): persist delivery-checker verdicts via async tracing backfill The LLM judge produced valid verdicts but they were never persisted, leaving every run stuck at `verifying`. Two root causes: 1. FK ordering: `writeVerdict` stamped `verifier_tracing_id` synchronously, but the `llm_generation_tracing` row is written asynchronously (best-effort, after the response) — so the hard FK was violated every time and the verdict write was rolled back. Now the verdict is written with a null link, and the tracing id is backfilled by an `onPersisted` callback that fires only after the tracing row commits (still non-blocking). If tracing is disabled the link simply stays null. 2. Verdict parse: the judge JSON schema is non-strict, so the provider returns optional Toulmin fields as explicit `null`. The Zod validator used `.optional()` (accepts undefined, not null), so any null failed the whole `safeParse` and discarded the batch. Switched to `.nullish()`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): add `verify` command for the delivery checker Adds `lh verify` covering the full delivery-checker chain — criteria & rubric CRUD, per-run plan (generate/state/confirm/skip), execute (LLM judge), results, and feedback — calling the `verify` lambda router. Enables end-to-end backend testing of the verify system. Also adds the missing `tool-runtime` / `prompts` / `const` workspace entries to the CLI's `pnpm-workspace.yaml` so the standalone package installs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): add verify message role + delivery-checker card UI Make the delivery-checker renderable in chat: - Fix the `features/Verify` components so they compile: flatten the `verify` locale to the repo's flat-dotted-key convention (keySeparator: false), import `Flexbox`/`TextArea` from `@lobehub/ui` (react-layout-kit is no longer a dep), and the token cast. - Add a `verify` UI message role + a `VerifyMessage` card that renders the Run Artifact + checker dock from `metadata.verifyOperationId`, wired into the message renderer switch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): add lobe-agent `generateVerifyPlan` tool (server runtime) Lets an agent set up the delivery checker for its run: the agent calls `generateVerifyPlan` early (per the new `<delivery_checker>` system-role guidance), which instantiates the rubric / ad-hoc criteria into a frozen plan on the current `agent_operations` row. Executed server-side only — the executor is dispatched via `runtime[apiName]` with `operationId` threaded through the tool execution context; the client `BaseExecutor` gracefully no-ops it. Also registers the metadata fields (`verifyOperationId`/`verifyRound`) on the message metadata zod schema so the role='verify' card can carry its operation id. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): surface role=verify card on run completion (LOBE-10051) Connect the delivery checker to the conversation: when an Agent Run with a verify plan completes, `CompletionLifecycle` inserts a persisted `role='verify'` message (parented to the assistant, carrying `metadata.verifyOperationId`) that renders the checker card. Self-guarded — no plan → no card, failures never affect the run. `role='verify'` behaves like a `user` leaf message everywhere it flows (persistence + conversation-flow pass it through unchanged); only the context-engine treats it specially: a new `VerifyMessageProcessor` drops it from the model context (UI-only card, not a valid model role). Adds `verify` to `CreateMessageRoleType`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): merge run-artifact + checker into one card The role=verify message rendered two stacked cards (Run Artifact summary + Delivery Checker) that duplicated the check-item list. Merge into a single card: the `Run Artifact · Round N` header, then the checker results + actions, then the snapshot note. RunArtifact/CheckerDock gain an `embedded` prop (header-only / body-only, no card chrome) and VerifyMessage composes them under one border. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): derive generateVerifyPlan rubric from agencyConfig A real agent calls `generateVerifyPlan` with just a `goal` and doesn't know rubric ids. When `rubricId`/`criteriaIds` params are absent, derive the mounted rubric + ad-hoc criteria from the executing agent's `agencyConfig.verifyRubricId / verifyCriteriaIds`. Params still win when given. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(cli): surface agent gateway WebSocket close code + reason The `onclose` handler logged `String(event)` → the useless "[object CloseEvent]". Surface `event.code` (+ `event.reason` when present) so a gateway disconnect before completion is actually diagnosable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 fix(verify): rename "Run Artifact" → "Verification", drop failed red border - The kicker said "Run Artifact" — it's automated verification, not an artifact. Renamed to "Verification · Round N". - Removed the red error border on a failed check — a normal card reads better. - Fixes a render crash (`useVerifyState is not defined`): the border removal left a dangling reference after the import was dropped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): poll run status when the agent stream drops When the live stream (gateway WebSocket / SSE) closes before the run finishes, the run is still executing server-side — so instead of hard-exiting, fall back to polling `aiAgent.getOperationStatus` every 10s until the run reaches a terminal state (or is no longer tracked). Pairs with surfacing the WS close code/reason. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 💄 feat(verify): add Render for generateVerifyPlan tool call The generateVerifyPlan tool call rendered as the default param/result dump. Add a Render that lists the generated delivery checks (title + gate/auto-fill tag), and surface the items on the tool state so the Render can read them. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): auto-confirm generated plan so checks run on completion The agent generated a plan but it stayed `planned`/unconfirmed, so the completion hook (which gates on a confirmed plan) never ran the checks — the card was stuck at "awaiting confirmation" with no pass/fail. In the headless agent flow there's no one to click Confirm, so `generateVerifyPlan` now auto-confirms the plan it generates; the checks then run automatically on completion. (An interactive "review before run" gate is a future enhancement.) Also: the verify card header disappeared in the draft/planned phase (`phaseToArtifact.draft` was null). Give it a header so the card always shows its "Verification · Round N" heading. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(agent-tracing): only count opaque/presentational attrs as structural noise The first structuralNoiseRatio charged ALL markup (every <...> tag) as noise, which over-penalized legitimately structured results 3x. Grounding against real web-search output (`<item title="…" url="…">snippet</item>`) showed the tags and the title=/url= attributes ARE the signal the model reads. Now only opaque/presentational attribute names (id, class, style, data-*, aria-*, role, on*) count as noise; semantic element tags and content-bearing attributes (title, url, href, name…) are kept. On a 57-op user-interrupted sample this drops web-search noise 42%→0% and overall estimated waste 16%→5%, leaving large-payload (readDocument) and high error-rate tools as the real signal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): model-authored criteria with name/description/instruction-in-document + agent verifier Restructure the generateVerifyPlan tool to a createDocument-style full-create flow and wire up the agent verifier path: - criteria now = title + description (required one-liner) + instruction (required detailed rubric); instruction lives in a linked document (verify_criteria.documentId), description is a new verify_criteria column (migration 0111). verifierConfig no longer holds description/instruction. - generateVerifyPlan creates verify_criteria + a rubric, snapshots the plan onto the operation and confirms it; judge resolves the instruction from the document. - agent-type checks run as verifier sub-agents (execAgent + isolated thread) whose onComplete hook parses a VERDICT and writes it back to verify_check_results (renamed AgentVerifierSpawner → VerifierAgentRunner). - UI: custom Inspector for the tool header; check list shows per-verifier-type icons (llm/agent/program) + description + required/optional tag; i18n en/zh. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ⚡️ perf(verify): run program/llm/agent checks concurrently on completion The three verifier kinds are independent; previously the agent spawn waited for the batched LLM judge to finish. Run them via Promise.all so agent sub-agents start immediately alongside the LLM batch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): dedicated builtin verify-agent + writeback tool, role=verify message, portal check editor - Add `@lobechat/builtin-tool-verify` (submitVerifyResult) + builtin `verify-agent`; agent-type checks now run as the dedicated verify agent (not the user's agent), which investigates and writes its verdict back via the tool during its run. - Verifier inherits the parent run's model/provider (builtin default may be unconfigured locally). - role=verify completion message no longer requires an assistantMessageId, so the delivery-checker card always surfaces when a plan exists. - Portal editor for verify checks (title/description/instruction/verifier/onFail). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🐛 fix(verify): restrict verify-agent to its writeback tool; fix running loader icon Root cause of stuck `running` agent checks: the verify-agent ran in agent mode and inherited all default tools (web-browsing, cloud-sandbox, skills, activator), so it went off web-searching/crawling to "investigate" and never called submitVerifyResult. - Run the verify-agent in chat mode (enableAgentMode: false, searchMode: off) — the strict whitelist — and whitelist `lobe-verify` for chat mode so the verifier gets ONLY its writeback tool. - Sharpen the verify systemRole: judge from the provided deliverable/instruction (no external tools), always reach a verdict, and always call submitVerifyResult. - CheckerDock: running check now uses the standard RingLoadingIcon (warning ring), matching the app's loader instead of a blue spinner. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): auto-repair loop — re-run the agent with failure feedback on failed checks When required checks fail with onFail=auto_repair, automatically run a second iteration instead of ending at `failed`: - createRepairRunner: re-runs the SAME agent in the same topic with the failure feedback as the prompt, re-snapshots the plan onto the repair operation and confirms it so it re-verifies on completion (the next round). Capped at MAX_REPAIR_ROUNDS via parent-chain depth to prevent runaway loops. - maybeAutoRepair: fires only once every required check has a terminal result, so it works for inline LLM checks (triggered from lifecycle) and async agent checks (triggered from the verify tool's writeback path). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): open check result detail in portal & rename artifact→result - add a VerifyResult portal view: clicking any check row opens that result's detail (verdict, confidence, Toulmin sections, suggestion) on the right; agent checks expose their execution trace from inside the panel - CheckerDock rows are all clickable now (chevron affordance), status shown by icon only; verify card uses colorBgElevated - rename the run-result surface from "artifact" to "result" everywhere: RunArtifact → RunResult, phaseToArtifact → phaseToResult, and all `artifact.*` i18n keys → `result.*` - ship verify namespace zh-CN / en-US locales Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): enrich check result portal — criterion stepper, richer detail view Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): rubric run-policy config + repair feedback on the verify card Auto-repair feedback now lives on the failed round's role=verify message (content), and the VerifyMessageProcessor surfaces it into the repair run's context as a tagged user turn — so the repair op runs off history via a new execAgent `suppressUserMessage` path instead of injecting a synthetic user message. createVerifyMessage is awaited before verification to avoid a race. maxRepairRounds becomes a rubric-level config: new `verify_rubrics.config` jsonb column, read live at repair time via the plan's sourceRubricId. Adds a RubricConfig portal panel (reachable from the plan card's settings affordance) to view/edit it, wired through the verify store + TRPC. Verify domain types/vocab/config are extracted from the DB schema into @lobechat/types as the single source of truth; schema and consumers import from there. Tests: VerifyMessageProcessor dual behavior; VerifyRubricModel config round-trip; MessageModel.findVerifyMessageByOperationId. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * 🗃️ refactor(verify): squash the 3 verify migrations into one Collapse 0110 (tables) + 0111 (criteria.description) + 0112 (rubrics.config) into a single regenerated 0110_add_verify_tables so the PR ships one clean, idempotent migration. No schema change vs the three combined. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(cli): verify rubric run-policy config commands + shrink judging-rule editor font CLI: `verify rubric create --max-repair-rounds`, `verify rubric view`, and `verify rubric update` exercise the rubric config endpoints end-to-end; adds a mocked command test. UI: judging-rule editor font 16px → 14px. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(verify): editable rubric name in the config panel + default 3 repair rounds Add a name (title) field to the RubricConfig portal, persisted via a new updateRubricTitle store action + service (optimistic + debounced, alongside the config write-back). Bump DEFAULT_MAX_REPAIR_ROUNDS 2 → 3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ♻️ refactor(verify): extract generateVerifyPlan into installable lobe-delivery-checker tool Move the delivery-checker plan-creation flow out of the always-on lobe-agent tool into a new standalone, installable builtin tool `lobe-delivery-checker` (Skill Store, opt-in per agent — not loaded by default). lobe-agent no longer ships generateVerifyPlan. - new packages/builtin-tool-lobe-delivery-checker (manifest/types/systemRole + client Render/Inspector/Portal moved wholesale from lobe-agent) - new serverRuntimes/lobeDeliveryChecker.ts (generateVerifyPlan moved out of lobeAgent.ts), registered alongside verifyResult - registered installable in builtin-tools (no hidden/discoverable:false, not in defaultToolIds/alwaysOnToolIds/runtimeManagedToolIds); renders/inspectors/ portals/identifiers wired; lobe-agent portal entries removed - i18n keys moved builtins.lobe-agent.verifyPlan.* → builtins.lobe-delivery-checker.* Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * ✨ feat(agent): add `custom` tool mode; verify agent uses it instead of chat-mode Chat mode's contract is to strip ALL user/agent plugins (strict KB/memory/web allow-list) — so the verify sub-agent couldn't get its writeback tool without a leaky blanket rule. Introduce a third tool mode `custom` where the toolset is EXACTLY the agent's declared plugins (no always-on, no defaults, no activator), for focused builtin sub-agents. - chatConfig.toolMode: 'agent' | 'chat' | 'custom' (overrides enableAgentMode) - AgentToolsEngine: custom branch (defaultToolIds = plugins, rules = plugins-on, allowExplicitActivation only in agent mode); chatModeRules restored to strict - verify agent → toolMode: 'custom'; lobe-verify dropped from chatModeAllowedToolIds - test: custom mode enables exactly the declared plugin, no always-on / defaults Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4,6 +4,9 @@ packages:
|
||||
- '../../packages/device-identity'
|
||||
- '../../packages/heterogeneous-agents'
|
||||
- '../../packages/local-file-shell'
|
||||
- '../../packages/tool-runtime'
|
||||
- '../../packages/prompts'
|
||||
- '../../packages/const'
|
||||
- '../../packages/types'
|
||||
- '../../packages/model-bank'
|
||||
- '../../packages/business/const'
|
||||
|
||||
@@ -347,22 +347,33 @@ export function registerAgentCommand(program: Command) {
|
||||
const { serverUrl, headers, token, tokenType } = await getAgentStreamAuthInfo();
|
||||
const agentGatewayUrl = options.sse ? undefined : resolveAgentGatewayUrl();
|
||||
|
||||
if (agentGatewayUrl) {
|
||||
await streamAgentEventsViaWebSocket({
|
||||
gatewayUrl: agentGatewayUrl,
|
||||
json: options.json,
|
||||
operationId,
|
||||
serverUrl,
|
||||
token,
|
||||
tokenType,
|
||||
verbose: options.verbose,
|
||||
});
|
||||
} else {
|
||||
const streamUrl = `${serverUrl}/api/agent/stream?operationId=${encodeURIComponent(operationId)}`;
|
||||
await streamAgentEvents(streamUrl, headers, {
|
||||
json: options.json,
|
||||
verbose: options.verbose,
|
||||
});
|
||||
try {
|
||||
if (agentGatewayUrl) {
|
||||
await streamAgentEventsViaWebSocket({
|
||||
gatewayUrl: agentGatewayUrl,
|
||||
json: options.json,
|
||||
operationId,
|
||||
serverUrl,
|
||||
token,
|
||||
tokenType,
|
||||
verbose: options.verbose,
|
||||
});
|
||||
} else {
|
||||
const streamUrl = `${serverUrl}/api/agent/stream?operationId=${encodeURIComponent(operationId)}`;
|
||||
await streamAgentEvents(streamUrl, headers, {
|
||||
json: options.json,
|
||||
verbose: options.verbose,
|
||||
});
|
||||
}
|
||||
} catch (error) {
|
||||
// The live stream (gateway WS / SSE) dropped before the run finished —
|
||||
// the run is still executing server-side. Instead of failing, fall back
|
||||
// to polling the run status until it reaches a terminal state.
|
||||
if (options.json) throw error;
|
||||
log.warn(
|
||||
`Live stream unavailable (${(error as Error).message}). Polling run status every 10s…`,
|
||||
);
|
||||
await pollAgentRunStatus(client, operationId);
|
||||
}
|
||||
},
|
||||
);
|
||||
@@ -626,3 +637,56 @@ function colorStatus(status: string): string {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const TERMINAL_RUN_STATUSES = new Set([
|
||||
'completed',
|
||||
'done',
|
||||
'success',
|
||||
'failed',
|
||||
'error',
|
||||
'cancelled',
|
||||
'canceled',
|
||||
'aborted',
|
||||
]);
|
||||
|
||||
/**
|
||||
* Fallback when the live stream (gateway WebSocket / SSE) drops before the run
|
||||
* finishes: the run is still executing server-side, so poll its status every 10s
|
||||
* until it reaches a terminal state (or is no longer tracked, which also means it
|
||||
* has finished). Avoids hard-exiting on a transient gateway disconnect.
|
||||
*/
|
||||
async function pollAgentRunStatus(
|
||||
client: Awaited<ReturnType<typeof getTrpcClient>>,
|
||||
operationId: string,
|
||||
): Promise<void> {
|
||||
const POLL_MS = 10_000;
|
||||
let lastStatus = '';
|
||||
for (let i = 0; ; i++) {
|
||||
if (i > 0) await new Promise((resolve) => setTimeout(resolve, POLL_MS));
|
||||
|
||||
let r: any;
|
||||
try {
|
||||
r = await client.aiAgent.getOperationStatus.query({ operationId } as any);
|
||||
} catch (error) {
|
||||
log.error(`Status poll failed: ${(error as Error).message}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (!r) {
|
||||
log.info('Run is no longer tracked — finished (or expired).');
|
||||
return;
|
||||
}
|
||||
|
||||
const status = r.status || r.state || 'unknown';
|
||||
if (status !== lastStatus) {
|
||||
lastStatus = status;
|
||||
const steps = r.stepCount !== undefined ? ` · ${r.stepCount} step(s)` : '';
|
||||
log.info(`Run status: ${colorStatus(status)}${steps}`);
|
||||
}
|
||||
|
||||
if (TERMINAL_RUN_STATUSES.has(status)) {
|
||||
if (r.error) log.error(`Run error: ${r.error}`);
|
||||
return;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,90 @@
|
||||
import { Command } from 'commander';
|
||||
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
|
||||
|
||||
import { registerVerifyCommand } from './verify';
|
||||
|
||||
const { mockTrpcClient } = vi.hoisted(() => ({
|
||||
mockTrpcClient: {
|
||||
verify: {
|
||||
createRubric: { mutate: vi.fn() },
|
||||
getRubric: { query: vi.fn() },
|
||||
updateRubric: { mutate: vi.fn() },
|
||||
},
|
||||
},
|
||||
}));
|
||||
|
||||
const { getTrpcClient: mockGetTrpcClient } = vi.hoisted(() => ({
|
||||
getTrpcClient: vi.fn(),
|
||||
}));
|
||||
|
||||
vi.mock('../api/client', () => ({ getTrpcClient: mockGetTrpcClient }));
|
||||
vi.mock('../utils/logger', () => ({
|
||||
log: { debug: vi.fn(), error: vi.fn(), info: vi.fn(), warn: vi.fn() },
|
||||
setVerbose: vi.fn(),
|
||||
}));
|
||||
|
||||
describe('verify rubric config commands', () => {
|
||||
let consoleSpy: ReturnType<typeof vi.spyOn>;
|
||||
|
||||
beforeEach(() => {
|
||||
consoleSpy = vi.spyOn(console, 'log').mockImplementation(() => {});
|
||||
mockGetTrpcClient.mockResolvedValue(mockTrpcClient);
|
||||
mockTrpcClient.verify.createRubric.mutate.mockReset().mockResolvedValue({ id: 'rub-1' });
|
||||
mockTrpcClient.verify.updateRubric.mutate.mockReset().mockResolvedValue(undefined);
|
||||
mockTrpcClient.verify.getRubric.query.mockReset();
|
||||
});
|
||||
|
||||
afterEach(() => consoleSpy.mockRestore());
|
||||
|
||||
const run = async (args: string[]) => {
|
||||
const program = new Command();
|
||||
program.exitOverride();
|
||||
registerVerifyCommand(program);
|
||||
await program.parseAsync(['node', 'lh', 'verify', ...args]);
|
||||
};
|
||||
|
||||
it('passes maxRepairRounds config when creating a rubric', async () => {
|
||||
await run(['rubric', 'create', '-t', 'Standard', '--max-repair-rounds', '3']);
|
||||
|
||||
expect(mockTrpcClient.verify.createRubric.mutate).toHaveBeenCalledWith({
|
||||
config: { maxRepairRounds: 3 },
|
||||
description: undefined,
|
||||
title: 'Standard',
|
||||
});
|
||||
});
|
||||
|
||||
it('omits config when no max-repair-rounds flag is given', async () => {
|
||||
await run(['rubric', 'create', '-t', 'Standard']);
|
||||
|
||||
expect(mockTrpcClient.verify.createRubric.mutate).toHaveBeenCalledWith({
|
||||
config: undefined,
|
||||
description: undefined,
|
||||
title: 'Standard',
|
||||
});
|
||||
});
|
||||
|
||||
it('updates only the config when max-repair-rounds is passed', async () => {
|
||||
await run(['rubric', 'update', 'rub-1', '--max-repair-rounds', '0']);
|
||||
|
||||
expect(mockTrpcClient.verify.updateRubric.mutate).toHaveBeenCalledWith({
|
||||
id: 'rub-1',
|
||||
value: { config: { maxRepairRounds: 0 } },
|
||||
});
|
||||
});
|
||||
|
||||
it('views a rubric and prints its repair-round config', async () => {
|
||||
mockTrpcClient.verify.getRubric.query.mockResolvedValue({
|
||||
config: { maxRepairRounds: 4 },
|
||||
description: 'desc',
|
||||
id: 'rub-1',
|
||||
title: 'Standard',
|
||||
});
|
||||
|
||||
await run(['rubric', 'view', 'rub-1']);
|
||||
|
||||
expect(mockTrpcClient.verify.getRubric.query).toHaveBeenCalledWith({ id: 'rub-1' });
|
||||
const printed = consoleSpy.mock.calls.map((c) => String(c[0])).join('\n');
|
||||
expect(printed).toContain('Standard');
|
||||
expect(printed).toContain('4');
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,455 @@
|
||||
import type { Command } from 'commander';
|
||||
import pc from 'picocolors';
|
||||
|
||||
import { getTrpcClient } from '../api/client';
|
||||
import { confirm, outputJson, printTable, timeAgo, truncate } from '../utils/format';
|
||||
import { log } from '../utils/logger';
|
||||
|
||||
// ── Helpers ────────────────────────────────────────────────
|
||||
|
||||
type VerifierType = 'agent' | 'llm' | 'program';
|
||||
type OnFail = 'auto_repair' | 'manual';
|
||||
type Decision = 'accepted' | 'overridden' | 'rejected';
|
||||
|
||||
const VERIFIER_TYPES: VerifierType[] = ['program', 'agent', 'llm'];
|
||||
const ON_FAIL: OnFail[] = ['manual', 'auto_repair'];
|
||||
const DECISIONS: Decision[] = ['accepted', 'rejected', 'overridden'];
|
||||
|
||||
function parseConfig(raw?: string): Record<string, unknown> | undefined {
|
||||
if (!raw) return undefined;
|
||||
try {
|
||||
return JSON.parse(raw);
|
||||
} catch {
|
||||
log.error('--config must be valid JSON');
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
function assertEnum<T extends string>(value: T | undefined, allowed: T[], flag: string): void {
|
||||
if (value !== undefined && !allowed.includes(value)) {
|
||||
log.error(`${flag} must be one of: ${allowed.join(', ')}`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Command Registration ───────────────────────────────────
|
||||
|
||||
export function registerVerifyCommand(program: Command) {
|
||||
const verify = program
|
||||
.command('verify')
|
||||
.description('Manage the Agent Run delivery checker (criteria, rubrics, plans, results)');
|
||||
|
||||
// ════════════ criteria ════════════
|
||||
const criterion = verify.command('criterion').description('Reusable pass/fail standards');
|
||||
|
||||
criterion
|
||||
.command('list')
|
||||
.description('List criteria')
|
||||
.option('--json [fields]', 'Output JSON, optionally specify fields (comma-separated)')
|
||||
.action(async (options: { json?: boolean | string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const items = await client.verify.listCriteria.query();
|
||||
|
||||
if (options.json !== undefined) {
|
||||
outputJson(items, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
if (items.length === 0) return void console.log('No criteria found.');
|
||||
printTable(
|
||||
items.map((c) => [
|
||||
c.id,
|
||||
truncate(c.title, 60),
|
||||
c.verifierType,
|
||||
c.required ? 'gate' : 'soft',
|
||||
c.onFail,
|
||||
c.updatedAt ? timeAgo(c.updatedAt) : '',
|
||||
]),
|
||||
['ID', 'TITLE', 'TYPE', 'BLOCK', 'ON-FAIL', 'UPDATED'],
|
||||
);
|
||||
});
|
||||
|
||||
criterion
|
||||
.command('create')
|
||||
.description('Create a criterion')
|
||||
.requiredOption('-t, --title <title>', 'Criterion title')
|
||||
.requiredOption('--type <type>', `Verifier type (${VERIFIER_TYPES.join('|')})`)
|
||||
.option('--on-fail <strategy>', `Action on failure (${ON_FAIL.join('|')})`)
|
||||
.option('--soft', 'Non-blocking (required=false); defaults to blocking')
|
||||
.option('--config <json>', 'Verifier config as JSON')
|
||||
.option('--doc <id>', 'Linked guidance document id')
|
||||
.action(
|
||||
async (options: {
|
||||
config?: string;
|
||||
doc?: string;
|
||||
onFail?: OnFail;
|
||||
soft?: boolean;
|
||||
title: string;
|
||||
type: VerifierType;
|
||||
}) => {
|
||||
assertEnum(options.type, VERIFIER_TYPES, '--type');
|
||||
assertEnum(options.onFail, ON_FAIL, '--on-fail');
|
||||
const client = await getTrpcClient();
|
||||
const result = await client.verify.createCriterion.mutate({
|
||||
documentId: options.doc,
|
||||
onFail: options.onFail,
|
||||
required: options.soft ? false : undefined,
|
||||
title: options.title,
|
||||
verifierConfig: parseConfig(options.config),
|
||||
verifierType: options.type,
|
||||
});
|
||||
console.log(`${pc.green('✓')} Created criterion ${pc.bold((result as any).id)}`);
|
||||
},
|
||||
);
|
||||
|
||||
criterion
|
||||
.command('delete <id>')
|
||||
.description('Delete a criterion')
|
||||
.option('--yes', 'Skip confirmation')
|
||||
.action(async (id: string, options: { yes?: boolean }) => {
|
||||
if (!options.yes && !(await confirm(`Delete criterion ${id}?`)))
|
||||
return void console.log('Cancelled.');
|
||||
const client = await getTrpcClient();
|
||||
await client.verify.deleteCriterion.mutate({ id });
|
||||
console.log(`${pc.green('✓')} Deleted criterion ${pc.bold(id)}`);
|
||||
});
|
||||
|
||||
// ════════════ rubrics ════════════
|
||||
const rubric = verify.command('rubric').description('Named groups of criteria');
|
||||
|
||||
rubric
|
||||
.command('list')
|
||||
.description('List rubrics')
|
||||
.option('--json [fields]', 'Output JSON, optionally specify fields (comma-separated)')
|
||||
.action(async (options: { json?: boolean | string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const items = await client.verify.listRubrics.query();
|
||||
if (options.json !== undefined) {
|
||||
outputJson(items, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
if (items.length === 0) return void console.log('No rubrics found.');
|
||||
printTable(
|
||||
items.map((r) => [
|
||||
r.id,
|
||||
truncate(r.title, 60),
|
||||
truncate(r.description || '', 60),
|
||||
r.updatedAt ? timeAgo(r.updatedAt) : '',
|
||||
]),
|
||||
['ID', 'TITLE', 'DESCRIPTION', 'UPDATED'],
|
||||
);
|
||||
});
|
||||
|
||||
rubric
|
||||
.command('create')
|
||||
.description('Create a rubric')
|
||||
.requiredOption('-t, --title <title>', 'Rubric title')
|
||||
.option('-d, --description <text>', 'Rubric description')
|
||||
.option('--max-repair-rounds <n>', 'Cap on automatic repair rounds (0-5)')
|
||||
.action(async (options: { description?: string; maxRepairRounds?: string; title: string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const result = await client.verify.createRubric.mutate({
|
||||
config:
|
||||
options.maxRepairRounds !== undefined
|
||||
? { maxRepairRounds: Number(options.maxRepairRounds) }
|
||||
: undefined,
|
||||
description: options.description,
|
||||
title: options.title,
|
||||
});
|
||||
console.log(`${pc.green('✓')} Created rubric ${pc.bold((result as any).id)}`);
|
||||
});
|
||||
|
||||
rubric
|
||||
.command('view <id>')
|
||||
.description('Show a rubric and its run-policy config')
|
||||
.option('--json [fields]', 'Output JSON')
|
||||
.action(async (id: string, options: { json?: boolean | string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const item = await client.verify.getRubric.query({ id });
|
||||
if (!item) return void log.error('Rubric not found.');
|
||||
if (options.json !== undefined) {
|
||||
outputJson(item, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
console.log(`${pc.bold('ID')} ${item.id}`);
|
||||
console.log(`${pc.bold('Title')} ${item.title}`);
|
||||
if (item.description) console.log(`${pc.bold('Description')} ${item.description}`);
|
||||
const maxRepairRounds = (item.config as { maxRepairRounds?: number } | null)?.maxRepairRounds;
|
||||
console.log(`${pc.bold('Repair rounds')} ${maxRepairRounds ?? pc.dim('default')}`);
|
||||
});
|
||||
|
||||
rubric
|
||||
.command('update <id>')
|
||||
.description('Update a rubric (title / description / run-policy config)')
|
||||
.option('-t, --title <title>', 'New title')
|
||||
.option('-d, --description <text>', 'New description')
|
||||
.option('--max-repair-rounds <n>', 'Cap on automatic repair rounds (0-5)')
|
||||
.action(
|
||||
async (
|
||||
id: string,
|
||||
options: { description?: string; maxRepairRounds?: string; title?: string },
|
||||
) => {
|
||||
const client = await getTrpcClient();
|
||||
const value: {
|
||||
config?: { maxRepairRounds?: number };
|
||||
description?: string;
|
||||
title?: string;
|
||||
} = {};
|
||||
if (options.title !== undefined) value.title = options.title;
|
||||
if (options.description !== undefined) value.description = options.description;
|
||||
if (options.maxRepairRounds !== undefined)
|
||||
value.config = { maxRepairRounds: Number(options.maxRepairRounds) };
|
||||
await client.verify.updateRubric.mutate({ id, value });
|
||||
console.log(`${pc.green('✓')} Updated rubric ${pc.bold(id)}`);
|
||||
},
|
||||
);
|
||||
|
||||
rubric
|
||||
.command('delete <id>')
|
||||
.description('Delete a rubric')
|
||||
.option('--yes', 'Skip confirmation')
|
||||
.action(async (id: string, options: { yes?: boolean }) => {
|
||||
if (!options.yes && !(await confirm(`Delete rubric ${id}?`)))
|
||||
return void console.log('Cancelled.');
|
||||
const client = await getTrpcClient();
|
||||
await client.verify.deleteRubric.mutate({ id });
|
||||
console.log(`${pc.green('✓')} Deleted rubric ${pc.bold(id)}`);
|
||||
});
|
||||
|
||||
rubric
|
||||
.command('criteria <rubricId>')
|
||||
.description('List criteria in a rubric')
|
||||
.option('--json [fields]', 'Output JSON')
|
||||
.action(async (rubricId: string, options: { json?: boolean | string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const items = await client.verify.getRubricCriteria.query({ rubricId });
|
||||
if (options.json !== undefined) {
|
||||
outputJson(items, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
if (items.length === 0) return void console.log('No criteria in this rubric.');
|
||||
printTable(
|
||||
items.map((c: any) => [
|
||||
c.id,
|
||||
truncate(c.title, 60),
|
||||
c.verifierType,
|
||||
c.required ? 'gate' : 'soft',
|
||||
]),
|
||||
['ID', 'TITLE', 'TYPE', 'BLOCK'],
|
||||
);
|
||||
});
|
||||
|
||||
rubric
|
||||
.command('set-criteria <rubricId> <criterionIds...>')
|
||||
.description('Set the criteria a rubric aggregates (order preserved)')
|
||||
.action(async (rubricId: string, criterionIds: string[]) => {
|
||||
const client = await getTrpcClient();
|
||||
await client.verify.setRubricCriteria.mutate({
|
||||
criteria: criterionIds.map((criterionId, i) => ({ criterionId, sortOrder: i })),
|
||||
rubricId,
|
||||
});
|
||||
console.log(
|
||||
`${pc.green('✓')} Rubric ${pc.bold(rubricId)} now has ${criterionIds.length} criterion(s)`,
|
||||
);
|
||||
});
|
||||
|
||||
// ════════════ per-run plan ════════════
|
||||
const plan = verify.command('plan').description('Per-run check plan lifecycle');
|
||||
|
||||
plan
|
||||
.command('generate <operationId>')
|
||||
.description('Generate a draft check plan for a run')
|
||||
.requiredOption('--goal <goal>', "The run's task/instruction the plan must satisfy")
|
||||
.option('--rubric <id>', 'Mounted rubric id')
|
||||
.option('--criteria <ids>', 'Ad-hoc criterion ids (comma-separated)')
|
||||
.option('--ai', 'Let the LLM propose additional criteria')
|
||||
.option('--max-ai <n>', 'Max AI-proposed criteria')
|
||||
.option('--model <model>', 'Model (required with --ai)')
|
||||
.option('--provider <provider>', 'Provider (required with --ai)')
|
||||
.option('--context <text>', 'Extra context for the AI prompt')
|
||||
.option('--json [fields]', 'Output JSON')
|
||||
.action(
|
||||
async (
|
||||
operationId: string,
|
||||
options: {
|
||||
ai?: boolean;
|
||||
context?: string;
|
||||
criteria?: string;
|
||||
goal: string;
|
||||
json?: boolean | string;
|
||||
maxAi?: string;
|
||||
model?: string;
|
||||
provider?: string;
|
||||
rubric?: string;
|
||||
},
|
||||
) => {
|
||||
if (options.ai && (!options.model || !options.provider)) {
|
||||
log.error('--ai requires --model and --provider');
|
||||
process.exit(1);
|
||||
}
|
||||
const client = await getTrpcClient();
|
||||
const items = await client.verify.generateDraftPlan.mutate({
|
||||
context: options.context,
|
||||
enableAiGeneration: options.ai,
|
||||
goal: options.goal,
|
||||
maxAiCriteria: options.maxAi ? Number.parseInt(options.maxAi, 10) : undefined,
|
||||
modelConfig:
|
||||
options.model && options.provider
|
||||
? { model: options.model, provider: options.provider }
|
||||
: undefined,
|
||||
operationId,
|
||||
verifyCriteriaIds: options.criteria
|
||||
?.split(',')
|
||||
.map((s) => s.trim())
|
||||
.filter(Boolean),
|
||||
verifyRubricId: options.rubric ?? null,
|
||||
});
|
||||
if (options.json !== undefined) {
|
||||
outputJson(items, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
console.log(`${pc.green('✓')} Draft plan: ${pc.bold(String(items.length))} item(s)`);
|
||||
printTable(
|
||||
items.map((i: any) => [
|
||||
String(i.index),
|
||||
truncate(i.title, 60),
|
||||
i.verifierType,
|
||||
i.required ? 'gate' : 'soft',
|
||||
]),
|
||||
['#', 'TITLE', 'TYPE', 'BLOCK'],
|
||||
);
|
||||
},
|
||||
);
|
||||
|
||||
plan
|
||||
.command('state <operationId>')
|
||||
.description('Show the verify state (status + frozen plan) of a run')
|
||||
.option('--json [fields]', 'Output JSON')
|
||||
.action(async (operationId: string, options: { json?: boolean | string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const state = await client.verify.getVerifyState.query({ operationId });
|
||||
if (options.json !== undefined) {
|
||||
outputJson(state, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
if (!state) return void console.log('No verify state for this run.');
|
||||
console.log(`${pc.bold('status')}: ${state.verifyStatus ?? pc.dim('(none)')}`);
|
||||
console.log(
|
||||
`${pc.bold('confirmed')}: ${state.verifyPlanConfirmedAt ? timeAgo(state.verifyPlanConfirmedAt) : pc.dim('no')}`,
|
||||
);
|
||||
const items = (state.verifyPlan ?? []) as any[];
|
||||
console.log(`${pc.bold('plan')}: ${items.length} item(s)`);
|
||||
if (items.length > 0)
|
||||
printTable(
|
||||
items.map((i) => [
|
||||
String(i.index),
|
||||
truncate(i.title, 60),
|
||||
i.verifierType,
|
||||
i.required ? 'gate' : 'soft',
|
||||
]),
|
||||
['#', 'TITLE', 'TYPE', 'BLOCK'],
|
||||
);
|
||||
});
|
||||
|
||||
plan
|
||||
.command('confirm <operationId>')
|
||||
.description('Freeze (confirm) the draft plan')
|
||||
.action(async (operationId: string) => {
|
||||
const client = await getTrpcClient();
|
||||
await client.verify.confirmPlan.mutate({ operationId });
|
||||
console.log(`${pc.green('✓')} Confirmed plan for run ${pc.bold(operationId)}`);
|
||||
});
|
||||
|
||||
plan
|
||||
.command('skip <operationId>')
|
||||
.description('Skip verification for a run')
|
||||
.action(async (operationId: string) => {
|
||||
const client = await getTrpcClient();
|
||||
await client.verify.skipPlan.mutate({ operationId });
|
||||
console.log(`${pc.green('✓')} Skipped verification for run ${pc.bold(operationId)}`);
|
||||
});
|
||||
|
||||
// ════════════ run / results ════════════
|
||||
verify
|
||||
.command('run <operationId>')
|
||||
.description('Execute the confirmed plan against a deliverable (LLM judge)')
|
||||
.requiredOption('--goal <goal>', "The run's task")
|
||||
.requiredOption('--deliverable <text>', 'The output to judge')
|
||||
.requiredOption('--model <model>', 'Judge model')
|
||||
.requiredOption('--provider <provider>', 'Judge provider')
|
||||
.option('--no-batch', 'Judge each item separately instead of one batched call')
|
||||
.option('--json [fields]', 'Output JSON')
|
||||
.action(
|
||||
async (
|
||||
operationId: string,
|
||||
options: {
|
||||
batch?: boolean;
|
||||
deliverable: string;
|
||||
goal: string;
|
||||
json?: boolean | string;
|
||||
model: string;
|
||||
provider: string;
|
||||
},
|
||||
) => {
|
||||
const client = await getTrpcClient();
|
||||
const results = await client.verify.executeVerify.mutate({
|
||||
batchLlm: options.batch,
|
||||
deliverable: options.deliverable,
|
||||
goal: options.goal,
|
||||
modelConfig: { model: options.model, provider: options.provider },
|
||||
operationId,
|
||||
});
|
||||
if (options.json !== undefined) {
|
||||
outputJson(results, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
printResults(results);
|
||||
},
|
||||
);
|
||||
|
||||
verify
|
||||
.command('results <operationId>')
|
||||
.description('List check results for a run')
|
||||
.option('--json [fields]', 'Output JSON')
|
||||
.action(async (operationId: string, options: { json?: boolean | string }) => {
|
||||
const client = await getTrpcClient();
|
||||
const results = await client.verify.listResults.query({ operationId });
|
||||
if (options.json !== undefined) {
|
||||
outputJson(results, typeof options.json === 'string' ? options.json : undefined);
|
||||
return;
|
||||
}
|
||||
if (results.length === 0) return void console.log('No results yet.');
|
||||
printResults(results);
|
||||
});
|
||||
|
||||
// ════════════ feedback ════════════
|
||||
verify
|
||||
.command('decision <resultId> <decision>')
|
||||
.description(`Record human feedback on a result (${DECISIONS.join('|')})`)
|
||||
.action(async (resultId: string, decision: Decision) => {
|
||||
assertEnum(decision, DECISIONS, 'decision');
|
||||
const client = await getTrpcClient();
|
||||
await client.verify.submitDecision.mutate({ decision, resultId });
|
||||
console.log(`${pc.green('✓')} Recorded ${pc.bold(decision)} on result ${pc.bold(resultId)}`);
|
||||
});
|
||||
}
|
||||
|
||||
function printResults(results: any[]): void {
|
||||
printTable(
|
||||
results.map((r) => [
|
||||
truncate(r.checkItemTitle || r.checkItemId, 50),
|
||||
statusColor(r.status),
|
||||
r.verdict ?? '',
|
||||
r.confidence != null ? String(r.confidence) : '',
|
||||
r.required ? 'gate' : 'soft',
|
||||
truncate(r.suggestion || '', 40),
|
||||
]),
|
||||
['CHECK', 'STATUS', 'VERDICT', 'CONF', 'BLOCK', 'SUGGESTION'],
|
||||
);
|
||||
}
|
||||
|
||||
function statusColor(status: string): string {
|
||||
if (status === 'passed') return pc.green(status);
|
||||
if (status === 'failed') return pc.red(status);
|
||||
if (status === 'running') return pc.yellow(status);
|
||||
return pc.dim(status);
|
||||
}
|
||||
@@ -34,6 +34,7 @@ import { registerTaskCommand } from './commands/task';
|
||||
import { registerThreadCommand } from './commands/thread';
|
||||
import { registerTopicCommand } from './commands/topic';
|
||||
import { registerUserCommand } from './commands/user';
|
||||
import { registerVerifyCommand } from './commands/verify';
|
||||
|
||||
const require = createRequire(import.meta.url);
|
||||
const { version } = require('../package.json');
|
||||
@@ -75,6 +76,7 @@ export function createProgram() {
|
||||
registerProviderCommand(program);
|
||||
registerPluginCommand(program);
|
||||
registerUserCommand(program);
|
||||
registerVerifyCommand(program);
|
||||
registerConfigCommand(program);
|
||||
registerEvalCommand(program);
|
||||
registerMigrateCommand(program);
|
||||
|
||||
@@ -296,7 +296,11 @@ export async function streamAgentEventsViaWebSocket(
|
||||
console.log(JSON.stringify(jsonEvents, null, 2));
|
||||
}
|
||||
isSettled = true;
|
||||
reject(new Error(`Agent gateway WebSocket closed before completion: ${String(event)}`));
|
||||
// Surface the close code + reason — `String(event)` is just "[object CloseEvent]".
|
||||
const reason = event.reason ? `: ${event.reason}` : '';
|
||||
reject(
|
||||
new Error(`Agent gateway WebSocket closed before completion (code ${event.code}${reason})`),
|
||||
);
|
||||
};
|
||||
});
|
||||
}
|
||||
|
||||
@@ -130,6 +130,31 @@
|
||||
"builtins.lobe-cloud-sandbox.apiName.writeLocalFile": "Write file",
|
||||
"builtins.lobe-cloud-sandbox.inspector.noResults": "No results",
|
||||
"builtins.lobe-cloud-sandbox.title": "Cloud Sandbox",
|
||||
"builtins.lobe-delivery-checker.apiName.generateVerifyPlan": "Create automated checks",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.optional": "Optional",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.fields.description": "Summary",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.fields.instruction": "Judging rubric",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.fields.title": "Check title",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.auto_repair": "Auto repair",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.auto_repairDesc": "On failure, automatically start a repair round and re-run the check.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.manual": "Handle manually",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.manualDesc": "On failure, stop and leave the next step to you.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.title": "On failure",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.required.desc": "When on, a failure on this check blocks the run from being delivered.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.required.title": "Required",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.desc": "How many times a failing run is automatically re-run with the failure feedback before it stops. Set to 0 to disable auto-repair.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.title": "Max repair rounds",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.name": "Standard name",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.title": "Standard settings",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.title": "Check configuration",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.agent.desc": "A dedicated agent reads the trace, files, diff and PR before judging.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.agent.title": "Agent check",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.llm.desc": "Judge against the text result and the run context.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.llm.title": "LLM judgment",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.program.desc": "Validate via commands, APIs or status results. Good for tests, type-check, PR existence.",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.program.title": "Program check",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.title": "Verification method",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.required": "Required",
|
||||
"builtins.lobe-group-agent-builder.apiName.batchCreateAgents": "Batch create agents",
|
||||
"builtins.lobe-group-agent-builder.apiName.createAgent": "Create agent",
|
||||
"builtins.lobe-group-agent-builder.apiName.createGroup": "Create group",
|
||||
|
||||
@@ -0,0 +1,60 @@
|
||||
{
|
||||
"badge.failed": "Check failed",
|
||||
"badge.passed": "Check passed",
|
||||
"badge.pending": "Awaiting check",
|
||||
"badge.repairing": "Repair triggered",
|
||||
"behavior.auto_improve": "Auto-fill",
|
||||
"behavior.auto_improveDesc": "Filled in automatically; does not block delivery",
|
||||
"behavior.gate": "Delivery gate",
|
||||
"behavior.gateDesc": "Blocks delivery on failure and triggers a repair round",
|
||||
"detail.checkedAt": "Checked at",
|
||||
"detail.confidence": "Confidence",
|
||||
"detail.counterEvidence": "Counter-evidence",
|
||||
"detail.duration": "Duration",
|
||||
"detail.evidence": "Evidence",
|
||||
"detail.instruction": "Judging rule",
|
||||
"detail.limitation": "Limitation",
|
||||
"detail.method": "Method",
|
||||
"detail.methodAgent": "Agent",
|
||||
"detail.methodLlm": "LLM",
|
||||
"detail.methodProgram": "Program",
|
||||
"detail.model": "Model",
|
||||
"detail.openTrace": "View agent trace",
|
||||
"detail.pending": "This check has not run yet.",
|
||||
"detail.reasoning": "Reasoning",
|
||||
"detail.suggestion": "Suggested fix",
|
||||
"detail.summary": "Summary",
|
||||
"detail.tokens": "Tokens",
|
||||
"dock.confirm": "Confirm & run",
|
||||
"dock.edit": "Adjust checks",
|
||||
"dock.forceDeliver": "Ignore & deliver",
|
||||
"dock.repairHint": "The next round is fixing the failed checks. A new result is produced and the checker re-runs when it finishes.",
|
||||
"dock.saveAndRepair": "Save input & repair now",
|
||||
"dock.skip": "Skip checks",
|
||||
"dock.title": "Delivery Checker",
|
||||
"editor.add": "+ Add check",
|
||||
"editor.cancel": "Cancel",
|
||||
"editor.placeholder": "Check title",
|
||||
"editor.save": "Save",
|
||||
"input.hint": "This goes to the next repair round as checker input — it will not appear as a chat message.",
|
||||
"input.label": "Extra input for the next repair round",
|
||||
"input.placeholder": "e.g. run type-check first; if it still fails, just add a risk note.",
|
||||
"result.failed.sub": "This result is held back. The delivery checker found verification insufficient and triggered a repair.",
|
||||
"result.failed.title": "Draft result",
|
||||
"result.foot": "A snapshot of this run’s result — not an assistant or user message.",
|
||||
"result.kicker": "Verification · Round {{round}}",
|
||||
"result.passed.sub": "The delivery checker passed {{passed}}/{{total}}. This result is ready to deliver.",
|
||||
"result.passed.title": "Result",
|
||||
"result.pending.sub": "The result is generated but not yet delivered — waiting for the delivery checker.",
|
||||
"result.pending.title": "Draft result",
|
||||
"result.repairing.sub": "Checks did not pass. A repair round has started.",
|
||||
"result.repairing.title": "Draft result",
|
||||
"result.title": "Verification #{{round}}",
|
||||
"status.checking": "Delivery Checker: checking {{passed}}/{{total}}",
|
||||
"status.draft": "Delivery Checker: awaiting confirmation · {{total}} checks",
|
||||
"status.failed": "Delivery Checker: failed · repair triggered",
|
||||
"status.idle": "Delivery Checker: not generated",
|
||||
"status.passed": "Delivery Checker: passed {{passed}}/{{total}}",
|
||||
"status.repairing": "Delivery Checker: repairing",
|
||||
"status.verifying": "Delivery Checker: waiting for run to finish"
|
||||
}
|
||||
@@ -130,6 +130,31 @@
|
||||
"builtins.lobe-cloud-sandbox.apiName.writeLocalFile": "写入文件",
|
||||
"builtins.lobe-cloud-sandbox.inspector.noResults": "无结果",
|
||||
"builtins.lobe-cloud-sandbox.title": "云端沙盒",
|
||||
"builtins.lobe-delivery-checker.apiName.generateVerifyPlan": "创建自动化检查",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.optional": "可选",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.fields.description": "检查说明",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.fields.instruction": "判定规则",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.fields.title": "检查项标题",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.auto_repair": "自动执行修复",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.auto_repairDesc": "未通过时自动触发修复轮次并重新运行检查。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.manual": "手动处理",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.manualDesc": "未通过时停下,后续交由你手动处理。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.onFail.title": "没通过时",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.required.desc": "开启后,该项未通过会阻止当前交付正式完成。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.required.title": "必须完成",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.desc": "未通过时,最多带着失败反馈自动重跑几轮后停止。设为 0 则关闭自动修复。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.title": "最大修复轮数",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.name": "验收标准名称",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.rubric.title": "验收标准设置",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.title": "检查项配置",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.agent.desc": "独立检查 Agent 读取轨迹、文件、diff、PR 后判断。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.agent.title": "Agent 检查",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.llm.desc": "基于文本结果和上下文判断是否满足要求。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.llm.title": "LLM 判断",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.program.desc": "用命令、接口、状态结果验证。适合测试、type-check、PR 是否存在。",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.program.title": "程序化检查",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.portal.verifier.title": "检查方式",
|
||||
"builtins.lobe-delivery-checker.verifyPlan.required": "必选",
|
||||
"builtins.lobe-group-agent-builder.apiName.batchCreateAgents": "批量创建 Agent",
|
||||
"builtins.lobe-group-agent-builder.apiName.createAgent": "创建助理",
|
||||
"builtins.lobe-group-agent-builder.apiName.createGroup": "创建群组",
|
||||
|
||||
@@ -0,0 +1,60 @@
|
||||
{
|
||||
"badge.failed": "检查未通过",
|
||||
"badge.passed": "检查通过",
|
||||
"badge.pending": "待检查",
|
||||
"badge.repairing": "已触发修复",
|
||||
"behavior.auto_improve": "自动补全",
|
||||
"behavior.auto_improveDesc": "自动补全,不阻断交付",
|
||||
"behavior.gate": "交付门禁",
|
||||
"behavior.gateDesc": "未通过则阻断交付并触发修复轮次",
|
||||
"detail.checkedAt": "判断时间",
|
||||
"detail.confidence": "置信度",
|
||||
"detail.counterEvidence": "反证",
|
||||
"detail.duration": "耗时",
|
||||
"detail.evidence": "证据",
|
||||
"detail.instruction": "判定规则",
|
||||
"detail.limitation": "局限",
|
||||
"detail.method": "判断方式",
|
||||
"detail.methodAgent": "Agent 检查",
|
||||
"detail.methodLlm": "LLM 判断",
|
||||
"detail.methodProgram": "程序化检查",
|
||||
"detail.model": "判断模型",
|
||||
"detail.openTrace": "查看 Agent 执行轨迹",
|
||||
"detail.pending": "该检查项尚未运行。",
|
||||
"detail.reasoning": "判定依据",
|
||||
"detail.suggestion": "修复建议",
|
||||
"detail.summary": "检查说明",
|
||||
"detail.tokens": "消耗 Token",
|
||||
"dock.confirm": "确认并运行",
|
||||
"dock.edit": "调整检查项",
|
||||
"dock.forceDeliver": "忽略并交付",
|
||||
"dock.repairHint": "下一轮正在修复未通过的检查项。完成后会生成新一轮结果,并重新运行检查器。",
|
||||
"dock.saveAndRepair": "保存输入并立即修复",
|
||||
"dock.skip": "跳过检查",
|
||||
"dock.title": "交付检查器",
|
||||
"editor.add": "+ 添加检查项",
|
||||
"editor.cancel": "取消",
|
||||
"editor.placeholder": "检查项标题",
|
||||
"editor.save": "保存",
|
||||
"input.hint": "这会作为检查器输入进入下一轮修复 —— 不会作为聊天消息显示。",
|
||||
"input.label": "下一轮修复的补充输入",
|
||||
"input.placeholder": "例如:先跑 type-check;若仍失败,补一条风险说明即可。",
|
||||
"result.failed.sub": "本轮结果暂不交付。交付检查器判定验证不充分,已触发修复。",
|
||||
"result.failed.title": "结果草稿",
|
||||
"result.foot": "这是本轮运行的结果快照,不属于 assistant / user 消息。",
|
||||
"result.kicker": "验证 · 第 {{round}} 轮",
|
||||
"result.passed.sub": "交付检查器已通过 {{passed}}/{{total}},当前结果可正式交付。",
|
||||
"result.passed.title": "结果",
|
||||
"result.pending.sub": "结果已生成,但尚未交付 —— 正在等待交付检查器。",
|
||||
"result.pending.title": "结果草稿",
|
||||
"result.repairing.sub": "检查未通过,已开始修复轮次。",
|
||||
"result.repairing.title": "结果草稿",
|
||||
"result.title": "验证 #{{round}}",
|
||||
"status.checking": "交付检查器:检查中 {{passed}}/{{total}}",
|
||||
"status.draft": "交付检查器:待确认 · {{total}} 项检查",
|
||||
"status.failed": "交付检查器:未通过 · 已触发修复",
|
||||
"status.idle": "交付检查器:未生成",
|
||||
"status.passed": "交付检查器:已通过 {{passed}}/{{total}}",
|
||||
"status.repairing": "交付检查器:修复中",
|
||||
"status.verifying": "交付检查器:等待运行完成"
|
||||
}
|
||||
@@ -225,6 +225,7 @@
|
||||
"@lobechat/builtin-tool-group-management": "workspace:*",
|
||||
"@lobechat/builtin-tool-knowledge-base": "workspace:*",
|
||||
"@lobechat/builtin-tool-lobe-agent": "workspace:*",
|
||||
"@lobechat/builtin-tool-lobe-delivery-checker": "workspace:*",
|
||||
"@lobechat/builtin-tool-local-system": "workspace:*",
|
||||
"@lobechat/builtin-tool-memory": "workspace:*",
|
||||
"@lobechat/builtin-tool-message": "workspace:*",
|
||||
@@ -238,6 +239,7 @@
|
||||
"@lobechat/builtin-tool-task": "workspace:*",
|
||||
"@lobechat/builtin-tool-topic-reference": "workspace:*",
|
||||
"@lobechat/builtin-tool-user-interaction": "workspace:*",
|
||||
"@lobechat/builtin-tool-verify": "workspace:*",
|
||||
"@lobechat/builtin-tool-web-browsing": "workspace:*",
|
||||
"@lobechat/builtin-tool-web-onboarding": "workspace:*",
|
||||
"@lobechat/builtin-tools": "workspace:*",
|
||||
|
||||
@@ -21,7 +21,11 @@ export interface ToolResultMetrics {
|
||||
/** 0..1 — fraction of fixed-size shingles that are exact repeats (degenerate dumps). */
|
||||
selfRedundancy: number;
|
||||
stepIndex: number;
|
||||
/** 0..1 — for xml/html, share of chars living inside `<...>` tags (node ids, markup). */
|
||||
/**
|
||||
* 0..1 — for xml/html, share of chars spent on markup ATTRIBUTES (`id="3tx0"`,
|
||||
* class/style/data-*). Semantic element tags (<title>, <result>) are treated as useful
|
||||
* structure, not noise — so clean semantic XML scores ~0, id-laden DOM dumps score high.
|
||||
*/
|
||||
structuralNoiseRatio: number;
|
||||
/** gpt-tokenizer count of the unwrapped content. */
|
||||
tokens: number;
|
||||
@@ -63,12 +67,19 @@ function selfRedundancy(s: string): number {
|
||||
return 1 - new Set(shingles).size / shingles.length;
|
||||
}
|
||||
|
||||
// Attribute NAMES that carry no signal for the model — opaque identifiers and presentational
|
||||
// markup. Everything else (title=, url=, href=, name=, lang=…) labels real content and is kept.
|
||||
const NOISE_ATTR_RE =
|
||||
/\b(?:id|class|style|role|rel|target|width|height|aria-[\w-]+|data-[\w-]+|on\w+)\s*=\s*(?:"[^"]*"|'[^']*')/gi;
|
||||
|
||||
function structuralNoiseRatio(s: string, format: ToolResultMetrics['format']): number {
|
||||
if (format !== 'xml' || s.length === 0) return 0;
|
||||
let inside = 0;
|
||||
const tags = s.match(/<[^>]*>/g);
|
||||
if (tags) for (const tag of tags) inside += tag.length;
|
||||
return inside / s.length;
|
||||
// Semantic structure is NOT noise: element tags (<item>, <result>) and signal-bearing
|
||||
// attributes (title="…", url="…") are exactly what the model reads. Only opaque/presentational
|
||||
// attributes (id="3tx0", class, style, data-*) are dead bytes the model never references.
|
||||
let attrChars = 0;
|
||||
for (const m of s.matchAll(NOISE_ATTR_RE)) attrChars += m[0].length;
|
||||
return attrChars / s.length;
|
||||
}
|
||||
|
||||
/** Pure: score one raw tool output string. The shared core reused by CLI / DC ingestion / backfill. */
|
||||
|
||||
@@ -19,6 +19,7 @@
|
||||
"@lobechat/builtin-tool-notebook": "workspace:*",
|
||||
"@lobechat/builtin-tool-task": "workspace:*",
|
||||
"@lobechat/builtin-tool-user-interaction": "workspace:*",
|
||||
"@lobechat/builtin-tool-verify": "workspace:*",
|
||||
"@lobechat/business-const": "workspace:*",
|
||||
"@lobechat/const": "workspace:*"
|
||||
},
|
||||
|
||||
@@ -0,0 +1,33 @@
|
||||
import { VerifyToolIdentifier } from '@lobechat/builtin-tool-verify';
|
||||
import { DEFAULT_PROVIDER } from '@lobechat/business-const';
|
||||
import { DEFAULT_MODEL } from '@lobechat/const';
|
||||
|
||||
import type { BuiltinAgentDefinition } from '../../types';
|
||||
import { BUILTIN_AGENT_SLUGS } from '../../types';
|
||||
import { systemRoleTemplate } from './systemRole';
|
||||
|
||||
export const VERIFY_AGENT: BuiltinAgentDefinition = {
|
||||
avatar: '/avatars/lobe-ai.png',
|
||||
persist: {
|
||||
// Custom tool mode: the verifier's toolset is EXACTLY its declared plugins
|
||||
// (its writeback tool + any investigation tools the run injects), with no
|
||||
// default agent toolset (web/sandbox/skills/always-on) so it judges and
|
||||
// submits instead of wandering off. `enableAgentMode: false` keeps the
|
||||
// chat-style minimal injectors (no skill discovery / agent-management).
|
||||
// Search off for the same reason.
|
||||
chatConfig: {
|
||||
enableAgentMode: false,
|
||||
searchMode: 'off',
|
||||
toolMode: 'custom',
|
||||
},
|
||||
model: DEFAULT_MODEL,
|
||||
provider: DEFAULT_PROVIDER,
|
||||
},
|
||||
runtime: (ctx) => ({
|
||||
// Only the verify-result tool — plus any investigation tools the run injects
|
||||
// (e.g. file/search tools). No document/plan tools by default.
|
||||
plugins: [VerifyToolIdentifier, ...(ctx.plugins || [])],
|
||||
systemRole: systemRoleTemplate,
|
||||
}),
|
||||
slug: BUILTIN_AGENT_SLUGS.verifyAgent,
|
||||
};
|
||||
@@ -0,0 +1,7 @@
|
||||
import { systemPrompt } from '@lobechat/builtin-tool-verify';
|
||||
|
||||
export const systemRoleTemplate = `You are a dedicated delivery-check verifier agent. Each run, you are asked to judge exactly ONE delivery check against the work a previous agent produced. Your instructions contain the check's title, description, the detailed judging instruction, the original goal, and the deliverable, along with the \`checkItemId\` to report against.
|
||||
|
||||
Investigate rigorously, follow the judging instruction, and then submit your verdict. Submitting the result via the tool is mandatory — it is the only way your judgement is recorded.
|
||||
|
||||
${systemPrompt}`;
|
||||
@@ -8,6 +8,7 @@ import { SELF_FEEDBACK_INTENT } from './agents/self-feedback-intent';
|
||||
import { SELF_REFLECTION } from './agents/self-reflection';
|
||||
import { SKILL_MANAGEMENT } from './agents/skill-management';
|
||||
import { TASK_AGENT } from './agents/task-agent';
|
||||
import { VERIFY_AGENT } from './agents/verify-agent';
|
||||
import { WEB_ONBOARDING } from './agents/web-onboarding';
|
||||
import type { BuiltinAgentDefinition, BuiltinAgentSlug, RuntimeContext } from './types';
|
||||
import { BUILTIN_AGENT_SLUGS } from './types';
|
||||
@@ -25,6 +26,7 @@ export { SELF_FEEDBACK_INTENT } from './agents/self-feedback-intent';
|
||||
export { SELF_REFLECTION } from './agents/self-reflection';
|
||||
export { SKILL_MANAGEMENT } from './agents/skill-management';
|
||||
export { TASK_AGENT } from './agents/task-agent';
|
||||
export { VERIFY_AGENT } from './agents/verify-agent';
|
||||
export { WEB_ONBOARDING } from './agents/web-onboarding';
|
||||
|
||||
/**
|
||||
@@ -41,6 +43,7 @@ export const BUILTIN_AGENTS: Record<BuiltinAgentSlug, BuiltinAgentDefinition> =
|
||||
[BUILTIN_AGENT_SLUGS.selfReflection]: SELF_REFLECTION,
|
||||
[BUILTIN_AGENT_SLUGS.skillManagement]: SKILL_MANAGEMENT,
|
||||
[BUILTIN_AGENT_SLUGS.taskAgent]: TASK_AGENT,
|
||||
[BUILTIN_AGENT_SLUGS.verifyAgent]: VERIFY_AGENT,
|
||||
[BUILTIN_AGENT_SLUGS.webOnboarding]: WEB_ONBOARDING,
|
||||
};
|
||||
|
||||
|
||||
@@ -16,6 +16,7 @@ export const BUILTIN_AGENT_SLUGS = {
|
||||
selfReflection: 'self-reflection',
|
||||
skillManagement: 'skill-management',
|
||||
taskAgent: 'task-agent',
|
||||
verifyAgent: 'verify-agent',
|
||||
webOnboarding: 'web-onboarding',
|
||||
} as const;
|
||||
|
||||
|
||||
@@ -0,0 +1,31 @@
|
||||
{
|
||||
"name": "@lobechat/builtin-tool-lobe-delivery-checker",
|
||||
"version": "1.0.0",
|
||||
"private": true,
|
||||
"exports": {
|
||||
".": "./src/index.ts",
|
||||
"./client": "./src/client/index.ts"
|
||||
},
|
||||
"main": "./src/index.ts",
|
||||
"scripts": {
|
||||
"test": "vitest",
|
||||
"test:coverage": "vitest --coverage --silent='passed-only'",
|
||||
"test:update": "vitest -u"
|
||||
},
|
||||
"dependencies": {
|
||||
"@lobechat/const": "workspace:*",
|
||||
"@lobechat/prompts": "workspace:*",
|
||||
"@lobechat/utils": "workspace:*"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@lobechat/types": "workspace:*"
|
||||
},
|
||||
"peerDependencies": {
|
||||
"@lobehub/editor": "^4",
|
||||
"@lobehub/ui": "^5",
|
||||
"antd": "^6",
|
||||
"antd-style": "*",
|
||||
"lucide-react": "*",
|
||||
"react": "*"
|
||||
}
|
||||
}
|
||||
+66
@@ -0,0 +1,66 @@
|
||||
'use client';
|
||||
|
||||
import type { BuiltinInspectorProps } from '@lobechat/types';
|
||||
import { Icon } from '@lobehub/ui';
|
||||
import { createStaticStyles, cx } from 'antd-style';
|
||||
import { ListChecks } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import { inspectorTextStyles, shinyTextStyles } from '@/styles';
|
||||
|
||||
import type { GenerateVerifyPlanParams, GenerateVerifyPlanState } from '../../../types';
|
||||
|
||||
const styles = createStaticStyles(({ css, cssVar }) => ({
|
||||
chip: css`
|
||||
overflow: hidden;
|
||||
display: inline-flex;
|
||||
flex: none;
|
||||
gap: 4px;
|
||||
align-items: center;
|
||||
|
||||
max-width: 260px;
|
||||
margin-inline-start: 6px;
|
||||
padding-block: 1px;
|
||||
padding-inline: 6px 8px;
|
||||
border: 1px solid ${cssVar.colorBorderSecondary};
|
||||
border-radius: 12px;
|
||||
|
||||
color: ${cssVar.colorText};
|
||||
`,
|
||||
chipIcon: css`
|
||||
flex: none;
|
||||
color: ${cssVar.colorTextSecondary};
|
||||
`,
|
||||
chipLabel: css`
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
`,
|
||||
}));
|
||||
|
||||
export const GenerateVerifyPlanInspector = memo<
|
||||
BuiltinInspectorProps<GenerateVerifyPlanParams, GenerateVerifyPlanState>
|
||||
>(({ args, partialArgs, pluginState, isArgumentsStreaming }) => {
|
||||
const { t } = useTranslation('plugin');
|
||||
|
||||
const title = pluginState?.title || args?.title || partialArgs?.title;
|
||||
|
||||
return (
|
||||
<div
|
||||
className={cx(inspectorTextStyles.root, isArgumentsStreaming && shinyTextStyles.shinyText)}
|
||||
>
|
||||
<span>{t('builtins.lobe-delivery-checker.apiName.generateVerifyPlan')}</span>
|
||||
{title && (
|
||||
<span className={styles.chip}>
|
||||
<Icon className={styles.chipIcon} icon={ListChecks} size={13} />
|
||||
<span className={styles.chipLabel}>{title}</span>
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
});
|
||||
|
||||
GenerateVerifyPlanInspector.displayName = 'GenerateVerifyPlanInspector';
|
||||
|
||||
export default GenerateVerifyPlanInspector;
|
||||
@@ -0,0 +1,14 @@
|
||||
import type { BuiltinInspector } from '@lobechat/types';
|
||||
|
||||
import { LobeDeliveryCheckerApiName } from '../../types';
|
||||
import { GenerateVerifyPlanInspector } from './GenerateVerifyPlan';
|
||||
|
||||
/**
|
||||
* Delivery Checker Inspector Components Registry
|
||||
*
|
||||
* Inspector components customize the title/header area
|
||||
* of tool calls in the conversation UI.
|
||||
*/
|
||||
export const LobeDeliveryCheckerInspectors: Record<string, BuiltinInspector> = {
|
||||
[LobeDeliveryCheckerApiName.generateVerifyPlan]: GenerateVerifyPlanInspector as BuiltinInspector,
|
||||
};
|
||||
@@ -0,0 +1,54 @@
|
||||
'use client';
|
||||
|
||||
import { ActionIcon, Flexbox } from '@lobehub/ui';
|
||||
import isEqual from 'fast-deep-equal';
|
||||
import { ChevronDown, ChevronUp } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
|
||||
import { useChatStore } from '@/store/chat';
|
||||
import { chatPortalSelectors, dbMessageSelectors } from '@/store/chat/selectors';
|
||||
|
||||
import { LobeDeliveryCheckerIdentifier } from '../../types';
|
||||
|
||||
/**
|
||||
* Portal header right-actions for the delivery-check config: step to the prev /
|
||||
* next criterion. Reads the focused index straight from the portal store (not via
|
||||
* props) so every click re-renders with the up-to-date index — otherwise the nav
|
||||
* would only fire once.
|
||||
*/
|
||||
const PortalActions = memo(() => {
|
||||
const openToolUI = useChatStore((s) => s.openToolUI);
|
||||
const messageId = useChatStore(chatPortalSelectors.toolMessageId);
|
||||
const params = useChatStore(chatPortalSelectors.toolUIParams, isEqual);
|
||||
const message = useChatStore(dbMessageSelectors.getDbMessageById(messageId || ''), isEqual);
|
||||
|
||||
const index = typeof params?.index === 'number' ? params.index : 0;
|
||||
const total = (message?.pluginState as { items?: unknown[] } | undefined)?.items?.length ?? 0;
|
||||
|
||||
// The rubric-config view has no per-criterion stepper.
|
||||
if (!messageId || params?.view === 'rubric' || total <= 1) return null;
|
||||
|
||||
const go = (next: number) =>
|
||||
openToolUI(messageId, LobeDeliveryCheckerIdentifier, { index: next });
|
||||
|
||||
return (
|
||||
<Flexbox horizontal gap={2}>
|
||||
<ActionIcon
|
||||
disabled={index <= 0}
|
||||
icon={ChevronUp}
|
||||
size={'small'}
|
||||
onClick={() => go(index - 1)}
|
||||
/>
|
||||
<ActionIcon
|
||||
disabled={index >= total - 1}
|
||||
icon={ChevronDown}
|
||||
size={'small'}
|
||||
onClick={() => go(index + 1)}
|
||||
/>
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
PortalActions.displayName = 'LobeDeliveryCheckerPortalActions';
|
||||
|
||||
export default PortalActions;
|
||||
@@ -0,0 +1,332 @@
|
||||
'use client';
|
||||
|
||||
import {
|
||||
ReactCodeblockPlugin,
|
||||
ReactCodePlugin,
|
||||
ReactHRPlugin,
|
||||
ReactLinkPlugin,
|
||||
ReactListPlugin,
|
||||
ReactMathPlugin,
|
||||
ReactTablePlugin,
|
||||
} from '@lobehub/editor';
|
||||
import { Editor, useEditor } from '@lobehub/editor/react';
|
||||
import { Flexbox, Icon, TextArea } from '@lobehub/ui';
|
||||
import { Switch } from '@lobehub/ui/base-ui';
|
||||
import { createStaticStyles, cx } from 'antd-style';
|
||||
import type { LucideIcon } from 'lucide-react';
|
||||
import { Bot, Hand, ListChecks, RefreshCw, RotateCcw, Scale, ShieldCheck } from 'lucide-react';
|
||||
import { memo, useEffect } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import { useVerifyStore, verifySelectors } from '@/store/verify';
|
||||
|
||||
import type { VerifyOnFailStrategy, VerifyVerifierType } from '../../types';
|
||||
|
||||
/** The shape this panel needs — assembled from the tool args / state. */
|
||||
export interface CriterionView {
|
||||
/** `verify_criteria.id`; absent on legacy plans → edits can't persist. */
|
||||
criterionId?: string;
|
||||
description?: string;
|
||||
/** Instruction document id; absent → the rubric can't persist. */
|
||||
documentId?: string;
|
||||
instruction?: string;
|
||||
onFail: VerifyOnFailStrategy;
|
||||
required: boolean;
|
||||
title: string;
|
||||
verifierType: VerifyVerifierType;
|
||||
}
|
||||
|
||||
// `program` checks aren't executed in v1, so the picker only offers agent / llm.
|
||||
const VERIFIERS: { icon: LucideIcon; type: VerifyVerifierType }[] = [
|
||||
{ icon: Bot, type: 'agent' },
|
||||
{ icon: Scale, type: 'llm' },
|
||||
];
|
||||
|
||||
const ON_FAILS: { icon: LucideIcon; type: VerifyOnFailStrategy }[] = [
|
||||
{ icon: RefreshCw, type: 'auto_repair' },
|
||||
{ icon: Hand, type: 'manual' },
|
||||
];
|
||||
|
||||
const styles = createStaticStyles(({ css, cssVar }) => ({
|
||||
cardActive: css`
|
||||
border-color: ${cssVar.colorPrimary};
|
||||
`,
|
||||
cardDesc: css`
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
cardTitle: css`
|
||||
font-weight: 600;
|
||||
color: ${cssVar.colorText};
|
||||
`,
|
||||
description: css`
|
||||
resize: none;
|
||||
padding: 0;
|
||||
color: ${cssVar.colorTextSecondary};
|
||||
`,
|
||||
divider: css`
|
||||
height: 1px;
|
||||
margin-block: 4px;
|
||||
background: ${cssVar.colorBorderSecondary};
|
||||
`,
|
||||
editorBlock: css`
|
||||
overflow: auto;
|
||||
|
||||
max-height: 320px;
|
||||
padding-block: 8px;
|
||||
padding-inline: 12px;
|
||||
border: 1px solid ${cssVar.colorBorderSecondary};
|
||||
border-radius: ${cssVar.borderRadiusLG};
|
||||
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
`,
|
||||
fieldIcon: css`
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
fieldLabel: css`
|
||||
font-size: 13px;
|
||||
font-weight: 500;
|
||||
color: ${cssVar.colorTextSecondary};
|
||||
`,
|
||||
title: css`
|
||||
resize: none;
|
||||
|
||||
padding: 0;
|
||||
|
||||
font-size: 18px;
|
||||
font-weight: 600;
|
||||
line-height: 1.4;
|
||||
color: ${cssVar.colorText};
|
||||
`,
|
||||
verifierCard: css`
|
||||
cursor: pointer;
|
||||
|
||||
flex: 1;
|
||||
|
||||
padding: 12px;
|
||||
border: 1px solid ${cssVar.colorBorderSecondary};
|
||||
border-radius: ${cssVar.borderRadiusLG};
|
||||
|
||||
transition:
|
||||
border-color 150ms ${cssVar.motionEaseOut},
|
||||
background 150ms ${cssVar.motionEaseOut};
|
||||
|
||||
&:hover {
|
||||
border-color: ${cssVar.colorBorder};
|
||||
}
|
||||
`,
|
||||
verifierIcon: css`
|
||||
color: ${cssVar.colorTextSecondary};
|
||||
`,
|
||||
switchCard: css`
|
||||
padding: 12px;
|
||||
border: 1px solid ${cssVar.colorBorderSecondary};
|
||||
border-radius: ${cssVar.borderRadiusLG};
|
||||
`,
|
||||
switchDesc: css`
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
switchTitle: css`
|
||||
font-weight: 600;
|
||||
color: ${cssVar.colorText};
|
||||
`,
|
||||
}));
|
||||
|
||||
const EDITOR_PLUGINS = [
|
||||
ReactListPlugin,
|
||||
ReactCodePlugin,
|
||||
ReactCodeblockPlugin,
|
||||
ReactHRPlugin,
|
||||
ReactLinkPlugin,
|
||||
ReactTablePlugin,
|
||||
ReactMathPlugin,
|
||||
];
|
||||
|
||||
interface FieldProps {
|
||||
children: React.ReactNode;
|
||||
icon: LucideIcon;
|
||||
label: string;
|
||||
}
|
||||
|
||||
const Field = memo<FieldProps>(({ icon, label, children }) => (
|
||||
<Flexbox gap={8}>
|
||||
<Flexbox horizontal align="center" gap={6}>
|
||||
<Icon className={styles.fieldIcon} icon={icon} size={14} />
|
||||
<span className={styles.fieldLabel}>{label}</span>
|
||||
</Flexbox>
|
||||
{children}
|
||||
</Flexbox>
|
||||
));
|
||||
|
||||
interface CriterionDetailProps {
|
||||
criterion: CriterionView;
|
||||
}
|
||||
|
||||
/**
|
||||
* The right-side config panel for a single delivery check. Every control writes
|
||||
* through the verify store, which optimistically overlays the edit and
|
||||
* debounce-persists it to the criterion row / instruction document.
|
||||
*/
|
||||
const CriterionDetail = memo<CriterionDetailProps>(({ criterion }) => {
|
||||
const { t } = useTranslation('plugin');
|
||||
const { criterionId, documentId } = criterion;
|
||||
|
||||
const updateCriterion = useVerifyStore((s) => s.updateCriterion);
|
||||
const updateInstruction = useVerifyStore((s) => s.updateInstruction);
|
||||
const edit = useVerifyStore(verifySelectors.criterionEdit(criterionId));
|
||||
|
||||
const editor = useEditor();
|
||||
|
||||
// The panel updates in place (no remount) when switching criteria, so push the
|
||||
// new rubric into the editor imperatively — keeps the title/description from
|
||||
// re-measuring and jittering on nav.
|
||||
useEffect(() => {
|
||||
if (editor) editor.setDocument('text', criterion.instruction ?? '');
|
||||
// eslint-disable-next-line react-hooks/exhaustive-deps
|
||||
}, [editor, criterionId, documentId]);
|
||||
|
||||
const editable = !!criterionId;
|
||||
const title = edit.title ?? criterion.title;
|
||||
const description = edit.description ?? criterion.description ?? '';
|
||||
const required = edit.required ?? criterion.required;
|
||||
const verifierType = edit.verifierType ?? criterion.verifierType;
|
||||
const onFail = edit.onFail ?? criterion.onFail;
|
||||
|
||||
const patch = (value: Parameters<typeof updateCriterion>[1]) => {
|
||||
if (criterionId) updateCriterion(criterionId, value);
|
||||
};
|
||||
|
||||
const handleInstructionChange = () => {
|
||||
if (!documentId || !editor) return;
|
||||
const content = (editor.getDocument('text') as unknown as string) || '';
|
||||
updateInstruction(documentId, content);
|
||||
};
|
||||
|
||||
return (
|
||||
<Flexbox gap={16} paddingBlock={16} style={{ height: '100%' }}>
|
||||
{/* Lightweight title + description at the top */}
|
||||
<Flexbox gap={4}>
|
||||
<TextArea
|
||||
autoSize={{ minRows: 1 }}
|
||||
className={styles.title}
|
||||
placeholder={t('builtins.lobe-delivery-checker.verifyPlan.portal.fields.title')}
|
||||
readOnly={!editable}
|
||||
value={title}
|
||||
variant="borderless"
|
||||
onChange={(e) => patch({ title: e.target.value })}
|
||||
/>
|
||||
<TextArea
|
||||
autoSize={{ minRows: 1 }}
|
||||
className={styles.description}
|
||||
placeholder={t('builtins.lobe-delivery-checker.verifyPlan.portal.fields.description')}
|
||||
readOnly={!editable}
|
||||
value={description}
|
||||
variant="borderless"
|
||||
onChange={(e) => patch({ description: e.target.value })}
|
||||
/>
|
||||
</Flexbox>
|
||||
|
||||
{/* Judging rubric — rich editor */}
|
||||
<Field
|
||||
icon={ListChecks}
|
||||
label={t('builtins.lobe-delivery-checker.verifyPlan.portal.fields.instruction')}
|
||||
>
|
||||
<div className={styles.editorBlock}>
|
||||
<Editor
|
||||
content={criterion.instruction}
|
||||
editable={!!documentId}
|
||||
editor={editor}
|
||||
plugins={EDITOR_PLUGINS}
|
||||
type={'text'}
|
||||
onTextChange={handleInstructionChange}
|
||||
/>
|
||||
</div>
|
||||
</Field>
|
||||
|
||||
<div className={styles.divider} />
|
||||
|
||||
{/* Required */}
|
||||
<Flexbox
|
||||
horizontal
|
||||
align="center"
|
||||
className={styles.switchCard}
|
||||
gap={12}
|
||||
justify="space-between"
|
||||
>
|
||||
<Flexbox gap={2} style={{ minWidth: 0 }}>
|
||||
<span className={styles.switchTitle}>
|
||||
{t('builtins.lobe-delivery-checker.verifyPlan.portal.required.title')}
|
||||
</span>
|
||||
<span className={styles.switchDesc}>
|
||||
{t('builtins.lobe-delivery-checker.verifyPlan.portal.required.desc')}
|
||||
</span>
|
||||
</Flexbox>
|
||||
<Switch checked={required} disabled={!editable} onChange={(c) => patch({ required: c })} />
|
||||
</Flexbox>
|
||||
|
||||
{/* Verifier type */}
|
||||
<Field
|
||||
icon={ShieldCheck}
|
||||
label={t('builtins.lobe-delivery-checker.verifyPlan.portal.verifier.title')}
|
||||
>
|
||||
<Flexbox horizontal gap={8}>
|
||||
{VERIFIERS.map(({ type, icon }) => (
|
||||
<Flexbox
|
||||
className={cx(styles.verifierCard, verifierType === type && styles.cardActive)}
|
||||
gap={6}
|
||||
key={type}
|
||||
onClick={() => patch({ verifierType: type })}
|
||||
>
|
||||
<Flexbox horizontal align="center" gap={6}>
|
||||
<Icon className={styles.verifierIcon} icon={icon} size={15} />
|
||||
<span className={styles.cardTitle}>
|
||||
{t(
|
||||
`builtins.lobe-delivery-checker.verifyPlan.portal.verifier.${type}.title` as any,
|
||||
)}
|
||||
</span>
|
||||
</Flexbox>
|
||||
<span className={styles.cardDesc}>
|
||||
{t(`builtins.lobe-delivery-checker.verifyPlan.portal.verifier.${type}.desc` as any)}
|
||||
</span>
|
||||
</Flexbox>
|
||||
))}
|
||||
</Flexbox>
|
||||
</Field>
|
||||
|
||||
{/* On failure */}
|
||||
<Field
|
||||
icon={RotateCcw}
|
||||
label={t('builtins.lobe-delivery-checker.verifyPlan.portal.onFail.title')}
|
||||
>
|
||||
<Flexbox horizontal gap={8}>
|
||||
{ON_FAILS.map(({ type, icon }) => (
|
||||
<Flexbox
|
||||
className={cx(styles.verifierCard, onFail === type && styles.cardActive)}
|
||||
gap={6}
|
||||
key={type}
|
||||
onClick={() => patch({ onFail: type })}
|
||||
>
|
||||
<Flexbox horizontal align="center" gap={6}>
|
||||
<Icon className={styles.verifierIcon} icon={icon} size={15} />
|
||||
<span className={styles.cardTitle}>
|
||||
{t(`builtins.lobe-delivery-checker.verifyPlan.portal.onFail.${type}` as any)}
|
||||
</span>
|
||||
</Flexbox>
|
||||
<span className={styles.cardDesc}>
|
||||
{t(`builtins.lobe-delivery-checker.verifyPlan.portal.onFail.${type}Desc` as any)}
|
||||
</span>
|
||||
</Flexbox>
|
||||
))}
|
||||
</Flexbox>
|
||||
</Field>
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
CriterionDetail.displayName = 'CriterionDetail';
|
||||
|
||||
export default CriterionDetail;
|
||||
@@ -0,0 +1,103 @@
|
||||
'use client';
|
||||
|
||||
import { DEFAULT_MAX_REPAIR_ROUNDS } from '@lobechat/types';
|
||||
import { Flexbox, Icon, Input, InputNumber } from '@lobehub/ui';
|
||||
import { createStaticStyles } from 'antd-style';
|
||||
import { RefreshCw, Type } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import { useRubric } from '@/features/Verify/hooks';
|
||||
import { useVerifyStore, verifySelectors } from '@/store/verify';
|
||||
|
||||
const styles = createStaticStyles(({ css, cssVar }) => ({
|
||||
desc: css`
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
fieldIcon: css`
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
fieldLabel: css`
|
||||
font-size: 13px;
|
||||
font-weight: 500;
|
||||
color: ${cssVar.colorTextSecondary};
|
||||
`,
|
||||
row: css`
|
||||
padding: 12px;
|
||||
border: 1px solid ${cssVar.colorBorderSecondary};
|
||||
border-radius: ${cssVar.borderRadiusLG};
|
||||
`,
|
||||
rowTitle: css`
|
||||
font-weight: 600;
|
||||
color: ${cssVar.colorText};
|
||||
`,
|
||||
}));
|
||||
|
||||
interface RubricConfigProps {
|
||||
rubricId: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* The right-side config panel for the rubric (delivery-standard): its name plus
|
||||
* the run policy (`maxRepairRounds`). Writes through the verify store (optimistic
|
||||
* + debounced) so edits reflect immediately and persist to the rubric.
|
||||
*/
|
||||
const RubricConfig = memo<RubricConfigProps>(({ rubricId }) => {
|
||||
const { t } = useTranslation('plugin');
|
||||
|
||||
const { data: rubric } = useRubric(rubricId);
|
||||
const updateRubricConfig = useVerifyStore((s) => s.updateRubricConfig);
|
||||
const updateRubricTitle = useVerifyStore((s) => s.updateRubricTitle);
|
||||
const configEdit = useVerifyStore(verifySelectors.rubricConfigEdit(rubricId));
|
||||
const titleEdit = useVerifyStore(verifySelectors.rubricTitleEdit(rubricId));
|
||||
|
||||
const title = titleEdit ?? rubric?.title ?? '';
|
||||
const maxRepairRounds =
|
||||
configEdit.maxRepairRounds ?? rubric?.config?.maxRepairRounds ?? DEFAULT_MAX_REPAIR_ROUNDS;
|
||||
|
||||
return (
|
||||
<Flexbox gap={16} paddingBlock={16} style={{ height: '100%' }}>
|
||||
{/* Standard name */}
|
||||
<Flexbox gap={8}>
|
||||
<Flexbox horizontal align="center" gap={6}>
|
||||
<Icon className={styles.fieldIcon} icon={Type} size={14} />
|
||||
<span className={styles.fieldLabel}>
|
||||
{t('builtins.lobe-delivery-checker.verifyPlan.portal.rubric.name')}
|
||||
</span>
|
||||
</Flexbox>
|
||||
<Input value={title} onChange={(e) => updateRubricTitle(rubricId, e.target.value)} />
|
||||
</Flexbox>
|
||||
|
||||
{/* Max repair rounds */}
|
||||
<Flexbox horizontal align="center" className={styles.row} gap={12} justify="space-between">
|
||||
<Flexbox gap={2} style={{ minWidth: 0 }}>
|
||||
<Flexbox horizontal align="center" gap={6}>
|
||||
<Icon className={styles.fieldIcon} icon={RefreshCw} size={14} />
|
||||
<span className={styles.rowTitle}>
|
||||
{t('builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.title')}
|
||||
</span>
|
||||
</Flexbox>
|
||||
<span className={styles.desc}>
|
||||
{t('builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.desc')}
|
||||
</span>
|
||||
</Flexbox>
|
||||
<InputNumber
|
||||
max={5}
|
||||
min={0}
|
||||
step={1}
|
||||
style={{ flex: 'none', width: 80 }}
|
||||
value={maxRepairRounds}
|
||||
onChange={(value) => {
|
||||
if (typeof value === 'number') updateRubricConfig(rubricId, { maxRepairRounds: value });
|
||||
}}
|
||||
/>
|
||||
</Flexbox>
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
RubricConfig.displayName = 'RubricConfig';
|
||||
|
||||
export default RubricConfig;
|
||||
@@ -0,0 +1,48 @@
|
||||
'use client';
|
||||
|
||||
import { Flexbox, Icon, Text } from '@lobehub/ui';
|
||||
import isEqual from 'fast-deep-equal';
|
||||
import { SlidersHorizontal } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import { useChatStore } from '@/store/chat';
|
||||
import { chatPortalSelectors, dbMessageSelectors } from '@/store/chat/selectors';
|
||||
|
||||
/**
|
||||
* Portal header for the lobe-agent delivery-check config. Owns its own name and
|
||||
* the focused-item `#N / total` badge; prev/next nav lives in the header's right
|
||||
* slot (see Actions) so the framework title slot stays tool-agnostic. Reads the
|
||||
* focused index from the portal store so it stays in sync while navigating.
|
||||
*/
|
||||
const PortalTitle = memo(() => {
|
||||
const { t } = useTranslation('plugin');
|
||||
const messageId = useChatStore(chatPortalSelectors.toolMessageId);
|
||||
const params = useChatStore(chatPortalSelectors.toolUIParams, isEqual);
|
||||
const message = useChatStore(dbMessageSelectors.getDbMessageById(messageId || ''), isEqual);
|
||||
|
||||
const isRubricView = params?.view === 'rubric';
|
||||
const index = typeof params?.index === 'number' ? params.index : 0;
|
||||
const total = (message?.pluginState as { items?: unknown[] } | undefined)?.items?.length ?? 0;
|
||||
|
||||
return (
|
||||
<Flexbox horizontal align={'center'} gap={8}>
|
||||
<Icon icon={SlidersHorizontal} size={16} />
|
||||
<Text style={{ fontSize: 16 }} type={'secondary'}>
|
||||
{isRubricView
|
||||
? t('builtins.lobe-delivery-checker.verifyPlan.portal.rubric.title')
|
||||
: t('builtins.lobe-delivery-checker.verifyPlan.portal.title')}
|
||||
</Text>
|
||||
{!isRubricView && total > 0 && (
|
||||
<Text style={{ fontSize: 13 }} type={'secondary'}>
|
||||
#{index + 1}
|
||||
{total > 1 && ` / ${total}`}
|
||||
</Text>
|
||||
)}
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
PortalTitle.displayName = 'LobeDeliveryCheckerPortalTitle';
|
||||
|
||||
export default PortalTitle;
|
||||
@@ -0,0 +1,60 @@
|
||||
'use client';
|
||||
|
||||
import type { BuiltinPortalProps } from '@lobechat/types';
|
||||
import { Center } from '@lobehub/ui';
|
||||
import { memo } from 'react';
|
||||
|
||||
import type { GenerateVerifyPlanParams, GenerateVerifyPlanState } from '../../types';
|
||||
import { LobeDeliveryCheckerApiName } from '../../types';
|
||||
import CriterionDetail, { type CriterionView } from './CriterionDetail';
|
||||
import RubricConfig from './RubricConfig';
|
||||
|
||||
/**
|
||||
* One Portal per tool, routed on `apiName`. Currently only `generateVerifyPlan`
|
||||
* has a deep-dive view: clicking a check row in the Render opens the criterion's
|
||||
* full configuration here, focused via `params.index`.
|
||||
*/
|
||||
const Portal = memo<BuiltinPortalProps>(({ apiName, arguments: args, params, state }) => {
|
||||
switch (apiName) {
|
||||
case LobeDeliveryCheckerApiName.generateVerifyPlan: {
|
||||
const index = typeof params?.index === 'number' ? params.index : 0;
|
||||
const planArgs = args as GenerateVerifyPlanParams | undefined;
|
||||
const planState = state as GenerateVerifyPlanState | undefined;
|
||||
|
||||
// Rubric-level run-policy config (maxRepairRounds, …) — opened from the
|
||||
// Render card's settings affordance.
|
||||
if (params?.view === 'rubric') {
|
||||
return planState?.rubricId ? <RubricConfig rubricId={planState.rubricId} /> : null;
|
||||
}
|
||||
|
||||
const input = planArgs?.criteria?.[index];
|
||||
const item = planState?.items?.[index];
|
||||
|
||||
// Prefer the model's full input (it carries `instruction`); the persisted
|
||||
// item carries the ids needed to write edits back.
|
||||
const criterion: CriterionView | undefined =
|
||||
input || item
|
||||
? {
|
||||
criterionId: item?.criterionId,
|
||||
description: input?.description ?? item?.description,
|
||||
documentId: item?.documentId,
|
||||
instruction: input?.instruction,
|
||||
onFail: input?.onFail ?? item?.onFail ?? 'auto_repair',
|
||||
required: input?.required ?? item?.required ?? true,
|
||||
title: input?.title ?? item?.title ?? '',
|
||||
verifierType: input?.verifierType ?? item?.verifierType ?? 'llm',
|
||||
}
|
||||
: undefined;
|
||||
|
||||
if (!criterion) return null;
|
||||
|
||||
return <CriterionDetail criterion={criterion} />;
|
||||
}
|
||||
}
|
||||
|
||||
return <Center height={'100%'} />;
|
||||
});
|
||||
|
||||
Portal.displayName = 'LobeDeliveryCheckerPortal';
|
||||
|
||||
export default Portal;
|
||||
@@ -0,0 +1,168 @@
|
||||
'use client';
|
||||
|
||||
import type { BuiltinRenderProps } from '@lobechat/types';
|
||||
import { ActionIcon, Flexbox, Icon } from '@lobehub/ui';
|
||||
import { createStaticStyles } from 'antd-style';
|
||||
import type { LucideIcon } from 'lucide-react';
|
||||
import { Bot, Scale, SlidersHorizontal, SquareTerminal } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import { useChatStore } from '@/store/chat';
|
||||
|
||||
import type {
|
||||
GeneratedVerifyCheck,
|
||||
GenerateVerifyPlanParams,
|
||||
GenerateVerifyPlanState,
|
||||
VerifyVerifierType,
|
||||
} from '../../types';
|
||||
import { LobeDeliveryCheckerIdentifier } from '../../types';
|
||||
|
||||
/** Verifier-type icon, matching the config panel: agent → bot, llm → scale. */
|
||||
const VERIFIER_ICON: Record<VerifyVerifierType, LucideIcon> = {
|
||||
agent: Bot,
|
||||
llm: Scale,
|
||||
program: SquareTerminal,
|
||||
};
|
||||
|
||||
const styles = createStaticStyles(({ css, cssVar }) => ({
|
||||
icon: css`
|
||||
margin-block-start: 1px;
|
||||
color: ${cssVar.colorTextSecondary};
|
||||
`,
|
||||
description: css`
|
||||
margin-block-start: 2px;
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
card: css`
|
||||
overflow: hidden;
|
||||
border: 1px solid ${cssVar.colorBorderSecondary};
|
||||
border-radius: 12px;
|
||||
background: ${cssVar.colorBgElevated};
|
||||
`,
|
||||
kicker: css`
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
`,
|
||||
row: css`
|
||||
cursor: pointer;
|
||||
padding-block: 10px;
|
||||
padding-inline: 12px;
|
||||
transition: background 150ms ${cssVar.motionEaseOut};
|
||||
|
||||
&:not(:last-child) {
|
||||
border-block-end: 1px solid ${cssVar.colorBorderSecondary};
|
||||
}
|
||||
|
||||
&:hover {
|
||||
background: ${cssVar.colorFillQuaternary};
|
||||
}
|
||||
`,
|
||||
tag: css`
|
||||
flex: none;
|
||||
|
||||
padding-block: 2px;
|
||||
padding-inline: 8px;
|
||||
border-radius: 6px;
|
||||
|
||||
font-size: 11px;
|
||||
font-weight: 500;
|
||||
color: ${cssVar.colorTextTertiary};
|
||||
|
||||
background: ${cssVar.colorFillTertiary};
|
||||
`,
|
||||
tagRequired: css`
|
||||
color: ${cssVar.colorText};
|
||||
background: ${cssVar.colorFillSecondary};
|
||||
`,
|
||||
title: css`
|
||||
font-weight: 500;
|
||||
line-height: 1.5;
|
||||
color: ${cssVar.colorText};
|
||||
`,
|
||||
}));
|
||||
|
||||
/**
|
||||
* Renders the `generateVerifyPlan` tool call: the delivery standard title plus
|
||||
* the checks the deliverable must satisfy. Each row shows a verifier-type icon,
|
||||
* the check title, its judging instruction, and a required/optional tag. Reads
|
||||
* the created plan from `pluginState` once executed, and falls back to the
|
||||
* proposed `args` while the call awaits confirmation.
|
||||
*/
|
||||
const GenerateVerifyPlanRender = memo<
|
||||
BuiltinRenderProps<GenerateVerifyPlanParams, GenerateVerifyPlanState>
|
||||
>(({ args, pluginState, messageId }) => {
|
||||
const { t } = useTranslation('plugin');
|
||||
const openToolUI = useChatStore((s) => s.openToolUI);
|
||||
|
||||
const items: GeneratedVerifyCheck[] =
|
||||
pluginState?.items ??
|
||||
(args?.criteria ?? []).map((c) => ({
|
||||
description: c.description,
|
||||
onFail: c.onFail ?? 'manual',
|
||||
required: c.required ?? true,
|
||||
title: c.title,
|
||||
verifierType: c.verifierType ?? 'llm',
|
||||
}));
|
||||
const title = pluginState?.title ?? args?.title;
|
||||
const rubricId = pluginState?.rubricId;
|
||||
|
||||
if (!items.length) return null;
|
||||
|
||||
return (
|
||||
<Flexbox gap={8} paddingBlock={4}>
|
||||
{(title || rubricId) && (
|
||||
<Flexbox horizontal align="center" gap={8} justify="space-between">
|
||||
{title ? <span className={styles.kicker}>{title}</span> : <span />}
|
||||
{rubricId && (
|
||||
<ActionIcon
|
||||
icon={SlidersHorizontal}
|
||||
size="small"
|
||||
title={t('builtins.lobe-delivery-checker.verifyPlan.portal.rubric.title')}
|
||||
onClick={() =>
|
||||
openToolUI(messageId, LobeDeliveryCheckerIdentifier, { view: 'rubric' })
|
||||
}
|
||||
/>
|
||||
)}
|
||||
</Flexbox>
|
||||
)}
|
||||
<div className={styles.card}>
|
||||
{items.map((item, index) => (
|
||||
<Flexbox
|
||||
horizontal
|
||||
align="flex-start"
|
||||
className={styles.row}
|
||||
gap={8}
|
||||
justify="space-between"
|
||||
key={index}
|
||||
onClick={() => openToolUI(messageId, LobeDeliveryCheckerIdentifier, { index })}
|
||||
>
|
||||
<Flexbox horizontal align="flex-start" gap={8} style={{ minWidth: 0 }}>
|
||||
<Icon
|
||||
className={styles.icon}
|
||||
icon={VERIFIER_ICON[item.verifierType] ?? Scale}
|
||||
size={15}
|
||||
/>
|
||||
<Flexbox gap={0} style={{ minWidth: 0 }}>
|
||||
<span className={styles.title}>{item.title}</span>
|
||||
{item.description && <span className={styles.description}>{item.description}</span>}
|
||||
</Flexbox>
|
||||
</Flexbox>
|
||||
<span className={`${styles.tag} ${item.required ? styles.tagRequired : ''}`}>
|
||||
{item.required
|
||||
? t('builtins.lobe-delivery-checker.verifyPlan.required')
|
||||
: t('builtins.lobe-delivery-checker.verifyPlan.optional')}
|
||||
</span>
|
||||
</Flexbox>
|
||||
))}
|
||||
</div>
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
GenerateVerifyPlanRender.displayName = 'GenerateVerifyPlanRender';
|
||||
|
||||
export default GenerateVerifyPlanRender;
|
||||
@@ -0,0 +1,13 @@
|
||||
import { LobeDeliveryCheckerApiName } from '../../types';
|
||||
import GenerateVerifyPlanRender from './GenerateVerifyPlan';
|
||||
|
||||
/**
|
||||
* Delivery Checker Tool Render Components Registry
|
||||
*
|
||||
* The verify plan renders the generated delivery checks.
|
||||
*/
|
||||
export const LobeDeliveryCheckerRenders = {
|
||||
[LobeDeliveryCheckerApiName.generateVerifyPlan]: GenerateVerifyPlanRender,
|
||||
};
|
||||
|
||||
export { default as GenerateVerifyPlanRender } from './GenerateVerifyPlan';
|
||||
@@ -0,0 +1,14 @@
|
||||
// Inspector components (customized tool call headers)
|
||||
export { LobeDeliveryCheckerInspectors } from './Inspector';
|
||||
|
||||
// Portal component (detailed view in the side panel)
|
||||
export { default as LobeDeliveryCheckerPortal } from './Portal';
|
||||
export { default as LobeDeliveryCheckerPortalActions } from './Portal/Actions';
|
||||
export { default as LobeDeliveryCheckerPortalTitle } from './Portal/Title';
|
||||
|
||||
// Render components (read-only snapshots)
|
||||
export { GenerateVerifyPlanRender, LobeDeliveryCheckerRenders } from './Render';
|
||||
|
||||
// Re-export types and manifest for convenience
|
||||
export { LobeDeliveryCheckerManifest } from '../manifest';
|
||||
export * from '../types';
|
||||
@@ -0,0 +1,3 @@
|
||||
export { LobeDeliveryCheckerManifest } from './manifest';
|
||||
export { systemPrompt } from './systemRole';
|
||||
export * from './types';
|
||||
@@ -0,0 +1,76 @@
|
||||
import type { BuiltinToolManifest } from '@lobechat/types';
|
||||
|
||||
import { systemPrompt } from './systemRole';
|
||||
import { LobeDeliveryCheckerApiName, LobeDeliveryCheckerIdentifier } from './types';
|
||||
|
||||
export const LobeDeliveryCheckerManifest: BuiltinToolManifest = {
|
||||
api: [
|
||||
{
|
||||
description:
|
||||
"Define the delivery checks for the current Agent Run. Call this BEFORE doing substantive work, once you understand the task. Enumerate the concrete checks the deliverable must satisfy — one `criteria` entry per check — and set `title` to the user's task/goal. On confirmation the criteria and a reusable rubric are created and snapshotted onto the run; the checks run automatically when the run completes — you do NOT run them yourself.",
|
||||
name: LobeDeliveryCheckerApiName.generateVerifyPlan,
|
||||
humanIntervention: 'required',
|
||||
renderDisplayControl: 'expand',
|
||||
parameters: {
|
||||
properties: {
|
||||
title: {
|
||||
description:
|
||||
"The delivery standard's title — typically the user's task / goal in one line.",
|
||||
type: 'string',
|
||||
},
|
||||
criteria: {
|
||||
description: 'The checks the deliverable must satisfy — one entry per check.',
|
||||
items: {
|
||||
properties: {
|
||||
title: {
|
||||
description: 'The short title of this check.',
|
||||
type: 'string',
|
||||
},
|
||||
description: {
|
||||
description: 'A one-sentence summary of what this check verifies.',
|
||||
type: 'string',
|
||||
},
|
||||
instruction: {
|
||||
description:
|
||||
'A detailed, fine-grained judging rubric for this check: the exact pass conditions, what counts as a fail, the concrete evidence the judge must find, and edge cases to check. Write it thoroughly (multiple sentences / bullet points), not a one-liner — the judge relies on it.',
|
||||
type: 'string',
|
||||
},
|
||||
required: {
|
||||
description:
|
||||
'Whether this check is required (must pass to deliver) vs optional. Default true.',
|
||||
type: 'boolean',
|
||||
},
|
||||
verifierType: {
|
||||
description:
|
||||
"How it is judged. 'llm' (default) judges the deliverable with an LLM; 'agent' spawns a sub-agent that actively investigates (reads files, runs checks); 'program' runs a command (not executed in v1).",
|
||||
enum: ['llm', 'agent', 'program'],
|
||||
type: 'string',
|
||||
},
|
||||
onFail: {
|
||||
description:
|
||||
"Action on failure: 'manual' (default) or 'auto_repair' (attempt an automatic fix).",
|
||||
enum: ['manual', 'auto_repair'],
|
||||
type: 'string',
|
||||
},
|
||||
},
|
||||
required: ['title', 'description', 'instruction'],
|
||||
type: 'object',
|
||||
},
|
||||
type: 'array',
|
||||
},
|
||||
},
|
||||
required: ['title', 'criteria'],
|
||||
type: 'object',
|
||||
},
|
||||
},
|
||||
],
|
||||
identifier: LobeDeliveryCheckerIdentifier,
|
||||
meta: {
|
||||
avatar: '✅',
|
||||
description:
|
||||
'Define delivery checks the agent run must satisfy; they run automatically on completion.',
|
||||
title: 'Delivery Checker',
|
||||
},
|
||||
systemRole: systemPrompt,
|
||||
type: 'builtin',
|
||||
};
|
||||
@@ -0,0 +1,15 @@
|
||||
export const systemPrompt = `
|
||||
<delivery_checker>
|
||||
For a task that produces a **deliverable** (writing code, editing files, producing a document or a multi-step result), call \`generateVerifyPlan\` **once at the start**, after you understand the request and before doing the substantive work.
|
||||
|
||||
Enumerate the concrete checks the deliverable must satisfy and pass them as \`criteria\` (one entry per check), with \`title\` set to the user's task/goal. Derive the criteria from the user's explicit requirements — **each requirement becomes one criterion**. For each criterion:
|
||||
- \`title\`: the single, concrete pass/fail standard (short).
|
||||
- \`instruction\`: a **detailed, fine-grained judging rubric** — the exact pass conditions, what counts as a fail, the concrete evidence the judge must find, and edge cases. Write it thoroughly, not a one-liner; the judge relies entirely on it.
|
||||
- \`verifierType\`: \`llm\` (default) for qualitative judgement from the output; \`agent\` when the check needs active investigation (reading files, running checks).
|
||||
- \`required\`: \`true\` when the check must pass to deliver, \`false\` when it is optional/advisory.
|
||||
|
||||
- The user reviews and confirms the proposed checks; on confirmation they are persisted and the checks run **automatically** when the operation completes. You do **not** run the checks yourself.
|
||||
- **Skip it** for simple questions, lookups, or chit-chat with no deliverable, or when you cannot identify any concrete check.
|
||||
- After calling it, continue with the task normally — do not wait or ask the user about the checks.
|
||||
</delivery_checker>
|
||||
`;
|
||||
@@ -0,0 +1,81 @@
|
||||
export const LobeDeliveryCheckerIdentifier = 'lobe-delivery-checker';
|
||||
|
||||
export const LobeDeliveryCheckerApiName = {
|
||||
generateVerifyPlan: 'generateVerifyPlan',
|
||||
} as const;
|
||||
|
||||
export type LobeDeliveryCheckerApiNameType =
|
||||
(typeof LobeDeliveryCheckerApiName)[keyof typeof LobeDeliveryCheckerApiName];
|
||||
|
||||
// ==================== Verify (delivery checker) ====================
|
||||
|
||||
/** How a single delivery check is judged. */
|
||||
export type VerifyVerifierType = 'program' | 'agent' | 'llm';
|
||||
/** What to do when a delivery check fails. */
|
||||
export type VerifyOnFailStrategy = 'manual' | 'auto_repair';
|
||||
|
||||
/**
|
||||
* One delivery check the agent defines for the run. Fully specified by the
|
||||
* model (like `createDocument` writes a whole document) — on confirmation each
|
||||
* becomes a `verify_criteria` row aggregated under the run's rubric.
|
||||
*/
|
||||
export interface VerifyCriterionInput {
|
||||
/** A one-sentence summary of what this check verifies (required). */
|
||||
description: string;
|
||||
/**
|
||||
* The detailed, fine-grained judging rubric (required): the exact pass
|
||||
* conditions, what counts as a fail, the concrete evidence the judge must find,
|
||||
* and edge cases. Written thoroughly — the judge relies on it.
|
||||
*/
|
||||
instruction: string;
|
||||
/** Action on failure. Defaults to 'manual'. */
|
||||
onFail?: VerifyOnFailStrategy;
|
||||
/** Whether this check is required (must pass to deliver) vs optional. Defaults to true. */
|
||||
required?: boolean;
|
||||
/** The short title of this check. */
|
||||
title: string;
|
||||
/** How this check is judged. Defaults to 'llm'. */
|
||||
verifierType?: VerifyVerifierType;
|
||||
}
|
||||
|
||||
/**
|
||||
* Define the delivery-checker plan for the current Agent Run. The agent calls
|
||||
* this before doing substantive work, enumerating the checks the deliverable
|
||||
* must satisfy. On confirmation the criteria + a rubric are created in the DB
|
||||
* and snapshotted onto the operation.
|
||||
*/
|
||||
export interface GenerateVerifyPlanParams {
|
||||
/** The checks the deliverable must satisfy — one entry per check. */
|
||||
criteria: VerifyCriterionInput[];
|
||||
/** The delivery standard's title — typically the user's task / goal. */
|
||||
title: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* A created check item, surfaced on the tool message for the Render. The
|
||||
* detailed `instruction` is not carried here — it lives in the criterion's
|
||||
* linked document; only the concise `description` is shown in the check list.
|
||||
*/
|
||||
export interface GeneratedVerifyCheck {
|
||||
/** The persisted `verify_criteria.id` — lets the client write edits back. */
|
||||
criterionId?: string;
|
||||
/** One-sentence summary shown under the title. */
|
||||
description?: string;
|
||||
/** The instruction document id — lets the client edit the detailed rubric. */
|
||||
documentId?: string;
|
||||
onFail: VerifyOnFailStrategy;
|
||||
/** Whether this check is required (must pass) vs optional. */
|
||||
required: boolean;
|
||||
title: string;
|
||||
verifierType: VerifyVerifierType;
|
||||
}
|
||||
|
||||
/** State persisted on the generateVerifyPlan tool message (drives the Render). */
|
||||
export interface GenerateVerifyPlanState {
|
||||
/** The created check items, in plan order. */
|
||||
items: GeneratedVerifyCheck[];
|
||||
/** The created rubric id (`verify_rubrics.id`). */
|
||||
rubricId?: string;
|
||||
/** The rubric / delivery-standard title. */
|
||||
title: string;
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
{
|
||||
"name": "@lobechat/builtin-tool-verify",
|
||||
"version": "1.0.0",
|
||||
"private": true,
|
||||
"exports": {
|
||||
".": "./src/index.ts"
|
||||
},
|
||||
"main": "./src/index.ts",
|
||||
"devDependencies": {
|
||||
"@lobechat/types": "workspace:*"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,8 @@
|
||||
export { VerifyToolIdentifier, VerifyToolManifest } from './manifest';
|
||||
export { systemPrompt } from './systemRole';
|
||||
export {
|
||||
type SubmitVerifyResultParams,
|
||||
VerifyToolApiName,
|
||||
type VerifyToolApiNameType,
|
||||
type VerifyToolVerdict,
|
||||
} from './types';
|
||||
@@ -0,0 +1,60 @@
|
||||
import type { BuiltinToolManifest } from '@lobechat/types';
|
||||
|
||||
import { systemPrompt } from './systemRole';
|
||||
import { VerifyToolApiName } from './types';
|
||||
|
||||
export const VerifyToolIdentifier = 'lobe-verify';
|
||||
|
||||
export const VerifyToolManifest: BuiltinToolManifest = {
|
||||
api: [
|
||||
{
|
||||
description:
|
||||
'Record the verdict for the delivery check you were asked to judge. Call this exactly once, after investigating, with the checkItemId you were given and your verdict. This is the only way to submit your judgement.',
|
||||
name: VerifyToolApiName.submitVerifyResult,
|
||||
parameters: {
|
||||
properties: {
|
||||
checkItemId: {
|
||||
description: 'The id of the check being judged (given to you in your instructions).',
|
||||
type: 'string',
|
||||
},
|
||||
verdict: {
|
||||
description:
|
||||
"Your judgement: 'passed' (concrete evidence the check is met), 'failed' (clearly not met), or 'uncertain' (cannot determine).",
|
||||
enum: ['passed', 'failed', 'uncertain'],
|
||||
type: 'string',
|
||||
},
|
||||
evidence: {
|
||||
description: 'The concrete evidence from the work supporting your verdict.',
|
||||
type: 'string',
|
||||
},
|
||||
reasoning: {
|
||||
description: 'Why that evidence supports your verdict.',
|
||||
type: 'string',
|
||||
},
|
||||
counterEvidence: {
|
||||
description: 'Evidence pointing the other way, if any.',
|
||||
type: 'string',
|
||||
},
|
||||
limitation: {
|
||||
description: 'What you could not verify and why.',
|
||||
type: 'string',
|
||||
},
|
||||
suggestion: {
|
||||
description: 'A concrete fix when the verdict is failed or uncertain.',
|
||||
type: 'string',
|
||||
},
|
||||
},
|
||||
required: ['checkItemId', 'verdict'],
|
||||
type: 'object',
|
||||
},
|
||||
},
|
||||
],
|
||||
identifier: VerifyToolIdentifier,
|
||||
meta: {
|
||||
avatar: '✅',
|
||||
description: 'Submit the verdict for a single delivery check',
|
||||
title: 'Delivery Check Verifier',
|
||||
},
|
||||
systemRole: systemPrompt,
|
||||
type: 'builtin',
|
||||
};
|
||||
@@ -0,0 +1,7 @@
|
||||
export const systemPrompt = `You are a delivery-check verifier. You are given ONE delivery check to judge against the work that was produced: a check title, a one-line description, and a detailed judging instruction, plus the goal and the deliverable.
|
||||
|
||||
Your job:
|
||||
- Judge whether the DELIVERABLE provided to you satisfies the check, following the judging instruction precisely. Base your judgement on the deliverable and the judging instruction in front of you — reason it through directly.
|
||||
- You do NOT have web search, sandbox, file, or other investigation tools, and you do not need them. Do not try to look things up externally; decide from the provided evidence.
|
||||
- Be skeptical but decisive: return "passed" when the deliverable clearly meets the check, "failed" when it clearly does not, and "uncertain" only when the provided material genuinely cannot settle it. Always reach one of these verdicts — never leave the check unresolved.
|
||||
- You MUST finish by calling \`submitVerifyResult\` exactly once, passing the given \`checkItemId\`, your \`verdict\`, and the supporting \`evidence\` / \`reasoning\` (and a \`suggestion\` when failed/uncertain). Calling the tool is the ONLY way to record your judgement — a text answer alone does nothing. Do not create documents or any other side effects.`;
|
||||
@@ -0,0 +1,27 @@
|
||||
export const VerifyToolApiName = {
|
||||
/** Submit the verdict for the check this verifier sub-agent was asked to judge. */
|
||||
submitVerifyResult: 'submitVerifyResult',
|
||||
} as const;
|
||||
|
||||
export type VerifyToolApiNameType = (typeof VerifyToolApiName)[keyof typeof VerifyToolApiName];
|
||||
|
||||
/** The verdict a verifier sub-agent reaches for a single delivery check. */
|
||||
export type VerifyToolVerdict = 'passed' | 'failed' | 'uncertain';
|
||||
|
||||
/** Arguments the verifier sub-agent passes to `submitVerifyResult`. */
|
||||
export interface SubmitVerifyResultParams {
|
||||
/** The id of the check being judged (given to the agent in its instructions). */
|
||||
checkItemId: string;
|
||||
/** Counter-evidence pointing the other way, if any. */
|
||||
counterEvidence?: string;
|
||||
/** The concrete evidence from the work supporting the verdict. */
|
||||
evidence?: string;
|
||||
/** What could not be verified and why. */
|
||||
limitation?: string;
|
||||
/** Why the evidence supports the verdict. */
|
||||
reasoning?: string;
|
||||
/** A concrete fix when the verdict is failed/uncertain. */
|
||||
suggestion?: string;
|
||||
/** The judgement for this check. */
|
||||
verdict: VerifyToolVerdict;
|
||||
}
|
||||
@@ -0,0 +1,25 @@
|
||||
'use client';
|
||||
|
||||
import type { BuiltinPortalTitleProps } from '@lobechat/types';
|
||||
import { Flexbox, Icon, Text } from '@lobehub/ui';
|
||||
import { Globe } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
/** Portal header for the web-browsing tool. */
|
||||
const PortalTitle = memo<BuiltinPortalTitleProps>(() => {
|
||||
const { t } = useTranslation('plugin');
|
||||
|
||||
return (
|
||||
<Flexbox horizontal align={'center'} gap={8}>
|
||||
<Icon icon={Globe} size={16} />
|
||||
<Text style={{ fontSize: 16 }} type={'secondary'}>
|
||||
{t('search.title')}
|
||||
</Text>
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
PortalTitle.displayName = 'WebBrowsingPortalTitle';
|
||||
|
||||
export default PortalTitle;
|
||||
@@ -9,6 +9,7 @@ export { WebBrowsingPlaceholders } from './Placeholder';
|
||||
|
||||
// Portal component (detailed view in portal)
|
||||
export { default as WebBrowsingPortal } from './Portal';
|
||||
export { default as WebBrowsingPortalTitle } from './Portal/Title';
|
||||
|
||||
// Reusable components
|
||||
export { CategoryAvatar, EngineAvatar, EngineAvatarGroup, SearchBar } from './components';
|
||||
|
||||
@@ -30,6 +30,7 @@
|
||||
"@lobechat/builtin-tool-group-management": "workspace:*",
|
||||
"@lobechat/builtin-tool-knowledge-base": "workspace:*",
|
||||
"@lobechat/builtin-tool-lobe-agent": "workspace:*",
|
||||
"@lobechat/builtin-tool-lobe-delivery-checker": "workspace:*",
|
||||
"@lobechat/builtin-tool-local-system": "workspace:*",
|
||||
"@lobechat/builtin-tool-memory": "workspace:*",
|
||||
"@lobechat/builtin-tool-message": "workspace:*",
|
||||
@@ -43,6 +44,7 @@
|
||||
"@lobechat/builtin-tool-task": "workspace:*",
|
||||
"@lobechat/builtin-tool-topic-reference": "workspace:*",
|
||||
"@lobechat/builtin-tool-user-interaction": "workspace:*",
|
||||
"@lobechat/builtin-tool-verify": "workspace:*",
|
||||
"@lobechat/builtin-tool-web-browsing": "workspace:*",
|
||||
"@lobechat/builtin-tool-web-onboarding": "workspace:*",
|
||||
"@lobechat/const": "workspace:*",
|
||||
|
||||
@@ -15,6 +15,7 @@ import { GroupAgentBuilderManifest } from '@lobechat/builtin-tool-group-agent-bu
|
||||
import { GroupManagementManifest } from '@lobechat/builtin-tool-group-management';
|
||||
import { KnowledgeBaseManifest } from '@lobechat/builtin-tool-knowledge-base';
|
||||
import { LobeAgentManifest } from '@lobechat/builtin-tool-lobe-agent';
|
||||
import { LobeDeliveryCheckerManifest } from '@lobechat/builtin-tool-lobe-delivery-checker';
|
||||
import { LocalSystemManifest } from '@lobechat/builtin-tool-local-system';
|
||||
import { MemoryManifest } from '@lobechat/builtin-tool-memory';
|
||||
import { NotebookManifest } from '@lobechat/builtin-tool-notebook';
|
||||
@@ -24,6 +25,7 @@ import { SkillStoreManifest } from '@lobechat/builtin-tool-skill-store';
|
||||
import { SkillsManifest } from '@lobechat/builtin-tool-skills';
|
||||
import { TopicReferenceManifest } from '@lobechat/builtin-tool-topic-reference';
|
||||
import { UserInteractionManifest } from '@lobechat/builtin-tool-user-interaction';
|
||||
import { VerifyToolManifest } from '@lobechat/builtin-tool-verify';
|
||||
import { WebBrowsingManifest } from '@lobechat/builtin-tool-web-browsing';
|
||||
import { WebOnboardingManifest } from '@lobechat/builtin-tool-web-onboarding';
|
||||
|
||||
@@ -54,4 +56,6 @@ export const builtinToolIdentifiers: string[] = [
|
||||
UserInteractionManifest.identifier,
|
||||
LobeAgentManifest.identifier,
|
||||
WebOnboardingManifest.identifier,
|
||||
VerifyToolManifest.identifier,
|
||||
LobeDeliveryCheckerManifest.identifier,
|
||||
];
|
||||
|
||||
@@ -16,6 +16,7 @@ import { GroupAgentBuilderManifest } from '@lobechat/builtin-tool-group-agent-bu
|
||||
import { GroupManagementManifest } from '@lobechat/builtin-tool-group-management';
|
||||
import { KnowledgeBaseManifest } from '@lobechat/builtin-tool-knowledge-base';
|
||||
import { LobeAgentManifest } from '@lobechat/builtin-tool-lobe-agent';
|
||||
import { LobeDeliveryCheckerManifest } from '@lobechat/builtin-tool-lobe-delivery-checker';
|
||||
import { LocalSystemManifest } from '@lobechat/builtin-tool-local-system';
|
||||
import { MemoryManifest } from '@lobechat/builtin-tool-memory';
|
||||
import { MessageManifest } from '@lobechat/builtin-tool-message';
|
||||
@@ -28,6 +29,7 @@ import { SkillsManifest } from '@lobechat/builtin-tool-skills';
|
||||
import { TaskManifest } from '@lobechat/builtin-tool-task';
|
||||
import { TopicReferenceManifest } from '@lobechat/builtin-tool-topic-reference';
|
||||
import { UserInteractionManifest } from '@lobechat/builtin-tool-user-interaction';
|
||||
import { VerifyToolManifest } from '@lobechat/builtin-tool-verify';
|
||||
import { WebBrowsingManifest } from '@lobechat/builtin-tool-web-browsing';
|
||||
import { WebOnboardingManifest } from '@lobechat/builtin-tool-web-onboarding';
|
||||
import { isDesktop, RECOMMENDED_SKILLS, RecommendedSkillType } from '@lobechat/const';
|
||||
@@ -127,6 +129,13 @@ export const runtimeManagedToolIds = [
|
||||
];
|
||||
|
||||
export const builtinTools: LobeBuiltinTool[] = [
|
||||
{
|
||||
discoverable: false,
|
||||
hidden: true,
|
||||
identifier: VerifyToolManifest.identifier,
|
||||
manifest: VerifyToolManifest,
|
||||
type: 'builtin',
|
||||
},
|
||||
{
|
||||
discoverable: false,
|
||||
hidden: true,
|
||||
@@ -319,6 +328,11 @@ export const builtinTools: LobeBuiltinTool[] = [
|
||||
manifest: LobeAgentManifest,
|
||||
type: 'builtin',
|
||||
},
|
||||
{
|
||||
identifier: LobeDeliveryCheckerManifest.identifier,
|
||||
manifest: LobeDeliveryCheckerManifest,
|
||||
type: 'builtin',
|
||||
},
|
||||
];
|
||||
|
||||
const recommendedBuiltinIds = new Set(
|
||||
|
||||
@@ -35,6 +35,10 @@ import {
|
||||
KnowledgeBaseManifest,
|
||||
} from '@lobechat/builtin-tool-knowledge-base/client';
|
||||
import { LobeAgentInspectors, LobeAgentManifest } from '@lobechat/builtin-tool-lobe-agent/client';
|
||||
import {
|
||||
LobeDeliveryCheckerInspectors,
|
||||
LobeDeliveryCheckerManifest,
|
||||
} from '@lobechat/builtin-tool-lobe-delivery-checker/client';
|
||||
import {
|
||||
LocalSystemInspectors,
|
||||
LocalSystemManifest,
|
||||
@@ -94,6 +98,10 @@ const BuiltinToolInspectors: Record<string, Record<string, BuiltinInspector>> =
|
||||
>,
|
||||
[KnowledgeBaseManifest.identifier]: KnowledgeBaseInspectors as Record<string, BuiltinInspector>,
|
||||
[LobeAgentManifest.identifier]: LobeAgentInspectors as Record<string, BuiltinInspector>,
|
||||
[LobeDeliveryCheckerManifest.identifier]: LobeDeliveryCheckerInspectors as Record<
|
||||
string,
|
||||
BuiltinInspector
|
||||
>,
|
||||
[LocalSystemManifest.identifier]: LocalSystemInspectors as Record<string, BuiltinInspector>,
|
||||
[MemoryManifest.identifier]: MemoryInspectors as Record<string, BuiltinInspector>,
|
||||
[MessageManifest.identifier]: MessageInspectors as Record<string, BuiltinInspector>,
|
||||
|
||||
@@ -1,6 +1,28 @@
|
||||
import { WebBrowsingManifest, WebBrowsingPortal } from '@lobechat/builtin-tool-web-browsing/client';
|
||||
import { type BuiltinPortal } from '@lobechat/types';
|
||||
import {
|
||||
LobeDeliveryCheckerManifest,
|
||||
LobeDeliveryCheckerPortal,
|
||||
LobeDeliveryCheckerPortalActions,
|
||||
LobeDeliveryCheckerPortalTitle,
|
||||
} from '@lobechat/builtin-tool-lobe-delivery-checker/client';
|
||||
import {
|
||||
WebBrowsingManifest,
|
||||
WebBrowsingPortal,
|
||||
WebBrowsingPortalTitle,
|
||||
} from '@lobechat/builtin-tool-web-browsing/client';
|
||||
import { type BuiltinPortal, type BuiltinPortalTitle } from '@lobechat/types';
|
||||
|
||||
export const BuiltinToolsPortals: Record<string, BuiltinPortal> = {
|
||||
[LobeDeliveryCheckerManifest.identifier]: LobeDeliveryCheckerPortal as BuiltinPortal,
|
||||
[WebBrowsingManifest.identifier]: WebBrowsingPortal as BuiltinPortal,
|
||||
};
|
||||
|
||||
/** Optional custom header content per tool, rendered in the portal title slot. */
|
||||
export const BuiltinToolsPortalTitles: Record<string, BuiltinPortalTitle> = {
|
||||
[LobeDeliveryCheckerManifest.identifier]: LobeDeliveryCheckerPortalTitle,
|
||||
[WebBrowsingManifest.identifier]: WebBrowsingPortalTitle,
|
||||
};
|
||||
|
||||
/** Optional header right-actions per tool, rendered next to the portal close. */
|
||||
export const BuiltinToolsPortalActions: Record<string, BuiltinPortalTitle> = {
|
||||
[LobeDeliveryCheckerManifest.identifier]: LobeDeliveryCheckerPortalActions,
|
||||
};
|
||||
|
||||
@@ -20,6 +20,10 @@ import {
|
||||
KnowledgeBaseRenders,
|
||||
} from '@lobechat/builtin-tool-knowledge-base/client';
|
||||
import { LobeAgentManifest, LobeAgentRenders } from '@lobechat/builtin-tool-lobe-agent/client';
|
||||
import {
|
||||
LobeDeliveryCheckerManifest,
|
||||
LobeDeliveryCheckerRenders,
|
||||
} from '@lobechat/builtin-tool-lobe-delivery-checker/client';
|
||||
import {
|
||||
LocalSystemManifest,
|
||||
LocalSystemRenders,
|
||||
@@ -69,6 +73,10 @@ const BuiltinToolsRenders: Record<string, Record<string, BuiltinRender>> = {
|
||||
[GroupManagementManifest.identifier]: GroupManagementRenders as Record<string, BuiltinRender>,
|
||||
[KnowledgeBaseManifest.identifier]: KnowledgeBaseRenders as Record<string, BuiltinRender>,
|
||||
[LobeAgentManifest.identifier]: LobeAgentRenders as Record<string, BuiltinRender>,
|
||||
[LobeDeliveryCheckerManifest.identifier]: LobeDeliveryCheckerRenders as Record<
|
||||
string,
|
||||
BuiltinRender
|
||||
>,
|
||||
[LocalSystemManifest.identifier]: LocalSystemRenders as Record<string, BuiltinRender>,
|
||||
[MemoryManifest.identifier]: MemoryRenders as Record<string, BuiltinRender>,
|
||||
[MessageManifest.identifier]: MessageRenders as Record<string, BuiltinRender>,
|
||||
|
||||
@@ -9,6 +9,8 @@ export const WEB_DOCUMENT_SOURCE_TYPE = 'web';
|
||||
|
||||
export const AGENT_DOCUMENT_FILE_TYPE = 'agent/document';
|
||||
export const AGENT_PLAN_FILE_TYPE = 'agent/plan';
|
||||
/** A verify criterion's detailed judging instruction / rule body. */
|
||||
export const VERIFY_INSTRUCTION_FILE_TYPE = 'verify/instruction';
|
||||
export const CUSTOM_DOCUMENT_FILE_TYPE = 'custom/document';
|
||||
export const CUSTOM_FOLDER_FILE_TYPE = 'custom/folder';
|
||||
|
||||
|
||||
@@ -23,6 +23,8 @@ export const TRACING_SCENARIOS = {
|
||||
TaskHandoff: 'task_handoff',
|
||||
TopicTitle: 'topic_title',
|
||||
Unknown: 'unknown',
|
||||
VerifyJudge: 'verify_judge',
|
||||
VerifyPlanGen: 'verify_plan_gen',
|
||||
} as const;
|
||||
|
||||
export type TracingScenario = (typeof TRACING_SCENARIOS)[keyof typeof TRACING_SCENARIOS];
|
||||
|
||||
@@ -21,6 +21,7 @@ import {
|
||||
TasksFlattenProcessor,
|
||||
ToolCallProcessor,
|
||||
ToolMessageReorder,
|
||||
VerifyMessageProcessor,
|
||||
} from '../../processors';
|
||||
import {
|
||||
ActiveTopicDocumentContextInjector,
|
||||
@@ -403,6 +404,9 @@ export class MessagesEngine {
|
||||
new TasksFlattenProcessor(),
|
||||
// Task message processing
|
||||
new TaskMessageProcessor(),
|
||||
// Verify (delivery-checker) cards: drop empty UI-only ones; surface
|
||||
// auto-repair failure feedback as a user turn for the repair run
|
||||
new VerifyMessageProcessor(),
|
||||
// Supervisor role restore
|
||||
new SupervisorRoleRestoreProcessor(),
|
||||
// Compressed group role transform
|
||||
|
||||
@@ -0,0 +1,62 @@
|
||||
import debug from 'debug';
|
||||
|
||||
import { BaseProcessor } from '../base/BaseProcessor';
|
||||
import type { PipelineContext } from '../types';
|
||||
|
||||
declare module '../types' {
|
||||
interface PipelineContextMetadataOverrides {
|
||||
verifyFeedbackSurfaced?: number;
|
||||
verifyMessagesRemoved?: number;
|
||||
}
|
||||
}
|
||||
|
||||
const log = debug('context-engine:processor:VerifyMessageProcessor');
|
||||
|
||||
/**
|
||||
* Verify Message Processor
|
||||
*
|
||||
* `role='verify'` messages are the Agent Run delivery-checker cards. `verify` is
|
||||
* not a valid model role, so they never reach the model as-is. Two cases:
|
||||
*
|
||||
* - **UI-only card** (empty content): a plain pass/fail card — removed from the
|
||||
* model context (still persisted + rendered in the conversation).
|
||||
* - **Repair feedback** (non-empty content): auto-repair persisted the failure
|
||||
* feedback onto the card (see VerifyRepairService). Surface it as a `user`
|
||||
* turn — wrapped in a `<delivery_check_feedback>` tag so the model reads it as
|
||||
* the checker's instruction, not human input — so the repair run acts on it.
|
||||
*/
|
||||
export class VerifyMessageProcessor extends BaseProcessor {
|
||||
readonly name = 'VerifyMessageProcessor';
|
||||
|
||||
protected async doProcess(context: PipelineContext): Promise<PipelineContext> {
|
||||
const clonedContext = this.cloneContext(context);
|
||||
|
||||
const before = clonedContext.messages.length;
|
||||
let surfaced = 0;
|
||||
|
||||
clonedContext.messages = clonedContext.messages.flatMap((message) => {
|
||||
if (message.role !== 'verify') return [message];
|
||||
|
||||
const content = typeof message.content === 'string' ? message.content.trim() : '';
|
||||
// UI-only card with no repair feedback — drop it from the model context.
|
||||
if (!content) return [];
|
||||
|
||||
surfaced += 1;
|
||||
return [
|
||||
{
|
||||
...message,
|
||||
content: `<delivery_check_feedback>\n${content}\n</delivery_check_feedback>`,
|
||||
role: 'user' as const,
|
||||
},
|
||||
];
|
||||
});
|
||||
|
||||
const removed = before - clonedContext.messages.length;
|
||||
clonedContext.metadata.verifyMessagesRemoved = removed;
|
||||
clonedContext.metadata.verifyFeedbackSurfaced = surfaced;
|
||||
if (removed > 0) log(`Removed ${removed} empty verify message(s) from model context`);
|
||||
if (surfaced > 0) log(`Surfaced ${surfaced} verify feedback message(s) as user turn(s)`);
|
||||
|
||||
return this.markAsExecuted(clonedContext);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,58 @@
|
||||
import { describe, expect, it } from 'vitest';
|
||||
|
||||
import type { PipelineContext } from '../../types';
|
||||
import { VerifyMessageProcessor } from '../VerifyMessage';
|
||||
|
||||
describe('VerifyMessageProcessor', () => {
|
||||
const createContext = (messages: any[]): PipelineContext => ({
|
||||
initialState: { messages: [] },
|
||||
isAborted: false,
|
||||
messages,
|
||||
metadata: {},
|
||||
});
|
||||
|
||||
it('drops empty UI-only verify cards from the model context', async () => {
|
||||
const processor = new VerifyMessageProcessor();
|
||||
const context = createContext([
|
||||
{ content: 'Hello', role: 'user' },
|
||||
{ content: '', role: 'verify' },
|
||||
{ content: 'Hi there', role: 'assistant' },
|
||||
]);
|
||||
|
||||
const result = await processor.process(context);
|
||||
|
||||
expect(result.messages).toHaveLength(2);
|
||||
expect(result.messages.map((m) => m.role)).toEqual(['user', 'assistant']);
|
||||
expect(result.metadata.verifyMessagesRemoved).toBe(1);
|
||||
expect(result.metadata.verifyFeedbackSurfaced).toBe(0);
|
||||
});
|
||||
|
||||
it('surfaces repair feedback as a tagged user turn', async () => {
|
||||
const processor = new VerifyMessageProcessor();
|
||||
const context = createContext([
|
||||
{ content: 'Write a paragraph', role: 'user' },
|
||||
{ content: 'Done', role: 'assistant' },
|
||||
{ content: '1. No letter e — the body still contains "e"', role: 'verify' },
|
||||
]);
|
||||
|
||||
const result = await processor.process(context);
|
||||
|
||||
expect(result.messages).toHaveLength(3);
|
||||
const last = result.messages[2];
|
||||
expect(last.role).toBe('user');
|
||||
expect(last.content).toContain('<delivery_check_feedback>');
|
||||
expect(last.content).toContain('No letter e');
|
||||
expect(result.metadata.verifyFeedbackSurfaced).toBe(1);
|
||||
expect(result.metadata.verifyMessagesRemoved).toBe(0);
|
||||
});
|
||||
|
||||
it('treats whitespace-only content as an empty card', async () => {
|
||||
const processor = new VerifyMessageProcessor();
|
||||
const context = createContext([{ content: ' \n ', role: 'verify' }]);
|
||||
|
||||
const result = await processor.process(context);
|
||||
|
||||
expect(result.messages).toHaveLength(0);
|
||||
expect(result.metadata.verifyMessagesRemoved).toBe(1);
|
||||
});
|
||||
});
|
||||
@@ -25,6 +25,7 @@ export { TaskMessageProcessor } from './TaskMessage';
|
||||
export { TasksFlattenProcessor } from './TasksFlatten';
|
||||
export { ToolCallProcessor } from './ToolCall';
|
||||
export { ToolMessageReorder } from './ToolMessageReorder';
|
||||
export { VerifyMessageProcessor } from './VerifyMessage';
|
||||
|
||||
// Re-export types
|
||||
export type { AgentInfo, GroupRoleTransformConfig } from './GroupRoleTransform';
|
||||
|
||||
@@ -0,0 +1,74 @@
|
||||
// @vitest-environment node
|
||||
import { eq } from 'drizzle-orm';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
|
||||
import { getTestDB } from '../../../core/getTestDB';
|
||||
import { users } from '../../../schemas';
|
||||
import type { LobeChatDatabase } from '../../../type';
|
||||
import { MessageModel } from '../../message';
|
||||
|
||||
const serverDB: LobeChatDatabase = await getTestDB();
|
||||
|
||||
const userId = 'verify-msg-lookup-user';
|
||||
const otherUserId = 'verify-msg-lookup-other';
|
||||
const messageModel = new MessageModel(serverDB, userId);
|
||||
|
||||
beforeEach(async () => {
|
||||
await serverDB.delete(users).where(eq(users.id, userId));
|
||||
await serverDB.delete(users).where(eq(users.id, otherUserId));
|
||||
await serverDB.insert(users).values([{ id: userId }, { id: otherUserId }]);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await serverDB.delete(users).where(eq(users.id, userId));
|
||||
await serverDB.delete(users).where(eq(users.id, otherUserId));
|
||||
});
|
||||
|
||||
describe('MessageModel.findVerifyMessageByOperationId', () => {
|
||||
it('resolves the verify card by its verifyOperationId metadata', async () => {
|
||||
const created = await messageModel.create({
|
||||
content: '',
|
||||
metadata: { verifyOperationId: 'op-1' },
|
||||
role: 'verify',
|
||||
});
|
||||
|
||||
const found = await messageModel.findVerifyMessageByOperationId('op-1');
|
||||
expect(found?.id).toBe(created.id);
|
||||
expect(found?.role).toBe('verify');
|
||||
});
|
||||
|
||||
it('returns undefined for an unknown operation id', async () => {
|
||||
await messageModel.create({
|
||||
content: '',
|
||||
metadata: { verifyOperationId: 'op-1' },
|
||||
role: 'verify',
|
||||
});
|
||||
|
||||
const found = await messageModel.findVerifyMessageByOperationId('op-unknown');
|
||||
expect(found).toBeUndefined();
|
||||
});
|
||||
|
||||
it('does not match non-verify messages with the same metadata', async () => {
|
||||
await messageModel.create({
|
||||
content: 'hi',
|
||||
metadata: { verifyOperationId: 'op-2' },
|
||||
role: 'user',
|
||||
});
|
||||
|
||||
const found = await messageModel.findVerifyMessageByOperationId('op-2');
|
||||
expect(found).toBeUndefined();
|
||||
});
|
||||
|
||||
it('is scoped to the owning user', async () => {
|
||||
await messageModel.create({
|
||||
content: '',
|
||||
metadata: { verifyOperationId: 'op-3' },
|
||||
role: 'verify',
|
||||
});
|
||||
|
||||
const asOther = await new MessageModel(serverDB, otherUserId).findVerifyMessageByOperationId(
|
||||
'op-3',
|
||||
);
|
||||
expect(asOther).toBeUndefined();
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,98 @@
|
||||
// @vitest-environment node
|
||||
import type { VerifyCheckItem } from '@lobechat/types';
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
|
||||
import { getTestDB } from '../../core/getTestDB';
|
||||
import { agentOperations, users, verifyCheckResults } from '../../schemas';
|
||||
import type { LobeChatDatabase } from '../../type';
|
||||
import { AgentOperationModel } from '../agentOperation';
|
||||
import { VerifyCheckResultModel } from '../verifyCheckResult';
|
||||
|
||||
const serverDB: LobeChatDatabase = await getTestDB();
|
||||
|
||||
const userId = 'verify-result-test-user';
|
||||
const operationId = 'verify-result-test-op';
|
||||
|
||||
const buildItem = (overrides: Partial<VerifyCheckItem> = {}): VerifyCheckItem => ({
|
||||
id: 'item-1',
|
||||
index: 0,
|
||||
onFail: 'manual',
|
||||
required: true,
|
||||
title: 'goal met',
|
||||
verifierConfig: {},
|
||||
verifierType: 'llm',
|
||||
...overrides,
|
||||
});
|
||||
|
||||
beforeEach(async () => {
|
||||
await serverDB.delete(users);
|
||||
await serverDB.insert(users).values([{ id: userId }]);
|
||||
await new AgentOperationModel(serverDB, userId).recordStart({ operationId });
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await serverDB.delete(verifyCheckResults);
|
||||
await serverDB.delete(agentOperations);
|
||||
await serverDB.delete(users);
|
||||
});
|
||||
|
||||
describe('VerifyCheckResultModel', () => {
|
||||
it('batch-inserts pending rows and lists by operation in index order', async () => {
|
||||
const model = new VerifyCheckResultModel(serverDB, userId);
|
||||
await model.createMany([
|
||||
{ checkItemId: 'b', checkItemIndex: 1, operationId, verifierType: 'llm' },
|
||||
{ checkItemId: 'a', checkItemIndex: 0, operationId, verifierType: 'llm' },
|
||||
]);
|
||||
|
||||
const rows = await model.listByOperation(operationId);
|
||||
expect(rows.map((r) => r.checkItemId)).toEqual(['a', 'b']);
|
||||
expect(rows[0].status).toBe('pending');
|
||||
});
|
||||
|
||||
it('updates a result by its stable (operationId, checkItemId) key', async () => {
|
||||
const model = new VerifyCheckResultModel(serverDB, userId);
|
||||
await model.create({ checkItemId: 'a', checkItemIndex: 0, operationId, verifierType: 'llm' });
|
||||
|
||||
await model.updateByCheckItem(operationId, 'a', {
|
||||
confidence: 0.9,
|
||||
status: 'passed',
|
||||
toulmin: { evidence: 'tests passed', reasoning: 'covers the goal' },
|
||||
verdict: 'passed',
|
||||
});
|
||||
|
||||
const [row] = await model.listByOperation(operationId);
|
||||
expect(row).toMatchObject({ confidence: 0.9, status: 'passed', verdict: 'passed' });
|
||||
expect(row.toulmin).toEqual({ evidence: 'tests passed', reasoning: 'covers the goal' });
|
||||
});
|
||||
});
|
||||
|
||||
describe('AgentOperationModel verify plan', () => {
|
||||
it('sets a draft plan and flips rollup to planned', async () => {
|
||||
const model = new AgentOperationModel(serverDB, userId);
|
||||
await model.setVerifyPlan(operationId, [buildItem()]);
|
||||
|
||||
const state = await model.getVerifyState(operationId);
|
||||
expect(state?.verifyStatus).toBe('planned');
|
||||
expect(state?.verifyPlan).toHaveLength(1);
|
||||
expect(state?.verifyPlanConfirmedAt).toBeNull();
|
||||
});
|
||||
|
||||
it('allows editing a draft plan but not after confirmation', async () => {
|
||||
const model = new AgentOperationModel(serverDB, userId);
|
||||
await model.setVerifyPlan(operationId, [buildItem({ title: 'draft' })]);
|
||||
|
||||
await model.replaceVerifyPlanItems(operationId, [buildItem({ title: 'edited' })]);
|
||||
expect((await model.getVerifyState(operationId))?.verifyPlan?.[0].title).toBe('edited');
|
||||
|
||||
await model.confirmVerifyPlan(operationId);
|
||||
await model.replaceVerifyPlanItems(operationId, [buildItem({ title: 'too late' })]);
|
||||
expect((await model.getVerifyState(operationId))?.verifyPlan?.[0].title).toBe('edited');
|
||||
});
|
||||
|
||||
it('updates the rollup status', async () => {
|
||||
const model = new AgentOperationModel(serverDB, userId);
|
||||
await model.setVerifyPlan(operationId, [buildItem()]);
|
||||
await model.updateVerifyStatus(operationId, 'passed');
|
||||
expect((await model.getVerifyState(operationId))?.verifyStatus).toBe('passed');
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,75 @@
|
||||
// @vitest-environment node
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
|
||||
import { getTestDB } from '../../core/getTestDB';
|
||||
import { users, verifyCriteria } from '../../schemas';
|
||||
import type { LobeChatDatabase } from '../../type';
|
||||
import { VerifyCriterionModel } from '../verifyCriterion';
|
||||
|
||||
const serverDB: LobeChatDatabase = await getTestDB();
|
||||
|
||||
const userId = 'verify-criterion-test-user';
|
||||
const otherUserId = 'verify-criterion-test-other-user';
|
||||
|
||||
beforeEach(async () => {
|
||||
await serverDB.delete(users);
|
||||
await serverDB.insert(users).values([{ id: userId }, { id: otherUserId }]);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await serverDB.delete(verifyCriteria);
|
||||
await serverDB.delete(users);
|
||||
});
|
||||
|
||||
describe('VerifyCriterionModel', () => {
|
||||
it('creates a criterion scoped to the user', async () => {
|
||||
const model = new VerifyCriterionModel(serverDB, userId);
|
||||
const created = await model.create({
|
||||
title: 'type-check passes',
|
||||
verifierConfig: { command: 'pnpm type-check' },
|
||||
verifierType: 'program',
|
||||
});
|
||||
|
||||
expect(created).toMatchObject({
|
||||
onFail: 'manual',
|
||||
required: true,
|
||||
title: 'type-check passes',
|
||||
userId,
|
||||
verifierType: 'program',
|
||||
});
|
||||
expect(created.id).toBeDefined();
|
||||
});
|
||||
|
||||
it('lists only the current user criteria', async () => {
|
||||
const mine = new VerifyCriterionModel(serverDB, userId);
|
||||
const other = new VerifyCriterionModel(serverDB, otherUserId);
|
||||
await mine.create({ title: 'a', verifierType: 'llm' });
|
||||
await other.create({ title: 'b', verifierType: 'llm' });
|
||||
|
||||
const list = await mine.query();
|
||||
expect(list).toHaveLength(1);
|
||||
expect(list[0].title).toBe('a');
|
||||
});
|
||||
|
||||
it('resolves a set of ids via findByIds (user-scoped)', async () => {
|
||||
const mine = new VerifyCriterionModel(serverDB, userId);
|
||||
const other = new VerifyCriterionModel(serverDB, otherUserId);
|
||||
const a = await mine.create({ title: 'a', verifierType: 'llm' });
|
||||
const b = await mine.create({ title: 'b', verifierType: 'agent' });
|
||||
const leaked = await other.create({ title: 'leaked', verifierType: 'llm' });
|
||||
|
||||
const resolved = await mine.findByIds([a.id, b.id, leaked.id]);
|
||||
expect(resolved.map((r) => r.id).sort()).toEqual([a.id, b.id].sort());
|
||||
});
|
||||
|
||||
it('updates and deletes', async () => {
|
||||
const model = new VerifyCriterionModel(serverDB, userId);
|
||||
const c = await model.create({ title: 'old', verifierType: 'llm' });
|
||||
|
||||
await model.update(c.id, { required: false, title: 'new' });
|
||||
expect(await model.findById(c.id)).toMatchObject({ required: false, title: 'new' });
|
||||
|
||||
await model.delete(c.id);
|
||||
expect(await model.findById(c.id)).toBeUndefined();
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,62 @@
|
||||
// @vitest-environment node
|
||||
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
|
||||
|
||||
import { getTestDB } from '../../core/getTestDB';
|
||||
import { users, verifyRubrics } from '../../schemas';
|
||||
import type { LobeChatDatabase } from '../../type';
|
||||
import { VerifyRubricModel } from '../verifyRubric';
|
||||
|
||||
const serverDB: LobeChatDatabase = await getTestDB();
|
||||
|
||||
const userId = 'verify-rubric-test-user';
|
||||
const otherUserId = 'verify-rubric-test-other';
|
||||
|
||||
beforeEach(async () => {
|
||||
await serverDB.delete(users);
|
||||
await serverDB.insert(users).values([{ id: userId }, { id: otherUserId }]);
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await serverDB.delete(verifyRubrics);
|
||||
await serverDB.delete(users);
|
||||
});
|
||||
|
||||
describe('VerifyRubricModel config', () => {
|
||||
it('persists run-policy config on create and reads it back', async () => {
|
||||
const model = new VerifyRubricModel(serverDB, userId);
|
||||
const created = await model.create({ config: { maxRepairRounds: 3 }, title: 'standard' });
|
||||
|
||||
const found = await model.findById(created.id);
|
||||
expect(found?.title).toBe('standard');
|
||||
expect(found?.config).toEqual({ maxRepairRounds: 3 });
|
||||
});
|
||||
|
||||
it('defaults config to an empty object when omitted', async () => {
|
||||
const model = new VerifyRubricModel(serverDB, userId);
|
||||
const created = await model.create({ title: 'no config' });
|
||||
|
||||
const found = await model.findById(created.id);
|
||||
expect(found?.config).toEqual({});
|
||||
});
|
||||
|
||||
it('updates the config independently of other fields', async () => {
|
||||
const model = new VerifyRubricModel(serverDB, userId);
|
||||
const created = await model.create({ config: { maxRepairRounds: 1 }, title: 'standard' });
|
||||
|
||||
await model.update(created.id, { config: { maxRepairRounds: 0 } });
|
||||
|
||||
const found = await model.findById(created.id);
|
||||
expect(found?.config).toEqual({ maxRepairRounds: 0 });
|
||||
expect(found?.title).toBe('standard');
|
||||
});
|
||||
|
||||
it('scopes reads to the owning user', async () => {
|
||||
const created = await new VerifyRubricModel(serverDB, userId).create({
|
||||
config: { maxRepairRounds: 2 },
|
||||
title: 'mine',
|
||||
});
|
||||
|
||||
const asOther = await new VerifyRubricModel(serverDB, otherUserId).findById(created.id);
|
||||
expect(asOther).toBeUndefined();
|
||||
});
|
||||
});
|
||||
@@ -1,4 +1,5 @@
|
||||
import { and, eq, gte, isNotNull, sql } from 'drizzle-orm';
|
||||
import type { VerifyCheckItem } from '@lobechat/types';
|
||||
import { and, eq, gte, isNotNull, isNull, sql } from 'drizzle-orm';
|
||||
|
||||
import { today } from '@/utils/time';
|
||||
|
||||
@@ -11,6 +12,16 @@ import type {
|
||||
import { agentOperations } from '../schemas/agentOperations';
|
||||
import type { LobeChatDatabase } from '../type';
|
||||
|
||||
/** Verify rollup states, mirrors the `verify_status` enum column. */
|
||||
export type VerifyStatus =
|
||||
| 'unverified'
|
||||
| 'planned'
|
||||
| 'verifying'
|
||||
| 'passed'
|
||||
| 'failed'
|
||||
| 'repairing'
|
||||
| 'delivered';
|
||||
|
||||
export interface RecordOperationStartParams {
|
||||
agentId?: string | null;
|
||||
appContext?: AgentOperationAppContext;
|
||||
@@ -201,4 +212,64 @@ export class AgentOperationModel {
|
||||
.returning({ id: agentOperations.id });
|
||||
return rows.length === 1;
|
||||
}
|
||||
|
||||
// ============================================
|
||||
// Verify (delivery checker) — plan snapshot lives on this row
|
||||
// ============================================
|
||||
|
||||
/**
|
||||
* Write a draft check plan onto the operation and flip the rollup to `planned`.
|
||||
* The plan is mutable while a draft; it is frozen on `confirmVerifyPlan`.
|
||||
*/
|
||||
async setVerifyPlan(operationId: string, items: VerifyCheckItem[]): Promise<void> {
|
||||
await this.db
|
||||
.update(agentOperations)
|
||||
.set({ verifyPlan: items, verifyStatus: 'planned' })
|
||||
.where(and(eq(agentOperations.id, operationId), eq(agentOperations.userId, this.userId)));
|
||||
}
|
||||
|
||||
/** Replace the draft plan items (user edited the plan before confirming). */
|
||||
async replaceVerifyPlanItems(operationId: string, items: VerifyCheckItem[]): Promise<void> {
|
||||
await this.db
|
||||
.update(agentOperations)
|
||||
.set({ verifyPlan: items })
|
||||
.where(
|
||||
and(
|
||||
eq(agentOperations.id, operationId),
|
||||
eq(agentOperations.userId, this.userId),
|
||||
// only a not-yet-confirmed plan may be edited
|
||||
isNull(agentOperations.verifyPlanConfirmedAt),
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
/** Freeze the plan (records confirmation time). Results relate to frozen items. */
|
||||
async confirmVerifyPlan(operationId: string, confirmedAt: Date = new Date()): Promise<void> {
|
||||
await this.db
|
||||
.update(agentOperations)
|
||||
.set({ verifyPlanConfirmedAt: confirmedAt })
|
||||
.where(and(eq(agentOperations.id, operationId), eq(agentOperations.userId, this.userId)));
|
||||
}
|
||||
|
||||
/** Update the denormalized rollup. Always go through the service-layer chokepoint. */
|
||||
async updateVerifyStatus(operationId: string, verifyStatus: VerifyStatus | null): Promise<void> {
|
||||
await this.db
|
||||
.update(agentOperations)
|
||||
.set({ verifyStatus })
|
||||
.where(and(eq(agentOperations.id, operationId), eq(agentOperations.userId, this.userId)));
|
||||
}
|
||||
|
||||
/** Read just the verify-related fields for an operation. */
|
||||
async getVerifyState(operationId: string) {
|
||||
const [row] = await this.db
|
||||
.select({
|
||||
verifyPlan: agentOperations.verifyPlan,
|
||||
verifyPlanConfirmedAt: agentOperations.verifyPlanConfirmedAt,
|
||||
verifyStatus: agentOperations.verifyStatus,
|
||||
})
|
||||
.from(agentOperations)
|
||||
.where(and(eq(agentOperations.id, operationId), eq(agentOperations.userId, this.userId)))
|
||||
.limit(1);
|
||||
return row ?? null;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1297,6 +1297,22 @@ export class MessageModel {
|
||||
});
|
||||
};
|
||||
|
||||
/**
|
||||
* Resolve the `role='verify'` delivery-checker card for an Agent Run (created
|
||||
* with `metadata.verifyOperationId = operationId`). Used by auto-repair to
|
||||
* persist the failure feedback onto the card it belongs to.
|
||||
*/
|
||||
findVerifyMessageByOperationId = async (operationId: string) => {
|
||||
return this.db.query.messages.findFirst({
|
||||
where: and(
|
||||
eq(messages.userId, this.userId),
|
||||
eq(messages.role, 'verify'),
|
||||
sql`${messages.metadata}->>'verifyOperationId' = ${operationId}`,
|
||||
),
|
||||
orderBy: [desc(messages.createdAt)],
|
||||
});
|
||||
};
|
||||
|
||||
/**
|
||||
* Get parent messages for a thread
|
||||
*
|
||||
|
||||
@@ -0,0 +1,104 @@
|
||||
import { and, asc, eq, inArray, isNull } from 'drizzle-orm';
|
||||
|
||||
import type { NewVerifyCheckResult, VerifyCheckResultItem } from '../schemas/verify';
|
||||
import { verifyCheckResults } from '../schemas/verify';
|
||||
import type { LobeChatDatabase } from '../type';
|
||||
|
||||
export class VerifyCheckResultModel {
|
||||
private readonly db: LobeChatDatabase;
|
||||
private readonly userId: string;
|
||||
|
||||
constructor(db: LobeChatDatabase, userId: string) {
|
||||
this.db = db;
|
||||
this.userId = userId;
|
||||
}
|
||||
|
||||
create = async (params: Omit<NewVerifyCheckResult, 'userId'>) => {
|
||||
const [result] = await this.db
|
||||
.insert(verifyCheckResults)
|
||||
.values({ ...params, userId: this.userId })
|
||||
.returning();
|
||||
|
||||
return result;
|
||||
};
|
||||
|
||||
/** Batch-insert the initial `pending` rows when verify execution starts. */
|
||||
createMany = async (rows: Omit<NewVerifyCheckResult, 'userId'>[]) => {
|
||||
if (rows.length === 0) return [];
|
||||
return this.db
|
||||
.insert(verifyCheckResults)
|
||||
.values(rows.map((r) => ({ ...r, userId: this.userId })))
|
||||
.returning();
|
||||
};
|
||||
|
||||
findById = async (id: string) => {
|
||||
return this.db.query.verifyCheckResults.findFirst({
|
||||
where: and(eq(verifyCheckResults.id, id), eq(verifyCheckResults.userId, this.userId)),
|
||||
});
|
||||
};
|
||||
|
||||
/** All results for one Agent Run, ordered by display index. */
|
||||
listByOperation = async (operationId: string): Promise<VerifyCheckResultItem[]> => {
|
||||
return this.db
|
||||
.select()
|
||||
.from(verifyCheckResults)
|
||||
.where(
|
||||
and(
|
||||
eq(verifyCheckResults.operationId, operationId),
|
||||
eq(verifyCheckResults.userId, this.userId),
|
||||
),
|
||||
)
|
||||
.orderBy(asc(verifyCheckResults.checkItemIndex));
|
||||
};
|
||||
|
||||
update = async (id: string, value: Partial<Omit<VerifyCheckResultItem, 'id' | 'userId'>>) => {
|
||||
return this.db
|
||||
.update(verifyCheckResults)
|
||||
.set(value)
|
||||
.where(and(eq(verifyCheckResults.id, id), eq(verifyCheckResults.userId, this.userId)));
|
||||
};
|
||||
|
||||
/**
|
||||
* Update a result by its stable `(operationId, checkItemId)` key rather than
|
||||
* the row id — used by the executor / batch judge which produces verdicts keyed
|
||||
* by check item id, never by array position.
|
||||
*/
|
||||
updateByCheckItem = async (
|
||||
operationId: string,
|
||||
checkItemId: string,
|
||||
value: Partial<Omit<VerifyCheckResultItem, 'id' | 'userId'>>,
|
||||
) => {
|
||||
return this.db
|
||||
.update(verifyCheckResults)
|
||||
.set(value)
|
||||
.where(
|
||||
and(
|
||||
eq(verifyCheckResults.operationId, operationId),
|
||||
eq(verifyCheckResults.checkItemId, checkItemId),
|
||||
eq(verifyCheckResults.userId, this.userId),
|
||||
),
|
||||
);
|
||||
};
|
||||
|
||||
/**
|
||||
* Late-bind the LLM tracing row onto already-written verdicts. The verdict is
|
||||
* persisted synchronously with `verifier_tracing_id = null`; the tracing row
|
||||
* lands asynchronously (best-effort, after the response), so the FK link is
|
||||
* backfilled only once that row exists. Idempotent — only fills `NULL`s and is
|
||||
* scoped to the items judged in this call (a batch shares one tracing id).
|
||||
*/
|
||||
backfillTracingId = async (operationId: string, checkItemIds: string[], tracingId: string) => {
|
||||
if (checkItemIds.length === 0) return;
|
||||
return this.db
|
||||
.update(verifyCheckResults)
|
||||
.set({ verifierTracingId: tracingId })
|
||||
.where(
|
||||
and(
|
||||
eq(verifyCheckResults.operationId, operationId),
|
||||
eq(verifyCheckResults.userId, this.userId),
|
||||
inArray(verifyCheckResults.checkItemId, checkItemIds),
|
||||
isNull(verifyCheckResults.verifierTracingId),
|
||||
),
|
||||
);
|
||||
};
|
||||
}
|
||||
@@ -0,0 +1,63 @@
|
||||
import { and, desc, eq, inArray } from 'drizzle-orm';
|
||||
|
||||
import type { NewVerifyCriterion, VerifyCriterionItem } from '../schemas/verify';
|
||||
import { verifyCriteria } from '../schemas/verify';
|
||||
import type { LobeChatDatabase } from '../type';
|
||||
|
||||
export class VerifyCriterionModel {
|
||||
private readonly db: LobeChatDatabase;
|
||||
private readonly userId: string;
|
||||
|
||||
constructor(db: LobeChatDatabase, userId: string) {
|
||||
this.db = db;
|
||||
this.userId = userId;
|
||||
}
|
||||
|
||||
create = async (params: Omit<NewVerifyCriterion, 'userId'>) => {
|
||||
const [result] = await this.db
|
||||
.insert(verifyCriteria)
|
||||
.values({ ...params, userId: this.userId })
|
||||
.returning();
|
||||
|
||||
return result;
|
||||
};
|
||||
|
||||
delete = async (id: string) => {
|
||||
return this.db
|
||||
.delete(verifyCriteria)
|
||||
.where(and(eq(verifyCriteria.id, id), eq(verifyCriteria.userId, this.userId)));
|
||||
};
|
||||
|
||||
query = async () => {
|
||||
return this.db.query.verifyCriteria.findMany({
|
||||
orderBy: [desc(verifyCriteria.updatedAt)],
|
||||
where: eq(verifyCriteria.userId, this.userId),
|
||||
});
|
||||
};
|
||||
|
||||
findById = async (id: string) => {
|
||||
return this.db.query.verifyCriteria.findFirst({
|
||||
where: and(eq(verifyCriteria.id, id), eq(verifyCriteria.userId, this.userId)),
|
||||
});
|
||||
};
|
||||
|
||||
/**
|
||||
* Resolve a set of criterion ids into their current definitions. Used by the
|
||||
* plan generator to instantiate ad-hoc `verifyCriteriaIds` mounted on an agent.
|
||||
* Always scoped by `userId` so a leaked id can't pull another user's criterion.
|
||||
*/
|
||||
findByIds = async (ids: string[]): Promise<VerifyCriterionItem[]> => {
|
||||
if (ids.length === 0) return [];
|
||||
return this.db
|
||||
.select()
|
||||
.from(verifyCriteria)
|
||||
.where(and(inArray(verifyCriteria.id, ids), eq(verifyCriteria.userId, this.userId)));
|
||||
};
|
||||
|
||||
update = async (id: string, value: Partial<Omit<VerifyCriterionItem, 'id' | 'userId'>>) => {
|
||||
return this.db
|
||||
.update(verifyCriteria)
|
||||
.set({ ...value, updatedAt: new Date() })
|
||||
.where(and(eq(verifyCriteria.id, id), eq(verifyCriteria.userId, this.userId)));
|
||||
};
|
||||
}
|
||||
@@ -0,0 +1,103 @@
|
||||
import { and, asc, desc, eq } from 'drizzle-orm';
|
||||
|
||||
import type { NewVerifyRubric, VerifyCriterionItem, VerifyRubricItem } from '../schemas/verify';
|
||||
import { verifyCriteria, verifyRubricCriteria, verifyRubrics } from '../schemas/verify';
|
||||
import type { LobeChatDatabase } from '../type';
|
||||
|
||||
export interface RubricCriterionInput {
|
||||
criterionId: string;
|
||||
sortOrder?: number | null;
|
||||
}
|
||||
|
||||
export class VerifyRubricModel {
|
||||
private readonly db: LobeChatDatabase;
|
||||
private readonly userId: string;
|
||||
|
||||
constructor(db: LobeChatDatabase, userId: string) {
|
||||
this.db = db;
|
||||
this.userId = userId;
|
||||
}
|
||||
|
||||
create = async (params: Omit<NewVerifyRubric, 'userId'>) => {
|
||||
const [result] = await this.db
|
||||
.insert(verifyRubrics)
|
||||
.values({ ...params, userId: this.userId })
|
||||
.returning();
|
||||
|
||||
return result;
|
||||
};
|
||||
|
||||
delete = async (id: string) => {
|
||||
// verify_rubric_criteria rows cascade via FK onDelete: 'cascade'.
|
||||
return this.db
|
||||
.delete(verifyRubrics)
|
||||
.where(and(eq(verifyRubrics.id, id), eq(verifyRubrics.userId, this.userId)));
|
||||
};
|
||||
|
||||
query = async () => {
|
||||
return this.db.query.verifyRubrics.findMany({
|
||||
orderBy: [desc(verifyRubrics.updatedAt)],
|
||||
where: eq(verifyRubrics.userId, this.userId),
|
||||
});
|
||||
};
|
||||
|
||||
findById = async (id: string) => {
|
||||
return this.db.query.verifyRubrics.findFirst({
|
||||
where: and(eq(verifyRubrics.id, id), eq(verifyRubrics.userId, this.userId)),
|
||||
});
|
||||
};
|
||||
|
||||
update = async (id: string, value: Partial<Omit<VerifyRubricItem, 'id' | 'userId'>>) => {
|
||||
return this.db
|
||||
.update(verifyRubrics)
|
||||
.set({ ...value, updatedAt: new Date() })
|
||||
.where(and(eq(verifyRubrics.id, id), eq(verifyRubrics.userId, this.userId)));
|
||||
};
|
||||
|
||||
/**
|
||||
* Resolve a rubric into its current criterion definitions, ordered by the
|
||||
* junction `sortOrder`. Used by the plan generator to instantiate the rubric
|
||||
* mounted on an agent. Scoped by `userId`.
|
||||
*/
|
||||
getCriteria = async (rubricId: string): Promise<VerifyCriterionItem[]> => {
|
||||
const rows = await this.db
|
||||
.select({ criterion: verifyCriteria })
|
||||
.from(verifyRubricCriteria)
|
||||
.innerJoin(verifyCriteria, eq(verifyRubricCriteria.criterionId, verifyCriteria.id))
|
||||
.where(
|
||||
and(
|
||||
eq(verifyRubricCriteria.rubricId, rubricId),
|
||||
eq(verifyRubricCriteria.userId, this.userId),
|
||||
),
|
||||
)
|
||||
.orderBy(asc(verifyRubricCriteria.sortOrder));
|
||||
|
||||
return rows.map((r) => r.criterion);
|
||||
};
|
||||
|
||||
/**
|
||||
* Replace the full set of criteria attached to a rubric. Idempotent: clears
|
||||
* existing junction rows then inserts the provided set with their sort order.
|
||||
*/
|
||||
setCriteria = async (rubricId: string, criteria: RubricCriterionInput[]) => {
|
||||
await this.db
|
||||
.delete(verifyRubricCriteria)
|
||||
.where(
|
||||
and(
|
||||
eq(verifyRubricCriteria.rubricId, rubricId),
|
||||
eq(verifyRubricCriteria.userId, this.userId),
|
||||
),
|
||||
);
|
||||
|
||||
if (criteria.length === 0) return;
|
||||
|
||||
await this.db.insert(verifyRubricCriteria).values(
|
||||
criteria.map((c, index) => ({
|
||||
criterionId: c.criterionId,
|
||||
rubricId,
|
||||
sortOrder: c.sortOrder ?? index,
|
||||
userId: this.userId,
|
||||
})),
|
||||
);
|
||||
};
|
||||
}
|
||||
@@ -63,4 +63,16 @@ export interface LobeAgentAgencyConfig {
|
||||
*/
|
||||
executionTarget?: HeteroExecutionTarget;
|
||||
heterogeneousProvider?: HeterogeneousProviderConfig;
|
||||
/**
|
||||
* Ad-hoc verify criteria mounted directly on this agent, in addition to any
|
||||
* `verifyRubricId`. Use for one-off checks that don't warrant a reusable
|
||||
* rubric. References `verify_criteria.id`.
|
||||
*/
|
||||
verifyCriteriaIds?: string[];
|
||||
/**
|
||||
* Verify (delivery checker) rubric (reusable criteria template) mounted on
|
||||
* this agent. Every run instantiates this rubric's criteria — together with
|
||||
* any `verifyCriteriaIds` — into its check plan. References `verify_rubrics.id`.
|
||||
*/
|
||||
verifyRubricId?: string;
|
||||
}
|
||||
|
||||
@@ -127,10 +127,9 @@ export interface LobeAgentChatConfig extends AgentMemoryChatConfig, AgentSelfIte
|
||||
* Runtime environment configuration (desktop only)
|
||||
*/
|
||||
runtimeEnv?: RuntimeEnvConfig;
|
||||
|
||||
searchFCModel?: WorkingModel;
|
||||
searchMode?: SearchMode;
|
||||
|
||||
searchMode?: SearchMode;
|
||||
/**
|
||||
* Skill activate mode:
|
||||
* - 'auto': Default tools (LobeTools, Skills, SkillStore, etc.) are always active,
|
||||
@@ -147,11 +146,20 @@ export interface LobeAgentChatConfig extends AgentMemoryChatConfig, AgentSelfIte
|
||||
textVerbosity?: 'low' | 'medium' | 'high';
|
||||
|
||||
thinking?: 'disabled' | 'auto' | 'enabled';
|
||||
|
||||
thinkingBudget?: number;
|
||||
thinkingLevel?: 'minimal' | 'low' | 'medium' | 'high';
|
||||
thinkingLevel2?: 'low' | 'high';
|
||||
thinkingLevel3?: 'low' | 'medium' | 'high';
|
||||
thinkingLevel4?: 'minimal' | 'high';
|
||||
/**
|
||||
* Tool-resolution mode. When set it overrides the `enableAgentMode` derivation:
|
||||
* - `agent` full default toolset + plugins + always-on tools
|
||||
* - `chat` strict runtime-managed allow-list (KB / memory / web-browsing)
|
||||
* - `custom` the toolset is EXACTLY the agent's declared plugins — nothing
|
||||
* auto-injected. For focused builtin sub-agents (e.g. the verifier).
|
||||
*/
|
||||
toolMode?: 'agent' | 'chat' | 'custom';
|
||||
/**
|
||||
* Maximum length for tool execution result content (in characters)
|
||||
* This prevents context overflow when sending tool results back to LLM
|
||||
@@ -206,6 +214,7 @@ export const AgentChatConfigSchema = z
|
||||
effort: z.enum(['low', 'medium', 'high', 'max']).optional(),
|
||||
enableAdaptiveThinking: z.boolean().optional(),
|
||||
enableAgentMode: z.boolean().optional(),
|
||||
toolMode: z.enum(['agent', 'chat', 'custom']).optional(),
|
||||
enableAutoScrollOnStreaming: z.boolean().optional(),
|
||||
enableCompressHistory: z.boolean().optional(),
|
||||
enableContextCompression: z.boolean().optional(),
|
||||
|
||||
@@ -185,6 +185,9 @@ export const MessageMetadataSchema = ModelUsageSchema.merge(ModelPerformanceSche
|
||||
subAgentId: z.string().optional(),
|
||||
toolExecutionTimeMs: z.number().optional(),
|
||||
trigger: z.nativeEnum(RequestTrigger).optional(),
|
||||
// role='verify' card: which Agent Run (agent_operations.id) it renders.
|
||||
verifyOperationId: z.string().optional(),
|
||||
verifyRound: z.number().optional(),
|
||||
// @deprecated token usage moved to the top-level `usage` column. Still listed
|
||||
// so zod doesn't strip `metadata.usage` from legacy writes during migration.
|
||||
usage: ModelUsageSchema.optional(),
|
||||
@@ -280,7 +283,6 @@ export interface MessageMetadata {
|
||||
* Flag indicating if message content is multimodal (serialized MessageContentPart[])
|
||||
*/
|
||||
isMultimodal?: boolean;
|
||||
|
||||
/**
|
||||
* Flag indicating if message is from the Supervisor agent in group orchestration
|
||||
* Used by conversation-flow to transform role to 'supervisor' for UI rendering
|
||||
@@ -288,6 +290,7 @@ export interface MessageMetadata {
|
||||
isSupervisor?: boolean;
|
||||
/** @deprecated use `metadata.performance` instead */
|
||||
latency?: number;
|
||||
|
||||
/**
|
||||
* Local-system tool snapshots materialized when the user sent @file mentions.
|
||||
*/
|
||||
@@ -374,6 +377,13 @@ export interface MessageMetadata {
|
||||
* but new writers should target the top-level `usage` instead.
|
||||
*/
|
||||
usage?: ModelUsage;
|
||||
/**
|
||||
* Agent Run operation id this verify card belongs to (for role='verify' messages).
|
||||
* References `agent_operations.id`; the card reads the verify plan + results off it.
|
||||
*/
|
||||
verifyOperationId?: string;
|
||||
/** Display round number for the verify card (1-based; repair rounds are separate). */
|
||||
verifyRound?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
@@ -30,7 +30,8 @@ export type UIMessageRoleType =
|
||||
| 'assistantGroup'
|
||||
| 'agentCouncil'
|
||||
| 'compressedGroup'
|
||||
| 'compareGroup';
|
||||
| 'compareGroup'
|
||||
| 'verify';
|
||||
|
||||
export interface ChatFileItem {
|
||||
content?: string;
|
||||
|
||||
@@ -12,7 +12,13 @@ import { ToolInterventionSchema } from '../common/tools';
|
||||
import type { UIChatMessage } from './chat';
|
||||
import { SemanticSearchChunkSchema } from './rag';
|
||||
|
||||
export type CreateMessageRoleType = 'user' | 'assistant' | 'tool' | 'task' | 'supervisor';
|
||||
export type CreateMessageRoleType =
|
||||
| 'user'
|
||||
| 'assistant'
|
||||
| 'tool'
|
||||
| 'task'
|
||||
| 'supervisor'
|
||||
| 'verify';
|
||||
|
||||
export interface CreateMessageParams extends Partial<
|
||||
Omit<UIChatMessage, 'content' | 'role' | 'topicId' | 'chunksList'>
|
||||
|
||||
@@ -275,11 +275,31 @@ export interface BuiltinPortalProps<Arguments = Record<string, any>, State = any
|
||||
arguments: Arguments;
|
||||
identifier: string;
|
||||
messageId: string;
|
||||
/**
|
||||
* Extra params the opener passed to `openToolUI` — e.g. which list item the
|
||||
* user clicked. Optional; portals that don't need a focused target ignore it.
|
||||
*/
|
||||
params?: Record<string, any>;
|
||||
state: State;
|
||||
}
|
||||
|
||||
export type BuiltinPortal = <T = any>(props: BuiltinPortalProps<T>) => ReactNode;
|
||||
|
||||
/**
|
||||
* Props for a tool's optional portal header content. The framework owns the
|
||||
* back/close chrome and renders this in the title slot, so a tool can name and
|
||||
* decorate its own portal without the framework hard-coding tool knowledge.
|
||||
*/
|
||||
export interface BuiltinPortalTitleProps {
|
||||
apiName?: string;
|
||||
identifier: string;
|
||||
messageId: string;
|
||||
/** Extra params the opener passed to `openToolUI` (e.g. focused item index). */
|
||||
params?: Record<string, any>;
|
||||
}
|
||||
|
||||
export type BuiltinPortalTitle = (props: BuiltinPortalTitleProps) => ReactNode;
|
||||
|
||||
export interface BuiltinPlaceholderProps<T extends Record<string, any> = any> {
|
||||
apiName: string;
|
||||
args?: T;
|
||||
|
||||
@@ -0,0 +1,59 @@
|
||||
'use client';
|
||||
|
||||
import { Flexbox } from '@lobehub/ui';
|
||||
import { createStyles } from 'antd-style';
|
||||
import isEqual from 'fast-deep-equal';
|
||||
import { memo } from 'react';
|
||||
|
||||
import { CheckerDock, RunResult } from '@/features/Verify';
|
||||
import { useVerifyState } from '@/features/Verify/hooks';
|
||||
import { phaseCardBackground, phaseFromStatus } from '@/features/Verify/utils';
|
||||
|
||||
import { dataSelectors, useConversationStore } from '../../store';
|
||||
|
||||
const useStyles = createStyles(({ css, token }) => ({
|
||||
card: css`
|
||||
overflow: hidden;
|
||||
border: 1px solid ${token.colorBorderSecondary};
|
||||
border-radius: 16px;
|
||||
background: ${token.colorBgElevated};
|
||||
`,
|
||||
}));
|
||||
|
||||
interface VerifyMessageProps {
|
||||
id: string;
|
||||
index: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Renders a `role='verify'` message — the Agent Run delivery-checker card. The
|
||||
* run's `operationId` is carried on `metadata.verifyOperationId`. Renders as a
|
||||
* single card: the run result header (round + status) on top, then the checker
|
||||
* results + actions. Unlike assistant/user messages this is a standalone card
|
||||
* group (no avatar bubble).
|
||||
*/
|
||||
const VerifyMessage = memo<VerifyMessageProps>(({ id }) => {
|
||||
const { styles, theme } = useStyles();
|
||||
const item = useConversationStore(dataSelectors.getDisplayMessageById(id), isEqual);
|
||||
const operationId = item?.metadata?.verifyOperationId;
|
||||
// Sequence number among all verify messages in the thread (not the repair round).
|
||||
const ordinal = useConversationStore(dataSelectors.getVerifyOrdinal(id));
|
||||
|
||||
const { data: state } = useVerifyState(operationId ?? null);
|
||||
const phase = phaseFromStatus(state?.verifyStatus);
|
||||
|
||||
if (!operationId) return null;
|
||||
|
||||
return (
|
||||
<Flexbox paddingBlock={8}>
|
||||
<div className={styles.card} style={{ background: phaseCardBackground(phase, theme) }}>
|
||||
<RunResult embedded operationId={operationId} round={ordinal} />
|
||||
<CheckerDock embedded operationId={operationId} />
|
||||
</div>
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
VerifyMessage.displayName = 'VerifyMessage';
|
||||
|
||||
export default VerifyMessage;
|
||||
@@ -24,6 +24,7 @@ import TaskMessage from './Task';
|
||||
import TasksMessage from './Tasks';
|
||||
import ToolMessage from './Tool';
|
||||
import UserMessage from './User';
|
||||
import VerifyMessage from './Verify';
|
||||
|
||||
const prefixCls = 'ant';
|
||||
|
||||
@@ -186,6 +187,10 @@ const MessageItem = memo<MessageItemProps>(
|
||||
case 'tool': {
|
||||
return <ToolMessage disableEditing={disableEditing} id={id} index={index} />;
|
||||
}
|
||||
|
||||
case 'verify': {
|
||||
return <VerifyMessage id={id} index={index} />;
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
|
||||
@@ -208,9 +208,22 @@ const getBlockHasTools =
|
||||
return !!tools && tools.length > 0;
|
||||
};
|
||||
|
||||
/** 1-based position of a verify message among all verify messages in the thread. */
|
||||
const getVerifyOrdinal = (id: string) => (s: State) => {
|
||||
let ordinal = 0;
|
||||
for (const message of s.displayMessages) {
|
||||
if (message.role === 'verify') {
|
||||
ordinal += 1;
|
||||
if (message.id === id) return ordinal;
|
||||
}
|
||||
}
|
||||
return ordinal || 1;
|
||||
};
|
||||
|
||||
export const dataSelectors = {
|
||||
currentTopicSummary,
|
||||
dbMessages,
|
||||
getVerifyOrdinal,
|
||||
displayMessageIds,
|
||||
displayMessages,
|
||||
findLastMessageId,
|
||||
|
||||
@@ -8,6 +8,7 @@ import { safeParseJSON } from '@/utils/safeParseJSON';
|
||||
|
||||
const ToolRender = memo(() => {
|
||||
const messageId = useChatStore(chatPortalSelectors.toolMessageId);
|
||||
const params = useChatStore(chatPortalSelectors.toolUIParams, isEqual);
|
||||
const message = useChatStore(dbMessageSelectors.getDbMessageById(messageId || ''), isEqual);
|
||||
|
||||
// make sure the message and id is valid
|
||||
@@ -32,6 +33,7 @@ const ToolRender = memo(() => {
|
||||
arguments={args}
|
||||
identifier={plugin.identifier}
|
||||
messageId={messageId}
|
||||
params={params}
|
||||
state={pluginState}
|
||||
/>
|
||||
);
|
||||
|
||||
@@ -0,0 +1,42 @@
|
||||
import { BuiltinToolsPortalActions } from '@lobechat/builtin-tools/portals';
|
||||
import type { BuiltinPortalTitle } from '@lobechat/types';
|
||||
import isEqual from 'fast-deep-equal';
|
||||
|
||||
import { useChatStore } from '@/store/chat';
|
||||
import { chatPortalSelectors, dbMessageSelectors } from '@/store/chat/selectors';
|
||||
|
||||
import HeaderChrome from '../components/Header';
|
||||
import Title from './Title';
|
||||
|
||||
/**
|
||||
* ToolUI portal header: the generic back/close chrome plus the tool's title and,
|
||||
* when a tool registers them, header right-actions (e.g. prev/next nav).
|
||||
*/
|
||||
const Header = () => {
|
||||
const [toolUIIdentifier = '', messageId] = useChatStore((s) => [
|
||||
chatPortalSelectors.toolUIIdentifier(s),
|
||||
chatPortalSelectors.toolMessageId(s),
|
||||
]);
|
||||
const params = useChatStore(chatPortalSelectors.toolUIParams, isEqual);
|
||||
const message = useChatStore(dbMessageSelectors.getDbMessageById(messageId || ''), isEqual);
|
||||
|
||||
const Actions = BuiltinToolsPortalActions[toolUIIdentifier] as BuiltinPortalTitle | undefined;
|
||||
|
||||
return (
|
||||
<HeaderChrome
|
||||
title={<Title />}
|
||||
rightExtra={
|
||||
Actions ? (
|
||||
<Actions
|
||||
apiName={message?.plugin?.apiName}
|
||||
identifier={toolUIIdentifier}
|
||||
messageId={messageId || ''}
|
||||
params={params}
|
||||
/>
|
||||
) : undefined
|
||||
}
|
||||
/>
|
||||
);
|
||||
};
|
||||
|
||||
export default Header;
|
||||
@@ -1,39 +1,43 @@
|
||||
import { WebBrowsingManifest } from '@lobechat/builtin-tool-web-browsing';
|
||||
import { ActionIcon, Flexbox, Icon, Text } from '@lobehub/ui';
|
||||
import { BuiltinToolsPortalTitles } from '@lobechat/builtin-tools/portals';
|
||||
import type { BuiltinPortalTitle } from '@lobechat/types';
|
||||
import { Flexbox, Text } from '@lobehub/ui';
|
||||
import isEqual from 'fast-deep-equal';
|
||||
import { ArrowLeft, Globe } from 'lucide-react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import PluginAvatar from '@/features/PluginAvatar';
|
||||
import { useChatStore } from '@/store/chat';
|
||||
import { chatPortalSelectors } from '@/store/chat/selectors';
|
||||
import { chatPortalSelectors, dbMessageSelectors } from '@/store/chat/selectors';
|
||||
import { pluginHelpers, useToolStore } from '@/store/tool';
|
||||
import { toolSelectors } from '@/store/tool/selectors';
|
||||
|
||||
const Title = () => {
|
||||
const [closeToolUI, toolUIIdentifier = ''] = useChatStore((s) => [
|
||||
s.closeToolUI,
|
||||
const [toolUIIdentifier = '', messageId] = useChatStore((s) => [
|
||||
chatPortalSelectors.toolUIIdentifier(s),
|
||||
chatPortalSelectors.toolMessageId(s),
|
||||
]);
|
||||
const toolUIParams = useChatStore(chatPortalSelectors.toolUIParams, isEqual);
|
||||
const message = useChatStore(dbMessageSelectors.getDbMessageById(messageId || ''), isEqual);
|
||||
|
||||
const { t } = useTranslation('plugin');
|
||||
const pluginMeta = useToolStore(toolSelectors.getMetaById(toolUIIdentifier), isEqual);
|
||||
const pluginTitle = pluginHelpers.getPluginTitle(pluginMeta) ?? toolUIIdentifier;
|
||||
|
||||
if (toolUIIdentifier === WebBrowsingManifest.identifier) {
|
||||
// A tool may ship its own portal header content; otherwise fall back to the
|
||||
// generic plugin avatar + title. The back/close chrome is owned by the header
|
||||
// wrapper (HeaderChrome), so the title slot must not add its own back arrow.
|
||||
const CustomTitle = BuiltinToolsPortalTitles[toolUIIdentifier] as BuiltinPortalTitle | undefined;
|
||||
|
||||
if (CustomTitle) {
|
||||
return (
|
||||
<Flexbox horizontal align={'center'} gap={8}>
|
||||
<ActionIcon icon={ArrowLeft} size={'small'} onClick={() => closeToolUI()} />
|
||||
<Icon icon={Globe} size={16} />
|
||||
<Text style={{ fontSize: 16 }} type={'secondary'}>
|
||||
{t('search.title')}
|
||||
</Text>
|
||||
</Flexbox>
|
||||
<CustomTitle
|
||||
apiName={message?.plugin?.apiName}
|
||||
identifier={toolUIIdentifier}
|
||||
messageId={messageId || ''}
|
||||
params={toolUIParams}
|
||||
/>
|
||||
);
|
||||
}
|
||||
|
||||
return (
|
||||
<Flexbox horizontal align={'center'} gap={4}>
|
||||
<ActionIcon icon={ArrowLeft} size={'small'} onClick={() => closeToolUI()} />
|
||||
<Flexbox horizontal align={'center'} gap={8}>
|
||||
<PluginAvatar identifier={toolUIIdentifier} size={28} />
|
||||
<Text style={{ fontSize: 16 }} type={'secondary'}>
|
||||
{pluginTitle}
|
||||
|
||||
@@ -1,8 +1,10 @@
|
||||
import { type PortalImpl } from '../type';
|
||||
import Body from './Body';
|
||||
import Header from './Header';
|
||||
import Title from './Title';
|
||||
|
||||
export const Plugins: PortalImpl = {
|
||||
Body,
|
||||
Header,
|
||||
Title,
|
||||
};
|
||||
|
||||
@@ -0,0 +1,248 @@
|
||||
import type { VerifierType } from '@lobechat/types';
|
||||
import { Button, Flexbox, Markdown, Text } from '@lobehub/ui';
|
||||
import { createStyles } from 'antd-style';
|
||||
import { ListTree } from 'lucide-react';
|
||||
import type { ReactNode } from 'react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import {
|
||||
useVerifierTracing,
|
||||
useVerifyInstruction,
|
||||
useVerifyResults,
|
||||
useVerifyState,
|
||||
} from '@/features/Verify/hooks';
|
||||
import { verifyService } from '@/services/verify';
|
||||
import { useChatStore } from '@/store/chat';
|
||||
import { chatPortalSelectors, threadSelectors } from '@/store/chat/selectors';
|
||||
|
||||
const useStyles = createStyles(({ css, token }) => ({
|
||||
confidenceCard: css`
|
||||
padding: 12px;
|
||||
border: 1px solid ${token.colorBorderSecondary};
|
||||
border-radius: ${token.borderRadiusLG}px;
|
||||
background: ${token.colorFillQuaternary};
|
||||
`,
|
||||
confidenceValue: css`
|
||||
font-size: 20px;
|
||||
font-weight: 700;
|
||||
font-variant-numeric: tabular-nums;
|
||||
`,
|
||||
fill: css`
|
||||
height: 100%;
|
||||
border-radius: 999px;
|
||||
transition: width 300ms ${token.motionEaseOut};
|
||||
`,
|
||||
label: css`
|
||||
margin-block-end: 6px;
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
color: ${token.colorTextSecondary};
|
||||
`,
|
||||
metaKey: css`
|
||||
color: ${token.colorTextTertiary};
|
||||
`,
|
||||
metaRow: css`
|
||||
display: flex;
|
||||
gap: 12px;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
|
||||
font-size: 13px;
|
||||
`,
|
||||
metaValue: css`
|
||||
overflow: hidden;
|
||||
color: ${token.colorText};
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
`,
|
||||
text: css`
|
||||
font-size: 13px;
|
||||
line-height: 1.7;
|
||||
color: ${token.colorText};
|
||||
`,
|
||||
track: css`
|
||||
overflow: hidden;
|
||||
|
||||
width: 100%;
|
||||
height: 8px;
|
||||
border-radius: 999px;
|
||||
|
||||
background: ${token.colorFillSecondary};
|
||||
`,
|
||||
}));
|
||||
|
||||
/** Score zone → theme color token, mirroring the A/B/F grade bands. */
|
||||
const confidenceColor = (ratio: number) => {
|
||||
if (ratio >= 0.8) return 'colorSuccess';
|
||||
if (ratio >= 0.6) return 'colorWarning';
|
||||
return 'colorError';
|
||||
};
|
||||
|
||||
const methodKey = (type: VerifierType) =>
|
||||
`detail.method${type.charAt(0).toUpperCase()}${type.slice(1)}`;
|
||||
|
||||
const formatDuration = (started?: Date | string | null, completed?: Date | string | null) => {
|
||||
if (!started || !completed) return null;
|
||||
const ms = +new Date(completed) - +new Date(started);
|
||||
if (!Number.isFinite(ms) || ms <= 0) return null;
|
||||
return ms >= 1000 ? `${(ms / 1000).toFixed(1)}s` : `${ms}ms`;
|
||||
};
|
||||
|
||||
const Field = memo<{ children: ReactNode; label: string }>(({ label, children }) => {
|
||||
const { styles } = useStyles();
|
||||
return (
|
||||
<Flexbox>
|
||||
<div className={styles.label}>{label}</div>
|
||||
{children}
|
||||
</Flexbox>
|
||||
);
|
||||
});
|
||||
|
||||
const Body = () => {
|
||||
const { styles, theme } = useStyles();
|
||||
const { t } = useTranslation('verify');
|
||||
const operationId = useChatStore(chatPortalSelectors.verifyResultOperationId);
|
||||
const checkItemId = useChatStore(chatPortalSelectors.verifyResultCheckItemId);
|
||||
const { data: state } = useVerifyState(operationId ?? null);
|
||||
const { data: results } = useVerifyResults(operationId ?? null);
|
||||
|
||||
const item = (state?.verifyPlan ?? []).find((i) => i.id === checkItemId);
|
||||
const result = (results ?? []).find((r) => r.checkItemId === checkItemId);
|
||||
const { data: tracing } = useVerifierTracing(result?.verifierTracingId);
|
||||
// The criterion's original judging rule, so the panel shows what was checked,
|
||||
// not only the judgment outcome.
|
||||
const { data: instructionDoc } = useVerifyInstruction(item?.documentId);
|
||||
|
||||
if (!item) return null;
|
||||
|
||||
const colorOf = (key: string) => (theme as unknown as Record<string, string>)[key];
|
||||
|
||||
const ratio = typeof result?.confidence === 'number' ? result.confidence : undefined;
|
||||
const duration = formatDuration(result?.startedAt, result?.completedAt);
|
||||
const instruction = instructionDoc?.content;
|
||||
const tokens =
|
||||
tracing && (tracing.inputTokens != null || tracing.outputTokens != null)
|
||||
? (tracing.inputTokens ?? 0) + (tracing.outputTokens ?? 0)
|
||||
: undefined;
|
||||
|
||||
const metaItems: { key: string; value: string }[] = [
|
||||
{ key: t('detail.method'), value: t(methodKey(item.verifierType) as any) },
|
||||
result?.completedAt && {
|
||||
key: t('detail.checkedAt'),
|
||||
value: new Date(result.completedAt).toLocaleString(),
|
||||
},
|
||||
duration && { key: t('detail.duration'), value: duration },
|
||||
tracing?.model && {
|
||||
key: t('detail.model'),
|
||||
value: tracing.provider ? `${tracing.provider} / ${tracing.model}` : tracing.model,
|
||||
},
|
||||
tokens != null && { key: t('detail.tokens'), value: tokens.toLocaleString() },
|
||||
].filter(Boolean) as { key: string; value: string }[];
|
||||
|
||||
const sections: { key: string; value?: string | null }[] = [
|
||||
{ key: 'reasoning', value: result?.toulmin?.reasoning },
|
||||
{ key: 'evidence', value: result?.toulmin?.evidence },
|
||||
{ key: 'counterEvidence', value: result?.toulmin?.counterEvidence },
|
||||
{ key: 'limitation', value: result?.toulmin?.limitation },
|
||||
{ key: 'suggestion', value: result?.suggestion },
|
||||
].filter((s) => !!s.value);
|
||||
|
||||
const canOpenTrace = item.verifierType === 'agent' && !!result?.verifierOperationId;
|
||||
|
||||
const openTrace = async () => {
|
||||
if (!result?.verifierOperationId) return;
|
||||
const resolved = await verifyService.getVerifierThread(result.verifierOperationId);
|
||||
const threadId = resolved?.threadId;
|
||||
if (!threadId) return;
|
||||
const thread = (threadSelectors.currentTopicThreads(useChatStore.getState()) ?? []).find(
|
||||
(th) => th.id === threadId,
|
||||
);
|
||||
useChatStore.getState().openThreadInPortal(threadId, thread?.sourceMessageId);
|
||||
};
|
||||
|
||||
return (
|
||||
<Flexbox
|
||||
gap={16}
|
||||
height={'100%'}
|
||||
paddingBlock={'4px 16px'}
|
||||
paddingInline={8}
|
||||
style={{ overflow: 'auto' }}
|
||||
>
|
||||
{ratio !== undefined && (
|
||||
<div className={styles.confidenceCard}>
|
||||
<Flexbox
|
||||
horizontal
|
||||
align={'baseline'}
|
||||
justify={'space-between'}
|
||||
style={{ marginBlockEnd: 8 }}
|
||||
>
|
||||
<span className={styles.label} style={{ marginBlockEnd: 0 }}>
|
||||
{t('detail.confidence')}
|
||||
</span>
|
||||
<span
|
||||
className={styles.confidenceValue}
|
||||
style={{ color: colorOf(confidenceColor(ratio)) }}
|
||||
>
|
||||
{Math.round(ratio * 100)}%
|
||||
</span>
|
||||
</Flexbox>
|
||||
<div className={styles.track}>
|
||||
<div
|
||||
className={styles.fill}
|
||||
style={{
|
||||
background: colorOf(confidenceColor(ratio)),
|
||||
width: `${Math.round(ratio * 100)}%`,
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
|
||||
{metaItems.length > 0 && (
|
||||
<Flexbox gap={8}>
|
||||
{metaItems.map((m) => (
|
||||
<div className={styles.metaRow} key={m.key}>
|
||||
<span className={styles.metaKey}>{m.key}</span>
|
||||
<span className={styles.metaValue}>{m.value}</span>
|
||||
</div>
|
||||
))}
|
||||
</Flexbox>
|
||||
)}
|
||||
|
||||
{/* Original criteria — what this check verifies */}
|
||||
{item.description && (
|
||||
<Field label={t('detail.summary')}>
|
||||
<div className={styles.text}>{item.description}</div>
|
||||
</Field>
|
||||
)}
|
||||
|
||||
{instruction && (
|
||||
<Field label={t('detail.instruction')}>
|
||||
<Markdown className={styles.text} variant={'chat'}>
|
||||
{instruction}
|
||||
</Markdown>
|
||||
</Field>
|
||||
)}
|
||||
|
||||
{canOpenTrace && (
|
||||
<Button block icon={ListTree} onClick={openTrace}>
|
||||
{t('detail.openTrace')}
|
||||
</Button>
|
||||
)}
|
||||
|
||||
{!result && <Text type={'secondary'}>{t('detail.pending')}</Text>}
|
||||
|
||||
{/* Judgment outcome */}
|
||||
{sections.map((s) => (
|
||||
<Field key={s.key} label={t(`detail.${s.key}` as any)}>
|
||||
<Markdown className={styles.text} variant={'chat'}>
|
||||
{s.value!}
|
||||
</Markdown>
|
||||
</Field>
|
||||
))}
|
||||
</Flexbox>
|
||||
);
|
||||
};
|
||||
|
||||
export default Body;
|
||||
@@ -0,0 +1,82 @@
|
||||
import { Flexbox, Icon, Text } from '@lobehub/ui';
|
||||
import { createStyles } from 'antd-style';
|
||||
import { CheckCircle2, Circle, CircleAlert, LoaderCircle, XCircle } from 'lucide-react';
|
||||
|
||||
import type { VerifyCheckResultItem } from '@/database/schemas/verify';
|
||||
import { useVerifyResults, useVerifyState } from '@/features/Verify/hooks';
|
||||
import { useChatStore } from '@/store/chat';
|
||||
import { chatPortalSelectors } from '@/store/chat/selectors';
|
||||
import { oneLineEllipsis } from '@/styles';
|
||||
|
||||
const useStyles = createStyles(({ css }) => ({
|
||||
badge: css`
|
||||
display: inline-flex;
|
||||
flex: none;
|
||||
gap: 4px;
|
||||
align-items: center;
|
||||
|
||||
padding-block: 2px;
|
||||
padding-inline: 8px;
|
||||
border-radius: 999px;
|
||||
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
`,
|
||||
}));
|
||||
|
||||
const statusMeta = (status: VerifyCheckResultItem['status'] | undefined) => {
|
||||
switch (status) {
|
||||
case 'passed': {
|
||||
return { bg: 'colorSuccess', icon: CheckCircle2, text: 'colorSuccessTextActive' } as const;
|
||||
}
|
||||
case 'running': {
|
||||
return { bg: 'colorInfo', icon: LoaderCircle, text: 'colorInfoTextActive' } as const;
|
||||
}
|
||||
case 'failed': {
|
||||
return { bg: 'colorError', icon: XCircle, text: 'colorErrorTextActive' } as const;
|
||||
}
|
||||
case 'skipped': {
|
||||
return { bg: 'colorTextQuaternary', icon: CircleAlert, text: 'colorTextSecondary' } as const;
|
||||
}
|
||||
default: {
|
||||
return { bg: 'colorTextQuaternary', icon: Circle, text: 'colorTextSecondary' } as const;
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
const Title = () => {
|
||||
const { styles, theme } = useStyles();
|
||||
const operationId = useChatStore(chatPortalSelectors.verifyResultOperationId);
|
||||
const checkItemId = useChatStore(chatPortalSelectors.verifyResultCheckItemId);
|
||||
const { data: state } = useVerifyState(operationId ?? null);
|
||||
const { data: results } = useVerifyResults(operationId ?? null);
|
||||
|
||||
const item = (state?.verifyPlan ?? []).find((i) => i.id === checkItemId);
|
||||
const result = (results ?? []).find((r) => r.checkItemId === checkItemId);
|
||||
|
||||
const sIcon = statusMeta(result?.status);
|
||||
const colorOf = (key: string) => (theme as unknown as Record<string, string>)[key];
|
||||
const label = result?.verdict ?? result?.status;
|
||||
|
||||
return (
|
||||
<Flexbox horizontal align={'center'} gap={8} style={{ minWidth: 0 }}>
|
||||
<Text className={oneLineEllipsis} style={{ fontSize: 16 }} type={'secondary'}>
|
||||
{item?.title}
|
||||
</Text>
|
||||
{label && (
|
||||
<span
|
||||
className={styles.badge}
|
||||
style={{
|
||||
background: `color-mix(in srgb, ${colorOf(sIcon.bg)} 12%, transparent)`,
|
||||
color: colorOf(sIcon.text),
|
||||
}}
|
||||
>
|
||||
<Icon icon={sIcon.icon} size={13} spin={result?.status === 'running'} />
|
||||
{label}
|
||||
</span>
|
||||
)}
|
||||
</Flexbox>
|
||||
);
|
||||
};
|
||||
|
||||
export default Title;
|
||||
@@ -0,0 +1,8 @@
|
||||
import { type PortalImpl } from '../type';
|
||||
import Body from './Body';
|
||||
import Title from './Title';
|
||||
|
||||
export const VerifyResult: PortalImpl = {
|
||||
Body,
|
||||
Title,
|
||||
};
|
||||
@@ -18,6 +18,7 @@ import { Notebook } from './Notebook';
|
||||
import { Plugins } from './Plugins';
|
||||
import { Thread } from './Thread';
|
||||
import { type PortalImpl } from './type';
|
||||
import { VerifyResult } from './VerifyResult';
|
||||
|
||||
// View type to component mapping
|
||||
const VIEW_COMPONENTS: Record<PortalViewType, PortalImpl> = {
|
||||
@@ -34,6 +35,7 @@ const VIEW_COMPONENTS: Record<PortalViewType, PortalImpl> = {
|
||||
[PortalViewType.ToolUI]: Plugins,
|
||||
[PortalViewType.Thread]: Thread,
|
||||
[PortalViewType.GroupThread]: GroupThread,
|
||||
[PortalViewType.VerifyResult]: VerifyResult,
|
||||
};
|
||||
|
||||
// Default Home component
|
||||
|
||||
@@ -0,0 +1,393 @@
|
||||
import type { VerifyCheckItem } from '@lobechat/types';
|
||||
import { ActionIcon, Button, Flexbox, Icon, Input, TextArea } from '@lobehub/ui';
|
||||
import { createStyles } from 'antd-style';
|
||||
import {
|
||||
Check,
|
||||
CheckCircle2,
|
||||
ChevronDown,
|
||||
ChevronRight,
|
||||
ChevronUp,
|
||||
Circle,
|
||||
CircleAlert,
|
||||
LoaderCircle,
|
||||
Plus,
|
||||
ShieldAlert,
|
||||
ShieldCheck,
|
||||
Trash2,
|
||||
XCircle,
|
||||
} from 'lucide-react';
|
||||
import { memo, useState } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import RingLoadingIcon from '@/components/RingLoading';
|
||||
import type { VerifyCheckResultItem } from '@/database/schemas/verify';
|
||||
import { verifyService } from '@/services/verify';
|
||||
import { useChatStore } from '@/store/chat';
|
||||
|
||||
import { useVerifyResults, useVerifyState } from './hooks';
|
||||
import { countResults, phaseFromStatus } from './utils';
|
||||
|
||||
const useStyles = createStyles(({ css, token }) => ({
|
||||
actions: css`
|
||||
margin-block-start: 12px;
|
||||
border-block-start: 1px solid ${token.colorBorderSecondary};
|
||||
`,
|
||||
body: css`
|
||||
padding-block: 0 12px;
|
||||
padding-inline: 12px;
|
||||
border-block-start: 1px solid ${token.colorBorderSecondary};
|
||||
`,
|
||||
/* In the merged verify card the RunResult header already draws the divider —
|
||||
drop our own top border so they don't stack into a 2px line. */
|
||||
bodyEmbedded: css`
|
||||
border-block-start: none;
|
||||
`,
|
||||
checkRow: css`
|
||||
display: grid;
|
||||
grid-template-columns: 20px minmax(0, 1fr) auto;
|
||||
gap: 8px;
|
||||
align-items: start;
|
||||
|
||||
padding-block: 12px;
|
||||
|
||||
&:not(:last-child) {
|
||||
border-block-end: 1px solid ${token.colorBorderSecondary};
|
||||
}
|
||||
`,
|
||||
chevron: css`
|
||||
flex: none;
|
||||
color: ${token.colorTextQuaternary};
|
||||
`,
|
||||
clickable: css`
|
||||
cursor: pointer;
|
||||
transition: background 150ms ${token.motionEaseOut};
|
||||
|
||||
&:hover {
|
||||
background: ${token.colorFillQuaternary};
|
||||
}
|
||||
`,
|
||||
desc: css`
|
||||
margin-block-start: 3px;
|
||||
font-size: 12px;
|
||||
line-height: 1.45;
|
||||
color: ${token.colorTextTertiary};
|
||||
`,
|
||||
dock: css`
|
||||
overflow: hidden;
|
||||
border: 1px solid ${token.colorBorder};
|
||||
border-radius: 16px;
|
||||
background: ${token.colorBgElevated};
|
||||
`,
|
||||
head: css`
|
||||
cursor: pointer;
|
||||
|
||||
display: flex;
|
||||
gap: 12px;
|
||||
align-items: center;
|
||||
justify-content: space-between;
|
||||
|
||||
padding-block: 11px;
|
||||
padding-inline: 12px;
|
||||
`,
|
||||
inputPanel: css`
|
||||
margin-block-start: 10px;
|
||||
padding: 10px;
|
||||
border: 1px solid ${token.colorBorderSecondary};
|
||||
border-radius: 12px;
|
||||
|
||||
background: ${token.colorFillQuaternary};
|
||||
`,
|
||||
sub: css`
|
||||
overflow: hidden;
|
||||
|
||||
font-size: 12px;
|
||||
color: ${token.colorTextTertiary};
|
||||
text-overflow: ellipsis;
|
||||
white-space: nowrap;
|
||||
`,
|
||||
title: css`
|
||||
font-size: 13px;
|
||||
font-weight: 700;
|
||||
color: ${token.colorText};
|
||||
`,
|
||||
}));
|
||||
|
||||
const statusIcon = (status: VerifyCheckResultItem['status'] | undefined) => {
|
||||
switch (status) {
|
||||
case 'passed': {
|
||||
return { color: 'colorSuccess', icon: CheckCircle2, spin: false } as const;
|
||||
}
|
||||
case 'running': {
|
||||
return { color: 'colorInfo', icon: LoaderCircle, spin: true } as const;
|
||||
}
|
||||
case 'failed': {
|
||||
return { color: 'colorError', icon: XCircle, spin: false } as const;
|
||||
}
|
||||
case 'skipped': {
|
||||
return { color: 'colorTextQuaternary', icon: CircleAlert, spin: false } as const;
|
||||
}
|
||||
default: {
|
||||
return { color: 'colorTextQuaternary', icon: Circle, spin: false } as const;
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
interface CheckerDockProps {
|
||||
/** Render only the checker body (items + actions), no dock chrome / header — for the merged verify card. */
|
||||
embedded?: boolean;
|
||||
operationId: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* The delivery checker dock — replaces the chat composer during a run. Mirrors
|
||||
* the reference mock: a collapsible card driving the plan state machine
|
||||
* (draft → verifying → failed/repairing → passed) with confirm / edit / skip
|
||||
* and a failure feedback panel.
|
||||
*/
|
||||
const CheckerDock = memo<CheckerDockProps>(({ operationId, embedded }) => {
|
||||
const { styles, cx, theme } = useStyles();
|
||||
const { t } = useTranslation('verify');
|
||||
const { data: state, mutate: mutateState } = useVerifyState(operationId);
|
||||
const { data: results, mutate: mutateResults } = useVerifyResults(operationId);
|
||||
const openVerifyResult = useChatStore((s) => s.openVerifyResult);
|
||||
|
||||
const [expanded, setExpanded] = useState(true);
|
||||
const [editing, setEditing] = useState(false);
|
||||
const [draftItems, setDraftItems] = useState<VerifyCheckItem[]>([]);
|
||||
const [inputText, setInputText] = useState('');
|
||||
const [busy, setBusy] = useState(false);
|
||||
|
||||
const plan = state?.verifyPlan ?? [];
|
||||
const phase = phaseFromStatus(state?.verifyStatus);
|
||||
const counts = countResults(results ?? []);
|
||||
const resultByItem = new Map((results ?? []).map((r) => [r.checkItemId, r]));
|
||||
|
||||
if (phase === 'idle' || plan.length === 0) return null;
|
||||
|
||||
const colorOf = (key: string) => (theme as unknown as Record<string, string>)[key];
|
||||
|
||||
const run = async (fn: () => Promise<unknown>) => {
|
||||
setBusy(true);
|
||||
try {
|
||||
await fn();
|
||||
await Promise.all([mutateState(), mutateResults()]);
|
||||
} finally {
|
||||
setBusy(false);
|
||||
}
|
||||
};
|
||||
|
||||
const onConfirm = () => run(() => verifyService.confirmPlan(operationId));
|
||||
const onSkip = () => run(() => verifyService.skipPlan(operationId));
|
||||
const onForceDeliver = () => run(() => verifyService.skipPlan(operationId));
|
||||
|
||||
const startEdit = () => {
|
||||
setDraftItems(plan.map((i) => ({ ...i })));
|
||||
setEditing(true);
|
||||
setExpanded(true);
|
||||
};
|
||||
const saveEdit = () =>
|
||||
run(async () => {
|
||||
await verifyService.updateDraftItems(
|
||||
operationId,
|
||||
draftItems.map((item, index) => ({ ...item, index })),
|
||||
);
|
||||
setEditing(false);
|
||||
});
|
||||
|
||||
const subText = (() => {
|
||||
const map: Record<string, string> = {
|
||||
draft: t('status.draft', { total: plan.length }),
|
||||
failed: t('status.failed'),
|
||||
passed: t('status.passed', { passed: counts.passed, total: counts.total }),
|
||||
repairing: t('status.repairing'),
|
||||
verifying: t('status.checking', { passed: counts.passed, total: counts.total }),
|
||||
};
|
||||
return map[phase] ?? t('status.idle');
|
||||
})();
|
||||
|
||||
const renderCheckRow = (item: VerifyCheckItem) => {
|
||||
const result = resultByItem.get(item.id);
|
||||
const sIcon = statusIcon(result?.status);
|
||||
const evidence = result?.toulmin?.reasoning || result?.suggestion;
|
||||
return (
|
||||
<div
|
||||
className={cx(styles.checkRow, styles.clickable)}
|
||||
key={item.id}
|
||||
onClick={() => openVerifyResult(operationId, item.id)}
|
||||
>
|
||||
{result?.status === 'running' ? (
|
||||
<RingLoadingIcon
|
||||
size={16}
|
||||
style={{ color: theme.colorWarning }}
|
||||
ringColor={
|
||||
theme.isDarkMode
|
||||
? theme.colorWarningBorder
|
||||
: `color-mix(in srgb, ${theme.colorWarning} 45%, transparent)`
|
||||
}
|
||||
/>
|
||||
) : (
|
||||
<Icon color={colorOf(sIcon.color)} icon={sIcon.icon} size={16} spin={sIcon.spin} />
|
||||
)}
|
||||
<Flexbox style={{ minWidth: 0 }}>
|
||||
<span className={styles.title} style={{ fontWeight: 600 }}>
|
||||
{item.title}
|
||||
</span>
|
||||
{evidence && <span className={styles.desc}>{evidence}</span>}
|
||||
</Flexbox>
|
||||
<Icon
|
||||
className={styles.chevron}
|
||||
icon={ChevronRight}
|
||||
size={16}
|
||||
style={{ marginBlockStart: 2 }}
|
||||
/>
|
||||
</div>
|
||||
);
|
||||
};
|
||||
|
||||
const renderEditor = () => (
|
||||
<Flexbox gap={8}>
|
||||
{draftItems.map((item, index) => (
|
||||
<Flexbox horizontal align="center" gap={7} key={item.id}>
|
||||
<Input
|
||||
placeholder={t('editor.placeholder')}
|
||||
value={item.title}
|
||||
onChange={(e) => {
|
||||
const next = [...draftItems];
|
||||
next[index] = { ...item, title: e.target.value };
|
||||
setDraftItems(next);
|
||||
}}
|
||||
/>
|
||||
<ActionIcon
|
||||
icon={Trash2}
|
||||
size="small"
|
||||
onClick={() => setDraftItems(draftItems.filter((_, i) => i !== index))}
|
||||
/>
|
||||
</Flexbox>
|
||||
))}
|
||||
<Button
|
||||
block
|
||||
icon={Plus}
|
||||
size="small"
|
||||
type="dashed"
|
||||
onClick={() =>
|
||||
setDraftItems([
|
||||
...draftItems,
|
||||
{
|
||||
id: `tmp-${draftItems.length}-${Date.now()}`,
|
||||
index: draftItems.length,
|
||||
onFail: 'manual',
|
||||
required: false,
|
||||
sourceCriterionId: null,
|
||||
sourceRubricId: null,
|
||||
title: '',
|
||||
verifierConfig: {},
|
||||
verifierType: 'llm',
|
||||
},
|
||||
])
|
||||
}
|
||||
>
|
||||
{t('editor.add')}
|
||||
</Button>
|
||||
<Flexbox horizontal gap={8}>
|
||||
<Button loading={busy} size="small" type="primary" onClick={saveEdit}>
|
||||
{t('editor.save')}
|
||||
</Button>
|
||||
<Button size="small" onClick={() => setEditing(false)}>
|
||||
{t('editor.cancel')}
|
||||
</Button>
|
||||
</Flexbox>
|
||||
</Flexbox>
|
||||
);
|
||||
|
||||
const renderActions = () => {
|
||||
if (phase === 'draft')
|
||||
return (
|
||||
<Flexbox horizontal gap={8} style={{ flexWrap: 'wrap', marginTop: 12 }}>
|
||||
<Button loading={busy} size="small" type="primary" onClick={onConfirm}>
|
||||
{t('dock.confirm')}
|
||||
</Button>
|
||||
<Button size="small" onClick={startEdit}>
|
||||
{t('dock.edit')}
|
||||
</Button>
|
||||
<Button size="small" onClick={onSkip}>
|
||||
{t('dock.skip')}
|
||||
</Button>
|
||||
</Flexbox>
|
||||
);
|
||||
if (phase === 'failed')
|
||||
return (
|
||||
<>
|
||||
<div className={styles.inputPanel}>
|
||||
<div style={{ fontSize: 12, fontWeight: 600, marginBottom: 6 }}>{t('input.label')}</div>
|
||||
<TextArea
|
||||
placeholder={t('input.placeholder')}
|
||||
rows={2}
|
||||
value={inputText}
|
||||
onChange={(e) => setInputText(e.target.value)}
|
||||
/>
|
||||
<div className={styles.desc} style={{ marginTop: 6 }}>
|
||||
{t('input.hint')}
|
||||
</div>
|
||||
</div>
|
||||
<Flexbox horizontal gap={8} style={{ flexWrap: 'wrap', marginTop: 12 }}>
|
||||
<Button size="small" onClick={startEdit}>
|
||||
{t('dock.edit')}
|
||||
</Button>
|
||||
<Button danger loading={busy} size="small" onClick={onForceDeliver}>
|
||||
{t('dock.forceDeliver')}
|
||||
</Button>
|
||||
</Flexbox>
|
||||
</>
|
||||
);
|
||||
if (phase === 'repairing')
|
||||
return (
|
||||
<div className={styles.inputPanel} style={{ marginTop: 10 }}>
|
||||
<div className={styles.desc}>{t('dock.repairHint')}</div>
|
||||
</div>
|
||||
);
|
||||
return null;
|
||||
};
|
||||
|
||||
const body = (
|
||||
<div className={cx(styles.body, embedded && styles.bodyEmbedded)}>
|
||||
{editing ? (
|
||||
renderEditor()
|
||||
) : (
|
||||
<>
|
||||
<Flexbox gap={0}>{plan.map((item) => renderCheckRow(item))}</Flexbox>
|
||||
{(() => {
|
||||
const actions = renderActions();
|
||||
return actions ? <div className={styles.actions}>{actions}</div> : null;
|
||||
})()}
|
||||
</>
|
||||
)}
|
||||
</div>
|
||||
);
|
||||
|
||||
// Merged verify card: just the checker body (items + actions), no dock chrome.
|
||||
if (embedded) return body;
|
||||
|
||||
const headIcon = phase === 'passed' ? Check : phase === 'failed' ? ShieldAlert : ShieldCheck;
|
||||
|
||||
return (
|
||||
<div className={styles.dock}>
|
||||
<div className={styles.head} onClick={() => setExpanded((v) => !v)}>
|
||||
<Flexbox horizontal align="center" gap={10} style={{ minWidth: 0 }}>
|
||||
<Icon icon={headIcon} size={18} />
|
||||
<Flexbox style={{ minWidth: 0 }}>
|
||||
<Flexbox horizontal align="center" gap={8}>
|
||||
<span className={styles.title}>{t('dock.title')}</span>
|
||||
</Flexbox>
|
||||
<span className={styles.sub}>{subText}</span>
|
||||
</Flexbox>
|
||||
</Flexbox>
|
||||
<Icon color={theme.colorTextTertiary} icon={expanded ? ChevronDown : ChevronUp} size={16} />
|
||||
</div>
|
||||
{expanded && body}
|
||||
</div>
|
||||
);
|
||||
});
|
||||
|
||||
CheckerDock.displayName = 'CheckerDock';
|
||||
|
||||
export default CheckerDock;
|
||||
@@ -0,0 +1,208 @@
|
||||
import { Flexbox, Icon } from '@lobehub/ui';
|
||||
import { createStyles } from 'antd-style';
|
||||
import { Check, Info, RefreshCw, Shield, ShieldCheck, X } from 'lucide-react';
|
||||
import { memo } from 'react';
|
||||
import { useTranslation } from 'react-i18next';
|
||||
|
||||
import { useVerifyResults, useVerifyState } from './hooks';
|
||||
import { countResults, type DockPhase, phaseFromStatus } from './utils';
|
||||
|
||||
const useStyles = createStyles(({ css, token }) => ({
|
||||
badge: css`
|
||||
display: inline-flex;
|
||||
flex: none;
|
||||
gap: 5px;
|
||||
align-items: center;
|
||||
|
||||
padding-block: 4px;
|
||||
padding-inline: 10px;
|
||||
border-radius: 999px;
|
||||
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
`,
|
||||
body: css`
|
||||
padding-block: 12px;
|
||||
padding-inline: 16px;
|
||||
|
||||
font-size: 13px;
|
||||
line-height: 1.7;
|
||||
color: ${token.colorTextSecondary};
|
||||
`,
|
||||
card: css`
|
||||
overflow: hidden;
|
||||
border: 1px solid ${token.colorBorderSecondary};
|
||||
border-radius: 16px;
|
||||
background: ${token.colorBgContainer};
|
||||
`,
|
||||
cardFailed: css`
|
||||
border-color: ${token.colorErrorBorder};
|
||||
`,
|
||||
foot: css`
|
||||
display: flex;
|
||||
gap: 8px;
|
||||
align-items: center;
|
||||
|
||||
padding-block: 10px;
|
||||
padding-inline: 16px;
|
||||
border-block-start: 1px solid ${token.colorBorderSecondary};
|
||||
|
||||
font-size: 12px;
|
||||
color: ${token.colorTextTertiary};
|
||||
`,
|
||||
head: css`
|
||||
display: flex;
|
||||
gap: 14px;
|
||||
align-items: flex-start;
|
||||
justify-content: space-between;
|
||||
|
||||
padding-block: 14px;
|
||||
padding-inline: 16px;
|
||||
border-block-end: 1px solid ${token.colorBorderSecondary};
|
||||
`,
|
||||
sub: css`
|
||||
margin-block-start: 4px;
|
||||
font-size: 12px;
|
||||
line-height: 1.5;
|
||||
color: ${token.colorTextTertiary};
|
||||
`,
|
||||
title: css`
|
||||
font-size: 15px;
|
||||
font-weight: 700;
|
||||
color: ${token.colorText};
|
||||
`,
|
||||
}));
|
||||
|
||||
interface BadgeMeta {
|
||||
color: 'default' | 'success' | 'error' | 'warning';
|
||||
icon: typeof Check;
|
||||
key: 'pending' | 'failed' | 'repairing' | 'passed';
|
||||
}
|
||||
|
||||
const phaseToResult: Record<DockPhase, { badge: BadgeMeta; subKey: string } | null> = {
|
||||
draft: {
|
||||
badge: { color: 'default', icon: Shield, key: 'pending' },
|
||||
subKey: 'result.pending.sub',
|
||||
},
|
||||
failed: {
|
||||
badge: { color: 'error', icon: X, key: 'failed' },
|
||||
subKey: 'result.failed.sub',
|
||||
},
|
||||
idle: {
|
||||
badge: { color: 'default', icon: Shield, key: 'pending' },
|
||||
subKey: 'result.pending.sub',
|
||||
},
|
||||
passed: {
|
||||
badge: { color: 'success', icon: Check, key: 'passed' },
|
||||
subKey: 'result.passed.sub',
|
||||
},
|
||||
repairing: {
|
||||
badge: { color: 'warning', icon: RefreshCw, key: 'repairing' },
|
||||
subKey: 'result.repairing.sub',
|
||||
},
|
||||
verifying: {
|
||||
badge: { color: 'default', icon: Shield, key: 'pending' },
|
||||
subKey: 'result.pending.sub',
|
||||
},
|
||||
};
|
||||
|
||||
interface RunResultProps {
|
||||
/** Render only the header (kicker + title + status), no card chrome — for the merged verify card. */
|
||||
embedded?: boolean;
|
||||
operationId: string;
|
||||
/** Display round number (1-based); repair rounds are separate operations. */
|
||||
round?: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* Inline snapshot card for one Agent Run's verify outcome. Rendered in the chat
|
||||
* thread below the assistant message group. Each round (operation) keeps its own
|
||||
* result snapshot, so failures are never overwritten by later success.
|
||||
*/
|
||||
const RunResult = memo<RunResultProps>(({ operationId, round = 1, embedded }) => {
|
||||
const { styles, cx, theme } = useStyles();
|
||||
const { t } = useTranslation('verify');
|
||||
const { data: state } = useVerifyState(operationId);
|
||||
const { data: results } = useVerifyResults(operationId);
|
||||
|
||||
const phase = phaseFromStatus(state?.verifyStatus);
|
||||
const meta = phaseToResult[phase];
|
||||
if (!state?.verifyPlan?.length || !meta) return null;
|
||||
|
||||
const counts = countResults(results ?? []);
|
||||
const badgeColorMap = {
|
||||
default: theme.colorTextTertiary,
|
||||
error: theme.colorError,
|
||||
success: theme.colorSuccess,
|
||||
warning: theme.colorWarning,
|
||||
} as const;
|
||||
// Deeper, more readable text color over the tinted badge fill.
|
||||
const badgeTextMap = {
|
||||
default: theme.colorTextSecondary,
|
||||
error: theme.colorErrorTextActive,
|
||||
success: theme.colorSuccessTextActive,
|
||||
warning: theme.colorWarningTextActive,
|
||||
} as const;
|
||||
|
||||
const header = (
|
||||
<div className={styles.head}>
|
||||
<Flexbox>
|
||||
<Flexbox horizontal align="center" gap={7}>
|
||||
<Icon icon={ShieldCheck} size={16} />
|
||||
<span className={styles.title}>{t('result.title', { round })}</span>
|
||||
</Flexbox>
|
||||
<div className={styles.sub}>
|
||||
{t(meta.subKey as any, { passed: counts.passed, total: counts.total } as any)}
|
||||
</div>
|
||||
</Flexbox>
|
||||
<span
|
||||
className={styles.badge}
|
||||
style={{
|
||||
background: `color-mix(in srgb, ${badgeColorMap[meta.badge.color]} 12%, transparent)`,
|
||||
color: badgeTextMap[meta.badge.color],
|
||||
}}
|
||||
>
|
||||
<Icon icon={meta.badge.icon} size={14} />
|
||||
{t(`badge.${meta.badge.key}` as any)}
|
||||
</span>
|
||||
</div>
|
||||
);
|
||||
|
||||
// Merged verify card: header only, no card chrome / summary list / footer.
|
||||
if (embedded) return header;
|
||||
|
||||
return (
|
||||
<div className={cx(styles.card, phase === 'failed' && styles.cardFailed)}>
|
||||
{header}
|
||||
<div className={styles.body}>
|
||||
<Flexbox gap={4}>
|
||||
{(state.verifyPlan ?? []).map((item) => {
|
||||
const result = (results ?? []).find((r) => r.checkItemId === item.id);
|
||||
return (
|
||||
<Flexbox horizontal align="center" gap={8} key={item.id}>
|
||||
<span>{item.title}</span>
|
||||
{result?.verdict && (
|
||||
<span
|
||||
style={{
|
||||
color: badgeColorMap[result.verdict === 'passed' ? 'success' : 'error'],
|
||||
}}
|
||||
>
|
||||
· {result.verdict}
|
||||
</span>
|
||||
)}
|
||||
</Flexbox>
|
||||
);
|
||||
})}
|
||||
</Flexbox>
|
||||
</div>
|
||||
<div className={styles.foot}>
|
||||
<Icon icon={Info} size={14} />
|
||||
<span>{t('result.foot')}</span>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
});
|
||||
|
||||
RunResult.displayName = 'RunResult';
|
||||
|
||||
export default RunResult;
|
||||
@@ -0,0 +1,39 @@
|
||||
import { useClientDataSWR } from '@/libs/swr';
|
||||
import { documentService } from '@/services/document';
|
||||
import { verifyService } from '@/services/verify';
|
||||
|
||||
export const VERIFY_STATE_KEY = 'verify-state';
|
||||
export const VERIFY_RESULTS_KEY = 'verify-results';
|
||||
export const VERIFY_TRACING_KEY = 'verify-tracing';
|
||||
export const VERIFY_INSTRUCTION_KEY = 'verify-instruction';
|
||||
export const VERIFY_RUBRIC_KEY = 'verify-rubric';
|
||||
|
||||
/** Plan + rollup status for one Agent Run. Pass null operationId to skip. */
|
||||
export const useVerifyState = (operationId: string | null) =>
|
||||
useClientDataSWR(operationId ? [VERIFY_STATE_KEY, operationId] : null, () =>
|
||||
verifyService.getVerifyState(operationId!),
|
||||
);
|
||||
|
||||
/** Per-item check results for one Agent Run. Pass null operationId to skip. */
|
||||
export const useVerifyResults = (operationId: string | null) =>
|
||||
useClientDataSWR(operationId ? [VERIFY_RESULTS_KEY, operationId] : null, () =>
|
||||
verifyService.listResults(operationId!),
|
||||
);
|
||||
|
||||
/** Model / token / latency for an LLM verifier judgment. Pass null to skip. */
|
||||
export const useVerifierTracing = (tracingId: string | null | undefined) =>
|
||||
useClientDataSWR(tracingId ? [VERIFY_TRACING_KEY, tracingId] : null, () =>
|
||||
verifyService.getVerifierTracing(tracingId!),
|
||||
);
|
||||
|
||||
/** The criterion's original judging rule, stored in its instruction document. */
|
||||
export const useVerifyInstruction = (documentId: string | null | undefined) =>
|
||||
useClientDataSWR(documentId ? [VERIFY_INSTRUCTION_KEY, documentId] : null, () =>
|
||||
documentService.getDocumentById(documentId!),
|
||||
);
|
||||
|
||||
/** A rubric and its run-policy config (e.g. maxRepairRounds). Pass null to skip. */
|
||||
export const useRubric = (rubricId: string | null | undefined) =>
|
||||
useClientDataSWR(rubricId ? [VERIFY_RUBRIC_KEY, rubricId] : null, () =>
|
||||
verifyService.getRubric(rubricId!),
|
||||
);
|
||||
@@ -0,0 +1,4 @@
|
||||
export { default as CheckerDock } from './CheckerDock';
|
||||
export { useVerifyResults, useVerifyState } from './hooks';
|
||||
export { default as RunResult } from './RunResult';
|
||||
export { countResults, isDraftUnconfirmed, phaseFromStatus } from './utils';
|
||||
@@ -0,0 +1,43 @@
|
||||
import { describe, expect, it } from 'vitest';
|
||||
|
||||
import { countResults, isDraftUnconfirmed, itemBehavior, phaseFromStatus } from './utils';
|
||||
|
||||
describe('phaseFromStatus', () => {
|
||||
it('maps rollup statuses to dock phases', () => {
|
||||
expect(phaseFromStatus('planned')).toBe('draft');
|
||||
expect(phaseFromStatus('verifying')).toBe('verifying');
|
||||
expect(phaseFromStatus('failed')).toBe('failed');
|
||||
expect(phaseFromStatus('repairing')).toBe('repairing');
|
||||
expect(phaseFromStatus('passed')).toBe('passed');
|
||||
expect(phaseFromStatus('delivered')).toBe('passed');
|
||||
expect(phaseFromStatus(null)).toBe('idle');
|
||||
expect(phaseFromStatus('unverified')).toBe('idle');
|
||||
});
|
||||
});
|
||||
|
||||
describe('isDraftUnconfirmed', () => {
|
||||
it('is true only for a planned, not-yet-confirmed plan', () => {
|
||||
expect(isDraftUnconfirmed('planned', null)).toBe(true);
|
||||
expect(isDraftUnconfirmed('planned', new Date())).toBe(false);
|
||||
expect(isDraftUnconfirmed('verifying', null)).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('itemBehavior', () => {
|
||||
it('maps required → gate, optional → auto_improve', () => {
|
||||
expect(itemBehavior({ required: true })).toBe('gate');
|
||||
expect(itemBehavior({ required: false })).toBe('auto_improve');
|
||||
});
|
||||
});
|
||||
|
||||
describe('countResults', () => {
|
||||
it('counts passed/failed by status or verdict', () => {
|
||||
expect(
|
||||
countResults([
|
||||
{ status: 'passed', verdict: 'passed' } as any,
|
||||
{ status: 'failed', verdict: 'failed' } as any,
|
||||
{ status: 'skipped', verdict: null } as any,
|
||||
]),
|
||||
).toEqual({ failed: 1, passed: 1, total: 3 });
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,94 @@
|
||||
import type { VerifyCheckItem } from '@lobechat/types';
|
||||
|
||||
import type { VerifyStatus } from '@/database/models/agentOperation';
|
||||
import type { VerifyCheckResultItem } from '@/database/schemas/verify';
|
||||
|
||||
export type DockPhase = 'idle' | 'draft' | 'verifying' | 'failed' | 'repairing' | 'passed';
|
||||
|
||||
/** Map the persisted rollup status to the dock's phase state machine. */
|
||||
export const phaseFromStatus = (status: VerifyStatus | null | undefined): DockPhase => {
|
||||
switch (status) {
|
||||
case 'planned': {
|
||||
return 'draft';
|
||||
}
|
||||
case 'verifying': {
|
||||
return 'verifying';
|
||||
}
|
||||
case 'failed': {
|
||||
return 'failed';
|
||||
}
|
||||
case 'repairing': {
|
||||
return 'repairing';
|
||||
}
|
||||
case 'passed':
|
||||
case 'delivered': {
|
||||
return 'passed';
|
||||
}
|
||||
default: {
|
||||
return 'idle';
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
/** Whether a draft plan exists but hasn't been confirmed yet. */
|
||||
export const isDraftUnconfirmed = (
|
||||
status: VerifyStatus | null | undefined,
|
||||
confirmedAt: Date | null | undefined,
|
||||
): boolean => status === 'planned' && !confirmedAt;
|
||||
|
||||
/** Display behavior of a check item, mirroring the mock's gate / auto_improve. */
|
||||
export const itemBehavior = (item: Pick<VerifyCheckItem, 'required'>): 'gate' | 'auto_improve' =>
|
||||
item.required ? 'gate' : 'auto_improve';
|
||||
|
||||
export interface CheckCounts {
|
||||
failed: number;
|
||||
passed: number;
|
||||
total: number;
|
||||
}
|
||||
|
||||
export const countResults = (results: VerifyCheckResultItem[] = []): CheckCounts => ({
|
||||
failed: results.filter((r) => r.status === 'failed' || r.verdict === 'failed').length,
|
||||
passed: results.filter((r) => r.status === 'passed' || r.verdict === 'passed').length,
|
||||
total: results.length,
|
||||
});
|
||||
|
||||
/** The subset of theme color tokens the verify card tint reads. */
|
||||
export interface VerifyTintTheme {
|
||||
colorBgElevated: string;
|
||||
colorError: string;
|
||||
colorSuccess: string;
|
||||
colorWarning: string;
|
||||
}
|
||||
|
||||
const mix = (color: string, percent: number) =>
|
||||
`color-mix(in srgb, ${color} ${percent}%, transparent)`;
|
||||
|
||||
/**
|
||||
* State-tinted background for the whole verify card, keyed by phase. A soft
|
||||
* radial glow anchored to the status corner (top-right, behind the badge) over
|
||||
* the container fill — a gentle halo, not a full-width banner. Returns undefined
|
||||
* when neutral.
|
||||
*/
|
||||
export const phaseCardBackground = (
|
||||
phase: DockPhase,
|
||||
theme: VerifyTintTheme,
|
||||
): string | undefined => {
|
||||
const glow = (color: string) =>
|
||||
`radial-gradient(60% 90% at 100% 0%, ${mix(color, 8)} 0%, ${mix(color, 0)} 52%), ${theme.colorBgElevated}`;
|
||||
switch (phase) {
|
||||
case 'passed': {
|
||||
return glow(theme.colorSuccess);
|
||||
}
|
||||
case 'failed': {
|
||||
return glow(theme.colorError);
|
||||
}
|
||||
case 'draft':
|
||||
case 'verifying':
|
||||
case 'repairing': {
|
||||
return glow(theme.colorWarning);
|
||||
}
|
||||
default: {
|
||||
return undefined;
|
||||
}
|
||||
}
|
||||
};
|
||||
@@ -45,6 +45,7 @@ import thread from './thread';
|
||||
import tool from './tool';
|
||||
import topic from './topic';
|
||||
import ui from './ui';
|
||||
import verify from './verify';
|
||||
import video from './video';
|
||||
import welcome from './welcome';
|
||||
|
||||
@@ -96,6 +97,7 @@ const resources = {
|
||||
tool,
|
||||
topic,
|
||||
ui,
|
||||
verify,
|
||||
video,
|
||||
welcome,
|
||||
} as const;
|
||||
|
||||
@@ -158,6 +158,39 @@ export default {
|
||||
'builtins.lobe-agent.apiName.updatePlan.completed': 'Completed',
|
||||
'builtins.lobe-agent.apiName.updatePlan.modified': 'Modified',
|
||||
'builtins.lobe-agent.apiName.updateTodos': 'Update todos',
|
||||
'builtins.lobe-delivery-checker.apiName.generateVerifyPlan': 'Create automated checks',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.optional': 'Optional',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.fields.description': 'Summary',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.fields.instruction': 'Judging rubric',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.fields.title': 'Check title',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.onFail.auto_repair': 'Auto repair',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.onFail.auto_repairDesc':
|
||||
'On failure, automatically start a repair round and re-run the check.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.onFail.manual': 'Handle manually',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.onFail.manualDesc':
|
||||
'On failure, stop and leave the next step to you.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.onFail.title': 'On failure',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.required.desc':
|
||||
'When on, a failure on this check blocks the run from being delivered.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.required.title': 'Required',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.desc':
|
||||
'How many times a failing run is automatically re-run with the failure feedback before it stops. Set to 0 to disable auto-repair.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.rubric.maxRepairRounds.title':
|
||||
'Max repair rounds',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.rubric.name': 'Standard name',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.rubric.title': 'Standard settings',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.title': 'Check configuration',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.agent.desc':
|
||||
'A dedicated agent reads the trace, files, diff and PR before judging.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.agent.title': 'Agent check',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.llm.desc':
|
||||
'Judge against the text result and the run context.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.llm.title': 'LLM judgment',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.program.desc':
|
||||
'Validate via commands, APIs or status results. Good for tests, type-check, PR existence.',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.program.title': 'Program check',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.portal.verifier.title': 'Verification method',
|
||||
'builtins.lobe-delivery-checker.verifyPlan.required': 'Required',
|
||||
'builtins.lobe-knowledge-base.apiName.readKnowledge': 'Read Library content',
|
||||
'builtins.lobe-knowledge-base.apiName.searchKnowledgeBase': 'Search Library',
|
||||
'builtins.lobe-knowledge-base.inspector.andMoreFiles': 'and {{count}} more',
|
||||
|
||||
@@ -0,0 +1,72 @@
|
||||
export default {
|
||||
'badge.failed': 'Check failed',
|
||||
'badge.passed': 'Check passed',
|
||||
'badge.pending': 'Awaiting check',
|
||||
'badge.repairing': 'Repair triggered',
|
||||
|
||||
'behavior.auto_improve': 'Auto-fill',
|
||||
'behavior.auto_improveDesc': 'Filled in automatically; does not block delivery',
|
||||
'behavior.gate': 'Delivery gate',
|
||||
'behavior.gateDesc': 'Blocks delivery on failure and triggers a repair round',
|
||||
|
||||
'detail.checkedAt': 'Checked at',
|
||||
'detail.confidence': 'Confidence',
|
||||
'detail.counterEvidence': 'Counter-evidence',
|
||||
'detail.duration': 'Duration',
|
||||
'detail.evidence': 'Evidence',
|
||||
'detail.instruction': 'Judging rule',
|
||||
'detail.limitation': 'Limitation',
|
||||
'detail.method': 'Method',
|
||||
'detail.methodAgent': 'Agent',
|
||||
'detail.methodLlm': 'LLM',
|
||||
'detail.methodProgram': 'Program',
|
||||
'detail.model': 'Model',
|
||||
'detail.openTrace': 'View agent trace',
|
||||
'detail.pending': 'This check has not run yet.',
|
||||
'detail.reasoning': 'Reasoning',
|
||||
'detail.suggestion': 'Suggested fix',
|
||||
'detail.summary': 'Summary',
|
||||
'detail.tokens': 'Tokens',
|
||||
|
||||
'dock.confirm': 'Confirm & run',
|
||||
'dock.edit': 'Adjust checks',
|
||||
'dock.forceDeliver': 'Ignore & deliver',
|
||||
'dock.repairHint':
|
||||
'The next round is fixing the failed checks. A new result is produced and the checker re-runs when it finishes.',
|
||||
'dock.saveAndRepair': 'Save input & repair now',
|
||||
'dock.skip': 'Skip checks',
|
||||
'dock.title': 'Delivery Checker',
|
||||
|
||||
'editor.add': '+ Add check',
|
||||
'editor.cancel': 'Cancel',
|
||||
'editor.placeholder': 'Check title',
|
||||
'editor.save': 'Save',
|
||||
|
||||
'input.hint':
|
||||
'This goes to the next repair round as checker input — it will not appear as a chat message.',
|
||||
'input.label': 'Extra input for the next repair round',
|
||||
'input.placeholder': 'e.g. run type-check first; if it still fails, just add a risk note.',
|
||||
|
||||
'result.failed.sub':
|
||||
'This result is held back. The delivery checker found verification insufficient and triggered a repair.',
|
||||
'result.failed.title': 'Draft result',
|
||||
'result.foot': 'A snapshot of this run’s result — not an assistant or user message.',
|
||||
'result.kicker': 'Verification · Round {{round}}',
|
||||
'result.passed.sub':
|
||||
'The delivery checker passed {{passed}}/{{total}}. This result is ready to deliver.',
|
||||
'result.passed.title': 'Result',
|
||||
'result.pending.sub':
|
||||
'The result is generated but not yet delivered — waiting for the delivery checker.',
|
||||
'result.pending.title': 'Draft result',
|
||||
'result.repairing.sub': 'Checks did not pass. A repair round has started.',
|
||||
'result.repairing.title': 'Draft result',
|
||||
'result.title': 'Verification #{{round}}',
|
||||
|
||||
'status.checking': 'Delivery Checker: checking {{passed}}/{{total}}',
|
||||
'status.draft': 'Delivery Checker: awaiting confirmation · {{total}} checks',
|
||||
'status.failed': 'Delivery Checker: failed · repair triggered',
|
||||
'status.idle': 'Delivery Checker: not generated',
|
||||
'status.passed': 'Delivery Checker: passed {{passed}}/{{total}}',
|
||||
'status.repairing': 'Delivery Checker: repairing',
|
||||
'status.verifying': 'Delivery Checker: waiting for run to finish',
|
||||
};
|
||||
@@ -347,6 +347,31 @@ describe('createServerAgentToolsEngine', () => {
|
||||
expect(result.enabledToolIds).toContain(KnowledgeBaseManifest.identifier);
|
||||
});
|
||||
|
||||
it('custom mode: enables exactly the declared plugins, no always-on / defaults', () => {
|
||||
const context = createMockContext();
|
||||
const engine = createServerAgentToolsEngine(context, {
|
||||
agentConfig: {
|
||||
plugins: ['test-plugin'],
|
||||
chatConfig: { searchMode: 'on', toolMode: 'custom' },
|
||||
},
|
||||
hasEnabledKnowledgeBases: true,
|
||||
model: 'gpt-4',
|
||||
provider: 'openai',
|
||||
});
|
||||
|
||||
const result = engine.generateToolsDetailed({
|
||||
model: 'gpt-4',
|
||||
provider: 'openai',
|
||||
toolIds: ['test-plugin', LobeAgentManifest.identifier, WebBrowsingManifest.identifier],
|
||||
});
|
||||
|
||||
// Exactly the declared plugin — no always-on lobe-agent, no default web/KB.
|
||||
expect(result.enabledToolIds).toContain('test-plugin');
|
||||
expect(result.enabledToolIds).not.toContain(LobeAgentManifest.identifier);
|
||||
expect(result.enabledToolIds).not.toContain(WebBrowsingManifest.identifier);
|
||||
expect(result.enabledToolIds).not.toContain(KnowledgeBaseManifest.identifier);
|
||||
});
|
||||
|
||||
it('should return undefined tools when model does not support function calling', () => {
|
||||
const context = createMockContext({
|
||||
isModelSupportToolUse: () => false,
|
||||
|
||||
@@ -156,7 +156,13 @@ export const createServerAgentToolsEngine = (
|
||||
|
||||
const searchMode = agentConfig.chatConfig?.searchMode ?? 'auto';
|
||||
const isSearchEnabled = searchMode !== 'off';
|
||||
const isChatMode = agentConfig.chatConfig?.enableAgentMode === false;
|
||||
// Tool mode: explicit `toolMode` wins; otherwise derive from `enableAgentMode`
|
||||
// (undefined = agent). `custom` = toolset is exactly the agent's plugins.
|
||||
const toolMode: 'agent' | 'chat' | 'custom' =
|
||||
agentConfig.chatConfig?.toolMode ??
|
||||
(agentConfig.chatConfig?.enableAgentMode === false ? 'chat' : 'agent');
|
||||
const isChatMode = toolMode === 'chat';
|
||||
const isCustomMode = toolMode === 'custom';
|
||||
|
||||
log(
|
||||
'Creating agent tools engine model=%s provider=%s searchMode=%s platform=%s runtimeMode=%s additionalManifests=%d hasDeviceProxy=%s canUseDevice=%s isChatMode=%s',
|
||||
@@ -182,6 +188,12 @@ export const createServerAgentToolsEngine = (
|
||||
[WebBrowsingManifest.identifier]: isSearchEnabled,
|
||||
};
|
||||
|
||||
// Custom mode: the tool set is EXACTLY the agent's declared plugins — no
|
||||
// alwaysOn tools, no default/runtime-managed injection, no activator. Used by
|
||||
// focused builtin sub-agents (e.g. the verify agent, which mounts only its
|
||||
// writeback tool) that need a precise, self-configured toolset.
|
||||
const customModeRules = Object.fromEntries((agentConfig.plugins ?? []).map((id) => [id, true]));
|
||||
|
||||
const agentModeRules = {
|
||||
// User-selected plugins
|
||||
...Object.fromEntries((agentConfig.plugins ?? []).map((id) => [id, true])),
|
||||
@@ -227,8 +239,13 @@ export const createServerAgentToolsEngine = (
|
||||
// activation could resolve the manifest and bypass the rule-layer
|
||||
// gates below ().
|
||||
builtinTools: buildAllowedBuiltinTools({ canUseDevice, disableLocalSystem }),
|
||||
// Add default tools based on configuration
|
||||
defaultToolIds: isChatMode ? chatModeAllowedToolIds : defaultToolIds,
|
||||
// Add default tools based on configuration. Custom mode = exactly the
|
||||
// agent's plugins; chat mode = strict allow-list; agent mode = full defaults.
|
||||
defaultToolIds: isCustomMode
|
||||
? (agentConfig.plugins ?? [])
|
||||
: isChatMode
|
||||
? chatModeAllowedToolIds
|
||||
: defaultToolIds,
|
||||
// Post-merge wall: a plugin or Skill/Klavis manifest claiming a
|
||||
// device identifier survives `buildAllowedBuiltinTools` (which only
|
||||
// filters the builtin source). Excluding the identifiers here drops
|
||||
@@ -237,9 +254,9 @@ export const createServerAgentToolsEngine = (
|
||||
excludeIdentifiers: canUseDevice ? undefined : DEVICE_TOOL_IDENTIFIERS,
|
||||
enableChecker: createEnableChecker({
|
||||
// Allow lobe-activator to dynamically enable tools at runtime (e.g., lobe-creds, lobe-cron).
|
||||
// Disabled in chat mode so the activator can't bypass the whitelist.
|
||||
allowExplicitActivation: !isChatMode,
|
||||
rules: isChatMode ? chatModeRules : agentModeRules,
|
||||
// Only in agent mode; chat/custom modes can't let the activator bypass their fixed set.
|
||||
allowExplicitActivation: toolMode === 'agent',
|
||||
rules: isCustomMode ? customModeRules : isChatMode ? chatModeRules : agentModeRules,
|
||||
}),
|
||||
});
|
||||
};
|
||||
|
||||
@@ -62,6 +62,11 @@ export interface ServerCreateAgentToolsEngineParams {
|
||||
enableAgentMode?: boolean;
|
||||
runtimeEnv?: RuntimeEnvConfig;
|
||||
searchMode?: 'off' | 'on' | 'auto';
|
||||
/**
|
||||
* Overrides the `enableAgentMode` derivation. `custom` = the toolset is
|
||||
* exactly the agent's declared plugins (focused builtin sub-agents).
|
||||
*/
|
||||
toolMode?: 'agent' | 'chat' | 'custom';
|
||||
};
|
||||
/** Plugin IDs enabled for this agent */
|
||||
plugins?: string[];
|
||||
|
||||
@@ -69,6 +69,7 @@ import { usageRouter } from './usage';
|
||||
import { userRouter } from './user';
|
||||
import { userMemoriesRouter } from './userMemories';
|
||||
import { userMemoryRouter } from './userMemory';
|
||||
import { verifyRouter } from './verify';
|
||||
import { videoRouter } from './video';
|
||||
import { webBrowsingRouter } from './webBrowsing';
|
||||
|
||||
@@ -132,6 +133,7 @@ export const lambdaRouter = router({
|
||||
user: userRouter,
|
||||
userMemories: userMemoriesRouter,
|
||||
userMemory: userMemoryRouter,
|
||||
verify: verifyRouter,
|
||||
video: videoRouter,
|
||||
webBrowsing: webBrowsingRouter,
|
||||
accountDeletion: accountDeletionRouter,
|
||||
|
||||
@@ -0,0 +1,229 @@
|
||||
import { z } from 'zod';
|
||||
|
||||
import { AgentOperationModel } from '@/database/models/agentOperation';
|
||||
import { LlmGenerationTracingModel } from '@/database/models/llmGenerationTracing';
|
||||
import { VerifyCheckResultModel } from '@/database/models/verifyCheckResult';
|
||||
import { VerifyCriterionModel } from '@/database/models/verifyCriterion';
|
||||
import { VerifyRubricModel } from '@/database/models/verifyRubric';
|
||||
import { authedProcedure, router } from '@/libs/trpc/lambda';
|
||||
import { serverDatabase } from '@/libs/trpc/lambda/middleware';
|
||||
import {
|
||||
VerifyExecutorService,
|
||||
VerifyFeedbackService,
|
||||
VerifyPlanGeneratorService,
|
||||
} from '@/server/services/verify';
|
||||
|
||||
const verifierTypeSchema = z.enum(['program', 'agent', 'llm']);
|
||||
const onFailSchema = z.enum(['manual', 'auto_repair']);
|
||||
const decisionSchema = z.enum(['accepted', 'rejected', 'overridden']);
|
||||
const modelConfigSchema = z.object({ model: z.string(), provider: z.string() });
|
||||
|
||||
/** Run-policy knobs persisted on a rubric (see VerifyRubricConfig). */
|
||||
const rubricConfigSchema = z.object({
|
||||
maxRepairRounds: z.number().int().min(0).max(5).optional(),
|
||||
});
|
||||
|
||||
const checkItemSchema = z.object({
|
||||
id: z.string(),
|
||||
index: z.number(),
|
||||
onFail: onFailSchema,
|
||||
required: z.boolean(),
|
||||
sourceCriterionId: z.string().nullable().optional(),
|
||||
sourceRubricId: z.string().nullable().optional(),
|
||||
title: z.string(),
|
||||
verifierConfig: z.record(z.unknown()),
|
||||
verifierType: verifierTypeSchema,
|
||||
});
|
||||
|
||||
const verifyProcedure = authedProcedure.use(serverDatabase).use(async (opts) => {
|
||||
const { ctx } = opts;
|
||||
return opts.next({
|
||||
ctx: {
|
||||
criterionModel: new VerifyCriterionModel(ctx.serverDB, ctx.userId),
|
||||
executorService: new VerifyExecutorService(ctx.serverDB, ctx.userId),
|
||||
tracingModel: new LlmGenerationTracingModel(ctx.serverDB, ctx.userId),
|
||||
feedbackService: new VerifyFeedbackService(ctx.serverDB, ctx.userId),
|
||||
operationModel: new AgentOperationModel(ctx.serverDB, ctx.userId),
|
||||
planGenerator: new VerifyPlanGeneratorService(ctx.serverDB, ctx.userId),
|
||||
resultModel: new VerifyCheckResultModel(ctx.serverDB, ctx.userId),
|
||||
rubricModel: new VerifyRubricModel(ctx.serverDB, ctx.userId),
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
export const verifyRouter = router({
|
||||
// ---- criteria (reusable atomic standards) ----
|
||||
createCriterion: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
documentId: z.string().optional(),
|
||||
onFail: onFailSchema.optional(),
|
||||
required: z.boolean().optional(),
|
||||
title: z.string(),
|
||||
verifierConfig: z.record(z.unknown()).optional(),
|
||||
verifierType: verifierTypeSchema,
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) => ctx.criterionModel.create(input)),
|
||||
|
||||
deleteCriterion: verifyProcedure
|
||||
.input(z.object({ id: z.string() }))
|
||||
.mutation(async ({ ctx, input }) => ctx.criterionModel.delete(input.id)),
|
||||
|
||||
listCriteria: verifyProcedure.query(async ({ ctx }) => ctx.criterionModel.query()),
|
||||
|
||||
updateCriterion: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
id: z.string(),
|
||||
value: z.object({
|
||||
description: z.string().nullable().optional(),
|
||||
documentId: z.string().nullable().optional(),
|
||||
onFail: onFailSchema.optional(),
|
||||
required: z.boolean().optional(),
|
||||
title: z.string().optional(),
|
||||
verifierConfig: z.record(z.unknown()).optional(),
|
||||
verifierType: verifierTypeSchema.optional(),
|
||||
}),
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) => ctx.criterionModel.update(input.id, input.value)),
|
||||
|
||||
// ---- rubrics (named criteria groups) ----
|
||||
createRubric: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
config: rubricConfigSchema.optional(),
|
||||
description: z.string().optional(),
|
||||
title: z.string(),
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) => ctx.rubricModel.create(input)),
|
||||
|
||||
deleteRubric: verifyProcedure
|
||||
.input(z.object({ id: z.string() }))
|
||||
.mutation(async ({ ctx, input }) => ctx.rubricModel.delete(input.id)),
|
||||
|
||||
getRubric: verifyProcedure
|
||||
.input(z.object({ id: z.string() }))
|
||||
.query(async ({ ctx, input }) => ctx.rubricModel.findById(input.id)),
|
||||
|
||||
getRubricCriteria: verifyProcedure
|
||||
.input(z.object({ rubricId: z.string() }))
|
||||
.query(async ({ ctx, input }) => ctx.rubricModel.getCriteria(input.rubricId)),
|
||||
|
||||
listRubrics: verifyProcedure.query(async ({ ctx }) => ctx.rubricModel.query()),
|
||||
|
||||
setRubricCriteria: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
criteria: z.array(z.object({ criterionId: z.string(), sortOrder: z.number().optional() })),
|
||||
rubricId: z.string(),
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) =>
|
||||
ctx.rubricModel.setCriteria(input.rubricId, input.criteria),
|
||||
),
|
||||
|
||||
updateRubric: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
id: z.string(),
|
||||
value: z.object({
|
||||
config: rubricConfigSchema.optional(),
|
||||
description: z.string().nullable().optional(),
|
||||
title: z.string().optional(),
|
||||
}),
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) => ctx.rubricModel.update(input.id, input.value)),
|
||||
|
||||
// ---- per-run plan ----
|
||||
confirmPlan: verifyProcedure
|
||||
.input(z.object({ operationId: z.string() }))
|
||||
.mutation(async ({ ctx, input }) => ctx.operationModel.confirmVerifyPlan(input.operationId)),
|
||||
|
||||
generateDraftPlan: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
context: z.string().optional(),
|
||||
enableAiGeneration: z.boolean().optional(),
|
||||
goal: z.string(),
|
||||
maxAiCriteria: z.number().optional(),
|
||||
modelConfig: modelConfigSchema.optional(),
|
||||
operationId: z.string(),
|
||||
verifyCriteriaIds: z.array(z.string()).optional(),
|
||||
verifyRubricId: z.string().nullable().optional(),
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) => ctx.planGenerator.generateDraftPlan(input)),
|
||||
|
||||
getVerifierThread: verifyProcedure
|
||||
.input(z.object({ operationId: z.string() }))
|
||||
.query(async ({ ctx, input }) => {
|
||||
// Resolve an agent verifier's sub-run to the thread it ran in, so the
|
||||
// client can open that execution trace in the portal.
|
||||
const op = await ctx.operationModel.findById(input.operationId);
|
||||
if (!op) return null;
|
||||
return { threadId: op.threadId ?? null, topicId: op.topicId ?? null };
|
||||
}),
|
||||
|
||||
getVerifierTracing: verifyProcedure
|
||||
.input(z.object({ tracingId: z.string() }))
|
||||
.query(async ({ ctx, input }) => {
|
||||
// The model / token / latency of an LLM verifier's judgment, surfaced in
|
||||
// the result detail panel.
|
||||
const row = await ctx.tracingModel.findById(input.tracingId);
|
||||
if (!row) return null;
|
||||
return {
|
||||
inputTokens: row.inputTokens ?? null,
|
||||
latencyMs: row.latencyMs ?? null,
|
||||
model: row.model ?? null,
|
||||
outputTokens: row.outputTokens ?? null,
|
||||
provider: row.provider ?? null,
|
||||
};
|
||||
}),
|
||||
|
||||
getVerifyState: verifyProcedure
|
||||
.input(z.object({ operationId: z.string() }))
|
||||
.query(async ({ ctx, input }) => ctx.operationModel.getVerifyState(input.operationId)),
|
||||
|
||||
skipPlan: verifyProcedure
|
||||
.input(z.object({ operationId: z.string() }))
|
||||
.mutation(async ({ ctx, input }) =>
|
||||
ctx.operationModel.updateVerifyStatus(input.operationId, null),
|
||||
),
|
||||
|
||||
updateDraftItems: verifyProcedure
|
||||
.input(z.object({ items: z.array(checkItemSchema), operationId: z.string() }))
|
||||
.mutation(async ({ ctx, input }) =>
|
||||
ctx.operationModel.replaceVerifyPlanItems(input.operationId, input.items),
|
||||
),
|
||||
|
||||
// ---- results / execution ----
|
||||
executeVerify: verifyProcedure
|
||||
.input(
|
||||
z.object({
|
||||
batchLlm: z.boolean().optional(),
|
||||
deliverable: z.string(),
|
||||
goal: z.string(),
|
||||
modelConfig: modelConfigSchema,
|
||||
operationId: z.string(),
|
||||
}),
|
||||
)
|
||||
.mutation(async ({ ctx, input }) => {
|
||||
await ctx.executorService.execute(input);
|
||||
return ctx.resultModel.listByOperation(input.operationId);
|
||||
}),
|
||||
|
||||
listResults: verifyProcedure
|
||||
.input(z.object({ operationId: z.string() }))
|
||||
.query(async ({ ctx, input }) => ctx.resultModel.listByOperation(input.operationId)),
|
||||
|
||||
// ---- feedback (data flywheel) ----
|
||||
submitDecision: verifyProcedure
|
||||
.input(z.object({ decision: decisionSchema, resultId: z.string() }))
|
||||
.mutation(async ({ ctx, input }) =>
|
||||
ctx.feedbackService.submitDecision(input.resultId, input.decision),
|
||||
),
|
||||
});
|
||||
@@ -11,6 +11,7 @@ import { buildFinalSnapshotKey } from '@/server/modules/AgentTracing';
|
||||
import { emitAgentSignalSourceEvent } from '@/server/services/agentSignal';
|
||||
import { toAgentSignalTraceEvents } from '@/server/services/agentSignal/observability/traceEvents';
|
||||
import { extractSelfIterationCompletionPayload } from '@/server/services/agentSignal/services/selfIteration/completion';
|
||||
import { runVerifyOnCompletion } from '@/server/services/verify';
|
||||
|
||||
import { hookDispatcher } from './hooks';
|
||||
|
||||
@@ -259,6 +260,40 @@ export class CompletionLifecycle {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Insert a `role='verify'` message that renders the Agent Run delivery-checker
|
||||
* card (plan + results, read off `metadata.verifyOperationId`). Only created
|
||||
* when the run actually has a verify plan. Self-guarded — failures never affect
|
||||
* the run; the card is purely additive UI.
|
||||
*/
|
||||
private async createVerifyMessage(
|
||||
operationId: string,
|
||||
assistantMessageId: string | undefined,
|
||||
userId: string,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const operationModel = new AgentOperationModel(this.serverDB, userId);
|
||||
const state = await operationModel.getVerifyState(operationId);
|
||||
if (!state?.verifyPlan?.length) return;
|
||||
|
||||
const op = await operationModel.findById(operationId);
|
||||
if (!op?.topicId) return;
|
||||
|
||||
const messageModel = new MessageModel(this.serverDB, userId);
|
||||
await messageModel.create({
|
||||
agentId: op.agentId ?? undefined,
|
||||
content: '',
|
||||
metadata: { verifyOperationId: operationId },
|
||||
parentId: assistantMessageId,
|
||||
role: 'verify',
|
||||
threadId: op.threadId ?? undefined,
|
||||
topicId: op.topicId,
|
||||
});
|
||||
} catch (error) {
|
||||
log('createVerifyMessage failed for op %s (non-fatal): %O', operationId, error);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Dispatch `onComplete` (and `onError` for `reason='error'`) hooks via
|
||||
* the global `hookDispatcher`. On the error path, also writes the error
|
||||
@@ -285,6 +320,32 @@ export class CompletionLifecycle {
|
||||
|
||||
await hookDispatcher.dispatch(operationId, 'onComplete', event, metadata._hooks);
|
||||
|
||||
// Delivery checker: on a successful completion, run the confirmed verify
|
||||
// plan against the deliverable. Fire-and-forget and self-guarded — a run
|
||||
// without an opted-in plan is a no-op, and failures never affect the run.
|
||||
if (reason === 'done') {
|
||||
const messages: any[] = Array.isArray(state?.messages) ? state.messages : [];
|
||||
const firstUserMessage = messages.find((m) => m?.role === 'user');
|
||||
const goal = firstUserMessage
|
||||
? (extractTextFromMessageContent(firstUserMessage.content) ?? '')
|
||||
: '';
|
||||
// Surface the delivery-checker card first (a role='verify' message that
|
||||
// renders the run's plan + results). Awaited before verification so
|
||||
// auto-repair can persist its failure feedback onto this card (the
|
||||
// VerifyMessageProcessor then surfaces it into the repair run's context).
|
||||
// Self-guarded — failures never affect the run.
|
||||
await this.createVerifyMessage(
|
||||
operationId,
|
||||
metadata?.assistantMessageId,
|
||||
metadata?.userId || this.userId,
|
||||
);
|
||||
void runVerifyOnCompletion(this.serverDB, metadata?.userId || this.userId, {
|
||||
deliverable: event.lastAssistantContent ?? '',
|
||||
goal,
|
||||
operationId,
|
||||
});
|
||||
}
|
||||
|
||||
if (reason === 'error') {
|
||||
await hookDispatcher.dispatch(operationId, 'onError', event, metadata._hooks);
|
||||
|
||||
|
||||
@@ -219,6 +219,15 @@ interface InternalExecAgentParams extends ExecAgentParams {
|
||||
* Defaults to true. Set to false for non-streaming scenarios (e.g., bot integrations).
|
||||
*/
|
||||
stream?: boolean;
|
||||
/**
|
||||
* Run the turn off existing topic history without injecting a new user message
|
||||
* (no user-message row, no Agent Signal source event). The agent responds to
|
||||
* whatever the context engine surfaces as the latest turn. Used by auto-repair,
|
||||
* where the failure feedback already lives on the verify card in history.
|
||||
* `prompt` is still used for the operation title / logs. Unlike `resume`, this
|
||||
* starts a fresh operation and skips the resume-specific validation.
|
||||
*/
|
||||
suppressUserMessage?: boolean;
|
||||
/** Task ID that triggered this execution (if trigger is 'task') */
|
||||
taskId?: string;
|
||||
/**
|
||||
@@ -409,6 +418,7 @@ export class AiAgentService {
|
||||
parentOperationId,
|
||||
resume,
|
||||
resumeApproval,
|
||||
suppressUserMessage,
|
||||
} = params;
|
||||
|
||||
// Validate that either agentId or slug is provided
|
||||
@@ -592,6 +602,12 @@ export class AiAgentService {
|
||||
// flag so downstream resume branches don't need to know about approval.
|
||||
const effectiveResume = resume || !!resumeApproval;
|
||||
|
||||
// Both resume and suppressUserMessage run the turn off existing history
|
||||
// instead of appending a new user message — share the message-construction
|
||||
// branches below. Resume-specific validation/approval stays gated on
|
||||
// `effectiveResume` only.
|
||||
const runFromHistory = effectiveResume || !!suppressUserMessage;
|
||||
|
||||
if (effectiveResume) {
|
||||
if (!parentMessageId) {
|
||||
throw new Error('parentMessageId is required when resume is true');
|
||||
@@ -768,7 +784,7 @@ export class AiAgentService {
|
||||
const operationId = nanoid();
|
||||
|
||||
// Create user message so the conversation is visible in the UI immediately.
|
||||
const userMsg = effectiveResume
|
||||
const userMsg = runFromHistory
|
||||
? undefined
|
||||
: await this.messageModel.create({
|
||||
agentId: resolvedAgentId,
|
||||
@@ -2013,7 +2029,7 @@ export class AiAgentService {
|
||||
|
||||
// 13. Create user message in database
|
||||
// Include threadId if provided (for SubAgent task execution in isolated Thread)
|
||||
const userMessageRecord = effectiveResume
|
||||
const userMessageRecord = runFromHistory
|
||||
? undefined
|
||||
: await this.messageModel.create({
|
||||
agentId: persistAgentId,
|
||||
@@ -2095,7 +2111,7 @@ export class AiAgentService {
|
||||
};
|
||||
|
||||
// Combine history messages with user message
|
||||
const allMessages = effectiveResume ? historyMessages : [...historyMessages, userMessage];
|
||||
const allMessages = runFromHistory ? historyMessages : [...historyMessages, userMessage];
|
||||
|
||||
log('execAgent: prepared evalContext for executor');
|
||||
|
||||
@@ -2111,7 +2127,7 @@ export class AiAgentService {
|
||||
// Pass assistant message ID so agent runtime knows which message to update
|
||||
assistantMessageId: assistantMessageRecord.id,
|
||||
isFirstMessage: true,
|
||||
message: effectiveResume ? [{ content: '' }] : [{ content: prompt }],
|
||||
message: runFromHistory ? [{ content: '' }] : [{ content: prompt }],
|
||||
// Pass user message ID as parentMessageId for reference
|
||||
parentMessageId: parentMessageId ?? userMessageRecord?.id ?? '',
|
||||
// Include tools for initial LLM call
|
||||
|
||||
@@ -100,9 +100,20 @@ export const createLLMGenerationTracingHook = (
|
||||
const validationFailed =
|
||||
!data.success && typeof errorMessage === 'string' && /zod|validation/i.test(errorMessage);
|
||||
|
||||
// In-process completion callback (not persisted): lets a caller late-bind
|
||||
// a hard FK onto the tracing row, which only exists after this deferred
|
||||
// record() commits. Carried on the raw `tracing` bag, so it survives
|
||||
// `parseTracingOptions` (which keeps only serializable string fields).
|
||||
const rawTracing = (context.options?.tracing ?? {}) as Record<string, unknown>;
|
||||
const onPersisted =
|
||||
typeof rawTracing.onPersisted === 'function'
|
||||
? (rawTracing.onPersisted as (tracingId: string | null) => void | Promise<void>)
|
||||
: undefined;
|
||||
|
||||
tryScheduleAfter(async () => {
|
||||
let persistedTracingId: string | null = null;
|
||||
try {
|
||||
await service.record({
|
||||
const result = await service.record({
|
||||
agentId: tracing.agentId,
|
||||
costUsd: (data.usage as { cost?: number } | undefined)?.cost,
|
||||
errorCode: data.error?.code,
|
||||
@@ -134,9 +145,20 @@ export const createLLMGenerationTracingHook = (
|
||||
userId,
|
||||
validationFailed,
|
||||
});
|
||||
persistedTracingId = result?.tracingId ?? null;
|
||||
} catch (err) {
|
||||
log('Tracing service threw: %O', err);
|
||||
}
|
||||
|
||||
// Signal completion after the row is committed (or null if it wasn't),
|
||||
// so the caller's backfill never references a non-existent row.
|
||||
if (onPersisted) {
|
||||
try {
|
||||
await onPersisted(persistedTracingId);
|
||||
} catch (err) {
|
||||
log('onPersisted callback threw: %O', err);
|
||||
}
|
||||
}
|
||||
});
|
||||
},
|
||||
};
|
||||
|
||||
@@ -21,6 +21,7 @@ import { cloudSandboxRuntime } from './cloudSandbox';
|
||||
import { credsRuntime } from './creds';
|
||||
import { knowledgeBaseRuntime } from './knowledgeBase';
|
||||
import { lobeAgentRuntime } from './lobeAgent';
|
||||
import { lobeDeliveryCheckerRuntime } from './lobeDeliveryChecker';
|
||||
import { localSystemRuntime } from './localSystem';
|
||||
import { memoryRuntime } from './memory';
|
||||
import { messageRuntime } from './message';
|
||||
@@ -35,6 +36,7 @@ import { taskRuntime } from './task';
|
||||
import { topicReferenceRuntime } from './topicReference';
|
||||
import type { ServerRuntimeFactory, ServerRuntimeRegistration } from './types';
|
||||
import { userInteractionRuntime } from './userInteraction';
|
||||
import { verifyResultRuntime } from './verifyResult';
|
||||
import { webBrowsingRuntime } from './webBrowsing';
|
||||
import { webOnboardingRuntime } from './webOnboarding';
|
||||
|
||||
@@ -83,6 +85,8 @@ registerRuntimes([
|
||||
agentSignalReflectionRuntime,
|
||||
agentSignalFeedbackIntentRuntime,
|
||||
pageAgentRuntime,
|
||||
verifyResultRuntime,
|
||||
lobeDeliveryCheckerRuntime,
|
||||
]);
|
||||
|
||||
// ==================== Registry API ====================
|
||||
|
||||
@@ -40,6 +40,8 @@ interface LobeAgentRuntimeContext {
|
||||
agentId?: string | null;
|
||||
groupId?: string | null;
|
||||
messageId: string;
|
||||
/** The current Agent Run (`agent_operations.id`). */
|
||||
operationId?: string;
|
||||
serverDB: LobeChatDatabase;
|
||||
threadId?: string | null;
|
||||
topicId?: string;
|
||||
@@ -76,6 +78,7 @@ class LobeAgentExecutionRuntime {
|
||||
private groupId?: string | null;
|
||||
private userId: string;
|
||||
private messageId: string;
|
||||
private operationId?: string;
|
||||
private threadId?: string | null;
|
||||
private topicId?: string;
|
||||
private planRuntime: PlanExecutionRuntime;
|
||||
@@ -85,6 +88,7 @@ class LobeAgentExecutionRuntime {
|
||||
this.db = context.serverDB;
|
||||
this.groupId = context.groupId;
|
||||
this.messageId = context.messageId;
|
||||
this.operationId = context.operationId;
|
||||
this.threadId = context.threadId;
|
||||
this.topicId = context.topicId;
|
||||
this.userId = context.userId;
|
||||
@@ -374,6 +378,7 @@ export const lobeAgentRuntime: ServerRuntimeRegistration = {
|
||||
agentId: context.agentId,
|
||||
groupId: context.groupId,
|
||||
messageId: context.messageId,
|
||||
operationId: context.operationId,
|
||||
serverDB: context.serverDB,
|
||||
threadId: context.threadId,
|
||||
topicId: context.topicId,
|
||||
|
||||
@@ -0,0 +1,116 @@
|
||||
import { LobeDeliveryCheckerIdentifier } from '@lobechat/builtin-tool-lobe-delivery-checker';
|
||||
import type { BuiltinServerRuntimeOutput } from '@lobechat/types';
|
||||
|
||||
import type { LobeChatDatabase } from '@/database/type';
|
||||
|
||||
import type { ServerRuntimeRegistration } from './types';
|
||||
|
||||
interface LobeDeliveryCheckerRuntimeContext {
|
||||
/** The current Agent Run (`agent_operations.id`) — the verify plan attaches to it. */
|
||||
operationId?: string;
|
||||
serverDB: LobeChatDatabase;
|
||||
userId: string;
|
||||
}
|
||||
|
||||
const buildError = (content: string, code: string): BuiltinServerRuntimeOutput => ({
|
||||
content,
|
||||
error: { code, message: content },
|
||||
success: false,
|
||||
});
|
||||
|
||||
/**
|
||||
* Server runtime for the delivery-checker tool. The agent calls
|
||||
* `generateVerifyPlan` (post-approval) enumerating the checks the deliverable
|
||||
* must satisfy; this creates the criteria + a reusable rubric and snapshots them
|
||||
* onto the current Agent Run so the checks run automatically when it completes.
|
||||
*/
|
||||
class LobeDeliveryCheckerExecutionRuntime {
|
||||
private operationId?: string;
|
||||
private db: LobeChatDatabase;
|
||||
private userId: string;
|
||||
|
||||
constructor(context: LobeDeliveryCheckerRuntimeContext) {
|
||||
this.operationId = context.operationId;
|
||||
this.db = context.serverDB;
|
||||
this.userId = context.userId;
|
||||
}
|
||||
|
||||
generateVerifyPlan = async (params: {
|
||||
criteria?: {
|
||||
description?: string;
|
||||
instruction?: string;
|
||||
onFail?: 'manual' | 'auto_repair';
|
||||
required?: boolean;
|
||||
title: string;
|
||||
verifierType?: 'program' | 'agent' | 'llm';
|
||||
}[];
|
||||
title: string;
|
||||
}): Promise<BuiltinServerRuntimeOutput> => {
|
||||
if (!this.operationId) {
|
||||
return buildError(
|
||||
'Verify plan generation requires an active Agent Run operation.',
|
||||
'NO_OPERATION',
|
||||
);
|
||||
}
|
||||
if (!params.title || typeof params.title !== 'string' || !params.title.trim()) {
|
||||
return buildError('title is required.', 'INVALID_ARGUMENTS');
|
||||
}
|
||||
const criteria = (params.criteria ?? []).filter((c) => c?.title?.trim());
|
||||
if (criteria.length === 0) {
|
||||
return buildError('At least one criterion with a title is required.', 'INVALID_ARGUMENTS');
|
||||
}
|
||||
|
||||
// Agent-authored path: the model enumerated the checks, so create the
|
||||
// criteria + a rubric, snapshot it onto this operation, and confirm it. The
|
||||
// tool call is human-reviewed (humanIntervention); this runs post-approval.
|
||||
const { VerifyPlanGeneratorService } = await import('@/server/services/verify');
|
||||
const planGenerator = new VerifyPlanGeneratorService(this.db, this.userId);
|
||||
const { items, rubricId } = await planGenerator.createPlanFromCriteria({
|
||||
criteria,
|
||||
operationId: this.operationId,
|
||||
title: params.title,
|
||||
});
|
||||
|
||||
return {
|
||||
content: `Created delivery standard "${params.title}" with ${items.length} check(s): ${items
|
||||
.map((i) => `${i.title}${i.required ? ' (gate)' : ''}`)
|
||||
.join(
|
||||
'; ',
|
||||
)}. The checks run automatically when this operation completes — do not run them yourself.`,
|
||||
state: {
|
||||
items: items.map((i) => ({
|
||||
// Surface the persisted ids so the client can write edits back to the
|
||||
// criterion row (and its instruction document) from the portal.
|
||||
criterionId: i.sourceCriterionId ?? undefined,
|
||||
description: i.description,
|
||||
documentId: i.documentId,
|
||||
onFail: i.onFail,
|
||||
required: i.required,
|
||||
title: i.title,
|
||||
verifierType: i.verifierType,
|
||||
})),
|
||||
rubricId,
|
||||
title: params.title,
|
||||
},
|
||||
success: true,
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
export const lobeDeliveryCheckerRuntime: ServerRuntimeRegistration = {
|
||||
factory: (context) => {
|
||||
if (!context.serverDB) {
|
||||
throw new Error('serverDB is required for Delivery Checker execution');
|
||||
}
|
||||
if (!context.userId) {
|
||||
throw new Error('userId is required for Delivery Checker execution');
|
||||
}
|
||||
|
||||
return new LobeDeliveryCheckerExecutionRuntime({
|
||||
operationId: context.operationId,
|
||||
serverDB: context.serverDB,
|
||||
userId: context.userId,
|
||||
});
|
||||
},
|
||||
identifier: LobeDeliveryCheckerIdentifier,
|
||||
};
|
||||
@@ -0,0 +1,100 @@
|
||||
import type { SubmitVerifyResultParams } from '@lobechat/builtin-tool-verify';
|
||||
import { VerifyToolIdentifier } from '@lobechat/builtin-tool-verify';
|
||||
import debug from 'debug';
|
||||
|
||||
import { AgentOperationModel } from '@/database/models/agentOperation';
|
||||
import { VerifyCheckResultModel } from '@/database/models/verifyCheckResult';
|
||||
import type { LobeChatDatabase } from '@/database/type';
|
||||
import { maybeAutoRepair, VerifyStatusService } from '@/server/services/verify';
|
||||
|
||||
import type { ServerRuntimeRegistration } from './types';
|
||||
|
||||
const log = debug('lobe-server:verify-result-runtime');
|
||||
|
||||
interface VerifyResultRuntimeContext {
|
||||
operationId?: string;
|
||||
serverDB: LobeChatDatabase;
|
||||
userId: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Server runtime for the verify-result tool. The verifier sub-agent calls
|
||||
* `submitVerifyResult` once it has judged its check; this writes the verdict back
|
||||
* to the PARENT run's `verify_check_results` row (resolved from the sub-op's
|
||||
* `parentOperationId`) and recomputes the parent's rollup status.
|
||||
*/
|
||||
class VerifyResultExecutionRuntime {
|
||||
private operationId?: string;
|
||||
private db: LobeChatDatabase;
|
||||
private userId: string;
|
||||
|
||||
constructor(context: VerifyResultRuntimeContext) {
|
||||
this.operationId = context.operationId;
|
||||
this.db = context.serverDB;
|
||||
this.userId = context.userId;
|
||||
}
|
||||
|
||||
submitVerifyResult = async (params: SubmitVerifyResultParams) => {
|
||||
if (!this.operationId) {
|
||||
return { content: 'No operation context.', error: 'NO_OPERATION', success: false };
|
||||
}
|
||||
if (!params?.checkItemId || !params?.verdict) {
|
||||
return {
|
||||
content: 'checkItemId and verdict are required.',
|
||||
error: 'INVALID_ARGUMENTS',
|
||||
success: false,
|
||||
};
|
||||
}
|
||||
|
||||
// The verifier runs as a sub-agent; the row to update belongs to the parent run.
|
||||
const op = await new AgentOperationModel(this.db, this.userId).findById(this.operationId);
|
||||
const targetOperationId = op?.parentOperationId ?? this.operationId;
|
||||
|
||||
const status = params.verdict === 'passed' ? 'passed' : 'failed';
|
||||
await new VerifyCheckResultModel(this.db, this.userId).updateByCheckItem(
|
||||
targetOperationId,
|
||||
params.checkItemId,
|
||||
{
|
||||
completedAt: new Date(),
|
||||
status,
|
||||
toulmin: {
|
||||
counterEvidence: params.counterEvidence,
|
||||
evidence: params.evidence,
|
||||
limitation: params.limitation,
|
||||
reasoning: params.reasoning,
|
||||
},
|
||||
verdict: params.verdict,
|
||||
},
|
||||
);
|
||||
await new VerifyStatusService(this.db, this.userId).recompute(targetOperationId);
|
||||
// This may be the last check to resolve — kick auto-repair if the run failed
|
||||
// with auto_repair checks (no-op until everything has a terminal result).
|
||||
await maybeAutoRepair(this.db, this.userId, targetOperationId);
|
||||
|
||||
log(
|
||||
'submitted verdict %s for check %s (op %s)',
|
||||
params.verdict,
|
||||
params.checkItemId,
|
||||
targetOperationId,
|
||||
);
|
||||
|
||||
return {
|
||||
content: `Recorded verdict "${params.verdict}" for the check. Verification complete.`,
|
||||
success: true,
|
||||
};
|
||||
};
|
||||
}
|
||||
|
||||
export const verifyResultRuntime: ServerRuntimeRegistration = {
|
||||
factory: (context) => {
|
||||
if (!context.userId || !context.serverDB) {
|
||||
throw new Error('userId and serverDB are required for verify-result tool execution');
|
||||
}
|
||||
return new VerifyResultExecutionRuntime({
|
||||
operationId: context.operationId,
|
||||
serverDB: context.serverDB,
|
||||
userId: context.userId,
|
||||
});
|
||||
},
|
||||
identifier: VerifyToolIdentifier,
|
||||
};
|
||||
@@ -0,0 +1,41 @@
|
||||
import { describe, expect, it } from 'vitest';
|
||||
|
||||
import { computeFalseFlags } from '../feedbackService';
|
||||
|
||||
describe('computeFalseFlags', () => {
|
||||
it('marks a false positive when verifier failed but user rejected/overrode', () => {
|
||||
expect(computeFalseFlags('failed', 'rejected')).toEqual({
|
||||
isFalseNegative: false,
|
||||
isFalsePositive: true,
|
||||
});
|
||||
expect(computeFalseFlags('failed', 'overridden')).toEqual({
|
||||
isFalseNegative: false,
|
||||
isFalsePositive: true,
|
||||
});
|
||||
});
|
||||
|
||||
it('marks a false negative when verifier passed but user rejected', () => {
|
||||
expect(computeFalseFlags('passed', 'rejected')).toEqual({
|
||||
isFalseNegative: true,
|
||||
isFalsePositive: false,
|
||||
});
|
||||
});
|
||||
|
||||
it('marks neither when the user accepts the verdict', () => {
|
||||
expect(computeFalseFlags('failed', 'accepted')).toEqual({
|
||||
isFalseNegative: false,
|
||||
isFalsePositive: false,
|
||||
});
|
||||
expect(computeFalseFlags('passed', 'accepted')).toEqual({
|
||||
isFalseNegative: false,
|
||||
isFalsePositive: false,
|
||||
});
|
||||
});
|
||||
|
||||
it('treats uncertain verdicts as neither FP nor FN', () => {
|
||||
expect(computeFalseFlags('uncertain', 'rejected')).toEqual({
|
||||
isFalseNegative: false,
|
||||
isFalsePositive: false,
|
||||
});
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,103 @@
|
||||
import { BUILTIN_AGENT_SLUGS } from '@lobechat/builtin-agents';
|
||||
import type { VerifyCheckItem } from '@lobechat/types';
|
||||
import { ThreadType } from '@lobechat/types';
|
||||
import debug from 'debug';
|
||||
|
||||
import { AgentModel } from '@/database/models/agent';
|
||||
import { DocumentModel } from '@/database/models/document';
|
||||
import { ThreadModel } from '@/database/models/thread';
|
||||
import type { LobeChatDatabase } from '@/database/type';
|
||||
|
||||
import type { VerifierAgentRunner } from './executor';
|
||||
|
||||
const log = debug('lobe-server:verify-agent-verifier');
|
||||
|
||||
/**
|
||||
* Build the instruction for a verifier sub-agent investigating one check. The
|
||||
* sub-agent reports its verdict by calling the `submitVerifyResult` tool with the
|
||||
* `checkItemId` injected here — it does not write to the DB directly.
|
||||
*/
|
||||
export const buildVerifierPrompt = (params: {
|
||||
checkItem: VerifyCheckItem;
|
||||
deliverable: string;
|
||||
goal: string;
|
||||
instruction?: string;
|
||||
}): string => {
|
||||
const { checkItem, deliverable, goal, instruction } = params;
|
||||
return [
|
||||
`## Check to verify\ncheckItemId: ${checkItem.id}\nTitle: ${checkItem.title}`,
|
||||
checkItem.description ? `Summary: ${checkItem.description}` : '',
|
||||
instruction ? `\n## Judging instruction\n${instruction}` : '',
|
||||
`\n## Run goal\n${goal}`,
|
||||
deliverable ? `\n## Deliverable / final output\n${deliverable}` : '',
|
||||
`\n## Your task\nInvestigate whether the deliverable satisfies this check, following the judging instruction. Gather concrete evidence. When done, call \`submitVerifyResult\` exactly once with checkItemId="${checkItem.id}" and your verdict (passed / failed / uncertain) plus evidence and reasoning.`,
|
||||
]
|
||||
.filter(Boolean)
|
||||
.join('\n');
|
||||
};
|
||||
|
||||
/**
|
||||
* Build a {@link VerifierAgentRunner} that runs each `agent`-type check as the
|
||||
* dedicated builtin **verify agent**: it materializes the verify agent, opens an
|
||||
* isolated thread, and `execAgent`s (headless) with the check context (incl.
|
||||
* `checkItemId`) injected into the prompt. The verify agent investigates and
|
||||
* writes its verdict back via the `submitVerifyResult` tool during its run — no
|
||||
* document creation, no output parsing, no external completion hook.
|
||||
*/
|
||||
export const createVerifierAgentRunner = (params: {
|
||||
db: LobeChatDatabase;
|
||||
deliverable: string;
|
||||
/** Inherit the parent run's model so the verifier uses a configured provider. */
|
||||
model?: string | null;
|
||||
provider?: string | null;
|
||||
topicId?: string | null;
|
||||
userId: string;
|
||||
}): VerifierAgentRunner | undefined => {
|
||||
const { db, deliverable, model, provider, topicId, userId } = params;
|
||||
if (!topicId) return undefined;
|
||||
|
||||
return async ({ checkItem, goal, operationId }) => {
|
||||
// The detailed instruction is the criterion's rule body, stored in a document.
|
||||
const instruction = checkItem.documentId
|
||||
? ((await new DocumentModel(db, userId).findById(checkItem.documentId))?.content ?? undefined)
|
||||
: undefined;
|
||||
|
||||
// Materialize the builtin verify agent (idempotent) to get an id for the thread.
|
||||
const verifyAgent = await new AgentModel(db, userId).getBuiltinAgent(
|
||||
BUILTIN_AGENT_SLUGS.verifyAgent,
|
||||
);
|
||||
if (!verifyAgent) {
|
||||
log('verify agent unavailable, cannot run agent verifier for check %s', checkItem.id);
|
||||
return null;
|
||||
}
|
||||
|
||||
const thread = await new ThreadModel(db, userId).create({
|
||||
agentId: verifyAgent.id,
|
||||
title: `Verify: ${checkItem.title}`,
|
||||
topicId,
|
||||
type: ThreadType.Isolation,
|
||||
});
|
||||
if (!thread) {
|
||||
log('failed to create verifier thread for check %s', checkItem.id);
|
||||
return null;
|
||||
}
|
||||
|
||||
// Dynamic import breaks the static cycle: aiAgent → agentRuntime completion
|
||||
// → verify lifecycle → this runner → aiAgent.
|
||||
const { AiAgentService } = await import('@/server/services/aiAgent');
|
||||
const result = await new AiAgentService(db, userId).execAgent({
|
||||
appContext: { threadId: thread.id, topicId },
|
||||
autoStart: true,
|
||||
// Inherit the parent run's model/provider so the verifier uses a provider
|
||||
// that's actually configured (the builtin agent's default may not be).
|
||||
...(model ? { model } : {}),
|
||||
parentOperationId: operationId,
|
||||
prompt: buildVerifierPrompt({ checkItem, deliverable, goal, instruction }),
|
||||
...(provider ? { provider } : {}),
|
||||
slug: BUILTIN_AGENT_SLUGS.verifyAgent,
|
||||
userInterventionConfig: { approvalMode: 'headless' },
|
||||
});
|
||||
|
||||
return { verifierOperationId: result.operationId };
|
||||
};
|
||||
};
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user