feat: per-call llm_generation_tracing observability (#15124)

*  feat(database): add llm_generation_tracing schema + tracing package (LOBE-9462)

Foundation layer for per-call observability of `generateObject` calls.

- New Drizzle table `llm_generation_tracing` with identity / context / model /
  result / usage / storage / feedback / audit columns and full single-column
  index coverage (Postgres bitmap-scan friendly). Migration 0103 is idempotent
  (CREATE TABLE/INDEX IF NOT EXISTS) for safe re-runs.
- `LlmGenerationTracingModel` with `record` / `updateFeedback` / `findById` /
  `listRecent`, all userId-scoped to prevent cross-user leaks.
- New package `@lobechat/llm-generation-tracing` mirroring agent-tracing's
  shape: `ITracingStore` interface, `FileTracingStore` (local/dev, scenario
  subfolders + latest.json symlink), `computePromptHash` (6-char sha256 of
  systemPrompt + schema), and `TRACING_SCENARIO_REGISTRY` + `resolveScenario`
  with explicit scenario override.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(model-runtime): wire llm_generation_tracing into ModelRuntime.generateObject (LOBE-9462)

Per-call interception layer — one hook covers all generateObject callers.

- New `onGenerateObjectComplete` hook on `ModelRuntimeHooks`: always fires
  (success or failure) with latency, usage, output/error. Fixes the gap where
  `onGenerateObjectFinal` only fires when the runtime invokes `onUsage`.
- `S3TracingStore` (zstd level 3, key
  `llm-generation-tracing/{scenario}/{v}-{hash}/{date}/{id}.json.zst`) and
  `LLMGenerationTracingService` that does DB insert → store.save → patch
  storage_key. Store failures preserve the row with `metadata.store_error`.
- `createLLMGenerationTracingHook` + `mergeModelRuntimeHooks` wired into
  `initModelRuntimeFromDB`; tracing runs alongside business (billing) hooks
  via `next/server.after()` when available, microtask fallback otherwise.
  Unknown metadata keys (e.g. `parent_memory_trace_key`) pass through.
- Memory extractor accepts `parentMemoryTraceKey` option for the job-level
  backlink. Follow-up-action caller given an explicit `scenario: 'follow_up'`
  metadata override — it was the only OSS caller missing trigger metadata.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  test(llm-generation-tracing): type vi.fn mocks so tsgo accepts mock.calls indexing

The hook + service tests destructured `mock.calls[0][0]` and accessed nested
fields, which tsgo flagged as TS2493 / TS18046 because `vi.fn()` defaults to a
zero-arg signature. Add explicit type parameters to the mocks so tsgo can
infer the call tuple, and cast `call.payload` at the access point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(model-runtime): move mergeModelRuntimeHooks into the package

It's a generic utility for composing `ModelRuntimeHooks` instances — same
import surface as `ModelRuntime` and the hooks interface — so it belongs
alongside them rather than tucked under a server-side consumer.

- New `packages/model-runtime/src/core/mergeHooks.ts` exports
  `mergeModelRuntimeHooks` and is re-exported from the package index.
- Move the unit tests to `packages/model-runtime/src/core/mergeHooks.test.ts`,
  including a new case covering the "a throws → b is skipped" load-bearing
  semantics.
- `src/server/services/llmGenerationTracing/hook.ts` drops the local copy and
  the consumer (`src/server/modules/ModelRuntime/index.ts`) imports from
  `@lobechat/model-runtime`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(llm-generation-tracing): version lives with the prompt, not in a central table

`promptVersion` was baked into `TRACING_SCENARIO_REGISTRY`, far from any
prompt definition — editing a prompt + forgetting to bump the entry in a
completely different file was an obvious foot-gun.

- Registry is now `Record<string, string>` mapping trigger → scenario only;
  it's the stable concern that rarely changes.
- `resolveScenario` always passes `promptVersion` through from the caller,
  defaulting to `UNKNOWN_PROMPT_VERSION` ('v0') when absent.
- Each call site declares its own `*_PROMPT_VERSION` constant next to the
  prompt it describes. `followUpAction` ships the first one:
  `FOLLOW_UP_PROMPT_VERSION` in `prompts/index.ts`, threaded through
  `metadata.promptVersion` at the `generateObject` call. Other callers can
  add the same constant when they next touch their prompts.

The 6-char prompt hash on the row still catches forgotten bumps.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(input-completion): wire prompt-version metadata at the auto-complete call site

Aligns input auto-complete with the FOLLOW_UP_PROMPT_VERSION convention so
each prompt iteration is recordable as the chat-side tracing lands.

- `INPUT_COMPLETION_PROMPT_VERSION = 'v1.0'` declared next to
  `chainInputCompletion` — bump together with the prompt body.
- `fetchPresetTaskResult` accepts optional `metadata` and forwards it to
  `getChatCompletion`; the existing chat path already plumbs metadata to
  `ModelRuntime.chat` options.
- `InputEditor` call site passes
  `{ scenario: 'input_completion', promptVersion }`.

Note: `llm_generation_tracing` currently only fires from
`onGenerateObjectComplete`. Input completion is a `chat` call, so this
metadata is forward-looking until a chat-side tracing hook lands.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 🐛 fix(llm-generation-tracing): collapse bucketDir path.join args to silence turbopack glob warning

Turbopack's static analyzer treats `path.join(root, dyn1, dyn2)` as a
multi-segment glob pattern and warned that it could match ~12k files in
the project. Compose the relative subdir as a single string first, so
`path.join` only sees one dynamic segment.

Behavior unchanged — the resulting path is identical.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(input-completion): route auto-complete through generateObject for tracing

Auto-complete is the first preset-task caller migrated to the structured-
output path so it lands in `llm_generation_tracing` via the existing
`onGenerateObjectComplete` hook. No new server hook, no global chat-side
tracing.

- `chainInputCompletion` now returns `{ messages, schema }` with a minimal
  `{ completion: string }` schema and a stable `INPUT_COMPLETION_SCHEMA_NAME`
  constant. JSON wrapping costs ~15-30 tokens against a 100-token completion
  budget — negligible for the observability win.
- `StructureOutputSchema` / `StructureOutputParams` accept optional
  `metadata`; `aiChatRouter.outputJSON` merges caller metadata over the
  default trigger so `{ scenario, promptVersion, schemaName }` reach
  `ModelRuntime.generateObject` options unchanged.
- `IStructureSchema.description` is now optional to match the zod schema —
  previously the TS type was stricter than runtime validation accepted.
- `InputEditor` switches from `chatService.fetchPresetTaskResult` to
  `aiChatService.generateJSON`, reading `response.completion`. Streaming
  is dropped because auto-complete already buffers the full result before
  inserting; no UX change.
- Reverts the unused `metadata` field that was added to
  `fetchPresetTaskResult` in the previous commit — no current caller needs
  it now that input completion uses the generateObject path.

Bumps `INPUT_COMPLETION_PROMPT_VERSION` to v2.0 because the system prompt
gained an "output the completion field" instruction.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(aiGeneration): extract the runtime-init + generateObject dance into a service

Every server-side caller that produces structured output was repeating the
same two-step ritual: `initModelRuntimeFromDB(...)` → `runtime.generateObject(payload, { metadata })`.
`AiGenerationService` collapses it into one call so future cross-cutting
concerns (default metadata, retry, observability hooks) have one place to
land.

- New `src/server/services/aiGeneration/index.ts` exposes
  `generateObject<T>(input, options)` and is unit-tested for provider
  resolution + payload/metadata pass-through.
- `aiChatRouter.outputJSON` and `FollowUpActionService.extract` migrated to
  the service (other callers move organically when next touched).
- Drops the unused `keyVaultsPayload` field from `StructureOutputParams`
  and the placeholder at the InputEditor call site — key vaults are
  server-resolved from DB, the client never supplies them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(tracing): centralize TRACING_SCENARIOS const + inject AiGenerationService via trpc ctx

- New `packages/const/src/llmGenerationTracing.ts` exports `TRACING_SCENARIOS`
  + `TracingScenario` type — the single directory where every known scenario
  name lives. Adds `@lobechat/const` as a workspace dep on llm-generation-
  tracing so `TRACING_SCENARIO_REGISTRY` can reference the same literals.
- Callers (FollowUpActionService, InputEditor) replace `'follow_up'` /
  `'input_completion'` string literals with `TRACING_SCENARIOS.FollowUp` /
  `.InputCompletion`, so a typo or a rename fails the type-check instead of
  silently drifting on the row.
- `AiGenerationService` is now injected into the `aiChatProcedure` ctx
  middleware alongside `aiChatService`; `outputJSON` consumes it via
  `ctx.aiGenerationService` instead of new-ing it inside the handler.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

*  feat(llm-generation-tracing): add lt/llm-tracing CLI + drop local-only storage_key

- Add `lt` / `llm-tracing` CLI under @lobechat/llm-generation-tracing with
  `list` (recent records, --scenario filter, --json) and `inspect` (by
  tracing_id prefix or latest, --full, --json).
- `FileTracingStore.save` now returns `{ key: null }` so dev DB rows leave
  `storage_key` empty instead of recording a non-resolvable local path; S3
  store remains the source of truth for the real key. Add helpers
  `findByTracingId` / `getLatest` used by the CLI.
- Wire `agentId` and `topicId` into `input_completion` tracing metadata
  from the chat input auto-complete call site.
- Default `FileTracingStore` whenever NODE_ENV=development (drop the
  ENABLE_LLM_GENERATION_TRACING_LOCAL opt-in env var).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* 💄 style(llm-generation-tracing): prettier CLI output (tree + colors)

Mirror the @lobechat/agent-tracing viewer style:

- Inline ANSI color helpers (dim/bold/cyan/magenta/green/yellow/red).
- Compact single-line header with id, scenario, version, model, status,
  time — replaces the multi-line bullet list.
- Tree structure with `├─`/`└─` connectors instead of `── section ──`
  banners.
- input arrays render per-message (role + char count + preview) rather
  than dumping raw JSON.
- Small single-key outputs (e.g. `{ completion: "怎么样" }`) collapse
  to inline `key: "value"`.
- `lt list` switches to a colored, properly padded table.

Default view stays compact; --full expands system_prompt / input /
schema bodies.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ♻️ refactor(llm-generation-tracing): split `tracing` config out of `metadata`

`options.metadata` was overloaded — half tracing-specific structured fields
(scenario / promptVersion / schemaName / agentId / topicId / ...), half
free-form jsonb passthrough. Callers couldn't tell which was which, and the
inputHint was always auto-extracted (useless when the prompt wraps the user's
text in a template).

This commit introduces a dedicated `tracing` option:

- Add `TracingOptions` to @lobechat/llm-generation-tracing — the typed shape
  callers import (agentId / topicId / inputHint / scenario / promptVersion /
  schemaName / systemPrompt / parentTracingId / metadata).
- Add loose `tracing?: Record<string, unknown>` to GenerateObjectOptions and
  StructureOutputParams / StructureOutputSchema so the field flows through
  the runtime + TRPC.
- Tracing hook now reads `context.options.tracing` for structured fields; it
  still falls back to `metadata.trigger` for the cross-cutting trigger string
  (ModelRuntime itself uses metadata.trigger for timing logs, so trigger
  stays on metadata).
- Service `record()` accepts an explicit `inputHint`; otherwise falls back
  to auto-extraction from the first user message. Always truncated.
- Free-form jsonb fields move to `tracing.metadata` (was unknown-key passthrough
  on `metadata`).
- Call sites updated:
  - FollowUpAction now passes `tracing: { scenario, promptVersion, schemaName,
    topicId }` (previously `metadata`).
  - InputCompletion now passes `tracing: { agentId, topicId, inputHint: input,
    scenario, promptVersion, schemaName }` — `inputHint` is the user's actual
    typed text, not the wrapper prompt's first user message.
  - `aiChat.outputJSON` router forwards both metadata and tracing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Update inputCompletion.ts

* 🐛 fix(llm-generation-tracing): stop duplicating provider into the row's metadata jsonb

`provider` is already a first-class column on the `llm_generation_tracing`
row, so auto-stamping it into the `metadata` jsonb column on every call was
pure noise. The hook now writes the caller-supplied `tracing.metadata`
verbatim — empty/undefined when the caller had nothing to add.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Arvin Xu
2026-05-23 18:14:23 +08:00
committed by GitHub
parent ddb5794826
commit cce14911d1
49 changed files with 2926 additions and 71 deletions
+1
View File
@@ -106,6 +106,7 @@ vertex-ai-key.json
# Agent tracing snapshots
.agent-tracing/
.llm-generation-tracing/
# AI coding tools
.local/
+1
View File
@@ -264,6 +264,7 @@
"@lobechat/fetch-sse": "workspace:*",
"@lobechat/file-loaders": "workspace:*",
"@lobechat/heterogeneous-agents": "workspace:*",
"@lobechat/llm-generation-tracing": "workspace:*",
"@lobechat/local-file-shell": "workspace:*",
"@lobechat/markdown-patch": "workspace:*",
"@lobechat/memory-user-memory": "workspace:*",
+1
View File
@@ -10,6 +10,7 @@ export * from './file';
export * from './interests';
export * from './klavis';
export * from './layoutTokens';
export * from './llmGenerationTracing';
export * from './lobehubSkill';
export * from './message';
export * from './meta';
@@ -0,0 +1,25 @@
/**
* Canonical directory of every `llm_generation_tracing` scenario value.
*
* Add to this map whenever a new caller pipes through the tracing path so
* there's one place to scan for all known scenarios. Values are the literal
* strings persisted on the row's `scenario` column — keep them stable, they
* are dashboard / partition keys.
*/
export const TRACING_SCENARIOS = {
AgentSignal: 'agent_signal',
AgentWelcome: 'agent_welcome',
FollowUp: 'follow_up',
HomeBrief: 'home_brief',
InputCompletion: 'input_completion',
MemoryExtract: 'memory_extract',
SignalFeedbackDomain: 'signal_feedback_domain',
SignalFeedbackSatisfaction: 'signal_feedback_satisfaction',
SignalSkillIntent: 'signal_skill_intent',
SignalSkillManagement: 'signal_skill_management',
SignupEmailReview: 'signup_email_review',
TopicTitle: 'topic_title',
Unknown: 'unknown',
} as const;
export type TracingScenario = (typeof TRACING_SCENARIOS)[keyof typeof TRACING_SCENARIOS];
@@ -0,0 +1,162 @@
// @vitest-environment node
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import { getTestDB } from '../../core/getTestDB';
import { llmGenerationTracing, users } from '../../schemas';
import type { LobeChatDatabase } from '../../type';
import { LlmGenerationTracingModel } from '../llmGenerationTracing';
const serverDB: LobeChatDatabase = await getTestDB();
const userId = 'llm-gen-trace-test-user';
const otherUserId = 'llm-gen-trace-other-user';
beforeEach(async () => {
await serverDB.delete(users);
await serverDB.insert(users).values([{ id: userId }, { id: otherUserId }]);
});
afterEach(async () => {
await serverDB.delete(llmGenerationTracing);
await serverDB.delete(users);
});
describe('LlmGenerationTracingModel', () => {
describe('record', () => {
it('inserts a row and returns the generated uuid', async () => {
const model = new LlmGenerationTracingModel(serverDB, userId);
const { id } = await model.record({
inputHint: 'hello world',
inputTokens: 120,
latencyMs: 850,
model: 'gpt-4o-mini',
outputTokens: 40,
promptHash: 'ab1fc3',
promptVersion: 'v1.0',
provider: 'openai',
scenario: 'home_brief',
schemaName: 'HomeBriefOutputSchema',
success: true,
trigger: 'home_brief',
});
expect(id).toMatch(/^[0-9a-f-]{36}$/);
const row = await model.findById(id);
expect(row).toMatchObject({
id,
inputHint: 'hello world',
inputTokens: 120,
latencyMs: 850,
metadata: {},
model: 'gpt-4o-mini',
outputTokens: 40,
promptHash: 'ab1fc3',
promptVersion: 'v1.0',
provider: 'openai',
scenario: 'home_brief',
schemaName: 'HomeBriefOutputSchema',
success: true,
trigger: 'home_brief',
userId,
validationFailed: false,
});
expect(row?.createdAt).toBeInstanceOf(Date);
});
it('records a failure with error fields and validation flag', async () => {
const model = new LlmGenerationTracingModel(serverDB, userId);
const { id } = await model.record({
errorCode: 'validation_failed',
errorDetail: 'output missing required field "summary"',
latencyMs: 1200,
model: 'gpt-4o',
promptHash: 'cccccc',
promptVersion: 'v1.0',
provider: 'openai',
scenario: 'topic_title',
success: false,
validationFailed: true,
});
const row = await model.findById(id);
expect(row).toMatchObject({
errorCode: 'validation_failed',
errorDetail: 'output missing required field "summary"',
success: false,
validationFailed: true,
});
});
});
describe('updateFeedback', () => {
it('writes feedback columns and the updated timestamp', async () => {
const model = new LlmGenerationTracingModel(serverDB, userId);
const { id } = await model.record({
promptHash: 'aaaaaa',
promptVersion: 'v1.0',
scenario: 'agent_welcome',
success: true,
});
await model.updateFeedback(id, {
data: { clicked_question_index: 1 },
score: 1,
signal: 'positive',
source: 'explicit_thumbs',
});
const row = await model.findById(id);
expect(row).toMatchObject({
feedbackData: { clicked_question_index: 1 },
feedbackScore: 1,
feedbackSignal: 'positive',
feedbackSource: 'explicit_thumbs',
});
expect(row?.feedbackUpdatedAt).toBeInstanceOf(Date);
});
it("does not touch another user's row", async () => {
const owner = new LlmGenerationTracingModel(serverDB, userId);
const intruder = new LlmGenerationTracingModel(serverDB, otherUserId);
const { id } = await owner.record({
promptHash: 'aaaaaa',
promptVersion: 'v1.0',
scenario: 'follow_up',
success: true,
});
await intruder.updateFeedback(id, {
signal: 'negative',
source: 'manual_edit',
});
const row = await owner.findById(id);
expect(row?.feedbackSignal).toBeNull();
});
});
describe('findById / listRecent', () => {
it('only returns rows owned by the caller', async () => {
const owner = new LlmGenerationTracingModel(serverDB, userId);
const stranger = new LlmGenerationTracingModel(serverDB, otherUserId);
const { id } = await owner.record({
promptHash: 'aaaaaa',
promptVersion: 'v1.0',
scenario: 'memory_extract',
success: true,
});
expect(await stranger.findById(id)).toBeNull();
expect(await stranger.listRecent()).toHaveLength(0);
const rows = await owner.listRecent();
expect(rows).toHaveLength(1);
expect(rows[0].id).toBe(id);
});
});
});
@@ -0,0 +1,121 @@
import { and, desc, eq } from 'drizzle-orm';
import type {
LlmGenerationFeedbackSignal,
LlmGenerationFeedbackSource,
NewLlmGenerationTracing,
} from '../schemas/llmGenerationTracing';
import { llmGenerationTracing } from '../schemas/llmGenerationTracing';
import type { LobeChatDatabase } from '../type';
export interface RecordLlmGenerationParams {
agentId?: string | null;
costUsd?: number | null;
errorCode?: string | null;
errorDetail?: string | null;
inputHash?: string | null;
inputHint?: string | null;
inputTokens?: number | null;
latencyMs?: number | null;
metadata?: Record<string, unknown>;
model?: string | null;
outputTokens?: number | null;
parentTracingId?: string | null;
promptHash: string;
promptVersion: string;
provider?: string | null;
scenario: string;
schemaName?: string | null;
spanId?: string | null;
storageKey?: string | null;
success: boolean;
topicId?: string | null;
traceId?: string | null;
trigger?: string | null;
validationFailed?: boolean;
}
export interface UpdateLlmGenerationFeedbackParams {
data?: Record<string, unknown>;
score?: number | null;
signal: LlmGenerationFeedbackSignal;
source: LlmGenerationFeedbackSource;
}
export class LlmGenerationTracingModel {
private readonly db: LobeChatDatabase;
private readonly userId: string;
constructor(db: LobeChatDatabase, userId: string) {
this.db = db;
this.userId = userId;
}
async record(params: RecordLlmGenerationParams): Promise<{ id: string }> {
const values: NewLlmGenerationTracing = {
agentId: params.agentId ?? null,
costUsd: params.costUsd ?? null,
errorCode: params.errorCode ?? null,
errorDetail: params.errorDetail ?? null,
inputHash: params.inputHash ?? null,
inputHint: params.inputHint ?? null,
inputTokens: params.inputTokens ?? null,
latencyMs: params.latencyMs ?? null,
metadata: params.metadata ?? {},
model: params.model ?? null,
outputTokens: params.outputTokens ?? null,
parentTracingId: params.parentTracingId ?? null,
promptHash: params.promptHash,
promptVersion: params.promptVersion,
provider: params.provider ?? null,
scenario: params.scenario,
schemaName: params.schemaName ?? null,
spanId: params.spanId ?? null,
storageKey: params.storageKey ?? null,
success: params.success,
topicId: params.topicId ?? null,
traceId: params.traceId ?? null,
trigger: params.trigger ?? null,
userId: this.userId,
validationFailed: params.validationFailed ?? false,
};
const [row] = await this.db
.insert(llmGenerationTracing)
.values(values)
.returning({ id: llmGenerationTracing.id });
return { id: row.id };
}
async updateFeedback(id: string, params: UpdateLlmGenerationFeedbackParams): Promise<void> {
await this.db
.update(llmGenerationTracing)
.set({
feedbackData: params.data,
feedbackScore: params.score ?? null,
feedbackSignal: params.signal,
feedbackSource: params.source,
feedbackUpdatedAt: new Date(),
})
.where(and(eq(llmGenerationTracing.id, id), eq(llmGenerationTracing.userId, this.userId)));
}
async findById(id: string) {
const [row] = await this.db
.select()
.from(llmGenerationTracing)
.where(and(eq(llmGenerationTracing.id, id), eq(llmGenerationTracing.userId, this.userId)))
.limit(1);
return row ?? null;
}
async listRecent(limit = 50) {
return this.db
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.userId, this.userId))
.orderBy(desc(llmGenerationTracing.createdAt))
.limit(limit);
}
}
@@ -0,0 +1,18 @@
{
"name": "@lobechat/llm-generation-tracing",
"version": "1.0.0",
"private": true,
"exports": {
".": "./src/index.ts",
"./store": "./src/store/types.ts"
},
"main": "./src/index.ts",
"bin": {
"llm-tracing": "./src/cli/index.ts",
"lt": "./src/cli/index.ts"
},
"dependencies": {
"@lobechat/const": "workspace:*",
"commander": "^13.1.0"
}
}
+18
View File
@@ -0,0 +1,18 @@
#!/usr/bin/env bun
import { Command } from 'commander';
import { registerInspectCommand } from './inspect';
import { registerListCommand } from './list';
const program = new Command();
program
.name('llm-tracing')
.description('Inspect local llm-generation-tracing records under .llm-generation-tracing/')
.version('1.0.0');
registerInspectCommand(program);
registerListCommand(program);
program.parse();
@@ -0,0 +1,35 @@
import type { Command } from 'commander';
import { FileTracingStore } from '../store/file-store';
import type { TracingPayload } from '../types';
import { renderPayloadDetail } from '../viewer';
export function registerInspectCommand(program: Command) {
program
.command('inspect', { isDefault: true })
.alias('i')
.description('Inspect a tracing record by tracing_id prefix (defaults to latest)')
.argument('[tracingId]', 'tracing_id or prefix; omit to inspect the latest record')
.option('-j, --json', 'Output the raw JSON payload')
.option('-f, --full', 'Show full system_prompt / input / output (no truncation)')
.action(async (tracingId: string | undefined, opts: { full?: boolean; json?: boolean }) => {
const store = new FileTracingStore();
let record: TracingPayload | null;
if (tracingId) {
record = await store.findByTracingId(tracingId);
} else {
record = await store.getLatest();
}
if (!record) {
console.error(
tracingId
? `No tracing record matched id prefix: ${tracingId}`
: 'No tracing records found. Run a generateObject call first (NODE_ENV=development).',
);
process.exit(1);
}
console.info(opts.json ? JSON.stringify(record, null, 2) : renderPayloadDetail(record, opts));
});
}
@@ -0,0 +1,21 @@
import type { Command } from 'commander';
import { FileTracingStore } from '../store/file-store';
import { renderSummaryTable } from '../viewer';
export function registerListCommand(program: Command) {
program
.command('list')
.alias('ls')
.description('List recent llm-generation-tracing records (newest first)')
.option('-l, --limit <n>', 'Max number of records to show', '20')
.option('-s, --scenario <name>', 'Filter by scenario (e.g. input_completion, topic_title)')
.option('-j, --json', 'Output as JSON instead of a table')
.action(async (opts: { json?: boolean; limit: string; scenario?: string }) => {
const store = new FileTracingStore();
let limit = Number.parseInt(opts.limit, 10);
if (Number.isNaN(limit) || limit < 1) limit = 20;
const summaries = await store.list({ limit, scenario: opts.scenario });
console.info(opts.json ? JSON.stringify(summaries, null, 2) : renderSummaryTable(summaries));
});
}
@@ -0,0 +1,19 @@
export { computeInputHash, computePromptHash } from './promptHash';
export type { ResolveScenarioInput } from './registry';
export {
resolveScenario,
TRACING_SCENARIO_REGISTRY,
UNKNOWN_PROMPT_VERSION,
UNKNOWN_SCENARIO,
} from './registry';
export { DEFAULT_DIR, FileTracingStore } from './store/file-store';
export type { ITracingStore, SaveResult } from './store/types';
export type {
LlmGenerationFeedbackSignal,
ScenarioDefinition,
TracingErrorPayload,
TracingModelMetadata,
TracingOptions,
TracingPayload,
TracingSummary,
} from './types';
@@ -0,0 +1,42 @@
import { describe, expect, it } from 'vitest';
import { computeInputHash, computePromptHash } from './promptHash';
describe('computePromptHash', () => {
it('returns a 6-char hex digest', () => {
const hash = computePromptHash('you are a helpful agent', { type: 'object' });
expect(hash).toHaveLength(6);
expect(hash).toMatch(/^[0-9a-f]{6}$/);
});
it('is stable across calls with the same input', () => {
const a = computePromptHash('prompt-A', { foo: 1 });
const b = computePromptHash('prompt-A', { foo: 1 });
expect(a).toBe(b);
});
it('changes when system prompt changes', () => {
const a = computePromptHash('prompt-A', { foo: 1 });
const b = computePromptHash('prompt-B', { foo: 1 });
expect(a).not.toBe(b);
});
it('changes when schema changes', () => {
const a = computePromptHash('prompt', { foo: 1 });
const b = computePromptHash('prompt', { foo: 2 });
expect(a).not.toBe(b);
});
it('treats missing schema and empty schema differently', () => {
const undef = computePromptHash('prompt', undefined);
const empty = computePromptHash('prompt', {});
expect(undef).not.toBe(empty);
});
});
describe('computeInputHash', () => {
it('returns a full-length sha256 hex', () => {
expect(computeInputHash('hello')).toHaveLength(64);
expect(computeInputHash({ a: 1 })).toMatch(/^[0-9a-f]{64}$/);
});
});
@@ -0,0 +1,29 @@
import { createHash } from 'node:crypto';
const SHORT_LENGTH = 6;
/**
* Compute the 6-char prompt hash used to detect silently-mutated prompts.
*
* Hash input: `systemPrompt + '\n---\n' + JSON.stringify(schema)` — schema MUST
* be a deterministic JSON form (e.g. zod-to-json-schema). Keys in objects are
* stringified in insertion order; the caller is responsible for normalising.
*/
export const computePromptHash = (systemPrompt: string, schema: unknown): string => {
const schemaPart = schema === undefined ? '' : JSON.stringify(schema);
const hash = createHash('sha256');
hash.update(systemPrompt);
hash.update('\n---\n');
hash.update(schemaPart);
return hash.digest('hex').slice(0, SHORT_LENGTH);
};
/**
* sha256 of normalized input — for dedup / cache-hit analysis. Returned as the
* full hex digest; truncate at the caller if storage size matters.
*/
export const computeInputHash = (input: unknown): string => {
const hash = createHash('sha256');
hash.update(typeof input === 'string' ? input : JSON.stringify(input));
return hash.digest('hex');
};
@@ -0,0 +1,52 @@
import { describe, expect, it } from 'vitest';
import {
resolveScenario,
TRACING_SCENARIO_REGISTRY,
UNKNOWN_PROMPT_VERSION,
UNKNOWN_SCENARIO,
} from './registry';
describe('TRACING_SCENARIO_REGISTRY', () => {
it('maps known triggers to scenario names (no versions)', () => {
expect(TRACING_SCENARIO_REGISTRY.topic).toBe('topic_title');
expect(TRACING_SCENARIO_REGISTRY.memory).toBe('memory_extract');
});
});
describe('resolveScenario', () => {
it('looks the scenario up by trigger and uses the caller-supplied promptVersion', () => {
expect(resolveScenario({ promptVersion: 'v3.1', trigger: 'topic' })).toEqual({
promptVersion: 'v3.1',
scenario: 'topic_title',
});
});
it('honours an explicit scenario override even when trigger has a registry mapping', () => {
expect(
resolveScenario({
promptVersion: 'v2.1',
scenario: 'signal_skill_intent',
trigger: 'agent_signal',
}),
).toEqual({ promptVersion: 'v2.1', scenario: 'signal_skill_intent' });
});
it('falls back to UNKNOWN_PROMPT_VERSION when no version is provided', () => {
expect(resolveScenario({ scenario: 'custom_thing' })).toEqual({
promptVersion: UNKNOWN_PROMPT_VERSION,
scenario: 'custom_thing',
});
});
it('falls back to the unknown scenario sentinel when neither matches', () => {
expect(resolveScenario({ trigger: 'does_not_exist' })).toEqual({
promptVersion: UNKNOWN_PROMPT_VERSION,
scenario: UNKNOWN_SCENARIO,
});
expect(resolveScenario({})).toEqual({
promptVersion: UNKNOWN_PROMPT_VERSION,
scenario: UNKNOWN_SCENARIO,
});
});
});
@@ -0,0 +1,67 @@
import { TRACING_SCENARIOS, type TracingScenario } from '@lobechat/const';
import type { ScenarioDefinition } from './types';
/**
* Stable `trigger → scenario` mapping. Maps a `RequestTrigger` value to the
* default scenario name used for tracing.
*
* Triggers that fan out into multiple scenarios (e.g. `agent_signal` →
* `signal_skill_intent` / `signal_feedback_satisfaction` / ...) deliberately
* have no default entry here; those callers pass an explicit
* `metadata.scenario` instead.
*
* **Note on prompt versions**: version intentionally lives next to the prompt
* it describes (see `tracing.ts` files / `*_PROMPT_VERSION` constants near the
* `generateObject` call site). When the prompt or schema changes, bump that
* local constant — keeping the version next to the thing it versions avoids
* the drift you'd get from a central table that nobody remembers to update.
*
* For the full directory of scenario *names*, see `@lobechat/const`
* `TRACING_SCENARIOS`.
*/
export const TRACING_SCENARIO_REGISTRY: Record<string, TracingScenario> = {
agent_signal: TRACING_SCENARIOS.AgentSignal,
memory: TRACING_SCENARIOS.MemoryExtract,
signup_email_llm_review: TRACING_SCENARIOS.SignupEmailReview,
topic: TRACING_SCENARIOS.TopicTitle,
};
export const UNKNOWN_SCENARIO = TRACING_SCENARIOS.Unknown;
export const UNKNOWN_PROMPT_VERSION = 'v0';
export interface ResolveScenarioInput {
/**
* Prompt version supplied by the caller. Conventionally a `v<major>.<minor>`
* constant declared next to the prompt definition. Missing values resolve to
* `UNKNOWN_PROMPT_VERSION` so tracing still records the row.
*/
promptVersion?: string;
/** Override scenario name (e.g. `signal_skill_intent`); takes precedence over registry. */
scenario?: string;
/** RequestTrigger value (string form). */
trigger?: string;
}
/**
* Pick the `{ scenario, promptVersion }` for a tracing record.
*
* Resolution order:
* 1. `input.scenario` if provided
* 2. registry lookup by `input.trigger`
* 3. `UNKNOWN_SCENARIO` sentinel
*
* `promptVersion` is always passed through from the caller (or
* `UNKNOWN_PROMPT_VERSION` if absent). The registry never assigns versions —
* they live with the prompt.
*/
export const resolveScenario = (input: ResolveScenarioInput): ScenarioDefinition => {
const scenario =
input.scenario ??
(input.trigger ? TRACING_SCENARIO_REGISTRY[input.trigger] : undefined) ??
UNKNOWN_SCENARIO;
return {
promptVersion: input.promptVersion ?? UNKNOWN_PROMPT_VERSION,
scenario,
};
};
@@ -0,0 +1,117 @@
import fs from 'node:fs/promises';
import os from 'node:os';
import path from 'node:path';
import { afterEach, beforeEach, describe, expect, it } from 'vitest';
import type { TracingPayload } from '../types';
import { DEFAULT_DIR, FileTracingStore } from './file-store';
let tmpRoot: string;
const makePayload = (overrides: Partial<TracingPayload> = {}): TracingPayload => ({
created_at: new Date('2026-05-22T11:22:33.444Z').getTime(),
prompt_hash: 'abcdef',
prompt_version: 'v1.0',
scenario: 'home_brief',
tracing_id: '00000000-0000-0000-0000-000000000001',
version: '1.0',
...overrides,
});
beforeEach(async () => {
tmpRoot = await fs.mkdtemp(path.join(os.tmpdir(), 'llm-gen-trace-test-'));
});
afterEach(async () => {
await fs.rm(tmpRoot, { force: true, recursive: true });
});
describe('FileTracingStore', () => {
it('writes payloads under {scenario}/{promptVersion}-{promptHash}/ and returns a null key (local-only)', async () => {
const store = new FileTracingStore(tmpRoot);
const payload = makePayload();
const { key } = await store.save(payload);
// Local store is non-shareable — DB should leave `storage_key` empty.
expect(key).toBeNull();
const dir = path.join(tmpRoot, DEFAULT_DIR, 'home_brief', 'v1.0-abcdef');
const entries = await fs.readdir(dir);
const jsonFiles = entries.filter((f) => f.endsWith('.json'));
expect(jsonFiles).toHaveLength(1);
const raw = await fs.readFile(path.join(dir, jsonFiles[0]), 'utf8');
expect(JSON.parse(raw)).toMatchObject({
prompt_hash: 'abcdef',
scenario: 'home_brief',
tracing_id: payload.tracing_id,
});
});
it('updates the latest.json symlink to point at the freshest record', async () => {
const store = new FileTracingStore(tmpRoot);
await store.save(makePayload({ tracing_id: 'aaaa-1' }));
await store.save(
makePayload({
created_at: new Date('2026-05-22T11:30:00.000Z').getTime(),
scenario: 'topic_title',
tracing_id: 'bbbb-2',
}),
);
const latestPath = path.join(tmpRoot, DEFAULT_DIR, 'latest.json');
const target = await fs.realpath(latestPath);
const content = await fs.readFile(target, 'utf8');
expect(JSON.parse(content)).toMatchObject({
scenario: 'topic_title',
tracing_id: 'bbbb-2',
});
});
it('lists recent records as flat summaries newest-first', async () => {
const store = new FileTracingStore(tmpRoot);
await store.save(
makePayload({
created_at: new Date('2026-05-22T11:00:00.000Z').getTime(),
scenario: 'home_brief',
tracing_id: 'aaaa',
}),
);
await store.save(
makePayload({
created_at: new Date('2026-05-22T12:00:00.000Z').getTime(),
scenario: 'memory_extract',
tracing_id: 'bbbb',
}),
);
const summaries = await store.list();
expect(summaries.map((s) => s.tracing_id)).toEqual(['bbbb', 'aaaa']);
});
it('round-trips a payload via get() using the on-disk file path', async () => {
const store = new FileTracingStore(tmpRoot);
const payload = makePayload({
input: { messages: [{ content: 'hi', role: 'user' }] },
output: { topic: 'greeting' },
});
await store.save(payload);
// save() returns a null key, so locate the file on disk and read via its path.
const dir = path.join(tmpRoot, DEFAULT_DIR, 'home_brief', 'v1.0-abcdef');
const jsonFile = (await fs.readdir(dir)).find((f) => f.endsWith('.json'));
if (!jsonFile) throw new Error('expected a saved tracing file to exist');
const loaded = await store.get(path.join(dir, jsonFile));
expect(loaded).toMatchObject({
input: { messages: [{ content: 'hi', role: 'user' }] },
output: { topic: 'greeting' },
tracing_id: payload.tracing_id,
});
});
it('returns null when get() targets a missing key', async () => {
const store = new FileTracingStore(tmpRoot);
expect(await store.get('not/a/real/key.json')).toBeNull();
});
});
@@ -0,0 +1,167 @@
import type { Dirent } from 'node:fs';
import fs from 'node:fs/promises';
import path from 'node:path';
import type { TracingPayload, TracingSummary } from '../types';
import type { ITracingStore, SaveResult } from './types';
export const DEFAULT_DIR = '.llm-generation-tracing';
const safeSegment = (value: string): string => value.replaceAll(/[^\w.-]+/g, '_') || 'unknown';
/**
* Local / dev / desktop store. Writes plain JSON (no compression) so contents
* can be inspected with `cat`. Layout mirrors the S3 key pattern:
*
* .llm-generation-tracing/{scenario}/{promptVersion}-{promptHash}/{file}.json
*
* Keeps a top-level `latest.json` symlink pointing at the most recent record.
*/
export class FileTracingStore implements ITracingStore {
private readonly root: string;
constructor(rootDir?: string) {
this.root = path.resolve(rootDir ?? process.cwd(), DEFAULT_DIR);
}
async save(record: TracingPayload): Promise<SaveResult> {
const dir = this.bucketDir(record);
await fs.mkdir(dir, { recursive: true });
const ts = new Date(record.created_at).toISOString().replaceAll(':', '-');
const shortId = safeSegment(record.tracing_id.slice(0, 12));
const filename = `${ts}_${shortId}.json`;
const filePath = path.join(dir, filename);
await fs.writeFile(filePath, JSON.stringify(record, null, 2), 'utf8');
await this.updateLatestSymlink(filePath);
// Local-only path — return null so the DB row's `storage_key` stays empty.
// The CLI rediscovers files by walking `.llm-generation-tracing/`.
return { key: null };
}
async get(key: string): Promise<TracingPayload | null> {
const target = path.isAbsolute(key) ? key : path.join(this.root, key);
try {
const content = await fs.readFile(target, 'utf8');
return JSON.parse(content) as TracingPayload;
} catch {
return null;
}
}
async list(options?: { limit?: number; scenario?: string }): Promise<TracingSummary[]> {
const limit = options?.limit ?? 20;
const files = await this.collectFiles();
files.sort((a, b) => (a.filename < b.filename ? 1 : -1));
const summaries: TracingSummary[] = [];
for (const file of files) {
if (summaries.length >= limit) break;
try {
const content = await fs.readFile(file.fullPath, 'utf8');
const record = JSON.parse(content) as TracingPayload;
if (options?.scenario && record.scenario !== options.scenario) continue;
summaries.push({
created_at: record.created_at,
model: record.model_metadata?.model,
prompt_version: record.prompt_version,
scenario: record.scenario,
success: !record.error,
tracing_id: record.tracing_id,
validation_failed: record.validation_failed,
});
} catch {
// skip corrupted files
}
}
return summaries;
}
/**
* CLI helper: find a payload by tracing_id prefix. Returns the most-recent
* match when several rows share the same prefix (e.g. truncated short id).
*/
async findByTracingId(prefix: string): Promise<TracingPayload | null> {
const files = await this.collectFiles();
files.sort((a, b) => (a.filename < b.filename ? 1 : -1));
for (const file of files) {
try {
const content = await fs.readFile(file.fullPath, 'utf8');
const record = JSON.parse(content) as TracingPayload;
if (record.tracing_id.startsWith(prefix)) return record;
} catch {
// skip corrupted files
}
}
return null;
}
/** CLI helper: resolve the `latest.json` symlink (or fall back to the newest file). */
async getLatest(): Promise<TracingPayload | null> {
const latestPath = path.join(this.root, 'latest.json');
try {
const real = await fs.realpath(latestPath);
const content = await fs.readFile(real, 'utf8');
return JSON.parse(content) as TracingPayload;
} catch {
// symlink missing or unreadable — fall back to newest by filename order
}
const files = await this.collectFiles();
if (files.length === 0) return null;
files.sort((a, b) => (a.filename < b.filename ? 1 : -1));
try {
const content = await fs.readFile(files[0].fullPath, 'utf8');
return JSON.parse(content) as TracingPayload;
} catch {
return null;
}
}
private bucketDir(record: TracingPayload): string {
// Compose the relative segment as a single string so Turbopack / Webpack
// static analyzers don't try to enumerate path.join's multi-arg pattern
// (which fans out into a glob match against the project).
const sub = `${safeSegment(record.scenario)}/${safeSegment(record.prompt_version)}-${safeSegment(record.prompt_hash)}`;
return path.join(this.root, sub);
}
private async updateLatestSymlink(filePath: string): Promise<void> {
const latestPath = path.join(this.root, 'latest.json');
try {
await fs.unlink(latestPath);
} catch {
// ignore — no previous symlink
}
try {
await fs.symlink(path.relative(this.root, filePath), latestPath);
} catch {
// file systems without symlink support (e.g. Windows w/o dev mode) — silently skip
}
}
private async collectFiles(): Promise<{ filename: string; fullPath: string }[]> {
const results: { filename: string; fullPath: string }[] = [];
const walk = async (dir: string): Promise<void> => {
let entries: Dirent[];
try {
entries = await fs.readdir(dir, { withFileTypes: true });
} catch {
return;
}
for (const entry of entries) {
const full = path.join(dir, entry.name);
if (entry.isDirectory()) {
await walk(full);
} else if (entry.name.endsWith('.json') && entry.name !== 'latest.json') {
results.push({ filename: entry.name, fullPath: full });
}
}
};
await walk(this.root);
return results;
}
}
@@ -0,0 +1,20 @@
import type { TracingPayload, TracingSummary } from '../types';
export interface SaveResult {
/**
* Canonical, globally addressable key for the saved payload (e.g. an S3
* object key). `null` when the payload was persisted only to a local /
* non-shareable location — the service should then leave `storage_key`
* empty in the DB rather than record a path no other process can resolve.
*/
key: string | null;
}
export interface ITracingStore {
/** Optional retrieval — used by CLI / debug tooling only. */
get?: (key: string) => Promise<TracingPayload | null>;
/** Optional listing — used by CLI / debug tooling only. */
list?: (options?: { limit?: number }) => Promise<TracingSummary[]>;
/** Persist a tracing payload; returns the storage key for cross-reference. */
save: (record: TracingPayload) => Promise<SaveResult>;
}
@@ -0,0 +1,100 @@
export type LlmGenerationFeedbackSignal = 'positive' | 'negative' | 'neutral';
export interface TracingErrorPayload {
code?: string;
message?: string;
stack?: string;
}
export interface TracingModelMetadata {
[key: string]: unknown;
finish_reason?: string;
model?: string;
provider?: string;
}
/**
* Blob payload written to the store. Mirrors the design's Blob schema —
* the DB row stores indexable summary columns; this carries the full prompt /
* input / output detail for offline analysis.
*
* Version field guards future schema evolution.
*/
export interface TracingPayload {
created_at: number;
error?: TracingErrorPayload;
input?: unknown;
model_metadata?: TracingModelMetadata;
output?: unknown;
prompt_hash: string;
prompt_version: string;
raw_output?: string;
scenario: string;
schema?: unknown;
system_prompt?: string;
/** Unique id of the tracing row in the DB. Used by the store to build the key. */
tracing_id: string;
validation_failed?: boolean;
version: '1.0';
}
export interface TracingSummary {
created_at: number;
latency_ms?: number;
model?: string;
prompt_version: string;
scenario: string;
success: boolean;
tracing_id: string;
validation_failed?: boolean;
}
export interface ScenarioDefinition {
/** Human-bumped prompt version (e.g. `v1.0`). */
promptVersion: string;
/** Symbolic scenario name, used for grouping and partitioning storage. */
scenario: string;
}
/**
* Caller-facing tracing config for a single `generateObject` call. Passed
* through `GenerateObjectOptions.tracing` and consumed by the tracing hook
* to populate the `llm_generation_tracing` DB row + off-DB blob.
*
* Every field is optional — the hook fills sensible defaults (auto-extracted
* `inputHint`, registry-resolved `scenario`, `messages[0]` as system prompt).
* Supply the fields explicitly to keep the DB row scannable.
*/
export interface TracingOptions {
/** Owning agent ID; persisted to `agent_id`. */
agentId?: string;
/**
* Short snippet stored on `input_hint`. Pass the user's actual typed text
* when the prompt wraps it in a template — otherwise the auto-extracted
* hint ends up being the wrapper's first user message (e.g.
* `Before cursor: "…" After cursor: "…"`) instead of what the user wrote.
*/
inputHint?: string;
/**
* Free-form context written to the row's `metadata` jsonb column. Use this
* for ad-hoc fields that don't deserve a typed slot (e.g. correlation IDs).
*/
metadata?: Record<string, unknown>;
/** Parent tracing row for chained generations. */
parentTracingId?: string;
/** Semantic prompt version (e.g. `v1.0`). */
promptVersion?: string;
/** Scenario name; falls back to registry lookup by `trigger`. */
scenario?: string;
/** Structured-output schema identifier. */
schemaName?: string;
/**
* Override for the prompt-hash system text. Defaults to `messages[0]`
* when it's a system message.
*/
systemPrompt?: string;
/** Topic / conversation ID. */
topicId?: string;
/** RequestTrigger string. */
trigger?: string;
}
@@ -0,0 +1,232 @@
import type { TracingPayload, TracingSummary } from '../types';
// ANSI color helpers — keep parity with @lobechat/agent-tracing's viewer.
const dim = (s: string) => `\x1B[2m${s}\x1B[22m`;
const bold = (s: string) => `\x1B[1m${s}\x1B[22m`;
const green = (s: string) => `\x1B[32m${s}\x1B[39m`;
const red = (s: string) => `\x1B[31m${s}\x1B[39m`;
const yellow = (s: string) => `\x1B[33m${s}\x1B[39m`;
const cyan = (s: string) => `\x1B[36m${s}\x1B[39m`;
const magenta = (s: string) => `\x1B[35m${s}\x1B[39m`;
const PREVIEW_CHARS = 120;
const FULL_PREVIEW_CHARS = 4000;
const padEnd = (text: string, width: number): string =>
text.length >= width ? text : text + ' '.repeat(width - text.length);
const formatTime = (timestamp: number): string => {
const d = new Date(timestamp);
const pad = (n: number) => String(n).padStart(2, '0');
return `${d.getFullYear()}-${pad(d.getMonth() + 1)}-${pad(d.getDate())} ${pad(d.getHours())}:${pad(d.getMinutes())}:${pad(d.getSeconds())}`;
};
const previewLine = (text: string, maxLen: number): string => {
const single = text.replaceAll(/\s+/g, ' ').trim();
if (single.length <= maxLen) return single;
return `${single.slice(0, maxLen - 1)}`;
};
const stringify = (value: unknown): string =>
typeof value === 'string' ? value : JSON.stringify(value, null, 2);
const statusOf = (record: TracingPayload): string => {
if (record.error) return red('error');
if (record.validation_failed) return yellow('validation-fail');
return green('ok');
};
export const renderSummaryTable = (summaries: TracingSummary[]): string => {
if (summaries.length === 0) return dim('No tracing records found.');
const rows = summaries.map((s) => ({
created: formatTime(s.created_at),
id: s.tracing_id.slice(0, 12),
model: s.model ?? '-',
scenario: s.scenario,
statusRaw: s.success ? (s.validation_failed ? 'validation-fail' : 'ok') : 'error',
version: s.prompt_version,
}));
// Column widths include a 2-space right gutter so the next column never
// butts up against this one.
const widths = {
created: 19,
id: 14,
model: Math.max(8, 'MODEL'.length, ...rows.map((r) => r.model.length)) + 2,
scenario: Math.max(10, 'SCENARIO'.length, ...rows.map((r) => r.scenario.length)) + 2,
status: Math.max(8, 'STATUS'.length, 'validation-fail'.length) + 2,
version: Math.max(7, 'VERSION'.length, ...rows.map((r) => r.version.length)) + 2,
};
const colorStatus = (status: string): string =>
status === 'ok' ? green(status) : status === 'error' ? red(status) : yellow(status);
// Pad first (using raw text length), then colorize — keeps column alignment
// independent of ANSI escape codes.
const header =
bold(padEnd('ID', widths.id)) +
bold(padEnd('SCENARIO', widths.scenario)) +
bold(padEnd('VERSION', widths.version)) +
bold(padEnd('MODEL', widths.model)) +
bold(padEnd('STATUS', widths.status)) +
bold('CREATED');
const padCell = (text: string, width: number): string =>
' '.repeat(Math.max(0, width - text.length));
const body = rows.map(
(r) =>
cyan(r.id) +
padCell(r.id, widths.id) +
r.scenario +
padCell(r.scenario, widths.scenario) +
r.version +
padCell(r.version, widths.version) +
magenta(r.model) +
padCell(r.model, widths.model) +
colorStatus(r.statusRaw) +
padCell(r.statusRaw, widths.status) +
dim(r.created),
);
const ruleWidth =
widths.id + widths.scenario + widths.version + widths.model + widths.status + widths.created;
return [header, dim('─'.repeat(ruleWidth)), ...body].join('\n');
};
const roleColor = (role: string): ((s: string) => string) => {
if (role === 'user') return green;
if (role === 'assistant') return cyan;
if (role === 'system') return magenta;
return yellow;
};
const renderInputMessages = (input: unknown, full: boolean): string[] => {
if (!Array.isArray(input))
return [` ${dim(previewLine(stringify(input), full ? FULL_PREVIEW_CHARS : PREVIEW_CHARS))}`];
const lines: string[] = [];
for (let i = 0; i < input.length; i++) {
const msg = (input[i] ?? {}) as { content?: unknown; role?: string };
const role = msg.role ?? 'unknown';
const rawContent =
typeof msg.content === 'string' ? msg.content : JSON.stringify(msg.content ?? '');
const charCount = rawContent.length;
const charLabel = charCount > 0 ? dim(` ${charCount} chars`) : '';
const connector = i === input.length - 1 ? '└─' : '├─';
lines.push(` ${dim(connector)} ${dim(`[${i}]`)} ${roleColor(role)(role)}${charLabel}`);
if (rawContent) {
const preview = full ? rawContent : previewLine(rawContent, PREVIEW_CHARS);
lines.push(` ${dim(preview)}`);
}
}
return lines;
};
const renderOutput = (output: unknown, full: boolean): string => {
// Inline tiny single-key objects: `{ completion: "怎么样" }` → `completion: "怎么样"`
if (
output &&
typeof output === 'object' &&
!Array.isArray(output) &&
Object.keys(output).length === 1
) {
const [key, value] = Object.entries(output)[0];
const rendered = typeof value === 'string' ? `"${value}"` : JSON.stringify(value);
if (rendered.length <= PREVIEW_CHARS) return `${cyan(key)}: ${rendered}`;
}
const text = stringify(output);
if (full || text.length <= PREVIEW_CHARS * 2) return text;
return previewLine(text, PREVIEW_CHARS);
};
export const renderPayloadDetail = (
record: TracingPayload,
options: { full?: boolean },
): string => {
const full = !!options.full;
const lines: string[] = [];
// Header — single compact line.
const modelLabel =
record.model_metadata?.provider || record.model_metadata?.model
? ` ${magenta(`${record.model_metadata?.provider ?? '-'} / ${record.model_metadata?.model ?? '-'}`)}`
: '';
lines.push(
bold('LLM Generation') +
` ${cyan(record.tracing_id.slice(0, 12))}` +
` scenario:${record.scenario}` +
` ${dim(record.prompt_version)}` +
modelLabel +
` ${statusOf(record)}` +
` ${dim(formatTime(record.created_at))}`,
);
if (record.error) {
lines.push(`${red('Error:')} ${record.error.code ?? '-'}${record.error.message ?? '-'}`);
}
// Build sections as a tree. Each section is rendered as `├─ label meta` then optional indented body.
type Section = { body?: string[]; label: string; meta?: string };
const sections: Section[] = [];
if (record.system_prompt) {
sections.push({
body: full ? [` ${dim(record.system_prompt)}`] : undefined,
label: 'system_prompt',
meta: full
? dim(`${record.system_prompt.length} chars`)
: dim(`${record.system_prompt.length} chars (use --full to expand)`),
});
}
if (record.input !== undefined) {
const isArr = Array.isArray(record.input);
const count = isArr ? (record.input as unknown[]).length : 1;
sections.push({
body: renderInputMessages(record.input, full),
label: 'input',
meta: isArr ? dim(`${count} message${count === 1 ? '' : 's'}`) : undefined,
});
}
if (record.output !== undefined) {
sections.push({
body: [` ${renderOutput(record.output, full)}`],
label: 'output',
});
}
if (record.raw_output) {
sections.push({
body: [` ${dim(full ? record.raw_output : previewLine(record.raw_output, PREVIEW_CHARS))}`],
label: 'raw_output',
meta: yellow('validation_failed'),
});
}
if (record.schema !== undefined) {
const schemaText = stringify(record.schema);
sections.push({
body: full ? [` ${dim(schemaText)}`] : undefined,
label: 'schema',
meta: dim(
full ? `${schemaText.length} chars` : `${schemaText.length} chars (use --full to expand)`,
),
});
}
for (let i = 0; i < sections.length; i++) {
const s = sections[i];
const isLast = i === sections.length - 1;
const connector = isLast ? '└─' : '├─';
lines.push(`${dim(connector)} ${bold(s.label)}${s.meta ? ` ${s.meta}` : ''}`);
if (s.body) {
for (const line of s.body) lines.push(line);
}
}
return lines.join('\n');
};
@@ -0,0 +1,7 @@
{
"compilerOptions": {
"outDir": "./dist"
},
"extends": "../../tsconfig.json",
"include": ["src/**/*.ts"]
}
@@ -0,0 +1,11 @@
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
coverage: {
provider: 'v8',
reporter: ['text', 'json', 'lcov', 'text-summary'],
},
environment: 'node',
},
});
@@ -157,7 +157,15 @@ export abstract class BaseMemoryExtractor<
span.addEvent('gen_ai.request.send');
const result = await this.runtime.generateObject(payload, {
metadata: { trigger: RequestTrigger.Memory },
metadata: {
// Optional backlink to the job-level memory trace blob; the
// llm_generation_tracing hook persists it under metadata so a
// per-call row can be traced back to the job-level dump.
...(options?.parentMemoryTraceKey
? { parent_memory_trace_key: options.parentMemoryTraceKey }
: {}),
trigger: RequestTrigger.Memory,
},
});
span.addEvent('gen_ai.response.receive');
+7
View File
@@ -43,6 +43,13 @@ export interface ExtractorOptions extends ExtractorTemplateProps {
) => Promise<void> | void;
};
messageIds?: string[];
/**
* S3 key of the parent memory job trace. When provided, propagated into
* the per-call `llm_generation_tracing` row as `metadata.parent_memory_trace_key`,
* giving offline analysis a backlink from a single generateObject call to the
* job-level memory trace that spawned it.
*/
parentMemoryTraceKey?: string;
sourceId?: string;
userId?: string;
}
@@ -716,6 +716,46 @@ describe('ModelRuntime', () => {
await expect(runtime.generateObject(genObjPayload)).resolves.toEqual({ result: 'ok' });
});
it('onGenerateObjectComplete fires on success with output, latency and usage', async () => {
const onGenerateObjectComplete = vi.fn();
const { runtime, mockRuntimeAI } = createMockRuntime({ onGenerateObjectComplete });
const usage = { totalInputTokens: 50, totalOutputTokens: 20, cost: 0.001 };
mockRuntimeAI.generateObject.mockImplementation(async (_p: any, opts: any) => {
await opts?.onUsage?.(usage);
return { result: 'ok' };
});
await runtime.generateObject(genObjPayload);
expect(onGenerateObjectComplete).toHaveBeenCalledTimes(1);
const [data, context] = onGenerateObjectComplete.mock.calls[0];
expect(data).toMatchObject({ output: { result: 'ok' }, success: true, usage });
expect(data.latencyMs).toBeGreaterThanOrEqual(0);
expect(context.payload).toBe(genObjPayload);
});
it('onGenerateObjectComplete fires on failure with structured error and is awaited before throw', async () => {
const onGenerateObjectComplete = vi.fn();
const { runtime, mockRuntimeAI } = createMockRuntime({ onGenerateObjectComplete });
const cause = new Error('boom');
mockRuntimeAI.generateObject.mockRejectedValue(cause);
await expect(runtime.generateObject(genObjPayload)).rejects.toBe(cause);
expect(onGenerateObjectComplete).toHaveBeenCalledTimes(1);
const [data] = onGenerateObjectComplete.mock.calls[0];
expect(data.success).toBe(false);
expect(data.error?.message).toBe('boom');
});
it('hook errors thrown from onGenerateObjectComplete are swallowed and do not surface', async () => {
const onGenerateObjectComplete = vi.fn().mockRejectedValue(new Error('hook broke'));
const { runtime, mockRuntimeAI } = createMockRuntime({ onGenerateObjectComplete });
mockRuntimeAI.generateObject.mockResolvedValue({ result: 'ok' });
await expect(runtime.generateObject(genObjPayload)).resolves.toEqual({ result: 'ok' });
expect(onGenerateObjectComplete).toHaveBeenCalledTimes(1);
});
});
describe('embeddings hooks', () => {
@@ -84,6 +84,25 @@ export interface ModelRuntimeHooks {
context: { options?: EmbeddingsOptions; payload: EmbeddingsPayload },
) => void | Promise<void>;
/**
* Always fires after `generateObject` returns or throws — success or failure.
* Use this for full-lifecycle observability (per-call tracing, prompt analytics).
* Unlike `onGenerateObjectFinal`, this fires regardless of whether the runtime
* surfaces a `usage` callback, so the gap of "succeeded but no usage" is covered.
*
* Hook failures are swallowed and logged — they must not interfere with the response.
*/
onGenerateObjectComplete?: (
data: {
error?: { code?: string; message?: string; stack?: string };
latencyMs: number;
output?: unknown;
success: boolean;
usage?: ModelUsage;
},
context: { options?: GenerateObjectOptions; payload: GenerateObjectPayload },
) => void | Promise<void>;
onGenerateObjectError?: (
error: ChatCompletionErrorPayload,
context: { options?: GenerateObjectOptions; payload: GenerateObjectPayload },
@@ -286,16 +305,46 @@ export class ModelRuntime {
}
async generateObject(payload: GenerateObjectPayload, options?: GenerateObjectOptions) {
const startedAt = Date.now();
let usageCapture: ModelUsage | undefined;
const fireComplete = async (data: {
error?: { code?: string; message?: string; stack?: string };
output?: unknown;
success: boolean;
}) => {
if (!this._hooks?.onGenerateObjectComplete) return;
try {
await this._hooks.onGenerateObjectComplete(
{
error: data.error,
latencyMs: Date.now() - startedAt,
output: data.output,
success: data.success,
usage: usageCapture,
},
{ options, payload },
);
} catch (e) {
// Hook failures must not affect the caller — log and move on.
console.error('[ModelRuntime] onGenerateObjectComplete hook error:', e);
}
};
try {
await this._hooks?.beforeGenerateObject?.(payload, options);
const finalOptions = this._hooks?.onGenerateObjectFinal
const needsUsageCapture =
this._hooks?.onGenerateObjectFinal || this._hooks?.onGenerateObjectComplete;
const finalOptions = needsUsageCapture
? {
...options,
onUsage: async (usage: ModelUsage) => {
usageCapture = usage;
await options?.onUsage?.(usage);
try {
await this._hooks!.onGenerateObjectFinal!({ usage }, { options, payload });
await this._hooks?.onGenerateObjectFinal?.({ usage }, { options, payload });
} catch (e) {
// Hook failures (billing, tracing) must not interfere with response completion
console.error('[ModelRuntime] onGenerateObjectFinal hook error:', e);
@@ -304,7 +353,9 @@ export class ModelRuntime {
}
: options;
return await this._runtime.generateObject!(payload, finalOptions);
const output = await this._runtime.generateObject!(payload, finalOptions);
await fireComplete({ output, success: true });
return output;
} catch (error) {
if (this._hooks?.onGenerateObjectError) {
await this._hooks.onGenerateObjectError(error as ChatCompletionErrorPayload, {
@@ -312,6 +363,11 @@ export class ModelRuntime {
payload,
});
}
const err = error as Error & { code?: string };
await fireComplete({
error: { code: err?.code, message: err?.message, stack: err?.stack },
success: false,
});
throw error;
}
}
@@ -0,0 +1,67 @@
import { describe, expect, it, vi } from 'vitest';
import { mergeModelRuntimeHooks } from './mergeHooks';
import type { ModelRuntimeHooks } from './ModelRuntime';
describe('mergeModelRuntimeHooks', () => {
it('returns undefined when both hooks are empty', () => {
expect(mergeModelRuntimeHooks(undefined, undefined)).toBeUndefined();
});
it('returns the only present hook untouched', () => {
const fn = vi.fn();
const merged = mergeModelRuntimeHooks({ beforeChat: fn }, undefined);
expect(merged?.beforeChat).toBe(fn);
});
it('chains hooks of the same name in a → b order', async () => {
const order: string[] = [];
const a: ModelRuntimeHooks = {
onGenerateObjectComplete: vi.fn(async () => {
order.push('a');
}),
};
const b: ModelRuntimeHooks = {
onGenerateObjectComplete: vi.fn(async () => {
order.push('b');
}),
};
const merged = mergeModelRuntimeHooks(a, b);
await merged?.onGenerateObjectComplete?.(
{ latencyMs: 0, success: true },
{} as Parameters<NonNullable<ModelRuntimeHooks['onGenerateObjectComplete']>>[1],
);
expect(order).toEqual(['a', 'b']);
expect(a.onGenerateObjectComplete).toHaveBeenCalledTimes(1);
expect(b.onGenerateObjectComplete).toHaveBeenCalledTimes(1);
});
it('does not run b when a throws (a is load-bearing)', async () => {
const bSpy = vi.fn();
const merged = mergeModelRuntimeHooks(
{
onGenerateObjectComplete: async () => {
throw new Error('billing failed');
},
},
{ onGenerateObjectComplete: bSpy },
);
await expect(
merged?.onGenerateObjectComplete?.(
{ latencyMs: 0, success: true },
{} as Parameters<NonNullable<ModelRuntimeHooks['onGenerateObjectComplete']>>[1],
),
).rejects.toThrow('billing failed');
expect(bSpy).not.toHaveBeenCalled();
});
it('keeps hooks that exist in only one side without wrapping', () => {
const onlyInA = vi.fn();
const onlyInB = vi.fn();
const merged = mergeModelRuntimeHooks({ beforeChat: onlyInA }, { onChatFinal: onlyInB });
expect(merged?.beforeChat).toBe(onlyInA);
expect(merged?.onChatFinal).toBe(onlyInB);
});
});
@@ -0,0 +1,37 @@
import type { ModelRuntimeHooks } from './ModelRuntime';
/**
* Merge two `ModelRuntimeHooks` instances, chaining handlers that share a key
* so both fire in `a → b` order. Designed for composing layered hooks at the
* `ModelRuntime` construction site (e.g. billing hooks + tracing hooks).
*
* - Returns `undefined` only when both inputs are empty.
* - Chained hooks run sequentially (`a` first, then `b`); the second hook only
* runs if the first resolves. Place load-bearing hooks (the ones whose
* failure should abort the call) in `a`.
*/
export const mergeModelRuntimeHooks = (
a?: ModelRuntimeHooks,
b?: ModelRuntimeHooks,
): ModelRuntimeHooks | undefined => {
if (!a && !b) return undefined;
if (!a) return b;
if (!b) return a;
const merged: ModelRuntimeHooks = { ...a };
for (const key of Object.keys(b) as (keyof ModelRuntimeHooks)[]) {
const existing = merged[key];
const next = b[key];
if (!existing) {
(merged[key] as unknown) = next;
continue;
}
(merged[key] as unknown) = async (...args: unknown[]) => {
await (existing as (...args: unknown[]) => Promise<unknown>)(...args);
await (next as (...args: unknown[]) => Promise<unknown>)(...args);
};
}
return merged;
};
+1
View File
@@ -1,6 +1,7 @@
export * from './const/models';
export * from './core/BaseAI';
export { pruneReasoningPayload } from './core/contextBuilders/openai';
export { mergeModelRuntimeHooks } from './core/mergeHooks';
export type { ModelRuntimeHooks } from './core/ModelRuntime';
export { ModelRuntime } from './core/ModelRuntime';
export { createOpenAICompatibleRuntime } from './core/openaiCompatibleFactory';
@@ -36,12 +36,20 @@ export interface GenerateObjectOptions {
*/
headers?: Record<string, any>;
/** Metadata passed to hooks (billing, tracing, etc.) */
/** Free-form context passed to hooks (e.g. billing, routing). */
metadata?: Record<string, unknown>;
onUsage?: (usage: ModelUsage) => void | Promise<void>;
signal?: AbortSignal;
/**
* Structured tracing config consumed by tracing hooks (e.g.
* `llm_generation_tracing`). Loosely typed here so the runtime stays
* tracing-agnostic; callers should import `TracingOptions` from
* `@lobechat/llm-generation-tracing` for the strongly-typed shape.
*/
tracing?: Record<string, unknown>;
/**
* userId for the GenerateObject
*/
@@ -0,0 +1,42 @@
import { describe, expect, it } from 'vitest';
import {
chainInputCompletion,
INPUT_COMPLETION_PROMPT_VERSION,
INPUT_COMPLETION_SCHEMA_NAME,
} from './inputCompletion';
describe('chainInputCompletion', () => {
it('returns a system + user message pair', () => {
const { messages } = chainInputCompletion('How can I ', '');
expect(messages).toHaveLength(2);
expect(messages[0].role).toBe('system');
expect(messages[1].role).toBe('user');
expect(messages[1].content).toContain('Before cursor: "How can I "');
expect(messages[1].content).toContain('After cursor: ""');
});
it('attaches a minimal `{ completion: string }` schema for generateObject', () => {
const { schema } = chainInputCompletion('hi', '');
expect(schema.name).toBe(INPUT_COMPLETION_SCHEMA_NAME);
expect(schema.strict).toBe(true);
expect(schema.schema.required).toEqual(['completion']);
expect(schema.schema.additionalProperties).toBe(false);
expect(schema.schema.properties.completion.type).toBe('string');
});
it('appends conversation context to the system prompt when provided', () => {
const { messages } = chainInputCompletion('write ', '', [
{ content: 'previous response', role: 'assistant' },
{ content: 'previous question', role: 'user' },
]);
const sys = messages[0].content as string;
expect(sys).toContain('Current conversation context');
expect(sys).toContain('assistant: previous response');
expect(sys).toContain('user: previous question');
});
it('exports a version constant the call site can pin to metadata', () => {
expect(INPUT_COMPLETION_PROMPT_VERSION).toMatch(/^v\d+\.\d+$/);
});
});
+74 -23
View File
@@ -1,21 +1,56 @@
import type { ChatStreamPayload, OpenAIChatMessage } from '@lobechat/types';
import type { OpenAIChatMessage } from '@lobechat/types';
export const chainInputCompletion = (
beforeCursor: string,
afterCursor: string,
context?: OpenAIChatMessage[],
): Partial<ChatStreamPayload> => {
let contextBlock = '';
if (context?.length) {
contextBlock = `\n\nCurrent conversation context:
${context.map((m) => `${m.role}: ${m.content}`).join('\n')}`;
}
/**
* Bump when editing the autocomplete system prompt or schema below. Plumbed
* through `metadata.promptVersion` at the call site so per-call tracing
* groups runs by prompt iteration. The 6-char prompt hash on the row catches
* forgotten bumps.
*/
export const INPUT_COMPLETION_PROMPT_VERSION = 'v1.0';
return {
max_tokens: 100,
messages: [
{
content: `You are an autocomplete engine for a chat input box. The user is composing a message to send to an AI assistant. Predict and complete what the USER is typing. Output ONLY the missing text to insert at the cursor.
/**
* Symbolic schema name — also recorded on the tracing row's `schemaName`
* column so prompt iterations and schema renames can be reasoned about
* together.
*/
export const INPUT_COMPLETION_SCHEMA_NAME = 'InputCompletion';
/**
* Minimal `generateObject` schema: a single `completion` string. The JSON
* wrapping overhead is ~15-30 tokens, which is negligible against the model's
* ~100-token completion budget but unlocks per-call tracing via the existing
* `ModelRuntime.generateObject` hook.
*/
export interface InputCompletionSchema {
name: typeof INPUT_COMPLETION_SCHEMA_NAME;
schema: {
additionalProperties: false;
properties: {
completion: { description: string; type: 'string' };
};
required: ['completion'];
type: 'object';
};
strict: true;
}
const INPUT_COMPLETION_SCHEMA: InputCompletionSchema = {
name: INPUT_COMPLETION_SCHEMA_NAME,
schema: {
additionalProperties: false,
properties: {
completion: {
description: 'The missing text to insert at the cursor. Empty string for no suggestion.',
type: 'string',
},
},
required: ['completion'],
type: 'object',
},
strict: true,
};
const SYSTEM_PROMPT = `You are an autocomplete engine for a chat input box. The user is composing a message to send to an AI assistant. Predict and complete what the USER is typing. Return only the missing text to insert at the cursor in the JSON object's \`completion\` field.
CRITICAL RULES:
- You are completing the USER's message, NOT the AI assistant's response
@@ -23,6 +58,7 @@ CRITICAL RULES:
- NEVER generate text that sounds like an AI assistant responding (e.g., "help you", "assist you", "I can help")
- Keep it short and natural, under 15 words
- Match the user's language
- If no completion would be useful, return an empty string
GOOD examples (user perspective):
"How can I " → "optimize my React component's performance?"
@@ -35,13 +71,28 @@ GOOD examples (user perspective):
BAD examples (assistant perspective — NEVER do this):
"How can I " → "help you today?" ← WRONG: this is what an AI assistant says
"Hi" → ", how can I help you?" ← WRONG: assistant greeting
"Let me " → "explain that for you" ← WRONG: assistant offering to explain${contextBlock}`,
role: 'system',
},
{
content: `Before cursor: "${beforeCursor}"\nAfter cursor: "${afterCursor}"`,
role: 'user',
},
"Let me " → "explain that for you" ← WRONG: assistant offering to explain`;
export interface InputCompletionChainResult {
messages: OpenAIChatMessage[];
schema: InputCompletionSchema;
}
export const chainInputCompletion = (
beforeCursor: string,
afterCursor: string,
context?: OpenAIChatMessage[],
): InputCompletionChainResult => {
let contextBlock = '';
if (context?.length) {
contextBlock = `\n\nCurrent conversation context:\n${context.map((m) => `${m.role}: ${m.content}`).join('\n')}`;
}
return {
messages: [
{ content: `${SYSTEM_PROMPT}${contextBlock}`, role: 'system' },
{ content: `Before cursor: "${beforeCursor}"\nAfter cursor: "${afterCursor}"`, role: 'user' },
],
schema: INPUT_COMPLETION_SCHEMA,
};
};
+23 -2
View File
@@ -194,6 +194,11 @@ export const StructureSchema = z.object({
});
export const StructureOutputSchema = z.object({
/**
* Free-form context forwarded to non-tracing hooks (e.g. billing). Use
* `tracing` for `llm_generation_tracing` config.
*/
metadata: z.record(z.string(), z.unknown()).optional(),
messages: z.array(z.any()),
model: z.string(),
provider: z.string(),
@@ -201,10 +206,16 @@ export const StructureOutputSchema = z.object({
tools: z
.array(z.object({ function: LobeUniformToolSchema, type: z.literal('function') }))
.optional(),
/**
* Structured tracing config (scenario / promptVersion / schemaName /
* agentId / topicId / inputHint / ...). See `TracingOptions` from
* `@lobechat/llm-generation-tracing` for the typed shape.
*/
tracing: z.record(z.string(), z.unknown()).optional(),
});
interface IStructureSchema {
description: string;
description?: string;
name: string;
schema: {
additionalProperties?: boolean;
@@ -216,8 +227,12 @@ interface IStructureSchema {
}
export interface StructureOutputParams {
keyVaultsPayload: string;
messages: OpenAIChatMessage[];
/**
* Free-form context forwarded to non-tracing hooks (e.g. billing). Use
* `tracing` for `llm_generation_tracing` config.
*/
metadata?: Record<string, unknown>;
model: string;
provider: string;
schema?: IStructureSchema;
@@ -226,4 +241,10 @@ export interface StructureOutputParams {
function: LobeUniformTool;
type: 'function';
}[];
/**
* Structured tracing config (scenario / promptVersion / schemaName /
* agentId / topicId / inputHint / ...). See `TracingOptions` from
* `@lobechat/llm-generation-tracing` for the typed shape.
*/
tracing?: Record<string, unknown>;
}
+36 -19
View File
@@ -1,8 +1,13 @@
import { isDesktop } from '@lobechat/const';
import { isDesktop, TRACING_SCENARIOS } from '@lobechat/const';
import { HotkeyEnum, KeyEnum } from '@lobechat/const/hotkeys';
import { HETEROGENEOUS_TYPE_LABELS } from '@lobechat/heterogeneous-agents';
import { chainInputCompletion, escapeXmlAttr } from '@lobechat/prompts';
import { isCommandPressed, merge } from '@lobechat/utils';
import {
chainInputCompletion,
escapeXmlAttr,
INPUT_COMPLETION_PROMPT_VERSION,
INPUT_COMPLETION_SCHEMA_NAME,
} from '@lobechat/prompts';
import { isCommandPressed } from '@lobechat/utils';
import type { IEditor } from '@lobehub/editor';
import { INSERT_MENTION_COMMAND, ReactAutoCompletePlugin, ReactMathPlugin } from '@lobehub/editor';
import { Editor, FloatMenu, useEditorState } from '@lobehub/editor/react';
@@ -16,9 +21,10 @@ import { useHotkeysContext } from 'react-hotkeys-hook';
import { usePasteFile, useUploadFiles } from '@/components/DragUploadZone';
import { useEnterToSend } from '@/hooks/useEnterToSend';
import { useIMECompositionEvent } from '@/hooks/useIMECompositionEvent';
import { chatService } from '@/services/chat';
import { aiChatService } from '@/services/aiChat';
import { useAgentStore } from '@/store/agent';
import { agentByIdSelectors } from '@/store/agent/selectors';
import { useChatStore } from '@/store/chat';
import { useUserStore } from '@/store/user';
import {
labPreferSelectors,
@@ -213,36 +219,47 @@ const InputEditor = memo<{
// mid-text causes nested editor updates that freeze the input
if (afterText.trim()) return null;
const { enabled: _, ...config } = systemAgentSelectors.inputCompletion(
useUserStore.getState(),
);
const config = systemAgentSelectors.inputCompletion(useUserStore.getState());
const context = getMessagesRef.current?.();
const chainParams = chainInputCompletion(input, afterText, context);
const { messages, schema } = chainInputCompletion(input, afterText, context);
const abortController = new AbortController();
abortSignal.addEventListener('abort', () => abortController.abort());
let result = '';
const currentTopicId = useChatStore.getState().activeTopicId;
let response: { completion?: string } | null;
try {
await chatService.fetchPresetTaskResult({
abortController,
onMessageHandle: (chunk) => {
if (chunk.type === 'text') {
result += chunk.text;
}
response = (await aiChatService.generateJSON(
{
messages,
model: config.model,
provider: config.provider,
schema,
tracing: {
agentId,
// Use the user's actual typed text as the row's `input_hint`
// — the wrapped prompt's first user message is templated and
// not human-scannable.
inputHint: input,
promptVersion: INPUT_COMPLETION_PROMPT_VERSION,
scenario: TRACING_SCENARIOS.InputCompletion,
schemaName: INPUT_COMPLETION_SCHEMA_NAME,
topicId: currentTopicId,
},
},
params: merge(config, chainParams),
});
abortController,
)) as { completion?: string } | null;
} catch {
return null;
}
if (abortSignal.aborted) return null;
return result.trimEnd() || null;
const completion = response?.completion?.trimEnd();
return completion || null;
},
[isComposingRef],
[isComposingRef, agentId],
);
const autoCompletePlugin = useMemo(
@@ -0,0 +1,90 @@
// @vitest-environment node
import { promisify } from 'node:util';
import { zstdCompress, zstdDecompress } from 'node:zlib';
import type { TracingPayload } from '@lobechat/llm-generation-tracing';
import { beforeEach, describe, expect, it, vi } from 'vitest';
const compressZstd = promisify(zstdCompress);
const decompressZstd = promisify(zstdDecompress);
const uploadBuffer = vi.fn();
const getFileByteArray = vi.fn();
vi.mock('@/server/modules/S3', () => ({
FileS3: vi.fn(() => ({ getFileByteArray, uploadBuffer })),
}));
const { S3TracingStore, buildTracingKey } = await import('./S3TracingStore');
const samplePayload = (overrides: Partial<TracingPayload> = {}): TracingPayload => ({
created_at: new Date('2026-05-22T11:22:33.444Z').getTime(),
prompt_hash: 'ab1fc3',
prompt_version: 'v1.0',
scenario: 'home_brief',
tracing_id: '00000000-0000-0000-0000-000000000001',
version: '1.0',
...overrides,
});
beforeEach(() => {
uploadBuffer.mockReset().mockResolvedValue(undefined);
getFileByteArray.mockReset();
});
describe('buildTracingKey', () => {
it('lays out scenario / version-hash / date / id with the .json.zst suffix', () => {
const key = buildTracingKey(samplePayload());
expect(key).toBe(
'llm-generation-tracing/home_brief/v1.0-ab1fc3/2026-05-22/00000000-0000-0000-0000-000000000001.json.zst',
);
});
it('sanitises path-unsafe characters in scenario and version segments', () => {
const key = buildTracingKey(samplePayload({ prompt_version: 'v 2/0', scenario: 'odd name!' }));
expect(key).toMatch(
/llm-generation-tracing\/odd_name_\/v_2_0-ab1fc3\/2026-05-22\/00000000-0000-0000-0000-000000000001\.json\.zst/,
);
});
});
describe('S3TracingStore.save', () => {
it('uploads zstd-compressed JSON with the canonical key and content-type', async () => {
const store = new S3TracingStore();
const payload = samplePayload({ input: { messages: [{ role: 'user' }] } });
const { key } = await store.save(payload);
expect(key).toBe(
'llm-generation-tracing/home_brief/v1.0-ab1fc3/2026-05-22/00000000-0000-0000-0000-000000000001.json.zst',
);
expect(uploadBuffer).toHaveBeenCalledTimes(1);
const [callKey, body, contentType] = uploadBuffer.mock.calls[0];
expect(callKey).toBe(key);
expect(contentType).toBe('application/zstd');
expect(Buffer.isBuffer(body)).toBe(true);
expect([body[0], body[1], body[2], body[3]]).toEqual([0x28, 0xb5, 0x2f, 0xfd]);
const roundtripped = JSON.parse((await decompressZstd(body)).toString('utf8'));
expect(roundtripped).toEqual(payload);
});
});
describe('S3TracingStore.get', () => {
it('decompresses a stored payload by key', async () => {
const store = new S3TracingStore();
const payload = samplePayload();
const buf = await compressZstd(Buffer.from(JSON.stringify(payload)));
getFileByteArray.mockResolvedValueOnce(new Uint8Array(buf));
const loaded = await store.get('some/key.json.zst');
expect(loaded).toEqual(payload);
});
it('returns null when the key is missing', async () => {
const store = new S3TracingStore();
getFileByteArray.mockRejectedValueOnce(new Error('NoSuchKey'));
expect(await store.get('missing')).toBeNull();
});
});
@@ -0,0 +1,88 @@
import { promisify } from 'node:util';
import { zstdCompress, zstdDecompress } from 'node:zlib';
import type {
ITracingStore,
SaveResult,
TracingPayload,
TracingSummary,
} from '@lobechat/llm-generation-tracing';
import debug from 'debug';
import { FileS3 } from '@/server/modules/S3';
const compressZstd = promisify(zstdCompress);
const decompressZstd = promisify(zstdDecompress);
const log = debug('lobe-server:llm-generation-tracing:s3');
const TRACE_PREFIX = 'llm-generation-tracing';
const PAYLOAD_SUFFIX = '.json.zst';
const ZSTD_CONTENT_TYPE = 'application/zstd';
const sanitize = (value: string): string => value.replaceAll(/[^\w.-]+/g, '_') || 'unknown';
const dateSegment = (createdAt: number): string => new Date(createdAt).toISOString().slice(0, 10);
/**
* Canonical S3 key for a tracing payload. Same source of truth used by both
* the store's `save()` and the DB row's `storage_key` so the value persisted
* in `llm_generation_tracing.storage_key` always matches the object in S3.
*
* Layout:
* llm-generation-tracing/{scenario}/{promptVersion}-{promptHash}/{yyyy-mm-dd}/{tracingId}.json.zst
*/
export const buildTracingKey = (record: {
created_at: number;
prompt_hash: string;
prompt_version: string;
scenario: string;
tracing_id: string;
}): string =>
[
TRACE_PREFIX,
sanitize(record.scenario),
`${sanitize(record.prompt_version)}-${sanitize(record.prompt_hash)}`,
dateSegment(record.created_at),
`${sanitize(record.tracing_id)}${PAYLOAD_SUFFIX}`,
].join('/');
/**
* S3-backed store for per-call llm_generation_tracing payloads.
*
* Payload is zstd-compressed (level 3) prior to upload; the `.zst` suffix
* advertises the format but Content-Encoding is intentionally omitted to keep
* the object opaque to HTTP middleware (callers decompress explicitly).
*
* Query (`get` / `list`) is left intentionally minimal — analytics queries go
* against the DB row; the S3 blob is the cold artefact for offline review.
*/
export class S3TracingStore implements ITracingStore {
private readonly s3: FileS3;
constructor() {
this.s3 = new FileS3();
}
async save(record: TracingPayload): Promise<SaveResult> {
const key = buildTracingKey(record);
log('Saving tracing payload to S3: %s', key);
const compressed = await compressZstd(Buffer.from(JSON.stringify(record)));
await this.s3.uploadBuffer(key, compressed, ZSTD_CONTENT_TYPE);
return { key };
}
async get(key: string): Promise<TracingPayload | null> {
try {
const bytes = await this.s3.getFileByteArray(key);
const buf = await decompressZstd(Buffer.from(bytes));
return JSON.parse(buf.toString('utf8')) as TracingPayload;
} catch {
return null;
}
}
async list(_options?: { limit?: number }): Promise<TracingSummary[]> {
return [];
}
}
@@ -0,0 +1 @@
export { buildTracingKey, S3TracingStore } from './S3TracingStore';
+13 -3
View File
@@ -1,5 +1,9 @@
import { type GoogleGenAIOptions } from '@google/genai';
import { ModelRuntime, type ModelRuntimeHooks } from '@lobechat/model-runtime';
import {
mergeModelRuntimeHooks,
ModelRuntime,
type ModelRuntimeHooks,
} from '@lobechat/model-runtime';
import { LobeVertexAI } from '@lobechat/model-runtime/vertexai';
import {
type AWSBedrockKeyVault,
@@ -18,6 +22,7 @@ import { getBusinessModelRuntimeHooks } from '@/business/server/model-runtime';
import { AiProviderModel } from '@/database/models/aiProvider';
import { type LobeChatDatabase } from '@/database/type';
import { getLLMConfig } from '@/envs/llm';
import { createLLMGenerationTracingHook } from '@/server/services/llmGenerationTracing/hook';
import { KeyVaultsGateKeeper } from '../KeyVaultsEncrypt';
import apiKeyManager from './apiKeyManager';
@@ -420,8 +425,13 @@ export const initModelRuntimeFromDB = async (
const payload = buildPayloadFromKeyVaults(keyVaults, runtimeProvider);
// 4. Get business hooks (billing in cloud, undefined in OSS)
const hooks = getBusinessModelRuntimeHooks(userId, provider);
const businessHooks = getBusinessModelRuntimeHooks(userId, provider);
// 5. Initialize ModelRuntime with the payload and hooks
// 5. Compose with the per-call llm_generation_tracing hook (no-op when the
// service is unconfigured, so OSS / self-hosted setups pay nothing for it).
const tracingHooks = createLLMGenerationTracingHook(userId, provider);
const hooks = mergeModelRuntimeHooks(businessHooks, tracingHooks);
// 6. Initialize ModelRuntime with the payload and hooks
return initModelRuntimeWithUserPayload(provider, payload, { userId }, hooks);
};
@@ -1013,5 +1013,43 @@ describe('aiChatRouter', () => {
{ metadata: { trigger: 'chat' } },
);
});
it('merges caller metadata over the default trigger', async () => {
const { initModelRuntimeFromDB } = await import('@/server/modules/ModelRuntime');
const mockGenerateObject = vi.fn().mockResolvedValue({ completion: 'hi there' });
vi.mocked(initModelRuntimeFromDB).mockResolvedValue({
generateObject: mockGenerateObject,
} as any);
const caller = aiChatRouter.createCaller({ ...mockCtx, serverDB: {} } as any);
await caller.outputJSON({
messages: [{ content: 'be helpful', role: 'system' }],
metadata: {
promptVersion: 'v2.0',
scenario: 'input_completion',
schemaName: 'InputCompletion',
},
model: 'gpt-4o-mini',
provider: 'openai',
schema: {
name: 'InputCompletion',
schema: {
additionalProperties: false,
properties: { completion: { type: 'string' } },
required: ['completion'],
type: 'object' as const,
},
},
});
expect(mockGenerateObject.mock.calls[0][1]).toEqual({
metadata: {
promptVersion: 'v2.0',
scenario: 'input_completion',
schemaName: 'InputCompletion',
trigger: 'chat',
},
});
});
});
});
+12 -8
View File
@@ -11,9 +11,9 @@ import { ThreadModel } from '@/database/models/thread';
import { TopicModel } from '@/database/models/topic';
import { authedProcedure, router } from '@/libs/trpc/lambda';
import { serverDatabase } from '@/libs/trpc/lambda/middleware';
import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
import { resolveContext } from '@/server/routers/lambda/_helpers/resolveContext';
import { AiChatService } from '@/server/services/aiChat';
import { AiGenerationService } from '@/server/services/aiGeneration';
import { FileService } from '@/server/services/file';
import { archiveToolResultIfNeeded } from '@/server/services/toolExecution/archiveToolResult';
@@ -29,6 +29,7 @@ const aiChatProcedure = authedProcedure.use(serverDatabase).use(async (opts) =>
ctx: {
agentModel: new AgentModel(ctx.serverDB, ctx.userId),
aiChatService: new AiChatService(ctx.serverDB, ctx.userId),
aiGenerationService: new AiGenerationService(ctx.serverDB, ctx.userId),
fileService: new FileService(ctx.serverDB, ctx.userId),
messageModel: new MessageModel(ctx.serverDB, ctx.userId),
threadModel: new ThreadModel(ctx.serverDB, ctx.userId),
@@ -43,19 +44,22 @@ export const aiChatRouter = router({
log('messages count: %d', input.messages.length);
log('schema: %O', input.schema);
log('initializing model runtime from DB with provider: %s', input.provider);
// Read user's provider config from database
const modelRuntime = await initModelRuntimeFromDB(ctx.serverDB, ctx.userId, input.provider);
log('calling generateObject');
const result = await modelRuntime.generateObject(
// Always stamp a trigger on metadata so cross-cutting hooks (timing,
// routing) and the tracing registry have a fallback when the caller
// forgets to set one. `tracing` carries the structured tracing config
// (scenario / promptVersion / schemaName / inputHint / ...).
const result = await ctx.aiGenerationService.generateObject(
{
messages: input.messages,
model: input.model,
provider: input.provider,
schema: input.schema,
tools: input.tools,
},
{ metadata: { trigger: RequestTrigger.Chat } },
{
metadata: { trigger: RequestTrigger.Chat, ...input.metadata },
tracing: input.tracing,
},
);
log('generateObject completed, result: %O', result);
@@ -0,0 +1,94 @@
// @vitest-environment node
import { beforeEach, describe, expect, it, vi } from 'vitest';
import * as ModelRuntimeModule from '@/server/modules/ModelRuntime';
import { AiGenerationService } from './index';
describe('AiGenerationService.generateObject', () => {
const generateObject = vi.fn();
const initSpy = vi.spyOn(ModelRuntimeModule, 'initModelRuntimeFromDB');
beforeEach(() => {
generateObject.mockReset();
initSpy.mockReset();
initSpy.mockResolvedValue({ generateObject } as any);
});
it('initialises the runtime from DB with the caller-supplied provider', async () => {
generateObject.mockResolvedValue({ ok: true });
const ai = new AiGenerationService({} as any, 'user-1');
await ai.generateObject({
messages: [{ content: 'hi', role: 'user' }],
model: 'gpt-4o',
provider: 'openai',
});
expect(initSpy).toHaveBeenCalledWith({}, 'user-1', 'openai');
});
it('forwards messages / model / schema / tools verbatim to the runtime', async () => {
generateObject.mockResolvedValue({ name: 'Atlas' });
const schema = {
name: 'Person',
schema: {
properties: { name: { type: 'string' } },
required: ['name'],
type: 'object' as const,
},
};
const ai = new AiGenerationService({} as any, 'user-1');
await ai.generateObject({
messages: [{ content: 'pick a name', role: 'user' }],
model: 'gpt-4o',
provider: 'openai',
schema,
});
const [payload] = generateObject.mock.calls[0];
expect(payload).toEqual({
messages: [{ content: 'pick a name', role: 'user' }],
model: 'gpt-4o',
schema,
tools: undefined,
});
});
it('forwards both options.metadata and options.tracing through to ModelRuntime.generateObject', async () => {
generateObject.mockResolvedValue({});
const ai = new AiGenerationService({} as any, 'user-1');
await ai.generateObject(
{
messages: [],
model: 'gpt-4o',
provider: 'openai',
},
{
metadata: { trigger: 'chat' },
tracing: {
promptVersion: 'v1.0',
scenario: 'input_completion',
},
},
);
const [, options] = generateObject.mock.calls[0];
expect(options).toMatchObject({
metadata: { trigger: 'chat' },
tracing: {
promptVersion: 'v1.0',
scenario: 'input_completion',
},
});
});
it('returns the runtime result with the typed cast applied', async () => {
generateObject.mockResolvedValue({ completion: 'hello world' });
const ai = new AiGenerationService({} as any, 'user-1');
const result = await ai.generateObject<{ completion: string }>({
messages: [],
model: 'gpt-4o',
provider: 'openai',
});
expect(result.completion).toBe('hello world');
});
});
+71
View File
@@ -0,0 +1,71 @@
import type {
ChatCompletionTool,
GenerateObjectPayload,
GenerateObjectSchema,
} from '@lobechat/model-runtime';
import type { OpenAIChatMessage } from '@lobechat/types';
import type { LobeChatDatabase } from '@/database/type';
import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
export interface AiGenerationObjectInput {
messages: OpenAIChatMessage[] | GenerateObjectPayload['messages'];
model: string;
provider: string;
schema?: GenerateObjectSchema;
tools?: ChatCompletionTool[];
}
export interface AiGenerationObjectOptions {
/**
* Free-form context forwarded to non-tracing hooks (billing, routing). Use
* `tracing` instead for `llm_generation_tracing` config.
*/
metadata?: Record<string, unknown>;
signal?: AbortSignal;
/**
* Structured tracing config (scenario / promptVersion / schemaName /
* agentId / topicId / inputHint / ...). Forwarded to the
* `llm_generation_tracing` hook. Strongly typed by `TracingOptions` from
* `@lobechat/llm-generation-tracing` at call sites.
*/
tracing?: Record<string, unknown>;
}
/**
* Thin wrapper around `initModelRuntimeFromDB` + `ModelRuntime.generateObject`.
*
* Almost every server-side caller that produces structured output goes through
* the same two-step dance: resolve the user's provider config from the DB,
* then call generateObject with caller-specific metadata. This service exists
* so those call sites don't repeat the init wiring, and so adding a future
* cross-cutting concern (default metadata, retries, observability defaults)
* has one place to land.
*
* Construct one per request — `db` and `userId` come from the request context.
*/
export class AiGenerationService {
private readonly db: LobeChatDatabase;
private readonly userId: string;
constructor(db: LobeChatDatabase, userId: string) {
this.db = db;
this.userId = userId;
}
async generateObject<T = unknown>(
input: AiGenerationObjectInput,
options: AiGenerationObjectOptions = {},
): Promise<T> {
const runtime = await initModelRuntimeFromDB(this.db, this.userId, input.provider);
return (await runtime.generateObject(
{
messages: input.messages as GenerateObjectPayload['messages'],
model: input.model,
schema: input.schema,
tools: input.tools,
},
{ metadata: options.metadata, signal: options.signal, tracing: options.tracing },
)) as T;
}
}
@@ -92,6 +92,12 @@ describe('FollowUpActionService.extract', () => {
expect.objectContaining({
model: 'custom-scene-model',
}),
expect.objectContaining({
tracing: expect.objectContaining({
scenario: 'follow_up',
topicId: TEST_TOPIC,
}),
}),
);
});
+24 -11
View File
@@ -1,10 +1,12 @@
import { TRACING_SCENARIOS } from '@lobechat/const';
import type { TracingOptions } from '@lobechat/llm-generation-tracing';
import type { FollowUpChip, FollowUpExtractInput, FollowUpExtractResult } from '@lobechat/types';
import debug from 'debug';
import type { LobeChatDatabase } from '@/database/type';
import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
import { AiGenerationService } from '@/server/services/aiGeneration';
import { buildSuggestionPrompt } from './prompts';
import { buildSuggestionPrompt, FOLLOW_UP_PROMPT_VERSION } from './prompts';
import { RawResponseSchema, SUGGESTION_RESPONSE_JSON_SCHEMA } from './schema';
const log = debug('lobe-server:follow-up-action-service');
@@ -52,17 +54,28 @@ export class FollowUpActionService {
const { system, user } = buildSuggestionPrompt({ assistantText: text, hint });
const { model, provider } = modelConfig;
const ai = new AiGenerationService(this.db, this.userId);
let raw: unknown;
try {
const modelRuntime = await initModelRuntimeFromDB(this.db, this.userId, provider);
raw = await modelRuntime.generateObject({
messages: [
{ content: system, role: 'system' as const },
{ content: user, role: 'user' as const },
],
model,
schema: SUGGESTION_RESPONSE_JSON_SCHEMA,
});
raw = await ai.generateObject(
{
messages: [
{ content: system, role: 'system' as const },
{ content: user, role: 'user' as const },
],
model,
provider,
schema: SUGGESTION_RESPONSE_JSON_SCHEMA,
},
{
tracing: {
promptVersion: FOLLOW_UP_PROMPT_VERSION,
scenario: TRACING_SCENARIOS.FollowUp,
schemaName: 'FollowUpSuggestionResponse',
topicId,
} satisfies TracingOptions,
},
);
} catch (error) {
log('LLM call failed: %O', error);
return EMPTY_RESULT(row.id);
@@ -3,6 +3,13 @@ import type { FollowUpHint } from '@lobechat/types';
import { BASE_SYSTEM_PROMPT } from './base';
import { buildOnboardingAddendum } from './onboarding';
/**
* Bump when editing BASE_SYSTEM_PROMPT, the onboarding addendum, or the
* suggestion response schema. The 6-char prompt hash in the tracing row
* catches forgotten bumps.
*/
export const FOLLOW_UP_PROMPT_VERSION = 'v1.0';
export interface BuiltPrompt {
system: string;
user: string;
@@ -0,0 +1,186 @@
// @vitest-environment node
import { beforeEach, describe, expect, it, vi } from 'vitest';
const isEnabled = vi.fn<() => boolean>(() => true);
const record = vi.fn<(params: Record<string, unknown>) => Promise<{ tracingId: string }>>(
async () => ({ tracingId: 'trace-1' }),
);
vi.mock('./index', () => ({
getLLMGenerationTracingService: () => ({ isEnabled, record }),
}));
// next/server is optional at runtime; default to "not available" so the hook
// falls back to its microtask path which is straightforward to test.
vi.mock('next/server', () => ({}));
const { createLLMGenerationTracingHook } = await import('./hook');
const flushMicrotasks = async () => {
await new Promise((resolve) => setImmediate(resolve));
};
beforeEach(() => {
isEnabled.mockReturnValue(true);
record.mockClear();
});
describe('createLLMGenerationTracingHook', () => {
it('returns an empty object when the service is disabled', () => {
isEnabled.mockReturnValue(false);
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
expect(hooks).toEqual({});
});
it('schedules a service.record call on success, reading structured tracing config from options.tracing', async () => {
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
expect(hooks.onGenerateObjectComplete).toBeDefined();
hooks.onGenerateObjectComplete!(
{
latencyMs: 250,
output: { topic: 'greeting' },
success: true,
usage: { cost: 0.001, totalInputTokens: 100, totalOutputTokens: 30 } as any,
},
{
options: {
metadata: { trigger: 'agent_signal' },
tracing: {
agentId: 'agt-1',
promptVersion: 'v2.0',
scenario: 'signal_skill_intent',
topicId: 'tpc-1',
},
},
payload: {
messages: [
{ content: 'be helpful', role: 'system' },
{ content: 'hi there', role: 'user' },
],
model: 'gpt-4o',
schema: { type: 'object' },
} as any,
},
);
await flushMicrotasks();
expect(record).toHaveBeenCalledTimes(1);
const call = record.mock.calls[0][0];
expect(call).toMatchObject({
agentId: 'agt-1',
costUsd: 0.001,
inputTokens: 100,
latencyMs: 250,
model: 'gpt-4o',
outputTokens: 30,
promptVersion: 'v2.0',
provider: 'openai',
scenario: 'signal_skill_intent',
success: true,
topicId: 'tpc-1',
trigger: 'agent_signal',
userId: 'user-1',
});
expect((call.payload as { systemPrompt?: string }).systemPrompt).toBe('be helpful');
expect(call.promptHash).toHaveLength(6);
});
it('forwards caller-supplied inputHint through to the service', async () => {
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
hooks.onGenerateObjectComplete!(
{ latencyMs: 10, success: true },
{
options: {
tracing: {
inputHint: '杭州天气',
scenario: 'input_completion',
schemaName: 'InputCompletion',
},
},
payload: { messages: [], model: 'gpt-4o', schema: {} } as any,
},
);
await flushMicrotasks();
expect(record.mock.calls[0][0]).toMatchObject({
inputHint: '杭州天气',
scenario: 'input_completion',
schemaName: 'InputCompletion',
});
});
it('flags validation failures using the error message heuristic and resolves scenario from metadata.trigger fallback', async () => {
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
hooks.onGenerateObjectComplete!(
{
error: { message: 'ZodError: required field missing' },
latencyMs: 100,
success: false,
},
{
options: { metadata: { trigger: 'topic' } },
payload: { messages: [], model: 'gpt-4o', schema: { type: 'object' } } as any,
},
);
await flushMicrotasks();
expect(record.mock.calls[0][0]).toMatchObject({
errorDetail: 'ZodError: required field missing',
scenario: 'topic_title',
success: false,
trigger: 'topic',
validationFailed: true,
});
});
it('writes caller-supplied tracing.metadata verbatim to the DB jsonb column (no auto-stamped provider)', async () => {
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
hooks.onGenerateObjectComplete!(
{ latencyMs: 5, success: true },
{
options: {
metadata: { trigger: 'memory' },
tracing: {
agentId: 'agt-known',
metadata: {
parent_memory_trace_key: 'memory-extraction/user-1/topic/abc/trace/2026-05-22.json',
},
},
},
payload: { messages: [], model: 'gpt-4o', schema: {} } as any,
},
);
await flushMicrotasks();
// `provider` is a first-class column — must NOT be duplicated into metadata.
expect(record.mock.calls[0][0].metadata).toEqual({
parent_memory_trace_key: 'memory-extraction/user-1/topic/abc/trace/2026-05-22.json',
});
});
it('omits the metadata field when the caller passes no tracing.metadata', async () => {
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
hooks.onGenerateObjectComplete!(
{ latencyMs: 5, success: true },
{
options: { tracing: { scenario: 'input_completion' } },
payload: { messages: [], model: 'gpt-4o', schema: {} } as any,
},
);
await flushMicrotasks();
expect(record.mock.calls[0][0].metadata).toBeUndefined();
});
it('falls back to the unknown scenario when no trigger / scenario is provided anywhere', async () => {
const hooks = createLLMGenerationTracingHook('user-1', 'openai');
hooks.onGenerateObjectComplete!(
{ latencyMs: 100, success: true },
{ options: {}, payload: { messages: [], model: 'gpt-4o' } as any },
);
await flushMicrotasks();
expect(record.mock.calls[0][0]).toMatchObject({
promptVersion: 'v0',
scenario: 'unknown',
});
});
});
@@ -0,0 +1,141 @@
import {
computePromptHash,
resolveScenario,
type TracingOptions,
} from '@lobechat/llm-generation-tracing';
import type { ModelRuntimeHooks } from '@lobechat/model-runtime';
import debug from 'debug';
import { getLLMGenerationTracingService } from './index';
const log = debug('lobe-server:llm-generation-tracing:hook');
const pickString = (value: unknown): string | undefined =>
typeof value === 'string' ? value : undefined;
/**
* Validate the loose `options.tracing` bag (the runtime declares it as
* `Record<string, unknown>`) into the strongly-typed `TracingOptions` shape
* the hook works with. Unknown keys flow through `metadata` for the DB jsonb
* column.
*/
const parseTracingOptions = (raw: Record<string, unknown> | undefined): TracingOptions => {
if (!raw) return {};
return {
agentId: pickString(raw.agentId),
inputHint: pickString(raw.inputHint),
metadata:
raw.metadata && typeof raw.metadata === 'object' && !Array.isArray(raw.metadata)
? (raw.metadata as Record<string, unknown>)
: undefined,
parentTracingId: pickString(raw.parentTracingId),
promptVersion: pickString(raw.promptVersion),
scenario: pickString(raw.scenario),
schemaName: pickString(raw.schemaName),
systemPrompt: pickString(raw.systemPrompt),
topicId: pickString(raw.topicId),
trigger: pickString(raw.trigger),
};
};
const extractSystemPrompt = (messages: unknown): string => {
if (!Array.isArray(messages)) return '';
const first = messages[0] as { content?: unknown; role?: unknown } | undefined;
if (first?.role === 'system' && typeof first.content === 'string') return first.content;
return '';
};
const tryScheduleAfter = (work: () => Promise<void> | void): void => {
let scheduled = false;
try {
const nextServer = require('next/server') as { after?: (fn: () => unknown) => void };
if (typeof nextServer.after === 'function') {
nextServer.after(work);
scheduled = true;
}
} catch {
// next/server not available — fall through to fire-and-forget
}
if (!scheduled) {
Promise.resolve()
.then(work)
.catch((err) => log('Deferred tracing work threw: %O', err));
}
};
/**
* Build a `ModelRuntimeHooks` slice that records every `generateObject` call to
* the `llm_generation_tracing` DB table + blob store. Designed to be merged
* with any business hooks at the ModelRuntime construction site.
*/
export const createLLMGenerationTracingHook = (
userId: string,
provider: string,
): Pick<ModelRuntimeHooks, 'onGenerateObjectComplete'> => {
const service = getLLMGenerationTracingService();
if (!service.isEnabled()) return {};
return {
onGenerateObjectComplete: (data, context) => {
const tracing = parseTracingOptions(context.options?.tracing as Record<string, unknown>);
// `trigger` is also read by ModelRuntime itself (timing logs) so it
// legitimately lives on `metadata`. Honour the explicit `tracing.trigger`
// override but fall back to the cross-cutting `metadata.trigger`.
const metadataTrigger = pickString(
(context.options?.metadata as Record<string, unknown> | undefined)?.trigger,
);
const trigger = tracing.trigger ?? metadataTrigger;
const { scenario, promptVersion } = resolveScenario({
promptVersion: tracing.promptVersion,
scenario: tracing.scenario,
trigger,
});
const systemPrompt = tracing.systemPrompt ?? extractSystemPrompt(context.payload.messages);
const promptHash = computePromptHash(systemPrompt, context.payload.schema);
// Heuristic: a Zod validation error message starts with the Zod marker.
const errorMessage = data.error?.message;
const validationFailed =
!data.success && typeof errorMessage === 'string' && /zod|validation/i.test(errorMessage);
tryScheduleAfter(async () => {
try {
await service.record({
agentId: tracing.agentId,
costUsd: (data.usage as { cost?: number } | undefined)?.cost,
errorCode: data.error?.code,
errorDetail: data.error?.message ?? data.error?.stack,
inputHint: tracing.inputHint,
inputTokens: data.usage?.totalInputTokens ?? data.usage?.inputTextTokens,
latencyMs: data.latencyMs,
// Caller-supplied jsonb context only. `provider` is already a
// first-class column on the row — no need to duplicate it here.
metadata: tracing.metadata,
model: context.payload.model,
outputTokens: data.usage?.totalOutputTokens ?? data.usage?.outputTextTokens,
parentTracingId: tracing.parentTracingId,
payload: {
input: context.payload.messages,
output: data.output,
schema: context.payload.schema,
systemPrompt,
},
promptHash,
promptVersion,
provider,
scenario,
schemaName: tracing.schemaName,
success: data.success,
topicId: tracing.topicId,
trigger,
userId,
validationFailed,
});
} catch (err) {
log('Tracing service threw: %O', err);
}
});
},
};
};
@@ -0,0 +1,227 @@
// @vitest-environment node
import type { ITracingStore, TracingPayload } from '@lobechat/llm-generation-tracing';
import { eq } from 'drizzle-orm';
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
import { getTestDB } from '@/database/core/getTestDB';
import { llmGenerationTracing, users } from '@/database/schemas';
import type { LobeChatDatabase } from '@/database/type';
import { LLMGenerationTracingService } from './index';
const serverDB: LobeChatDatabase = await getTestDB();
// The service resolves the DB via getServerDB at call time. Point it at our
// test DB so the integration covers the real insert/update path.
vi.mock('@/database/server', () => ({ getServerDB: async () => serverDB }));
const userId = 'llm-gen-trace-svc-user';
const stubStore: ITracingStore & {
save: ReturnType<typeof vi.fn<(record: TracingPayload) => Promise<{ key: string | null }>>>;
} = {
save: vi.fn<(record: TracingPayload) => Promise<{ key: string | null }>>(async () => ({
key: 'memo://saved',
})),
};
beforeEach(async () => {
await serverDB.delete(users);
await serverDB.insert(users).values([{ id: userId }]);
stubStore.save.mockClear();
stubStore.save.mockResolvedValue({ key: 'memo://saved' });
});
afterEach(async () => {
await serverDB.delete(llmGenerationTracing);
await serverDB.delete(users);
});
describe('LLMGenerationTracingService.record', () => {
it('inserts a row, calls the store, and patches the returned storage_key', async () => {
const service = new LLMGenerationTracingService(stubStore);
const result = await service.record({
latencyMs: 420,
model: 'gpt-4o',
payload: {
input: [{ content: 'hello world from the user', role: 'user' }],
output: { topic: 'greeting' },
schema: { type: 'object' },
systemPrompt: 'be helpful',
},
promptHash: 'aaaaaa',
promptVersion: 'v1.0',
provider: 'openai',
scenario: 'home_brief',
success: true,
trigger: 'home_brief',
userId,
});
expect(result?.tracingId).toMatch(/^[0-9a-f-]{36}$/);
expect(stubStore.save).toHaveBeenCalledTimes(1);
const payload = stubStore.save.mock.calls[0][0];
expect(payload).toMatchObject({
input: [{ content: 'hello world from the user', role: 'user' }],
model_metadata: { model: 'gpt-4o', provider: 'openai' },
output: { topic: 'greeting' },
prompt_hash: 'aaaaaa',
scenario: 'home_brief',
tracing_id: result?.tracingId,
version: '1.0',
});
const [row] = await serverDB
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.id, result!.tracingId));
expect(row).toMatchObject({
inputHint: 'hello world from the user',
latencyMs: 420,
model: 'gpt-4o',
promptHash: 'aaaaaa',
provider: 'openai',
scenario: 'home_brief',
storageKey: 'memo://saved',
success: true,
trigger: 'home_brief',
userId,
});
expect(row?.inputHash).toMatch(/^[0-9a-f]{64}$/);
});
it('preserves the row with storage_key=null and metadata.store_error when the store throws', async () => {
stubStore.save.mockRejectedValueOnce(new Error('S3 5xx'));
const service = new LLMGenerationTracingService(stubStore);
const result = await service.record({
metadata: { caller: 'home_brief_handler' },
promptHash: 'bbbbbb',
promptVersion: 'v1.0',
scenario: 'home_brief',
success: true,
userId,
});
const [row] = await serverDB
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.id, result!.tracingId));
expect(row?.storageKey).toBeNull();
expect(row?.metadata).toMatchObject({
caller: 'home_brief_handler',
store_error: 'S3 5xx',
});
});
it('honours an explicit inputHint override instead of auto-extracting from the first user message', async () => {
const service = new LLMGenerationTracingService(stubStore);
const result = await service.record({
inputHint: '杭州天气',
payload: {
// Wrapper prompt — first user message is a template, not the real input.
input: [
{ content: 'be helpful', role: 'system' },
{ content: 'Before cursor: "杭州天气" After cursor: ""', role: 'user' },
],
},
promptHash: 'ffffff',
promptVersion: 'v1.0',
scenario: 'input_completion',
success: true,
userId,
});
const [row] = await serverDB
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.id, result!.tracingId));
expect(row?.inputHint).toBe('杭州天气');
});
it('truncates an excessively long inputHint override to INPUT_HINT_MAX', async () => {
const service = new LLMGenerationTracingService(stubStore);
const long = 'x'.repeat(500);
const result = await service.record({
inputHint: long,
promptHash: 'ffffff',
promptVersion: 'v1.0',
scenario: 'input_completion',
success: true,
userId,
});
const [row] = await serverDB
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.id, result!.tracingId));
expect(row?.inputHint?.length).toBe(200);
});
it('leaves storage_key null when the store reports a local-only save (key=null)', async () => {
stubStore.save.mockResolvedValueOnce({ key: null });
const service = new LLMGenerationTracingService(stubStore);
const result = await service.record({
promptHash: 'eeeeee',
promptVersion: 'v1.0',
scenario: 'home_brief',
success: true,
userId,
});
const [row] = await serverDB
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.id, result!.tracingId));
expect(row?.storageKey).toBeNull();
});
it('returns null and skips everything when no store is configured', async () => {
const service = new LLMGenerationTracingService(null);
const result = await service.record({
promptHash: 'cccccc',
promptVersion: 'v1.0',
scenario: 'home_brief',
success: true,
userId,
});
expect(result).toBeNull();
expect(stubStore.save).not.toHaveBeenCalled();
const rows = await serverDB.select().from(llmGenerationTracing);
expect(rows).toHaveLength(0);
});
});
describe('LLMGenerationTracingService.recordFeedback', () => {
it('writes feedback columns onto the row owned by the caller', async () => {
const service = new LLMGenerationTracingService(stubStore);
const { tracingId } = (await service.record({
promptHash: 'dddddd',
promptVersion: 'v1.0',
scenario: 'agent_welcome',
success: true,
userId,
}))!;
await service.recordFeedback(userId, tracingId, {
data: { clicked_question_index: 0 },
score: 1,
signal: 'positive',
source: 'explicit_thumbs',
});
const [row] = await serverDB
.select()
.from(llmGenerationTracing)
.where(eq(llmGenerationTracing.id, tracingId));
expect(row).toMatchObject({
feedbackData: { clicked_question_index: 0 },
feedbackScore: 1,
feedbackSignal: 'positive',
feedbackSource: 'explicit_thumbs',
});
});
});
@@ -0,0 +1,258 @@
import {
computeInputHash,
FileTracingStore,
type ITracingStore,
type TracingPayload,
} from '@lobechat/llm-generation-tracing';
import debug from 'debug';
import { eq } from 'drizzle-orm';
import {
LlmGenerationTracingModel,
type RecordLlmGenerationParams,
type UpdateLlmGenerationFeedbackParams,
} from '@/database/models/llmGenerationTracing';
import { llmGenerationTracing } from '@/database/schemas/llmGenerationTracing';
import { getServerDB } from '@/database/server';
const log = debug('lobe-server:llm-generation-tracing:service');
const INPUT_HINT_MAX = 200;
export interface GenerationCallPayload {
input?: unknown;
output?: unknown;
rawOutput?: string;
schema?: unknown;
systemPrompt?: string;
}
export interface RecordLLMGenerationCallParams {
agentId?: string | null;
costUsd?: number | null;
errorCode?: string | null;
errorDetail?: string | null;
/**
* Caller-supplied snippet stored on `input_hint`. When omitted, the service
* auto-extracts a hint from the first user message in `payload.input`.
* Callers wrapping the user's text in a template should pass the raw input
* here so the DB row stays human-scannable.
*/
inputHint?: string | null;
inputTokens?: number | null;
latencyMs?: number | null;
metadata?: Record<string, unknown>;
model?: string | null;
outputTokens?: number | null;
parentTracingId?: string | null;
payload?: GenerationCallPayload;
promptHash: string;
promptVersion: string;
provider?: string | null;
scenario: string;
schemaName?: string | null;
spanId?: string | null;
success: boolean;
topicId?: string | null;
traceId?: string | null;
trigger?: string | null;
userId: string;
validationFailed?: boolean;
}
/**
* Per-call observability for `generateObject`. Persists a structured summary
* row to `llm_generation_tracing` and the full prompt/input/output blob to the
* configured store (S3 in prod, local file in dev, no-op otherwise).
*
* Always invoked from `after()` so it never blocks the user response. Both
* store and DB failures are swallowed and logged — the DB row is the source
* of truth for analytics, the blob is a cold artefact for offline review.
*/
export class LLMGenerationTracingService {
private readonly store: ITracingStore | null;
constructor(store?: ITracingStore | null) {
this.store = store === undefined ? createDefaultStore() : store;
}
isEnabled(): boolean {
return this.store !== null;
}
async record(params: RecordLLMGenerationCallParams): Promise<{ tracingId: string } | null> {
if (!this.store) return null;
let db: Awaited<ReturnType<typeof getServerDB>>;
try {
db = await getServerDB();
} catch (err) {
log('Skipping tracing — getServerDB failed: %O', err);
return null;
}
const model = new LlmGenerationTracingModel(db, params.userId);
const dbValues: RecordLlmGenerationParams = {
agentId: params.agentId,
costUsd: params.costUsd,
errorCode: params.errorCode,
errorDetail: params.errorDetail,
inputHash: params.payload?.input ? computeInputHash(params.payload.input) : null,
inputHint: resolveInputHint(params.inputHint, params.payload?.input),
inputTokens: params.inputTokens,
latencyMs: params.latencyMs,
metadata: params.metadata,
model: params.model,
outputTokens: params.outputTokens,
parentTracingId: params.parentTracingId,
promptHash: params.promptHash,
promptVersion: params.promptVersion,
provider: params.provider,
scenario: params.scenario,
schemaName: params.schemaName,
spanId: params.spanId,
success: params.success,
topicId: params.topicId,
traceId: params.traceId,
trigger: params.trigger,
validationFailed: params.validationFailed,
};
// Insert first so the storage key can embed the row's id — every blob then
// points at exactly one row.
let tracingId: string;
try {
const row = await model.record(dbValues);
tracingId = row.id;
} catch (err) {
log('DB insert failed: %O', err);
return null;
}
const payload: TracingPayload = {
created_at: Date.now(),
error: params.success
? undefined
: {
code: params.errorCode ?? undefined,
message: params.errorDetail ?? undefined,
},
input: params.payload?.input,
model_metadata: {
model: params.model ?? undefined,
provider: params.provider ?? undefined,
},
output: params.payload?.output,
prompt_hash: params.promptHash,
prompt_version: params.promptVersion,
raw_output: params.validationFailed ? params.payload?.rawOutput : undefined,
scenario: params.scenario,
schema: params.payload?.schema,
system_prompt: params.payload?.systemPrompt,
tracing_id: tracingId,
validation_failed: params.validationFailed,
version: '1.0',
};
let storageKey: string | null = null;
let storeError: string | undefined;
try {
const result = await this.store.save(payload);
storageKey = result.key;
} catch (err) {
storeError = err instanceof Error ? err.message : String(err);
log('Store save failed (DB row kept): %O', err);
}
try {
await db
.update(llmGenerationTracing)
.set({
metadata: storeError
? { ...params.metadata, store_error: storeError }
: (params.metadata ?? {}),
storageKey,
})
.where(eq(llmGenerationTracing.id, tracingId));
} catch (err) {
log('Failed to patch storage_key onto row: %O', err);
}
return { tracingId };
}
async recordFeedback(
userId: string,
tracingId: string,
params: UpdateLlmGenerationFeedbackParams,
): Promise<void> {
let db: Awaited<ReturnType<typeof getServerDB>>;
try {
db = await getServerDB();
} catch (err) {
log('Skipping feedback — getServerDB failed: %O', err);
return;
}
const model = new LlmGenerationTracingModel(db, userId);
try {
await model.updateFeedback(tracingId, params);
} catch (err) {
log('Feedback update failed: %O', err);
}
}
}
const createDefaultStore = (): ITracingStore | null => {
if (process.env.ENABLE_LLM_GENERATION_TRACING_S3 === '1') {
try {
// Require at call time so test environments without S3 wiring don't break.
const { S3TracingStore } = require('@/server/modules/LLMGenerationTracing');
return new S3TracingStore();
} catch {
// S3 wiring not available — fall through to file store / null.
}
}
if (process.env.NODE_ENV === 'development') {
try {
return new FileTracingStore();
} catch {
// Filesystem unavailable — fall through to null.
}
}
return null;
};
const autoExtractHint = (input: unknown): string | null => {
if (input == null) return null;
if (typeof input === 'string') return input;
if (!Array.isArray(input)) return null;
const firstUser = input.find(
(m): m is { content: unknown; role: string } =>
typeof m === 'object' &&
m !== null &&
'role' in m &&
(m as { role: unknown }).role === 'user',
);
return firstUser && typeof firstUser.content === 'string' ? firstUser.content : null;
};
/**
* Pick the `input_hint` value: caller-supplied override wins; otherwise fall
* back to a best-effort auto-extraction from the first user message. Always
* truncated to `INPUT_HINT_MAX` so the column stays scannable.
*/
const resolveInputHint = (override: string | null | undefined, input: unknown): string | null => {
const raw = override ?? autoExtractHint(input);
if (raw == null) return null;
return raw.slice(0, INPUT_HINT_MAX);
};
let cachedInstance: LLMGenerationTracingService | null = null;
export const getLLMGenerationTracingService = (): LLMGenerationTracingService => {
if (!cachedInstance) cachedInstance = new LLMGenerationTracingService();
return cachedInstance;
};