✨ feat: per-call llm_generation_tracing observability (#15124)

* ✨ feat(database): add llm_generation_tracing schema + tracing package (LOBE-9462) Foundation layer for per-call observability of `generateObject` calls. - New Drizzle table `llm_generation_tracing` with identity / context / model / result / usage / storage / feedback / audit columns and full single-column index coverage (Postgres bitmap-scan friendly). Migration 0103 is idempotent (CREATE TABLE/INDEX IF NOT EXISTS) for safe re-runs. - `LlmGenerationTracingModel` with `record` / `updateFeedback` / `findById` / `listRecent`, all userId-scoped to prevent cross-user leaks. - New package `@lobechat/llm-generation-tracing` mirroring agent-tracing's shape: `ITracingStore` interface, `FileTracingStore` (local/dev, scenario subfolders + latest.json symlink), `computePromptHash` (6-char sha256 of systemPrompt + schema), and `TRACING_SCENARIO_REGISTRY` + `resolveScenario` with explicit scenario override. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ✨ feat(model-runtime): wire llm_generation_tracing into ModelRuntime.generateObject (LOBE-9462) Per-call interception layer — one hook covers all generateObject callers. - New `onGenerateObjectComplete` hook on `ModelRuntimeHooks`: always fires (success or failure) with latency, usage, output/error. Fixes the gap where `onGenerateObjectFinal` only fires when the runtime invokes `onUsage`. - `S3TracingStore` (zstd level 3, key `llm-generation-tracing/{scenario}/{v}-{hash}/{date}/{id}.json.zst`) and `LLMGenerationTracingService` that does DB insert → store.save → patch storage_key. Store failures preserve the row with `metadata.store_error`. - `createLLMGenerationTracingHook` + `mergeModelRuntimeHooks` wired into `initModelRuntimeFromDB`; tracing runs alongside business (billing) hooks via `next/server.after()` when available, microtask fallback otherwise. Unknown metadata keys (e.g. `parent_memory_trace_key`) pass through. - Memory extractor accepts `parentMemoryTraceKey` option for the job-level backlink. Follow-up-action caller given an explicit `scenario: 'follow_up'` metadata override — it was the only OSS caller missing trigger metadata. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ✅ test(llm-generation-tracing): type vi.fn mocks so tsgo accepts mock.calls indexing The hook + service tests destructured `mock.calls[0][0]` and accessed nested fields, which tsgo flagged as TS2493 / TS18046 because `vi.fn()` defaults to a zero-arg signature. Add explicit type parameters to the mocks so tsgo can infer the call tuple, and cast `call.payload` at the access point. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ♻️ refactor(model-runtime): move mergeModelRuntimeHooks into the package It's a generic utility for composing `ModelRuntimeHooks` instances — same import surface as `ModelRuntime` and the hooks interface — so it belongs alongside them rather than tucked under a server-side consumer. - New `packages/model-runtime/src/core/mergeHooks.ts` exports `mergeModelRuntimeHooks` and is re-exported from the package index. - Move the unit tests to `packages/model-runtime/src/core/mergeHooks.test.ts`, including a new case covering the "a throws → b is skipped" load-bearing semantics. - `src/server/services/llmGenerationTracing/hook.ts` drops the local copy and the consumer (`src/server/modules/ModelRuntime/index.ts`) imports from `@lobechat/model-runtime`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ♻️ refactor(llm-generation-tracing): version lives with the prompt, not in a central table `promptVersion` was baked into `TRACING_SCENARIO_REGISTRY`, far from any prompt definition — editing a prompt + forgetting to bump the entry in a completely different file was an obvious foot-gun. - Registry is now `Record<string, string>` mapping trigger → scenario only; it's the stable concern that rarely changes. - `resolveScenario` always passes `promptVersion` through from the caller, defaulting to `UNKNOWN_PROMPT_VERSION` ('v0') when absent. - Each call site declares its own `*_PROMPT_VERSION` constant next to the prompt it describes. `followUpAction` ships the first one: `FOLLOW_UP_PROMPT_VERSION` in `prompts/index.ts`, threaded through `metadata.promptVersion` at the `generateObject` call. Other callers can add the same constant when they next touch their prompts. The 6-char prompt hash on the row still catches forgotten bumps. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ✨ feat(input-completion): wire prompt-version metadata at the auto-complete call site Aligns input auto-complete with the FOLLOW_UP_PROMPT_VERSION convention so each prompt iteration is recordable as the chat-side tracing lands. - `INPUT_COMPLETION_PROMPT_VERSION = 'v1.0'` declared next to `chainInputCompletion` — bump together with the prompt body. - `fetchPresetTaskResult` accepts optional `metadata` and forwards it to `getChatCompletion`; the existing chat path already plumbs metadata to `ModelRuntime.chat` options. - `InputEditor` call site passes `{ scenario: 'input_completion', promptVersion }`. Note: `llm_generation_tracing` currently only fires from `onGenerateObjectComplete`. Input completion is a `chat` call, so this metadata is forward-looking until a chat-side tracing hook lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * 🐛 fix(llm-generation-tracing): collapse bucketDir path.join args to silence turbopack glob warning Turbopack's static analyzer treats `path.join(root, dyn1, dyn2)` as a multi-segment glob pattern and warned that it could match ~12k files in the project. Compose the relative subdir as a single string first, so `path.join` only sees one dynamic segment. Behavior unchanged — the resulting path is identical. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ✨ feat(input-completion): route auto-complete through generateObject for tracing Auto-complete is the first preset-task caller migrated to the structured- output path so it lands in `llm_generation_tracing` via the existing `onGenerateObjectComplete` hook. No new server hook, no global chat-side tracing. - `chainInputCompletion` now returns `{ messages, schema }` with a minimal `{ completion: string }` schema and a stable `INPUT_COMPLETION_SCHEMA_NAME` constant. JSON wrapping costs ~15-30 tokens against a 100-token completion budget — negligible for the observability win. - `StructureOutputSchema` / `StructureOutputParams` accept optional `metadata`; `aiChatRouter.outputJSON` merges caller metadata over the default trigger so `{ scenario, promptVersion, schemaName }` reach `ModelRuntime.generateObject` options unchanged. - `IStructureSchema.description` is now optional to match the zod schema — previously the TS type was stricter than runtime validation accepted. - `InputEditor` switches from `chatService.fetchPresetTaskResult` to `aiChatService.generateJSON`, reading `response.completion`. Streaming is dropped because auto-complete already buffers the full result before inserting; no UX change. - Reverts the unused `metadata` field that was added to `fetchPresetTaskResult` in the previous commit — no current caller needs it now that input completion uses the generateObject path. Bumps `INPUT_COMPLETION_PROMPT_VERSION` to v2.0 because the system prompt gained an "output the completion field" instruction. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ♻️ refactor(aiGeneration): extract the runtime-init + generateObject dance into a service Every server-side caller that produces structured output was repeating the same two-step ritual: `initModelRuntimeFromDB(...)` → `runtime.generateObject(payload, { metadata })`. `AiGenerationService` collapses it into one call so future cross-cutting concerns (default metadata, retry, observability hooks) have one place to land. - New `src/server/services/aiGeneration/index.ts` exposes `generateObject<T>(input, options)` and is unit-tested for provider resolution + payload/metadata pass-through. - `aiChatRouter.outputJSON` and `FollowUpActionService.extract` migrated to the service (other callers move organically when next touched). - Drops the unused `keyVaultsPayload` field from `StructureOutputParams` and the placeholder at the InputEditor call site — key vaults are server-resolved from DB, the client never supplies them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ♻️ refactor(tracing): centralize TRACING_SCENARIOS const + inject AiGenerationService via trpc ctx - New `packages/const/src/llmGenerationTracing.ts` exports `TRACING_SCENARIOS` + `TracingScenario` type — the single directory where every known scenario name lives. Adds `@lobechat/const` as a workspace dep on llm-generation- tracing so `TRACING_SCENARIO_REGISTRY` can reference the same literals. - Callers (FollowUpActionService, InputEditor) replace `'follow_up'` / `'input_completion'` string literals with `TRACING_SCENARIOS.FollowUp` / `.InputCompletion`, so a typo or a rename fails the type-check instead of silently drifting on the row. - `AiGenerationService` is now injected into the `aiChatProcedure` ctx middleware alongside `aiChatService`; `outputJSON` consumes it via `ctx.aiGenerationService` instead of new-ing it inside the handler. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ✨ feat(llm-generation-tracing): add lt/llm-tracing CLI + drop local-only storage_key - Add `lt` / `llm-tracing` CLI under @lobechat/llm-generation-tracing with `list` (recent records, --scenario filter, --json) and `inspect` (by tracing_id prefix or latest, --full, --json). - `FileTracingStore.save` now returns `{ key: null }` so dev DB rows leave `storage_key` empty instead of recording a non-resolvable local path; S3 store remains the source of truth for the real key. Add helpers `findByTracingId` / `getLatest` used by the CLI. - Wire `agentId` and `topicId` into `input_completion` tracing metadata from the chat input auto-complete call site. - Default `FileTracingStore` whenever NODE_ENV=development (drop the ENABLE_LLM_GENERATION_TRACING_LOCAL opt-in env var). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * 💄 style(llm-generation-tracing): prettier CLI output (tree + colors) Mirror the @lobechat/agent-tracing viewer style: - Inline ANSI color helpers (dim/bold/cyan/magenta/green/yellow/red). - Compact single-line header with id, scenario, version, model, status, time — replaces the multi-line bullet list. - Tree structure with `├─`/`└─` connectors instead of `── section ──` banners. - input arrays render per-message (role + char count + preview) rather than dumping raw JSON. - Small single-key outputs (e.g. `{ completion: "怎么样" }`) collapse to inline `key: "value"`. - `lt list` switches to a colored, properly padded table. Default view stays compact; --full expands system_prompt / input / schema bodies. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * ♻️ refactor(llm-generation-tracing): split `tracing` config out of `metadata` `options.metadata` was overloaded — half tracing-specific structured fields (scenario / promptVersion / schemaName / agentId / topicId / ...), half free-form jsonb passthrough. Callers couldn't tell which was which, and the inputHint was always auto-extracted (useless when the prompt wraps the user's text in a template). This commit introduces a dedicated `tracing` option: - Add `TracingOptions` to @lobechat/llm-generation-tracing — the typed shape callers import (agentId / topicId / inputHint / scenario / promptVersion / schemaName / systemPrompt / parentTracingId / metadata). - Add loose `tracing?: Record<string, unknown>` to GenerateObjectOptions and StructureOutputParams / StructureOutputSchema so the field flows through the runtime + TRPC. - Tracing hook now reads `context.options.tracing` for structured fields; it still falls back to `metadata.trigger` for the cross-cutting trigger string (ModelRuntime itself uses metadata.trigger for timing logs, so trigger stays on metadata). - Service `record()` accepts an explicit `inputHint`; otherwise falls back to auto-extraction from the first user message. Always truncated. - Free-form jsonb fields move to `tracing.metadata` (was unknown-key passthrough on `metadata`). - Call sites updated: - FollowUpAction now passes `tracing: { scenario, promptVersion, schemaName, topicId }` (previously `metadata`). - InputCompletion now passes `tracing: { agentId, topicId, inputHint: input, scenario, promptVersion, schemaName }` — `inputHint` is the user's actual typed text, not the wrapper prompt's first user message. - `aiChat.outputJSON` router forwards both metadata and tracing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Update inputCompletion.ts * 🐛 fix(llm-generation-tracing): stop duplicating provider into the row's metadata jsonb `provider` is already a first-class column on the `llm_generation_tracing` row, so auto-stamping it into the `metadata` jsonb column on every call was pure noise. The hook now writes the caller-supplied `tracing.metadata` verbatim — empty/undefined when the caller had nothing to add. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-14 03:30:19 +00:00 · 2026-05-23 18:14:23 +08:00
parent ddb5794826
commit cce14911d1
49 changed files with 2926 additions and 71 deletions
@@ -106,6 +106,7 @@ vertex-ai-key.json

 # Agent tracing snapshots
 .agent-tracing/
+.llm-generation-tracing/

 # AI coding tools
 .local/
@@ -264,6 +264,7 @@
    "@lobechat/fetch-sse": "workspace:*",
    "@lobechat/file-loaders": "workspace:*",
    "@lobechat/heterogeneous-agents": "workspace:*",
+    "@lobechat/llm-generation-tracing": "workspace:*",
    "@lobechat/local-file-shell": "workspace:*",
    "@lobechat/markdown-patch": "workspace:*",
    "@lobechat/memory-user-memory": "workspace:*",
@@ -10,6 +10,7 @@ export * from './file';
 export * from './interests';
 export * from './klavis';
 export * from './layoutTokens';
+export * from './llmGenerationTracing';
 export * from './lobehubSkill';
 export * from './message';
 export * from './meta';
@@ -0,0 +1,25 @@
+/**
+ * Canonical directory of every `llm_generation_tracing` scenario value.
+ *
+ * Add to this map whenever a new caller pipes through the tracing path so
+ * there's one place to scan for all known scenarios. Values are the literal
+ * strings persisted on the row's `scenario` column — keep them stable, they
+ * are dashboard / partition keys.
+ */
+export const TRACING_SCENARIOS = {
+  AgentSignal: 'agent_signal',
+  AgentWelcome: 'agent_welcome',
+  FollowUp: 'follow_up',
+  HomeBrief: 'home_brief',
+  InputCompletion: 'input_completion',
+  MemoryExtract: 'memory_extract',
+  SignalFeedbackDomain: 'signal_feedback_domain',
+  SignalFeedbackSatisfaction: 'signal_feedback_satisfaction',
+  SignalSkillIntent: 'signal_skill_intent',
+  SignalSkillManagement: 'signal_skill_management',
+  SignupEmailReview: 'signup_email_review',
+  TopicTitle: 'topic_title',
+  Unknown: 'unknown',
+} as const;
+
+export type TracingScenario = (typeof TRACING_SCENARIOS)[keyof typeof TRACING_SCENARIOS];
@@ -0,0 +1,162 @@
+// @vitest-environment node
+import { afterEach, beforeEach, describe, expect, it } from 'vitest';
+
+import { getTestDB } from '../../core/getTestDB';
+import { llmGenerationTracing, users } from '../../schemas';
+import type { LobeChatDatabase } from '../../type';
+import { LlmGenerationTracingModel } from '../llmGenerationTracing';
+
+const serverDB: LobeChatDatabase = await getTestDB();
+
+const userId = 'llm-gen-trace-test-user';
+const otherUserId = 'llm-gen-trace-other-user';
+
+beforeEach(async () => {
+  await serverDB.delete(users);
+  await serverDB.insert(users).values([{ id: userId }, { id: otherUserId }]);
+});
+
+afterEach(async () => {
+  await serverDB.delete(llmGenerationTracing);
+  await serverDB.delete(users);
+});
+
+describe('LlmGenerationTracingModel', () => {
+  describe('record', () => {
+    it('inserts a row and returns the generated uuid', async () => {
+      const model = new LlmGenerationTracingModel(serverDB, userId);
+
+      const { id } = await model.record({
+        inputHint: 'hello world',
+        inputTokens: 120,
+        latencyMs: 850,
+        model: 'gpt-4o-mini',
+        outputTokens: 40,
+        promptHash: 'ab1fc3',
+        promptVersion: 'v1.0',
+        provider: 'openai',
+        scenario: 'home_brief',
+        schemaName: 'HomeBriefOutputSchema',
+        success: true,
+        trigger: 'home_brief',
+      });
+
+      expect(id).toMatch(/^[0-9a-f-]{36}$/);
+
+      const row = await model.findById(id);
+      expect(row).toMatchObject({
+        id,
+        inputHint: 'hello world',
+        inputTokens: 120,
+        latencyMs: 850,
+        metadata: {},
+        model: 'gpt-4o-mini',
+        outputTokens: 40,
+        promptHash: 'ab1fc3',
+        promptVersion: 'v1.0',
+        provider: 'openai',
+        scenario: 'home_brief',
+        schemaName: 'HomeBriefOutputSchema',
+        success: true,
+        trigger: 'home_brief',
+        userId,
+        validationFailed: false,
+      });
+      expect(row?.createdAt).toBeInstanceOf(Date);
+    });
+
+    it('records a failure with error fields and validation flag', async () => {
+      const model = new LlmGenerationTracingModel(serverDB, userId);
+
+      const { id } = await model.record({
+        errorCode: 'validation_failed',
+        errorDetail: 'output missing required field "summary"',
+        latencyMs: 1200,
+        model: 'gpt-4o',
+        promptHash: 'cccccc',
+        promptVersion: 'v1.0',
+        provider: 'openai',
+        scenario: 'topic_title',
+        success: false,
+        validationFailed: true,
+      });
+
+      const row = await model.findById(id);
+      expect(row).toMatchObject({
+        errorCode: 'validation_failed',
+        errorDetail: 'output missing required field "summary"',
+        success: false,
+        validationFailed: true,
+      });
+    });
+  });
+
+  describe('updateFeedback', () => {
+    it('writes feedback columns and the updated timestamp', async () => {
+      const model = new LlmGenerationTracingModel(serverDB, userId);
+      const { id } = await model.record({
+        promptHash: 'aaaaaa',
+        promptVersion: 'v1.0',
+        scenario: 'agent_welcome',
+        success: true,
+      });
+
+      await model.updateFeedback(id, {
+        data: { clicked_question_index: 1 },
+        score: 1,
+        signal: 'positive',
+        source: 'explicit_thumbs',
+      });
+
+      const row = await model.findById(id);
+      expect(row).toMatchObject({
+        feedbackData: { clicked_question_index: 1 },
+        feedbackScore: 1,
+        feedbackSignal: 'positive',
+        feedbackSource: 'explicit_thumbs',
+      });
+      expect(row?.feedbackUpdatedAt).toBeInstanceOf(Date);
+    });
+
+    it("does not touch another user's row", async () => {
+      const owner = new LlmGenerationTracingModel(serverDB, userId);
+      const intruder = new LlmGenerationTracingModel(serverDB, otherUserId);
+
+      const { id } = await owner.record({
+        promptHash: 'aaaaaa',
+        promptVersion: 'v1.0',
+        scenario: 'follow_up',
+        success: true,
+      });
+
+      await intruder.updateFeedback(id, {
+        signal: 'negative',
+        source: 'manual_edit',
+      });
+
+      const row = await owner.findById(id);
+      expect(row?.feedbackSignal).toBeNull();
+    });
+  });
+
+  describe('findById / listRecent', () => {
+    it('only returns rows owned by the caller', async () => {
+      const owner = new LlmGenerationTracingModel(serverDB, userId);
+      const stranger = new LlmGenerationTracingModel(serverDB, otherUserId);
+
+      const { id } = await owner.record({
+        promptHash: 'aaaaaa',
+        promptVersion: 'v1.0',
+        scenario: 'memory_extract',
+        success: true,
+      });
+
+      expect(await stranger.findById(id)).toBeNull();
+      expect(await stranger.listRecent()).toHaveLength(0);
+
+      const rows = await owner.listRecent();
+      expect(rows).toHaveLength(1);
+      expect(rows[0].id).toBe(id);
+    });
+  });
+});
@@ -0,0 +1,121 @@
+import { and, desc, eq } from 'drizzle-orm';
+
+import type {
+  LlmGenerationFeedbackSignal,
+  LlmGenerationFeedbackSource,
+  NewLlmGenerationTracing,
+} from '../schemas/llmGenerationTracing';
+import { llmGenerationTracing } from '../schemas/llmGenerationTracing';
+import type { LobeChatDatabase } from '../type';
+
+export interface RecordLlmGenerationParams {
+  agentId?: string | null;
+  costUsd?: number | null;
+  errorCode?: string | null;
+  errorDetail?: string | null;
+  inputHash?: string | null;
+  inputHint?: string | null;
+  inputTokens?: number | null;
+  latencyMs?: number | null;
+  metadata?: Record<string, unknown>;
+  model?: string | null;
+  outputTokens?: number | null;
+  parentTracingId?: string | null;
+  promptHash: string;
+  promptVersion: string;
+  provider?: string | null;
+  scenario: string;
+  schemaName?: string | null;
+  spanId?: string | null;
+  storageKey?: string | null;
+  success: boolean;
+  topicId?: string | null;
+  traceId?: string | null;
+  trigger?: string | null;
+  validationFailed?: boolean;
+}
+
+export interface UpdateLlmGenerationFeedbackParams {
+  data?: Record<string, unknown>;
+  score?: number | null;
+  signal: LlmGenerationFeedbackSignal;
+  source: LlmGenerationFeedbackSource;
+}
+
+export class LlmGenerationTracingModel {
+  private readonly db: LobeChatDatabase;
+  private readonly userId: string;
+
+  constructor(db: LobeChatDatabase, userId: string) {
+    this.db = db;
+    this.userId = userId;
+  }
+
+  async record(params: RecordLlmGenerationParams): Promise<{ id: string }> {
+    const values: NewLlmGenerationTracing = {
+      agentId: params.agentId ?? null,
+      costUsd: params.costUsd ?? null,
+      errorCode: params.errorCode ?? null,
+      errorDetail: params.errorDetail ?? null,
+      inputHash: params.inputHash ?? null,
+      inputHint: params.inputHint ?? null,
+      inputTokens: params.inputTokens ?? null,
+      latencyMs: params.latencyMs ?? null,
+      metadata: params.metadata ?? {},
+      model: params.model ?? null,
+      outputTokens: params.outputTokens ?? null,
+      parentTracingId: params.parentTracingId ?? null,
+      promptHash: params.promptHash,
+      promptVersion: params.promptVersion,
+      provider: params.provider ?? null,
+      scenario: params.scenario,
+      schemaName: params.schemaName ?? null,
+      spanId: params.spanId ?? null,
+      storageKey: params.storageKey ?? null,
+      success: params.success,
+      topicId: params.topicId ?? null,
+      traceId: params.traceId ?? null,
+      trigger: params.trigger ?? null,
+      userId: this.userId,
+      validationFailed: params.validationFailed ?? false,
+    };
+
+    const [row] = await this.db
+      .insert(llmGenerationTracing)
+      .values(values)
+      .returning({ id: llmGenerationTracing.id });
+
+    return { id: row.id };
+  }
+
+  async updateFeedback(id: string, params: UpdateLlmGenerationFeedbackParams): Promise<void> {
+    await this.db
+      .update(llmGenerationTracing)
+      .set({
+        feedbackData: params.data,
+        feedbackScore: params.score ?? null,
+        feedbackSignal: params.signal,
+        feedbackSource: params.source,
+        feedbackUpdatedAt: new Date(),
+      })
+      .where(and(eq(llmGenerationTracing.id, id), eq(llmGenerationTracing.userId, this.userId)));
+  }
+
+  async findById(id: string) {
+    const [row] = await this.db
+      .select()
+      .from(llmGenerationTracing)
+      .where(and(eq(llmGenerationTracing.id, id), eq(llmGenerationTracing.userId, this.userId)))
+      .limit(1);
+    return row ?? null;
+  }
+
+  async listRecent(limit = 50) {
+    return this.db
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.userId, this.userId))
+      .orderBy(desc(llmGenerationTracing.createdAt))
+      .limit(limit);
+  }
+}
@@ -0,0 +1,18 @@
+{
+  "name": "@lobechat/llm-generation-tracing",
+  "version": "1.0.0",
+  "private": true,
+  "exports": {
+    ".": "./src/index.ts",
+    "./store": "./src/store/types.ts"
+  },
+  "main": "./src/index.ts",
+  "bin": {
+    "llm-tracing": "./src/cli/index.ts",
+    "lt": "./src/cli/index.ts"
+  },
+  "dependencies": {
+    "@lobechat/const": "workspace:*",
+    "commander": "^13.1.0"
+  }
+}
@@ -0,0 +1,18 @@
+#!/usr/bin/env bun
+
+import { Command } from 'commander';
+
+import { registerInspectCommand } from './inspect';
+import { registerListCommand } from './list';
+
+const program = new Command();
+
+program
+  .name('llm-tracing')
+  .description('Inspect local llm-generation-tracing records under .llm-generation-tracing/')
+  .version('1.0.0');
+
+registerInspectCommand(program);
+registerListCommand(program);
+
+program.parse();
@@ -0,0 +1,35 @@
+import type { Command } from 'commander';
+
+import { FileTracingStore } from '../store/file-store';
+import type { TracingPayload } from '../types';
+import { renderPayloadDetail } from '../viewer';
+
+export function registerInspectCommand(program: Command) {
+  program
+    .command('inspect', { isDefault: true })
+    .alias('i')
+    .description('Inspect a tracing record by tracing_id prefix (defaults to latest)')
+    .argument('[tracingId]', 'tracing_id or prefix; omit to inspect the latest record')
+    .option('-j, --json', 'Output the raw JSON payload')
+    .option('-f, --full', 'Show full system_prompt / input / output (no truncation)')
+    .action(async (tracingId: string | undefined, opts: { full?: boolean; json?: boolean }) => {
+      const store = new FileTracingStore();
+      let record: TracingPayload | null;
+      if (tracingId) {
+        record = await store.findByTracingId(tracingId);
+      } else {
+        record = await store.getLatest();
+      }
+
+      if (!record) {
+        console.error(
+          tracingId
+            ? `No tracing record matched id prefix: ${tracingId}`
+            : 'No tracing records found. Run a generateObject call first (NODE_ENV=development).',
+        );
+        process.exit(1);
+      }
+
+      console.info(opts.json ? JSON.stringify(record, null, 2) : renderPayloadDetail(record, opts));
+    });
+}
@@ -0,0 +1,21 @@
+import type { Command } from 'commander';
+
+import { FileTracingStore } from '../store/file-store';
+import { renderSummaryTable } from '../viewer';
+
+export function registerListCommand(program: Command) {
+  program
+    .command('list')
+    .alias('ls')
+    .description('List recent llm-generation-tracing records (newest first)')
+    .option('-l, --limit <n>', 'Max number of records to show', '20')
+    .option('-s, --scenario <name>', 'Filter by scenario (e.g. input_completion, topic_title)')
+    .option('-j, --json', 'Output as JSON instead of a table')
+    .action(async (opts: { json?: boolean; limit: string; scenario?: string }) => {
+      const store = new FileTracingStore();
+      let limit = Number.parseInt(opts.limit, 10);
+      if (Number.isNaN(limit) || limit < 1) limit = 20;
+      const summaries = await store.list({ limit, scenario: opts.scenario });
+      console.info(opts.json ? JSON.stringify(summaries, null, 2) : renderSummaryTable(summaries));
+    });
+}
@@ -0,0 +1,19 @@
+export { computeInputHash, computePromptHash } from './promptHash';
+export type { ResolveScenarioInput } from './registry';
+export {
+  resolveScenario,
+  TRACING_SCENARIO_REGISTRY,
+  UNKNOWN_PROMPT_VERSION,
+  UNKNOWN_SCENARIO,
+} from './registry';
+export { DEFAULT_DIR, FileTracingStore } from './store/file-store';
+export type { ITracingStore, SaveResult } from './store/types';
+export type {
+  LlmGenerationFeedbackSignal,
+  ScenarioDefinition,
+  TracingErrorPayload,
+  TracingModelMetadata,
+  TracingOptions,
+  TracingPayload,
+  TracingSummary,
+} from './types';
@@ -0,0 +1,42 @@
+import { describe, expect, it } from 'vitest';
+
+import { computeInputHash, computePromptHash } from './promptHash';
+
+describe('computePromptHash', () => {
+  it('returns a 6-char hex digest', () => {
+    const hash = computePromptHash('you are a helpful agent', { type: 'object' });
+    expect(hash).toHaveLength(6);
+    expect(hash).toMatch(/^[0-9a-f]{6}$/);
+  });
+
+  it('is stable across calls with the same input', () => {
+    const a = computePromptHash('prompt-A', { foo: 1 });
+    const b = computePromptHash('prompt-A', { foo: 1 });
+    expect(a).toBe(b);
+  });
+
+  it('changes when system prompt changes', () => {
+    const a = computePromptHash('prompt-A', { foo: 1 });
+    const b = computePromptHash('prompt-B', { foo: 1 });
+    expect(a).not.toBe(b);
+  });
+
+  it('changes when schema changes', () => {
+    const a = computePromptHash('prompt', { foo: 1 });
+    const b = computePromptHash('prompt', { foo: 2 });
+    expect(a).not.toBe(b);
+  });
+
+  it('treats missing schema and empty schema differently', () => {
+    const undef = computePromptHash('prompt', undefined);
+    const empty = computePromptHash('prompt', {});
+    expect(undef).not.toBe(empty);
+  });
+});
+
+describe('computeInputHash', () => {
+  it('returns a full-length sha256 hex', () => {
+    expect(computeInputHash('hello')).toHaveLength(64);
+    expect(computeInputHash({ a: 1 })).toMatch(/^[0-9a-f]{64}$/);
+  });
+});
@@ -0,0 +1,29 @@
+import { createHash } from 'node:crypto';
+
+const SHORT_LENGTH = 6;
+
+/**
+ * Compute the 6-char prompt hash used to detect silently-mutated prompts.
+ *
+ * Hash input: `systemPrompt + '\n---\n' + JSON.stringify(schema)` — schema MUST
+ * be a deterministic JSON form (e.g. zod-to-json-schema). Keys in objects are
+ * stringified in insertion order; the caller is responsible for normalising.
+ */
+export const computePromptHash = (systemPrompt: string, schema: unknown): string => {
+  const schemaPart = schema === undefined ? '' : JSON.stringify(schema);
+  const hash = createHash('sha256');
+  hash.update(systemPrompt);
+  hash.update('\n---\n');
+  hash.update(schemaPart);
+  return hash.digest('hex').slice(0, SHORT_LENGTH);
+};
+
+/**
+ * sha256 of normalized input — for dedup / cache-hit analysis. Returned as the
+ * full hex digest; truncate at the caller if storage size matters.
+ */
+export const computeInputHash = (input: unknown): string => {
+  const hash = createHash('sha256');
+  hash.update(typeof input === 'string' ? input : JSON.stringify(input));
+  return hash.digest('hex');
+};
@@ -0,0 +1,52 @@
+import { describe, expect, it } from 'vitest';
+
+import {
+  resolveScenario,
+  TRACING_SCENARIO_REGISTRY,
+  UNKNOWN_PROMPT_VERSION,
+  UNKNOWN_SCENARIO,
+} from './registry';
+
+describe('TRACING_SCENARIO_REGISTRY', () => {
+  it('maps known triggers to scenario names (no versions)', () => {
+    expect(TRACING_SCENARIO_REGISTRY.topic).toBe('topic_title');
+    expect(TRACING_SCENARIO_REGISTRY.memory).toBe('memory_extract');
+  });
+});
+
+describe('resolveScenario', () => {
+  it('looks the scenario up by trigger and uses the caller-supplied promptVersion', () => {
+    expect(resolveScenario({ promptVersion: 'v3.1', trigger: 'topic' })).toEqual({
+      promptVersion: 'v3.1',
+      scenario: 'topic_title',
+    });
+  });
+
+  it('honours an explicit scenario override even when trigger has a registry mapping', () => {
+    expect(
+      resolveScenario({
+        promptVersion: 'v2.1',
+        scenario: 'signal_skill_intent',
+        trigger: 'agent_signal',
+      }),
+    ).toEqual({ promptVersion: 'v2.1', scenario: 'signal_skill_intent' });
+  });
+
+  it('falls back to UNKNOWN_PROMPT_VERSION when no version is provided', () => {
+    expect(resolveScenario({ scenario: 'custom_thing' })).toEqual({
+      promptVersion: UNKNOWN_PROMPT_VERSION,
+      scenario: 'custom_thing',
+    });
+  });
+
+  it('falls back to the unknown scenario sentinel when neither matches', () => {
+    expect(resolveScenario({ trigger: 'does_not_exist' })).toEqual({
+      promptVersion: UNKNOWN_PROMPT_VERSION,
+      scenario: UNKNOWN_SCENARIO,
+    });
+    expect(resolveScenario({})).toEqual({
+      promptVersion: UNKNOWN_PROMPT_VERSION,
+      scenario: UNKNOWN_SCENARIO,
+    });
+  });
+});
@@ -0,0 +1,67 @@
+import { TRACING_SCENARIOS, type TracingScenario } from '@lobechat/const';
+
+import type { ScenarioDefinition } from './types';
+
+/**
+ * Stable `trigger → scenario` mapping. Maps a `RequestTrigger` value to the
+ * default scenario name used for tracing.
+ *
+ * Triggers that fan out into multiple scenarios (e.g. `agent_signal` →
+ * `signal_skill_intent` / `signal_feedback_satisfaction` / ...) deliberately
+ * have no default entry here; those callers pass an explicit
+ * `metadata.scenario` instead.
+ *
+ * **Note on prompt versions**: version intentionally lives next to the prompt
+ * it describes (see `tracing.ts` files / `*_PROMPT_VERSION` constants near the
+ * `generateObject` call site). When the prompt or schema changes, bump that
+ * local constant — keeping the version next to the thing it versions avoids
+ * the drift you'd get from a central table that nobody remembers to update.
+ *
+ * For the full directory of scenario *names*, see `@lobechat/const`
+ * `TRACING_SCENARIOS`.
+ */
+export const TRACING_SCENARIO_REGISTRY: Record<string, TracingScenario> = {
+  agent_signal: TRACING_SCENARIOS.AgentSignal,
+  memory: TRACING_SCENARIOS.MemoryExtract,
+  signup_email_llm_review: TRACING_SCENARIOS.SignupEmailReview,
+  topic: TRACING_SCENARIOS.TopicTitle,
+};
+
+export const UNKNOWN_SCENARIO = TRACING_SCENARIOS.Unknown;
+export const UNKNOWN_PROMPT_VERSION = 'v0';
+
+export interface ResolveScenarioInput {
+  /**
+   * Prompt version supplied by the caller. Conventionally a `v<major>.<minor>`
+   * constant declared next to the prompt definition. Missing values resolve to
+   * `UNKNOWN_PROMPT_VERSION` so tracing still records the row.
+   */
+  promptVersion?: string;
+  /** Override scenario name (e.g. `signal_skill_intent`); takes precedence over registry. */
+  scenario?: string;
+  /** RequestTrigger value (string form). */
+  trigger?: string;
+}
+
+/**
+ * Pick the `{ scenario, promptVersion }` for a tracing record.
+ *
+ * Resolution order:
+ *   1. `input.scenario` if provided
+ *   2. registry lookup by `input.trigger`
+ *   3. `UNKNOWN_SCENARIO` sentinel
+ *
+ * `promptVersion` is always passed through from the caller (or
+ * `UNKNOWN_PROMPT_VERSION` if absent). The registry never assigns versions —
+ * they live with the prompt.
+ */
+export const resolveScenario = (input: ResolveScenarioInput): ScenarioDefinition => {
+  const scenario =
+    input.scenario ??
+    (input.trigger ? TRACING_SCENARIO_REGISTRY[input.trigger] : undefined) ??
+    UNKNOWN_SCENARIO;
+  return {
+    promptVersion: input.promptVersion ?? UNKNOWN_PROMPT_VERSION,
+    scenario,
+  };
+};
@@ -0,0 +1,117 @@
+import fs from 'node:fs/promises';
+import os from 'node:os';
+import path from 'node:path';
+
+import { afterEach, beforeEach, describe, expect, it } from 'vitest';
+
+import type { TracingPayload } from '../types';
+import { DEFAULT_DIR, FileTracingStore } from './file-store';
+
+let tmpRoot: string;
+
+const makePayload = (overrides: Partial<TracingPayload> = {}): TracingPayload => ({
+  created_at: new Date('2026-05-22T11:22:33.444Z').getTime(),
+  prompt_hash: 'abcdef',
+  prompt_version: 'v1.0',
+  scenario: 'home_brief',
+  tracing_id: '00000000-0000-0000-0000-000000000001',
+  version: '1.0',
+  ...overrides,
+});
+
+beforeEach(async () => {
+  tmpRoot = await fs.mkdtemp(path.join(os.tmpdir(), 'llm-gen-trace-test-'));
+});
+
+afterEach(async () => {
+  await fs.rm(tmpRoot, { force: true, recursive: true });
+});
+
+describe('FileTracingStore', () => {
+  it('writes payloads under {scenario}/{promptVersion}-{promptHash}/ and returns a null key (local-only)', async () => {
+    const store = new FileTracingStore(tmpRoot);
+    const payload = makePayload();
+
+    const { key } = await store.save(payload);
+    // Local store is non-shareable — DB should leave `storage_key` empty.
+    expect(key).toBeNull();
+
+    const dir = path.join(tmpRoot, DEFAULT_DIR, 'home_brief', 'v1.0-abcdef');
+    const entries = await fs.readdir(dir);
+    const jsonFiles = entries.filter((f) => f.endsWith('.json'));
+    expect(jsonFiles).toHaveLength(1);
+
+    const raw = await fs.readFile(path.join(dir, jsonFiles[0]), 'utf8');
+    expect(JSON.parse(raw)).toMatchObject({
+      prompt_hash: 'abcdef',
+      scenario: 'home_brief',
+      tracing_id: payload.tracing_id,
+    });
+  });
+
+  it('updates the latest.json symlink to point at the freshest record', async () => {
+    const store = new FileTracingStore(tmpRoot);
+    await store.save(makePayload({ tracing_id: 'aaaa-1' }));
+    await store.save(
+      makePayload({
+        created_at: new Date('2026-05-22T11:30:00.000Z').getTime(),
+        scenario: 'topic_title',
+        tracing_id: 'bbbb-2',
+      }),
+    );
+
+    const latestPath = path.join(tmpRoot, DEFAULT_DIR, 'latest.json');
+    const target = await fs.realpath(latestPath);
+    const content = await fs.readFile(target, 'utf8');
+    expect(JSON.parse(content)).toMatchObject({
+      scenario: 'topic_title',
+      tracing_id: 'bbbb-2',
+    });
+  });
+
+  it('lists recent records as flat summaries newest-first', async () => {
+    const store = new FileTracingStore(tmpRoot);
+    await store.save(
+      makePayload({
+        created_at: new Date('2026-05-22T11:00:00.000Z').getTime(),
+        scenario: 'home_brief',
+        tracing_id: 'aaaa',
+      }),
+    );
+    await store.save(
+      makePayload({
+        created_at: new Date('2026-05-22T12:00:00.000Z').getTime(),
+        scenario: 'memory_extract',
+        tracing_id: 'bbbb',
+      }),
+    );
+
+    const summaries = await store.list();
+    expect(summaries.map((s) => s.tracing_id)).toEqual(['bbbb', 'aaaa']);
+  });
+
+  it('round-trips a payload via get() using the on-disk file path', async () => {
+    const store = new FileTracingStore(tmpRoot);
+    const payload = makePayload({
+      input: { messages: [{ content: 'hi', role: 'user' }] },
+      output: { topic: 'greeting' },
+    });
+    await store.save(payload);
+
+    // save() returns a null key, so locate the file on disk and read via its path.
+    const dir = path.join(tmpRoot, DEFAULT_DIR, 'home_brief', 'v1.0-abcdef');
+    const jsonFile = (await fs.readdir(dir)).find((f) => f.endsWith('.json'));
+    if (!jsonFile) throw new Error('expected a saved tracing file to exist');
+    const loaded = await store.get(path.join(dir, jsonFile));
+    expect(loaded).toMatchObject({
+      input: { messages: [{ content: 'hi', role: 'user' }] },
+      output: { topic: 'greeting' },
+      tracing_id: payload.tracing_id,
+    });
+  });
+
+  it('returns null when get() targets a missing key', async () => {
+    const store = new FileTracingStore(tmpRoot);
+    expect(await store.get('not/a/real/key.json')).toBeNull();
+  });
+});
@@ -0,0 +1,167 @@
+import type { Dirent } from 'node:fs';
+import fs from 'node:fs/promises';
+import path from 'node:path';
+
+import type { TracingPayload, TracingSummary } from '../types';
+import type { ITracingStore, SaveResult } from './types';
+
+export const DEFAULT_DIR = '.llm-generation-tracing';
+
+const safeSegment = (value: string): string => value.replaceAll(/[^\w.-]+/g, '_') || 'unknown';
+
+/**
+ * Local / dev / desktop store. Writes plain JSON (no compression) so contents
+ * can be inspected with `cat`. Layout mirrors the S3 key pattern:
+ *
+ *   .llm-generation-tracing/{scenario}/{promptVersion}-{promptHash}/{file}.json
+ *
+ * Keeps a top-level `latest.json` symlink pointing at the most recent record.
+ */
+export class FileTracingStore implements ITracingStore {
+  private readonly root: string;
+
+  constructor(rootDir?: string) {
+    this.root = path.resolve(rootDir ?? process.cwd(), DEFAULT_DIR);
+  }
+
+  async save(record: TracingPayload): Promise<SaveResult> {
+    const dir = this.bucketDir(record);
+    await fs.mkdir(dir, { recursive: true });
+
+    const ts = new Date(record.created_at).toISOString().replaceAll(':', '-');
+    const shortId = safeSegment(record.tracing_id.slice(0, 12));
+    const filename = `${ts}_${shortId}.json`;
+    const filePath = path.join(dir, filename);
+
+    await fs.writeFile(filePath, JSON.stringify(record, null, 2), 'utf8');
+    await this.updateLatestSymlink(filePath);
+
+    // Local-only path — return null so the DB row's `storage_key` stays empty.
+    // The CLI rediscovers files by walking `.llm-generation-tracing/`.
+    return { key: null };
+  }
+
+  async get(key: string): Promise<TracingPayload | null> {
+    const target = path.isAbsolute(key) ? key : path.join(this.root, key);
+    try {
+      const content = await fs.readFile(target, 'utf8');
+      return JSON.parse(content) as TracingPayload;
+    } catch {
+      return null;
+    }
+  }
+
+  async list(options?: { limit?: number; scenario?: string }): Promise<TracingSummary[]> {
+    const limit = options?.limit ?? 20;
+    const files = await this.collectFiles();
+    files.sort((a, b) => (a.filename < b.filename ? 1 : -1));
+
+    const summaries: TracingSummary[] = [];
+    for (const file of files) {
+      if (summaries.length >= limit) break;
+      try {
+        const content = await fs.readFile(file.fullPath, 'utf8');
+        const record = JSON.parse(content) as TracingPayload;
+        if (options?.scenario && record.scenario !== options.scenario) continue;
+        summaries.push({
+          created_at: record.created_at,
+          model: record.model_metadata?.model,
+          prompt_version: record.prompt_version,
+          scenario: record.scenario,
+          success: !record.error,
+          tracing_id: record.tracing_id,
+          validation_failed: record.validation_failed,
+        });
+      } catch {
+        // skip corrupted files
+      }
+    }
+    return summaries;
+  }
+
+  /**
+   * CLI helper: find a payload by tracing_id prefix. Returns the most-recent
+   * match when several rows share the same prefix (e.g. truncated short id).
+   */
+  async findByTracingId(prefix: string): Promise<TracingPayload | null> {
+    const files = await this.collectFiles();
+    files.sort((a, b) => (a.filename < b.filename ? 1 : -1));
+    for (const file of files) {
+      try {
+        const content = await fs.readFile(file.fullPath, 'utf8');
+        const record = JSON.parse(content) as TracingPayload;
+        if (record.tracing_id.startsWith(prefix)) return record;
+      } catch {
+        // skip corrupted files
+      }
+    }
+    return null;
+  }
+
+  /** CLI helper: resolve the `latest.json` symlink (or fall back to the newest file). */
+  async getLatest(): Promise<TracingPayload | null> {
+    const latestPath = path.join(this.root, 'latest.json');
+    try {
+      const real = await fs.realpath(latestPath);
+      const content = await fs.readFile(real, 'utf8');
+      return JSON.parse(content) as TracingPayload;
+    } catch {
+      // symlink missing or unreadable — fall back to newest by filename order
+    }
+    const files = await this.collectFiles();
+    if (files.length === 0) return null;
+    files.sort((a, b) => (a.filename < b.filename ? 1 : -1));
+    try {
+      const content = await fs.readFile(files[0].fullPath, 'utf8');
+      return JSON.parse(content) as TracingPayload;
+    } catch {
+      return null;
+    }
+  }
+
+  private bucketDir(record: TracingPayload): string {
+    // Compose the relative segment as a single string so Turbopack / Webpack
+    // static analyzers don't try to enumerate path.join's multi-arg pattern
+    // (which fans out into a glob match against the project).
+    const sub = `${safeSegment(record.scenario)}/${safeSegment(record.prompt_version)}-${safeSegment(record.prompt_hash)}`;
+    return path.join(this.root, sub);
+  }
+
+  private async updateLatestSymlink(filePath: string): Promise<void> {
+    const latestPath = path.join(this.root, 'latest.json');
+    try {
+      await fs.unlink(latestPath);
+    } catch {
+      // ignore — no previous symlink
+    }
+    try {
+      await fs.symlink(path.relative(this.root, filePath), latestPath);
+    } catch {
+      // file systems without symlink support (e.g. Windows w/o dev mode) — silently skip
+    }
+  }
+
+  private async collectFiles(): Promise<{ filename: string; fullPath: string }[]> {
+    const results: { filename: string; fullPath: string }[] = [];
+
+    const walk = async (dir: string): Promise<void> => {
+      let entries: Dirent[];
+      try {
+        entries = await fs.readdir(dir, { withFileTypes: true });
+      } catch {
+        return;
+      }
+      for (const entry of entries) {
+        const full = path.join(dir, entry.name);
+        if (entry.isDirectory()) {
+          await walk(full);
+        } else if (entry.name.endsWith('.json') && entry.name !== 'latest.json') {
+          results.push({ filename: entry.name, fullPath: full });
+        }
+      }
+    };
+
+    await walk(this.root);
+    return results;
+  }
+}
@@ -0,0 +1,20 @@
+import type { TracingPayload, TracingSummary } from '../types';
+
+export interface SaveResult {
+  /**
+   * Canonical, globally addressable key for the saved payload (e.g. an S3
+   * object key). `null` when the payload was persisted only to a local /
+   * non-shareable location — the service should then leave `storage_key`
+   * empty in the DB rather than record a path no other process can resolve.
+   */
+  key: string | null;
+}
+
+export interface ITracingStore {
+  /** Optional retrieval — used by CLI / debug tooling only. */
+  get?: (key: string) => Promise<TracingPayload | null>;
+  /** Optional listing — used by CLI / debug tooling only. */
+  list?: (options?: { limit?: number }) => Promise<TracingSummary[]>;
+  /** Persist a tracing payload; returns the storage key for cross-reference. */
+  save: (record: TracingPayload) => Promise<SaveResult>;
+}
@@ -0,0 +1,100 @@
+export type LlmGenerationFeedbackSignal = 'positive' | 'negative' | 'neutral';
+
+export interface TracingErrorPayload {
+  code?: string;
+  message?: string;
+  stack?: string;
+}
+
+export interface TracingModelMetadata {
+  [key: string]: unknown;
+  finish_reason?: string;
+  model?: string;
+  provider?: string;
+}
+
+/**
+ * Blob payload written to the store. Mirrors the design's Blob schema —
+ * the DB row stores indexable summary columns; this carries the full prompt /
+ * input / output detail for offline analysis.
+ *
+ * Version field guards future schema evolution.
+ */
+export interface TracingPayload {
+  created_at: number;
+  error?: TracingErrorPayload;
+  input?: unknown;
+  model_metadata?: TracingModelMetadata;
+  output?: unknown;
+  prompt_hash: string;
+  prompt_version: string;
+  raw_output?: string;
+  scenario: string;
+  schema?: unknown;
+  system_prompt?: string;
+  /** Unique id of the tracing row in the DB. Used by the store to build the key. */
+  tracing_id: string;
+  validation_failed?: boolean;
+  version: '1.0';
+}
+
+export interface TracingSummary {
+  created_at: number;
+  latency_ms?: number;
+  model?: string;
+  prompt_version: string;
+  scenario: string;
+  success: boolean;
+  tracing_id: string;
+  validation_failed?: boolean;
+}
+
+export interface ScenarioDefinition {
+  /** Human-bumped prompt version (e.g. `v1.0`). */
+  promptVersion: string;
+  /** Symbolic scenario name, used for grouping and partitioning storage. */
+  scenario: string;
+}
+
+/**
+ * Caller-facing tracing config for a single `generateObject` call. Passed
+ * through `GenerateObjectOptions.tracing` and consumed by the tracing hook
+ * to populate the `llm_generation_tracing` DB row + off-DB blob.
+ *
+ * Every field is optional — the hook fills sensible defaults (auto-extracted
+ * `inputHint`, registry-resolved `scenario`, `messages[0]` as system prompt).
+ * Supply the fields explicitly to keep the DB row scannable.
+ */
+export interface TracingOptions {
+  /** Owning agent ID; persisted to `agent_id`. */
+  agentId?: string;
+  /**
+   * Short snippet stored on `input_hint`. Pass the user's actual typed text
+   * when the prompt wraps it in a template — otherwise the auto-extracted
+   * hint ends up being the wrapper's first user message (e.g.
+   * `Before cursor: "…" After cursor: "…"`) instead of what the user wrote.
+   */
+  inputHint?: string;
+  /**
+   * Free-form context written to the row's `metadata` jsonb column. Use this
+   * for ad-hoc fields that don't deserve a typed slot (e.g. correlation IDs).
+   */
+  metadata?: Record<string, unknown>;
+  /** Parent tracing row for chained generations. */
+  parentTracingId?: string;
+  /** Semantic prompt version (e.g. `v1.0`). */
+  promptVersion?: string;
+  /** Scenario name; falls back to registry lookup by `trigger`. */
+  scenario?: string;
+  /** Structured-output schema identifier. */
+  schemaName?: string;
+  /**
+   * Override for the prompt-hash system text. Defaults to `messages[0]`
+   * when it's a system message.
+   */
+  systemPrompt?: string;
+  /** Topic / conversation ID. */
+  topicId?: string;
+  /** RequestTrigger string. */
+  trigger?: string;
+}
@@ -0,0 +1,232 @@
+import type { TracingPayload, TracingSummary } from '../types';
+
+// ANSI color helpers — keep parity with @lobechat/agent-tracing's viewer.
+const dim = (s: string) => `\x1B[2m${s}\x1B[22m`;
+const bold = (s: string) => `\x1B[1m${s}\x1B[22m`;
+const green = (s: string) => `\x1B[32m${s}\x1B[39m`;
+const red = (s: string) => `\x1B[31m${s}\x1B[39m`;
+const yellow = (s: string) => `\x1B[33m${s}\x1B[39m`;
+const cyan = (s: string) => `\x1B[36m${s}\x1B[39m`;
+const magenta = (s: string) => `\x1B[35m${s}\x1B[39m`;
+
+const PREVIEW_CHARS = 120;
+const FULL_PREVIEW_CHARS = 4000;
+
+const padEnd = (text: string, width: number): string =>
+  text.length >= width ? text : text + ' '.repeat(width - text.length);
+
+const formatTime = (timestamp: number): string => {
+  const d = new Date(timestamp);
+  const pad = (n: number) => String(n).padStart(2, '0');
+  return `${d.getFullYear()}-${pad(d.getMonth() + 1)}-${pad(d.getDate())} ${pad(d.getHours())}:${pad(d.getMinutes())}:${pad(d.getSeconds())}`;
+};
+
+const previewLine = (text: string, maxLen: number): string => {
+  const single = text.replaceAll(/\s+/g, ' ').trim();
+  if (single.length <= maxLen) return single;
+  return `${single.slice(0, maxLen - 1)}…`;
+};
+
+const stringify = (value: unknown): string =>
+  typeof value === 'string' ? value : JSON.stringify(value, null, 2);
+
+const statusOf = (record: TracingPayload): string => {
+  if (record.error) return red('error');
+  if (record.validation_failed) return yellow('validation-fail');
+  return green('ok');
+};
+
+export const renderSummaryTable = (summaries: TracingSummary[]): string => {
+  if (summaries.length === 0) return dim('No tracing records found.');
+
+  const rows = summaries.map((s) => ({
+    created: formatTime(s.created_at),
+    id: s.tracing_id.slice(0, 12),
+    model: s.model ?? '-',
+    scenario: s.scenario,
+    statusRaw: s.success ? (s.validation_failed ? 'validation-fail' : 'ok') : 'error',
+    version: s.prompt_version,
+  }));
+
+  // Column widths include a 2-space right gutter so the next column never
+  // butts up against this one.
+  const widths = {
+    created: 19,
+    id: 14,
+    model: Math.max(8, 'MODEL'.length, ...rows.map((r) => r.model.length)) + 2,
+    scenario: Math.max(10, 'SCENARIO'.length, ...rows.map((r) => r.scenario.length)) + 2,
+    status: Math.max(8, 'STATUS'.length, 'validation-fail'.length) + 2,
+    version: Math.max(7, 'VERSION'.length, ...rows.map((r) => r.version.length)) + 2,
+  };
+
+  const colorStatus = (status: string): string =>
+    status === 'ok' ? green(status) : status === 'error' ? red(status) : yellow(status);
+
+  // Pad first (using raw text length), then colorize — keeps column alignment
+  // independent of ANSI escape codes.
+  const header =
+    bold(padEnd('ID', widths.id)) +
+    bold(padEnd('SCENARIO', widths.scenario)) +
+    bold(padEnd('VERSION', widths.version)) +
+    bold(padEnd('MODEL', widths.model)) +
+    bold(padEnd('STATUS', widths.status)) +
+    bold('CREATED');
+
+  const padCell = (text: string, width: number): string =>
+    ' '.repeat(Math.max(0, width - text.length));
+
+  const body = rows.map(
+    (r) =>
+      cyan(r.id) +
+      padCell(r.id, widths.id) +
+      r.scenario +
+      padCell(r.scenario, widths.scenario) +
+      r.version +
+      padCell(r.version, widths.version) +
+      magenta(r.model) +
+      padCell(r.model, widths.model) +
+      colorStatus(r.statusRaw) +
+      padCell(r.statusRaw, widths.status) +
+      dim(r.created),
+  );
+
+  const ruleWidth =
+    widths.id + widths.scenario + widths.version + widths.model + widths.status + widths.created;
+  return [header, dim('─'.repeat(ruleWidth)), ...body].join('\n');
+};
+
+const roleColor = (role: string): ((s: string) => string) => {
+  if (role === 'user') return green;
+  if (role === 'assistant') return cyan;
+  if (role === 'system') return magenta;
+  return yellow;
+};
+
+const renderInputMessages = (input: unknown, full: boolean): string[] => {
+  if (!Array.isArray(input))
+    return [`  ${dim(previewLine(stringify(input), full ? FULL_PREVIEW_CHARS : PREVIEW_CHARS))}`];
+
+  const lines: string[] = [];
+  for (let i = 0; i < input.length; i++) {
+    const msg = (input[i] ?? {}) as { content?: unknown; role?: string };
+    const role = msg.role ?? 'unknown';
+    const rawContent =
+      typeof msg.content === 'string' ? msg.content : JSON.stringify(msg.content ?? '');
+    const charCount = rawContent.length;
+    const charLabel = charCount > 0 ? dim(`  ${charCount} chars`) : '';
+    const connector = i === input.length - 1 ? '└─' : '├─';
+    lines.push(`  ${dim(connector)} ${dim(`[${i}]`)} ${roleColor(role)(role)}${charLabel}`);
+    if (rawContent) {
+      const preview = full ? rawContent : previewLine(rawContent, PREVIEW_CHARS);
+      lines.push(`     ${dim(preview)}`);
+    }
+  }
+  return lines;
+};
+
+const renderOutput = (output: unknown, full: boolean): string => {
+  // Inline tiny single-key objects: `{ completion: "怎么样" }` → `completion: "怎么样"`
+  if (
+    output &&
+    typeof output === 'object' &&
+    !Array.isArray(output) &&
+    Object.keys(output).length === 1
+  ) {
+    const [key, value] = Object.entries(output)[0];
+    const rendered = typeof value === 'string' ? `"${value}"` : JSON.stringify(value);
+    if (rendered.length <= PREVIEW_CHARS) return `${cyan(key)}: ${rendered}`;
+  }
+
+  const text = stringify(output);
+  if (full || text.length <= PREVIEW_CHARS * 2) return text;
+  return previewLine(text, PREVIEW_CHARS);
+};
+
+export const renderPayloadDetail = (
+  record: TracingPayload,
+  options: { full?: boolean },
+): string => {
+  const full = !!options.full;
+  const lines: string[] = [];
+
+  // Header — single compact line.
+  const modelLabel =
+    record.model_metadata?.provider || record.model_metadata?.model
+      ? `  ${magenta(`${record.model_metadata?.provider ?? '-'} / ${record.model_metadata?.model ?? '-'}`)}`
+      : '';
+  lines.push(
+    bold('LLM Generation') +
+      `  ${cyan(record.tracing_id.slice(0, 12))}` +
+      `  scenario:${record.scenario}` +
+      `  ${dim(record.prompt_version)}` +
+      modelLabel +
+      `  ${statusOf(record)}` +
+      `  ${dim(formatTime(record.created_at))}`,
+  );
+
+  if (record.error) {
+    lines.push(`${red('Error:')} ${record.error.code ?? '-'} — ${record.error.message ?? '-'}`);
+  }
+
+  // Build sections as a tree. Each section is rendered as `├─ label  meta` then optional indented body.
+  type Section = { body?: string[]; label: string; meta?: string };
+  const sections: Section[] = [];
+
+  if (record.system_prompt) {
+    sections.push({
+      body: full ? [`  ${dim(record.system_prompt)}`] : undefined,
+      label: 'system_prompt',
+      meta: full
+        ? dim(`${record.system_prompt.length} chars`)
+        : dim(`${record.system_prompt.length} chars  (use --full to expand)`),
+    });
+  }
+
+  if (record.input !== undefined) {
+    const isArr = Array.isArray(record.input);
+    const count = isArr ? (record.input as unknown[]).length : 1;
+    sections.push({
+      body: renderInputMessages(record.input, full),
+      label: 'input',
+      meta: isArr ? dim(`${count} message${count === 1 ? '' : 's'}`) : undefined,
+    });
+  }
+
+  if (record.output !== undefined) {
+    sections.push({
+      body: [`  ${renderOutput(record.output, full)}`],
+      label: 'output',
+    });
+  }
+
+  if (record.raw_output) {
+    sections.push({
+      body: [`  ${dim(full ? record.raw_output : previewLine(record.raw_output, PREVIEW_CHARS))}`],
+      label: 'raw_output',
+      meta: yellow('validation_failed'),
+    });
+  }
+
+  if (record.schema !== undefined) {
+    const schemaText = stringify(record.schema);
+    sections.push({
+      body: full ? [`  ${dim(schemaText)}`] : undefined,
+      label: 'schema',
+      meta: dim(
+        full ? `${schemaText.length} chars` : `${schemaText.length} chars  (use --full to expand)`,
+      ),
+    });
+  }
+
+  for (let i = 0; i < sections.length; i++) {
+    const s = sections[i];
+    const isLast = i === sections.length - 1;
+    const connector = isLast ? '└─' : '├─';
+    lines.push(`${dim(connector)} ${bold(s.label)}${s.meta ? `  ${s.meta}` : ''}`);
+    if (s.body) {
+      for (const line of s.body) lines.push(line);
+    }
+  }
+
+  return lines.join('\n');
+};
@@ -0,0 +1,7 @@
+{
+  "compilerOptions": {
+    "outDir": "./dist"
+  },
+  "extends": "../../tsconfig.json",
+  "include": ["src/**/*.ts"]
+}
@@ -0,0 +1,11 @@
+import { defineConfig } from 'vitest/config';
+
+export default defineConfig({
+  test: {
+    coverage: {
+      provider: 'v8',
+      reporter: ['text', 'json', 'lcov', 'text-summary'],
+    },
+    environment: 'node',
+  },
+});
@@ -157,7 +157,15 @@ export abstract class BaseMemoryExtractor<

              span.addEvent('gen_ai.request.send');
              const result = await this.runtime.generateObject(payload, {
-                metadata: { trigger: RequestTrigger.Memory },
+                metadata: {
+                  // Optional backlink to the job-level memory trace blob; the
+                  // llm_generation_tracing hook persists it under metadata so a
+                  // per-call row can be traced back to the job-level dump.
+                  ...(options?.parentMemoryTraceKey
+                    ? { parent_memory_trace_key: options.parentMemoryTraceKey }
+                    : {}),
+                  trigger: RequestTrigger.Memory,
+                },
              });
              span.addEvent('gen_ai.response.receive');

@@ -43,6 +43,13 @@ export interface ExtractorOptions extends ExtractorTemplateProps {
    ) => Promise<void> | void;
  };
  messageIds?: string[];
+  /**
+   * S3 key of the parent memory job trace. When provided, propagated into
+   * the per-call `llm_generation_tracing` row as `metadata.parent_memory_trace_key`,
+   * giving offline analysis a backlink from a single generateObject call to the
+   * job-level memory trace that spawned it.
+   */
+  parentMemoryTraceKey?: string;
  sourceId?: string;
  userId?: string;
 }
@@ -716,6 +716,46 @@ describe('ModelRuntime', () => {

        await expect(runtime.generateObject(genObjPayload)).resolves.toEqual({ result: 'ok' });
      });
+
+      it('onGenerateObjectComplete fires on success with output, latency and usage', async () => {
+        const onGenerateObjectComplete = vi.fn();
+        const { runtime, mockRuntimeAI } = createMockRuntime({ onGenerateObjectComplete });
+        const usage = { totalInputTokens: 50, totalOutputTokens: 20, cost: 0.001 };
+        mockRuntimeAI.generateObject.mockImplementation(async (_p: any, opts: any) => {
+          await opts?.onUsage?.(usage);
+          return { result: 'ok' };
+        });
+
+        await runtime.generateObject(genObjPayload);
+
+        expect(onGenerateObjectComplete).toHaveBeenCalledTimes(1);
+        const [data, context] = onGenerateObjectComplete.mock.calls[0];
+        expect(data).toMatchObject({ output: { result: 'ok' }, success: true, usage });
+        expect(data.latencyMs).toBeGreaterThanOrEqual(0);
+        expect(context.payload).toBe(genObjPayload);
+      });
+
+      it('onGenerateObjectComplete fires on failure with structured error and is awaited before throw', async () => {
+        const onGenerateObjectComplete = vi.fn();
+        const { runtime, mockRuntimeAI } = createMockRuntime({ onGenerateObjectComplete });
+        const cause = new Error('boom');
+        mockRuntimeAI.generateObject.mockRejectedValue(cause);
+
+        await expect(runtime.generateObject(genObjPayload)).rejects.toBe(cause);
+        expect(onGenerateObjectComplete).toHaveBeenCalledTimes(1);
+        const [data] = onGenerateObjectComplete.mock.calls[0];
+        expect(data.success).toBe(false);
+        expect(data.error?.message).toBe('boom');
+      });
+
+      it('hook errors thrown from onGenerateObjectComplete are swallowed and do not surface', async () => {
+        const onGenerateObjectComplete = vi.fn().mockRejectedValue(new Error('hook broke'));
+        const { runtime, mockRuntimeAI } = createMockRuntime({ onGenerateObjectComplete });
+        mockRuntimeAI.generateObject.mockResolvedValue({ result: 'ok' });
+
+        await expect(runtime.generateObject(genObjPayload)).resolves.toEqual({ result: 'ok' });
+        expect(onGenerateObjectComplete).toHaveBeenCalledTimes(1);
+      });
    });

    describe('embeddings hooks', () => {
@@ -84,6 +84,25 @@ export interface ModelRuntimeHooks {
    context: { options?: EmbeddingsOptions; payload: EmbeddingsPayload },
  ) => void | Promise<void>;

+  /**
+   * Always fires after `generateObject` returns or throws — success or failure.
+   * Use this for full-lifecycle observability (per-call tracing, prompt analytics).
+   * Unlike `onGenerateObjectFinal`, this fires regardless of whether the runtime
+   * surfaces a `usage` callback, so the gap of "succeeded but no usage" is covered.
+   *
+   * Hook failures are swallowed and logged — they must not interfere with the response.
+   */
+  onGenerateObjectComplete?: (
+    data: {
+      error?: { code?: string; message?: string; stack?: string };
+      latencyMs: number;
+      output?: unknown;
+      success: boolean;
+      usage?: ModelUsage;
+    },
+    context: { options?: GenerateObjectOptions; payload: GenerateObjectPayload },
+  ) => void | Promise<void>;
+
  onGenerateObjectError?: (
    error: ChatCompletionErrorPayload,
    context: { options?: GenerateObjectOptions; payload: GenerateObjectPayload },
@@ -286,16 +305,46 @@ export class ModelRuntime {
  }

  async generateObject(payload: GenerateObjectPayload, options?: GenerateObjectOptions) {
+    const startedAt = Date.now();
+    let usageCapture: ModelUsage | undefined;
+
+    const fireComplete = async (data: {
+      error?: { code?: string; message?: string; stack?: string };
+      output?: unknown;
+      success: boolean;
+    }) => {
+      if (!this._hooks?.onGenerateObjectComplete) return;
+      try {
+        await this._hooks.onGenerateObjectComplete(
+          {
+            error: data.error,
+            latencyMs: Date.now() - startedAt,
+            output: data.output,
+            success: data.success,
+            usage: usageCapture,
+          },
+          { options, payload },
+        );
+      } catch (e) {
+        // Hook failures must not affect the caller — log and move on.
+        console.error('[ModelRuntime] onGenerateObjectComplete hook error:', e);
+      }
+    };
+
    try {
      await this._hooks?.beforeGenerateObject?.(payload, options);

-      const finalOptions = this._hooks?.onGenerateObjectFinal
+      const needsUsageCapture =
+        this._hooks?.onGenerateObjectFinal || this._hooks?.onGenerateObjectComplete;
+
+      const finalOptions = needsUsageCapture
        ? {
            ...options,
            onUsage: async (usage: ModelUsage) => {
+              usageCapture = usage;
              await options?.onUsage?.(usage);
              try {
-                await this._hooks!.onGenerateObjectFinal!({ usage }, { options, payload });
+                await this._hooks?.onGenerateObjectFinal?.({ usage }, { options, payload });
              } catch (e) {
                // Hook failures (billing, tracing) must not interfere with response completion
                console.error('[ModelRuntime] onGenerateObjectFinal hook error:', e);
@@ -304,7 +353,9 @@ export class ModelRuntime {
          }
        : options;

-      return await this._runtime.generateObject!(payload, finalOptions);
+      const output = await this._runtime.generateObject!(payload, finalOptions);
+      await fireComplete({ output, success: true });
+      return output;
    } catch (error) {
      if (this._hooks?.onGenerateObjectError) {
        await this._hooks.onGenerateObjectError(error as ChatCompletionErrorPayload, {
@@ -312,6 +363,11 @@ export class ModelRuntime {
          payload,
        });
      }
+      const err = error as Error & { code?: string };
+      await fireComplete({
+        error: { code: err?.code, message: err?.message, stack: err?.stack },
+        success: false,
+      });
      throw error;
    }
  }
@@ -0,0 +1,67 @@
+import { describe, expect, it, vi } from 'vitest';
+
+import { mergeModelRuntimeHooks } from './mergeHooks';
+import type { ModelRuntimeHooks } from './ModelRuntime';
+
+describe('mergeModelRuntimeHooks', () => {
+  it('returns undefined when both hooks are empty', () => {
+    expect(mergeModelRuntimeHooks(undefined, undefined)).toBeUndefined();
+  });
+
+  it('returns the only present hook untouched', () => {
+    const fn = vi.fn();
+    const merged = mergeModelRuntimeHooks({ beforeChat: fn }, undefined);
+    expect(merged?.beforeChat).toBe(fn);
+  });
+
+  it('chains hooks of the same name in a → b order', async () => {
+    const order: string[] = [];
+    const a: ModelRuntimeHooks = {
+      onGenerateObjectComplete: vi.fn(async () => {
+        order.push('a');
+      }),
+    };
+    const b: ModelRuntimeHooks = {
+      onGenerateObjectComplete: vi.fn(async () => {
+        order.push('b');
+      }),
+    };
+
+    const merged = mergeModelRuntimeHooks(a, b);
+    await merged?.onGenerateObjectComplete?.(
+      { latencyMs: 0, success: true },
+      {} as Parameters<NonNullable<ModelRuntimeHooks['onGenerateObjectComplete']>>[1],
+    );
+    expect(order).toEqual(['a', 'b']);
+    expect(a.onGenerateObjectComplete).toHaveBeenCalledTimes(1);
+    expect(b.onGenerateObjectComplete).toHaveBeenCalledTimes(1);
+  });
+
+  it('does not run b when a throws (a is load-bearing)', async () => {
+    const bSpy = vi.fn();
+    const merged = mergeModelRuntimeHooks(
+      {
+        onGenerateObjectComplete: async () => {
+          throw new Error('billing failed');
+        },
+      },
+      { onGenerateObjectComplete: bSpy },
+    );
+
+    await expect(
+      merged?.onGenerateObjectComplete?.(
+        { latencyMs: 0, success: true },
+        {} as Parameters<NonNullable<ModelRuntimeHooks['onGenerateObjectComplete']>>[1],
+      ),
+    ).rejects.toThrow('billing failed');
+    expect(bSpy).not.toHaveBeenCalled();
+  });
+
+  it('keeps hooks that exist in only one side without wrapping', () => {
+    const onlyInA = vi.fn();
+    const onlyInB = vi.fn();
+    const merged = mergeModelRuntimeHooks({ beforeChat: onlyInA }, { onChatFinal: onlyInB });
+    expect(merged?.beforeChat).toBe(onlyInA);
+    expect(merged?.onChatFinal).toBe(onlyInB);
+  });
+});
@@ -0,0 +1,37 @@
+import type { ModelRuntimeHooks } from './ModelRuntime';
+
+/**
+ * Merge two `ModelRuntimeHooks` instances, chaining handlers that share a key
+ * so both fire in `a → b` order. Designed for composing layered hooks at the
+ * `ModelRuntime` construction site (e.g. billing hooks + tracing hooks).
+ *
+ * - Returns `undefined` only when both inputs are empty.
+ * - Chained hooks run sequentially (`a` first, then `b`); the second hook only
+ *   runs if the first resolves. Place load-bearing hooks (the ones whose
+ *   failure should abort the call) in `a`.
+ */
+export const mergeModelRuntimeHooks = (
+  a?: ModelRuntimeHooks,
+  b?: ModelRuntimeHooks,
+): ModelRuntimeHooks | undefined => {
+  if (!a && !b) return undefined;
+  if (!a) return b;
+  if (!b) return a;
+
+  const merged: ModelRuntimeHooks = { ...a };
+
+  for (const key of Object.keys(b) as (keyof ModelRuntimeHooks)[]) {
+    const existing = merged[key];
+    const next = b[key];
+    if (!existing) {
+      (merged[key] as unknown) = next;
+      continue;
+    }
+    (merged[key] as unknown) = async (...args: unknown[]) => {
+      await (existing as (...args: unknown[]) => Promise<unknown>)(...args);
+      await (next as (...args: unknown[]) => Promise<unknown>)(...args);
+    };
+  }
+
+  return merged;
+};
@@ -1,6 +1,7 @@
 export * from './const/models';
 export * from './core/BaseAI';
 export { pruneReasoningPayload } from './core/contextBuilders/openai';
+export { mergeModelRuntimeHooks } from './core/mergeHooks';
 export type { ModelRuntimeHooks } from './core/ModelRuntime';
 export { ModelRuntime } from './core/ModelRuntime';
 export { createOpenAICompatibleRuntime } from './core/openaiCompatibleFactory';
@@ -36,12 +36,20 @@ export interface GenerateObjectOptions {
   */
  headers?: Record<string, any>;

-  /** Metadata passed to hooks (billing, tracing, etc.) */
+  /** Free-form context passed to hooks (e.g. billing, routing). */
  metadata?: Record<string, unknown>;

  onUsage?: (usage: ModelUsage) => void | Promise<void>;

  signal?: AbortSignal;
+  /**
+   * Structured tracing config consumed by tracing hooks (e.g.
+   * `llm_generation_tracing`). Loosely typed here so the runtime stays
+   * tracing-agnostic; callers should import `TracingOptions` from
+   * `@lobechat/llm-generation-tracing` for the strongly-typed shape.
+   */
+  tracing?: Record<string, unknown>;
+
  /**
   * userId for the GenerateObject
   */
@@ -0,0 +1,42 @@
+import { describe, expect, it } from 'vitest';
+
+import {
+  chainInputCompletion,
+  INPUT_COMPLETION_PROMPT_VERSION,
+  INPUT_COMPLETION_SCHEMA_NAME,
+} from './inputCompletion';
+
+describe('chainInputCompletion', () => {
+  it('returns a system + user message pair', () => {
+    const { messages } = chainInputCompletion('How can I ', '');
+    expect(messages).toHaveLength(2);
+    expect(messages[0].role).toBe('system');
+    expect(messages[1].role).toBe('user');
+    expect(messages[1].content).toContain('Before cursor: "How can I "');
+    expect(messages[1].content).toContain('After cursor: ""');
+  });
+
+  it('attaches a minimal `{ completion: string }` schema for generateObject', () => {
+    const { schema } = chainInputCompletion('hi', '');
+    expect(schema.name).toBe(INPUT_COMPLETION_SCHEMA_NAME);
+    expect(schema.strict).toBe(true);
+    expect(schema.schema.required).toEqual(['completion']);
+    expect(schema.schema.additionalProperties).toBe(false);
+    expect(schema.schema.properties.completion.type).toBe('string');
+  });
+
+  it('appends conversation context to the system prompt when provided', () => {
+    const { messages } = chainInputCompletion('write ', '', [
+      { content: 'previous response', role: 'assistant' },
+      { content: 'previous question', role: 'user' },
+    ]);
+    const sys = messages[0].content as string;
+    expect(sys).toContain('Current conversation context');
+    expect(sys).toContain('assistant: previous response');
+    expect(sys).toContain('user: previous question');
+  });
+
+  it('exports a version constant the call site can pin to metadata', () => {
+    expect(INPUT_COMPLETION_PROMPT_VERSION).toMatch(/^v\d+\.\d+$/);
+  });
+});
@@ -1,21 +1,56 @@
-import type { ChatStreamPayload, OpenAIChatMessage } from '@lobechat/types';
+import type { OpenAIChatMessage } from '@lobechat/types';

-export const chainInputCompletion = (
-  beforeCursor: string,
-  afterCursor: string,
-  context?: OpenAIChatMessage[],
-): Partial<ChatStreamPayload> => {
-  let contextBlock = '';
-  if (context?.length) {
-    contextBlock = `\n\nCurrent conversation context:
-${context.map((m) => `${m.role}: ${m.content}`).join('\n')}`;
-  }
+/**
+ * Bump when editing the autocomplete system prompt or schema below. Plumbed
+ * through `metadata.promptVersion` at the call site so per-call tracing
+ * groups runs by prompt iteration. The 6-char prompt hash on the row catches
+ * forgotten bumps.
+ */
+export const INPUT_COMPLETION_PROMPT_VERSION = 'v1.0';

-  return {
-    max_tokens: 100,
-    messages: [
-      {
-        content: `You are an autocomplete engine for a chat input box. The user is composing a message to send to an AI assistant. Predict and complete what the USER is typing. Output ONLY the missing text to insert at the cursor.
+/**
+ * Symbolic schema name — also recorded on the tracing row's `schemaName`
+ * column so prompt iterations and schema renames can be reasoned about
+ * together.
+ */
+export const INPUT_COMPLETION_SCHEMA_NAME = 'InputCompletion';
+
+/**
+ * Minimal `generateObject` schema: a single `completion` string. The JSON
+ * wrapping overhead is ~15-30 tokens, which is negligible against the model's
+ * ~100-token completion budget but unlocks per-call tracing via the existing
+ * `ModelRuntime.generateObject` hook.
+ */
+export interface InputCompletionSchema {
+  name: typeof INPUT_COMPLETION_SCHEMA_NAME;
+  schema: {
+    additionalProperties: false;
+    properties: {
+      completion: { description: string; type: 'string' };
+    };
+    required: ['completion'];
+    type: 'object';
+  };
+  strict: true;
+}
+
+const INPUT_COMPLETION_SCHEMA: InputCompletionSchema = {
+  name: INPUT_COMPLETION_SCHEMA_NAME,
+  schema: {
+    additionalProperties: false,
+    properties: {
+      completion: {
+        description: 'The missing text to insert at the cursor. Empty string for no suggestion.',
+        type: 'string',
+      },
+    },
+    required: ['completion'],
+    type: 'object',
+  },
+  strict: true,
+};
+
+const SYSTEM_PROMPT = `You are an autocomplete engine for a chat input box. The user is composing a message to send to an AI assistant. Predict and complete what the USER is typing. Return only the missing text to insert at the cursor in the JSON object's \`completion\` field.

 CRITICAL RULES:
 - You are completing the USER's message, NOT the AI assistant's response
@@ -23,6 +58,7 @@ CRITICAL RULES:
 - NEVER generate text that sounds like an AI assistant responding (e.g., "help you", "assist you", "I can help")
 - Keep it short and natural, under 15 words
 - Match the user's language
+- If no completion would be useful, return an empty string

 GOOD examples (user perspective):
 "How can I " → "optimize my React component's performance?"
@@ -35,13 +71,28 @@ GOOD examples (user perspective):
 BAD examples (assistant perspective — NEVER do this):
 "How can I " → "help you today?" ← WRONG: this is what an AI assistant says
 "Hi" → ", how can I help you?" ← WRONG: assistant greeting
-"Let me " → "explain that for you" ← WRONG: assistant offering to explain${contextBlock}`,
-        role: 'system',
-      },
-      {
-        content: `Before cursor: "${beforeCursor}"\nAfter cursor: "${afterCursor}"`,
-        role: 'user',
-      },
+"Let me " → "explain that for you" ← WRONG: assistant offering to explain`;
+
+export interface InputCompletionChainResult {
+  messages: OpenAIChatMessage[];
+  schema: InputCompletionSchema;
+}
+
+export const chainInputCompletion = (
+  beforeCursor: string,
+  afterCursor: string,
+  context?: OpenAIChatMessage[],
+): InputCompletionChainResult => {
+  let contextBlock = '';
+  if (context?.length) {
+    contextBlock = `\n\nCurrent conversation context:\n${context.map((m) => `${m.role}: ${m.content}`).join('\n')}`;
+  }
+
+  return {
+    messages: [
+      { content: `${SYSTEM_PROMPT}${contextBlock}`, role: 'system' },
+      { content: `Before cursor: "${beforeCursor}"\nAfter cursor: "${afterCursor}"`, role: 'user' },
    ],
+    schema: INPUT_COMPLETION_SCHEMA,
  };
 };
@@ -194,6 +194,11 @@ export const StructureSchema = z.object({
 });

 export const StructureOutputSchema = z.object({
+  /**
+   * Free-form context forwarded to non-tracing hooks (e.g. billing). Use
+   * `tracing` for `llm_generation_tracing` config.
+   */
+  metadata: z.record(z.string(), z.unknown()).optional(),
  messages: z.array(z.any()),
  model: z.string(),
  provider: z.string(),
@@ -201,10 +206,16 @@ export const StructureOutputSchema = z.object({
  tools: z
    .array(z.object({ function: LobeUniformToolSchema, type: z.literal('function') }))
    .optional(),
+  /**
+   * Structured tracing config (scenario / promptVersion / schemaName /
+   * agentId / topicId / inputHint / ...). See `TracingOptions` from
+   * `@lobechat/llm-generation-tracing` for the typed shape.
+   */
+  tracing: z.record(z.string(), z.unknown()).optional(),
 });

 interface IStructureSchema {
-  description: string;
+  description?: string;
  name: string;
  schema: {
    additionalProperties?: boolean;
@@ -216,8 +227,12 @@ interface IStructureSchema {
 }

 export interface StructureOutputParams {
-  keyVaultsPayload: string;
  messages: OpenAIChatMessage[];
+  /**
+   * Free-form context forwarded to non-tracing hooks (e.g. billing). Use
+   * `tracing` for `llm_generation_tracing` config.
+   */
+  metadata?: Record<string, unknown>;
  model: string;
  provider: string;
  schema?: IStructureSchema;
@@ -226,4 +241,10 @@ export interface StructureOutputParams {
    function: LobeUniformTool;
    type: 'function';
  }[];
+  /**
+   * Structured tracing config (scenario / promptVersion / schemaName /
+   * agentId / topicId / inputHint / ...). See `TracingOptions` from
+   * `@lobechat/llm-generation-tracing` for the typed shape.
+   */
+  tracing?: Record<string, unknown>;
 }
@@ -1,8 +1,13 @@
-import { isDesktop } from '@lobechat/const';
+import { isDesktop, TRACING_SCENARIOS } from '@lobechat/const';
 import { HotkeyEnum, KeyEnum } from '@lobechat/const/hotkeys';
 import { HETEROGENEOUS_TYPE_LABELS } from '@lobechat/heterogeneous-agents';
-import { chainInputCompletion, escapeXmlAttr } from '@lobechat/prompts';
-import { isCommandPressed, merge } from '@lobechat/utils';
+import {
+  chainInputCompletion,
+  escapeXmlAttr,
+  INPUT_COMPLETION_PROMPT_VERSION,
+  INPUT_COMPLETION_SCHEMA_NAME,
+} from '@lobechat/prompts';
+import { isCommandPressed } from '@lobechat/utils';
 import type { IEditor } from '@lobehub/editor';
 import { INSERT_MENTION_COMMAND, ReactAutoCompletePlugin, ReactMathPlugin } from '@lobehub/editor';
 import { Editor, FloatMenu, useEditorState } from '@lobehub/editor/react';
@@ -16,9 +21,10 @@ import { useHotkeysContext } from 'react-hotkeys-hook';
 import { usePasteFile, useUploadFiles } from '@/components/DragUploadZone';
 import { useEnterToSend } from '@/hooks/useEnterToSend';
 import { useIMECompositionEvent } from '@/hooks/useIMECompositionEvent';
-import { chatService } from '@/services/chat';
+import { aiChatService } from '@/services/aiChat';
 import { useAgentStore } from '@/store/agent';
 import { agentByIdSelectors } from '@/store/agent/selectors';
+import { useChatStore } from '@/store/chat';
 import { useUserStore } from '@/store/user';
 import {
  labPreferSelectors,
@@ -213,36 +219,47 @@ const InputEditor = memo<{
      // mid-text causes nested editor updates that freeze the input
      if (afterText.trim()) return null;

-      const { enabled: _, ...config } = systemAgentSelectors.inputCompletion(
-        useUserStore.getState(),
-      );
+      const config = systemAgentSelectors.inputCompletion(useUserStore.getState());
      const context = getMessagesRef.current?.();
-      const chainParams = chainInputCompletion(input, afterText, context);
+      const { messages, schema } = chainInputCompletion(input, afterText, context);

      const abortController = new AbortController();
      abortSignal.addEventListener('abort', () => abortController.abort());

-      let result = '';
+      const currentTopicId = useChatStore.getState().activeTopicId;

+      let response: { completion?: string } | null;
      try {
-        await chatService.fetchPresetTaskResult({
-          abortController,
-          onMessageHandle: (chunk) => {
-            if (chunk.type === 'text') {
-              result += chunk.text;
-            }
+        response = (await aiChatService.generateJSON(
+          {
+            messages,
+            model: config.model,
+            provider: config.provider,
+            schema,
+            tracing: {
+              agentId,
+              // Use the user's actual typed text as the row's `input_hint`
+              // — the wrapped prompt's first user message is templated and
+              // not human-scannable.
+              inputHint: input,
+              promptVersion: INPUT_COMPLETION_PROMPT_VERSION,
+              scenario: TRACING_SCENARIOS.InputCompletion,
+              schemaName: INPUT_COMPLETION_SCHEMA_NAME,
+              topicId: currentTopicId,
+            },
          },
-          params: merge(config, chainParams),
-        });
+          abortController,
+        )) as { completion?: string } | null;
      } catch {
        return null;
      }

      if (abortSignal.aborted) return null;

-      return result.trimEnd() || null;
+      const completion = response?.completion?.trimEnd();
+      return completion || null;
    },
-    [isComposingRef],
+    [isComposingRef, agentId],
  );

  const autoCompletePlugin = useMemo(
@@ -0,0 +1,90 @@
+// @vitest-environment node
+import { promisify } from 'node:util';
+import { zstdCompress, zstdDecompress } from 'node:zlib';
+
+import type { TracingPayload } from '@lobechat/llm-generation-tracing';
+import { beforeEach, describe, expect, it, vi } from 'vitest';
+
+const compressZstd = promisify(zstdCompress);
+const decompressZstd = promisify(zstdDecompress);
+
+const uploadBuffer = vi.fn();
+const getFileByteArray = vi.fn();
+
+vi.mock('@/server/modules/S3', () => ({
+  FileS3: vi.fn(() => ({ getFileByteArray, uploadBuffer })),
+}));
+
+const { S3TracingStore, buildTracingKey } = await import('./S3TracingStore');
+
+const samplePayload = (overrides: Partial<TracingPayload> = {}): TracingPayload => ({
+  created_at: new Date('2026-05-22T11:22:33.444Z').getTime(),
+  prompt_hash: 'ab1fc3',
+  prompt_version: 'v1.0',
+  scenario: 'home_brief',
+  tracing_id: '00000000-0000-0000-0000-000000000001',
+  version: '1.0',
+  ...overrides,
+});
+
+beforeEach(() => {
+  uploadBuffer.mockReset().mockResolvedValue(undefined);
+  getFileByteArray.mockReset();
+});
+
+describe('buildTracingKey', () => {
+  it('lays out scenario / version-hash / date / id with the .json.zst suffix', () => {
+    const key = buildTracingKey(samplePayload());
+    expect(key).toBe(
+      'llm-generation-tracing/home_brief/v1.0-ab1fc3/2026-05-22/00000000-0000-0000-0000-000000000001.json.zst',
+    );
+  });
+
+  it('sanitises path-unsafe characters in scenario and version segments', () => {
+    const key = buildTracingKey(samplePayload({ prompt_version: 'v 2/0', scenario: 'odd name!' }));
+    expect(key).toMatch(
+      /llm-generation-tracing\/odd_name_\/v_2_0-ab1fc3\/2026-05-22\/00000000-0000-0000-0000-000000000001\.json\.zst/,
+    );
+  });
+});
+
+describe('S3TracingStore.save', () => {
+  it('uploads zstd-compressed JSON with the canonical key and content-type', async () => {
+    const store = new S3TracingStore();
+    const payload = samplePayload({ input: { messages: [{ role: 'user' }] } });
+
+    const { key } = await store.save(payload);
+
+    expect(key).toBe(
+      'llm-generation-tracing/home_brief/v1.0-ab1fc3/2026-05-22/00000000-0000-0000-0000-000000000001.json.zst',
+    );
+    expect(uploadBuffer).toHaveBeenCalledTimes(1);
+
+    const [callKey, body, contentType] = uploadBuffer.mock.calls[0];
+    expect(callKey).toBe(key);
+    expect(contentType).toBe('application/zstd');
+    expect(Buffer.isBuffer(body)).toBe(true);
+    expect([body[0], body[1], body[2], body[3]]).toEqual([0x28, 0xb5, 0x2f, 0xfd]);
+
+    const roundtripped = JSON.parse((await decompressZstd(body)).toString('utf8'));
+    expect(roundtripped).toEqual(payload);
+  });
+});
+
+describe('S3TracingStore.get', () => {
+  it('decompresses a stored payload by key', async () => {
+    const store = new S3TracingStore();
+    const payload = samplePayload();
+    const buf = await compressZstd(Buffer.from(JSON.stringify(payload)));
+    getFileByteArray.mockResolvedValueOnce(new Uint8Array(buf));
+
+    const loaded = await store.get('some/key.json.zst');
+    expect(loaded).toEqual(payload);
+  });
+
+  it('returns null when the key is missing', async () => {
+    const store = new S3TracingStore();
+    getFileByteArray.mockRejectedValueOnce(new Error('NoSuchKey'));
+    expect(await store.get('missing')).toBeNull();
+  });
+});
@@ -0,0 +1,88 @@
+import { promisify } from 'node:util';
+import { zstdCompress, zstdDecompress } from 'node:zlib';
+
+import type {
+  ITracingStore,
+  SaveResult,
+  TracingPayload,
+  TracingSummary,
+} from '@lobechat/llm-generation-tracing';
+import debug from 'debug';
+
+import { FileS3 } from '@/server/modules/S3';
+
+const compressZstd = promisify(zstdCompress);
+const decompressZstd = promisify(zstdDecompress);
+
+const log = debug('lobe-server:llm-generation-tracing:s3');
+
+const TRACE_PREFIX = 'llm-generation-tracing';
+const PAYLOAD_SUFFIX = '.json.zst';
+const ZSTD_CONTENT_TYPE = 'application/zstd';
+
+const sanitize = (value: string): string => value.replaceAll(/[^\w.-]+/g, '_') || 'unknown';
+
+const dateSegment = (createdAt: number): string => new Date(createdAt).toISOString().slice(0, 10);
+
+/**
+ * Canonical S3 key for a tracing payload. Same source of truth used by both
+ * the store's `save()` and the DB row's `storage_key` so the value persisted
+ * in `llm_generation_tracing.storage_key` always matches the object in S3.
+ *
+ * Layout:
+ *   llm-generation-tracing/{scenario}/{promptVersion}-{promptHash}/{yyyy-mm-dd}/{tracingId}.json.zst
+ */
+export const buildTracingKey = (record: {
+  created_at: number;
+  prompt_hash: string;
+  prompt_version: string;
+  scenario: string;
+  tracing_id: string;
+}): string =>
+  [
+    TRACE_PREFIX,
+    sanitize(record.scenario),
+    `${sanitize(record.prompt_version)}-${sanitize(record.prompt_hash)}`,
+    dateSegment(record.created_at),
+    `${sanitize(record.tracing_id)}${PAYLOAD_SUFFIX}`,
+  ].join('/');
+
+/**
+ * S3-backed store for per-call llm_generation_tracing payloads.
+ *
+ * Payload is zstd-compressed (level 3) prior to upload; the `.zst` suffix
+ * advertises the format but Content-Encoding is intentionally omitted to keep
+ * the object opaque to HTTP middleware (callers decompress explicitly).
+ *
+ * Query (`get` / `list`) is left intentionally minimal — analytics queries go
+ * against the DB row; the S3 blob is the cold artefact for offline review.
+ */
+export class S3TracingStore implements ITracingStore {
+  private readonly s3: FileS3;
+
+  constructor() {
+    this.s3 = new FileS3();
+  }
+
+  async save(record: TracingPayload): Promise<SaveResult> {
+    const key = buildTracingKey(record);
+    log('Saving tracing payload to S3: %s', key);
+    const compressed = await compressZstd(Buffer.from(JSON.stringify(record)));
+    await this.s3.uploadBuffer(key, compressed, ZSTD_CONTENT_TYPE);
+    return { key };
+  }
+
+  async get(key: string): Promise<TracingPayload | null> {
+    try {
+      const bytes = await this.s3.getFileByteArray(key);
+      const buf = await decompressZstd(Buffer.from(bytes));
+      return JSON.parse(buf.toString('utf8')) as TracingPayload;
+    } catch {
+      return null;
+    }
+  }
+
+  async list(_options?: { limit?: number }): Promise<TracingSummary[]> {
+    return [];
+  }
+}
@@ -0,0 +1 @@
+export { buildTracingKey, S3TracingStore } from './S3TracingStore';
@@ -1,5 +1,9 @@
 import { type GoogleGenAIOptions } from '@google/genai';
-import { ModelRuntime, type ModelRuntimeHooks } from '@lobechat/model-runtime';
+import {
+  mergeModelRuntimeHooks,
+  ModelRuntime,
+  type ModelRuntimeHooks,
+} from '@lobechat/model-runtime';
 import { LobeVertexAI } from '@lobechat/model-runtime/vertexai';
 import {
  type AWSBedrockKeyVault,
@@ -18,6 +22,7 @@ import { getBusinessModelRuntimeHooks } from '@/business/server/model-runtime';
 import { AiProviderModel } from '@/database/models/aiProvider';
 import { type LobeChatDatabase } from '@/database/type';
 import { getLLMConfig } from '@/envs/llm';
+import { createLLMGenerationTracingHook } from '@/server/services/llmGenerationTracing/hook';

 import { KeyVaultsGateKeeper } from '../KeyVaultsEncrypt';
 import apiKeyManager from './apiKeyManager';
@@ -420,8 +425,13 @@ export const initModelRuntimeFromDB = async (
  const payload = buildPayloadFromKeyVaults(keyVaults, runtimeProvider);

  // 4. Get business hooks (billing in cloud, undefined in OSS)
-  const hooks = getBusinessModelRuntimeHooks(userId, provider);
+  const businessHooks = getBusinessModelRuntimeHooks(userId, provider);

-  // 5. Initialize ModelRuntime with the payload and hooks
+  // 5. Compose with the per-call llm_generation_tracing hook (no-op when the
+  //    service is unconfigured, so OSS / self-hosted setups pay nothing for it).
+  const tracingHooks = createLLMGenerationTracingHook(userId, provider);
+  const hooks = mergeModelRuntimeHooks(businessHooks, tracingHooks);
+
+  // 6. Initialize ModelRuntime with the payload and hooks
  return initModelRuntimeWithUserPayload(provider, payload, { userId }, hooks);
 };
@@ -1013,5 +1013,43 @@ describe('aiChatRouter', () => {
        { metadata: { trigger: 'chat' } },
      );
    });
+
+    it('merges caller metadata over the default trigger', async () => {
+      const { initModelRuntimeFromDB } = await import('@/server/modules/ModelRuntime');
+      const mockGenerateObject = vi.fn().mockResolvedValue({ completion: 'hi there' });
+      vi.mocked(initModelRuntimeFromDB).mockResolvedValue({
+        generateObject: mockGenerateObject,
+      } as any);
+
+      const caller = aiChatRouter.createCaller({ ...mockCtx, serverDB: {} } as any);
+      await caller.outputJSON({
+        messages: [{ content: 'be helpful', role: 'system' }],
+        metadata: {
+          promptVersion: 'v2.0',
+          scenario: 'input_completion',
+          schemaName: 'InputCompletion',
+        },
+        model: 'gpt-4o-mini',
+        provider: 'openai',
+        schema: {
+          name: 'InputCompletion',
+          schema: {
+            additionalProperties: false,
+            properties: { completion: { type: 'string' } },
+            required: ['completion'],
+            type: 'object' as const,
+          },
+        },
+      });
+
+      expect(mockGenerateObject.mock.calls[0][1]).toEqual({
+        metadata: {
+          promptVersion: 'v2.0',
+          scenario: 'input_completion',
+          schemaName: 'InputCompletion',
+          trigger: 'chat',
+        },
+      });
+    });
  });
 });
@@ -11,9 +11,9 @@ import { ThreadModel } from '@/database/models/thread';
 import { TopicModel } from '@/database/models/topic';
 import { authedProcedure, router } from '@/libs/trpc/lambda';
 import { serverDatabase } from '@/libs/trpc/lambda/middleware';
-import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
 import { resolveContext } from '@/server/routers/lambda/_helpers/resolveContext';
 import { AiChatService } from '@/server/services/aiChat';
+import { AiGenerationService } from '@/server/services/aiGeneration';
 import { FileService } from '@/server/services/file';
 import { archiveToolResultIfNeeded } from '@/server/services/toolExecution/archiveToolResult';

@@ -29,6 +29,7 @@ const aiChatProcedure = authedProcedure.use(serverDatabase).use(async (opts) =>
    ctx: {
      agentModel: new AgentModel(ctx.serverDB, ctx.userId),
      aiChatService: new AiChatService(ctx.serverDB, ctx.userId),
+      aiGenerationService: new AiGenerationService(ctx.serverDB, ctx.userId),
      fileService: new FileService(ctx.serverDB, ctx.userId),
      messageModel: new MessageModel(ctx.serverDB, ctx.userId),
      threadModel: new ThreadModel(ctx.serverDB, ctx.userId),
@@ -43,19 +44,22 @@ export const aiChatRouter = router({
    log('messages count: %d', input.messages.length);
    log('schema: %O', input.schema);

-    log('initializing model runtime from DB with provider: %s', input.provider);
-    // Read user's provider config from database
-    const modelRuntime = await initModelRuntimeFromDB(ctx.serverDB, ctx.userId, input.provider);
-
-    log('calling generateObject');
-    const result = await modelRuntime.generateObject(
+    // Always stamp a trigger on metadata so cross-cutting hooks (timing,
+    // routing) and the tracing registry have a fallback when the caller
+    // forgets to set one. `tracing` carries the structured tracing config
+    // (scenario / promptVersion / schemaName / inputHint / ...).
+    const result = await ctx.aiGenerationService.generateObject(
      {
        messages: input.messages,
        model: input.model,
+        provider: input.provider,
        schema: input.schema,
        tools: input.tools,
      },
-      { metadata: { trigger: RequestTrigger.Chat } },
+      {
+        metadata: { trigger: RequestTrigger.Chat, ...input.metadata },
+        tracing: input.tracing,
+      },
    );

    log('generateObject completed, result: %O', result);
@@ -0,0 +1,94 @@
+// @vitest-environment node
+import { beforeEach, describe, expect, it, vi } from 'vitest';
+
+import * as ModelRuntimeModule from '@/server/modules/ModelRuntime';
+
+import { AiGenerationService } from './index';
+
+describe('AiGenerationService.generateObject', () => {
+  const generateObject = vi.fn();
+  const initSpy = vi.spyOn(ModelRuntimeModule, 'initModelRuntimeFromDB');
+
+  beforeEach(() => {
+    generateObject.mockReset();
+    initSpy.mockReset();
+    initSpy.mockResolvedValue({ generateObject } as any);
+  });
+
+  it('initialises the runtime from DB with the caller-supplied provider', async () => {
+    generateObject.mockResolvedValue({ ok: true });
+    const ai = new AiGenerationService({} as any, 'user-1');
+    await ai.generateObject({
+      messages: [{ content: 'hi', role: 'user' }],
+      model: 'gpt-4o',
+      provider: 'openai',
+    });
+    expect(initSpy).toHaveBeenCalledWith({}, 'user-1', 'openai');
+  });
+
+  it('forwards messages / model / schema / tools verbatim to the runtime', async () => {
+    generateObject.mockResolvedValue({ name: 'Atlas' });
+    const schema = {
+      name: 'Person',
+      schema: {
+        properties: { name: { type: 'string' } },
+        required: ['name'],
+        type: 'object' as const,
+      },
+    };
+
+    const ai = new AiGenerationService({} as any, 'user-1');
+    await ai.generateObject({
+      messages: [{ content: 'pick a name', role: 'user' }],
+      model: 'gpt-4o',
+      provider: 'openai',
+      schema,
+    });
+
+    const [payload] = generateObject.mock.calls[0];
+    expect(payload).toEqual({
+      messages: [{ content: 'pick a name', role: 'user' }],
+      model: 'gpt-4o',
+      schema,
+      tools: undefined,
+    });
+  });
+
+  it('forwards both options.metadata and options.tracing through to ModelRuntime.generateObject', async () => {
+    generateObject.mockResolvedValue({});
+    const ai = new AiGenerationService({} as any, 'user-1');
+    await ai.generateObject(
+      {
+        messages: [],
+        model: 'gpt-4o',
+        provider: 'openai',
+      },
+      {
+        metadata: { trigger: 'chat' },
+        tracing: {
+          promptVersion: 'v1.0',
+          scenario: 'input_completion',
+        },
+      },
+    );
+    const [, options] = generateObject.mock.calls[0];
+    expect(options).toMatchObject({
+      metadata: { trigger: 'chat' },
+      tracing: {
+        promptVersion: 'v1.0',
+        scenario: 'input_completion',
+      },
+    });
+  });
+
+  it('returns the runtime result with the typed cast applied', async () => {
+    generateObject.mockResolvedValue({ completion: 'hello world' });
+    const ai = new AiGenerationService({} as any, 'user-1');
+    const result = await ai.generateObject<{ completion: string }>({
+      messages: [],
+      model: 'gpt-4o',
+      provider: 'openai',
+    });
+    expect(result.completion).toBe('hello world');
+  });
+});
@@ -0,0 +1,71 @@
+import type {
+  ChatCompletionTool,
+  GenerateObjectPayload,
+  GenerateObjectSchema,
+} from '@lobechat/model-runtime';
+import type { OpenAIChatMessage } from '@lobechat/types';
+
+import type { LobeChatDatabase } from '@/database/type';
+import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
+
+export interface AiGenerationObjectInput {
+  messages: OpenAIChatMessage[] | GenerateObjectPayload['messages'];
+  model: string;
+  provider: string;
+  schema?: GenerateObjectSchema;
+  tools?: ChatCompletionTool[];
+}
+
+export interface AiGenerationObjectOptions {
+  /**
+   * Free-form context forwarded to non-tracing hooks (billing, routing). Use
+   * `tracing` instead for `llm_generation_tracing` config.
+   */
+  metadata?: Record<string, unknown>;
+  signal?: AbortSignal;
+  /**
+   * Structured tracing config (scenario / promptVersion / schemaName /
+   * agentId / topicId / inputHint / ...). Forwarded to the
+   * `llm_generation_tracing` hook. Strongly typed by `TracingOptions` from
+   * `@lobechat/llm-generation-tracing` at call sites.
+   */
+  tracing?: Record<string, unknown>;
+}
+
+/**
+ * Thin wrapper around `initModelRuntimeFromDB` + `ModelRuntime.generateObject`.
+ *
+ * Almost every server-side caller that produces structured output goes through
+ * the same two-step dance: resolve the user's provider config from the DB,
+ * then call generateObject with caller-specific metadata. This service exists
+ * so those call sites don't repeat the init wiring, and so adding a future
+ * cross-cutting concern (default metadata, retries, observability defaults)
+ * has one place to land.
+ *
+ * Construct one per request — `db` and `userId` come from the request context.
+ */
+export class AiGenerationService {
+  private readonly db: LobeChatDatabase;
+  private readonly userId: string;
+
+  constructor(db: LobeChatDatabase, userId: string) {
+    this.db = db;
+    this.userId = userId;
+  }
+
+  async generateObject<T = unknown>(
+    input: AiGenerationObjectInput,
+    options: AiGenerationObjectOptions = {},
+  ): Promise<T> {
+    const runtime = await initModelRuntimeFromDB(this.db, this.userId, input.provider);
+    return (await runtime.generateObject(
+      {
+        messages: input.messages as GenerateObjectPayload['messages'],
+        model: input.model,
+        schema: input.schema,
+        tools: input.tools,
+      },
+      { metadata: options.metadata, signal: options.signal, tracing: options.tracing },
+    )) as T;
+  }
+}
@@ -92,6 +92,12 @@ describe('FollowUpActionService.extract', () => {
      expect.objectContaining({
        model: 'custom-scene-model',
      }),
+      expect.objectContaining({
+        tracing: expect.objectContaining({
+          scenario: 'follow_up',
+          topicId: TEST_TOPIC,
+        }),
+      }),
    );
  });

@@ -1,10 +1,12 @@
+import { TRACING_SCENARIOS } from '@lobechat/const';
+import type { TracingOptions } from '@lobechat/llm-generation-tracing';
 import type { FollowUpChip, FollowUpExtractInput, FollowUpExtractResult } from '@lobechat/types';
 import debug from 'debug';

 import type { LobeChatDatabase } from '@/database/type';
-import { initModelRuntimeFromDB } from '@/server/modules/ModelRuntime';
+import { AiGenerationService } from '@/server/services/aiGeneration';

-import { buildSuggestionPrompt } from './prompts';
+import { buildSuggestionPrompt, FOLLOW_UP_PROMPT_VERSION } from './prompts';
 import { RawResponseSchema, SUGGESTION_RESPONSE_JSON_SCHEMA } from './schema';

 const log = debug('lobe-server:follow-up-action-service');
@@ -52,17 +54,28 @@ export class FollowUpActionService {
    const { system, user } = buildSuggestionPrompt({ assistantText: text, hint });
    const { model, provider } = modelConfig;

+    const ai = new AiGenerationService(this.db, this.userId);
    let raw: unknown;
    try {
-      const modelRuntime = await initModelRuntimeFromDB(this.db, this.userId, provider);
-      raw = await modelRuntime.generateObject({
-        messages: [
-          { content: system, role: 'system' as const },
-          { content: user, role: 'user' as const },
-        ],
-        model,
-        schema: SUGGESTION_RESPONSE_JSON_SCHEMA,
-      });
+      raw = await ai.generateObject(
+        {
+          messages: [
+            { content: system, role: 'system' as const },
+            { content: user, role: 'user' as const },
+          ],
+          model,
+          provider,
+          schema: SUGGESTION_RESPONSE_JSON_SCHEMA,
+        },
+        {
+          tracing: {
+            promptVersion: FOLLOW_UP_PROMPT_VERSION,
+            scenario: TRACING_SCENARIOS.FollowUp,
+            schemaName: 'FollowUpSuggestionResponse',
+            topicId,
+          } satisfies TracingOptions,
+        },
+      );
    } catch (error) {
      log('LLM call failed: %O', error);
      return EMPTY_RESULT(row.id);
@@ -3,6 +3,13 @@ import type { FollowUpHint } from '@lobechat/types';
 import { BASE_SYSTEM_PROMPT } from './base';
 import { buildOnboardingAddendum } from './onboarding';

+/**
+ * Bump when editing BASE_SYSTEM_PROMPT, the onboarding addendum, or the
+ * suggestion response schema. The 6-char prompt hash in the tracing row
+ * catches forgotten bumps.
+ */
+export const FOLLOW_UP_PROMPT_VERSION = 'v1.0';
+
 export interface BuiltPrompt {
  system: string;
  user: string;
@@ -0,0 +1,186 @@
+// @vitest-environment node
+import { beforeEach, describe, expect, it, vi } from 'vitest';
+
+const isEnabled = vi.fn<() => boolean>(() => true);
+const record = vi.fn<(params: Record<string, unknown>) => Promise<{ tracingId: string }>>(
+  async () => ({ tracingId: 'trace-1' }),
+);
+
+vi.mock('./index', () => ({
+  getLLMGenerationTracingService: () => ({ isEnabled, record }),
+}));
+
+// next/server is optional at runtime; default to "not available" so the hook
+// falls back to its microtask path which is straightforward to test.
+vi.mock('next/server', () => ({}));
+
+const { createLLMGenerationTracingHook } = await import('./hook');
+
+const flushMicrotasks = async () => {
+  await new Promise((resolve) => setImmediate(resolve));
+};
+
+beforeEach(() => {
+  isEnabled.mockReturnValue(true);
+  record.mockClear();
+});
+
+describe('createLLMGenerationTracingHook', () => {
+  it('returns an empty object when the service is disabled', () => {
+    isEnabled.mockReturnValue(false);
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    expect(hooks).toEqual({});
+  });
+
+  it('schedules a service.record call on success, reading structured tracing config from options.tracing', async () => {
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    expect(hooks.onGenerateObjectComplete).toBeDefined();
+
+    hooks.onGenerateObjectComplete!(
+      {
+        latencyMs: 250,
+        output: { topic: 'greeting' },
+        success: true,
+        usage: { cost: 0.001, totalInputTokens: 100, totalOutputTokens: 30 } as any,
+      },
+      {
+        options: {
+          metadata: { trigger: 'agent_signal' },
+          tracing: {
+            agentId: 'agt-1',
+            promptVersion: 'v2.0',
+            scenario: 'signal_skill_intent',
+            topicId: 'tpc-1',
+          },
+        },
+        payload: {
+          messages: [
+            { content: 'be helpful', role: 'system' },
+            { content: 'hi there', role: 'user' },
+          ],
+          model: 'gpt-4o',
+          schema: { type: 'object' },
+        } as any,
+      },
+    );
+
+    await flushMicrotasks();
+    expect(record).toHaveBeenCalledTimes(1);
+    const call = record.mock.calls[0][0];
+    expect(call).toMatchObject({
+      agentId: 'agt-1',
+      costUsd: 0.001,
+      inputTokens: 100,
+      latencyMs: 250,
+      model: 'gpt-4o',
+      outputTokens: 30,
+      promptVersion: 'v2.0',
+      provider: 'openai',
+      scenario: 'signal_skill_intent',
+      success: true,
+      topicId: 'tpc-1',
+      trigger: 'agent_signal',
+      userId: 'user-1',
+    });
+    expect((call.payload as { systemPrompt?: string }).systemPrompt).toBe('be helpful');
+    expect(call.promptHash).toHaveLength(6);
+  });
+
+  it('forwards caller-supplied inputHint through to the service', async () => {
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    hooks.onGenerateObjectComplete!(
+      { latencyMs: 10, success: true },
+      {
+        options: {
+          tracing: {
+            inputHint: '杭州天气',
+            scenario: 'input_completion',
+            schemaName: 'InputCompletion',
+          },
+        },
+        payload: { messages: [], model: 'gpt-4o', schema: {} } as any,
+      },
+    );
+    await flushMicrotasks();
+    expect(record.mock.calls[0][0]).toMatchObject({
+      inputHint: '杭州天气',
+      scenario: 'input_completion',
+      schemaName: 'InputCompletion',
+    });
+  });
+
+  it('flags validation failures using the error message heuristic and resolves scenario from metadata.trigger fallback', async () => {
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    hooks.onGenerateObjectComplete!(
+      {
+        error: { message: 'ZodError: required field missing' },
+        latencyMs: 100,
+        success: false,
+      },
+      {
+        options: { metadata: { trigger: 'topic' } },
+        payload: { messages: [], model: 'gpt-4o', schema: { type: 'object' } } as any,
+      },
+    );
+
+    await flushMicrotasks();
+    expect(record.mock.calls[0][0]).toMatchObject({
+      errorDetail: 'ZodError: required field missing',
+      scenario: 'topic_title',
+      success: false,
+      trigger: 'topic',
+      validationFailed: true,
+    });
+  });
+
+  it('writes caller-supplied tracing.metadata verbatim to the DB jsonb column (no auto-stamped provider)', async () => {
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    hooks.onGenerateObjectComplete!(
+      { latencyMs: 5, success: true },
+      {
+        options: {
+          metadata: { trigger: 'memory' },
+          tracing: {
+            agentId: 'agt-known',
+            metadata: {
+              parent_memory_trace_key: 'memory-extraction/user-1/topic/abc/trace/2026-05-22.json',
+            },
+          },
+        },
+        payload: { messages: [], model: 'gpt-4o', schema: {} } as any,
+      },
+    );
+    await flushMicrotasks();
+    // `provider` is a first-class column — must NOT be duplicated into metadata.
+    expect(record.mock.calls[0][0].metadata).toEqual({
+      parent_memory_trace_key: 'memory-extraction/user-1/topic/abc/trace/2026-05-22.json',
+    });
+  });
+
+  it('omits the metadata field when the caller passes no tracing.metadata', async () => {
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    hooks.onGenerateObjectComplete!(
+      { latencyMs: 5, success: true },
+      {
+        options: { tracing: { scenario: 'input_completion' } },
+        payload: { messages: [], model: 'gpt-4o', schema: {} } as any,
+      },
+    );
+    await flushMicrotasks();
+    expect(record.mock.calls[0][0].metadata).toBeUndefined();
+  });
+
+  it('falls back to the unknown scenario when no trigger / scenario is provided anywhere', async () => {
+    const hooks = createLLMGenerationTracingHook('user-1', 'openai');
+    hooks.onGenerateObjectComplete!(
+      { latencyMs: 100, success: true },
+      { options: {}, payload: { messages: [], model: 'gpt-4o' } as any },
+    );
+
+    await flushMicrotasks();
+    expect(record.mock.calls[0][0]).toMatchObject({
+      promptVersion: 'v0',
+      scenario: 'unknown',
+    });
+  });
+});
@@ -0,0 +1,141 @@
+import {
+  computePromptHash,
+  resolveScenario,
+  type TracingOptions,
+} from '@lobechat/llm-generation-tracing';
+import type { ModelRuntimeHooks } from '@lobechat/model-runtime';
+import debug from 'debug';
+
+import { getLLMGenerationTracingService } from './index';
+
+const log = debug('lobe-server:llm-generation-tracing:hook');
+
+const pickString = (value: unknown): string | undefined =>
+  typeof value === 'string' ? value : undefined;
+
+/**
+ * Validate the loose `options.tracing` bag (the runtime declares it as
+ * `Record<string, unknown>`) into the strongly-typed `TracingOptions` shape
+ * the hook works with. Unknown keys flow through `metadata` for the DB jsonb
+ * column.
+ */
+const parseTracingOptions = (raw: Record<string, unknown> | undefined): TracingOptions => {
+  if (!raw) return {};
+  return {
+    agentId: pickString(raw.agentId),
+    inputHint: pickString(raw.inputHint),
+    metadata:
+      raw.metadata && typeof raw.metadata === 'object' && !Array.isArray(raw.metadata)
+        ? (raw.metadata as Record<string, unknown>)
+        : undefined,
+    parentTracingId: pickString(raw.parentTracingId),
+    promptVersion: pickString(raw.promptVersion),
+    scenario: pickString(raw.scenario),
+    schemaName: pickString(raw.schemaName),
+    systemPrompt: pickString(raw.systemPrompt),
+    topicId: pickString(raw.topicId),
+    trigger: pickString(raw.trigger),
+  };
+};
+
+const extractSystemPrompt = (messages: unknown): string => {
+  if (!Array.isArray(messages)) return '';
+  const first = messages[0] as { content?: unknown; role?: unknown } | undefined;
+  if (first?.role === 'system' && typeof first.content === 'string') return first.content;
+  return '';
+};
+
+const tryScheduleAfter = (work: () => Promise<void> | void): void => {
+  let scheduled = false;
+  try {
+    const nextServer = require('next/server') as { after?: (fn: () => unknown) => void };
+    if (typeof nextServer.after === 'function') {
+      nextServer.after(work);
+      scheduled = true;
+    }
+  } catch {
+    // next/server not available — fall through to fire-and-forget
+  }
+  if (!scheduled) {
+    Promise.resolve()
+      .then(work)
+      .catch((err) => log('Deferred tracing work threw: %O', err));
+  }
+};
+
+/**
+ * Build a `ModelRuntimeHooks` slice that records every `generateObject` call to
+ * the `llm_generation_tracing` DB table + blob store. Designed to be merged
+ * with any business hooks at the ModelRuntime construction site.
+ */
+export const createLLMGenerationTracingHook = (
+  userId: string,
+  provider: string,
+): Pick<ModelRuntimeHooks, 'onGenerateObjectComplete'> => {
+  const service = getLLMGenerationTracingService();
+  if (!service.isEnabled()) return {};
+
+  return {
+    onGenerateObjectComplete: (data, context) => {
+      const tracing = parseTracingOptions(context.options?.tracing as Record<string, unknown>);
+      // `trigger` is also read by ModelRuntime itself (timing logs) so it
+      // legitimately lives on `metadata`. Honour the explicit `tracing.trigger`
+      // override but fall back to the cross-cutting `metadata.trigger`.
+      const metadataTrigger = pickString(
+        (context.options?.metadata as Record<string, unknown> | undefined)?.trigger,
+      );
+      const trigger = tracing.trigger ?? metadataTrigger;
+      const { scenario, promptVersion } = resolveScenario({
+        promptVersion: tracing.promptVersion,
+        scenario: tracing.scenario,
+        trigger,
+      });
+
+      const systemPrompt = tracing.systemPrompt ?? extractSystemPrompt(context.payload.messages);
+      const promptHash = computePromptHash(systemPrompt, context.payload.schema);
+
+      // Heuristic: a Zod validation error message starts with the Zod marker.
+      const errorMessage = data.error?.message;
+      const validationFailed =
+        !data.success && typeof errorMessage === 'string' && /zod|validation/i.test(errorMessage);
+
+      tryScheduleAfter(async () => {
+        try {
+          await service.record({
+            agentId: tracing.agentId,
+            costUsd: (data.usage as { cost?: number } | undefined)?.cost,
+            errorCode: data.error?.code,
+            errorDetail: data.error?.message ?? data.error?.stack,
+            inputHint: tracing.inputHint,
+            inputTokens: data.usage?.totalInputTokens ?? data.usage?.inputTextTokens,
+            latencyMs: data.latencyMs,
+            // Caller-supplied jsonb context only. `provider` is already a
+            // first-class column on the row — no need to duplicate it here.
+            metadata: tracing.metadata,
+            model: context.payload.model,
+            outputTokens: data.usage?.totalOutputTokens ?? data.usage?.outputTextTokens,
+            parentTracingId: tracing.parentTracingId,
+            payload: {
+              input: context.payload.messages,
+              output: data.output,
+              schema: context.payload.schema,
+              systemPrompt,
+            },
+            promptHash,
+            promptVersion,
+            provider,
+            scenario,
+            schemaName: tracing.schemaName,
+            success: data.success,
+            topicId: tracing.topicId,
+            trigger,
+            userId,
+            validationFailed,
+          });
+        } catch (err) {
+          log('Tracing service threw: %O', err);
+        }
+      });
+    },
+  };
+};
@@ -0,0 +1,227 @@
+// @vitest-environment node
+import type { ITracingStore, TracingPayload } from '@lobechat/llm-generation-tracing';
+import { eq } from 'drizzle-orm';
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
+
+import { getTestDB } from '@/database/core/getTestDB';
+import { llmGenerationTracing, users } from '@/database/schemas';
+import type { LobeChatDatabase } from '@/database/type';
+
+import { LLMGenerationTracingService } from './index';
+
+const serverDB: LobeChatDatabase = await getTestDB();
+
+// The service resolves the DB via getServerDB at call time. Point it at our
+// test DB so the integration covers the real insert/update path.
+vi.mock('@/database/server', () => ({ getServerDB: async () => serverDB }));
+
+const userId = 'llm-gen-trace-svc-user';
+
+const stubStore: ITracingStore & {
+  save: ReturnType<typeof vi.fn<(record: TracingPayload) => Promise<{ key: string | null }>>>;
+} = {
+  save: vi.fn<(record: TracingPayload) => Promise<{ key: string | null }>>(async () => ({
+    key: 'memo://saved',
+  })),
+};
+
+beforeEach(async () => {
+  await serverDB.delete(users);
+  await serverDB.insert(users).values([{ id: userId }]);
+  stubStore.save.mockClear();
+  stubStore.save.mockResolvedValue({ key: 'memo://saved' });
+});
+
+afterEach(async () => {
+  await serverDB.delete(llmGenerationTracing);
+  await serverDB.delete(users);
+});
+
+describe('LLMGenerationTracingService.record', () => {
+  it('inserts a row, calls the store, and patches the returned storage_key', async () => {
+    const service = new LLMGenerationTracingService(stubStore);
+
+    const result = await service.record({
+      latencyMs: 420,
+      model: 'gpt-4o',
+      payload: {
+        input: [{ content: 'hello world from the user', role: 'user' }],
+        output: { topic: 'greeting' },
+        schema: { type: 'object' },
+        systemPrompt: 'be helpful',
+      },
+      promptHash: 'aaaaaa',
+      promptVersion: 'v1.0',
+      provider: 'openai',
+      scenario: 'home_brief',
+      success: true,
+      trigger: 'home_brief',
+      userId,
+    });
+
+    expect(result?.tracingId).toMatch(/^[0-9a-f-]{36}$/);
+    expect(stubStore.save).toHaveBeenCalledTimes(1);
+
+    const payload = stubStore.save.mock.calls[0][0];
+    expect(payload).toMatchObject({
+      input: [{ content: 'hello world from the user', role: 'user' }],
+      model_metadata: { model: 'gpt-4o', provider: 'openai' },
+      output: { topic: 'greeting' },
+      prompt_hash: 'aaaaaa',
+      scenario: 'home_brief',
+      tracing_id: result?.tracingId,
+      version: '1.0',
+    });
+
+    const [row] = await serverDB
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.id, result!.tracingId));
+    expect(row).toMatchObject({
+      inputHint: 'hello world from the user',
+      latencyMs: 420,
+      model: 'gpt-4o',
+      promptHash: 'aaaaaa',
+      provider: 'openai',
+      scenario: 'home_brief',
+      storageKey: 'memo://saved',
+      success: true,
+      trigger: 'home_brief',
+      userId,
+    });
+    expect(row?.inputHash).toMatch(/^[0-9a-f]{64}$/);
+  });
+
+  it('preserves the row with storage_key=null and metadata.store_error when the store throws', async () => {
+    stubStore.save.mockRejectedValueOnce(new Error('S3 5xx'));
+    const service = new LLMGenerationTracingService(stubStore);
+
+    const result = await service.record({
+      metadata: { caller: 'home_brief_handler' },
+      promptHash: 'bbbbbb',
+      promptVersion: 'v1.0',
+      scenario: 'home_brief',
+      success: true,
+      userId,
+    });
+
+    const [row] = await serverDB
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.id, result!.tracingId));
+    expect(row?.storageKey).toBeNull();
+    expect(row?.metadata).toMatchObject({
+      caller: 'home_brief_handler',
+      store_error: 'S3 5xx',
+    });
+  });
+
+  it('honours an explicit inputHint override instead of auto-extracting from the first user message', async () => {
+    const service = new LLMGenerationTracingService(stubStore);
+    const result = await service.record({
+      inputHint: '杭州天气',
+      payload: {
+        // Wrapper prompt — first user message is a template, not the real input.
+        input: [
+          { content: 'be helpful', role: 'system' },
+          { content: 'Before cursor: "杭州天气" After cursor: ""', role: 'user' },
+        ],
+      },
+      promptHash: 'ffffff',
+      promptVersion: 'v1.0',
+      scenario: 'input_completion',
+      success: true,
+      userId,
+    });
+
+    const [row] = await serverDB
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.id, result!.tracingId));
+    expect(row?.inputHint).toBe('杭州天气');
+  });
+
+  it('truncates an excessively long inputHint override to INPUT_HINT_MAX', async () => {
+    const service = new LLMGenerationTracingService(stubStore);
+    const long = 'x'.repeat(500);
+    const result = await service.record({
+      inputHint: long,
+      promptHash: 'ffffff',
+      promptVersion: 'v1.0',
+      scenario: 'input_completion',
+      success: true,
+      userId,
+    });
+    const [row] = await serverDB
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.id, result!.tracingId));
+    expect(row?.inputHint?.length).toBe(200);
+  });
+
+  it('leaves storage_key null when the store reports a local-only save (key=null)', async () => {
+    stubStore.save.mockResolvedValueOnce({ key: null });
+    const service = new LLMGenerationTracingService(stubStore);
+
+    const result = await service.record({
+      promptHash: 'eeeeee',
+      promptVersion: 'v1.0',
+      scenario: 'home_brief',
+      success: true,
+      userId,
+    });
+
+    const [row] = await serverDB
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.id, result!.tracingId));
+    expect(row?.storageKey).toBeNull();
+  });
+
+  it('returns null and skips everything when no store is configured', async () => {
+    const service = new LLMGenerationTracingService(null);
+    const result = await service.record({
+      promptHash: 'cccccc',
+      promptVersion: 'v1.0',
+      scenario: 'home_brief',
+      success: true,
+      userId,
+    });
+    expect(result).toBeNull();
+    expect(stubStore.save).not.toHaveBeenCalled();
+    const rows = await serverDB.select().from(llmGenerationTracing);
+    expect(rows).toHaveLength(0);
+  });
+});
+
+describe('LLMGenerationTracingService.recordFeedback', () => {
+  it('writes feedback columns onto the row owned by the caller', async () => {
+    const service = new LLMGenerationTracingService(stubStore);
+
+    const { tracingId } = (await service.record({
+      promptHash: 'dddddd',
+      promptVersion: 'v1.0',
+      scenario: 'agent_welcome',
+      success: true,
+      userId,
+    }))!;
+
+    await service.recordFeedback(userId, tracingId, {
+      data: { clicked_question_index: 0 },
+      score: 1,
+      signal: 'positive',
+      source: 'explicit_thumbs',
+    });
+
+    const [row] = await serverDB
+      .select()
+      .from(llmGenerationTracing)
+      .where(eq(llmGenerationTracing.id, tracingId));
+    expect(row).toMatchObject({
+      feedbackData: { clicked_question_index: 0 },
+      feedbackScore: 1,
+      feedbackSignal: 'positive',
+      feedbackSource: 'explicit_thumbs',
+    });
+  });
+});
@@ -0,0 +1,258 @@
+import {
+  computeInputHash,
+  FileTracingStore,
+  type ITracingStore,
+  type TracingPayload,
+} from '@lobechat/llm-generation-tracing';
+import debug from 'debug';
+import { eq } from 'drizzle-orm';
+
+import {
+  LlmGenerationTracingModel,
+  type RecordLlmGenerationParams,
+  type UpdateLlmGenerationFeedbackParams,
+} from '@/database/models/llmGenerationTracing';
+import { llmGenerationTracing } from '@/database/schemas/llmGenerationTracing';
+import { getServerDB } from '@/database/server';
+
+const log = debug('lobe-server:llm-generation-tracing:service');
+
+const INPUT_HINT_MAX = 200;
+
+export interface GenerationCallPayload {
+  input?: unknown;
+  output?: unknown;
+  rawOutput?: string;
+  schema?: unknown;
+  systemPrompt?: string;
+}
+
+export interface RecordLLMGenerationCallParams {
+  agentId?: string | null;
+  costUsd?: number | null;
+  errorCode?: string | null;
+  errorDetail?: string | null;
+  /**
+   * Caller-supplied snippet stored on `input_hint`. When omitted, the service
+   * auto-extracts a hint from the first user message in `payload.input`.
+   * Callers wrapping the user's text in a template should pass the raw input
+   * here so the DB row stays human-scannable.
+   */
+  inputHint?: string | null;
+  inputTokens?: number | null;
+  latencyMs?: number | null;
+  metadata?: Record<string, unknown>;
+  model?: string | null;
+  outputTokens?: number | null;
+  parentTracingId?: string | null;
+  payload?: GenerationCallPayload;
+  promptHash: string;
+  promptVersion: string;
+  provider?: string | null;
+  scenario: string;
+  schemaName?: string | null;
+  spanId?: string | null;
+  success: boolean;
+  topicId?: string | null;
+  traceId?: string | null;
+  trigger?: string | null;
+  userId: string;
+  validationFailed?: boolean;
+}
+
+/**
+ * Per-call observability for `generateObject`. Persists a structured summary
+ * row to `llm_generation_tracing` and the full prompt/input/output blob to the
+ * configured store (S3 in prod, local file in dev, no-op otherwise).
+ *
+ * Always invoked from `after()` so it never blocks the user response. Both
+ * store and DB failures are swallowed and logged — the DB row is the source
+ * of truth for analytics, the blob is a cold artefact for offline review.
+ */
+export class LLMGenerationTracingService {
+  private readonly store: ITracingStore | null;
+
+  constructor(store?: ITracingStore | null) {
+    this.store = store === undefined ? createDefaultStore() : store;
+  }
+
+  isEnabled(): boolean {
+    return this.store !== null;
+  }
+
+  async record(params: RecordLLMGenerationCallParams): Promise<{ tracingId: string } | null> {
+    if (!this.store) return null;
+
+    let db: Awaited<ReturnType<typeof getServerDB>>;
+    try {
+      db = await getServerDB();
+    } catch (err) {
+      log('Skipping tracing — getServerDB failed: %O', err);
+      return null;
+    }
+
+    const model = new LlmGenerationTracingModel(db, params.userId);
+
+    const dbValues: RecordLlmGenerationParams = {
+      agentId: params.agentId,
+      costUsd: params.costUsd,
+      errorCode: params.errorCode,
+      errorDetail: params.errorDetail,
+      inputHash: params.payload?.input ? computeInputHash(params.payload.input) : null,
+      inputHint: resolveInputHint(params.inputHint, params.payload?.input),
+      inputTokens: params.inputTokens,
+      latencyMs: params.latencyMs,
+      metadata: params.metadata,
+      model: params.model,
+      outputTokens: params.outputTokens,
+      parentTracingId: params.parentTracingId,
+      promptHash: params.promptHash,
+      promptVersion: params.promptVersion,
+      provider: params.provider,
+      scenario: params.scenario,
+      schemaName: params.schemaName,
+      spanId: params.spanId,
+      success: params.success,
+      topicId: params.topicId,
+      traceId: params.traceId,
+      trigger: params.trigger,
+      validationFailed: params.validationFailed,
+    };
+
+    // Insert first so the storage key can embed the row's id — every blob then
+    // points at exactly one row.
+    let tracingId: string;
+    try {
+      const row = await model.record(dbValues);
+      tracingId = row.id;
+    } catch (err) {
+      log('DB insert failed: %O', err);
+      return null;
+    }
+
+    const payload: TracingPayload = {
+      created_at: Date.now(),
+      error: params.success
+        ? undefined
+        : {
+            code: params.errorCode ?? undefined,
+            message: params.errorDetail ?? undefined,
+          },
+      input: params.payload?.input,
+      model_metadata: {
+        model: params.model ?? undefined,
+        provider: params.provider ?? undefined,
+      },
+      output: params.payload?.output,
+      prompt_hash: params.promptHash,
+      prompt_version: params.promptVersion,
+      raw_output: params.validationFailed ? params.payload?.rawOutput : undefined,
+      scenario: params.scenario,
+      schema: params.payload?.schema,
+      system_prompt: params.payload?.systemPrompt,
+      tracing_id: tracingId,
+      validation_failed: params.validationFailed,
+      version: '1.0',
+    };
+
+    let storageKey: string | null = null;
+    let storeError: string | undefined;
+    try {
+      const result = await this.store.save(payload);
+      storageKey = result.key;
+    } catch (err) {
+      storeError = err instanceof Error ? err.message : String(err);
+      log('Store save failed (DB row kept): %O', err);
+    }
+
+    try {
+      await db
+        .update(llmGenerationTracing)
+        .set({
+          metadata: storeError
+            ? { ...params.metadata, store_error: storeError }
+            : (params.metadata ?? {}),
+          storageKey,
+        })
+        .where(eq(llmGenerationTracing.id, tracingId));
+    } catch (err) {
+      log('Failed to patch storage_key onto row: %O', err);
+    }
+
+    return { tracingId };
+  }
+
+  async recordFeedback(
+    userId: string,
+    tracingId: string,
+    params: UpdateLlmGenerationFeedbackParams,
+  ): Promise<void> {
+    let db: Awaited<ReturnType<typeof getServerDB>>;
+    try {
+      db = await getServerDB();
+    } catch (err) {
+      log('Skipping feedback — getServerDB failed: %O', err);
+      return;
+    }
+    const model = new LlmGenerationTracingModel(db, userId);
+    try {
+      await model.updateFeedback(tracingId, params);
+    } catch (err) {
+      log('Feedback update failed: %O', err);
+    }
+  }
+}
+
+const createDefaultStore = (): ITracingStore | null => {
+  if (process.env.ENABLE_LLM_GENERATION_TRACING_S3 === '1') {
+    try {
+      // Require at call time so test environments without S3 wiring don't break.
+
+      const { S3TracingStore } = require('@/server/modules/LLMGenerationTracing');
+      return new S3TracingStore();
+    } catch {
+      // S3 wiring not available — fall through to file store / null.
+    }
+  }
+
+  if (process.env.NODE_ENV === 'development') {
+    try {
+      return new FileTracingStore();
+    } catch {
+      // Filesystem unavailable — fall through to null.
+    }
+  }
+
+  return null;
+};
+
+const autoExtractHint = (input: unknown): string | null => {
+  if (input == null) return null;
+  if (typeof input === 'string') return input;
+  if (!Array.isArray(input)) return null;
+  const firstUser = input.find(
+    (m): m is { content: unknown; role: string } =>
+      typeof m === 'object' &&
+      m !== null &&
+      'role' in m &&
+      (m as { role: unknown }).role === 'user',
+  );
+  return firstUser && typeof firstUser.content === 'string' ? firstUser.content : null;
+};
+
+/**
+ * Pick the `input_hint` value: caller-supplied override wins; otherwise fall
+ * back to a best-effort auto-extraction from the first user message. Always
+ * truncated to `INPUT_HINT_MAX` so the column stays scannable.
+ */
+const resolveInputHint = (override: string | null | undefined, input: unknown): string | null => {
+  const raw = override ?? autoExtractHint(input);
+  if (raw == null) return null;
+  return raw.slice(0, INPUT_HINT_MAX);
+};
+
+let cachedInstance: LLMGenerationTracingService | null = null;
+export const getLLMGenerationTracingService = (): LLMGenerationTracingService => {
+  if (!cachedInstance) cachedInstance = new LLMGenerationTracingService();
+  return cachedInstance;
+};
				`@@ -0,0 +1 @@`
				`export { buildTracingKey, S3TracingStore } from './S3TracingStore';`