📝 docs: clarify drizzle raw sql guidance (#15467)

2026-06-14 03:30:19 +00:00 · 2026-06-04 17:00:42 +08:00
parent bab3ff4a7a
commit e165b6424b
1 changed files with 88 additions and 0 deletions
@@ -174,6 +174,94 @@ const rows = await this.db
  .groupBy(agentEvalDatasets.id);
 ```

+### Raw SQL and Advanced Queries
+
+Prefer Drizzle builders whenever the query can be expressed clearly with `select`,
+`insert().select()`, `update().from()`, joins, CTEs, `groupBy`, and typed selected
+columns. This keeps table and column references tied to schema definitions, so
+schema changes are more likely to surface as TypeScript errors.
+
+Expression-level `sql<T>` is fine inside a Drizzle builder for PostgreSQL features
+that do not have a dedicated helper, such as JSON path extraction, casts, aggregate
+expressions, `CASE`, `NOW()`, or advisory locks. Row locks are query clauses, not
+expressions; use the select builder's `.for('update')` instead of raw
+`FOR UPDATE` SQL fragments.
+
+When refactoring raw SQL:
+
+- Preserve the original query shape for latency-sensitive paths. If raw SQL is one
+  database roundtrip, do not replace it with multiple depth-based queries just to
+  remove `execute`.
+- Use `$with(...)` plus `insert().select()` / `update().from()` for multi-step
+  single-roundtrip writes when Drizzle can express the data flow.
+- Avoid generic `execute<MyRow>(sql...)` as the main safety mechanism. It types the
+  returned rows, but it does not keep selected columns in sync with schema changes.
+- If the only clean implementation is a PostgreSQL feature that Drizzle cannot
+  express well, keep the raw SQL and tighten it instead: use schema references in
+  interpolations, explicit user scope, a narrow row interface, and regression tests.
+
+Recursive CTEs are a special case: current Drizzle usage in this repo does not have
+a clean `WITH RECURSIVE` builder pattern. Keep recursive CTE raw SQL when replacing
+it would add extra database roundtrips or materially worsen performance.
+
+Example: convert an aggregate query when Drizzle can preserve one roundtrip:
+
+```typescript
+// ✅ Good: builder owns table and column references; sql<T> stays expression-level.
+const rows = await trx
+  .select({
+    model: messages.model,
+    provider: messages.provider,
+    totalCost: sql<string | null>`sum((${messages.metadata}->'usage'->>'cost')::numeric)`.as(
+      'totalCost',
+    ),
+  })
+  .from(messages)
+  .where(
+    and(
+      eq(messages.topicId, topicId),
+      eq(messages.userId, userId),
+      eq(messages.role, 'assistant'),
+      sql`${messages.metadata} ? 'usage'`,
+    ),
+  )
+  .groupBy(messages.provider, messages.model);
+```
+
+Example: use the select lock builder for row locks:
+
+```typescript
+const [user] = await trx.select().from(users).where(eq(users.id, userId)).for('update');
+```
+
+Example: keep a recursive CTE raw when replacing it would add depth-based DB
+roundtrips:
+
+```typescript
+interface TaskTreeRow {
+  id: string;
+  parent_task_id: string | null;
+}
+
+// execute<T> is acceptable here only because Drizzle has no clean WITH RECURSIVE
+// builder; a builder rewrite would add depth-based roundtrips. Keep schema refs in
+// the interpolations and scope every leg to the user.
+const { rows } = await db.execute<TaskTreeRow>(sql`
+  WITH RECURSIVE task_tree AS (
+    SELECT ${tasks.id}, ${tasks.parentTaskId}
+    FROM ${tasks}
+    WHERE ${tasks.id} = ${rootTaskId}
+      AND ${tasks.createdByUserId} = ${userId}
+    UNION ALL
+    SELECT ${tasks.id}, ${tasks.parentTaskId}
+    FROM ${tasks}
+    JOIN task_tree ON ${tasks.parentTaskId} = task_tree.id
+    WHERE ${tasks.createdByUserId} = ${userId}
+  )
+  SELECT * FROM task_tree
+`);
+```
+
 ### One-to-Many (Separate Queries)

 When you need a parent record with its children, use two queries instead of relational `with:`: