This commit is contained in:
Tejas Kumar
2026-04-02 10:30:35 +02:00
commit b29905a3ac
20 changed files with 2941 additions and 0 deletions
+1
View File
@@ -0,0 +1 @@
OPENROUTER_API_KEY=your_api_key_here
+73
View File
@@ -0,0 +1,73 @@
# Created by https://www.toptal.com/developers/gitignore/api/macos,nodejs,nextjs
# Edit at https://www.toptal.com/developers/gitignore?templates=macos,nodejs,nextjs
### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride
# Icon must end with two \r
Icon
# Thumbnails
._*
# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent
# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
### macOS Patch ###
# iCloud generated files
*.icloud
### NextJS ###
# dependencies
/node_modules
/.pnp
.pnp.js
# testing
/coverage
# next.js
/.next/
/out/
# production
/build
# misc
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.pnpm-debug.log*
# local env files
.env*.local
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts
#!! ERROR: nodejs is undefined. Use list command to see defined gitignore types !!#
+180
View File
@@ -0,0 +1,180 @@
# mini-ai-harness
A minimal TypeScript implementation of two kinds of AI harness, built for a talk on **harness engineering** at AI Engineer World's Fair.
---
## What is an AI harness?
An AI harness is the infrastructure that gives an AI model tools and manages input/output behind the scenes, ensuring the model has the tools, context, and environment to do what's asked. It's the scaffolding that wraps around an LLM to make it useful for real-world tasks — not just answering one prompt, but doing actual work in a loop.
The clearest one-liner: **an AI harness is everything except the model weights.**
In practice that means: tool interfaces, context/memory handling, guardrails, verification steps, approval gates, logging, and recovery loops. Anthropic refers to their Claude Agent SDK as a "general-purpose agent harness" that provides built-in context management and tool use so Claude can function as a long-running assistant. OpenAI describes the same idea as orchestration. Anthropic calls the context layer context engineering.
---
## What is harness engineering?
The term crystallized in February 2026 when Mitchell Hashimoto — co-founder of HashiCorp, creator of Terraform — published a blog post giving the practice a name:
> Whenever an agent makes a mistake, you engineer the environment so it won't make that mistake again.
Days later, OpenAI used the same phrase describing how they built an internal beta product: roughly one million lines of code, written entirely by agents, shipped in five months, with no manually written source code. Their key insight:
> When something failed, the fix was almost never "try harder." Human engineers always stepped in and asked: what capability is missing, and how do we make it both legible and enforceable for the agent?
Harness engineering shifts the engineer's job from writing code to designing environments, specifying intent, and providing structured feedback. The harness is the moat. The model is rented.
According to Thoughtworks and OpenAI, a harness has three core components:
1. **Context engineering** — deciding what information to include or exclude at each model call: isolation (keep subtasks separate), reduction (drop stale data to avoid context rot), retrieval (inject fresh docs or search results at the right time).
2. **Architectural constraints** — enforced not just by the model, but by deterministic linters, structural tests, and guardrails the model cannot bypass.
3. **Verification and feedback loops** — the harness checks outputs, runs eval steps, and if something is wrong, surfaces it so the agent or the engineer can fix it.
---
## The two meanings of "harness" — and why both are in this repo
The word has two distinct usages and conflating them causes real confusion.
| | Eval harness | Agent harness |
|---|---|---|
| **Origin** | ML research, 2021 | Agentic engineering, 2026 |
| **Example** | EleutherAI's LM Evaluation Harness | Claude Agent SDK, this repo |
| **Purpose** | Measure model quality against known answers | Enable a model to act in the real world |
| **Input** | Fixed dataset | Open-ended task |
| **Output** | Scores and pass/fail | Answer + tool call log |
| **Loop** | One call per test case | Iterates until done or guardrail fires |
| **Tools** | None | Yes — the whole point |
| **Guardrails** | Not needed | Essential |
| **State** | Stateless | Conversation history across turns |
EleutherAI's LM Evaluation Harness (2021) described itself as "a framework for few-shot evaluation of autoregressive language models." That's the older meaning. The agent harness is newer and fundamentally different in purpose.
Both are in this repo so you can see them side by side.
---
## What's in this repo
```
eval/ ← the eval harness (older meaning)
agent/ ← the agent harness (newer meaning)
```
### `eval/` — test a model against known answers
```
dataset → model → scorer → pass/fail → summary
```
| File | Part | What it does |
|---|---|---|
| `1-dataset.ts` | Dataset | Fixed test cases with known expected outputs. Designed to trigger common hallucinations — the "obvious" answer is usually wrong. |
| `2-model.ts` | Model | Calls any OpenRouter model with a prompt, returns a string. |
| `3-scorers.ts` | Scorers | `exactMatch`, `contains`, `keywords` — normalizes number words ("Three" → "3") before comparing. |
| `4-runner.ts` | Runner | Loops over cases, scores each, tracks whether the model fell for the trap answer. |
| `5-index.ts` | Output | Runs multiple models against the same dataset, prints side-by-side comparison. |
```sh
npm run eval
```
### `agent/` — give a model a task and an environment
```
task → [tools + context + guardrails + loop + verify] → result
```
| File | Part | What it does |
|---|---|---|
| `1-tools.ts` | Tool registry | `createTools(session)` — tools are bound to the environment the harness provides, not a global they reach into. |
| `2-model.ts` | Model client | OpenRouter via the OpenAI SDK. Swap models by changing one string. |
| `3-context.ts` | Context / state | Builds initial context, trims old messages to prevent context rot. |
| `4-guardrails.ts` | Guardrails | Composable safety checks (max iterations, max messages) that run before every loop iteration. |
| `5-loop.ts` | Agent loop | Call model → use tools → feed result back → repeat. Stops when model answers or guardrail fires. |
| `6-harness.ts` | The harness | Owns the full lifecycle: opens environment, creates tools, runs loop, verifies answer, closes environment. |
| `browser.ts` | Environment | A `BrowserSession` — one isolated browser page per harness run, managed entirely by the harness. |
```sh
npm run agent
```
---
## How the agent demo works
The task requires live data from the web:
> "Go to https://news.ycombinator.com and tell me the exact title and current point score of the #1 story right now."
The demo runs this against two models sequentially. Each gets its own browser session, opened and closed by the harness.
**A model with good tool use** (e.g. `gpt-4o-mini`):
```
[iter 1] called 2 tool(s) [ctx: 2 msgs]
→ browser_navigate({"url":"https://news.ycombinator.com"})
→ browser_get_text({})
...Hacker News | #1: "Some Title" | 847 points...
[iter 2] answered [ctx: 6 msgs]
Answer: The #1 story is "Some Title" with 847 points.
Verify: ✓ PASS — Answer contains a point score
```
**A model that skips tools** (e.g. `stepfun/step-3.5-flash:free`):
```
[iter 1] answered [ctx: 2 msgs]
Answer: The top story on Hacker News is "Some Made-Up Title" with 312 points.
Verify: ✗ FAIL — No point score found in answer
```
The contrast makes three things visible at once:
- **Tools**: one model opens a real browser, the other hallucinates
- **Context**: message count grows with each tool call — you can watch it
- **Verify**: both models stopped without hitting a guardrail. The harness looked successful either way. Only the verify step caught the semantic failure.
That last point is the key insight: **guardrails catch structural failures. Verify catches wrong answers. You need both.**
---
## The harness owns the environment
The architectural decision that makes this a real harness rather than just a loop with tools:
```
runHarness()
├── session = new BrowserSession() ← harness opens the environment
├── tools = createTools(session) ← tools are bound to this session
├── messages = createContext(task) ← fresh context for this task
├── result = await runLoop(...) ← loop runs inside the environment
└── session.close() ← always, even on error
```
Tools don't manage the browser. They don't know about the browser lifecycle. The harness opens it, the harness closes it, and the process exits cleanly. That's what "managing input/output behind the scenes" means in practice.
---
## Setup
```sh
cp .env.example .env
# add your OPENROUTER_API_KEY
npm install
npx playwright install chromium
npm run eval # or
npm run agent
```
Get an OpenRouter key at [openrouter.ai](https://openrouter.ai).
---
## Sources
- Mitchell Hashimoto, [My AI Adoption Journey](https://mitchellh.com/writing/my-ai-adoption-journey) (February 2026) — coined "harness engineering" in its current agentic meaning
- Anthropic, [Effective context engineering for AI agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — context engineering as a core harness component
- EleutherAI, [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) — the older eval harness meaning (2021)
+163
View File
@@ -0,0 +1,163 @@
import type { ChatCompletionTool } from "openai/resources/chat/completions";
import type { BrowserSession } from "./browser.js";
export type Tool = {
definition: ChatCompletionTool;
execute: (args: Record<string, unknown>) => Promise<string>;
};
export type ToolRegistry = {
definitions: ChatCompletionTool[];
byName: Map<string, Tool>;
};
export type ToolHooks = {
onUpvoteSuccess?: (storyId: string) => void;
onStoriesLoaded?: (stories: any[]) => void;
};
export function createTools(session: BrowserSession, hooks?: ToolHooks): ToolRegistry {
const tools: Tool[] = [
{
definition: {
type: "function",
function: {
name: "browser_navigate",
description: "Navigate the browser to a URL.",
parameters: {
type: "object",
properties: {
url: { type: "string" },
},
required: ["url"],
},
},
},
execute: async ({ url }) => session.navigate(url as string),
},
{
definition: {
type: "function",
function: {
name: "browser_url",
description: "Get the URL of the current page. Use this to detect redirects (e.g. being sent to a login page).",
parameters: { type: "object", properties: {}, required: [] },
},
},
execute: async () => session.getUrl(),
},
{
definition: {
type: "function",
function: {
name: "browser_get_text",
description: "Get the visible text content of the current page.",
parameters: { type: "object", properties: {}, required: [] },
},
},
execute: async () => session.getText(),
},
{
definition: {
type: "function",
function: {
name: "browser_fill",
description: "Fill in an input field on the current page.",
parameters: {
type: "object",
properties: {
selector: { type: "string", description: 'CSS selector for the input, e.g. "input[name=\'acct\']"' },
value: { type: "string", description: "The value to type into the field." },
},
required: ["selector", "value"],
},
},
},
execute: async ({ selector, value }) =>
session.fill(selector as string, value as string),
},
{
definition: {
type: "function",
function: {
name: "browser_click",
description: "Click an element on the current page. Also waits for any navigation that results from the click.",
parameters: {
type: "object",
properties: {
selector: { type: "string", description: 'CSS selector, e.g. "input[type=\'submit\']"' },
},
required: ["selector"],
},
},
},
execute: async ({ selector }) => {
const result = await session.click(selector as string);
// Check if this was a successful upvote click
if (
hooks?.onUpvoteSuccess &&
/up_/.test(JSON.stringify(selector)) &&
/news\.ycombinator\.com\/(news)?$/.test(result)
) {
const match = (selector as string).match(/up_(\d+)/);
if (match) {
hooks.onUpvoteSuccess(match[1]);
}
}
return result;
},
},
{
definition: {
type: "function",
function: {
name: "browser_get_stories",
description: "Get a structured list of Hacker News stories on the current page — rank, story ID, title, and whether you've already voted. Use this instead of browser_get_text to accurately identify which story to upvote.",
parameters: { type: "object", properties: {}, required: [] },
},
},
execute: async () => {
const result = await session.getStories();
if (hooks?.onStoriesLoaded) {
try {
const stories = JSON.parse(result);
hooks.onStoriesLoaded(stories);
} catch {}
}
return result;
},
},
{
definition: {
type: "function",
function: {
name: "browser_has_class",
description: "Check whether the first element matching a selector has a specific CSS class. Use this to verify upvote state: check if a[id='up_12345'] has class 'nosee' before and after clicking.",
parameters: {
type: "object",
properties: {
selector: { type: "string", description: "CSS selector for the element to check." },
className: { type: "string", description: "The CSS class name to look for." },
},
required: ["selector", "className"],
},
},
},
execute: async ({ selector, className }) =>
session.hasClass(selector as string, className as string),
},
];
return {
definitions: tools.map((t) => t.definition),
byName: new Map(tools.map((t) => [t.definition.function.name, t])),
};
}
+8
View File
@@ -0,0 +1,8 @@
import OpenAI from "openai";
import "dotenv/config";
// OpenRouter is OpenAI-compatible. We just swap the baseURL.
export const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
+29
View File
@@ -0,0 +1,29 @@
import type { ChatCompletionMessageParam } from "openai/resources/chat/completions";
const SYSTEM_PROMPT = `
You are a helpful assistant with access to tools.
Use tools whenever they help you give a more accurate answer.
When you have enough information, respond directly and concisely.
`.trim();
// Build the initial context for a new task
export function createContext(task: string): ChatCompletionMessageParam[] {
return [
{ role: "system", content: SYSTEM_PROMPT },
{ role: "user", content: task },
];
}
// Drop old tool messages if context grows too large.
// Always keep: the system prompt and the original user task.
export function trimContext(
messages: ChatCompletionMessageParam[],
maxMessages: number
): ChatCompletionMessageParam[] {
if (messages.length <= maxMessages) return messages;
const [system, user] = messages;
const rest = messages.slice(2);
const trimmed = rest.slice(rest.length - (maxMessages - 2));
return [system, user, ...trimmed];
}
+58
View File
@@ -0,0 +1,58 @@
import type { ChatCompletionMessageParam } from "openai/resources/chat/completions";
type GuardrailInput = {
iterations: number;
messages: ChatCompletionMessageParam[];
};
export type GuardrailResult = { ok: true } | { ok: false; reason: string };
export type GuardrailFn = (input: GuardrailInput) => GuardrailResult;
// ── Individual guardrails ─────────────────────
// Stop after too many iterations — prevents infinite loops
const maxIterations =
(limit: number): GuardrailFn =>
({ iterations }) =>
iterations >= limit
? { ok: false, reason: `Guardrail: reached iteration limit (${limit})` }
: { ok: true };
// Stop if context has ballooned unexpectedly
const maxMessages =
(limit: number): GuardrailFn =>
({ messages }) =>
messages.length > limit
? { ok: false, reason: `Guardrail: context too large (${messages.length} messages)` }
: { ok: true };
// ── Compose into one fn ───────────────────────
export function combineGuardrails(...fns: GuardrailFn[]): GuardrailFn {
return (input) => {
for (const check of fns) {
const result = check(input);
if (!result.ok) return result;
}
return { ok: true };
};
}
// Stop after successful upvote
export const stopAfterUpvote =
(getUpvotedStory: () => { id: string; title?: string; rank?: number } | null): GuardrailFn =>
() => {
const story = getUpvotedStory();
if (story) {
const storyInfo = story.title && story.rank
? `"${story.title}" (rank ${story.rank})`
: `story ID ${story.id}`;
return { ok: false, reason: `Successfully upvoted ${storyInfo}` };
}
return { ok: true };
};
export const defaultGuardrails = combineGuardrails(
maxIterations(15),
maxMessages(50)
);
+123
View File
@@ -0,0 +1,123 @@
import type { ChatCompletionMessageParam } from "openai/resources/chat/completions";
import { client } from "./2-model.js";
import { trimContext } from "./3-context.js";
import type { GuardrailFn } from "./4-guardrails.js";
import type { ToolRegistry } from "./1-tools.js";
const MAX_CONTEXT_MESSAGES = 20;
// A single tool call + its result, captured for the trace
export type ToolEvent = {
tool: string;
args: Record<string, unknown>;
result: string;
};
// One loop iteration: the model either called tools or gave a final answer
export type LoopIteration = {
index: number;
outcome: "tool_calls" | "answer";
toolEvents: ToolEvent[]; // empty if outcome is "answer"
contextSize: number; // how many messages were in context for this call
contextTrimmed: boolean; // true if we dropped old messages before this call
};
export type LoopResult = {
answer: string;
iterations: number;
trace: LoopIteration[];
stoppedBy: "model" | "guardrail" | "success";
};
export type LoginHandler = () => Promise<ToolEvent | null>;
export async function runLoop(
model: string,
messages: ChatCompletionMessageParam[],
guardrail: GuardrailFn,
tools: ToolRegistry, // injected by the harness, not imported globally
loginHandler?: LoginHandler // optional callback to handle login redirects
): Promise<LoopResult> {
const trace: LoopIteration[] = [];
while (true) {
const iterationIndex = trace.length + 1;
// ── Context management ────────────────────
const beforeTrim = messages.length;
messages = trimContext(messages, MAX_CONTEXT_MESSAGES);
const contextTrimmed = messages.length < beforeTrim;
// ── Guardrails check ──────────────────────
const check = guardrail({ iterations: trace.length, messages });
if (!check.ok) {
// Check if this is a success completion (reason starts with "Successfully")
const stoppedBy = check.reason.startsWith("Successfully") ? "success" : "guardrail";
return { answer: check.reason, iterations: trace.length, trace, stoppedBy };
}
// ── Model call ────────────────────────────
process.stdout.write(`[iter ${iterationIndex}] calling model... `);
const response = await client.chat.completions.create({
model,
messages,
tools: tools.definitions,
});
const choice = response.choices[0];
const contextSize = messages.length;
console.log(`${choice.finish_reason}`);
messages.push(choice.message as ChatCompletionMessageParam);
// ── Final answer ──────────────────────────
if (choice.finish_reason === "stop") {
trace.push({ index: iterationIndex, outcome: "answer", toolEvents: [], contextSize, contextTrimmed });
return {
answer: choice.message.content ?? "(no response)",
iterations: trace.length,
trace,
stoppedBy: "model",
};
}
// ── Tool calls → execute → loop ───────────
if (choice.finish_reason === "tool_calls") {
const toolEvents: ToolEvent[] = [];
for (const call of choice.message.tool_calls ?? []) {
const name = call.function.name;
const args = JSON.parse(call.function.arguments) as Record<string, unknown>;
const tool = tools.byName.get(name);
process.stdout.write(`${name}(${JSON.stringify(args)}) ... `);
let result: string;
try {
result = tool ? await tool.execute(args) : `Unknown tool: "${name}"`;
console.log(`done`);
} catch (err) {
result = `Error: ${err instanceof Error ? err.message : String(err)}`;
console.log(`error`);
}
toolEvents.push({ tool: name, args, result });
messages.push({ role: "tool", tool_call_id: call.id, content: result });
}
// ── Check for login redirect after tool execution ───
if (loginHandler) {
const loginEvent = await loginHandler();
if (loginEvent) {
toolEvents.push(loginEvent);
// Add a system message to inform the agent that login was handled
messages.push({
role: "user",
content: "Authentication completed by harness. You are now logged in. Navigate back to https://news.ycombinator.com and complete your upvote task.",
});
}
}
trace.push({ index: iterationIndex, outcome: "tool_calls", toolEvents, contextSize, contextTrimmed });
}
}
}
+169
View File
@@ -0,0 +1,169 @@
import { BrowserSession } from "./browser.js";
import { createTools } from "./1-tools.js";
import { createContext } from "./3-context.js";
import { combineGuardrails, defaultGuardrails, stopAfterUpvote } from "./4-guardrails.js";
import { runLoop } from "./5-loop.js";
import type { LoopResult, ToolEvent } from "./5-loop.js";
export type VerifyResult = {
passed: boolean;
reason: string;
};
export type HarnessExecutionResult = LoopResult & {
task: string;
model: string;
};
export type HarnessOptions = {
verify?: (result: HarnessExecutionResult) => VerifyResult;
maxAttempts?: number;
};
export type HarnessResult = HarnessExecutionResult & {
attempts: number;
verification: VerifyResult | null;
};
export async function runHarness(
task: string,
model: string,
options: HarnessOptions = {}
): Promise<HarnessResult> {
const maxAttempts = options.maxAttempts ?? 1;
let latestResult: HarnessResult | null = null;
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
const result = await runHarnessAttempt(task, model);
const verification = options.verify ? options.verify(result) : null;
latestResult = { ...result, attempts: attempt, verification };
if (verification?.passed || attempt === maxAttempts) {
return latestResult;
}
console.log(`\nAttempt ${attempt} failed — retrying (${attempt + 1}/${maxAttempts})...\n`);
}
throw new Error("Harness finished without producing a result");
}
export function verifySuccessfulUpvote(result: HarnessExecutionResult): VerifyResult {
const successfulUpvote = result.trace
.flatMap((iter) => iter.toolEvents)
.find(
(e) =>
e.tool === "browser_click" &&
/up_/.test(JSON.stringify(e.args)) &&
/news\.ycombinator\.com\/(news)?$/.test(e.result.split("now at ")[1]?.trim() ?? "")
);
return {
passed: !!successfulUpvote,
reason: successfulUpvote
? `Upvote click confirmed — landed on ${successfulUpvote.result.split("now at ")[1]}`
: "No successful upvote click found in trace (all arrows may be hidden, or login failed)",
};
}
export function printHarnessResult(result: HarnessResult): void {
console.log("\n─── Agent trace ───────────────────────────\n");
for (const iteration of result.trace) {
const trimNote = iteration.contextTrimmed ? " ✂ context trimmed" : "";
const ctx = `[ctx: ${iteration.contextSize} msgs${trimNote}]`;
if (iteration.outcome === "tool_calls") {
console.log(`[iter ${iteration.index}] ${iteration.toolEvents.length} tool call(s) ${ctx}`);
for (const event of iteration.toolEvents) {
console.log(`${event.tool}(${JSON.stringify(event.args)})`);
console.log(` ${event.result.slice(0, 120)}${event.result.length > 120 ? "…" : ""}`);
}
} else {
console.log(`[iter ${iteration.index}] answered ${ctx}`);
}
console.log();
}
console.log("─── Result ────────────────────────────────\n");
console.log(result.answer);
console.log(`\nStopped by: ${result.stoppedBy} after ${result.iterations} iteration(s)`);
console.log(`Attempts: ${result.attempts}`);
if (result.verification) {
const { passed, reason } = result.verification;
console.log(`Verify: ${passed ? "✓ PASS" : "✗ FAIL"}${reason}`);
}
}
async function runHarnessAttempt(
task: string,
model: string
): Promise<HarnessExecutionResult> {
// Open the environment — each run gets its own isolated browser page
const session = new BrowserSession();
await session.open();
try {
const messages = createContext(task); // fresh context for this task
// Track upvoted story
let upvotedStory: { id: string; title?: string; rank?: number } | null = null;
let storiesData: any[] = [];
// Create tools with hooks to track upvote success and story data
const tools = createTools(session, {
onUpvoteSuccess: (storyId) => {
const story = storiesData.find(s => s.id === storyId);
upvotedStory = story
? { id: storyId, title: story.title, rank: story.rank }
: { id: storyId };
console.log(`\n[harness] Upvote successful for story ID ${storyId} — forcing completion\n`);
},
onStoriesLoaded: (stories) => {
storiesData = stories;
},
});
// Login handler checks for redirects after each tool execution
const loginHandler = async (): Promise<ToolEvent | null> => {
const currentUrl = await session.getUrl();
const isLoginPage = currentUrl.includes("login") || currentUrl.includes("vote");
if (!isLoginPage) return null;
console.log("\n[harness] Login redirect detected — handling automatically...");
try {
await session.fill("input[name='acct']", "tejasthrowaway");
await session.fill("input[name='pw']", "tejasthrowaway");
await session.click("input[type='submit']");
console.log("[harness] Login completed — agent can continue\n");
return {
tool: "harness_auto_login",
args: {},
result: `Harness automatically handled login at ${currentUrl}. You are now authenticated and back at ${await session.getUrl()}.`,
};
} catch (err) {
console.log(`[harness] Login failed: ${err instanceof Error ? err.message : String(err)}\n`);
return null;
}
};
// Combine default guardrails with upvote completion check
const guardrails = combineGuardrails(
stopAfterUpvote(() => upvotedStory),
defaultGuardrails
);
const result = await runLoop(model, messages, guardrails, tools, loginHandler);
return { task, model, ...result };
} finally {
// Always close the environment — even if the loop threw
await session.close();
}
}
+19
View File
@@ -0,0 +1,19 @@
import { printHarnessResult, runHarness, verifySuccessfulUpvote } from "./6-harness.js";
// try a shitty model
const MODEL = "openai/gpt-3.5-turbo-0613";
const TASK = `
Upvote a story on Hacker News.
Go to https://news.ycombinator.com.
Call browser_get_stories to see ranked stories with their IDs and voted status.
Find the highest-ranked story where alreadyVoted is false.
Click its upvote arrow using the exact selector: a[id="up_STORYID"] (replace STORYID with the actual id).
`.trim();
console.log(`Model: ${MODEL}`);
console.log(`Task: upvote on Hacker News\n`);
const result = await runHarness(TASK, MODEL, { verify: verifySuccessfulUpvote, maxAttempts: 3 });
printHarnessResult(result);
+71
View File
@@ -0,0 +1,71 @@
import { chromium } from "playwright";
import type { Browser, Page } from "playwright";
export class BrowserSession {
private browser: Browser | null = null;
private page: Page | null = null;
async open(): Promise<void> {
this.browser = await chromium.launch({ headless: false });
const context = await this.browser.newContext();
this.page = await context.newPage();
}
async navigate(url: string): Promise<string> {
await this.page!.goto(url, { waitUntil: "domcontentloaded", timeout: 15000 });
return `Navigated to ${url}`;
}
async getUrl(): Promise<string> {
return this.page!.url();
}
async getText(): Promise<string> {
const text = await this.page!.innerText("body");
return text.slice(0, 4000);
}
async fill(selector: string, value: string): Promise<string> {
await this.page!.fill(selector, value);
return `Filled "${selector}"`;
}
async click(selector: string): Promise<string> {
// Capture the id of the element before clicking (navigation may change the page)
const elementId = await this.page!.locator(selector).first().getAttribute("id");
await this.page!.click(selector, { timeout: 10000 });
await this.page!.waitForLoadState("domcontentloaded", { timeout: 10000 });
const clicked = elementId ? `element id="${elementId}"` : `"${selector}"`;
return `Clicked ${clicked} — now at ${this.page!.url()}`;
}
// Returns a structured list of HN front-page stories so the agent can
// correlate story IDs, titles, ranks, and voted status precisely.
async getStories(): Promise<string> {
const stories = await this.page!.evaluate(() => {
return Array.from(document.querySelectorAll(".athing")).map((row, i) => {
const id = row.id;
const title = row.querySelector(".titleline a")?.textContent?.trim() ?? "(no title)";
const upvoteEl = document.querySelector(`#up_${id}`);
const alreadyVoted = upvoteEl?.classList.contains("nosee") ?? true;
return { rank: i + 1, id, title, alreadyVoted };
});
});
return JSON.stringify(stories, null, 2);
}
async hasClass(selector: string, className: string): Promise<string> {
const el = this.page!.locator(selector).first();
const classes = await el.getAttribute("class") ?? "";
const has = classes.split(" ").includes(className);
return has ? `"${selector}" has class "${className}"` : `"${selector}" does not have class "${className}"`;
}
async close(): Promise<void> {
await this.browser?.close();
this.browser = null;
this.page = null;
}
}
+78
View File
@@ -0,0 +1,78 @@
// ─────────────────────────────────────────────
// PART 1: The dataset
//
// A fixed set of test cases.
// Each one has an input we send to the model
// and an expected output we judge it against.
//
// These cases are designed to trigger common
// hallucinations — the "obvious" answer is often
// wrong, which exposes weaker models quickly.
// ─────────────────────────────────────────────
export type TestCase = {
id: string;
input: string;
expected: string; // the correct answer
trap?: string; // the wrong answer most models give
tags?: string[];
};
export const dataset: TestCase[] = [
{
id: "geo-australia-capital",
input: "What is the capital of Australia?",
expected: "Canberra",
trap: "Sydney", // most models confidently say Sydney
tags: ["geography"],
},
{
id: "geo-brazil-capital",
input: "What is the capital of Brazil?",
expected: "Brasília",
trap: "Rio de Janeiro", // or São Paulo
tags: ["geography"],
},
{
id: "geo-most-lakes",
input: "Which country has the most natural lakes in the world?",
expected: "Canada",
trap: "Russia", // or USA — both common wrong answers
tags: ["geography"],
},
{
id: "bio-octopus-hearts",
input: "How many hearts does an octopus have?",
expected: "3",
trap: "1", // models assume one heart like most animals
tags: ["biology"],
},
{
id: "bio-spider-legs",
input: "How many legs does a spider have?",
expected: "8",
trap: "6", // models sometimes confuse spiders with insects
tags: ["biology"],
},
{
id: "astro-mars-moons",
input: "How many moons does Mars have?",
expected: "2",
trap: "1", // or 0 — models often guess wrong
tags: ["astronomy"],
},
{
id: "geo-populous-2024",
input: "What is the most populous country in the world as of 2024?",
expected: "India",
trap: "China", // India surpassed China in 2023 — tests recency
tags: ["geography", "recency"],
},
{
id: "sci-salt-boiling",
input: "Does adding salt to water raise or lower its boiling point?",
expected: "raise",
trap: "lower", // counterintuitive — boiling point elevation
tags: ["science"],
},
];
+35
View File
@@ -0,0 +1,35 @@
// ─────────────────────────────────────────────
// PART 2: The model
//
// One function. Takes a prompt, returns a string.
// The harness doesn't care what's inside —
// swap the model string to test a different one.
// ─────────────────────────────────────────────
import OpenAI from "openai";
import "dotenv/config";
const client = new OpenAI({
baseURL: "https://openrouter.ai/api/v1",
apiKey: process.env.OPENROUTER_API_KEY,
});
export async function callModel(
model: string,
prompt: string
): Promise<string> {
const response = await client.chat.completions.create({
model,
max_tokens: 64,
messages: [
{
role: "system",
content:
"Answer as briefly as possible. One word or number if you can. No punctuation.",
},
{ role: "user", content: prompt },
],
});
return response.choices[0].message.content?.trim() ?? "";
}
+46
View File
@@ -0,0 +1,46 @@
// ─────────────────────────────────────────────
// PART 3: The scorers
//
// A scorer takes the model's output and the
// expected output, and returns 0 (wrong) or 1
// (correct). Partial credit is also fine.
//
// Pick the scorer that fits your task.
// Exact match is too strict for most real output.
// ─────────────────────────────────────────────
export type ScorerFn = (actual: string, expected: string) => number;
// Map number words to digits so "Three" and "3" are treated as equal.
// Models answer the same question differently — this shouldn't count as wrong.
const NUMBER_WORDS: Record<string, string> = {
zero: "0", one: "1", two: "2", three: "3", four: "4",
five: "5", six: "6", seven: "7", eight: "8", nine: "9",
ten: "10", eleven: "11", twelve: "12",
};
function normalize(text: string): string {
return text
.trim()
.toLowerCase()
.replace(/\b(zero|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve)\b/g,
(word) => NUMBER_WORDS[word]!);
}
// Pass only if output exactly equals expected
export function scoreExactMatch(actual: string, expected: string): number {
return normalize(actual) === normalize(expected) ? 1 : 0;
}
// Pass if output contains expected anywhere — more forgiving
export function scoreContains(actual: string, expected: string): number {
return normalize(actual).includes(normalize(expected)) ? 1 : 0;
}
// Partial credit: what fraction of keywords appear in the output?
export function scoreKeywords(actual: string, keywords: string[]): number {
if (keywords.length === 0) return 0;
const text = normalize(actual);
const hits = keywords.filter((k) => text.includes(normalize(k))).length;
return hits / keywords.length;
}
+70
View File
@@ -0,0 +1,70 @@
// ─────────────────────────────────────────────
// PART 4: The runner
//
// Loops over every test case, calls the model,
// scores the result, and collects the numbers.
//
// The model and scorer are passed in as args —
// swap either one without touching this file.
// ─────────────────────────────────────────────
import type { TestCase } from "./1-dataset.js";
import type { ScorerFn } from "./3-scorers.js";
import { callModel } from "./2-model.js";
export type RunResult = {
id: string;
expected: string;
actual: string;
trap: string | undefined; // the wrong answer we expected it to give
felForTrap: boolean; // true if it said the trap answer instead
score: number;
passed: boolean;
latencyMs: number;
};
export type EvalRun = {
model: string;
results: RunResult[];
passed: number;
total: number;
avgScore: number;
avgLatencyMs: number;
};
export async function runEval(
cases: TestCase[],
model: string,
scorer: ScorerFn
): Promise<EvalRun> {
const results: RunResult[] = [];
for (const testCase of cases) {
const start = Date.now();
const actual = await callModel(model, testCase.input);
const latencyMs = Date.now() - start;
const score = scorer(actual, testCase.expected);
const felForTrap =
testCase.trap !== undefined &&
actual.toLowerCase().includes(testCase.trap.toLowerCase());
results.push({
id: testCase.id,
expected: testCase.expected,
actual,
trap: testCase.trap,
felForTrap,
score,
passed: score >= 1,
latencyMs,
});
}
const passed = results.filter((r) => r.passed).length;
const total = results.length;
const avgScore = results.reduce((sum, r) => sum + r.score, 0) / total;
const avgLatencyMs = results.reduce((sum, r) => sum + r.latencyMs, 0) / total;
return { model, results, passed, total, avgScore, avgLatencyMs };
}
+57
View File
@@ -0,0 +1,57 @@
// ─────────────────────────────────────────────
// PART 5: The output
//
// Run the same dataset across multiple models
// and print a side-by-side comparison.
//
// This is the key value of an eval harness:
// the same test, run consistently, so you can
// compare models or catch regressions over time.
// ─────────────────────────────────────────────
import { dataset } from "./1-dataset.js";
import { scoreContains } from "./3-scorers.js";
import { runEval } from "./4-runner.js";
import type { EvalRun } from "./4-runner.js";
// Swap any OpenRouter model ID here
const MODELS = [
"ibm-granite/granite-4.0-h-micro",
"anthropic/claude-haiku-4-5",
"arcee-ai/trinity-large-preview:free",
];
function printRun(run: EvalRun): void {
console.log(`\n=== ${run.model} ===\n`);
console.table(
run.results.map((r) => ({
id: r.id,
passed: r.passed ? "✓" : "✗",
felForTrap: r.felForTrap ? "🪤" : "",
expected: r.expected,
actual: r.actual,
latencyMs: r.latencyMs,
}))
);
}
function printComparison(runs: EvalRun[]): void {
console.log("\n=== COMPARISON ===\n");
console.table(
runs.map((run) => ({
model: run.model,
passed: `${run.passed} / ${run.total}`,
traps: run.results.filter((r) => r.felForTrap).length,
avgScore: run.avgScore.toFixed(2),
avgLatencyMs: run.avgLatencyMs.toFixed(0),
}))
);
}
// Run all models in parallel against the same dataset
const runs = await Promise.all(
MODELS.map((model) => runEval(dataset, model, scoreContains))
);
for (const run of runs) printRun(run);
printComparison(runs);
+1082
View File
File diff suppressed because it is too large Load Diff
+18
View File
@@ -0,0 +1,18 @@
{
"name": "mini-ai-harness",
"type": "module",
"scripts": {
"eval": "tsx eval/5-index.ts",
"agent": "tsx agent/7-index.ts"
},
"devDependencies": {
"@types/node": "^25.3.5",
"tsx": "^4.19.0",
"typescript": "^5.6.0"
},
"dependencies": {
"dotenv": "^17.3.1",
"openai": "^4.77.0",
"playwright": "^1.50.0"
}
}
+651
View File
@@ -0,0 +1,651 @@
lockfileVersion: '9.0'
settings:
autoInstallPeers: true
excludeLinksFromLockfile: false
importers:
.:
dependencies:
dotenv:
specifier: ^17.3.1
version: 17.3.1
openai:
specifier: ^4.77.0
version: 4.104.0
devDependencies:
'@types/node':
specifier: ^25.3.5
version: 25.3.5
tsx:
specifier: ^4.19.0
version: 4.21.0
typescript:
specifier: ^5.6.0
version: 5.9.3
packages:
'@esbuild/aix-ppc64@0.27.3':
resolution: {integrity: sha512-9fJMTNFTWZMh5qwrBItuziu834eOCUcEqymSH7pY+zoMVEZg3gcPuBNxH1EvfVYe9h0x/Ptw8KBzv7qxb7l8dg==}
engines: {node: '>=18'}
cpu: [ppc64]
os: [aix]
'@esbuild/android-arm64@0.27.3':
resolution: {integrity: sha512-YdghPYUmj/FX2SYKJ0OZxf+iaKgMsKHVPF1MAq/P8WirnSpCStzKJFjOjzsW0QQ7oIAiccHdcqjbHmJxRb/dmg==}
engines: {node: '>=18'}
cpu: [arm64]
os: [android]
'@esbuild/android-arm@0.27.3':
resolution: {integrity: sha512-i5D1hPY7GIQmXlXhs2w8AWHhenb00+GxjxRncS2ZM7YNVGNfaMxgzSGuO8o8SJzRc/oZwU2bcScvVERk03QhzA==}
engines: {node: '>=18'}
cpu: [arm]
os: [android]
'@esbuild/android-x64@0.27.3':
resolution: {integrity: sha512-IN/0BNTkHtk8lkOM8JWAYFg4ORxBkZQf9zXiEOfERX/CzxW3Vg1ewAhU7QSWQpVIzTW+b8Xy+lGzdYXV6UZObQ==}
engines: {node: '>=18'}
cpu: [x64]
os: [android]
'@esbuild/darwin-arm64@0.27.3':
resolution: {integrity: sha512-Re491k7ByTVRy0t3EKWajdLIr0gz2kKKfzafkth4Q8A5n1xTHrkqZgLLjFEHVD+AXdUGgQMq+Godfq45mGpCKg==}
engines: {node: '>=18'}
cpu: [arm64]
os: [darwin]
'@esbuild/darwin-x64@0.27.3':
resolution: {integrity: sha512-vHk/hA7/1AckjGzRqi6wbo+jaShzRowYip6rt6q7VYEDX4LEy1pZfDpdxCBnGtl+A5zq8iXDcyuxwtv3hNtHFg==}
engines: {node: '>=18'}
cpu: [x64]
os: [darwin]
'@esbuild/freebsd-arm64@0.27.3':
resolution: {integrity: sha512-ipTYM2fjt3kQAYOvo6vcxJx3nBYAzPjgTCk7QEgZG8AUO3ydUhvelmhrbOheMnGOlaSFUoHXB6un+A7q4ygY9w==}
engines: {node: '>=18'}
cpu: [arm64]
os: [freebsd]
'@esbuild/freebsd-x64@0.27.3':
resolution: {integrity: sha512-dDk0X87T7mI6U3K9VjWtHOXqwAMJBNN2r7bejDsc+j03SEjtD9HrOl8gVFByeM0aJksoUuUVU9TBaZa2rgj0oA==}
engines: {node: '>=18'}
cpu: [x64]
os: [freebsd]
'@esbuild/linux-arm64@0.27.3':
resolution: {integrity: sha512-sZOuFz/xWnZ4KH3YfFrKCf1WyPZHakVzTiqji3WDc0BCl2kBwiJLCXpzLzUBLgmp4veFZdvN5ChW4Eq/8Fc2Fg==}
engines: {node: '>=18'}
cpu: [arm64]
os: [linux]
'@esbuild/linux-arm@0.27.3':
resolution: {integrity: sha512-s6nPv2QkSupJwLYyfS+gwdirm0ukyTFNl3KTgZEAiJDd+iHZcbTPPcWCcRYH+WlNbwChgH2QkE9NSlNrMT8Gfw==}
engines: {node: '>=18'}
cpu: [arm]
os: [linux]
'@esbuild/linux-ia32@0.27.3':
resolution: {integrity: sha512-yGlQYjdxtLdh0a3jHjuwOrxQjOZYD/C9PfdbgJJF3TIZWnm/tMd/RcNiLngiu4iwcBAOezdnSLAwQDPqTmtTYg==}
engines: {node: '>=18'}
cpu: [ia32]
os: [linux]
'@esbuild/linux-loong64@0.27.3':
resolution: {integrity: sha512-WO60Sn8ly3gtzhyjATDgieJNet/KqsDlX5nRC5Y3oTFcS1l0KWba+SEa9Ja1GfDqSF1z6hif/SkpQJbL63cgOA==}
engines: {node: '>=18'}
cpu: [loong64]
os: [linux]
'@esbuild/linux-mips64el@0.27.3':
resolution: {integrity: sha512-APsymYA6sGcZ4pD6k+UxbDjOFSvPWyZhjaiPyl/f79xKxwTnrn5QUnXR5prvetuaSMsb4jgeHewIDCIWljrSxw==}
engines: {node: '>=18'}
cpu: [mips64el]
os: [linux]
'@esbuild/linux-ppc64@0.27.3':
resolution: {integrity: sha512-eizBnTeBefojtDb9nSh4vvVQ3V9Qf9Df01PfawPcRzJH4gFSgrObw+LveUyDoKU3kxi5+9RJTCWlj4FjYXVPEA==}
engines: {node: '>=18'}
cpu: [ppc64]
os: [linux]
'@esbuild/linux-riscv64@0.27.3':
resolution: {integrity: sha512-3Emwh0r5wmfm3ssTWRQSyVhbOHvqegUDRd0WhmXKX2mkHJe1SFCMJhagUleMq+Uci34wLSipf8Lagt4LlpRFWQ==}
engines: {node: '>=18'}
cpu: [riscv64]
os: [linux]
'@esbuild/linux-s390x@0.27.3':
resolution: {integrity: sha512-pBHUx9LzXWBc7MFIEEL0yD/ZVtNgLytvx60gES28GcWMqil8ElCYR4kvbV2BDqsHOvVDRrOxGySBM9Fcv744hw==}
engines: {node: '>=18'}
cpu: [s390x]
os: [linux]
'@esbuild/linux-x64@0.27.3':
resolution: {integrity: sha512-Czi8yzXUWIQYAtL/2y6vogER8pvcsOsk5cpwL4Gk5nJqH5UZiVByIY8Eorm5R13gq+DQKYg0+JyQoytLQas4dA==}
engines: {node: '>=18'}
cpu: [x64]
os: [linux]
'@esbuild/netbsd-arm64@0.27.3':
resolution: {integrity: sha512-sDpk0RgmTCR/5HguIZa9n9u+HVKf40fbEUt+iTzSnCaGvY9kFP0YKBWZtJaraonFnqef5SlJ8/TiPAxzyS+UoA==}
engines: {node: '>=18'}
cpu: [arm64]
os: [netbsd]
'@esbuild/netbsd-x64@0.27.3':
resolution: {integrity: sha512-P14lFKJl/DdaE00LItAukUdZO5iqNH7+PjoBm+fLQjtxfcfFE20Xf5CrLsmZdq5LFFZzb5JMZ9grUwvtVYzjiA==}
engines: {node: '>=18'}
cpu: [x64]
os: [netbsd]
'@esbuild/openbsd-arm64@0.27.3':
resolution: {integrity: sha512-AIcMP77AvirGbRl/UZFTq5hjXK+2wC7qFRGoHSDrZ5v5b8DK/GYpXW3CPRL53NkvDqb9D+alBiC/dV0Fb7eJcw==}
engines: {node: '>=18'}
cpu: [arm64]
os: [openbsd]
'@esbuild/openbsd-x64@0.27.3':
resolution: {integrity: sha512-DnW2sRrBzA+YnE70LKqnM3P+z8vehfJWHXECbwBmH/CU51z6FiqTQTHFenPlHmo3a8UgpLyH3PT+87OViOh1AQ==}
engines: {node: '>=18'}
cpu: [x64]
os: [openbsd]
'@esbuild/openharmony-arm64@0.27.3':
resolution: {integrity: sha512-NinAEgr/etERPTsZJ7aEZQvvg/A6IsZG/LgZy+81wON2huV7SrK3e63dU0XhyZP4RKGyTm7aOgmQk0bGp0fy2g==}
engines: {node: '>=18'}
cpu: [arm64]
os: [openharmony]
'@esbuild/sunos-x64@0.27.3':
resolution: {integrity: sha512-PanZ+nEz+eWoBJ8/f8HKxTTD172SKwdXebZ0ndd953gt1HRBbhMsaNqjTyYLGLPdoWHy4zLU7bDVJztF5f3BHA==}
engines: {node: '>=18'}
cpu: [x64]
os: [sunos]
'@esbuild/win32-arm64@0.27.3':
resolution: {integrity: sha512-B2t59lWWYrbRDw/tjiWOuzSsFh1Y/E95ofKz7rIVYSQkUYBjfSgf6oeYPNWHToFRr2zx52JKApIcAS/D5TUBnA==}
engines: {node: '>=18'}
cpu: [arm64]
os: [win32]
'@esbuild/win32-ia32@0.27.3':
resolution: {integrity: sha512-QLKSFeXNS8+tHW7tZpMtjlNb7HKau0QDpwm49u0vUp9y1WOF+PEzkU84y9GqYaAVW8aH8f3GcBck26jh54cX4Q==}
engines: {node: '>=18'}
cpu: [ia32]
os: [win32]
'@esbuild/win32-x64@0.27.3':
resolution: {integrity: sha512-4uJGhsxuptu3OcpVAzli+/gWusVGwZZHTlS63hh++ehExkVT8SgiEf7/uC/PclrPPkLhZqGgCTjd0VWLo6xMqA==}
engines: {node: '>=18'}
cpu: [x64]
os: [win32]
'@types/node-fetch@2.6.13':
resolution: {integrity: sha512-QGpRVpzSaUs30JBSGPjOg4Uveu384erbHBoT1zeONvyCfwQxIkUshLAOqN/k9EjGviPRmWTTe6aH2qySWKTVSw==}
'@types/node@18.19.130':
resolution: {integrity: sha512-GRaXQx6jGfL8sKfaIDD6OupbIHBr9jv7Jnaml9tB7l4v068PAOXqfcujMMo5PhbIs6ggR1XODELqahT2R8v0fg==}
'@types/node@25.3.5':
resolution: {integrity: sha512-oX8xrhvpiyRCQkG1MFchB09f+cXftgIXb3a7UUa4Y3wpmZPw5tyZGTLWhlESOLq1Rq6oDlc8npVU2/9xiCuXMA==}
abort-controller@3.0.0:
resolution: {integrity: sha512-h8lQ8tacZYnR3vNQTgibj+tODHI5/+l06Au2Pcriv/Gmet0eaj4TwWH41sO9wnHDiQsEj19q0drzdWdeAHtweg==}
engines: {node: '>=6.5'}
agentkeepalive@4.6.0:
resolution: {integrity: sha512-kja8j7PjmncONqaTsB8fQ+wE2mSU2DJ9D4XKoJ5PFWIdRMa6SLSN1ff4mOr4jCbfRSsxR4keIiySJU0N9T5hIQ==}
engines: {node: '>= 8.0.0'}
asynckit@0.4.0:
resolution: {integrity: sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==}
call-bind-apply-helpers@1.0.2:
resolution: {integrity: sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==}
engines: {node: '>= 0.4'}
combined-stream@1.0.8:
resolution: {integrity: sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==}
engines: {node: '>= 0.8'}
delayed-stream@1.0.0:
resolution: {integrity: sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==}
engines: {node: '>=0.4.0'}
dotenv@17.3.1:
resolution: {integrity: sha512-IO8C/dzEb6O3F9/twg6ZLXz164a2fhTnEWb95H23Dm4OuN+92NmEAlTrupP9VW6Jm3sO26tQlqyvyi4CsnY9GA==}
engines: {node: '>=12'}
dunder-proto@1.0.1:
resolution: {integrity: sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==}
engines: {node: '>= 0.4'}
es-define-property@1.0.1:
resolution: {integrity: sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==}
engines: {node: '>= 0.4'}
es-errors@1.3.0:
resolution: {integrity: sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==}
engines: {node: '>= 0.4'}
es-object-atoms@1.1.1:
resolution: {integrity: sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==}
engines: {node: '>= 0.4'}
es-set-tostringtag@2.1.0:
resolution: {integrity: sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==}
engines: {node: '>= 0.4'}
esbuild@0.27.3:
resolution: {integrity: sha512-8VwMnyGCONIs6cWue2IdpHxHnAjzxnw2Zr7MkVxB2vjmQ2ivqGFb4LEG3SMnv0Gb2F/G/2yA8zUaiL1gywDCCg==}
engines: {node: '>=18'}
hasBin: true
event-target-shim@5.0.1:
resolution: {integrity: sha512-i/2XbnSz/uxRCU6+NdVJgKWDTM427+MqYbkQzD321DuCQJUqOuJKIA0IM2+W2xtYHdKOmZ4dR6fExsd4SXL+WQ==}
engines: {node: '>=6'}
form-data-encoder@1.7.2:
resolution: {integrity: sha512-qfqtYan3rxrnCk1VYaA4H+Ms9xdpPqvLZa6xmMgFvhO32x7/3J/ExcTd6qpxM0vH2GdMI+poehyBZvqfMTto8A==}
form-data@4.0.5:
resolution: {integrity: sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==}
engines: {node: '>= 6'}
formdata-node@4.4.1:
resolution: {integrity: sha512-0iirZp3uVDjVGt9p49aTaqjk84TrglENEDuqfdlZQ1roC9CWlPk6Avf8EEnZNcAqPonwkG35x4n3ww/1THYAeQ==}
engines: {node: '>= 12.20'}
fsevents@2.3.3:
resolution: {integrity: sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==}
engines: {node: ^8.16.0 || ^10.6.0 || >=11.0.0}
os: [darwin]
function-bind@1.1.2:
resolution: {integrity: sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==}
get-intrinsic@1.3.0:
resolution: {integrity: sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==}
engines: {node: '>= 0.4'}
get-proto@1.0.1:
resolution: {integrity: sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==}
engines: {node: '>= 0.4'}
get-tsconfig@4.13.6:
resolution: {integrity: sha512-shZT/QMiSHc/YBLxxOkMtgSid5HFoauqCE3/exfsEcwg1WkeqjG+V40yBbBrsD+jW2HDXcs28xOfcbm2jI8Ddw==}
gopd@1.2.0:
resolution: {integrity: sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==}
engines: {node: '>= 0.4'}
has-symbols@1.1.0:
resolution: {integrity: sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==}
engines: {node: '>= 0.4'}
has-tostringtag@1.0.2:
resolution: {integrity: sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw==}
engines: {node: '>= 0.4'}
hasown@2.0.2:
resolution: {integrity: sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==}
engines: {node: '>= 0.4'}
humanize-ms@1.2.1:
resolution: {integrity: sha512-Fl70vYtsAFb/C06PTS9dZBo7ihau+Tu/DNCk/OyHhea07S+aeMWpFFkUaXRa8fI+ScZbEI8dfSxwY7gxZ9SAVQ==}
math-intrinsics@1.1.0:
resolution: {integrity: sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==}
engines: {node: '>= 0.4'}
mime-db@1.52.0:
resolution: {integrity: sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==}
engines: {node: '>= 0.6'}
mime-types@2.1.35:
resolution: {integrity: sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==}
engines: {node: '>= 0.6'}
ms@2.1.3:
resolution: {integrity: sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==}
node-domexception@1.0.0:
resolution: {integrity: sha512-/jKZoMpw0F8GRwl4/eLROPA3cfcXtLApP0QzLmUT/HuPCZWyB7IY9ZrMeKw2O/nFIqPQB3PVM9aYm0F312AXDQ==}
engines: {node: '>=10.5.0'}
deprecated: Use your platform's native DOMException instead
node-fetch@2.7.0:
resolution: {integrity: sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==}
engines: {node: 4.x || >=6.0.0}
peerDependencies:
encoding: ^0.1.0
peerDependenciesMeta:
encoding:
optional: true
openai@4.104.0:
resolution: {integrity: sha512-p99EFNsA/yX6UhVO93f5kJsDRLAg+CTA2RBqdHK4RtK8u5IJw32Hyb2dTGKbnnFmnuoBv5r7Z2CURI9sGZpSuA==}
hasBin: true
peerDependencies:
ws: ^8.18.0
zod: ^3.23.8
peerDependenciesMeta:
ws:
optional: true
zod:
optional: true
resolve-pkg-maps@1.0.0:
resolution: {integrity: sha512-seS2Tj26TBVOC2NIc2rOe2y2ZO7efxITtLZcGSOnHHNOQ7CkiUBfw0Iw2ck6xkIhPwLhKNLS8BO+hEpngQlqzw==}
tr46@0.0.3:
resolution: {integrity: sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==}
tsx@4.21.0:
resolution: {integrity: sha512-5C1sg4USs1lfG0GFb2RLXsdpXqBSEhAaA/0kPL01wxzpMqLILNxIxIOKiILz+cdg/pLnOUxFYOR5yhHU666wbw==}
engines: {node: '>=18.0.0'}
hasBin: true
typescript@5.9.3:
resolution: {integrity: sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==}
engines: {node: '>=14.17'}
hasBin: true
undici-types@5.26.5:
resolution: {integrity: sha512-JlCMO+ehdEIKqlFxk6IfVoAUVmgz7cU7zD/h9XZ0qzeosSHmUJVOzSQvvYSYWXkFXC+IfLKSIffhv0sVZup6pA==}
undici-types@7.18.2:
resolution: {integrity: sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w==}
web-streams-polyfill@4.0.0-beta.3:
resolution: {integrity: sha512-QW95TCTaHmsYfHDybGMwO5IJIM93I/6vTRk+daHTWFPhwh+C8Cg7j7XyKrwrj8Ib6vYXe0ocYNrmzY4xAAN6ug==}
engines: {node: '>= 14'}
webidl-conversions@3.0.1:
resolution: {integrity: sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==}
whatwg-url@5.0.0:
resolution: {integrity: sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==}
snapshots:
'@esbuild/aix-ppc64@0.27.3':
optional: true
'@esbuild/android-arm64@0.27.3':
optional: true
'@esbuild/android-arm@0.27.3':
optional: true
'@esbuild/android-x64@0.27.3':
optional: true
'@esbuild/darwin-arm64@0.27.3':
optional: true
'@esbuild/darwin-x64@0.27.3':
optional: true
'@esbuild/freebsd-arm64@0.27.3':
optional: true
'@esbuild/freebsd-x64@0.27.3':
optional: true
'@esbuild/linux-arm64@0.27.3':
optional: true
'@esbuild/linux-arm@0.27.3':
optional: true
'@esbuild/linux-ia32@0.27.3':
optional: true
'@esbuild/linux-loong64@0.27.3':
optional: true
'@esbuild/linux-mips64el@0.27.3':
optional: true
'@esbuild/linux-ppc64@0.27.3':
optional: true
'@esbuild/linux-riscv64@0.27.3':
optional: true
'@esbuild/linux-s390x@0.27.3':
optional: true
'@esbuild/linux-x64@0.27.3':
optional: true
'@esbuild/netbsd-arm64@0.27.3':
optional: true
'@esbuild/netbsd-x64@0.27.3':
optional: true
'@esbuild/openbsd-arm64@0.27.3':
optional: true
'@esbuild/openbsd-x64@0.27.3':
optional: true
'@esbuild/openharmony-arm64@0.27.3':
optional: true
'@esbuild/sunos-x64@0.27.3':
optional: true
'@esbuild/win32-arm64@0.27.3':
optional: true
'@esbuild/win32-ia32@0.27.3':
optional: true
'@esbuild/win32-x64@0.27.3':
optional: true
'@types/node-fetch@2.6.13':
dependencies:
'@types/node': 25.3.5
form-data: 4.0.5
'@types/node@18.19.130':
dependencies:
undici-types: 5.26.5
'@types/node@25.3.5':
dependencies:
undici-types: 7.18.2
abort-controller@3.0.0:
dependencies:
event-target-shim: 5.0.1
agentkeepalive@4.6.0:
dependencies:
humanize-ms: 1.2.1
asynckit@0.4.0: {}
call-bind-apply-helpers@1.0.2:
dependencies:
es-errors: 1.3.0
function-bind: 1.1.2
combined-stream@1.0.8:
dependencies:
delayed-stream: 1.0.0
delayed-stream@1.0.0: {}
dotenv@17.3.1: {}
dunder-proto@1.0.1:
dependencies:
call-bind-apply-helpers: 1.0.2
es-errors: 1.3.0
gopd: 1.2.0
es-define-property@1.0.1: {}
es-errors@1.3.0: {}
es-object-atoms@1.1.1:
dependencies:
es-errors: 1.3.0
es-set-tostringtag@2.1.0:
dependencies:
es-errors: 1.3.0
get-intrinsic: 1.3.0
has-tostringtag: 1.0.2
hasown: 2.0.2
esbuild@0.27.3:
optionalDependencies:
'@esbuild/aix-ppc64': 0.27.3
'@esbuild/android-arm': 0.27.3
'@esbuild/android-arm64': 0.27.3
'@esbuild/android-x64': 0.27.3
'@esbuild/darwin-arm64': 0.27.3
'@esbuild/darwin-x64': 0.27.3
'@esbuild/freebsd-arm64': 0.27.3
'@esbuild/freebsd-x64': 0.27.3
'@esbuild/linux-arm': 0.27.3
'@esbuild/linux-arm64': 0.27.3
'@esbuild/linux-ia32': 0.27.3
'@esbuild/linux-loong64': 0.27.3
'@esbuild/linux-mips64el': 0.27.3
'@esbuild/linux-ppc64': 0.27.3
'@esbuild/linux-riscv64': 0.27.3
'@esbuild/linux-s390x': 0.27.3
'@esbuild/linux-x64': 0.27.3
'@esbuild/netbsd-arm64': 0.27.3
'@esbuild/netbsd-x64': 0.27.3
'@esbuild/openbsd-arm64': 0.27.3
'@esbuild/openbsd-x64': 0.27.3
'@esbuild/openharmony-arm64': 0.27.3
'@esbuild/sunos-x64': 0.27.3
'@esbuild/win32-arm64': 0.27.3
'@esbuild/win32-ia32': 0.27.3
'@esbuild/win32-x64': 0.27.3
event-target-shim@5.0.1: {}
form-data-encoder@1.7.2: {}
form-data@4.0.5:
dependencies:
asynckit: 0.4.0
combined-stream: 1.0.8
es-set-tostringtag: 2.1.0
hasown: 2.0.2
mime-types: 2.1.35
formdata-node@4.4.1:
dependencies:
node-domexception: 1.0.0
web-streams-polyfill: 4.0.0-beta.3
fsevents@2.3.3:
optional: true
function-bind@1.1.2: {}
get-intrinsic@1.3.0:
dependencies:
call-bind-apply-helpers: 1.0.2
es-define-property: 1.0.1
es-errors: 1.3.0
es-object-atoms: 1.1.1
function-bind: 1.1.2
get-proto: 1.0.1
gopd: 1.2.0
has-symbols: 1.1.0
hasown: 2.0.2
math-intrinsics: 1.1.0
get-proto@1.0.1:
dependencies:
dunder-proto: 1.0.1
es-object-atoms: 1.1.1
get-tsconfig@4.13.6:
dependencies:
resolve-pkg-maps: 1.0.0
gopd@1.2.0: {}
has-symbols@1.1.0: {}
has-tostringtag@1.0.2:
dependencies:
has-symbols: 1.1.0
hasown@2.0.2:
dependencies:
function-bind: 1.1.2
humanize-ms@1.2.1:
dependencies:
ms: 2.1.3
math-intrinsics@1.1.0: {}
mime-db@1.52.0: {}
mime-types@2.1.35:
dependencies:
mime-db: 1.52.0
ms@2.1.3: {}
node-domexception@1.0.0: {}
node-fetch@2.7.0:
dependencies:
whatwg-url: 5.0.0
openai@4.104.0:
dependencies:
'@types/node': 18.19.130
'@types/node-fetch': 2.6.13
abort-controller: 3.0.0
agentkeepalive: 4.6.0
form-data-encoder: 1.7.2
formdata-node: 4.4.1
node-fetch: 2.7.0
transitivePeerDependencies:
- encoding
resolve-pkg-maps@1.0.0: {}
tr46@0.0.3: {}
tsx@4.21.0:
dependencies:
esbuild: 0.27.3
get-tsconfig: 4.13.6
optionalDependencies:
fsevents: 2.3.3
typescript@5.9.3: {}
undici-types@5.26.5: {}
undici-types@7.18.2: {}
web-streams-polyfill@4.0.0-beta.3: {}
webidl-conversions@3.0.1: {}
whatwg-url@5.0.0:
dependencies:
tr46: 0.0.3
webidl-conversions: 3.0.1
+10
View File
@@ -0,0 +1,10 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"strict": true,
"outDir": "dist"
},
"include": ["src"]
}