feat: update token usage after each step in multi-step turns

Previously, token usage and costs were only updated at the end of a complete turn. For long-running multi-step tool-calling conversations, this meant the status bar showed stale (or zero) costs during the entire interaction. Now, after each complete step (tool call + result), the usage tracker is updated with the actual token counts from that step. This provides real-time cost accumulation visible in the status bar. Changes: - Add StepUsageHandler type and onStepUsage parameter to agent - Emit StepUsageEvent from kit layer after each step completes - Handle StepUsageEvent in app layer to update UsageTracker - Add EventStepUsage constant and StepUsageEvent struct to events The step usage is additive - each step's tokens are added to the running session totals, just like the final turn usage was before.
fix: update token counting when switching models mid-session
2026-06-21 14:39:38 +00:00 · 2026-03-25 18:17:48 +03:00 · 2026-03-25 18:09:36 +03:00 · 2026-03-25 18:02:50 +03:00 · 2026-03-25 17:48:37 +03:00 · 2026-03-25 17:41:37 +03:00
39 changed files with 2126 additions and 101 deletions
@@ -107,13 +107,8 @@ func resolveGoFilePath(inputJSON, cwd string) (string, bool) {
 }

 func runGoDiagnostics(cwd, absPath string) string {
-	target := absPath
-	if rel, err := filepath.Rel(cwd, absPath); err == nil && !strings.HasPrefix(rel, "..") {
-		target = rel
-	}
-
 	gopls := runGopls(cwd, absPath)
-	lint := runGolangCILint(cwd, target)
+	lint := runGolangCILint(cwd, "./...")

 	return fmt.Sprintf(
 		"<go_diagnostics file=%q>\n[gopls]\n%s\n\n[golangci-lint]\n%s\n</go_diagnostics>",
@@ -287,7 +287,7 @@ kit -e examples/extensions/minimal.go

 ### Extension Capabilities

-**Lifecycle Events**: OnSessionStart, OnSessionShutdown, OnBeforeAgentStart, OnAgentStart, OnAgentEnd, OnToolCall, OnToolExecutionStart, OnToolExecutionEnd, OnToolResult, OnInput, OnMessageStart, OnMessageUpdate, OnMessageEnd, OnModelChange, OnContextPrepare, OnBeforeFork, OnBeforeSessionSwitch, OnBeforeCompact
+**Lifecycle Events**: OnSessionStart, OnSessionShutdown, OnBeforeAgentStart, OnAgentStart, OnAgentEnd, OnToolCall, OnToolExecutionStart, OnToolOutput, OnToolExecutionEnd, OnToolResult, OnInput, OnMessageStart, OnMessageUpdate, OnMessageEnd, OnModelChange, OnContextPrepare, OnBeforeFork, OnBeforeSessionSwitch, OnBeforeCompact, OnCustomEvent

 **Custom Components**:
 - **Tools**: Add new tools the LLM can invoke
@@ -336,12 +336,13 @@ See the `examples/extensions/` directory:
 - `subagent-widget.go` - Multi-agent orchestration with status widget
 - `subagent-test.go` - Subagent testing utilities
 - `summarize.go` - Conversation summarization
- `go-edit-lint.go` - LSP diagnostic integration with TUI visibility
 - `tool-logger.go` - Log all tool calls
 - `neon-theme.go` - Custom theme registration and switching
 - `tool-renderer-demo.go` - Custom tool call rendering
 - `widget-status.go` - Persistent status widgets

+Also see `.kit/extensions/go-edit-lint.go` (in this repo) for a project-local extension example that runs gopls and golangci-lint on Go file edits.
+
 ### Loading Extensions

 **Auto-discovery** (loads automatically):
@@ -703,8 +704,24 @@ npm/                 - NPM package wrapper for distribution
 - **Google Vertex** - Claude on Vertex AI
 - **OpenRouter** - Multi-provider router
 - **Vercel AI** - Vercel AI SDK models
+- **Custom** - Any OpenAI-compatible endpoint via `--provider-url`
 - **Auto-routed** - Any provider from models.dev database

+### Custom Provider
+
+Use `custom/custom` when pointing Kit at any OpenAI-compatible endpoint with `--provider-url`:
+
+```bash
+kit --provider-url "http://localhost:8080/v1" "Hello"
+```
+
+This automatically defaults to `custom/custom` without needing to specify a model. The custom provider routes through fantasy's `openaicompat` provider and supports:
+
+- Zero cost tracking (input/output = 0)
+- 262K context window, 65K output limit
+- Reasoning and temperature support
+- Optional `CUSTOM_API_KEY` environment variable or `--provider-api-key` flag
+
 ### Model String Format

 ```bash
@@ -4,6 +4,7 @@ import (
 	"fmt"
 	"sort"

+	"github.com/mark3labs/kit/internal/models"
 	kit "github.com/mark3labs/kit/pkg/kit"
 	"github.com/spf13/cobra"
 )
@@ -47,6 +48,9 @@ func runModels(_ *cobra.Command, args []string) error {
 }

 func printAllProviders(showAll bool) error {
+	// Reload the registry to pick up any custom models from config
+	models.ReloadGlobalRegistry()
+
 	var providerIDs []string
 	if showAll {
 		providerIDs = kit.GetSupportedProviders()
@@ -98,6 +102,9 @@ func printAllProviders(showAll bool) error {
 }

 func printProvider(provider string) error {
+	// Reload the registry to pick up any custom models from config
+	models.ReloadGlobalRegistry()
+
 	m, err := kit.GetModelsForProvider(provider)
 	if err != nil {
 		return fmt.Errorf("unknown provider %q. Run 'kit models' to see all providers", provider)
@@ -13,6 +13,7 @@ import (
 	"charm.land/fantasy"
 	"charm.land/lipgloss/v2"
 	"github.com/mark3labs/kit/internal/app"
+	"github.com/mark3labs/kit/internal/auth"
 	"github.com/mark3labs/kit/internal/config"
 	"github.com/mark3labs/kit/internal/extensions"
 	"github.com/mark3labs/kit/internal/models"
@@ -689,6 +690,16 @@ func runNormalMode(ctx context.Context) error {
 		}
 	}

+	// When --provider-url is set but no explicit --model was provided,
+	// default to "custom/custom" so the user doesn't need to remember a
+	// provider/model pair for custom OpenAI-compatible endpoints.
+	// This intentionally overrides saved preferences but respects config-file
+	// models — if you specify a model in ~/.kit.yml, it will be used with
+	// custom/custom's provider routing.
+	if viper.GetString("provider-url") != "" && !modelFlagChanged && !viper.InConfig("model") {
+		viper.Set("model", "custom/custom")
+	}
+
 	// Load MCP configuration.
 	mcpConfig, err := config.LoadAndValidateConfig()
 	if err != nil {
@@ -945,6 +956,24 @@ func runNormalMode(ctx context.Context) error {
 				kitInstance.UpdateExtensionContextModel(modelString)
 				// Fire OnModelChange event to extensions.
 				kitInstance.EmitModelChange(modelString, previousModel, "extension")
+				// Update usage tracker with new model info for correct token counting.
+				if usageTracker != nil {
+					newProvider, newModel, _ := models.ParseModelString(modelString)
+					if newProvider != "unknown" && newModel != "unknown" && newProvider != "ollama" {
+						registry := models.GetGlobalRegistry()
+						if modelInfo := registry.LookupModel(newProvider, newModel); modelInfo != nil {
+							// Check OAuth status for Anthropic models
+							isOAuth := false
+							if newProvider == "anthropic" {
+								_, source, err := auth.GetAnthropicAPIKey(viper.GetString("provider-api-key"))
+								if err == nil && strings.HasPrefix(source, "stored OAuth") {
+									isOAuth = true
+								}
+							}
+							usageTracker.UpdateModelInfo(modelInfo, newProvider, isOAuth)
+						}
+					}
+				}
 				return nil
 			},
 			GetAvailableModels: func() []extensions.ModelInfoEntry {
@@ -1142,6 +1171,24 @@ func runNormalMode(ctx context.Context) error {
 		// this callback runs synchronously inside BubbleTea's Update(), and
 		// NotifyModelChanged calls prog.Send() which deadlocks. The UI layer
 		// updates m.providerName and m.modelName directly after setModel returns.
+		// Update usage tracker with new model info for correct token counting.
+		if usageTracker != nil {
+			newProvider, newModel, _ := models.ParseModelString(modelString)
+			if newProvider != "unknown" && newModel != "unknown" && newProvider != "ollama" {
+				registry := models.GetGlobalRegistry()
+				if modelInfo := registry.LookupModel(newProvider, newModel); modelInfo != nil {
+					// Check OAuth status for Anthropic models
+					isOAuth := false
+					if newProvider == "anthropic" {
+						_, source, err := auth.GetAnthropicAPIKey(viper.GetString("provider-api-key"))
+						if err == nil && strings.HasPrefix(source, "stored OAuth") {
+							isOAuth = true
+						}
+					}
+					usageTracker.UpdateModelInfo(modelInfo, newProvider, isOAuth)
+				}
+			}
+		}
 		return nil
 	}
 	emitModelChangeForUI := func(newModel, previousModel, source string) {
@@ -8,19 +8,21 @@ import (
 	"github.com/spf13/cobra"
 )

-// skillCmd installs the kit-extensions skill via the skills.sh CLI (npx skills).
-// This teaches AI agents how to create Kit extensions with full knowledge of
-// the extension API, lifecycle events, widgets, tools, commands, and Yaegi constraints.
+// skillCmd installs Kit skills via the skills.sh CLI (npx skills).
 var skillCmd = &cobra.Command{
 	Use:   "skill",
-	Short: "Install the Kit extensions skill via skills.sh",
-	Long: `Install the kit-extensions skill that teaches AI agents how to create
-Kit extensions. Uses the skills.sh CLI (npx skills) to install the skill
-from the Kit repository.
+	Short: "Install Kit skills via skills.sh",
+	Long: `Install Kit skills that teach AI agents how to build with Kit.
+Uses the skills.sh CLI (npx skills) to install all skills from the Kit repository.

-The skill provides comprehensive documentation of Kit's extension API including
-lifecycle events, custom tools, slash commands, widgets, editor interceptors,
-tool renderers, and critical Yaegi interpreter constraints.
+Two skills are provided:
+
+  1. Extensions — creating Kit extensions with full knowledge of the extension
+     API, lifecycle events, widgets, tools, commands, editor interceptors,
+     tool renderers, and Yaegi interpreter constraints.
+
+  2. SDK — building AI-powered applications with the Kit Go SDK, including
+     providers, agents, tools, and MCP integration.

 Example:
  kit skill`,
@@ -41,8 +43,6 @@ func runSkill(_ *cobra.Command, _ []string) error {
 		"skills",
 		"add",
 		"mark3labs/kit",
-		"--skill",
-		"kit-extensions",
 	}

 	cmd := exec.Command(npx, args...)
@@ -0,0 +1,45 @@
+# SDK Examples
+
+These examples demonstrate how to use the Kit SDK (`pkg/kit`) to build agents programmatically in Go.
+
+## Examples
+
+### [basic](basic/)
+
+Shows core SDK usage: creating a Kit instance, sending prompts, overriding the model, subscribing to events (tool calls, streaming), and session management.
+
+```bash
+go run ./examples/sdk/basic
+```
+
+### [scripting](scripting/)
+
+A minimal script-friendly wrapper that takes a prompt from the command line and prints the response — useful for piping and automation.
+
+```bash
+go run ./examples/sdk/scripting "Explain what this repo does"
+```
+
+### [crypto-monitor](crypto-monitor/)
+
+A background agent that checks Bitcoin and Ethereum prices every 30 minutes and sends desktop notifications via `notify-send` (dbus). Demonstrates using the SDK for a long-running autonomous task with a single tool.
+
+```bash
+go run ./examples/sdk/crypto-monitor
+
+# Override the check interval:
+CRYPTO_INTERVAL=5m go run ./examples/sdk/crypto-monitor
+```
+
+## Getting Started
+
+```go
+import kit "github.com/mark3labs/kit/pkg/kit"
+
+host, err := kit.New(ctx, nil)        // uses ~/.kit.yml defaults
+defer host.Close()
+
+response, err := host.Prompt(ctx, "Hello!")
+```
+
+See the [SDK README](../../pkg/kit/README.md) for the full API reference.
@@ -0,0 +1,85 @@
+package main
+
+import (
+	"context"
+	"fmt"
+	"log"
+	"os"
+	"os/signal"
+	"time"
+
+	kit "github.com/mark3labs/kit/pkg/kit"
+)
+
+const systemPrompt = `You are a cryptocurrency price monitor. Your job is to:
+
+1. Fetch the current prices of Bitcoin and Ethereum using bash with curl
+2. Send a desktop notification with the results using notify-send
+
+To fetch prices, use this CoinGecko API endpoint (no API key needed):
+  curl -s 'https://api.coingecko.com/api/v3/simple/price?ids=bitcoin,ethereum&vs_currencies=usd&include_24hr_change=true'
+
+To send a desktop notification:
+  notify-send -i dialog-information "Crypto Prices" "BTC: $XX,XXX (+X.X%)\nETH: $X,XXX (+X.X%)"
+
+Include the 24h percentage change in the notification. Use a green arrow (▲) for
+positive changes and a red arrow (▼) for negative. Format prices with commas.
+
+If the API call fails, send a notification about the failure instead.
+
+Always complete both steps: fetch then notify. Be concise — no commentary needed.`
+
+func main() {
+	interval := 30 * time.Minute
+	if os.Getenv("CRYPTO_INTERVAL") != "" {
+		d, err := time.ParseDuration(os.Getenv("CRYPTO_INTERVAL"))
+		if err == nil {
+			interval = d
+		}
+	}
+
+	ctx, cancel := signal.NotifyContext(context.Background(), os.Interrupt)
+	defer cancel()
+
+	host, err := kit.New(ctx, &kit.Options{
+		SystemPrompt: systemPrompt,
+		Tools:        []kit.Tool{kit.NewBashTool()},
+		NoSession:    true,
+		Quiet:        true,
+	})
+	if err != nil {
+		log.Fatalf("Failed to create kit instance: %v", err)
+	}
+	defer func() { _ = host.Close() }()
+
+	fmt.Printf("Crypto price monitor started (every %s)\n", interval)
+	fmt.Println("Press Ctrl+C to stop")
+
+	// Run immediately on startup, then on each tick.
+	check(ctx, host)
+
+	ticker := time.NewTicker(interval)
+	defer ticker.Stop()
+
+	for {
+		select {
+		case <-ticker.C:
+			check(ctx, host)
+		case <-ctx.Done():
+			fmt.Println("\nStopping price monitor")
+			return
+		}
+	}
+}
+
+func check(ctx context.Context, host *kit.Kit) {
+	fmt.Printf("[%s] Checking prices...\n", time.Now().Format("15:04:05"))
+
+	// Clear session so each check is independent.
+	host.ClearSession()
+
+	_, err := host.Prompt(ctx, "Fetch current Bitcoin and Ethereum prices and send a desktop notification.")
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "Error: %v\n", err)
+	}
+}
@@ -63,6 +63,18 @@ type ToolCallContentHandler func(content string)
 // ReasoningDeltaHandler is a function type for handling streaming reasoning/thinking deltas.
 type ReasoningDeltaHandler func(delta string)

+// ToolOutputHandler is a function type for handling streaming tool output chunks.
+// Used by tools like bash to stream output as it arrives rather than waiting
+// for the command to complete. The isStderr flag indicates if the chunk
+// contains stderr output.
+// Note: This is an alias for core.ToolOutputCallback to avoid import cycles.
+type ToolOutputHandler = core.ToolOutputCallback
+
+// StepUsageHandler is a function type for handling token usage after each
+// complete step in a multi-step agent turn. This enables real-time cost
+// tracking during long-running tool-calling conversations.
+type StepUsageHandler func(inputTokens, outputTokens, cacheReadTokens, cacheCreationTokens int64)
+
 // Agent represents an AI agent with core tool integration using the fantasy library.
 // Core tools (bash, read, write, edit, grep, find, ls) are registered as direct
 // fantasy.AgentTool implementations — no MCP layer, no serialization overhead.
@@ -218,7 +230,7 @@ func (a *Agent) GenerateWithLoop(ctx context.Context, messages []fantasy.Message
 	onResponse ResponseHandler, onToolCallContent ToolCallContentHandler,
 ) (*GenerateWithLoopResult, error) {
 	return a.GenerateWithLoopAndStreaming(ctx, messages, onToolCall, onToolExecution, onToolResult,
-		onResponse, onToolCallContent, nil, nil)
+		onResponse, onToolCallContent, nil, nil, nil, nil)
 }

 // GenerateWithLoopAndStreaming processes messages using the fantasy agent with streaming and callbacks.
@@ -229,8 +241,15 @@ func (a *Agent) GenerateWithLoopAndStreaming(ctx context.Context, messages []fan
 	onResponse ResponseHandler, onToolCallContent ToolCallContentHandler,
 	onStreamingResponse StreamingResponseHandler,
 	onReasoningDelta ReasoningDeltaHandler,
+	onToolOutput ToolOutputHandler,
+	onStepUsage StepUsageHandler,
 ) (*GenerateWithLoopResult, error) {

+	// Inject tool output handler into context for use by core tools (e.g., bash).
+	if onToolOutput != nil {
+		ctx = core.ContextWithToolOutputCallback(ctx, onToolOutput)
+	}
+
 	// Fantasy requires the current user input as Prompt, with prior messages as history.
 	// Extract the last user message text and files as the prompt, and pass everything
 	// before it as Messages. Files (e.g. clipboard images) are passed via the Files
@@ -338,6 +357,11 @@ func (a *Agent) GenerateWithLoopAndStreaming(ctx context.Context, messages []fan
 				if text != "" && len(toolCalls) > 0 && onToolCallContent != nil {
 					onToolCallContent(text)
 				}
+				// Emit step usage for real-time cost tracking
+				if onStepUsage != nil {
+					onStepUsage(step.Usage.InputTokens, step.Usage.OutputTokens,
+						step.Usage.CacheReadTokens, step.Usage.CacheCreationTokens)
+				}
 				return nil
 			},
 		})
@@ -486,11 +486,10 @@ func (a *App) runQueueBatch(items []queueItem) {
 	result, err := a.executeBatch(stepCtx, items, eventFn)
 	if err != nil {
 		if stepCtx.Err() != nil {
-			// Step was cancelled by the user (double-ESC). The SDK's
-			// runTurn has rolled the tree session back to the pre-turn
-			// state, discarding the user message and any tool call/result
-			// pairs from the cancelled turn. Sync the in-memory store
-			// to match the rolled-back tree session.
+			// Step was cancelled by the user (double-ESC). The SDK
+			// preserves the user message and any completed tool
+			// call/result pairs; only the in-progress message or tool
+			// call is discarded. Sync the in-memory store to match.
 			if ts := a.opts.TreeSession; ts != nil {
 				a.store.Replace(ts.GetFantasyMessages())
 			}
@@ -672,6 +671,22 @@ func (a *App) subscribeSDKEvents(sendFn func(tea.Msg)) func() {
 			sendFn(StreamChunkEvent{Content: ev.Chunk})
 		case kit.ReasoningDeltaEvent:
 			sendFn(ReasoningChunkEvent{Delta: ev.Delta})
+		case kit.ToolOutputEvent:
+			sendFn(ToolOutputEvent{
+				ToolCallID: ev.ToolCallID,
+				ToolName:   ev.ToolName,
+				Chunk:      ev.Chunk,
+				IsStderr:   ev.IsStderr,
+			})
+		case kit.StepUsageEvent:
+			if a.opts.UsageTracker != nil {
+				a.opts.UsageTracker.UpdateUsage(
+					int(ev.InputTokens),
+					int(ev.OutputTokens),
+					int(ev.CacheReadTokens),
+					int(ev.CacheWriteTokens),
+				)
+			}
 		}
 	}))

@@ -54,6 +54,19 @@ type ToolResultEvent struct {
 	IsError bool
 }

+// ToolOutputEvent is sent when a tool produces streaming output chunks (e.g., bash output).
+// This allows the TUI to display tool output as it arrives, before the tool completes.
+type ToolOutputEvent struct {
+	// ToolCallID is the stable identifier for the tool call producing output.
+	ToolCallID string
+	// ToolName is the name of the tool producing output.
+	ToolName string
+	// Chunk is a piece of the tool's output text.
+	Chunk string
+	// IsStderr indicates whether this chunk came from stderr.
+	IsStderr bool
+}
+
 // ToolCallContentEvent is sent when a step includes text content alongside tool calls.
 // This allows the TUI to display assistant commentary that accompanies tool usage.
 type ToolCallContentEvent struct {
@@ -157,6 +157,32 @@ type Theme struct {
 	Markdown MarkdownThemeConfig `json:"markdown,omitzero" yaml:"markdown,omitempty"`
 }

+// CustomModelConfig defines a custom model that can be used with custom/custom
+// or other custom/ prefixed models. These models are loaded from the config file
+// and merged into the custom provider in the model registry.
+type CustomModelConfig struct {
+	Name        string      `json:"name" yaml:"name"`
+	Family      string      `json:"family,omitempty" yaml:"family,omitempty"`
+	Attachment  bool        `json:"attachment,omitempty" yaml:"attachment,omitempty"`
+	Reasoning   bool        `json:"reasoning,omitempty" yaml:"reasoning,omitempty"`
+	Temperature bool        `json:"temperature,omitempty" yaml:"temperature,omitempty"`
+	Knowledge   string      `json:"knowledge,omitempty" yaml:"knowledge,omitempty"`
+	Cost        CostConfig  `json:"cost" yaml:"cost"`
+	Limit       LimitConfig `json:"limit" yaml:"limit"`
+}
+
+// CostConfig defines the pricing for a custom model.
+type CostConfig struct {
+	Input  float64 `json:"input" yaml:"input"`
+	Output float64 `json:"output" yaml:"output"`
+}
+
+// LimitConfig defines context and output limits for a custom model.
+type LimitConfig struct {
+	Context int `json:"context" yaml:"context"`
+	Output  int `json:"output" yaml:"output"`
+}
+
 // Config represents the complete application configuration including MCP servers,
 // model settings, UI preferences, and API credentials. It supports both command-line
 // flags and configuration file settings.
@@ -187,6 +213,9 @@ type Config struct {
 	// Prompt templates configuration
 	Prompts           []string `json:"prompts,omitempty" yaml:"prompts,omitempty"`
 	NoPromptTemplates bool     `json:"no-prompt-templates,omitempty" yaml:"no-prompt-templates,omitempty"`
+
+	// Custom model definitions (under custom/ provider)
+	CustomModels map[string]CustomModelConfig `json:"customModels,omitempty" yaml:"customModels,omitempty"`
 }

 // GetTransportType returns the transport type for the server config, mapping
@@ -1,17 +1,41 @@
 package core

 import (
-	"bytes"
+	"bufio"
 	"context"
 	"fmt"
+	"io"
 	"os"
 	"os/exec"
 	"strings"
+	"sync"
 	"time"

 	"charm.land/fantasy"
 )

+// ToolOutputCallback is the signature for streaming tool output.
+// It receives tool call ID, tool name, output chunk, and whether it's stderr.
+type ToolOutputCallback func(toolCallID, toolName, chunk string, isStderr bool)
+
+// contextKey is a custom type for context keys to avoid collisions.
+type contextKey string
+
+const toolOutputCallbackKey contextKey = "toolOutputCallback"
+
+// ContextWithToolOutputCallback returns a new context with the tool output callback set.
+func ContextWithToolOutputCallback(ctx context.Context, callback ToolOutputCallback) context.Context {
+	return context.WithValue(ctx, toolOutputCallbackKey, callback)
+}
+
+// toolOutputCallbackFromContext retrieves the tool output callback from context.
+func toolOutputCallbackFromContext(ctx context.Context) ToolOutputCallback {
+	if cb, ok := ctx.Value(toolOutputCallbackKey).(ToolOutputCallback); ok {
+		return cb
+	}
+	return nil
+}
+
 const defaultBashTimeout = 120 * time.Second
 const maxBashTimeout = 600 * time.Second

@@ -99,32 +123,157 @@ func executeBash(ctx context.Context, call fantasy.ToolCall, workDir string) (fa
 	}
 	cmd.Env = append(os.Environ(), "SHELL="+bashPath)

-	var stdout, stderr bytes.Buffer
-	cmd.Stdout = &stdout
-	cmd.Stderr = &stderr
+	// Get the output callback if present (for streaming support)
+	outputCallback := toolOutputCallbackFromContext(ctx)

-	err = cmd.Run()
+	if outputCallback != nil {
+		// Streaming mode: use pipes to capture output as it arrives
+		return executeBashStreaming(cmdCtx, call, cmd, outputCallback)
+	}
+
+	// Non-streaming mode: collect all output at once (original behavior)
+	return executeBashBuffered(cmdCtx, call, cmd)
+}
+
+// executeBashBuffered collects all output before returning (original behavior).
+// It uses explicit pipes (not cmd.Stdout) so that cmd.WaitDelay can forcibly
+// close them when grandchild processes hold pipe handles open after the
+// direct child exits.
+func executeBashBuffered(cmdCtx context.Context, call fantasy.ToolCall, cmd *exec.Cmd) (fantasy.ToolResponse, error) {
+	stdoutPipe, err := cmd.StdoutPipe()
+	if err != nil {
+		return fantasy.NewTextErrorResponse("failed to create stdout pipe"), nil
+	}
+	stderrPipe, err := cmd.StderrPipe()
+	if err != nil {
+		return fantasy.NewTextErrorResponse("failed to create stderr pipe"), nil
+	}
+
+	if err := cmd.Start(); err != nil {
+		return fantasy.NewTextErrorResponse(fmt.Sprintf("failed to start command: %v", err)), nil
+	}
+
+	// Read pipes concurrently
+	var wg sync.WaitGroup
+	var stdout, stderr strings.Builder
+	var stdoutErr, stderrErr error
+
+	wg.Add(2)
+	go func() {
+		defer wg.Done()
+		_, stdoutErr = io.Copy(&stdout, stdoutPipe)
+	}()
+	go func() {
+		defer wg.Done()
+		_, stderrErr = io.Copy(&stderr, stderrPipe)
+	}()
+
+	// Wait for the process to exit first. cmd.WaitDelay ensures that if
+	// pipes remain open (held by grandchild processes), they'll be forcibly
+	// closed after the grace period, which unblocks the io.Copy goroutines.
+	waitErr := cmd.Wait()
+
+	// Wait for pipe readers to finish draining.
+	wg.Wait()
+
+	// Ignore pipe read errors caused by WaitDelay force-closing —
+	// we still have whatever was read before the close.
+	_ = stdoutErr
+	_ = stderrErr
+
+	exitCode := 0
+	if waitErr != nil {
+		if exitErr, ok := waitErr.(*exec.ExitError); ok {
+			exitCode = exitErr.ExitCode()
+		} else if cmdCtx.Err() == context.DeadlineExceeded {
+			return fantasy.NewTextErrorResponse("command timed out"), nil
+		}
+	}
+
+	return buildBashResponse(stdout.String(), stderr.String(), exitCode)
+}
+
+// executeBashStreaming streams output as it arrives via the callback.
+func executeBashStreaming(cmdCtx context.Context, call fantasy.ToolCall, cmd *exec.Cmd, outputCallback ToolOutputCallback) (fantasy.ToolResponse, error) {
+	stdoutPipe, err := cmd.StdoutPipe()
+	if err != nil {
+		return fantasy.NewTextErrorResponse("failed to create stdout pipe"), nil
+	}
+	stderrPipe, err := cmd.StderrPipe()
+	if err != nil {
+		return fantasy.NewTextErrorResponse("failed to create stderr pipe"), nil
+	}
+
+	// Start command execution
+	if err := cmd.Start(); err != nil {
+		return fantasy.NewTextErrorResponse(fmt.Sprintf("failed to start command: %v", err)), nil
+	}
+
+	// Stream stdout and stderr concurrently
+	var wg sync.WaitGroup
+	var mu sync.Mutex
+	var stdoutChunks, stderrChunks []string
+
+	streamOutput := func(reader io.Reader, isStderr bool) {
+		defer wg.Done()
+		scanner := bufio.NewScanner(reader)
+		// Use larger buffer for long lines
+		buf := make([]byte, 0, 64*1024)
+		scanner.Buffer(buf, 1024*1024)
+
+		for scanner.Scan() {
+			chunk := scanner.Text()
+			// Send chunk to UI
+			outputCallback(call.ID, "bash", chunk, isStderr)
+			// Collect for final result
+			mu.Lock()
+			if isStderr {
+				stderrChunks = append(stderrChunks, chunk)
+			} else {
+				stdoutChunks = append(stdoutChunks, chunk)
+			}
+			mu.Unlock()
+		}
+	}
+
+	wg.Add(2)
+	go streamOutput(stdoutPipe, false)
+	go streamOutput(stderrPipe, true)
+
+	// Wait for the process to exit. cmd.WaitDelay ensures that if pipes
+	// remain open (held by grandchild processes), they'll be forcibly closed
+	// after the grace period, which unblocks the scanners above.
+	err = cmd.Wait()
+
+	// Wait for the pipe readers to finish draining. This will complete
+	// quickly since cmd.Wait() (with WaitDelay) has already ensured
+	// the pipes are closed.
+	wg.Wait()

 	exitCode := 0
 	if err != nil {
 		if exitErr, ok := err.(*exec.ExitError); ok {
 			exitCode = exitErr.ExitCode()
 		} else if cmdCtx.Err() == context.DeadlineExceeded {
-			return fantasy.NewTextErrorResponse(fmt.Sprintf("command timed out after %v", timeout)), nil
+			return fantasy.NewTextErrorResponse("command timed out"), nil
 		}
 	}

-	// Build result
+	return buildBashResponse(strings.Join(stdoutChunks, "\n"), strings.Join(stderrChunks, "\n"), exitCode)
+}
+
+// buildBashResponse constructs the final tool response from stdout/stderr.
+func buildBashResponse(stdout, stderr string, exitCode int) (fantasy.ToolResponse, error) {
 	var result strings.Builder
-	if stdout.Len() > 0 {
-		result.WriteString(stdout.String())
+	if stdout != "" {
+		result.WriteString(stdout)
 	}
-	if stderr.Len() > 0 {
+	if stderr != "" {
 		if result.Len() > 0 {
 			result.WriteString("\n")
 		}
 		result.WriteString("STDERR:\n")
-		result.WriteString(stderr.String())
+		result.WriteString(stderr)
 	}
 	if exitCode != 0 {
 		if result.Len() > 0 {
@@ -0,0 +1,129 @@
+package core
+
+import (
+	"context"
+	"encoding/json"
+	"testing"
+	"time"
+
+	"charm.land/fantasy"
+)
+
+// helper to create a bash tool call with the given command and optional timeout.
+func bashCall(command string, timeout float64) fantasy.ToolCall {
+	args := map[string]any{"command": command}
+	if timeout > 0 {
+		args["timeout"] = timeout
+	}
+	input, _ := json.Marshal(args)
+	return fantasy.ToolCall{
+		ID:    "test-call",
+		Name:  "bash",
+		Input: string(input),
+	}
+}
+
+func TestBash_SimpleCommand(t *testing.T) {
+	resp, err := executeBash(context.Background(), bashCall("echo hello", 0), "")
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+	if resp.IsError {
+		t.Fatalf("expected success, got error: %s", resp.Content)
+	}
+	if resp.Content != "hello\n" {
+		t.Errorf("expected 'hello\\n', got %q", resp.Content)
+	}
+}
+
+func TestBash_TimeoutKillsProcess(t *testing.T) {
+	start := time.Now()
+	resp, err := executeBash(context.Background(), bashCall("sleep 60", 2), "")
+	elapsed := time.Since(start)
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+	if !resp.IsError {
+		t.Fatal("expected error response for timed-out command")
+	}
+	if elapsed > 10*time.Second {
+		t.Errorf("command took %v, expected ~2s timeout", elapsed)
+	}
+}
+
+func TestBash_BackgroundProcessDoesNotHang(t *testing.T) {
+	// This command spawns a background sleep that would hold pipes open
+	// forever if we didn't have process group killing + WaitDelay.
+	start := time.Now()
+	resp, err := executeBash(context.Background(), bashCall("echo done; sleep 3600 &", 5), "")
+	elapsed := time.Since(start)
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+	// The foreground command (echo) should complete quickly
+	if elapsed > 5*time.Second {
+		t.Errorf("command took %v, should complete in <5s (background process should not block)", elapsed)
+	}
+	if resp.IsError {
+		t.Fatalf("expected success, got error: %s", resp.Content)
+	}
+}
+
+func TestBash_BackgroundProcessDoesNotHang_Streaming(t *testing.T) {
+	// Same test but in streaming mode (with output callback).
+	ctx := ContextWithToolOutputCallback(context.Background(), func(_, _, _ string, _ bool) {})
+	start := time.Now()
+	resp, err := executeBash(ctx, bashCall("echo streaming; sleep 3600 &", 5), "")
+	elapsed := time.Since(start)
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+	if elapsed > 5*time.Second {
+		t.Errorf("streaming command took %v, should complete in <5s", elapsed)
+	}
+	if resp.IsError {
+		t.Fatalf("expected success, got error: %s", resp.Content)
+	}
+}
+
+func TestBash_ContextCancellation(t *testing.T) {
+	ctx, cancel := context.WithCancel(context.Background())
+
+	done := make(chan struct{})
+	go func() {
+		defer close(done)
+		_, _ = executeBash(ctx, bashCall("sleep 60", 0), "")
+	}()
+
+	// Cancel after a short delay
+	time.Sleep(500 * time.Millisecond)
+	cancel()
+
+	// Should return promptly after cancellation
+	select {
+	case <-done:
+		// success
+	case <-time.After(5 * time.Second):
+		t.Fatal("executeBash did not return after context cancellation")
+	}
+}
+
+func TestBash_BannedCommand(t *testing.T) {
+	resp, err := executeBash(context.Background(), bashCall("alias foo=bar", 0), "")
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+	if !resp.IsError {
+		t.Fatal("expected error for banned command")
+	}
+}
+
+func TestBash_EmptyCommand(t *testing.T) {
+	resp, err := executeBash(context.Background(), bashCall("", 0), "")
+	if err != nil {
+		t.Fatalf("unexpected error: %v", err)
+	}
+	if !resp.IsError {
+		t.Fatal("expected error for empty command")
+	}
+}
@@ -727,6 +727,7 @@ type API struct {
 	onToolCall                func(func(ToolCallEvent, Context) *ToolCallResult)
 	onToolExecStart           func(func(ToolExecutionStartEvent, Context))
 	onToolExecEnd             func(func(ToolExecutionEndEvent, Context))
+	onToolOutput              func(func(ToolOutputEvent, Context))
 	onToolResult              func(func(ToolResultEvent, Context) *ToolResultResult)
 	onInput                   func(func(InputEvent, Context) *InputResult)
 	onBeforeAgentStart        func(func(BeforeAgentStartEvent, Context) *BeforeAgentStartResult)
@@ -767,6 +768,13 @@ func (a *API) OnToolExecutionEnd(handler func(ToolExecutionEndEvent, Context)) {
 	a.onToolExecEnd(handler)
 }

+// OnToolOutput registers a handler for streaming tool output chunks.
+// This fires for each output line as it arrives from tools like bash,
+// allowing extensions to observe or process output in real-time.
+func (a *API) OnToolOutput(handler func(ToolOutputEvent, Context)) {
+	a.onToolOutput(handler)
+}
+
 // OnToolResult registers a handler that fires after tool execution.
 // Return a non-nil ToolResultResult to modify the output.
 func (a *API) OnToolResult(handler func(ToolResultEvent, Context) *ToolResultResult) {
@@ -1538,6 +1546,19 @@ type ToolExecutionEndEvent struct {

 func (e ToolExecutionEndEvent) Type() EventType { return ToolExecutionEnd }

+// ToolOutputEvent fires when a tool produces streaming output chunks.
+// This is primarily used for long-running tools like bash to show output
+// in real-time as it arrives, before the tool completes.
+type ToolOutputEvent struct {
+	ToolCallID string
+	ToolName   string
+	ToolKind   string
+	Chunk      string // Output text chunk
+	IsStderr   bool   // Whether this chunk came from stderr
+}
+
+func (e ToolOutputEvent) Type() EventType { return ToolOutput }
+
 // ToolResultEvent fires after tool execution with the output.
 type ToolResultEvent struct {
 	ToolCallID string
@@ -19,6 +19,9 @@ const (
 	// ToolExecutionEnd fires when a tool finishes executing.
 	ToolExecutionEnd EventType = "tool_execution_end"

+	// ToolOutput fires when a tool produces streaming output chunks.
+	ToolOutput EventType = "tool_output"
+
 	// ToolResult fires after a tool executes. Handlers can modify the result.
 	ToolResult EventType = "tool_result"

@@ -439,6 +439,12 @@ func loadSingleExtension(path string) (*LoadedExtension, error) {
 				return nil
 			})
 		},
+		onToolOutput: func(h func(ToolOutputEvent, Context)) {
+			reg(ToolOutput, func(e Event, c Context) Result {
+				h(e.(ToolOutputEvent), c)
+				return nil
+			})
+		},
 		onToolResult: func(h func(ToolResultEvent, Context) *ToolResultResult) {
 			reg(ToolResult, func(e Event, c Context) Result {
 				r := h(e.(ToolResultEvent), c)
@@ -128,6 +128,7 @@ func Symbols() interp.Exports {
 			"ToolCallResult":          reflect.ValueOf((*ToolCallResult)(nil)),
 			"ToolExecutionStartEvent": reflect.ValueOf((*ToolExecutionStartEvent)(nil)),
 			"ToolExecutionEndEvent":   reflect.ValueOf((*ToolExecutionEndEvent)(nil)),
+			"ToolOutputEvent":         reflect.ValueOf((*ToolOutputEvent)(nil)),
 			"ToolResultEvent":         reflect.ValueOf((*ToolResultEvent)(nil)),
 			"ToolResultResult":        reflect.ValueOf((*ToolResultResult)(nil)),
 			"InputEvent":              reflect.ValueOf((*InputEvent)(nil)),
@@ -30,6 +30,12 @@ func NewTestAPI(ext *LoadedExtension) API {
 				return nil
 			})
 		},
+		onToolOutput: func(h func(ToolOutputEvent, Context)) {
+			reg(ToolOutput, func(e Event, c Context) Result {
+				h(e.(ToolOutputEvent), c)
+				return nil
+			})
+		},
 		onToolResult: func(h func(ToolResultEvent, Context) *ToolResultResult) {
 			reg(ToolResult, func(e Event, c Context) Result {
 				r := h(e.(ToolResultEvent), c)
@@ -0,0 +1,74 @@
+package models
+
+import (
+	"log"
+
+	"github.com/spf13/viper"
+)
+
+// loadCustomModelsFromConfig loads custom model definitions from the config file
+// and returns them as a map of model ID -> ModelInfo. Returns nil if no custom
+// models are configured.
+func loadCustomModelsFromConfig() map[string]ModelInfo {
+	if !viper.IsSet("customModels") {
+		return nil
+	}
+
+	var customModels map[string]CustomModelConfig
+	if err := viper.UnmarshalKey("customModels", &customModels); err != nil {
+		log.Printf("Warning: Failed to parse customModels: %v", err)
+		return nil
+	}
+
+	result := make(map[string]ModelInfo, len(customModels))
+	for modelID, cfg := range customModels {
+		info := modelConfigToModelInfo(modelID, cfg)
+		result[modelID] = info
+	}
+
+	return result
+}
+
+// modelConfigToModelInfo converts a CustomModelConfig to a ModelInfo.
+func modelConfigToModelInfo(modelID string, cfg CustomModelConfig) ModelInfo {
+	return ModelInfo{
+		ID:          modelID,
+		Name:        cfg.Name,
+		Attachment:  cfg.Attachment,
+		Reasoning:   cfg.Reasoning,
+		Temperature: cfg.Temperature,
+		Cost: Cost{
+			Input:  cfg.Cost.Input,
+			Output: cfg.Cost.Output,
+		},
+		Limit: Limit{
+			Context: cfg.Limit.Context,
+			Output:  cfg.Limit.Output,
+		},
+	}
+}
+
+// CustomModelConfig defines a custom model configuration loaded from the config file.
+// This is a duplicate here to avoid circular dependencies with internal/config.
+type CustomModelConfig struct {
+	Name        string      `json:"name" yaml:"name"`
+	Family      string      `json:"family,omitempty" yaml:"family,omitempty"`
+	Attachment  bool        `json:"attachment,omitempty" yaml:"attachment,omitempty"`
+	Reasoning   bool        `json:"reasoning,omitempty" yaml:"reasoning,omitempty"`
+	Temperature bool        `json:"temperature,omitempty" yaml:"temperature,omitempty"`
+	Knowledge   string      `json:"knowledge,omitempty" yaml:"knowledge,omitempty"`
+	Cost        CostConfig  `json:"cost" yaml:"cost"`
+	Limit       LimitConfig `json:"limit" yaml:"limit"`
+}
+
+// CostConfig defines the pricing for a custom model.
+type CostConfig struct {
+	Input  float64 `json:"input" yaml:"input"`
+	Output float64 `json:"output" yaml:"output"`
+}
+
+// LimitConfig defines context and output limits for a custom model.
+type LimitConfig struct {
+	Context int `json:"context" yaml:"context"`
+	Output  int `json:"output" yaml:"output"`
+}
@@ -253,6 +253,8 @@ func CreateProvider(ctx context.Context, config *ProviderConfig) (*ProviderResul
 		return createBedrockProvider(ctx, config, modelName)
 	case "vercel":
 		return createVercelProvider(ctx, config, modelName)
+	case "custom":
+		return createCustomProvider(ctx, config, modelName)
 	default:
 		return autoRouteProvider(ctx, config, provider, modelName, registry)
 	}
@@ -779,6 +781,42 @@ func createVercelProvider(ctx context.Context, config *ProviderConfig, modelName
 	return &ProviderResult{Model: model}, nil
 }

+func createCustomProvider(ctx context.Context, config *ProviderConfig, modelName string) (*ProviderResult, error) {
+	if config.ProviderURL == "" {
+		return nil, fmt.Errorf("custom provider requires --provider-url")
+	}
+
+	apiKey := config.ProviderAPIKey
+	if apiKey == "" {
+		apiKey = os.Getenv("CUSTOM_API_KEY")
+	}
+	if apiKey == "" {
+		// Many local/custom endpoints don't require a key; use a placeholder.
+		apiKey = "custom"
+	}
+
+	var opts []openaicompat.Option
+	opts = append(opts, openaicompat.WithBaseURL(config.ProviderURL))
+	opts = append(opts, openaicompat.WithAPIKey(apiKey))
+	opts = append(opts, openaicompat.WithName("custom"))
+
+	if config.TLSSkipVerify {
+		opts = append(opts, openaicompat.WithHTTPClient(createHTTPClientWithTLSConfig(true)))
+	}
+
+	p, err := openaicompat.New(opts...)
+	if err != nil {
+		return nil, fmt.Errorf("failed to create custom provider: %w", err)
+	}
+
+	model, err := p.LanguageModel(ctx, modelName)
+	if err != nil {
+		return nil, fmt.Errorf("failed to create custom model: %w", err)
+	}
+
+	return &ProviderResult{Model: model}, nil
+}
+
 func createOllamaProvider(ctx context.Context, config *ProviderConfig, modelName string) (*ProviderResult, error) {
 	baseURL := "http://localhost:11434"
 	if host := os.Getenv("OLLAMA_HOST"); host != "" {
@@ -116,6 +116,47 @@ func buildFromModelsDB() map[string]ProviderInfo {
 		}
 	}

+	// Register the "custom" provider stub for --provider-url without --model.
+	// This allows users to point kit at any OpenAI-compatible endpoint without
+	// needing to specify a model from the database.
+	providers["custom"] = ProviderInfo{
+		ID:   "custom",
+		Name: "Custom",
+		Models: map[string]ModelInfo{
+			"custom": {
+				ID:          "custom",
+				Name:        "Custom",
+				Attachment:  false,
+				Reasoning:   true,
+				Temperature: true,
+				Cost: Cost{
+					Input:  0,
+					Output: 0,
+				},
+				Limit: Limit{
+					Context: 262_144,
+					Output:  65_536,
+				},
+			},
+		},
+	}
+
+	// Load custom models from config file and merge into custom provider.
+	// Config file models take precedence - if a model ID exists in both
+	// models.dev and config, the config version wins.
+	if customModels := loadCustomModelsFromConfig(); customModels != nil {
+		for modelID, info := range customModels {
+			// Validate custom model config
+			if info.Limit.Context <= 0 {
+				fmt.Fprintf(os.Stderr, "Warning: custom model %q has invalid context limit: %d\n", modelID, info.Limit.Context)
+			}
+			if info.Limit.Output <= 0 {
+				fmt.Fprintf(os.Stderr, "Warning: custom model %q has invalid output limit: %d\n", modelID, info.Limit.Output)
+			}
+			providers["custom"].Models[modelID] = info
+		}
+	}
+
 	return providers
 }

@@ -111,14 +111,25 @@ func formatToolParams(toolArgs string, maxWidth int) string {
 		result.WriteString(primaryVal)
 	}

-	// Collect remaining parameters (skip large values like file content)
+	// Collect remaining parameters, skipping body-content keys (already
+	// rendered in the tool body) and any values that are too large.
+	bodyKeys := map[string]bool{
+		"content":  true,
+		"old_text": true,
+		"new_text": true,
+		"oldText":  true,
+		"newText":  true,
+		"todos":    true,
+	}
 	var remaining []string
 	for key, val := range params {
 		if key == primaryKey {
 			continue
 		}
+		if bodyKeys[key] {
+			continue
+		}
 		valStr := fmt.Sprintf("%v", val)
-		// Skip very large values (e.g., oldString, newString, content, todos)
 		if len(valStr) > 100 {
 			continue
 		}
@@ -7,6 +7,7 @@ import (
 	"os"
 	"os/exec"
 	"strings"
+	"sync"
 	"time"

 	tea "charm.land/bubbletea/v2"
@@ -560,6 +561,16 @@ type AppModel struct {
 	// width and height track the terminal dimensions.
 	width  int
 	height int
+
+	// streamingBashOutput holds the current streaming bash output lines.
+	// Lines are accumulated as they arrive and displayed in the stream region.
+	streamingBashOutput []string
+	// streamingBashStderr holds stderr lines separately (rendered differently).
+	streamingBashStderr []string
+	// streamingBashMaxLines caps how many lines to accumulate to prevent memory issues.
+	streamingBashMaxLines int
+	// streamingMu protects the streaming bash output fields from concurrent access.
+	streamingMu sync.RWMutex
 }

 // --------------------------------------------------------------------------
@@ -670,6 +681,9 @@ func NewAppModel(appCtrl AppController, opts AppModelOptions) *AppModel {
 	m.mcpToolCount = opts.MCPToolCount
 	m.extensionToolCount = opts.ExtensionToolCount

+	// Initialize streaming bash output buffer.
+	m.streamingBashMaxLines = 50 // cap to prevent memory issues
+
 	// Wire up child components now that we have the concrete implementations.
 	m.input = NewInputComponent(width, "Enter your prompt (Type /help for commands, Ctrl+C to quit)", appCtrl)

@@ -1312,12 +1326,35 @@ func (m *AppModel) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
 	case app.ToolResultEvent:
 		// Buffer tool result for scrollback.
 		m.printToolResult(msg)
+		// Clear streaming bash output since tool completed.
+		m.streamingMu.Lock()
+		m.streamingBashOutput = nil
+		m.streamingBashStderr = nil
+		m.streamingMu.Unlock()
 		// Start spinner again while waiting for the next LLM response.
 		if m.stream != nil {
 			_, cmd := m.stream.Update(app.SpinnerEvent{Show: true})
 			cmds = append(cmds, cmd)
 		}

+	case app.ToolOutputEvent:
+		// Accumulate streaming bash output for display.
+		m.streamingMu.Lock()
+		if msg.IsStderr {
+			m.streamingBashStderr = append(m.streamingBashStderr, msg.Chunk)
+			// Cap stderr lines to prevent memory issues.
+			if len(m.streamingBashStderr) > m.streamingBashMaxLines {
+				m.streamingBashStderr = m.streamingBashStderr[len(m.streamingBashStderr)-m.streamingBashMaxLines:]
+			}
+		} else {
+			m.streamingBashOutput = append(m.streamingBashOutput, msg.Chunk)
+			// Cap stdout lines to prevent memory issues.
+			if len(m.streamingBashOutput) > m.streamingBashMaxLines {
+				m.streamingBashOutput = m.streamingBashOutput[len(m.streamingBashOutput)-m.streamingBashMaxLines:]
+			}
+		}
+		m.streamingMu.Unlock()
+
 	case app.ToolCallContentEvent:
 		// In streaming mode this text was already delivered via StreamChunkEvents
 		// and will be flushed before the next tool call. Ignore to avoid
@@ -1670,24 +1707,114 @@ func (m *AppModel) View() tea.View {

 // renderStream returns the stream region content.
 func (m *AppModel) renderStream() string {
-	if m.stream == nil {
+	theme := GetTheme()
+
+	var parts []string
+
+	// Stream component content (LLM streaming text, reasoning, spinner placeholder).
+	if m.stream != nil {
+		if content := m.stream.View().Content; content != "" {
+			parts = append(parts, content)
+		}
+	}
+
+	// Streaming bash output section (if any).
+	bashView := m.renderStreamingBashOutput(theme)
+	if bashView != "" {
+		parts = append(parts, bashView)
+	}
+
+	if len(parts) == 0 {
 		return ""
 	}

 	// Show canceling warning if set.
 	if m.canceling {
-		theme := GetTheme()
 		warning := lipgloss.NewStyle().
 			Foreground(theme.Warning).
 			Bold(true).
 			Render("  ⚠ Press ESC again to cancel")
-		return lipgloss.JoinVertical(lipgloss.Left,
-			m.stream.View().Content,
-			warning,
-		)
+		parts = append(parts, warning)
 	}

-	return m.stream.View().Content
+	return lipgloss.JoinVertical(lipgloss.Left, parts...)
+}
+
+// renderStreamingBashOutput renders accumulated streaming bash output (stdout + stderr)
+// below the LLM streaming text. Returns empty string if no bash output is present.
+// Lines are truncated to the terminal width and capped to maxBashLines to prevent
+// long-running commands from blowing up the TUI layout.
+func (m *AppModel) renderStreamingBashOutput(theme Theme) string {
+	m.streamingMu.RLock()
+	stdoutLines := make([]string, len(m.streamingBashOutput))
+	copy(stdoutLines, m.streamingBashOutput)
+	stderrLines := make([]string, len(m.streamingBashStderr))
+	copy(stderrLines, m.streamingBashStderr)
+	m.streamingMu.RUnlock()
+
+	if len(stdoutLines) == 0 && len(stderrLines) == 0 {
+		return ""
+	}
+
+	const lineIndent = "  "
+	lineWidth := max(m.width-2-len(lineIndent), 20)
+	// Account for PaddingLeft(1) on the output/stderr styles.
+	maxLineChars := lineWidth - 1
+
+	outputStyle := lipgloss.NewStyle().
+		Background(theme.CodeBg).
+		PaddingLeft(1)
+
+	stderrStyle := lipgloss.NewStyle().
+		Foreground(theme.Error).
+		Background(theme.CodeBg).
+		PaddingLeft(1)
+
+	// Cap displayed lines to maxBashLines (show the tail, since streaming
+	// output is most useful at the end). The buffer itself is larger to
+	// preserve context, but we only render the last N lines.
+	totalLines := len(stdoutLines) + len(stderrLines)
+	var hiddenCount int
+	if totalLines > maxBashLines {
+		hiddenCount = totalLines - maxBashLines
+		// Trim from stdout first (older output), then stderr.
+		remaining := maxBashLines
+		if len(stderrLines) >= remaining {
+			stdoutLines = nil
+			stderrLines = stderrLines[len(stderrLines)-remaining:]
+		} else {
+			remaining -= len(stderrLines)
+			if len(stdoutLines) > remaining {
+				stdoutLines = stdoutLines[len(stdoutLines)-remaining:]
+			}
+		}
+	}
+
+	var lines []string
+
+	// Truncation hint at the top.
+	if hiddenCount > 0 {
+		hint := fmt.Sprintf("...(%d more lines above)", hiddenCount)
+		hintContent := outputStyle.Width(lineWidth).
+			Foreground(theme.Muted).Italic(true).Render(hint)
+		lines = append(lines, lineIndent+hintContent)
+	}
+
+	// Render stdout lines.
+	for _, line := range stdoutLines {
+		line = truncateLine(strings.TrimRight(line, "\n"), maxLineChars)
+		styled := outputStyle.Width(lineWidth).Render(line)
+		lines = append(lines, lineIndent+styled)
+	}
+
+	// Render stderr lines with error styling.
+	for _, line := range stderrLines {
+		line = truncateLine(strings.TrimRight(line, "\n"), maxLineChars)
+		styled := stderrStyle.Width(lineWidth).Render(line)
+		lines = append(lines, lineIndent+styled)
+	}
+
+	return strings.Join(lines, "\n")
 }

 // renderStatusBar renders a persistent single-line status bar below the input.
@@ -112,15 +112,16 @@ func newTestAppModel(ctrl AppController) (*AppModel, *stubStreamComponent, *stub
 	stream := &stubStreamComponent{}
 	input := &stubInputComponent{}
 	m := &AppModel{
-		state:       stateInput,
-		appCtrl:     ctrl,
-		stream:      stream,
-		input:       input,
-		renderer:    newMessageRenderer(80, false),
-		compactMode: false,
-		modelName:   "test-model",
-		width:       80,
-		height:      24,
+		state:                 stateInput,
+		appCtrl:               ctrl,
+		stream:                stream,
+		input:                 input,
+		renderer:              newMessageRenderer(80, false),
+		compactMode:           false,
+		modelName:             "test-model",
+		width:                 80,
+		height:                24,
+		streamingBashMaxLines: 50, // Initialize buffer cap like NewAppModel does
 	}
 	return m, stream, input
 }
@@ -602,6 +603,82 @@ func TestToolResult_printsAndStartsSpinner(t *testing.T) {
 	}
 }

+// TestToolOutputEvent_accumulatesBashOutput verifies that ToolOutputEvent
+// accumulates stdout and stderr lines into the streaming bash output buffers.
+func TestToolOutputEvent_accumulatesBashOutput(t *testing.T) {
+	ctrl := &stubAppController{}
+	m, _, _ := newTestAppModel(ctrl)
+	m.state = stateWorking
+
+	// Send stdout chunk.
+	m = sendMsg(m, app.ToolOutputEvent{
+		ToolCallID: "call-1",
+		ToolName:   "bash",
+		Chunk:      "line one\n",
+		IsStderr:   false,
+	})
+
+	if len(m.streamingBashOutput) != 1 || m.streamingBashOutput[0] != "line one\n" {
+		t.Fatalf("expected streamingBashOutput=['line one\\n'], got %v", m.streamingBashOutput)
+	}
+	if len(m.streamingBashStderr) != 0 {
+		t.Fatalf("expected empty streamingBashStderr, got %v", m.streamingBashStderr)
+	}
+
+	// Send another stdout chunk.
+	m = sendMsg(m, app.ToolOutputEvent{
+		ToolCallID: "call-1",
+		ToolName:   "bash",
+		Chunk:      "line two\n",
+		IsStderr:   false,
+	})
+
+	if len(m.streamingBashOutput) != 2 {
+		t.Fatalf("expected 2 stdout lines, got %d", len(m.streamingBashOutput))
+	}
+
+	// Send stderr chunk.
+	m = sendMsg(m, app.ToolOutputEvent{
+		ToolCallID: "call-1",
+		ToolName:   "bash",
+		Chunk:      "error: something failed\n",
+		IsStderr:   true,
+	})
+
+	if len(m.streamingBashStderr) != 1 {
+		t.Fatalf("expected 1 stderr line, got %d", len(m.streamingBashStderr))
+	}
+	if m.streamingBashStderr[0] != "error: something failed\n" {
+		t.Fatalf("expected stderr 'error: something failed\\n', got %q", m.streamingBashStderr[0])
+	}
+}
+
+// TestToolResult_clearsStreamingBashOutput verifies that ToolResultEvent clears
+// the streaming bash output buffers since the final result will be printed.
+func TestToolResult_clearsStreamingBashOutput(t *testing.T) {
+	ctrl := &stubAppController{}
+	m, _, _ := newTestAppModel(ctrl)
+	m.state = stateWorking
+
+	// Accumulate some bash output.
+	m.streamingBashOutput = []string{"output line"}
+	m.streamingBashStderr = []string{"error line"}
+
+	_, _ = m.Update(app.ToolResultEvent{
+		ToolName: "bash",
+		ToolArgs: `{"cmd":"ls"}`,
+		Result:   "output line\nerror line\n",
+		IsError:  false,
+	})
+
+	if len(m.streamingBashOutput) != 0 {
+		t.Fatalf("expected streamingBashOutput cleared, got %v", m.streamingBashOutput)
+	}
+	if len(m.streamingBashStderr) != 0 {
+		t.Fatalf("expected streamingBashStderr cleared, got %v", m.streamingBashStderr)
+	}
+}
+
 // TestStepError_printCmd verifies that StepErrorEvent with a non-nil error
 // produces a non-nil cmd (the tea.Println call for the error message).
 func TestStepError_printCmd(t *testing.T) {
@@ -484,6 +484,7 @@ func (s *StreamComponent) renderReasoningBlock(reasoning string) string {

 	contentStyle := lipgloss.NewStyle().
 		Foreground(theme.Muted).
+		Background(theme.MutedBorder).
 		Italic(true)

 	var parts []string
@@ -495,6 +496,7 @@ func (s *StreamComponent) renderReasoningBlock(reasoning string) string {
 		hidden := len(lines) - maxCollapsedLines
 		hintStyle := lipgloss.NewStyle().
 			Foreground(theme.VeryMuted).
+			Background(theme.MutedBorder).
 			Italic(true)
 		parts = append(parts, hintStyle.Render(fmt.Sprintf("... (%d lines hidden)", hidden)))
 		lines = lines[len(lines)-maxCollapsedLines:]
@@ -517,8 +519,8 @@ func (s *StreamComponent) renderReasoningBlock(reasoning string) string {
 		} else {
 			durationStr = fmt.Sprintf("%.1fs", duration.Seconds())
 		}
-		footer := lipgloss.NewStyle().Foreground(theme.VeryMuted).Render("Thought for ") +
-			lipgloss.NewStyle().Foreground(theme.Info).Render(durationStr)
+		footer := lipgloss.NewStyle().Foreground(theme.VeryMuted).Background(theme.MutedBorder).Render("Thought for ") +
+			lipgloss.NewStyle().Foreground(theme.Info).Background(theme.MutedBorder).Render(durationStr)
 		parts = append(parts, footer)
 	}

@@ -7,11 +7,86 @@ import (
 	"os"
 	"path/filepath"
 	"sort"
+	"strconv"
 	"strings"

 	"gopkg.in/yaml.v3"
 )

+// ---------------------------------------------------------------------------
+// Color derivation helpers
+// ---------------------------------------------------------------------------
+
+// parseHexColor parses a "#RRGGBB" hex string into r, g, b components (0-255).
+func parseHexColor(hex string) (r, g, b int) {
+	hex = strings.TrimPrefix(hex, "#")
+	if len(hex) == 6 {
+		if v, err := strconv.ParseUint(hex[0:2], 16, 8); err == nil {
+			r = int(v)
+		}
+		if v, err := strconv.ParseUint(hex[2:4], 16, 8); err == nil {
+			g = int(v)
+		}
+		if v, err := strconv.ParseUint(hex[4:6], 16, 8); err == nil {
+			b = int(v)
+		}
+	}
+	return
+}
+
+// blendHex linearly interpolates between two hex colors by amount (0.0–1.0).
+func blendHex(base, tint string, amount float64) string {
+	br, bg, bb := parseHexColor(base)
+	tr, tg, tb := parseHexColor(tint)
+	clamp := func(v int) int {
+		if v < 0 {
+			return 0
+		}
+		if v > 255 {
+			return 255
+		}
+		return v
+	}
+	r := clamp(int(float64(br)*(1-amount) + float64(tr)*amount))
+	g := clamp(int(float64(bg)*(1-amount) + float64(tg)*amount))
+	b := clamp(int(float64(bb)*(1-amount) + float64(tb)*amount))
+	return fmt.Sprintf("#%02x%02x%02x", r, g, b)
+}
+
+// deriveDiffBg computes diff / code background colors from the theme's
+// background, success, and error hex pairs. Returns an adaptive color for each
+// diff element. The tint amounts are tuned for subtle differentiation.
+func deriveDiffBg(bgPair, successPair, errorPair [2]string) (diffInsert, diffDelete, diffEqual, diffMissing, codeBg, gutterBg, writeBg color.Color) {
+	derive := func(idx int) (color.Color, color.Color, color.Color, color.Color) {
+		bg := bgPair[idx]
+		// Contrast target: darken for light mode (idx 0), lighten for dark (idx 1).
+		contrast := "#000000"
+		if idx == 1 {
+			contrast = "#ffffff"
+		}
+		ins := blendHex(bg, successPair[idx], 0.13)
+		del := blendHex(bg, errorPair[idx], 0.13)
+		eq := blendHex(bg, contrast, 0.05)
+		miss := blendHex(bg, contrast, 0.03)
+		return AdaptiveColor(ins, ins), AdaptiveColor(del, del), AdaptiveColor(eq, eq), AdaptiveColor(miss, miss)
+	}
+
+	// Pick the correct index based on detected background.
+	idx := 0
+	if isDarkBg {
+		idx = 1
+	}
+	insL, delL, eqL, missL := derive(idx)
+	diffInsert = insL
+	diffDelete = delL
+	diffEqual = eqL
+	diffMissing = missL
+	codeBg = eqL
+	gutterBg = missL
+	writeBg = insL
+	return
+}
+
 // ThemeEntry is a named, loadable theme — either built-in or discovered from disk.
 type ThemeEntry struct {
 	Name   string // Display name (filename stem or preset name)
@@ -80,14 +155,9 @@ func makeTheme(p presetColors) Theme {
 		Accent:      acOr(p.accent, ac(p.primary)),
 		Highlight:   acOr(p.highlight, def.Highlight),
 	}
-	// Derive diff/code backgrounds from the base background.
-	t.DiffInsertBg = def.DiffInsertBg
-	t.DiffDeleteBg = def.DiffDeleteBg
-	t.DiffEqualBg = def.DiffEqualBg
-	t.DiffMissingBg = def.DiffMissingBg
-	t.CodeBg = def.CodeBg
-	t.GutterBg = def.GutterBg
-	t.WriteBg = def.WriteBg
+	// Derive diff/code backgrounds from the theme's own palette.
+	t.DiffInsertBg, t.DiffDeleteBg, t.DiffEqualBg, t.DiffMissingBg,
+		t.CodeBg, t.GutterBg, t.WriteBg = deriveDiffBg(p.background, p.success, p.error_)
 	// Markdown colors.
 	t.Markdown = MarkdownThemeColors{
 		Text:    t.Text,
@@ -609,6 +679,17 @@ func loadThemeFile(path string) (Theme, error) {

 func fileConfigToTheme(cfg themeFileConfig) Theme {
 	def := DefaultTheme()
+
+	// Resolve the base background/success/error hex pairs for diff derivation.
+	// We need the raw hex strings to feed deriveDiffBg.
+	bgPair := resolveHexPair(cfg.Background, [2]string{"#F0F0F0", "#0D0D0D"})
+	successPair := resolveHexPair(cfg.Success, [2]string{"#998800", "#CCAA00"})
+	errorPair := resolveHexPair(cfg.Error, [2]string{"#CC0000", "#FF3333"})
+
+	// Derive diff backgrounds from the theme's own palette.
+	derivedInsert, derivedDelete, derivedEqual, derivedMissing,
+		derivedCodeBg, derivedGutterBg, derivedWriteBg := deriveDiffBg(bgPair, successPair, errorPair)
+
 	return Theme{
 		Primary:     cfg.Primary.resolve(def.Primary),
 		Secondary:   cfg.Secondary.resolve(def.Secondary),
@@ -627,13 +708,13 @@ func fileConfigToTheme(cfg themeFileConfig) Theme {
 		Accent:      cfg.Accent.resolve(def.Accent),
 		Highlight:   cfg.Highlight.resolve(def.Highlight),

-		DiffInsertBg:  cfg.DiffInsertBg.resolve(def.DiffInsertBg),
-		DiffDeleteBg:  cfg.DiffDeleteBg.resolve(def.DiffDeleteBg),
-		DiffEqualBg:   cfg.DiffEqualBg.resolve(def.DiffEqualBg),
-		DiffMissingBg: cfg.DiffMissingBg.resolve(def.DiffMissingBg),
-		CodeBg:        cfg.CodeBg.resolve(def.CodeBg),
-		GutterBg:      cfg.GutterBg.resolve(def.GutterBg),
-		WriteBg:       cfg.WriteBg.resolve(def.WriteBg),
+		DiffInsertBg:  cfg.DiffInsertBg.resolve(derivedInsert),
+		DiffDeleteBg:  cfg.DiffDeleteBg.resolve(derivedDelete),
+		DiffEqualBg:   cfg.DiffEqualBg.resolve(derivedEqual),
+		DiffMissingBg: cfg.DiffMissingBg.resolve(derivedMissing),
+		CodeBg:        cfg.CodeBg.resolve(derivedCodeBg),
+		GutterBg:      cfg.GutterBg.resolve(derivedGutterBg),
+		WriteBg:       cfg.WriteBg.resolve(derivedWriteBg),

 		Markdown: MarkdownThemeColors{
 			Text:    cfg.Markdown.Text.resolve(def.Markdown.Text),
@@ -651,3 +732,17 @@ func fileConfigToTheme(cfg themeFileConfig) Theme {
 		},
 	}
 }
+
+// resolveHexPair returns the hex pair from an adaptiveColorPair, falling back
+// to defaults when the pair is empty.
+func resolveHexPair(a adaptiveColorPair, fallback [2]string) [2]string {
+	light := a.Light
+	if light == "" {
+		light = fallback[0]
+	}
+	dark := a.Dark
+	if dark == "" {
+		dark = fallback[1]
+	}
+	return [2]string{light, dark}
+}
@@ -0,0 +1,85 @@
+package ui
+
+import (
+	"testing"
+)
+
+func TestParseHexColor(t *testing.T) {
+	tests := []struct {
+		hex     string
+		r, g, b int
+	}{
+		{"#000000", 0, 0, 0},
+		{"#ffffff", 255, 255, 255},
+		{"#1e1e2e", 0x1e, 0x1e, 0x2e},
+		{"#a6e3a1", 0xa6, 0xe3, 0xa1},
+		{"#f38ba8", 0xf3, 0x8b, 0xa8},
+	}
+	for _, tt := range tests {
+		r, g, b := parseHexColor(tt.hex)
+		if r != tt.r || g != tt.g || b != tt.b {
+			t.Errorf("parseHexColor(%q) = (%d,%d,%d), want (%d,%d,%d)",
+				tt.hex, r, g, b, tt.r, tt.g, tt.b)
+		}
+	}
+}
+
+func TestBlendHex(t *testing.T) {
+	// Blending with 0 amount should return the base color.
+	got := blendHex("#1e1e2e", "#a6e3a1", 0.0)
+	if got != "#1e1e2e" {
+		t.Errorf("blendHex with 0.0 = %q, want #1e1e2e", got)
+	}
+
+	// Blending with 1.0 amount should return the tint color.
+	got = blendHex("#1e1e2e", "#a6e3a1", 1.0)
+	if got != "#a6e3a1" {
+		t.Errorf("blendHex with 1.0 = %q, want #a6e3a1", got)
+	}
+
+	// Blending black and white at 0.5 should give mid gray.
+	got = blendHex("#000000", "#ffffff", 0.5)
+	// 127 = int(0 + 255*0.5) — truncated, so #7f7f7f
+	if got != "#7f7f7f" {
+		t.Errorf("blendHex black/white at 0.5 = %q, want #7f7f7f", got)
+	}
+}
+
+func TestDeriveDiffBgProducesDifferentColorsPerTheme(t *testing.T) {
+	// Catppuccin palette
+	catBg := [2]string{"#eff1f5", "#1e1e2e"}
+	catSuccess := [2]string{"#40a02b", "#a6e3a1"}
+	catError := [2]string{"#d20f39", "#f38ba8"}
+
+	// KITT palette
+	kittBg := [2]string{"#F0F0F0", "#0D0D0D"}
+	kittSuccess := [2]string{"#998800", "#CCAA00"}
+	kittError := [2]string{"#CC0000", "#FF3333"}
+
+	catInsert, catDelete, _, _, _, _, _ := deriveDiffBg(catBg, catSuccess, catError)
+	kittInsert, kittDelete, _, _, _, _, _ := deriveDiffBg(kittBg, kittSuccess, kittError)
+
+	if catInsert == kittInsert {
+		t.Error("catppuccin DiffInsertBg should differ from kitt DiffInsertBg")
+	}
+	if catDelete == kittDelete {
+		t.Error("catppuccin DiffDeleteBg should differ from kitt DiffDeleteBg")
+	}
+}
+
+func TestMakeThemeDerivesUniqueDiffColors(t *testing.T) {
+	themes := builtinThemes()
+	kitt := themes["kitt"]
+	cat := themes["catppuccin"]
+
+	// The catppuccin diff backgrounds should NOT equal the kitt defaults.
+	if cat.DiffInsertBg == kitt.DiffInsertBg {
+		t.Error("catppuccin DiffInsertBg should differ from kitt default")
+	}
+	if cat.DiffDeleteBg == kitt.DiffDeleteBg {
+		t.Error("catppuccin DiffDeleteBg should differ from kitt default")
+	}
+	if cat.DiffEqualBg == kitt.DiffEqualBg {
+		t.Error("catppuccin DiffEqualBg should differ from kitt default")
+	}
+}
@@ -23,6 +23,7 @@ const (
 	maxCodeLines  = 20 // lines for Read / code blocks
 	maxWriteLines = 10 // lines for Write blocks
 	maxBashLines  = 20 // lines for Bash output (matches Read)
+	maxLsLines    = 20 // lines for Ls directory listings
 )

 // renderToolBody dispatches to tool-specific body renderers based on tool name.
@@ -229,7 +230,7 @@ func renderDiffBlock(before, after string, startLine int, width int) string {
 	gutterMissing := lipgloss.NewStyle().Background(theme.DiffMissingBg)

 	contentInsert := lipgloss.NewStyle().Background(theme.DiffInsertBg)
-	contentDelete := lipgloss.NewStyle().Background(theme.DiffDeleteBg).Strikethrough(true)
+	contentDelete := lipgloss.NewStyle().Background(theme.DiffDeleteBg)
 	contentEqual := lipgloss.NewStyle().Foreground(theme.Muted).Background(theme.DiffEqualBg)
 	contentMissing := lipgloss.NewStyle().Background(theme.DiffMissingBg)

@@ -315,6 +316,13 @@ func renderLsBody(toolResult string, width int) string {

 	lines := strings.Split(content, "\n")

+	// Truncate to maxLsLines for display
+	var hiddenCount int
+	if len(lines) > maxLsLines {
+		hiddenCount = len(lines) - maxLsLines
+		lines = lines[:maxLsLines]
+	}
+
 	const indent = "  "
 	codeWidth := max(width-len(indent), 20)

@@ -329,6 +337,13 @@ func renderLsBody(toolResult string, width int) string {
 		result = append(result, indent+styled)
 	}

+	if hiddenCount > 0 {
+		hint := fmt.Sprintf("...(%d more entries)", hiddenCount)
+		hintContent := codeStyle.Width(codeWidth).
+			Foreground(theme.Muted).Italic(true).Render(hint)
+		result = append(result, indent+hintContent)
+	}
+
 	return strings.Join(result, "\n")
 }

@@ -266,3 +266,14 @@ func (ut *UsageTracker) SetWidth(width int) {
 	defer ut.mu.Unlock()
 	ut.width = width
 }
+
+// UpdateModelInfo updates the model information and OAuth status when the model
+// is switched mid-session. This ensures token costs and context limits are
+// calculated correctly for the new model.
+func (ut *UsageTracker) UpdateModelInfo(modelInfo *models.ModelInfo, provider string, isOAuth bool) {
+	ut.mu.Lock()
+	defer ut.mu.Unlock()
+	ut.modelInfo = modelInfo
+	ut.provider = provider
+	ut.isOAuth = isOAuth
+}
@@ -39,6 +39,9 @@ const (
 	EventCompaction EventType = "compaction"
 	// EventReasoningDelta fires for each streaming reasoning/thinking chunk.
 	EventReasoningDelta EventType = "reasoning_delta"
+	// EventToolOutput fires when a tool produces streaming output chunks.
+	EventToolOutput EventType = "tool_output"
+	EventStepUsage  EventType = "step_usage"
 )

 // ---------------------------------------------------------------------------
@@ -143,6 +146,17 @@ type ReasoningDeltaEvent struct {
 // EventType implements Event.
 func (e ReasoningDeltaEvent) EventType() EventType { return EventReasoningDelta }

+// ToolOutputEvent fires when a tool produces streaming output chunks (e.g., bash output).
+type ToolOutputEvent struct {
+	ToolCallID string
+	ToolName   string
+	Chunk      string
+	IsStderr   bool
+}
+
+// EventType implements Event.
+func (e ToolOutputEvent) EventType() EventType { return EventToolOutput }
+
 // MessageEndEvent fires when the assistant message is complete.
 type MessageEndEvent struct {
 	Content string
@@ -236,6 +250,19 @@ type ResponseEvent struct {
 // EventType implements Event.
 func (e ResponseEvent) EventType() EventType { return EventResponse }

+// StepUsageEvent fires after each complete step in a multi-step agent turn,
+// carrying the token usage for that specific step. This enables real-time
+// cost tracking during long-running tool-calling conversations.
+type StepUsageEvent struct {
+	InputTokens      uint64
+	OutputTokens     uint64
+	CacheReadTokens  uint64
+	CacheWriteTokens uint64
+}
+
+// EventType implements Event.
+func (e StepUsageEvent) EventType() EventType { return EventStepUsage }
+
 // CompactionEvent fires after a successful compaction.
 type CompactionEvent struct {
 	Summary         string
@@ -322,6 +349,16 @@ func (m *Kit) OnToolResult(handler func(ToolResultEvent)) func() {
 	})
 }

+// OnToolOutput registers a handler that fires only for ToolOutputEvent
+// (streaming tool output chunks, e.g., from bash). Returns an unsubscribe function.
+func (m *Kit) OnToolOutput(handler func(ToolOutputEvent)) func() {
+	return m.Subscribe(func(e Event) {
+		if to, ok := e.(ToolOutputEvent); ok {
+			handler(to)
+		}
+	})
+}
+
 // OnStreaming registers a handler that fires only for MessageUpdateEvent
 // (streaming text chunks). Returns an unsubscribe function.
 func (m *Kit) OnStreaming(handler func(MessageUpdateEvent)) func() {
@@ -86,6 +86,20 @@ func (m *Kit) bridgeExtensions(runner *extensions.Runner) {
 		})
 	}

+	// Tool output streaming events (observation only).
+	if runner.HasHandlers(extensions.ToolOutput) {
+		m.Subscribe(func(e Event) {
+			if ev, ok := e.(ToolOutputEvent); ok {
+				_, _ = runner.Emit(extensions.ToolOutputEvent{
+					ToolCallID: ev.ToolCallID,
+					ToolName:   ev.ToolName,
+					Chunk:      ev.Chunk,
+					IsStderr:   ev.IsStderr,
+				})
+			}
+		})
+	}
+
 	if runner.HasHandlers(extensions.AgentEnd) {
 		m.Subscribe(func(e Event) {
 			if ev, ok := e.(TurnEndEvent); ok {
@@ -917,8 +917,12 @@ func New(ctx context.Context, opts *Options) (*Kit, error) {
 	setSDKDefaults()

 	// Initialize config (loads config files and env vars).
-	if err := InitConfig(opts.ConfigFile, false); err != nil {
-		return nil, fmt.Errorf("failed to initialize config: %w", err)
+	// Only initialize if not already done (e.g., by CLI's cobra.OnInitialize).
+	// Check if model is already set, which indicates config was loaded.
+	if viper.GetString("model") == "" {
+		if err := InitConfig(opts.ConfigFile, false); err != nil {
+			return nil, fmt.Errorf("failed to initialize config: %w", err)
+		}
 	}

 	// Handle CLI debug mode.
@@ -1478,6 +1482,24 @@ func (m *Kit) generate(ctx context.Context, messages []fantasy.Message) (*agent.
 		func(delta string) {
 			m.events.emit(ReasoningDeltaEvent{Delta: delta})
 		},
+		func(toolCallID, toolName, chunk string, isStderr bool) {
+			// Emit tool output chunk event for streaming bash output
+			m.events.emit(ToolOutputEvent{
+				ToolCallID: toolCallID,
+				ToolName:   toolName,
+				Chunk:      chunk,
+				IsStderr:   isStderr,
+			})
+		},
+		func(inputTokens, outputTokens, cacheReadTokens, cacheCreationTokens int64) {
+			// Emit step usage event for real-time cost tracking
+			m.events.emit(StepUsageEvent{
+				InputTokens:      uint64(inputTokens),
+				OutputTokens:     uint64(outputTokens),
+				CacheReadTokens:  uint64(cacheReadTokens),
+				CacheWriteTokens: uint64(cacheCreationTokens),
+			})
+		},
 	)
 }

@@ -1537,13 +1559,6 @@ func (m *Kit) runTurn(ctx context.Context, promptLabel string, prompt string, pr
 		}
 	}

-	// Save the leaf position before appending anything so we can roll back
-	// to this point if the turn is cancelled (double-ESC). Rolling back
-	// discards the user message and any tool call / tool result pairs that
-	// were generated, which avoids leaving orphaned tool_use messages
-	// without matching tool_result (APIs require them in pairs).
-	preLeafID := m.treeSession.GetLeafID()
-
 	// Persist pre-generation messages to tree session.
 	for _, msg := range preMessages {
 		_, _ = m.treeSession.AppendFantasyMessage(msg)
@@ -1571,23 +1586,16 @@ func (m *Kit) runTurn(ctx context.Context, promptLabel string, prompt string, pr

 	result, err := m.generate(ctx, messages)
 	if err != nil {
-		if ctx.Err() != nil {
-			// Context was cancelled (e.g. user pressed ESC twice). Roll
-			// the tree session back to the pre-turn leaf so that the
-			// user message, any tool_use messages, and any tool_result
-			// messages from this turn are all discarded. APIs require
-			// tool calls and tool results to appear in matched pairs;
-			// persisting a partial turn would leave orphaned entries
-			// that break subsequent requests.
-			_ = m.treeSession.Branch(preLeafID)
-		} else {
-			// Non-cancellation error (e.g. API failure). Persist any
-			// messages that were generated during this turn (completed
-			// tool call/result pairs) so partial progress is not lost.
-			if result != nil && len(result.ConversationMessages) > sentCount {
-				for _, msg := range result.ConversationMessages[sentCount:] {
-					_, _ = m.treeSession.AppendFantasyMessage(msg)
-				}
+		// Persist any messages from completed steps (tool call/result
+		// pairs) so partial progress is not lost. The agent layer only
+		// includes fully-paired tool_use + tool_result messages in
+		// completedStepMessages, so there are no orphaned entries that
+		// would break subsequent API requests. The user message and any
+		// completed work remain in the session; only the in-progress
+		// (pending) message or tool call is discarded.
+		if result != nil && len(result.ConversationMessages) > sentCount {
+			for _, msg := range result.ConversationMessages[sentCount:] {
+				_, _ = m.treeSession.AppendFantasyMessage(msg)
 			}
 		}
 		m.events.emit(TurnEndEvent{Error: err})
@@ -16,7 +16,10 @@ func TestNew(t *testing.T) {
 	ctx := context.Background()

 	// Test default initialization
-	host, err := kit.New(ctx, nil)
+	opts := &kit.Options{
+		Model: "anthropic/claude-sonnet-4-5-20250929",
+	}
+	host, err := kit.New(ctx, opts)
 	if err != nil {
 		t.Fatalf("Failed to create Kit with defaults: %v", err)
 	}
@@ -0,0 +1,772 @@
+---
+name: kit-sdk
+description: Guide for building Go applications with the Kit SDK. Use when the user asks to create a program, service, script, or application that uses Kit programmatically as a Go library — e.g. embedding LLM interactions, building agents, creating CLI tools powered by Kit, or integrating Kit into backend services. Do NOT use for Kit extensions (use kit-extensions skill instead).
+---
+
+# Kit SDK Development Guide
+
+The Kit SDK (`pkg/kit`) lets you embed Kit's full agent capabilities — LLM interactions, tool execution, session management, streaming, hooks — into any Go application. Unlike extensions (which are interpreted scripts running inside Kit's TUI), SDK programs are standalone compiled Go binaries.
+
+## Installation
+
+```bash
+go get github.com/mark3labs/kit
+```
+
+Import path (alias recommended):
+
+```go
+import kit "github.com/mark3labs/kit/pkg/kit"
+```
+
+## Quick Start
+
+```go
+package main
+
+import (
+    "context"
+    "fmt"
+    "log"
+
+    kit "github.com/mark3labs/kit/pkg/kit"
+)
+
+func main() {
+    ctx := context.Background()
+
+    host, err := kit.New(ctx, nil) // nil = load ~/.kit.yml defaults
+    if err != nil {
+        log.Fatal(err)
+    }
+    defer func() { _ = host.Close() }()
+
+    response, err := host.Prompt(ctx, "What is 2+2?")
+    if err != nil {
+        log.Fatal(err)
+    }
+    fmt.Println(response)
+}
+```
+
+## Core Lifecycle
+
+1. **Create**: `kit.New(ctx, opts)` — loads config, initializes MCP servers, creates LLM provider, sets up agent
+2. **Interact**: `host.Prompt(ctx, msg)` — send messages, agent uses tools as needed
+3. **Close**: `host.Close()` — cleans up MCP connections, model resources, session file handle
+
+Always defer `Close()`:
+
+```go
+defer func() { _ = host.Close() }()
+```
+
+---
+
+## Options Reference
+
+All fields are optional. Zero values use CLI defaults.
+
+```go
+host, err := kit.New(ctx, &kit.Options{
+    // Model
+    Model:        "anthropic/claude-sonnet-4-5-20250929", // "provider/model" format
+    SystemPrompt: "You are a helpful assistant",
+    ConfigFile:   "/path/to/config.yml",                  // default: ~/.kit.yml
+
+    // Behavior
+    MaxSteps:  10,   // 0 = unlimited tool-calling steps
+    Streaming: true, // stream LLM output (default from config)
+    Quiet:     true, // suppress debug output
+    Debug:     true, // enable debug logging
+
+    // Session
+    SessionDir:  "/path/to/project",  // base dir for session discovery (default: cwd)
+    SessionPath: "/path/to/session.jsonl", // open specific session file
+    Continue:    true,                // resume most recent session for SessionDir
+    NoSession:   true,                // ephemeral in-memory session, no disk persistence
+
+    // Tools
+    Tools:      []kit.Tool{kit.NewBashTool()}, // REPLACES entire default tool set
+    ExtraTools: []kit.Tool{myTool},            // ADDS alongside core/MCP/extension tools
+
+    // Skills
+    Skills:    []string{"/path/to/skill.md"}, // explicit skill files (empty = auto-discover)
+    SkillsDir: "/path/to/skills",             // override project-local skills dir
+
+    // Compaction
+    AutoCompact:       true,                        // auto-compact near context limit
+    CompactionOptions: &kit.CompactionOptions{...}, // nil = defaults
+})
+```
+
+**Critical distinction**: `Tools` replaces ALL default tools (core + MCP + extension). `ExtraTools` adds tools alongside the defaults. Use `Tools` to restrict the agent's capabilities; use `ExtraTools` to extend them.
+
+---
+
+## Prompt Methods
+
+### Simple prompt — string in, string out
+
+```go
+response, err := host.Prompt(ctx, "Explain this code")
+```
+
+### Full result with usage stats
+
+```go
+result, err := host.PromptResult(ctx, "Analyze this file")
+// result.Response     — assistant's text
+// result.StopReason   — "stop", "length", "tool-calls", "error", etc.
+// result.SessionID    — session UUID
+// result.TotalUsage   — aggregate tokens across all steps (*kit.FantasyUsage)
+// result.FinalUsage   — tokens from last API call only
+// result.Messages     — full updated conversation ([]kit.FantasyMessage)
+```
+
+### Multimodal with file attachments
+
+```go
+import "charm.land/fantasy"
+
+files := []fantasy.FilePart{{
+    Name:      "screenshot.png",
+    MediaType: "image/png",
+    Data:      imageBytes,
+}}
+result, err := host.PromptResultWithFiles(ctx, "What's in this image?", files)
+```
+
+### Per-call system message injection
+
+```go
+response, err := host.PromptWithOptions(ctx, "Review this PR", kit.PromptOptions{
+    SystemMessage: "Focus on security vulnerabilities only.",
+})
+```
+
+### System-level steering (no visible user message)
+
+```go
+response, err := host.Steer(ctx, "Switch to a more formal tone")
+```
+
+### Continue without new input
+
+```go
+response, err := host.FollowUp(ctx, "") // empty = "Continue."
+```
+
+### Multiple user messages in one turn
+
+```go
+result, err := host.PromptResultWithMessages(ctx, []string{
+    "Here is the code:",
+    "@file.go", // content from earlier
+    "Please review it.",
+})
+```
+
+### Legacy inline callbacks (deprecated — use event subscribers instead)
+
+```go
+response, err := host.PromptWithCallbacks(ctx, "List files",
+    func(name, args string) { fmt.Printf("Tool: %s\n", name) },
+    func(name, args, result string, isError bool) { /* tool result */ },
+    func(chunk string) { fmt.Print(chunk) }, // streaming
+)
+```
+
+---
+
+## Event System
+
+Events are read-only observations of the agent lifecycle. Register before calling Prompt.
+
+### Typed convenience subscribers
+
+```go
+// Each returns an unsubscribe function.
+unsub := host.OnToolCall(func(e kit.ToolCallEvent) {
+    // e.ToolCallID, e.ToolName, e.ToolKind, e.ToolArgs, e.ParsedArgs
+})
+defer unsub()
+
+host.OnToolResult(func(e kit.ToolResultEvent) {
+    // e.ToolCallID, e.ToolName, e.ToolKind, e.ToolArgs, e.ParsedArgs
+    // e.Result, e.IsError, e.Metadata (*ToolResultMetadata)
+})
+
+host.OnToolOutput(func(e kit.ToolOutputEvent) {
+    // e.ToolCallID, e.ToolName, e.Chunk, e.IsStderr
+    // Streaming bash output chunks
+})
+
+host.OnStreaming(func(e kit.MessageUpdateEvent) {
+    fmt.Print(e.Chunk) // real-time text streaming
+})
+
+host.OnResponse(func(e kit.ResponseEvent) {
+    // e.Content — final response text
+})
+
+host.OnTurnStart(func(e kit.TurnStartEvent) {
+    // e.Prompt
+})
+
+host.OnTurnEnd(func(e kit.TurnEndEvent) {
+    // e.Response, e.Error, e.StopReason
+})
+```
+
+### Generic subscriber (receives all events)
+
+```go
+unsub := host.Subscribe(func(e kit.Event) {
+    switch ev := e.(type) {
+    case kit.ToolCallEvent:
+        // ...
+    case kit.MessageUpdateEvent:
+        // ...
+    case kit.CompactionEvent:
+        // ev.Summary, ev.OriginalTokens, ev.CompactedTokens
+    }
+})
+```
+
+### All event types
+
+| Event Type | Struct | Key Fields |
+|------------|--------|------------|
+| `turn_start` | `TurnStartEvent` | `Prompt` |
+| `turn_end` | `TurnEndEvent` | `Response`, `Error`, `StopReason` |
+| `message_start` | `MessageStartEvent` | *(none)* |
+| `message_update` | `MessageUpdateEvent` | `Chunk` |
+| `message_end` | `MessageEndEvent` | `Content` |
+| `tool_call` | `ToolCallEvent` | `ToolCallID`, `ToolName`, `ToolKind`, `ToolArgs`, `ParsedArgs` |
+| `tool_execution_start` | `ToolExecutionStartEvent` | `ToolCallID`, `ToolName`, `ToolKind`, `ToolArgs` |
+| `tool_execution_end` | `ToolExecutionEndEvent` | `ToolCallID`, `ToolName`, `ToolKind` |
+| `tool_result` | `ToolResultEvent` | `ToolCallID`, `ToolName`, `ToolKind`, `ToolArgs`, `ParsedArgs`, `Result`, `IsError`, `Metadata` |
+| `tool_call_content` | `ToolCallContentEvent` | `Content` |
+| `tool_output` | `ToolOutputEvent` | `ToolCallID`, `ToolName`, `Chunk`, `IsStderr` |
+| `response` | `ResponseEvent` | `Content` |
+| `compaction` | `CompactionEvent` | `Summary`, `OriginalTokens`, `CompactedTokens`, `MessagesRemoved`, `ReadFiles`, `ModifiedFiles` |
+| `reasoning_delta` | `ReasoningDeltaEvent` | `Delta` |
+
+### Tool kind constants
+
+Tools are classified by kind for UI rendering:
+
+- `ToolKindExecute` = `"execute"` — bash
+- `ToolKindEdit` = `"edit"` — edit, write
+- `ToolKindRead` = `"read"` — read, ls
+- `ToolKindSearch` = `"search"` — grep, find
+- `ToolKindSubagent` = `"agent"` — spawn_subagent
+
+---
+
+## Hook System (Interceptors)
+
+Hooks can **modify or cancel** operations. Events are read-only; hooks are read-write.
+
+### BeforeToolCall — block tool execution
+
+```go
+unsub := host.OnBeforeToolCall(kit.HookPriorityNormal, func(h kit.BeforeToolCallHook) *kit.BeforeToolCallResult {
+    // h.ToolCallID, h.ToolName, h.ToolArgs
+    if h.ToolName == "bash" {
+        return &kit.BeforeToolCallResult{Block: true, Reason: "bash disabled"}
+    }
+    return nil // allow
+})
+```
+
+### AfterToolResult — modify tool output
+
+```go
+host.OnAfterToolResult(kit.HookPriorityNormal, func(h kit.AfterToolResultHook) *kit.AfterToolResultResult {
+    // h.ToolCallID, h.ToolName, h.ToolArgs, h.Result, h.IsError
+    if h.ToolName == "read" {
+        filtered := redactSecrets(h.Result)
+        return &kit.AfterToolResultResult{Result: &filtered}
+    }
+    return nil
+})
+```
+
+### BeforeTurn — modify prompt, inject messages
+
+```go
+host.OnBeforeTurn(kit.HookPriorityNormal, func(h kit.BeforeTurnHook) *kit.BeforeTurnResult {
+    // h.Prompt
+    newPrompt := h.Prompt + "\nAlways respond in JSON."
+    return &kit.BeforeTurnResult{Prompt: &newPrompt}
+    // Also available: SystemPrompt *string, InjectText *string
+})
+```
+
+### AfterTurn — observation only
+
+```go
+host.OnAfterTurn(kit.HookPriorityNormal, func(h kit.AfterTurnHook) {
+    // h.Response, h.Error
+    log.Printf("Turn completed: %d chars", len(h.Response))
+})
+```
+
+### ContextPrepare — filter/inject context window
+
+```go
+host.OnContextPrepare(kit.HookPriorityNormal, func(h kit.ContextPrepareHook) *kit.ContextPrepareResult {
+    // h.Messages — []fantasy.Message (the full context being sent to the LLM)
+    // Return nil to pass through, or replace entire context:
+    return &kit.ContextPrepareResult{Messages: filteredMessages}
+})
+```
+
+### BeforeCompact — cancel or customize compaction
+
+```go
+host.OnBeforeCompact(kit.HookPriorityNormal, func(h kit.BeforeCompactHook) *kit.BeforeCompactResult {
+    // h.EstimatedTokens, h.ContextLimit, h.UsagePercent, h.MessageCount, h.IsAutomatic
+    if h.IsAutomatic && h.UsagePercent < 0.9 {
+        return &kit.BeforeCompactResult{Cancel: true, Reason: "not yet"}
+    }
+    return nil
+})
+```
+
+### Hook priorities
+
+```go
+kit.HookPriorityHigh   = 0   // runs first
+kit.HookPriorityNormal = 50  // default
+kit.HookPriorityLow    = 100 // runs last
+```
+
+Lower values run first. Within the same priority, registration order applies. First non-nil result wins.
+
+---
+
+## Tools
+
+### Built-in tool constructors
+
+```go
+kit.NewReadTool(opts...)  // file reading
+kit.NewWriteTool(opts...) // file writing
+kit.NewEditTool(opts...)  // surgical text editing
+kit.NewBashTool(opts...)  // bash command execution
+kit.NewGrepTool(opts...) // content search (uses ripgrep when available)
+kit.NewFindTool(opts...) // file search (uses fd when available)
+kit.NewLsTool(opts...)   // directory listing
+```
+
+### Tool bundles
+
+```go
+kit.AllTools(opts...)       // all 7 core tools
+kit.CodingTools(opts...)    // bash, read, write, edit
+kit.ReadOnlyTools(opts...)  // read, grep, find, ls
+kit.SubagentTools(opts...)  // all except spawn_subagent (prevents recursion)
+```
+
+### Tool options
+
+```go
+kit.WithWorkDir("/path/to/dir") // override working directory for file-based tools
+```
+
+### Using tools in Options
+
+```go
+// Restricted: agent can ONLY run bash
+host, _ := kit.New(ctx, &kit.Options{
+    Tools: []kit.Tool{kit.NewBashTool()},
+})
+
+// Extended: all defaults PLUS a custom tool
+host, _ := kit.New(ctx, &kit.Options{
+    ExtraTools: []kit.Tool{myCustomTool},
+})
+```
+
+### Querying tools at runtime
+
+```go
+names := host.GetToolNames()       // []string of all tool names
+tools := host.GetTools()           // []kit.Tool (full tool objects)
+mcpCount := host.GetMCPToolCount() // tools from MCP servers
+extCount := host.GetExtensionToolCount() // tools from extensions
+```
+
+---
+
+## Session Management
+
+Sessions automatically persist as JSONL tree files. No explicit save needed.
+
+### Session modes (via Options)
+
+| Mode | Options | Behavior |
+|------|---------|----------|
+| Default | `{}` | New session file for cwd |
+| Specific file | `{SessionPath: "path.jsonl"}` | Open existing session |
+| Continue | `{Continue: true}` | Resume most recent session for cwd |
+| Ephemeral | `{NoSession: true}` | In-memory only, no disk persistence |
+| Custom dir | `{SessionDir: "/path"}` | Base directory for session discovery |
+
+### Instance methods
+
+```go
+host.GetSessionPath() // file path of active session
+host.GetSessionID()   // UUID of active session
+host.ClearSession()   // reset to fresh branch (doesn't delete file)
+host.Branch("entry-id") // branch from a specific entry
+host.SetSessionName("my session") // set display name
+
+// Get conversation messages
+msgs := host.GetSessionMessages()       // []extensions.SessionMessage (flattened text)
+msgs := host.GetStructuredMessages()     // []kit.StructuredMessage (typed content parts)
+```
+
+### Package-level session operations (no Kit instance needed)
+
+```go
+sessions, _ := kit.ListSessions("/path/to/project") // sessions for a directory
+sessions, _ := kit.ListAllSessions()                  // all sessions everywhere
+kit.DeleteSession("/path/to/session.jsonl")
+tm, _ := kit.OpenTreeSession("/path/to/session.jsonl") // open for direct access
+```
+
+---
+
+## Model Management
+
+### At creation time
+
+```go
+host, _ := kit.New(ctx, &kit.Options{
+    Model: "openai/gpt-4o",
+})
+```
+
+### At runtime
+
+```go
+err := host.SetModel(ctx, "anthropic/claude-sonnet-4-5-20250929")
+modelStr := host.GetModelString()   // "provider/model"
+info := host.GetModelInfo()          // *kit.ModelInfo (capabilities, limits, pricing) or nil
+isReasoning := host.IsReasoningModel()
+level := host.GetThinkingLevel()
+err = host.SetThinkingLevel(ctx, "medium") // recreates agent with new thinking budget
+```
+
+### Model registry
+
+```go
+models := host.GetAvailableModels()      // []extensions.ModelInfoEntry
+providers := kit.GetSupportedProviders() // []string
+providers := kit.GetFantasyProviders()   // providers usable with fantasy
+models, _ := kit.GetModelsForProvider("anthropic") // map[string]kit.ModelInfo
+info := kit.LookupModel("anthropic", "claude-sonnet-4-5-20250929") // *kit.ModelInfo
+info := kit.GetProviderInfo("openai")    // *kit.ProviderInfo (env vars, API URL)
+err := kit.ValidateEnvironment("anthropic", "") // check API keys
+suggestions := kit.SuggestModels("anthropic", "claudee") // fuzzy match
+kit.RefreshModelRegistry() // reload model database
+```
+
+### Model string format
+
+Always `"provider/model"`: `"anthropic/claude-sonnet-4-5-20250929"`, `"openai/gpt-4o"`, `"ollama/qwen3:8b"`.
+
+```go
+provider, modelID, err := kit.ParseModelString("anthropic/claude-sonnet-4-5-20250929")
+```
+
+---
+
+## Context & Compaction
+
+```go
+tokens := host.EstimateContextTokens()  // heuristic token count
+shouldCompact := host.ShouldCompact()    // true if near context limit
+
+stats := host.GetContextStats()
+// stats.EstimatedTokens — uses API-reported count when available (more accurate)
+// stats.ContextLimit    — model's context window size
+// stats.UsagePercent    — fraction used (0.0–1.0)
+// stats.MessageCount    — number of messages
+
+// Manual compaction
+result, err := host.Compact(ctx, nil, "") // nil opts = defaults, "" = default prompt
+// result.Summary, result.OriginalTokens, result.CompactedTokens, result.MessagesRemoved
+
+// Auto-compaction via Options
+host, _ := kit.New(ctx, &kit.Options{
+    AutoCompact: true,
+    CompactionOptions: &kit.CompactionOptions{
+        ReserveTokens:   16384,
+        KeepRecentTokens: 4096,
+        ContextWindow:   200000,
+    },
+})
+```
+
+---
+
+## In-Process Subagents
+
+Spawn child Kit instances without subprocess overhead:
+
+```go
+result, err := host.Subagent(ctx, kit.SubagentConfig{
+    Prompt:       "Analyze the test files and summarize coverage",
+    Model:        "anthropic/claude-haiku-3-5-20241022", // empty = parent's model
+    SystemPrompt: "You are a test analysis expert.",
+    Tools:        nil,           // nil = SubagentTools() (all except spawn_subagent)
+    NoSession:    true,          // ephemeral
+    Timeout:      2 * time.Minute, // 0 = 5 minute default
+    OnEvent: func(e kit.Event) {
+        // Real-time events from the child agent
+        if chunk, ok := e.(kit.MessageUpdateEvent); ok {
+            fmt.Print(chunk.Chunk)
+        }
+    },
+})
+// result.Response, result.Error, result.SessionID, result.StopReason
+// result.Usage (*kit.FantasyUsage), result.Elapsed (time.Duration)
+```
+
+### Subscribing to subagent events from parent
+
+```go
+host.OnToolCall(func(e kit.ToolCallEvent) {
+    if e.ToolName == "spawn_subagent" {
+        host.SubscribeSubagent(e.ToolCallID, func(child kit.Event) {
+            // Real-time events scoped to this subagent
+        })
+    }
+})
+```
+
+---
+
+## Authentication
+
+```go
+cm, _ := kit.NewCredentialManager()
+hasKey := kit.HasAnthropicCredentials()
+apiKey := kit.GetAnthropicAPIKey() // stored creds → ANTHROPIC_API_KEY env var
+```
+
+---
+
+## Skills
+
+```go
+// Load a single skill file
+skill, _ := kit.LoadSkill("/path/to/SKILL.md")
+// skill.Name, skill.Description, skill.Content, skill.Path
+
+// Load from directory
+skills, _ := kit.LoadSkillsFromDir("/path/to/skills")
+
+// Auto-discover (global + project-local)
+skills, _ := kit.LoadSkills("/path/to/project")
+
+// Prompt building with skills
+pb := kit.NewPromptBuilder("You are an assistant")
+pb.WithSkills(skills)
+pb.WithSection("", "Extra context here")
+systemPrompt := pb.Build()
+```
+
+---
+
+## Re-exported Types
+
+The SDK re-exports internal types so you don't need direct internal imports:
+
+```go
+// Message types
+kit.Message, kit.MessageRole, kit.ContentPart
+kit.TextContent, kit.ReasoningContent, kit.ToolCall, kit.ToolResult, kit.Finish
+kit.RoleUser, kit.RoleAssistant, kit.RoleTool, kit.RoleSystem
+
+// Session types
+kit.SessionInfo, kit.TreeManager, kit.SessionHeader, kit.MessageEntry
+
+// Config types
+kit.Config, kit.MCPServerConfig
+
+// Provider types
+kit.ProviderConfig, kit.ProviderResult, kit.ModelInfo, kit.ModelCost, kit.ModelLimit
+
+// Fantasy types (from charm.land/fantasy)
+kit.FantasyMessage, kit.FantasyUsage, kit.FantasyResponse
+
+// Compaction types
+kit.CompactionResult, kit.CompactionOptions
+
+// Conversion helpers
+msgs := kit.ConvertToFantasyMessages(&msg)   // SDK message → fantasy messages
+msg := kit.ConvertFromFantasyMessage(fMsg)    // fantasy message → SDK message
+```
+
+---
+
+## Common Patterns
+
+### Pattern: Scripting / CLI pipe
+
+Minimal program for automation — stdout-only output:
+
+```go
+host, _ := kit.New(ctx, &kit.Options{Quiet: true})
+defer func() { _ = host.Close() }()
+
+response, _ := host.Prompt(ctx, os.Args[1])
+fmt.Println(response)
+```
+
+### Pattern: Long-running autonomous agent
+
+Daemon that performs repeated independent tasks:
+
+```go
+host, _ := kit.New(ctx, &kit.Options{
+    SystemPrompt: taskPrompt,
+    Tools:        []kit.Tool{kit.NewBashTool()},
+    NoSession:    true,
+    Quiet:        true,
+})
+defer func() { _ = host.Close() }()
+
+ticker := time.NewTicker(30 * time.Minute)
+for {
+    select {
+    case <-ticker.C:
+        host.ClearSession() // fresh context each iteration
+        host.Prompt(ctx, "Perform the monitoring task")
+    case <-ctx.Done():
+        return
+    }
+}
+```
+
+### Pattern: Streaming output to terminal
+
+```go
+host.OnStreaming(func(e kit.MessageUpdateEvent) {
+    fmt.Print(e.Chunk)
+})
+response, _ := host.Prompt(ctx, "Write a poem")
+```
+
+### Pattern: Multi-turn conversation with memory
+
+```go
+host.Prompt(ctx, "My name is Alice")
+response, _ := host.Prompt(ctx, "What's my name?")
+// Session automatically maintains context across calls
+fmt.Printf("Session: %s\n", host.GetSessionPath())
+```
+
+### Pattern: Tool execution monitoring
+
+```go
+host.OnToolCall(func(e kit.ToolCallEvent) {
+    fmt.Printf("[%s] %s(%s)\n", e.ToolKind, e.ToolName, e.ToolArgs)
+})
+host.OnToolResult(func(e kit.ToolResultEvent) {
+    status := "✓"
+    if e.IsError { status = "✗" }
+    fmt.Printf("[%s] %s %s\n", e.ToolKind, status, e.ToolName)
+})
+```
+
+### Pattern: Guard rails with hooks
+
+```go
+// Block dangerous commands
+host.OnBeforeToolCall(kit.HookPriorityHigh, func(h kit.BeforeToolCallHook) *kit.BeforeToolCallResult {
+    if h.ToolName == "bash" && strings.Contains(h.ToolArgs, "rm -rf") {
+        return &kit.BeforeToolCallResult{Block: true, Reason: "dangerous command"}
+    }
+    return nil
+})
+
+// Inject context before every turn
+host.OnBeforeTurn(kit.HookPriorityNormal, func(h kit.BeforeTurnHook) *kit.BeforeTurnResult {
+    context := "Current user: admin\nEnvironment: production"
+    return &kit.BeforeTurnResult{InjectText: &context}
+})
+```
+
+### Pattern: Parallel subagents
+
+```go
+var wg sync.WaitGroup
+results := make([]*kit.SubagentResult, 3)
+
+tasks := []string{"Analyze auth module", "Analyze database layer", "Analyze API routes"}
+for i, task := range tasks {
+    wg.Add(1)
+    go func(idx int, t string) {
+        defer wg.Done()
+        results[idx], _ = host.Subagent(ctx, kit.SubagentConfig{
+            Prompt:    t,
+            NoSession: true,
+            Timeout:   3 * time.Minute,
+        })
+    }(i, task)
+}
+wg.Wait()
+```
+
+### Pattern: Read-only analysis agent
+
+```go
+host, _ := kit.New(ctx, &kit.Options{
+    SystemPrompt: "You are a code reviewer. Only read and analyze, never modify files.",
+    Tools:        kit.ReadOnlyTools(),
+})
+```
+
+---
+
+## Configuration
+
+The SDK loads config identically to the CLI:
+
+1. Explicit `ConfigFile` in Options (highest priority)
+2. `.kit.yml` in current directory
+3. `~/.kit.yml` in home directory
+4. Environment variables with `KIT_` prefix (`KIT_MODEL`, etc.)
+5. Provider-specific env vars (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.)
+
+Config files support `${ENV_VAR}` expansion.
+
+```go
+// Initialize config manually (usually not needed — kit.New handles this)
+kit.InitConfig("/path/to/config.yml", false)
+kit.LoadConfigWithEnvSubstitution("/path/to/config.yml")
+```
+
+---
+
+## Key Files for Reference
+
+- [`pkg/kit/kit.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/kit.go) — Kit struct, New(), Prompt methods, Subagent, Close
+- [`pkg/kit/types.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/types.go) — Re-exported types from internal packages
+- [`pkg/kit/tools.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/tools.go) — Tool constructors and bundles
+- [`pkg/kit/events.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/events.go) — Event types, EventBus, typed subscribers
+- [`pkg/kit/hooks.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/hooks.go) — Hook system (BeforeToolCall, AfterToolResult, etc.)
+- [`pkg/kit/sessions.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/sessions.go) — Session management
+- [`pkg/kit/compaction.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/compaction.go) — Context compaction
+- [`pkg/kit/models.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/models.go) — Model registry lookups
+- [`pkg/kit/config.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/config.go) — Config initialization and defaults
+- [`pkg/kit/skills.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/skills.go) — Skills loading and prompt building
+- [`pkg/kit/auth.go`](https://github.com/mark3labs/kit/blob/main/pkg/kit/auth.go) — Credential management
+- [`examples/sdk/`](https://github.com/mark3labs/kit/tree/main/examples/sdk) — Working example programs
@@ -7,7 +7,7 @@ description: All extension capabilities — lifecycle events, tools, commands, w

 ## Lifecycle events

-Extensions can hook into 18 lifecycle events:
+Extensions can hook into 20 lifecycle events:

 | Event | Description |
 |-------|-------------|
@@ -18,6 +18,7 @@ Extensions can hook into 18 lifecycle events:
 | `OnAgentEnd` | Agent loop completed |
 | `OnToolCall` | Tool call requested by the model |
 | `OnToolExecutionStart` | Tool execution beginning |
+| `OnToolOutput` | Streaming tool output chunk (for long-running tools) |
 | `OnToolExecutionEnd` | Tool execution completed |
 | `OnToolResult` | Tool result returned |
 | `OnInput` | User input received |
@@ -29,6 +30,7 @@ Extensions can hook into 18 lifecycle events:
 | `OnBeforeFork` | Before forking a conversation branch |
 | `OnBeforeSessionSwitch` | Before switching sessions |
 | `OnBeforeCompact` | Before conversation compaction |
+| `OnCustomEvent` | Custom inter-extension event received |

 ### Example

@@ -51,6 +51,12 @@ Kit ships with a rich set of example extensions in the `examples/extensions/` di
 | [`summarize.go`](https://github.com/mark3labs/kit/blob/master/examples/extensions/summarize.go) | Conversation summarization |
 | [`lsp-diagnostics.go`](https://github.com/mark3labs/kit/blob/master/examples/extensions/lsp-diagnostics.go) | LSP diagnostic integration |

+## Themes
+
+| Extension | Description |
+|-----------|-------------|
+| [`neon-theme.go`](https://github.com/mark3labs/kit/blob/master/examples/extensions/neon-theme.go) | Custom theme registration and switching |
+
 ## Multi-agent

 | Extension | Description |
@@ -74,3 +80,7 @@ Kit ships with a rich set of example extensions in the `examples/extensions/` di
 | [`kit-kit-agents/`](https://github.com/mark3labs/kit/tree/master/examples/extensions/kit-kit-agents) | Multi-agent orchestration example |
 | [`kit-telegram/`](https://github.com/mark3labs/kit/tree/master/examples/extensions/kit-telegram) | Telegram bot integration |
 | [`status-tools/`](https://github.com/mark3labs/kit/tree/master/examples/extensions/status-tools) | Status bar tool examples |
+
+## Project-local example
+
+The Kit repository also includes a project-local extension at `.kit/extensions/go-edit-lint.go` that demonstrates running `gopls` and `golangci-lint` on Go file edits. This serves as an example of how to create extensions specific to a project by placing them in the `.kit/extensions/` directory.
@@ -20,6 +20,7 @@ Kit supports a wide range of LLM providers through a unified `provider/model` st
 | **Google Vertex** | `google-vertex-anthropic/` | Claude on Vertex AI |
 | **OpenRouter** | `openrouter/` | Multi-provider router |
 | **Vercel AI** | `vercel/` | Vercel AI SDK models |
+| **Custom** | `custom/` | Any OpenAI-compatible endpoint |
 | **Auto-routed** | any | Any provider from the models.dev database |

 ## Model string format
@@ -132,6 +133,16 @@ For self-hosted or proxy endpoints:
 kit --provider-url "https://my-proxy.example.com/v1" --model openai/gpt-4o
 ```

+When `--provider-url` is provided without `--model`, Kit automatically defaults to `custom/custom`:
+
+```bash
+kit --provider-url "http://localhost:8080/v1" "Hello"
+```
+
+The `custom/custom` model has zero cost, 262K context window, and supports reasoning. It routes through fantasy's `openaicompat` provider and accepts any OpenAI-compatible API endpoint.
+
+Optionally set `CUSTOM_API_KEY` environment variable or use `--provider-api-key` for endpoints requiring authentication.
+
 ## Model database

 Kit ships with a local model database that maps provider names to API configurations. You can manage it with:
Author	SHA1	Message	Date
Ed Zynda	09919b6307	feat: update token usage after each step in multi-step turns Previously, token usage and costs were only updated at the end of a complete turn. For long-running multi-step tool-calling conversations, this meant the status bar showed stale (or zero) costs during the entire interaction. Now, after each complete step (tool call + result), the usage tracker is updated with the actual token counts from that step. This provides real-time cost accumulation visible in the status bar. Changes: - Add StepUsageHandler type and onStepUsage parameter to agent - Emit StepUsageEvent from kit layer after each step completes - Handle StepUsageEvent in app layer to update UsageTracker - Add EventStepUsage constant and StepUsageEvent struct to events The step usage is additive - each step's tokens are added to the running session totals, just like the final turn usage was before.	2026-03-25 18:17:48 +03:00
Ed Zynda	7a2de4cc3c	fix: update token counting when switching models mid-session When switching models (e.g., via /model command or ctx.SetModel), the usage tracker now updates its model info to reflect the new model's: - Pricing for cost calculations - Context limits for percentage display - OAuth status (to show bash costs when using OAuth creds) Previously, token costs and context percentages continued using the old model's settings after a switch, causing incorrect display for: - Users switching from paid to free/OAuth models - Users switching between models with different pricing Changes: - Add UpdateModelInfo() method to UsageTracker - Call UpdateModelInfo() in both SetModel callbacks (extension and UI) - Add auth import for OAuth detection in root.go	2026-03-25 18:09:36 +03:00
Ed Zynda	acd7fd7f45	feat(ui): add line truncation to bash streaming output Add width and count truncation to renderStreamingBashOutput to prevent long-running commands from blowing up the TUI layout: - Per-line width truncation via truncateLine() (ANSI-aware, matches final bash tool renderer behavior) - Display cap at maxBashLines (20) showing the tail (latest output) - Truncation hint '...(N more lines above)' when lines are hidden The buffer still accumulates up to 50 lines for context, but only the last 20 are rendered during streaming. This is consistent with how the final bash tool result is displayed.	2026-03-25 18:02:50 +03:00
Ed Zynda	3446f38516	feat(ui): add line truncation to Ls tool renderer Add maxLsLines (20) constant and truncate Ls output in the TUI to prevent large directory listings from blowing up the layout. Shows a '...(N more entries)' hint when truncated, consistent with all other core tool renderers (Edit, Read, Write, Bash, Subagent).	2026-03-25 17:48:37 +03:00
Ed Zynda	db4bb19bac	fix: derive diff/code bg colors from active theme instead of hardcoding KITT defaults - makeTheme() and fileConfigToTheme() now compute DiffInsertBg, DiffDeleteBg, DiffEqualBg, DiffMissingBg, CodeBg, GutterBg, and WriteBg by blending the theme's own Background with its Success/Error colors, so every theme gets properly tinted diff backgrounds. - Added color derivation helpers: parseHexColor, blendHex, deriveDiffBg. - File-based themes still allow explicit diff color overrides; derived colors are used only as fallbacks. - formatToolParams() now skips body-content keys (content, old_text, new_text, etc.) from the header line regardless of value length, preventing raw unformatted code from appearing above the formatted body.	2026-03-25 17:41:37 +03:00
Ed Zynda	d1cffb85ef	fix: prevent bash tool from hanging on long-running/background processes - Use process group isolation (Setpgid) so the entire process tree is killed on timeout/cancellation, not just the direct child - Set cmd.Cancel to kill the process group (-pgid) with SIGKILL - Set cmd.WaitDelay (500ms grace period) to force-close pipes when grandchild processes hold them open after the direct child exits - Convert buffered path from cmd.Run() to explicit pipes + cmd.Start() + cmd.Wait() so WaitDelay can properly force-close pipe handles - Reorder streaming path: cmd.Wait() before wg.Wait() so the WaitDelay timer starts when the child exits, not after pipes close - Add mutex for thread-safe chunk collection in streaming mode - Add comprehensive tests for timeout, background processes, context cancellation, and both buffered/streaming paths	2026-03-24 15:13:35 +03:00
Ed Zynda	329cd4ea4a	feat: Add custom models via config file Allow users to define custom models in ~/.kit.yml under the customModels section. These models are automatically merged into the custom provider. Example config: customModels: my-model: name: "My Custom Model" reasoning: true temperature: true cost: input: 0.002 output: 0.004 limit: context: 128000 output: 32000 Usage: kit --model custom/my-model "Hello" kit --provider-url "http://localhost:8080" --model custom/my-model "Hello" Note: When --provider-url is specified without --model, kit defaults to custom/custom. When --provider-url is specified WITH a custom model from config, that model is used. Bug fixes: - Fixed kit.New() re-loading config file and overriding CLI-specified config - Fixed models command to reload registry for custom models	2026-03-24 14:19:49 +03:00
Ed Zynda	4e779d576f	docs: Add custom provider documentation Update README.md and www/pages/providers.md to document the new custom/custom model that auto-loads when --provider-url is specified.	2026-03-24 13:38:52 +03:00
Ed Zynda	fc054f50e8	Add custom/custom stub model for --provider-url When users pass --provider-url without --model, automatically default to custom/custom instead of the saved model preference. This lets users point kit at any OpenAI-compatible endpoint without needing a provider/model pair from the database. The custom/custom model has: - Zero cost (input/output = 0) - 262K context window, 65K output limit - Reasoning and temperature support - Routes through openaicompat fantasy provider	2026-03-24 13:28:23 +03:00
Ed Zynda	d8f1b32885	Remove --skill flag from skill subcommand to install all skills The skill command now runs 'npx skills add mark3labs/kit' without filtering to a single skill, installing both available skills: 1. Extensions - creating Kit extensions 2. SDK - building with the Kit Go SDK	2026-03-23 17:54:53 +03:00
Ed Zynda	1e2a3e2589	fix: preserve completed messages in session on ESC cancellation Previously, pressing ESC twice to cancel rolled back the entire tree session to the pre-turn state, discarding the user message, completed tool call/result pairs, and any streamed response. Content that had already rendered in the TUI would vanish from the session history. Now the cancellation path uses the same logic as the non-cancellation error path: the user message (already persisted before generation) and any completed step messages (fully-paired tool_use + tool_result from OnStepFinish) are preserved. Only the in-progress pending message or tool call is discarded. This ensures that if a message has rendered in the TUI, it stays in the history and session.	2026-03-23 17:51:22 +03:00
Ed Zynda	c7f43917b1	Add kit-sdk skill for building Go applications with the Kit SDK Comprehensive reference covering the full pkg/kit API surface: - Core lifecycle (New, Prompt, Close) - All 8 prompt method variants - Event system with 14 event types - Hook system with 6 interceptor types and priorities - Tool constructors, bundles, and runtime querying - Session management (5 modes, branching, discovery) - Model management and registry lookups - Context estimation and compaction - In-process subagents with event streaming - Authentication, skills, config resolution - 7 common patterns (scripting, daemon, streaming, etc.) - Re-exported types reference	2026-03-23 17:27:53 +03:00
Ed Zynda	6a8833a7b1	add crypto-monitor SDK example Background agent that checks BTC/ETH prices every 30 minutes via the CoinGecko API and sends desktop notifications through notify-send. Demonstrates long-running autonomous agents with the Kit SDK.	2026-03-23 16:08:25 +03:00
Ed Zynda	82cbf1d457	move SDK examples from pkg/kit/examples to examples/sdk Relocate SDK usage examples to the top-level examples directory alongside extension examples for better discoverability. Add README for the SDK examples directory.	2026-03-23 16:02:54 +03:00
Ed Zynda	ab09d5c9e4	docs: update README and www docs for recent features - Update lifecycle events count from 18 to 20 - Add OnToolOutput event documentation for streaming tool output - Add OnCustomEvent to lifecycle events list - Fix go-edit-lint.go reference (it's project-local, not in examples/) - Add neon-theme.go to examples list - Add project-local extension example section	2026-03-22 21:11:18 +03:00
Ed Zynda	2347e0e506	fix: uniform background for thinking output blocks Add Background(theme.MutedBorder) to all text elements in reasoning blocks: contentStyle, hintStyle, and footer styles. Previously these only specified foreground colors, causing them to inherit the terminal's default background instead of matching the box background.	2026-03-22 20:50:14 +03:00
Ed Zynda	3e1c19442b	feat(extensions): expose ToolOutputEvent to extensions API Extensions can now subscribe to streaming tool output events using OnToolOutput(), giving them the same power as the internal TUI to observe and react to tool execution in real-time. Changes: - Add ToolOutputEvent struct to extensions API - Add ToolOutput constant to EventType - Add OnToolOutput() handler registration method - Add event bridging from kit to extensions runner - Export ToolOutputEvent in Yaegi symbols - Add OnToolOutput() to public SDK (pkg/kit) Example usage in an extension: api.OnToolOutput(func(e ext.ToolOutputEvent, ctx ext.Context) { ctx.PrintInfo(fmt.Sprintf("%s: %s", e.ToolName, e.Chunk)) })	2026-03-22 20:28:30 +03:00
Ed Zynda	3fc0ad906e	feat(ui): streaming bash output in TUI Display streaming bash output in the TUI stream region as it arrives. Changes: - Add streaming bash output rendering to renderStream() - Style stdout with CodeBg, stderr with Error color - Add streamingMu mutex for thread-safe buffer access - Clear buffers on ToolResultEvent - Add ToolOutputEvent to event system (pkg/kit, internal/app) - Add ToolOutputHandler callback in agent - Implement streaming mode in bash tool with pipes - Add tests for accumulation and clearing The streaming output appears in real-time below the LLM streaming text while bash commands are executing, with proper synchronization to prevent race conditions between Update and Render methods.	2026-03-22 20:23:19 +03:00
Ed Zynda	f373c34f54	ui: remove strikethrough from diff delete lines for better readability	2026-03-22 20:21:39 +03:00
Ed Zynda	1206837af4	fix(go-edit-lint): run golangci-lint against ./... instead of single file Running golangci-lint on a single file caused false positives due to missing package context. Now it analyzes the entire package (./...) while gopls still provides fast, targeted feedback on the edited file.	2026-03-22 20:19:24 +03:00