Infrastructure8 min readDecember 10, 2024

Cut AI Costs by 70% with Intelligent Routing

Smart query routing avoids expensive web searches when context is sufficient. Save money without sacrificing quality.

UnforgeAPI Team

Share:

The Cost Problem

AI APIs are expensive. A single web search can cost $0.01-0.05, and LLM inference adds more. For agents making thousands of requests, costs spiral quickly.

Where Money Goes

OperationCost
Web Search$0.01-0.05 per query
LLM Inference$0.0001-0.01 per 1K tokens
Total per Agent Request$0.02-0.10

At 10,000 requests/month: $200-1,000/month

The Routing Solution

Deep Research API's intelligent router analyzes each query and routes it to the optimal path:

CHAT Path (Free)

For casual queries that don't need web search:

// Query: "What's 2+2?"
// Router: No search needed
// Cost: $0.0001 (LLM only)

const response = await deepResearch({
  query: "What's 2+2?"
})

// Uses cached LLM, no web search
// Cost: ~$0.0001

CONTEXT Path (Free)

For queries answerable from provided context:

// Query: "What's in my document?"
// Router: Answer from context
// Cost: $0.0001 (LLM only)

const response = await deepResearch({
  query: "What's in my document?",
  context: userDocument
})

// Uses provided context, no web search
// Cost: ~$0.0001

RESEARCH Path (Paid)

Only when web search is actually needed:

// Query: "What's Tesla's current stock price?"
// Router: Web search required
// Cost: $0.01-0.05 (search + LLM)

const response = await deepResearch({
  query: "What's Tesla's current stock price?"
})

// Performs web search + LLM
// Cost: $0.01-0.05

How Routing Works

Intent Classification

The router analyzes query structure and context:

interface RouterAnalysis {
  intent: "CHAT" | "CONTEXT" | "RESEARCH"
  confidence: number
  reason: string
}

const analysis: RouterAnalysis = {
  intent: "CONTEXT",
  confidence: 0.95,
  reason: "Answerable from provided context"
}

Decision Logic

if (analysis.intent === "CHAT") {
  // Casual query - use fast LLM
  routeTo("chat_path")
} else if (analysis.intent === "CONTEXT") {
  // Context available - skip search
  routeTo("context_path")
} else if (analysis.intent === "RESEARCH") {
  // Web search needed
  routeTo("research_path")
}

Real-World Savings

Example 1: Customer Support Bot

// Without routing: Every query = $0.02
100 queries/day × $0.02 = $2/day

// With routing:
- 60 queries × $0.0001 (chat) = $0.006
- 40 queries × $0.0001 (context) = $0.004
- 0 queries × $0.02 (research) = $0
Total: $0.01/day

// Savings: 99.5%

Example 2: Document QA System

// Without routing: Every query = $0.02
1000 queries/day × $0.02 = $20/day

// With routing:
- 950 queries × $0.0001 (context) = $0.095
- 50 queries × $0.02 (research) = $1
Total: $1.095/day

// Savings: 94.5%

Example 3: Research Assistant

// Without routing: Every query = $0.02
500 queries/day × $0.02 = $10/day

// With routing:
- 200 queries × $0.0001 (chat) = $0.02
- 300 queries × $0.02 (research) = $6
Total: $6.02/day

// Savings: 39.8%

Best Practices

Provide Context

Always include relevant context to trigger cheaper paths:

const response = await deepResearch({
  query: "What's the status of my order #12345?",
  context: orderHistory
})

// Routes to CONTEXT (free) instead of RESEARCH (paid)

Batch Queries

Combine multiple queries when possible:

const results = await deepResearch({
  queries: [
    "Order #12345 status",
    "Order #12346 status",
    "Order #12347 status"
  ]
})

// Single API call, shared context

Monitor Routing

Check which paths your queries use:

const response = await deepResearch({ query: "..." })

console.log("Routed to:", response.meta?.routed_to)
console.log("Search skipped:", response.meta?.search_skipped)

// Optimize based on routing patterns

Get Started

Start saving on AI costs today.

Get Your API Key

Read Routing Documentation

Tags:InfrastructureAI AgentsDeep Research

Ready to Build with AI?

Join developers using UnforgeAPI to ship intelligent applications faster with our Hybrid RAG engine.