Engineering10 min readDecember 28, 2024

Building a 30-Second Deep Research Pipeline

Multi-stage AI architecture optimized for speed. Web search, reasoning, and JSON rendering in under 40 seconds.

UnforgeAPI Team

Share:

The Speed Challenge

Traditional research takes minutes or hours:

  • Manual research: 30+ minutes
  • Standard LLM: 60+ seconds (single pass)
  • Multi-step agents: 2+ minutes (sequential)

For AI agents, speed matters. Every second of latency impacts user experience and costs.

Our 30-Second Pipeline

Deep Research API delivers comprehensive research in 30-40 seconds through a multi-stage architecture:

Stage 1: Parallel Web Search (5-8s)

We don't search sequentially. We search 12+ sources in parallel:

// Parallel search execution
const sources = await Promise.all([
  searchGoogle(query),
  searchBing(query),
  searchNewsAPI(query),
  searchAcademic(query),
  // ... 8 more sources
])

Benefits:

  • Faster total time (parallel vs sequential)
  • Comprehensive coverage (diverse sources)
  • Redundancy (if one fails, others succeed)

Stage 2: Intelligent Filtering (2-3s)

Not all sources are useful. We filter:

  • Relevance scoring: Rank by query match
  • Quality filtering: Remove low-quality sites
  • Deduplication: Eliminate duplicate content
  • Freshness: Prioritize recent content

Result: 5-8 high-quality, unique sources.

Stage 3: AI Reasoning (10-15s)

We use fast LLMs (Llama 3.1 8b) for initial analysis:

const summary = await fastLLM({
  model: "llama-3.1-8b",
  prompt: `Analyze these sources: ${sources}`,
  max_tokens: 2000
})

Why fast models?

  • Lower latency (< 2s per request)
  • Sufficient for summarization
  • Cheaper per token
  • Can be parallelized

Stage 4: Structured Output (8-12s)

Final synthesis with quality model (Llama 3.3 70b):

const report = await qualityLLM({
  model: "llama-3.3-70b",
  prompt: `Create JSON report from: ${summary}`,
  schema: userSchema,
  max_tokens: 4000
})

Why quality model here?

  • Better reasoning and synthesis
  • Follows schema precisely
  • Produces clean JSON
  • Handles complex structures

Optimization Techniques

Streaming Responses

We stream results as they're available:

const response = await deepResearch({
  query: "Your query",
  webhook: "https://your-endpoint.com/callback"
})

// Results POSTed as soon as ready
// No polling needed

Caching

Identical queries return cached results:

// First call: 35s
await deepResearch({ query: "Tesla stock price" })

// Second call: 0.5s (cached)
await deepResearch({ query: "Tesla stock price" })

Batch Processing

Process multiple queries in one call:

const results = await deepResearch({
  queries: [
    "Tesla stock price",
    "Rivian stock price",
    "Lucid stock price"
  ]
})
// Single API call, multiple results

Performance Metrics

Our pipeline achieves:

MetricValue
Median Latency35 seconds
P95 Latency40 seconds
P99 Latency45 seconds
Success Rate99.2%
Sources per Query12+

Get Started

Experience 30-second deep research.

Get Your API Key

Read Performance Docs

Tags:EngineeringAI AgentsDeep Research

Ready to Build with AI?

Join developers using UnforgeAPI to ship intelligent applications faster with our Hybrid RAG engine.