Multi-stage AI architecture optimized for speed. Web search, reasoning, and JSON rendering in under 40 seconds.

The Speed Challenge

Traditional research takes minutes or hours:

Manual research: 30+ minutes
Standard LLM: 60+ seconds (single pass)
Multi-step agents: 2+ minutes (sequential)

For AI agents, speed matters. Every second of latency impacts user experience and costs.

Our 30-Second Pipeline

Deep Research API delivers comprehensive research in 30-40 seconds through a multi-stage architecture:

Stage 1: Parallel Web Search (5-8s)

We don't search sequentially. We search 12+ sources in parallel:

// Parallel search execution
const sources = await Promise.all([
  searchGoogle(query),
  searchBing(query),
  searchNewsAPI(query),
  searchAcademic(query),
  // ... 8 more sources
])

Benefits:

Faster total time (parallel vs sequential)
Comprehensive coverage (diverse sources)
Redundancy (if one fails, others succeed)

Stage 2: Intelligent Filtering (2-3s)

Not all sources are useful. We filter:

Relevance scoring: Rank by query match
Quality filtering: Remove low-quality sites
Deduplication: Eliminate duplicate content
Freshness: Prioritize recent content

Result: 5-8 high-quality, unique sources.

Stage 3: AI Reasoning (10-15s)

We use fast LLMs (Llama 3.1 8b) for initial analysis:

const summary = await fastLLM({
  model: "llama-3.1-8b",
  prompt: `Analyze these sources: ${sources}`,
  max_tokens: 2000
})

Why fast models?

Lower latency (< 2s per request)
Sufficient for summarization
Cheaper per token
Can be parallelized

Stage 4: Structured Output (8-12s)

Final synthesis with quality model (Llama 3.3 70b):

const report = await qualityLLM({
  model: "llama-3.3-70b",
  prompt: `Create JSON report from: ${summary}`,
  schema: userSchema,
  max_tokens: 4000
})

Why quality model here?

Better reasoning and synthesis
Follows schema precisely
Produces clean JSON
Handles complex structures

Optimization Techniques

Streaming Responses

We stream results as they're available:

const response = await deepResearch({
  query: "Your query",
  webhook: "https://your-endpoint.com/callback"
})

// Results POSTed as soon as ready
// No polling needed

Caching

Identical queries return cached results:

// First call: 35s
await deepResearch({ query: "Tesla stock price" })

// Second call: 0.5s (cached)
await deepResearch({ query: "Tesla stock price" })

Batch Processing

Process multiple queries in one call:

const results = await deepResearch({
  queries: [
    "Tesla stock price",
    "Rivian stock price",
    "Lucid stock price"
  ]
})
// Single API call, multiple results

Performance Metrics

Our pipeline achieves:

Metric	Value
Median Latency	35 seconds
P95 Latency	40 seconds
P99 Latency	45 seconds
Success Rate	99.2%
Sources per Query	12+

Get Started

Experience 30-second deep research.

Get Your API Key

Read Performance Docs

Building a 30-Second Deep Research Pipeline