LLM-based Agents
Converge-Provider and the model landscape
The Problem
The LLM landscape is a moving target. New models every month. Different APIs, different capabilities, different pricing. Building on one provider means rebuilding when you need another.
The agent problem is harder than the model problem.
Calling an LLM API is easy. Building reliable agents—systems that reason, plan, use tools, and recover from errors—is not. Most agent frameworks are toys: they work in demos but fail in production.
Production agents need:
- Model abstraction: Switch providers without rewriting code
- Structured outputs: Guaranteed schema compliance, not "please format as JSON"
- Tool orchestration: Reliable function calling with validation
- Context management: Handle conversation history, RAG, and memory
- Observability: Trace every decision, every tool call, every token
We built Converge-Provider to solve these problems once, correctly.
Current Options
| Option | Pros | Cons |
|---|---|---|
| Claude (Anthropic)Best-in-class reasoning and instruction following. |
|
|
| GPT-4o / GPT-4 Turbo (OpenAI)Industry standard with broad capabilities. |
|
|
| Gemini 2.0 (Google)Multimodal native with massive context. |
|
|
| Llama 3.3 (Meta)Open-weight model for self-hosting. |
|
|
| Mistral Large / MixtralEuropean alternative with strong performance. |
|
|
| DeepSeek V3Cost-effective reasoning model. |
|
|
Future Outlook
The model landscape will continue fragmenting before it consolidates.
Specialization is coming.
General-purpose models will give way to specialized ones: coding models, reasoning models, multimodal models, domain-specific models. The best agent architectures will route to the right model for each subtask.
Open weights are catching up.
Llama 3.3 70B matches GPT-4 on many benchmarks. The gap between open and closed models shrinks every release. In two years, the best model for many tasks may be open-weight.
Agents are the product, not models.
Models are commoditizing. The value is in the agent layer: orchestration, tool use, memory, reasoning chains. Converge-Provider builds this layer once, uses any model.
Multi-model architectures will dominate.
Production systems will use multiple models: fast/cheap for classification, powerful for reasoning, specialized for code. The abstraction layer that enables this wins.
Our Decision
✓Why we chose this
- Model abstractionUnified interface across Claude, GPT-4, Gemini, Llama, Mistral. Switch providers with config, not code.
- Structured outputsSchema-enforced responses using constrained decoding. Not "please format as JSON"—guaranteed valid output.
- Tool orchestrationType-safe tool definitions, automatic validation, retry logic, parallel execution.
- Context managementSliding window, summarization, RAG integration. Handle conversations longer than any context window.
- ObservabilityEvery prompt, completion, tool call, and decision traced with OpenTelemetry. Debug agents in production.
- StreamingFirst-class streaming support. Stream structured outputs, tool calls, and reasoning chains.
×Trade-offs we accept
- Abstraction costProvider-specific features may be harder to access. Some capability differences are papered over.
- ComplexityMore moving parts than direct API calls. Useful complexity, but complexity nonetheless.
- MaintenanceMust track API changes across all supported providers. New models require integration work.
Motivation
We started with direct API calls to Claude. Then we needed GPT-4 for comparison. Then Gemini for long context. Then local models for privacy-sensitive workloads.
Each addition meant new code paths, new error handling, new retry logic. The agent logic was drowning in provider-specific details.
Converge-Provider extracts what agents actually need:
Completion: Give me a response to this prompt, with this schema, using tools if needed.
Streaming: Same, but stream the response as it generates.
Context: Manage history that exceeds any single context window.
Tools: Call functions with validated inputs, handle errors, retry if needed.
The provider handles the rest. Claude's tool_use format differs from OpenAI's function_calling? Converge-Provider translates. Gemini's streaming chunking is different? Abstracted away.
Our agent code doesn't know which model it's using. That's the point.
Recommendation
Model selection by task:
| Task | Recommended | Why |
|---|---|---|
| Complex reasoning | Claude Opus | Best at multi-step reasoning |
| General tasks | Claude Sonnet / GPT-4o | Good balance of capability and cost |
| Code generation | Claude / GPT-4 / DeepSeek | All strong, test for your domain |
| Long documents | Gemini 2.0 Pro | 1M+ context handles full codebases |
| Cost-sensitive | GPT-4o-mini / Llama 3.3 | 10-100x cheaper than frontier |
| Privacy-critical | Llama 3.3 (local) | Data never leaves your infrastructure |
| Latency-critical | GPT-4o / Claude Haiku | Optimized for speed |
Agent architecture recommendations:
Start with one model. Get the agent working before adding complexity.
Add fallbacks. When Claude is down, fall back to GPT-4. Converge-Provider handles this.
Specialize gradually. Use fast/cheap models for classification and routing. Reserve frontier models for reasoning.
Always trace. Agent debugging without observability is impossible. Every call, every decision, logged.
Structure everything. Never parse free-form text. Define schemas. Let the provider enforce them.
Converge-Provider usage:
converge-provider.model = "claude-sonnet-4"
converge-provider.fallback = ["gpt-4o", "gemini-2.0-pro"]
converge-provider.structured_output = true
converge-provider.tools = [search, calculate, lookup]
The abstraction pays for itself on the first provider outage, the first model upgrade, the first time you need to compare models.
Examples
use converge_provider::{Provider, Model, Tool, Schema};
use serde::{Deserialize, Serialize};
// Define structured output schema
#[derive(Debug, Serialize, Deserialize, Schema)]
struct AnalysisResult {
sentiment: Sentiment,
confidence: f32,
key_points: Vec<String>,
suggested_actions: Vec<Action>,
}
#[derive(Debug, Serialize, Deserialize, Schema)]
enum Sentiment { Positive, Negative, Neutral, Mixed }
#[derive(Debug, Serialize, Deserialize, Schema)]
struct Action {
description: String,
priority: u8,
}
// Define tools the agent can use
fn search_tool() -> Tool {
Tool::new("search")
.description("Search the knowledge base")
.param("query", "string", "Search query")
.param("limit", "integer", "Max results (default 10)")
}
async fn analyze_document(provider: &Provider, document: &str) -> Result<AnalysisResult> {
// Provider handles model selection, retries, structured output
let result = provider
.completion()
.model(Model::ClaudeSonnet)
.fallback(Model::Gpt4o)
.system("You are a document analyst. Analyze the provided document.")
.user(document)
.tools([search_tool()])
.structured_output::<AnalysisResult>()
.send()
.await?;
Ok(result)
}
// Streaming with structured output
async fn stream_analysis(provider: &Provider, document: &str) -> Result<()> {
let mut stream = provider
.completion()
.model(Model::ClaudeSonnet)
.user(document)
.structured_output::<AnalysisResult>()
.stream()
.await?;
while let Some(chunk) = stream.next().await {
match chunk? {
StreamChunk::Partial(json) => println!("Progress: {}", json),
StreamChunk::ToolCall(call) => println!("Tool: {} ({})", call.name, call.id),
StreamChunk::Complete(result) => return Ok(result),
}
}
Err(Error::StreamEnded)
}Converge-Provider abstracts model differences. Structured outputs are schema-enforced. Fallbacks handle provider outages. Streaming works with tools and structured output.
use converge_provider::{Provider, Model, Router};
// Route to different models based on task
fn create_router() -> Router {
Router::new()
// Fast/cheap for classification
.route("classify", Model::Gpt4oMini)
.route("extract_entities", Model::Gpt4oMini)
// Frontier for reasoning
.route("analyze", Model::ClaudeSonnet)
.route("plan", Model::ClaudeOpus)
// Long context for documents
.route("summarize_long", Model::Gemini2Pro)
// Local for privacy
.route("process_pii", Model::Llama3_70b_Local)
// Default
.default(Model::ClaudeSonnet)
}
async fn process_request(provider: &Provider, task: &str, input: &str) -> Result<String> {
let router = create_router();
let model = router.select(task);
provider
.completion()
.model(model)
.user(input)
.send()
.await
}
// Parallel model comparison for evaluation
async fn compare_models(provider: &Provider, prompt: &str) -> Result<Comparison> {
let models = [Model::ClaudeSonnet, Model::Gpt4o, Model::Gemini2Pro];
let results = futures::future::join_all(
models.iter().map(|model| {
provider.completion().model(*model).user(prompt).send()
})
).await;
Ok(Comparison::new(models.iter().zip(results).collect()))
}Router directs tasks to appropriate models. Parallel execution enables A/B testing and model comparison. Privacy-sensitive data routes to local models automatically.