Converge-Providerai

LLM-based Agents

Converge-Provider and the model landscape

v1.1·14 min read·Kenneth Pernyér
agentsllmconverge-providerclaudegpt-4geminillamaorchestration

The Problem

The LLM landscape is a moving target. New models every month. Different APIs, different capabilities, different pricing. Building on one provider means rebuilding when you need another.

The agent problem is harder than the model problem.

Calling an LLM API is easy. Building reliable agents—systems that reason, plan, use tools, and recover from errors—is not. Most agent frameworks are toys: they work in demos but fail in production.

Production agents need:

  • Model abstraction: Switch providers without rewriting code
  • Structured outputs: Guaranteed schema compliance, not "please format as JSON"
  • Tool orchestration: Reliable function calling with validation
  • Context management: Handle conversation history, RAG, and memory
  • Observability: Trace every decision, every tool call, every token

We built Converge-Provider to solve these problems once, correctly.

Current Options

OptionProsCons
Claude (Anthropic)Best-in-class reasoning and instruction following.
  • Excellent at complex reasoning tasks
  • Long context (200K tokens)
  • Strong instruction following
  • Computer use and tool calling
  • Constitutional AI safety
  • Higher latency than some competitors
  • Smaller ecosystem than OpenAI
  • Can be overly cautious
GPT-4o / GPT-4 Turbo (OpenAI)Industry standard with broad capabilities.
  • Largest ecosystem and tooling
  • Strong multimodal (vision, audio)
  • Fast inference (GPT-4o)
  • Mature function calling
  • Fine-tuning available
  • Inconsistent instruction following
  • Can hallucinate confidently
  • Expensive at scale
Gemini 2.0 (Google)Multimodal native with massive context.
  • 1M+ token context window
  • Native multimodal (text, image, video, audio)
  • Strong code understanding
  • Competitive pricing
  • Google Cloud integration
  • Younger than GPT-4/Claude
  • Reasoning can be inconsistent
  • Safety filters can over-trigger
Llama 3.3 (Meta)Open-weight model for self-hosting.
  • Open weights—full control
  • No API costs at scale
  • Privacy—data stays local
  • Fine-tuning freedom
  • Active community
  • Lower capability than frontier
  • Requires GPU infrastructure
  • No official support
Mistral Large / MixtralEuropean alternative with strong performance.
  • Competitive with GPT-4
  • Good multilingual support
  • Open-weight options (Mixtral)
  • EU data residency
  • Smaller ecosystem
  • Less tooling support
  • Fewer integrations
DeepSeek V3Cost-effective reasoning model.
  • Strong reasoning capabilities
  • Very competitive pricing
  • Good code generation
  • MoE architecture efficiency
  • Newer, less proven
  • China-based (data concerns)
  • Limited ecosystem

Future Outlook

The model landscape will continue fragmenting before it consolidates.

Specialization is coming.

General-purpose models will give way to specialized ones: coding models, reasoning models, multimodal models, domain-specific models. The best agent architectures will route to the right model for each subtask.

Open weights are catching up.

Llama 3.3 70B matches GPT-4 on many benchmarks. The gap between open and closed models shrinks every release. In two years, the best model for many tasks may be open-weight.

Agents are the product, not models.

Models are commoditizing. The value is in the agent layer: orchestration, tool use, memory, reasoning chains. Converge-Provider builds this layer once, uses any model.

Multi-model architectures will dominate.

Production systems will use multiple models: fast/cheap for classification, powerful for reasoning, specialized for code. The abstraction layer that enables this wins.

Our Decision

Why we chose this

  • Model abstractionUnified interface across Claude, GPT-4, Gemini, Llama, Mistral. Switch providers with config, not code.
  • Structured outputsSchema-enforced responses using constrained decoding. Not "please format as JSON"—guaranteed valid output.
  • Tool orchestrationType-safe tool definitions, automatic validation, retry logic, parallel execution.
  • Context managementSliding window, summarization, RAG integration. Handle conversations longer than any context window.
  • ObservabilityEvery prompt, completion, tool call, and decision traced with OpenTelemetry. Debug agents in production.
  • StreamingFirst-class streaming support. Stream structured outputs, tool calls, and reasoning chains.

×Trade-offs we accept

  • Abstraction costProvider-specific features may be harder to access. Some capability differences are papered over.
  • ComplexityMore moving parts than direct API calls. Useful complexity, but complexity nonetheless.
  • MaintenanceMust track API changes across all supported providers. New models require integration work.

Motivation

We started with direct API calls to Claude. Then we needed GPT-4 for comparison. Then Gemini for long context. Then local models for privacy-sensitive workloads.

Each addition meant new code paths, new error handling, new retry logic. The agent logic was drowning in provider-specific details.

Converge-Provider extracts what agents actually need:

Completion: Give me a response to this prompt, with this schema, using tools if needed.

Streaming: Same, but stream the response as it generates.

Context: Manage history that exceeds any single context window.

Tools: Call functions with validated inputs, handle errors, retry if needed.

The provider handles the rest. Claude's tool_use format differs from OpenAI's function_calling? Converge-Provider translates. Gemini's streaming chunking is different? Abstracted away.

Our agent code doesn't know which model it's using. That's the point.

Recommendation

Model selection by task:

Task Recommended Why
Complex reasoning Claude Opus Best at multi-step reasoning
General tasks Claude Sonnet / GPT-4o Good balance of capability and cost
Code generation Claude / GPT-4 / DeepSeek All strong, test for your domain
Long documents Gemini 2.0 Pro 1M+ context handles full codebases
Cost-sensitive GPT-4o-mini / Llama 3.3 10-100x cheaper than frontier
Privacy-critical Llama 3.3 (local) Data never leaves your infrastructure
Latency-critical GPT-4o / Claude Haiku Optimized for speed

Agent architecture recommendations:

  1. Start with one model. Get the agent working before adding complexity.

  2. Add fallbacks. When Claude is down, fall back to GPT-4. Converge-Provider handles this.

  3. Specialize gradually. Use fast/cheap models for classification and routing. Reserve frontier models for reasoning.

  4. Always trace. Agent debugging without observability is impossible. Every call, every decision, logged.

  5. Structure everything. Never parse free-form text. Define schemas. Let the provider enforce them.

Converge-Provider usage:

converge-provider.model = "claude-sonnet-4"
converge-provider.fallback = ["gpt-4o", "gemini-2.0-pro"]
converge-provider.structured_output = true
converge-provider.tools = [search, calculate, lookup]

The abstraction pays for itself on the first provider outage, the first model upgrade, the first time you need to compare models.

Examples

src/agent/provider.rsrust
use converge_provider::{Provider, Model, Tool, Schema};
use serde::{Deserialize, Serialize};

// Define structured output schema
#[derive(Debug, Serialize, Deserialize, Schema)]
struct AnalysisResult {
    sentiment: Sentiment,
    confidence: f32,
    key_points: Vec<String>,
    suggested_actions: Vec<Action>,
}

#[derive(Debug, Serialize, Deserialize, Schema)]
enum Sentiment { Positive, Negative, Neutral, Mixed }

#[derive(Debug, Serialize, Deserialize, Schema)]
struct Action {
    description: String,
    priority: u8,
}

// Define tools the agent can use
fn search_tool() -> Tool {
    Tool::new("search")
        .description("Search the knowledge base")
        .param("query", "string", "Search query")
        .param("limit", "integer", "Max results (default 10)")
}

async fn analyze_document(provider: &Provider, document: &str) -> Result<AnalysisResult> {
    // Provider handles model selection, retries, structured output
    let result = provider
        .completion()
        .model(Model::ClaudeSonnet)
        .fallback(Model::Gpt4o)
        .system("You are a document analyst. Analyze the provided document.")
        .user(document)
        .tools([search_tool()])
        .structured_output::<AnalysisResult>()
        .send()
        .await?;

    Ok(result)
}

// Streaming with structured output
async fn stream_analysis(provider: &Provider, document: &str) -> Result<()> {
    let mut stream = provider
        .completion()
        .model(Model::ClaudeSonnet)
        .user(document)
        .structured_output::<AnalysisResult>()
        .stream()
        .await?;

    while let Some(chunk) = stream.next().await {
        match chunk? {
            StreamChunk::Partial(json) => println!("Progress: {}", json),
            StreamChunk::ToolCall(call) => println!("Tool: {} ({})", call.name, call.id),
            StreamChunk::Complete(result) => return Ok(result),
        }
    }
    Err(Error::StreamEnded)
}

Converge-Provider abstracts model differences. Structured outputs are schema-enforced. Fallbacks handle provider outages. Streaming works with tools and structured output.

src/agent/multi_model.rsrust
use converge_provider::{Provider, Model, Router};

// Route to different models based on task
fn create_router() -> Router {
    Router::new()
        // Fast/cheap for classification
        .route("classify", Model::Gpt4oMini)
        .route("extract_entities", Model::Gpt4oMini)

        // Frontier for reasoning
        .route("analyze", Model::ClaudeSonnet)
        .route("plan", Model::ClaudeOpus)

        // Long context for documents
        .route("summarize_long", Model::Gemini2Pro)

        // Local for privacy
        .route("process_pii", Model::Llama3_70b_Local)

        // Default
        .default(Model::ClaudeSonnet)
}

async fn process_request(provider: &Provider, task: &str, input: &str) -> Result<String> {
    let router = create_router();
    let model = router.select(task);

    provider
        .completion()
        .model(model)
        .user(input)
        .send()
        .await
}

// Parallel model comparison for evaluation
async fn compare_models(provider: &Provider, prompt: &str) -> Result<Comparison> {
    let models = [Model::ClaudeSonnet, Model::Gpt4o, Model::Gemini2Pro];

    let results = futures::future::join_all(
        models.iter().map(|model| {
            provider.completion().model(*model).user(prompt).send()
        })
    ).await;

    Ok(Comparison::new(models.iter().zip(results).collect()))
}

Router directs tasks to appropriate models. Parallel execution enables A/B testing and model comparison. Privacy-sensitive data routes to local models automatically.

Related Articles

Stockholm, Sweden

Version 1.1

Kenneth Pernyér signature