Claudeai

Why Claude

AI that reasons, not just generates

v1.1·10 min read·Kenneth Pernyér
claudeanthropicllmaireasoning

The Problem

Large language models generate plausible text. That's not the same as generating correct text.

Plausibility is dangerous for business systems.

An LLM that confidently generates wrong answers is worse than no LLM at all. When the output is code, a contract clause, or a financial calculation, "sounds right" isn't good enough.

We needed an AI that:

  • Reasons through problems, not just pattern matches
  • Acknowledges uncertainty rather than confabulating
  • Follows instructions precisely when given
  • Can be steered toward specific behaviors reliably

Current Options

OptionProsCons
GPT-4 (OpenAI)Industry standard. Broad capabilities, massive scale.
  • Largest ecosystem and tooling
  • Strong general capabilities
  • Multimodal (vision, audio)
  • Function calling support
  • Can confidently generate incorrect outputs
  • Instruction following can be inconsistent
  • Less transparent about uncertainty
  • Safety measures sometimes over-trigger
Claude (Anthropic)Focused on reasoning and instruction following.
  • Excellent instruction following
  • Acknowledges uncertainty
  • Long context window (200K tokens)
  • Constitutional AI approach
  • Smaller ecosystem than OpenAI
  • Can be overly cautious
  • Limited multimodal capabilities
  • Fewer fine-tuning options
Open Source (Llama, Mistral)Self-hosted models for control and privacy.
  • Full control over deployment
  • No data leaves your infrastructure
  • No per-token costs at scale
  • Customization via fine-tuning
  • Lower capability than frontier models
  • Significant infrastructure required
  • Expertise needed for deployment
  • Slower iteration on capabilities

Future Outlook

The LLM landscape is consolidating around a few leaders while open source catches up.

Reasoning is the differentiator.

As models become commoditized on basic tasks, the frontier moves to reasoning—multi-step problem solving, self-correction, uncertainty quantification. Claude's focus on these capabilities positions it well.

Anthropic's Constitutional AI approach provides a path to reliability that pure scale doesn't. When you can specify desired behaviors and train toward them, you get more predictable outputs.

The future is likely hybrid: frontier models for reasoning, smaller models for routine tasks, local models for latency-sensitive or privacy-critical operations.

Our Decision

Why we chose this

  • Instruction followingDoes what you ask, not what it thinks you meant
  • Honest uncertaintySays "I don't know" rather than confabulating
  • Long context200K token window enables entire codebase in context
  • Constitutional AITrained for specific behaviors, not just capabilities

×Trade-offs we accept

  • Ecosystem sizeFewer integrations and tools than OpenAI
  • CautionCan refuse valid requests due to safety measures
  • CostPremium pricing for frontier model

Motivation

For AI-assisted development, instruction following is everything. When we ask Claude to implement a function with specific constraints, we need it to follow those constraints, not improvise.

Claude's willingness to say "I'm not sure" is equally valuable. For business-critical systems, confident wrong answers are dangerous. We'd rather have the AI flag uncertainty and ask for clarification.

The long context window enables patterns that other models can't support. We can include entire modules, multiple files, and extensive documentation in a single prompt. This context awareness produces better outputs than stitching together multiple short-context calls.

Recommendation

Use Claude for:

  • Code generation where instruction following matters
  • Document analysis leveraging long context
  • Reasoning tasks requiring multi-step logic
  • Anything requiring reliability over creativity

Pair Claude with a smaller, faster model for routine tasks where speed matters more than reasoning depth.

Use the API with structured outputs when possible. Claude's instruction following makes it excellent at producing JSON, TypeScript, or other structured formats reliably.

Examples

lib/ai/claude.tstypescript
import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

export async function generateWithSchema<T>(
  prompt: string,
  schema: string,
  examples: { input: string; output: T }[]
): Promise<T> {
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 4096,
    messages: [
      {
        role: 'user',
        content: `You are a JSON generator. Output only valid JSON matching this schema:
${schema}

Examples:
${examples.map(e => `Input: ${e.input}\nOutput: ${JSON.stringify(e.output)}`).join('\n\n')}

Now generate JSON for:
${prompt}

Output only the JSON, no explanation.`,
      },
    ],
  });

  const text = response.content[0].type === 'text'
    ? response.content[0].text
    : '';

  return JSON.parse(text) as T;
}

Claude excels at structured output generation. Clear instructions and examples produce reliable, parseable JSON.

Related Articles

Stockholm, Sweden

Version 1.1

Kenneth Pernyér signature