January 20, 2026·14 min read·Kenneth Pernyér·1.8k views·289 appreciation

Building the 'Last Piece of Software' Means Building an Intent Codec

Why the decoder for trust is the real product

intentcompressiongovernancearchitectureagents

Lovable's tagline—"Building the last piece of software"—is provocative in exactly the right way: it forces a hard question.

If software creation is becoming cheap, what does it really mean to be the last piece—and what problem must that last piece actually solve?

Here is the claim:

The last piece of software is an intent codec: it minimizes what humans must specify and maximizes what systems can safely reconstruct.

And a corollary that matters more than any demo:

The winners won't be whoever renders the prettiest app from a prompt. They'll be whoever owns the decoder for trust.

Lovable is one of a small set of teams positioned to matter here. They have a brand, a community, and a channel to the market that most infra teams will never touch. Anthropic and others have similarly strong positions on the model side. The argument below is written in the spirit of: here's how to make sure "last piece of software" is more than a slogan.

1) Software is turning into compression

The right analogy isn't "AI writes code." It's compression.

Compression is never just about the encoder. It is about what the encoder can assume the decoder already knows how to reconstruct.

The evolution of codecs tells the story:

MJPEG: every frame is a full JPEG. The encoder does all the work; the decoder is dumb and just renders pixels. Predictable, but expensive—you ship everything.
H.264/AVC: the encoder ships keyframes plus deltas (motion vectors, residuals, predictions) and the decoder reconstructs intermediate frames. The smarter the motion model in the decoder, the more aggressively the encoder can compress.
H.265/HEVC, AV1, and beyond: larger block sizes, better prediction, film grain synthesis, neural post-processing. As the decoder's world model improves, you no longer ship frames; you ship just enough signal for the decoder, under shared assumptions, to rebuild them.

The breakthrough now is decoders that understand context, not just blocks and motion.

NVIDIA's Maxine is a canonical example: instead of sending compressed pixels, it extracts facial keypoints on the sender side and uses a generative model on the receiver side to render realistic faces from those sparse keypoints, achieving roughly an order-of-magnitude bandwidth reduction versus H.264 for video conferencing while preserving perceived quality. The decoder doesn't just decompress; it understands faces well enough to regenerate them from a semantic representation.

This generalizes:

Video conferencing: ship facial keypoints; the decoder reconstructs talking heads because it understands facial structure and motion.
Sports and live video: ship semantic or physical structure—player positions, ball trajectories, event labels—and let a learned decoder fill in visually plausible frames.
Neural codecs: ship a compact latent or structured representation and let a powerful decoder reconstruct a high-quality video stream on commodity devices in real time.

Every generation of compression follows the same rule:

Invest in the decoder's understanding of the domain, and you can strip the payload down to what the decoder cannot infer.

When the decoder is dumb, you ship pixels. When it understands motion, you ship motion vectors. When it understands faces, you ship keypoints. When it understands the domain, you ship intent.

Software is following the same trajectory.

Traditional software is MJPEG: you ship everything—code, workflows, integration logic, infrastructure, tests, edge-case handling—frame by frame. The runtime is a dumb decoder: it executes exactly what you gave it.

AI-native software wants to be Maxine: you ship a semantic representation of what you want, and a reconstruction engine that understands your domain fills in the rest.

That's where the analogy breaks—and why Lovable, Anthropic, and others need a stronger core than "prompt → app."

2) Where the analogy breaks: business has no physics

Maxine has a safety net: perceptual fidelity. The reconstructed face just needs to look right. If a few pixels are wrong, no one notices.

Business systems have no such luxury.

A video codec that hallucinates a few pixels is fine. A business system that hallucinates an approval, a payment, or a compliance state is a liability.

Faces are constrained by physics and anatomy. The decoder will not produce a three‑eyed face because its training data and architecture enforce natural invariants.

Business logic has no natural invariants. A confident model can generate any plausible-sounding outcome. There is no "physics of invoices" or "anatomy of approvals." Plausible and correct are different properties.

So the question is not just "can we compress intent?"

The question is: what guarantees that reconstruction doesn't hallucinate a business outcome?

The answer is not "a better model." It is a decoder architecture that knows the difference between "this looks right" and "this is allowed to be true." One that can enforce constraints, verify invariants, and refuse to proceed when it cannot guarantee correctness.

That's the decoder for trust.

This is exactly where UX‑first "prompt → app" products risk drifting: creation gets magical, but the trust boundary becomes implicit, ad‑hoc, or bolted on later.

3) Writing as a re-rendered medium

If video compression shows where the technology is heading, writing shows where the experience is heading.

Historically, writing was asymmetric:

The writer does heavy work—research, structuring, drafting, revising—to produce a fixed artifact.
The reader receives that artifact and extracts meaning through effort.

The payload is the document. Heavy, complete, the same for everyone. This is MJPEG for ideas.

Now both encoder and decoder are becoming intelligent:

As a writer, AI can compress messy thoughts into structured arguments, translate between registers, explore variations, highlight logical gaps, and expand terse notes into full prose. The job shifts from "produce the final artifact" to "specify intent clearly enough that an intelligent system can help render it."

As a reader, AI can summarize long documents, answer follow-up questions, translate jargon into your domain, adapt presentation to your background, extract action items, and cross-check claims against other sources. The job shifts from "decode the fixed artifact" to "reconstruct understanding from a semantic payload."

When both sides become intelligent, the artifact in the middle matters less than the intent that produced it.

Maxine ships keypoints instead of pixels, trusting the decoder to reconstruct your face. Writers begin to ship structured intent instead of fixed prose, trusting the reader's AI to reconstruct the presentation.

The document becomes a waypoint—a serialization of meaning that gets re-rendered at read time.

That raises questions for writing—authorship, canonical versions, accountability when AIs transform text—but for software the stakes are far higher.

When a user says "build me a CRM" and an AI generates one, who is accountable for what it does?

The user who prompted it?
The model that generated it?
The platform that hosted it?
The runtime that executed it?

In a world where anyone can "generate an app," the scarce asset is not the app. It is the system that ensures generated apps respect reality.

That means:

Constraints that must hold regardless of what was generated.
Authority boundaries that cannot be crossed by confident inference.
Audit trails that explain why this outcome, not another.
Reproducibility so Tuesday's behavior can be debugged on Wednesday.

Maxine can reconstruct faces because faces have physics. Writing can be re-rendered because language has conventions and social norms. Business software needs explicit constraints because business logic has neither.

The decoder for trust is not optional. It is the product.

4) Intent compression needs an "intent packet"

Video and writing both point to the same structural truth:

As decoders get more capable, what you ship approaches pure intent.

Maxine ships keypoints. Writers ship semantic structure. Software wants to ship jobs to be done.

But they also expose the trap:

Intent without constraints is a wish. Reconstruction without guarantees is guessing.

In video, the constraint is perceptual fidelity. In writing, semantic fidelity and social accountability. In software, especially business software, the constraint is operational correctness—with no external ground truth. The system is the truth-maker.

"Prompt → app" demos showcase a better encoder (natural language → code), but they hide the decoder problem: what guarantees that the generated system does what was actually needed, not what the model inferred?

To fix this, you need an explicit intent packet. Not a vibe, not "build me a CRM," but a minimum viable specification of what must be true:

Jobs-to-be-done Outcomes, not features.
- Track every customer interaction so nothing falls through the cracks.
- Surface deals at risk before they stall.
- Hand off qualified leads to sales within 4 hours.
Constraints and invariants Business physics: what must always be true or never happen.
- Every contact has exactly one owner.
- No outreach without consent verification.
- Revenue recognition follows contract terms, not invoice dates.
Authority model What can be automated vs. what requires explicit approval.
- AI drafts emails; humans approve sends.
- AI scores leads; humans route key accounts.
- AI proposes discounts up to 10%; higher requires escalation.
Contracts for outputs The valid shape and semantics of results.
- A qualified lead must have: budget confirmed, authority identified, timeline under 90 days.
- A completed task must include assignee sign-off, artifact link, timestamp.
Budgets and stop conditions When and how the system stops honestly.
- Retry up to N times, then escalate with full context.
- If confidence drops below threshold, pause and ask.
- Never cross a certain boundary without human confirmation.
Traceability requirements What must be recorded to explain and reproduce behavior.
- Every state change logged with actor, timestamp, and reason.
- Every AI inference tagged with model version and input hash.
- Every decision reversible or at least auditable.

This is the intent packet. It is what the encoder (human, app builder, or upstream system) must specify, and what the decoder (reconstruction engine) uses to safely fill in everything else.

Maxine needs keypoints to reconstruct faces. The decoder for trust needs the intent packet to reconstruct business behavior.

If you cannot express this, you cannot safely reconstruct. You're shipping vibes and hoping the decoder guesses correctly.

For Lovable and peers, the opportunity is obvious: turn "prompting" into structured intent packet authoring—without killing the UX.

5) Multi-agent reconstruction, single kernel

A single giant model is not "the last software." It is a powerful guesser.

Real jobs-to-be-done need:

Specialized agents (research, planning, execution, validation, monitoring).
Collaboration patterns (fan-out exploration, fan-in synthesis, pipelines with dependencies).
A singular kernel that enforces the boundary between suggestion and authority.

Most agent frameworks treat "multi-agent" as "more autonomy in more places," and it shows up as:

Unbounded loops of agents refining forever.
Hidden retries masking failure as "iteration."
Self-reflection theater with no proofs.
Silent state mutation where agents change facts without authorization.
Confident answers with no traceable provenance.

Compression gives the right pattern.

In Maxine, multiple keypoint sequences can be applied, but the GAN decoder is the authority on what a face looks like. Corrupted keypoints do not produce a three‑eyed human—the decoder's learned constraints block it.

In multi-agent systems, you need the same: agents as encoders, kernel as decoder.

Agents compress their specialized work into proposals.
The kernel evaluates proposals against the intent packet.
Only validated proposals become facts in an append-only log.
Everything else stays as context.

This is the architecture that makes "last piece of software" coherent:

Multiple specialized agents contribute proposals.
Proposals are suggestions, not actions.
A singular kernel enforces jobs, constraints, authority, contracts, budgets, traceability.
Only kernel‑approved state transitions become truth.

Just as Maxine will not render a three‑eyed face, the kernel must not commit a business outcome that violates the intent packet.

6) What the decoder for trust must do

Given this framing, the decoder requirements become crisp:

Enforce intent, don't infer it. Jobs, constraints, authority, contracts, budgets, traceability are inputs, not hints. The decoder does not guess what was meant; it upholds what was declared.
Separate proposals from facts. Agents propose; the kernel validates. Only validated proposals become truth. That's not bureaucracy—it is how you avoid authority leaks in multi-agent environments.
Make convergence explicit. You have an answer not when "the model stops talking," but when proposals converge to a state that satisfies contracts. Convergence should be detectable, not assumed.
Make stopping honest. When constraints cannot be satisfied, budgets are exhausted, or confidence is insufficient, the system must stop with a clear explanation, not fill the gap with confident prose. In video you can degrade gracefully; in business you must fail explicitly.
Make replay possible. If Tuesday's behavior cannot be reproduced on Wednesday, it cannot be debugged, audited, or trusted. Determinism where it matters is non‑negotiable.
Make authority explicit. Every action has an actor; every actor has a scope; scope is declared and enforced, not inferred from capability.

That is the decoder for trust.

Not a model. Not a prompt library. Not a workflow builder with AI steps.

A kernel that enforces the boundary between suggestion and authority, inference and fact, plausible and proven.

7) Who is actually positioned to build the "last piece"?

If "last software" just means "prompt → UI + backend," UX‑first teams like Lovable have a real advantage. They make the first 80% feel effortless.

But business software doesn't break in the first 80%. It breaks in the last 20%:

Hidden authority decisions.
Edge cases and compliance.
Drift over time.
"Who approved this?"
"Why did it do that on Tuesday at 09:00?"

That's why the landscape looks like three overlapping lanes:

Lane	Focus	Strength	Risk
UX-first (Lovable, Bolt, others)	Creation, speed, accessibility	Own the market's imagination and the front door	Governance becomes a retrofit; beautiful demos crumble under real-world pressure
Model-first (Anthropic, OpenAI, Google, etc.)	General decoders (frontier models)	Best raw generation, broad tooling ecosystems	Governance is "best effort," and best effort quietly becomes authority
Governance-kernel-first (trust substrate)	Correctness boundaries, traceability, explicit authority	Defensible trust layer under any generator	Harder to explain, less flashy, slower to demo—until production demands answers

If we talk seriously about the "last piece," it is not the prettiest generator. It is the trust kernel underneath.

This is where Lovable, Anthropic, and similar players can either reinforce their core—or drift into "amazing demos, fragile production." The goal of this argument is to help them choose the former.

At Converge, the bet is explicit: AI as infrastructure, not charisma. Multi-agent freedom at the edges, mathematically constrained authority at the core.

8) The bet

Most debates about who will "build the last piece of software"—Lovable, Anthropic, OpenAI, Google, others—focus on encoder quality:

Who generates the most impressive output from the smallest prompt?

That's the wrong question.

The right question is:

Who will ship the decoder that makes reconstruction trustworthy?

Encoder-side capabilities are getting commoditized. Models will get better. Generation will get cheaper. The gap between "pretty demo" and "runs in production" will keep widening.

The decoder side is where defensibility lives:

Governance as architecture, not an afterthought.
Contracts as enforcement, not documentation.
Authority as explicit grants, not accumulated convenience.
Convergence as a mathematical property, not vibes.

The last piece of software is an intent codec.

The encoder is the human or system specifying intent packets: jobs, constraints, authority, contracts, budgets, traceability.
The decoder is the kernel that reconstructs correct behavior from that specification, using whatever agents, models, and tools are appropriate, while guaranteeing that reconstruction stays within bounds.

NVIDIA built Maxine by investing in a decoder that understands faces. The "last piece of software" requires a decoder that understands trust.

That is the problem Converge is designed to solve—exactly that, nothing more.

AI that can talk is not a system that can run a business. The gap is governance. Autonomy is cheap. Trust is engineered.

Sources: NVIDIA Maxine | Neural Talking-Head Synthesis | CVPR 2025 Real-Time Neural Codecs