---
title: "Agentic vs Workflow-based AI"
pubDate: 2026-04-19T00:00:00.000Z
updatedDate: 2026-04-19T00:00:00.000Z
tags:
  - AI
image: /images/blog/workflow-vs-agent/featured.png
---

Progress in frontier AI models has allowed coding agents to tackle ever more complex and long-running tasks. Benchmarks are saturating fast, and the trajectory is genuinely impressive.

What makes coding agents so effective is a combination of three things. First, software is largely composed of patterns LLMs have seen countless times during training — syntax, idioms, libraries, algorithms. Second, code gives you *automated evaluation for free*. You run it, and you get a compiler error, a failing test, or a stack trace. The agent doesn't need to know whether it's right — the environment tells it. Third, and most importantly, agents can iterate cheaply. A bad attempt costs a few seconds of compute and nothing irreversible happens. In many benchmarks, coding agents are given nothing but terminal access — like [tbench](https://www.tbench.ai/) — and are left to try things, observe results, and loop until the task is done. That tight feedback loop is what allows them to crack problems that would have seemed out of reach just a year ago.

But this combination — abundant training data, automated feedback, and cheap retries — is specific to code. Most business tasks don't have any of it. There is no compiler for procurement policy. LLMs haven't seen your internal approval chains or vendor contracts during training. And placing a wrong order isn't something you can just undo and retry. To understand what this means in practice, let's walk through a concrete example.

---

## The scenario

Bob from OilVentures sends the following request to his procurement department:

> I need a new laptop for my work. I would like:
> - MacBook Pro
> - M4 Max Chip
> - 16 inch
> - 32GB RAM
> - 1TB Storage
>
> I am working in the finance department. Can you please find the best options for me and make the purchase?

The procurement department handles requests like this regularly and wants to automate it. Two constraints are relevant:

- **AwesomeMarketplace** (their vendor) requires brand, model, chip, RAM, and storage to filter products.
- **OilVentures policy** caps laptop spend at **$3,000** for the finance department.

We'll look at three ways to automate this.

---

## 1. The fully agentic approach

A single Procurement Agent gets access to all three tools — marketplace search, company policy, and order placement — and figures out the rest on its own.

```mermaid
flowchart TD
    INPUT([Bob's Laptop Request])
    INPUT --> A1[🤖 Procurement Agent\nreasoning loop]

    A1 -- Search products --> A3[🔍 AwesomeMarketplace\nSearch Tool]
    A1 -- Check policy --> A4[📋 Company Policy\nTool]
    A1 -- Place order --> A5[🛒 AwesomeMarketplace\nOrder Tool]
    A1 -- Not compliant --> OUT2([❌ Order not compliant,\nnotify Bob])

    A3 -- Results --> A1
    A4 -- Policy data --> A1
    A5 --> OUT1([✅ Order placed,\nnotify Bob])
```

This looks elegant, but it has a serious problem: there is no guarantee the agent will check the policy *before* placing the order. It might. It might not. And if it places a non-compliant order, you only find out after the fact. Unlike coding tasks — where a failed test is cheap and reversible — a wrongly submitted purchase order has real consequences. As the ServiceNow CEO [put it](https://open.spotify.com/episode/11xeL3QCAF77z1vxOp0am0?si=fc70f33f0dd64ab4): people forgive other people for making mistakes. They don't forgive software.

---

## 2. The specialised multi-agent approach

Rather than one agent doing everything, we give each task its own dedicated agent with a single tool. Regular code orchestrates the handoffs between them, and Search and Policy agents run in parallel.

```mermaid
flowchart TD
    INPUT([Bob's Laptop Request])

    INPUT -->|code handoff| A1
    INPUT -->|code handoff| A2

    subgraph A1 ["🤖 Search Agent"]
        S1[Parse specs & query] -- Search --> T1[🔍 AwesomeMarketplace\nSearch Tool]
        T1 -- Ranked options --> S1
    end

    subgraph A2 ["🤖 Policy Agent"]
        P1[Retrieve budget cap] -- Lookup --> T2[📋 Company Policy\nTool]
        T2 -- Budget cap $3,000 --> P1
    end

    A1 -->|code handoff| A3
    A2 -->|code handoff| A3

    subgraph A3 ["🤖 Order Agent"]
        O1[Check best match against budget]
        O1 --> O2{Price within\n$3,000 cap?}
        O2 -- Yes --> T3[🛒 AwesomeMarketplace\nOrder Tool]
    end

    T3 --> OUT1([✅ Order placed,\nnotify Bob])
    O2 -- No --> OUT2([❌ Order not compliant,\nnotify Bob])
```

This is better. Each agent has a narrow, well-defined job, and code controls the sequencing — so we know the policy check always happens before the order. But three LLM calls for what is largely a structured task still feels like overkill. The Search and Policy agents are essentially doing retrieval with a thin reasoning wrapper. We're paying the cost of LLM unpredictability where we don't need to.

---

## 3. The hybrid approach: extraction agent + code workflow

The key insight is that there is only *one* genuinely hard part of Bob's request: parsing his free-text message into a structured spec the marketplace API can consume. Everything after that — the search, the policy check, the order — can be handled deterministically by regular code.

```mermaid
flowchart TD
    INPUT([Bob's Laptop Request])

    INPUT -->|code handoff| A1

    subgraph A1 ["🤖 Extraction Agent  —  only LLM step"]
        E1[Parse unstructured request]
        E1 --> E2[Structured spec\nbrand, model, chip, RAM, storage]
    end

    A1 -->|code handoff| W1
    A1 -->|code handoff| W2

    subgraph W1 ["⚙️ Marketplace Search"]
        S1[Query AwesomeMarketplace\nwith exact spec filters]
    end

    subgraph W2 ["⚙️ Policy Lookup"]
        P1[Read Finance dept.\nbudget cap from policy DB]
    end

    W1 -->|code handoff| W3
    W2 -->|code handoff| W3

    subgraph W3 ["⚙️ Order Placement"]
        O1[Check best match against budget]
        O1 --> O2{Price within\n$3,000 cap?}
    end

    O2 -- Yes --> OUT1([✅ Order placed,\nnotify Bob])
    O2 -- No --> OUT2([❌ Order not compliant,\nnotify Bob])
```

Why does this work better than the agentic alternatives?

**Use code where code is enough.** The Marketplace Search and Policy Lookup endpoints are unlikely to change often. Implementing a robust integration with exponential backoff, error handling, and retries is a one-time investment — and it'll be more reliable than an agent figuring out how to interact with them on the fly, every time.

**Scope the LLM to what only an LLM can do.** Extracting structured information from unstructured text is a well-defined task. You can build targeted evals, iterate on prompts, use a cheaper model — or even swap the LLM out for a Named Entity Recognition model if your input vocabulary is bounded enough. None of that is possible when the LLM is entangled with search, policy, and ordering logic.

**Don't ask an LLM to enforce rules it hasn't seen.** LLMs have seen code countless times in their training data. They have *not* seen your specific business requirements, procurement policies, or vendor contracts. A deterministic policy check is a hard `if` statement. It cannot be reasoned around, hallucinated, or skipped.

**Auditability comes for free.** Every step after extraction is regular code. You can log, test, and trace it exactly like any other software. When something goes wrong — and it will — you know exactly where to look.

This is the pattern I'd reach for first in any process automation task: use an LLM to bridge the gap between human language and structured data, then let code do the rest. For a deeper dive into how to implement this cleanly with PydanticAI, see [this post](https://danielfridljand.de/post/pydantic-ai-type-safe-hybrid-workflows).