Creating Functional Agentic AI

How to Engineer Intent

May 14, 2026 by

CBOS (PTY) LTD, Sean Veldboer

or nearly sixty years, the underlying ‘contract’ of software engineering has been rooted in determinism. We inherited a world defined by the Von Neumann architecture: a world of sequential logic, predictable state transitions, and absolute control. If a developer wrote if (x) then {y}, they could go to sleep knowing that as long as the hardware held, ‘y’ would happen every single time ‘x’ occurred. All the aspects of software development were built upon this bedrock of certainty.

But as we transition from software that contains AI to software that is an AI agent, that contract is being torn up.

Agentic AI is not just a “smarter” layer of code. It represents a fundamental shift in how software behaves. We are moving from systems that execute instructions to systems that pursue goals. This evolution changes the mandate of the engineering leader from managing the logic to governing conduct. So instead of managing code, we will have to concentrate on shaping behaviour.

The birth of intent

In traditional software, the algorithm plays a huge role. It is a recipe: step-by-step instructions that transform an input into a specific output. If the output is wrong, the recipe is flawed. We “debug” by tracing the logic back to the broken step.

In an agentic system, the algorithm is replaced by Intent.

An agent doesn’t follow a static decision tree; it interprets a goal within a context. If you task an agentic system with “minimizing churn for high-value customers by offering personalized retention packages”, you are no longer writing the code that decides what the package is. The agent evaluates the customer’s history, the current market trends, and the available budget, then chooses an action.

This is where a pivot in how leadership functions must take place:

“Feature X” or “Feature Y” can no longer be the centrepieces of planning and must rather focus on restraint engineering. As a leader, your value no longer lies in how efficiently your team can map out every logic branch. Instead, it lies in how clearly you can articulate the boundaries of the system’s “agency”:

What are the non-negotiables? (Ethical, legal, and safety guardrails).
What are the trade-offs? (Is accuracy more important than latency? Is cost-saving more important than customer delight?)
The “Escalation Threshold”: Defining the precise moment an autonomous actor must stop and ask a human for permission.

The Scaffolding:

The best way to articulate this concept is through an analogy. If you consider train architecture as a set of tracks that a train must follow, agentic architecture is a containment field in which a drone is allowed to fly.

When software can make probabilistic decisions, your architecture must provide the “scaffolding” that keeps those decisions safe and observable. We are moving away from hardcoded APIs toward Dynamic Tooling Interfaces and Context Pipelines.

The “Validator” Agent:

In this new architectural paradigm, we are seeing the emergence of the “Multi-Agent Mesh.” You don’t just build one agent; you build an ecosystem of checks and balances.

The Actor: Pursues the goal.
The Critic: Reviews the Actor’s proposed plan against a set of safety policies.
The Auditor: Records the decision-making trace for post-hoc analysis.

As an engineering lead, you are designing the environment where these interactions happen. This includes managing Memory Layers (both short-term [context windows] and long-term [vector databases/RAG]) to ensure the agent has the “wisdom” to act correctly without being overwhelmed by “noise.”

The End of the Binary Test Suite

The most jarring shift for engineering teams is the realisation that the “Green/Red” light of unit testing is now insufficient. In a deterministic world, a test passes or fails based on an exact match. In an agentic world, the same input may produce three different, yet equally valid, outputs. As a result, we must evolve from Validation to Evaluation.

In an agentic SDLC, “Quality” is a statistical distribution, rather than a binary state. If an agent stays within its ethical and financial boundaries 99.9% of the time but hallucinates a regulatory violation in the other 0.1%, the code is technically “working,” but the system is a failure.

The New Quality Toolkit:

Behavioural Scoring: We replace assert output == expected with score (output, intent) > 0.85. We use LLMs to grade other LLMs on their adherence to the “Agent Charter.”
Adversarial Red-Teaming: Instead of just testing “happy paths,” we must actively try to trick our agents into breaking their own constraints. This is quality engineering turned into security research.
Drift Monitoring: Agents are sensitive to the world around them. As the underlying model (e.g., GPT-4 to GPT-5) or the external data changes, the agent’s “personality” may shift. We need dashboards that track behavioural drift over months, not just minutes.

Managed Autonomy

Deployment, maybe until recently, has always been seen as an event. You would push the code, check the error logs, and then head off to lunch. In the agentic world, deployment is the beginning of a continuous orchestration phase.

No longer are we just going to be releasing code, but rather onboarding a semi-autonomous entity. Which, of course, requires new levels of oversight and governance. This includes the introduction of kill switches and autonomy tiers. It involves starting an agent in an Advisory Tier (where it suggests actions but doesn’t take them), then moving it to a Constrained Execution Tier (where it can act only on small interactions), before finally granting it Full Autonomy.

This is not “Continuous Deployment” in the Jenkins sense; it is continuous alignment. You are constantly tuning the prompt, the tools, and the guardrails based on real-world performance.

The Cultural Shift

Perhaps the greatest challenge isn’t technical, but cultural. The effect of coding everything yourself comes with the feeling of being entitled to control. Being told to shape conduct rather than write the logic involved can feel like losing both your autonomy and essential function within a company. This will involve reframing a developer’s identity to the point where they feel comfortable with no longer maintaining ultimate control over parts of a project that they might have had before.

A point to consider within this is hiring and managing these ‘digital employees’. Think of your agentic system not as a tool, but as a new hire. You wouldn’t manage a new employee by giving them a 10,000-page manual documenting every single mouse click they should ever make. That would be inefficient and brittle. Instead, you give them:

Context: The history of the company and the project.
Goals: What success looks like.
Boundaries: What they are absolutely never allowed to do.
Feedback: Regular reviews of their performance.

This is exactly how we must now build software. The “Definition of Done” (DoD) for a sprint must change. A story is no longer “Done” when the code is merged. It is “Done” when the agent’s behaviour has been calibrated against the intent, and the guardrails have been verified as “impenetrable.”

What remains?

It is tempting to think that the “Old SDLC” is dead. It isn’t. In fact, the move to Agentic AI makes the traditional engineering discipline more important than ever.

If your underlying data pipelines are messy, your agent will be hallucinating on bad information. If your API contracts are poorly defined, your agent will fail to use its tools. If your security protocols are lax, your agent becomes a massive vulnerability.

Furthermore, agile principles are actually amplified in an agentic world. The system itself is adaptive; as a result, the human team must be even more so. This should play a large role in the changing roles of the individual and industry. We will iterate on the alignment of the system and no longer the features.

Reliability at Scale

Ultimately, the future of engineering is not about code velocity. In an age where AI can write the code itself, “lines of code per day” is a meaningless metric. The new metric of success is behavioural reliability. Can you prove that your autonomous system will act in the best interest of the customer, even in a situation the developers never envisioned? Can you demonstrate that your agents are “drifting” toward better outcomes rather than toward chaos?

The transition to Agentic AI is a “growing up” moment for the software industry. We are moving away from the mechanical certainty of the industrial age and into the biological complexity of the information age. We are moving away from the mechanical certainty of “Input A leads to Output B” and toward a more sophisticated model of probabilistic accountability.

This is the new engineering mandate: It isn’t about making machines “smarter.” It’s about making autonomous systems disciplined.

Sources

PwC (Jan 2026): Agentic SDLC in Practice: The Rise of Autonomous Software Delivery. This report provided the framework for moving from deterministic code to “high-level intent” and the “Governance Gap” that leaders must now address.

Anthropic (Feb 2026): 2026 Agentic Coding Trends Report. I drew from their research on “Multi-Agent Architectures” and the shift from “Hand-coding” to “Orchestrating AI agents” as the primary human role.

Deloitte (Feb 2026): 2026 Global Software Industry Outlook. This source informed the strategic reframing of engineering teams and the integration of “digital workers” (agents) alongside human seniors.

KPMG International (Jan 2026): Agentic AI is Revolutionizing Software Development. This provided the basis for the “Alignment over Velocity” argument and the concept of developers as supervisors of asynchronous agents.

McKinsey & Company (Nov 2025/Sept 2025): Unlocking the Value of AI in Software Development and The Agentic Organization. These were used to ground the discussion on ROI and the organizational shift toward “Humans above the loop.”

HB-Eval (Dec 2025): A System-Level Reliability Evaluation and Certification Framework for Agentic AI. This preprint is the source for the behavioural reliability metrics mentioned (specifically the shift from task success to resilience under stress).

Eval-Driven Memory (EDM) (Jan 2026): A Persistence Governance Layer for Reliable Agentic AI. This research informed the “Architecture as Scaffolding” section, specifically regarding how agents store and retrieve “certified” high-quality behaviours.

Agentic Systems Engineering (Sept 2025): Robust, Observable, and Evolvable Agentic Systems Engineering. This paper provided the conceptual bridge between traditional Software Engineering (SE) and the “SE Absence” in early agentic systems.

The Death of the Timesheet

Creating Functional Agentic AI

The birth of intent

The Scaffolding:

The End of the Binary Test Suite

Managed Autonomy

The Cultural Shift

What remains?

Reliability at Scale

Sources

Pages

Languages

Platforms

Contact