Every organization that wants AI to actually work, not just demo well, but run real business operations, faces the same obstacle. The AI doesn’t understand the business. It can process text, generate responses, and call APIs. But it doesn’t know that your biggest client has a custom pricing agreement. It doesn’t know that Q4 invoicing follows different rules than Q1. It doesn’t know that “urgent” means something different to your Portland team than to your Austin team.

This isn’t a technology problem. The models are good. The tools are mature. The problem is structural: your organization’s knowledge lives in people’s heads, in tribal expertise passed between tenured employees, in exception-handling logic that has never been written down. AI agents can’t operate on unstructured knowledge. They need context, structured representations of how your business works, what your entities look like, what rules govern decisions, and what “good” means in specific situations.

Business-as-Code is the methodology that solves this. You define business entities as JSON schemas, encode domain expertise as markdown skills, and structure organizational context so any AI agent can operate on your business from day one. It turns the 95% AI pilot failure rate into a structural problem with a structural fix.

This guide is the complete implementation manual. The actual process we run at NimbleBrain on every engagement and use internally to run our own operations. By the end, you’ll have a clear path from “our knowledge lives in people’s heads” to “our agents handle real work autonomously.”

What You’ll Learn

This guide covers the end-to-end Business-as-Code implementation in six steps, plus common mistakes and how to start immediately.

Step 1: The Knowledge Audit. How to systematically find the tribal knowledge in your organization, the decision rules, exception handling, and domain expertise that currently lives only in people’s heads. This is where every implementation begins.

Step 2: Designing Your Schema Layer. How to define your business entities as JSON schemas, the structured data model that tells AI agents what your business IS. Customers, orders, workflows, products, approval chains, with their attributes, relationships, and constraints spelled out in machine-readable format.

Step 3: Encoding Skills. How to turn domain expertise, the judgment calls, the “it depends” decisions, the things a new hire needs six months to learn, into structured documents that AI agents can follow consistently. This is the heart of Business-as-Code.

Step 4: Building the Context Layer. How to create the glue that ties schemas and skills together, the background knowledge that makes individual artifacts coherent and gives agents the organizational awareness to operate intelligently.

Step 5: Your First Agent. How to connect schemas, skills, and context to a working AI agent that handles real business processes. Not a demo. A production system operating on real data.

Step 6: The Recursive Loop. How to set up the continuous improvement cycle. BUILD, OPERATE, LEARN. That makes the system compound over time. Each iteration makes every agent smarter without retraining a single model.

After the six steps, we cover the five most common mistakes that kill implementations and concrete actions you can take tomorrow morning.

Step 1: The Knowledge Audit

Before you write a single schema or skill, you need to understand what your organization knows and where that knowledge lives. This is the knowledge audit, the foundation that everything else builds on.

Most organizations vastly underestimate how much critical knowledge is unwritten. They have process documentation, sure. Maybe a wiki, a Confluence space, a SharePoint site with SOPs. But the documentation describes the happy path. The real operational knowledge, the exceptions, the judgment calls, the “when this happens, call Janet because she’s the only one who knows the workaround”. That knowledge exists nowhere except inside specific people’s heads.

This is tribal knowledge. It’s the institutional memory that makes experienced employees invaluable and new hires slow. It’s the reason a 10-year veteran can handle a complex customer issue in 5 minutes while a new hire takes an hour and still gets it wrong. And it’s the reason your AI pilot failed: the AI had the same information as the new hire, minus the ability to walk over to Janet’s desk and ask.

How to Run the Audit

The knowledge audit is a structured interview process. You’re not asking people “what do you want AI to do?”. That question produces wish lists, not actionable knowledge. You’re asking “how do you make decisions?”

Identify your knowledge holders. These are the people who carry the most tribal knowledge. Operations leads. Customer service veterans. The finance controller who has been doing month-end close for eight years. The sales director who knows every client’s quirks. Start with 5-8 people who represent your core business functions.

Map the decision points. For each process area, identify the decisions that require judgment. When a support ticket comes in, how does it get prioritized? When an invoice has a discrepancy, what’s the escalation path? When a new lead enters the pipeline, how is it scored? Each decision point is a candidate for encoding.

Document the exceptions. This is where the real value lives. Every process has a standard path and a set of exceptions. The standard path is usually documented. The exceptions rarely are. The customer who gets special pricing. The vendor who requires manual approval above a certain threshold. The regulatory check that only applies to certain jurisdictions. The product return policy that has three tiers depending on customer segment.

Capture the “it depends” logic. When a knowledge holder says “it depends,” that’s a signal. Stop and unpack it. Depends on what? What are the conditions? What changes based on those conditions? “It depends” is tribal knowledge in its most concentrated form, a decision tree that exists only in one person’s head.

Quantify the frequency and impact. For each process and decision point, record how often it occurs and what it costs in time or money. A decision that happens 50 times a day and takes 10 minutes each time is 4,000 hours per year. That’s your priority list.

Audit Outputs

A completed knowledge audit produces three things:

A knowledge map. A structured inventory of every process, decision point, exception, and knowledge holder in the areas you audited. This map shows you not just what the organization knows, but who knows it and how critical they are. If one person is the sole holder of knowledge that governs a high-volume process, that’s a risk, and a high-priority encoding target.

A prioritized process list. Every candidate process ranked on two axes: impact (time saved, error reduction, revenue affected) and structurability (how well-defined are the rules, how many exceptions exist, how much judgment is required). The top 8-12 become your initial encoding targets.

A gap inventory. Places where the organization has no clear rules, where decisions are made inconsistently because there’s no shared standard. These gaps are valuable. They’re not just encoding targets; they’re process improvement opportunities. Building a skill for an inconsistent process forces the organization to define what “right” looks like.

The knowledge audit typically takes 3-5 days for one business domain. You don’t audit the entire organization at once. Start with the highest-impact area and expand.

Step 2: Designing Your Schema Layer

Schemas define what your business IS. They’re JSON definitions of your entities, the nouns of your business, with their attributes, relationships, and constraints spelled out in structured data that both humans and AI agents can read.

A schema doesn’t describe a customer in prose. It defines a customer as a data structure: required fields (name, email, segment), optional fields (phone, address, custom pricing tier), valid states (active, churned, prospect), and relationships to other entities (a customer has orders, an order has line items, a line item references a product).

This is the structural approach that replaces strategy decks with executable assets. A strategy deck describes your customer segments in a 2x2 matrix. A schema defines them in a format that an AI agent can validate, query, and act on.

Start With Five Core Entities

Don’t try to model your entire business on day one. Start with five core entities that your highest-priority processes operate on. For most organizations, these are:

Customer. Who you serve. Segments, tiers, preferences, special agreements, communication history.

Order / Transaction. What customers buy or request. Line items, pricing, status, approval requirements.

Product / Service. What you deliver. Attributes, pricing tiers, availability, constraints.

Process / Workflow. How work moves through your organization. Steps, approvals, handoffs, SLAs.

Employee / Role. Who does what. Capabilities, authorities, escalation paths.

Your specific entities will vary. A logistics company has shipments and routes. A healthcare provider has patients and treatment plans. A financial services firm has portfolios and risk assessments. The pattern is the same: identify the core nouns and define them structurally.

Schema Design Principles

Required fields are a commitment. Every field you mark as required means every instance of that entity must have it. Be conservative. Start with 5-8 truly required fields and let the rest be optional. You can always tighten constraints later; loosening them after agents depend on them is harder.

Enums encode business rules. When a field has a finite set of valid values, customer segments, order statuses, priority levels, define them as enums. This isn’t just data validation. It’s encoding business logic. When an agent sees that a ticket’s priority can only be “critical,” “high,” “medium,” or “low,” it knows the vocabulary of your organization.

Relationships are first-class. A customer schema that doesn’t reference orders is incomplete. A product schema that doesn’t reference pricing tiers is missing context. Define relationships explicitly. When an agent processes an order, it needs to traverse from order to customer to pricing tier to determine the correct price. If those relationships aren’t in the schemas, the agent can’t make that traversal.

Descriptions carry context. JSON Schema supports a description field on every property. Use it. Don’t just say "status": "string". Say "status": { "type": "string", "enum": ["active", "churned", "prospect"], "description": "Customer lifecycle stage. Active means paying and engaged. Churned means previously active, now inactive 90+ days. Prospect means qualified lead, not yet converted." } Those descriptions are what agents read to understand the semantics, not just the structure.

Version everything. Schemas change as the business evolves. A new product tier gets added. An approval threshold changes. A new field becomes required. Version your schemas in git. Every change is tracked, reversible, and auditable. When an agent behaves unexpectedly, you can diff the schema to see what changed.

A Practical Example

Here’s a simplified customer schema that demonstrates the principles:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Customer",
  "description": "A customer entity in the CRM. Represents any organization or individual with an active or historical business relationship.",
  "type": "object",
  "required": ["id", "name", "segment", "status"],
  "properties": {
    "id": {
      "type": "string",
      "description": "Unique identifier. Format: CUS-XXXXX."
    },
    "name": {
      "type": "string",
      "description": "Legal business name or individual's full name."
    },
    "segment": {
      "type": "string",
      "enum": ["enterprise", "mid-market", "smb"],
      "description": "Revenue-based segmentation. Enterprise: >$1M ARR. Mid-market: $100K-$1M. SMB: <$100K."
    },
    "status": {
      "type": "string",
      "enum": ["active", "churned", "prospect"],
      "description": "Lifecycle stage. Determines available actions and SLA tier."
    },
    "customPricing": {
      "type": "boolean",
      "default": false,
      "description": "If true, standard pricing does not apply. Check the pricing_override field for this customer's specific rates."
    },
    "primaryContact": {
      "type": "object",
      "description": "Main point of contact for this customer.",
      "properties": {
        "name": { "type": "string" },
        "email": { "type": "string", "format": "email" },
        "role": { "type": "string" }
      }
    }
  }
}

This schema is deliberately simple. Five required fields. A few optional fields that capture critical business logic (custom pricing, primary contact). Clear descriptions that tell an agent not just what the data looks like, but what it means.

A real production schema would have more fields, more relationships, and more constraints. But this is where you start. Get this into production, let agents operate on it, and the gaps will surface quickly. That’s the Recursive Loop at work.

Step 3: Encoding Skills

Skills are the heart of Business-as-Code. If schemas define what your business IS, skills encode what your business KNOWS, the domain expertise, the judgment calls, the decision logic that makes experienced employees effective.

A skill is not a prompt. This distinction matters. A prompt is an instruction you write once for one interaction. A skill is a structured document, written in natural language with embedded constraints. That any AI agent can reference repeatedly to make consistent decisions. A prompt says “qualify this lead.” A skill defines what a qualified lead looks like, lists the disqualification criteria, specifies the data sources to check, describes the scoring algorithm, and defines the output format.

Skills-as-Documents is the concept. The practice is straightforward: encode the tribal knowledge surfaced in your knowledge audit into markdown files with a consistent structure.

Skill Structure

Every production skill follows the same pattern:

Purpose. One sentence describing what this skill does. “Score and qualify inbound leads based on fit criteria and engagement signals.”

When to use. The trigger conditions. “Apply when a new lead enters the pipeline from any source, web form, referral, event, or cold outbound.”

Input. What data the agent needs. “Lead record (from lead schema), company data (from company schema), engagement history (from CRM).”

Decision logic. The core of the skill. Step-by-step instructions for how to make the decision, written in natural language. This is where the tribal knowledge lives. “Check if the lead’s company size falls within our target range (50-500 employees). Check if the industry matches our ICP. Check if the lead has engaged with at least two content assets in the last 30 days…”

Exceptions. The edge cases. “If the lead is a referral from an existing enterprise customer, skip the company size filter, these convert at 3x the standard rate regardless of size.” Every exception you encode is one less situation where the agent will make the wrong call.

Output. What the skill produces. “A qualification score (1-100), a qualification status (qualified, nurture, disqualified), and a one-paragraph rationale explaining the decision.”

Examples. Two to three worked examples showing the skill applied to real scenarios. Agents use these as calibration references. If the skill’s logic is ambiguous, the examples resolve the ambiguity.

Writing Your First Skill

Pick the highest-volume decision from your knowledge audit. The one that happens dozens or hundreds of times a day. The one where an experienced employee makes the call in seconds and a new hire takes ten minutes. That’s your first skill.

Writing a skill takes 15-30 minutes for a well-understood process. Sit with the domain expert. Walk through 5 real examples of the decision. For each example, ask: what did you look at? What was the deciding factor? What would have changed your decision? Write down the answers in the skill structure above.

Then validate. Run the skill against 10 historical cases where you know the right answer. If the agent matches the expert’s judgment on 8 out of 10, the skill is ready for production. Use the two mismatches as iteration targets, each gap you close makes the skill stronger.

Here’s a simplified lead qualification skill to illustrate the format:

# Lead Qualification

## Purpose
Score and qualify inbound leads based on fit, engagement, and timing signals.

## When to Use
Apply when a new lead enters the pipeline from any channel.

## Input
- Lead record (lead schema)
- Company data (company schema)
- Engagement history (last 90 days from CRM)

## Decision Logic
1. **Fit score (0-40 points)**
   - Company size 50-500 employees: +20
   - Company size 501-2000: +15
   - Industry matches ICP list: +10
   - Revenue >$10M: +10

2. **Engagement score (0-40 points)**
   - Downloaded a resource: +10 per resource (max 20)
   - Attended webinar: +15
   - Visited pricing page: +10
   - Requested demo: +20 (cap total at 40)

3. **Timing score (0-20 points)**
   - Active evaluation (mentioned timeline): +20
   - Budget cycle Q4: +10
   - Recent technology change: +10

## Exceptions
- Referral from enterprise customer: auto-qualify regardless of fit score
- Competitor employee: auto-disqualify
- Existing customer upsell: route to account management, skip qualification

## Output
- Score: sum of fit + engagement + timing (0-100)
- Status: qualified (70+), nurture (40-69), disqualified (<40)
- Rationale: one paragraph explaining the primary factors

## Examples
[Include 2-3 worked examples with real-ish data]

This skill is specific enough to produce consistent results and general enough to handle the normal range of leads. The exceptions section handles the edge cases that would trip up a prompt-based approach. The examples give the agent calibration data for ambiguous situations.

How Many Skills Do You Need?

Start with 10-15 skills covering your highest-priority processes. A typical initial set includes:

  • 3-4 skills for your primary revenue process (lead qualification, proposal generation, pricing decisions, deal approval)
  • 3-4 skills for your primary operational process (ticket routing, escalation logic, SLA management, exception handling)
  • 2-3 skills for internal operations (invoice processing, reporting, compliance checks)
  • 1-2 cross-cutting skills (communication tone by audience, data quality validation)

This gives your first agents enough capability to handle real work. The Recursive Loop will surface what’s missing. Every human override is a signal that a skill needs refinement or a new skill needs to be written.

Step 4: Building the Context Layer

Schemas define your entities. Skills encode your decisions. Context is the glue that ties them together, the background knowledge that makes individual schemas and skills coherent.

Without context, an agent has a customer schema and a lead qualification skill but doesn’t know which industry you’re in, what your strategic priorities are, how your teams are structured, or what your customers actually expect. The agent can follow the skill mechanically, but it can’t exercise the kind of judgment that comes from understanding the bigger picture.

Context is what separates an agent that follows rules from an agent that understands the business.

What Context Includes

Organizational context. Who you are, what you do, how you’re structured. Your industry, your market position, your competitive environment, your team structure, your strategic priorities. This is the knowledge that every employee absorbs in their first few weeks and that shapes every decision they make afterward.

Domain context. Industry-specific knowledge that your business operates within. Regulatory requirements. Industry standards. Market conventions. Terminology. A healthcare agent needs to know HIPAA constraints. A financial services agent needs to know SOX requirements. A logistics agent needs to know INCOTERMS.

Relationship context. How your entities relate to each other in ways that go beyond schema relationships. Your biggest customer is also your most demanding. Your newest product has the highest margin but the longest implementation cycle. Your Austin office handles West Coast clients while Portland handles the Northwest. These patterns inform judgment.

Historical context. What has worked and what hasn’t. Past decisions and their outcomes. Seasonal patterns. Growth trends. Known issues and workarounds. An agent that knows “last time we offered a 15% discount to retain a churning enterprise customer, it worked in 3 out of 4 cases” makes better retention decisions than one operating without that pattern.

How to Structure Context

Context lives in markdown files, organized by domain. A typical context layer includes:

context/
├── company.md          # Who we are, what we do, team structure
├── industry.md         # Industry-specific knowledge, regulations
├── customers.md        # Customer segments, key accounts, patterns
├── products.md         # Product details, positioning, constraints
├── operations.md       # How we work, SLAs, escalation paths
└── strategy.md         # Current priorities, quarterly goals, focus areas

Each context file is a living document. It gets updated as the business evolves. When the company enters a new market, industry.md gets updated. When a major customer’s relationship changes, customers.md gets updated. When strategic priorities shift quarterly, strategy.md gets updated.

The key principle: context files describe the “why” that schemas and skills don’t capture. A schema defines the structure of a customer. A skill defines how to qualify a lead. A context file explains why enterprise customers in healthcare get white-glove onboarding, the strategic decision behind the rule.

Context + Schemas + Skills = Intelligent Operation

Here’s how the three components work together in practice.

An agent receives a task: “Process this inbound lead.”

  1. The agent reads the lead schema to understand the data structure: what fields exist, what values are valid, what relationships matter.
  2. The agent reads the lead qualification skill to understand the decision logic: how to score, what thresholds to apply, what exceptions to watch for.
  3. The agent reads the customer context to understand the bigger picture. That enterprise healthcare leads are a strategic priority this quarter, that referrals from existing clients convert at 3x the standard rate, that the sales team is at capacity for mid-market deals.

With all three layers, the agent doesn’t just mechanically score the lead. It understands that a 65-point mid-market lead might be deprioritized because the team is at capacity, while a 55-point enterprise healthcare lead might be fast-tracked because it aligns with the quarterly strategic priority. That’s judgment. And it comes from context.

Without the context layer, the agent would score both leads by the numbers and miss the strategic nuance. It would make the technically correct decision but the operationally wrong one. Context is what closes that gap.

Step 5: Your First Agent

You have schemas that define your business entities. You have skills that encode your decision logic. You have context that provides organizational awareness. Now you connect them to an agent and let it handle real work.

Not a demo. Not a proof of concept. A production system operating on real data, making real decisions, handling real business processes. The distinction matters. Demos operate on curated data in sandboxes. Production systems encounter messy data, unexpected inputs, edge cases, and failures. If your first agent only works on clean data, you haven’t built an agent, you’ve built a demo that will disappoint you later.

Choosing Your First Process

Pick a process that meets three criteria:

High volume. The process happens frequently enough to generate meaningful data for the Recursive Loop. A process that happens twice a month won’t give you enough signal. A process that happens 50 times a day will show you skill gaps within hours.

Well-defined decision logic. Your first agent should operate on a process where the rules are clear and the knowledge audit surfaced strong patterns. Save the ambiguous, judgment-heavy processes for your second or third agent, after the team has experience with the methodology.

Manageable error cost. The consequences of a wrong decision should be correctable. Ticket routing, lead qualification, data entry validation, report generation, these are good first candidates. Contract approval, financial transactions, customer-facing communications, save these for later, when the agent has proven itself and governance is in place.

For most organizations, the best first agent handles operations or customer service, domains where the volume is high, the rules are structured, and the cost of a correctable error is low.

The Human-in-the-Loop Phase

Your first agent runs with human approval for the first iteration. Every decision the agent makes gets flagged for human review before execution. This isn’t permanent. It’s a calibration mechanism.

You’re measuring two things during this phase:

Accuracy. How often does the agent make the right call? Compare the agent’s decisions to what the domain expert would have done. For a well-built agent with solid schemas and skills, accuracy starts around 80-85% and climbs to 90%+ within the first week.

Intervention rate. How often does a human override the agent’s decision? This metric tells you where skills have gaps. Every override is a data point: the agent encountered a situation the skill didn’t cover, or the skill’s logic led to the wrong conclusion. Track every override, categorize the reason, and feed it back into skill refinement.

A typical first agent starts with a 15-20% intervention rate in the first few days. By the end of two weeks, intervention drops to under 5%. The curve flattens as the easy gaps get filled and the remaining overrides represent genuinely novel situations.

When the intervention rate stabilizes below 5%, the agent graduates from human-in-the-loop to autonomous operation with monitoring. It still logs every decision. Anomalies still trigger alerts. But it no longer waits for human approval on routine decisions.

Connecting to Enterprise Tools

Agents need to interact with your existing systems. CRM, email, calendars, project management tools, databases. MCP (Model Context Protocol) is the standard for these connections. Instead of building custom API integrations from scratch, you use MCP servers that provide a standard interface between agents and tools.

The mpak registry has 21+ pre-built MCP servers for common enterprise tools. A CRM integration that would take weeks to build custom takes hours with an existing MCP server. Each server handles authentication, rate limiting, error recovery, and data formatting, the plumbing that consumes most of the effort in traditional integration work.

Start with 2-3 integrations for your first agent. The data source (CRM, database, or internal tool that holds the information the agent needs), the action target (the system where the agent’s decisions get executed), and a communication channel (email, Slack, or the tool where results are delivered).

What Success Looks Like

After two weeks with your first agent in production, you should see:

  • Intervention rate below 5% on routine decisions
  • 3-4x efficiency gain on the automated process (measured in human hours saved)
  • A list of 10-15 skill refinements identified through override analysis
  • Clear data on which processes are ready for the next agent

If you’re not seeing this, the problem is almost always in the skills or schemas, not the agent infrastructure. Go back to the knowledge audit outputs. Compare what the agent is doing wrong to what the domain expert would do differently. The gap between those two things is a missing skill or an incomplete schema.

Step 6: The Recursive Loop

Business-as-Code is not a one-time project. It’s a continuous improvement cycle. BUILD, OPERATE, LEARN. That compounds over time.

BUILD. Create schemas, skills, and context. Deploy agents. Connect integrations. This is where you start, and where you return after every learning cycle.

OPERATE. Agents run in production, handling real work. Every interaction generates data. Every decision creates a record. Every edge case surfaces a pattern.

LEARN. Analyze the operational data. Where do agents struggle? Which skills produce the most overrides? Which schemas are missing attributes? What new situations have emerged that no skill covers? This analysis produces a prioritized list of improvements.

Then back to BUILD. Encode the learnings. Refine the skills. Extend the schemas. Deploy the improvements. And operate again.

The Compound Effect

Each cycle through the loop makes every agent in the system smarter, not through model retraining, but through richer context.

A skill that gets refined after 1,000 real interactions handles edge cases the initial version couldn’t anticipate. A schema that gets extended to capture a new entity relationship gives every agent that references that entity deeper context. A context file that gets updated with a new strategic priority shifts every agent’s judgment to align with the new direction.

The compounding is multiplicative, not additive. A new skill doesn’t just help the agent it was written for. It helps every agent that operates on the same schemas, because the skill enriches the overall knowledge layer. A schema refinement doesn’t just fix one agent’s blind spot. It gives every agent better data to work with.

Month 1: 8-12 automations running with occasional human oversight. Month 3: intervention rate near zero on established processes, 15-20 new skills added, 2-3x the process coverage of the initial deployment. Month 6: the system operates at a level that would have taken 18+ months with a traditional implementation.

Reaching Escape Velocity

Escape Velocity is the point where your internal team adds new capabilities faster than the business generates new requirements. At that point, the system is self-sustaining.

New processes get automated as a matter of course. New hires get skills written for their domain in their first week. The knowledge audit isn’t a special project anymore. It’s how the organization captures expertise. Someone figures out a better way to handle a customer situation? They write a skill. A new regulation takes effect? The compliance context gets updated and every agent immediately follows the new rules.

Most organizations reach Escape Velocity 2-3 months after the initial implementation. The variable isn’t the technology. It’s how aggressively the team adopts the practice of encoding knowledge as schemas and skills. Teams that treat encoding as a weekly habit reach Escape Velocity fast. Teams that treat it as an occasional project take longer.

The metric that tracks progress toward Escape Velocity is pattern velocity: how fast new patterns (schemas, skills, context updates) are being added to the system. A healthy post-deployment team adds 2-5 new skills per week. When pattern velocity exceeds the rate of new business requirements, you’ve reached Escape Velocity.

Common Mistakes

We’ve seen every way a Business-as-Code implementation can go wrong. These five kill the most projects.

1. Starting With Technology Instead of Context

The most common and most fatal mistake. A team picks a model, picks a framework, builds agent infrastructure, and three months later realizes the agents don’t understand the business. The technology is 10% of the problem. Context is 90%. An agent without context is a chatbot with API access.

The fix: Start with the knowledge audit. Understand what agents need to know before you decide how to build them. If you haven’t structured your business knowledge into schemas and skills, it doesn’t matter how good your model is.

2. Over-Engineering Schemas on the First Pass

Teams try to capture every edge case and optional field before deploying anything. They spend weeks building the “perfect” schema, 50 fields, nested objects four levels deep, validation rules for every conceivable scenario. The result is a schema that is complete on paper and never validated by agents in production.

The fix: Start with the minimum viable schema. 5-8 required fields. Clear descriptions. Basic relationships. Get it into production and let the Recursive Loop tell you what’s missing. You’ll learn more from one week of agent operation than from a month of theoretical schema design.

3. Writing Skills Like Prompts

A skill that says “Qualify this lead based on our criteria” is a prompt, not a skill. It doesn’t define the criteria. It doesn’t specify the scoring logic. It doesn’t handle exceptions. It produces inconsistent results because it relies on the model’s general knowledge instead of encoded business knowledge.

The fix: Every skill should be specific enough that a new hire could follow it and produce the right result. If a human can’t execute the skill by reading it, an agent can’t either. Include the decision logic, the exceptions, the output format, and worked examples.

4. Skipping the Human-in-the-Loop Phase

Excitement about automation leads teams to deploy agents with full autonomy from day one. The agents make mistakes that reach end users. Trust erodes. The project gets killed.

The fix: Every new agent starts with human-in-the-loop approval. No exceptions. The calibration phase costs 1-2 weeks of human review time. Skipping it costs months of rebuilt trust when the agent makes a visible mistake.

5. Not Planning for Iteration

A team builds schemas and skills, deploys an agent, and considers the project done. Nobody is monitoring the intervention rate. Nobody is refining skills based on overrides. Nobody is extending schemas when new patterns emerge. The system stagnates while the business evolves.

The fix: Assign ownership. Someone on the team is responsible for the weekly review cycle, analyzing overrides, prioritizing skill refinements, extending schemas, updating context. The Recursive Loop only works if someone is turning the crank. Make it part of someone’s job, not an afterthought.

Getting Started Tomorrow

You don’t need a consulting engagement, a new platform, or a six-month roadmap to start. You can begin implementing Business-as-Code tomorrow morning with nothing more than a text editor and an hour of focused work.

Morning: Pick one process. Choose the business process you understand best and that happens most frequently. Not the most complex. Not the most valuable. The one where you can clearly articulate “here’s how an expert handles this.”

First 30 minutes: Write the schema. Define the core entity involved in that process. 5-8 fields. Required fields only. Clear descriptions. Use the customer schema example from Step 2 as your template. Save it as JSON.

Next 30 minutes: Write one skill. Encode the primary decision in that process. Follow the skill structure from Step 3: purpose, when to use, input, decision logic, exceptions, output, examples. Write it in markdown. Be specific enough that a new hire could follow it.

This week: Test it. Feed the schema and skill to any LLM-based agent (Claude, GPT-4, or whatever you have access to). Give it a real example from your process. Compare the agent’s output to what an expert would do. Note where it gets it right and where it misses. The misses are your first skill refinements.

Next week: Add two more. Pick two more processes from the same domain. Write their schemas and skills. You now have a small but functional context layer. Notice how the second and third skills are faster to write. They reference the same schemas, the same context patterns, the same organizational knowledge. That’s the compounding starting.

By the end of the month: You’ll have 10-15 schemas and skills covering your primary business domain. Your agents will handle routine decisions with 90%+ accuracy. You’ll have a clear view of which processes to encode next. And you’ll understand, from direct experience, not theory, why context is the real skill, not prompt engineering.

The organizations that win with AI aren’t the ones with the best models or the biggest budgets. They’re the ones that structured their knowledge first. Business-as-Code is how you do that. The guide you just read is the playbook. The only thing left is to start.


Frequently Asked Questions

How long does it take to implement Business-as-Code?

A basic implementation (core entity schemas and 5-10 skills) can be done in 1-2 weeks. A full implementation covering all business functions typically takes 4-6 weeks with an embedded team. Most organizations reach Escape Velocity (self-sustaining improvement) in 2-3 months.

Do I need AI expertise to implement this?

You need someone who understands your business processes and can work with structured data formats like JSON and markdown. You don't need ML expertise. The best Business-as-Code implementers are operations leaders who learn the technical format, not technologists who try to learn the business.

What's the first step?

A knowledge audit. Identify the tribal knowledge, business rules, and decision processes that currently live in people's heads. Map who knows what, how decisions get made, and where exceptions live. Then prioritize which processes to encode first based on frequency, impact, and structurability.

Can I use Business-as-Code with any AI model?

Yes. Schemas are JSON, skills are markdown, context is structured text. Any LLM-based agent can consume these formats. The methodology is model-agnostic. It works with GPT-4, Claude, Gemini, open-source models, or whatever comes next. The value is in the structured knowledge, not the model.

What tools do I need?

A way to define schemas (JSON Schema), a way to write skills (structured markdown), and a way to version them (git). That's it. NimbleBrain uses Upjack for declarative app definitions and JSON schemas hosted at schemas.nimblebrain.ai, but the principles work with any structured format. The hard part is the knowledge work, not the tooling.

How is this different from documenting our processes?

Process documentation describes how things work for humans. Business-as-Code encodes how things work in formats that AI agents can execute on. A process doc says 'qualified leads meet these criteria.' A skill defines the criteria, specifies the data sources to check, describes the scoring logic, and defines the output format, so an agent can actually do the qualification.

What if our business processes change frequently?

That's exactly why Business-as-Code uses version-controlled files instead of hardcoded logic. When a process changes, you update the skill. Every agent that references that skill immediately operates on the new rules. Compare this to changing hardcoded business logic scattered across multiple codebases.

How do I know if a process is a good candidate for encoding?

Good candidates have three properties: the decision logic can be described in natural language, the inputs and outputs are structured or can be structured, and a human currently spends meaningful time on the process. The knowledge audit evaluates every candidate against these criteria.

What's the ROI of Business-as-Code?

A typical implementation costs 80-120 hours of knowledge work upfront. The context layer is durable across model updates, reusable across every agent and use case, and compounds as you add to it. Within 6 months, organizations see 3-4x efficiency gains on automated processes. One engagement reduced a 4.2-hour daily process to 18 minutes.

Can I start small and scale later?

That's the recommended approach. Start with 5 core entity schemas and 10 skills covering your highest-volume workflow. Deploy one agent. Let the Recursive Loop surface what to encode next. Each cycle builds on the previous one, new skills reference existing schemas, new agents inherit context from the layer you've already built. This is how compound improvement works.


Ready to put this into practice?

Or email directly: hello@nimblebrain.ai