Anapoly AI Labs

Tag: concept

Contract-First Prompting
Contract-First Prompting

Transcript of a chat with Perplexity AI on 4 August 2025. Scope:
- What is contract-first prompting?
- How can we achieve compliance with the contract?
- Glossary
- What does the Prompt Contract comprise?
The AI cited a range of sources which are not listed here.

Alec

I’m reading about contract-first prompting. Is this a well known idea?

Perplexity

Contract-first prompting is an emerging approach in the field of prompt engineering for large language models (LLMs), but it is not yet a broadly established or widely recognized term in the AI community. The concept centers on treating the prompt as a contract: before the LLM begins generating output, both the human and the model arrive at a clear, mutual understanding of the task, success criteria, requirements, and constraints—essentially simulating the way engineers or teams agree on specifications before work begins.

The main advocate for this method, as of August 2025, appears to be Nate B. Jones, who describes contract-first prompting as a way to overcome mismatches in intent that often lead to prompt failures. His process involves explicitly communicating the mission, listing out unknowns, having the LLM ask clarifying questions until it is highly confident it understands, and then “locking” the agreement before any generative work starts. This ensures a precision in the model’s output that typical, single-shot instructions often lack.

Key features of contract-first prompting include:
- Explicit Agreement: Agreed-upon goals, constraints, and deliverables set before generation begins.
- Clarification Loop: The LLM systematically asks questions about any gaps or ambiguities until it reaches a high confidence threshold (often cited as 95%).
- Echo Check: Before proceeding, the model summarizes (in a single crisp sentence) the mission to confirm everyone is aligned.
- Token Efficiency: The approach aims for clarity without unnecessarily long prompts, relying instead on a structured negotiation for understanding.
While the term “contract-first” has broader use in software (such as API design) and business contract processes, its specific application to prompting LLMs is recent and is only beginning to gain attention outside of Jones’ blog, newsletter, and talks. There are no signs (as of August 2025) that this is a standard technique in mainstream AI literature, though it is being applied practically and seems to be gaining traction among advanced prompt engineers seeking more reliable, aligned outputs from AI models.

In summary: Contract-first prompting is a new, not-yet-mainstream idea in LLM prompting, but its principles are resonating with users frustrated by intent failures and iterative prompt re-writing. Its adoption and notoriety may grow as more practitioners experiment with the method and share results.

Alec

Having locked the agreement – the contract – how is compliance achieved?

Perplexity

Compliance in contract-first prompting—after “locking” the agreement—relies on both process and mechanisms designed to minimize misunderstanding and enforce the original intent as the model generates outputs.

Here’s how compliance is typically achieved:

1. Mutually Verified Understanding—Before Execution
- Before any generative task begins, both the human user and the LLM go through a structured clarification phase. The LLM actively interrogates gaps or ambiguities, asking targeted questions until a high confidence threshold (such as 95% clarity) is met.
- An “echo check” is performed, where the model summarizes the mission and requirements in a single definitive sentence. If this summary matches the human’s expectations, the contract is “locked,” making intent explicit and verifiable.
2. Process-Driven Output Generation
- Once agreement is locked, the LLM proceeds to generate according to the contract. The steps and constraints defined upfront become the rubric against which the output will be compared.
- The contract includes not just the end goals, but also deliverables, must-hit facts, hardest constraints, and even “guardrails” for creativity, tone, or style—allowing for precise post-generation comparison of what was promised versus what was delivered.
3. Internal Self-Testing and Critique
- Advanced workflows employ “self-testing” or “self-critique” phases, where the LLM is asked to evaluate its output against the agreed contract. This can involve the model itself (or a second critique pass) systematically checking for alignment with requirements, completeness, and accuracy—much as code is tested against a written specification.
- This may be repeated or budgeted with a cap on retries to avoid an endless critique loop.
4. Human Oversight and Feedback
- If the output fails to satisfy the locked contract, a feedback stage allows the user to flag deficiencies, inconsistencies, or missed constraints for another iteration—always referencing the specific terms of the locked agreement as the benchmark for compliance.
5. Traceability and Amendments
- The locked contract (“prompt contract”) and any conversation history serve as a transparent audit trail. If outputs diverge or intent drifts over time, the contract can be revisited, amended with new information, and relocked to renew mutual understanding.
Key Principles for Enforcing Compliance:
- Explicit rubrics: Every required element and forbidden omission is made explicit.
- Model-enabled self-verification: The LLM is prompted to “mark its own homework” before delivering.
- External criticism: Human or model-based critics can review the output, pinpointing contract breaches.
- Iterative negotiation: In the case of failures, the process can resume from clarification, not starting from scratch.
In sum: Contract-first prompting doesn’t just minimize ambiguity upfront—it supports compliance by converting vague expectations into actionable, checkable commitments, and by making the verification of those commitments part of the generation and review process itself.

This “contract discipline” borrows best practices from software engineering and project management, tailored to the specifics of LLM interaction.

Alec

Please provide a glossary for the content of this chat.

Perplexity

Here’s a glossary for key terms and concepts related to contract-first prompting, as discussed in this conversation:

Contract-First Prompting
An approach to prompt engineering where both the user and the language model establish an explicit, mutual understanding of the task, requirements, constraints, and deliverables before generating any outputs. The prompt acts as a “contract” guiding model behavior and evaluation.

Locked Contract / Agreement
The state achieved when both the human user and the language model have fully clarified, summarized, and agreed upon the task and its criteria. Generation begins only after this “contract” is accepted by both parties.

Clarification Loop
A process in which the language model asks targeted questions about the prompt or assignment until all gaps or ambiguities are resolved. This loop continues until a certain confidence threshold (e.g., 95%) is achieved, ensuring both parties understand exactly what is required.

Echo Check
A final summary step, where the language model restates the mission or main goal in a single, explicit sentence to confirm mutual understanding. This is usually the step before locking the contract.

Compliance
The act of ensuring that the model’s outputs adhere to the requirements, constraints, and expectations set out in the locked contract. Compliance involves explicit rubrics, self-testing, and human or model review to verify that the “contract” is fully honored.

Self-Testing / Self-Critique
A process where the language model evaluates its own output versus the locked contract, checking for alignment on deliverables, content, tone, and constraints. This may involve specific prompt instructions for the model to “mark its own homework”.

Rubric
A set of explicit, measurable criteria used to judge whether the model’s output meets the contract. Rubrics can cover required facts, tone, structure, style, and any “must-not” constraints.

Guardrails
Rules or boundaries in the contract that keep output within the intended scope, such as stylistic requirements, forbidden content, or strict adherence to facts.

Iterative Negotiation
The process of returning to the clarification loop or contract stage when outputs fail to comply, allowing for further refinement and amendments rather than starting over from scratch.

Prompt Contract
Another term for the locked prompt; the full, mutually agreed prompt specification defining the agreed work, constraints, and criteria.

Audit Trail
The conversation history, including contract negotiations, clarifications, and amendments. This record provides transparency for how compliance and intent alignment were achieved.

Confidence Threshold
A percentage (such as 95%) representing the degree of certainty the model must reach before locking the contract. Achieved through questions and summarization steps.

Token Efficiency
The principle of using prompt and process structure (such as contract-first negotiation) to achieve clarity and compliance without relying on excessively long or verbose prompts.

These terms form the core vocabulary for understanding and using contract-first prompting as a discipline in language model interactions.

Alec

What does the Prompt Contract comprise, in this approach to working with an LLM?

Perplexity

A Prompt Contract, in contract-first prompting for large language models (LLMs), is a structured, mutually agreed specification between the user and the model that defines exactly what is expected—turning vague intentions into an explicit, checkable work order before any output is generated. Its main components include:
- Mission Statement/Task Description:
  A concise summary of the specific task or objective the model is asked to accomplish. This is the “headline” of the work assignment and sets the frame for all further instructions.
- Deliverables:
  Explicit details about the output format, length, style, structure, and any other properties the response must have. For example, “a 500-word summary,” “Python code with comments,” or “a list of five strategic recommendations.”
- Must-Include Facts and Requirements:
  Any key pieces of information, facts, or arguments that must appear in the output. These are non-negotiable and serve as the backbone for measuring compliance.
- Constraints (Hardest Constraints and Guardrails):
  Limitations or boundaries the model should not cross—such as “do not speculate,” “no first-person language,” “avoid repetition,” or “never reference X.” Guardrails may also cover tone, audience, or domain-specific restrictions.
- Success Criteria/Rubric:
  Criteria or checklist for what constitutes a “successful” output. This could involve accuracy, completeness, style, tone, or inclusion of certain sections. These criteria become the basis for self-testing or human review of the model’s output.
- Clarified Unknowns and Assumptions:
  List of any remaining ambiguities, potential edge cases, or unstated assumptions. These are surfaced and explicitly agreed upon through a clarification loop before locking the contract.
- Echo Check/Summary Statement:
  The language model returns a single, crisp summary statement of the agreed task and constraints. The user confirms this matches their intent—at which point, the “contract” is locked and execution (output generation) can begin.
In practice, the Prompt Contract acts as both a plan and a testable checklist. If model output deviates or is incomplete, parties return to the contract to clarify or amend terms, ensuring high accuracy and alignment. This disciplined, engineering-inspired approach is especially useful for high-stakes, complex, or collaborative tasks, reducing intent gaps and making model behavior more predictable and auditable.
5 August 2025
The concept

Purpose: To model and investigate how non-technical people can make good use of general-purpose AI in their work, using experimentation to understand the strengths and limitations of current AI tools.

Why does this matter? AI is now widely available, but there’s a credibility gap between hype and reality. Many people are unsure how to use AI effectively.

What is Anapoly AI Labs? Not a research lab, nor a tech incubator. A collection of small, hands-on labs simulating real-world contexts to explore the practical use of general-purpose AI tools.

How it works

A lab is a simulated workspace: a model of an office or home environment, set up to reflect the tasks and tools typical of a real working situation. It is equipped with one or more PCs and other internet-connected devices.

For some labs, the devices are physically co-located in one office, together with a large, touchscreen display. This setup is designed for when we want better interaction through face to face contact and shared viewing of experiments. In other labs, the devices may be distributed over two or more locations for remote working.

For all labs, digital files are held in cloud storage. Standard software such as Microsoft Office is used to create and edit documents, manage data, communicate by email, and support typical workflows. General-purpose AI tools like ChatGPT, Perplexity, and NotebookLM are accessed online.

The participants in a lab carry out realistic tasks in a simulated working context – researching a topic, drafting a proposal, analysing correspondence, writing a report – just as they might in their professional life.

To create a lab, we configure the physical and digital parts to suit its purpose. This involves connecting the equipment to a dedicated area of file storage whose content is tailored to the work context being modelled by that lab. Thus all documents, data, and outputs in a lab are context-specific and separate from those in other labs.

What Makes It Different? This isn’t a course, a product demo, or a sales pitch. It’s a testbed. The emphasis is practical: hands-on exploration of what general-purposeAI tools can and can’t do when pointed at everyday work.

Intended audience: curious professionals, small business owners, writers, and community actors – anyone who works with words, data, or decisions.

Mode of Operation: Small, in-person, hands-on sessions. Sometimes co-located, otherwise working remotely.

Outcomes: Better understanding of what AI can and cannot do in everyday contexts. A growing library of real examples and honest reflections. A trusted local presence in the AI literacy landscape.

Founders’ position: Experienced, local professionals not selling AI services but exploring their use. Not trying to be experts, but honest, curious testers of what’s actually useful. Hoping to pass on the baton to a younger team.

14 June 2025
A pivot

Our initial idea, prompted by Kamil Banc’s writing on practical AI use, was to run a small, local club. Somewhere people like us could meet in person, experiment with ChatGPT, and see what we could actually do with it. A “non-threatening, friendly environment,” we called it at the time.

But the concept developed, and the name seemed too cosy. A reference to Google Labs brought up the idea of a lab as a place to experiment with tools and ideas. This resonated, so we pivoted to thinking of ourselves not as conveners of a club but as facilitators of a sandbox: a safe space to try things out and see what works.

Our sandbox would be friendly and exploratory, but with a clear purpose: to model the use of general-purpose AI tools in everyday working environments. It would enable a number of labs, each modelling a different working situation, where we could try things out, see what helps and what doesn’t, and work out how to get better results.

Hence Anapoly AI Labs: one sandbox, many lab setups.

sandbox: a safe play area where computer programs can be used without affecting the operational system; useful for experimenting with or testing new software.

12 June 2025
The idea

The idea for Anapoly AI Labs began with a newsletter.

Kamil Banc wrote a thought-provoking piece on SubStack under the title “My Top 10 ChatGPT Features That Actually Matter At Work”. Early in the piece he wrote:

“Most professionals approach ChatGPT like tourists at an all-you-can-eat buffet. They sample everything, master nothing, and walk away wondering why they feel unsatisfied. The harsh truth? Not all ChatGPT features deliver equal career value. While everyone else is busy playing with voice demos and testing the latest gimmick, the quiet power users are building workflows that make their bosses take notice.”

I was in danger of falling into the “buffet” approach, too. But I had some points of focus. I am writing stories from my family history, and from time to time I needed to research how to produce specialist documents, for example a RAMS: a Risk Assessment and Method Statement for some building maintenance work. These tasks needed me to do more than “browse the buffet”. I began to think seriously about and to research how to get the best value out of ChatGPT.

Dennis had been experimenting too, and Kamil Banc’s assertion resonated with us both. We felt sure that many people must be like us: interested in using AI, but uncertain how to do so effectively.

We felt, also, that many people,rather than do one of the many online courses or get bogged down in YouTube, would prefer to try AI in a more social environment.

Hence the thought: what if we run a small club: get a few like-minded people in a room from time to time, talk about the real tasks AI can help us with, try out a smalll number of AI tools, and compare notes. That was the seed of the idea.

15 May 2025