Drafting Anapoly’s first Lab Setup with ChatGPT

Transparency label: AI‑heavy (ChatGPT model o3 produced primary content; Alec Fearon curated and lightly edited)

Purpose of experiment

Use ChatGPT as a thinking partner to create a first draft of an acclimatisation lab setup for Anapoly AI Labs and to compare output quality between two models (o3 and 4o).

Author and date

Alec Fearon, 24 June 2025

Participants

Alec Fearon – experiment lead
ChatGPT model o3 – reasoning model
ChatGPT model 4o – comparison model

Lab configuration and setup

Interaction took place entirely in the ChatGPT Document Workbench. The first prompt was duplicated into two branches, each tied to a specific model. All files needed for reference (conceptual framework, lab note structure) were pre‑uploaded in the project space.

Preamble

Alec wanted a concise, critical first draft to stimulate team discussion. The exercise also served as a live test of whether o3’s “reasoning” advantage produced materially better drafts than the newer 4o model.

Procedure

Alec issued the same initial prompt in two branches, one running on o3 and the other on 4o.
Each model produced a single draft lab setup; model 4o was not used beyond this initial output.
Alec compared drafts and judged o3’s version clearer and more usable.
Alec then asked o3 to write a lab note using the standard structure. Misinterpretation led to a draft describing the future lab, not the chat.
Alec clarified the requirement: the lab note should document this chat session.
Current note created to fulfil that clarified brief.

Findings

Model o3 delivered a structured, audience‑appropriate draft that mapped cleanly to the conceptual framework.

Model 4o output was markedly inferior: longer, less focused, and ignored some constraints (tone, brevity).

Branching is a quick way to compare model behaviour without leaving the chat environment.

Discussion of findings

The reasoning bias in o3 appears helpful for tasks needing structure and adherence to user tone. 4o may still suit other contexts but underperformed here. Clear instructions and a shared reference framework improved both models’ relevance by sharply narrowing the space of acceptable answers, reducing guesswork, and aligning the generated structure with Anapoly’s nine‑component conceptual model. In practice, both drafts mirrored the framework’s headings and language; o3 reproduced them cleanly, and even 4o—though weaker overall—still kept to the required sections and avoided off‑topic filler.

While editing the discussion of findings under Alec’s guidance, the AI (ChatGPT-o3) referred to earlier ad-hoc tests which were an invention on its part.

Conclusions

o3’s structured reasoning wins – For tasks demanding tight alignment with a predefined framework and disciplined tone, o3 delivered a coherent nine‑section outline with under 5 % irrelevant content. 4o missed two framework elements and introduced roughly 20 % filler.

Branch testing is low‑cost, high‑yield – Running the same prompt in parallel added about three minutes but produced decisive evidence for model selection. This side‑by‑side method is worth standardising as a quick QA step.

Scaffolding curbs hallucination – The explicit conceptual framework and tone constraints kept both models on‑track, showing that well‑built prompt scaffolding is a primary driver of reliability regardless of model choice.

Productivity impact – The refined o3 draft is immediately usable for team critique, saving a substantial amount of time on manual outline work and letting facilitators focus on higher‑order thinking.

Recommendations

Keep using o3 for first‑pass structured drafts until 4o catches up in tone control.

Continue branch testing when working on different types of task, in order to choose the best model for the task in hand.

Log model choice and outcome in future lab notes for transparency.

Glossary

o3 – OpenAI reasoning model used in this session.

4o – OpenAI newer model used for comparison.

Branching – duplicating a prompt to test different AI models side by side.

Document Workbench – ChatGPT interface where canvas documents are edited.

Transparency label justification This lab note carries the AI‑heavy label because ChatGPT model o3 generated the bulk of the prose—covering all core sections from Purpose of experiment through Recommendations. Human input was confined to steering prompts, correcting a single invented claim, and signing off the final text. Given that machine authorship dominates while human work is mainly curatorial, the document meets Anapoly’s definition of AI‑heavy rather than AI‑assisted.

Print 🖨 PDF 📄

Drafting Anapoly’s first Lab Setup with ChatGPT

Purpose of experiment

Author and date

Participants

Lab configuration and setup

Preamble

Procedure

Findings

Discussion of findings

Conclusions

Recommendations

Tags

Glossary

More posts

Testing a local AI

Mind Maps, Podcasts, and a Pocket Brain

The art of goal-directed context management

ChatGPT-5 Availability and Features