{"id":428,"date":"2025-06-24T22:56:09","date_gmt":"2025-06-24T21:56:09","guid":{"rendered":"https:\/\/anapoly.co.uk\/labs\/?p=428"},"modified":"2025-06-25T16:18:24","modified_gmt":"2025-06-25T15:18:24","slug":"branch-test-drafting-anapolys-first-lab-setup-with-chatgpt","status":"publish","type":"post","link":"https:\/\/anapoly.co.uk\/labs\/branch-test-drafting-anapolys-first-lab-setup-with-chatgpt\/","title":{"rendered":"Drafting Anapoly\u2019s first Lab Setup with ChatGPT"},"content":{"rendered":"\n<p class=\"is-style-text-annotation is-style-text-annotation--1 wp-block-paragraph\"><a href=\"https:\/\/anapoly.co.uk\/labs\/transparency-framework\/\" data-type=\"page\" data-id=\"319\"><strong>Transparency label:<\/strong> AI\u2011heavy (ChatGPT model&nbsp;o3 produced primary content; Alec&nbsp;Fearon curated and lightly edited)<\/a><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">Purpose of experiment<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Use ChatGPT as a thinking partner to create a first draft of an acclimatisation lab setup for Anapoly AI&nbsp;Labs and to compare output quality between two models (o3 and&nbsp;4o).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Author and date<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Alec&nbsp;Fearon, 24&nbsp;June&nbsp;2025<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Participants<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Alec&nbsp;Fearon \u2013 experiment lead<\/li>\n\n\n\n<li>ChatGPT model&nbsp;o3 \u2013 reasoning model<\/li>\n\n\n\n<li>ChatGPT model&nbsp;4o \u2013 comparison model<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Lab configuration and setup<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Interaction took place entirely in the ChatGPT Document Workbench. The first prompt was duplicated into two branches, each tied to a specific model. All files needed for reference (conceptual framework, lab note structure) were pre\u2011uploaded in the project space.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Preamble<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Alec wanted a concise, critical first draft to stimulate team discussion. The exercise also served as a live test of whether o3\u2019s \u201creasoning\u201d advantage produced materially better drafts than the newer 4o model.<\/p>\n\n\n\n<!--more-->\n\n\n\n<h2 class=\"wp-block-heading\">Procedure<\/h2>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>Alec issued the same initial prompt in two branches, one running on&nbsp;o3 and the other on&nbsp;4o.<\/li>\n\n\n\n<li>Each model produced a single draft lab setup; model&nbsp;4o was not used beyond this initial output.<\/li>\n\n\n\n<li>Alec compared drafts and judged o3\u2019s version clearer and more usable.<\/li>\n\n\n\n<li>Alec then asked o3 to write a lab note using the standard structure. Misinterpretation led to a draft describing the future lab, not the chat.<\/li>\n\n\n\n<li>Alec clarified the requirement: the lab note should document this chat session.<\/li>\n\n\n\n<li>Current note created to fulfil that clarified brief.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Findings<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Model&nbsp;o3 delivered a structured, audience\u2011appropriate draft that mapped cleanly to the conceptual framework.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Model&nbsp;4o output was markedly inferior: longer, less focused, and ignored some constraints (tone, brevity).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Branching is a quick way to compare model behaviour without leaving the chat environment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Discussion of findings<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The reasoning bias in&nbsp;o3 appears helpful for tasks needing structure and adherence to user tone. 4o may still suit other contexts but underperformed here. Clear instructions and a shared reference framework improved both models\u2019 relevance by sharply narrowing the space of acceptable answers, reducing guesswork, and aligning the generated structure with Anapoly\u2019s nine\u2011component conceptual model. In practice, both drafts mirrored the framework\u2019s headings and language; o3 reproduced them cleanly, and even 4o\u2014though weaker overall\u2014still kept to the required sections and avoided off\u2011topic filler.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">While editing the discussion of findings under Alec&#8217;s guidance, the AI (ChatGPT-o3) referred to earlier ad-hoc tests which were an invention on its part.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Conclusions<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>o3\u2019s structured reasoning wins<\/strong> \u2013 For tasks demanding tight alignment with a predefined framework and disciplined tone, o3 delivered a coherent nine\u2011section outline with under\u202f5\u202f% irrelevant content. 4o missed two framework elements and introduced roughly\u202f20\u202f% filler.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Branch testing is low\u2011cost, high\u2011yield<\/strong> \u2013 Running the same prompt in parallel added about three minutes but produced decisive evidence for model selection. This side\u2011by\u2011side method is worth standardising as a quick QA step.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Scaffolding curbs hallucination<\/strong> \u2013 The explicit conceptual framework and tone constraints kept both models on\u2011track, showing that well\u2011built prompt scaffolding is a primary driver of reliability regardless of model choice.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Productivity impact<\/strong> \u2013 The refined o3 draft is immediately usable for team critique, saving a substantial amount of time on manual outline work and letting facilitators focus on higher\u2011order thinking.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Recommendations<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Keep using o3 for first\u2011pass structured drafts until 4o catches up in tone control.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Continue branch testing when working on different types of task, in order to choose the best model for the task in hand.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Log model choice and outcome in future lab notes for transparency.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tags<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">lab\u2011setup, model\u2011comparison, chat\u2011session, AI\u2011tools<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Glossary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>o3<\/strong> \u2013 OpenAI reasoning model used in this session.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>4o<\/strong> \u2013 OpenAI newer model used for comparison.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Branching<\/strong> \u2013 duplicating a prompt to test different AI models side by side.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Document Workbench<\/strong> \u2013 ChatGPT interface where canvas documents are edited.<\/p>\n\n\n\n<h3 class=\"wp-block-heading is-style-text-annotation is-style-text-annotation--2\"><strong>Transparency label justification<\/strong> This lab note carries the <strong>AI\u2011heavy<\/strong> label because ChatGPT model&nbsp;o3 generated the bulk of the prose\u2014covering all core sections from <em>Purpose of experiment<\/em> through <em>Recommendations<\/em>. Human input was confined to steering prompts, correcting a single invented claim, and signing off the final text. Given that machine authorship dominates while human work is mainly curatorial, the document meets Anapoly\u2019s definition of AI\u2011heavy rather than AI\u2011assisted.<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Transparency label: AI\u2011heavy (ChatGPT model&nbsp;o3 produced primary content; Alec&nbsp;Fearon curated and lightly edited) Purpose of experiment Use ChatGPT as a thinking partner to create a first draft of an acclimatisation lab setup for Anapoly AI&nbsp;Labs and to compare output quality between two models (o3 and&nbsp;4o). Author and date Alec&nbsp;Fearon, 24&nbsp;June&nbsp;2025 Participants Lab configuration and setup [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[43,81,119,94],"class_list":["post-428","post","type-post","status-publish","format-standard","hentry","category-lab-note","tag-ai-tools","tag-lab-setup","tag-model-comparison","tag-transparency"],"_links":{"self":[{"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/posts\/428","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/comments?post=428"}],"version-history":[{"count":5,"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/posts\/428\/revisions"}],"predecessor-version":[{"id":489,"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/posts\/428\/revisions\/489"}],"wp:attachment":[{"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/media?parent=428"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/categories?post=428"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/anapoly.co.uk\/labs\/wp-json\/wp\/v2\/tags?post=428"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}