Skip to content

Research · April 2026 · 6 min read

How prompt structure changes model behaviour.

Three findings from the prompting literature, plus one recent survey that catalogues everything since. We summarise what each measured, what it implies for the prompts you write, and where the evidence stops.

Pergamum Research29 April 2026

01 · The cost of vague

Vague prompts make the model guess.

Every round-trip to a language model costs time and tokens. When the input is ambiguous, the model fills in the gaps with whatever the average request looks like — not yours. The fix isn't a longer back-and-forth. It's a better first message.

Here's the difference, made visible. Same model, same task. The prompts on the left of each pair were both used as the entire input. The outputs on the right are what came back.

Vague prompt
Write a marketing email for our new feature.

Model output

Subject: Exciting new feature! Hi there, We're thrilled to announce a powerful new feature designed to help you achieve more, faster. With our latest update, you'll unlock new capabilities that streamline your workflow and elevate your results. Whether you're a seasoned pro or just getting started, this feature is built with you in mind. Try it today and let us know what you think! Best, The team

Generic, padded, ends with a non-CTA. Twenty minutes from a usable email.

Structured prompt
You are a B2B SaaS copywriter who specialises in product launches.

Context: We are launching a collaborative prompt library for teams.
The audience is growth marketers at 50–500 person software companies.
They care about saving time and repeatable results.

Task: Write a launch announcement email for our new shared prompt library.

Constraints:
- Subject line under 50 characters
- Body under 180 words
- No jargon ("synergy", "leverage", "game-changer")
- End with exactly one CTA linking to the product page

Output format: Plain text. Subject line first, then body.

Model output

Subject: Stop pasting prompts. Save them. For growth marketers running 50–500 person teams: we just shipped shared prompt libraries. If your team copy-pastes the same twelve prompts into ChatGPT every week, you can now save them once, run them with variables, and share them with anyone you invite. No more Notion-doc maintenance. No more Slack-pin archaeology. Five minutes to set up. Free for the first three seats on your team. [Try shared prompts →]

Specific subject, clear audience, ends with one CTA. Send-ready.

Outputs are illustrative — produced by Claude Sonnet on default settings. Yours will differ.

The structured version takes about thirty seconds longer to write. It eliminates two to four follow-up messages. On any task you do more than once — every task, eventually — that math gets very good very fast.

02 · Anatomy

Five things every good prompt has.

Structure isn't about being long for its own sake. It's about removing whole categories of failure before the model ever sees the question. You don't need every block every time — but you should know which one you're leaving out, and why.

  1. 01

    Role / persona

    Sets the model's frame of reference and the knowledge it draws on.

    "You are a senior TypeScript engineer reviewing a pull request."

    Without a role, the model defaults to a generic helpful assistant. With one, it draws on a much narrower slice of its training and writes accordingly.

  2. 02

    Context

    The facts the model needs to avoid guessing them itself.

    "The codebase uses strict ESLint and targets Node 20."

    Missing context is the most common cause of wrong-but-confident answers. Spell out the facts so the model doesn't fill in plausible-sounding ones.

  3. 03

    Task

    A single unambiguous instruction. One task per prompt.

    "Refactor the function below to eliminate the nested conditionals."

    Two tasks in one prompt usually means one of them gets a half-effort answer. Split them. Ask one thing at a time, well.

  4. 04

    Constraints

    Boundaries that prevent unwanted outputs before they happen.

    "Do not change the function signature or return type."

    Constraints upfront beat corrections after the fact. Every "don't" you list saves one round-trip you would have spent fixing it.

  5. 05

    Output format

    Tells the model exactly what shape the answer should take.

    "Return only the refactored function — no explanation."

    If you don't say what you want back, you'll get markdown when you wanted JSON, paragraphs when you wanted bullets, and a polite preface either way.

Hover any row to see why the block matters.

03 · The research

The numbers don't lie.

Three findings spanning the foundational papers (2022) through the more recent reasoning techniques (2023). The 2024 Prompt Report — a systematic survey from a coalition of researchers across Maryland, OpenAI, Stanford, Microsoft, and others — catalogues 58 distinct prompting techniques across the literature, and the same patterns hold across all of them.

Chain-of-thought

GSM8K accuracy on PaLM 540B

Asking the model to “think step by step” jumped accuracy from 17.9% to 56.9% on grade-school math.

Wei et al., 2022 [1]

Tree of Thoughts

Game of 24 success rate, GPT-4

Letting the model explore and prune multiple reasoning paths takes a task it solves 4% of the time and pushes it to 74%.

Yao et al., 2023 [2]

Self-consistency

GSM8K — CoT vs. CoT + sampling

Sampling several chain-of-thought attempts and taking the majority answer adds another 17 points on top.

Wang et al., 2022 [3]

These aren't toy effects, and they aren't old news. The same patterns show up across the reasoning, coding, and knowledge benchmarks Anthropic, OpenAI, and DeepMind have continued to publish through 2024 and into 2025 — and increasingly across structured-output techniques like Anthropic's XML-tag prompting (typically 15-20% accuracy lifts on Claude over plain text), DSPy-style program synthesis, and ReAct-style tool use. Structure remains the cheapest performance lever you have access to: a minute of writing, and a model that performs as if it were a tier larger than it actually is.

Sources

  1. [1]Wei et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arxiv.org
  2. [2]Yao et al. (2023). Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arxiv.org
  3. [3]Wang et al. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. arxiv.org
  4. [4]Schulhoff et al. (2024). The Prompt Report: A Systematic Survey of Prompt Engineering Techniques. arxiv.org

04 · A real prompt, dissected

What it looks like in the wild.

Every block from the previous section, in a single prompt you could drop into your workflow today. The callouts on the right map each block to the lines it covers.

You are a UX researcher who specialises in B2B software.

# Context
I conducted 6 customer interviews this week. Each transcript is pasted below,
separated by "---". Participants were asked about their current prompt
workflows and pain points.

# Task
Summarise the interviews into a concise research brief.

# Constraints
- Max 400 words total
- Group findings by theme, not by participant
- Exclude anything mentioned only once (not a pattern)
- Flag any direct quotes worth keeping verbatim

# Output format
Markdown. Top-level heading per theme. Bullet points for findings.
Quotes in > blockquotes. End with a "Key tensions" section.

---
[TRANSCRIPT 1] …

Role

Line 1

Context

Lines 3–7

Constraints

Lines 12–16

Output format

Lines 18–20

The task itself is the shortest part. That's usually a sign you got the rest right.

What this means for you

Design the prompt once. Run it forever.

The prompts worth designing aren't the one-offs — those are faster to type fresh than to engineer. The ones worth designing are the ones you'll run again next Tuesday, and the Tuesday after that. Two minutes of structure today saves thirty seconds × the next fifty runs. Plus you don't have to remember the few-shot examples in week three. Plus your teammate doesn't have to reinvent it.

That's what Pergamum is for. It's a free, open library for the prompts that earn their keep. Every prompt has its variables broken out as fillable inputs, every one is tagged for the model it was tuned on, and every one is yours to copy, fork, and remix.

Design a promptBrowse the library →Submit your own →