The Difference Between "The Auditor" and "The Teacher" (and Why It Matters for Your Students)

Summary: If you've ever tried "Mark this out of 25" in ChatGPT and got a brutal score with cold, pedantic feedback, you haven't proved AI can't mark. You've just handed your marking pile to the wrong kind of AI. There are (broadly) two different "brains" under the bonnet, and they behave like two very different members of staff.

The Sunday night experiment we've all tried

We've all been there.

It's Sunday night, you're staring at a pile of Year 12 essays, and you decide to experiment. You copy a student's response, paste it into ChatGPT, and type:

"Mark this out of 25 and give feedback."

The result comes back instantly. The feedback looks clever. The grammar checks are spot on.

But the mark is brutal.

It gives a solid B-grade student a D. It picks holes in arguments that a human examiner would accept as reasonable. It feels pedantic. It feels cold.

You close the tab and think, "Well, AI is not ready for this yet."

In my view, the problem isn't that the models are "bad". The problem is that raw ChatGPT marking is the digital equivalent of handing your marking pile to the wrong member of staff.

At TeachEdge, we've spent over two years living inside this problem. The lesson has been clear:

You can't just ask AI to "mark." You have to assign the right brain to the right job.

The tale of two brains

Under the bonnet, there are broadly two families of AI "brains" you can use. I find it easiest to think of them as staff roles.

1) The Auditor (the ultra-strict Head of Department)

This is the reasoning-style model.

It's trained to work through problems step-by-step. It can be brilliant at maths, coding, and logic puzzles. It's also brilliant at interpreting rules.

But when you use it to mark essays, it often behaves like an ultra-strict, detail-obsessed Head of Department who treats the mark scheme like a legal contract.

If a student implies a point but doesn't spell it out in neat textbook language, The Auditor counts it as "not there".

Why raw AI marking fails here

If you use this "auditor brain" to mark History, English, or Business essays, it can destroy confidence.

It struggles with:

best-fit marking (judging the script as a whole)
credit for implied understanding
nuance in borderline scripts that capture the spirit but aren't perfectly expressed

2) The Teacher (the experienced classroom examiner)

The second family is the general chat/pattern-matching model.

It's extremely good at recognising patterns in how humans write, argue, and respond. It often feels closer to the judgement a real examiner makes.

Think of it as a tired, experienced classroom teacher who's marked hundreds of scripts and knows what a "Level 3 but messy" answer looks like.

When you ask this model to mark, it naturally leans towards:

best-fit judgement
positive marking (crediting what's there)
holistic evaluation (weighing several features together)

Why raw AI marking fails here too

This one feels more human, but without strong guidance it can drift.

It can become:

too generous
vague or generic
inconsistent on assessment objectives
wobbly on exam-board nuance

So the "teacher brain" needs a rubric to keep it anchored.

How we use both brains at TeachEdge

The mistake most teachers make is pasting an essay into ChatGPT and hoping it figures out which brain to use.

It usually guesses wrong.

Our solution is to separate the marking process into two distinct steps, assigning each to the specialist best suited for it.

Step 1: The Auditor builds the rubric

We use the strict, reasoning-style model behind the scenes — but we don't let it mark the essay.

Instead, we ask it to interpret the rules:

map the exam board criteria
encode assessment objectives
stress-test the mark scheme
define what evidence looks like at each level

It does the heavy lifting of a Head of Department preparing for the academic year.

It ensures rigour.

Step 2: The Teacher marks the essay

Once the rules are set, we hand the student's work to the "teacher brain".

We give it the rigorous rubric created in Step 1, but we instruct it to behave like an experienced examiner:

assign a level that reflects overall quality
highlight what the student has done well
identify the most important improvements for next time

This hybrid approach gives the balance we need:

rigour from the mark scheme
human judgement in how that scheme is applied under exam conditions

What this means for you (even if you never use TeachEdge)

Understanding this distinction will help you get better results with any AI tool.

If your AI experiments feel harsh and nit-picky, you're probably trapping an "auditor brain" in a creative, best-fit task.

If they feel vague, you're using a "teacher brain" with no rubric.

To get closer to a human mark yourself:

Context is king: specify exam board, qualification, and question type
Paste the rules: include a simplified version of the mark scheme or AO descriptors
Set the tone: explicitly ask for positive marking and best-fit judgement
Do the human check: treat the mark as a starting suggestion, not a verdict

Summary

When I started TeachEdge, I didn't want a "magic examiner" that would replace me.

I wanted a trusted colleague who could take a first pass through a pile of essays, give broadly fair marks, and leave me to make the final call.

To get there, we had to stop treating ChatGPT like a magic box and start treating it like a team of specialist tools.

You absolutely can use AI to make marking faster, fairer, and less exhausting — but only if you assign the right brain to the right job.

Gary Roebuck is Head of Economics at Holy Cross School, New Malden and the creator of TeachEdge.ai.

Summary: If you've ever tried "Mark this out of 25" in ChatGPT and got a brutal score with cold, pedantic feedback, you haven't proved AI can't mark. You've just handed your marking pile to the wrong kind of AI. There are (broadly) two different "brains" under the bonnet, and they behave like two very different members of staff.

The Sunday night experiment we've all tried

We've all been there.

It's Sunday night, you're staring at a pile of Year 12 essays, and you decide to experiment. You copy a student's response, paste it into ChatGPT, and type:

"Mark this out of 25 and give feedback."

The result comes back instantly. The feedback looks clever. The grammar checks are spot on.

But the mark is brutal.

It gives a solid B-grade student a D. It picks holes in arguments that a human examiner would accept as reasonable. It feels pedantic. It feels cold.

You close the tab and think, "Well, AI is not ready for this yet."

In my view, the problem isn't that the models are "bad". The problem is that raw ChatGPT marking is the digital equivalent of handing your marking pile to the wrong member of staff.

At TeachEdge, we've spent over two years living inside this problem. The lesson has been clear:

You can't just ask AI to "mark." You have to assign the right brain to the right job.

The tale of two brains

Under the bonnet, there are broadly two families of AI "brains" you can use. I find it easiest to think of them as staff roles.

1) The Auditor (the ultra-strict Head of Department)

This is the reasoning-style model.

It's trained to work through problems step-by-step. It can be brilliant at maths, coding, and logic puzzles. It's also brilliant at interpreting rules.

But when you use it to mark essays, it often behaves like an ultra-strict, detail-obsessed Head of Department who treats the mark scheme like a legal contract.

If a student implies a point but doesn't spell it out in neat textbook language, The Auditor counts it as "not there".

Why raw AI marking fails here

If you use this "auditor brain" to mark History, English, or Business essays, it can destroy confidence.

It struggles with:

best-fit marking (judging the script as a whole)
credit for implied understanding
nuance in borderline scripts that capture the spirit but aren't perfectly expressed

2) The Teacher (the experienced classroom examiner)

The second family is the general chat/pattern-matching model.

It's extremely good at recognising patterns in how humans write, argue, and respond. It often feels closer to the judgement a real examiner makes.

Think of it as a tired, experienced classroom teacher who's marked hundreds of scripts and knows what a "Level 3 but messy" answer looks like.

When you ask this model to mark, it naturally leans towards:

best-fit judgement
positive marking (crediting what's there)
holistic evaluation (weighing several features together)

Why raw AI marking fails here too

This one feels more human, but without strong guidance it can drift.

It can become:

too generous
vague or generic
inconsistent on assessment objectives
wobbly on exam-board nuance

So the "teacher brain" needs a rubric to keep it anchored.

How we use both brains at TeachEdge

The mistake most teachers make is pasting an essay into ChatGPT and hoping it figures out which brain to use.

It usually guesses wrong.

Our solution is to separate the marking process into two distinct steps, assigning each to the specialist best suited for it.

Step 1: The Auditor builds the rubric

We use the strict, reasoning-style model behind the scenes — but we don't let it mark the essay.

Instead, we ask it to interpret the rules:

map the exam board criteria
encode assessment objectives
stress-test the mark scheme
define what evidence looks like at each level

It does the heavy lifting of a Head of Department preparing for the academic year.

It ensures rigour.

Step 2: The Teacher marks the essay

Once the rules are set, we hand the student's work to the "teacher brain".

We give it the rigorous rubric created in Step 1, but we instruct it to behave like an experienced examiner:

assign a level that reflects overall quality
highlight what the student has done well
identify the most important improvements for next time

This hybrid approach gives the balance we need:

rigour from the mark scheme
human judgement in how that scheme is applied under exam conditions

What this means for you (even if you never use TeachEdge)

Understanding this distinction will help you get better results with any AI tool.

If your AI experiments feel harsh and nit-picky, you're probably trapping an "auditor brain" in a creative, best-fit task.

If they feel vague, you're using a "teacher brain" with no rubric.

To get closer to a human mark yourself:

Context is king: specify exam board, qualification, and question type
Paste the rules: include a simplified version of the mark scheme or AO descriptors
Set the tone: explicitly ask for positive marking and best-fit judgement
Do the human check: treat the mark as a starting suggestion, not a verdict

Summary

When I started TeachEdge, I didn't want a "magic examiner" that would replace me.

I wanted a trusted colleague who could take a first pass through a pile of essays, give broadly fair marks, and leave me to make the final call.

To get there, we had to stop treating ChatGPT like a magic box and start treating it like a team of specialist tools.

You absolutely can use AI to make marking faster, fairer, and less exhausting — but only if you assign the right brain to the right job.

Gary Roebuck is Head of Economics at Holy Cross School, New Malden and the creator of TeachEdge.ai.

The Difference Between "The Auditor" and "The Teacher" (and Why It Matters for Your Students)

Quick Summary

The Sunday night experiment we've all tried

The tale of two brains

1) The Auditor (the ultra-strict Head of Department)

2) The Teacher (the experienced classroom examiner)

How we use both brains at TeachEdge

Step 1: The Auditor builds the rubric

Step 2: The Teacher marks the essay

What this means for you (even if you never use TeachEdge)

Summary

Related Posts

Why TeachEdge.ai Gets Marking Right (When Others Don't Quite)

Straight from the Source: What GCSE Business Students REALLY Think About AI Essay Feedback (and why it matters)

Navigating AI Essay Marking and Feedback

The Difference Between "The Auditor" and "The Teacher" (and Why It Matters for Your Students)

Quick Summary

The Sunday night experiment we've all tried

The tale of two brains

1) The Auditor (the ultra-strict Head of Department)

2) The Teacher (the experienced classroom examiner)

How we use both brains at TeachEdge

Step 1: The Auditor builds the rubric

Step 2: The Teacher marks the essay

What this means for you (even if you never use TeachEdge)

Summary

Related Posts

Why TeachEdge.ai Gets Marking Right (When Others Don't Quite)

Straight from the Source: What GCSE Business Students REALLY Think About AI Essay Feedback (and why it matters)

Navigating AI Essay Marking and Feedback