The Self-Learning Paradox: How to Actually Learn with AI

The same AI tool that helps one person learn more than twice as much in half the time can leave another person unable to quote the essay they just “wrote.” This path explains why, and how to make sure you end up in the first group.


Who this is for

You are a knowledge worker teaching yourself something new — a new stack, a new domain, a new language, a new discipline. You already use ChatGPT, Claude, Gemini, or NotebookLM. You suspect you are going faster than you used to. You also suspect, quietly, that you are remembering less.

This path is for you if:

  • You feel fluent in conversations with AI but blank the moment the AI is not in the room
  • You have watched a 30-minute AI explanation and realised two days later you cannot reconstruct the core argument
  • You want the acceleration AI offers without the cognitive atrophy the research is starting to document
  • You are building a learning system for yourself and need to know which AI habits to adopt and which to avoid

What this path covers

This is not a tool round-up. It is a working model of AI-assisted learning: why the same tool produces opposite outcomes in different learners, what “cognitive debt” actually is, and the five learning loops that turn an LLM into a tutor instead of a crutch. Once you understand the underlying mechanism, the tool choice becomes obvious.


Part 1 — The paradox: same tool, opposite outcomes

Start with two studies from the past eighteen months.

graph LR
    T[The Same LLM] --> H[Harvard 2025<br/>Fine-tuned tutor]
    T --> M[MIT 2024-25<br/>Free-form chat]
    H --> HR[Learned 2x as much<br/>Engagement doubled]
    M --> MR[Weakest brain connectivity<br/>Could not quote own essays]

    style T fill:#4a9ede,color:#fff
    style HR fill:#5cb85c,color:#fff
    style MR fill:#d9534f,color:#fff

Harvard, 2025. Kestin et al. ran a randomised trial in the Harvard physics course PS2. One hundred and ninety-four students were split between a standard active-learning classroom and a group that learned the same material using an AI tutor. The AI group learned more than twice as much in less time, with effect sizes between 0.73 and 1.3 standard deviations. Self-reported engagement roughly doubled.1

The critical detail, buried in the methods section: the tutor was not a raw LLM. It was fine-tuned with pedagogical constraints — brief replies, one step at a time, no full solutions, a forced Socratic posture. The students did the thinking. The AI did the scaffolding.

MIT Media Lab, 2024–25. Kosmyna and colleagues ran a very different study. Fifty-four participants wrote essays with or without an LLM while an EEG tracked their brains. Three conditions: LLM-assisted, search-assisted, brain-only. The LLM group showed the weakest neural connectivity of the three. They struggled to quote their own essays. They reported the lowest sense of ownership over the work. The researchers coined a phrase for what they were measuring: cognitive debt.2

Same technology class. Opposite outcomes. The difference is the loop the learner built around it.

The core question

Everyone is talking about whether AI is good or bad for learning. That is the wrong question. The right question is: under what conditions does AI accelerate learning, and under what conditions does it atrophy it? The paradox dissolves once you understand that the tool is neutral; the harness around it is not.


Part 2 — The promise: Bloom’s two-sigma, finally within reach

To understand what the Harvard result means, you have to go back to 1984.

Benjamin Bloom, one of the most cited educational psychologists of the twentieth century, published a now-famous paper titled The 2 Sigma Problem. Bloom compared students in conventional group classrooms against students receiving one-on-one tutoring on the same material. The result was staggering: the tutored students performed roughly two standard deviations better. The average tutored student scored at the 98th percentile of the conventional group.3

Bloom then posed the problem that became his legacy: how do we find group instruction methods that are as effective as one-on-one tutoring? For forty years, the answer was “we cannot — not affordably, not at scale.” One human tutor per learner is a luxury reserved for a handful of students with rich parents.

graph LR
    A[Conventional Class<br/>50th percentile] --> B[Mastery Learning<br/>84th percentile]
    B --> C[1-on-1 Tutoring<br/>98th percentile]
    C --> D[?<br/>AI tutor at scale]

    style C fill:#4a9ede,color:#fff
    style D fill:#5cb85c,color:#fff

The Harvard study is the strongest empirical claim yet that large language models, properly scaffolded, can approach the two-sigma result. Not because the LLM is smarter than a human tutor, but because it can do three things a classroom cannot:

  1. Always available, infinitely patient — a student can attempt the same problem at 2am as easily as at 2pm
  2. Pedagogical constraints built in — the tutor refuses to give full solutions, forcing retrieval-practice by design
  3. Socratic posture by default — the tutor asks questions rather than delivering monologues, triggering the effortful retrieval that produces durable memory

This is why the Khan Academy Khanmigo user base grew 731% year-over-year to more than two million learners in 2024-25.4 And it is why edX’s 2025 workforce survey found that 42% of workers now say their employer expects them to learn AI on their own time.5 Self-directed learning with AI is no longer a trend. It is the default modality for an entire generation of knowledge workers.

The promise in plain terms

For the first time in forty years, the economics of one-on-one tutoring may be solvable. A learner with $20 a month, the right loop, and enough discipline can approximate what used to require a private tutor. That is a genuinely new thing in the history of education, and the research is beginning to back it up.

But only if you build the loop.


Part 3 — The catch: cognitive debt is real

Now the other side.

The same body of 2024–26 research, read honestly, shows that unguided AI use produces the opposite of learning. Not “less learning.” Actively negative outcomes: better artifacts, worse learners.

Metacognitive laziness

Fan and colleagues, publishing in the British Journal of Educational Technology (December 2024), ran a 117-student study. One group wrote essays with ChatGPT. Another wrote without. A third wrote first and used ChatGPT only to revise.

The ChatGPT-from-the-start group produced the best essays and learned the least about the topic. They were no more motivated. They retained less. The authors coined a phrase for the pattern: metacognitive laziness — the offloading of the thinking about thinking, which is the part of learning that actually makes knowledge stick.6

The write-first-then-revise group preserved their learning. Same tool, different loop, completely different outcome.

Cognitive offloading and critical thinking

A 2025 study by Gerlich, with roughly 666 participants, found a significant negative correlation between frequent AI tool use and critical thinking ability, mediated by cognitive offloading.7 A related arXiv paper from July 2025 is bluntly titled “ChatGPT produces more ‘lazy’ thinkers: Evidence of cognitive offloading.”8

The illusion of explanatory depth

When you read a fluent AI explanation of a hard concept, your brain codes the fluency of the explanation as depth of your understanding. This is the classic illusion of explanatory depth (IOED), documented by Rozenblit and Keil in 2002 — and it is massively amplified by AI. A 2025 UC study had students rate their comprehension of a concept before and after explaining a ChatGPT answer out loud. Self-ratings dropped sharply once the AI was no longer in the room. Merely reading a warning about AI limitations did not reduce the illusion.9

Hallucination in self-learning contexts

A Deakin University study examined GPT-4o’s citations in a mental-health literature review. 56% of all generated citations were fake or contained errors.10 In a classroom or a research team, an advisor catches that. In a self-directed learning context, nobody does. You incorporate the fabrication into your mental model and move on.

graph TD
    U[Unguided AI Use] --> B[Better Artifact]
    U --> W[Worse Learner]
    B --> F[Fluency Illusion]
    W --> F
    F --> D[Cognitive Debt]

    style U fill:#d9534f,color:#fff
    style D fill:#d9534f,color:#fff

EDUCAUSE Review summarised the whole 2025 literature in four words: “Better results, worse thinking.”11

The uncomfortable truth

The default LLM experience — open chat, paste problem, drink output — is optimised for artifact quality, not for learning. The artifact gets better. The learner gets worse. The two outcomes move in opposite directions, which is exactly what “debt” means. You are borrowing understanding you will eventually have to pay back.


Part 4 — The reframe: a harness, not a hose

Here is the honest synthesis of the research.

If you treat an LLM like a fire hose — point it at your problem, drink the output, move on — you will produce better work and build cognitive debt. That is what the MIT EEG study measured. That is what the metacognitive-laziness paper measured. That is what EDUCAUSE is describing.

If you treat an LLM like a harness — a structure that forces you to do the generative, effortful, error-catching work while the AI handles scaffolding, interrogation, and retrieval — you get the Harvard result. Engagement doubles. Learning doubles.

graph LR
    L[Learner] --> H[Harness<br/>Loop structure]
    H --> AI[LLM]
    AI --> H
    H --> L

    style H fill:#4a9ede,color:#fff

The difference is not the model. It is the loop you build around it.

This idea has now been tested directly. A 2025 cross-country experiment (MDPI Data, n=150, participants in Germany, Switzerland, and the UK) found that structured prompting significantly improved critical reasoning and reflective engagement, while unguided AI use produced cognitive offloading without any reasoning gains.12

Same tool. Same task. Loop presence or absence flipped the outcome. This is the cleanest empirical support we have for the harness principle, and it tells you where to put your attention. Not on the model. Not on the prompt. On the loop the prompt lives inside.

The harness principle

You do not “learn with AI.” You learn inside a loop that includes AI. The loop is what you are designing. The LLM is one component of it. Swap the loop, and the same LLM produces opposite outcomes. That is the only thing you need to remember from this path.

A harness has four jobs:

  1. Force the learner to generate first — no reading the answer before attempting the problem
  2. Put the AI in an interrogator role, not an answerer role — the AI asks, the learner does the work
  3. Catch errors via structure, not vibes — explicit review steps, cross-checking against primary sources, spaced recall
  4. Measure whether learning actually happened — can you explain it without the AI in the room?

Parts 5 and 6 give you the concrete loops and the concrete tools.


Part 5 — Five learning loops that actually work

These are the interventions the 2024–26 research supports. Each one is a specific answer to “how do I keep the cognitive load on my brain instead of the model’s?”

graph TD
    H[Harness Principle] --> L1[1. Write First,<br/>Revise with AI]
    H --> L2[2. Inverted<br/>Feynman]
    H --> L3[3. Structured<br/>Prompting]
    H --> L4[4. Socratic /<br/>Study Modes]
    H --> L5[5. Concept Graph<br/>PKM + AI]

    style H fill:#4a9ede,color:#fff

Loop 1 — Write first, revise with AI

This is Fan et al.’s clearest finding, and the easiest rule to apply: draft independently, then use AI to critique and revise.6

Students who wrote independently and used AI only in the revision stage preserved their learning. Students who used AI from the first sentence did not. Same essay quality; only one group actually learned the material.

The rule generalises to almost any knowledge task:

  • Coding: attempt the function yourself. Only after you have a working (or broken) first draft do you paste it into Claude or GPT for review.
  • Writing: draft the argument in your own words. Then ask the AI to find holes, counter-arguments, unclear transitions.
  • Strategy: make a decision and write down your reasoning. Then ask the AI to stress-test it.
  • Language learning: attempt the translation yourself. Then ask the AI to correct you and explain what you got wrong.

Generate before you converse. The act of attempting first is what creates the retrieval effort that makes learning stick.

Loop 2 — The inverted Feynman technique

The classic Feynman technique: explain a concept as if teaching a child, notice your gaps, return to the source, repeat.

The AI-adapted version reverses the roles: you explain the concept to the LLM, and the LLM plays the inquisitive novice — probing your clarity, your accuracy, your examples, your edge cases.

graph LR
    U[You: explain] --> A[AI: ask]
    A --> U2[You: defend]
    U2 --> A2[AI: probe deeper]
    A2 --> U3[You: find the gap]
    U3 --> S[Source: verify]

    style U fill:#4a9ede,color:#fff
    style U2 fill:#4a9ede,color:#fff
    style U3 fill:#4a9ede,color:#fff

The prompt is something like: “I am going to explain [concept] to you. Play the role of a curious beginner. Do not teach me. Ask me questions to find the weak spots in my understanding. When I give a vague answer, ask me to be more specific. When I give a confident answer, ask me for an example. Your job is to find what I do not know. Ready?”

This inversion is the single most important move in AI-assisted learning. The model asks. You do the thinking. The cognitive load stays on you, which is exactly where it has to be for learning to happen. You also get the benefit of an infinitely patient questioner who will not get bored, embarrassed, or distracted.

Loop 3 — Structured prompting scaffolds

The MDPI 2025 study found that explicit prompting scaffolds — plan → draft → critique → revise — produced significantly more critical reasoning than free-form chat.12

The mechanism is simple: each scaffold step forces a separate cognitive operation. Planning is a different mental act from drafting. Critiquing is a different act from revising. When you collapse them into a single “write me X” prompt, the AI does all four steps at once and you do zero. When you split them, you are forced to engage with each output before moving to the next step.

A working scaffold for learning a new concept:

  1. Plan — “Before I learn about X, help me draft a list of questions I should be able to answer by the end. Do not answer them. Just help me write good questions.”
  2. Attempt — close the chat. Try to answer your questions from scratch. Write down what you do not know.
  3. Draft — re-open the chat. “Here is my current understanding of X. Here are the gaps I hit.” Ask for targeted explanations of the gaps only.
  4. Critique — “Quiz me on the gaps I just filled. Do not give me the answers until I have tried. Mark me.”
  5. Revise — write a one-page summary from memory. Only then paste it into the AI and ask it to catch errors.

Five steps. Each one puts the cognitive work somewhere specific. The AI never writes the final summary. You do. That is what makes the summary stick.

Loop 4 — Socratic and study modes

The 2025 wave of dedicated study modes is the first time the major labs have shipped harness logic by default.

  • ChatGPT Study Mode (OpenAI, July 2025) — refuses to give direct answers; forces guiding questions, quizzes, and self-reflection prompts.13
  • Claude Learning Mode (Anthropic, April 2025) — Socratic posture built in for Claude for Education users.
  • Gemini Guided Learning (Google, 2025) — Google’s equivalent guided-tutor mode, focused on step-by-step reasoning rather than answers.

On a published academic teaching benchmark, SocraticLM — a dedicated Socratic model — outperformed vanilla GPT-4 by 12 percentage points at actually teaching, not just answering.14

When you catch yourself chasing answers instead of chasing understanding, switch to one of these modes. They take the “I’ll just ask the AI” temptation off the table.

Loop 5 — Concept-graph PKM with AI over your own notes

This is the one I use most, and it is the closest to the Yiuno-style approach.

A knowledge graph — Obsidian, Notion, a custom vault — where your notes are the source of truth, and an AI layer that queries only those sources. The AI can only cite your own prior work. The citation is grounded. The fluency illusion collapses because the AI cannot pretend to know things you have not written down.

graph TD
    S[Primary Sources<br/>Books, papers, docs] --> N[Your Notes<br/>The vault]
    N --> Q[AI Query Layer<br/>NotebookLM, Copilot]
    Q --> A[Answer grounded<br/>in your notes]
    A -->|feeds back into| N

    style N fill:#4a9ede,color:#fff

The cleanest implementations today:

  • NotebookLM — upload your sources; every AI answer cites a specific passage. Source-grounded by design.
  • Copilot for Obsidian — “vault QA” chat over your own notes; graph view plus AI clustering.
  • Claude Projects — upload a reference set; the AI answers only from that set.

This approach inverts the default LLM experience. Instead of asking the AI what it knows, you ask the AI what you know — and use it as a retrieval engine for your own mind rather than an external crutch. The notes have to exist first. The AI cannot generate knowledge you have not already metabolised.

Which loop to start with

If you are picking only one to adopt this week, pick Loop 1 (write first, revise with AI). It is the easiest to apply and has the strongest empirical support. If you adopt a second one, pick Loop 2 (inverted Feynman) — it will surface every gap in your understanding in about ten minutes.


Part 6 — The 2026 self-learner’s stack

A working toolkit, grouped by the job it does. One line each. Pick one tool per category; more than that is procrastination dressed up as preparation.

graph TD
    subgraph Tutor
        T1[ChatGPT Study Mode]
        T2[Claude Learning Mode]
        T3[Gemini Guided Learning]
        T4[Khanmigo]
    end
    subgraph Research
        R1[NotebookLM]
        R2[Perplexity]
    end
    subgraph Papers
        P1[Elicit]
        P2[Consensus]
        P3[SciSpace]
    end
    subgraph Retrieval
        RT1[Anki + FSRS5]
        RT2[RemNote]
        RT3[NotebookLM Flashcards]
    end
    subgraph Graph
        G1[Obsidian + Copilot]
        G2[Claude Projects]
    end

    style R1 fill:#4a9ede,color:#fff
    style G1 fill:#5cb85c,color:#fff

Tutoring modes (for Socratic scaffolding)

ToolWhat it is good at
ChatGPT Study ModeGuided questions, quizzes, self-reflection prompts. The most mature study mode shipped so far.
Claude Learning ModeSocratic posture, built into Claude for Education. Strong at long-form reasoning.
Gemini Guided LearningStep-by-step guided tutor; integrates well with Google Docs and YouTube.
KhanmigoK-12 leaning but excellent for fundamentals; refuses to give answers by design.4

Source-grounded research (for reading and summarising)

ToolWhat it is good at
NotebookLMUpload your own PDFs; every answer cites a specific passage. Audio overviews, study guides, flashcards (Sept 2025). The best source-grounded tool for learners.
PerplexityCited web search. Good for quick factual research with inline sources.

Academic depth (for peer-reviewed literature)

ToolWhat it is good at
ElicitLiterature review across 125M+ papers; extracts data tables automatically.
ConsensusSearches peer-reviewed papers; shows a “consensus meter” for how many studies support a claim.
SciSpaceInline “explain this passage” copilot on any PDF. 280M+ papers.

Retrieval practice (for long-term memory)

ToolWhat it is good at
Anki + FSRS5The gold standard for spaced repetition. LLM add-ons can generate cards; FSRS5 handles scheduling.
RemNoteNotes plus spaced repetition native. Good for integrated workflows.
NotebookLM FlashcardsSource-grounded cards generated from your own uploads (launched Sept 2025).

Concept graph + AI (for grounded long-term knowledge)

ToolWhat it is good at
Obsidian + Copilot for ObsidianVault-grounded chat over your own markdown notes. Graph view plus AI clustering.
Claude ProjectsUpload a reference set; Claude answers only from that set. Cleanest “personal corpus” setup.

Tool proliferation is a trap

Every tool in this list is capable. None of them will save you if the loop is wrong. Pick one per category, use it for a month, then decide. A learner using Anki + NotebookLM

  • Claude Projects with good loops will out-learn a learner with twelve subscriptions and no harness every time.

Part 7 — Designing your learning system

Putting it all together. A self-learning system is three things operating in a cycle.

graph TD
    G[Goal: what to learn] --> S[Source: primary material]
    S --> L[Loop: write first,<br/>invert Feynman,<br/>scaffold prompts]
    L --> N[Notes: your vault]
    N --> R[Review: retrieval practice<br/>spaced repetition]
    R --> C[Check: can you explain<br/>without the AI?]
    C -->|no| L
    C -->|yes| G

    style L fill:#4a9ede,color:#fff
    style C fill:#5cb85c,color:#fff

A weekly rhythm that works

WhenWhatHow long
MondayPick the concept for the week. Write down what you already think you know, before touching AI.20 min
Tuesday–ThursdayWork the material: read a primary source, use the inverted Feynman loop to find gaps, fill them with a scaffolded prompt. Take notes in your vault.30-45 min/day
FridayRetrieval day. Close all AI. Write a one-page summary from memory. Only then ask the AI to catch errors.30 min
WeekendSpaced recall. Review Anki or equivalent. Add this week’s cards to the rotation.10-15 min

The shape matters more than the schedule. Four principles:

  1. Primary sources before AI. Read the paper, the docs, the book, the code. The AI helps you digest them — it does not replace them.
  2. Your vault is the memory, not the chat. Chat history disappears; notes you wrote yourself do not. Put the knowledge in your own words in a place you own.
  3. Measure with a no-AI checkpoint. If you can explain it without the AI in the room, you learned it. If you cannot, you rented it.
  4. Spaced recall is non-negotiable. Without it, everything fades in a week. With it, knowledge compounds.

The one rule that captures everything

If I had to compress twenty studies into a single rule for anyone self-teaching with AI in 2026, it would be this:

The self-learner's rule

If you feel fluent but cannot explain it without the AI in the room, you did not learn it. You rented it.

That is the difference between the Harvard result and the MIT result. It is not the model. It is not the prompt. It is whether you built a harness or just opened a tap.

Build the harness.


The full map

graph TD
    subgraph Stakes
        PR[Promise:<br/>2-sigma within reach]
        CA[Catch:<br/>cognitive debt]
    end

    subgraph Principle
        HA[Harness not hose:<br/>the loop is everything]
    end

    subgraph Loops
        L1[Write first,<br/>revise with AI]
        L2[Inverted Feynman]
        L3[Structured prompting]
        L4[Socratic / study modes]
        L5[Concept graph PKM]
    end

    subgraph Stack
        TK[Tutor + Research +<br/>Papers + Retrieval +<br/>Graph]
    end

    subgraph System
        SY[Weekly rhythm +<br/>no-AI checkpoint +<br/>spaced recall]
    end

    PR --> HA
    CA --> HA
    HA --> L1
    HA --> L2
    HA --> L3
    HA --> L4
    HA --> L5
    L1 --> TK
    L2 --> TK
    L3 --> TK
    L4 --> TK
    L5 --> TK
    TK --> SY
    SY -->|iterate| HA

    style HA fill:#4a9ede,color:#fff
    style SY fill:#5cb85c,color:#fff

What you should understand now

After reading this path, you should be able to:

  • Explain the Harvard/MIT paradox and why the same LLM produces opposite learning outcomes depending on the loop
  • Describe Bloom’s two-sigma problem and what would have to be true for AI tutors to approach it
  • Name the four documented risks of unguided AI use: metacognitive laziness, cognitive offloading, illusion of explanatory depth, and hallucination in self-learning
  • State the harness principle in your own words and explain why it is a loop-level intervention, not a prompt-level one
  • Apply the five learning loops to a concept you are currently trying to learn
  • Choose one tool per category from the 2026 stack and justify the choice against the loop you are building
  • Run a weekly self-learning rhythm with a no-AI checkpoint and spaced recall

Check your understanding


Where to go next

Path A -- Go deeper on learning science

Read How Humans Learn for the full cognitive-psychology foundations behind retrieval practice, spaced repetition, desirable difficulties, and the six evidence-based learning strategies. Most of the loops in this path are AI-adapted versions of those strategies.

Path B -- Go deeper on AI mechanics

Read How to Talk to AI for the fundamentals of prompts, context, and harnesses. Then explore context-engineering and harness-engineering — the same harness principle that governs self-learning also governs how reliable AI systems are built.

Path C -- Build a personal knowledge system

Read What Gives Knowledge Meaning to understand how knowledge is represented and why a personal concept graph is more than just a note-taking habit. Then set up Obsidian, Copilot for Obsidian, and Anki, and run the Part 7 weekly rhythm for four weeks before adding any other tools.

Path D -- Apply the loops today

Pick one concept you are currently trying to learn. Spend 20 minutes with the inverted Feynman loop (Part 5, Loop 2). Then close the AI and write a one-page summary from memory. That is your baseline. Do this once a week for a month and compare your summaries.


Sources


Further reading

Resources

Footnotes

  1. Kestin, G., Miller, K., Klales, A., Milbourne, T., & Ponti, G. (2025). AI Tutoring Outperforms In-Class Active Learning: An RCT Introducing a Novel Research-Based Design in an Authentic Educational Setting. Scientific Reports, 15. The Harvard PS2 physics RCT showing effect sizes of 0.73–1.3 SD for a fine-tuned AI tutor vs. active-learning classroom.

  2. Kosmyna, N. et al. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing. MIT Media Lab preprint. 54-participant EEG study showing weakest neural connectivity in the LLM condition and the coining of “cognitive debt.”

  3. Bloom, B. S. (1984). The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring. Educational Researcher, 13(6), 4-16. The canonical paper establishing the two-sigma result of one-on-one tutoring.

  4. Khan Academy. (2025). Khan Academy Annual Report 2025. 731% YoY growth in Khanmigo users during SY24-25. 2

  5. edX. (2025). Economic Pressures Propel Workers Towards Upskilling. 2025 workforce survey finding 42% of workers say their employer expects them to learn AI on their own time.

  6. Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., Shen, Y., Li, X., & Gasevic, D. (2024). Beware of Metacognitive Laziness: Effects of Generative Artificial Intelligence on Learning Motivation, Processes, and Performance. British Journal of Educational Technology. 117-student study coining “metacognitive laziness.” 2

  7. Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. UCL summary of a ~666-participant study finding a significant negative correlation between frequent AI tool use and critical thinking.

  8. Yang, C. et al. (2025). ChatGPT Produces More “Lazy” Thinkers: Evidence of Cognitive Offloading. arXiv preprint.

  9. Hamilton, L. & Lombrozo, T. (2025). Reliance on ChatGPT Amplifies the Illusion of Explanatory Depth. UC eScholarship. Experimental demonstration that LLM explanations amplify the IOED effect.

  10. Maples, B. et al. (2025). Fabricated References in AI-Assisted Mental Health Literature Reviews. Deakin University. Finding that 56% of GPT-4o-generated citations were fake or contained errors.

  11. Watson, C. E. & Bowen, J. (2025). The Paradox of AI Assistance: Better Results, Worse Thinking. EDUCAUSE Review, December 2025.

  12. Möller, S. et al. (2025). Structured Prompting and Critical Reasoning: A Cross-Country Experiment on AI-Assisted Learning. Data, 10(11), 172. MDPI. 150-participant experiment showing structured prompting significantly reduces cognitive offloading and improves reasoning. 2

  13. OpenAI. (2025). Introducing Study Mode in ChatGPT. July 2025 launch announcement for Socratic-scaffolded study mode.

  14. Liu, J. et al. (2024). SocraticLM: Teaching Mathematics with a Socratic Method. OpenReview. Socratic-tuned LLM outperforming GPT-4 by 12 points on a teaching benchmark.