AI Reasoning vs. AI Justification
I thought I’d note a few ideas that I observed in a CS class this spring. A key question that (PhD) students were pondering was about AI “reasoning”. There are interesting papers that show ‘better’ (or more aligned) output when humans prompted AI bots with extra context like “think step by step”. The AI obliged by ‘showing its work’, which suggested to folks evidence of reasoning skills. Yet, alternative evidence suggested that the reasoning was instead justification of the steps. Deeper dives were showing that the sequence of events was (1) take an action (2) justify that action, rather than (1) reason and documentation, followed by (2) take the action. The chain of thought and other analyses are ‘old’ now—much of it from way back in 2022 or 2024—but the substantive question of what ‘reasoning’ looks like in AI models remains.
scott cunningham has another very interesting post today in the Claude Code series about looking at files (he calls them diaries) that are kept by AI to better understand issues like p-hacking that may be going on under the hood yet not (explicitly) asked by the human in the prompt.
But as an economist, Scott knows well that we (economists) do not primarily take people at their word—we often use revealed preference arguments that “actions speak louder than words”. Because people don’t do what they say they will do, they do not report ‘why’ they do things accurately, and they/we have a ton of cognitive biases that mean that ‘self reports’ are taken with a grain of salt.
AI is both the same and different here. A similarity is that we should not ‘trust’ the written diary as truth. I personally think we continue to fool ourselves to treat the AI output as evidence of any human-like reasoning. We are (correctly) impressed with how good AI is at some of the work we assign it, but that is different from seeing evidence of reasoning.
And we are far away from figuring out how to understand the model weights vs. local context (including skills) contributions of the impressive output we can (sometimes) get from AI prompting.
When AI does something we don’t anticipate (e.g. specification searching for the ‘right’ answer), we need to consider the difference between actively ignoring our prompts (that may say “don’t p-hack”, but often do not) and AI filling in the details that are not explicitly stated in the prompt (plus whatever context and skills) are fed to it. We should blame ourselves and not AI when we are not clear about something and do not like what AI did with our lack of clarity. A large set of problems we see with the output is “user error” in my own view, including most of the problems I have with my own AI output.
I have further details (that I am still digesting) here, where I asked ChatGPT to provide more information about JSON files and reasoning, etc:



