In Search of Negativity
Ensuring that your Red Team provides the necessary stress-testing
A recent independent research project on India’s fertility-education relationship produced an unexpected finding about AI-assisted science. The researcher, C. Kererū, submitted his working paper to adversarial review by three AI models: DeepSeek, Grok, and ChatGPT. DeepSeek and Grok caught real methodological errors — a misspecified threshold test, a circularity concern in the small-area estimation, an over-reach in interpreting cultural residuals. ChatGPT waved the paper through. The paper’s conclusions were partially aligned with the mainstream consensus on female education and fertility decline, and ChatGPT apparently found nothing to fight about. The two models that were less deferential to the orthodoxy produced more useful critiques.
This isn’t an isolated pattern. In a separate ongoing collaboration on population genetics — my own work that directly refutes the modern evolutionary synthesis — ChatGPT has been consistently adversarial, throwing every objection the field would raise, demanding justification for every heterodox claim. Same model, opposite behavior. The variable isn’t the model’s quality or capability. It’s the relationship between the paper’s conclusions and the training-data consensus. ChatGPT isn’t agreement-biased or disagreement-biased. It’s orthodoxy-biased. It attacks work that challenges the mainstream and ratifies work that supports it, regardless of the actual methodological quality of either. Deepseek is much the same, and indeed, says so directly.
This has a practical implication that researchers using AI for review need to internalize: your Red Team AI should be chosen for the direction of its bias, not for its general intelligence. If your work challenges the consensus, a mainstream-biased model like ChatGPT will pressure-test you against exactly the objections you’ll face from the field — that’s valuable, even when the objections are motivated by paradigm defense rather than genuine methodological concern. But if your work supports the consensus, that same model becomes a yes-man, and you need a reviewer whose biases run the other way. The principle is simple: your reviewer should be uncomfortable with your conclusion, because comfortable reviewers don’t review. They ratify.
For researchers whose work is consensus-friendly, provoking a genuine adversarial review from AI requires deliberate friction. Several techniques work. First, frame the review prompt explicitly: tell the model to argue against your conclusions and find every reason the paper might be wrong, rather than asking for general feedback. Second, ask the model to steelman the strongest opposing position and then evaluate your paper from that position. Third, submit the same paper to multiple models with different training biases — in the Kererū case, DeepSeek and Grok found what ChatGPT missed. Fourth, and most effectively, ask the model to identify what your paper cannot distinguish — what alternative explanations are consistent with your data that your preferred interpretation doesn’t rule out. This forces engagement with the methodology rather than the conclusion, which is where the real errors live.
The broader lesson extends beyond any single model. Institutional peer review has always suffered from the same directional bias — reviewers are human, they have priors, and they’re more rigorous with papers they dislike than papers they like. The difference is that with AI, you can deliberately select for hostile priors and then survive them. A paper that passes adversarial review by a model that wanted to kill it is stronger than a paper that passes friendly review by a model that wanted to like it. The AI-augmented research model doesn’t just replicate institutional peer review faster. Done correctly, it produces something the institutional system never reliably delivered: review whose hostility is a feature, not a bug.
For more about using AIs for Red Team Stress-Testing, read HARDCODED: AI and the End of the Scientific Consensus.




Do you have suggestion for when doing frontier or extrapolatory research?
Found that Grok, Gemini and Claude behave almost exactly the same. Unless I make an obvious mistake, they go enthusiastically along, pulling a lot of adjacent theories into the frame by themselves, unprompted.
It is known models are trained for such exercises to encourage the user and go along as much as possible, but still feels eery.
Of course, when I ask ChatGPT to pluck holes in some mad theory of me and Grok, it immediately points out that a) math is missing, b) too many conjectures in the steps, c) my daily tokens have been spent and come back tomorrow.
I think there is a fourth aspect in this. I prefer to work with tools that can sustain criticism for a long period of time. My consideration of Chat GPT is harsh: it gives me the textbook. It does not criticise. I am investigating what else to use: I want hostile Ais that do not share my assumptions. I am not that impressed with Chat GPT or Grok over multiple rounds on the same topic: to date Opus 4.7 is the best l have found for interaction. It wrote most of the paper you cited.