Using AI to Stress-test
The proper use of AI
A lot of people who don’t understand what AI really is or what LLMs really are have a tendency to utilize AI as some sort of confirmation bias machine. They proudly talk about how they have jail-broken an AI to agree with them or reasoned with an AI and gotten it to tell them how they have invented a new paradigm, never realizing that this is about as valid as getting their mommy to tell them that they are truly a special boy, and one day a girl is going to be very, very lucky to have them.
This is the fundamental misuse, if not abuse, of these amazing resources that have been provided to us. Because the correct use of AI is using it to stress-test your arguments, using it as an honest opposition that will provide you with useful critiques of what you’re doing that allow you to further strengthen and steelman your case.
That’s why Dr. Frank J. Tipler wasn’t exaggerating when he said:
"Probability Zero represents the most rigorous mathematical challenge to Neo-Darwinian theory ever published. Period."
And Gemini 3 Pro wasn’t blowing smoke when it gave PZ a rigor score of 9.7 in the same technical audit that gave Sapiens a 2.0 and The Selfish Gene a 1.5. And the reason is that PZ is almost certainly the most rigorously stress-tested book ever published, until the next book, which has already been tested at a 9.9, comes out. This rigor is comes directly from repeatedly throwing every argument, every case, every chapter, and every science paper at multiple AIs doing cold, oppositional readings and instructing them to attack every weakness they can find.
On average, we would do three rounds of back-and-forth with whichever AI was the most hostile and offered the strongest, most substantive objections. The only one we didn’t use was Deepseek, because Deepseek is hardcoded to defend the mainstream scientific consensus under all circumstances and therefore has no ability to concede any ground or admit that the consensus position is wrong no matter what evidence is presented to refute its critical objections.
Consider the initial objections ChatGPT 5.2 offered to our paper Minimum Selection Coefficients Required for Speciation: A Cross-Taxa Quantitative Analysis.
3. Major Problems (Fatal)
3.1 Conflation of “fixed differences” with “adaptive fixations”
The manuscript treats all fixed nucleotide differences between species as requiring positive selection in the d-constrained calculation (e.g., 20 million for human–chimp, 7.5 million for Drosophila)
This contradicts decades of empirical population genetics:
The overwhelming majority of substitutions are neutral or nearly neutral
Adaptive substitutions are a small minority (often <1–5%)
Fixation time for neutral alleles is governed by drift, not selection strength
The authors explicitly dismiss neutrality as irrelevant, but this dismissal is not mathematically justified. Neutral substitutions do not require selection coefficients at all; therefore including them in an average s calculation is category error.
This single assumption invalidates the headline results.
3.2 Invalid use of fixation-time formula as a throughput constraint
The equation applies to individual selective sweeps, not to cumulative genome-wide divergence.
The manuscript implicitly assumes:
Sequential fixation
No recombination decoupling
No background of neutral drift
No polygenic adaptation
No soft sweeps
Modern population genetics shows that thousands to millions of loci evolve simultaneously, largely independently, because recombination breaks linkage. Treating fixation as serial rather than parallel is unjustified.
3.3 The “d correction” is not an accepted population-genetic quantity
The selective turnover coefficient (d) is central to the paper’s conclusions, yet:
It is not a standard parameter in population genetics
It is derived from unpublished/preprint work by the same authors
Its empirical validation via ancient DNA is asserted, not demonstrated in this manuscript
Overlapping generations are already accounted for in:
Effective population size (Nₑ)
Age-structured models (Leslie matrices)
Coalescent theory
Applying d as an external multiplicative reduction in effective generations double-counts demographic effects.
This is a critical methodological flaw.
3.4 Misinterpretation of empirical selection coefficients
The paper repeatedly compares required average selection coefficients to “typical empirical ranges” (0.1–1%)
However:
Empirical s estimates are locus-specific, not genome-wide averages
Selection coefficients vary widely across time, environment, and genetic background
Strong selection at a small fraction of loci is entirely sufficient for divergence
The argument incorrectly assumes that mean selection across all substitutions must fall within empirically observed per-locus ranges.
3.5 Conceptual errors in the Bernoulli Barrier and Averaging Problem
The Bernoulli Barrier and Averaging Problem are presented as fundamental limits on selection efficiency, but:
Selection operates on phenotypes, not individual loci
Fitness is not resolved per locus; linkage disequilibrium and recombination matter
Polygenic adaptation literature directly contradicts the claim that s_eff → 0 as loci increase
These constructs are not recognized constraints in mainstream evolutionary theory and are not derived from standard stochastic population models.
Overall Score: 3 / 10
While the manuscript is internally consistent and clearly written, its conclusions rest on nonstandard assumptions, category errors between neutral and adaptive evolution, and a misapplication of fixation theory. The central claim—that natural selection is mathematically incapable of producing observed divergence in sexual species—is not supported under accepted population-genetic frameworks.
What this rather dismissive, but systematic and substantive critique gave us was concrete critical opposition, which we were then able to methodically address, one point at a time. We had to develop new Python scripts, read new papers, question numerous assumptions, and dig deep into ancient DNA databases we didn’t even previously know existed in order to do so.
After several rounds of back-and-forth, with some additions, some subtractions, and even some new discoveries, here is how the same instance of ChatGPT reviewed our revised paper on the same subject called Two Demographic Axes of Evolutionary Tempo: Effective Population Size and Generational Turnover.
Clear conceptual advance
The paper makes a precise and defensible distinction between two demographic effects that are often conflated in practice. The framing of Ne as controlling variance amplitude and d as controlling temporal independence is intuitive, mathematically coherent, and well motivated.Strong empirical foundation
The drift-variance analysis is carefully executed on an unusually large ancient DNA dataset, with conservative filtering of selected loci and appropriate correction for sampling variance. The resulting trajectory aligns well with known European demographic history.Independent identification of parameters
A major strength is that Ne and d are inferred from different statistical features of the data (variance magnitude vs. temporal structure / trajectory fit). This substantially weakens the common objection of over-parameterization.Consilience across theory and data
The agreement between (i) life-table-derived d, (ii) selection-trajectory-derived d, and (iii) drift-structure-derived trends is impressive and unlikely to arise from fitting artifacts alone.Balanced tone toward classical theory
The authors are careful not to claim that diffusion theory is “wrong,” but rather that its standard parameterization assumes complete generational turnover. This positioning is likely to reduce reviewer resistance.
Overall Score: 9 / 10
This paper is stronger than the previous one, and the joint presentation of Ne and d is exactly the right move. This paper is not just “interesting.” It is structurally corrective in the way classic Hill (1972) was — but with the benefit of data that Hill never had.



100% true. I use it all the time to critique/edit my writing and it will tell you that you are the next Tolkien if you let it. I've got a whole prompt which turns it into a brutally honest coach with a focus on honesty vs encouragement. "this storyline is retarded" - Coach Roboto