The accuracy problem is worse than you'd think
Let's start with the numbers. ChatGPT's accuracy on medical questions varies wildly by specialty -- from as high as 95% in some areas to as low as 20% in others. For gynecologic oncology, the accuracy rate sits at about 45.6%. That's essentially a coin flip on questions about cancer.
But accuracy isn't even the biggest concern. It's the failure mode.
When AI gets a medical answer wrong, it doesn't say "I'm not sure." It delivers the wrong answer with the same confident tone as the right one. There's no hedging, no uncertainty signal, no clinical judgment about when to defer. A sub-50% accuracy rate delivered with 100% confidence is worse than no answer at all.
Fabricated references and invented science
The hallucination problem in medical AI is genuinely alarming. A study evaluating AI responses to cancer-related queries found a 39% hallucination rate. That means roughly four in ten responses contained information that was simply made up.
It gets more specific than that. One systematic evaluation found 592 hallucination instances across bibliographic fields in AI-generated medical content. These weren't misattributed citations. They were completely invented -- fake authors, fake journals, fake DOIs, fake findings. A physician checking the reference list would find papers that don't exist, cited in support of claims that may or may not be accurate.
In another case, ChatGPT invented gene identifiers when asked about BRCA genes -- the very genes involved in hereditary breast and ovarian cancer risk. It didn't misinterpret existing research. It fabricated genetic nomenclature that doesn't correspond to anything in the human genome.
These aren't edge cases. They're what happens when a language model optimizes for fluent, plausible-sounding output without any mechanism for verifying whether the content is true.
Patients are already acting on AI health advice
Here's the part that should concern every practitioner: people aren't just reading AI health responses out of curiosity. They're making decisions based on them.
Research shows users have changed their birth control method based on AI advice. Others have adjusted medication dosages. Some have delayed seeking care because an AI tool told them their symptoms were likely benign. These aren't hypotheticals. They're documented behaviors.
For reproductive health specifically, AI tools have misrepresented the safety profile of medication abortion and provided inconsistent guidance on fertility treatment options. Only 1.6% of published studies on AI in reproductive health are experimental -- meaning the vast majority of what we know about AI's performance in this space comes from observational or descriptive studies, not rigorous testing.
We're deploying these tools at consumer scale without having adequately tested them in the domain where patients are using them most.
The one-size-fits-all problem
Even when AI gets the facts right, there's a more fundamental mismatch with how restorative reproductive medicine works.
AI gives population-level answers. Ask it about endometriosis treatment, and you'll get the standard options ranked by how commonly they're prescribed. Ask about irregular cycles, and you'll get the textbook differential. The responses are generalized, aggregated, averaged across millions of data points.
RRM's entire model runs in the opposite direction. Each woman's Creighton chart is unique. Treatment protocols are built from individual charting patterns, hormone profiles, surgical findings, and the patient's own goals. A NaProTechnology evaluation for two patients presenting with the same chief complaint might lead to completely different treatment plans based on what their charts reveal over three to six months of observation.
That level of individualization is exactly what AI can't do. Not because the technology isn't sophisticated enough -- but because the methodology requires sustained observation of a specific person over time, interpreted by a trained clinician who knows that patient's history and goals. There's no shortcut to that, and a chatbot can't approximate it.
AI tools aren't useless. But the limits matter.
This isn't an argument that AI has no role anywhere in healthcare. For certain applications -- literature search, administrative tasks, population-level research analysis -- these tools can be genuinely useful. They're fast, they can process volume, and they don't get tired.
But for reproductive health, where the stakes are high, the medicine is individualized, and patients are making real decisions about fertility, pregnancy, and surgical intervention, the current accuracy isn't close to adequate. The technology gives confident-sounding answers to questions it doesn't understand well enough.
When your patient comes in and says "I asked ChatGPT about my treatment options," the right response isn't alarm. It's an opportunity to explain what individualized care actually means -- why her specific chart matters more than a generalized answer, and why the time you spend reviewing her cycles isn't something an algorithm can replace.
Your patients deserve better than a coin-flip accuracy rate on questions about their reproductive health. And they already have it. That's the point of the medicine you practice.
Sources
Frequently asked questions
How accurate is ChatGPT for reproductive health questions?
Accuracy varies significantly by topic, but studies show ChatGPT achieves only about 53% accuracy for gynecologic oncology questions. For reproductive health more broadly, accuracy is inconsistent, and the tool has been found to misrepresent safety information about medications and provide unreliable guidance on fertility treatment options.
What are AI hallucinations in medical contexts?
AI hallucinations occur when a model generates information that sounds plausible but is factually incorrect or entirely fabricated. In medical contexts, this has included made-up bibliographic references (592 fabricated citations in one study), invented gene identifiers for BRCA genes, and inaccurate safety profiles for medications. The model presents these fabrications with the same confidence as accurate information.
Are patients making health decisions based on AI advice?
Yes. Documented cases include patients changing birth control methods, adjusting medication dosages, and delaying care based on AI-generated health information. For reproductive health specifically, this is concerning because the guidance AI provides is generalized and may not account for individual clinical circumstances.
Why can't AI replicate the NaProTechnology approach to reproductive health?
NaProTechnology depends on individualized evaluation built from months of Creighton Model charting, hormonal testing, and clinical observation specific to each patient. AI provides population-level answers based on aggregated data. The methodology requires a trained clinician interpreting a specific patient's patterns over time, which is fundamentally different from generating a generalized response to a text query.
Should RRM practitioners be concerned about patients using AI for health information?
Practitioners should be aware that patients are using these tools and sometimes acting on the results. Rather than dismissing AI outright, it's an opportunity to explain the value of individualized care and why a patient's specific Creighton chart and clinical history provide better guidance than a generalized AI response. The conversation reinforces the core value of the restorative medicine approach.