AI fuels surge in questionable health research, study warns

Image by Pexels from Pixabay

Image by Pexels from Pixabay

03 July 2025

A sharp rise in potentially misleading health research articles could be due to the use of artificial intelligence tools, a new study suggests.

Researchers from Aberystwyth University and the University of Surrey analysed 341 studies published over the past decade that used a large publicly available US dataset to link individual predictors - such as diet, lifestyle, or environmental exposures - to specific health outcomes.

The investigation found that many of these studies followed an almost identical structure: isolating one variable, testing its association with a health condition, and publishing the result, often without accounting for confounding factors or correcting for multiple comparisons.

This approach, while easy to automate, increases the risk of false positives and misleading conclusions, and fails to meet basic standards of scientific rigour.

The researchers noted a surge in publications based on the US National Health and Nutrition Examination Survey (NHANES) dataset in 2024, with 190 papers published in just the first nine months, compared to just four papers in 2014.

Professor Reyer Zwiggelaar from the Department of Computer Science at Aberystwyth University, and a Senior Research Leader for Health and Care Research Wales, said:

“We’re seeing a troubling surge in AI-assisted papers that prioritise quantity over quality.  These studies often adopt a formulaic and simple methodology and ignore statistical best practices.  This leads to oversimplified conclusions and increases the risk of false discoveries.  Our findings raise serious concerns about the misuse of AI in scientific publishing.”

The study highlights how AI tools are being used by ‘paper mills’ - organisations that mass-produce academic papers for profit. Such companies can rapidly generate manuscripts by plugging variables into pre-written templates, often with minimal human oversight or scientific justification.

Another concern is the selective use of data from NHANES – a survey which publishes health, nutrition and behaviour data from people across the United States. Many studies analysed only narrow timeframes or specific subgroups without clear justification for this. Known as ‘data dredging’, this practice can lead to cherry-picked results that appear statistically significant but lack real-world relevance.

Co-author Charlie Harrison from the Department of Computer Science at Aberystwyth University said:

“AI can be a powerful tool, but when misused it can undermine scientific integrity and flood the literature with unreliable findings.  This isn’t just an academic issue - when flawed research enters the literature, it can mislead clinicians, confuse policymakers, and ultimately harm public trust in science. It will also be used to train the next generation of AI models, so the problem is baked in.”

Publishing their findings in PLOS Biology, the authors propose guidelines for researchers, data custodians, publishers, and peer reviewers to improve statistical practices, ensure transparency, and guard against unethical publication practices.  These include encouraging multifactorial analyses, transparent data selection, and stronger editorial oversight.

Professor Reyer Zwiggelaar from Aberystwyth University added:

“As research using large datasets and AI tools becomes more common, it is vital that we safeguard research integrity. Reviewers play a key role in spotting weak or misleading research before it gets published. To help protect the quality of science, we suggest a few important steps: data providers should track how their data is used, journals should reject low-quality papers early, and experts in statistics should help review complex studies. We also encourage open discussion after papers are published, so mistakes or problems can be addressed quickly. These measures are not just safeguards - they are essential steps toward preserving the credibility and value of scientific discovery in the age of big data.”