AMI
Guide

What Is Data Fabrication in Research? Definition and Famous Cases

Data fabrication is the most clearly defined form of academic misconduct — and the most consequential when it affects research literature. The AMI's D6 dimension tracks it. Here is what it is, how it is detected, and the cases that defined modern research integrity.

TL;DR

Data fabrication is making up research data; data falsification is altering real data. Both are research misconduct. The AMI's D6 dimension scores them via Retraction Watch. China scores 100 (highest), Russia 78, India 70, Iran 65. Famous cases: Diederik Stapel, Hwang Woo-suk, STAP cells.

data fabricationresearch misconductD6Retraction Watchguide

TL;DR

Data fabrication is making up research data. Data falsification is altering real data. Both are research misconduct. The AMI's D6 dimension scores fabrication via Retraction Watch data normalised by publication volume. China scores 100 (highest), Russia 78, India 70, Iran 65. Famous cases include Diederik Stapel, Hwang Woo-suk, STAP cells, Marc Hauser, Macchiarini.

Definition

Data fabrication is the creation of research data that was not actually collected or measured. The researcher reports results from experiments that did not occur, observations that were not made, or measurements that were not taken.

Data falsification is altering or selectively reporting real data — manipulating images, omitting inconvenient measurements, or changing values to produce a more favourable result.

Both are research misconduct. The US Office of Research Integrity (ORI) groups them together with plagiarism under the umbrella of research misconduct (FFP: Fabrication, Falsification, Plagiarism).

Why it matters

Data fabrication affects the scientific record. Unlike student plagiarism — which damages credentialing but does not propagate into ongoing research — fabricated data enters the literature, gets cited, and shapes subsequent research:

  • Fabricated medical research can mislead clinical practice
  • Fabricated psychology research can shape policy decisions
  • Fabricated engineering research can affect engineering standards
  • Fabricated biology research can lead other researchers to chase non-existent phenomena

The consequences extend far beyond the individual misconduct case.

Detection methods

Statistical analysis

Real data shows expected variance patterns; fabricated data often does not. Forensic statistics has caught multiple major fraud cases by identifying impossible patterns — too-clean distributions, missing variance, statistical impossibilities.

Replication failures

Other researchers attempting to replicate findings discover they cannot reproduce the results. The most direct detection method, though slow and expensive.

Image forensics

Manipulation of images in microscopy, gel electrophoresis, and similar techniques can be detected through pixel-level analysis. Specialised tools (PaperWatcher, Imagetwin) check for duplicated or altered images.

Peer review

Reviewers identifying impossible claims or inconsistencies. Limited in catching fabrication that produces plausible-seeming results.

Post-publication review

PubPeer and similar platforms allow post-publication comment on potential misconduct. Has led to detection of major cases.

Whistleblower reports

Co-authors, lab members, or institutional colleagues reporting suspected misconduct. Many famous cases were initiated by whistleblowers.

What the AMI data shows

D6 scores on a 0–100 scale across the 39-country set:

Top D6 scoresScore
China100
Russia78
India70
Iran65
Pakistan65
Egypt60
Nigeria55
South Korea55
Lowest D6 scoresScore
New Zealand12
Ireland15
Sweden15
Norway15
Netherlands15
Singapore20
Kenya20
Vietnam22
Canada22

The D6 dimension is built directly from the Retraction Watch database, filtered to misconduct-linked retractions and normalised by publication volume. China's D6=100 reflects the highest misconduct-linked retraction rate per 10,000 publications in the dataset.

Famous cases

Diederik Stapel (Netherlands, 2011)

Dutch social psychologist who fabricated data in dozens of papers over years. The case led to revocation of his PhD title and broader Dutch reform of social psychology research practice. One of the largest fabrication cases by paper count.

Hwang Woo-suk (South Korea, 2005–2006)

Korean stem cell researcher who claimed to have produced patient-specific stem cell lines through somatic cell nuclear transfer. The results were fabricated; the cloning claims could not be replicated. The case prompted establishment of the Korea Research Integrity (KRI) framework.

STAP cells / Haruko Obokata (Japan, 2014)

Claimed novel stem-cell induction method via stress application. The Nature papers were retracted after replication failures and identification of image manipulation. The case led to JSPS and MEXT integrity reforms.

Marc Hauser (US, 2010)

Harvard primatologist who fabricated data in cognitive psychology research. Resigned from Harvard following ORI investigation.

Paolo Macchiarini (Sweden, 2014–2016)

Karolinska surgeon whose synthetic trachea transplant research was found to involve fabricated patient outcomes and missing ethical approvals. Multiple patients died. The case contributed to the establishment of Sweden's NPOF national misconduct board.

The detection-incidence challenge

Detected cases are not the same as actual incidence. The retraction rate measures what gets caught, not what occurs. Countries with stronger detection infrastructure (peer review, replication culture, post-publication review) report more cases. The AMI applies a detection correction factor but the fundamental challenge remains.

Sources

  • Retraction Watch Database, Crossref/GitLab (2026)
  • Fang, Steen & Casadevall (2012), PNAS: "Misconduct accounts for the majority of retracted scientific publications"
  • ORI (US Office of Research Integrity) case reports
  • AMI v1.5 methodology document

Full methodology | Download dataset

Related

Read the full methodology

Frequently asked questions

What is data fabrication?

Data fabrication is making up research data that was not actually collected or measured. It is distinct from data falsification, which is altering or selectively reporting real data. Both are research misconduct. Data fabrication damages the scientific literature by introducing false results that may be cited and built upon by subsequent researchers.

How is data fabrication detected?

Detection methods include statistical analysis of reported data (real data shows expected variance patterns; fabricated data often does not), replication failures, image forensics (image manipulation in microscopy and gel electrophoresis), peer review, post-publication review on platforms like PubPeer, and whistleblower reports. The Retraction Watch database catalogues confirmed cases.

What are famous cases of data fabrication?

Major cases include: Diederik Stapel (Dutch social psychologist, dozens of fabricated papers, 2011); Hwang Woo-suk (Korean stem cell researcher, fabricated cloning results, 2005–2006); Haruko Obokata / STAP cells (RIKEN, 2014); Marc Hauser (Harvard primatologist, 2010); the Macchiarini case (Karolinska, 2014–2016).

How to cite this article

APA: Booth, F. (2026). What Is Data Fabrication in Research? Definition and Famous Cases. Academic Misconduct Index. https://academicmisconductindex.com/blog/what-is-data-fabrication-research

BibTeX: @misc{booth2026what, author={Booth, Francisco}, title={What Is Data Fabrication in Research? Definition and Famous Cases}, year={2026}, url={https://academicmisconductindex.com/blog/what-is-data-fabrication-research}}

FB

Francisco Booth

Independent researcher, founder of the Academic Misconduct Index