Version 1.3 data sources explained
A plain-English explanation of where the AMI data comes from — Retraction Watch, Google Trends, FOI disclosures, and the ICAI McCabe surveys.
Version 1.3 of the AMI uses five categories of data source across six misconduct dimensions. Here is a plain-English explanation of each.
Retraction Watch (D6 — data fabrication)
The Retraction Watch database, now hosted publicly on GitLab via Crossref, contains 69,911 retraction records as of April 2026. We filtered these to 5,390 records where the retraction reason indicated misconduct (fabrication, falsification, fraud, or manipulation of images), then normalised by each country's publication volume using the OpenAlex API.
This gives us a retractions-per-10,000-publications rate per country — a measure that accounts for research output size rather than penalising high-output countries for having more absolute retractions.
Google Trends (D1, D2)
We run two sets of Google Trends queries. The first uses generic contract cheating terms ("buy essay online", "essay writing service"). The second uses specific essay mill brand names ("ukessays", "edubirdie", "papersowl") which are more precise signals of demand.
Both are pulled at country resolution across a 4-year timeframe (2022–2026) and normalised so the top country scores 100.
FOI-derived AI misconduct rates (D2)
The Guardian published a Freedom of Information investigation in June 2025 showing nearly 7,000 UK university students were formally caught using AI tools in 2023–24 — 5.1 per 1,000 students. Times Higher Education published similar FOI data for Russell Group universities.
We use the confirmed case rate, apply a detection ratio correction (Scarfe et al. 2024 found 94% of AI submissions went undetected at the University of Reading), and derive an estimated true rate.
ICAI / McCabe survey data (D4, D5)
Donald McCabe's surveys, conducted between 2002–2015 across 70,000+ students, remain the gold standard for self-reported plagiarism and collusion rates. The International Center for Academic Integrity (ICAI) has published country-level breakdowns. We use these for 20 countries on plagiarism and 15 countries on collusion, replacing regional extrapolation with actual survey data where it exists.
Literature-derived estimates
For dimensions and countries where no live data exists, we use country-adjusted estimates grounded in the peer-reviewed literature. Regional multipliers are applied to global base rates, with country-specific overrides where national studies exist (e.g. Bretag 2018 for Australia, Curtis et al. 2021 for the UK, Eret & Ok 2014 for Turkey).
Written by Francisco Booth, independent researcher.