Technical documentation

Methodology

The AMI uses a literature-derived weighting methodology for version 1.x, modelled on the approach used by the Corruption Perceptions Index in its first releases (1995–2001), before expert-panel perception surveys were added. All weights are transparent and independently verifiable from the cited sources.

The two-axis framework

Axis 1

Prevalence Score (P)

A weighted composite of estimated misconduct rates across six dimensions. Draws on live data (Google Trends, Retraction Watch, FOI disclosures) supplemented by country-adjusted literature estimates where live data is unavailable. Expressed 0–100; higher means more cheating estimated.

Axis 2

Response Quality Score (R)

Measures how robustly a country detects, investigates, and deters academic misconduct. Built from four policy sub-components: legislation, detection tool adoption, disclosure transparency, and penalty severity. Derived from policy research, not self-reported data.

The enforcement-detection correction: Institutions with strong enforcement report more misconduct because they catch more — they should not be penalised for this. The AMI applies a bounded ±10 point correction to the P-Score based on the R-Score, adjusting upward where enforcement is weak (under-detection likely) and slightly downward where enforcement is strong (reporting is more complete). Both the raw and corrected P-Score are published.

Score floor and ceiling: P-Scores are rescaled so the lowest-scoring country in this set receives 5 and the highest receives 95. A score near 5 does not mean zero cheating — it means the lowest estimated prevalence among the 28 countries currently scored. Both anchors will shift as coverage expands to more countries.

Six misconduct dimensions

Weights are derived from two factors: estimated prevalence (how common is this behaviour?) and severity (how much harm does it cause per incident?). Each factor scored 1–5 from the literature; the average is normalised across all six dimensions to sum to 100%.

Contract cheating / essay mills

Google Trends brand keywords; essay mill domain presence; Newton 2018 systematic review; Bretag et al. 2018

19.5%

AI-generated submissions

Guardian FOI 2025 (UK); Turnitin 2024 global data; Stanford Survey 2023; Feedough 2025

17.1%

Exam impersonation

ICAI surveys; Springer 2023 systematic review; country-adjusted literature estimates

14.6%

Plagiarism

ICAI / McCabe country-level data (20 countries); Pupovac & Fanelli 2015 meta-analysis; Curtis et al. 2021

17.1%

Collusion

ICAI / McCabe country-level data (15 countries); Bretag 2018; McCabe graduate surveys

14.6%

Data fabrication / falsification

Retraction Watch database (69,911 records); OpenAlex publication counts; Fanelli 2009; Fang et al. 2012 (PNAS)

17.1%

Data quality flags

Each country score carries a data quality flag indicating what proportion of dimensions used live data versus literature estimates.

Majority live

3 or more dimensions scored from live data sources. High confidence.

Mixed

At least 1 live dimension; remaining from literature. Moderate confidence.

Literature only

All dimensions from literature estimates. Lower confidence; treat with caution.

Limitations

D3 impersonation has no live data. Exam impersonation is the only dimension where all 28 countries use literature-derived estimates. No scalable live data source exists for this dimension — confirmed impersonation cases are either unreported or buried within general "exam cheating" categories. D3 contributes 14.6% of the P-Score entirely from estimates. Country scores on this dimension should be treated with particular caution.

Self-report bias. All literature base rates derive from self-report surveys that systematically underestimate true prevalence. AMI treats all prevalence estimates as lower bounds.

Coverage gaps. Most large-scale studies are conducted in English-speaking countries. Asian, African, and Latin American institutional data are underrepresented. Coverage gaps are flagged per country.

Reporting paradox. Institutions with strong detection look worse on raw misconduct rates. The enforcement-detection correction addresses this but is itself an analytical judgement.

AI dimension immaturity. D2 data is less than three years old. Estimates will be revised with each annual update as the evidence base matures.

How to cite

Booth, F. (2026). Academic Misconduct Index, Version 1.3. Retrieved from academicmisconductindex.vercel.app

Data released under Creative Commons Attribution 4.0. You are free to share and adapt with attribution.