AI-Created Exam Papers Go Undetected: New Study Highlights the Challenges Facing Academic Integrity

Researchers reveal that 94% of AI-generated exam submissions were not identified as artificial, raising concerns over academic integrity in universities.

Researchers reveal that 94% of AI-generated exam submissions were not identified as artificial, raising concerns over academic integrity in universities.

A new study has demonstrated the alarming ease with which artificial intelligence can infiltrate and excel in university examinations, potentially threatening the integrity of higher education assessments.

Conducted by researchers from the University of Reading and the University of Essex, the study injected 100% AI-written submissions into the exam system for a BSc degree in Psychology at a reputable UK university.

The findings reveal that a staggering 94% of these AI submissions went undetected, often scoring higher than those submitted by real students.

The study, by researchers Peter Scarfe, Kelly Watcham and Etienne Roesch of the UK’s University of Reading and Alasdair Clarke of the University of Essex, was published in the journal PLOS One on June 26, 2024.

Study Participants and Methodology

The study involved five undergraduate modules across all years of study for a BSc degree in Psychology.

Researchers used the AI model ChatGPT (specifically, GPT-4) to generate answers for short-answer questions (SAQs) and essay-based questions.

The AI submissions were then included anonymously in the pool of exam papers for each module.

In total, 63 AI-written answers were submitted alongside 1134 real student submissions.

Demographics and Recruitment

The study was carried out at a UK university, focusing on undergraduate psychology students.

The participants included students from all years of the BSc Psychology program.

The AI-generated answers were designed to mimic typical student responses, ensuring a realistic comparison between AI and human-generated exam papers.

Results

The results showed that 94% of AI submissions were not detected as being written by artificial intelligence.

Moreover, these AI-generated papers consistently received higher grades than those of real students.

Specifically, AI submissions were, on average, awarded grades half a classification boundary higher than their human counterparts.

In terms of specific numbers, AI submissions outperformed real student submissions in 83.4% of cases.

The probability that AI submissions would achieve higher grades than real student submissions was nearly 100% for all modules except one.

Only 6% of AI submissions were flagged for any reason, and of these, just 3% specifically mentioned AI as a concern.

Implications for Academic Integrity

The findings raise significant concerns about the integrity of unsupervised assessments in higher education.

The fact that AI submissions not only passed undetected but also achieved higher grades highlights the urgent need for universities to reconsider their assessment methods.

Future Directions and Recommendations

Given the study’s findings, the researchers recommend several measures to address the issue.

First, implementing more robust AI detection tools is essential.

Additionally, designing assessments that are less susceptible to AI-generated answers will help maintain academic integrity.

Increasing awareness and training for educators on how to identify AI-written work is also crucial.

The researchers also suggest that future studies should explore the performance of AI in more complex and abstract reasoning tasks, which may be less vulnerable to AI infiltration.

Study Details

  • Title: A real-world test of artificial intelligence infiltration of a university examinations system: A “Turing Test” case study
  • Authors: Peter Scarfe, Kelly Watcham, Alasdair Clarke, Etienne Roesch
  • Publication Date: June 26, 2024
  • DOI: 10.1371/journal.pone.0305354