AI-Assisted Devices in University Written Examinations: A Structural Threat to the Degree as a Warrant of Competence

Demetrios T. Venetsanos

doi:10.20944/preprints202606.2014.v1

Submitted:

25 June 2026

Posted:

29 June 2026

You are already at the latest version

Abstract

This paper examines the threat to academic integrity posed by students who may use contemporary technology to cheat undetectably in traditional written university examinations. The threat is a cheating pipeline formed by the convergence of three technologies: miniaturised cameras and earbuds; consumer AI smart glasses with near-invisible Heads-Up Displays; and Large Language Models capable of answering university-level questions to a passing standard or higher. The paper argues that, if used, this pipeline can compromise the integrity of closed-book, supervised examinations. Once covert assistance of this kind is feasible, the format can no longer reliably distinguish a candidate's own work from a machine's. It can therefore no longer substantiate the claim to competence that a degree makes on its holder's behalf. This is a structural problem: it concerns what the examination can certify for the cohort as a whole, not only the conduct of those who cheat. The evidence establishes that the threat is already real. Drawing on a narrative review of the academic-integrity literature (1966-2025), UK Ofqual malpractice statistics (2019-2025), and a documentary scan of commercially available AI-assisted devices (May 2026), the paper shows that device use in invigilated examinations is established and increasing in secondary education, with the conditions driving it plausibly extending to higher education, although direct sector-level evidence remains limited. It develops the ethical case that examination design and certification must be reconsidered as a matter of institutional responsibility. It concludes that what a degree certifies in the age of ambient AI cannot be left to detection technology to settle.

Keywords:

academic integrity

;

AI-assisted cheating

;

examination malpractice

;

large language models

;

smart glasses

;

wearable technology

;

higher education

;

assessment planning

Subject:

Social Sciences - Education

1. Introduction

Academic integrity is a basic principle of higher education. It supports the credibility of qualifications, the fairness of assessment, and the social trust that employers, professional bodies, and the public place in university awards. The International Center for Academic Integrity (2021) defines it as a commitment to honesty, trust, fairness, respect, responsibility, and courage. When operationalised, these values distinguish authentic learning from its simulation. However, when they are systematically undermined, the consequences extend beyond the individual student. The degree certificate becomes an unreliable signal of competence, and the awarding institution becomes complicit in misrepresentation that may carry professional, social, and, within safety-critical disciplines, legal consequences.

Academic misconduct is neither new nor rare. Sociological studies dating to the mid-twentieth century document its prevalence across educational systems (Harp & Taietz, 1966). Six decades of subsequent research confirm that a non-trivial minority of students engage in dishonest behaviour in every jurisdiction studied (Christensen Hughes & McCabe, 2006a; Christensen Hughes & McCabe, 2006b; Khalid, 2015; Makarova, 2019). What has changed, and changed rapidly, is the technical infrastructure available to a student who chooses to cheat. The progression from concealed crib notes to pre-programmed calculators to smartphone-based cheating is well documented (Burgason et al., 2019; Khalid, 2015), while each transition is characterised by increased capability, reduced cost, and reduced detectability of misconduct.

This paper argues that a further, qualitatively different transition is now underway, driven by the convergence of three developments: (a) the miniaturisation of earbuds, as well as of high-definition cameras into objects indistinguishable from everyday items; (b) consumer AI smart glasses with near-invisible Heads-Up Display (HUD); and (c) Large Language Models (LLMs) capable of answering university-level examination questions to a passing or higher standard. Together, these produce a cheating pipeline that, for the first time in the history of invigilated examinations, is genuinely difficult to detect under non-invasive countermeasures. Every hardware component described here is commercially available, some for under £20, and can be purchased without restriction from mainstream platforms, such as Amazon and eBay.

The closest existing contributions on the topic are by Susnjak and McIntosh (2024) and by Birks and Clare (2023). The former demonstrates LLM proficiency on examination questions but does not address the hardware pipeline through which an LLM might be accessed covertly during an invigilated examination. The latter applies Situational Crime Prevention theory to AI-facilitated misconduct, yet focuses on take-home and online assessment rather than invigilated written examinations. The originality of the present contribution rests primarily on this gap in the peer-reviewed record.

The paper makes three contributions:

To set the present threat in context. It places today's technology within the long history of academic misconduct and documents the specific consumer devices that constitute the current threat landscape.
To advance four arguments, one structural, two empirical, and one normative, as follows:

○

The structural argument is that the closed-book invigilated examination can no longer serve as a reliable warrant of individual competence, because covert AI-assisted devices make it infeasible, under non-invasive invigilation, to distinguish a candidate's own work from machine assistance. Consequently, the warrant fails for the cohort, not only for those who cheat.

○

The first empirical argument is that the use of electronic devices in invigilated examinations is established and increasing at the secondary education level. This argument is made with stated caution about the inferential gaps it involves:

○

The second empirical argument is that the conditions driving that use plausibly extend to higher education, although direct sector-level evidence remains limited.

○

The normative argument is that an institution which certifies competence is responsible for ensuring that its certificates mean what they claim, and that, since the examination can no longer establish competence, this responsibility now requires institutions to reconsider examination design and the meaning of certification.
To turn the aforementioned normative argument into a proportionate institutional response, working through the questions of certification, fairness, and responsibility that it raises.

The relationship among these arguments is deliberate. The structural argument depends on the empirical ones: the examination loses its power to vouch for competence precisely because covert assistance has become feasible. It is no longer reliably separable from independent work. The empirical evidence shows that this is not a distant prospect but a present and worsening reality, since the enabling hardware is already on sale, cheap, and improving. The normative argument rests on the structural one in turn. It joins the structural claim that the examination can no longer secure what it certifies to the premise that an institution certifying competence must ensure its certificates are truthful. From these, it follows that institutions ought to act, and the empirical picture fixes that obligation as immediate rather than deferred. The three strands, thus, do different but connected work: the empirical establishes that the threat is real and present, the structural establishes that it defeats the format's warranting function, and the normative establishes whose duty it is to respond.

To advance these arguments, the paper considers four Research Questions (RQ):

RQ1:: What do malpractice statistics show about the scale and trajectory of electronic-device cheating in examinations, and to what extent is the established secondary-level trend a valid leading indicator for higher education?
RQ2:: What cheating configurations and specific commercially available devices does current consumer technology enable in invigilated written university examinations?
RQ3:: How effective are available institutional countermeasures against these devices, and what are their principal limitations?
RQ4:: In light of RQ1-RQ3, can the traditional closed-book invigilated written examination still function as a reliable instrument of individual competence assessment, and what institutional response must the evidence, empirical and ethical, warrant?

These questions are addressed in Section 4, Section 5 and Section 6, and Section 7-Section 8, respectively. In particular, the rest of the report is structured as follows: Section 2 describes the methodology. Section 3 reviews the historical and theoretical literature, while Section 4 presents the empirical evidence on prevalence and trends. Section 5 documents the devices and cheating configurations, while Section 6 assesses countermeasures and their limits. Section 7 develops ethical and institutional analysis, Section 8 presents recommendations, and Section 9 concludes.

2. Methodology

This paper employs a mixed-methods design integrating three evidence streams: a narrative review, a secondary analysis of malpractice statistics, and a documentary scan of commercially available devices. Each is described below, with its rationale, its analytical approach, and, importantly, the limits on what it can establish.

2.1. Narrative Review

A narrative review was chosen in preference to a systematic one for two reasons. Firstly, the argument needs both a historical perspective spanning six decades and a theoretical synthesis across several disciplines: sociology, psychology, criminology, educational technology, and ethics. Secondly, the phenomenon is new enough that the relevant work is scattered across adjacent fields and uses inconsistent terminology, rather than forming a single, searchable body of literature. A systematic protocol designed to aggregate quantitative findings would risk excluding theoretically important work and would not serve a paper whose aim is to build an argument rather than estimate an effect size (Macfarlane et al., 2014; Snyder, 2019).

To make the basis of the synthesis transparent, a search was carried out across Web of Science, Scopus, Google Scholar, and ERIC between December 2025 and May 2026. The search combined terms covering academic integrity and misconduct, examination malpractice, electronic and wearable devices, large language models, AI-assisted and contract cheating, and detection. No date restriction was applied. The search returned 847 records. After duplicates were removed, the remainder were screened for direct relevance to examination-based assessment and for peer-reviewed or otherwise verifiable provenance, leaving 35 sources for close synthesis. Their reference lists were then hand-searched for further material. This procedure is reported for transparency, not as a claim to systematic-review completeness. The retained corpus was read thematically rather than aggregated, which yielded seven themes: historical persistence; electronic devices as vectors of cheating; AI capabilities as examination threats; conceptual models; detection and the arms-race dynamic; institutional policy responses; and the ethical dimensions of AI in assessment.

2.2. Secondary Analysis of Malpractice Statistics

A secondary analysis of publicly available malpractice statistics was conducted to compile a set of empirical indicators bearing on the use of devices during examinations. Rather than establishing a single baseline, the analysis arranges these indicators along a chain of concern: from a general willingness to subvert assessment, through self-reported dishonesty, to documented device offences during examinations. It draws on three jurisdictions, chosen for the quality and frequency of their data and for the range of measurement approaches they represent.

The documented device offences sit at the end of the chain closest to the phenomenon under investigation. For these, the main source was reports released by the Office of Qualifications and Examinations Regulation (Ofqual) on annual malpractice statistics for GCSE, AS, and A-Level examinations (2019-2025). In these reports, mobile telephones and other communication devices are treated as a separate category of offence (Ofqual, 2025). University-level context came from a Guardian Freedom of Information (FOI) investigation covering 131 UK institutions for the period 2023-24 (Goodier, 2025). These data come from a media investigation rather than a regulatory return, while more than a quarter of the institutions do not record AI misuse separately. They are, thus, used only as contextual evidence, with that limitation stated explicitly. Further along the chain, the Australian data came from large-scale studies by Bretag et al. (2019), with some results later corrected in Corrigendum (2020), and Awdry (2021). These measure self-reported outsourcing rather than device use during examinations and are treated only as evidence of a willingness to subvert assessment. The US data came from the McCabe/ICAI survey programme, which has reached more than 250,000 students and records self-reported academic dishonesty of all kinds, including examination cheating, plagiarism, and unauthorised collaboration.

As all the aforementioned figures come from three different collection regimes, they are not directly commensurable. Each is therefore treated as an independent indicator, bearing on one link in the chain of concern. They are neither aggregated nor compared as if they were equivalent magnitudes. This principle governs both what each source may establish and the caveats attached to it.

2.3. Documentary Product Scan

A documentary scan of commercially available AI-assisted devices was conducted in May 2026 to identify the specific consumer products that constitute the current threat by directly inspecting mainstream retail platforms. Search terms included ‘camera pen’, ‘AI smart glasses’, ‘HUD glasses’, ‘nano earbud’, ‘inductive earpiece’, and ‘GSM earpiece’. Products were included if they were available without restriction, could plausibly facilitate cheating in an invigilated written examination, and were priced within reach of most students (taken as under £700).

This stream is a documentary scan, not a tested empirical method, and its standing should be understood accordingly. No products were purchased, tested, or operated. Capabilities were taken from manufacturer specifications and, where available, independent consumer reviews. Moreover, claims that certain devices are difficult to detect under non-invasive invigilation are inferences from their stated technical characteristics rather than findings from controlled testing. Detection difficulty is reported on a four-point scale (Low, Moderate, High, Very High), anchored to three factors, namely the device’s physical concealability, its wireless transmission profile, and the countermeasures required to detect it. The detection difficulty reflects the author’s judgment rather than a measured value. Finally, product names, prices, and availability are volatile. Many listings are ephemeral and may not be retrievable for verification, making the catalogue as a snapshot of a product class rather than a stable inventory.

2.4. Integration of Evidence Streams

The three streams were integrated through convergent synthesis (Creswell & Plano Clark, 2017). Each stream was analysed independently before being brought together on the logic that a conclusion supported by independent methods, sources, and analytical perspectives remains more robust than one resting on any single stream. Table 1 presents the triangulation matrix, showing how each of the paper’s central claims is supported and, equally, where a given stream is silent. The weight any claim can bear is set by its weakest necessary link, not its strongest. This is made explicit in the empirical discussion of Section 4.

2.5. Ethical Issues

This research reported commercial products, legally available on mainstream platforms without restriction, solely to help interested parties understand and address a genuine and growing threat to examination integrity. The author acknowledges that this material has a dual-use character. The same documentation used to raise institutions’ awareness could also serve as a guide for a learner seeking to cheat. This concern was weighed against the benefits of publication, including institutional awareness, evidence-based policy, countermeasure design, and sectoral debate, which the author judges outweigh the risk for two reasons. Firstly, every product described is already discoverable through an ordinary retail search, so the paper adds analytical context rather than novel access. Secondly, the literature consistently finds that the primary barriers to cheating are motivation and opportunity rather than knowledge of methods (Birks & Clare, 2023; Stone et al., 2010). Consequently, a student motivated to cheat will find these products without this paper. Therefore, the present work is offered in the tradition of responsible disclosure, in which describing a vulnerability serves defence rather than attack. Regardless, and as a more conservative approach, all products mentioned in this paper (e.g. Table 3) are genericised.

3. Literature Review: Historical Foundations and Conceptual Models

3.1. The Historical Persistence of Academic Misconduct

Academic misconduct predates the information age. Harp and Taietz (1966) found that half of all students surveyed reported cheating on term papers, with the rate rising to 70.9% among engineering juniors. They argued that cheating is a form of deviance in which students accept institutionalised goals (academic success) while rejecting institutionalised means (legitimate study). This is a structural explanation that has proved durable. As reported in Christensen Hughes & McCabe (2006a; 2006b), Bowers found in 1964 that three in four students were engaging in at least one questionable academic behaviour, with 39% reporting serious test cheating. A replication by McCabe and Trevino (1993) in the same review found that two in three reported serious test cheating, with 64% reporting it. Misconduct has, thus, been a majority or near-majority behaviour for as long as it has been measured, and moral awareness does not reliably translate into honest behaviour. The same review reports that 80% of respondents in the earlier survey agreed they were morally obliged not to cheat. This moral-behavioural gap is among the most consistent findings in the literature (Christensen Hughes & McCabe, 2006a; Christensen Hughes & McCabe, 2006b; Khalid, 2015). International work also confirms the pattern. Khalid (2015) reports a prevalence of roughly 90% in New Zealand, 81% in Australia, and 53% in Canada, while Makarova (2019), studying 2,336 students across four countries, found significant cross-national variation she attributes to the maturity of institutional integrity systems rather than to differences in moral values.

3.2. Electronic Devices as a Primary Cheating Vector

Electronic-device cheating predates the AI era by decades. Khalid (2015) identifies the two most common techniques in supervised assessment as unauthorised cellular devices and prohibited crib notes (41% and 38% of respondents, respectively). The cellular-device behaviours documented in Khalid (2015), i.e. searching, texting answers, photographing papers, and storing answers, are direct precursors to the AI-assisted configurations discussed in this paper. Makarova (2019) found that device use during tests was perceived as the most extensive form of misconduct across all four countries in her study. Burgason et al. (2019) document the prior generation of this problem. A decade ago, students could already describe in detail how to evade a computer-based test, whether by hiding browser windows on screen or by retrieving material from a phone or smartwatch. Lenient attitudes toward such online-exam cheating were already widespread, with many students not regarding it as a serious violation. Makarova (2019) also found that 71% of distance-learning students used notes during an online exam, which they considered trivial or not cheating. This illustrates the definitional gap that AI-assisted devices widen further. The progression from calculators to smartphones to AI-assisted wearables follows a consistent logic in which each generation is smaller, more capable, and harder to detect. The current generation is a step change rather than a linear extension, combining covert capture, wireless transmission, AI-generated answers, and covert delivery into a single pipeline that requires no visible interaction.

3.3. Conceptual Models

The literature offers several conceptual models that explain why students cheat and how they rationalise it, each carrying consequences for the device threat.

Rational choice and perceived behavioural control. The Theory of Planned Behaviour, when applied to misconduct, showed that perceived behavioural control, i.e. the perceived ease of cheating, was the strongest predictor of both intentions and behaviour (Ajzen, 1985; Ajzen, 1991; Stone et al., 2010). The highest-loading item was that cheating on exams would be easy if one wished to do so. Cheap, small, unrestricted devices reduce precisely this apparent difficulty and are predicted to increase both intention and behaviour among students already inclined to cheat. Zhang et al. (2025) extend the analysis to the AI era, finding that the low detectability of LLM-based plagiarism reduces the expected cost of being caught, thereby weakening the deterrent logic on which rational-choice accounts rely. This concern is more pronounced when detection is more uncertain.

Moral disengagement. Zhang et al. (2025) integrate Bandura’s (2002) moral disengagement theory with rational choice to explain 48% of the variance in LLM-plagiarism intention. They found that moral disengagement is a significant predictor and, notably, that informal sanctions, such as shame, embarrassment, and guilt, deter more powerfully than formal penalties. This suggests that the social meaning attached to a practice matters more than institutional sanction alone, a point that becomes central to the normative argument below.

Situational crime prevention. Birks and Clare (2023) provide the first application of Clarke’s (1995) situational crime prevention to AI-facilitated misconduct, establishing that misconduct is non-randomly distributed across assessment types. In particular, it is higher in take-home, untimed, and unsupervised settings than in invigilated, closed-book ones. The invigilated examination has historically served as a situational deterrent, increasing the effort required and the risk of cheating. The devices documented here attack that function directly, collapsing the effort and, in their most advanced forms, the possibility of detection.

Cost-benefit calculation. Bin-Nashwan et al. (2023), surveying 702 students, found that time-saving and perceived stress are the strongest positive drivers of AI adoption in academic contexts, with academic integrity a significant negative predictor. Applied to examinations, the calculation is stark: when the benefit is a passing grade, while the cost of a device is as low as a few tens of pounds, and the detection probability is low, the balance favours use for any student with low integrity.

3.4. AI Capabilities as an Examination Threat

The specific capabilities of LLMs as examination threats are well evidenced, though they should be stated precisely. Susnjak and McIntosh (2024) evaluated a multimodal model across 600 questions in 12 subjects and found mean proficiency ranging from about 70% to 83%. This is a substantial percentage across every discipline tested, though not uniform across subjects or questions. Independent evidence corroborates both the breadth of this ability and its limits. Sun et al. (2025), evaluating four LLMs on medical and engineering examination content, found that, after structured prompt engineering, the best-performing model achieved 95% accuracy across question types, reaching 97% on short-answer questions. Torrecilla et al. (2025), applying LLMs to 450 examinations from three technical courses (2022-2025), reported that AI grading aligned very closely with expert human marking, indicating that present models engage with technical examination content at a level approaching that of human assessors. However, the capability is not uniform, and its unevenness cuts in this paper’s favour rather than against it. Anandababu et al. (2025), testing four LLMs across Calculus I, Calculus II, Linear Algebra, and Differential Equations, found that the best-performing model produced correct solutions and explanations in only about 60% of cases, confirming that advanced multi-step mathematical reasoning remains a relative weakness. The composite picture of high mastery across most disciplines, yet lower reliability on the most demanding quantitative reasoning, matters for the argument that follows, since the recall-and-procedure formats this paper identifies as most vulnerable fall squarely within the range the models handle best. Urban et al. (2024) provide experimental evidence that AI assistance improves student performance on complex problem-solving tasks, with moderate-to-large effect sizes for quality, elaboration, and originality. LLMs have also passed components of professional examinations, including the United States Medical Licensing Examination (USMLE), the Wharton MBA final, and parts of the US licensing examination for lawyers (Sun & Hoelscher, 2023). Currie (2023) catalogues characteristic AI failure modes, namely hallucination, confabulation, and miscalculation, relevant to reliability. However, as Susnjak and McIntosh (2024) note, cheating typically aims to satisfy minimal requirements rather than achieve perfection, and LLMs reliably generate plausible, superficially correct output sufficient to pass. Schiff’s (2021) concept of multi-stability, that technology built for one purpose is inevitably repurposed, applies directly to consumer devices designed for legitimate use and turned to examination fraud.

3.5. Detection Difficulties and the Arms-Race Dynamic

One approach to identify possible AI-assisted cheating in university exams is to post-submission scan students’ scripts for possible AI-generated text. However, the inadequacy of current detection is among the most consistent findings in recent literature and is stated here as the basis for a later argument. Birks and Clare (2023) describe an explicit arms race between students using LLMs to evade detection and institutions building detectors. Zhang et al. (2025) tested 16 AI-detection platforms and found only three with high accuracy, with most biased toward classifying AI-generated text as human. Epaphras and Mtenzi (2026) tested three commercially available AI text humanising tools and, for one of them, recorded consistent success in masking AI text origins with an Average Detection Rate (ADR) of 1.98%. Consequently, the fast evolution of AI tools, equipped with humanisation features, makes the post-submission detection, let alone confirmation, of AI-assisted cheating in written examinations very challenging, if at all possible.

Furthermore, Perkins and Roe (2024), analysing 142 institutional ethics policies, found that, at the time of ChatGPT’s release, only one mentioned AI or automated paraphrasing. This translates to a 99.3% policy gap. Milano et al. (2023) conclude that any attempt to upgrade detection software is likely to fail against fast-evolving LLMs. The structural implication that a detection-first strategy is a losing one recurs throughout the paper and grounds the structural argument of Section 7.

3.6. Institutional Policy Responses and Gaps

The literature documents a consistent gap between the rate of technological change and that of institutional response. Perkins and Roe (2024) found that none of the 142 institutions surveyed had updated their ethical policies within 6 months of ChatGPT’s release. Gulumbe et al. (2025), analysing AI guidelines from 17 major publishers and universities, found universal consensus that AI cannot be an author but wide variation in scope, specificity, and enforcement. Tauginiene et al. (2019) confirmed that existing taxonomies do not yet capture AI-assisted misconduct. In contrast, Benson and Enstroem (2023) provided the strongest evidence in the literature for intervention: mandatory, institution-wide integrity tutorials produced a 40% reduction in misconduct in year one and a 17% reduction in year two, with the largest reductions among early-year students. This is evidence that mandatory, universal programming substantially outperforms voluntary, small-scale efforts, carrying clear implications for Section 8.

4. The Empirical Picture: Prevalence, Trends, and the Limits of the Evidence

4.1. The Methodological Challenge

This section addresses RQ1. The available statistics are uneven in quality and scope. The three jurisdictions that produce the most-cited figures measure fundamentally different things, and reading them as a single comparable series would be an error. Their value lies in what each can legitimately establish about one link in the chain of concern, not in head-to-head comparison.

4.2. United Kingdom: The Strongest Direct Evidence and Its Limits

The United Kingdom provides the only large-scale, regulator-collected data on the use of electronic devices in invigilated rooms during written examinations. The Office of Qualifications and Examinations Regulation (Ofqual) records mobile and communication devices as a distinct offence category for GCSE, AS, and A-Level examinations (Ofqual, 2025). These are proven, penalised cases that capture only detected and substantiated incidents. The trend is consistent and upward: penalties for bringing a device into the examination room rose from 1,385 in 2019 to 1,845 in 2022, and instances of students found with such devices roughly doubled between 2018 and 2023, reaching 2,180. In summer 2025, phone and smart-device offences accounted for 2,225 cases (i.e. 44.3% of all student malpractice, up from 41.5% the previous year), within a total of 5,025 cases affecting approximately 0.3% of the cohort (Ofqual, 2025). Device offences have been the single most common category in every summer series since 2018.

Two limitations are decisive for how far these figures can be pressed. Firstly, they are counts of detected offences involving, on the evidence, overwhelmingly conventional smartphones rather than the purpose-built, covert hardware documented in this paper. As such, they describe that device cheating is established and rising, but not that covert AI-device cheating specifically is. Secondly, the data are from secondary-level qualifications, whereas there is no equivalent national regulator that collects standardised in-examination device statistics for UK higher education. Therefore, the paper’s argument depends on a double extrapolation, i.e. from secondary to higher education, and from smartphones to covert AI devices. This extrapolation should be understood as a structured plausibility argument rather than as direct evidence.

Four considerations support it, and none of them is dispositive. Firstly, the conceptual models in Section 3.3 are not level-specific. They predict cheating whenever the perceived benefit exceeds the perceived cost and risk, regardless of the student's age. Secondly, the motivational drivers identified in the literature, i.e. grade pressure, time constraints, peer norms, and low perceived detection risk, are present in higher education at least as strongly, and university examinations carry higher stakes (Christensen Hughes & McCabe, 2006a; Christensen Hughes & McCabe, 2006b; Patrzek et al., 2015). Thirdly, self-report data from Australia, as well as the long-running McCabe/ICAI survey programme in the US, confirm that a non-trivial minority of university students will subvert assessment when the opportunity arises (Bretag et al., 2019; Corrigendum, 2020). The open question is not whether such students exist but whether the enabling devices are available to them, and Section 5 confirms that they are. Fourthly, the absence of reliable university-level detection data is itself consistent with, though it does not prove, a real but largely undetected phenomenon, for the reasons given in Section 4.4. Taken together, these considerations make the extrapolation reasonable, though not certain.

4.3. Australia and the United States: Associated but Different Measures

The Australian evidence measures the related yet distinct phenomenon of self-reported contract cheating, in which students outsource assessed work, and converges on a prevalence of roughly 6%-8% (Bretag et al., 2019; Awdry, 2021). These figures provide evidence of willingness to subvert assessment but describe outsourcing rather than in-examination device use and rest on self-report. The most-cited US data, from the McCabe/ICAI surveys of more than 250,000 students, consistently find that most undergraduates, typically 60% to 70%, admit some form of academic dishonesty, with a lower figure around 43% at the postgraduate level. This is the broadest measure, capturing any self-reported breach over the survey window and conflating examination cheating with plagiarism and group work. Table 2 presents these sources together while preserving their distinctions.

The Guardian Freedom-of-Information (FOI) investigation of 131 UK institutions for 2023-24 warrants separate treatment because it speaks most directly to higher education despite being the weakest of the sources methodologically (Goodier, 2025). The roughly 7,000 recorded AI-misuse cases overwhelmingly reflect coursework rather than invigilated examinations, and more than a quarter of institutions did not record AI misuse as a distinct category. Consequently, the true figure is certainly higher than the recorded one, but cannot be taken as a measure of in-examination device use. It is treated here not as a prevalence estimate but as evidence that AI misuse in higher education has already moved from the hypothetical to the administratively recorded, and as an early sign of the trajectory predicted by the frameworks in Section 3.3. Students adopt a transgressive practice once its estimated gain exceeds its estimated cost and detection risk, and the moral friction attaching to it diminishes as it becomes normalised (Stone et al., 2010; Zhang et al., 2025). Both are now moving together: as LLMs have become capable and routine, extensive AI use has become a normalised feature of student work. The next displacement, i.e. from coursework, where AI use is already normalised, to the examination hall, where the devices of Section 5 make such use feasible and hard to detect, is best read as a predictable extension of that trajectory rather than a speculative leap.

4.4. Why the Gaps Are Informative

This patchwork could be read as a reason for inaction: since a clear prevalence figure for the use of covert devices in university examinations is absent, the concern might seem premature. This paper takes the opposite view, while accepting the limits of the inference. The one dataset that directly measures in-examination device use shows the behaviour is established and rising where it is counted. There is no principled reason to expect higher education, which counts it far less well, to be exempt. Because covert devices are, by design, hard to detect under non-invasive invigilation, low university-level detection counts cannot be taken as evidence of low use. This should not be inflated: an absence of detection is equally consistent with a genuinely low prevalence, and the data cannot by themselves distinguish the two. What can be said is that the evidentiary gap is precisely what one would observe if the threat were real and largely undetected, and that this possibility is sufficient, given the stakes, to warrant a proportionate response rather than to await a clean figure, which the nature of the threat may prevent ever being collected.

4.5. Engaging the Counterargument

The argument must be tested against its strongest objection: that the base rate of sophisticated AI-assisted device use in university examinations is very low, that most students who cheat apply simpler means, and that the financial, administrative, relational, and ethical costs of the recommended countermeasures may outweigh the benefit of addressing a threat affecting few determined students.

The base-rate objection is plausible but insufficient as a reason for inaction, for three reasons. Firstly, as argued above, the base rate is unknown rather than known to be low. Secondly, even a low base rate has disproportionate consequences for validity. If a small number of students receive grades that do not reflect their independent competence, then the cohort’s classification is compromised. Thirdly, trajectory matters more than current magnitude: the smart glasses described here were not commercially available three years ago, and the next generation of devices will arrive within 12 to 18 months. Waiting for the base rate to become demonstrably high is equivalent to waiting for a breach to become widespread before patching.

The cost-benefit objection is more substantive, and the paper concedes much of it. It does not argue for simultaneous implementation of all countermeasures, nor for the most invasive measures, which Section 6 identifies as disproportionate and does not recommend. Its narrower claim is that the lowest-cost, highest-value measures are justified, even if the base rate is low, because their cost is negligible, their benefit is certain, and, as Section 7 argues, they point to a greater problem significantly more difficult to resolve with other countermeasures.

In response to RQ1, the Ofqual data show that the use of electronic devices in invigilated examinations is established and increasing at the secondary level in England, and that it is the single most common category of student malpractice in every summer series since 2018. The available evidence cannot prove whether this can also be used as a valid leading indicator for higher education. However, the four considerations set out in Section 4.2 make the extrapolation reasonable and sufficient to warrant a proportionate response.

5. The Current Threat Landscape: Devices and Cheating Configurations

5.1. The Technological Convergence

This section addresses RQ2. The threat is not theoretical: every component mentioned in Section 5 was identified through the documentary scan of Section 2.3. High-definition cameras now exist in formats indistinguishable from everyday objects, such as pens, shirt buttons, and glasses frames, all sold openly at student-accessible prices. For example, six spy-camera pens were identified here, all recording at 1080p, while the cheapest was at £16. Consumer AI smart glasses (roughly £470–£640) display text within the wearer’s field of vision, remain near-invisible to observers, and some embed live LLM access and a HUD that surfaces AI prompts without specific interaction, while others add a 12-megapixel camera. Combined with LLMs capable of answering university-level questions to passing-or-higher standard within seconds (Section 3.4), these create a capture–transmission–AI–delivery pipeline qualitatively different from anything previously available.

5.2. Cheating Configurations

The four configurations below represent the principal ways current consumer technology can be used to cheat during an invigilated examination, in order of increasing sophistication and decreasing detectability.

Configuration 1: pen camera with phone relay and audio return. A spy-camera pen on the desk captures the paper and relays it via Bluetooth to a phone concealed in the student’s bag. The phone streams over 4G/5G to an accomplice, human or AI, who returns answers through a nano earpiece positioned inside the student’s ear canal, close to the eardrum, and remains hidden from view. The student never touches or looks at the phone, so a visual sweep showing no phone in use gives false reassurance. A fully automated variant removes the need for an accomplice: the phone runs Optical Character Recognition (OCR) on the feed, queries an LLM, and returns the answer within seconds.

Configuration 2: button camera with GSM module (no phone required). A camera the size of a shirt button is sewn into clothing. A matchbox-sized GSM module (i.e. a device that enables electronic systems to communicate over the Global System for Mobile Communications network) with its own SIM (i.e. Subscriber Identity Module) is concealed in the lining. At the same time, a paper-thin inductive coil worn at the neck transmits to a nano-earbud in the ear canal. The accomplice calls the module, which auto-answers and views the feed. Without a phone in this loop, asking students to leave their mobile phones at visible, out-of-reach locations is ineffective. The only signature is the module’s cellular transmission, which a roving Radio Frequency (RF) detector registers as a cellular signal with no visible phone.

Configuration 3: AI smart glasses (no accomplice required). The student pre-loads notes and model answers to the glasses’ HUD before the examination and reads them by natural upward glances during it. The HUD is visible only to the wearer. A more sophisticated variant queries an LLM in real time. The undetectability of HUD reading is illustrated, though not established, by a widely commented-on case on the Internet: Palmer Luckey’s delivery of a complete TED Talk (TED2025, 8^th of April, Vancouver) read entirely from smart glasses, defeated TED's teleprompter ban, and no observer noticed it. In this case, the setting, i.e. a public talk, without RF screening or trained invigilators, significantly differs from an examination hall, and the anecdote should be read as suggestive rather than probative. However, Corbin et al. (2026) argue that the use of AI smart glasses compromises invigilated exams and interactive orals, which many higher education providers still consider a secure way to exclude AI from assessment physically.

Configuration 4: AI smart glasses combined with a button camera. A button camera films the paper and transmits the footage to an accomplice or an AI pipeline. Answers return as text on the glasses’ HUD, navigated by sub-millimetre taps on a companion smart ring. This is the most complete system currently available: capture, transmission, processing, and display operate simultaneously via devices that collectively resemble a shirt, a pair of glasses, and a ring. It is emphasised that no effective non-invasive countermeasure currently exists to detect the pre-loaded offline version of this configuration, as it transmits no detectable wireless signal.

5.3. Commercially Available Products

Table 3 summarises the products identified, with prices listed on platforms in May 2026. As noted in Section 2.3, these are snapshot listings: prices and availability change quickly, and the detection-difficulty ratings are the author’s indicative estimates rather than tested values. The minimum cost of a working system, i.e. a spy-camera pen combined with a phone relay, is under £100. The most sophisticated autonomous system requires no accomplice, no phone in the room, and no visible interaction.

In response to RQ2, current consumer technology enables four cheating configurations in invigilated written examinations, ranging from a spy-camera pen relaying questions to an external AI system (Configuration 1) to a fully integrated button-camera, smart-glasses, and companion-ring system requiring no phone in the room and no visible interaction (Configuration 4). The enabling devices are commercially available without restriction, priced at £16 to £640. Renting such devices seems to be a less costly and more feasible option, starting at about £5 per day (Zhou, 2026). Critically, the most sophisticated configuration, i.e. AI smart glasses running offline on pre-loaded content, emits no detectable wireless signal and has no effective non-invasive countermeasure. This finding directly motivates the structural argument of Section 7.

6. Detection Difficulties and the Inadequacy of Current Countermeasures

6.1. The Limits of Software Detection

This section addresses RQ3. As established in Section 3.5, the inadequacy of detection for AI-assisted misconduct is among the most consistent findings in the literature. Detection software is unreliable and unlikely to keep pace with evolving LLMs (Milano et al., 2023). Even multimodal questions, once considered the last defence, are now compromised, leading Susnjak and McIntosh (2024) to conclude that unproctored examinations can no longer be regarded as valid. The discussion here shifts from software-based detection of AI text to physical countermeasures against the devices described in Section 5.

6.2. Physical Countermeasures and Their Limits

No single countermeasure is sufficient against a sophisticated, determined student using the full range of available devices. The realistic objective is to raise the cost and detectability of cheating enough that the risk outweighs the expected benefit to the average student.

Provision of examination stationery eliminates the entire spy-camera-pen category at essentially zero cost and with no disruption: the six spy-camera-pens identified are rendered inert if personal pens are barred and institutional pens supplied. This is the single highest-value, lowest-cost measure available, though it addresses only that category.

Radio-Frequency (RF) signal detection is the primary active countermeasure. An RF detector identifies active wireless transmissions, such as Bluetooth, Wi-Fi, 4G, 5G, and GSM, regardless of the device generating them. Nevertheless, it detects the transmission, not the hardware. Suitable devices range from inexpensive supplementary tools to professional units (roughly £300–£500), with frequency ranges and Bluetooth sensitivities appropriate to high-stakes settings. The major limitation is that smart glasses with preloaded notes operate autonomously without a live connection and, thus, emit no detectable signal. This is the single most dangerous configuration currently available, and it is the empirical fact that most directly motivates the structural argument of Section 7. Table 4 summarises the effectiveness of the countermeasures against each configuration.

Measures which are not recommended, because they are disproportionate, legally problematic, discriminatory, or ineffective, include mandatory glasses-prescription verification, compelled contact-lens wear, physical wand sweeps in standard university settings, signal jamming (illegal in the UK without authorisation), oral follow-up after submission (defeated by live AI glasses), and AI-detection software used as disciplinary evidence (unreliable, with high false detection rates and legally inadvisable). It is significant in itself that several of the only fully effective measures are precisely those that are disproportionate or unlawful. This significance is set out in Section 7.

In response to RQ3, no single countermeasure is sufficient to cover the full range of available AI-assisted devices. Provision of examination stationery is complete against the spy-camera-pen category at negligible cost. Also, RF detection is effective against configurations that require live transmission but useless against offline-preloaded devices. Furthermore, the most technically effective measures, such as wand sweeps, prescription verification, and signal jamming, are disproportionate, discriminatory, or unlawful in a standard university setting. Therefore, the most dangerous configuration has no effective non-invasive countermeasure. This is not simply a practical limitation but the empirical fact that most directly motivates the structural argument of Section 7: a detection-first response is both ethically costly and structurally inadequate against a threat which the current assessment format invites.

7. Ethical and Institutional Dimensions

7.1. Two Distinguished Arguments

This section, with Section 8, addresses RQ4. The paper articulates two related but analytically distinct arguments that must be separated before ethics can properly be addressed.

The first is structural. Commercially available AI-assisted devices are not merely a new way for individuals to cheat. They remove the condition on which a closed-book invigilated examination's evidential value depends. That value rests on the single assumption that the work produced in the room is the candidate's own. The devices documented in Section 5 and Section 6 break that assumption for the format, not only for the individual who exploits it. Once an examination cannot, under feasible non-invasive conditions, distinguish a candidate's own work from machine assistance, it can no longer warrant individual competence for the cohort as a whole. This argument is empirical in its premises, resting on the convergent evidence of the preceding sections. However, its force is structural: it concerns what the format can still certify, not how many students misuse it.

The second is normative. An institution that issues a qualification warrants to parties who cannot verify it for themselves the holder's competence, and is responsible for ensuring that the warrant is truthful. This responsibility requires the institution to act by reconsidering how it assesses and what it certifies, rather than by defending a format whose evidential basis has gone. This is a claim about what institutions owe, and it requires a different justification than the structural claim it rests on.

A third consideration is often conflated with the two aforementioned arguments but is weaker, and the paper keeps it separate. Because an LLM can now answer recall-and-procedure questions to a passing standard, it is sometimes argued that certifying a human's ability to answer them is worth less even when assistance is reliably prevented. As Section 7.3 sets out, this concern does not bear on validity, as an examination secure against cheating still certifies whether a student has mastered what it tests. It raises, though, the separate question of whether those skills remain worth assessing, which concerns curricular priority rather than exam validity. This third consideration is registered here only so that it is not mistaken for the structural argument, which stands whatever one concludes about it.

The first two arguments are each necessary, and neither suffices alone: the structural establishes that the format's warranting capacity has failed, the normative establishes whose duty it is to respond. The remainder of this section develops the normative argument and the ethical questions that follow from both.

7.2. What Constitutes Misconduct in the AI Era?

The literature has long recognised that students’ definitions of misconduct diverge from institutional ones (Burgason et al., 2019; Chan, 2025; Fyfe, 2023; Perkins, 2023), and this difference is widening as norms of AI use form faster than institutions can codify them. Chan (2025) introduces the term ‘AI-giarism’ and finds that students show significantly stronger awareness of traditional plagiarism than of unacknowledged AI use. Emerging work on student perceptions similarly finds that acceptable-use boundaries remain in flux and are frequently at odds with institutional positions (Lund et al., 2025). Tauginiene et al. (2019) confirm that existing taxonomies do not yet capture AI-assisted misconduct.

The devices in Section 5 heighten the difficulty because they do not fit existing categories. A student wearing smart glasses pre-loaded with course notes is not using a ‘mobile phone’ or a ‘communication device’ in the conventional sense. A student receiving AI-generated answers through an inductive earbud is not ‘copying from another student’. Ethics policies that, as Perkins and Roe (2024) found, focus overwhelmingly on plagiarism, with no mention of AI tools, are not simply out of date but fail to name the conduct at all, leaving students genuinely uncertain where the line falls and institutions unable to enforce it. This is an ethical problem before it is an administrative one: fair sanction presupposes a clearly stated and intelligible rule.

7.3. The Question of Certification

The most consequential ethical question raised by AI-assisted examination cheating is one that universities have been slow to confront: in an era when an AI can answer a factual examination question more quickly and at least as accurately as a competent student, what can a pass still certify when testing factual and procedural knowledge under examination conditions?

A degree is, in essence, a warrant. It is a statement by an institution to parties, such as employers, professional bodies, colleagues, patients, and the public, who cannot themselves verify the graduate’s competence, that the holder possesses defined capabilities to a defined standard. Its entire social value lies in that warranting function: it allows strangers to rely on a competence they have not witnessed. The ethical force of academic integrity, as the ICAI (2021) frames it, derives from this. Integrity is owed primarily to the relying parties whose trust the warrant invites, not to the institution as a matter of rule-following. A qualification that no longer tracks the competence it certifies not only breaks an institutional rule but also misleads everyone downstream who acts on it.

Two distinct failures follow, and they should not be confused. The first is the wrongdoing of the individual who cheats: a student who passes by relaying questions to an LLM obtains a warrant they have not earned, to the detriment of honest peers and of those who later rely on it. The second, which is the proper subject of this paper and the more serious, is structural and does not depend on any single student's dishonesty. A closed-book invigilated examination derives its evidential value from the assumption that the work produced in the room is the candidate's own. The devices documented in Section 5 and Section 6 challenge that assumption, not just for the individual cheat, but for the exam format. Once an examination cannot, under feasible non-invasive conditions, distinguish AI-assisted from independent performance, the pass it confers loses its evidential meaning for the whole cohort. The honest student is harmed not by the cheat beside them but by a format that can no longer certify what their honesty has earned. This is a problem of separability, hence, at root, a problem of detection.

A further concern is that, because an LLM can now answer recall-and-procedure questions to a passing standard, certifying a human's ability to answer them is worth less even when cheating is reliably prevented. This does not follow in general. The calculator has outperformed every student at arithmetic for half a century without making arithmetic competence pointless to assess. A foundational skill's value to the learner, as scaffolding for later reasoning, and for supervising and error-checking the very tools that automate it, does not depend on whether a machine can also perform it. Democratised AI tutoring points in the same direction: by extending to all students a support once confined to those who could afford private tutoring, it makes a well-controlled recall-and-application examination a cleaner signal of who has internalised the material. A narrower version of the concern does survive: for a skill whose only value was ever the ability to perform the operation itself, i.e. one that builds nothing further, a pass tells employers less than it once did, because they can now have AI perform that operation for them. However, that is a concern about what is worth certifying, not about whether the examination still certifies it validly.

Within safety-critical disciplines, this ceases to be an academic concern. Where a qualification in medicine, engineering, or law warrants competence on which lives or liberty depend, a qualification that may certify a graduate’s facility with a covert device rather than their independent judgement is not simply an integrity matter but a public-safety one. The institutional obligation here is correspondingly stronger: if it can be shown, as the preceding sections argue, that current assessment cannot accurately distinguish assisted from genuine performance, then institutions owe it to their graduates, to those who employ and depend on them, and to the public, to revisit the assessment rather than to defend it. The duty is not principally to catch cheats but to ensure that the warrant means what it claims.

This reframes the institutional task. A detection-first response treats the problem as one of policing individual transgression. However, if the analysis above is correct, the deeper failure is not that some students pass the examination, but that the examination, in its current form, has lost the capacity to certify what it exists to certify. No RF detector addresses that. The response that ethics demands is not better surveillance but an assessment rebuilt around what can still be reliably warranted: capabilities that are exercised, observed, and difficult to outsource in real time.

7.4. The Ethics of Surveillance as a Response

Every countermeasure will, in time, be circumvented, and consumer AI hardware is improving faster than institutional policy can adapt (Biagioli et al., 2019; Birks & Clare, 2023). If detection is in any case a losing strategy, an institution that responds chiefly by intensifying surveillance incurs ethical costs with diminishing returns. Examinations are already among the more carceral experiences universities impose. Layering RF sweeps, device inspections, and frame-profile checks on top of them treats every candidate as a suspect and erodes the relationship of trust on which integrity depends. There is a self-defeating quality to addressing a crisis of trust with measures that presuppose its absence. The moral-disengagement evidence (Zhang et al., 2025) points the same way: if informal social sanctions deter more effectively than formal penalties, a culture in which integrity is held as a shared value does work which a detector cannot, and a heavily surveillant regime may weaken the former in pursuit of the latter. None of these argues against proportionate, low-cost measures, such as barring personal pens, naming devices in policy, and holding a modest detection capability in reserve for high-stakes settings. Still, it argues against treating surveillance as the primary response and in favour of measures that rebuild the warrant rather than police it.

7.5. Equity and the Fairness of Purchasable Advantage

AI-assisted cheating is not equally available to all students, as the more capable configurations carry a real cost. For example, smart glasses with HUD and a companion ring run to several hundred pounds. This raises a fairness concern distinct from integrity as such. A grade or a degree classification is a positional good: its value depends on its relation to others', and it functions as a signal of relative merit in competition for further study and employment. When a positional good of this kind can be purchased, the assessment not only admits some cheating but also practically converts an instrument of meritocratic signalling into one that partly tracks ability to pay.

This concern has a second, less visible dimension. The students most likely to be driven toward cheating by financial stress, time pressure, and inadequate academic support, i.e. the conditions identified in Section 7.6 as the institutional drivers of misconduct, are simultaneously the students least able to afford the most sophisticated devices. The result is a compounding disadvantage: those under the greatest pressure to cheat have access to the least capable means of doing so, while wealthier peers who cheat face lower pressure and higher capability. An institutional response focused on detecting and punishing cheating will, in this environment, fall disproportionately upon students who are already disadvantaged.

A further tension deserves attention. Restricting AI tools in the name of integrity may itself carry equity costs. Milano et al. (2023) and Reiss (2021) note that foreign-language students and educationally disadvantaged students regularly rely on AI assistance for legitimate writing support. Blanket restriction penalises legitimate use alongside misconduct and may widen rather than narrow current educational inequalities. The boundary between legitimate AI assistance and AI-assisted cheating is contested, contextually variable, and, as Chan (2025) documents, poorly understood by students. An institutional response that draws that boundary in ways that are opaque or inconsistently enforced will compound the equity problem rather than address it. Fair management of students calls not only for the rule to be stated clearly, as Section 7.2 highlights, but also for it to be applied in ways that do not systematically disadvantage those who already face the greatest barriers. This equity concern directly connects to the assessment structure and institutional support conditions identified in Section 7.6 as the deeper drivers of the problem.

7.6. The Conditions That Drive Cheating, and Institutional Responsibility

It would be incomplete to discuss the ethics of AI cheating without the conditions under which students consider it. Academic procrastination predicts every measured form of misconduct, including the use of forbidden means in examinations (Patrzek et al., 2015). Perceived stress is a significant driver of AI adoption (Bin-Nashwan et al., 2023). High-stakes single-sitting examinations, intense grade competition, family and funding pressure, and thin support structures form environments in which some students feel driven to cheat. This does not excuse academic fraud. The student who cheats is responsible, yet responsibility is not borne solely by that student. An institution that constructs high-stakes, easily defeated assessments, declines to state intelligible rules about AI use, and then meets the predictable result with surveillance and sanctions alone, bears its own share of the outcome. A purely punitive response tackles symptoms while leaving the conditions and the format, whose vulnerability is this paper's subject, untouched. The ethically serious response attends to both: students' conduct and the design choices of institutions that make that conduct attractive and easy.

8. Recommendations

The following recommendations, structured by time horizon and level of responsibility, follow from both arguments of this paper. They are ordered by the two principles that emerged in Section 6 and Section 7:

prefer proportionate, low-cost measures that protect integrity without converting examinations to surveillance operations; and
treat assessment redesign, not detection, as the substantive response.

8.1. Immediate Recommendations (Module and Programme Level)

Four recommendations for immediate application on module and programme level are to:

Provide examination stationery and bar personal pens from the desks. The cost is negligible and the effect high, as it eliminates the entire spy-camera-pen category. Its specific function as a countermeasure against camera pens does not appear to have been identified in the prior literature, though the provision of materials is itself routine.
Procure RF detection capable of registering Bluetooth and cellular signals (a frequency range of roughly 50 MHz to 8 GHz), with at least one unit per venue. This approach should primarily be reserved for high-stakes examinations, and invigilators should be trained accordingly. RF detection as a countermeasure against AI-assisted devices in university examinations has not been addressed in the literature. Critically, establish in advance a clear protocol for what an invigilator does when an anomalous signal is detected, i.e. what to record, what immediate action to take, and how the matter passes to a disciplinary process, so that detection does not produce arbitrary or unfair treatment at the point of invigilation.
Update the academic integrity policy to name AI wearable devices, smart glasses, HUD displays, AI-capable audio glasses, and GSM earpiece systems explicitly as prohibited, and require students to acknowledge the policy before entry. Perkins and Roe’s (2024) principle of technological explicitness provides the rationale, while the specific device categories are drawn from this paper’s scan. As Section 7.2 argued, naming the conduct is an ethical precondition of fairly sanctioning it.
Introduce question formats that reward physical reasoning, awareness of model limitations, and self-reflection. These elements are substantially harder for AI to generate convincingly than factual or procedural answers (consistent with Miles et al. (2022) and Susnjak & McIntosh (2024)). Questions requiring engagement with a student’s own prior work or genuinely unanticipated scenarios are markedly more resistant to AI assistance than standardised recall.

8.2. Medium-Term Recommendations (Faculty Level)

Three recommendations for medium-term application on the faculty level are to:

Develop continuous assessment portfolios with version history as a considerable proportion of module grades. Sustained falsification across a semester is far harder than cheating in a single sitting, and this measure is consistent with Birks & Clare (2023), as well as Miles et al. (2022). The evidence that mandatory, institution-wide interventions outperform voluntary, small-scale ones (Benson & Enstroem, 2023) implies portfolios should be adopted at the programme level rather than left to individual modules.
Increase the proportion of assessments carried out through practical, laboratory, simulation, or demonstration activities that cannot be outsourced to a wearable device. Such a measure is consistent with Miles et al. (2022) and Perkins et al. (2020).
Implement mandatory, institution-wide academic integrity education that explicitly addresses the use of AI-assisted devices. The 40% first-year reduction following mandatory tutorials (Benson & Enstroem, 2023) is the strongest evidence of the availability of an intervention. The new element is the explicit inclusion of device scenarios, product categories, and countermeasures, and the framing of integrity as a shared value rather than only a policed prohibition, as mentioned in Section 7.4.

8.3. Longer-Term Recommendations (Institutional and Sectoral Level)

Four recommendations for longer-term application on the institutional and sectoral level are to:

Commission a structured review of assessment formats across all programmes, with the explicit objective of identifying and redesigning any assessment that a student could pass through AI assistance without genuine learning. This measure is consistent with Susnjak & McIntosh (2024) and Milano et al. (2023). This paper’s specific contribution to that review is the identification of closed-book examinations that test recall or require structured problem-solving as the formats most vulnerable to AI-assisted device use.
Engage accreditation and professional bodies on what competence assurance means when AI can pass examinations and covert devices can be used in invigilated ones. This is a sectoral question that requires resolution beyond the individual institution, since it is not only an academic issue when safety-critical disciplines are involved.
Collaborate with peer institutions, through the Quality Assurance Agency and sector bodies, to develop shared approaches. Gulumbe et al. (2025) propose an international body for AI ethics in academia, indicating the scale of the governance challenge.
Develop AI literacy curricula that rigorously engage with AI capabilities, limits, and ethics. Students who have a sincere commitment to integrity and understand what AI can and cannot do are more resistant to misuse than those who have received only punitive warnings. This measure is consistent with Kurtz et al. (2024), Yusuf et al. (2024), and Chan (2025).

9. Conclusion

This paper has focused on the threat to academic integrity posed by students who may use contemporary technology to cheat undetectably in traditional written university examinations. The existence of this threat is based on three elements: first, the convergence of miniaturised cameras, AI smart glasses, and capable LLMs, which has produced a cheating pipeline that, in its most advanced form, is practically undetectable under non-invasive invigilation; second, the apparently established and rising malpractice of using electronic devices during invigilated examinations in secondary education, as well as the implicit evidence of such malpractice in tertiary education; and third, the fact that the conditions that drive such malpractice in secondary education seem plausible in higher education as well. If used, this pipeline can generate output that, under feasible conditions, cannot safely be determined to be the student's own work rather than the product of machine assistance. At root, this is a separability problem which can jeopardise the credibility of closed-book examinations and, hence, of university certificates. Escalating detection cannot solve the problem. Providing stationery can partially address the issue of spy-camera pens. However, what a university degree certifies in the age of ambient AI cannot be deferred to detection technology. What is required is a holistic reconsideration of examination design and certification as a matter of institutional responsibility.

References

Ajzen, I. (1985). From intentions to actions: A theory of planned behavior. In J. Kuhl & J. Beckmann (Eds), Action Control: From Cognition to Behavior (pp. 11–39). Springer Berlin Heidelberg. [CrossRef]
Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179–211. [CrossRef]
Anandababu, T., Thind, R., Gupta, A., Korada, G., & Kramarczuk, K. (2025). Evaluating large language models in undergraduate mathematics: Balancing potentials and pitfalls. ACM Journal on Responsible Computing, 2(3), 1–17. [CrossRef]
Awdry, R. (2021). Assignment outsourcing: Moving beyond contract cheating. Assessment & Evaluation in Higher Education, 46(2), 220–235. [CrossRef]
Bandura, A. (2002). Social foundations of thought and action. In D. F. Marks (Ed.) The Health Psychology Reader (pp. 94-106). SAGE Publications Ltd. [CrossRef]
Benson, L., & Enstroem, R. (2023). A model for preventing academic misconduct: Evidence from a large-scale intervention. International Journal for Educational Integrity, 19(1), 25. [CrossRef]
Biagioli, M., Kenney, M., Martin, B. R., & Walsh, J. P. (2019). Academic misconduct, misrepresentation and gaming: A reassessment. Research Policy, 48(2), 401–413. [CrossRef]
Bin-Nashwan, S. A., Sadallah, M., & Bouteraa, M. (2023). Use of ChatGPT in academia: Academic integrity hangs in the balance. Technology in Society, 75, 102370. [CrossRef]
Birks, D., & Clare, J. (2023). Linking artificial intelligence facilitated academic misconduct to existing prevention frameworks. International Journal for Educational Integrity, 19(1), 20. [CrossRef]
Bretag, T., Harper, R., Burton, M., Ellis, C., Newton, P., Rozenberg, P., Saddiqui, S., & Van Haeringen, K. (2019). Contract cheating: A survey of Australian university students. Studies in Higher Education, 44(11), 1837–1856. [CrossRef]
Burgason, K. A., Sefiha, O., & Briggs, L. (2019). Cheating is in the eye of the beholder: An evolving understanding of academic misconduct. Innovative Higher Education, 44(3), 203–218. [CrossRef]
Chan, C. K. Y. (2025). Students’ perceptions of ‘AI-giarism’: Investigating changes in understandings of academic misconduct. Education and Information Technologies, 30(6), 8087–8108. [CrossRef]
Christensen Hughes, J. M., & McCabe, D. L. (2006a). Academic misconduct within higher education in Canada. Canadian Journal of Higher Education, 36(2), 1–21. [CrossRef]
Christensen Hughes, J. M., & McCabe, D. L. (2006b). Understanding academic misconduct. Canadian Journal of Higher Education, 36(1), 49–63. [CrossRef]
Clarke, R. V. (1995). Situational crime prevention. Crime and Justice, 19, 91–150. http://www.jstor.org/stable/1147596.
Corbin, T., Sharpe, S., & Dawson, P. (2026). On AI glasses and wearable AI in assessment. Assessment & Evaluation in Higher Education, Advance online publication. [CrossRef]
Corrigendum. (2020). Studies in Higher Education, 45(1), 232–233. [CrossRef]
Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research. SAGE Publications (3^rd ed).
Currie, G. M. (2023). Academic integrity and artificial intelligence: Is ChatGPT hype, hero or heresy? Seminars in Nuclear Medicine, 53(5), 719–730. [CrossRef]
Epaphras, S. N., & Mtenzi, F. (2026). Evaluating the effectiveness of AI Text humanising tools in reducing AI detection in AI-generated texts by AI detectors. International Journal of Advanced Research, 9(1), 107–125. [CrossRef]
Fyfe, P. (2023). How to cheat on your final paper: Assigning AI for student writing. AI & Society, 38(4), 1395–1405. [CrossRef]
Goodier, M. (2025, June 15). Revealed: Thousands of UK university students caught cheating using AI. The Guardian. https://www.theguardian.com/education/2025/jun/15/thousands-of-uk-university-students-caught-cheating-using-ai-artificial-intelligence-survey.
Gulumbe, B. H., Audu, S. M., & Hashim, A. M. (2025). Balancing AI and academic integrity: What are the positions of academic publishers and universities? AI & Society, 40(3), 1775–1784. [CrossRef]
Harp, J., & Taietz, P. (1966). Academic integrity and social structure: A study of cheating among college students. Social Problems, 13(4), 365–373. [CrossRef]
International Center for Academic Integrity. (2021). The fundamental values of academic integrity (3rd ed.). https://www.academicintegrity.org/aws/ICAI/asset_manager/get_file/911282?ver=1.
Khalid, A. (2015). Comparison of academic misconduct across disciplines – faculty and student perspectives. Universal Journal of Educational Research, 3(4), 258–268. [CrossRef]
Kurtz, G., Amzalag, M., Shaked, N., Zaguri, Y., Kohen-Vacs, D., Gal, E., Zailer, G., & Barak-Medina, E. (2024). Strategies for integrating generative ai into higher education: Navigating challenges and leveraging opportunities. Education Sciences, 14(5), 503. [CrossRef]
Lund, B. D., Lee, T. H., Mannuru, N. R., & Arutla, N. (2025). AI and academic integrity: Exploring student perceptions and implications for higher education. Journal of Academic Ethics, 23(3), 1545–1565. [CrossRef]
Macfarlane, B., Zhang, J., & Pun, A. (2014). Academic integrity: A review of the literature. Studies in Higher Education, 39(2), 339–358. [CrossRef]
Makarova, M. (2019). Factors of academic misconduct in a cross-cultural perspective and the role of integrity systems. Journal of Academic Ethics, 17(1), 51–71. [CrossRef]
McCabe, D. L., & Trevino, L. K. (1993). Academic dishonesty: Honor codes and other contextual influences. The Journal of Higher Education, 64(5), 522–538. [CrossRef]
Milano, S., McGrane, J. A., & Leonelli, S. (2023). Large language models challenge the future of higher education. Nature Machine Intelligence, 5(4), 333–334. [CrossRef]
Miles, P.J., Campbell, M., Ruxton, G.D. (2022). Why students cheat and how understanding this can help reduce the frequency of academic misconduct in higher education: A literature review. The Journal of Undergraduate Neuroscience Education. 20(2), A150-A160. [CrossRef]
Ofqual. (2019). Malpractice for GCSE, AS and A level: Summer 2019 exam series. GOV.UK. https://www.gov.uk/government/statistics/malpractice-in-gcse-as-and-a-level-summer-2019-exam-series.
Ofqual. (2022). Malpractice in GCSE, AS and A level: Summer 2022 exam series. GOV.UK. https://www.gov.uk/government/statistics/malpractice-in-gcse-as-and-a-level-summer-2022-exam-series.
Ofqual. (2023). Malpractice in GCSE, AS and A level: Summer 2023 exam series. GOV.UK. https://www.gov.uk/government/statistics/malpractice-in-gcse-as-and-a-level-summer-2023-exam-series.
Ofqual. (2024). Malpractice in GCSE, AS and A level: Summer 2024 exam series. GOV.UK. https://www.gov.uk/government/statistics/malpractice-in-gcse-as-and-a-level-summer-2024-exam-series.
Ofqual. (2025). Malpractice in GCSE, AS and A level: Summer 2025 exam series. GOV.UK. https://www.gov.uk/government/statistics/malpractice-in-gcse-as-and-a-level-summer-2025-exam-series.
Patrzek, J., Sattler, S., Van Veen, F., Grunschel, C., & Fries, S. (2015). Investigating the effect of academic procrastination on the frequency and variety of academic misconduct: A panel study. Studies in Higher Education, 40(6), 1014–1029. [CrossRef]
Perkins, M. (2023). Academic integrity considerations of AI large language models in the post-pandemic era: ChatGPT and beyond. Journal of University Teaching and Learning Practice, 20(2). [CrossRef]
Perkins, M., Gezgin, U. B., & Roe, J. (2020). Reducing plagiarism through academic misconduct education. International Journal for Educational Integrity, 16(1), 3. [CrossRef]
Perkins, M., & Roe, J. (2024) Decoding academic integrity policies: A corpus linguistics investigation of AI and other technological threats. Higher Education Policy, 37, 633–653. [CrossRef]
Reiss, M. J. (2021). The use of AI in education: Practicalities and ethical considerations. London Review of Education, 19(1), 5, 1–14. [CrossRef]
Schiff, D. (2021). Out of the laboratory and into the classroom: The future of artificial intelligence in education. AI & SOCIETY, 36(1), 331–348. [CrossRef]
Snyder, H. (2019). Literature review as a research methodology: An overview and guidelines. Journal of Business Research, 104, 333–339. [CrossRef]
Stone, T. H., Jawahar, I. M., & Kisamore, J. L. (2010). Predicting academic misconduct intentions and behavior using the theory of planned behavior and personality. Basic and Applied Social Psychology, 32(1), 35–45. [CrossRef]
Sun, G. H., & Hoelscher, S. H. (2023). The ChatGPT storm and what faculty can do. Nurse Educator, 48(3), 119–124. [CrossRef]
Sun, L., Li, Y., Kan, H., Shu, J., Xu, H., Li, C., Shi, G., Wang, Z., Wang, X., & Jin, L. (2025). Open- and closed-source LLMs in medical and engineering education. Frontiers in Medicine, 12, 1751813. [CrossRef]
Susnjak, T., & McIntosh, T. (2024). ChatGPT: The end of online exam integrity? Education Sciences, 14(6), 656. [CrossRef]
Tauginienė, L., Gaižauskaitė, I., Razi, S., Glendinning, I., Sivasubramaniam, S., Marino, F., Cosentino, M., Anohina-Naumeca, A., & Kravjar, J. (2019). Enhancing the taxonomies relating to academic integrity and misconduct. Journal of Academic Ethics, 17(4), 345–361. [CrossRef]
Torrecilla, J. S., López Pingarrón, C., & Beleño Saenz, K. de J. (2025). Automated formative assessment with large language models: Design, validation, and empirical application in higher education. Journal of Higher Education Theory and Practice, 25(6), 54-82. [CrossRef]
Urban, M., Děchtěrenko, F., Lukavský, J., Hrabalová, V., Svacha, F., Brom, C., & Urban, K. (2024). ChatGPT improves creative problem-solving performance in university students: An experimental study. Computers & Education, 215, 105031. [CrossRef]
Yusuf, A., Pervin, N., & Román-González, M. (2024). Generative AI and the future of higher education: A threat to academic integrity or reformation? Evidence from multicultural perspectives. International Journal of Educational Technology in Higher Education, 21(1), 21. [CrossRef]
Zhang, L., Amos, C., & Pentina, I. (2025). Interplay of rationality and morality in using ChatGPT for academic misconduct. Behaviour & Information Technology, 44(3), 491–507. [CrossRef]
Zhou, V. (2026, March 27). AI glasses are catching on in China, from shopping to cheating. Rest of World. https://restofworld.org/2026/china-ai-glasses-cheating-privacy-boom/.

Table 1. Triangulation matrix: convergence of evidence streams across the paper’s central claims.

Paper’s central claim	Stream 1: Narrative review	Stream 2: Malpractice statistics	Stream 3: Documentary product scan
Electronic device use in exams is established and rising	Khalid (2015): cellular devices most common technique (41%); Makarova (2019): device use most extensive form of misconduct cross-culturally; Burgason et al. (2019): detailed student knowledge of device-based cheating	UK Ofqual data: device offences doubled 2018–2023; 44.3% of all malpractice in 2025; single most common category every year since 2018	N/A (scan addresses current capability, not historical trend)
Current AI-assisted devices create a qualitatively new threat	Susnjak & McIntosh (2024): LLMs achieve roughly 70–83% proficiency across 12 subjects; Urban et al. (2024): AI improves problem-solving quality; Schiff (2021): multi-stability, technology repurposed from legitimate to fraudulent use	N/A (statistics predate current device generation)	Multiple devices identified (£16–£640), all legally available; cheating configurations documented; most advanced (pre-loaded AI glasses) has no effective non-invasive countermeasure
Existing detection mechanisms are inadequate	Birks & Clare (2023): arms race between AI and detection; Zhang et al. (2025): only 3 of 16 detection platforms accurate; Milano et al. (2023): detection software likely to fail; Perkins & Roe (2024): 99.3% policy gap at ChatGPT’s release	Low detection counts at university level are consistent with low detection capability, not necessarily low use	Detection difficulty rated Very High for several products; pre-loaded smart glasses emit no detectable signal; no effective non-invasive countermeasure for most sophisticated configuration
A fundamental reassessment of examination design is required	Biagioli et al. (2019): Goodhart’s Law, any metric attracts gaming; Birks & Clare (2023): detection-based response structurally inadequate; Susnjak & McIntosh (2024): unproctored exams can no longer be regarded as possessing validity	Rising trend despite existing countermeasures suggests detection-based response is insufficient	No single countermeasure is complete; the only fully effective measures are disproportionate or unlawful; arms-race logic predicts circumvention

Table 2. Malpractice and academic-dishonesty indicators across three jurisdictions. Sources are neither commensurable nor aggregated.

Region	Education level	Year	Figure	What is measured
UK (England)	GCSE / AS / A-Level	2019	1,385 device penalties (Ofqual, 2019)	Mobile/communication-device offences in exams (proven, penalised)
UK (England)	GCSE / AS / A-Level	2022	1,845 penalties (43% of student penalties) (Ofqual, 2022)	Mobile/communication-device offences in exams
UK (England)	GCSE / AS / A-Level	2023	2,180 penalties (≈ double the 2018 figure) (Ofqual, 2023)	Device-in-exam offences
UK (England)	GCSE / AS / A-Level	2024	2,140 cases (41.5% of student malpractice) (Ofqual, 2024)	Phone/smart-device offences in exams
UK (England)	GCSE / AS / A-Level	2025	2,225 cases (44.3%); 5,025 total cases (0.3% of cohort) (Ofqual, 2025)	Phone/smart-device offences in exams
UK	University	2023–24	≈ 7,000 proven AI-misuse cases (5.1 per 1,000, up from 1.6)	AI misuse, predominantly in coursework (not in-exam device use)
UK	University	2021–22	16% admitted to cheating in online assessments	Self-reported, small non-random sample (n = 900)
Australia	University	2017	≈ 6% engaged in contract cheating (>15,000 students)	Self-reported contract cheating
Australia	University	2018	5.78% prevalence	Self-reported contract cheating and assignment-sharing
Australia	University	2020	7.53% (up to ≈ 8%)	Self-reported outsourcing (formal and informal)
USA	University (undergraduate)	2002–15	60–70% admit some form of cheating	Self-reported any academic dishonesty (n > 70,000)
USA	University (postgraduate)	Rolling	≈ 43% admit to cheating	Self-reported any cheating, graduate level
USA	High school	Rolling	≈ 59–64% admit to test cheating	Self-reported test cheating in prior year

Table 3. Genericised representative commercially available AI-assisted devices (May 2026).

Product	Price	Platform	Primary exam threat	Detection difficulty
HD 1080p camera pen	£16	Mainstream marketplace	Desk camera	Very High
HD 1080p camera pen (upgraded)	£22	Mainstream marketplace	Desk/clip camera	Very High
HD camera pen with onboard storage	£25	Mainstream marketplace	Desk camera	Very High
HD 1080p spy-camera pen	£44	Mainstream marketplace	Desk camera	Very High
HD 1080p camera pen with 32GB storage	£35	Mainstream marketplace	Desk camera	Very High
Clip-on HD 1080p camera pen	£54	Mainstream marketplace	Clip-on camera	Very High
Audio smart glasses with speakers	£19	Mainstream marketplace	Audio answer relay	High
AI audio smart glasses	£30–40	Mainstream marketplace	AI audio answers	High
AI translation/audio smart glasses	£35	Mainstream marketplace	Translation/audio AI	High
Bluetooth audio smart glasses	£46	Mainstream marketplace	Translation/audio AI	High
HUD smart glasses (display only, no camera)	~£470	Specialist vendor	HUD answers + AI prompts	Very High (partial RF only)
HUD smart glasses with integrated camera	£640	Mainstream marketplace	Full autonomous AI system	Very High (partial RF only)
Nano GSM earpiece system	£50–150	Specialist vendor	Invisible audio relay	High

Table 4. Best available countermeasure against each cheating configuration. Effectiveness ratings are indicative and based on the author's judgments (see Section 2.3).

Cheating configuration	Best available countermeasure	Effectiveness
Spy camera pen on desk	Provide examination pens; bar personal stationery	Complete
Button camera on clothing	RF detection of streaming signal	Partial (pre-recorded scenarios not detected)
GSM module + nano earbud	RF detection of GSM signal; physical ear inspection at entry	Partial (nano earbud itself not RF-detectable)
AI smart glasses (live AI)	RF detection of Bluetooth/4G; phone confiscation	Partial (offline pre-loaded content not detected)
AI smart glasses (pre-loaded)	No effective non-invasive countermeasure currently exists	Minimal

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.