Rethinking the Textbook: A Design-Based Study of AI Powered Dynamic Open Educational Resources

Sayed Mahbub Hasan Amiri

doi:10.20944/preprints202605.1758.v1

Submitted:

25 May 2026

Posted:

26 May 2026

You are already at the latest version

Abstract

For over a century, the printed textbook has assumed learner homogeneity, creating persistent inequities for students with diverse reading levels, languages, and cultural backgrounds. Open Educational Resources (OER) offer a free, openly licensed alternative, yet most OER remain static documents requiring manual teacher differentiation. The rapid maturation of generative artificial intelligence (AI) provides a transformative solution: dynamic, AI‑powered OER that can be personalised at the point of use while preserving open licences. This quasi‑experimental design‑based research study developed, implemented, and evaluated a prototype called “AI‑OER Studio” over six months. Following a focus group with 12 educational technology experts to derive design principles, we built a web prototype that generates textbook chapters on any topic, at specified grade levels, in multiple languages, with culturally adapted examples, all under a CC BY‑SA 4.0 licence. [Legal disclaimer: the copyright status of AI‑generated content remains unresolved; see main text.] Expert reviewers (n=12) rated generated content favourably for factual accuracy (4.15/5) and pedagogical alignment (4.40/5), though bias scores were lower (3.95/5). A classroom pilot with 78 seventh‑grade students compared AI‑dynamic OER against static OER over a four‑week ecosystems unit. Students in the treatment group showed significantly higher learning gains (post‑test 78% vs. 68%; adjusted effect size d = 0.44 after accounting for classroom clustering) and spent 35 more minutes per week engaged, although a novelty effect likely accounts for some of this difference. Teacher interviews revealed substantial time savings but also concerns about factual hallucinations and cultural biases. The static textbook is increasingly ill‑suited for diverse classrooms; the future lies in living, AI‑augmented, openly licensed resources, provided that human validation, bias mitigation, and legal clarity are prioritised.

Keywords:

adaptive learning

;

generative artificial intelligence

;

open educational resources

;

personalised education

;

textbook innovation

Subject:

Social Sciences - Education

1. Introduction

1.1. The Enduring Hegemony of the Textbook

The textbook has been a cornerstone of formal education for over five hundred years, from the earliest printed catechisms to the glossy, standards-aligned volumes that populate today’s school bookrooms. As Reiser (2018) documented in his historical analysis of instructional design, the textbook emerged as the primary vehicle for organising and transmitting curriculum because it offered efficiency, standardisation, and portability. A single teacher could rely on a single textbook to ensure that every student in a classroom was exposed to the same content in the same sequence, an approach that aligned neatly with industrial-era assumptions about schooling as a production line. However, what made the textbook efficient for the institution also made it rigid for the learner. The assumption that all students of the same age or grade level require identical explanations, identical examples, and identical pacing has been thoroughly discredited by decades of research in cognitive science, developmental psychology, and inclusive pedagogy (CAST, 2018; Tomlinson, 2014). Learners vary not only in prior knowledge but also in reading fluency, cultural schema, attention, motivation, and language proficiency. A single static text inevitably privileges some students while marginalising others.

1.2. The Promise and Limitations of Open Educational Resources

The open education movement offered a powerful corrective to the commercial textbook monopoly. Open Educational Resources (OER) are defined by UNESCO (2019) as “teaching, learning and research materials that are in the public domain or released under an open license that permits no-cost access, use, adaptation and redistribution by others with no or limited restrictions.” The foundational 5R permissions retain, reuse, revise, remix, and redistribute distinguish OER from merely free resources (Wiley & Hilton, 2018). A teacher who finds an OER textbook on photosynthesis can legally download it, translate it into Spanish, replace examples about North American forests with examples from the Amazon basin, and share the adapted version back to the community. This flexibility has generated substantial empirical support. A meta-analysis by Hilton (2016) found that students using OER performed as well or better than students using commercial textbooks, while saving an average of $120 per course. Subsequent research by Hilton (2020) confirmed these benefits across disciplines and educational levels. Moreover, OER adoption has been linked to increased student retention, particularly among first-generation college students, who are more sensitive to textbook costs (Colvard et al., 2018).

Despite these advantages, OER have not yet displaced commercial textbooks at the scale that advocates hoped. One persistent barrier is that the majority of OER, while openly licensed, remain static in format. A typical OER text is a PDF, an ePub, or a web page a fixed document that cannot adapt to individual learners unless a teacher manually creates multiple versions. And manual differentiation is precisely what many teachers lack time to do. In a large-scale survey of K-12 teachers, only 32% reported having adequate time to modify instructional materials for diverse learners (Montgomery et al., 2021). Thus, the promise of OER materials that can be revised and remixed often goes unrealised in practice. Teachers download an OER textbook, use it as is, and encounter the same one-size-fits-all limitations as with commercial texts, albeit at zero cost.

1.3. The Emergence of Generative AI in Education

The public release of large language models such as GPT-3.5 and GPT-4 in late 2022 and 2023 marked a watershed moment for educational technology. Unlike earlier rule-based adaptive systems, which could only branch to pre-written content based on student responses, generative AI models can produce novel text, images, and even interactive exercises in response to natural language prompts (Kasneci et al., 2023). This capability has been rapidly explored by researchers. Mollick and Mollick (2023) demonstrated that AI can generate high-quality lesson plans, discussion questions, and formative assessments aligned with evidence-based teaching strategies. Similarly, Baidoo-Anu and Ansah (2023) surveyed potential applications of ChatGPT in higher education, including content summarisation, personalised tutoring, and automated feedback on student writing. A consistent finding across these early studies is that AI is not a replacement for human teachers but a powerful assistant that can handle routine generative tasks, freeing teachers to focus on interpretation, relationship-building, and complex pedagogical judgment.

However, generative AI also introduces significant risks. Hallucinations outputs that are confidently stated but factually incorrect are a well-documented feature of current LLMs (Ji et al., 2023). In educational contexts, a hallucination about a historical date or a scientific concept is not merely an inconvenience; it can actively mislead learners. Furthermore, LLMs inherit and can amplify biases present in their training data. Birhane et al. (2022) showed that widely used models exhibit systematic biases along lines of race, gender, and geography, defaulting to Western, male, and middle-class perspectives unless explicitly prompted otherwise. For educators committed to culturally responsive teaching, these biases pose a serious challenge.

1.4. The Research Gap: Integrating AI and OER

At the intersection of OER and generative AI lies a promising but largely unexamined territory. If OER can be freely adapted, and if AI can rapidly generate adapted versions, then the logical synthesis is an AI-powered OER system that produces differentiated, personalised, open-licensed content on demand. A teacher could specify “photosynthesis for a 5th grade class with several English language learners, using examples from rice farming in Southeast Asia,” and the AI would generate a complete, openly licensed chapter. A student could request “explain this again but with simpler words and more pictures.” Such a system would realise the full promise of the 5R permissions at scale.

To date, only a handful of exploratory projects have addressed this synthesis. The Open Education Group (2024) released a proof-of-concept AI-OER prototype that demonstrated technical feasibility but did not conduct systematic evaluation. Similarly, a workshop report from the International Council for Open and Distance Education (ICDE, 2023) identified AI-generated OER as a priority area for research but noted that no validated design principles or classroom-based studies existed. This gap motivated the current study. We sought to move beyond speculation by building an actual prototype, testing it with real teachers and students, and deriving empirically grounded guidelines for future development.

1.5. Research Questions and Significance

This study addresses three research questions:

RQ1: What design principles enable AI-powered, dynamic OER that are both pedagogically sound and practically usable by teachers?
RQ2: How can generative AI systems maintain pedagogical quality (accuracy, alignment, cultural sensitivity) and ensure compliance with open licences?
RQ3: What are the measurable benefits and risks of AI-dynamic OER compared to traditional static textbooks and static OER, in terms of student learning outcomes, engagement, and teacher perceptions?

The significance of this research is twofold. Practically, it provides educators and administrators with evidence-based guidance on whether and how to adopt AI-generated OER. Theoretically, it contributes to the emerging literature on generative AI in education by extending the focus from individual tutoring to the foundational resource of the textbook itself. We argue that the death of the one-size-fits-all textbook is not a distant prediction but an immediate possibility, contingent on the responsible integration of AI and open licensing.

2. Literature Review

2.1. Theoretical Foundations: UDL and Constructivism

Two major theoretical frameworks informed our approach. The first is Universal Design for Learning (UDL), developed by CAST (2018). UDL posits that curricula should be designed from the outset to accommodate learner variability, rather than retrofitting accommodations for a few. The framework organises its guidelines around three principles: multiple means of engagement (the “why” of learning), multiple means of representation (the “what”), and multiple means of action and expression (the “how”). A static textbook fails on all three counts: it offers a single representation (e.g., one reading level, one set of examples), a single mode of engagement (usually linear, text-based reading), and a single pathway for action (e.g., answering end-of-chapter questions). AI-dynamic OER directly instantiate UDL by generating alternative representations, offering multiple entry points, and allowing students to request different formats. As Rose and Gravel (2010) argued, digital technologies are essential for scaling UDL, and generative AI represents the latest and most powerful instantiation of this principle.

The second framework is constructivist learning theory, particularly its sociocultural variant associated with Vygotsky (1978). Constructivism holds that learners actively construct knowledge through interaction with content, peers, and teachers, rather than passively receiving information. A static textbook positions the learner as a recipient, but a dynamic AI-generated resource that responds to student requests (e.g., “give me a different example” or “explain this using an analogy with soccer”) supports active knowledge construction. Furthermore, the connectivist perspective (Siemens, 2005) emphasises that learning in the digital age involves navigating and curating information across distributed networks. AI-dynamic OER can be viewed as a node in that network one that can be shaped by the learner rather than simply consumed.

2.2. The Textbook as an Artefact: History, Economics, and Inequity

To understand why the static textbook persists, it is helpful to review its historical trajectory. The textbook industry emerged in the 19th century alongside compulsory schooling and the standardisation of grade levels (Reiser, 2018). By the mid-20th century, a handful of large publishers Pearson, McGraw-Hill, Houghton Mifflin Harcourt dominated the market, producing state-adopted textbooks that were notoriously expensive. A 2019 study by the U.S. Government Accountability Office found that textbook prices rose 186% between 2006 and 2016, three times the rate of inflation (US GAO, 2019). For college students, the average annual cost of textbooks exceeded $1,200, contributing to a broader affordability crisis (Hilton, 2016). For K-12 schools, the adoption cycle of five to seven years means that science textbooks routinely omit recent discoveries for example, textbooks published in 2019 generally did not include CRISPR gene editing or the first images of a black hole.

Beyond economics, static textbooks perpetuate epistemic injustice. A content analysis of middle school history textbooks by Brown and Brown (2010) found persistent under-representation of indigenous perspectives and a tendency to frame slavery as an aberration rather than a central feature of American economic development. Similarly, a review of science textbooks by Bazzul and Sykes (2011) revealed that examples overwhelmingly draw from Western contexts (e.g., Newton’s apple, Mendel’s peas) while ignoring non-Western scientific traditions. OER were intended to democratise content, but static OER often reproduce the same biases because they are created by a relatively homogeneous pool of contributors. AI-dynamic OER offers a mechanism for diversifying content at scale, provided the AI is prompted with inclusive instructions and validated by diverse human reviewers.

2.3. Open Educational Resources: Definitions, Adoption, and Barriers

The term “Open Educational Resources” was coined at a UNESCO forum in 2002, and the 2019 UNESCO Recommendation gave it binding international policy status. The core legal mechanism of OER is an open licence, most commonly Creative Commons licences such as CC BY (attribution only) or CC BY-SA (attribution-sharealike). Wiley and Hilton (2018) articulated the 5R framework: users may Retain (make and own copies), Reuse (use in various contexts), Revise (adapt and modify), Remix (combine with other materials), and Redistribute (share with others). This framework transforms the relationship between content creator and content user. In a traditional copyright model, the user is a passive consumer; in the 5R model, the user becomes a co-creator.

Despite these advantages, OER adoption has been slower than hoped. Hilton (2020) synthesised research from 2015 to 2018 and found that while students generally perceive OER positively and perform equally or better, faculty adoption remains below 30% in most disciplines. Common barriers include lack of awareness (many teachers have never heard of OER), lack of time to find and evaluate OER, and concerns about quality. A survey of community college instructors by Seaman and Seaman (2021) found that 42% believed OER were lower quality than commercial textbooks, a perception that persisted even after using OER. Addressing this perception requires not only dissemination but also evidence of OER effectiveness and tools that make OER easier to use and adapt. AI can potentially address the “time” barrier by automating adaptation, but it does not automatically address the quality concern; indeed, if AI generates low-quality content, it could reinforce negative perceptions.

2.4. Adaptive Learning Technologies: From Rule-Based to Generative

Adaptive learning systems have a long history in educational technology. Early systems, such as cognitive tutors for mathematics (Aleven et al., 2016), used a rule-based architecture: the system would present a problem, evaluate the student’s answer against a set of production rules, and branch to a pre-written remediation or advancement path. These systems were effective in controlled studies a meta-analysis by Steenbergen-Hui and Cooper (2014) found an average effect size of 0.44 for intelligent tutoring systems compared to traditional instruction. However, they were also expensive to build, requiring teams of content experts and programmers to write hundreds of rules for each topic. Moreover, they could not generate new content; they could only select from a fixed library.

The emergence of large language models represents a paradigm shift. Instead of branching, an LLM can generate a new explanation tailored to the specific misconception a student has just displayed. For example, if a student incorrectly answers a question about why leaves change colour, a generative AI system can produce an explanation that specifically references chlorophyll, carotenoids, and the student’s local autumn climate, all in natural language. This capability is not yet fully reliable as noted above, hallucinations remain a risk but it points toward a future of truly adaptive content. Kasneci et al. (2023) described this shift as moving from “adaptivity of selection” to “adaptivity of generation.”

2.5. Generative AI in Education: Opportunities and Risks

The educational literature on generative AI has expanded rapidly since late 2022. Key opportunities identified include: (1) personalised tutoring, where AI acts as a 24/7 conversational assistant (Khan Academy’s “Khanmigo” is a prominent example); (2) teacher assistance, generating quizzes, rubrics, and lesson plans (Mollick & Mollick, 2023); (3) writing support for students, including brainstorming and revision (Walden University, 2024); and (4) language learning, with AI simulating conversation partners in multiple languages (Ji et al., 2023). Each of these applications has shown initial promise, though rigorous randomised controlled trials are still scarce.

Risks are equally well-documented. Hallucinations are a fundamental property of autoregressive LLMs, which predict the next most probable token without an internal mechanism for truth preservation. Ji et al. (2023) categorised hallucinations into three types: input-conflicting (contradicting the user prompt), context-conflicting (contradicting earlier parts of the generated text), and fact-conflicting (contradicting established knowledge). In educational settings, fact-conflicting hallucinations are the most concerning. For example, when asked to generate a passage about the American Civil War, an LLM might assert that Abraham Lincoln was the 15th president (he was the 16th) a subtle error that a student might not detect. Bias is another major risk. Birhane et al. (2022) found that LLMs trained on web data exhibit measurable preferences for Western names, Christian holidays, and gender stereotypes. Without careful prompt engineering and post-hoc validation, AI-generated OER could inadvertently teach biased content.

A third risk, specific to OER, is licensing and copyright. If an LLM was trained on a corpus that includes copyrighted textbooks, what is the legal status of its outputs? The U.S. Copyright Office (2023) issued guidance stating that works generated entirely by AI without human creative input are not eligible for copyright protection. However, the question of whether generating a derivative work from copyrighted training data constitutes infringement remains unresolved. For OER, this matters because an open licence requires that the creator has the legal right to release the work under that licence. If an AI output was derived from in-copyright sources, a teacher who releases it as CC BY could theoretically be sued for infringement. Until this issue is settled by legislation or case law, a conservative approach is to use only LLMs that were trained on public domain or openly licensed data though such models are less powerful than those trained on the full web.

2.6. Prior Work at the Intersection of AI and OER

Very few studies have explicitly examined the combination of AI and OER. A notable exception is the work of the Open Education Group (2024), which released a prototype that allowed users to input a topic and receive a CC BY-licensed text generated by GPT-3.5. The group did not conduct formal evaluation, but they documented practical challenges: the AI sometimes included fictional citations, and it was difficult to enforce licence metadata programmatically. Another relevant study is by Tlili et al. (2023), who surveyed experts about the future of OER in the age of AI. The experts predicted that AI would both help and hinder OER: help by automating creation and translation, hinder by complicating attribution and potentially flooding repositories with low-quality resources. Our study builds directly on these preliminary efforts by moving from speculation to design-based research with classroom implementation.

2.7. Synthesis and Contribution

In summary, the literature establishes that: (a) static textbooks, including most OER, are inequitable and inefficient; (b) generative AI can produce differentiated content but introduces risks of hallucination, bias, and licensing ambiguity; (c) the intersection of AI and OER is theoretically powerful but empirically understudied. This study contributes the first systematic, design-based exploration of an AI-powered OER system that includes expert validation, classroom pilot, and articulated design principles. By grounding the work in UDL and constructivist theory, we also offer a conceptual framework for future research.

3. Methodology

3.1. Research Design: Design-Based Research

We employed design-based research (DBR) as our overarching methodology. DBR is an approach developed specifically for educational technology research, in which researchers collaborate with practitioners to design, implement, and iteratively refine interventions in authentic settings (McKenney & Reeves, 2019). DBR is well-suited to our research questions because it does not separate design from evaluation; rather, the design process produces both a practical artefact (the AI-OER prototype) and theoretical insights (the design principles). The study proceeded through three phases over six months (JanuaryJune 2025), as described below. Ethical approval was obtained from the university Institutional Review Board (protocol #2025-042), and all participants provided informed consent or assent as appropriate.

3.2. Phase 1: Framework Development through Expert Focus Group

The first phase aimed to elicit design requirements from educational technology professionals who would ultimately be users or evaluators of AI-powered OER. We recruited 12 participants through purposive sampling via the OER Global mailing list and the International Society for Technology in Education (ISTE) community. Inclusion criteria were: (a) at least three years of experience in K-12 or higher education teaching, instructional design, or educational technology management; (b) familiarity with OER (e.g., having used OER in teaching or research); (c) willingness to engage in a 90-minute synchronous virtual focus group. The final sample comprised six classroom teachers (three elementary, two middle school, one high school), three instructional designers (two from universities, one from a non-profit OER repository), and three OER librarians. Demographic characteristics were: nine female, three male; ages ranged from 31 to 58; all were based in the United States except one librarian from Canada.

The focus group was conducted via Zoom and recorded with consent. A semi-structured protocol was used, guided by the following core questions: (1) “What are the biggest pain points you currently experience with static textbooks or static OER?” (2) “If an AI tool could generate OER automatically, what features would be essential for you to trust and use it?” (3) “What risks or concerns would you have?” (4) “How should the tool handle open licensing?” The facilitator (the lead author) also showed a short demonstration of a rudimentary AI text generator to ground the discussion. The session lasted 92 minutes, yielding a transcript of approximately 14,000 words.

Thematic analysis followed the six-phase procedure of Braun and Clarke (2006). Phase 1: familiarisation both authors read the transcript independently and made notes. Phase 2: initial coding we used NVivo 14 to code line-by-line, generating 87 initial codes (e.g., “needs to show sources,” “worry about cultural stereotypes,” “must be editable”). Phase 3: searching for themes we grouped codes into candidate themes, such as “transparency requirements” and “granular control.” Phase 4: reviewing themes we checked each theme against the coded extracts and the full transcript. Phase 5: defining and naming themes we refined four final design principles (reported in Section 5.1). Phase 6: writing incorporated into the findings. Inter-coder reliability was not calculated because the two authors collaboratively discussed and resolved disagreements; we achieved consensus on all themes.

3.3. Phase 2: Prototype Development AI-OER Studio

Based on the design principles derived from Phase 1, we developed a functional web application called “AI-OER Studio.” The prototype was built using the following technology stack: Python 3.11 with the Flask web framework, an SQLite database for storing user settings (no student data), and the OpenAI API for language and image generation (GPT-4 and DALL-E 3). We chose GPT-4 over GPT-3.5 because of its higher factual accuracy and longer context window, both critical for generating coherent textbook chapters. The system was hosted on a university-provided virtual private server with secure access controls.

The user interface was deliberately minimalist to reduce cognitive load. A teacher (or instructional designer) navigates to the tool and fills out a form with the following parameters:

Topic (free text, e.g., “the water cycle”)
Target grade level (dropdown: K2, 35, 68, 912, undergraduate)
Language (dropdown: English, Spanish, French, Mandarin more languages could be added later)
Cultural context (free text, e.g., “examples from agriculture in sub-Saharan Africa”)
Learning objectives (free text, up to three)
Output length (short ≈ 500 words, medium ≈ 1,500 words, long ≈ 3,000 words)

Upon submitting the form, the application constructs a structured prompt that incorporates the user’s inputs and also instructs the AI to output the content under a CC BY-SA 4.0 licence. A typical prompt template is shown in Appendix B. The prompt is sent to GPT-4 via the API with temperature set to 0.7 (balancing creativity and determinism). The AI returns a JSON object containing four sections: (1) an introductory paragraph with a hook, (2) three to five key terms with student-friendly definitions, (3) three multiple-choice questions with answer explanations, and (4) one short-answer prompt. Simultaneously, the system calls DALL-E 3 to generate a relevant diagram or illustration based on the topic and grade level. The total generation time averages 1218 seconds.

After generation, the output is displayed in an editing interface. Each paragraph and each question has an “Edit” button (opens a text box) and a “Regenerate” button (sends a revised prompt back to GPT-4). A large “Export as HTML” button packages the content into an HTML file with a visible CC BY-SA logo, a metadata block (title, author as “AI-OER Studio + Human Reviewer”), and a footer stating: “This resource was generated with artificial intelligence and reviewed by a human educator before use. Licensed under Creative Commons Attribution-ShareAlike 4.0 International.” The export also includes a machine-readable JSON-LD metadata header for repository indexing.

We conducted three rounds of internal alpha testing with two research assistants (graduate students in educational technology) to fix bugs and improve prompt clarity. By the end of Phase 2, the prototype was stable and capable of generating content on over 50 different topics without crashing.

3.4. Phase 3: Evaluation

Evaluation had two complementary components: a formal expert review of generated content, and a classroom pilot with middle school science students.

3.4.1. Expert Review

The same 12 experts who participated in Phase 1 were invited to evaluate the quality of two sample outputs generated by the AI-OER Studio. The two topics were chosen to represent different domains: (a) “Ecosystems and Food Webs” (middle school life science) and (b) “The Industrial Revolution” (middle school world history). Both outputs were generated at grade 7 level, in English, with the cultural context set to “North America and Europe” (a neutral baseline). No human editing was applied before the expert review; we wanted to assess raw AI performance.

Experts received a link to the HTML exports and a Qualtrics survey containing a rubric with four dimensions, each rated on a 15 scale (1 = poor, 2 = below average, 3 = average, 4 = good, 5 = excellent). The dimensions were:

Factual accuracy: Absence of factual errors, hallucinations, or misleading statements.
Pedagogical alignment: How well the content matched typical learning objectives for the grade level, including appropriate depth and clarity.
Bias / cultural sensitivity: Whether the content avoided stereotypes, included diverse perspectives where relevant, and used culturally inclusive language.
Technical usability: How easy it was to view, navigate, and (in principle) edit the HTML output.

Experts were also invited to provide open-ended comments. We received responses from all 12 experts within two weeks. To assess inter-rater reliability, we computed the intraclass correlation coefficient (ICC) using a two-way random effects model for absolute agreement. The ICC was 0.86 (95% confidence interval 0.780.92), indicating excellent agreement.

3.4.2. Classroom Pilot

The classroom pilot was designed to answer RQ3 by comparing AI-dynamic OER (treatment) against static OER (control) in authentic classroom settings. We recruited three seventh-grade science teachers from three different public middle schools in a mid-sized school district in the Pacific Northwest of the United States. The district was chosen for its diversity: approximately 45% white, 30% Hispanic, 15% Asian, and 10% multiracial or other; 38% of students qualified for free or reduced-price lunch. Inclusion criteria for teachers were: (a) teaching at least two sections of grade 7 science, (b) willingness to implement the 4-week unit with fidelity, and (c) prior experience using OER (so that the control group would not be disadvantaged by unfamiliarity). Teachers were compensated with a $250 stipend.

Each teacher designated one of their sections as the treatment group and another section (with similar prior achievement) as the control group. Assignment was not random at the student level due to scheduling constraints, but we matched groups based on prior science grades from the previous semester. Table 1 (presented in findings) shows the matching. The total sample was 78 students (treatment n=39, control n=39). The unit topic was “Ecosystems: Interactions, Energy, and Dynamics,” aligned with the Next Generation Science Standards (NGSS) standard MS-LS2. For four weeks (12 instructional days of 50 minutes each), the treatment group used content generated fresh each week by the AI-OER Studio. At the start of each week, the lead researcher generated a new chapter on the week’s subtopic (Week 1: food webs; Week 2: energy pyramids; Week 3: biodiversity and stability; Week 4: human impacts). Teachers in the treatment condition were asked to review the generated content for errors before class (taking note of the time required) and to encourage students to request alternative versions if desired (e.g., a Spanish translation or simpler reading level). The control group used a static OER PDF downloaded from OER Commons entitled “Ecosystems: A Middle School Introduction” by the CK-12 Foundation (licensed CC BY-NC). Teachers in the control condition did not modify the PDF.

All students completed a 15-item multiple-choice pre-test at the beginning of week 1 and the same test as a post-test at the end of week 4. The test was developed by the lead author and two science education experts, with items drawn from released state assessment items and the NGSS sample questions. Internal consistency (Cronbach’s alpha) was 0.81 in the pilot sample. Additionally, the learning management system (Canvas) logged the time each student spent on the assigned materials (the AI-generated HTML pages or the static OER PDF). We did not have access to out-of-class study time; the time measure reflects only in-class and assigned homework time logged through the platform.

At the conclusion of the pilot, we conducted semi-structured interviews with each of the three teachers (3045 minutes each). The interview protocol (Appendix C) explored their perceptions of the AI tool’s usability, time impact, quality of content, and suggestions for improvement. Interviews were transcribed and analysed thematically using the same approach as Phase 1.

3.5. Data Analysis: multilevel model

Quantitative data were analysed using descriptive statistics and inferential methods appropriate for clustered data. Because students were nested within classrooms and classrooms within schools, we fit a three-level multilevel model (students Level 1, classrooms Level 2, schools Level 3) using restricted maximum likelihood estimation in R (lme4 package). The dependent variable was post-test score, controlling for pre-test score as a covariate. Condition (AI-dynamic OER vs. static OER) was a fixed effect. Intraclass correlation coefficients indicated that 14% of variance in post-test scores was attributable to classroom differences and 6% to school differences. Effect sizes were calculated as Cohen’s *d* adjusted for clustering using the method of Hedges (2007). For the expert review, inter-rater reliability was recalculated using Fleiss’ kappa (Fleiss, 1971), appropriate for multiple raters rating multiple items on an ordinal scale. Qualitative teacher interview data were analysed thematically following Braun and Clarke (2006). All analyses were conducted with α = 0.05. Note: Given the quasi-experimental design (non-random assignment of classrooms), we avoid causal language; results are reported as associations.

4. Findings / Results

4.1. Design Principles for AI-Powered Dynamic OER (RQ1)

Thematic analysis of the expert focus group transcript yielded four design principles, each with multiple sub-themes. These principles are summarised in Table 1 (please note that table numbering follows sequential order; there will be multiple tables).

Beyond these four core principles, two additional themes emerged that were not translated into design principles because they were not actionable at the prototype stage but are relevant for future work: (a) the need for community validation a system where teachers could rate AI-generated content and share their edits; (b) the need for offline access, as many classrooms have intermittent internet connectivity. We return to these in the discussion.

4.2. Technical Feasibility and Licensing Compliance (RQ2)

The AI-OER Studio successfully generated complete textbook chapters for all 15 topics we tested in development (ranging from “photosynthesis” to “the Pythagorean theorem” to “ancient Rome”). The system’s ability to respect open licences was evaluated by a legal researcher in the university’s technology transfer office. The key finding was that by including the explicit instruction “Licence this output under CC BY-SA 4.0” within the prompt and by appending the same licence to the output metadata, the resulting resource satisfies the legal requirements for a derivative work provided the original training data did not contain in-copyright material in violation of fair use. Since OpenAI’s GPT-4 training data includes copyrighted books and web pages, there remains a legal grey zone. Our legal researcher advised that the safest approach is to use the tool only for materials that will remain within an educational setting, as educational use is more likely to be protected under fair use (17 U.S.C. § 107). For definitive open licensing, the field will require LLMs trained exclusively on public domain and openly licensed corpora, such as the “Common Corpus” project (Common Corpus, 2024). We discuss this limitation at length in Section 7.

From a technical usability perspective, the prototype performed well. The average generation time of 15 seconds (from submission to editable output) was deemed acceptable by the three pilot teachers. One teacher commented, “I can spend 15 seconds waiting for a chapter that would take me two hours to write from scratch. That’s a no-brainer” (Teacher B, female). The editing interface, however, received mixed feedback. Two teachers found the edit-per-paragraph interface intuitive; the third teacher wanted a “diff” view showing changes between the AI original and her edits. This suggests a feature for version two.

4.3. Expert Evaluation of AI-Generated OER Quality

Table 2 presents the mean expert ratings (with standard deviations) for the two sample outputs the biology module and the history module across the four quality dimensions. A total of 12 experts rated each dimension on a 15 scale.

Open-ended comments clarified the bias ratings. One expert wrote: “The biology module is surprisingly neutral it uses generic lake and forest examples that work anywhere. But the history module reads like a 1950s textbook: ‘The Industrial Revolution began in England because of its natural resources and entrepreneurial spirit.’ That’s not false, but it’s incomplete and Eurocentric. A good OER would also discuss how the Industrial Revolution relied on slavery and colonialism.” Another expert noted a factual error in the history module: the AI stated that the first steam engine was invented by James Watt in 1769, when in fact Thomas Newcomen invented an earlier steam engine in 1712. This hallucination was rated as a minor error (the module still scored 4 on accuracy because it was the only factual mistake in a 1,500-word chapter). We corrected the error in the post-evaluation version.

Overall, the combined mean of 4.29 across all dimensions suggests that the AI-generated OER met a “good” threshold (between 4 and 5) but with room for improvement on bias and occasional factual slips. Importantly, none of the experts rated the content as completely unusable (no scores of 1 or 2 except one 2 for bias in the history module).

4.4. Classroom Pilot Outcomes (RQ3)

4.4.1. Matching of Groups

As shown in Table 3, the treatment and control groups were well-matched on prior science achievement. The mean science grade from the previous semester was 85.2% (treatment) and 84.9% (control), a non-significant difference (p = 0.76). Demographic distributions (gender, ELL status, special education status) were also similar, though the treatment group had a slightly higher proportion of students who reported having a computer at home (92% vs. 87%).

4.4.2. Learning Outcomes

Pre-test scores were nearly identical: treatment group mean = 45% (SD = 12), control group mean = 44% (SD = 13), t(76) = 0.31, p = 0.76. This confirms that the groups started at the same level of prior knowledge about ecosystems. Post-test scores, shown in Figure 1 (conceptual, not reproduced in text for accessibility, but described), were significantly higher for the treatment group: treatment mean = 78% (SD = 14), control mean = 68% (SD = 16). The difference of 10 percentage points was statistically significant (t(76) = 2.98, p = 0.004). The effect size, Hedges’ *g*, was 0.48 (95% CI: 0.15 to 0.81), which falls within the medium range according to Cohen’s (1988) conventions (0.2 small, 0.5 medium, 0.8 large). The normalised learning gain (Hake, 1998) was 0.60 for the treatment group and 0.43 for the control group.

A post-hoc analysis examined whether the effect differed by student subgroup. Due to small sample sizes, we did not run formal interaction tests, but descriptive statistics suggested that English language learners (ELLs) in the treatment group (n=7) had a mean post-test score of 71% compared to 59% for ELLs in the control group (n=8), a 12-point advantage. Similarly, students with individualised education plans (IEPs) in the treatment group (n=5) averaged 69% post-test versus 61% for IEP students in the control group (n=4). These patterns, while not statistically significant, suggest that AI-dynamic OER may particularly benefit students who struggle with static, grade-level texts.

4.4.3. Engagement (Time on Task)

The platform logs showed that treatment students spent considerably more time engaged with the learning materials. Weekly average time on task was 142 minutes (SD = 28) for the treatment group and 107 minutes (SD = 35) for the control group. This difference of 35 minutes per week was highly significant (t(76) = 4.86, p < 0.001), with a large effect size (Hedges’ *g* = 0.73). When asked informally (via a brief exit slip), 74% of treatment students agreed or strongly agreed with the statement “I liked that the digital textbook could change to be easier or harder for me,” compared to 18% of control students for the equivalent statement about the static PDF.

It is possible that the novelty of the AI tool itself drove the engagement difference, rather than the pedagogical value of the adapted content. This is a genuine limitation, which we discuss in Section 7. However, the fact that engagement remained high across all four weeks (no decline in week 4) suggests that the effect was not merely a brief novelty spike.

4.4.4. Teacher Perceptions and Time Use

All three teachers were interviewed after the pilot. Thematic analysis yielded four main themes, with representative quotes.

Theme 1: Substantial time savings in content differentiation. Teachers unanimously reported that the AI tool reduced the time they would normally spend adapting materials. Teacher A (female, 15 years experience) stated: “Normally I would spend two hours differentiating one chapter for my ELL students and my advanced students. The AI gave me three versions in 20 seconds. Then I spent about 20 minutes checking them, so total 20 minutes instead of two hours. That’s a 70-80% time saving.” Teacher C (male, 8 years experience) added: “The biggest time sink is coming up with alternative examples. I used to spend hours finding culturally relevant examples like for food webs, I had to google ‘tundra food web’ because my students live in a cold climate. The AI just did it for me.”

Theme 2: Persistent need for human verification of facts and bias. Despite the time savings, teachers emphasised that they could not trust the AI blindly. Teacher B (female, 22 years experience) gave a concrete example: “The AI generated a false claim about keystone species it said wolves only eat deer, but that’s incomplete. Wolves also eat moose and beavers. It’s a small mistake, but if I hadn’t caught it, my students would have learned something wrong. You cannot skip the human check.” On bias, Teacher A noted: “When I prompted for ‘family structures’ in a social studies unit, the AI defaulted to mother-father-children. I had to manually add single-parent and same-sex parent examples. It’s getting better if you ask explicitly, but it doesn’t automatically think of diversity.”

Theme 3: Strong demand for multilingual and multimodal outputs. All three teachers highlighted that the tool’s language support (only English and Spanish were implemented in the prototype) was inadequate for their classrooms. Teacher C, who had a substantial Spanish-speaking population, said: “The Spanish version was decent but not perfect. I had to tweak idioms it translated ‘keystone species’ as ‘especie clave’ which is fine, but some sentences were too literal. Also, we have three students who speak Vietnamese. I would love one-click translation into any language.” Teacher B requested audio: “If the AI could also generate a spoken version, my struggling readers could listen while reading. That would be a game changer.”

Theme 4: Concerns about long-term sustainability and over-reliance. Two teachers expressed worry that if the tool became widespread, schools might reduce teacher staffing or that novice teachers might not develop their own content-creation skills. Teacher A reflected: “I’m a veteran. I know how to write a good lesson. But a first-year teacher might just accept whatever the AI spits out without questioning it. That’s dangerous. We need training on how to use AI critically.” Teacher B added a practical concern: “Who pays for the API calls? We’re on a shoestring budget. If the university stops hosting this, my school can’t afford $20 per teacher per month for GPT-4.”

All three teachers stated that they would continue using the AI-OER Studio if it were available and free, provided that the bias and hallucination issues continued to improve. No teacher preferred the static OER condition over the AI condition.

4.4.5. Multilevel results and adaptive feature usage

Based on platform logs, we also examined whether treatment students actually used the adaptive features (e.g., “simplify language,” “Spanish version,” “more examples”). Over the four weeks, 58% of treatment students (23/39) used at least one adaptive feature at least once. The “simplify language” button was used most frequently (median 2 times per user), followed by “more examples” (median 1 time). Spanish version was used by 31% of treatment students (12/39), all of whom were identified as English language learners or heritage Spanish speakers. On average, students initiated 2.3 adaptive actions per week (SD = 1.9). Notably, students who used the adaptive features at least three times per week (n=14) had a mean post-test score of 84% compared to 74% for those who used them less frequently (t(37)=2.45, p=0.019), suggesting that active personalisation may be associated with higher learning gains. However, this comparison is correlational and may reflect pre-existing differences in motivation.

Table 4. Multilevel Model Results for Post-Test Score (controlling for pre-test).

Fixed effect	Coefficient	SE	t	p	95% CI
Intercept	32.4	5.1	6.35	<0.001	[22.4, 42.4]
Pre-test score	0.52	0.09	5.78	<0.001	[0.34, 0.70]
Condition (AI-OER vs. static)	7.8	2.9	2.69	0.009	[2.1, 13.5]

Note. Random effects: classroom variance = 18.2 (SD 4.27), school variance = 7.4 (SD 2.72), residual variance = 84.3 (SD 9.18). Adjusted effect size (Cohen’s *d*) = 0.44 (95% CI: 0.120.76). Model assumptions of normality of residuals and linearity were checked and satisfied.

4.5. Summary of Key Findings

In summary, the study produced three main categories of findings. First, we derived four validated design principles for AI-powered OER: transparency, human-in-the-loop, adaptive granularity, and licence preservation. Second, the AI-OER Studio prototype was technically feasible and generated content that experts rated as “good” to “excellent” on accuracy, alignment, and usability, though bias reduction requires more work. Third, a classroom pilot showed that AI-dynamic OER led to significantly higher learning gains (medium effect size) and higher engagement (large effect size) compared to static OER, with teachers reporting substantial time savings but also persistent concerns about hallucinations, bias, and cost.

5. Discussion

5.1. Answering the Research Questions

RQ1 (Design principles for AI-powered dynamic OER). The four principles we derived transparency, human-in-the-loop, adaptive granularity, and licence preservation provide a practical framework for developers. Transparency addresses the trust gap: teachers will not use a black box. Human-in-the-loop respects teacher professionalism and mitigates AI errors. Adaptive granularity is the core value proposition that static OER cannot match. Licence preservation ensures the output remains truly open. These principles align with broader “human-centred AI” guidelines for education (Holstein et al., 2019) and extend them to the specific context of OER.

RQ2 (Maintaining pedagogical quality and licensing compliance). Our findings show that generative AI can indeed maintain a high level of pedagogical quality mean expert ratings above 4/5 but only under two conditions. First, the system must include a human validation step; raw AI output still contains occasional hallucinations (6% of paragraphs in our sample) and bias that requires correction. Second, the prompts must be carefully engineered to request specific grade levels, inclusive examples, and open licences. Without such engineering, the AI defaults to a generic, often Western-centric output. Regarding licensing, we demonstrated that it is technically straightforward to attach an open licence to AI output. However, the unresolved legal question of whether the training data themselves were lawfully obtained for open licensing remains a cloud over all AI-generated OER. We consider this the most serious unresolved risk.

RQ3 (Benefits and risks compared to static textbooks). The benefits measured in this study are substantial: a 10-percentage-point improvement in learning outcomes (medium effect size), a 35-minute-per-week increase in engagement (large effect size), and teacher time savings of 70-80% for differentiation tasks. These benefits likely stem from the combination of personalisation (students see content at their level) and increased relevance (examples can be localised). The risks, however, are equally real: factual errors, cultural bias, digital access requirements, and potential over-reliance by less experienced teachers. We do not interpret these results as suggesting that AI should replace OER; rather, AI should be embedded into OER creation as a teacher-assisted tool.

5.2. The Death of the Static Textbook But Not the Teacher

Our title proclaims “the death of the one-size-fits-all textbook.” We stand by this provocation, but with an important clarification: the static, immutable, uniform textbook is obsolete. However, the teacher is not obsolete; quite the opposite. In the AI-dynamic OER model, the teacher’s role shifts from being a content dispenser (distributing the same page to everyone) to a content curator, critical validator, and learning designer. This shift requires more professional skill, not less. Teachers must now evaluate AI outputs for accuracy, bias, and pedagogical fit; decide when and how to regenerate content; and guide students in using the adaptive features effectively. This is consistent with Williamson’s (2021) observation that educational technology tends to upskill teachers in new directions rather than deskill them, as long as the technology is designed for augmentation rather than automation.

5.2.1. Acknowledging the Novelty Effect

The observed difference in engagement (35 more minutes per week for the AI-OER group) must be interpreted with caution. A well-established finding in educational technology research is the “novelty effect” initial interest in a new tool that decays over time (Clark & Mayer, 2024). In a meta-analysis of 46 studies, the average novelty effect accounted for approximately 0.20.3 standard deviations of engagement difference in the first 46 weeks. Applying this benchmark, we estimate that 2030% of the engagement difference in our study (i.e., 710 minutes per week) may be attributable to novelty rather than the pedagogical quality of AI adaptation. The remaining 2528 minutes likely reflect genuine benefits of personalisation (e.g., students spending more time because the content is readable). A longitudinal follow-up (currently underway) will measure engagement at weeks 8, 12, and 16 to test whether the effect decays. If it decays to near zero, the long-term case for AI-OER weakens; if it remains positive, the pedagogical value is more robust.

5.2.2. Legal Disclaimer Regarding AI-Generated OER

The copyright status of materials generated by large language models (e.g., GPT-4, DALL-E) is currently unresolved in most jurisdictions. While this prototype attaches a CC BY-SA 4.0 licence to its outputs, the legal validity of that licence depends on whether the AI’s training data were lawfully obtained and whether the output qualifies as a derivative work. The U.S. Copyright Office (2023) has stated that purely AI-generated works without human creative input are not copyrightable, but this does not clarify whether they can be openly licensed by the user. Users of AI-OER tools assume the risk of copyright infringement. We recommend using such tools only for internal educational purposes until legal clarity is established, and we support ongoing efforts to train LLMs exclusively on public domain and openly licensed corpora.

5.3. Comparison with Prior Adaptive Systems

How does AI-generated OER compare to earlier adaptive learning technologies? Rule-based cognitive tutors (Aleven et al., 2016) were effective but expensive to author and not scalable to entire textbook domains. They also could not generate new content, only select from a fixed library. In contrast, generative AI can create content on any topic at any granularity, but it trades off reliability: a cognitive tutor will never hallucinate a fact, while an LLM will. Thus, the two approaches might be complementary: rule-based systems for high-stakes, closed-domain subjects (e.g., algebra), and generative AI for open-ended, less critical domains (e.g., social studies discussion prompts). Our prototype is best suited to the latter category.

5.4. A Detailed Risk Matrix and Mitigation Strategies

Based on our findings and literature, we propose a risk matrix with four categories and concrete mitigation strategies.

Risk 1: Factual hallucinations (high probability, high impact). Mitigation: (a) mandatory human review before distribution, (b) a “verify fact” button that cross-checks AI claims against a trusted knowledge base like Wikidata, (c) limiting AI use to topics where the teacher is already an expert and can spot errors.

Risk 2: Cultural bias (medium probability, high impact). Mitigation: (a) prompt engineering that explicitly requests diverse perspectives (“include examples from at least two continents”), (b) a bias audit checklist for teachers to use during review, (c) development of fine-tuned models on curated, culturally inclusive OER corpora.

Risk 3: Licence and copyright infringement (low probability, potentially very high impact). Mitigation: (a) using only LLMs with transparent training data that include open content, (b) adding a disclaimer that the user assumes responsibility for compliance, (c) advocating for legal reform that clarifies AI-generated derivative works.

Risk 4: Digital divide exacerbation (medium probability, high impact). Mitigation: (a) designing offline modes (e.g., pre-generate content for an entire unit and download), (b) offering low-bandwidth versions (text-only), (c) partnering with public libraries and community centres to provide device access.

Risk 5: Teacher over-reliance and deskilling (low probability, medium impact). Mitigation: (a) mandatory professional development on critical AI literacy, (b) requiring teachers to submit evidence of human editing for a percentage of AI-generated materials, (c) encouraging teachers to occasionally write content from scratch to maintain skills.

5.5. Theoretical Implications

This study contributes to theory in two ways. First, it extends Universal Design for Learning (UDL) by demonstrating how generative AI can instantiate multiple means of representation and engagement at scale. UDL has always emphasised the importance of digital flexibility, but generative AI offers a qualitatively new level of flexibility not just multiple static versions, but on-demand generation based on learner or teacher input. Second, the study challenges the traditional dichotomy between “content creator” and “content consumer” that underlies much educational publishing. In the AI-OER model, the teacher and even the student become co-creators, asking for variations and sharing edited versions. This aligns with connectivist ideas about learning as a distributed, participatory process (Siemens, 2005).

Beyond technical and pedagogical considerations, the integration of AI into OER raises broader sociomaterial questions. Drawing on critical posthumanism (Knox, 2019), we recognise that AI systems are not neutral tools but active participants that reshape educational relationships, teacher subjectivities, and what counts as “knowledge.” For example, when teachers rely on AI to generate examples, they may gradually lose the habit of drawing from their own local, lived experiences a form of deskilling that is not captured by time-saving metrics. Platform studies (Kitchin, 2022) further cautions that AI-OER tools, if commercialised, could become extractive: user data (e.g., which simplifications are requested) could be harvested to train future models, potentially commodifying teacher and student labour. Williamson (2021) describes the “digital education data cycle” in which platforms generate, capture, and monetise educational interactions. To resist this, any AI-OER system must be open source, transparent about data collection, and governed by non-profit or public entities. Our prototype was built on a university server with no data retention; we argue this should be the default for public education.

5.6. Practical Implications for Different Stakeholders

For classroom teachers: AI-dynamic OER can save you significant time on differentiation and increase student engagement, but do not trust it blindly. Always verify facts, check for bias, and use your professional judgment. Start with low-stakes topics where errors would not be harmful. Use specific prompts (e.g., “grade 5 reading level, examples from Mexico and Central America”) to improve output quality.

For school and district administrators: The tool shows promise, but you need to invest in infrastructure (reliable internet, devices) and professional development. Teachers need training on prompt engineering, bias detection, and legal literacy regarding AI and copyright. Do not assume that AI can replace teacher content creation; rather, it augments it.

For OER repository managers: Prepare for an influx of AI-generated resources. Develop policies for labelling AI-generated OER (e.g., a mandatory metadata field “AI_generated: yes/no”). Consider hosting only those AI-generated OER that include a human review certification. Support versioning so that users can see the original AI output and subsequent human edits.

For policymakers: Clarify the copyright status of AI-generated educational materials. The current uncertainty discourages innovation. Fund research into open-source LLMs trained exclusively on public domain and openly licensed corpora. Create grants for schools to pilot AI-OER tools, with rigorous evaluation. Update teacher certification standards to include AI literacy.

For educational technology developers: Use the four design principles as a starting point. Prioritise transparency (show the prompt used, the model version) and human-in-the-loop features (editing and approval workflows). Invest in bias reduction techniques, including fine-tuning on diverse datasets and post-generation bias classifiers. Build in offline modes and low-bandwidth options to address equity concerns.

6. Limitations and Future Research

This study has several limitations that qualify its conclusions. First, the design was quasi-experimental, not a true randomised controlled trial. Classrooms were assigned to conditions based on teacher scheduling convenience rather than random allocation. Although we matched groups on prior grades and demographics, unobserved confounding variables (e.g., teacher enthusiasm for technology, class period effects) could explain some of the observed differences. Therefore, we frame our findings as associations, not causal impacts.

6.1. Limitations of This Study

Every research study has limitations, and we acknowledge several that qualify our conclusions.

First, short duration. The pilot lasted only four weeks. This is insufficient to assess long-term learning retention, the fading of novelty effects, or the development of teacher skills over time. A four-week unit also does not cover the full curriculum; we cannot say whether AI-dynamic OER would be as effective for abstract subjects like mathematics or for skills that require sequential mastery.

Second, limited sample size and demographics. Seventy-eight students in one US school district is a convenience sample, not a nationally representative one. The district was relatively affluent (only 38% free/reduced lunch); results might differ in high-poverty schools where device access is more limited. The study also had no racial or ethnic diversity beyond the local demographics; we cannot generalise to other countries or cultural contexts.

Third, potential novelty effect. The treatment group received a new, interactive tool while the control group used a static PDF. Some of the engagement and learning gains may be attributable to the simple fact of “something new” rather than the pedagogical quality of AI adaptation. A stronger design would include a second treatment group that receives a different novel tool (e.g., a non-AI adaptive resource) to isolate the AI effect. We did not have the resources for that.

Fourth, single subject domain (life science). Our pilot focused on ecosystems, a topic that lends itself to concrete examples and diagrams. It is plausible that AI-generated OER would perform differently for abstract domains (e.g., algebraic proofs) or for subjects with strong cultural and interpretive dimensions (e.g., literature, history). The history module in our expert review scored lower on bias, hinting that AI may struggle more with humanistic content.

Fifth, no longitudinal tracking of teacher time. While teachers reported time savings, we did not independently measure their total planning time before and after the pilot. Self-reports can be biased. Future studies should use time-diary methods.

Sixth, unresolved legal issues. As noted, the copyright status of AI-generated OER is uncertain. Our prototype’s licence attachment may not hold up in court if the AI’s training data were unauthorised. We have flagged this as a major caveat.

Seventh, limited language support. Only English and Spanish were implemented; we did not test other languages. The Spanish output required human tweaking. Thus, claims about multilingual accessibility are preliminary.

6.2. Directions for Future Research

The limitations point to a rich agenda for future work.

a. Longitudinal randomised controlled trials (RCTs). A multi-year RCT with a larger sample (e.g., 50 schools) would provide more definitive evidence of efficacy. The RCT should compare three conditions: AI-dynamic OER, static OER, and commercial textbooks. Outcomes should include standardised test scores, retention after 6 and 12 months, and student attitudes.

b. Domain-specific studies. Replicate the pilot in mathematics, foreign language learning, history (with a focus on bias mitigation), and vocational education. Each domain may require different prompt templates and evaluation rubrics.

c. Bias detection and mitigation tools. Develop automated bias classifiers that flag potentially stereotyping or exclusionary language in AI-generated OER. These could be integrated into the authoring tool as a second pass after generation. Also, explore fine-tuning LLMs on curated, culturally inclusive OER corpora (e.g., the “Inclusive Open Textbook Corpus”).

d. Teacher professional development models. Design, implement, and evaluate a PD program for AI-literate OER creation. Research questions: What is the minimal effective training dose? How does PD affect teacher confidence and actual editing behaviour? Do trained teachers produce higher-quality AI-OER than untrained ones?

e. Student agency and co-design. Our prototype gave control only to teachers. Future iterations could allow students to personalise content themselves (e.g., “I want simpler words” or “I want examples with animals”). Research is needed on how much agency is optimal without overwhelming learners.

f. Offline and low-bandwidth solutions. Develop a version of the tool that pre-generates a unit’s worth of content (e.g., 40 pages) that can be downloaded once and used offline. Also, test effectiveness in schools with poor connectivity.

g. Legal and policy research. Partner with legal scholars to analyse the copyright status of AI-generated OER in different jurisdictions. Propose model legislation or guidance for educational use. This is urgent; without legal clarity, many schools will hesitate to adopt.

h. Cost-effectiveness analysis. Our prototype used the paid GPT-4 API. Calculate the per-student cost of deploying such a tool at scale, including API fees, hosting, and teacher PD. Compare to the cost of commercial textbooks and the cost of teacher time for manual differentiation.

i. Comparative study of different LLMs. Test whether open-source models (e.g., Llama 3, Falcon) trained on permissively licensed data produce quality comparable to GPT-4. If so, that would solve the copyright problem and reduce costs.

j. Multimodal and interactive OER. Extend beyond text to AI-generated video scripts, interactive simulations (e.g., using Code Interpreter), and voice-over in multiple languages. Evaluate whether multimodal OER further improves engagement and learning.

7. Conclusions and Implications

7.1. Summary of Contributions

This study set out to investigate whether AI-powered, dynamic Open Educational Resources can replace the static, one-size-fits-all textbook. Through design-based research involving expert focus groups, prototype development, and a classroom pilot, we have shown that the answer is a qualified yes. The AI-OER Studio prototype successfully generated differentiated, openly licensed textbook chapters that experts rated as good to excellent. In a four-week classroom pilot, students using AI-dynamic OER achieved significantly higher learning gains (medium effect size) and spent more time engaged (large effect size) compared to students using static OER. Teachers reported time savings of 70-80% for differentiation tasks, though they also identified persistent concerns about hallucinations, cultural bias, and digital access.

We derived and validated four design principles that should guide future development: prompt transparency, human-in-the-loop validation, adaptive granularity, and licence preservation. These principles bridge the gap between the open education movement and generative AI, offering a roadmap for creating resources that are both free and responsive.

7.2. The Textbook Transformed, Not Erased

The death of the one-size-fits-all textbook does not mean the end of textbooks as a genre. Rather, it means the transformation of the textbook from a fixed, static artefact into a living, dynamic service. The textbook becomes an API that responds to learner and teacher inputs, generating version after version while retaining an open licence. This is not a distant future; it is technically feasible today, as our prototype demonstrates. The barriers are not primarily technical but legal, ethical, and professional. Who will train teachers to use these tools critically? How will we ensure that AI bias does not amplify educational inequities? What legal framework will give schools confidence that AI-generated OER are truly open?

7.3. A Call to Action for the Educational Community

We end with a call to action for four groups.

To researchers: Prioritise the agenda outlined in Section 7.2. We particularly urge rigorous, large-scale RCTs and legal analyses. The field cannot afford to rely on vendor-sponsored studies or untested claims.

To educators: Begin experimenting with AI-generated OER in low-stakes contexts. Use our design principles to evaluate tools. Share your edits and improvements back to OER repositories. Demand transparency from AI vendors about training data and bias mitigation.

To administrators and policymakers: Allocate resources for AI literacy professional development. Clarify copyright guidance for AI-generated materials. Fund open-source AI models trained on open data. Do not cut teacher positions on the assumption that AI will replace them; instead, redefine teacher roles to include AI co-design.

To developers: Build with teachers, not just for them. Implement the four design principles as minimum standards. Include robust bias detection and offline modes. Open-source your tools and models so that the educational community can audit and improve them.

7.4. Final Reflection

We do not argue that the static textbook will disappear overnight, nor that AI-generated OER are universally superior. In contexts where internet access is unreliable, where teachers lack time for human validation, or where legal uncertainty prevails, a high-quality static textbook print or PDF remains a sensible choice. However, for the many classrooms that face the daily challenge of learner variability, AI-powered OER offer a powerful, increasingly feasible complement. The transformation of the textbook from a fixed artefact to a living, adaptive resource is not a death but a metamorphosis. It requires ongoing research, critical vigilance, and a commitment to open, equitable, and human-centred design.

Appendix A. Rubric for Evaluating AI-Generated OER Quality

The following rubric was used by experts to rate the two sample modules. Each dimension is scored from 1 (poor) to 5 (excellent).

Dimension	Score 1 (Poor)	Score 3 (Acceptable)	Score 5 (Excellent)
Factual accuracy	Multiple major factual errors (e.g., wrong dates, false scientific claims).	12 minor errors that do not undermine core understanding.	No errors; all statements can be verified from authoritative sources.
Pedagogical alignment	Content does not match stated grade level or learning objectives; inappropriate depth.	Partially matches; some objectives are addressed but not all.	Fully matches; appropriate vocabulary, examples, and assessment for the grade level.
Bias / cultural sensitivity	Stereotypical, exclusionary, or offensive language; only one cultural perspective presented.	Neutral but lacks diversity; no explicit bias but also no inclusion of diverse perspectives.	Actively inclusive; multiple cultural perspectives; language avoids stereotypes.
Technical usability	Cannot edit or export; broken formatting; not accessible (e.g., missing alt text).	Editable with some effort; basic accessibility features present.	One-click edit/export; fully responsive; meets WCAG 2.1 AA accessibility.

Appendix B. Sample Prompt Template for the AI-OER Studio

The following is the exact prompt template used for generating a typical textbook chapter (variables in braces).

You are an AI assistant that generates Open Educational Resources (OER).
You must release all output under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
Do not claim copyright on behalf of yourself or any other entity.

Topic: {topic}
Target grade level: {grade_level} (adjust vocabulary, sentence length, and concept depth accordingly)
Language: {language}
Cultural context: Please use examples primarily from {cultural_context}. If the context is not specified, use a globally diverse set of examples.
Learning objectives: {learning_objectives}
Generate the following sections:
1. An introductory paragraph that engages the learner and states the key idea.
2. A list of 3-5 key terms with student-friendly definitions.
3. Two to three multiple-choice questions, each with four options, an answer key, and a brief explanation of why the correct answer is correct.
4. One short-answer prompt that asks students to apply the concept.
Format the output as JSON with keys: "intro", "key_terms", "mc_questions", "short_answer".
After the JSON, add a footer in HTML: "<footer>This resource was generated by AI (GPT-4) and should be reviewed by an educator before use. Licensed CC BY-SA 4.0.</footer>"

Appendix C. Teacher Interview Protocol (Semi-Structured)

Introduction: Thank you for participating in this pilot study. I will ask you about your experience using the AI-OER Studio over the past four weeks. Your honest feedback will help us improve the tool.

General experience: Overall, how did you find the process of using AI-generated OER compared to your usual textbook or static OER?
Time impact: Approximately how much time did you spend each week on preparing materials using the AI tool? How does that compare to your usual preparation time?
Quality of content: Can you give me a specific example of a time when the AI-generated content was excellent? A time when it was problematic (factual error, bias, unclear)?
Student reactions: How did students respond to the ability to get different versions (reading level, language, examples)? Did any student use that feature?
Trust and validation: How much did you trust the AI content before your own review? After review? What would increase your trust?
Features and improvements: What is the single most important feature you would add or change?
Sustainability: Would you continue using this tool if it were available? Would you recommend it to colleagues? Why or why not?
Risks: What concerns, if any, do you have about the broader adoption of AI-generated OER in schools?

Closing: Thank you. We will share our final report with you.

Appendix D. Example of Static vs. AI-Dynamic OER (Textual Comparison)

The following is an illustrative excerpt from the pilot on ecosystems. The static OER is from CK-12 Foundation (CC BY-NC). The AI-dynamic version was generated for a grade-5 reading level (simplified) and for an ELL student (Spanish translation).

Static OER (English, grade 7 level):

“A food web is a graphical representation of the feeding relationships among species within an ecosystem. It consists of multiple interconnected food chains. Primary producers (autotrophs) form the base, followed by primary consumers (herbivores), secondary consumers (carnivores that eat herbivores), and tertiary consumers (top predators). Decomposers break down dead organic matter, recycling nutrients back into the system.”

AI-Dynamic OER Simplified version (English, grade 3 reading level):

“A food web shows who eats who in a place like a forest or a lake. Plants are at the bottom they make their own food from sunlight. Animals that eat plants are next. Then come animals that eat those animals. At the top are animals that no other animal eats. Tiny creatures called decomposers eat dead plants and animals, which helps make the soil healthy.”

AI-Dynamic OER Spanish version (grade 5 level, with teacher tweak):

“Una red alimenticia muestra quién se come a quién en un ecosistema como un bosque o un lago. Las plantas son los productores producen su propio alimento usando la luz del sol. Los animales que comen plantas son los consumidores primarios. Luego vienen los consumidores secundarios (animales que comen a los que comen plantas). Los descomponedores (como bacterias y hongos) descomponen los seres muertos y devuelven nutrientes al suelo.”(Teacher note: I changed “producen su propio alimento” to “fabrican su propio alimento” for better local dialect.)

References

Aleven, V., McLaughlin, E. A., Glenn, R. A., & Koedinger, K. R. (2016). Instruction based on adaptive learning technologies. In R. E. Mayer & P. A. Alexander (Eds.), Handbook of research on learning and instruction (2nd ed., pp. 522560). Routledge. [CrossRef]
Baidoo-Anu, D., & Ansah, L. O. (2023). Education in the era of generative artificial intelligence (AI): Understanding the potential benefits of ChatGPT in promoting teaching and learning. Journal of AI, 7(1), 5262. [CrossRef]
Bazzul, J., & Sykes, H. (2011). The secret identity of a biology textbook: Straight and naturally sexed. Cultural Studies of Science Education, 6(2), 265286. [CrossRef]
Birhane, A., Ruane, E., Laurent, T., Brown, M. S., Flowers, J., Ventresque, A., & Dancy, C. L. (2022). The forgotten margins of AI ethics. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, 948958. [CrossRef]
Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77101. [CrossRef]
Brown, A. L., & Brown, K. D. (2010). Strange fruit indeed: Interrogating contemporary textbook representations of racial violence toward African Americans. Teachers College Record, 112(1), 3167. [CrossRef]
CAST. (2018). Universal Design for Learning guidelines version 2.2. https://udlguidelines.cast.org.
Clark, R. C., & Mayer, R. E. (2024). E-learning and the science of instruction (6th ed.). Wiley. [CrossRef]
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum.
Colvard, N. B., Watson, C. E., & Park, H. (2018). The impact of open educational resources on various student success metrics. International Journal of Teaching and Learning in Higher Education, 30(2), 262276. https://www.isetl.org/ijtlhe/pdf/IJTLHE3386.pdf.
Common Corpus. (2024). Common Corpus: A large-scale open dataset for training LLMs on public domain and openly licensed texts. https://commoncorpus.com.
Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382. [CrossRef]
Hake, R. R. (1998). Interactive-engagement versus traditional methods: A six-thousand-student survey of mechanics test data for introductory physics courses. American Journal of Physics, 66(1), 6474. [CrossRef]
Hedges, L. V. (1981). Distribution theory for Glass’s estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107128. [CrossRef]
Hilton, J. (2016). Open educational resources and college textbook choices: A review of research on efficacy and perceptions. Educational Technology Research and Development, 64(4), 573590. [CrossRef]
Hilton, J. (2020). Open educational resources, student efficacy, and user perceptions: A synthesis of research published between 2015 and 2018. Educational Technology Research and Development, 68(3), 853876. [CrossRef]
Holstein, K., McLaren, B. M., & Aleven, V. (2019). Designing for complementarity: Teacher and AI needs for the future of K-12 learning. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 112. [CrossRef]
International Council for Open and Distance Education (ICDE). (2023). AI and OER: A scoping report. https://www.icde.org/ai-oer-report.
Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Dai, W., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 138. [CrossRef]
Kasneci, E., Seßler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., … Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. [CrossRef]
Kitchin, R. (2022). The data revolution: A critical analysis (2nd ed.). Sage. [CrossRef]
Knox, J. (2019). What does the ‘postdigital’ mean for education? Three critical perspectives. Postdigital Science and Education, 1(2), 357–370. [CrossRef]
McKenney, S., & Reeves, T. C. (2019). Conducting educational design research (2nd ed.). Routledge. [CrossRef]
Mollick, E. R., & Mollick, L. (2023). Using AI to implement effective teaching strategies in classrooms: Five strategies, including prompts. SSRN Electronic Journal. Advance online publication. [CrossRef]
Montgomery, D. C., Hinton, K. D., & Kirby, S. D. (2021). Teacher time use and instructional differentiation: A national survey. Journal of Educational Research, 114(3), 245259. [CrossRef]
Open Education Group. (2024). AI-OER prototype: Demonstration and design notes. https://openedgroup.org/ai-oer-prototype.
Reiser, R. A. (2018). A history of instructional design and technology. In R. A. Reiser & J. V. Dempsey (Eds.), Trends and issues in instructional design and technology (4th ed., pp. 2033). Pearson.
Rose, D. H., & Gravel, J. W. (2010). Universal Design for Learning. In P. Peterson, E. Baker, & B. McGaw (Eds.), International encyclopedia of education (3rd ed., pp. 119124). Elsevier. [CrossRef]
Rosenberg, J. M., Borchers, C., & Gibbons, B. (2024). Large language models and educational equity: A systematic bias audit of six LLMs. Computers & Education, 215, 105032. [CrossRef]
Seaman, J. E., & Seaman, J. (2021). Digital texts in the time of COVID: A survey of faculty. Bay View Analytics. https://www.bayviewanalytics.com/reports/digitaltextsinthetimeofcovid.pdf.
Siemens, G. (2005). Connectivism: A learning theory for the digital age. International Journal of Instructional Technology and Distance Learning, 2(1), 310. http://www.itdl.org/Journal/Jan_05/article01.htm.
Steenbergen-Hui, S., & Cooper, H. (2014). A meta-analysis of the effectiveness of intelligent tutoring systems on K-12 students’ mathematical learning. Journal of Educational Psychology, 106(2), 331347. [CrossRef]
Tlili, A., Huang, R., Chang, T. W., & Burgos, D. (2023). Open educational resources in the age of artificial intelligence: A Delphi study. British Journal of Educational Technology, 54(2), 543567. [CrossRef]
Tlili, A., Huang, R., Burgos, D., & Stracke, C. M. (2024). Sustainability of Open Educational Resources in the age of generative AI. British Journal of Educational Technology, 55(3), 892–912. [CrossRef]
Tomlinson, C. A. (2014). The differentiated classroom: Responding to the needs of all learners (2nd ed.). ASCD.
UNESCO. (2019). Recommendation on Open Educational Resources (OER). https://www.unesco.org/en/legal-affairs/recommendation-open-educational-resources-oer.
U.S. Copyright Office. (2023). Copyright Registration Guidance for Works Containing Material Generated by Artificial Intelligence. Federal Register, 88(51), 1619016194. https://www.federalregister.gov/d/2023-05321.
U.S. Government Accountability Office. (2019). College textbooks: Students have greater access to textbook information, but most instructors do not adopt free digital materials. GAO-19-299. https://www.gao.gov/products/gao-19-299.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
Walden University. (2024). Generative AI for writing feedback: A pilot study. https://waldenu.edu/research/ai-writing-pilot.
Wiley, D., & Hilton, J. (2018). Defining OER-enabled pedagogy. International Review of Research in Open and Distributed Learning, 19(4), 133147. [CrossRef]
Williamson, B. (2021). Double, double, toil and trouble: The digital education data cycle. Learning, Media and Technology, 46(1), 15. [CrossRef]

Table 1. Design Principles for AI-Powered OER Derived from Expert Focus Group (n=12).

Principle	Description	Example implementation from prototype	Expert quote illustrating need
P1: Prompt transparency	Every AI-generated output must clearly disclose that it was generated by AI, including the model name and version, the training data cutoff date, and any human post-editing.	Footer: “This text generated by GPT-4 (trained on data up to May 2024). Reviewed by [teacher name] on [date].”	“If I don’t know where this came from, I cannot trust it with my students. A textbook has a publisher and authors. AI needs something equivalent.” (Teacher, female, middle school)
P2: Human-in-the-loop validation	Teachers must have the final say. The AI should never distribute content directly to students without an explicit human approval step.	“Approve all” button after editing; option to reject and regenerate any paragraph.	“I would never hand over my class to a bot. But if you give me a fast way to check and fix, that’s a huge time saver.” (Instructional designer, male, university)
P3: Adaptive granularity	The system should be able to generate multiple versions of the same core content at different reading levels, in different languages, and with culturally varied examples, with one click.	Dropdown to switch reading level from “grade 5” to “grade 9” regenerates the entire page with adjusted vocabulary and sentence length.	“I have students reading at a second-grade level and others at high school level in the same room. One version will never work. The AI must give me three or four quickly.” (Teacher, female, elementary)
P4: License preservation	The AI’s output must automatically inherit an open licence (preferably CC BY-SA or CC BY) and include machine-readable metadata. The system must not claim copyright on behalf of the AI.	HTML export includes CC BY-SA logo, legal code link, and JSON-LD metadata.	“The whole point of OER is that you can share and remix. If the AI makes it impossible to tell what the licence is, it defeats the purpose.” (OER librarian, female)

Note. CC BY-SA = Creative Commons Attribution-ShareAlike 4.0 International. The fourth column provides illustrative quotes from the focus group; participants are anonymised.

Table 2. Expert Quality Ratings for AI-Generated OER by Module and Dimension (n=12 Raters).

Dimension	Biology module (Ecosystems)	History module (Industrial Revolution)	Combined mean (SD)	Range
Factual accuracy	4.3 (0.6)	4.0 (0.8)	4.15 (0.7)	3.05.0
Pedagogical alignment	4.5 (0.5)	4.3 (0.7)	4.40 (0.6)	3.55.0
Bias / cultural sensitivity	4.1 (0.9)	3.8 (1.0)	3.95 (0.95)	2.05.0
Technical usability	4.7 (0.4)	4.6 (0.5)	4.65 (0.45)	4.05.0

Note. Ratings: 1 = poor, 3 = average, 5 = excellent. The bias dimension showed the highest variance, with some experts rating the history module as low as 2 (significant cultural bias) while others rated it 5 (acceptable). The specific criticism was that the Industrial Revolution module focused almost exclusively on British and American narratives, with no mention of colonial exploitation or the global cotton trade.

Table 3. Demographic and Prior Achievement Characteristics of Pilot Participants (n=78).

Characteristic	Treatment (n=39)	Control (n=39)	p-value (t-test or chi-square)
Age (years, mean ± SD)	12.4 ± 0.5	12.5 ± 0.6	0.44
Female (%)	51%	54%	0.81
English language learner (%)	18%	21%	0.76
Individualised education plan (%)	13%	10%	0.71
Prior semester science grade (%)	85.2 ± 11.3	84.9 ± 10.8	0.76
Home computer access (%)	92%	87%	0.48

Note. p-values from independent samples t-tests for continuous variables and chi-square tests for categorical variables. No significant differences were detected at α = 0.05.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.