Who Is AI in Education Actually Built For?

A critical look at adaptive learning systems, language bias, and what it will take to build EdTech that works for everyone.

By 2026, the numbers look impressive on the surface. [1] Global student AI usage has climbed from 66% in 2024 to 92% in 2025. Adaptive learning systems report up to 42% improvement in learning outcomes. AI tutors outperform traditional in-class instruction in randomized controlled trials. The narrative is optimistic: AI is democratizing access to world-class education.

But aggregate numbers obscure who is actually benefiting — and who is being left further behind.

A 2025 rapid review published in Education Sciences put it plainly: while AI technologies such as adaptive learning platforms, intelligent tutoring systems, and predictive analytics are increasingly adopted, [2] their primary aim remains institutional efficiency rather than fostering equity. Initiatives explicitly designed to support underrepresented students are rare.

This post examines why that gap exists, what the technical root causes are, and what researchers and developers can do about it.

The Three Layers of the Problem

1. The Data Representation Problem

Most large-scale educational AI systems are trained on data generated by students in high-income, English-speaking countries. This creates a compounding disadvantage for learners from underrepresented backgrounds — not because the models are explicitly biased, but because they have never encountered their linguistic and cultural context in training.

Consider automatic speech recognition (ASR) in education. Tools built on Wav2Vec2 or Whisper show significantly higher word error rates for African-accented English compared to American or British English — often two to three times higher on the same model. For a student in Lagos or Nairobi using a voice-based tutoring system, this is not a minor inconvenience. It is a fundamental barrier to access.

The same pattern extends to NLP broadly. [3] Despite advances in multilingual NLP and machine translation, African languages remain underrepresented due to data scarcity, tokenization inefficiencies, and structural bias in model training. Standard tokenization pipelines developed for English often fragment African language morphology incorrectly, degrading downstream task performance significantly.

Masakhane — a community-driven research initiative bringing together over 400 researchers from 30 African countries — has been one of the most significant efforts to address this. [4] Their MasakhaNER project introduced the first large-scale named entity recognition dataset covering 10 African languages. What distinguishes Masakhane is its participatory research model: native speakers and community stakeholders are embedded in the development process, not consulted as an afterthought.

A 2025 paper presented at WiML @ NeurIPS formalized this further, introducing the African Entity Recognition Bias Index (AERBI) — a reusable framework for quantifying NER fairness across African languages. The findings revealed severe performance gaps and systematic bias patterns in widely used models including mBERT, XLM-R, and AfroXLM-R.

2. The Infrastructure Gap

Even when good AI tools exist, access is unevenly distributed. A 2026 Frontiers paper on AI and the digital divide in education argues that [5] the gap between those who do and do not have access to AI technologies is measurably wider between rich and poor regions — and that AI systems often widen this gap further through language barriers and cultural mismatches between developers and end users.

This matters for EdTech specifically because adaptive learning systems require persistent, low-latency connectivity to function well. Many rural schools across sub-Saharan Africa, Southeast Asia, and Latin America cannot meet that requirement reliably. The result is that the tools marketed as "personalized learning for every student" are, in practice, personalized learning for students with reliable broadband.

One technical response worth watching is the shift toward smaller, more efficient models. [6] Lelapa AI's InkubaLM — Africa's first multilingual small language model — demonstrates that high-quality language AI can be built to run in low-resource environments without requiring cloud infrastructure. This architectural direction matters enormously for educational deployment in contexts where compute and connectivity are constrained.

3. The Evaluation Metric Problem

Perhaps the most underappreciated issue is how we measure success in educational AI. Benchmark datasets and evaluation metrics in the field are almost entirely derived from Western educational contexts. A model that achieves state-of-the-art performance on standard reading comprehension benchmarks may perform poorly on texts that reference non-Western cultural knowledge, use code-switching, or draw on oral storytelling traditions.

This is not hypothetical. Studies on bias in AI-based grading systems have found that automated essay scoring tools systematically underrate writing from students whose first language is not English — not because their writing is lower quality, but because stylistic conventions differ from the training distribution.

When performance is only measured against the dominant cultural norm, underrepresented learners are invisible in the evaluation — which means they are invisible in the improvement cycle.

What Good Research Looks Like Here

The most promising work in this space shares a few characteristics:

Community-first data collection. The best datasets for underrepresented learners are built with communities, not for them. This means compensating annotators fairly, ensuring native speaker involvement at every stage, and treating local linguistic knowledge as expertise rather than raw labor.

Transfer learning with cultural grounding. Techniques like domain-adaptive pre-training (DAPT) and task-adaptive pre-training (TAPT) can significantly improve model performance in low-resource settings. [7] A recent analysis found that focused adaptation strategies yield up to 35% performance improvement in resource-limited settings — but only when the adaptation data is genuinely representative of the target community.

Fairness-aware evaluation. Frameworks like AERBI and fairness metrics from Fairlearn and AIF360 need to become standard in EdTech evaluation pipelines, not optional audits. If a reading comprehension model performs 20% worse for students from a particular linguistic background, that is a product failure — not a "demographic note."

Lightweight, deployable architectures. Researchers building for underrepresented learners need to design with infrastructure constraints in mind from the beginning. A model that requires a GPU cluster to run is not a model for rural Nigeria or rural Alabama. The InkubaLM approach — small, efficient, culturally grounded — is worth studying carefully.

The Honest Assessment

AI in education has genuine potential to be one of the most equalizing technologies ever developed. Adaptive systems that meet students where they are, in their language, at their pace, without the resource constraints of human tutors — this is a real and achievable vision.

But right now, most of the field is building for the students who need help the least. The benchmark datasets are wrong. The tokenizers are wrong. The evaluation metrics are wrong. And the business incentives often push toward scaling systems that already work for already-served populations.

Fixing this requires researchers who treat equity as a technical problem, not just an ethical footnote. It requires dataset work that is unglamorous but essential. It requires evaluation frameworks that make underperformance on marginalized groups visible rather than averaged away.

[8] UNESCO's 2025 report on AI and the right to education asks the right question: how can digital tools be made truly inclusive, especially for speakers of underrepresented languages? The honest answer is that we do not yet know — but the research directions are becoming clearer, and the community doing this work is growing.

That is worth paying attention to.

References & Further Reading

DemandSage. AI in Education Statistics 2026. https://www.demandsage.com/ai-in-education-statistics/
Rapid Review: The Impact of AI on Inclusivity in Higher Education (Education Sciences, 2025). https://www.mdpi.com/2227-7102/15/9/1255
AfricaNLP 2025 Workshop. https://sites.google.com/view/africanlp2025/home
Masakhane NLP Initiative. https://www.masakhane.io
Frontiers in Computer Science: AI and the Digital Divide in Education (2026). https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2026.1759027/full
Lelapa AI: Top African Language AI Trends 2025. https://lelapa.ai/top-african-language-ai-trends-to-watch-in-2025/
MoldStud: NLP and Low-Resource Languages — Innovative Solutions for 2025. https://moldstud.com/articles/p-nlp-and-low-resource-languages-innovative-solutions-for-2025
UNESCO: AI and Education — Protecting the Rights of Learners (2025). https://www.unesco.org/en/articles/what-you-need-know-about-ai-and-right-education

Ridwan Bello is a PhD student at the University of Alabama researching accented speech recognition, bias in AI systems, and machine learning for underrepresented languages.

Who Is AI in Education Actually Built For?

The Three Layers of the Problem

1. The Data Representation Problem

2. The Infrastructure Gap

3. The Evaluation Metric Problem

What Good Research Looks Like Here

The Honest Assessment

References & Further Reading

Comments

More from this blog

How to Download the Afrispeech-200 Dataset on Linux

Command Palette

The Three Layers of the Problem

1. The Data Representation Problem

2. The Infrastructure Gap

3. The Evaluation Metric Problem

What Good Research Looks Like Here

The Honest Assessment

References & Further Reading

Comments

More from this blog