Exploring Natural Language Processing in Education and Education Studies


Natural language processing (NLP) helps computers interpret human language. Humans can then use these interpretations to create tools and conduct research. This allows researchers to work with large quantities of data faster than humans, and provides new ways to quantify language content, syntax, and emotion. Therefore, NLP for education can enable what may otherwise be infeasible due to time, resource, or measurability constraints.

I consider two specific NLP techniques for education research: topic modeling and word embeddings. I provide overviews of these techniques in the following section. I group this education research into three categories: Text as Observational Data, Automated Evaluation, and Adaptive Pedagogy. I also consider whether this work uses NLP methods to replace or supplement other existing techniques.

I do not describe the technical implementations for these NLP techniques in existing specific NLP tools performing these techniques. The state-of-the-art implementations for NLP techniques are rapidly evolving, and selecting a particular tool depends on researcher experience, objective, and data. Instead, I focus on how the general principles of these techniques can be applied to different types of education and education studies work.

Natural Language Processing Techniques

Topic Models

Topic models are a type of NLP technique targeted to discovering hidden patterns in text (Boyd-Graber, Hu, and Mimno 2017). Words that frequently appear together in documents are grouped together as a topic. Documents can then be characterized by the presence or absence of different topics. While some categories may be obviously recognizable by humans (e.g baseball, hockey, and basketball could be grouped into a category for “sports”), topic modeling can also systematically generate less intuitive groupings of words.

Word Embeddings

Instead of grouping together words into topics, word embeddings create vector-based (numerical) representations for words in text. These vectors can then be compared to one another so that the distance between words indicates relationship strength. Recent word embedding work such as BERT (Devlin et al. 2018) has improved generalizability and support for words that have different meanings in different contexts (e.g bat when discussing baseball vs. animals).

Research Categories

Text as Observational Data

This category refers to research using NLP techniques to transform existing language data into quantitative values that can be evaluated in the context of other variables. Nelson et al. (2021) evaluate the strengths and weaknesses of different natural language processing methods to support hand-encodings of documents. While initial NLP coding is typically imperfect, this supportive approach improves the overall speed of coding documents.

Alvero et al. (2013) demonstrate a replacement approach when evaluating the correlation between essay content and style with household income and SAT scores. The authors develop quantitative representations of essay content using topic modeling and other NLP techniques that are then used to predict family income. While both essays and SAT scores correlate with income, the authors find that essays are more strongly correlated than SAT scores. This suggests that efforts toward equity may place undue emphasis on scrutinizing quantitative evaluation measures when more intractable methods reproduce similar inequality, simply because quantitative values like SAT scores are more easily measured. Operationalizing non-numerical parts of educational processes such as college applications to draw these conclusions may be infeasible without NLP.

Automated Evaluation

One of the earliest uses of computers in education was for grading essays (Page 1968). Existing systems use NLP techniques to evaluate how closely student essays are able to match keywords and their synonyms. Recent work in this field (Rokade et al. 2018; Wang, Chang, and Li 2008) aims to better evaluate essay structure and capture the semantic meaning of student essays using more sophisticated natural language processing tools. While this work largely focuses on developing fully-automated replacements for human grading, popular commercial tools such as Grammarly and Turnitin.com use NLP to provide a supportive tool where humans have the final say. For a deeper review of natural language processing in grading systems, see Rokade et al. (2018).

Adaptive Pedagogy

I am working with researchers at the University of Maryland to develop a flashcard recommendation system using word embeddings to establish semantic relations between flashcard content: KAR³L (Shu, Feng, and Boyd-Graber 2021). We believe incorporating these methods allows the system to better infer student knowledge on related topics, as well as better model the behaviors of human memory observed in psychology research (Ebbinghaus 1913; Erdelyi 2010). Research like ours suggests that NLP could improve learning efficiency by using better memory models to implement study methods recommended by pedagogicial research (Dunlosky et al. 2013).

Natural language processing systems for adaptive pedagogy may prioritize aims other than learning efficiency. Ruan et al. (2019) develop an adaptive chatbot named QuizBot that teaches and tests factual knowledge. Students learn more material through this medium than a traditional flashcard app using the same scheduling algorithm for items. While this system means students take more time to learn, they are also more likely to spend time using the app. This work highlights how NLP can be employed not just for efficiency but also for greater engagement.

Work in adaptive pedagogy could be used in both to support or replace existing forms of education. These two highlighted works focus on helping students acquire and/or retain fact-based information, which teachers could use in support of other strategies to apply what is learned. However, students may also use these applications to independently learn information suited to their interests.


There is ample room for NLP in education work. While I’ve focused on word embeddings and topic models, other NLP techniques like sentiment analysis and summarization can also be useful for work in all three of these broad categories: 1) Text as Quantitative Observational Data, 2) Automated Evaluation Systems, and 3) Adaptive Pedagogy. As demonstrated by this existing research, NLP acts as another methodological tool for achieving educational goals.

In projects bridging NLP with education, however, we should consider how our methods help us answer our research questions. Do our data actually help us answer our questions? What is the data source and are there ethical concerns about how to handle or interpret the data? Who are we including, and who are we excluding in our work? If you’re interested in conducting this research or learning more about how to handle these practical and ethical questions, I recommend Matthew Salganik’s online textbook Bit by Bit: Social Research for the Digital Age.


Abebe, Rediet, Solon Barocas, Jon Kleinberg, Karen Levy, Manish Raghavan, and David G. Robinson. 2020. “Roles for Computing in Social Change.” Pp. 252–60 in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. Barcelona Spain: ACM.

Alvero, AJ, Sonia Giebel, Ben Gebre-Medhin, anthony lising antonio, Mitchell L. Stevens, and Benjamin W. Domingue. 2021. “Essay Content and Style Are Strongly Related to Household Income and SAT Scores: Evidence from 60,000 Undergraduate Applications.” Science Advances 7(42):eabi9031. doi: [10.1126/sciadv.abi9031]{.underline}.

Ebbinghaus, Hermann. 1913. Memory: A Contribution to Experimental Psychology. New York: Teachers College Press.

Erdelyi, Matthew Hugh, and Jeff Kleinbard. 1978. “Has Ebbinghaus Decayed with Time? The Growth of Recall (Hypermnesia) over Days.” Journal of Experimental Psychology: Human Learning and Memory 4(4):275–89. doi: [10.1037/0278-7393.4.4.275]{.underline}.

Nelson, Laura K., Derek Burk, Marcel Knudsen, and Leslie McCall. 2021. “The Future of Coding: A Comparison of Hand-Coding and Three Types of Computer-Assisted Text Analysis Methods.” Sociological Methods & Research 50(1):202–37. doi: [10.1177/0049124118769114]{.underline}.

Ruan, Sherry, Liwei Jiang, Justin Xu, Bryce Joe-Kun Tham, Zhengneng Qiu, Yeshuang Zhu, Elizabeth L. Murnane, Emma Brunskill, and James A. Landay.

  1. “QuizBot: A Dialogue-Based Adaptive Learning System for Factual Knowledge.” Pp. 1–13 in Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. Glasgow Scotland Uk: ACM.

Salganik, Matthew J. 2017. Bit by Bit: Social Research in the Digital Age. Illustrated Edition. Princeton: Princeton University Press.

Shu, Matthew, Shi Feng, and Jordan Boyd-Graber. 2021. “Spaced Repetition Meets Representation Learning.” EACL 2021 HCI-NLP Workshop.