A Case Study Using Large Language Models to Generate Metadata for Math Questions

Katie Bainbridge, Candace Walkington, Armon Ibrahim, Iris Zhong, Debshila Basu Mallick, Julianna Washington, Rich Baraniuk

Research output: Contribution to journalConference articlepeer-review

Abstract

Creating labels for assessment items, such as concept used, difficulty, or vocabulary used, can improve the quality and depth of research insights as well as targeting the right kinds of questions for students depending on their needs. However, traditional processes for metadata tagging are resource intensive in terms of labor, time, and cost, and these metadata become quickly outdated with any changes to the question content. Given thoughtful prompts, Large Language Models (LLMs) like GPT-3.5 and 4 can efficiently automate generation of assessment metadata and can help scale the process for larger volumes of questions as well as address any updates to question content that would otherwise have been tedious to reanalyze. With a human subject matter expert in-the-loop, recall and precision were analyzed for LLM generated tags for two metadata variables: problem context and math vocabulary. We conclude that LLMs like GPT-3.5 and 4 are highly reliable at generating assessment metadata, and make actionable recommendations for others intending to apply the technology to their own assessment items.

Original languageEnglish (US)
Pages (from-to)34-42
Number of pages9
JournalCEUR Workshop Proceedings
Volume3487
StatePublished - 2023
Event1st Annual Workshop on Empowering Education with LLMs - the Next-Gen Interface and Content Generation, AIEDLLM 2023 - Tokyo, Japan
Duration: Jul 7 2023 → …

Keywords

  • Assessments
  • Human-in-the-loop
  • Large Language Models
  • Metadata

ASJC Scopus subject areas

  • Computer Science(all)

Fingerprint

Dive into the research topics of 'A Case Study Using Large Language Models to Generate Metadata for Math Questions'. Together they form a unique fingerprint.

Cite this