TY - GEN
T1 - Code Soliloquies for Accurate Calculations in Large Language Models
AU - Sonkar, Shashank
AU - Chen, Xinghe
AU - Le, Myco
AU - Liu, Naiming
AU - Basu Mallick, Debshila
AU - Baraniuk, Richard
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/3/18
Y1 - 2024/3/18
N2 - High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents impressive language processing capabilities, its limitations in fundamental mathematical reasoning curtail its efficacy for such subjects. To tackle this limitation, we introduce in this paper an innovative stateful prompt design. Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4. Each student response triggers an internal monologue, or 'code soliloquy' in the GPT-tutorbot, which assesses whether its subsequent response would necessitate calculations. If a calculation is deemed necessary, it scripts the relevant Python code and uses the Python output to construct a response to the student. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive. The preliminary Subject Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA model, effectively uses Python for computations, which significantly enhances the accuracy and computational reliability of Higgs' responses.
AB - High-quality conversational datasets are crucial for the successful development of Intelligent Tutoring Systems (ITS) that utilize a Large Language Model (LLM) backend. Synthetic student-teacher dialogues, generated using advanced GPT-4 models, are a common strategy for creating these datasets. However, subjects like physics that entail complex calculations pose a challenge. While GPT-4 presents impressive language processing capabilities, its limitations in fundamental mathematical reasoning curtail its efficacy for such subjects. To tackle this limitation, we introduce in this paper an innovative stateful prompt design. Our design orchestrates a mock conversation where both student and tutorbot roles are simulated by GPT-4. Each student response triggers an internal monologue, or 'code soliloquy' in the GPT-tutorbot, which assesses whether its subsequent response would necessitate calculations. If a calculation is deemed necessary, it scripts the relevant Python code and uses the Python output to construct a response to the student. Our approach notably enhances the quality of synthetic conversation datasets, especially for subjects that are calculation-intensive. The preliminary Subject Matter Expert evaluations reveal that our Higgs model, a fine-tuned LLaMA model, effectively uses Python for computations, which significantly enhances the accuracy and computational reliability of Higgs' responses.
UR - http://www.scopus.com/inward/record.url?scp=85187555630&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85187555630&partnerID=8YFLogxK
U2 - 10.1145/3636555.3636889
DO - 10.1145/3636555.3636889
M3 - Conference contribution
AN - SCOPUS:85187555630
T3 - ACM International Conference Proceeding Series
SP - 828
EP - 835
BT - LAK 2024 Conference Proceedings - 14th International Conference on Learning Analytics and Knowledge
PB - Association for Computing Machinery
T2 - 14th International Conference on Learning Analytics and Knowledge, LAK 2024
Y2 - 18 March 2024 through 22 March 2024
ER -