home - login - register

PERC 2024 Abstract Detail Page

Previous Page  |  New Search  |  Browse All

Abstract Title: Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering
Abstract Type: Contributed Poster Presentation
Abstract: Large language modules (LLMs) have great potential for auto-grading student written responses to physics problems due to their capacity to process and generate natural language. In this explorative study, we use a prompt engineering technique, which we name "scaffolded chain of thought (COT)", to instruct GPT-3.5 to grade student written responses to a physics conceptual question. Compared to common COT prompting, scaffolded COT prompts GPT-3.5 to explicitly compare student responses to a detailed, well-explained rubric before generating the grading outcome. We show that when compared to human raters, the grading accuracy of GPT-3.5 using scaffolded COT is 20% - 30% higher than conventional COT. The level of agreement between AI and human raters can reach 70% - 80%, comparable to the level between two human raters. This shows promise that an LLM-based AI grader can achieve human-level grading accuracy on a physics conceptual problem using prompt engineering techniques alone.
Session Time: Poster Session 2
Poster Number: B86
Contributed Paper Record: Contributed Paper Information
Contributed Paper Download: Download Contributed Paper

Author/Organizer Information

Primary Contact: Zhongzhou Chen
University of Central Florida
Co-Author(s)
and Co-Presenter(s)
Tong Wan
University of Central Florida