PERC 2024 Abstract Detail Page
Previous Page | New Search | Browse All
| Abstract Title: | Achieving Human Level Partial Credit Grading of Written Responses to Physics Conceptual Question using GPT-3.5 with Only Prompt Engineering |
|---|---|
| Abstract Type: | Contributed Poster Presentation |
| Abstract: | Large language modules (LLMs) have great potential for auto-grading student written responses to physics problems due to their capacity to process and generate natural language. In this explorative study, we use a prompt engineering technique, which we name "scaffolded chain of thought (COT)", to instruct GPT-3.5 to grade student written responses to a physics conceptual question. Compared to common COT prompting, scaffolded COT prompts GPT-3.5 to explicitly compare student responses to a detailed, well-explained rubric before generating the grading outcome. We show that when compared to human raters, the grading accuracy of GPT-3.5 using scaffolded COT is 20% - 30% higher than conventional COT. The level of agreement between AI and human raters can reach 70% - 80%, comparable to the level between two human raters. This shows promise that an LLM-based AI grader can achieve human-level grading accuracy on a physics conceptual problem using prompt engineering techniques alone. |
| Session Time: | Poster Session 2 |
| Poster Number: | B86 |
| Contributed Paper Record: | Contributed Paper Information |
| Contributed Paper Download: | Download Contributed Paper |
Author/Organizer Information | |
| Primary Contact: |
Zhongzhou Chen University of Central Florida |
| Co-Author(s) and Co-Presenter(s) |
Tong Wan University of Central Florida |




