PERC 2025 Abstract Detail Page
Previous Page | New Search | Browse All
| Abstract Title: | Investigation of the Inter-Rater Reliability between Large Language Models and Human Raters in Qualitative Analysis |
|---|---|
| Abstract Type: | Contributed Poster Presentation |
| Abstract: | Qualitative analysis is typically limited to small datasets because it is time-intensive. Moreover, a second human rater is required to ensure reliable findings. Artificial intelligence tools may replace human raters if we demonstrate high reliability compared to human ratings. We investigated the inter-rater reliability of state-of-the-art Large Language Models (LLMs), ChatGPT-4o and ChatGPT-4.5-preview, in rating audio transcripts coded manually. We explored prompts and hyperparameters to optimize model performance. The participants were 14 undergraduate student groups from a university in the midwestern U.S. who discussed problem-solving strategies for a project. We prompted an LLM to replicate manual coding, and calculated Cohen's Kappa for inter-rater reliability. After optimizing model hyperparameters and prompts, the results showed substantial agreement (k>0.6) for three themes and moderate agreement on one. Our findings demonstrate the potential of GPT-4o and GPT-4.5 for efficient, scalable qualitative analysis in physics education and identify their limitations in rating subjective constructs. |
| Footnote: | This work is supported in part by U.S. National Foundation Grant 23000645. Opinions expressed are of the authors and not the Foundation. |
| Session Time: | Poster Session A |
| Poster Number: | A-4 |
| Contributed Paper Record: | Contributed Paper Information |
| Contributed Paper Download: | Download Contributed Paper |
Author/Organizer Information | |
| Primary Contact: |
Nikhil Sanjay Borse Purdue University |
| Co-Author(s) and Co-Presenter(s) |
Ravishankar Chatta Subramaniam, Purdue University N. Sanjay Rebello, Purdue University |
Contributed Poster | |
| Contributed Poster: | Download the Contributed Poster |




