PERC 2024 Abstract Detail Page
Previous Page | New Search | Browse All
| Abstract Title: | Applications, Opportunities and Challenges of Large Language Models in Physics Education |
|---|---|
| Abstract Type: | Talk Symposium |
| Abstract: | Artificial intelligence (AI) has increasingly found its way into more and more areas of our lives, including education. Until the end of 2022, AI had played only a minor role in physics education, if at all, mostly with the focus on the possibilities of AI in the evaluation of extensive or heterogeneous datasets, as they arise in learning with digital technologies. With the public availability of ChatGPT since the end of 2022, generative and other AI tools have come more in the focus of physics education research. AI has the possibility of advancing the field of Physics Education Research in a variety of ways, such as Enhancing Personalized Learning, Improving Accessibility and Inclusivity, and Natural Language Interaction. In this symposium, we present research insights regarding opportunities and challenges of AI-enhanced physics education across institutional settings with a particular focus on AI-assisted assessment and learner assistance to bridge educational divides. |
| Session Time: | Parallel Sessions Cluster 1 |
| Room: | Burroughs |
Author/Organizer Information | |
| Primary Contact: |
Stefan Kuechemann and Jochen Kuhn Ludwig-Maximilians-Universität München, Germany München, Germany |
| Co-Author(s) and Co-Presenter(s) |
Gerd Kortemeyer, ETH Zürich Xiaoming Zhai, University of Georgia |
Parallel Session Information | |
| Moderator: | Ludwig-Maximilians-Universität München |
Symposium Specific Information | |
| Presentation 1 Title: | Strategies to learn with Large Language Models in Physics Education |
| Presentation 1 Authors: | Stefan Küchemann and Jochen Kuhn |
| Presentation 1 Abstract: | Large language models (LLMs), like ChatGPT, have significantly advanced various fields, offering automated, coherent responses to complex inputs, including essay writing, programming, and zero-shot learning. These models, trained on vast datasets, promise substantial benefits in physics education. However, there are several challenges such as the controllability of the output, interpretability, and ethical concerns, including bias and potential misuse. In physics education, it is crucial for students to critically evaluate ChatGPT's output for correctness and biases, possibly using efficient prompting strategies, such as chain of thought prompting or few-shot learning, to enhance the accuracy of the output. This study explores how chatbots based on an LLM can be brought into 9th and 10th grade classrooms. Therefore, we investigated how students (N=114) from these grades tackle physics problems using ChatGPT. Using a pre-post design, we compared the problem-solving process of students taught a prompting strategy (intervention group 1) against those without such instruction (intervention group 2) and a control group learning from worked examples. The study aimed to determine if and how instructional strategies could mitigate issues when ChatGPT generates incorrect responses. We specifically designed physics problems to challenge ChatGPT, providing a control window for students to verify answers and learn from the model's outputs. The findings reveal significant challenges in using ChatGPT for physics problem-solving when immediate corrections are needed. The results are analyzed within a theoretical framework involving Generative Artificial Intelligence, highlighting the importance of strategic prompting techniques to improve students' learning outcomes with AI tools. |
| Presentation 2 Title: | Large Language Models for Assessment in Physics |
| Presentation 2 Authors: | Gerd Kortemeyer |
| Presentation 2 Abstract: | Recent studies have demonstrated that Large Language Models (LLMs) like GPT-4 can solve approximately 80% of the common homework and exam problems in introductory physics courses. This raises the question of whether LLMs can also assist in evaluating student solutions to these problems. The talk presents initial findings from studies conducted in large-enrollment physics courses at ETH Zurich, a technical university in Europe. These explorations include student evaluations of AI-generated feedback on their handwritten homework-solution derivations and comparisons of grading between teaching assistants and AI for high-stakes handwritten exams in thermodynamics. These findings suggest a scalable approach to providing formative and summative assessments that emphasize reasoning, modeling, and solution strategies, rather than focusing solely on the final result. |
| Presentation 3 Title: | PhysicsLlama: Bridging the Educational Divide with a Contextualized Large Language Model |
| Presentation 3 Authors: | Ehsan Latif & Xiaoming Zhai |
| Presentation 3 Abstract: | Large Language Models (LLMs) are adept at generating high-quality texts for various queries. Despite their linguistic prowess, these models often need to catch up when responding to scientific queries, particularly those in physics education that demand students' scientific reasoning and deep contextual understanding. Such limitations render them less effective for tasks requiring higher-order physics problem-solving skills, thus diminishing their utility in educational contexts where students seek assistance with physics problems. To address these challenges, we develop PhysicalLlama, a novel contextualized foundation LLM pre-train on an extensive dataset compiled from over 1000 high-impact physics education articles and thousands of student written responses in physics. This innovative approach equips PhysicalLlama with a rich factual and procedural knowledge foundation specifically tailored to address the intricate demands of physics education. To evaluate PhysicalLlama's performance rigorously, we fine-tune the model on student-written responses associated with ten physics constructed response assessment items. This fine-tuning process is designed to enhance further the model's understanding and responsiveness to the nuances of physics education. Subsequently, we tested PhysicalLlama against a diverse set of 2,000 student responses, examining its performance with and without prior fine-tuning. This comparative analysis extends to its performance against two commercial LLMs (GPT-4 and Claude 3) and one public LLM (Mistreal 7B-instruct), offering a comprehensive perspective on PhysicalLlama's capabilities relative to the current state-of-the-art. Our findings will shed light on PhysicalLlama's potential as a transformative tool in physics education. By bridging the significant institutional gaps in this field, PhysicalLlama stands poised to redefine the landscape of educational assistance in physics, offering students unprecedented support and insight. |




