Designing a Useful Problem Solving Coach: Usage and Usability Studies

Designing useful computer coaches for problem-solving in introductory physics requires an iterative process to develop both the software framework and content of the coaches. Research is necessary to determine which students use the coaches, how those students use the coaches, and whether the coaches are effective for those students. We report results of a study of prototype coaches to determine which students use the coaches, how those students use the coaches, and whether those students believe the coaches are effective. We also discuss how this data will guide the next iteration of these coaches.


INTRODUCTION
The ability to solve problems is highly valued and critically important to scientists and engineers [1].Teaching problem solving based on the cognitive apprenticeship model [2] requires exhibiting all the requisite mental processes and practicing those processes while receiving real-time guidance and feedback (coaching).Providing this comprehensive coaching can be difficult because problem solving requires many decisions that students and even instructors do not recognize.Because students differ in their learning styles and rate of learning it is helpful if coaching is available at students' convenience.One approach to expanding students' access to effective coaching is the development of web-based computer coaches which could simulate the guidance provided by a moderately skilled instructor.
Designing a software framework for computer coaches is an iterative process.After small scale laboratory testing of computer coaches [3] we built a complete set of prototype coaches for introductory mechanics based on research in problem solving and computer tutoring.Here we report the results of testing these prototypes with students in large introductory physics classes to determine their potential usage and student perception of their utility.These results are being used to guide further development of both the software framework and the content of the coaches.Several cycles of implementation, assessment, and development will likely be necessary to achieve a useful and effective software framework.
Our goal is to determine if web-based computer coaches have a place in a multi-faceted toolkit to help diverse students develop problem-solving skills in an introductory physics class.So far, we have built and tested 35 coached physics problems spanning six major topics in an introductory mechanics class.The structure of these coaches is described elsewhere, along with the results of a pilot study in Fall 2011 [4].
In the Fall 2011 study, students could choose to do their homework either by submitting the correct answer to WebAssign (http://www.webassign.com/)within three tries or by completing the computer coach for the same problem.In that situation, students found the coaches very attractive, attempting (finishing at least the first section) an average of 28 (80%) and completing an average of 23 (66%).In Spring 2013, we conducted a second study designed to make use of the coaches less attractive to get a population of users and non-users in the same class.Below, we describe this study and discuss the questions (1) What subpopulation of a class tends to use such coaches?and (2) How did they use the coaches?We also discuss how this data will influence the development of the software framework.

THE STUDY
In Spring 2013, two sections of the calculus-based introductory mechanics course (249 students total) used the coaches.The results for the sections were similar so we have combined all of the data.
The two sections had different lecturers, but both focused on problem-solving facilitated by Cooperative Groups in the labs and discussion sections [5].Students submitted weekly homework assignments (10% of the course grade) through WebAssign and were allowed 5 tries to earn credit.Roughly one third of these problems were Context-Rich problems [6] on which students could get help from a computer coach.Students received no direct credit for using the coaches.The WebAssign and coached versions of a problem differed only in the symbols used to represent quantities in the problem.
Students took four midterm tests, each with two free-response Context-Rich problems, and a final exam with five Context-Rich problems.The five final exam problems were identical for both sections.In addition to the scores and written solutions to these problems, other collected data included pre and post test scores on the Force Concept Inventory (FCI) [7], a Math skills test, the Colorado Learning Attitudes about Science Survey (CLASS) [8], and a survey about the students' background.Students' use of the coaches was monitored by recording their keystrokes.They were also surveyed at the middle and the end of the semester about their use of the computer coaches.

USER CHARACTERISTICS
In any course, some students will tend to use resources such as the computer coaches and others find them unnecessary or incompatible with their personal preferences.To design effective coaches, one needs to know the relative sizes and characteristics of each group.
One difference between the three groups is in their gender.The proportion of females in the L group is about half that of the class as a whole.This is consistent with research that females with the same performance as males are more willing to seek assistance [9].
The three groups also differed in their physics preparation as measured by their scores on standardized assessments.Table 1 shows the pretest scores of the three groups, broken down by gender.The number (N) differs from those from the entire class because only students who took all three pre-tests are included.A higher FCI pre-test score is correlated with lower use of the computer coaches.There is some indication that this may also be true for the Math skills 1 Although there were 35 coaches, only 29 total coaches were considered for the data analysis because a database error made it impossible to track the usage of the first 6 coaches.test.One might infer that the more poorly prepared students recognize this and choose to use easily accessible help.
Another difference among the groups is their expectation of the effort required for the class.Table 2 shows the results from a beginning-of-the-semester survey.Students in the L group expected to spend less time studying and to earn a higher grade in the class.No student expected to earn less than a B.
One might thus infer that students in the L group have high confidence in their ability to perform well.Students in the M group similarly expect to do well, but also expect to spend more time doing so.Students in the H group expect to spend more time and are less confident of their success.
Finally, we compare the performance of the three groups on the final exam.Table 3 shows averaged scores from 4 of the 5 free-response Context-Rich problems on the final exam (one problem that was graded anomalously in one section was dropped from the analysis).The scores for all three groups are not significantly different despite differences in FCI and MATH pre-test scores.In similar classes, a combination of those scores accounts for about 25% of the difference in problem-solving performance on the final exam.

USAGE CHARACTERISTICS
In addition to characterizing the students using the coaches, it is also important to examine how they use the coaches.There are two main differences in how the three groups of students used the coaches.
The first difference is the pattern of usage.Figure 1 shows the fraction of the coaches preceding each midterm test used by each group.The L group used only 20% of the coaches associated with the topic of the second midterm before the second midterm.Their usage then dropped. 2 In contrast, students in the H group used the coaches consistently throughout.The M group started out using a high fraction of the coaches but their usage dropped steadily throughout the semester.One possible reason is that the M students became more confident problem solvers, and believe that they no longer need the coaches.A second is that the M group decided that the coaches were no longer useful or valuable.
A second difference is how students reported using the coaches.On the end-of-semester survey, students were asked to select one of several choices describing how they used the coaches, or to write their own answer.The most popular choice was "I tried to solve the problems on my own and used the computer coaches for help if I got stuck," selected by 42% of the H group, 70% of the M group and 48% of the L group.The next most popular selection "I worked through the computer coaches before trying to solve the problems on my own," also differed depending on the group.While only 4% of the M group and 3% of the L group selected this option, 37% of students in the H group did so.

DISCUSSION
One of the goals of the current study was to assess students' perceptions of the utility of the computer coaches.In improving the usability of the coaches, one could choose to focus on improving the user experience and effectiveness for students who tend to use the coaches, or on trying to make the coaches more attractive to a larger fraction of students.Because most of the students in the physics course chose to use a 2 Data on coaches used before the first midterm is not available because of a database error.substantial fraction of the coaches, the next iteration will focus on the former, which may, as a byproduct, lead to a larger user base.
The population who tended to use the coaches from the beginning, even though they took time and gave no direct grade benefit, consisted of 71% of the students; the H and M groups.On an end-of-semester survey, students were asked to respond to the statement "The computer coaches did not help improve my problem solving in this class" on a 5-point Likert scale.74% of the M group and 67% of the H group responded "Disagree" or "Strongly disagree."Interestingly 46% of the L group did so as well.
Furthermore, on a question that asked students to rank (with no ties) 10 different components of the class from most (10) to least (1) useful, both the H amd M groups ranked the coaches among the top 3 useful components (7.1±0.5),roughly on par with lectures (7.3±0.8) and doing the homework (7.7±0.5) and ahead of other course components such as the textbook, labs, and problem-solving discussion sections.In contrast, the L group ranked the coaches 7 th most useful (4.9±0.5), while lectures (8.3±0.3), the discussion sections (7.1±0.3), and homework (6.7±0.5) were the 3 most useful components.All three groups ranked the computer coaches as more useful than the tutor room staffed by graduate teaching assistants (ranked either 8 th (L & H) or 10 th (M) out of 10).
Likewise, 63% of the M group and 70% of the H group responded "Agree" or "Strongly agree" on a 5point Likert scale to the statement "The computer coaches helped improve my conceptual knowledge of physics."Indeed, the absolute FCI gain for the H (21%±5%) and M (19%±5%) groups was markedly higher than that of the L group (12%±5%), with no significant difference between gains for the male or female students.
From this data, we conclude that the target population believed that the coaches were beneficial to their learning at the end of the course.However, this population consisted of two subgroups, the H and M groups, who had dramatically different usage patterns.
Ideally, as students become more competent as well as confident problem solvers, one might expect to see a decrease in the use of the coaches.The fact that the H group not only continued to use almost all the coaches but also that a large fraction responded that they "…worked through the computer coaches before trying to solve the problems on my own," indicates that some mechanism is necessary to wean these students from the detailed help provided by the coaches.On the end-of-semester survey, 59% of the M users and 53% of H users "Agreed" or "Strongly agreed" with the statement "Using the coaches improved my confidence in solving non-coached problems," but this still leaves almost half of the H users in a dependent state.
Other changes to the coaches could address the user interface and the time necessary to complete them.Of the 167 students completing a mid-semester survey 75% of the H group, 78% M, and 65% L responded "strongly agree" or "agree" to the statement "When using the computer coaches, it was usually clear how to proceed."Thus, we conclude that overall, the interface of the coaches was reasonably clear and self-explanatory.
The keystroke data shows that the average time to complete a single problem using a coach was less than 31 minutes, comparable to the time spent by students interacting with a human coach in office hours.However, many students thought that the computer coaches took too long.On the mid-semester survey, 49% of the respondents answered "Agree" or "Strongly agree" to the statement "Using the computer coaches for homework made the homework take too long."Furthermore, 37% of the answers to the freeresponse question "What do you like least about the computer coaches?" mentioned that the computer coaches were either too long or too repetitive.In designing the next iteration, allowing more flexibility in the student pathway through the coaches could address both the time and repetition issues.
To produce the first set of prototype coaches to test with students, the decision structure and its software framework were basically procedural in nature.There were three different types of coaches with differing amounts of flexibility [3] and emphases but only one type of coach was available for each problem.The most popular and numerous type of coach had limited flexibility in that it would follow reasonable student choices in the problem solution but was rigid in the order of decisions that constituted those choices.Instructors wishing to modify the coaching pattern or build coaches for different problems needed knowledge of the underlying software language with more significant changes in procedure requiring more sophisticated software knowledge.
The second round of prototype coaches, now under construction, will address these issues of flexibility for the students and ease of modification for instructors.Indeed, the new software framework will allow instructors to build a new coach with no knowledge of the underlying software using only a graphical user interface (GUI).Although not part of the current study, this requirement was identified in a workshop on the coaches for physics instructors.
The new prototypes will be designed to better address the needs of the high and medium user population by having adjustable (by instructors or students) decision grain size.It will allow students to jump to sections of the problem solving framework that address their issues without repeatedly going through coaching they do not need.For the H group, it will allow them to have step-by-step coaching from the beginning to the end of a problem if desired.However, it will encourage bypassing detailed coaching.This flexibility should also reduce the time spent on the coaches for the students who perceive them as onerous or repetitious.The additional flexibility should also engage students with different learning priorities at different times in the course.They would be able to use the same coach differently at the beginning and at the end of the course.This should be more useful to the M group whose usage decrease could be due to a perceived inefficiency of the coaches toward the end of the semester.
These computer coaches will never replace a good human coach in its ability to help students identify their difficulties and remediate them.However, when completed, the computer coaches should provide a helpful approximation of the office hour experience available on demand and with whatever repetition is desired by the student.We expect that most students will still need human intervention provided by the instructor and other students to make significant learning gains.Nevertheless, we hope that computer coaches interacting with students on the internet can be a flexible tool to support the learning of a diverse set of students in the introductory physics course.
This work was supported by the National Science Foundation under DUE-0715615 and DUE-1226197.

TABLE 1 .
Pre-test scores (as percentages) of the 3 groups.

TABLE 2 .
Expectations of the 3 groups.
Fraction of coaches associated with each test used by students in each group.The tests were given at the end of week 4, 7, 11, and 15.The lines are drawn to guide the eye.