Student belonging in STEM courses that use group work

Students’ sense of belonging predicts their success and persistence in STEM courses. Collaborative, small-group activities form the foundation of many research-based instructional strategies. Our broader project seeks to understand the role of small groups in students’ sense of belonging to support instructors in the formation of equitable groups in active engagement classrooms. In this article, we focus on the construct and discrimination validity of a belonging measure. To assess the belonging measure’s ability to discriminate across time, courses, and demographic groups, we administered a short survey on belonging in a variety of STEM courses that used groups as a pre-and post-class assessment. We analyzed the results using structural equation modeling to inform the validity of the survey and identify possible differences of interest. The results provided evidence for both construct and discrimination validity. Belonging varied across the courses and changed from pre to post in two of the four courses: one course saw a decrease and the other course saw an increase. Men tended to have a higher sense of belonging than women and the changes in belonging increased these gender differences. One possibility is that the differences observed across courses could result from the different practices used to support group work within each course. The validity evidence for the belonging measure indicates it will support our ongoing research to establish the statistical relationships between instructor practices to implement and support small groups and students’ sense of belonging.

Student-centered, collaborative instruction can lead to more learning and higher grades than lecture-based instruction [19][20][21][22].Such instruction may [23,24] or may not [25] achieve improved equity.These practices often have students work in groups of three to five students.Best practices for small group work are not well established.For example, some studies support the formation of groups with heterogeneous prior achievement [24,[26][27][28], others argue for creating groups with homogeneous prior achievement [29][30][31][32], and a few suggest it does not matter [33][34][35].Another common grouping strategy is the avoidance of solo status for minoritized students [36].Laboratory research and stereotype threat [37][38][39] both indicate the harm that solo status can cause.We are not, however, aware of any studies in the classroom that indicate this harm occurs or the size of the harm.
Our overall project investigates the effectiveness of various grouping strategies instructors can use to form intentional groups that contribute to student outcomes and classroom equity.In this article we focus on measuring students' sense of belonging in their class.While our broader project investigates a variety of student outcomes including self-efficacy, conceptual understanding, and course grade, this article focuses on belonging.Belonging is correlated with critical metrics like students' persistence in STEM [1,2].Specifically, this article focuses on substantiating the construct and discrimination validity of a belonging measure [10] in a variety of STEM courses.Our specific research questions are: 1. To what extent did the measures of belonging show evidence for construct validity?2. Did the belonging measures differ across courses in four STEM disciplines before and after instruction?3. Did the belonging measures differ across gender before and after instruction?

II. THEORETICAL FRAMEWORK
We use Connell & Wellborn's self-system model of motivational development, which they based on the fundamental human needs for competence, autonomy, and belonging [40].This model supposes that the classroom impacts students' self-perception, including their sense of belonging, which in turn impacts their engagement and achievement [3,[40][41][42].We further draw on an anti-deficit approach with the perspec-tive that students enter a course with abilities, talents, and capital, and that the institution is then responsible for structuring a course to support all students [43][44][45][46].

III. METHODS
We studied instructional practice and students' sense of belonging in four courses.In each course, instructors randomly assigned groups of 3-5 students.All courses used frequent active learning in the classroom and encouraged student-student interactions around content.We surveyed students during the first and last 2 weeks of the course.Table I summarizes group work practices in each course, including the use of undergraduate learning assistants (LAs), who support instructors by co-facilitating small group work [48].The use of LAs in the classroom has been shown to strengthen students' sense of belonging in STEM [49,50].
Table II gives the course demographics for students who consented to participate.Data collection included many social identity groups, which we grouped in the table due to small sample sizes.The minoritized group included Black, Hispanic, Indigenous, Pacific Islander, and Middle Eastern students.We received responses from non-binary and trans students, but we were unable to perform statistical analysis for these groups due to small sample size.The demographics for each class were typical for the corresponding field.
Course A was algebra-based physics I, mainly taken by students in their final two years, with 690 health-related students at a large 4-year public university that is an emerging Hispanic-serving Institution (HSI).Each week the course met for two 50-minute lectures-including conceptual questions, worked examples, demonstrations, and think-pair-share iClicker questions-and two discussion section meetings with a TA-including homework help, quizzes, and group work.Students received participation points for working together (during about 25% of contact time) in assigned groups of 5 (with 6 groups per TA) on worksheets solving problems.
Course B was a core, upper-level biology course for majors with 62 students at a small, private, 4-year liberal-arts HSI.The course consisted of three 55-minute lectures and a threehour lab each week.The instructor lectured with frequent short discussions, with about 20% of contact time spent on group work, and there was an out of class group assignment.[47].

Course
A B C D Learning Assistants (LAs) support group work --X X Groups discuss expectations and draft contract -X --Groups work on structured assignments in class X X X X Group members assigned changing roles ---X Groups work together outside of class -X X X Peer evaluation --X - Course C was a sophomore-level engineering course with sections of about 30 students each (177 total students), at a mid-sized, 4-year, public, emerging HSI.Classes met three times a week for 100 minutes.Instructors presented the topic through an interactive lesson on the whiteboard with example problems.Students used 50% of contact time to work on problems at their tables in groups of about 4 as the instructor and one to two LAs circulated to facilitate group work and engage students in discussion regarding their problemsolving.There were about 3-4 groups per LA.In addition to these ungraded group activities, students worked on a graded, out-of-class project.
Course D was the first course in the general chemistry series for life science majors at a 4-year, research-intensive, public, emerging HSI.The course was taught by an instructor, four TAs, and approximately 20 LAs, with around 130 students enrolled in a section (266 total students).The course consisted of three 50-minute lectures and a 110-minute discussion each week during a ten-week quarter.The lecture used think-pair-share and clickers.The discussion section was based on a blended model of Process Oriented Guided Inquiry (POGIL) and Peer-Led Team Learning (PLTL).Students worked in small groups on structured worksheets designed around the learning cycle.LAs supported about two groups each.Teams also completed the second stage of the two midterms together.
We used a six-item instrument to measure social belonging on a six-point Likert scale (strongly disagree, mostly disagree, somewhat disagree, somewhat agree, mostly agree, strongly agree), drawn from Fink et al. [10].The instrument (see Table III) includes two factors: perceived belonging (first four items) and belonging uncertainty (last two items).Perceived belonging relates to the general feeling of belonging in a course, while belonging uncertainty relates to the stability of that belonging.We administered the pre-and postcourse belonging surveys through the LASSO Platform [51] for Courses A, B, and C; course D administered the surveys through their learning management system.
We used structural equation modeling (SEM) [52] to investigate the three research questions.For RQ1, we used confirmatory factor analysis (CFA) [52] to test the construct validity of the belonging measures for our study population.We first tested the model based on Fink et al. [10] using the lavaan [53] package.When necessary to improve model fit, we used the modindices command.We repeated these steps until the factor loadings and the fit indices passed the cutoffs discussed in the next paragraph.
The root mean square error of approximation (RMSEA), comparative fit index (CFI), Tucker-Lewis index (TLI), and factor loadings informed how well the model fit the data.RMSEA is an absolute fit index that addresses parsimony in the model by accounting for the model's degrees of freedom.High RMSEA indicates an over-constrained model with too few degrees of freedom.RMSEA has several proposed cutoffs, < 0.05, 0.06, 0.08, and 0.10, indicating good to acceptable fit [54].CFI and TLI are relative fit indices.They compare the test model to a baseline model with all covariances set to zero and all variances freely estimated.Both indices range from 0, no fit, to 1, best possible fit.We used a cutoff of CFI and TLI > 0.95 [54].Balancing absolute and relative fit indices can lead to a model that is simple and fits well.We report scaled fit indices.Hair et al. [55] proposes factor loadings of 0.5, explaining 25% of the variance in the item, as an absolute minimum and 0.7 as a preferred minimum, explaining 50% of the variance.
To test RQ2 and RQ3, we used multiple group SEM.This method built separate SEM models for each course (i.e., multiple groups) to investigate the shift from pre-to post-instruction.To determine if differences existed across courses, we compared these SEMs to a set of SEMs that constrained the intercepts and regression coefficients to the same value across all of the courses.We used the RMSEA, CFI, and TLI to assess the model fit.To determine if the unconstrained multigroup SEMs provided unique information, we compared them to the constrained multigroup SEMs with an ANOVA.We set the intercept for the models to the largest group: pretest for RQ2 and women's pretest for RQ3.The re-  gression coefficients represent differences from the intercept in units of standard deviation (SD).The SEM builds the latent variables as normal continuous distributions.The SEM calculates thresholds for the point on this continuous distribution that a response would shift (e.g., going from strongly agree to agree) such that our six-response Likert-scale had five thresholds.The thresholds tended to cover a span of 3 SD, which means going from strongly disagree to strongly agree is approximately a 3 SD shift.Because of this scale and the spread of the results in the data, we adopted Cohen's [56] rules of thumb for effect sizes: < 0.2 is very small, 0.2 to 0.4 is small, 0.4 to 0.8 is moderate, and > 0.8 is large.

IV. RESULTS
Figure 1 presents the perceived belonging and belonging uncertainty (pre and post) for the four courses constructed by averaging the questions for each factor.Perceived belonging tended towards 'mostly agree' in all courses.Perceived belonging was lower in Course A and highest on the posttest in Course D. The only notable shift in perceived belonging was for Course D where both the median and inter-quartile range increased.Belonging uncertainty tended to be between 'slightly agree' and 'slightly disagree'.Consistent with perceived belonging, the noticeable shift in belonging uncertainty was for Course D where both the median and interquartile range shifted down, indicating higher belonging.
We tested the two-factor structure of the sense of belonging scale using CFA for the pretest and posttest data.All indices indicated a good fit: CFI pre = 0.997, CFI post = 0.978.TLI pre = 0.982, and TLI post = 0.953 (> 0.950 indicates good fit), RMSEA pre = 0.039 and RMSEA post = 0.059 (< 0.060 indicates good fit).The factor loadings, shown in Table III, indicate adequate fits for all items with the lowest fit for the question about performance for the belonging uncertainty factor.
We investigated the change over time in the courses (RQ2) using multi-group SEM, and identified several relationships that also appear in Fig. 1 of the average scores.We did not include Course B in the analysis because no students responded to 'strongly disagree' or 'disagree' for several questions on belonging.The ANOVA between the constrained and unconstrained SEMs was statistically significant (p < 0.001), indicating variation across the courses.The fit indices for the unconstrained SEM indicated adequate fit: TLI = 1.00,CFI = 1.00,RMSEA = 0.096.The regression coefficients in Table IV indicated small to moderate improvements in Course D, little to no change in Course C and very small decreases in belonging in Course A.
Due to small sample sizes, we restricted our analysis for RQ3 to investigate gender differences in Course A and D. The multigroup (course) SEM created standardized regression coefficients, shown in Fig. 2 and Table IV, for men's pretest and posttest scores and women's posttest scores.The intercept was women's pretest scores.In both courses, men tended to report a higher belonging and lower belonging uncertainty than women both before and after instruction, in agreement with previous findings [11-13, 57, 58].
In course A, students' perceived belonging decreased and belonging uncertainty increased.The decrease in perceived belonging for men shifted moderately (0.3 SD) and was statistically significant.The other shifts, while all being consistent with this decrease, were much smaller.The shifts in Course D were also consistent in that perceived belonging increased and belonging uncertainty decreased, and the shifts and differences across gender were much larger than in Course A.  Women started (0.3 SD) and ended (0.5 SD) with higher belonging uncertainty than men and belonging uncertainty decreased for both men (0.4 SD) and women (0.2 SD).Women started (0.2 SD) and ended (0.5 SD) with lower perceived belonging than men and perceived belonging increased for both women (0.6 SD) and men (0.9 SD).Both the improvements in belonging and the greater improvements for men were consistent across both perceived belonging and belonging uncertainty.The shifts from pre to post were small to large in size.While belonging improved in Course D, these improvements were larger for the men in the course than for the women.
V. DISCUSSION AND CONCLUSIONS The CFA provided evidence for the construct validity of the belonging survey.All of the indicators of model fit met or exceeded the minimum values.The belonging measure also captured differences across courses, shifts in belonging for groups of students, and differences across demographic groups.The two latent factors, perceived belonging and belonging uncertainty, had very different average responses.Perceived belonging was mostly positive across courses, with students tending to 'mostly agree' that they belonged in the course.Responses to belonging uncertainty tended towards 'slightly disagree' with a larger spread in all of the courses than the perceived belonging responses.The changes also varied across the courses, but were consistent for both belonging and belonging uncertainty.In courses B and C, there was little shift in either perceived belonging or belonging uncertainty.A small negative shift occurred in course A. The largest increases occurred in Course D with an increase in perceived belonging and a decrease in belonging uncertainty.
The most noteworthy shifts in student belonging occurred in course D, in which perceived belonging increased and belonging uncertainty decreased for both men and women.On the other hand, course A experienced shifts in the opposite direction, though they were not as large.We suspect the use of instructional practices oriented toward supporting group work (shown in Fig. 1) may contribute to the students' sense of belonging in these courses, especially given that course A made only limited use of structured group assignments in class.Prior research suggests that one of these practices, the use of LAs, may be particularly effective at improving students' sense of belonging [49,50].However, course C experienced at most small shifts, despite use of LAs.Because there are many other differences between the courses, including institution and field of study, we hope that expanding the courses in our dataset will help identify the factors contributing to belonging.The gender differences present in the chemistry course (D) and not the physics course (A) stand out from prior work showing larger gender differences on many affective outcomes in physics than in chemistry [59,60].This finding also contrasts recent work showing that rotating roles can benefit women in particular [61,62], as only Course D used this practice.While these results indicate discrimination validity in the belonging measure, the small sample in this study limits our ability to investigate the causes of these differences.The results do point to the need for the larger study across the intersections of discipline, gender, and race/ethnicity that we are pursuing.These findings support our use of this belonging measure in our larger study of instructor practices to support group work.The construct and discrimination validity indicate the instrument can identify differences across courses and likely could identify differences between groups or students randomly assigned to different conditions.Ongoing data collection will support several investigations.Larger samples will allow modeling the relationships between the instructor practices in Table I and student belonging, which we can only speculate upon here based on prior research.Ongoing research will also look at differences in belonging and learning outcomes for groups.We will investigate if groups with either homogeneous or heterogeneous prior performance lead to different student outcomes and outcomes for students in solo-status conditions due to either their race or gender.

5 FIG. 1 .
FIG. 1. Belonging across the four institutions constructed from the averages of the questions in the two factors, which are shown in Table III.The boxplots show the distribution of the data with outliers as large black dots.The light grey dots represent individual students with a slight jitter to spread out the data.

FIG. 2 .
FIG. 2. Belonging for Course A and D for men and women.The scores are standardized within each course and compared to the pretest value for women which is constrained to zero.Error bars represent 2 SE.If the error bars for two coefficients do not contain the point estimates, the difference is likely statistically significant.

TABLE II .
Demographic distribution by course for participants.

TABLE III .
Questions on the belonging survey with their factor loadings for the pretest and posttest.

TABLE IV .
SEM regression coefficients for both models.The intercept was set to either the pretest or the pretest for women.