Preliminary results for the development and deployment of Conceptual Learning Assessment Instruments Methodology Survey ( CLAIMS )

Following the creation of the Force Concept Inventory (FCI), many STEM discipline-based education researchers developed their own version of concept inventories. To incorporate all types of concept inventories created, we introduce the terminology of Conceptual Learning Assessment Instruments (CLAIs). Previous research shows much variation between what is considered a CLAI and the type of evidence used to support the inferences made from the results of CLAIs. As part of our study, we began by creating the Conceptual Learning Assessment Inventory Methodology Survey (CLAIMS). The CLAIMS was sent to developers of over 100 CLAIs identified via a systematic/ structured literature review. This paper discusses the research behind the CLAIMS as well as the preliminary results for the different CLAIs specifically differences in the field-testing of the CLAIs. PACS: 01.40.Fk, 01.40.gf, 01.50.Kw


I. INTRODUCTION
The creation of the Force Concept Inventory in 1992 and its revision in 1995 [1] revolutionized physics education research.With the ability to quantitatively measure student conceptual understanding, Hake's groundbreaking study shows how interactive engagement methods increased effectiveness in teaching mechanics when compared to traditional teaching methods [2].Seeing this as a powerful tool for evaluating student learning, discipline-based education researchers across the country began creating similar instruments.Almost twenty-five years after the FCI, there are now over a hundred instruments in the different STEM fields.The necessity of quantitatively measuring student success in a number of contexts has led to a diverse group of instruments across fields.
Previous research [3] shows that developers use different methodologies and have different definitions of a concept inventory.While this variety may not be problematic, it may affect the information that can be validly gained from a particular instrument.Validity must be considered as "nothing less than an evaluative summary of both the evidence for, and the actual -as well as potential -consequences of score interpretation and use" [4].It is for this reason that we bring attention to the development methodologies and evidence gained to establish validity and reliability in order to compare a wide variety of instruments.For the purposes of this analysis we are adopting a general term to encompass all definitions of conceptual assessments 1 : Conceptual Learning Assessment Instruments (CLAIs).
This study begins to analyze methodologies used to create different STEM CLAIs and the evidence supporting inferences made from their results.In section II, we present the mythologies utilized to identify appropriate CLAIs as well as for the development of the Conceptual Learning Assessment Instrument Methodology Survey (CLAIMS).Sections III presents the results and discussion of the preliminary results of the CLAIMS.

II. METHODOLOGY A. Identification of CLAIS
We utilized a detailed literature review to identify as many conceptual instruments as possible and then determined the inclusion criteria for an instrument to be considered a CLAI, see Table 1.

TABLE 1. Inclusion criteria CLAIs
Assess content knowledge; Be a coherent instrument 1 ; Be created for a STEM field; Be published or completed; Be distracter-driven; Be intended for use at the post-secondary level. 1 We define a coherent instrument to be one created with intent to be used together in the order in which they appear.
Starting with instruments listed on PhysPort, an AAPT supported website which has previously collected assessment instruments in Physics and Astronomy, researchers identified 37 different CLAIs in Physics, Math and Engineering.By searching through the references of these collected publications, more potential publications were identified.Researchers repeated this process until this process yielded no unrecorded publications.

B. Creation of the CLAIMS
The CLAIMS is a 28-item survey, designed to probe the methodologies used to develop the CLAIs we collected during our search.A concept inventory expert (Lindell) and a psychometrics expert (Douglas) developed the CLAIMS instrument based on relevant psychometric theory.[4,5] It was sent to CLAI developers previously identified.The CLAIMS asks for developers to provide information on the instrument development methodology, proof of the evidence for the instruments' validity as well as any evidence gathered that was not published at the time of development.

A. Preliminary Analysis of CLAIMS
Researchers identified 108 CLAIs shown in Fig. 1 in blue.Developers of the identified CLAIs were contacted via email and asked to fill out the CLAIMS instrument within 10 days.The developers of only 29 CLAIs completed the CLAIMS prior to the deadline as shown in gray in Fig. 1.See Table 2 for the individual CLAIs.

Differences in field-testing
Because of the limited scope of this paper, we focused on one piece of the CLAI development methodology, the field-testing of the instruments, as this is one component key to validity arguments.Table 3 shows, at the instrumentlevel, the responses for field-testing of different CLAIs.The first section shows the level at which a CLAI should be administered according to the developers, while section two shows the level at which the CLAI was field-tested.
There is discrepancy for some instruments in these two categories.As discussed in previous works [35] it is not enough to consider classical notions of validity and claim a valid instrument.The evidence collected for their use and interpretation of results supports valid uses of a CLAI.Collecting a sufficient amount of this evidence in fieldtesting involves both the intended use of the CLAI as well as the size and demographics of the field-testing.
Please note the differences in sections 3, 4, and 5 of Table 3. Considering location of field-testing, 28 CLAIs were field tested at the developing institution, but only 18 CLAIs were field tested at a location other than the developing institution.For population size most instruments were field-tested on a population of 500 or fewer (23 CLAIs) with almost half of those field-testing on fewer than 100 individuals (11 CLAIs).Only 12 CLAIs were field tested on more than 500 individuals.We recognize two of CLAIs field-testing were for upper-level courses, which often have a smaller population.
With the initial analysis of the CLAIMS, we see a discrepancy with some CLAIs between the populations for which the instrument was intended and those on which the instrument was field-tested.Often we see field-testing at the intro-undergraduate level only, likely because of the large easily accessible population in those courses, but then the developers claim that the instrument was developed for use in other age groups.Only 52% of the reported CLAIs field-tested on all of their intended age groups that they say the instrument is appropriate to use.

IV. CONCLUSIONS
This study reveals an important issue with the development and use of CLAIs, specifically that any use of a CLAI that cannot be supported by appropriate fieldtesting is not a valid use.Creators of CLAIs should be aware they cannot simply establish validity for one population and then claim it is valid for other populations.It is important that evidence for validity and reliability are not bypassed by developers nor overlooked by users so as informed decisions can be made about the reliable and valid uses of the CLAIs.

Table 2 .
Instruments where developers completed CLAIMS.

Table 3 .
Field-testing comparison for CLAIMS responses.