Standardized Testing in Physics via the World Wide Web

 

Dan MacIsaac

Rebecca Pollard Cole

David M. Cole

 

 

Northern Arizona University

 

This is a preprint of a manuscript submitted for publication in the Journal of Research in Science Teaching ©1999 by the National Association for Research in Science Teaching.

 

Running Head:  Paper- vs. Web-based Testing in Physics

 

 

Reprint requests should be sent to Dan MacIsaac, Department of Physics and Astronomy, Northern Arizona University, Flagstaff, AZ, 86011-6010. This research was partially supported by funding from the Arizona Collaborative for Excellence in the Preparation of Teachers (ACEPT) and NAU Organized Research Grant funds.  The authors wish to acknowledge the helpful contributions of Nate Davis and Brian Nance who assisted with HTML and data coding and were funded by an NAU Hooper Undergraduate Research Fellowship. Valuable comments and suggestions regarding the statistical comparisons of item response patterns was provided by Professor Philip Sadler of Harvard University.


Abstract

On-line web-based technologies provide students with the opportunity to complete assessment instruments from personal computers with internet access. The purpose of this study was to examine the differences in paper-based and web-based administrations of a commonly used assessment instrument, the Force Concept Inventory (FCI). Results demonstrated no appreciable difference on FCI scores or FCI items based on the type of administration. A 4 way ANOVA (N = 376) demonstrated differences in FCI scores due to different sections of the same sections, different courses and gender. However, none of these differences was influenced by the type of test administration. Similarly, FCI student scores were comparable with respect to both test reliability and predictive validity. For individual FCI items, paper-based and web-based comparisons were made by examining potential differences in item means and by examining potential differences in response patterns. Chi Squares demonstrated no differences in response patterns and t Tests demonstrated no differences in item means between paper-based and web-based administrations. In summary, the web-based administration of the Force Concept Inventory appears to be as efficacious as the paper-based administration.


Since the late 1970’s, science educators have been experimenting with the use of microcomputers for the conceptual and attitudinal assessment of their students (Arons, 1984, 1986; Bork, 1981; Waugh, 1985). Since the late 1980’s, multiple-choice, machine scored, standardized instruments have been developed to assess the conceptual and attitudinal state of introductory physics students. The Force Concept Inventory (FCI), perhaps the best known of these standardized instruments, assesses student’s conceptual knowledge of physics (see Hestenes, Wells & Swackhamer, 1992). Recently, Redish, Saul, and Steinberg (1998) developed the Maryland Physics Expectations Survey (MPEX), a standardized instrument which assesses the attitudinal state of physics students. Both the FCI and the MPEX are widely used in the physics education research community (Hake, 1998).

Data from these instruments can provide valuable information for both research and teaching. For example, the instruments can be used to assess physics learning, to justify and guide interventions in physics teaching practices, to evaluate introductory physics programs and to compare student learning and attitudinal outcomes. However, each of these instruments requires approximately thirty minutes. Additionally, in both research and teaching situations, the instruments are typically given for both pre- and post- instruction. Each instrument can therefore consume a full hour of valuable instructional time. Further, additional resources are required to score, collate, record, and analyze the instrument data. Both the loss of instructional time and the administrative overhead may discourage the regular use of these instruments by many introductory physics instructors (Hake, 1998).

Recently available, “on-line” or web-based technologies provide students with the opportunity to complete assessment instruments from personal computers with internet access (Titus, Martin & Beichner, 1998). While not as sophisticated as advanced computer-adaptive testing (CAT) of the sort recently adopted for tests like the Graduate Record Examinations (Straetmans & Eggen, 1998), web testing could still greatly reduce the administrative and class time burden required for the application of standardized instruments. Furthermore, new kinds of data could be collected for improving the instruments themselves (such as question latency data) and data could be readily collated for long-term studies of student learning in databases contributed to by on-line instruments.

To be widely used, the web-based administration of these instruments must be characterized in terms of reliability, and results from the web-based administration of these instruments must be statistically compared to results from standard paper administration. If measurements from web-based administrations are explored, they can be corrected or calibrated to paper-based administrations. Therefore, the purpose of this study is to begin this process by examining the differences in paper-based and web-based administrations of the Force Concept Inventory.

 

Method

Participants

The participants in the study were students from three introductory physics courses taught at a medium sized university in the southwest during the Spring of 1998 and the Fall of 1999. The first two courses, General College Physics I (Physics 111) and General College Physics II (Physics 112) comprise the two semester algebra-based sequence for non-science majors. Students in these two courses were mostly pre-health professions, biology and education majors. The third course, University Physics I (Physics 161) is part of the three semester calculus-based sequence for science majors. Students in this course were mostly science (e.g. physics, chemistry) and engineering majors.

The participants made up a sample of 376 students, 235 (62.5%) women and 141 (37.5%) men. As the majority of the students were caucasian, in the age range of 18 to 22, age and ethnicity were not considered further.

Instruments

The Force Concept Inventory (FCI) is a 30 item multiple choice test which "requires a forced choice between Newtonian concepts and common-sense alternatives" (Hestenes, Wells, & Swackhamer, 1992, p. 142). The concepts tested include kinematics, Newton's First, Second and Third Laws, the superposition principle and forces. Student data from the FCI and related instruments have now been collected and published on thousands of students (Hake, 1998). The Maryland Physics Expectations Survey (MPEX) is a 34 item Likert instrument with 5 attitudinal subscales (Redish, Saul, and Steinberg, 1998) which was used as a filler task and not analyzed further in this study.

 

Procedure

This study used a quasi-random, quasi-experimental design. During the Spring of 1998, one section of Physics 112 and one section of Physics 161 participated in the study. During the Fall of 1998, one section each of Physics 111, Physics 112, and Physics 161 participated. In total, 5 sections of three different courses participated. For simplicity, these will be referred to as classes. Each class section was divided into two equal (within one student) half-class groups by selecting every second name in alphabetical order from the roster. During the first week of each semester, thirty minutes was devoted to testing. In each class, one half-class group completed a paper-based FCI and were then asked to complete the web-based MPEX in the next seven days. The other half-class group completed a paper-based MPEX and were then asked to complete the web-based FCI in the next seven days.

Each student was supplied with the web address for the test appropriate to their assigned half-class group. No training was provided to the students for taking either the FCI or the MPEX on the web. Further, there was no attempt to authenticate the web users. Each student's work was accepted as their own. Overall completion times, submission times and dates were recorded. This information was used to ensure that students took no longer than 30 minutes to complete the test and that they took the test within the seven day period. It should be noted that the web-based format allowed students to retake the test after they received on-line feedback regarding their first submission. The date and time information ensured that the test data used as part of the study was their first submission.

All of the tests were graded as to completeness and counted as the equivalent of one homework or quiz assignment. With respect to final class grades, students’ participation comprised about 3 points out of one thousand total points, so that completion or non-completion had negligible impact.

 

Results

As a result of the paper-based and web-based administrations, 376 usable tests were collected. Tests that were turned in after the seven day period, or that were taken for longer than 30 minutes were deemed unusable. Student scores on the FCI were calculated by adding the total number of correct answers with a total possible FCI score being 30. For the entire data set (N = 376), the mean of the FCI was M = 13.71 (SD = 6.08). Table 1 presents the means and standard deviations of the Force Concept Inventory for all sections of all of the introductory physics classes tested.

 

Table 1

Means and Standard Deviations of FCI student scores in all sections of all physics classes.

 

 

Spring 1998

 

 

Fall 1998

 

Course

 

N

 

Mean

 

SD

 

 

N

 

Mean

 

SD

 

Physics 111

 

na

 

na

 

na

 

 

109

 

9.11

 

4.19

 

Physics 112

 

38

 

15.37

 

6.09

 

 

38

 

13.71

 

4.16

 

Physics 161

 

90

 

18.17

 

5.64

 

 

101

 

14.09

 

5.41

 

 

The purpose of the study was to examine differences in paper-based and web-based administrations of the Force Concept Inventory.  Therefore, several different analyses were conducted. First, total FCI scores were calculated and differences between paper and web were examined. Second, differences in individual items between paper and web were explored. Third, patterns of responses in the individual items were examined to determine if differences existed between paper and web-based administrations. Finally, the predictive validity of the two different FCI administrations on students' course grades was examined. The results of these analyses are reported in the sections which follow.

Paper-based Versus Web-based FCI Student Scores

Data for this study were collected in different sections of 3 different physics courses (see Table 1). In addition, previous research has indicated differences in FCI scores due to gender. Therefore, to examine differences in paper-based and web-based FCI student scores a 5 X 3 X 2 X 2 ANOVA was used (5 sections, 3 courses, 2 genders, 2 types of FCI administration). An alpha level of .01 was used for all statistical tests. Significant differences were found for the main effects of section, course, and gender. No significant differences were found for the main effect of FCI administration. For the first-order interactions, no significant differences were found due to type of FCI administration. Table 2 presents the results of the ANOVA.

 

Table 2

Four-Way ANOVA summary table for section, course, gender, and type of FCI administration

 

Source

 

df

 

MSe

 

F

 

course

 

2

 

1684.72

 

   68.09 *

 

section

 

2

 

   421.75

 

   17.05 *

 

gender

 

1

 

   499.79

 

   20.20 *

 

administration

 

1

 

     29.06

 

     1.17

 

course x administration

 

2

 

     26.79

 

     1.08

 

section x administration

 

2

 

     41.45

 

     1.68

 

gender x administration

 

1

 

         .14

 

       .01

 

*p < .01

 

 

 

 

To further examine potential differences in the student scores, Cronbach's alpha was calculated separately for the paper and web administrations. For the entire sample a = .86 (N = 376), for the paper-based administration a = .86 (N = 212), and for the web-based administration a = .85 (N = 164). These alpha levels appear to be comparable.

Paper-Based Versus Web-based Individual FCI Items

Differences in the paper-based and web-based administrations of the FCI for individual items was explored using t Tests. A probability level of .01 was used for all statistical tests. The F statistic was used to determine whether the variances of the paper- and web-based administrations of each item were equal. No significant differences were found for any of the 30 items. Table 3 presents the results of the t Tests.

 

Table 3

Results of t Tests for paper-based and web-based administrations of FCI items

 

Item

 

F

 

prob<F

 

 

Item

 

F

 

prob<F

 

Item 1

 

1.29

 

.08

 

 

Item 16

 

1.00

 

.98

 

Item 2

 

1.08

 

.60

 

 

Item 17

 

1.04

 

.79

 

Item 3

 

1.02

 

.91

 

 

Item 18

 

1.12

 

.45

 

Item 4

 

1.05

 

.71

 

 

Item 19

 

1.07

 

.66

 

Item 5

 

1.04

 

.77

 

 

Item 20

 

1.08

 

.62

 

Item 6

 

1.05

 

.75

 

 

Item 21

 

1.04

 

.80

 

Item 7

 

1.13

 

.41

 

 

Item 22

 

1.01

 

.96

 

Item 8

 

1.03

 

.86

 

 

Item 23

 

1.00

 

.98

 

Item 9

 

1.01

 

.98

 

 

Item 24

 

1.00

 

.98

 

Item 10

 

1.04

 

.81

 

 

Item 25

 

1.06

 

.67

 

Item 11

 

1.12

 

.45

 

 

Item 26

 

1.01

 

.93

 

Item 12

 

1.02

 

.90

 

 

Item 27

 

1.00

 

.98

 

Item 13

 

1.04

 

.80

 

 

Item 28