				ANALYZE 1.00
     This program is for teachers who use machine-graded multiple-choice 
tests.  It analyzes the output from test scanners.  Although developed for 
OptiScan 5 and Sentry 7001 scanners (both products of National Computer 
Systems), it is designed to read the output from other scanners as well.  It 
requires an EGA monitor and a PC.

FEATURES
--------
     The program is user-friendly, flexible, fast, and fun.  Output options
that are selected from the screen include: 

     1)  Basic Printout              Alphabetically and numerically sorted
                                     scores, statistics, and a bar graph.

     2)  Item Analysis               Percent correct, distribution of
                                     results, and discrimination for each
                                     item.

     3)  Student Responses           A compact record of each student's 
                                     responses, so that the original score
                                     sheets can be thrown out.

     4)  Personal Reports            Designed to be cut out and given to 
                                     students.  Shows answer key, number and
                                     percent they got right, and their 
                                     responses.

     5)  Discrimination Explanation  A one page explanation of what 
                                     discrimination is and how it is 
                                     calculated.

     6)  Wide-output                 This format will fit on "green-bar"
                                     computer paper used by mainframe
                                     computer printers.  If desired, the 
                                     program will start a terminal emulater,
                                     such as KERMIT, when analysis is done.

     7)  Choose Answer Key File      If the answer key contains errors, 
                                     just correct it and re-scan it.  No
                                     need to re-scan the students' sheets.

A "Format Finder" program determines the file structure of your input data 
files, and writes an initialization file.  This initialization file lets 
ANALYZE know how to interpret your data files.  (Without an initialization 
file, ANALYZE will recognize two formats used with our 7001 and two used 
with our OptiScan 5.)  The initialization file also lets you set defaults, 
such as whether to always print a one-page explanation of how discrimination 
is calculated.  The defaults can be easily overridden by pressing F1, F2, 
etc., as prompted.  The program has been written by a teacher for teachers.  
It has been used for over a year at our school.

INPUT
-----
     The input is limited to 400 score sheets of 200 questions each.  The 
input file consists of records.  The contents of each score sheet may be on 
either one or two records.  The contents of the answer key sheet must be at
the start of a file.  A record for a 20 item test might look like this:

    000245#455 0001  JOHN DOE     12254 434*2552112543       T0014
                    |-Name field-|-----Response field-------|

Only the information in the Name and Response fields is used; the other
information is ignored.  The response field can contain integers from 1 to 
5, asterisks, and blanks.  The asterisk indicates that more than one answer 
was marked (question 10).  A blank means no answer was given (question 6).  
Individual responses are not separated by spaces, tabs or quotation marks.
     I have tried to make the input format flexible.  If your scanner is 
incompatible with the above restrictions, let me know.  I have some ideas 
for making the input more flexible, but I don't know what other scanners are 
out there, so I will wait until I learn of a need.

OUTPUT
------
     The following is an actual printout with the blank lines at the end of 
pages removed.  (Note that Reliability = -.2 for this test, rather than the 
usual .8 to .95; the input file was generated with random numbers.)


******************************************************************************
Name on Answer Key:  MUSCLES IN BACK       Number of Sheets Graded:  7 

Arranged by Score                        Arranged Alphabetically
                    CORRECT ANSWERS                          CORRECT ANSWERS
Name                number  percent      Name                number  percent
      
Fackler John            3    30.0        Fackler John            3    30.0
Wilkinson G Rhodium     3    30.0        Matthews Virgil         3    30.0
Matthews Virgil         3    30.0        Reibenspies Joe         5    50.0
Roundhill David M       4    40.0        Rook Susan              6    60.0
Reibenspies Joe         5    50.0        Roundhill David M       4    40.0
Swartz Gary             5    50.0        Swartz Gary             5    50.0
Rook Susan              6    60.0        Wilkinson G Rhodium     3    30.0

           Average:    4.1  41.4%
     Highest Score:    6    60.0%
      Lowest Score:    3    30.0%
Standard Deviation:    1.2  12.1%
Number of Problems:   10

                    Relative Distribution of Scores
  0      0 -  10% 
  0     10 -  20% 
  3     20 -  30% ******************************************
  1     30 -  40% **************
  2     40 -  50% ****************************
  1     50 -  60% **************
  0     60 -  70% 
  0     70 -  80% 
  0     80 -  90% 
  0     90 - 100% 

Page   2  MADEUP   MUSCLES IN BACK                                    12-18-1993
                                ITEM ANALYSIS

       Correct               Distribution        Answer      Discrimination
            on 
Item   %     #        A    B    C    D    E Blank Key NetD   A   B   C   D   E
  1   57.1    4       -    1    1    1    4        E  .01       .1 -.2 -.0  .0
  2  100.0    7       -    -    2    2    3        *   0           -.0 -.5  .5
  3*# 14.3    1       2    1    1    1    1    1   D -.20   .7  .1 -.2 -.2 -.0
  4   28.6    2       2    1    3    1    -        A  .28   .3 -.2  .0 -.2    
  5   42.9    3       3    1    1    2    -        A  .03   .0 -.2  .1 -.0    
  6*   0.0    0       -    -    4    1    2        B   0            .5 -.0 -.5
  7   57.1    4       2    -    4    -    -        C  .16  -.0      .2        
  8   42.9    3       -    1    3    3    -        C  .19      -.0  .2 -.2    
  9 # 28.6    2       -    2    1    2    2        D  .05       .7 -.2  .0 -.5
 10   42.9    3       1    -    -    3    3        E  1    -.2         -.5  1.
*The most difficult problems.
#A wrong answer was chosen significantly more often by high-scoring students
than by low-scoring students.

                                     A's    B's    C's    D's    E's   Blank
Distribution of Answers on Key:     20.0%  10.0%  20.0%  20.0%  20.0%    -
Distribution of Student Responses:  14.5%  10.1%  29.0%  23.2%  21.7%   1.45%  
Student responses with more than one answer marked (indicated by "?"):   1 

95% Confidence interval for a student's "true ability":  +/- 3 (or +/-26%) 
from the current score, based on Reliability (Kuder-Richardson Method):  -.21

                            -1<0     0<.2   .2<.4   .4<.6   .6<.8   .8<1
      NetD Distribution:     10%     70%     10%      0%      0%     10%
      Average Net Discrimination:  0.15
      90% Confidence interval for a discrimination of zero:  -.53 to .53

Item discrimination, NetD, is the square of the correlation coefficient between
test scores and an item score.  It is scaled to range from -1 to +1.  A negative
discrimination means more low-scoring students chose a response than did high-
scoring students.

INFO:  The answer key can contain blanks, which are not scored, and asterisks,
which cause all responses to be scored as correct.  (An item is marked with an
asterisk by marking more than one response.)  If the answer key contains a
mistake, it is not necessary to rescan the students' response sheets.  Just
correct the key and rescan it.

Page   3  MADEUP   MUSCLES IN BACK                                    12-18-1993
                                        STUDENTS' RESPONSES
                              Dash if Same Response as on Answer Key
                                           
                                        1 1
     Problem Number:  1 2 3 4 5 6 7 8 9 0 1
                      
         Answer Key:  5 * 4 1 1 2 3 3 4 5  

Fackler John          3 4 - 3 2 5 - 4 5 4  
Wilkinson G Rhodium   - 4 3 2 4 5 ? - 5 1  
Matthews Virgil       - 3   4 - 3 1 4 3 4 5
Roundhill David M     4 5 5 3 - 4 - 2 - 4  
Reibenspies Joe       2 5 1 - 4 3 1 - - -  
Swartz Gary           - 3 2 - 3 3 - 4 2 -  
Rook Susan            - 5 1 3 - 3 - - 2 -  

Page   4  MADEUP   MUSCLES IN BACK                                    12-18-1993

Fackler John              Score:  3 out of 10 (30.0%)
                        5   10
      Your answer:  34-325-454 
    Answer on key:  5*41123345 

Matthews Virgil           Score:  3 out of 10 (30.0%)
                        5   10
      Your answer:  -3 4-314345
    Answer on key:  5*41123345 

Reibenspies Joe           Score:  5 out of 10 (50.0%)
                        5   10
      Your answer:  251-431--- 
    Answer on key:  5*41123345 

Rook Susan                Score:  6 out of 10 (60.0%)
                        5   10
      Your answer:  -513-3--2- 
    Answer on key:  5*41123345 

Roundhill David M         Score:  4 out of 10 (40.0%)
                        5   10
      Your answer:  4553-4-2-4 
    Answer on key:  5*41123345 

Swartz Gary               Score:  5 out of 10 (50.0%)
                        5   10
      Your answer:  -32-33-42- 
    Answer on key:  5*41123345 

Wilkinson G Rhodium       Score:  3 out of 10 (30.0%)
                        5   10
      Your answer:  -43245?-51 
    Answer on key:  5*41123345 
******************************************************************************


CONTENTS OF THE ANALYZE.ZIP FILE
--------------------------------
     ANALYZE.EXE     The ANALYZE program
     ANALYZE.BAS     Source code in QuickBasic
     ANALYZE.TXT     This file
     ANALYZFF.EXE    ANALYZE's FORMAT FINDER.  Helps make ".ini" files.
     ANALYZFF.BAS    Source code in QuickBasic
     DATA1           Sample input data files in four different formats
     DATA2             "
     DATA3             "
     DATA4             "
     SCAN.BAT        A program to run a scanner, then run ANALYZE
     ZERORCRD.INI    An initialization file with zero record lengths

RUNNING ANALYZE
---------------
     Just type ANALYZE at the DOS prompt.  The program will prompt for an 
input file.  It will propose an output file name, then allow options to be
set.  One option is to specify a different filename for the answer key file.
Press enter after choosing this option, then enter the key filename.  
Filenames (and paths) can also be entered on the command line:
       ANALYZE InputFile OutputFile AnswerKeyFile InitializationFile.ini
The command ANALYZE /? gives a little help information.

THE INITIALIZATION FILE
-----------------------
     Each line of the .ini file (except blank lines and comment-only lines) 
has the format: 

              PARAMETER = VALUE   ' Comments can follow an apostrophe.

This can be seen in the first part of the ANALYZE.INI file:

F1Default = 1  ' 0 = no item analysis    1 = analysis     2 = analysis only
F2Default = 1  ' 0 = no personal reports 1 = make reports 2 = reports only
F3Default = 1  ' 0 = no responses        1 = responses    2 = responses only
F4Default = 0  ' 0 = no text             1 = text         2 = text only
F5DefaultPrintWide = false        ' if true then wide output is default
LinesPerNormalPage = 66           ' valid numbers are 25 to 200
NoGraphicsWhenNormalWidth = false ' true if can't print extended characters.
LinesPerWidePage = 80             ' valid numbers are 25 to 200
ColumnsPerWidePage = 132          ' valid numbers are 80 to 300
CommandLinesIfWideOutput = true   ' If false, prints wide on default printer
CommandLine1 = cd\kermit
CommandLine2 = kermit
Info1 = INFO:  The answer key can contain blanks, which are not scored, and asterisks,
Info2 = which cause all responses to be scored as correct.  (An item is marked with an 
Info3 = asterisk by marking more than one response.)  If the answer key contains a
Info4 = mistake, it is not necessary to rescan the students' response sheets.  Just
Info5 = correct the key and rescan it.

     The default start-up settings can be changed.  For example, the default
is to print a 66-line explanation of discrimination.  When you first start 
using this program, teachers should get this explanation, but after a while 
I would set F1Default = 2 (no text).
     LinesPerNormalPage could be changed to, say, 60, which is the default 
setting for our HP IIP+ Laserjet printer.  Without a font cartridge, the 
Laserjet can't handle extended characters; if your printout contains things 
like , then set NoGraphicsWhenNormalWidth = true.  Lines and columns for 
wide output could be adjusted to match, say, a condensed-mode printer 
setting.  The wide output format is double-spaced in places to make it more 
readable, and does not use any extended characters.  The wide output uses 
the entire width of the page for the listing of student responses.  The 
score tabs will switch from 60 to 100 problems per line at 
ColumnsPerWidePage =120.  If you want the wide output to print on the same 
printer as normal output, instead of executing CommandLine1 and CommandLine2, 
set CommandLinesIfWideOutput = false.
     The five lines of text that begin with INFO, (At the end of Item 
Analysis in the sample output file), tells teachers about the options 
available with their scanner.  The content of these lines can be changed by 
editing the lines Info1 through Info5 in the .ini file.  The rest of the
.INI file is used to define the file structure of input files.

ANALYZE'S FORMAT FINDER
-----------------------
     The program ANALYZFF will help you figure out where name and response 
fields begin and how long they are, and it will write an initialization file
to disk for you.  When it starts it gives an explanation and suggests a 
strategy to use.  FORMAT FINDER makes initial estimates of parameter values.
These estimates are correct for the four data files that are included.  The 
next section does not need to be understood to use FORMAT FINDER.

DEFINING DATA FORMATS
---------------------
     If your scanner's output does not meet the restictions mentioned 
earlier--perhaps using spaces, tabs, or quotes in the response field, or 
using A, B, C, D, and E instead of 1, 2, 3, 4, and 5--I recommend you check 
your software program.  Your test scanner may have many output formats 
available to it; ours does.  You may be able to change your output format.  
If not, send me a message.  The following explains the parameters used 
in .ini files to describe the file structure.
	
     The first time the program runs, it asks to create an initialization
file, ANALYZE.INI, in the root directory (Unfortunately, the software does 
not use the DOS PATH setting).  The part of ANALYZE.INI that defines the 
third data format is shown below.  This file structure is present in the 
file DATA3.

   Format3Length1 = 221
   Format3Length2 = 221
   Format3NameStart = 1
   Format3NameLength = 21
   Format3ResponseStart = 22
   Format3ResponseLength = 200
   Format3ShortFirst = 0  ' set to 1 if each record ends with lf lf
   Format3Response2Start = 0 
   Format3Response2Length = 0   ' if not 0, causes a second record to be read.

     The format of an input file is determined from the length of the first 
two records in the file.  In this case the first record has a length of 221; 
the second 221.  (This format will also be identified if length of the 
second record is zero, which might occur if only an answer key is in a 
file.)  Name starts with the 1st character in the first record, and is 21 
characters long (21 characters is also the maximum that will be printed).  
Responses start with the 22nd character; there are 200 characters in the 
response field (200 is the maximum possible responses).  The last three 
parameters are discussed later.  
     It happens that each record in DATA3 ends with a carriage return.  How 
can you find this out?  How can you get the record length and positions?  
Run ANALYZE!  If it does not recognize the input file data format, it will 
display the length and first 80 characters of each of the first two records, 
and show what the characters are at the end of the first record.  The screen 
will look something like this:
____________________________________________________________________________
                           Unrecognized input file format.
MUSCLES IN BACK     53543413133543343112
1---5---10----5---20----5---30----5---40----5---50----5---60----5---70----5---80
                              First record length:  270
                      ASCII value of character 270 is 52 (4)
                      ASCII value of character 271 is 13 ()
                      ASCII value of character 272 is 10 ()
                      ASCII value of character 273 is 10 ()
                      ASCII value of character 274 is 68 (D)
             (ASCII values:  carriage return = 13; line feed = 10)
DOE JOHNNY          53142413133523343132 
1---5---10----5---20----5---30----5---40----5---50----5---60----5---70----5---80
                             Second record length:  271
____________________________________________________________________________
(Of course, the format finder program is an easier way to set parameters.)

     The file DATA1 has each record terminated with 3 characters:  a 

   Format1Length1 = 135
   Format1Length2 = 136
   Format1NameStart = 7
   Format1NameLength = 20
   Format1ResponseStart = 27
   Format1ResponseLength = 100
   Format1ShortFirst = 1  ' if each record ends with lf lf
   Format1Response2Start = 0 
   Format1Response2Length = 0   ' if not 0, causes a second record to be read.

carriage return (cr) and two linefeeds (lf).  This makes the definition of 
record length a bit more complex, because the second linefeed is considered 
the first character of the second record.  As you can see in the format 
definition, the second record is 1 longer than the first record.  This makes 
the position of Name and Response different in the first record than in all 
the rest.  To handle this, the parameter ShortFirst is set to 1, WHICH 
SUBTRACTS 1 FROM NameStart AND FROM ResponseStart FOR THE FIRST RECORD ONLY.  
NameStart AND ResponseStart REFER TO POSITIONS IN THE SECOND (or fourth) 
RECORD, NOT THE FIRST.  (ShortFirst = -12, say, could be used if the first 
record contained 12 characters more than the following records.  ShortFirst 
cannot be used to skip past an entire first record.)  

     The format below defines the data format for DATA2.  This time

   Format2Length1 = 145
   Format2Length2 = 86
   Format2NameStart = 2
   Format2NameLength = 20
   Format2ResponseStart = 22
   Format2ResponseLength = 125
   Format2ShortFirst = 1  ' if each record ends with lf lf
   Format2Response2Start = 2 
   Format2Response2Length = 75  ' if not 0, causes a second record to be read.

Response2Length is not zero, which causes more responses to be read from a 
second record.  To the program, the input file appears as follows:

 First record:    "Name on key         1243RESPONSES ARE HERE3321..."
Second record:    "1432425123214323432412321414241213214421413221..."
 Third record:    "First Student      124533223411523211134222334..."
Fourth record:    "1432424123233123243212321314234133213441413422..."

Note how the second linefeed becomes the first character in the next record 
("" represents both carriage return and linefeed).  Response2Start = 2 
indicates that the second set of responses begins with the 2nd character; 
Response2Length indicates that this set of responses is 75 characters long.  
The total number of responses is 200.  If Response2Length = 0 then the 
second record is expected to contain a name and responses from the next 
score sheet.

     Format4 defines the format used for DATA4.  Each record in this file 
ends with cr,lf.  You can view the parameters defining it in ANALYZE.INI.
Be careful if you edit these files.  EDIT in DOS 6.0. changes the last
characters in records that don't end in cr,lf.  Notepad in Windows 3.1 is 
safe to use.

USING .INI FILES
----------------
     It might occur that two different data formats have the same record 
lengths.  These could be distinguished by using different initialization 
files, say SCANNER1.INI and SCANNER2.INI.  If these files were in the 
directory TOOLS, then the commands
                           ANALYZE \TOOLS\SCANNER1.INI
                                      and
                           ANALYZE \TOOLS\SCANNER2.INI
could be used to select which data format to use.  
	
     The initialization file only needs to include lines that are different
from the default values.  (In fact, the initialization file can be empty.)  
An example of a short initialization file is ZERORCRD.INI, which is 
included.  This defines the record length as zero for all data formats.  It 
can be used to get the file structure of the included data files.  For 
example, the record length of DATA1 is normally recognized by ANALYZE, but 
ZERORCRD.INI sets the defaults to zero, so if the command 
                        ANALYZE ZERORCRD.INI DATA1 
is given, the file format is not recognized, and instead the file structure
of DATA1 is shown.

SCAN.BAT
--------
     This file is an example of how ANALYZE can be run automatically after
scanning.  The command 
                              SCAN KING1203
starts the scanning software on our OptiScan 5.  We use the teacher's first 
four letters, then the month and day as names for scored test files.  The 
output file has .DAT appended, giving in this case KING1203.DAT as the 
scanner output file, and ANALYZE input file.

ANALYZE.BAS
-----------
     This is the source code for the program.  It was written using 
Microsoft's QuickBasic 4.5, which ran me $75 in 1990.  On my 386, the 
program takes 75 seconds to analyze 400 tests of 200 questions each.  This 
includes all the options, and generates 196 pages of output (99 pages using 
the wide-output format).  


     If you get this program to work, or if you don't, I would be happy to 
hear from you.  If you create a .ini file, I would appreciate getting a copy 
of the .ini file, a data input file, and the brand of scanner.  It could be 
included with this package.  (Set file type to binary if you send a data 
input file)  If you have a suggestion, or find an input format that won't 
work, let me know; perhaps I can update the program.   
					
                        Christopher King
                        West Virginia State College
                        Department of Chemistry
                        Institute, WV 25112-1000
                        Internet Address:  IN%"KING@WVSVAX.WVNET.EDU"