                                                                 1
    

















                                 EPISTAT
                           Statistical Package
                      for the IBM Personal Computer

                               Version 3.0




                      Written by:
   
                         Tracy L. Gustafson, M.D.

                             Copyright 1984
                                            
                                                                 2





                              INTRODUCTION

   
        EPISTAT is a collection of programs written in BASICA for 
   statistical analysis of small to medium-sized data samples ( < 28
   samples or variables and < 2000 total data entries per file).
   The 25 programs in EPISTAT perform more than 40 common statistical
   tests or functions and provide utilities for data entry, editing,
   printing, graphing, sorting, selecting, transforming and crosstabs.

        The programs are intended to be as self-explanatory and user-
   friendly as possible.  You do not need to memorize this guide
   before using the programs.  On the other hand, neither the programs
   nor this manual purport to TEACH the proper use or interpretation
   of statistics.  The user must have some familiarity with the kinds
   of data required and the underlying assumptions appropriate to each
   statistical test.


   For further explanations of tests, refer to:

   1.  Colton, Theodore. Statistics in Medicine. Little, Brown and Co.
         Boston, 1974.
   2.  Fleiss, Joseph.  Statistical Methods for Rates and Proportions.
         John Wiley and Sons. New York, 1981.
   3.  Snedecor, George W. and Cochran, William G. Statistical Methods.
         Iowa State Univ. Press. Ames, Iowa, 1978.
   4.  Schlesselman, James. Case-Control Studies. Oxford Univ. Press.
         New York, 1982.










   CAVEAT:
        These programs have been tested extensively, but I cannot 
   guarantee that they will work correctly with every possible data set.
   Incorrect results are usually due to errors in format or type of
   data entered.  If you believe you have discovered an error in the
   programs, please write me.  I intend to correct any bugs that are
   brought to my attention.
        It is good practice to regularly compare the results obtained
   by programs in EPISTAT with results obtained by your previous method
   of calculation.  ANY unexpected result should be questionned and
   double-checked by reference to tables or another method of
   calculation.

                                                                  3







                        INDEX TO EPISTAT

   The following statistical tests and functions are available:
                                    
   TEST or FUNCTION                                  PROGRAM NAME
   ----------------                                  ------------
   Analysis of variance (1 and 2-way)...................ANOVA
   Bayes' theorem.......................................BAYES
   Binomial distribution................................BINOMIAL
   Chi-square test and distribvtion.....................CHISQR
   Correlation coefficients.............................CORRELAT
   F distribution.......................................ANOVA
   Fisher's exact test..................................FISHERS
   Linear regression analysis...........................LNREGRES
   Mantel-Haenszel Chi-square test......................MHCHISQR
   Mantel-Haenszel for multiple controls................MHCHIMLT
   McNemar's test.......................................MCNEMAR
   Mean, median and standard deviation..................DATA-ONE
   Normal distribution..................................NORMAL
   Poisson distribution.................................POISSON
   Random sample generator..............................RANDOMIZ
   Rank sum test........................................RANKTEST
   Rates adjusted (direct and indirect).................RATEADJ
   Sample size calculations..........,..................SAMPLSIZ
   Signed rank test.....................................RANKTEST
   Student's T-test and T distribution..................T-TEST






   The following data-handling capabilities are provided:

   DATA MANIPULATION                                  PROGRAM NAME
   -----------------                                  ------------
   Determine best test and program names................EPISTAT
   Graph histograms.....................................HISTOGRM
   Graph scattergrams...................................SCATRGRM
   Perform data transformations.........................LNREGRES
   Print data (sorted or input order)...................DATA-ONE
   Print crosstab reports...............................XTAB
   Select specific records..............................SELECT
   Transfer data between EPISTAT files..................FILETRAN
   Transfer data from FORTRAN to EPISTAT files..........FORTRANS

                                                                   4



                     SYSTEM REQUIREMENTS FOR EPISTAT

               MINIMUM                            OPTIMAL
         IBM PC with 64K RAM                IBM PC with 96K RAM
         One 160K disk drive                Two 320K disk drives
         Monochrome monitor                 Color graphics adapter
         BASICA                             Hi-res color monitor
                                            BASICA
                                            IBM, Epson, Okidata, or
                                            C. Itoh Prowriter printer
                                            with graphics capability




                       OVERALL PROGRAM DESCRIPTION
   

        All calculations in EPISTAT are performed using single precision.
   Although it may first appear that double precision would be more 
   appropriate for statistical tests, "double" precision makes little or
   no real improvement in precision in these programs.  Many of the
   algorithms used to evaluate p values use trigonometric functions which
   are calculated in single precision anyway.  For best results, data
   entries should be numbers between 1E+7 and 1E-7.  Larger or smaller
   numbers should be multiplied by an appropriate power of 10 before
   entry and analysis in EPISTAT.


        All EPISTAT programs are written so that as much pertinent
   information about the test as possible can fit on the final screen.
   This feature allows a summary printed copy to be produced simply by
   pressing <Shift-PrtSc>.  This will work any time there is a pause in
   the program display.  Six programs, "DATA-ONE", "HISTOGRM",
   "RANDOMIZ", "SCATRGRM", "SELECT", and "XTAB" produce printed reports
   without using <Shift-PrtSc>.  In these, follow program instructions
   to route output to your printer.
   

        EPISTAT is the introductory program in the EPISTAT package.
   DATA-ONE is the major data entry, editing, and printing program.
   Most of the programs in EPISTAT can evaluate data entered and saved
   using DATA-ONE.  Many of the programs can, in addition, evaluate
   summary data.  The programs marked with a star (*) below can
   evaluate data entered in DATA-ONE.  Non-starred programs provide
   their own data entry routines.



        The EPISTAT disk should be placed in drive A (or other default
   drive) when loading any program because "EPIMRG" and "EPISETUP.DAT"
   are used by every program.  Once a program is running, EPISTAT can
   be removed from drive A if necessary.

                                                                  5


                    INDIVIDUAL PROGRAM DESCRIPTIONS


    (1)                        "EPISTAT"
        This introductory program lists the available programs and aids
   the user in selecting the best statistical test.  It also allows one
   to specify hardware configuration and colors for a color monitor.
   Choose colors 7,0,0 if you have a monochrome monitor connected to
   the color/graphics adapter.  If yours is not one of the listed printers,
   check your printer's codes for the typeface you want.  For example,
   the code for elite type on the Prowriter is ESC "E".  If you press
   Escape then E, the display will show the decimal ASCII codes: 27 69.
   An alternate method is to press <Alt> and enter the decimal code on
   the numeric keypad.  Press <Enter> when the complete code is entered.

                               "DATA-ONE" *

   DATA ENTRY:
        This is the central keyboard data entry program for the EPISTAT
   package (for non-keyboard data entry, see FILETRAN and FORTRANS).
   Initial data entry (Option 1) first asks you to name your samples or
   variables.  Then type in the data, pressing <Enter> twice after each
   entry.  The maximum number of samples or variables (S) allowed is
   28 with a color adapter and 7 with a monochrome adapter.  The maximum
   number of records in each sample is 2000/S.  A blank record can be
   entered by pressing <Enter> then key F2.  To exit, press <Enter> then
   key F10.  The mean, median and (n-1) standard deviation are then
   displayed.  When you return to the main menu, SAVE your datafile to 
   disk (Option 5) for future modification or use by other programs
   in the EPISTAT package.
        Although all entries in a datafile are treated as numbers by 
   DATA-ONE, it is possible to enter characters (names) in a record.
   Characters will be treated as zeros in calculations.  Nevertheless,
   it improves data readability to use the "Sample 1" column for record
   or case names.  Thus, DATA-ONE allows one to specify a name for each
   column (variable) and each row (case) in the datafile.

   DATA MODIFICATION:  
        APPEND (Option 2) allows one to add more observations to a sample
   at a later session.  EDIT (Option 3) allows one to delete or replace
   incorrect data entries and to change sample or variable names.  When
   you return to the main menu, SAVE modified data to disk again.

   
   PRINTING DATA:
        To view or review a datafile, a printout to screen or printer can
   be selected (Option 4).  To print a datafile exactly as it was keyed in,
   request the printout in INPUT order.  DATA-ONE can also print the
   data SORTED by any selected sample.  Only numeric data is sorted by 
   DATA-ONE, so it will not alphabetize a character field.  Blank records
   are not sorted, either.

   SAVING DATAFILES and LOADING DATAFILES:
        SAVING data (Option 5), writes your data to disk in a sequential
   file for later editing, review, or use by another program.  DATA MUST
   BE SAVED TO DISK before it can be used by other programs in EPISTAT.
   Since EPISTAT must be in drive A: (or other default drive) to begin,
   you will probably want to SAVE datafiles on drive B.  To do so,
   precede each datafile name with B: (e.g. B:TESTDATA).  Do not enclose
   filenames in quotation marks.

                                                                  6
 

    (3)                        "ANOVA" *

        Provides ONE-way and TWO-way analysis of variance.  One-way ANOVA
   compares the means of 3 or more samples.  Two-way ANOVA compares the
   combined effects of 2 variables on a third (ROW and COLUMN effects).
   All samples in two-way ANOVA must have the same number of elements.
   ANOVA prints sample means, (n-1) variances and sums of squares.
   It also evaluates a known F value. (Snedecor, pp. 258-338)

    (4)                        "BAYES"

        Using Bayes' theorem, this program calculates the rates of false
   positive and false negative tests given different sensitivities,
   specificities and outcome incidences.  Using the formula in a different
   way, it calculates the prior probability of several outcomes given a
   positive test. (Fleiss, p. 5)

    (5)                       "BINOMIAL"

        The binomial distribution allows calculation of the probability
   of an observed number compared to the expected.  It assumes the variable
   is dichotomous and has an equal probability of occurring in each trial.
   It calculates the ONE-tailed probability of the observed number and all more 
   extreme situations.  For example, the ONE-tailed probability of 2 heads in 
   10 tosses of a coin is the sum of the probabilities for 0,1 and 2 heads. 
   (Colton, p. 151)

    (6)                        "CHISQR"

        The Chi-square program evaluates a table of data or a known
   chi-square value.  2 by 2 tables are evaluated using Yates' correction
   and the odds ratio and its confidence limits are calculated using
   Cornfield's method (Schlesselman, p. 175,177).  A Chi-square test
   for trend can also be performed. (Sclesselman, p. 201)

    (7)                       "CORRELAT" *

        Pearson's correlation coefficient and Spearman's rank correlation
   assess the relationship between paired variables.  The probability
   of a given Pearson R value is evaluated using the T distribution.
   (Colton, p. 212)

    (8)                       "FILETRAN" *

        On occasion, you may need to compare 2 samples that are in separate 
   datafiles.  Or you may have a data set with more than 28 variables that
   you split between two or more datafiles.  EPISTAT programs will only
   compare samples that are in the same datafile, FILETRAN allows you to 
   transfer samples between two datafiles.  You may create a new datafile by 
   selecting one sample from DATAFILE #1 and another from DATAFILE #2.
   FILETRAN can also combine two samples by APPENDING one to the other.
   
    (9)                       "FISHERS"

        Fisher's exact test evaluates 2 by 2 tables of discrete variables.
   It is particularly valuable when the Chi-square test is inappropriate
   because the expected value for a cell is < 5.  However, this program
   can evaluate some tables where A+B+C+D > 200.
                                                                         7


    (10)                       "FORTRANS"

        If your data was previously entered into a FORTRAN or any other  
   sequential card image file, FORTRANS may be able to transform it into an 
   EPISTAT datafile.  You must know the record length (including spaces, 
   carriage return and line feeds), the columns that contain data you want to 
   transfer, the number of decimal places, and missing value codes.  FORTRANS 
   can also extract selected data items from DBASE(R) "SDF" type files and 
   from LOTUS(R) "PRN" print files and place them in an EPISTAT datafile.
   Follow the on-screen directions to transfer your data.
      
    (11)                      "HISTOGRM" *

        The histogram program graphs a data sample according to user
   specifications on the high resolution graphics screen.  To obtain
   a printed copy on the IBM, Epson, Okidata or Prowriter (specified in
   "EPISTAT") press key F1.  Press F10 to return to the program.

    (12)                      "LNREGRES" *
   
        Linear regression analysis calculates the least-squares regression
   line for paired samples.  It then uses the T distribution to determine
   if the calculated slope is significantly different than zero. (Colton
   p. 199)  LNREGRES also provides a variety of data transformations.
   Transformed data can be saved to disk for future use or printout.

    (13)                      "MHCHISQR"

        The Mantel-Haenszel Chi-square test evaluates the relationship
   between two discrete variables while controlling for the effect of
   a third variable.  It also calculates an odds ratio and 95% confidence
   limits. (Schlesselman, pp. 183,206)

    (14)                      "MHCHIMLT" *

        The Mantel-Haenszel Chi-square test for multiple controls compares
   a case sample with 2 or more matched control samples, and calculates
   a probability and an odds ratio. (Fleiss, p. 125)  MHCHIMLT can
   evaluate summary data or raw data entered using DATA-ONE.  If using
   DATA-ONE, data should be coded as "1" for factor present, and "0" for
   factor absent in each case and control sample.

    (15)                      "MCNEMAR"

        McNemar's test (paired Chi-square test) evaluates 2 by 2 tables
   of paired discrete variables using Yates' correction and calculates
   an odds ratio and 95% confidence limits. (Schlesselman, p. 210)

    (16)                       "NORMAL" *

        The normal distribution has innumerable uses in statistics.  This
   program specifically addresses three situations: (1) It compares
   a sample mean to a population mean. (2) It calculates the proportion
   of samples that would be expected to fall in any given range under
   the normal curve.  (3) It calculates the two-tailed probability
   associated with any given value of z.

                                                                  8


    (17)                      "POISSON"

        The Poisson distribution applies to dichotomous variables when
   the number of successes can be counted, but the number of failures
   cannot.  This program calculates a ONE-tailed probability.

    (18)                      "RANDOMIZ"

        This random sample generator aids in the selection of random
   samples for several purposes.  It can provide a random subset of a 
   larger population, or it can assign cases randomly to independent or
   paired groups for case-control studies.

    (19)                      "RANKTEST" *

        Two non-parametric tests of significance are performed by this
   program.  They are appropriate for small samples which are clearly NOT
   normally distributed.  They also specifically apply when quantitative
   variables are not available but qualitative ranks are.  The RANK SUM
   TEST compares 2 independent samples.  The SIGNED RANK TEST compares the
   medians of paired samples.  RANKTEST calculates the TWO-tailed
   exact probability associated with the various rank sums.  Note that
   for samples larger that 20 observations, the latter calculation can
   take several minutes. (Colton, pp. 219-222)


    (20)                      "RATEADJ" *

        The rate adjustment program will adjust sample rates by either
   the direct or indirect methods.(Colton, pp. 47-51)  For the direct
   method, the datafile must include the study sample rates and the
   standard population figures.  For indirect adjustment, the datafile
   used must include the study population figures and the standard
   population rates.  For indirect rate adjustment, RATEADJ evaluates
   the probability of the observed number of cases using the ONE-tailed
   Poisson distribution for small numbers, or the Chi-square 
   distribution for large numbers.

    (21)                       "SAMPLSIZ" 

        The sample size program calculates the approximate sample sizes
   required to achieve statistical significance given certain specified
   levels of certainty.  Adjustments are made if the user desires more
   than one control per case. (Schlesselman, p. 168)

   For a survey:   TP = total population    pi = population proportion
                   d = maximum acceptable error in sample proportion

                     n = [ z(a)*SQR(pi*(1-pi)) / d ] squared
                            N = n / (1+n/TP)

   For a paired case-control study:  (Colton, p. 161)

    N = [(z(a)*SQR(pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT))) / (PT-pi)] squared

   For an unpaired case-control study: (Fleiss, p. 41)

       [(z(a)*SQR(2*pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT)+PC*(1-PC))]
  N = [-----------------------------------------------------------] squared
                               (PT - PC)  

                                                                  9




    (22)                       "SCATRGRM" *

        The scattergram program graphs paired variables according to 
   user specifications on the hi-res graphics screen.  To add the linear
   regression line, press key F5.  To obtain a printed copy on the IBM,
   Epson, Okidata or Prowriter (specified in "EPISTAT"), press key F1.
   Press key F10 to return to the program.

    (23)                       "SELECT" *

        This program allows the user to select any combination of 
   records for printout.  It can also create a new disk datafile that
   is a select subset of the original.  One can select on any variable
   with "AND" and "OR" specifications.  As many as 10 selection criteria
   can be set at one time.  SELECT assumes that "AND"s are in parentheses.
   For example:
     "SELECT IF Sample #1>10 AND Sample #2=1 OR Sample #1<Sample #3"
   is interpreted as meaning:
     "SELECT IF (Sample #1>10 AND Sample #2=1) OR Sample #1<Sample #3"

    (24)                         "T-TEST" *

     The Student's T-test compares the means of two samples.  The
   program provides both paired and unpaired T-test calculations.
   Variances (n-1) are displayed and, for independent samples, the
   equality of variances is tested to be sure that the assumptions
   of the T-test are met.(Snedecor, p. 116)  T-TEST will also 
   evaluate a known T value.

    (25)                          "XTAB" *

        The crosstab program generates 1,2 or 3-way crosstab reports.
   It allows the user to specify the crosstab criteria as well as a name
   for each row and column so that the report will be readable and
   easily interpreted.

   TRY THE EXAMPLE:                                
        An example datafile, named "EXAMPLE", showing a sample of people,       
   their ages and their systolic blood pressures, is included on the EPISTAT   
   disk.  To gain some familiarity with the appearance of an EPISTAT datafile,  
   follow these steps:
   1.) Press <Ctrl> and <Alt> and <Del> at the same time (or load BASICA, then  
       type RUN "EPISTAT") to run the introductory program.  Do not change the
       default configuration for now, but move on to the main menu.  
   2.) Choose Menu option 3 to run specific programs in the EPISTAT package.
   3.) Choose program number 2 to run "DATA-ONE", the main data entry and       
       printing program in EPISTAT.
   4.) Choose Menu option 6 to load data from disk.  Then enter the filename 
       EXAMPLE without any quotation marks.  
   5.) Return to the main DATA-ONE menu and choose option 4 to print this 
       datafile on your screen or printer.   Print it once in input order, 
       then try printing it sorted by Sample 2 or 3.
   6.) Choose menu option 7 to exit DATA-ONE ,then enter Y because EXAMPLE      
       was already saved to disk.  Choose other EPISTAT program numbers to    
       run ANOVA, HISTOGRM, LNREGRES, SCATRGRM, or XTAB with this datafile.
   7.) Return to DATA-ONE to enter your own data for analysis.

                                                                  10


                               NOTICE

   ---------------------------------------------------------------------
   Users may copy EPISTAT and distribute it to others on the following
   conditions:
     1.  The programs are not modified in any way.
     2.  Individual programs are not distributed separately.
     3.  No fee is charged for copying or distribution.
   ---------------------------------------------------------------------


                     ====USER-SUPPORTED SOFTWARE====                  

        The concept of user-supported software is based on three
   principles:

     1.  The value and utility of a software package is best assessed
         by each user on his or her own system with his or her own data.
         Only after using a program can one determine whether it serves
         one's personal applications, needs, and tastes.
   
     2.  The creation of independent personal computer software requires
         a substantial commitment of time and effort.  Rather than
         replicate this effort time after time, the computing community
         can and should support individual creative efforts.

     3.  By encouraging users to copy programs, rather than spending
         large sums on copy-protection, authors can supply quality
         software at reduced cost.  Users will support useful programs.
                               

        If after using EPISTAT, you find it of value, your contribution
            in any amount will be appreciated ( $25 suggested ).

   Send contributions to:

                          Tracy L. Gustafson, M.D.
                          1705 Gattis School Road
                          Round Rock, Texas    78664



                                 Thank you.
                                 

