        An assembly language tutor in MAX syntax by Sherman M Barney
        
        This file is not intended to teach everything about assembly 
        language.  It is just a brief introduction.  I suggest lots 
        of reading in books.  

INTRODUCTION
------------
        Assembly language is a way of communicating with a computer
        in it's "native tongue".  It is a translator to machine
        language (binary codes which instruct the CPU on what to do).
        It has been called a "low level" language because it "speaks"
        to the computer at it's level.  In contrast, high level 
        languages like BASIC attempt to speak to the computer in YOUR 
        language.  Some languages are too cryptic to be considered 
        anybody's language!  

        High level languages and assembly language each have their
        advantages.  Since most of my programming experience is in
        various forms of BASIC, I will make comparisons to it in this
        tutor.  BASIC programs are easier to understand by looking at
        the source code.  They still need comments though.  A BASIC
        program will have far fewer lines, but will run slower than
        an assembly language program.  That is because each BASIC line
        gets interpreted into many machine language lines.  Compiled
        BASIC programs run faster, but still don't achieve the fastest
        or most compact code possible.  Some DOS functions can't be
        executed at all in BASIC.

BINARY AND HEXADECIMAL NUMBERS
------------------------------
        Assembly language requires some basic knowledge of how the
        computer hardware is arranged and operates.  All CPU data and
        instructions are in the form of voltage levels which represent
        Binary numbers.  For example, 0 volts may represent a Binary  
        0, and 5 volts may represent a Binary 1.  Binary is the base
        two number system.  It has only the numbers 0 and 1.  The 
        number 10 in Binary has a value of 2 in Decimal.  The number
        100 in Binary has a value of 4 in Decimal.  Each Binary 0 or
        1 is a Binary digit (a bit).  Four bits are called a nibble.
        Eight bits are called a byte.  Sixteen bits are called a word.
        
        The original IBM PC used an Intel 8088 for the CPU (central 
        processing unit).  An instruction in Binary may look like 
        this: 1011001000010000.  In assembly language a Binary number
        is usually followed by a B: 101100100000100000B.

        This is a very difficult form for Humans to work with.  We can
        convert it to base 16 (Hexadecimal or HEX for short) to make 
        it easier.  First separate it into groups of 4 digits (right 
        to left): 1011 0010 0001 0000.  Now assign the Binary value to 
        each column of each group (8 to left column, 4 to the next, 2 
        to the 3rd, and 1 to the right).  This gives us the result:
        11 2 1 0.  In HEX the number 11 is indicated by the letter B, 
        so we have B210.  In BASIC this would be written as &HB210.  
        Often in assembly language it will be written as 0B210H.  
        Notice that 4 bits = 1 HEX digit, and 1 byte = two HEX digits.

        The HEX code B210 is easier to recognize than the Binary form
        of the number.  It is still not easy enough for efficient
        programming.  The assembly language equivalent is MOV DL,8.
        This says move the number eight into the DL register.  Now it 
        is understandable by Humans.  You can convert one form to the 
        other using DEBUG.  Use the A (assemble) feature to enter the 
        instruction.  Then use the U (unassemble) feature to see both 
        forms.  See your DOS manual or the section "Using DEBUG" in 
        the file MAX.DOC.

WHY USE MAX?
------------
        If DEBUG can assemble programs, why do I need MAX?  Yes, you
        can assemble short programs using DEBUG, but for more than a
        few lines there are some problems.  First, you can't use any
        comments.  These are essential if you want to understand what
        you wrote later on.  Second, you can't use labels.  All jumps
        and calls are to addresses.  The number of bytes that the 
        jumps and calls require depends on how far they are going, so
        after you choose the address, it may change!  Third, if you
        want to modify the program later, it can be nearly impossible.
        Suppose you replace a 2-byte instruction with a 3-byte one.
        The extra byte will overwrite the next instruction, forcing
        you to type it in again.  This problem will ripple down to
        the end of the program!  Fourth, in addition to the CPU 
        instructions, MAX has some extra things called assembler 
        directives or pseudo-ops.  See the .doc file for the "rules"
        and details regarding these.  Fifth, if you use DEBUG wrong, 
        it will write to absolute disk sectors, likely destroying 
        other programs.

INSTRUCTIONS AND DIRECTIVES
---------------------------
        The designers of the CPU gave it many functions.  Each one is
        accessed by a unique combination of 1s and 0s on a certain set
        of pins at a certain time.  These unique inputs are the CPU
        instructions.  In assembly language you use a mneumonic to 
        represent the instruction.  A mneumonic may be followed by 0,
        1, or 2 operands.  See the file 8088.TXT for a list of all 
        8088 instructions with a brief description and one correct 
        example of how to use it with MAX.  Many of them will be 
        discussed in this document.  
        
        Assemblers use additional words and mneumonics to process the 
        source code before it is presented to the CPU.  They are 
        called assembler directives or pseudo-ops.  Some are only one          
        character.  MAX uses the DOS program DEBUG as a go-between.  
        If the code received by DEBUG is not a correct CPU instruction, 
        an error indication will result, and assembly will stop.

        MAX has a limited number of directives, making it easy to learn.
        The semicolon (;) specifies that the text following it is a
        comment (unless it is inside quotes).  Max will remove them. 
        The comments are for you (or someone else) to remember how your
        program works.  You should have several lines before each piece
        of code as an explanation, and a comment after most of the 
        lines to explain each one.  I will use the (;) in this document
        for comments.  I will enclose instructions with their operands
        in { } for clarity in text.
        
THE 8088 REGISTER SET
---------------------
        The 8088 has fourteen 16-bit registers.  Four of them are the
        data registers AX,BX,CX, & DX.  They are the most flexible and
        most often used.  Each one can be used as a 16-bit register or
        two 8-bit registers.  For example, the 8 high bits of the AX
        register are the AH register.  The low 8 bits are the AL 
        register.  The AX (or AL) register is called the accumulator.  
        Use it for fastest execution of some instructions and for all
        I/O (input or output) instructions.  BX has a special use in  
        addressing memory.  It is called the base register.  CX is
        often used as a counter for repetitive instructions like LOOP.
        It is called the count register.  DX, the data register, has
        only one special use: to contain I/O port addresses for I/O
        instructions.
                
        The next 4 registers are the pointer and index registers.  SP
        is the stack pointer.  BP is the base pointer.  These two are
        used for reading and writing from a stack (explained later).
        The source index (SI) and the destination index (DI) are used 
        for addressing memory.  All 8 of these, but none of the others,  
        can be used for arithmetic.

        Four of the registers are called segment registers.  CS is the
        code segment, DS is the data segment, SS is the stack segment,
        and ES is the extra segment.  A 16-bit register can address
        65,536 (64K) memory locations.  Each 64k block is called a
        segment of memory.  The 8088 addresses 1 megabyte of memory.
        To do this a 20-bit address is required.  This is done by 
        shifting a segment register left one HEX digit, and adding it
        to another 16 bit address.  The IP register (called the 
        instruction pointer) is used.  This gives addresses the form 
        CS:IP which looks like this in DEBUG: C609:0100.  The part 
        left of the colon (:) is the segment portion.  The part right 
        of the (:) is the offset within the segment.  The above 
        address is C6090 + 0100 = C6190.  Segments can be anywhere, 
        overlapping or separated by odd amounts.

        The IP register points to (contains the address of) the next
        instruction to be executed by the CPU.  After it is executed,
        IP is incremented by 1, so the next instruction can be 
        executed.  Jump or call instructions change where IP points,
        so programs can skip around.

        The last register, the flags register, is a collection of 
        separate bits used as individual flags.  The flags are flip-
        flops set to 1 or 0 depending on the result of the last 
        arithmetic or logical operation.    
        
THE FLAGS
---------
        There are 9 flags in the 8088.  Three are special purpose.  
        The Interrupt Flag indicates whether interrupts (discussed
        later) are enabled or disabled.  The Direction Flag is used
        only with string primitive instructions (discussed later).
        The Trap Flag allows debuggers (such as DEBUG) to single step
        the CPU.  When set it causes an interrupt to occur each step
        of the program.  The other 6 flags deal with arithmetic and
        logical operations.

        The Carry Flag is set to 1 if an arithmetic operation causes
        a carry out of the high order bit.  It can be changed by the
        shift and rotate instructions, is set by STC, cleared by CLC,
        and complemented by CMC.  

        The Overflow Flag is set to 1 if there is a carry out of the 
        high order bit, but not a carry in, or vice-versa.  The Zero
        Flag is set to 1 if an operation produces 0.  The Sign Flag is
        set (to 1) if an operation produces a 1 for the high order bit.
        The Parity Flag is set if an operation produces a result whose
        low-order 8 bits have an even number of 1s.  The Auxiliary 
        Flag is set if a carry or borrow occurs in or out of the low-
        order 4 bits.  It is used mostly for BCD or Decimal arithmetic.
        All flags are cleared (reset) if they are not set.

CONDITIONAL JUMPS
-----------------
        The conditional jumps work with the flags.  They allow your
        programs to branch to other lines depending on the outcome
        of operations preceding them.  They have a limited range of
        128 bytes, so I call them the "short jumps".  One set is used
        with unsigned binary numbers:

          JA  (jump if above)
          JAE (jump if above or equal)  or  JNC (jump if no carry)
          JE  (jump if equal)           or  JZ  (jump if zero)
          JNE (jump if not equal)       or  JNZ (jump if not zero)
          JBE (jump if below or equal)
          JB  (jump if below)           or  JC  (jump if carry)

        The Mneumonics in the right column are synonyms for those in 
        the left column.  Be aware of this when you disassemble a
        program using DEBUG.  You may see a different one than you
        expected to see.  The ones in the right column refer directly
        to the state of the zero or carry flags.  The ones in the left
        column are intended for use after the CMP (compare) instruction.
        Example:
                assembly language                       BASIC
                -----------------                       -----
                CMP AX,BX                       10  IF A < B THEN 1000
                JL SECTION_1                            ...
                    ...                         1000 'branch point       
                SECTION_1: ;branch point

        The LOOP instruction acts as a DEC (decriment) and a JNZ in
        one instruction.  It works only with the CX register and does
        not change any flags.  Loop keeps jumping until CX is zero.
        Example:
                assembly language                       BASIC
                -----------------                       -----
                MOV CX,10                       10  FOR I = 1 TO 10
                L1:                             20  NEXT I
                LOOP L1

        There are two more loop instructions:
                LOOPZ  (loop if CX is not zero and ZF is set)
                LOOPNZ (loop if CX is not zero and ZF is not set)

        The following conditional jumps are for signed binary numbers:
                        JG  (jump if greater than)
                        JGE (jump if greater than or equal to)
                        JLE (jump if less than or equal to)
                        JL  (jump if less than)
                        JS  (jump if sign)
                        JNS (jump if not sign)
                        JO  (jump if overflow)
                        JNO (jump if not overflow)

        Three other conditional jumps are:
                        JCXZ (jump if CX is zero)
                        JPO  (jump if parity odd)
                        JPE  (jump if parity even)

DATA MOVEMENT INSTRUCTIONS
--------------------------
        The next CPU instruction you need to learn is the MOV 
        instruction.  You will use it a lot.  It takes the form: 
        MOV destination,source.  Imagine a curved arrow starting above 
        the word source, and pointing to the word destination.  All 
        8088 instructions with two operands use this left-moving 
        notation.  The MOV instruction allows you to move data between 
        registers, from a register to memory, from memory to a 
        register, or move a constant into a register or memory location.  
        To move a constant (a number) into memory, you need to use the
        byte or word form of MOV (MOVBY or MOVWO)  Examples:

                MOV AX,BX       ; Move the contents of BX into AX
                                ; This is like A = B in BASIC. 
                MOV VARBL,DX    ; Move the contents of DX into memory.
                MOV AX,[BX]     ; Move the contents of the memory location
                                ; specified by BX into AX.

                MOV BX,5        ; Move the number 5 into BX.
                                ; This is like B = 5 in BASIC.
                MOVBY VARBL,5   ; This also. (move byte 05 into memory)    
                MOVWO VARBL,5   ; Move word 0005 into memory.

        The XCHG instruction allows you to exchange the contents of
        one register with another or with a memory location.  The LEA
        instruction loads the effective address of a variable in memory
        (not the contents) into a register.  XLAT replaces the value
        in AL with a new value from a table in memory.  BX must point 
        to the beginning of the table, and the value of AL determines
        which item of the table is used.  The IN and OUT instructions
        read data from an input port and send data to an output port.

CREATING DATA SPACE
-------------------
        Unlike BASIC, you have control over where in memory your 
        variables are going to be.  The DB (define byte) and DW 
        (define word) directives allow you to set aside space for byte 
        or word variables (or larger variables or strings).  Another 
        directive DUP allows duplicating bytes or words to easily set 
        aside large data blocks.  It is similar to the DIMension 
        statement in BASIC.  The DB directive is used with quotes for 
        text, since an ASCII character takes up one byte of memory.  
        Variables have names so that they can be located by the 
        assembler.  Examples:

            string  db  "This text can be printed," 
            db  'and this will follow it.'
            db 4 dup (0)                    ; same as  db 0,0,0,0
            variable_1  dw  256 dup (0)     ; define space for 256 words
            variable_1  db  512 dup (0)     ; same result as above
                                            ; except it uses more lines.
            varbl2:  db   0                 ; the (:) is optional   
                
ADDRESSING MODES
----------------
        There are many ways of specifying a memory address.  This gives
        variety to the CPU instructions.  You can't refer to memory by
        an explicit address because absolute addresses are not known
        until a program is loaded into memory.  There would be no need
        to keep track of them anyway.  Memory locations are refered to
        by variable names or by items within square brackets: [].  The
        simplest way is called direct addressing because the address 
        of the data source or destination is given directly in the 
        machine language instruction.  Sometimes you want to load the
        address of a variable instead of the contents of the memory at
        that location (the contents is the value of the variable).  
        For that you can use the OFFSET directive or the LEA 
        instruction.  This kind of addressing is often called 
        immediate addressing.  A powerful mode is called indirect or 
        indexed addressing.  In the most simple form of this mode a 
        register contains the address whose contents are to be used as 
        the source or destination address.  Examples:

                MOV AL,VARIABLE_1     ;Direct. variable contents to AL.
                MOVBY VARIABLE,"A"    ;  "     hex code of A to memory.
                MOV DX,OFFSET VAR     ; Immediate. address of var to DX. 
                MOV AL,[BX]           ; Indirect. BX specifies address  
                                      ; of variable to move.
                MOVWO [BX+1],3        ; Indirect. moves 0003 to the next
                                      ; address past where BX is pointing.
                MOV [BP+SI-6],AX      ; More complex indirect.
                MOV VARIABLE[BX],AL   ; Indexed. this is like dimensioned
                                      ; variables in BASIC:  V(B)=L
                MOV [BX+VAR],AL       ; Same as above
                MOV VARIABLE,VAR      ; WRONG! memory to memory not allowed.
                MOVBY [VARBL],3       ; WRONG! square brackets will be  
                                      ; ignored in this case.
                MOV DX,VAR +1   ; This doesn't add 1 to the value of VAR
                                ; as you might expect. It moves the value
                                ; of the next memory location into DX.
                                ; It is equal to { MOV DX,[OFFSET VAR +1] }

        Only the registers BX,BP,SI,DI can be used in [].  You can't
        use BX with BP, or SI with DI.  Notice the use of (+) and (-)    
        in the examples above.  They allow you to choose a value from
        a group or table, like an array variable in BASIC.

THE STACK
---------
        The stack is an area of memory (anywhere you want) with a 
        consecutive list of data.  It is a last-in-first-out buffer
        like a stack of dinner plates with data written on them.  The 
        last item put on the stack is the first one taken off.  The 
        instruction PUSH adds an item to the stack.  The POP 
        instruction removes it.  You can PUSH and POP the contents
        of any 16-bit register except IP or Flags, or a memory 
        location.  The SP (stack pointer) register keeps track of 
        where in memory the top of the stack is located.  When you do
        a PUSH AX, SP is decrimented by 2.  The top of the stack is 
        the lowest numbered memory address in the stack.  In other
        words, the stack is upside-down in memory.  PUSHF and POPF
        push and pop the contents of the flags register.  The stack        
        is useful for temporarily storing register contents.  At the
        beginning of a procedure (subroutine) you PUSH the registers
        used in the procedure that are not used for sending data to,
        or recieving data from the procedure, then POP them at the 
        end (reverse order).  This returns their values to what they 
        were before, preventing unexpected results.  High level 
        languages make extensive use of the stack, even multiple stacks.

JUMPS AND CALLS
---------------
        { JMP label } is an unconditional jump like GOTO in BASIC.     
        It puts the address of the label in IP (instruction pointer) 
        so that it is the next instruction.  JMP is not limited to 
        128 bytes like the conditional jumps.  With MAX the JMP takes
        2,3,or 5 bytes depending on how far the jump goes.  You can
        force 3 bytes with JMPNEAR, or 5 bytes with JMPFAR if you 
        want to.  You can use JMP with the conditional ("short") jumps
        to make long conditional jumps.  Another form of JMP is { JMP
        reg or mem } where reg is any "legal" register and mem is
        "effective address": [BX] or [BP] + [SI] or [DI] + number.
        This form is useful for making multiway jumps (the equivalent
        of ON...GOTO in BASIC).  Examples:

                JMP END         ; jumps to label
                JMP BX          ; jumps to address in BX
                JMP [BX+SI+4]   ; jumps to address contained in memory
                END:            ; a label

        CALL is the equivalent of GOSUB in BASIC.  Instead of RETURN
        you use RET at the end of the procedure.  The form { CALL reg
        or mem } is allowed.  That gives you the equivalent of BASIC's
        ON...GOSUB statement.  CALL saves the contents of IP in the
        stack, then puts the address of the subroutine in IP.  RET 
        takes the previous IP contents from the stack and puts them
        back in IP.  { RET n } can also be used.  It adds a 16-bit
        number to the stack pointer, and is used to pass variables
        between assembly language subroutines and high level languages.

HARDWARE & SOFTWARE INTERRUPTS
------------------------------
        The PC hardware has some lines called hardware interrupts.  
        They handle communications to and from peripheral devices.  A
        hardware interrupt occurs when a device like a disk drive or
        keyboard sends a signal to the CPU.  The CPU stops what it is
        doing (is interrupted) and jumps to a subroutine that handles
        the device.  The INT instruction provides a way to simulate a
        hardware interrupt in software.  It allows programs to access
        routines within the operating system and I/O device drivers.
        Subroutines that control keyboards, video screens, disk drives
        and printers can be accessed. 
        
        The INT instruction is a special type of CALL.  It causes the 
        Flags, CS and IP to be pushed onto the stack.  IP and CS are 
        loaded with the contents of a particular low memory address 
        which tells where the subroutine is located.  The subroutine
        ends with IRET (interrupt return).  IRET restores CS, IP and
        the Flags.  Of the many interrupts, you will mostly use INT 
        10H, the video I/O calls, and INT 21H, the DOS function calls.  
        It is beyond the scope of this document to go into detail 
        about these.  See the example files in this package.

        The INTO instruction causes an interrupt if the overflow flag
        is set.  CLI clears the interrupt flag (IF).  STI sets it.  
        When IF is cleared, hardware interrupts will be ignored.  
        Software interrupts are not affected.  Example:                

                assembly language                 GWBASIC
                -----------------                 -------
                   INT 20H                        SYSTEM
                                                              
ARITHMETIC INSTRUCTIONS
-----------------------
        Skilled use of these will require further reading, but I will
        introduce them.  ADD can be used to add one register to 
        another or to a memory location, or to add a number to a 
        register or memory location.  ADC (add with carry) has the 
        same forms, as does SUB (subtract), SBB (subtract with borrow),
        and CMP (compare).  You can work with either 8-bit or 16-bit
        quantities.  For example, { ADD BL,CH } adds CH to BL, leaving 
        the result in BL, while { SUB SI,1234 } subtracts the number 
        1234 from SI, leaving the result in SI.  The destination for 
        the result can be any register or memory location.  This makes 
        the 8088 considerably more powerful than previous 
        microprocessors such as the Z80.  

        Since the operand lengths are fixed at 8 or 16 bits, the 
        result of an ADD, for instance, can exceed the capacity of
        the register or memory location.  For example, if AX=F910h,
        and you perform { ADD AX,800h }, the sum 10110h is too large
        to fit in AX.  In this case, the carry flag will be set, and
        only the least significant 16 bits (0110h) will be left in AX.
        You must decide what you want to do in a case like this.  If
        the numbers being added are being interpreted as signed binary
        numbers, you have to be even more careful since the sum of two
        positve 16-bit numbers that exceeds 7FFFh will be incorrect.
        
        Similar care must be used with subtraction.  In short, you 
        must either know in advance that only an appropriate range of
        numbers will be encountered, or make explicit tests for wrong
        results.  For unsigned numbers you can do this by checking the
        carry flag (use JC to jump to an error-handling routine).  For
        signed numbers, use JO (jump if overflow) or use INTO 
        (interrupt if overflow).  With INTO you don't need a separate
        error-handling routine for every program.  You can have a 
        single overflow handler that stays memory resident.

        The INC and DEC instructions add or subtract 1 to or from a
        register or memory location.  They use a 1-byte operand rather
        than a 3-byte operand, and execute twice as fast.  The NEG
        (negate) instruction changes the sign of a number.

        The MUL instruction multiplies AL or AX by a register or memory
        variable.  Using AL, the result is in AX.  Using AX, the result
        is in DX:AX.  DIV (divide) works in reverse, putting the AX
        results into AL with remainder in AH, and DX:AX results into AX 
        with remainder in DX.  IMUL and IDIV have the same forms, but
        are for signed numbers rather than unsigned ones.  
        
        You can multiply other registers by using the SHL instruction.
        Shifting left by one bit multiplies by 2, two bits by 4, etc. 
        Use SHR to divide unsigned numbers.  Use SAR to divide signed
        numbers.
        
        The CBW instruction converts a byte (8-bits) in AL to a word
        (16-bits) in AX.  CWD converts a word in AX to a double-word
        in DX:AX.  These are usually used with IDIV.  Examples:

                ASSEMBLY LANGUAGE                  BASIC
                -----------------                ---------
                   ADD AX,BX                     A = A + B
                   SUB AX,BX                     A = A - B
                   MUL BX                        A = A * B
                   DIV BX                        A = A / B
                   INC AX                        A = A + 1
                   DEC BX                        B = B - 1
                   SHL DX,1                      D = D * 2

LOGIC INSTRUCTIONS
------------------
        For this section, I must assume that you know something about
        digital electronics or Boolean Algebra.  The AND, OR, and XOR
        instructions are the software equivalents of the hardware 
        gates of the same name.  They take the same forms as the ADD
        instruction.  The operations are bit-wise (each bit is handled
        separately).  Thus, when you AND the register AL with the  
        register BL, bit 0 of AL is ANDed to bit 0 of BL, etc.  For 
        example, if AL=11000011B, and BL=01000110B, then { AND AL,BL } 
        yields 01000010B, which is left in AL.  

        AND is used to mask off (set to 0) selected bits.  For example,
        suppose you want to see what number is set in the three least
        significant bits in AL, regardless of what the other bits are.
        { AND AL,7 } does this since 7=00000111 in binary.  The first 
        5 bits of AL will be set to 0, and the last 3 remain unchanged.
        The OR instruction works like AND, but ORs quantites together.
        It is used to set bits, rather than clear them.  XOR is used 
        to toggle bits (change their state, like a toggle switch).
        TEST is like CMP, but for individual bits.  TEST can be used 
        to look at a single bit from an input port, or test for the 
        sign of a number, etc.

        The rest of the logic instructions are and rotate instructions.  
        They rotate all the bits in a register or memory location 
        right or left.  The descriptions are in the file 8088.TXT.  
        I will not go into detail about them.  Example:

                ; check a bit from an input port until it goes low                        
                MOV DX,PORT     ;point DX at the port
                GET: IN AL,DX   ;input from the port
                TEST AL,1       ;is bit 0 still high?
                JNZ GET         ;keep checking if so
                LOW:            ;get here if not

STRING PRIMITIVE INSTRUCTIONS
-----------------------------
        These are one of the most powerful groups of instructions.  
        They allow rapidly moving blocks of data from one place to
        another.  They are called primitive because they provide basic
        (primitive) instructions that can be combined to perform
        complicated operations on strings of data.  In this context a
        string just means a block of data with consecutive addresses.

        Use MOVSB to move one byte at a time, and MOVSW to move one
        word at a time from one place in memory to another.  They 
        require that the SI register is pointing to the source of the
        data, and that DI is pointing to the destination.  Source and
        destination can be in different segments if desired.  Actually,
        the source is always DS:SI, and the destination is always ES:DI.
        For each execution, SI and DI are both incremented if DF 
        (direction flag) is cleared, and decrimented if DF is set. 
        Use CLD to clear DF, and STD to set it.
        
        STOSB (store string byte) moves the value in AL into a memory        
        byte pointed to by DI, and then increments or decriments DI as 
        above.  STOSW does the same thing using AX, and changing DI by
        two bytes (one word).  The SI register is left unchanged.  
        These are handy for filling a block of memory with the same  
        value.  LODSB (load string byte) and LODSW are the reverse of
        the above.  

        CMPSB (compare string byte) compares the byte pointed to by SI
        with the byte pointed to by DI.  The comparison is done by 
        subtracting [SI]-[DI], setting the flags, then throwing away
        the result.  CMPSW is the word equivalent of CMPSB.  the scan
        string instructions SCASB and SCASW are used to look for a 
        particular value in a single string.  They are like MID$ or
        SEG$ in BASICs.  

        REP in front of a string primitive makes it repeat a number of
        times.  The number of repeats is the number in the CX register.
        REPE (repeat while equal) and REPNE (repeat while not equal)
        can also be used.  REPZ and REPNZ are the same thing.  CX is
        involved with these also.  These repeats give string 
        primitives their power.  Examples:

                CLD                     ; clear direction flag
                LEA SI,SOURCE           ; point to source 
                LEA DI,DESTINATION      ; & destination     
                MOV CX,100              ; want to move 100 bytes
                REP MOVSB               ; do it

                ; the following program finds the absolute values of
                ; 10 signed 8-bit integers that are stored starting
                ; at memory location NUMBERS
                CLD
                LEA SI,NUMBERS          ; point to numbers in memory
                MOV DI,SI               ; put back in same place
                MOV CX,10               ; 10 numbers to do
                NEXT: LODSB             ; get a signed number
                TEST AL,80H             ; is it (+)? (bit 7=0 if pos)
                JZ POSITIVE             ; jump to label if it is
                NEG AL                  ; else make it positive
                POSITIVE: STOSB         ; store absolute (+) number
                LOOP NEXT               ; get another if CX is not 0

MISCELLANEOUS INSTRUCTIONS
--------------------------
        Some instructions allow you to do simple arithmetic on BCD
        (binary coded decimal) numbers and for number stored as ASCII
        digits.  BCD numbers are stored with 2 digits in each byte, 
        each digit being a number from 0 to 9.  If you add 2 bytes of
        2 BCD digits each, the results may be wrong.  DAA (decimal
        adjust for addition) makes it right.  DAS is the subtraction
        equivalent.  AAS (ascii adjust for addition), AAS, AAM, and
        AAD are for ASCII digits.  I will not cover their use here.

        The NOP (no operation) instruction is used for debugging or
        time waste.  MAX inserts a NOP in the program code when a 
        label is on a separate line.  This makes it easy to find the
        start of a procedure when you disassemble a .COM file that was
        assembled using MAX, if other labels in the procedure are not 
        put on separate lines.  The label list also has the addresses
        of all of the labels and variables.  

        The HLT instruction halts the program until a reset or interrupt
        occurs.  The program resumes on the next line.  SAHF is used to
        store the contents of the AH register into the flags.  LAHF is
        to load AH from the flags.  These two were provided for 
        compatibility with the earlier 8080/8085 CPUs.  The ESC, WAIT,
        and LOCK instructions are used with math coprocessors or a
        second CPU.  The unregistered version of MAX does not support
        the math coprocessor instructions.

MULTIPLE SEGMENT & LARGE PROGRAMS
---------------------------------
        There are occasions when you will want to change the segment
        registers in the body of your program.  You may want to 
        directly access the video memory at B0000h or the interrupt
        vectors at 0 to 3ffh.  You may also want to deal with a large 
        amount of data (more than 64K).  In these cases you need to set 
        the segment registers and make sure they always point to the 
        right place in memory.  

        With a single segment program, the segment registers (CS,DS,
        SS,ES) all point to the default segment.  The stack defaults
        to offset FFEE, near the top of the segment, and grows down-
        ward.  If it reaches the program code, there will be a problem
        such as a system crash.  You can give the stack a different
        segment by changing SS.  The same applies to the Data and 
        Extra segments.  Conceivably, a program could have more than 
        4 segments.

        Here is a good way to handle the stack in a large program:        
                        JMP START       ;first line in program
                        DB 15 DUP ("stack   ")
                        START: MOV SP,OFFSET START -1
        This puts the stack at the bottom of the segment.  You can
        use DEBUG to run the program, then use D (dump) 0100 to see
        how much of the words "stack   " have been overwritten.  You
        can adjust the number in front of DUP accordingly.

        Here are the "rules" regarding segments:
          1. Addresses pointed to by IP (code) are relative to CS.
          2. Addresses pointed to by SP (stack) are relative to SS.
          3. Addresses using modes involving BP are relative to SS.
          4. Addresses pointed to by all other modes are relative to DS.
          5. For string primitives, the source is relative to DS, and
             the destination is relative to ES.

        To change a segment register, you can just MOV a new value
        into it.  There are restrictions on what can be MOVed, so it
        takes two MOVs. You can't move an immediate value into any
        segment register, and you can't move one segment register into
        another.  You can't change CS at all.  For example, to move 
        the number 4000h into DS, use:
                                        MOV AX,4000h
                                        MOV DS,AX
        To move the contents of CS into DS, replace 4000h above with 
        CS.  Since you can PUSH and POP segment registers, another 
        way is to PUSH CS, then POP DS.

        The LDS (load pointer using DS) instruction allows you to load
        any 16-bit register and DS with one instruction.  For example,
        { LDS SI,DWORD [BX] } loads SI with the contents of the memory
        locations pointed to by DS:BX and DS:BX+1, then loads DS with
        the contents of DS:BX+2 and DS:BX+3.  LES does the same thing
        using the ES register instead of DS.  These are commonly used
        with the string primitive instructions.

        Sometimes you want to access memory using a segment register
        other than the one given in the "rules" above.  This can be
        done using the segment override prefix.  The segment override 
        prefix is just the name of the register followed by a colon. 
        With MAX it is placed at the beginning of the line, but after
        the label, if one is used.  Examples:

                DS:MOV CL,[BP+3]        ;Load CL with the contents of
                                        ;DS:BP+3 instead of SS:BP+3.
                LABL_4:ES:MOV BH,VARBL  ;Load BH with the contents of
                                        ;VARBL, which is in the Extra
                                        ;Segment, not the Data Segment.

FURTHER READING        
---------------
        Here is a list of possible books to increase your understanding
        of assembly language:
          
          1. Peter Norton's Assembly Language Book for the IBM PC.
             by Peter Norton & John Socha.  This is a good beginner's
             book with idea headings highlighted in blue.
          
          2. The IBM PC from the Inside Out.  by Murray Sargent III and
             Richard L. Shoemaker.  Three chapters on assembly 
             language cover all instructions.  Much hardware info, 
             memory map, I/O port addresses, ASCII table, etc.
          
          3. DOS Programmer's Reference.  by Terry Dettmann.  Some 
             basics and a complete reference of the BIOS and DOS 
             interrupts & function calls.
