BitSound.DOC    ---    Documentation for BITSOUND.TPU

Daniel B. Singer, June 1989.

BitSound is a Turbo Pascal Unit for use with TP 4.0 and beyond. It enables
Turbo Pascal programs to effortlessly play back bit-stream digitized sound
data created by RECORD.EXE (by Alan D. Jones).

System Requirements:
  None that you don't already have.... an IBM PC (4.77 MHz) 8088 or better.
  Anything will do.  This unit was developed on a Compaq Portable II, using
  Turbo Pascal v5.50, and has been tested on an IBM PC (4.77/8088), and a
  Compaq Portable II (8.00/80286).  Math coprocessors are not supported,
  because there's no math to be done.

Usage
  Playback
    In your "uses" statement, include "BITSOUND."
    Delcare sound variables of type "SOUNDPTR".

    Procedures
      LoadSound (fileName, soundPtrVar);   {loads and allocates on the heap}
      PlaySound (soundPtrVar);             {plays whole sound file}
      PurgeSound (soundPtrVar);            {deallocates heap}

  Recording
    Use Jones' original RECORD program.  An HP graphics dump of his original
    circuit schematic (prints on HP LaserJet II and DeskJet type printers)
    is included in the archive.  A schematic of Daniel Singer's circuit
    (which was used to create SINGER.VOI) is available upon request from
    the address given below.  Jones' demonstration sound file is called
    'NUMBERS.VOI'.

Limitations
  Note that each sound file may be NO GREATER than 64k.  That limitation is
  not too bad, given that a full 64k file yields 31 seconds of sound.  The
  computer is completely commandered by playSound.  No disk, crt, comm,
  timer, or other interrupt activity is permitted while playSound is running.
  Doing so will jeopardize the operation of the procedure.

Credits
  Alan D. Jones did the original theory and design work for the play and
  record programs as well as the input circuitry.  Daniel B. Singer modified
  the play program to work under Turbo Pascal (4.0+) and the circuitry for
  use with line-level (connect to your stereo) audio and battery power.

Daniel B. Singer
StripÊ̌ Tiger Software
2245 Iroqouis Road
Wilmette, IL  60091-1409

Jones' original documentation follows:


              VOICE DIGITIZATION AND REPRODUCTION ON THE
                 IBM PC/XT AND PC/AT BUILT-IN SPEAKER
             --------------------------------------------

                    Alan D. Jones        July 1988


    The speaker on the PC and its associated driver circuitry is quite
simple and crude, having been designed primarily for creating single
square-wave tones of various audio frequencies. This speaker is typically
driven by a pair of transistors used as current amplifier which is in turn
driven directly by the output of a TTL gate. This results in only two
possibilities of voltage across the voice coil: 0 volts and 5 volts. Any
sound to be reproduced by this system must be reduced to an approximation
in the form of a stream of constant-amplitude, variable-width rectangular
pulses.

    Examination of a speech waveform on an oscilloscope display quickly
tells us that it is not going to be possible to even remotely mimic this
waveform under the above restrictions. Much of the information contained
in the waveform is in the form of amplitude variations, and this is the
one attribute we cannot reproduce. It is initially tempting to try to
use the technique of the "class D" amplifier to create the waveform, using
high-speed pulse width modulation and depending on the mechanical
characteristics of the speaker and those of the human ear to provide the
missing low-pass filtering. Assuming the sampling rate to be 8 KHz (based
on the Nyquist criterion) and, to conserve memory, assuming the samples
to contain only 4 bits of amplitude information (16 levels), we can see
that data accumulates at a rate of 4k bytes per second, which is certainly
acceptable. The problem comes when we try to play back the sound. Pulses
occur at intervals of 125 microseconds, which doesn't seem too bad, but
since each pulse can have 16 possible widths, it is necessary to time the
pulses with a resolution of well under 8 microseconds. This is only a
couple of instruction times on a 4.77 MHz XT, and even on a fast 80386
it doesn't give the CPU much time between bits to shift bits, read and
increment a pointer, check the pointer to see if it's done yet, etc., not
to mention the difficulty of servicing unrelated interrupts.

    The search for simpler (but still usable) and less CPU-intensive
methods of reproducing speech leads to the question of what information
in the waveform we can discard without an unacceptable loss of
intelligibility. My experiments with running speech signals through
a graphic equalizer revealed that the lower-frequency components, those
which are most visible to the eye on the oscilloscope, are actually of
minimal importance in understanding speech. This is also demonstrated by
the fact that a whisper is just as understandable as normal speech, but
does not make use of vibrating vocal chords, which are the primary source
of low-frequency components in the voice.

    The schematic created by printing the file SCHEMATC.PRT arose partly
from the above observations and partly from trial-and-error. The circuit
consists of two stages of voltage amplification with some high-pass
filtering built into the coupling capacitors, followed by a differentiator.
The output of the differentiator is fed to a voltage comparator, thus
producing an output which has approximately the following relationship
to the input from the microphone: If the derivative of the speech waveform
if positive, then the output is logic zero; If the derivative of the speech
waveform is negative, then the output is logic one. The transition timing
at the output is entirely analog in nature; there is no synchronizing
clock signal anywhere in the circuit.

    If the output of this circuit is connected directly to a speaker, the
resulting sound will still be an understandable version of the input.
Since the output consists of nothing but a digital bit stream, the job
of the computer becomes that of simply recording and accurately reproducing
this bit stream.

**** NOTE: The following two paragraphs apply to Jones' circuit only.
If using Singer's circuit, these are the wrong directions and specifications.
--DBS

    The trimpot at the input of amplifier U3 is used to set the DC idle
voltage output from the differentiator to somewhere near the threshold
of comparator U4. There will be a considerable amount of noise at the output
of U3, originating at the microphone and within the input circuitry of U1,
and highly amplified by U1 and U2. The trimpot should be adjusted so that
the comparator threshold is just outside the normal excursion of the noise
signal ("off to one side"), otherwise "silence" at the microphone will
become, at the speaker output from the computer, a loud hiss with a strong
component at half the sampling frequency.

    I used LF356's for U1, U2, and U3, and an LM393 for U4. Everything is
powered by +12 and ground. All amplifiers should have power supply bypass
capacitors (not shown). The microphone is a 600 ohm dynamic type. The 12
volt power supply should be quiet and well-regulated; the one in the PC is
too noisy unless you use heavy filtering.

    The two programs, RECORD and PLAY, are used as follows: Attach the
circuit to the CTS input on one of the PC's COM ports. Then type:
RECORD <number> <filename>    where <number> is the COM port number
and <filename> is the name of the disk file to contain the voice data.
RECORD will respond with "Press a key to start and stop." Press the space
bar and start talking. Press the space bar again to end recording and write
the data to disk. Play it back with PLAY <filename> (or use the BITSOUND
unit -- DBS).  The sampling rate is about 16.5k bits per second. This means
that about 30 seconds of voice will make a 64k disk file. This is a simple
program; it runs out of steam at 64k. The programs both operate by
reprogramming the 8253 time chip to produce hardware interrupts at the
16.5 KHz rate. The interrupt service routine then manipulates the NAND gate
driving the speaker based on bits read from the file. The 16.5 Khz rate was
chosen by trial-and-error; this is the audible "point of diminishing returns",
where a further increase in sampling rate didn't produce enough of an
improvement to warrant the increased memory usage.

    This technique is somewhat limited in its usefulness. It necessitates
the writing of a "badly behaved" program which not only reprograms the timer
chip but also totally hogs the CPU for the duration of the voice output.
Nevertheless, it demonstrates a few interesting things about how humans hear
speech. I first developed this circuit over a year ago as a rebuttal to
someone who said "it couldn't be done". Not only can it be done, it is
actually quite simple. Certainly the curcuit could be improved, at the
possible expense of increased complexity. I'm waiting to hear from some of
you. If anyone has questions, especially about my sloppy code, I check
for messages on CIS every three or four days.

                                                        - Alan

                                                          74030,554


( end of documentation )
