BitSound.DOC --- Documentation for BITSOUND.TPU Daniel B. Singer, June 1989. BitSound is a Turbo Pascal Unit for use with TP 4.0 and beyond. It enables Turbo Pascal programs to effortlessly play back bit-stream digitized sound data created by RECORD.EXE (by Alan D. Jones). System Requirements: None that you don't already have.... an IBM PC (4.77 MHz) 8088 or better. Anything will do. This unit was developed on a Compaq Portable II, using Turbo Pascal v5.50, and has been tested on an IBM PC (4.77/8088), and a Compaq Portable II (8.00/80286). Math coprocessors are not supported, because there's no math to be done. Usage Playback In your "uses" statement, include "BITSOUND." Delcare sound variables of type "SOUNDPTR". Procedures LoadSound (fileName, soundPtrVar); {loads and allocates on the heap} PlaySound (soundPtrVar); {plays whole sound file} PurgeSound (soundPtrVar); {deallocates heap} Recording Use Jones' original RECORD program. An HP graphics dump of his original circuit schematic (prints on HP LaserJet II and DeskJet type printers) is included in the archive. A schematic of Daniel Singer's circuit (which was used to create SINGER.VOI) is available upon request from the address given below. Jones' demonstration sound file is called 'NUMBERS.VOI'. Limitations Note that each sound file may be NO GREATER than 64k. That limitation is not too bad, given that a full 64k file yields 31 seconds of sound. The computer is completely commandered by playSound. No disk, crt, comm, timer, or other interrupt activity is permitted while playSound is running. Doing so will jeopardize the operation of the procedure. Credits Alan D. Jones did the original theory and design work for the play and record programs as well as the input circuitry. Daniel B. Singer modified the play program to work under Turbo Pascal (4.0+) and the circuitry for use with line-level (connect to your stereo) audio and battery power. Daniel B. Singer Stripˆd Tiger Software 2245 Iroqouis Road Wilmette, IL 60091-1409 Jones' original documentation follows: VOICE DIGITIZATION AND REPRODUCTION ON THE IBM PC/XT AND PC/AT BUILT-IN SPEAKER -------------------------------------------- Alan D. Jones July 1988 The speaker on the PC and its associated driver circuitry is quite simple and crude, having been designed primarily for creating single square-wave tones of various audio frequencies. This speaker is typically driven by a pair of transistors used as current amplifier which is in turn driven directly by the output of a TTL gate. This results in only two possibilities of voltage across the voice coil: 0 volts and 5 volts. Any sound to be reproduced by this system must be reduced to an approximation in the form of a stream of constant-amplitude, variable-width rectangular pulses. Examination of a speech waveform on an oscilloscope display quickly tells us that it is not going to be possible to even remotely mimic this waveform under the above restrictions. Much of the information contained in the waveform is in the form of amplitude variations, and this is the one attribute we cannot reproduce. It is initially tempting to try to use the technique of the "class D" amplifier to create the waveform, using high-speed pulse width modulation and depending on the mechanical characteristics of the speaker and those of the human ear to provide the missing low-pass filtering. Assuming the sampling rate to be 8 KHz (based on the Nyquist criterion) and, to conserve memory, assuming the samples to contain only 4 bits of amplitude information (16 levels), we can see that data accumulates at a rate of 4k bytes per second, which is certainly acceptable. The problem comes when we try to play back the sound. Pulses occur at intervals of 125 microseconds, which doesn't seem too bad, but since each pulse can have 16 possible widths, it is necessary to time the pulses with a resolution of well under 8 microseconds. This is only a couple of instruction times on a 4.77 MHz XT, and even on a fast 80386 it doesn't give the CPU much time between bits to shift bits, read and increment a pointer, check the pointer to see if it's done yet, etc., not to mention the difficulty of servicing unrelated interrupts. The search for simpler (but still usable) and less CPU-intensive methods of reproducing speech leads to the question of what information in the waveform we can discard without an unacceptable loss of intelligibility. My experiments with running speech signals through a graphic equalizer revealed that the lower-frequency components, those which are most visible to the eye on the oscilloscope, are actually of minimal importance in understanding speech. This is also demonstrated by the fact that a whisper is just as understandable as normal speech, but does not make use of vibrating vocal chords, which are the primary source of low-frequency components in the voice. The schematic created by printing the file SCHEMATC.PRT arose partly from the above observations and partly from trial-and-error. The circuit consists of two stages of voltage amplification with some high-pass filtering built into the coupling capacitors, followed by a differentiator. The output of the differentiator is fed to a voltage comparator, thus producing an output which has approximately the following relationship to the input from the microphone: If the derivative of the speech waveform if positive, then the output is logic zero; If the derivative of the speech waveform is negative, then the output is logic one. The transition timing at the output is entirely analog in nature; there is no synchronizing clock signal anywhere in the circuit. If the output of this circuit is connected directly to a speaker, the resulting sound will still be an understandable version of the input. Since the output consists of nothing but a digital bit stream, the job of the computer becomes that of simply recording and accurately reproducing this bit stream. **** NOTE: The following two paragraphs apply to Jones' circuit only. If using Singer's circuit, these are the wrong directions and specifications. --DBS The trimpot at the input of amplifier U3 is used to set the DC idle voltage output from the differentiator to somewhere near the threshold of comparator U4. There will be a considerable amount of noise at the output of U3, originating at the microphone and within the input circuitry of U1, and highly amplified by U1 and U2. The trimpot should be adjusted so that the comparator threshold is just outside the normal excursion of the noise signal ("off to one side"), otherwise "silence" at the microphone will become, at the speaker output from the computer, a loud hiss with a strong component at half the sampling frequency. I used LF356's for U1, U2, and U3, and an LM393 for U4. Everything is powered by +12 and ground. All amplifiers should have power supply bypass capacitors (not shown). The microphone is a 600 ohm dynamic type. The 12 volt power supply should be quiet and well-regulated; the one in the PC is too noisy unless you use heavy filtering. The two programs, RECORD and PLAY, are used as follows: Attach the circuit to the CTS input on one of the PC's COM ports. Then type: RECORD where is the COM port number and is the name of the disk file to contain the voice data. RECORD will respond with "Press a key to start and stop." Press the space bar and start talking. Press the space bar again to end recording and write the data to disk. Play it back with PLAY (or use the BITSOUND unit -- DBS). The sampling rate is about 16.5k bits per second. This means that about 30 seconds of voice will make a 64k disk file. This is a simple program; it runs out of steam at 64k. The programs both operate by reprogramming the 8253 time chip to produce hardware interrupts at the 16.5 KHz rate. The interrupt service routine then manipulates the NAND gate driving the speaker based on bits read from the file. The 16.5 Khz rate was chosen by trial-and-error; this is the audible "point of diminishing returns", where a further increase in sampling rate didn't produce enough of an improvement to warrant the increased memory usage. This technique is somewhat limited in its usefulness. It necessitates the writing of a "badly behaved" program which not only reprograms the timer chip but also totally hogs the CPU for the duration of the voice output. Nevertheless, it demonstrates a few interesting things about how humans hear speech. I first developed this circuit over a year ago as a rebuttal to someone who said "it couldn't be done". Not only can it be done, it is actually quite simple. Certainly the curcuit could be improved, at the possible expense of increased complexity. I'm waiting to hear from some of you. If anyone has questions, especially about my sloppy code, I check for messages on CIS every three or four days. - Alan 74030,554 ( end of documentation )