Version 1.0:
 - high resolution time measurement ok.
 - detection of L1 cache ok.
 - detection of L2 cache ok.
 - CPU type detection ok.
 - partial implementation of video performance tests.

Version 1.1:
 - L1 cache code improved.
 - L2 cache code improved.
 - AT cache detection added.
 - CPU type detection reworked.
 - full implementation of video performance tests.

Version 1.2:
 - L2 cache code improved.
 - DRAM page size detection added.
 - cache line size detection added.
 - video info added.

Version 1.3:
 - different device driver used to be WARP compatible

Version 1.3a:
 - nasty bug in preloading code pages => total system lockup   (fixed)

Version 1.4:
 - L2 cache code improved.
 - CPU type detection improved.
 - DRAM page size detection improved.
 - DRAM interleave detection added.
 - results layout depends on presence of L1/L2 cache.

Version 1.4a:
 - CPU-ID bug fixed

Version 1.5
 - CPU type detection reworked (more CPU types, more reliably)
 - more robust to different environments:
   works without device driver -> less details
   works from a boot floppy -> no video testing
 - much smaller critical code with disabled interrupts should remove any
   chance for lockups due to page faults
 - fixed bug in L2 cache code (affected Pentium only)

Version 1.6
 - another CPU-ID bug fixed
 - minor code reordering to improve L1 cache detection on Pentiums
 - added testing of opcode fetch from L2 cache
 - added capability of opcode fetch testing on CPUs with harvard architecture
   (sorry, this needs much bigger code)
 - added 'more' option

Future:
 - overall improvement :-)
 - less bugs :-)
 - fancy GUI interface
 - any other good and implementable idea
