              Disk Cache Request Sorting
   The dangers of changing the order of disk writes

Introduction

At least three leading disk caches on the commercial software
market attempt to enhance a PC's performance by changing the
order that information is written to a disk.  Instead of
processing disk writes on a first-in-first-out (FIFO) basis,
these caches use request sorting.  Request sorting can reduce
the elapsed time required to complete a series of disk writes,
but it can also create periods of jeopardy that result in the
corruption of a disk's file allocation information as well as
a loss of data.

PC-Kwik Corporation has developed a diagnostic program that
monitors a specific directory for these periods of jeopardy.
When the directory and the file-allocation information become
at risk, the user is alerted to the problem.

PC-Kwik's disk cache, Super PC-Kwik, provides request sorting
as an option but does not use this technique as a default.
Instead, it processes write requests on a FIFO basis,
minimizing risk while still providing a dramatic performance
gain.  Other caches expose the user's disk to the risk of
corruption by defaulting to request sorting.  Often when the
request sorting feature is turned off, the performance gain
provided by these other caches decreases dramatically.


History

Sorting the order of requests is a common technique for
enhancing the throughput of a disk subsystem.  Time can be
saved if the data is written to the desired sectors with less
overall movement of the disk drive's heads.  Request sorting
algorithms were originally designed for mainframes under the
assumption that all requested input-output (I/O) would be
completed.  This conclusion was based on three assumptions:

1. I/O is under the control of a protected operating system
   (i.e., not subject to failure due to bugs in application code).
2. The hardware operates with an uninterruptable power source.
3. The system is being operated by trained technicians.


Request Sorting in the PC Environment

While these assumptions are typically valid for mainframe
systems, they are not usually valid for DOS-based PCs.  Instead,
we are all too familiar with frequent system crashes due to
application bugs.  Most of us have accidentally rebooted or
powered down our systems by hitting the reset button or the
power switch at the wrong time.  And most of us have also
experienced system failures due to power fluctuations.  When
request sorting is not used, the impact of these problems is
limited to the potential loss of the new data being written,
instead of threatening the integrity of entire disk structures
and databases.

Impact of Request Sorting on Database Systems

The designers of sophisticated PC database systems understand
that their database update algorithms will sometimes be subject
to system crashes.  To minimize the damage that these crashes
produce, they carefully order the sequence of disk writes
used to update index and data records (or their equivalent).
Because of this careful ordering, the database software or
utilities provided with it can usually recover from system
failures by detecting how far database transactions proceeded
before the failure, and limit the damage to the records not
yet written.  When the order of modifications to the database
is changed by a disk cache, the recovery procedure cannot
accurately determine which transactions and substeps were
completed, compromising the integrity of the entire database.
Due to this unrecoverable damage that request sorting can cause
here, most database users do not feel the performance increment
is worth the risk.


Risk vs. Reward

A sophisticated user will occasionally choose to accept the
risks of request sorting to obtain the advantage of additional
performance based on their knowledge that the application
environment is largely bug free, an uninterruptable power
supply is being provided, and the system will not be turned
off or rebooted during the database transactions.  A careful
user will also take the precaution of making regular backups
of systems if they employ request sorting.

Unfortunately, though, there are many unsophisticated users
using request sorting because most disk caches use request
sorting algorithms by default.  The user is given little or
no information about this risky technology and what the
trade-offs are between performance and data integrity.


The Impact of Request Sorting on Disk Structures

The DOS File Allocation Table (FAT) and directory structures
can be thought of as a special type of database.  When DOS
creates a file, it carefully postpones pointing the new
directory entry at the newly allocated disk clusters until
the FAT contains the required information about these clusters.

If a request sorting disk cache is used, there may be periods
of time when directory entries point to disk clusters that are
marked as unallocated in the FAT.  If the system is rebooted
due to an application crash or a power failure during one of
these jeopardy periods, the directory entry will continue to
point to unallocated space on the disk.  If the system continues
to be used, the same space will be allocated to another file,
and cross-linked files will result.  Cross-linked files are
almost impossible to correct, because two files end up sharing
the same space on the disk.  One or both of the files will be
corrupted, and there is no reliable technique for recovering
the lost data.  A hard drive with many cross-linked files is
typically referred to as "crashed."

Unfortunately, some disk caches reorder the sequence of disk
writes so radically that directory/FAT structure are exposed
to this type of jeopardy for long periods at a time.  Most
users have no knowledge that they are incurring this type of
risk, because the request sorting is the default mode.  Those
who are aware of the situation are faced with either accepting
the risk or disabling the caching of writes all together.


Jeopardy Detection

To determine how often jeopardy periods are generated by
various disk cache programs, PC-Kwik Corporation developed a
Jeopardy Detection program.   This program continuously
monitors the writing of sectors to a particular disk.  It
compares the directory entries of a specific directory to
the space shown as allocated in the FAT.  When it detects a
discrepancy it alerts the user by changing the screen's border
color to red.  The resulting flashing red border gives the
user a sense of how often jeopardy periods occur for a
particular directory.


Disk Caches that Impose Jeopardy Periods

Most commercially available disk products default to using
some form of request sorting.  According to our analysis,
some of those which create periods of jeopardy include:

-- SMARTDrive which is included with Microsoft Windows 3.1
   and DOS 6.0

-- NCache2 which is included with Norton Utilities 7.0

-- PC-Cache which is included with PC-Tools 8.0

We expect that many other disk cache programs generate the
same sort of risk.


Summary

The Super PC-Kwik disk cache defaults to strict FIFO processing
of disk writes.  It provides a dramatic improvement in
throughput without incurring the risks of request sorting.
Sophisticated users have the option of reading documentation
about advanced options like request sorting, time-delayed
writes, and quick return of the DOS prompt.  They can then
make an informed decision about which of these options are
appropriate for their situation.


                       FIGURE 1

  File Allocation Table               Directory
+------------------------+       +---------------------+
|                        |       |                     |
|                        |       +---------------------+
|                        |       | 1STFILE.DAT     +   |
|        [Allocated]     |       +-----------------|---+
|             |          |       |                 |   |
+-------------|----------+       +-----------------|---+
              |                                    |
              |     +------------------------------+
              |     |
              |     V
              |    +------------------------------+
              +--->|  disk space                  |
                   +------------------------------+

Figure 1.  File 1STFILE.DAT occupies space on the disk.  Space is
allocated in the File Allocation Table and the file name has a
directory which points to this space.


                         FIGURE 2

  File Allocation Table               Directory
+------------------------+       +---------------------+
|                        |       |                     |
|                        |       +---------------------+
|                        |       | 1STFILE.DAT     +   |
|        [Unallocated]   |       +-----------------|---+
|                        |       |                 |   |
+------------------------+       +-----------------|---+
                                                   |
                    +------------------------------+
                    |
                    V
                   +------------------------------+
                   |  disk space                  |
                   +------------------------------+

Figure 2.  During a jeopardy period, file 1STFILE.DAT has a
directory entry, but is not allocated in the FILE Allocation
Table.  If a crash occurs during this jeopardy period, the disk
space wil remain unallocated.


                         FIGURE 3

  File Allocation Table               Directory
+------------------------+       +---------------------+
|                        |       |                     |
|                        |       +---------------------+
|                        |       | 1STFILE.DAT     +   |
|        [Allocated]     |       +-----------------|---+
|             |          |       |                 |   |
+-------------|----------+       +-----------------|---+
              |      +-----------------------------+
              |      |
              |      |           +---------------------+
              |      |           |                     |
              |      |           +---------------------+
              |      |           | 2NDFILE.DAT     +   |
              |      |           +-----------------|---+
              |      |           |                 |   |
              |      |           +-----------------|---+
              |      |                             +
              |      | +---------------------------+
              |      | |
              |      V V
              |     +------------------------------+
              +---->|  disk space                  |
                    +------------------------------+

Figure 3.  Crosslinked files.  After system recovery, new file
2NDFILE.DAT is created and unllocated space is assigned to it.
1STFILE.DAT and 2NDFILE.DAT now occupy the same physical location
and both have directory entries.


PC-Kwik is a registered trademark of PC-Kwik Corporation.

