Microsoft Foundation Classes                            Microsoft Corporation
Technical Notes            

#2 : Persistent Data Format

This note describes the MFC routines that support persistent
C++ objects and the format of those objects in a persistent store.

=============================================================================

The Problem
===========

The MFC implementation for persistent data relies on a 
compact binary format for saving data to disk.  This format
is distinct from the format used for diagnostic output
of class objects for two reasons: (1) diagnostic output is
human readable, and (2) maximum space efficiency is desired
when saving to a persistent store (usually a disk).  It is
for these reasons, that MFC does not provide a polymorphic
interface for storing objects, as is common in other "pure"
object-oriented languages, such as Smalltalk-80.

MFC solves this problem by using the class CArchive.  A
CArchive class object provides a context for persistence that
lasts from the time the archive is created until the CArchive::Close
member function is called, either explicitly by the programmer, or
implicitly by the destructor when the scope containing the CArchive
is exited.

This note describes the implementation of the CArchive protected 
members WriteObject and ReadObject.  ReadObject and WriteObject
are never called directly by end users; these member functions
are used to implement persistent objects.  Remember that end users
should use the class-specific type safe insertion and extraction
operators, enabled by including the DECLARE_SERIAL and IMPLEMENT_SERIAL
macros in a CObject-derived class.  Similarly, the end user rarely
calls the virtual member function CObject::Serialize directly, unless the
object being stored is embedded in another class object, in which case
the exact type of the object is known.


NOTE: This note describes code located in the MFC
source file ARCHIVE.CPP.

=============================================================================
Saving objects to the store (CArchive::WriteObject)
===================================================

The protected member function void CArchive::WriteObject(const CObject*)
is responsible for writing out enough data so that the object
can be correctly reconstructed.  This data consists of two parts:
the type of the object and the state of the object.  This member
function is also responsible for maintaining the identity of the
object being written out, so that only a single copy is 
saved, regardless of the number of pointers to that object 
(including circular pointers).

The saving (inserting) and restoring (extracting) of objects 
relies on several manifest constants.  These are values that 
are stored in binary and provide important information to the 
archive (note the "w" prefix indicates 16-bit quantities).

    wNullTag        // used for NULL object pointers (0)
    wNewClassTag    // indicates class description that follows is new
                    // to this archive context (-1)
    wOldClassTag    // indicates class of the object being read
                    // has been seen in this context (0x8000)

When storing objects, the archive maintains a CMapPtrToWord
(the m_pStoreMap) which is a mapping from a stored object to a
16-bit persistent identifier (PID).  A PID is assigned to every
unique object and every unique class name that is saved in
the context of the archive.  These PIDs are handed out sequentially
starting at 1.  It is important to note, that these PIDs have
no significance outside the scope of the archive, and in
particular are not to be confused with the "record number" or
other identity concepts.

When a request is made to save an object to an archive
(usually through the global insertion operator), a check is made
for a NULL CObject pointer; if the pointer is NULL the wNullTag is
inserted into the archive stream.

If we have a real object pointer that is capable of being
serialized (the class is a DECLARE_SERIAL class), we then check
the m_pStoreMap to see if the object has been saved already, and if
that is the case we insert the 16-bit PID associated with that
object.  

If the object has not been saved before, there are two possibilities
we must take into account, either both the object and the
exact type (i.e. class) of the object are new to this archive context,
or the object is of an exact type already seen.  To determine
if the type has been seen we query the m_pStoreMap for a CRuntimeClass
object (formally, CRuntimeClass is a structure to avoid problems
associated with meta-classes) that matches the CRuntimeClass 
object associated with the object we are saving.  If we have seen this
class before then WriteObject inserts out a 16-bit tag that is the
bit-wise OR'ing of wOldClassTag and this index.  You will note
that this operation imposes a hard limit of 32766 indices per
archive context.  This number represents the maximum number of
unique objects and classes that can be saved in a single archive,
but note that a single disk file can have an unlimited number
of archive contexts.  If the CRuntimeClass is new to this archive
context, then WriteObject will assign a new PID to that class
and insert it into the archive, preceded by the wNewClassTag value.
The descriptor for this class is then inserted into the archive
using the CRuntimeClass member function Store.  CRuntimeClass::Store
inserts the schema number of the class (see below) and the
ASCII text name of the class.  Note that the use of the ASCII
text name does not guarantee uniqueness of the archive across
applications, thus it is advisable to tag your data files to
prevent corruption (imagine distinct applications that both
define the class CWordStack, for example).  Following the
insertion of the class information, the archive places the
object into the m_pStoreMap and then calls the Serialize member
function to insert class-specific data into the archive.  Placing
the object into the m_pStoreMap before calling Serialize prevents
multiple copies of the object from being saved to the store.

When returning to the initial caller (usually the root of the
network of objects), it is important to Close the archive.
If other CFile operations are going to be done, the CArchive
member function Flush MUST be called.  Failure to do so will
result in a corrupt archive.


=============================================================================
Loading objects from the store (CArchive::ReadObject)
=====================================================

Loading (extracting) objects uses the protected
CArchive::ReadObject function, and is the converse of WriteObject.
As with WriteObject, ReadObject is not called directly by user code;
user code should call the type-safe extraction operator (enabled by
DECLARE_SERIAL/IMPLEMENT_SERIAL), which then calls ReadObject.
This extraction operator will insure the type integrity of the extract
operation.

Since the WriteObject implementation discussed above assigned
increasing PIDs, starting with 1 (0 is predefined as the NULL object),
the ReadObject implementation can use an array to maintain
the state of the archive context.  When a PID is read from
the store, if the PID is greater than the current upper
bound of the m_pLoadArray, then ReadObject knows that a
"new" object (or class description) follows.


=============================================================================
Schema numbers
==============

The schema number, which is assigned to the class when the class'
IMPLEMENT_SERIAL is encountered, is the "version" of the
class implementation.  The schema refers to the implementation
of the class, not to the number of times a given object has been
made persistent.  Properly, the latter is usually referred to as the
object version.  If you intend to maintain several different
implementations of the same class over time, incrementing the schema
as you revise your object's Serialize member function implementation
will enable you to write code that can load objects stored using older
iterations of the implementation.

The CArchive::ReadObject member function will throw a CArchiveException
when it encounters a schema number in the persistent store that differs
from the schema number of the class description in memory.  If your
implementation of Serialize for a class with multiple schemas catches this
exception, you will be able to continue the extraction operation taking
into account the differences in the implementation of the Serialize
member function.


=============================================================================
CRuntimeClass
=============

The persistence mechanism uses the CRuntimeClass data
structure to uniquely identify classes.  MFC associates one
structure of this type with each dynamic and/or serializable class in
the application.  These structures are initialized at application
startup time using a special static object of type CClassInit.  You
need not concern yourself with the implementation of this information,
as it is likely to change between revisions of MFC.

The current implementation of CRuntimeClass does not support
multiple inheritance (MI).  This does not mean you cannot use MI
in your MFC application, but it does imply that you will have
certain responsibilities when working with objects that have more than
one base class.  The CObject::IsKindOf member function
will not correctly determine the type of an object if it
has multiple base classes.  Therefore, you cannot use CObject
as a virtual base class, and all calls to CObject member functions
such as Serialize and operator new will need to have scope qualifiers
so that C++ can disambiguate the function call.  If you do find
the need to use MI within MFC, then you should be sure to make the
class containing the CObject base class the leftmost class in the
list of base classes.

For advice on the uses and abuses of MI, a good reference is
"Advanced C++ Programming Styles and Idioms" by James O. Coplien
(Addison Wesley, 1992).
