WinHelp File Format -- Additional Internal Files
------------------------------------------------

The September 1993 and October 1993 issues of Dr. Dobb's Journal
contain (in the "Undocumented Corner") a detailed description by Pete
Davis of the WinHelp file format, used in the Windows .HLP and .MVB
files.  Unfortunately, for space reasons only a limited number
(though hopefully the most important) of the internal files that make
up a .HLP file were discussed.

All internal files are shown in WHSTRUCT.H and in HELPDUMP.C.  Here
are detailed descriptions of the internal files not discussed in
the article.  This probably won't make much sense if you haven't
read the two-part DDJ article.


|FONT
-----

    The |FONT file has three parts: a header, a list of available
fonts, and a list of font descriptors.

    Following the file header for the |FONT file, is the FONTHEADER
(see WHSTRUCT.H) record.  This is a 4 word field. The first word is
the number of fonts available to the help file. The second word is
the number of font descriptors actually used in the help file. The
third is the default font descriptor and the last is the offset to
the descriptors list.

    Immediately following the FONTHEADER is a list of font names. These are 
all 20 character fixed length records. Each font name is null terminated so
font names can be up to 19 characters long.

    Immediately following the fonts is the descriptor list. The font
descriptors are individual instances of fonts that are actually used. For
example, if you use 10 pt Helvetica, then a descriptor is created for that.
If later you use 12 pt Helvetica, or Bold 10 pt Helvetica, different 
descriptors are created. Different descriptors are created for the following:

1) Using a different font
2) Using a different point size
3) Using a different attribute (Bold, Underline, Italics, etc)
4) Using a different color

    The first byte of the Font Descriptor is the attribute. This has
attributes like Underline, Bold, etc. 

    The second byte is the size of the font in half points. Therefore
an 8 pt font has a halfpoint size of 0x10. The third byte is the
family of the font.

    The fourth byte is the name of the font. This is the index into
the font list preceding the font descriptor list.

    The last 6 bytes are the colors for foreground and background.
Actually, the background color is just a guess. Changing these values
has no affect on the font as WinHelp displays it. I'm guessing it was
a planned enhancement.


|CONTEXT Hash values

    The |CONTEXT file contains hash values for all the keywords and
context strings. This makes it easy to search on keywords and context
strings. Simply calculate the hash value of the string and search
the |CONTEXT file for a matching hash value. 

    Since the hash values can't be reversed, I have included a simple
program called MAKEHASH.C. This will simply take a string from the
command-line and convert it to a hash value. The hash algorithm uses
a conversion table to remove case-sensitivity and reduce the number
of characters involved in the hash. 

/********************************************
  MakeHash.C
  Pete Davis 
  Calculates and outputs the hash value
of a string. These hash values are used in
the |CONTEXT file of a WinHelp .HLP file.
*********************************************/

#include <stdio.h>

   char  MapTable[256];

/* Function prototypes */
void BuildMap(void);
long Hash (char *);

/********************************************
  Builds character set map for hash function.
*********************************************/
void BuildMap() {
   char c;
   int  counter;

   /* Map A-Z and a-z as 0-25. */
   for (counter = 'A', c = 17; counter <= 'Z'; counter++, c++) 
      MapTable[counter] = MapTable[counter + 32] = c;
   for (counter = '1', c = 1; counter <= '9'; counter++, c++) 
      MapTable[counter] = c;
   MapTable['0'] = 0x0A;
   MapTable['.'] = 0x0C;
   MapTable['_'] = 0x0D;
}

/********************************************
   Hash function by Ron Burk
*********************************************/
long Hash (char *p) {
   long h = 0;
   while(*p) {
      char c = MapTable[*p++];
      h = h * 0x2B + c;
   }
   return h;
}

void main(int argc, char *argv[]) {
  long HashVal;
  BuildMap();
  HashVal = Hash(argv[1]);
  printf(" Hash value = %ld\n", HashVal);
}

|KWMAP, |KWBTREE, and |KWDATA

    These three files are used together to get the keywords and their
offsets to topics. These are the default keyword files. The default
letter associated with Keywords in WinHelp is 'K'. Using the MULTIKEY
option in the .HPJ file, though, you can have multiple keyword files
based on different letters. If, for example, if you use the
MULTIKEY=V option, you will have |VWMAP, |VWBTREE, and |VWDATA files
associated with the 'V' keywords.

    We're going to stick with the 'K' keywords in our discussion and
these are the only ones handled by the HELPDUMP program. The other
keyword files are handled in exactly the same was as the 'K'
keywords, so everything here applies.

    The |KWMAP file is the simplest. It starts with a single long
that gives the number of KWMAPREC records. This is followed by a list
of KWMAPREC records (See WHSTRUCT.H). The FirstRec field is the first
keyword to appear on the given leaf page. The PageNum field,
therefore, is the page number associated with the keywords. For
example, if you have 3 leaf pages in the |KWBTREE file, then there
will be 3 KWMAPREC records in |KWMAP. If there are, say, 60 keywords
on page 0, 45 keywords on page 1, and 52 keywords on page 2, then the
three records in |KWMAP would look like the ones in Figure 1.

------------- Figure 1 ----------------

         |  Rec 1  |  Rec 2  |  Rec 3
---------+----------------------------
FirstRec |     0   |    60   |   105
Page Num |     0   |     1   |     2

    The |KWBTREE file has a list of all the keywords in the help
file. Each keyword has a count and an offset associated with it. The
count is the number of occurrences of the keyword in the help file.
The offset is relative to the the beginning |KWDATA file.

    If you have a keyword with a count of 3 and an offset of 16
(decimal), then you would go to the 16th byte of the |KWDATA file.
You would then read the next 3 longs. Each of these longs would be an
offset to the location in the |TOPIC file to find the occurance of
the keyword.

    The way WinHelp uses is this information is as follows. When you
select the "Search" button, you are given a list of keywords. If you
double-click on a keyword, the all the topics with occurances of that
keyword are listed.  The topic titles are actually pulled from the
|TTLBTREE file which has the topics and offsets. You would simply
match thes offset from |KWDATA with the offsets in |TTLBTREE to get
the topic titles.

|CONTEXT

    The |CONTEXT file is a b-tree like the |TTLBTREE and |KWBTREE. It
uses 2k page sizes and the same structure for the header, index
nodes, and leaf nodes.

    The |CONTEXT file's data is simply a list of CONTEXTRECs which
consist of a hash value and a topic offset (see WHSTRUCT.H).

    The hash values are for context strings used in the help file.
The context strings are basically the hot links in the text. I
question the need for this table since each context string, in the
text, has the hash value. It would have been easier to just have the
Topic offset associated with the hot-links instead of the hash value.
You don't even have to actually calculate the hash value because it
is provided with the hot-link itself.

|CTXOMAP

    In the .HPJ file you can set up context-sensitive points in your
help file by adding a section titled [MAP]. Under the [MAP] section
you list Topic Titles and assign unique IDs to each one. A sample
[MAP] section could look like this:

[MAP]
TableOfContents   0x0010
Introduction      0x0020
Chapter1          0x0030
Chapter2          0x0040
...
Chapter10         0x0120
Glossary          0x0130

    You can then use these numbers in the WinHelp API function to
jump to a specific topic. This information is listed in the |CTXOMAP
file. The first WORD of the file is the number of entries in the
Context Map table.

    This is followed by the individual CTXOMAPREC records (See
WHSTRUCT.H).  The CTXOMAP records simply have the unique ID provided
in the [MAP] section followed by the offset to the topic specified.

-- Pete Davis, September 1993
   CIS 71644,3570
   WPJ BBS 703-503-3021

