OS/2 Upload Information Template for ftp-os2.nmsu.edu

Archive Name: UNH204.ZIP
Program Description: a command line utility to strip HTML codes
Operating System Versions: OS/2 2.x and later
Program Source: Don Hawkinson, author
Replaces: UNH202.ZIP UNH175.ZIP UNH150.ZIP 
          NOTE: UNHTMLxx.zip is a different utility

Your name: Don Hawkinson
Your email address: dwhawk@southwind.net 

Proposed directory for placement:  ./os2/textutil

This is an OS/2 command line utility to strip HTML codes from
files saved from the WebX or other web browsers. 

UNH 2.04  HTML stripper by Don Hawkinson  dwhawk@southwind.net

usage:  ..\unh  file1 file2 <file3>

	file1 == html file
	file2 == stripped text output file
	file3 == URLs from html source file - optional


UNH does not check for the existance of the output
file, and will overwrite any existing file.  UNH
is HPFS aware.

UNH does not attempt to recreate the format of the  Web page. UNH does
not attempt to force any format on the output text, nor does it attempt
to remove any existing text format. While the layout of tables and lists
is lost during stripping, data is sorted to separate lines for
legibility.

The HTML specification defines Character Entity Sets or tags to
represent particular graphic characters which have special meanings
in places in the markup, or may not be part of the character set
available to the writer. UNH does not attempt to scan for all of the
possible tags, but does try to resolve the most common tags.

This version of UNH has support for codepages 437 and 850 and if
codepage 850 is in use, the 850 character set is used. The codepages
only make a difference when &xxxx; or &#nnn; tags are present in the
file. If the correct character or an acceptable alternate is not
&#nnn; available a space will be used. If an unrecognized tag is
encountered, it is left in the output text.

This version should be useable under OS/2 2.1, but it has not been
tested.  The special compression option for OS/2 Warp was not used
when linking the executable.

This program is free, but the author retains all rights. See the file
license.txt file for further information.

The command line utility UNH.EXE uses the same logic as the shareware
PMStripper to strip the HTML codes from files. PMStripper is a PM
utility that loads the stripped file into a MLE window to allow
simple editing functions.  PMStripper is distributed as PMS_xxx.ZIP
with the version number replacing the xxx.  For information on the 
current PMStripper version, contact send email to dwhawk@southwind.net .
                                                          
