======================================================================
HTML2TXT (c) 1995-96       by Danny Attema                    Ver 2.40
======================================================================
                       aattema@dove.mtx.net.au


What is HTML2TXT
----------------

  It's what you got it for.  It takes a HTML file and tries to turn it
  to a txt file.  It usually does it very well.  HTML2TXT is written
  so that it is easy for the text file to be imported into a word
  processor.  Each paragraph is one line.  This might be a problem for
  programs that cannot do a word wrap.  If all else fails Window's
  Notepad can do a word wrap.

  ** HTML2TXT is FreeWare and may be freely distributed if in an
     unmodified form.
  


How to run HTML2TXT
-------------------

  On the command line type:

      HTML2TXT [name-of.htm]

  and press <enter>.  When HTML2TXT has finished, in the same
  directory as the original HTML file there will be the .txt file. If
  you do not type in file name then a small help message will appear.
  Only one HTML file can be converted at a time.  If you want to
  convert using wide cards you could write a batch file like this:

    @For %%F in (%1) do HTML2TXT.exe %%F

  Then just call up the batch file like:  H2T *.htm  Where H2T is the
  name of the batch file.  If you don't know what this all means don't
  worry about it.  I don't know why it works either, but it does.



The Config File
---------------

  The Config file looks something like this:

  --------------------------------------------------------------------
  ; HTML2TXT Config file.
  ; If this file is not found is the current directory, then the
  ; defaults are assumed.
  ; --------------------------------------------------------------

  ; When a <B> or </B> is found it is replaced with the character
  ; corresponding with this ascii value.  This also works for
  ; <STRONG> and </STRONG>.  [Default = 0]
  ;  e.g.  42 = * , 60 = < , 62 = >
  <B>, 42
  </B>, 42

  ; When a <I> or </I> is found it is replaced with the character
  ; corresponding with this ascii value.  This also works for <EM>
  ; and </EM>.  [Default = 0]
  ;  e.g.  95 = _ , 40 = ( , 41 = )
  <I>, 95
  </I>, 95

  ; Use Alt. Picture Text when found.  [Default = Y]
  ;  Y for yes, N for no.
  Alt. Picture Text, Y
  --------------------------------------------------------------------

  Everything starting with a ; is a comment and is ignored. Everything
  else is self explanitory.  If the config file is not found in the
  default then the defaults are assumed.  Usually the defaults are
  what you want anyway.



What Tags does is recognise?
----------------------------

  HTML2TXT knows what <P>, <BR>, <B>, </B>, <STRONG>, </STRONG>, <I>,
  </I>, <EM>, </EM>, <DT>, <DD>, <LI>, <DL>, </DL>, <UL>, </UL>, <OL>,
  </OL>, <MENU>, </MENU>, <DIR>, </DIR>, <PRE>, </PRE>, <TITLE>,
  </TITLE>, <HR>, and parts of <IMG> mean and it uses these to make
  the text file look right.



What do all those funny characters, that come up on the screen, mean?
---------------------------------------------------------------------

  +  means that there is something on the line that is being
     processed.  Writing, HTML code(s) or something.  It is really
     just a progress indicator.

  -  means that a blank line is being processed (it is really just
     skipped).

  *  means that some preformated text (<PRE> & </PRE>) has been found.
     When this is found HTML2TXT does not alter the spacing or make
     long lines for paragraphs.

  &  means that an escape character (eg. &lt;) was converted to its
     real character.  If an 'X' appears next to a '&' then the escape
     character was unknown and ignored.  When HTML2TXT has finished,
     it will tell you this has happened.

  |  means that a horizontal rule (<HR>) was found.  It is replaced by
     a dashed line

    means a picture's "alt" text has been used.  Sometimes handy,
     sometimes not.  The alternate text shows up in the text file in
     between { and }.

  ?  means that a "<" (less than symbol) was found that did not have a
     HTML code following it.  This is usually because it appears in a
     preformated text area.  When HTML2TXT has finished, it will tell
     you this has happened.

  c  means that a "" [#169] character was found and turned into a
     copyright symbol.

  R  means that a "" [#174] character was found and turned into a
     registered symbol.

  T  means that a tab code [#9] was found and removed.  Tab codes used
     to make a mess of the txt files.

  N  means that a nesting error was found. This means that there were
     more </DL>s than <DL>s.  Very rare.  When HTML2TXT has finished,
     it will tell you this has happened.

  L  means a long line (>255 characters) was found.  This now makes no
     difference to HTML2TXT. It used to mess up the .txt file.



Known Problems (Don't judge this program by what it can't do)
-------------------------------------------------------------

  Tables.  I have yet to work out a way of turning the tables in HTML
     files to something that looks like a table in a text file.

  Centre codes.  HTML2TXT does not centre the text for you. (Cannot be
     bothered because it would take too much code.) It is not really a
     problem because if you put this into a word processor then you
     don't really want centring.

  Forms.  Some forms will look very odd with hidden text appearing.
     Who would want to convert a form anyway?

  Some codes.  If HTML2TXT does not know what a code is it just
     deletes it.  This is usually good enough.


  If you find any other problems with HTML2TXT make sure to write to
  me at "aattema@dove.mtx.net.au" and send a copy of the HTML file
  that causes the problem.



Other HTML to Text converters
-----------------------------

  QSTRIP11.zip - This program just gets rid of the codes (between the
     < and >) and does not do anything else.  Not very good at all.
     Use HTML2TXT instead.

  HTMLBU11.zip - I cannot give you any details about this program
     because I deleted it when it did not do what I wanted it to do.
     But it is not as good as HTML2TXT

  HTMLCO20.zip - This has a big nag screen which come up and makes you
     wait 20 seconds.  It also does some strange things to the text
     file as default, but it can be changed.  It recognises lots of
     the special character codes (e.g. &lt; and &034;).  It does not
     use the <DL> and </DL> codes (and others like them) to do
     indenting, like HTML2TXT does.  HTML2TXT is better.

  UNHTML10.zip - This program just gets rid of the codes (between the
     < and >) and does not do anything else.  Not very good at all.
     Use HTML2TXT instead.

  HTMST512.zip - This program gets rid of the HTML codes and
     understands some of them (like centre, some tables).  It
     recognises lots of the special character codes (e.g. &lt; and
     &034;).  It does not use the <DL> and </DL> codes (and others
     like them) to do indenting, like HTML2TXT does.  Keep it in mind
     if you find a file that HTML2TXT does not like.



Changes
-------

  Version
         2.40  Netscape ignores <P> and other codes sometimes, so that
               you don't get huge spaces between paragraphs.  I have
               tried to copy this.
         2.36  <STRONG> and </STRONG> are changed like <B> and </B>
               and <EM> and </EM> are changed like <I> and </I>
               through the config file.  Added bit in HTML2TXT.TXT
               (this file) about the config file.
         2.35  Added config file.  So you can change some options.
         2.34  More escape characters are now recognised.
         2.32  Added more error messages.  (So you know what's going
               on)
         2.31  Stops you doing dangerous things like writing over the
               original file.
         2.30  Now reads in long lines (>255 characters) correctly. (I
               hope!  It seems to, but that doesn't mean anything.)
  1.00 - 2.26  Lots.  Way too many to mention.



Disclaimer
----------

  (I put this disclaimer at the bottom of the file because who
   actually reads this anyway?)

  I take no responsibility for anything this program does to you, your
  data or your view on life.  Use it at your own risk.


                           <<  The End  >>
