README for Backprop 0.9
-----------------------

Thanks for downloading Backprop. I hope that you will find it to 
be a useful program.

Updates
-------

The latest update of backprop will always be available at the following URL:

http://www.users.cts.com/crash/s/slogan/backprop.html

Distribution Rights
-------------------

Backprop is Copyright 2003 Syd Logan. You have permission to distribute at will, 
for any legal purpose, but you must include this README, unmodified, along with
the software.

Bug Reports and Suggestions
---------------------------

Please send bug reports to me, Syd Logan, at slogan@cts.com. Your input will
help to make Backprop a better, more stable program for you and others.

When reporting a bug, include the following information:

-- Version of Backprop. The version is included in the dialog displayed by selecting
About Backprop... in the Help menu.

-- Operating system (e.g., Windows 95, 98, NT, XP)

-- Steps to reproduce the problem. Attach a copy of the data files you are using (training
exemplars, weight files, test data) so that I can duplicate the problem.

Thanks. If I can't understand the problem, I probably can't help, so the more information,
the better.

How to Use Backprop
------------------- 

If you are new to neural networks, or to backpropagation in particular, you should spend 
some time reading about it before using backprop. There are numerous books, journals, and
web sites that contain information about backpropagation neural networks, and their use.
The following should be enough to get you started, however.

A neural network is a program that can be trained to perform a task, usually pattern 
recognition, classification, or function approximation. For example, you might train
a neural network to classify an input as belonging to a certain class, or to recognize
a series of pen strokes read on an input device as a letter of the alphabet. In order to 
train the neural network to perform its intended task, you must do the following:

-- Come up with a neural network architecture. A neural network consists of a set of 
layers, each containing a number of nodes. The number of layers in backpropagation
nets is usually 3 or larger. The first layer is called the input layer, and it has one
node for each input. The last layer is called the output layer, and it has one node for
each output. The remaining layers are called hidden layers, and the number of nodes in
these layers is harder to specify. More on that later.

As an example, consider a neural network that is designed to classify patterns based on
the following input data:

Has Fins     Has Gills    Is a Fish
-----------------------------------
Yes          No           No
No           No           No
Yes          Yes          Yes

The first row of the table represents the fact that an animal that has fins, but not gills,
is not a fish. In converting this data to use with a neural network, we can simply replace
Yes with the value 1, no with the value 0 (or perhaps -1), and come up with the following:

Has Fins     Has Gills    Is a Fish
-----------------------------------
1            0            0
0            0            0
1            1            1

A neural network with two input nodes, one corresponding to Has Fins and one corresponding
to Has Gills, and one output node corresponding to Is a Fish can be used to solve this
problem. Setting the input layer node one to 1 and node two to 0, in a properly trained
network, will result in the output node firing 0. The output output node should also, in a
properly trained network, fire 0 if nodes one and two in the inout layer are presented the
value 0. The number of hidden layers, and the number of nodes in each of the hidden layers,
is more difficult to specify. Many claim that coming up with the hidden layer architecture
is more of an "art" than a "science". I won't argue that. One of the nice things about
backprop is you can easily add or remove hidden layers, or change the number of nodes in
the hidden layers, and see the effects on training.

-- Once you have an architecture for the neural network in hand, you need to train the 
neural network. This is done by presenting the neural network with examples that it can
use to learn the problem you want it to solve. These examples, also known as exemplars,
are repeatedly shown to the network until the network learns them, or some maximum number
of tries has been performed. It can, and often does, take tens of thousands of presentations
of a set of exemplars before a network becomes trained. How long it takes is a function of
the network architecture, the initial state of the network, nuances of the training 
algorithm, and the set of exemplars. Changing one or more of these is all it sometimes takes
for a network that won't train to turn into a network that will. One of the design goals of
backprop is to give you the tools you need to visualize how changes in the exemplar set, 
training algorithm, or architecture affect the ability of the neural network to train
successfully.

Once the network is trained, you can then use it to solve problems. This is done by presenting
data to the input layer nodes, and observing the values that result in the output layer.

Launching Backprop
------------------

To launch backprop, simply double click on the backprop icon. 

Loading a Network Architecture File
-----------------------------------

The first thing you must do after launching backprop is to load an architecture file that 
describes the architecture of the network. The architecture file is a file that you create
in a text editor (like notepad). The first line of the file specifies the number of layers
in the network, including the input and output layers. The remaining lines in the file 
contain a number which specifies the number of nodes in each layer. The following is an
example of a file that specifies a network with 4 layers. Layer one (the input layer) contains 
2 nodes, layer two contains 3 nodes, layer three contains 7 nodes, and layer four (the output 
layer) contains 1 node:

4
2
3
7
1

By convention, architecture files are stored on disk in files with a ".net" suffix, for example,
"mynet.net" is a backprop architecture file.

To load an architecture file, select Open... from the File menu. All the files in the current
directory with a suffix of ".net" will be displayed. Click on the file and hit OK. Backprop
will load the architecture file and display a graphical representation of the network. Note that
lines connect each node in the input layer to the nodes in the first hidden layer, each node in
the first hidden layer to the second hidden layer, and so forth. 

Training the Network
--------------------

The next step is to train the neural network to solve a problem. This is done by selecting 
"Train..." from the Network menu. A dialog will display, asking you to specify
an exemplar file. Type in the path of the exemplar file, or click the "Browse" button to 
navigate the file system in search of one. By convention, exemplar files are given the same name 
as the architecture file, but have a ".exm" suffix. For example, "mynet.exm" would be the exemplar 
file for the network defined in the architecture file named "mynet.net".

The exemplar file contains a count of the number of exemplars stored in the file, followed
by the exemplars themselves. An exemplar consists of two lines, one containing a value for
each of the input nodes, and the other containing a value for each of the output layer nodes.
An exemplar file corresponding to the "Has Fins, Has Gills, Is a Fish" problem described 
above might look like this:

3

1            0            
0

0            0            
0

1            1            
1

You also must specify a weight file, which by convention is the name of the architecture
file with a ".wgt" suffix, for example, "mynet.wgt". A weights file represents a learned
neural network. As I mentioned above, each node in the input layer is connected to each
node in the first hidden layer, and so forth. Each of these connections has a corresponding
weight associated. Training a neural network amounts to changing these weights in such a 
way that the network correctly learns the problem it is being trained for. If the neural
network successfully trains, the weights will be saved to this file when training completes.

Once you have specified the exemplar and weight files, click OK to start training. A dialog
will display that shows the number of iterations executed, and a graph of the cummulative
error. A properly training network will, over time, exhibit a decrease in the cummulative
error, but at times the error may increase as the network stabilizes. 

The architecture window will display the firing values of each node in the network during 
training, using a 255-level grayscale colormap. The range of values mapped to this grayscale 
colormap is [0.0, 1.0] by default, with 0.0 displaying as black, 1.0 displaying as white, and 
0.5 displaying as a middle gray. Values above the range display as green, and values below the
range display as red. You can change the range by selecting Options... from the View menu, and
changing the values in the Neuron Output Range text fields. For example, to set the lower value
to -1, type -1 in the "Low" text field, and click OK. You can change this or any other data in 
View->Options... dialog during a training session, and it will take effect immediately upon
clicking OK. You can also display a color for each weight in the network by selecting the
"View Node Outputs and Weights" radio button. The colormap corresponding to this display will
be shown at the top of the architecture window. You can widen or narrow the range of the 
colormap at any time during a training session by modifying the Low and High text edit fields
in the Weight Output Range portion of the View->Options... dialog.

The Edit->Training Settings... dialog can be used to change parameters of the training algorithm
used by backprop. It is outside the scope of this document to give detailed descriptions of each
parameter, but here are some hints and observations:

-- Use "Maximum training iterations" to control how many training iterations are executed before
backprop gives up. The default value is probably too high for most cases, I would recommend a 
lower value and perhaps changing other parameters or the network architecture before attempting
to give the network a long time to converge on a solution.

-- Per-exemplar threshold defines how large the output error must be before the network adjusts
the weights. For example, if the threshold is 0.5 and the error is 0.6, then the error is greater
than the threshold and the weights in the network will be adjusted in an attempt to improve the
accuracy of the network. If the error were 0.4, then the network would not be adjusted for this
exemplar. If all of the errors for the exemplars are below the threshold, the network has learned
the exemplars and training successfully halts.

-- Momentum and learning rate are parameters that affect the training backpropagation training
algorithm. Momentum causes the network to consider earlier behavior of the network in computing
new weight values. Learning rate affects how rapidly weights are adjusted, and may or may not
affect the ability for the network to successfully train. Usually, you will want to set the 
learning rate high and the momentum low, but this is only a starting point. By turning on and off
these options, and changing the values, you can experiment with what works best for your network
architecture and training data.

-- Bias adds a trainable input to each hidden and output node in the network. The value of this
input is always 1, and the weight on this input is always adjusted during training. In some cases,
a bias is needed in order for the network to converge, but this is not always the case. Again.
refer to the literature for more guidance on the uses of bias, and experiment with backprop to 
see what affect it has.

-- Backprop also allows you to select from two activation functions. The first, and default is
sigmoid. This is by far the most popular activation function, and results in an output that is
in the range of 0.0 to 1.0. If your exemplars include outputs in the range -1.0 to 1.0, then 
hyperbolic tangent may be a better choice, since it fires in the range of -1 to 1. 

Finally, backprop training starts by initializing the weights to random values. By default, this
range is 0.0 to 1.0, but you can change the range to, say, -1.0 to 1.0 by using the Initial
Weight Range settings in the Edit->Training Settings... dialog.

Executing the Network
---------------------

Once you have a trained network, you can use it to solve problems. Select Execute... from the
Network menu. If you just finished training the network, the weight file will be prefilled for
you. If you wish to use another weight file, type in its path or use to Browse... button to 
find it. You also need to specify an input file that contains the data you want the network to
process. These files are named with a ".run" suffix, for example, "mynet1.run". The file is 
very simple, just a single line of text with input values. For example,

0 1

will cause the network to set the value of the first input layer node to 0, and the second
input layer node to 1. Clicking OK in the Execute... dialog will cause the network to process
the specified file, and the graphical architecture window will display the results. Note that
the input nodes will display the values read from the input file, and the output nodes will
display an answer that should be correct for the data that was processed. If not, you might
consider adding the data to the exemplar file, and retraining the network. Then, the network,
assuming it trains, should have no problem processing the input.

Books About Backpropagation
---------------------------

These are a few books I've found useful in understanding backpropagation, and neural nets in
general.

Author                     Title                                    Publisher     

James T. Anderson          Introduction to Neural Networks          MIT Press    																			
Reed, Marks                Neural Smithing                          MIT Press   
Rummelhart, McClelland     Parallel Distributed Processing, Vol 1   MIT Press     

Known Problems
--------------

None as of 0.9. Please e-mail requests and bug reports to me at slogan@cts.com.

Planned Enhancements
--------------------

-- Add support for controlling the update frequency of the graphical representation of the
network, and the training error strip chart. 

-- Add support for XML-based architecture files, and possibly XML-based weight, exemplar, and
data files.

-- A toolbar. I'm looking for a talented graphics artist who can do the artwork, if you know
of someone who can volunteer their time, please send me e-mail at slogan@cts.com.

Modification History
--------------------

6/10/2003 Initial version 0.9 released.

