------------------------------------
CPT Dictionary 1.1.4

Win32 version, MS JVM (jview) or Sun's
Java JDK/JRE 1.1, 1.2 or 1.3 required.

Shareware, free for non-commercial use.
Freely distributable.

Updated: 15-August-2001
------------------------------------

DESCRIPTION
-----------
CPT Dictionary is a browser for dictionary files (CTrees),
created by the program CPT Word Lists 1.0/1.1.

Features:
- browsing/searching in any standard encoding including
  Unicode decomposition and bidi support;
- creates display list of all words or clues (definitions);
- supports inverted indexes;
- filtering words and clues via all tags;
- options for the keyboard input, the searching,
  and the information to be shown;
- searching variants of words like anagrams
  and palindromes;
- can be localized by the user.

The distribution contains two sample dictionaries,
just to test the installation:
- "The Unofficial Smiley Dictionary" is CTree with
clues (smile.dic).
- "2000K" is artificial word list of two millions words,
stored in 9KB file (2000K.wlz).
You can check the web page
http://www.geocities.com/cptshareware
for more word lists and dictionaries.
(We plan to put some there.)

The only documentation for now is this file and for
the details and the terms used you should look in
the description of CPT Word Lists.

SYSTEM REQUIREMENTS
-------------------
- Supported OS: Windows 95/98/ME/NT/2K
  (for Linux there is a separate distribution);
- Requires 511 KB of disk space and 32 MB RAM;
- This is Java program and Sun's JDK/JRE 1.1.3 or
  greater or compatible is needed. The Sun's Java
  Runtime Environment is available for download at: 
    http://java.sun.com/products/
  (version 1.1.8 is 5 MB, 1.2.2 is 12 MB, 1.3.1 is 7.8 MB)
  For Java 1.1, please, look at our JavaFonts.txt for tuning
  your JDK/JRE installation.
NOTE: Java 1.2 on Win 95/98 has problems with non
ANSI characters and if you want to display
international characters, use 1.1. or 1.3.

  If you prefer MS JVM (JView is included in any Windows
  before WXP), and if your version does not support JNI
  (it gives "...UnsatisfiedLinkError"), take new one from:
    http://www.microsoft.com/java/vm/
  (the recent versions are about 6 MB)
NOTE: MS JVM has faster GUI but supports limited number of
character encodings and you might get an error message by
the CPT programs like "...Converter not found!".

INSTALL
-------
1. Extract this zip file into temporary directory.

2. Edit "install.bat" to reflect your Java VM
   (there are guiding comments in the file).
   Run it from the temporary directory.
   This will start the wizard and according to your
   choices, the program CPT Dictionary will be installed.

3. The installation program (install.class) is a
   self-extracting class file whose contents get
   extracted during the installation and two
   directories will be created:
   - the target one chosen for the installation;
   - <user-home>\ITJ directory for the uninstall program
   (see UNINSTALL below).
   Note that CPT Dictionary is 'single user application'
   and the user running the program should have
   full access to the installation directory.

4. If the installation fails, you still can extract
   the program from 'data.zip' file left in the
   temporary directory.

5. If you have problem running CPT Dictionary,
   check/modify the generated cpt_dc11.bat file to
   reflect your JDK/JRE environment, especially
   if you install a new version of JRE after you
   install this program.

UNINSTALL
---------
To uninstall, do one of the following:

1. Click on the uninstall icon added to the
   start menu or desktop folder.

2. Go to the Add/Remove Programs dialog in the
   Windows Control Panel and remove the program.

3. Run in command line
     <user-home>\ITJ\juninst <CPT-home>\UnInst
   where <CPT-home> is the installation directory,
   and <user-home> for Win 2K is:
     c:\Documents and Settings\<user-name>
   for NT it is:
     c:\winnt\Profiles\<user-name>
   and for ME or single user Win 95 it is:
     c:\windows

If you have installed a new version of JRE after
the installation of this program, check/modify
juninst.bat in your <user-home>\ITJ sub directory.
After the uninstallation, the ITJ directory
will not be removed because it serves all CPT
packages. If you don't have any other CPT
program, you can delete it.


DOCUMENTATION
-------------

Changes in version 1.1:
- bug fixes;
- filtering words and clues for all tags;
- creating own inverted index files;
- searching variants of words;
- drawing in color in the text area;
- support for IPA8 encoded clues;
- additional sorting of the display lists.

CONTENTS
A. Introduction
B. Select Dictionary
C. Display Options
D. Search Text Field
E. Search Options
F. Select Font
G. Quit
H. Localization


A. Introduction

The program can do extremely fast and incredibly slow
searches depending of the settings. The rules of thumb are:
- if you don't see the search results in a second, some
'heavy-weight' option has been set, read below;
- when using regular expressions do not put excessive '*' or '?'
in the beginning of the search pattern - the searching
will be optimized if the pattern starts with real letter;
- do not set 'Unicode Normalization' if you don't know
what it means in the specific case (usually, it will switch off
many of the optimizations);
- when the main search list is clues, choose 'Browse Style';
- open the dictionaries in 'Low' memory/speed mode, the other
modes are for the users who know what they are doing
(see the documentation of CPT Word Lists);
- when searching 'Variants', escape all additional flags
like 'Ignore Case', 'Ignore Spaces', etc.

The rules above are effective for big dictionaries,
having thousands or millions of words. To be more clear,
'extremely fast' means 'less than a second' -  e.g. searching a
word in 5 millions words CTree, 'incredibly slow' means
'more than 10 minutes' -  e.g. searching a clue pattern in
150000 words with 150000 clues packed dictionary
in 'Search Clues Style' (a mode supported just for completion).

Well, after the 'special notes' above, here is
the short description of the program.

B. Select Dictionary

After starting the program, click on the left most
button to open a dictionary and/or to add new one to the list.
For now, you can do searching only in one opened dictionary file.

The radio button group 'Open selected on start up'
allows to choose one dictionary and to forget about
this dialog. 'None' is used to clear any selection made,
without browsing the whole list.

The radio button group 'RAM used and search speed'
is almost obsolete. In most cases you should select
'Low' (the packed CTrees now have reasonable speed,
and the inverted indexes will force 'Low'). 
If you select 'High' or 'Medium' for big CTree with clues,
you will really gain in speed for multiple searches in
'Search Style', but the opening of the dictionary
will be slower.

When you click on the "OK" button, the selected dictionary will
be opened and available for searching and browsing.

C. Display Options

The second button from the bar will start a dialog
with the following options:

C.1. Format Tab

- 'Right Alignment' should be set for right-to-left scripts.
- 'Shaping' should be set if you need Arabic shaping or if
  the dictionary is in Thai composed form.
- 'Wrap Tags/Clues Lines' if not set, the program will show any
clue/tag in one line even if it is hundreds of characters.
Notes: There is no special support for the Thai language.
If the clues are in RTL script stored in visual order (usual
case for Hebrew), and you select 'Right Alignment',
the program will use the logical order for the wrapping
(with all consequences of the double conversion),
otherwise, it will be wrapped as LTR script.

- 'Search Style' means no display list and all matches
from the searching will be shown.
- 'Browse Style' means to create display list and only the first
match will be selected. When you click on a word, you will see
the tags and clues linked to this word or the variants
of the word if in 'Variants' mode.
- 'Sort' will force the 'natural' sorting of the display list.
It will be done if the display list contains upper and
lower case letters, and if the dictionary has been created
using 'Strict Alphabet' and 'Locale Sorting'.

- 'Words' will switch the main search/browse list
to the normal mode (words -> clues);
- 'Clues' will switch the main search list
to the clues if available. Note that the searching in clues
using 'Search Style' means to scan sequentially all CTrees in the
dictionary, which is the most slowest mode that we can imagine.
For normal speed, use 'Browse Style'.
- 'Browse with Inverted Index' will create/use supporting
inverted index when searching in clues. In this mode when
you select a clue (click or search), you will see all words,
which have links to this clue.
If the flag is not set, you will not see the words, but you
will save many resources (time and memory).

If the dictionary file does not contain inverted index
the program will make own one and will save it in a new file
(appending ".ii" to the name).
The main idea behind the inverted indexes is to use the
dictionary in both directions - e.g. if it is 'de_en'
(German to English), you can browse it as 'en_de'.
This make sense if the dictionary has been created in
'word list' style mode. If the clues are big paragraphs,
the sense of inverted indexes is under question.
 
C.2. Tags Text
Use this tab to select the text of the tags to be shown.
Some tag display texts are quite boring and you could
switch them off.

C.3. Filters
This tab shows and allows to select any of the tags included
in the dictionary as filters for the words and clues.
The tags are presented by the codes and the display text
(again, for the details, see the CPT Word Lists documentation).
Since there are 4 groups of tags ('Morpho', 'User', 'Topic',
and 'Clues'), the filtering is done in two steps:
the first is in a group of tags and the second is in total
(for all groups).
In a group: if a tag is not selected, the word/clue having
this tag will be marked as 'bad'.
In total: if a word/clue has at least one 'bad' mark, it will be
excluded from the search/browse list.

In simple words, you can do global queries via the tags selection.
For example, open the 'smile.dic' dictionary. In the dialog
'Display Options' select 'Browse Style' and 'Words' in 'Format' tab,
and the tag 'miniatures' only from 'Morphology Tags' in 'Filters' tab,
and you will get the words of one or two characters only
in the display list.

D. Search Text Field

This is the field in the center of the top window bar.
Here you can enter a word to search for. Simple regular
expressions, bidi, and Unicode notation are supported,
Note that the searching is for words, not strings.
For example, in 'Search Clues' mode, to find an entry
containing "word", you have to enter the regular
expression "*word*", or in 'Search Words' mode, to find
the word "word" the pattern "word" is OK.
To switch off the regular expressions, use the 'Input' tab
in 'Search Options' dialog.
For the complete description of the regular expressions
you have to look in the documentation of CPT Word Lists.
Here we will mention the most used special symbols:
*   matches zero or more characters;
?   matches exactly one character.

Meanwhile, you can see the whole dictionary in the text area
via selecting 'Search Style' and 'Words' in the 'Display Options'
dialog and then enter the search pattern "*".
Well, there is no problem for small dictionaries, but all
depends on the available memory - the program has to
unpack everything and to format the lines in the text area.
If the dictionary is protected some how, this kind
of search will be ignored or you will not be able to
copy from the text area via the clipboard.

The communication with the clipboard is always in Unicode
and in logical order when the dictionary is stored
in logical order (RTL scripts).

You can use the 'Search' button instead of <Return> key
to start the searching. This button will mean 'Find Next'
when working in 'Browse Style' - the searching will
start from the list item following the last selected.
The <Return> key will always start from the beginning.

E. Search Options

The first button on the right of the text field will
start a dialog for the keyboard input and search options.

E.1 Input Tab
- 'Allow \uxxxx notation' will transparently convert
the \uxxxx encoded characters to Unicode.
- 'Regular expressions' will switch on this processing.
- 'Keyboard converter' is option only for Linux. If set,
the selected encoding from 'Select Font' dialog will
be used to convert the typed 8-bit characters to Unicode.

E.2. Search Tab
- 'Ignore case' will switch on the caseless searching.
- 'Special casing' will switch on the special Unicode
casing when changing the letter case.
- 'Stop on first match' is valid for 'Search Style' mode.
- 'Unicode Normalization' means to apply the selected
normalization to the source text and to the search pattern.

E.3 Unicode Tab
Use the radio buttons to select the desired
Unicode normalization.
The standard forms are described in
"Unicode Technical Report #15"
(http://www.unicode.org/unicode/reports/tr15/).
The processing for the other normalizations
is described in the documentation of CPT Word Lists.

E.4 Variants Tab
To switch to the combinatorial ('Variants') mode,
the check box 'Search Variants of Words' should be set.

The radio button 'Palindromes' means to search
words which read the same forward and backward
and are symmetrical (e.g. "malayalam").

The radio button 'Reversions' means to search
words which when are read backward are the same as
the search pattern and not need to be symmetrical
(e.g. "doom" and "mood").
The palindromes are subset of this class and if the
search pattern is palindrome itself, it will appear
in the result list as well.

The radio button 'Anagrams' means to search for all
words which have the same characters as the search
pattern (e.g. "acre", "care", "race").

The radio button 'Similar' means to search for all
words which have 'almost' the same characters as
the search pattern. The minimal percent for the matching
can be given in the text field on the right (as integer
between 1 and 99). This is the slowest search from
the Variants group.

The check box 'Ignore Spaces' will slow down the process,
but will allow to find palindromes like "pull up"
or anagrams like "backset" and "set back".

Notes:
The searching of variants is supported in 'Browse Style' mode
(the words are searched in the current display list).
You don't need to enter a search pattern. Just click on
'Find Next' and all following words from the list will be
tried as a search pattern until a variant is found.
If you have entered a regular expression, the next match
from the list will be the search pattern.
For example, to find palindromes starting with "re",
enter the pattern "re*". To find all variants in the
dictionary, click sequentially on the 'Find Next' button.
The searching of anagrams is not supported
if 'Unicode Normalization' is set and
in 'Browse Clues' mode (you will see an error message).

F. Select Font

The 'a' button will start the dialog for selecting
the display font characteristics. For Sun's Java 1.1 the
font list is limited to several fonts. For any other
JVM the list will contain most of the installed
fonts on your OS. Some of the problems with Java 1.1
fonts could be solved if you set the 'Encoding'
to 'Unicode'.
You can type or paste in the text field any
sample text to see how it will be shown using the
selected font.

If the dictionary contains IPA8 encoded clues,
you should select a font supporting the
Unicode IPA block (like "Arial Unicode MS" or
"Lucida Sans Unicode").


G. Quit

Finally, to stop the program, click on the right most
button. The current setting for the dictionaries from
the list will be saved.

H. Localization

If you want the program to talk to you in your language,
you have to do the following:

H.1. Replace in cpt_dc11.pr the line
ProgramLocale=<locale>
where <locale> is ISO-639 language code,
optionally followed by "_" plus ISO-3166 country code.
For example, 'el' or 'el_GR' is for Greek, 'en' for English,
'ru' for Russian, etc. This is the easy part.

H.2. Put in 'locale' directory a file with name
'<locale>.msg', which contains the messages in
your language. Use the 'default.msg' file as
a template to translate the text.
There is another 'Readme.txt' file with instructions
in the 'locale' directory.

H.3. Ensure that in Java's 'font.properties' file,
the 'dialog.plain.' and 'dialog.bold.' fonts are
assigned to your locale font. This step is not
possible for MS JVM (actually not needed if the
proper language is set in your Windows),
but for any other JVM (including Java 2) you have
to do it. E.g., if your locale is 'be' or 'bg',
do the following in jre\lib directory:
  ren font.properties font.properties.ANSI
  copy font.properties.ru font.properties
or if you have font.properties.Cp1251, use it.

CONTACT
-------
We are very interested in receiving your comments,
suggestions, and bug reports at our email:
cpt.software@usa.net  
