

               WORDSURV:

               A Program for Analyzing

               Language Survey Word Lists



               by John S. Wimbish
                                
                                


                                


                                
                                


                                
Occasional Publications in Academic Computing

Number 13


                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
                 Summer Institute of Linguistics
                          Dallas, Texas

This book is sold with the software it describes.  That software,
too, is the copyrighted property of the Summer Institute of
Linguistics.  However, in the interest of sharing the fruit of
our research with the larger academic community, the registered
owner of the WORDSURV software is granted the right to share
copies of the distribution diskette with friends and associates,
provided this is not done for commercial gain.  Such recipients
of the software, if they decide to use it in their research,
should in turn become registered owners by buying this book with
its latest version of the software.















                                  
                                
                                
      Copyright 1989 by the Summer Institute of Linguistics
                       All rights reserved



                             CONTENTS

1. Introduction                                           1

  1.1  An overview of WORDSURV                            2
  1.2  Program limits and hardware requirements           3
  1.3  An overview of the book                            5
  1.4  Installing the program                             5
  1.5  Acknowledgements                                   8

2. A walk through the program                            11

  2.1  Starting WORDSURV                                 12
  2.2  Data storage                                      14
  2.3  Data analysis                                     16
  2.4  Output                                            17
  2.5  Settings                                          18
  2.6  Concluding the tour                               18

3. General conventions                                   19

  3.1  Some definitions                                  20
  3.2  Modules and function keys                         20
  3.3  Menu conventions                                  21
  3.4  The response editor                               21
  3.5  The text editor                                   23
      3.5.1  Entering characters                         23
      3.5.2  The IBM extended character set              24
      3.5.3  Cursor movement                             25
      3.5.4  Delete and undelete                         26
      3.5.5  The editor's exit menu                      27

4. Data entry                                            29

  4.1  The F1-CATALOG module                             30
      4.1.1  Language symbols and titles                 30
      4.1.2  Reliability codes                           31
      4.1.3  Background information                      32
      4.1.4  The menu commands                           32
      4.1.5  The catalog file format                     35
  4.2  The F8-TITLES module                              36
  4.3  The F3-DATABASE module                            38
      4.3.1  Initializing the word list database         38
      4.3.2  On collecting good data                     39
      4.3.3  Presentation format of word list data       40
      4.3.4  The menu commands                           44
      4.3.5  Express database editing                    50
      4.3.6  Word list database file format              51

5. Data analysis                                         53

  5.1  The goals of word list analysis                   54
  5.2  The USE SCHEME parameters                         55
  5.3  The F4-INTEGRITY module                           56
  5.4  The F7-SHARED module                              58
      5.4.1  Concerning shared vocabulary counting       58
      5.4.2  Concerning phonostatistics                  60
      5.4.3  The presentation of matrices                63
      5.4.4  The menu commands                           64
  5.5  The F2-COMPASS module                             67
      5.5.1  What COMPASS does                           67
      5.5.2  The menu commands                           74
  5.6  Further reading in word list analysis             76

6. Output, settings, and help                            77

  6.1  The F5-OUTPUT module                              78
      6.1.1  Output file handling                        78
      6.1.2  The menu commands                           78
  6.2  The F6-SETTINGS module                            82
      6.2.1  Database and Catalog settings               83
      6.2.2  Module and Use settings                     84
      6.2.3  Phonostatistics settings                    85
      6.2.4  Confidence value settings                   86
  6.3  The F9-HELP module                                86
  6.4  Exiting WORDSURV                                  87

Appendixes                                               89

  A.  Error messages                                     90
  B.  Sample data                                        94
  C.  Alt key codes                                      98

References                                              101

Index                                                   103


                             FIGURES

1.1  The SETTINGS module display                          7

2.1  The display upon program startup                    13

4.1  The CATALOG module display                          30
4.2  Word list reliability codes                         31
4.3  The Korafe catalog entry                            35
4.4  The catalog file format                             36
4.5  The TITLES module display                           37
4.6  TITLES module display for the sample data           38
4.7  Sample word list record                             41
4.8  Use of # to disqualify a database record            42
4.9  Word list record demonstrating missing data         42
4.10  Properly aligned word forms                        43
4.11  The gloss editor                                   47
4.12  The database editor with sample data               49
4.13  The word list database file format                 52

5.1  The INTEGRITY module display                        57
5.2  The SHARED module display                           65
5.3  COMPASS module display of phoneme correspondences   70
5.4  COMPASS module display of cognate strengths         73
5.5  COMPASS module summary display                      73

6.1  The OUTPUT module display                           79
6.2  The SETTINGS module display                         83
6.3  The Database submenu of the SETTINGS module         83
6.4  The Module submenu of the SETTINGS module           84
6.5  User specified titles                               85
6.6  The Shared submenu of the SETTINGS module           86
6.7  The HELP module display                             87


_________________________________________________________________

Chapter 1


INTRODUCTION


      1.1  An overview of WORDSURV

      1.2  Program limits and hardware requirements

      1.3  An overview of the book

      1.4  Installing the program

      1.5  Acknowledgements


1.1  An overview of WORDSURV

  A  typical   language  survey   may  involve   activities  like
determining linguistic  relationships through  the comparison  of
word lists, testing dialect intelligibility by playing back tape-
recorded texts,  and studying sociolinguistic aspects of language
use and  language attitudes in multilingual situations.  WORDSURV
(for Word  Survey) is  a computer  program designed to aid in the
first of these areasthe collection and analysis of word lists.

  The potential  of WORDSURV  can best  be seen  through contrast
with a  language survey  done without  the aid of a computer.  In
such a  survey, the  analyst typically  collects  word  lists  on
preprinted forms  having only  national or trade language glosses
from which  to elicit.   Once  these lists  have been  collected,
each pair  of lists  is placed  side  by  side  and  compared  to
determine the  percentage of  shared vocabulary between them.  If
there are ten such word lists, there are 45 possible combinations
to compare.[1]  If, on the  average, it took an hour to compare a
pair of  word  lists,  it  would  take  a  week  to  perform  the
comparisons for  all 45  possible pairs  of word lists.  A survey
the size  of the  Zambales survey  (Wimbish 1986),  with 50  word
lists, would  require seven  months to  compare the 1225 possible
pairs of  word lists  (if anyone  would actually  attempt such  a
thing).

  WORDSURV  has  several  advantages  over  the  above  scenario.
First, it  provides a special, printable form for the elicitation
of additional  word lists.   Using this printout, the analyst may
see all  of the  language forms that have previously been entered
for each  gloss as  new lists  are  collected.    By  having  all
previously elicited  forms readily available, the analyst is in a
position to  question suspicious,  new forms  on-the-spot.    For
investigators with  a diachronic  interest, this  also  makes  it
easier to  "fish" for  forms related  to  those  that  have  been
collected in  other dialects but which may have undergone a shift
in meaning.

  Second, WORDSURV  reduces the  time spent  in shared vocabulary
comparisons.   The time  cost is  only that of entering each word
list into  the computer,   a  process that  typically averages an
hour per  list.   Cognate decisions  are made  only once for each
word  list,  at  the  time of data  entry.  Thus the hypothetical
survey with ten word lists  would  require  ten, rather  than 45,
hours  to  process.[2]  A survey with  50  word  lists  could  be
analyzed in about seven days rather than seven months.

  Third, WORDSURV takes advantage of the speed of the computer to
conduct  a   more  in-depth   comparative  analysis   than  basic
lexicostatistic analysis.   A  phonostatistic  analysis  gives  a
measure of phonological divergence between dialects.  The COMPASS
analysis (Frantz  1970) measures the strength of proposed phoneme
correspondences and  gives an  indication of  the likelihood that
words grouped in cognate sets are actually cognates.

  Finally, WORDSURV  allows word  list data  to be  output  in  a
format useful  for reports.  If one considers that in a situation
where a  computer analysis program is not available, the surveyor
would still  need to  type the  word lists for the survey report,
then the  additional time  required to enter the data in WORDSURV
becomes almost negligible.

  In a nutshell, WORDSURV functions in three main areas: (1) data
entry and  maintenance, (2) data  analysis, and  (3) data output.
WORDSURV works  in the  first area  through  two  databases,  one
containing  the   word  list   data,  and  the  other  containing
demographic information  about each  list.   In the  second area,
data analysis,  WORDSURV provides  a  simple  count  of  apparent
cognates between  lists, provides  a phonostatistic  analysis  of
these cognates,  and performs  the COMPASS  analysis.   The third
area, data  output, provides  a number  of options  by which  the
elements of the two databases and the results of the analyses can
be printed out.


1.2  Program limits and hardware requirements

  WORDSURV   allows you to enter a maximum of 90 word lists, with
each list  containing up  to 999  items.   The program  was first
tested with  Zambales data  from the  Philippines (Wimbish 1986),
which involved 50 word lists that were 372 items in length.

  WORDSURV is  written in  the  C  programming  language  and  is
compiled for  IBM PC-compatible  computers (which  use the MS-DOS
operating system).   Because  of its screen-intensive nature, the
program  requires  a  full  display of 80 characters by 25 lines.
For this  reason it  will not run on early battery-powered models
like the Sharp PC-5000.

  The program  supports special characters only by supporting the
standard extended  character set  available on  IBM-PC compatible
computers.     This  extended   set  includes  the  94  printable
characters of the regular ASCII[3] character  set plus  128  more
printable characters, which include (among other things) accented
vowels, Greek letters, and mathematical symbols.  (See appendix C
for a complete list.)  With certain kinds of video adapters, such
as the  EGA and  the Hercules  Plus, it  is possible  to download
user-defined shapes  for the  characters.   WORDSURV will support
such user-defined  character sets,  though it is beyond the scope
of this  book to  describe how  such character  sets are defined.
Even if  you are  not able to get the desired character shapes on
the screen,  there are many word processors which can take a file
that uses the extended character set (such as an output file from
WORDSURV) and  replace the program's characters with user-defined
ones at print time.  In this way special characters such as those
used in  phonetics may  be displayed  on the  printed page,  even
though they cannot be displayed on the screen.

  A hard  disk minimizes  the time required for loading WORDSURV,
but the  program works  fine with  floppy disk  drives.   In that
case,  however,   it  is  best  to  have  two  drives  to  handle
potentially lengthy data files.

  WORDSURV begins  by loading its databases into memory.  Because
it does  not access  the disk  during regular  program execution,
computers powered by batteries will operate much longer than they
might with a disk-intensive program.  This strategy of minimizing
disk access  is employed  to facilitate carrying a computer along
on the  survey.  It has the disadvantage of potentially requiring
large amounts of RAM memory with longer word lists (for instance,
the Zambales  survey data  filled 120K  of memory  when  loaded).
Because the  program itself  requires up to 250K memory to run, a
system with  512K memory  should be  sufficient for  all but  the
largest surveys.   The  user with  a small  system  should  avoid
memory resident  programs such  as RAM disks and printer spoolers
when running WORDSURV.


1.3  An overview of the book

  This book  assumes that  the user  has no previous knowledge of
WORDSURV, but  is familiar with basic computer operation and with
the MS-DOS operating system.

  The WORDSURV  program disk  includes sample data files based on
the Zambales  survey conducted  by the  author in the Philippines
(Wimbish 1986).   Chapter 2 uses these data files to give a quick
tour of  the most  prominent features  of WORDSURV.    This  tour
provides an  overview of  the  entire  program  so  that  in  the
chapters  which   follow  its  individual  parts  may  be  better
understood.

  Chapter 3 discusses  some of  the terms and general features of
WORDSURV  that  are  employed  throughout  the  program.    These
features include  menu conventions, the use of function keys, and
the built-in editors.

  The remaining chapters present the details of program operation
under the  three main  headings given  in the program overview of
section 1.1.   Chapter  4 deals  with data entry; chapter 5 deals
with data analysis; and chapter 6 deals with data output (as well
as parameter settings and the on-line help system).

  Appendix B provides an abbreviated set of sample data for seven
languages of  Papua New  Guinea (Farr  and  Larsen  n.d.).    The
illustrations in  chapters 4  through 6  are based on this set of
data rather  than the  samples provided on the release disk.  The
user is  encouraged to  follow the  instructions in  the tutorial
sections throughout  these chapters to enter, analyze, and output
these data  and thereby gain first-hand experience in using every
aspect of the program.

  Once  learned,  WORDSURV  is  relatively  easy  to  use;    its
difficulties are  mainly those  inherent in  understanding    the
principles of word list analysis.


1.4  Installing the program

  The WORDSURV  program disk  contains a  program file  and three
sample data files, as follows:

  WORDSURV.EXE   the executable program

  WORDSURV.DB    sample Zambales word list database

  WORDSURV.CAT   sample Zambales catalog file

  WORDSURV.PM    parameter settings  for the sample data  (The
                 .PM file must always be in the directory from
                 which WORDSURV is called.)

Be sure  to consult  the README  file  also,  which  may  contain
updates or corrections to the information printed in this book.

  The following steps are recommended for first-time use:

  *  As with  any new  software, make  a copy  of the WORDSURV
     release disk  and store the original in a safe place.  If
     you have  a hard  disk, make  a subdirectory and copy the
     whole release  disk into  it.   If you  have only  floppy
     disks, copy the release disk to another floppy.

  *  Set the  current drive  and directory  to the one holding
     the WORDSURV  files.   Type dir  and press Enter to see a
     directory of  the WORDSURV  files and  confirm  that  the
     current default directory is the correct one.

  *  To start  the program,  type wordsurv  and then press the
     Enter key.  A startup message containing the version num-
     ber is  briefly displayed.   Then  the  display  for  the
     SETTINGS module appears; see figure 1.1.

  *  In the  upper right  hand corner of the display is a list
     of function  key assignments.   Adjust the brightness and
     contrast  controls   on  the   monitor  until   the  line
     "F6 SETTINGS" is  brighter than  the other lines.  If you
     have difficulty achieving this, see below.

  *  Once the  tour through  the program in chapter 2 has been
     completed,  the   sample  files   (namely,  WORDSURV.CAT,
     WORDSURV.DB, and  WORDSURV.PM) may  be deleted  from  the
     directory.


             Figure 1.1  The SETTINGS module display

--------------------------------------------------------
| DATABASE:                               PHONETIC     |  Function Keys:
|   Filename:         WORDSURV.db         DEGREES of   |  F1  CATALOG
|   Backup Filename:  WORDSURV.dbk        DIFFERENCE:  |  F2  COMPASS
|   Automatic backup: N                                |  F3  DATABASE
|   In memory <N>   Needs backup <N>      SHARED       |  F4  INTEGRITY
|                                         CONF's       |  F5  OUTPUT
| CATALOG:                                A: 1.010     |  F6  SETTINGS
|   Filename:         WORDSURV.cat        B: 1.320     |  F7  SHARED
|   Backup Filename:  WORDSURV.cbk        C: 1.645     |  F8  TITLES
|   Automatic backup: N                   D: 1.960     |  F9  HELP
|   In memory <N>   Needs backup <N>      E: 2.575     |  F10 Exit
|                                                      |
| MODULE on program entry: F6                          |
|                                                      |
| USE SCHEME                                           |
|   Current use scheme:    All initialized titles      |
|   User specified titles:                             |
|                                                      |
| Available memory: 507                                |
------------------------------ WORDSURV Version 3.2 ----
 Database Catalog Module Use Titles Phonetic Shared
 Database file settings submenu.


  WORDSURV uses  a  scheme  in  its  video  display  that  writes
directly to the screen's memory.  The result is a fast display of
data;   the problem  is that  not all computer displays behave in
the same  manner.   To allow  for the incompatibility of computer
displays, WORDSURV  has an internal setup routine that mimics the
standard ANSI[4] device driver.

  In normal operation, WORDSURV uses a best guess as to the video
attributes to  display on the screen.  The current menu selection
appears in  inverse video.   Most  of the  display is  in  normal
video.   Characters typed by the user (including parameter values
and data)  are in  bold video.   In  figure 1.1  above,  and  the
remaining figures  throughout the  book, underlining  is used  to
show portions  of the  screen that  are highlighted by inverse or
bold video.

  If the  screen attributes  do  not  appear  in  a  satisfactory
manner, try  installing the  ANSI screen  driver.   Upon startup,
WORDSURV checks for the presence of the ANSI driver, and if finds
it, mimics  its use  of the  video for  normal, bold, and inverse
attributes.

  Installing the ANSI driver into WORDSURV requires the following
steps:

  *  Make sure  that the file ANSI.SYS is located in your boot
     disk's  root   directory.    ANSI.SYS  should  have  been
     included with  the system  software when  you bought your
     computer.

  *  Edit  (or   create)  the  file  CONFIG.SYS  in  the  root
     directory of your boot disk and insert the line:

          device=ansi.sys

  *  Save  the   newly  edited  CONFIG.SYS  file  and  restart
     (reboot) the computer.

  *  Type wordsurv  and then  press the Enter key to start the
     program.

  If there  is still  a problem, either the above process was not
followed correctly  or there  is some  incompatibility with  your
particular computer.   Please  send a  note  to  the  address  in
appendix A,  including your  brand of  computer and  observations
concerning the problem.


1.5  Acknowledgements

   WORDSURV  was a  product of necessity, born of a survey in the
Philippines requiring  the comparison  of some  fifty word lists,
each with  372 words  (Wimbish 1986).   Early  efforts to compare
these word  lists took  one and  a half  hours for  each pair  of
lists, potentially  requiring  a  year's  time  to  complete  the
analysis of that one survey.

  Credit for WORDSURV's approach to lining up language forms into
cognate sets  must be  given to  Ken Smith  and Chuck Walton, who
devised a  program in  the early 1970s on a mainframe computer to
compare Philippine languages.  Walton (1979) derived a lexically-
based language  tree from  the output  of this program.  In those
days the  data were  manually organized  into cognate sets as the
individual forms  were copied  onto large  filing cards,  one for
each gloss  in the  word list.   The  cognate decisions were then
entered on  key-punched cards  and fed into the computer.  It was
through a conversation with Chuck Walton that I began thinking of
replacing cards with a computer display.

  I am  indebted to  Sandra Wimbish,  Frank Blair,  Chuck Grimes,
Joseph Grimes,  Ted Bergman,  Gary Simons,  Linda  Simons,  Susan
Hochstetler, Wyn  Laidig, Karen  Buseman,  Alan  Buseman,  Eugene
Casad, and  members of  the May 1987 Language Use in Multilingual
Societies class  of the  University of  Texas  at  Arlington  for
comments on earlier versions of the program and documentation.  I
am indebted to Darryl Wilson for supplying the word lists used as
sample data for the tutorials.

  In addition, thanks are due to the many government officials in
the Republic  of the Philippines and in the Republic of Indonesia
who have  sponsored both  the work  of the  Summer  Institute  of
Linguistics and  my own  personal studies in those countries.  In
Indonesia, thanks  are also  due to Pattimura University in Ambon
for  supporting  the  work  of  SIL  in  Maluku.    Without  such
sponsorship the  ideas  and  opportunities  for  writing  such  a
computer program would not have existed.

  
____________________

  1 The number  of comparisons  reflects the  fact that each word
list must be compared with every other word list.  Where n is the
number of word lists, the total number of pairwise comparisons of
lists is  n(n-1)/2.   The number  of language  forms compared  is
computed by multiplying this number by the number of glosses in a
typical word list.

  2 To be  fair, it must be added that further time in data entry
will probably be spent during later stages of analysis as earlier
cognate decisions  are refined on the basis of more experience in
the language  family and  the results  from running  the  COMPASS
analysis.
  
  3 ASCII  is   an  acronym   for  American   Standard  Code  for
Information Interchange.

  4 ANSI is  an  acronym  for  the  American  National  Standards
Institute.  Among the many standards adopted by this organization
is one for the behavior of computer displays.  The ANSI.SYS file,
which comes  with the MS-DOS operating system, is a device driver
which causes the computer's screen to emulate the ANSI standard.


_________________________________________________________________

Chapter 2


A WALK THROUGH THE PROGRAM


      2.1  Starting WORDSURV

      2.2  Data storage

      2.3  Data analysis

      2.4  Output

      2.5  Settings

      2.6  Concluding the tour

  
  This chapter  gives a tour of the main features of WORDSURV,
  using the  sample Zambales  data for demonstration. The user
  has the  opportunity to  see the manner in which survey data
  are handled  by WORDSURV,  and the  results of  the  various
  analyses.

  *  Because this  is a  sightseeing tour  and not  a tutorial
     (the tutorial  begins in  chapter 4),  do only the things
     directed in  paragraphs that  are preceded  by  a  square
     bullet (like  this one).  This avoids the risk of getting
     into situations  from  which  the  escape  has  not  been
     covered.


2.1  Starting WORDSURV

  This tour  is based  on sample data (provided with the program)
which come  from a  survey  of  the  languages  of  the  Zambales
mountains in  the Philippines  (Wimbish 1986).   The tour assumes
that WORDSURV  is up  and running  on your computer, and that the
sample data  are available  for loading.  To reach this point, do
the following:

  *  Check  that   the  sample  data  files  WORDSURV.CAT  and
     WORDSURV.DB are  in your  default directory.  These files
     are included on the release disk.

  *  Type wordsurv  and then  press the Enter key to start the
     program.

  *  If this  is your  first time to use WORDSURV, the program
     will lead  you through  a brief  setup procedure.  Follow
     the directions on the screen.  Refer to the discussion on
     installation  in   chapter  1   if  you   experience  any
     difficulties.

  If, during this tour, you find the program behaving in a manner
different than  expected, it  is possible  that you  typed a menu
command by  mistake.  Pressing the Esc (or Escape) key is a means
of backing  out of  such difficulties;  you may need to press Esc
several times  to return  to the appropriate part of the program.
If you wish to exit the program, press F10 (you may need to first
press Esc as explained above).

  Once WORDSURV  has started  up,  the  SETTINGS  module  display
should appear,  as in  figure 2.1.  Note the menu of ten function
keys in  the upper  right hand  corner of the display.  The first
nine of  these, spelled  in all  capital letters,  represent  the
modules of  WORDSURV.   Each  module  presents  a  unique  screen
display and  a set  of commands  designed to support a particular
major function  of the program.  These modules may be accessed at
most points in the program by pressing the corresponding function
key.  The name of the module which is currently running is always
highlighted in  the function-key  menu; in  this case  the  words
"F6 SETTINGS" should appear brightly on your screen.


          Figure 2.1  The display upon program startup

--------------------------------------------------------
| DATABASE:                               PHONETIC     |  Function Keys:
|   Filename:         WORDSURV.db         DEGREES of   |  F1  CATALOG
|   Backup Filename:  WORDSURV.dbk        DIFFERENCE:  |  F2  COMPASS
|   Automatic backup: N                                |  F3  DATABASE
|   In memory <N>   Needs backup <N>      SHARED       |  F4  INTEGRITY
|                                         CONF's       |  F5  OUTPUT
| CATALOG:                                A: 1.010     |  F6  SETTINGS
|   Filename:         WORDSURV.cat        B: 1.320     |  F7  SHARED
|   Backup Filename:  WORDSURV.cbk        C: 1.645     |  F8  TITLES
|   Automatic backup: N                   D: 1.960     |  F9  HELP
|   In memory <N>   Needs backup <N>      E: 2.575     |  F10 Exit
|                                                      |
| MODULE on program entry: F6                          |
|                                                      |
| USE SCHEME                                           |
|   Current use scheme:    All initialized titles      |
|   User specified titles:                             |
|                                                      |
| Available memory: 507                                |
------------------------------ WORDSURV Version 3.2 ----
 Database Catalog Module Use Titles Phonetic Shared
 Database file settings submenu.


  Most of  the modules  require that  the word list database, the
catalog, or  both be  resident in  memory.   Therefore,  pressing
certain function keys for the first time results in a delay while
files are read.

  *  Press function  keys F1,  F2, F3,  F4, F5, F7, and F8, in
     this order,  for a  sample of  movement between  modules.
     When you press F2, WORDSURV will take 30 seconds or so to
     load the database file.

  *  Press function key F6 to return to the SETTINGS module.

  Note the  command menu at the bottom of the screen, immediately
under the  box.   Each module  has its own menu through which you
tell WORDSURV what you want it to do.  The words in the menu line
(Database, Catalog,  and so  on) are  the names  of the  commands
supported in  the current  module.   Note that while module names
are in  all capital  letters, command  names have  only the first
letter capitalized.  (This convention is followed throughout this
book, as  well as  on the  screen displays.)   One of the command
names is  always highlighted;  this is  the command that would be
executed if you were to hit the Enter key.  The very last line of
the screen  is a  one-line explanation  of what  the  highlighted
command does.

  *  Press the  right and  left arrow  keys to  move the  menu
     highlight, and  note  how  the  bottom  line  changes  to
     describe each menu selection.

   You  can invoke  a menu  command by  either typing  the  first
letter of  the command's  name (upper or lower case may be used),
or by  moving the  highlight around  with the  arrow keys  to the
desired choice and then pressing Enter.

  *  Press U  and note  how the message following "Current use
     scheme" on the screen changes from All initialized titles
     to User specified titles.

  *  Press U  again until  the message  following "Current use
     scheme" is All initialized titles.


2.2  Data storage

  WORDSURV uses  the term  title to  mean the language or dialect
name for a word list.  Furthermore, it assigns a single-character
symbol to each title to serve as an abbreviation.

  *  Press function key F8 to move to the TITLES module.

  You will see two columns on the screen.  The column on the left
is for  the one-character symbol; the one on the right is for the
title (or  name) of  the word  list.   When a symbol has no title
assigned to  it, the  title field is simply a row of dots.  There
are 90  symbols available  for assignment  to word  lists.  Since
only 13  of these  can appear  at a time on the screen, the whole
list is  displayed in  seven groups  called "blocks."   The  menu
commands Next  and Prev  allow you  to see  the other blocks with
their symbol-to-title assignments.

  *  Type N (for Next) and see the next block of titles.  Note
     that the  block number changes from 1/7 to 2/7.  Continue
     to type N until you return to the first block.

  *  Type P  (for Prev)  to see  the previous block of titles.
     Note that when the first block is displayed, the previous
     block is the seventh one.

  The TITLES  module (which  you are  now in) provides a means of
viewing the  title assignments  you have  made.  Elsewhere in the
program you  will refer to word lists only by their symbols;  the
TITLES  module  is  a  ready  reference  of  what  these  symbols
represent.

  The CATALOG  module is  where you  assign titles to symbols and
enter further information about each word list.

  *  Press F1.   The  CATALOG module  should come into view on
     the screen.

  The display has two small windows.  The top window contains the
title of  the word  list, its  symbol, and  its reliability code.
The reliability  code is  an indicator of the quality of the word
list used in the analysis computations.

  *  Press C  for menu selection Choose.  At the prompt on the
     right of  the screen,  type i  (do not  capitalize).  The
     catalog  should  now  be  positioned  at  the  entry  for
     Ilocano.

  The bottom window is used to enter background information about
the word  lists.  Note the types of information entered about the
Ilocano language.

  *  As you did in the TITLES module, move through the catalog
     by  pressing   N  (Next)  and  P  (Prev).    Notice  that
     background information  is not  required for  every  word
     list, though it is always a good idea to include it.

  The DATABASE module is used to manage the word list database.

  *  Press F3 to go to the DATABASE module.

The  DATABASE  module's  display  appears  complicated  at  first
glance,  as   the  window  divides  into  three  columns  with  a
horizontal box  below.   It is these columns that provide for the
formatting of  word list  data into  a  form  understood  by  the
analysis algorithms.

  The horizontal  box at the bottom of the window  identifies the
gloss for  the word  list item.   The  narrow column  on the left
holds cognate  set labels.   The  middle column  holds the actual
forms elicited  for the  gloss.  The column on the right contains
the symbols  representing the  word lists  which  have  the  form
entered in the middle column.

  Thus, in  the first  record of the sample, you can see that the
gloss is  `sky'.   There are many word lists with the form laNit;
these languages  all fall into cognate set A.  (These cognate set
letters are  arbitrarily assigned  by you  as the linguist.)  You
also see  three other forms, taNataN, banwa, and taw+n, which are
found in  only one  word list  each, and are assigned to separate
cognate sets.

  *  Move through  a  few  records  of  the  database  by  the
     familiar menu  commands Next and Prev.  In the second and
     following records  you will  see cognate  sets which have
     more than one form as members.

  Don't be  concerned,  at  this  point,  if  you  do  not  fully
understand the  grouping of  forms into cognate sets;  there is a
large section devoted to that subject in chapter 4.

  The database  is organized  both in  a semantic  order  and  an
alphabetical order.   The  commands  Next  and  Prev  access  the
database in  semantic order;   Alph  and  Last  use  alphabetical
order.

  *  Press 1  (for the  1stA command) to position the database
     at the  first gloss  in alphabetical order, in this case,
     `a tear (from crying)'.

  *  Press A  (for Alph)  a few  times to  advance through the
     word list by alphabetical order of glosses.  Then press L
     (for Last) a few times to move back in reverse order.

  There are two ways in which the database can be positioned at a
particular record  (that is,  gloss) without using one of the two
ordering systems:

  *  Press C  (for Choose).   A  prompt at  the right  of  the
     screen will  ask you to type a number.  Type 45, and then
     press Enter.   The  database will be positioned at record
     45.

  *  Press S  (for Search).   A  prompt will ask you to type a
     gloss, and  the cursor  will appear  at the  bottom  left
     corner of  the screen  where the  glosses are  displayed.
     For instance,  type father,  and then  press Enter.   The
     database will  move to  the appropriate  record.   If the
     gloss you  type does  not occur  in  the  word  list,  no
     message  is   displayed;  the   database  simply  remains
     positioned where it was.


2.3  Data analysis

  Once WORDSURV  has access  to the  word list  data through  the
above modules,  it is  ready for  analysis.   As  a  first  step,
WORDSURV can  check that  each gloss has an entry from every word
list.   This is  done through  the  INTEGRITY  module,  which  is
accessed by  pressing the  F4 key.  At this point, we will assume
that the  database is  complete and  will save  discussion of the
integrity checker  for later.   Now we will move on to the SHARED
module,  which   performs   counts   of   shared   cognates   and
phonostatistic comparisons.

  *  Press F7 to enter the SHARED module.

  *  Press G (for Go) to start the computations.

  WORDSURV should  now be  performing the  various analyses.  You
will see  the "Current  record" counter on the screen counting up
from one.   This  number reports  the database  record  currently
under consideration  by the  computer.  When it reaches 98, which
is the  total number  of records  in the  database, a  matrix  of
results will be displayed on the screen.

  Two counts  are performed  at this  stage:  a count of apparent
cognates and  a phonostatistic count.  These counts are different
measures of  the relationships  between the  word lists.    Their
results can  be viewed  on the  screen in seven matrices when the
counts are finished.  The first matrix is the Percent matrix, for
percentage of shared cognates.

  *  When WORDSURV has finished counting, select menu commands
     Tally and N(Total) to view some of the other matrices.

  The meanings of the different matrices are discussed in chapter
5.   Some features of the SHARED module are worth noting.  First,
the Exchange  command allows  permutation of  the matrices so you
can reorder  its rows  and  columns.    Second,  the  display  is
actually a  small window on the entire matrix, which in this case
is 40  rows by  40 columns.  The menu utilizes the number keys to
move around  in the  matrix.   The meaning  of each number is the
cursor movement  command on  the corresponding key of the numeric
keypad.   If you want to use the numeric keypad to generate these
commands, be  sure to  press the  NumLock key  first.   The upper
right portion  of the display shows the row and column number for
the upper  left corner  of the  displayed portion  of the matrix.
The display  initially shows the upper left corner of the matrix,
which has coordinates (0,0).

  WORDSURV provides  the COMPASS analysis in the F2 module.  This
algorithm  approximates   the  comparative   method  to  help  in
determining cognates.   The  discussion of this module is lengthy
and is postponed until chapter 5.


2.4  Output

  Each of the analysis modules is able to output its results to a
disk file.   WORDSURV  also  has  provision  for  outputting  the
information stored in the word list database and the catalog.

  *  Press F5 to enter the OUTPUT module.

  The OUTPUT  module has seven options, as listed on the top half
of the screen.  These include outputting the database in the same
form as  it appears  on the screen, outputting one word list at a
time, and outputting various portions of the catalog.

  WORDSURV is  not sophisticated  in either its editing functions
or its  output formatting.   To compensate, its output is a plain
ASCII file  that is suitable for input to a word processor, which
can then  be used  to perform  special editing  operations or  to
generate the  nicely formatted  printouts needed  for reports  or
other applications.


2.5  Settings

  We return now to the SETTINGS module, where we began this short
tour of WORDSURV.

  *  Press F6 to move to the SETTINGS module.

  This module  allows you  to customize the various settings used
by WORDSURV, such as the names of the files, the symbols from the
catalog to  use in  analysis, and  some of the parameters used in
the analysis algorithms.


2.6  Concluding the tour

  WORDSURV is  exited by pressing F10.  The program automatically
saves any  files in  which data  have been  altered,  and  prints
messages on the screen to tell you which files have been saved.

  *  Press F10 and exit WORDSURV.  Since you have moved around
     in the  catalog and  database, you  will  see  a  message
     stating that  the settings  file is being saved.  Besides
     the settings  of all  the WORDSURV  parameters, this file
     also saves  your current  position in the catalog and the
     database.

  This has  been a  short tour  of WORDSURV.   For  the  tutorial
sections in the following chapters you will need a disk that does
not contain  the Zambales  files.   However, you may want to keep
them elsewhere in order to be able to try out the various options
on their larger, more developed data, since the tutorial data has
word lists of only 25 glosses.

