Appendix     by Fumio Imai
--------------------------------
International patent draft    As of April.13.2002
--------------------------------
Title:
Pronunciation grouping method
--------------------------------
Abstract:
Upon the English dictionary, now, the words are listed alphabetically by those spelling. This listing method is not based on the pronunciation, so it makes hard to find the word which is given only its sound information. This invention proposes the sound-based index for the dictionary to realize the quick word retrieval by any person's access. For the sake of this, I introduced the some (ex.12) grouping phonetic symbols (hereafter this symbol is called "AsifSound") including the all existing ones. Given a one word's sound, then one AsifSound string is decided uniquely.
Further, according to the real experimental statistics of the CMU (Carnegie Mellon University) Pronouncing Dictionary  (containing over 125,000 words), the maximum number of conflict word's elements in the same AsifSound string was about 250 at most. And such case was very few, and the case as more than 252 was zero (=0). In general, The longer the string length of AsifSound, the less the number of its elements. Hence, The longer, the easier to find the word. This fact indicates that AsifSound grouping technique is fit for the quick word's retrieval by any person.
--------------------------------
Claim  ---independent page
--------------------------------
1. What I claim as my invention is: A definition of 12 grouping phonetic characters (=AsifSound), comprising:

 ----Using only ; A,B,D,F,G,H,K,L,N,P,S,T (12 characters)
A={ x ; x= vowels ,W , Y or all previous combination }, B={ b, v}, D={ d, dg(as bridge), th(as the), z, dz, su(as leisure)}, F={ f}, G={ g}, H={ h}, K={ k}, L={ l, r}, N={ m, n, ng (, nk)}, P={ p}, S={ s, sh, th}, T={ ch, t, th (, ts)} (,W={ w}, Y={ j}) 
Here, in above definition as the mathematical Set notation, AsifSound is the Set name, and each Set's elements are the corresponding traditional phonetic symbols or sounds. That is, AsifSound is the super-set of the traditional phonetic notation.

And by sound, assign the words into the AsifSound string. For example: "help" is the only one element of "HaLP", "dictionary" belongs to "DaKSaNaLa", "vocabulary" is in "BaKaBaLaLa",...
-------------------------------
FIELD OF THE INVENTION

This invention relates to a classification of a word in the language, for example the English dictionary classifies the words by the spelling.

BACKBROUND OF THE INVENTION

Up to now, in the English culture, the input of the dictionary was only a spelling, except the "SOUNDEX" index system which classifies the family name by its spelling in the national census.  "SOUNDEX" has "Xnnn" format style where X is 1st character of the object name and nnn is 3-digit code key number corresponding to the subsequent consonants spelling. Especially in SOUNDEX, the similar consonant spelling characters are assigned the same number, and vowel {a,e,i,o,u} and {h,w,y} characters are omitted all. This mapping is 1={b,p,f,v}, 2={c,s,k,g,j,q,x,z}, 3={d,t}, 4={l}, 5={m,n}, 6={r}. Foregoing SOUNDEX system is systematic and consistent, but it is lack of the human taste. I knew the existence of SOUNDEX after I submitted this invention to Japanese Patent Office. 
And the vowel omission principle is adopted in various abbreviations or stenographic system frequently. I dislike the custom of vowel omission especially in computer world, it disturbs the sense of humanity.
I must describe about the word's spelling checker's correction function by computer, too. The computer has almighty power. But, now its intelligent level is yet depend on the spelling only, for example it can't correct from the input "telolist" into the output "terrorist". This example is good benchmark test, if you have a time, please do check out this test on various word processors or online dictionaries. In this point, I don't say computer should be cleverer, contrariwise, I'd think that it had better give more regulation to the human side. The sign of sound-based input mode or rule must be defined between the man and the computer, then the man-machine interface and software procedure itself become simpler and efficiency of machine makes more up. I think it will be allowed to give such regulation toward the man, and such rule is easy for the man, and then it brings a consistency (or comprehensibility) of human thinking concept. 
The present invention might be owed much to the Japanese culture. I am a pure Japanese boy. In Japan, the most common dictionary's input is phonetic symbols. That symbol is said "KANA", we use it as a letter in usual writing. Japanese pronunciation is so simple that there are no confusing/vague vowels or consonants. Therefore KANA itself is not only a spelling symbol but also a phonetic symbol. But, any Japanese people (except me?!) has never thought seriously that English dictionary's input is somewhat odd.
This invention was born in process of my lonely training for TOEIC listening test. I listened to the American military's radio broadcast on every day every night, then I was indeed urged to get the pronunciation-input-based glossary, and then I searched it, but I found that there were not such glossaries in the world.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is the definition of AsifSound in English.
FIG. 2 is a mapping example of the relation between AsifSound and the real words in English.
FIG. 3 is a census result of AsifSound about CMU Pronouncing Dictionary

DETAILED DESCRIPTION OF THE DRAWINGS

FIG. 1 indicates the definition of AsifSound. AsifSound is a set of similar sound, in it, I assigned only one character "A (or a)" to all vowels or W, Y or those previous combination patterns, because I can't distinguish these detailed difference and it is convenient/flexible as one symbol collectedly as such. W and Y are both near the vowels, for example "ear" / "year"/"we're", "U" / "you" , "no" / "know", "worship" / "warship", "Y" / "eye" / "I", "wait" / "eight", "us" / "use",...  And with regard to the consonants, I've cut the branches on purpose for the sake of its simplification. As the matter of fact, I'd like to include "F" into "H", but I stopped doing it because I feel something bad (like a culture gap?!). For example, someone said "You are hired." or "You are fired." at high speed, don't you feel any anxiety?  At last, 12 (one dozen) characters remained as AsifSound. This grouping result is different from the foregoing SOUNDEX rule, because SOUNDEX depends on the derivative sounds (including the consideration to the spelling), on the other hand, AsifSound depends on the sound only. Further, AsifSound system does not omit the vowels, because the vowel is so important or indispensable for sustaining the nuance of the word's sound that the vowel must not be omitted. This difference between SOUNDEX and AsifSound is due to the difference of its purpose. SOUNDEX was developed for identifying the family ancestor, so it takes care of the spelling. AsifSound is not bound by the spelling derivative; it is free from the spelling. AsifSound rule has a unique 1 to 1 mapping relation from the traditional phonetic symbol to AsifSound symbol except a "th" pronunciation. The sound of clear "th" belongs to "S" or "T" as "thousand" is heard like a "tauzand" in one instance. Last, "m" is in "N" as the spelling "mb" / "mp" are only allowed but not "nb" / "np". The fact that "m" and "n" are in the same group is the same as SOUNDEX. You may say that AsifSound is a one of the exclusive least common multiples of traditional phonetic symbols.


FIG. 2 indicates the example. I have no mention about this example result. It's easy to understand. 

FIG. 3 indicates the distribution statistics result of the raw English words. According to the real experimental statistics of the CMU (Carnegie Mellon University) Pronouncing Dictionary (containing over 125,000 words), the maximum number of conflict word's elements in the same AsifSound string was about 250 at most. And such case was very few, and the case as more than 252 was zero (=0). In general, The longer the string length of AsifSound, the less the number of its elements. Hence, The longer, the easier to find the word. This fact indicates that AsifSound grouping technique is fit for the quick word's retrieval by any person if he or she has a somewhat precise word's pronunciation. Using another expression, the population density of AsifSound space is very thin in the suburbs or the countryside as greater than 5 string length of it; you only select one among at most 81 words. In case of greater than 8 string length of AsifSound, it is no exaggeration to say that its member is only one. 
Summing up this result, AsifSound classification technique has a good feature like a good hashing algorithm with respect to the moderate number of members, and then everyone can find out the wanted word quickly. And, no omitting the vowel is also fit to the humanistic sense and educational spirits.
If you couldn't find the word by AsifSound dictionary, then, its reason is in either your pronouncing memory circuit is corrupted (collapsed) or AsifSound dictionary is lack of its word.

--------------------------------------
FIG. 1

A definition of 12 grouping phonetic characters (=AsifSound), comprising:

 ----Using only ; A,B,D,F,G,H,K,L,N,P,S,T (12 characters)
A={ x ; x= vowels ,W , Y or all previous combination }, B={ b, v}, D={ d, dg(as bridge), th(as the), z, dz, su(as leisure)}, F={ f}, G={ g}, H={ h}, K={ k}, L={ l, r}, N={ m, n, ng (, nk)}, P={ p}, S={ s, sh, th}, T={ ch, t, th (, ts)} (,W={ w}, Y={ j}) 
Here in above definition as the mathematical Set notation, AsifSound is the Set name, and each Set's elements are the corresponding traditional phonetic symbols or sounds. That is, AsifSound is the super set of the traditional phonetic notation.
And by sound, assign the words into the AsifSound string. For example: "help" is the only one element of "HaLP", "dictionary" belongs to "DaKSaNaLa", "vocabulary" is in "BaKaBaLaLa",...

-------------------------------------
FIG. 2

(I, you, we, are, were, air) < a,
(bath, verse, boss, voice) < BaS,
(flee, flea, flow, free) < FLa,
(son, sing, same, thong) < SaN,
(cash, case, chaos, kiss) < KaS,
(cat, catch, kit, caught) < KaT,
(thesaurus) < SaSaLaS,
(bathe, birds, voyage, void) < BaD,
(Babylon) < BaBaLaN,
(volunteer, Valente) < BaLaNTa,
(barricada, barracuda) < BaLaKaDa,
(grandma, glamor, grammar) < GLaNa,
(sanctuary) < SaNKTaLa,
(science, sense, since) < SaNS,
(conscience) < KaNSaNS,
(discipline) < DaSaPLaN,
(larger, rather, leisure, lazy) < LaDa,
(carvers, cabbage, cupboard) < KaBaD,
(underweight, indict, indite) < aNDaT,
(lamb, ram , rhyme, lion, lyon, rain, reign) < LaN,
(consul, cancel, council, counsel) < KaNSaL,
(redirection, resurrection) < LaDaLaKSaN,
(morning, mourning, moaning) < NaNaN,
(Giacometti) < DaKaNaTa,
(Euler) < aLa,
(Einstein) < aNSTaN,
(Lincoln) < LaNKaN,
(Beethoven) < BaTaBaN,
(Tchaikovsky) < TaKaFSKa

Tongue twister [TaN  TaSTa]
Peter Paper picked a peck of pickled pepper. ----
[PaTa  PaPa  PaKT  a  PaK  aB  PaKLD  PaPa]
She sells seashells on the seashore. -----
[Sa  SaLD  SaSaLD  aN  Da  SaSa]

-------------------------------------
FIG. 3

CMU Pronouncing Dictionary statistics:
<The census of AsifSound families>				
digit	#sounds	#words	av w/s	max w/s
1	2	216	108.0 	215
2	28	2,223	79.4 	226
3	184	8,524	46.3 	252
4	788	18,270	23.2 	248
5	2,780	26,584	9.6 	163
6	6,703	25,063	3.7 	81
7	9,498	18,745	2.0 	31
8	8,735	12,927	1.5 	16
9	6,029	7,755	1.3 	12
10	3,681	4,377	1.2 	8
11	2,116	2,430	1.1 	6
12	1,154	1,304	1.1 	5
13	563	617	1.1 	6
14	239	254	1.1 	4
15	109	116	1.1 	3
...				
Total	42,609	129,405	3.0 	252