(aspell.info.gz) Working With Affix Info in Word Lists
Info Catalog
(aspell.info.gz) Creating an Individual Word List
(aspell.info.gz) Working With Dictionaries
(aspell.info.gz) Format of the Personal and Replacement Dictionaries
5.6 Working With Affix Info in Word Lists
=========================================
5.6.1 The Munch Command
-----------------------
The `munch' command takes a list of words from standard input and
outputs a list of possible root words and affixes. The root may,
however, be invalid as it does not check them against the existing
dictionary. For example the command:
echo brother | aspell -l en munch
produces
brother broth/R brothe/R
5.6.2 The Expand Command
------------------------
The `expand' command is the reverse of `munch', it expands affix flags
to produce a list of words. For example:
echo both/R | aspell -l en expand
produces
both bother
The formal usage is:
aspell expand [LEVEL] [LIMIT]
Where LEVEL is the expansion level. Valid values are between 1 and
3. Level 1 is the default if not otherwise specified. Level 2 causes
the original root/affix to be included, for example:
both/R both bother
Level 3 causes multiple lines to be printed, one for each generated
word, with the original root/affix combination followed by the word it
creates:
both/R both
both/R bother
Levels larger than 3 may also be supported, but should not be used as
they may eventually be removed.
If a LIMIT parameter is given then only expansions which affect the
first LIMIT letters will be expanded. If a base word is not completely
expanded for a given affix flag that flag will be left on the word.
Note that prefixes are always expanded.
5.6.3 The Munch-list Command
----------------------------
The `munch-list' command will reduce the size of word list via affix
compression. It will reduce a list of words to a minimal (or close to
it) set of roots and affixes that will match the same list of words.
The list of words is read from standard input and the result, the
"munched" list, is written to standard out. It's usage is:
aspell munch-list [keep] [single|multi] [simple] < INFILE > OUTFILE
where `simple', `single', `multi', and `keep' are literal values.
The default algorithm used should give near optimum results. In some
cases the set of words returned is, provably, the minimum number
possible. In the typical case the number of words returned is within
1% of the optimum number.
By default Aspell will remove redundant affix flags. The `keep'
flag will avoid removing them, which can be useful if you want to
include all possible expansions for each base word.
When cross products are involved it may be beneficial to list a base
word more than once. Unfortunately, the current version of Aspell can
not correctly handle multiple base words in a dictionary. Therefore,
the current default behavior is to only include the one with the most
expansions. All of them can be included via the `multi' flag. Once
Aspell is able to handle multiple base words the default will be to
include them all. The `single' flag can be used to only include one of
them.
The `simple' flag will select an alternate faster algorithm. This
algorithm is very similar to the `munch' command distributed with
MySpell (the Open Office spell checker), however, it doesn't give
nearly as good results. It does okay for the English word list but not
for some other languages such as German; the normal algorithm reduced a
list of 312,002 German words to 79,420 base words while the simple
algorithm only reduced it to 115,927 words. This algorithm may
disappear in a future version of Aspell.
Info Catalog
(aspell.info.gz) Creating an Individual Word List
(aspell.info.gz) Working With Dictionaries
(aspell.info.gz) Format of the Personal and Replacement Dictionaries
automatically generated by
info2html