(aspell.info.gz) Unsupported
Info Catalog
(aspell.info.gz) Supported
(aspell.info.gz) Languages Which Aspell can Support
(aspell.info.gz) Multiple Scripts
B.2 Unsupported
===============
These languages, when written in the given script, are currently
unsupported by Aspell for one reason or another.
Code Language Name Script
ja Japanese Japanese
km Khmer Khmer
ko Korean Han, Hangul
lo Lao Lao
th Thai Thai
zh Chinese Han
B.2.1 The Thai, Khmer, and Lao Scripts
--------------------------------------
The Thai, Khmer, and Lao scripts presents a different problem for
Aspell. The problem is not that there are more than 210 unique symbols,
but that there are no spaces between words. This means that there is no
easy way to split a sentence into individual words. However, it is
still possible to spell check these scripts, it is just a lot more
difficult. I will be happy to work with someone who is interested in
adding Thai, Khmer, or Lao support to Aspell, but it is not likely
something I will do on my own in the foreseeable future.
B.2.2 Languages which use Hŕnzi Characters
------------------------------------------
Hŕnzi Characters are used to write Chinese, Japanese, Korean, and were
once used to write Vietnamese. Each hŕnzi character represents a
syllable of a spoken word and also has a meaning. Since there are
around 3,000 of them in common usage it is unlikely that Aspell will
ever be able to support spell checking languages written using hŕnzi
until full Unicode support is implemented. However, I am not even sure
if these languages need spell checking since hŕnzi characters are
generally not entered in directly. Furthermore even if Aspell could
spell check hŕnzi the existing suggestion strategy will not work well
at all, and thus a completely new strategy will need to be developed.
However, if it is the case that hŕnzi needs to be spell checked and you
know something about the issues involved please fell free to contact me.
B.2.3 Japanese
--------------
Modern Japanese is written in a mixture of "hiragana", "katakana",
"kanji", and sometimes "romaji". "Hiragana" and "katakana" are both
syllabaries unique to Japan, "kanji" is a modified form of hŕnzi, and
"romaji" uses the Latin alphabet. With some work, Aspell should be
able to check the non-kanji part of Japanese text. However, based on
my limited understanding of Japanese hiragana is often used at the end
of kanji. Thus if Aspell was to simply separate out the hiragana from
kanji it would end up with a lot of word endings which are not proper
words and will thus be flagged as misspellings. However, this can be
fairly easily rectified as text is tokenized into words before it is
converted into Aspell's internal encoding. In fact, some Japanese text
is written in entirely in one script. For example books for children
and foreigners are sometimes written entirely in hiragana. Thus,
Aspell, in its current state, could prove at least somewhat useful for
spell checking Japanese.
B.2.4 Hangul
------------
Korean is generally written in hangul or a mixture of han and hangul.
In Hangul letters individual letters, known as jamo, are grouped
together in syllable blocks. Unicode allows Hangul to be stored in one
of three ways, (A) Individual jamo letters (Hangul Compatibility Jamo,
U+3130 - U+318F), (D) decomposed jamo (Hangul Jamo, U+1100 - U+11FF),
and (C) precoposed sylable blocks (Hangul Syllables, U+AC00 - U+D7AF).
In order for Aspell to work with Hangul it needs to be form A.
Unfortunately the existing Normalization code in Aspell will not be
able to adequately deal with converting Hangul from form D and C to
form A and back again. However, once this code is written, Aspell
should be able to spell check Hangul without any problem.
Info Catalog
(aspell.info.gz) Supported
(aspell.info.gz) Languages Which Aspell can Support
(aspell.info.gz) Multiple Scripts
automatically generated by
info2html