From 0777f846453af092d01cd1551972b45b52b88b5e Mon Sep 17 00:00:00 2001 From: Jean-Philippe Barrette-LaPierre Date: Thu, 26 Feb 2015 18:27:48 +0800 Subject: [PATCH] Update README.md Pretty close now from the original webpage --- README.md | 67 ++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 9be287f..c2e2182 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,15 @@ # Moman ## Description -This is will eventually be a suite of tools to be used by an orthographic/grammatical checker and the checker itself. The tools are currently coded in Python, but I began to rewrite it in Lisp. Moman, the suite itself, will be consisted of the following tools: -* *FineNight* is the FSA library. +This was supposed to be a suite of tools to be used by an orthographic/grammatical checker and the checker itself. However, the project is mainly dead right now. But I encourage you to look through the code and use it as inspiration/reference. The tools are currently coded in Python, but I started a while back to rewrite it in Lisp (which will never be finished). Moman, the suite itself, consist of the following tools: + +* [FineNight](#finenight) is the FSA library. * A FST library. (Not yet implemented) -* *ZSpell* is the orthographic checker. +* [ZSpell](#zspell) is the orthographic checker. + +Mostly, the only part of the tools suite which is worthwhile mentioning is the "Fast String Correction" which is used by [Lucene's](https://lucene.apache.org/) FuzzyQuery. You can read about the inclusion of this project in Lucene by reading Michael McCandless's [article](http://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html). ## FineNight -The *FineNight* library contains many algorithms for Finite State Automatons. That includes: +The *FineNight* library contains many algorithms for Finite State Automatons. That includes: * Union of two FSAs * Intersection of two FSAs * Complement of a FSAs @@ -19,9 +22,57 @@ The *FineNight* library contains many algorithms for Finite State Automatons. Th * Minimization algorithm * Construction of an IADFA from a sorted dictionary * Graphviz support -* Error-Tolerant IADFA +* Error-Tolerant IADFA (starred in Michael McCandless's Mike MChttp://blog.mikemccandless.com/2011/03/lucenes-fuzzyquery-is-100-times-faster.html + +Almost all algorithms were taken from the book [Introduction to Automata Theory, Languages, and Computation](#hopcroft01). The minimization algorithm is an implementation of [Brzozowski's method](#brzozowski). In this method, the (possibly non-deterministic) automaton is reversed, determinized, reversed and determinized. I'll eventually add the [Hopcroft's nlog(n) minimization algorithm](#hopcroft). + +## ZSpell +ZSpell is meant to be a concurrent of aspell, made by Kevin Atkinson. At this time, ZSpell can suggest words with a Levenshtein-distance of one. Before we were using [Kemal Oflazer's algorithm](#oflazer96errortolerant). This algorithm is very slow, but now we use a faster algorithm ([Schulz's and Mihov's algorithm](#schulz02fast)). However, only substitution, removal and insertion are used for the faster algorithm. It means that transpositions errors, like "ehllo" -> "hello", are considered as two operations. -Almost all algorithms were taken from the book [Introduction to Automata Theory, Languages, and Computation](#hopcroft01). The minimization algorithm is an implementation of Brzozowski's method [2]. In this method, the (possibly non-deterministic) automaton is reversed, determinized, reversed and determinized. I'll eventually add the Hopcroft's nlog(n) minimization algorithm [3] +TODOs includes: +* Add transposition errors for Levenshtein-distance algorithm. +* Add phonetic errors (spelling by sound). +* Add derivation errors. -[John E. Hopcroft](http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Hopcroft.html), Rajeev Motwani and - Jefferey D. Ullman, Introduction to Automata Theory, Languages and Computation, 2nd edition, Adison-Wesley, 2001. +## References +* [John E. Hopcroft](http://www.cs.cornell.edu/Info/Department/Annual95/Faculty/Hopcroft.html), Rajeev Motwani and Jefferey D. Ullman, Introduction to Automata Theory, Languages and Computation, 2nd edition, Adison-Wesley, 2001. +* J. A. Brzozowski, + Canonical regular expressions and minimal state graphs for definite events, + in Mathematical Theory of Automata, Volume 12 of MRI Symposia Series, + pp. 529-561, Polytechnic Press, Polytechnic Institute of Brooklyn, N.Y., + 1962. +* + John E. Hopcroft + , + + An n log n algorithm for minimizing the states in a finite automaton + , + in The Theory of Machines and Computations, Z. Kohavi (ed.), pp. 189-196, + Academic Press, 1971. +* + Kemal Oflazer, + + Error-tolerant Finite State Recognition with Applications to + Morphological Analysis and Spelling Correction + , + Computational Linguistics, 22(1), pp. 73--89, March, 1996. +* + + Klaus U. Schulz + and + Stoyan Mihov, + + Fast String Correction with Levenshtein-Automata, + + International Journal of Document Analysis and Recognition, 5(1):67--85, 2002. +* + + Zbigniew J. Czech + , + + George Havas + and + Bohdan S. Majewski, + + An Optimal Algorithm for Generating Minimal Perfect Hash Functions + , Information Processing Letters, 43(5):257--264, 1992.