-
Notifications
You must be signed in to change notification settings - Fork 7
Readme
On Unix-like systems, please follow the basic installation process below:
./configure
make
make install
This process installs 2 commands, dawgdic-build and dawgdic-find, and headers <dawgdic/*.h> as a C++ header library. Outline of the commands and headers are given as follows:
- Commands
-
make-dawg
- Builds a dictionary from a lexicon.
- Keys must be divided by line separaters (LF).
- Keys must be sorted in dictionary order.
-
dawgdic
- Finds keys in prefixes of each input line.
-
make-dawg
- Headers
-
<dawgdic/*.h>
- Define classes for building a dictionary from a lexicon.
- Define classes for finding keys in a dictionary.
- Define classes for completing keys from a given prefix.
-
<dawgdic/*.h>
If you need to use dawgdic without installation, e.g., on Windows. Please copy headers in src/dawgdic to your project directory or include directory.
The construction process is as follows:
- Dawg construction
- Builds a simple dawg from a lexicon.
- Keys must be inserted in dictionary order.
- dawgdic::DawgBuilder is a class for building a simple dawg.
- dawgdic::Dawg is a simple dawg class.
- Dawg conversion
- Transforms a simple dawg into a dictionary.
- dawgdic::DictionaryBuilder is a class for building a dictionary.
- dawgdic::Dictionary is a dictionary class.
The following example builds a dictionary from a lexicon which consists of 2 keys "apple" and "orange" without records. In fact, the default value 0 is attached to the keys. If you need records for keys other than the default value, please pass a record to a member function Insert() of dawgdic::DawgBuilder as the 2nd or 3rd argument.
#include <ofstream>
#include <dawgdic/dawg-builder.h>
#include <dawgdic/dictionary-builder.h>
// Inserts keys into a simple dawg.
dawgdic::DawgBuilder dawg_builder;
dawg_builder.Insert("apple");
dawg_builder.Insert("orange");
// Finishes building a simple dawg.
dawgdic::Dawg dawg;
dawg_builder.Finish(&dawg);
// Builds a dictionary from a simple dawg.
dawgdic::Dictionary dic;
dawgdic::DictionaryBuilder::Build(dawg, &dic);
// Writes a dictionary into a file "test.dic".
std::ofstream dic_file("test.dic", ios::binary);
dic.Write(&dic_file);
The following sample code gives the way of simple dictionary lookups. If you need to find keys in prefixes of an input string, please use a member function Follow() of dawgdic::Dictionary.
#include <ifstream>
#include <iostream>
#include <dawgdic/dictionary.h>
// Reads a dictionary from a file "test.dic".
std::ifstream dic_file("test.dic", ios::binary);
dawgdic::Dictionary dic;
dic.Read(&dic_file);
// Checks if a key exists or not.
if (dic.Contains("apple"))
std::cout << "apple: found" << std::endl;
// Finds a key and gets its associated record.
if (dic.Find("banana") < 0)
std::cout << "banana: not found" << std::endl;
- Aoe, J.: An Efficient Digital Search Algorithm by Using a Double-Array Structure.
- IEEE Transactions on Software Engineering 15(9) (September 1989) 1066-1077
- Aoe, J., Morimoto, K., Sato, T.: An Efficient Implementation of Trie Structures.
- Software: Practice and Experience, 22(9) (September 1992) 695-721
- Daciuk, J., Watson, B.W., Mihov, S., Watson, R.E.: Incremental Construction of Minimal Acyclic Finite-State Automata.
- Computational Linguistics 26(1) (March 2000) 3-16
- Morita, K., Fuketa, M., Yamakawa, Y. and Aoe, J.: Fast insertion methods of a double-array structure.
- Software: Practice and Experience 31(1) (January 2001) 43-65
- Ciura, M.G., Deorowicz, S.: How to squeeze a lexicon.
- Software: Practice and Experience 31(11) (September 2001) 1077-1090
- Yata, S., Oono, M., Morita, K., Fuketa, M., Sumitomo, T. and Aoe, J.: A compact static double-array keeping character codes.
- Information Processing and Management 43(1) (January 2007) 237-247
- Yata, S., Morita, K., Fuketa, M., Aoe, J.: Fast String Matching with Space-efficient Word Graphs.
- Innovations in Information Technology (Innovations '08) Al Ain, United Arab Emirates (December 2008) 79-83
- Jan Daciuk: Jan Daciuk, Dept of Knowledge Engineering, Gdansk University of Technology
- Sam Allen: C# Directed Acyclic Word Graph
- Wutka Consulting, Inc.: Directed Acyclic Word Graphs
- Taku Kudo: Darts: Double-ARray Trie System (Japanese)
- Wikipedia: Directed acyclic word graph - Wikipedia, the free encyclopedia
- JohnPaul Adamovsky: Directed Acyclic Word Graph or DAWG