Skip to content

MeTA v2.1.0

Compare
Choose a tag to compare
@smassung smassung released this 13 Feb 03:15
· 365 commits to master since this release

New features

  • Add the GloVe algorithm for
    training word embeddings and a library class word_embeddings for loading and
    querying trained embeddings. To facilitate returning word embeddings, a simple
    util::array_view class was added.
  • Add simple vector math library (and move fastapprox into the math
    namespace).

Bug fixes

  • Fix probe_map::extract() for inline_key_value_storage type; old
    implementation forgot to delete all sentinel values before returning the
    vector.
  • Fix incorrect definition of l1norm() in sgd_model.
  • Fix gmap calculation where 0 average precision was ignored
  • Fix progress output in multiway_merge.

Enhancements

  • Improve performance of printing::progress. Before, progress::operator() in
    tight loops could dramatically hurt performance, particularly due to frequent
    calls to std::chrono::steady_clock::now(). Now, progress::operator()
    simply sets an atomic iteration counter and a background thread periodically
    wakes to update the progress output.
  • Allow full text storage in index as metadata field. If store-full-text = true (default false) in the corpus config, the string metadata field
    "content" will be added. This is to simplify the creation of full text
    metadata: the user doesn't have to duplicate their dataset in metadata.dat,
    and metadata.dat will still be somewhat human-readable without large strings
    of full text added.
  • Allow make_index to take a user-supplied corpus object.

Miscellaneous

  • ZLIB is now a required dependency.
  • Switch to just using the standalone ./unit-test instead of ctest. There
    aren't really many advantages for us to using CTest at this point with the new
    unit test framework, so just use our unit test executable.