Elinor: Evaluation Library in INfOrmation Retrieval

News: The CLI tools are now available in the elinor-cli directory!

Elinor is a Rust library for evaluating information retrieval (IR) systems. It provides a comprehensive set of metrics and statistical tests for evaluating and comparing IR systems.

Key features

IR-specific design: Elinor is tailored specifically for evaluating IR systems, with an intuitive interface designed for IR engineers. It offers a streamlined workflow that simplifies common IR evaluation tasks.
Comprehensive evaluation metrics: Elinor supports a wide range of key evaluation metrics, such as Precision, MAP, MRR, and nDCG. The supported metrics are available in Metric. The evaluation results are validated against trec_eval to ensure accuracy and reliability.
In-depth statistical testing: Elinor includes several statistical tests, such as Student's t-test, Bootstrap test, and Randomized Tukey HSD test. Not only p-values but also other important statistics, such as effect sizes and confidence intervals, are provided for thorough reporting. See the statistical_tests module for more details.
Command-line tools: elinor-cli provides command-line tools for evaluating and comparing IR systems. The tools support various metrics and statistical tests, facilitating comprehensive evaluations and in-depth analyses.

API documentation

See https://docs.rs/elinor/.

Or, you can build and open the documentation locally by running the following command:

RUSTDOCFLAGS="--html-in-header katex.html" cargo doc --no-deps --features serde --open

Command-line tools

elinor-cli provides command-line tools for evaluating and comparing IR systems.

For example, you can obtain various statistics from several statistical tests, as shown below:

Two-system comparison

# Means
+--------+----------+----------+
| Metric | System_1 | System_2 |
+--------+----------+----------+
| ndcg@5 | 0.3450   | 0.2700   |
+--------+----------+----------+

# Two-sided paired Student's t-test for (System_1 - System_2)
+--------+--------+--------+--------+--------+---------+---------+
| Metric | Mean   | Var    | ES     | t-stat | p-value | 95% MOE |
+--------+--------+--------+--------+--------+---------+---------+
| ndcg@5 | 0.0750 | 0.0251 | 0.4731 | 2.1158 | 0.0478  | 0.0742  |
+--------+--------+--------+--------+--------+---------+---------+

# Two-sided paired Bootstrap test (n_resamples = 10000)
+--------+---------+
| Metric | p-value |
+--------+---------+
| ndcg@5 | 0.0511  |
+--------+---------+

# Fisher's randomized test (n_iters = 10000)
+--------+---------+
| Metric | p-value |
+--------+---------+
| ndcg@5 | 0.0498  |
+--------+---------+

Multi-system comparison

# ndcg@5
## System means
+----------+--------+---------+
| System   | Mean   | 95% MOE |
+----------+--------+---------+
| System_1 | 0.3450 | 0.0670  |
| System_2 | 0.2700 | 0.0670  |
| System_3 | 0.2450 | 0.0670  |
+----------+--------+---------+
## Two-way ANOVA without replication
+-----------------+------------+----+----------+--------+---------+
| Factor          | Variation  | DF | Variance | F-stat | p-value |
+-----------------+------------+----+----------+--------+---------+
| Between-systems | 0.1083     | 2  | 0.0542   | 2.4749 | 0.0976  |
| Between-topics  | 1.0293     | 19 | 0.0542   | 2.4754 | 0.0086  |
| Residual        | 0.8317     | 38 | 0.0219   |        |         |
+-----------------+------------+----+----------+--------+---------+
## Effect sizes for Tukey HSD test
+----------+----------+----------+----------+
| ES       | System_1 | System_2 | System_3 |
+----------+----------+----------+----------+
| System_1 | 0.0000   | 0.5070   | 0.6760   |
| System_2 | -0.5070  | 0.0000   | 0.1690   |
| System_3 | -0.6760  | -0.1690  | 0.0000   |
+----------+----------+----------+----------+
## p-values for randomized Tukey HSD test (n_iters = 10000)
+----------+----------+----------+----------+
| p-value  | System_1 | System_2 | System_3 |
+----------+----------+----------+----------+
| System_1 | 1.0000   | 0.2561   | 0.1040   |
| System_2 | 0.2561   | 1.0000   | 0.8926   |
| System_3 | 0.1040   | 0.8926   | 1.0000   |
+----------+----------+----------+----------+

Correctness verification

In addition to simple unit tests, Elinor's evaluation results are validated to ensure accuracy and reliability:

The metrics are validated against trec_eval using its test data.
The statistical tests are validated against the results in Sakai's book using its sample data.

Acknowledgments

This library is inspired by Sakai's books on IR evaluation and statistical testing:

酒井哲也. 情報アクセス評価方法論. コロナ社, 2015.
Tetsuya Sakai. Laboratory Experiments in Information Retrieval: Sample Sizes, Effect Sizes, and Statistical Power. Springer, 2018.

I recommend reading these books before using this library.

Licensing

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
elinor-cli		elinor-cli
examples		examples
scripts		scripts
src		src
test-data		test-data
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
katex.html		katex.html
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Elinor: Evaluation Library in INfOrmation Retrieval

Key features

API documentation

Command-line tools

Two-system comparison

Multi-system comparison

Correctness verification

Acknowledgments

Licensing

About

Licenses found

Releases 8

Packages

Languages

License

Licenses found

kampersanda/elinor

Folders and files

Latest commit

History

Repository files navigation

Elinor: Evaluation Library in INfOrmation Retrieval

Key features

API documentation

Command-line tools

Two-system comparison

Multi-system comparison

Correctness verification

Acknowledgments

Licensing

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages