CheapER

CheapER is a tool for performing Entity Resolution tasks with few labeled training samples.

CheapER adopts large language models within a noisy training framework, in combination with adaptive fine tuning, consistency training, adaptive softmax and Monte Carlo dropout.

Experiments

CheapER requires less labeled training data with respect to SotA systems (as of early 2023) to reach the same F1.

Experiments on the DeepMatcher datasets can be reproduced using the eval.py script.

Notebooks

Effectiveness of adaptive fine-tuning for the ER task.
CheapER training using 5% of the BeerAdvo-RateBeer dataset (using a DistilBert model).

Citing CheapER

If you extend or use this work, please cite:

@article{teofili2023cheaper,
  title={CheapER: Low Cost Entity Resolution},
  author={Teofili, Tommaso and Firmani, Donatella and Merialdo, Paolo},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 422 Commits
cheaper		cheaper
datasets		datasets
results		results
scripts		scripts
.gitignore		.gitignore
README.md		README.md
cheaper.png		cheaper.png
dm_results.png		dm_results.png
example.ipynb		example.ipynb
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CheapER

Experiments

Notebooks

Citing CheapER

About

Releases

Packages

Languages

tteofili/cheapER

Folders and files

Latest commit

History

Repository files navigation

CheapER

Experiments

Notebooks

Citing CheapER

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages