CheapER
is a tool for performing Entity Resolution tasks with few labeled training samples.
CheapER
adopts large language models within a noisy training framework, in combination with adaptive fine tuning, consistency training, adaptive softmax and Monte Carlo dropout.
CheapER
requires less labeled training data with respect to SotA systems (as of early 2023) to reach the same F1.
Experiments on the DeepMatcher datasets can be reproduced using the eval.py
script.
- Effectiveness of adaptive fine-tuning for the ER task.
- CheapER training using 5% of the BeerAdvo-RateBeer dataset (using a DistilBert model).
If you extend or use this work, please cite:
@article{teofili2023cheaper,
title={CheapER: Low Cost Entity Resolution},
author={Teofili, Tommaso and Firmani, Donatella and Merialdo, Paolo},
year={2023}
}