A Catalan corpus based on https://github.com/recitalAI/MLSUM concepts.
Original context is from Vilaweb licensed under Attribution-NonCommercial-NoDerivs which allows sharing.
Files:
- URLs used at urls/train.ca.txt.urls
- Text and summaries: processed/ca_train.txt (2678 entries)
The text and summaries are in the same format that MLSum corpus (tab separated).