Nepal bhasa text corpus for NLP.
Text is in jsonlines format, each line is a post.
- _raw.jsonl = original scraped text in devanagari script
- _clean.jsonl = cleaned up version in devanagari script
- _newa.jsonl = cleaned up version converted to Prachalit (Newa) script
Scraped from these nepal bhasa news portals
Scraped without explicit permission. To be used for betterment of Nepal Bhasa lingustic models and tools