Nepal bhasa text corpus for NLP.
Text is in jsonlines format, each line is a post.
- _raw.jsonl = original scraped text in devanagari script
- _clean.jsonl = cleaned up version in devanagari script
- _newa.jsonl = cleaned up version converted to Prachalit (Newa) script
Scraped from these nepal bhasa news portals
- nepalbhasatimes.com
- nepalmandal.com
Scraped without explicit permission. To be used for betterment of Nepal Bhasa lingustic models and tools