Skip to content

sapradhan/nepalbhasa-corpus

Repository files navigation

nepalbhasa-corpus

Nepal bhasa text corpus for NLP.

Text is in jsonlines format, each line is a post.

File key

  • _raw.jsonl = original scraped text in devanagari script
  • _clean.jsonl = cleaned up version in devanagari script
  • _newa.jsonl = cleaned up version converted to Prachalit (Newa) script

Source

Scraped from these nepal bhasa news portals

  • nepalbhasatimes.com
  • nepalmandal.com

Scraped without explicit permission. To be used for betterment of Nepal Bhasa lingustic models and tools

About

Nepal bhasa text corpus

Resources

Stars

Watchers

Forks