A (multi-threaded) web crawler that is limited to a domain. For instance, if the origin or starting point is www.amazon.com, it would crawl all pages within amazon.com, but not follow external links, for example to the Facebook or Twitter.
To get the source code, clone the GitHub repository:
$ git clone https://github.com/shalomRachapudi/web-crawler.git
$ ./run.sh