This Python script extracts emails and links from a given website URL. It crawls the domain for emails and retrieves all valid links on the web pages within the same domain.
- Extract Emails: Finds all email addresses from a webpage and other pages within the same domain.
- Crawl Links: Collects and follows all the links within the domain to continue email extraction.
- Domain Validation: The script checks if the provided URL is valid and belongs to the same domain before crawling.
Before running this script, ensure you have the following installed:
- Python 3.x
- Requests: To make HTTP requests.
- BeautifulSoup (bs4): For parsing HTML content.
- re: For regular expression matching.
- Clone this repository and install the required dependencies (if any):
git clone https://github.com/cambridgeitcollege/EmailScraperScript.git
cd EmailScraperScript
- Create the Virtual Environment (Not Necessary but Recommand) and activate it.
#For Linux User
python3 -m virtualenv venv
source venv/bin/activate
- Install the Requirements
pip install -r requirements.txt
- Run the
main.py
file
We welcome contributions! If you'd like to contribute to this Mero Share IPO Filler Script, please check out our Contribution Guidelines.
Please review our Code of Conduct before participating in this Script.