Web-Page-Data-Extraction-Script

This Python script reads a list of URLs from a spreadsheet, fetches the source code for each page, extracts multiple matches using a regular expression (regex) pattern, and writes the results to a new spreadsheet. Each match found on a page is written as a row in the output file.

Prerequisites

Before you run the script, ensure you have the following installed:

Python 3.x
pandas library
requests library
openpyxl library

You can install the required libraries using pip:

pip install pandas requests openpyxl

Script Explanation

The script performs the following steps:

Read URLs from the spreadsheet: It uses the pandas library to read URLs from the specified Excel file.
Fetch the web pages: It uses the requests library to retrieve the source code of each URL.
Extract data using regex: It uses the re library to find all matches in the source code based on the provided regex pattern.
Write the extracted data to a new spreadsheet: It uses the pandas library to write the results to a new Excel file.

Customization

Regex Pattern: Update the pattern variable with your desired regex pattern to extract specific data from the web pages.
Input/Output Files: Adjust the file paths and names for the input (urls.xlsx) and output (extracted_data.xlsx) files as needed.

Contact

If you encounter any issues or have questions, feel free to open an issue or contact contact Darnelle Melvin.

License

Source code is made available under the BSD 3-Clause License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
LICENSE		LICENSE
README.md		README.md
webExtractTool_v1.0.py		webExtractTool_v1.0.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web-Page-Data-Extraction-Script

Prerequisites

Script Explanation

Customization

Contact

License

About

Releases 1

Packages

Languages

License

darnelleMelvin/Web-Page-Data-Extraction-Script

Folders and files

Latest commit

History

Repository files navigation

Web-Page-Data-Extraction-Script

Prerequisites

Script Explanation

Customization

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages