Merge CSVs - Union Join

Description

This script will merge your CSVs into one file by performing a union operation, eliminating duplicate records.

An illustration of a union join can be found below:

Key Features

Duplicate Row Handeling: This script efficiently handles duplicate rows by storing previously processed rows in a set. This allows for rapid duplicate checks (O(1) lookup time) before writing new rows.
Handles Varying Column Orders: This script can process CSV files with columns in different sequences, further ensuring accurate duplicate checks.
Overwrite File Warning: If your specified output file exists, the script will warn you and confirm if you wish to overwrite the file.

Getting Started

Dependencies

Python 3

Executing script

Step One - Clone Repo

git clone https://github.com/kirby-jack/merge-csvs-union-join.git

cd merge-csvs-union-join

Step Two - Launch Script

python3 main.py -d '/PATH/TO/CSV/FILES' -o 'OUTPUT_NAME' -s 'SEPARATOR'

Parameter Usage

Replace '/PATH/TO/CSV/FILES' with the full path to the directory containing the CSV files you wish to merge.

Replace OUTPUT_NAME with your desired output name. OUTPUT_NAME should not include a period .

Replace SEPARATOR with a valid separator value. Available SEPARATOR values include:

comma
','
semicolon
';'
space
' '
tab
'\t'

Step Three - Finished

Your specified directory will now contain a merged csv called OUTPUT_NAME.csv (depending on your chosen OUTPUT_NAME name).

Help

Run

python3 main.py -h

If you encounter any issues, please contact me on LinkedIn - https://www.linkedin.com/in/kirby-jack/

Authors

Jack Kirby - https://www.linkedin.com/in/kirby-jack/

Credit

There is a piece of code for handeling very large field values, credit goes to Stackoverflow user 'user1251007' for this technique https://stackoverflow.com/questions/15063936/csv-error-field-larger-than-field-limit-131072

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
README.md		README.md
args.py		args.py
main.py		main.py
merge_csv.py		merge_csv.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Merge CSVs - Union Join

Description

Key Features

Getting Started

Dependencies

Executing script

Help

Authors

Credit

About

Releases

Packages

Languages

kirby-jack/merge-csvs-union-join

Folders and files

Latest commit

History

Repository files navigation

Merge CSVs - Union Join

Description

Key Features

Getting Started

Dependencies

Executing script

Help

Authors

Credit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages