Skip to content

This repository contains a script that removes the angle brackets surrounding the value in the WARC-Target-URI field.

License

Notifications You must be signed in to change notification settings

PACKED-vzw/wget_warc_converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wget_warc_converter

This script takes a WARC file and makes a copy of it, removing all angle brackets surrounding the URL in the WARC-Target-URI headers.

Requirements

Usage

Base command:

python3 wget_warc_converter.py --input $input_warc --output $output_warc

For example

python3 wget_warc_converter.py --input /Users/nvanderperren/Desktop/wget-warc.warc.gz --output /Users/nvanderperren/Desktop/converted-warc.gz

About

This repository contains a script that removes the angle brackets surrounding the value in the WARC-Target-URI field.

Resources

License

Stars

Watchers

Forks

Languages