See the releases page to see new additions/modifications for each release!
See this comparison page to see new additions/modifications that will be available in the NEXT release!
See sister YouTube-Channels repository for a list of interesting channels!
- The
YouTube-Channels
sister repository is a separate repository that uses this package to create a list of videos uploaded by every channel supported by the repository. - The sister repository will update the lists of videos once a week.
- NOTE: In order to minimize the size of the sister repo, the repo contains the list of videos in ONLY the
csv
format, and not intxt
ormd
format.
Python 3.6+ setup (required if not already installed)
This package uses f-strings (more here), and so requires Python 3.6+.
If you have an older version of Python, you can download Python 3.9.1 (follow links below) and follow the instructions to set up Python for your machine. If you want to install a different version, visit the Python Downloads page and select the version you want.
Permissions for first run
This is required to make sure you can download and install the required Selenium binary dependencies.
On Windows: make sure you open Command Prompt
or Powershell
(both work) in "Run as Administrator" mode
- shortcut: ⊞ Win + X + A
On Unix based machines (MacOS, Linux): make sure you have read and write access to /usr/local/bin/
- if you're not sure, open terminal and run
sudo chown $USER /usr/local/bin/
Installing the package
After you install Python 3.6+ and ensure you have the required permissions as needed, enter the following in your command line:
# if something isn't working properly, try rerunning this
# the problem may have been fixed with a newer version
pip3 install -U yt-videos-list # MacOS/Linux
pip install -U yt-videos-list # Windows
# if that doesn't work:
python3 -m pip install -U yt-videos-list # MacOS/Linux
python -m pip install -U yt-videos-list # Windows
If you're on Windows: make sure you always open Command Prompt
or Powershell
(both work) in "Run as Administrator" mode!
- shortcut: ⊞ Win + X + A
- this allows
yt_videos_list
to update selenium webdriver binaries to be compatible with newer browser versions as browsers are updated (e.g. your Firefox browser updates from version 77 to version 82)- to see the commands being run, see the
yt_videos_list/docs/dependencies.json
file
- to see the commands being run, see the
Running the package from the python interpreter
python3 # MacOS/Linux
python # Windows
from yt_videos_list import ListCreator
my_driver = 'firefox' # SUBSTITUTE DRIVER YOU WANT (options below)
lc = ListCreator(driver=my_driver, scroll_pause_time=0.8)
lc.create_list_for(url='https://www.youtube.com/user/schafer5')
lc.create_list_for(url='https://www.youtube.com/channel/UC8butISFwT-Wl7EV0hUK0BQ', log_silently=True)
# Set `log_silently` to `True` to mute program logging to the console.
# The program will log the prgram status and any program information
# to only the log file for the channel being scraped
# (this is useful when scraping multiple channels at once with multi-threading).
# By default, the program logs to both the log file for the channel being scraped AND the console.
# see the new files that were just created:
import os
os.system('ls -lt | head') # MacOS/Linux
os.system('dir /O-D | find "_videos_list"') # Windows
# for more information on using the module:
help(lc)
driver
options include:'firefox'
'opera'
'safari'
(MacOS only)'chrome'
'brave'
'edge'
(Windows only!)
- increase
scroll_pause_time
for laggy internet and decreasescroll_pause_time
for fast internet
If you already scraped a channel and the channel uploaded a new video, simply rerun this program on that channel and this package updates your files to include the newer video(s)!
Scraping multiple channels from a file simultaneously with multi-threading
Add the url to every channel you want to extract information from in a txt
file with every url placed on a new line.
- example:
channels.txt
(NOTE this is a relative link, so this might not link properly on non-GitHub hosted sites!)
Enter the python interpreter:
python3 # MacOS/Linux
python # Windows
from yt_videos_list import ListCreator
lc = ListCreator(driver='firefox', scroll_pause_time=1.2)
lc.create_list_from(path_to_channel_urls_file='channels.txt', number_of_threads=4)
# configuring settings:
lc.create_list_from(
path_to_channel_urls_file='channels.txt',
number_of_threads=4,
min_sleep=1,
max_sleep=5,
after_n_channels_pause_for_s=(20, 10),
log_subthread_status_silently=False,
log_subthread_info_silently=False
) # defaults (keyword argument form)
lc.create_list_from('channels.txt', 4, 1, 5, (20, 10), False, False) # defaults (positional argument form)
lc.create_list_from('channels.txt', min_sleep=3, max_sleep=10) # modifying only min_sleep and max_sleep
help(lc.create_list_from) # see API method details
- See Thread about multi-threading with yt_videos_list for more information!
Explicitly downloading all Selenium dependencies
Ideal if you use Selenium for other projects 😎
- Make sure you already have the
yt-videos-list
package installed (follow directions above for getting set up), then run the following:
pip3 install -U yt-videos-list # MacOS/Linux: ensure latest package
python3 # MacOS/Linux: enter python interpreter
pip install -U yt-videos-list # Windows: ensure latest package
python # Windows: enter python interpreter
from yt_videos_list.download import selenium_webdriver_dependencies
selenium_webdriver_dependencies.download_all()
That's all! 🤓
More API information
NOTE that you can also access all the information below from the Python interpreter by entering
import yt_videos_list
help(yt_videos_list)
# default options for the ListCreator object
ListCreator(
txt=True,
csv=True,
md=True,
reverse_chronological=True,
headless=False,
scroll_pause_time=0.8,
driver='firefox',
cookie_consent=False
)
There are a number of optional arguments you can specify during the instantiation of the ListCreator object. The preceding arguments are run by default, but in case you want more flexibility, you can specify the:
driver
argument:- Firefox (default)
- Opera
- Safari (MacOS only)
- Chrome
- Brave
- Edge (Windows only)
driver='firefox'
driver='opera'
driver='safari'
driver='chrome'
driver='brave'
driver='edge'
cookie_consent
argument:False
(default) - block all cookie options if prompted by YouTube (at consent.youtube.com)True
- accept all cookie options if prompted by YouTube (also at consent.youtube.com)cookie_consent=False
(default) ORcookie_consent=True
txt
,csv
,md
file type argument:True
(default) - create a file for the specified typeFalse
- do not create a file for the specified type.txt=True
(default) ORtxt=False
csv=True
(default) ORcsv=False
md=True
(default) ORmd=False
reverse_chronological
argument:True
(default) - write the files in order from most recent video to the oldest videoFalse
- write the files in order from oldest video to the most recent videoreverse_chronological=True
(default) ORreverse_chronological=False
headless
argument:False
(default) - run the driver with an open Selenium instance for viewingTrue
- run the driver in "invisible" mode.headless=False
(default) ORheadless=True
scroll_pause_time
argument:- any float values greater than
0
(default0.8
).- The value you provide will be how long the program waits before trying to scroll the videos list page down for the channel you want to scrape. For fast internet connections, you may want to reduce the value, and for slow connections you may want to increase the value.
scroll_pause_time=0.8
(default)- CAUTION: reducing this value too much will result in the program not capturing all the videos, so be careful! Experiment :)
- any float values greater than
verify_page_bottom_n_times
argument:- any int values greater than
0
(defaults to3
) - NOTE: this argument is only used when CREATING a new file for a new channel, and is unused when UPDATING an existing file for an already scraped channel.
- The value you provide will be how many times the program needs to verify it acually reached the bottom of the page before accepting it is the bottom of the page, and starting to write the information to the output file(s).
- For channels that have uploaded THOUSANDS of videos, increase this value to a large number that you think should be sufficient to verify the program reached the bottom of the page.
- To determine HOW large of a value you should provide, determine the length of time you'd like to wait before being reasonably sure that you reached the bottom of the page and it's not just YouTube's server trying to fetch the response from an old database entry, and divide the time you decided to wait by the
scroll_pause_time
argument.- For example, if you want to wait 45 seconds and you set the
scrioll_pause_time
value to1.0
: ->your_time / scroll_pause_time
->45 / 1.0
->45
-> therefore:verify_page_bottom_n_times=45
- For channels with only a couple hundred videos (or less), the default value of verify_
page_bottom_n_times=3
should be sufficient.
- For example, if you want to wait 45 seconds and you set the
- See commit a68f8f62e5c343cbb0641125e271bb96cc4f0750 for more details.
- any int values greater than
file_buffering
argument:- any
int
values greater than0
(default-1
, which uses the default OS setting) - LEAVE THIS ALONE IF YOU'RE UNSURE!
- Documentation:
- Deep dive:
- https://stackoverflow.com/questions/3167494/how-often-does-python-flush-to-a-file
- https://stackoverflow.com/questions/10019456/usage-of-sys-stdout-flush-method
- https://www.quora.com/What-does-flushing-files-or-Stdin-do-in-Python
- https://www.quora.com/Whats-the-difference-between-buffered-I-O-and-unbuffered-I-O
- https://stackoverflow.com/questions/8409050/unix-buffered-vs-unbuffered-i-o
- https://medium.com/@bramblexu/three-ways-to-close-buffer-for-stdout-stdin-stderr-in-python-8be694bd2737
- https://www.quora.com/In-C-what-does-buffering-I-O-or-buffered-I-O-mean
- any
Cloning and running locally
To clone the repository and install the most updated version of the package that may not yet be available on the latest release through PyPI, run:
git clone https://github.com/Shail-Shouryya/yt-videos-list.git
cd yt_videos_list/python # MacOS/Linux
pip3 install . # MacOS/Linux
# if that doesn't work:
python3 -m pip install . # MacOS/Linux
cd yt_videos_list\python # Windows
pip install . # Windows
# if that doesn't work:
python -m pip install . # Windows
To make your own changes to the yt_videos_list
python package and run the changes locally:
# make changes to the codebase in the
# ===> /dev <=== directory
python3 minifier.py # MacOS/Linux
pip3 install . # MacOS/Linux
python minifier.py # Windows
pip install . # Windows
NOTE that the changes you make to the codebase SHOULD BE MADE in the yt_videos_list/python/dev
directory!!
- the code in the
yt_videos_list/python/yt-videos-list
directory is minified with- leading indents stipped to the minimum (1 space for each nested scope)
- whitespace for padding (e.g. extra spaces to align variable assignments) stripped
- comments stripped
- as a result, the code in the
yt_videos_list/python/yt-videos-list
directory is NOT human readable, and theyt_videos_list/python/dev
directory should be used for development instead!- the
minifier.py
module performs all the code preprocessing and packages the code fromyt_videos_list/python/dev
into the final version seen in theyt_videos_list/python/yt-videos-list
directory - so running
minifier.py
before installing the local package withpip install .
(Windows) orpip3 install .
is essential!
- the
Running tests
The tests use the custom ThreadWithResult
subclass of threading.Thread
provided by the save-thread-result
package, so make sure you install that module using
pip3 install -U save-thread-result # MacOS/Linux
pip install -U save-thread-result # Windows
# if that doesn't work:
python3 -m pip install -U save-thread-result # MacOS/Linux
python -m pip install -U save-thread-result # Windows
Then, make sure you're in the yt_videos_list/python
directory, then run:
tests\run_tests.bat # Windows
#### Any shell on MacOS/Linux
bash tests/run_tests.sh # this works
csh tests/run_tests.sh # this works
dash tests/run_tests.sh # this works
ksh tests/run_tests.sh # this also works
tcsh tests/run_tests.sh # this works too
zsh tests/run_tests.sh # this works as well
# you can try other shells and
# they should work too, since
# there's no special syntax in
# the run_tests.sh file
Usage Statistics
If you found this interesting or useful, please consider starring this repo so other people can more easily find and use this. Thanks!