-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/fix url and add better ordering with numbers #3
base: master
Are you sure you want to change the base?
Feature/fix url and add better ordering with numbers #3
Conversation
…t using playlists
While testing the full range I ran into a problem trying to download MP3 files from 2016. I'm going to investigate and try to find a fix for this. |
OK it should be fixed now, waiting to see how it does on older talks. |
any update on this? |
I haven't heard anything from the original author but I have been told that if you use this branch it works great. |
Hi, I'm experiencing a problem with your branch:
I run on Python 3.8.0 on MacOS 10.14.6. |
Thank you for reaching out, I did a quick Google search on your error and came up with the following: The issue is that some of the "root" certificates on your computer are missing and unable to validate the SSL connection to the church's website. Installing these "root" certificates will enable you to run the program. |
Worked like a charm! Thanks! |
Hello,
Steps 4 through 6 were made both before and after modifying the files with the same results. Any help? |
Hello! |
Wow... it worked! Just as you imagined, I didn't check the "Add Python 3.8 to PATH" option, so I uninstalled it and installed it again. It worked perfectly. Thank you very much for your help. I'm impressed with the result. One question, what will happen with the current files the next time I download new audios? For instance, I downloaded just from 2018 and 2019, what if I want to download from 2016? Will this script skip those already downloaded? |
During the download the python script usually makes a cache of all the HTML pages it downloads. This enables it to avoid re-downloading those files again. As long as you don't remove the cache directory then I think it should work as you expect. I believe it does recreate the "play list" files though since those are usually affected. There are play list files created by topic, speaker, and session if I recall correctly. |
There have been a few changes to the church website, is there any chance this gets an update? |
@Jacobobber1087, I have been keeping this tool updated under my Github fork of this project. Have you given that a try? |
@GatorQue Oh ok, thank you! It seems to work, but the destination folder is empty after it completes, do you know what could cause this? |
@Jacobobber1087, I see the same results. Let me look into what is causing this and post a new version. Something must have changed in the format of the HTML to prevent the program from working right. |
@Jacobobber1087 - It seems that the church has hidden the MP3 download link behind the "Options" side panel which only seems to load when you click on the "Options" button (3 dots) and then click on the Download arrow. There is Javascript code which loads the Options side panel and the Download arrow loads the link somehow. I haven't found a good way to do that with my current way of doing things. I will need to see if I can find a Python based web browser that is capable of performing the Javascript commands needed to trigger the MP3 media link to appear in order to fix this. I will keep looking into this but it isn't going to be an easy fix like I was hoping. |
@GatorQue Yeah, I was very curious how you were getting around the Javascript in previous versions of this haha... I ended up writing an automation in Microsoft Power Automate Desktop that uses Firefox to iterate through the sites and manually click to the download link. It technically worked but it took forever and was super clunky. Is there any way to interact with Javascript through a script that you know of? |
@Jacobobber1087, As far as tools are concerned, I have initially looked at splash, a docker image with Qt5 WebKit and a HTTP API for performing queries (usually paired with a Python scrapy-splash package). I have also discovered requests-html which uses a headless chromium install downloaded using the pyppeteer python package (but since that package has been abandoned the download fails). There is also a Python package Selenium that also uses a headless chromium to perform web scrapes which I haven't done anything with yet. I think if we can combine the above javascript lines somehow with a headless install of chromium, we might be able to retrieve the information we need. Another approach would be to identify WHAT/HOW the javascript downloads and modifies the DOM to create the "This Page (MP3)" download reference element when we click on the Options and Download arrows. Yet another approach might be to "predict" the media URL by guessing the filename that would be used from the information in the initial HTML but I suspect that might not be as stable (but certainly faster) approach. |
@GatorQue Ok cool. I hadn't heard of a headless browser before, that seems like a really good solution. Would the browser need to be in the foreground? I assume not if you're sending requests through JS? |
@Jacobobber1087, |
@Jacobobber1087, |
@GatorQue Sorry for the late reply. I am currently serving as a missionary for the church so I do not have reliable access to a computer. I will look forward to the next release. |
Thank you for creating the General Conference Downloader tool. I fixed a few issues and added a new feature. Please accept this pull request or comment on what you would like me to change.