Replies: 2 comments 3 replies
-
We don’t care what you do with the data. If you want to build a database knock yourself out. Even commercial is fine. The biggest issue we have is API scraping which is what we combat constantly. I’m fine with people building databases but I don’t want them using the API to do it (meaning issuing 100 million requests just to get all the episodes). I’d rather provide a download. It’s just that the full (with episodes) download is 150 gigs so it’s why I only provide the feeds tabs currently. I’m not sure what you mean by the weekly dump “not being at a predictable url”. The download url for that hasn’t changed in a long time. If there is a bug let me know. What @mitchdowney does for Podverse is he tracks the /recent/data endpoint and just follows along with all the feed updates. There are a bunch of ways to stay up to date actually. I’m glad to share them. Sorry for the late reply. This should probably be in the “database” repo instead of in the namespace repo. |
Beta Was this translation helpful? Give feedback.
-
Understand the frustration with API terms. Your workaround with weekly data dumps sounds smart! Have you considered exploring publishing differentials or utilizing podping for updates? |
Beta Was this translation helpful? Give feedback.
-
Podcast Index APIs Terms of Service - March 2, 2021
Reading Section 5.e.,
[you will not]:
It seems from this that the way I would want to use the API is actually not permitted, and that's why I have instead built a sync tool to import the weekly data dumps, compute a diff of what's changed, and then replicate all of the APIs that I need. I understand that the weekly data dump is there to prevent scraping the API, but it's not clear to me why someone who already has a copy of the data dump shouldn't also be able to make occasional API calls and integrate those responses into a local copy of the database.
Prohibition 2 is also a concern to the extent that it relates to 1.:
I am also curious to ask, what do the cache headers referenced in prohibition 1 actually say?
As for what I'm currently doing (downloading the whole database every week, computing a diff, then integrating that), I feel this could be streamlined. Currently, it's a large file to download, it takes time for the diff/import process to complete, and I don't think the dump is at a predictable URL so it's a process I have to do manually. It would perhaps be more convenient and efficient to publish diffs. Or, and maybe you have reasons against this, since the Podcast Index actually knows when all podcasts get updated, it could just podping the changes on behalf of those that don't already use podping.
Beta Was this translation helpful? Give feedback.
All reactions