Skip to content
This repository has been archived by the owner on Aug 16, 2023. It is now read-only.

Add more sites to the scrapy start_urls #6

Open
groovecoder opened this issue Mar 13, 2016 · 6 comments
Open

Add more sites to the scrapy start_urls #6

groovecoder opened this issue Mar 13, 2016 · 6 comments

Comments

@groovecoder
Copy link
Member

With just nextcity.org and govtech.com, I was able to crawl and index 18.8k articles/pages. @chimchim237 knows the bigger list of real sites to crawl.

@chimchim237
Copy link
Collaborator

@groovecoder i made a REALLY good list. ...and I thought I emailed it to you? but now i can't find it. :(

@groovecoder
Copy link
Member Author

Yeah, I thought I saw that somewhere too ... but I can't find it in email. 😢 Re-creating on GH issue will keep it public and permanent at least.

@chimchim237
Copy link
Collaborator

chimchim237 commented May 5, 2016

nextcity.org
strongtowns.org
citylab.com
iqc.ou.edu
urbanland.uli.org
planetizen.com
streetsblog.net
governing.com

cnt.org/blog
thehappycity.com/blog/
smartgrowthamerica.org/blog
smartgrowthtulsa.com/blog
100resilientcities.org/blog
sunlightfoundation.com/blog
brookings.edu/about/programs/metro/research

@chimchim237
Copy link
Collaborator

I found the file to edit, and can easily do that if you want me to, this weekend. Lemme know.

@chimchim237
Copy link
Collaborator

chimchim237 commented Mar 24, 2017

@chimchim237
Copy link
Collaborator

Another one:
http://datasmart.ash.harvard.edu/

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants