docs clarification #23

Ostapp · 2019-05-02T07:52:12Z

Does it assign a different UA to each request?
Does it assign a different UA to each request retry?

vortexkd · 2020-04-04T09:07:46Z

For anyone who still wants the answer to this,
Yes it assigns a new user agent to each request.
You can refer to exactly how here https://pypi.org/project/fake-useragent/

but tldr, you can use the RANDOM_UA_TYPE setting (which defaults to random)

and the middleware will generate a new user agent string for each request based on the above criteria.

i-chaochen · 2020-09-01T23:32:38Z

Thank @alecxe for providing this great project.

For scrapy-proxies, I wonder what do you mean by set RANDOM_UA_PER_PROXY to be true?

Usage with scrapy-proxies

To use with middlewares of random proxy such as scrapy-proxies, you need:

set RANDOM_UA_PER_PROXY to True to allow switch per proxy
set priority of RandomUserAgentMiddleware to be greater than scrapy-proxies, so that proxy is set before handle UA

Do I need to first pip install scrapy_proxies, and then I add RANDOM_UA_PER_PROXY = True in my setting.py? or it is already included and I can add RANDOM_UA_PER_PROXY = True directly.

Also, for scrapy_proxies priority, do I need to add another DOWNLOADER_MIDDLEWARE for scrapy-proxies? I mean there will be two DOWNLOADER_MIDDLEWARES, respectively. And I then just set two priorities of fake-useragent are larger than scrapy-proxies, so I can have proxy + fake user agent together?

Because you mentioned fake user agent needs to turn off built-in UserAgentMiddleware and RetryMiddleware, and scrapy-proxies used RetryMiddleware. I am confused whether should I use the RetryMiddleware in DOWNLOADER_MIDDLEWARES or not. Thanks in advance!

# Retry many times since proxies often fail
RETRY_TIMES = 10
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
    'scrapy_proxies.RandomProxy': 100,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}

# Proxy list containing entries like
# http://host1:port
# http://username:password@host2:port
# http://host3:port
# ...
PROXY_LIST = '/path/to/proxy/list.txt'

# Proxy mode
# 0 = Every requests have different proxy
# 1 = Take only one proxy from the list and assign it to every requests
# 2 = Put a custom proxy to use in the settings
PROXY_MODE = 0

# If proxy mode is 2 uncomment this sentence :
#CUSTOM_PROXY = "http://host1:port"

alecxe · 2020-09-06T02:07:50Z

@i-chaochen thank you for the kind words and the questions. Docs could definitely be better for this project, I agree.

Do I need to first pip install scrapy_proxies, and then I add RANDOM_UA_PER_PROXY = True in my setting.py? or it is already included and I can add RANDOM_UA_PER_PROXY = True directly.

Yeah, scrapy_proxies is not listed in project requirements and you would need to install it separately.

Also, for scrapy_proxies priority, do I need to add another DOWNLOADER_MIDDLEWARE for scrapy-proxies? I mean there will be two DOWNLOADER_MIDDLEWARES, respectively. And I then just set two priorities of fake-useragent are larger than scrapy-proxies, so I can have proxy + fake user agent together?

It seems so. Though, I have not used this combination of scrapy-fake-useragent and scrapy-proxies myself. I'd say do some experimentation with the middlewares setup while logging proxies and headers.

Hope that helps.

i-chaochen · 2020-09-12T01:24:39Z

@alecxe Thanks. After reading your code and a couple of tries I think I figured it out and tested it OK.

RANDOM_UA_PER_PROXY = True
in the spider file, to add scrapy.Request(meta={'proxy'} : `your_proxy_address`)

But just need to remember, if we set RANDOM_UA_PER_PROXY = True, the UA would be fixed for each request and only random for each proxy address.

alecxe added the help wanted label Dec 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs clarification #23

docs clarification #23

Ostapp commented May 2, 2019

vortexkd commented Apr 4, 2020 •

edited

Loading

i-chaochen commented Sep 1, 2020 •

edited

Loading

alecxe commented Sep 6, 2020

i-chaochen commented Sep 12, 2020 •

edited

Loading

docs clarification #23

docs clarification #23

Comments

Ostapp commented May 2, 2019

vortexkd commented Apr 4, 2020 • edited Loading

i-chaochen commented Sep 1, 2020 • edited Loading

Usage with scrapy-proxies

alecxe commented Sep 6, 2020

i-chaochen commented Sep 12, 2020 • edited Loading

vortexkd commented Apr 4, 2020 •

edited

Loading

i-chaochen commented Sep 1, 2020 •

edited

Loading

i-chaochen commented Sep 12, 2020 •

edited

Loading