Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

获取weibo.cn部分的cookies的一点建议 #51

Open
LichMscy opened this issue Mar 31, 2017 · 1 comment
Open

获取weibo.cn部分的cookies的一点建议 #51

LichMscy opened this issue Mar 31, 2017 · 1 comment

Comments

@LichMscy
Copy link

LichMscy commented Mar 31, 2017

其实可以不用进行验证码操作,受作者启发,可以先登录weibo.com的无验证码入口(微博账号安全里设为常登陆地点可以免验证码),然后直接在phontomjs模拟打开weibo.cnweibo.cn会是登录状态,这时候获取cookies便可。

由于我自己实现了,代码如下,仅供参考:

def init_phantomjs_driver():
    headers = {
        'Cookie': 'YF-Ugrow-G0=b02489d329584fca03ad6347fc915997; SUB=_2AkMvgPj2dcPxrAFYnPgWyGvkZYpH-jycVZEAAn7uJhMyOhgv7nBSqSVOKynW2PbhU4768kfRGZgNPwXeRA..; SUBP=0033WrSXqPxfM72wWs9jqgMF55529P9D9WWEFXHsNpvgJdQjr1GM.e765JpVF020SKM7e0571hMc',  # 未登录时weibo.com的cookie
    }
    for key, value in headers.items():
        webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.{}'.format(key)] = value
    useragent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.110 Safari/537.36'
    webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.settings.userAgent'] = useragent

    #   local path refer phantomjs
    driver = webdriver.PhantomJS(executable_path='xxxxxxxphantomjs路径xxxxxxx')
    driver.set_window_size(1366, 768)
    return driver
browser = weibo_auto_handle.init_phantomjs_driver()
    browser.get("http://weibo.com")
    time.sleep(3)
    failure = 0
    while "微博-随时随地发现新鲜事" == browser.title and failure < 5:
        failure += 1
        username = browser.find_element_by_name("username")
        pwd = browser.find_element_by_name("password")
        login_submit = browser.find_element_by_class_name('W_btn_a')
        username.clear()
        username.send_keys(account['usn'])
        pwd.clear()
        pwd.send_keys(account['pwd'])
        login_submit.click()
        time.sleep(5)

        # if browser.find_element_by_class_name('verify').is_displayed():
        #     logging.error("Verify code is needed! (Account: %s)" % account)

    if "我的首页 微博-随时随地发现新鲜事" in browser.title:
        browser.get('http://weibo.cn/')
        cookie = dict()
        if "我的首页" in browser.title:
            for elem in browser.get_cookies():
                cookie[elem["name"]] = elem["value"]
        # p2 = persist_iics.Persist()
        # p2.save_account_cookies(accounts[0][0], cookie, datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
        logging.error('Account cookies updated! (Account_id: %s)' % account['usn'])
        return cookie
@LiuXingMing
Copy link
Owner

嗯,想法不错,少量作业的情况可以用这个。
但是如果抓取量大的话登录的账号比较多,不可能人工去设置,另外微博对IP有限制,爬得快的要加代理,也不适用。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants