Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

403 Error when attempting to use spy = ETF('SPY') #12

Open
Cerebex opened this issue Sep 15, 2024 · 5 comments
Open

403 Error when attempting to use spy = ETF('SPY') #12

Cerebex opened this issue Sep 15, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@Cerebex
Copy link

Cerebex commented Sep 15, 2024

Describe the bug
403 Error when attempting to use spy = ETF('SPY')

To Reproduce
Steps to reproduce the behavior:

from pyetfdb_scraper.etf import ETF, load_etfs
spy = ETF('SPY')

Expected behavior
Pull information properly

Additional context
It times out and waits 15 minutes but does not fix the issue.

@lvxhnat
Copy link
Owner

lvxhnat commented Sep 16, 2024

Hi @Cerebex, I had a look deeper into this issue, and it seems like VettaFi are now using JavaScript-based checks to verify that requests are not coming from bots. This can probably be solved by using Selenium to retrieve the page source, but I am quite busy these few days, so it will take me a while to get to.

Will keep posted when a fix is pushed.

@lvxhnat lvxhnat added the bug Something isn't working label Sep 16, 2024
@Cerebex
Copy link
Author

Cerebex commented Sep 16, 2024

Really appreciate it. I found I could get it to work with selenium, as you stated, but only when I physically opened the browser which is not ideal.

@lvxhnat
Copy link
Owner

lvxhnat commented Sep 16, 2024

@Cerebex If I am not wrong, you can run Selenium in headless mode. Are you saying it doesn't work when you do that? Either way, it will be great if you can share that code to help fix this issue. It will be great help to get a load off my back! :)

@lvxhnat lvxhnat self-assigned this Oct 15, 2024
@lvxhnat
Copy link
Owner

lvxhnat commented Oct 15, 2024

Update: I will get this solved sometime in November/December. Apologies to whoever is using this package, but I do not have the time now to work on this.

@GentlemanXR
Copy link

GentlemanXR commented Oct 19, 2024

Using this as a guide: https://www.zenrows.com/blog/403-web-scraping#set-fake-user-agent
This is not my area of expertise, so I'm not sure if it's a permanent fix. Seems to work for me thus far.
`# etf_scraper.py
class ETFScraper(object):

def __init__(
    self, 
    ticker: str,
    user_agent: str = None,
):

    self.ticker = ticker
    self.base_url: str = "https://etfdb.com/etf"

    self.user_agents = load_user_agents()
    self.request_headers: dict = {
        ######
        # "User-Agent": user_agent if not user_agent else random.choice(self.user_agents),    <------- replace this line with the line below
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",,
        "Referer": "https://etfdb.com/etfs/QQQ"
    }
    self.scrape_url: str = f"{self.base_url}/{ticker}"

    soup = self.__request_ticker()

    self.etf_ticker_body_soup = soup.find("div", {"id": "etf-ticker-body"})`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants