403 Error when attempting to use spy = ETF('SPY') #12

Cerebex · 2024-09-15T15:02:31Z

Describe the bug
403 Error when attempting to use spy = ETF('SPY')

To Reproduce
Steps to reproduce the behavior:

from pyetfdb_scraper.etf import ETF, load_etfs
spy = ETF('SPY')

Expected behavior
Pull information properly

Additional context
It times out and waits 15 minutes but does not fix the issue.

The text was updated successfully, but these errors were encountered:

lvxhnat · 2024-09-16T06:56:36Z

Hi @Cerebex, I had a look deeper into this issue, and it seems like VettaFi are now using JavaScript-based checks to verify that requests are not coming from bots. This can probably be solved by using Selenium to retrieve the page source, but I am quite busy these few days, so it will take me a while to get to.

Will keep posted when a fix is pushed.

Cerebex · 2024-09-16T12:59:12Z

Really appreciate it. I found I could get it to work with selenium, as you stated, but only when I physically opened the browser which is not ideal.

lvxhnat · 2024-09-16T14:26:30Z

@Cerebex If I am not wrong, you can run Selenium in headless mode. Are you saying it doesn't work when you do that? Either way, it will be great if you can share that code to help fix this issue. It will be great help to get a load off my back! :)

lvxhnat · 2024-10-15T06:28:45Z

Update: I will get this solved sometime in November/December. Apologies to whoever is using this package, but I do not have the time now to work on this.

GentlemanXR · 2024-10-19T18:59:03Z

Using this as a guide: https://www.zenrows.com/blog/403-web-scraping#set-fake-user-agent
This is not my area of expertise, so I'm not sure if it's a permanent fix. Seems to work for me thus far.
`# etf_scraper.py
class ETFScraper(object):

def __init__(
    self, 
    ticker: str,
    user_agent: str = None,
):

    self.ticker = ticker
    self.base_url: str = "https://etfdb.com/etf"

    self.user_agents = load_user_agents()
    self.request_headers: dict = {
        ######
        # "User-Agent": user_agent if not user_agent else random.choice(self.user_agents),    <------- replace this line with the line below
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36",,
        "Referer": "https://etfdb.com/etfs/QQQ"
    }
    self.scrape_url: str = f"{self.base_url}/{ticker}"

    soup = self.__request_ticker()

    self.etf_ticker_body_soup = soup.find("div", {"id": "etf-ticker-body"})`

lvxhnat added the bug Something isn't working label Sep 16, 2024

lvxhnat self-assigned this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

403 Error when attempting to use spy = ETF('SPY') #12

403 Error when attempting to use spy = ETF('SPY') #12

Cerebex commented Sep 15, 2024

lvxhnat commented Sep 16, 2024

Cerebex commented Sep 16, 2024

lvxhnat commented Sep 16, 2024

lvxhnat commented Oct 15, 2024

GentlemanXR commented Oct 19, 2024 •

edited

Loading

403 Error when attempting to use spy = ETF('SPY') #12

403 Error when attempting to use spy = ETF('SPY') #12

Comments

Cerebex commented Sep 15, 2024

lvxhnat commented Sep 16, 2024

Cerebex commented Sep 16, 2024

lvxhnat commented Sep 16, 2024

lvxhnat commented Oct 15, 2024

GentlemanXR commented Oct 19, 2024 • edited Loading

GentlemanXR commented Oct 19, 2024 •

edited

Loading