Skip to content

This is a project to analyze the 10-K filings of companies using LLMs

License

Notifications You must be signed in to change notification settings

kompy99/Stock-Ticker-10K-Analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock-Ticker-10K-Analyzer

This is a project to analyze the 10-K filings of companies using LLMs and create visualizations

Stocks Dropdown List

Stock Visualizer sample

Stock Visualizer sample

Technology Stack

Vector Database: ChromaDB

ChromaDB was selected as the vector database since it is a popular open-source vector database. It provides the ability to create Collections and also the ability to store documents along with their embeddings. These two features were the main reason for using ChromaDB.

LLM Provider: Azure OpenAI

Azure OpenAI was used as the LLM provider since free Azure credits are available to students. These free credits can be used to deploy and use an LLM of choice on Azure.

LLM: GPT-35-turbo-16k

Taking into account the costs as well as the context length, the GPT 3.5 model with a 16k token limit was chosen as the model for this project.

Backend: Flask in Python

The data was downloaded using the sec_edgar_downloader library and the chromaDB and OpenAI SDKs were used for data processing and analysis, all done in python. Hence it was easy to create a simple python web application using Flask.

Code Structure

Front end:

Back end:

Workflow

index.html and app.py

  1. A new ticker symbol is entered into the textbox on the dashboard, and submitted.

Adding new ticker

Submitted new ticker

ingest.py

  1. The 10K filings for the company from 1995-2023 (whichever are available given the company's IPO date) are downloaded.
  2. The filings are pre-processed.
  3. Each of the filings are chunked, converted into embeddings, and stored in ChromaDB.

analytics.py

  1. For each year, the required metrics (Eg: revenue, income, earnings per share) are analysed through Retrieval Augmented Generation.
  2. The metrics are stored as a JSON file.

The ticker can then be selected from the drop-down on the dashboard, and the metrics can be visualized.

New ticker added

The metrics for the new ticker can be visualized only after the data has been downloaded and processed.

(Tested on Mac M2: It takes about 2 hours for a new ticker's metrics to be generated)

About

This is a project to analyze the 10-K filings of companies using LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published