Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load External Data to Vector Database #2

Open
ASahu16 opened this issue Apr 16, 2024 · 1 comment
Open

Load External Data to Vector Database #2

ASahu16 opened this issue Apr 16, 2024 · 1 comment

Comments

@ASahu16
Copy link
Contributor

ASahu16 commented Apr 16, 2024

Description: Implement functionality to load external data into the vector database. This involves developing scripts or tools to import data from various sources such as DOCX or PDF files and store them in the vector database.

Tasks:
- Develop a script/tool to parse data from DOCX/PDF files.
- Design a mechanism to transform the parsed data into vector representations.
- Implement logic to store the vectorized data in the database.

@aarushiksk
Copy link

The steps that can be taken to solve this are:

Step 1) Parsing the PDF/DOCX using PyMuPDF(for text) or OCR(for images) or similar python libraries.
Step2) Choosing an embedding model for converting this to embeddings.
Step 3) Connecting to ChromaDB or FAISS using their APIs/Documentation

Assign this to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants