-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest fails with Cannot read properties of undefined (reading 'text')
#472
Comments
Hey @dmotxilla! 👋 I'm here to help you with any bugs, questions, or contributions you have. Let's squash those bugs together! The error Here is the corrected OPENAI_API_KEY=
# Update these with your pinecone details from your dashboard.
# PINECONE_INDEX_NAME is in the indexes tab under "index name" in blue
# PINECONE_ENVIRONMENT is in indexes tab under "Environment". Example: "us-east1-gcp"
PINECONE_API_KEY=
PINECONE_ENVIRONMENT=us-east1-gcp
PINECONE_INDEX_NAME=index1 Ensure that all the required environment variables ( Additionally, verify the structure of the documents being processed. Ensure that the documents loaded by import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
const filePath = 'docs';
export const run = async () => {
try {
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new PDFLoader(path),
});
const rawDocs = await directoryLoader.load();
console.log('rawDocs', rawDocs); // Log rawDocs to inspect structure
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await textSplitter.splitDocuments(rawDocs);
console.log('split docs', docs); // Log split docs to inspect structure
console.log('creating vector store...');
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME);
await PineconeStore.fromDocuments(docs, embeddings, {
pineconeIndex: index,
namespace: PINECONE_NAME_SPACE,
textKey: 'text',
});
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};
(async () => {
await run();
console.log('ingestion complete');
})(); By following these steps, you should be able to identify and resolve the issue [2]. |
thanks for the answer. Anyway, using the value "us-east1-gcp" as PINECONE_ENVIRONMENT works a bit better, but it still fails: the documents returned by textSplitter.splitDocuments(rawDocs) do not have a 'text' property. They have a
|
The
Here’s the updated code snippet: import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { pinecone } from '@/utils/pinecone-client';
import { PDFLoader } from 'langchain/document_loaders/fs/pdf';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';
import { DirectoryLoader } from 'langchain/document_loaders/fs/directory';
const filePath = 'docs';
export const run = async () => {
try {
const directoryLoader = new DirectoryLoader(filePath, {
'.pdf': (path) => new PDFLoader(path),
});
const rawDocs = await directoryLoader.load();
console.log('rawDocs', rawDocs); // Log rawDocs to inspect structure
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const docs = await textSplitter.splitDocuments(rawDocs);
console.log('split docs', docs); // Log split docs to inspect structure
console.log('creating vector store...');
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME);
await PineconeStore.fromDocuments(docs, embeddings, {
pineconeIndex: index,
namespace: PINECONE_NAME_SPACE,
textKey: 'pageContent', // Use 'pageContent' instead of 'text'
});
} catch (error) {
console.log('error', error);
throw new Error('Failed to ingest your data');
}
};
(async () => {
await run();
console.log('ingestion complete');
})();
By following these steps, you should be able to resolve the issues and successfully ingest your data [1]. |
I asked Pinecone about it: Reply: gpt4-pdf-chatbot-langchain needs to be updated to use the latest version of the NodeJS SDK. |
I close this issue, as the original subject was solved, and I created a new ticket with the current issue. |
The README says:
but I do not see an "environment" variable.
I create a pinecone account, I get the api key (and I use it for OPENAI_API_KEY), I create an index with name "index1" (and I use it for PINECONE_INDEX_NAME), with default parameters. I see a "Region" parameter with value "us-east-1". Is that what I need to use for
PINECONE_ENVIRONMENT
?but then, the ingest fails with
Cannot read properties of undefined (reading 'text')
.The text was updated successfully, but these errors were encountered: