Building Generative AI Applications Using StarRocks and Open Source Large Language Models #33603
Closed
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Generative AI is a rapidly evolving field of artificial intelligence that has the potential to revolutionize the way we live and work. One of the most promising areas of generative AI is large language models (LLMs). LLMs are trained on massive datasets of text and code, and they can be used to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
One example of a powerful LLM is ChatGPT, which has become a global phenomenon thanks to its ability to generate human-quality text. ChatGPT is just one example of the many ways that LLMs can be used to build transformative AI-powered applications.
In this article, we will discuss how you can leverage the power of LLMs on your private data to build transformative AI-powered applications using StarRocks. We will also walk through an example of building a semantic search using Python and LangChain, machine learning models (OpenAI), and StarRocks as a vector store for finding answers to questions about StarRocks using natural language queries.
Semantic search is a type of search that understands the meaning and intent behind a query in order to retrieve relevant results. This is in contrast to traditional keyword search, which simply matches keywords in the query to keywords in the dataset.
To build a semantic search application using StarRocks, we can use the following steps:
Once the semantic search application is deployed, users can submit natural language queries to the application. The application will use the machine learning model to match the vector embeddings for the queries to vector embeddings for data in StarRocks. The application will then return the most relevant results to the user.
Semantic search is just one example of how LLMs can be used to build transformative AI-powered applications using StarRocks. Other use cases include recommendation systems, anomaly detection, and customer support chatbots.
What are vector embeddings?
Vector embeddings are numerical representations of data that capture the meaning and relationships between different data points. They are created by training a machine learning model on a large dataset of text or code. The model learns to represent each data point as a vector of numbers, where similar data points are represented by vectors that are close to each other in space.
Here is an example of how vector embeddings can be used for semantic search:
Imagine that we have a dataset of product reviews. We can train an LLM on this dataset to learn the meaning and context of the reviews. We can then use the LLM to generate vector embeddings for each review.
Now, let's say that a user submits the following query to our semantic search application: "What is the best smartphone for photography?" The application will use the machine learning model to match the vector embedding for the query to the vector embeddings for the product reviews. The application will then return the most relevant reviews to the user.
Semantic search for docs.starrocks.io
https://python.langchain.com/docs/integrations/vectorstores/starrocks credit to @dirtysalt
It is to be run in a python lab notebook and I used the URL: https://raw.githubusercontent.com/langchain-ai/langchain/933655b4acd74d5d158271151be3def0b909db98/docs/extras/integrations/vectorstores/starrocks.ipynb
Additional pre-requirements:
Conclusion
In this article, we demonstrated how to use OpenAI APIs with help from LangChain, how to generate embeddings, and how to use StarRocks to do queries on vector data. We also learned how to build a semantic search application to find answers whose question most closely matched the intent behind a natural language query, rather than searching based on the existing keywords in the dataset. We also demonstrated how efficient it is to bring the power of machine learning models to your data using StarRocks.
Beta Was this translation helpful? Give feedback.
All reactions