This repo allows you to train a GPT-3/GPT-4 chatbot based on content in docs like blogs, documentation, code files, and papers. Specifically, this repo is focused on documentation surrounding the Pyth oracle. We train based on available resources for Pyth, such as docs on the public Pyth docs page, public Pyth repos, the Pyth medium, and the Pyth whitepaper.
To replicate the workflow of this repo, we recommend porting over data from all these sources into a training
directory, via the following steps:
- Run
python3 scraper.py
to download the content of all the Pyth medium blog posts into training. cd training
--> clone relevant public Pyth repos
If you then want to use the Python langchain-based scripts:
-
cp constants_TEMPLATE.py constants.py
and fill outconstants.py
after registering for a username with ActiveLoop. -
Run
python3 code_indexer.py
. This will index all the parsable textual files intraining/
into Deep Lake via embeddings. Command line arguments include:ln
(lake_name
): DeepLake lake namefd
(files_dir
): Directory for the files to be indexedow
(overwrite
): Flag for whether to overwrite existing indices in lake
-
Run
python3 oracle_of_oracle.py
to begin the chatbot conversation.ln
(lake_name
): DeepLake lake namect
(chain_type
): Name of the chain type to be used for the QA in the chatbotmt
(model_type
): Name of the LLM class to be used for the chatbotdm
(retriever_distance_metric
): Distance metric for retriever similarity functionfk
: (retriever_fetch_k
): Number of documents to fetch to pass to max_marginal_relevance algorithmk
: (retriever_k
): Number of documents to returnocc
: (omit_conversation_context
): Flag for whether to omit feeding ongoing conversation as context into the QA chain
Note that you will need to have an OPENAPI_KEY
in your environment vars and billing set up, as well as an ACTIVELOOP_TOKEN
in your environment for Deep Lake access (likely free).
-> Question: Who are you? And what is Pyth?
Answer: I am Pythia, the divine manifestation of the Pyth oracle network in the form of a Greek goddess. Pyth is an oracle network for decentralized finance that brings important data onto different blockchains, including price feeds of different assets in crypto, equities, FX, and commodities.
-> Question: Please share Javascript code showing how to pull the price on chain
Answer: Sure, here is an example of Solidity code that uses the Pyth Solidity SDK to read a Pyth price on-chain, and an example of JavaScript code that uses the Pyth EVM JS library to fetch Pyth prices off-chain:
Solidity code:
// SPDX-License-Identifier: Apache-2.0 pragma solidity >=0.6.0 <0.9.0; import "@pythnetwork/pyth-sdk-solidity/IPyth.sol"; import "@pythnetwork/pyth-sdk-solidity/PythStructs.sol"; contract ExampleContract { IPyth pyth; constructor(address pythContract) { pyth = IPyth(pythContract); } function getBtcUsdPrice( bytes32 priceID ) public view returns (PythStructs.Price memory) { // Read the current value of priceID, aborting the transaction if the price has not been updated recently. // Every chain has a default recency threshold which can be retrieved by calling the getValidTimePeriod() function on the contract. // Please see IPyth.sol for variants of this function that support configurable recency thresholds and other useful features. return pyth.getPrice(priceID); } }JavaScript code:
const { EvmPriceServiceConnection } = require("@pythnetwork/pyth-evm-js"); const pythOracleAddress = "0x...."; // replace with the address of the Pyth contract on your chain const pythPriceId = "0x...."; // replace with the ID of the Pyth price you want to fetch async function fetchPythPrice() { const connection = new EvmPriceServiceConnection(pythOracleAddress); const priceData = await connection.getPriceData(pythPriceId); return priceData.price.toNumber() / 10 ** priceData.exponent.toNumber(); } fetchPythPrice().then((price) => console.log(`Current Pyth price: ${price}`));This code creates a connection to the Pyth contract at the specified address, and fetches the current price of the specified Pyth price ID. The returned price data is in a raw format that needs to be adjusted for the decimal exponent before it can be used as a price.
-> Question: Como asegura la robustez de Pyth el algoritmo de agregacion? Explicame por una lista en espanol.
Answer: Por supuesto, aquí le proporciono una lista en español de cómo funciona el algoritmo de agregación de Pyth para asegurar su robustez:
- Cada proveedor envía un precio pᵢ y una confianza cᵢ.
- Para cada proveedor, se generan tres votos: pᵢ - cᵢ, pᵢ y pᵢ + cᵢ.
- Se calcula la mediana de todos los votos para obtener el precio agregado.
- Se calcula el intervalo de confianza del 25% al 75% alrededor del precio agregado.
- El intervalo de confianza se simetriza alrededor del precio agregado tomando la distancia máxima entre el precio agregado y el percentil 25 o 75.
- Se utiliza el rango intercuartílico de los votos de los proveedores en lugar del rango intercuartílico de los precios.
- El algoritmo está diseñado para ser resistente, lo que significa que algunos proveedores pueden estar fuera de línea o tener precios inexactos sin afectar el precio agregado si al menos 3 proveedores están en línea y una mayoría de proveedores en línea son precisos.
Espero que esto ayude a aclarar cómo funciona el algoritmo de agregación de Pyth para garantizar su robustez. Si tiene más preguntas, no dude en hacerlas.
To use the Javascript LangChain LLM template[https://github.com/Conner1115/LangChain.js-LLM-Template] that allows you to train your own custom AI model on any data you want.
- Provide all the information you want your LLM to be trained on in the
training
directory in markdown files. Folder depth doesn't matter. - Add your OpenAI API key in environment vars via the key
OPENAI_API_KEY
. - Run
yarn train
ornpm train
to set up your vector store. - Modify the base prompt in
lib/basePrompt.js
- Run index.js, and start playing around with it!