openAIScientist
is an R package that generates comprehensive scientific analysis using OpenAI's API. The analysis is output in markdown format, making it easy to integrate into various documentation workflows.
openAIScientist
was designed to provide users with a quick overview and analysis of their datasets, helping them to understand and interpret their data more efficiently.
- Installation
- Other Dependencies
- Usage
- Setting Up Your API Key
- Documentation
- Disclaimer
- Contributing
- License
First, install the package along with its dependencies if you haven't already:
library(devtools)
install_github("noluyorAbi/openaAIScientist")
The openAIScientist
package relies on the following R packages, which will be installed automatically:
httr
: For HTTP requests.utils
: For utility functions like capturing output.readr
: For reading and writing data.
To use the openAIScientist
package, follow these steps:
- Load the package.
- Load your API key from
.Renviron
. - Use the
openAIScientist_generate_scientific_analysis
oropenAIScientist_generate_visualization_rmd
function to generate the analysis.
data
(mandatory): A data frame containing the dataset to analyze.api_key
(mandatory): Your OpenAI API key as a string.output_name
(optional): The name of the output markdown file (default is "Analysis").additional_prompt
(optional): Additional instructions for the OpenAI API.
data
(mandatory): A data frame containing the dataset to analyze.api_key
(mandatory): Your OpenAI API key as a string.output_name
(optional): The name of the output RMarkdown file (default is "Visualization").additional_prompt
(optional): Additional instructions for the OpenAI API.
# Load the package
library(openAIScientist)
# Load environment variables from the .Renviron file
readRenviron(".Renviron")
# Example data
data <- data.frame(
var1 = rnorm(100),
var2 = rnorm(100),
outcome = sample(c(0, 1), 100, replace = TRUE)
)
# Retrieve the API key from environment variables
api_key <- Sys.getenv("OPENAI_API_KEY")
# Generate scientific analysis
analysis <- openAIScientist_generate_scientific_analysis(data, api_key, "Analysis")
# Generate scientific analysis with additional prompt
analysis <- openAIScientist_generate_scientific_analysis(data, api_key, "Analysis-ADDITIONAL-PROMPT","Write the analysis in German")
# Generate visualization RMarkdown
visualization <- openAIScientist_generate_visualization_rmd(data, api_key, "Visualization")
# Generate visualization RMarkdown with additional prompt
visualization <- openAIScientist_generate_visualization_rmd(data, api_key, "Visualization-ADDITIONAL-PROMPT", "make the visualizations for red-green colorblind")
To securely store and load your OpenAI API key, you should use the .Renviron
file. This file allows you to set environment variables that R can access.
-
Create/Edit
.Renviron
File:-
Open your
.Renviron
file. If it doesn't exist, create it in your home directory or in the root of your project folder. -
Add your OpenAI API key in the following format:
OPENAI_API_KEY=your_openai_api_key
-
-
Save and Reload Environment Variables:
-
Save the
.Renviron
file. -
In R, use the following command to reload the environment variables:
readRenviron("~/.Renviron")
-
-
Access the API Key in Your R Script:
-
Retrieve the API key using
Sys.getenv
as shown in the usage example above.api_key <- Sys.getenv("OPENAI_API_KEY")
-
While you can directly paste your API key as an argument in the generate_scientific_analysis
function, it is considered bad practice and results in “smelly” code. Using environment variables via .Renviron
is a more secure and clean approach.
# Directly pasting the API key as an argument (not recommended)
analysis <- openAIScientist_generate_scientific_analysis(data, "your_openai_api_key", "Analysis")
Using environment variables as demonstrated in the previous examples is the recommended approach.
For detailed documentation, please refer to the function documentation generated by Roxygen2. You can access the documentation within R:
?openAIScientist_generate_scientific_analysis
?openAIScientist_generate_visualization_rmd
The analysis is created with GPT-4, a very powerful and fast AI. However, there can still be inaccuracies and formatting issues as AIs can be unpredictable sometimes. For formatting issues, try reanalyzing the dataset.
If you find any issues or have suggestions for improvements, please create an issue or a pull request on GitHub.
This package is licensed under the GPL-3 License.
Made with ♥ by noluyorAbi for FortStaSoft @ LMU Munich, July 2024.