-
Notifications
You must be signed in to change notification settings - Fork 13
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'origin/main' into whoisjones/improved_n…
…aming_for_sampling # Conflicts: # tests/test_dataset_generator.py
- Loading branch information
Showing
30 changed files
with
77 additions
and
74 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
<h1 align="center">Dataset Generator</h1> | ||
![Fabricator Logo](resources/logo_fabricator.drawio_dark.png#gh-dark-mode-only) | ||
![Fabricator Logo](resources/logo_fabricator.drawio_white.png#gh-light-mode-only) | ||
|
||
<p align="center">A flexible open-source framework to generate datasets with large language models.</p> | ||
<p align="center"> | ||
<img alt="version" src="https://img.shields.io/badge/version-0.1-green"> | ||
|
@@ -8,8 +10,8 @@ | |
<div align="center"> | ||
<hr> | ||
|
||
[Installation](#installation) - [Basic Concepts](#basic-concepts) - [Examples](#examples) - [Tutorials](tutorials/TUTORIAL-1_OVERVIEW.md) - | ||
Paper - [Citation](#citation) | ||
[Installation](#installation) | [Basic Concepts](#basic-concepts) | [Examples](#examples) | [Tutorials](tutorials/TUTORIAL-1_OVERVIEW.md) | | ||
Paper | [Citation](#citation) | ||
|
||
<hr> | ||
</div> | ||
|
@@ -30,10 +32,10 @@ prompt customization, integration and sampling of fewshot examples or annotation | |
## Installation | ||
Using conda: | ||
``` | ||
git clone [email protected]:flairNLP/ai-dataset-generator.git | ||
cd ai-dataset-generator | ||
conda create -y -n aidatasetgenerator python=3.10 | ||
conda activate aidatasetgenerator | ||
git clone [email protected]:flairNLP/fabricator.git | ||
cd fabricator | ||
conda create -y -n fabricator python=3.10 | ||
conda activate fabricator | ||
pip install -e . | ||
``` | ||
|
||
|
@@ -45,10 +47,9 @@ we need four basic modules: a dataset, a prompt, a language model and a generato | |
unlabeled datasets and store the generated or annotated datasets with their `Dataset` class. Once | ||
created, you can share the dataset with others via the hub or use it for your model training. | ||
- <b>Prompt</b>: A prompt is the instruction made to the language model. It can be a simple sentence or a more complex | ||
template with placeholders. We utilize [langchain](https://github.com/langchain-ai/langchain) `PromptTemplate` classes | ||
and provide an easy interface for custom dataset generation prompts in which you can specify label options | ||
for the LLM to choose from, provide fewshot examples to support the prompt with or annotate an unlabeled dataset | ||
in a specific way. | ||
template with placeholders. We provide an easy interface for custom dataset generation prompts in which you can specify | ||
label options for the LLM to choose from, provide fewshot examples to support the prompt with or annotate an unlabeled | ||
dataset in a specific way. | ||
- <b>LLM</b>: We use [deepset's haystack library](https://github.com/deepset-ai/haystack) as our LLM interface. deepset | ||
supports a wide range of LLMs including OpenAI, all models from the HuggingFace model hub and many more. | ||
- <b>Generator</b>: The generator is the core of this framework. It takes a dataset, a prompt and a LLM and generates a | ||
|
@@ -64,8 +65,8 @@ as that: | |
```python | ||
import os | ||
from haystack.nodes import PromptNode | ||
from ai_dataset_generator import DatasetGenerator | ||
from ai_dataset_generator.prompts import BasePrompt | ||
from fabricator import DatasetGenerator | ||
from fabricator.prompts import BasePrompt | ||
|
||
prompt = BasePrompt( | ||
task_description="Generate a short movie review.", | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.