Skip to content

Commit

Permalink
Merge from main
Browse files Browse the repository at this point in the history
  • Loading branch information
Kostis-S-Z committed Dec 3, 2024
2 parents be82900 + ab1c6c8 commit 0dd8b78
Show file tree
Hide file tree
Showing 27 changed files with 500 additions and 40 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -163,3 +163,6 @@ cython_debug/

# Generated audio files
*.wav

# VS files
.vscode
100 changes: 85 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,100 @@
<img src="./images/Blueprints-logo.png" alt="Project Logo" style="width:25%;">
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
<p align="center"><img src="./images/Blueprints-logo.png" width="25%" alt="Project logo"/></p>

# Blueprint Title
# Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI

This blueprint guides you to ...
This blueprint demonstrate how you can use open-source models & tools to convert input documents into a podcast featuring two speakers.
It is designed to work on most local setups or with [GitHub Codespaces](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb), meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local.

![Blueprint Diagram](./images/blueprint-diagram.png)
### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/document-to-podcast/).

## Quick-start

Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine.

---

### **Option 1: GitHub Codespaces**

The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces:

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb)

Once the Codespaces environment launches, follow these steps:

1. **Install Dependencies**
Inside the Codespaces terminal, run:
```bash
pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
2. **Run the Demo**
Inside the Codespaces terminal, start the Streamlit demo by running:
```bash
python -m streamlit run demo/app.py
```

### **Option 2: Local Installation**

1. **Clone the Repository**
Inside the Codespaces terminal, run:
```bash
git clone https://github.com/mozilla-ai/document-to-podcast.git
cd document-to-podcast
```

2. **Install Dependencies**
Inside the terminal, run:
```bash
pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
3. **Run the Demo**
Inside the terminal, start the Streamlit demo by running:
```bash
python -m streamlit run demo/app.py
```

## How it Works

<img src="./images/document-to-podcast-diagram.png" width="1200" />


1. **Document Upload**
Start by uploading a document in a supported format (e.g., PDF, .txt, or .docx).

2. **Document Pre-Processing**
The uploaded document is processed to extract and clean the text. This involves:
- Extracting readable text from the document.
- Removing noise such as URLs, email addresses, and special characters to ensure the text is clean and structured.

3. **Script Generation**
The cleaned text is passed to a language model to generate a podcast transcript in the form of a conversation between two speakers.
- **Model Loading**: The system selects and loads a pre-trained LLM optimized for running locally, using the llama_cpp library. This enables the model to run efficiently on CPUs, making them more accessible and suitable for local setups.
- **Customizable Prompt**: A user-defined "system prompt" guides the LLM in shaping the conversation, specifying tone, content, speaker interaction, and format.
- **Output Transcript**: The model generates a podcast script in structured format, with each speaker's dialogue clearly labeled.
Example output:
```json
{
"Speaker 1": "Welcome to the podcast on AI advancements.",
"Speaker 2": "Thank you! So what's new this week for the latest AI trends?",
"Speaker 1": "Where should I start.. Lots has been happening!",
...
}
```
This step ensures that the podcast script is engaging, relevant, and ready for audio conversion.

4. **Audio Generation**
- The generated transcript is converted into audio using a Text-to-Speech (TTS) model.
- Each speaker is assigned a distinct voice.
- The final output is saved as an audio file in formats like MP3 or WAV.

## Pre-requisites

- **System requirements**:
- OS: Windows, macOS, or Linux
- Python 3.10 or higher
- Minimum RAM: 4 GB
- Disk space: 1 GB minimum
- Minimum RAM: 16 GB
- Disk space: 32 GB minimum

- **Dependencies**:
- Dependencies listed in `requirements.txt`

## Installation

---

## Quick-start

---
- Dependencies listed in `pyproject.toml`

## License

Expand Down
10 changes: 5 additions & 5 deletions demo/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@

import streamlit as st

from opennotebookllm.preprocessing import DATA_LOADERS, DATA_CLEANERS
from opennotebookllm.inference.model_loaders import (
from document_to_podcast.preprocessing import DATA_LOADERS, DATA_CLEANERS
from document_to_podcast.inference.model_loaders import (
load_llama_cpp_model,
load_parler_tts_model_and_tokenizer,
)
from opennotebookllm.inference.text_to_speech import _speech_generation_parler
from opennotebookllm.inference.text_to_text import text_to_text_stream
from document_to_podcast.inference.text_to_speech import text_to_speech
from document_to_podcast.inference.text_to_text import text_to_text_stream


PODCAST_PROMPT = """
Expand Down Expand Up @@ -112,7 +112,7 @@ def load_text_to_speech_model_and_tokenizer():
st.write(text)
speaker_id = re.search(r"Speaker (\d+)", text).group(1)
with st.spinner("Generating Audio..."):
speech = _speech_generation_parler(
speech = text_to_speech(
text.split(f'"Speaker {speaker_id}":')[-1],
speech_model,
speech_tokenizer,
Expand Down
6 changes: 3 additions & 3 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# API Reference

::: opennotebookllm.preprocessing.data_cleaners
::: document_to_podcast.preprocessing.data_cleaners

::: opennotebookllm.inference.model_loaders
::: document_to_podcast.inference.model_loaders

::: opennotebookllm.inference.text_to_text
::: document_to_podcast.inference.text_to_text
7 changes: 7 additions & 0 deletions docs/assets/custom.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');

:root {
--md-default-font: "Inter", sans-serif;
--md-code-font: "Fira Code", monospace;
--md-primary-font: "Inter", sans-serif;
}
89 changes: 89 additions & 0 deletions docs/customization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# 🎨 **Customization Guide**

The Document-to-Podcast Blueprint is designed to be flexible and easily adaptable to your specific needs. This guide will walk you through some key areas you can customize to make the Blueprint your own.

---

## 🧠 **Changing the Text-to-Text Model**
You can swap the language model used for generating podcast scripts to suit your needs, such as using a smaller model for faster processing or a larger one for higher quality outputs.

Customizing the app:

1. Open the `app.py` file.
2. Locate the `load_text_to_text_model` function.
3. Replace the `model_id` with the ID of your desired model from a supported repository (e.g., Hugging Face). Note: The model repository must be in GGFUF format, for example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`

Example:

```python
@st.cache_resource
def load_text_to_text_model():
return load_llama_cpp_model(
model_id="Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"
```


## 📝 **Modifying the Text Generation Prompt**
The system prompt defines the structure and tone of the generated script. Customizing this can allow you to generate conversations that align with your project’s needs.

Customizing the app:

1. Open the `app.py` file.
2. Locate the PODCAST_PROMPT variable.
3. Edit the instructions to suit your desired conversation style.

Example:

```python
PODCAST_PROMPT = """
You are a radio show scriptwriter generating lively and humorous dialogues.
Speaker 1: A comedian who is interested in learning new things.
Speaker 2: A scientist explaining concepts in a fun way.
"""
```


## 🎙️ **Customizing Speaker Descriptions**
Adjusting the speaker profiles allows you to create distinct and engaging voices for your podcast.

Customizing the app:

1. Open the `app.py` file.
2. Locate the SPEAKER_DESCRIPTIONS dictionary.
3. Update the descriptions to define new voice characteristics for each speaker
Example:

```python
PODCAST_PROMPT = """
SPEAKER_DESCRIPTIONS = {
"1": "A cheerful and animated voice with a fast-paced delivery.",
"2": "A calm and deep voice, speaking with authority and warmth."
}
"""
```


## 🧠 **Changing the Text-to-Speech Model**
You can use a different TTS model to achieve specific voice styles or improve performance.

Customizing the app:

1. Open the `app.py` file.
2. Locate the `load_text_to_speech_model_and_tokenizer` function.
3. Replace the model_id with your preferred TTS model.

Example:
```python
@st.cache_resource
def load_text_to_speech_model_and_tokenizer():
return load_parler_tts_model_and_tokenizer(
"parler-tts/parler-tts-mini-expresso", "cpu")

## 💡 Other Customization Ideas

- Add Multiple Speakers: Modify `script_to_audio.py` to include additional speakers in your podcast.


## 🤝 **Contributing to the Blueprint**

Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](../future-features-contributions)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better!
30 changes: 30 additions & 0 deletions docs/future-features-contributions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# 🚀 **Future Features & Contributions**

The Document-to-Podcast Blueprint is an evolving project designed to grow with the help of the open-source community. Whether you’re an experienced developer or just starting, there are many ways you can contribute and help shape the future of this tool.

---
## 🛠️ **This Page is Evolving**
As the community grows, we’ll use this space to highlight contributions, showcase new ideas, and share guidance on expanding the Blueprint ecosystem.

We have some ideas of how this Blueprint can be extend and improved, will be sharing these ideas and request for contributions shortly.

---

## 🌟 **How You Can Contribute**

### 💡 **Share Your Ideas**
Got a vision for how this Blueprint could be improved? Share your suggestions through [GitHub Discussions](https://github.com/mozilla-ai/document-to-podcast/discussions). Your insights can help inspire new directions for the project.

### 🛠️ **Enhance the Code**
Dive into the codebase and contribute enhancements, optimizations, or bug fixes. Whether it's a small tweak or a big feature, every contribution helps! Start by checking our Contribution Guide (coming soon).


### 🌍 **Build New Blueprints**
This project is part of a larger initiative to create a collection of reusable starter code solutions that use open-source AI tools. If you’re inspired to create your own Blueprint, we’d love to see it!

---

## 🤝 **Get Involved**
- Visit our [GitHub Discussions](https://github.com/mozilla-ai/document-to-podcast/discussions) to explore ongoing conversations and share your thoughts.

Your contributions help make this Blueprint better for everyone. Thank you for being part of the journey! 🎉
49 changes: 49 additions & 0 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine.

---

### ☁️ **Option 1: GitHub Codespaces**

The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces:

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb)

Once the Codespaces environment launches, follow these steps:

1. **Install Dependencies**
Inside the Codespaces terminal, run:
```bash
pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```
2. **Run the Demo**
Inside the Codespaces terminal, start the Streamlit demo by running:
```bash
python -m streamlit run demo/app.py
```


### 💻 **Option 2: Local Installation**

1. **Clone the Repository**
Inside the Codespaces terminal, run:

```bash
git clone https://github.com/mozilla-ai/document-to-podcast.git
cd document-to-podcast
```


2. **Install Dependencies**
Inside the terminal, run:


```bash
pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
```

3. **Run the Demo**
Inside the terminal, start the Streamlit demo by running:

```bash
python -m streamlit run demo/app.py
```
Binary file added docs/images/Blueprints-logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/document-to-podcast-diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 37 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1 +1,37 @@
# Wellcome to Blueprint docs
# **Document-to-Podcast Blueprint**

<div style="text-align: center;">
<img src="../images/document-to-podcast-diagram.png" alt="Project Logo" style="width: 100%; margin-bottom: 1px; margin-top: 1px;">
</div>

Blueprints empower developers to easily integrate AI capabilities into their projects using open-source models and tools.

These docs are your your companion to mastering the **Document-to-Podcast Blueprint**—a local-first approach for transforming documents into engaging podcasts.

---

### 🚀 **Get Started Quickly**
#### _Start building your own Document-to-Podcast pipeline in minutes:_
- **[Getting Started](getting-started.md):** Quick setup and installation instructions.

### 🔍 **Understand the System**
#### _Dive deeper into how the Blueprint works:_
- **[Step-by-Step Guide](step-by-step-guide.md):** A detailed breakdown of the system’s design and workflow.
- **[API Reference](api.md):** Explore the technical details of the core modules.

### 🎨 **Make It Yours**
#### _Customize the Blueprint to fit your needs:_
- **[Customization Guide](customization.md):** Tailor prompts, voices, and settings to create unique podcasts.

### 🌟 **Join the Community**
#### _Help shape the future of Blueprints:_
- **[Future Features & Contributions](future-features-contributions.md):** Learn about exciting upcoming features and how to contribute to the project.


Have more questions? Reach out to us on Discord and we'll see how we can help:

---

## **Why Blueprints?**

Blueprints are more than starter code—they’re your gateway to building AI-powered solutions with confidence. With step-by-step guidance, modular design, and open-source tools, we make AI accessible for developers of all skill levels.
Loading

0 comments on commit 0dd8b78

Please sign in to comment.