Merge from main

mozilla-ai · Dec 3, 2024 · 0dd8b78 · 0dd8b78
2 parents be82900 + ab1c6c8
commit 0dd8b78
Show file tree

Hide file tree

Showing 27 changed files with 500 additions and 40 deletions.
diff --git a/.gitignore b/.gitignore
@@ -163,3 +163,6 @@ cython_debug/
 
 # Generated audio files
 *.wav
+
+# VS files
+.vscode
diff --git a/README.md b/README.md
@@ -1,30 +1,100 @@
-<img src="./images/Blueprints-logo.png" alt="Project Logo" style="width:25%;">
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+<p align="center"><img src="./images/Blueprints-logo.png" width="25%" alt="Project logo"/></p>
 
-# Blueprint Title
+# Document-to-podcast: a Blueprint by Mozilla.ai for generating podcasts from documents using local AI
 
-This blueprint guides you to ...
+This blueprint demonstrate how you can use open-source models & tools to convert input documents into a podcast featuring two speakers.
+It is designed to work on most local setups or with [GitHub Codespaces](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb), meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local.
 
-![Blueprint Diagram](./images/blueprint-diagram.png)
+### 👉 📖 For more detailed guidance on using this project, please visit our [Docs here](https://mozilla-ai.github.io/document-to-podcast/).
 
+## Quick-start
+
+Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine.
+
+---
+
+### **Option 1: GitHub Codespaces**
+
+The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces:
+
+[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb)
+
+Once the Codespaces environment launches, follow these steps:
+
+1. **Install Dependencies**
+   Inside the Codespaces terminal, run:
+   ```bash
+   pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+2. **Run the Demo**
+   Inside the Codespaces terminal, start the Streamlit demo by running:
+   ```bash
+   python -m streamlit run demo/app.py
+   ```
+
+### **Option 2: Local Installation**
+
+1. **Clone the Repository**
+   Inside the Codespaces terminal, run:
+   ```bash
+   git clone https://github.com/mozilla-ai/document-to-podcast.git
+   cd document-to-podcast
+   ```
+
+2. **Install Dependencies**
+   Inside the terminal, run:
+   ```bash
+   pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+3. **Run the Demo**
+   Inside the terminal, start the Streamlit demo by running:
+   ```bash
+   python -m streamlit run demo/app.py
+   ```
+
+## How it Works
+
+<img src="./images/document-to-podcast-diagram.png" width="1200" />
+
+
+1. **Document Upload**
+   Start by uploading a document in a supported format (e.g., PDF, .txt, or .docx).
+
+2. **Document Pre-Processing**
+   The uploaded document is processed to extract and clean the text. This involves:
+   - Extracting readable text from the document.
+   - Removing noise such as URLs, email addresses, and special characters to ensure the text is clean and structured.
+
+3. **Script Generation**
+   The cleaned text is passed to a language model to generate a podcast transcript in the form of a conversation between two speakers.
+   - **Model Loading**: The system selects and loads a pre-trained LLM optimized for running locally, using the llama_cpp library. This enables the model to run efficiently on CPUs, making them more accessible and suitable for local setups.
+   - **Customizable Prompt**: A user-defined "system prompt" guides the LLM in shaping the conversation, specifying tone, content, speaker interaction, and format.
+   - **Output Transcript**: The model generates a podcast script in structured format, with each speaker's dialogue clearly labeled.
+     Example output:
+     ```json
+     {
+         "Speaker 1": "Welcome to the podcast on AI advancements.",
+         "Speaker 2": "Thank you! So what's new this week for the latest AI trends?",
+         "Speaker 1": "Where should I start.. Lots has been happening!",
+         ...
+     }
+     ```
+   This step ensures that the podcast script is engaging, relevant, and ready for audio conversion.
+
+4. **Audio Generation**
+  - The generated transcript is converted into audio using a Text-to-Speech (TTS) model.
+  -	Each speaker is assigned a distinct voice.
+	- The final output is saved as an audio file in formats like MP3 or WAV.
 
 ## Pre-requisites
 
 - **System requirements**:
   - OS: Windows, macOS, or Linux
   - Python 3.10 or higher
-  - Minimum RAM: 4 GB
-  - Disk space: 1 GB minimum
+  - Minimum RAM: 16 GB
+  - Disk space: 32 GB minimum
 
 - **Dependencies**:
-  - Dependencies listed in `requirements.txt`
-
-## Installation
-
----
-
-## Quick-start
-
----
+  - Dependencies listed in `pyproject.toml`
 
 ## License
 

diff --git a/demo/app.py b/demo/app.py
@@ -3,13 +3,13 @@
 
 import streamlit as st
 
-from opennotebookllm.preprocessing import DATA_LOADERS, DATA_CLEANERS
-from opennotebookllm.inference.model_loaders import (
+from document_to_podcast.preprocessing import DATA_LOADERS, DATA_CLEANERS
+from document_to_podcast.inference.model_loaders import (
     load_llama_cpp_model,
     load_parler_tts_model_and_tokenizer,
 )
-from opennotebookllm.inference.text_to_speech import _speech_generation_parler
-from opennotebookllm.inference.text_to_text import text_to_text_stream
+from document_to_podcast.inference.text_to_speech import text_to_speech
+from document_to_podcast.inference.text_to_text import text_to_text_stream
 
 
 PODCAST_PROMPT = """
@@ -112,7 +112,7 @@ def load_text_to_speech_model_and_tokenizer():
                     st.write(text)
                     speaker_id = re.search(r"Speaker (\d+)", text).group(1)
                     with st.spinner("Generating Audio..."):
-                        speech = _speech_generation_parler(
+                        speech = text_to_speech(
                             text.split(f'"Speaker {speaker_id}":')[-1],
                             speech_model,
                             speech_tokenizer,

diff --git a/docs/api.md b/docs/api.md
@@ -1,7 +1,7 @@
 # API Reference
 
-::: opennotebookllm.preprocessing.data_cleaners
+::: document_to_podcast.preprocessing.data_cleaners
 
-::: opennotebookllm.inference.model_loaders
+::: document_to_podcast.inference.model_loaders
 
-::: opennotebookllm.inference.text_to_text
+::: document_to_podcast.inference.text_to_text
diff --git a/docs/assets/custom.css b/docs/assets/custom.css
@@ -0,0 +1,7 @@
+@import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');
+
+:root {
+    --md-default-font: "Inter", sans-serif;
+    --md-code-font: "Fira Code", monospace;
+    --md-primary-font: "Inter", sans-serif;
+}
diff --git a/docs/customization.md b/docs/customization.md
@@ -0,0 +1,89 @@
+# 🎨 **Customization Guide**
+
+The Document-to-Podcast Blueprint is designed to be flexible and easily adaptable to your specific needs. This guide will walk you through some key areas you can customize to make the Blueprint your own.
+
+---
+
+## 🧠 **Changing the Text-to-Text Model**
+You can swap the language model used for generating podcast scripts to suit your needs, such as using a smaller model for faster processing or a larger one for higher quality outputs.
+
+Customizing the app:
+
+1. Open the `app.py` file.
+2. Locate the `load_text_to_text_model` function.
+3. Replace the `model_id` with the ID of your desired model from a supported repository (e.g., Hugging Face). Note: The model repository must be in GGFUF format, for example: `Qwen/Qwen2.5-1.5B-Instruct-GGUF`
+
+Example:
+
+```python
+@st.cache_resource
+def load_text_to_text_model():
+    return load_llama_cpp_model(
+        model_id="Qwen/Qwen2.5-1.5B-Instruct-GGUF/qwen2.5-1.5b-instruct-q8_0.gguf"
+```
+
+
+## 📝 **Modifying the Text Generation Prompt**
+The system prompt defines the structure and tone of the generated script. Customizing this can allow you to generate conversations that align with your project’s needs.
+
+Customizing the app:
+
+1.	Open the `app.py` file.
+2.	Locate the PODCAST_PROMPT variable.
+3.	Edit the instructions to suit your desired conversation style.
+
+Example:
+
+```python
+PODCAST_PROMPT = """
+You are a radio show scriptwriter generating lively and humorous dialogues.
+Speaker 1: A comedian who is interested in learning new things.
+Speaker 2: A scientist explaining concepts in a fun way.
+"""
+```
+
+
+## 🎙️ **Customizing Speaker Descriptions**
+Adjusting the speaker profiles allows you to create distinct and engaging voices for your podcast.
+
+Customizing the app:
+
+1. Open the `app.py` file.
+2.	Locate the SPEAKER_DESCRIPTIONS dictionary.
+3.	Update the descriptions to define new voice characteristics for each speaker
+Example:
+
+```python
+PODCAST_PROMPT = """
+SPEAKER_DESCRIPTIONS = {
+    "1": "A cheerful and animated voice with a fast-paced delivery.",
+    "2": "A calm and deep voice, speaking with authority and warmth."
+}
+"""
+```
+
+
+## 🧠 **Changing the Text-to-Speech Model**
+You can use a different TTS model to achieve specific voice styles or improve performance.
+
+Customizing the app:
+
+1. Open the `app.py` file.
+2. Locate the `load_text_to_speech_model_and_tokenizer` function.
+3.	Replace the model_id with your preferred TTS model.
+
+Example:
+```python
+@st.cache_resource
+def load_text_to_speech_model_and_tokenizer():
+    return load_parler_tts_model_and_tokenizer(
+        "parler-tts/parler-tts-mini-expresso", "cpu")
+
+## 💡 Other Customization Ideas
+
+- Add Multiple Speakers: Modify `script_to_audio.py` to include additional speakers in your podcast.
+
+
+## 🤝 **Contributing to the Blueprint**
+
+Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](../future-features-contributions)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better!
diff --git a/docs/future-features-contributions.md b/docs/future-features-contributions.md
@@ -0,0 +1,30 @@
+# 🚀 **Future Features & Contributions**
+
+The Document-to-Podcast Blueprint is an evolving project designed to grow with the help of the open-source community. Whether you’re an experienced developer or just starting, there are many ways you can contribute and help shape the future of this tool.
+
+---
+## 🛠️ **This Page is Evolving**
+As the community grows, we’ll use this space to highlight contributions, showcase new ideas, and share guidance on expanding the Blueprint ecosystem.
+
+We have some ideas of how this Blueprint can be extend and improved, will be sharing these ideas and request for contributions shortly.
+
+---
+
+## 🌟 **How You Can Contribute**
+
+### 💡 **Share Your Ideas**
+Got a vision for how this Blueprint could be improved? Share your suggestions through [GitHub Discussions](https://github.com/mozilla-ai/document-to-podcast/discussions). Your insights can help inspire new directions for the project.
+
+### 🛠️ **Enhance the Code**
+Dive into the codebase and contribute enhancements, optimizations, or bug fixes. Whether it's a small tweak or a big feature, every contribution helps! Start by checking our Contribution Guide (coming soon).
+
+
+### 🌍 **Build New Blueprints**
+This project is part of a larger initiative to create a collection of reusable starter code solutions that use open-source AI tools. If you’re inspired to create your own Blueprint, we’d love to see it!
+
+---
+
+## 🤝 **Get Involved**
+- Visit our [GitHub Discussions](https://github.com/mozilla-ai/document-to-podcast/discussions) to explore ongoing conversations and share your thoughts.
+
+Your contributions help make this Blueprint better for everyone. Thank you for being part of the journey! 🎉
diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -0,0 +1,49 @@
+Get started with Document-to-Podcast using one of the two options below: **GitHub Codespaces** for a hassle-free setup or **Local Installation** for running on your own machine.
+
+---
+
+### ☁️ **Option 1: GitHub Codespaces**
+
+The fastest way to get started. Click the button below to launch the project directly in GitHub Codespaces:
+
+[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://github.com/codespaces/new?hide_repo_select=true&ref=main&repo=888426876&skip_quickstart=true&machine=standardLinux32gb)
+
+Once the Codespaces environment launches, follow these steps:
+
+1. **Install Dependencies**
+   Inside the Codespaces terminal, run:
+```bash
+pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+```
+2. **Run the Demo**
+   Inside the Codespaces terminal, start the Streamlit demo by running:
+```bash
+python -m streamlit run demo/app.py
+```
+
+
+### 💻  **Option 2: Local Installation**
+
+1. **Clone the Repository**
+   Inside the Codespaces terminal, run:
+
+```bash
+git clone https://github.com/mozilla-ai/document-to-podcast.git
+cd document-to-podcast
+```
+
+
+2. **Install Dependencies**
+   Inside the terminal, run:
+
+
+```bash
+pip install -e . --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
+```
+
+3. **Run the Demo**
+   Inside the terminal, start the Streamlit demo by running:
+
+```bash
+python -m streamlit run demo/app.py
+```
diff --git a/docs/images/Blueprints-logo.png b/docs/images/Blueprints-logo.png
diff --git a/docs/images/document-to-podcast-diagram.png b/docs/images/document-to-podcast-diagram.png
diff --git a/docs/index.md b/docs/index.md
@@ -1 +1,37 @@
-# Wellcome to Blueprint docs
+# **Document-to-Podcast Blueprint**
+
+<div style="text-align: center;">
+  <img src="../images/document-to-podcast-diagram.png" alt="Project Logo" style="width: 100%; margin-bottom: 1px; margin-top: 1px;">
+</div>
+
+Blueprints empower developers to easily integrate AI capabilities into their projects using open-source models and tools.
+
+These docs are your your companion to mastering the **Document-to-Podcast Blueprint**—a local-first approach for transforming documents into engaging podcasts.
+
+---
+
+### 🚀 **Get Started Quickly**
+#### _Start building your own Document-to-Podcast pipeline in minutes:_
+- **[Getting Started](getting-started.md):** Quick setup and installation instructions.
+
+### 🔍 **Understand the System**
+#### _Dive deeper into how the Blueprint works:_
+- **[Step-by-Step Guide](step-by-step-guide.md):** A detailed breakdown of the system’s design and workflow.
+- **[API Reference](api.md):** Explore the technical details of the core modules.
+
+### 🎨 **Make It Yours**
+#### _Customize the Blueprint to fit your needs:_
+- **[Customization Guide](customization.md):** Tailor prompts, voices, and settings to create unique podcasts.
+
+### 🌟 **Join the Community**
+#### _Help shape the future of Blueprints:_
+- **[Future Features & Contributions](future-features-contributions.md):** Learn about exciting upcoming features and how to contribute to the project.
+
+
+Have more questions? Reach out to us on Discord and we'll see how we can help:
+
+---
+
+## **Why Blueprints?**
+
+Blueprints are more than starter code—they’re your gateway to building AI-powered solutions with confidence. With step-by-step guidance, modular design, and open-source tools, we make AI accessible for developers of all skill levels.
-Original file line number
+Diff line change
@@ Expand Up / @@ -163,3 +163,6 @@ cython_debug/ @@
     # Generated audio files
     *.wav
+    # VS files
+    .vscode