Skip to content

Commit

Permalink
fix(docs): Update broken links. (#36)
Browse files Browse the repository at this point in the history
* fix(docs): Update broken links.

* Add search function in docs

* Add basic docstring to config.py

* Add docstring to save_waveform_as_file

* Add missing references and fix a typo

---------

Co-authored-by: Kostis-S-Z <[email protected]>
  • Loading branch information
daavoo and Kostis-S-Z authored Dec 4, 2024
1 parent dccd551 commit 5791a95
Show file tree
Hide file tree
Showing 7 changed files with 34 additions and 11 deletions.
4 changes: 4 additions & 0 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,7 @@
::: document_to_podcast.inference.text_to_text

::: document_to_podcast.inference.text_to_speech

::: document_to_podcast.podcast_maker.script_to_audio

::: document_to_podcast.podcast_maker.config
2 changes: 1 addition & 1 deletion docs/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,4 @@ def load_text_to_speech_model_and_tokenizer():

## 🤝 **Contributing to the Blueprint**

Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](../future-features-contributions)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better!
Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](future-features-contributions.md)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better!
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# **Document-to-Podcast Blueprint**

<div style="text-align: center;">
<img src="../images/document-to-podcast-diagram.png" alt="Project Logo" style="width: 100%; margin-bottom: 1px; margin-top: 1px;">
<img src="images/document-to-podcast-diagram.png" alt="Project Logo" style="width: 100%; margin-bottom: 1px; margin-top: 1px;">
</div>

Blueprints empower developers to easily integrate AI capabilities into their projects using open-source models and tools.
Expand Down
18 changes: 9 additions & 9 deletions docs/step-by-step-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# **Step-by-Step Guide: How the Document-to-Podcast Blueprint Works**

Transforming static documents into engaging podcast episodes involves a integration of pre-processing, LLM-powered transcript generation, and text-to-speech generation. Here's how it all works under the hood:
Transforming static documents into engaging podcast episodes involves an integration of pre-processing, LLM-powered transcript generation, and text-to-speech generation. Here's how it all works under the hood:

---

Expand Down Expand Up @@ -33,15 +33,15 @@ Cleaner input data ensures that the model works with reliable and consistent inf
### ⚙️ **Key Components in this Doc Pre-Processing**
**1 - File Loading**

- Uses functions defined in `data_loaders.py`
- Uses functions defined in [`data_loaders.py`](api.md/#document_to_podcast.preprocessing.data_loaders)

- Supports `.html`, `.pdf`, `.txt`, and `.docx` formats.

- Extracts readable text from uploaded files using specialized loaders.

**2 - Text Cleaning**

- Uses functions defined in [`data_cleaners.py`](../api/#document_to_podcast.inference.data_cleaners)
- Uses functions defined in [`data_cleaners.py`](api.md/#document_to_podcast.preprocessing.data_cleaners)

- Removes unwanted elements like URLs, email addresses, and special characters using Python's `re` library, which leverages **Regular Expressions** (regex) to identify and manipulate specific patterns in text.

Expand All @@ -55,15 +55,15 @@ In this step, the pre-processed text is transformed into a conversational podcas

**1 - Model Loading**

- The [`model_loader.py`](../api/#document_to_podcast.inference.model_loaders) script is responsible for loading GGUF-type models using the `llama_cpp` library.
- The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) script is responsible for loading GGUF-type models using the `llama_cpp` library.

- The function `load_llama_cpp_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model.

- This approach of using the `llama_cpp` library supports efficient CPU-based inference, making language models accessible even on machines without GPUs.

**2 - Text-to-Text Generation**

- The [`text_to_text.py`](../api/#document_to_podcast.inference.text_to_text) script manages the interaction with the language model, converting input text into a structured conversational podcast script.
- The [`text_to_text.py`](api.md/#document_to_podcast.inference.text_to_text) script manages the interaction with the language model, converting input text into a structured conversational podcast script.

- It uses the `chat_completion` function to process the input text and a customizable system prompt, guiding the language to generate a text output (e.g. a coherent podcast script between speakers).

Expand All @@ -80,15 +80,15 @@ In this final step, the generated podcast transcript is brought to life as an au

**1 - Text-to-Speech Audio Generation**

- The `text_to_speech.py` script converts text into audio using a specified TTS model and tokenizer.
- The [`text_to_speech.py`](api.md/#document_to_podcast.inference.text_to_speech) script converts text into audio using a specified TTS model and tokenizer.

- A **speaker profile** defines the voice characteristics (e.g., tone, speed, clarity) for each speaker.

- The function `text_to_speech` takes the input text (e.g podcast script) and speaker profile, generating a waveform (audio data) that represents the spoken version of the text.

**2 - Parsing and Combining Voices**

- The `script_to_audio.py` script ensures each speaker’s dialogue is spoken in their unique voice.
- The [`script_to_audio.py`](api.md/#document_to_podcast.podcast_maker.script_to_audio) script ensures each speaker’s dialogue is spoken in their unique voice.

- The function `parse_script_to_waveform` splits the dialogue script by speakers and uses `text_to_speech` to generate audio for each speaker, stitching them together into a full podcast.

Expand Down Expand Up @@ -145,8 +145,8 @@ This demo uses [Streamlit](https://streamlit.io/), an open-source Python framewo

## 🎨 **Customizing the Blueprint**

To better understand how you can tailor this Blueprint to suit your specific needs, please visit the **[Customization Guide](../customization)**.
To better understand how you can tailor this Blueprint to suit your specific needs, please visit the **[Customization Guide](customization.md)**.

## 🤝 **Contributing to the Blueprint**

Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](../future-features-contributions)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better!
Want to help improve or extend this Blueprint? Check out the **[Future Features & Contributions Guide](future-features-contributions.md)** to see how you can contribute your ideas, code, or feedback to make this Blueprint even better!
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ markdown_extensions:
- pymdownx.superfences

plugins:
- search
- mkdocstrings:
handlers:
python:
Expand Down
9 changes: 9 additions & 0 deletions src/document_to_podcast/podcast_maker/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@


class SpeakerConfig(BaseModel):
"""
Pydantic model that stores configuration of an individual speaker for the TTS model.
"""

model_config = ConfigDict(arbitrary_types_allowed=True)

model: PreTrainedModel
Expand All @@ -16,5 +20,10 @@ class SpeakerConfig(BaseModel):


class PodcastConfig(BaseModel):
"""
Pydantic model that stores configuration of all the speakers for the TTS model. This allows different speakers to
use different models and configurations.
"""

speakers: Dict[str, SpeakerConfig]
sampling_rate: int = 44_100
9 changes: 9 additions & 0 deletions src/document_to_podcast/podcast_maker/script_to_audio.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,15 @@ def parse_script_to_waveform(script: str, podcast_config: PodcastConfig):
def save_waveform_as_file(
waveform: np.ndarray, sampling_rate: int, filename: str
) -> None:
"""
Save the output of the TTS (a numpy waveform) to a .wav file using the soundfile library.
Args:
waveform: 2D numpy array of a waveform
sampling_rate: Usually 44.100, but check the specifications of the TTS model you are using.
filename: the destination filename to save the audio
"""
sf.write(filename, waveform, sampling_rate)


Expand Down

0 comments on commit 5791a95

Please sign in to comment.