Skip to content

Release 0.2.0

Compare
Choose a tag to compare
@tchaton tchaton released this 26 Feb 13:33
· 280 commits to main since this release
a05495e

⚡ Welcome to Lightning Data

We developed StreamingDataset to optimize training of large datasets stored on the cloud while prioritizing speed, affordability, and scalability.

Specifically crafted for multi-gpu & multi-node (with DDP, FSDP, etc...), distributed training with large models, it enhances accuracy, performance, and user-friendliness. Now, training efficiently is possible regardless of the data's location. Simply stream in the required data when needed.

The StreamingDataset is compatible with any data type, including images, text, video, audio, geo-spatial, and multimodal data and it is a drop-in replacement for your PyTorch IterableDataset class. For example, it is used by Lit-GPT to pretrain LLMs.

This release marks the first of the release from litdata. From now on, we will track all changes within a CHANGELOG.md file.

Thanks to all contributors.