docs: Update README to use LaserEncoderPipeline

Paulooh007 · Oct 19, 2023 · 82d400a · 82d400a
1 parent 8175af9
commit 82d400a
Showing 1 changed file with 15 additions and 9 deletions.
diff --git a/laser_encoders/README.md b/laser_encoders/README.md
@@ -27,27 +27,33 @@ You can install laser_encoders using pip:
 
 Here's a simple example of how you can download and initialise the tokenizer and encoder with just one step.
 
-**Note:** By default, the models will be downloaded to the `~/.cache/laser_encoders` directory. To specify a different download location, you can provide the argument `model_dir=path/to/model/directory` to the initialize_tokenizer and initialize_encoder functions
+**Note:** By default, the models will be downloaded to the `~/.cache/laser_encoders` directory. To specify a different download location, you can provide the argument `model_dir=path/to/model/directory` to the initialize_tokenizer and LaserEncoderPipeline functions
 
 ```py
-from laser_encoders import initialize_encoder, initialize_tokenizer
+from laser_encoders import LaserEncoderPipeline
+
+# Initialize the LASER encoder with the specified language
+encoder = LaserEncoderPipeline(lang="igbo")
+
+# Encode a list of sentences into embeddings
+embeddings = encoder.encode_sentences(list_of_strings)
+```
+
+You also have the option to initialize the tokenizer and encoder separately. If you choose to go this route, ensure to set the `tokenize` argument to `False` when initializing `LaserEncoderPipeline`.
+
+```py
+from laser_encoders import LaserEncoderPipeline, initialize_tokenizer
 
 # Initialize the LASER tokenizer
 tokenizer = initialize_tokenizer(lang="igbo")
 tokenized_sentence = tokenizer.tokenize("nnọọ, kedu ka ị mere")
 
 # Initialize the LASER sentence encoder
-encoder = initialize_encoder(lang="igbo")
+encoder = LaserEncoderPipeline(lang="igbo")
 
 # Encode sentences into embeddings
 embeddings = encoder.encode_sentences([tokenized_sentence])
 ```
-
-When initializing the encoder, you have the option to enable both tokenization and encoding by setting the `tokenize` flag to `True`. Below is an example of how to use it:
-```py
-encoder = initialize_encoder(lang="igbo", spm=True, tokenize=True)
-embeddings = encoder("nnọọ, kedu ka ị mere")
-```
 >setting the `spm` flag to `True` tells the encoder to also download the accompanying spm model
 
 **Supported Languages:** You can specify any language from the [FLORES200](https://github.com/facebookresearch/flores/blob/main/flores200/README.md#languages-in-flores-200) dataset. This includes both languages identified by their full codes (like "ibo_Latn") and simpler alternatives (like "igbo").