Fastembed API provides a straightforward way to generate text embeddings for paragraphs, making it easier to retrieve vector embeddings for downstream tasks like semantic search or text similarity.
It was developed as a companyon of Boc·ajarro which, as is written in PHP, needed some Python help in order to create text embeddings in order to feed a vector database.
As it's meant to be executed in a server without a graphic card, it doesn't rely on expensive (in terms of computation) Pytorch setups. Instead, it uses FastEmbed so it can be executed in the CPU's of a standard Virtual Private Server.
-
Currently, the API supports multilingual embeddings using the model sentence-transformers/paraphrase-multilingual-mpnet-base-v2 and accepts text input one paragraph at a time.
-
As it's meant to be executed in a private network, it does not implement any authentication (or authorization) mechanism.
To get started, clone the repository:
git clone https://github.com/estudio-hawara/fastembed-api
cd fastembed-api
For a local setup using a virtual environment:
# 1. Create a Python virtual environment
python -m venv .venv
# 2. Activate the Python virtual environment
source .venv/bin/activate
# 3. Install the dependencies
pip install -r requirements.txt
# 4. Start the service
uvicorn main:app
The API should now be available at http://localhost:8000
.
To use Docker, follow these steps:
# 1. Build the Docker Image
docker compose build
# 2. Start the Service as a Background Container
docker compose up -d
The API should now be available at http://localhost:8000
.
-
Method:
POST
-
Description: Accepts a paragraph of text and returns vector embeddings for that text.
-
Request:
- Content-Type:
application/json
- JSON Body:
{ "paragraph": "Your paragraph here" }
- Content-Type:
-
Response:
- JSON Body:
{ "embeddings": [ ... ] }
- The
embeddings
attribute contains an array with the embedding vectors.
- JSON Body:
- Method:
GET
- Description: Returns a 200 status if the service is up and ready.
The docker-compose.yml
file includes the following healthcheck
configuration to ensure the service is fully initialized:
healthcheck:
test: curl --fail http://localhost:8000/health || exit 1
interval: 2s
timeout: 5s
retries: 3
start_period: 5s
This configuration ensures that Docker waits until the service is ready before marking it as "healthy."
This project caches FastEmbed files in a temporary directory for quicker container restarts. The .onnx
model and related files are stored in a Docker volume as specified in docker-compose.yml
:
volumes:
- fastembed_cache:/tmp/fastembed_cache
Using this volume prevents repeated downloads of model files, speeding up restarts and preserving cache data between runs.
- Python: 3.12 (>=3.12 and <3.13)
- Docker: No specific version required.
To run unit tests:
pytest
These tests ensure basic functionality. Benchmark tests will be added in future updates.
This project is licensed under the MIT License.