embeddy

A lightweight, embeddings-only model runtime with CLI and HTTP API. Built in Rust for performance and efficiency, embeddy allows you to download, manage, and run text embedding models from HuggingFace without heavy dependencies.

Features

Lightweight Runtime: Pure Rust implementation using the Candle ML framework
HuggingFace Integration: Download and cache models directly from HuggingFace Hub
Dynamic Model Loading: Load multiple models on-demand via API without restart
Model Preloading: Optionally preload models at startup for zero cold-start latency
Embedding Cache: LRU cache for repeated queries (configurable size)
Auto-Download: Automatically download models on first use
API Documentation: Built-in Swagger UI at /docs
Hardware Support: CPU and CUDA GPU acceleration
Model Management: Built-in registry for tracking, aliasing, and removing models
Docker Ready: Includes Dockerfile and docker-compose configurations
Multiple Formats: Supports both SafeTensors and PyTorch model formats (F16/F32)

Quick Start

Using Docker (Recommended)

# Pull and run from GitHub Container Registry
docker pull ghcr.io/cedrugs/embeddy:latest

# Or use docker-compose for production
curl -O https://raw.githubusercontent.com/cedrugs/embeddy/main/docker-compose.prod.yml
docker-compose -f docker-compose.prod.yml up -d

# Test the API
curl -X POST http://localhost:8080/api/embed \
  -H "Content-Type: application/json" \
  -d '{"model": "sentence-transformers/all-MiniLM-L6-v2", "input": ["Hello, world!"]}'

# View API docs
open http://localhost:8080/docs

From Source

# Clone and build
git clone https://github.com/cedrugs/embeddy.git
cd embeddy
cargo build --release

# Pull a model and run
./target/release/embeddy pull sentence-transformers/all-MiniLM-L6-v2 --alias minilm
./target/release/embeddy serve --model minilm

Installation

From Source

Requires Rust 1.91 or higher.

git clone https://github.com/cedrugs/embeddy.git
cd embeddy
cargo build --release

Using Docker

# From GitHub Container Registry
docker pull ghcr.io/cedrugs/embeddy:latest

# Or build locally
docker build -t embeddy:latest .

CLI Commands

Pull a Model

embeddy pull <MODEL_REPO_ID> [--alias <ALIAS>]

# Examples
embeddy pull sentence-transformers/all-MiniLM-L6-v2
embeddy pull sentence-transformers/all-mpnet-base-v2 --alias mpnet

List Models

embeddy list

Remove a Model

embeddy remove <MODEL_NAME>

# Example
embeddy remove minilm

Run Embeddings (CLI)

embeddy run <MODEL_NAME> --text <TEXT> [--text <TEXT>...] [--device <DEVICE>]

# Examples
embeddy run minilm --text "Hello world"
embeddy run minilm --text "First" --text "Second" --device cuda:0

Serve HTTP API

embeddy serve [OPTIONS]

Options:

Flag	Env Variable	Default	Description
`--host`	`EMBEDDY_HOST`	`0.0.0.0`	Host to bind to
`--port`	`EMBEDDY_PORT`	`8080`	Port to listen on
`--device`	`EMBEDDY_DEVICE`	`cpu`	Device (`cpu`, `cuda:0`, etc.)
`--model`	`EMBEDDY_MODEL`	-	Model to preload (auto-downloads if needed)
`--cache-size`	`EMBEDDY_CACHE_SIZE`	`10000`	Embedding cache size

Examples:

# Basic server (models loaded on-demand)
embeddy serve

# Preload a model at startup
embeddy serve --model minilm

# With GPU and larger cache
embeddy serve --model minilm --device cuda:0 --cache-size 50000

# Using environment variables
EMBEDDY_MODEL=minilm EMBEDDY_CACHE_SIZE=50000 embeddy serve

HTTP API

Endpoints

Method	Endpoint	Description
GET	`/docs`	Swagger UI documentation
GET	`/openapi.json`	OpenAPI specification
GET	`/api/health`	Health check and status
POST	`/api/embed`	Generate embeddings
GET	`/api/models`	List installed models
DELETE	`/api/models/{name}`	Remove a model

Generate Embeddings

curl -X POST http://localhost:8080/api/embed \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minilm",
    "input": ["Hello, world!", "How are you?"]
  }'

Response:

{
  "model": "minilm",
  "dimension": 384,
  "embeddings": [[0.123, -0.456, ...], [0.321, -0.654, ...]],
  "cache_hits": 0
}

Health Check

curl http://localhost:8080/api/health

Response:

{
  "status": "ok",
  "loaded_models": ["minilm"],
  "cache_entries": 42,
  "device": "Cpu"
}

List Models

curl http://localhost:8080/api/models

Remove a Model

curl -X DELETE http://localhost:8080/api/models/minilm

Docker Deployment

Development (build from source)

docker-compose up -d

Production (from ghcr.io)

docker-compose -f docker-compose.prod.yml up -d

Environment Variables

Configure via environment in docker-compose.prod.yml:

environment:
  - EMBEDDY_MODEL=sentence-transformers/all-MiniLM-L6-v2  # Auto-downloads and preloads
  - EMBEDDY_CACHE_SIZE=10000
  - EMBEDDY_DEVICE=cpu
  - RUST_LOG=info

If EMBEDDY_MODEL is set to a model that isn't installed, it will be automatically downloaded from HuggingFace on startup.

Persistent Storage

Models are stored in a Docker volume (embeddy-data:/data) and persist across container restarts.

Configuration

Environment Variables

Variable	Default	Description
`EMBEDDY_DATA_DIR`	System default	Data directory for models
`EMBEDDY_HOST`	`0.0.0.0`	Server host
`EMBEDDY_PORT`	`8080`	Server port
`EMBEDDY_DEVICE`	`cpu`	Compute device
`EMBEDDY_MODEL`	-	Model to preload
`EMBEDDY_CACHE_SIZE`	`10000`	LRU cache size
`RUST_LOG`	`info`	Log level

Data Directory

Default locations:

Linux: ~/.local/share/embeddy/
macOS: ~/Library/Application Support/embeddy/
Windows: C:\Users\<USER>\AppData\Roaming\embeddy\

Supported Models

Model	Dimension	Description
`sentence-transformers/all-MiniLM-L6-v2`	384	Fast, good for general use
`sentence-transformers/all-mpnet-base-v2`	768	Higher quality
`sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2`	384	Multilingual
`intfloat/multilingual-e5-large-instruct`	1024	High quality multilingual
`BAAI/bge-small-en-v1.5`	384	Retrieval optimized
`BAAI/bge-base-en-v1.5`	768	Retrieval optimized

Requirements

Must include config.json, tokenizer.json, and weights file
Supported formats: SafeTensors (.safetensors) or PyTorch (.bin)
Supported dtypes: F16, F32 (auto-converted)
Architecture: BERT-based (BERT, RoBERTa, DistilBERT, etc.)

Development

# Debug build
cargo build

# Release build
cargo build --release

# Run tests
cargo test

# Run with debug logging
RUST_LOG=debug cargo run -- serve --model minilm

Project Structure

embeddy/
├── src/
│   ├── cli/          # CLI argument parsing
│   ├── config.rs     # Configuration
│   ├── embedder/     # Model loading and inference
│   ├── error.rs      # Error types
│   ├── model/        # Model downloading and registry
│   ├── server/       # HTTP API server
│   └── main.rs       # Entry point
├── Cargo.toml
├── Dockerfile
├── docker-compose.yml       # Development
└── docker-compose.prod.yml  # Production

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues, questions, or contributions, please visit the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml

cedrugs/embeddy

Folders and files

Latest commit

History

Repository files navigation

embeddy

Features

Quick Start

Using Docker (Recommended)

From Source

Installation

From Source

Using Docker

CLI Commands

Pull a Model

List Models

Remove a Model

Run Embeddings (CLI)

Serve HTTP API

HTTP API

Endpoints

Generate Embeddings

Health Check

List Models

Remove a Model

Docker Deployment

Development (build from source)

Production (from ghcr.io)

Environment Variables

Persistent Storage

Configuration

Environment Variables

Data Directory

Supported Models

Requirements

Development

Project Structure

License

Contributing

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages