DATASPIRES

Inference Engine
Models
Datasets

Inference Engine

Hosted Inference Endpoints

Click Create Endpoint on any model to get a persistent, OpenAI-API-compatible URL serving that model. Works with cURL, the OpenAI Python SDK, and any OpenAI client unchanged. Billed per hour while the endpoint is running.

3BChat

Llama 3.2 3B Instruct

Meta's small, fast Llama 3.2 in instruct mode. Best for low-latency chat, RAG-style completions, and lightweight tool use. 128K context.

$2.00/hr
1024-dimEmbeddings

BGE-M3 Embeddings

BAAI's multilingual embedding model. Returns 1024-dim vectors usable as drop-in replacements for OpenAI embeddings. Common pick for retrieval (RAG) over African-language corpora.

$1.00/hr
7BChat

Qwen 2.5 7B Instruct

Alibaba's mid-size Qwen 2.5 in instruct mode. Strong reasoning + multilingual performance, including African languages via the broader Qwen pretraining mix. 32K context.

$2.50/hr