Llama 3.2 3B Instruct
Meta's small, fast Llama 3.2 in instruct mode. Best for low-latency chat, RAG-style completions, and lightweight tool use. 128K context.
Indaba 2026 · Deep Learning Indaba, Nigeria · August 2026 — When Research Meets Reality: AI Engineering in Africa. Deep Learning Indaba · Nigeria · August 2026. Learn more
Inference Engine
Click Create Endpoint on any model to get a persistent, OpenAI-API-compatible URL serving that model. Works with cURL, the OpenAI Python SDK, and any OpenAI client unchanged. Billed per hour while the endpoint is running.
Meta's small, fast Llama 3.2 in instruct mode. Best for low-latency chat, RAG-style completions, and lightweight tool use. 128K context.
BAAI's multilingual embedding model. Returns 1024-dim vectors usable as drop-in replacements for OpenAI embeddings. Common pick for retrieval (RAG) over African-language corpora.
Alibaba's mid-size Qwen 2.5 in instruct mode. Strong reasoning + multilingual performance, including African languages via the broader Qwen pretraining mix. 32K context.