• Models

  • Datasets

Open-Source AI Models

A suite of high-quality, open-source models developed for African languages, data, and use cases.

Text-to-Speech (TTS)

Currently the highest-quality synthetic voice for Kinyarwanda, enabling natural-sounding interactions for voice-based applications.

View Model

KinyaCOLBERT Free

The first Kinyarwanda embedding and information retrieval model. Essential for building RAG-based chatbots like Tunga.

View Model

KinyaBERT

A foundational BERT-style model serving as the backbone for various Kinyarwanda NLP tasks, from classification to entity recognition.

View Model

UlizaLlama3

An 8B parameter model by Jacaranda specifically enhanced to excel in processing and generating text in Swahili.

View Model

YorubaLlama

An 8B parameter model by Jacaranda specifically enhanced to excel in processing and generating text in the Yoruba language.

View Model

HausaLlama

An 8B parameter model by Jacaranda specifically enhanced to excel in processing and generating text in the Hausa language.

View Model

Xhosa_ZuluLlama3

An 8B parameter model by Jacaranda specifically enhanced to excel in processing and generating text in isiXhosa and isiZulu.

View Model

AfroLlama_V1

An 8B parameter multi-language model by Jacaranda enhanced for Swahili, Xhosa, Zulu, Yoruba, and Hausa.

View Model

UlizaLlama

A 7B parameter Swahili language model by Jacaranda continually pre-trained on Swahili instructions.

View Model

Kiswallama Pretrained

A Swahili foundational model continually pre-trained to extend the capabilities of Llama 2.

View Model

Masakhane Models

Browse open-source models from the Masakhane community spanning multiple African languages and NLP tasks.

View Models

Open-Source Datasets

Curated open datasets used to train and evaluate foundational models across Kinyarwanda and other African languages.

Voice Dataset

A comprehensive dataset designed for training high-fidelity Text-to-Speech (TTS) and Speech-to-Text (STT) models in Kinyarwanda.

Access Data

Information Retrieval Dataset

The specialized dataset used to train the KinyaColBERT model, optimized for semantic search and RAG applications.

Access Data

Fikira (Vambo AI)

A multilingual reasoning dataset containing 50,000 synthetic reasoning examples across 10 African languages.

Access Data

Masakhane Datasets

Explore multilingual datasets from Masakhane covering translation, speech, and NLP resources for African languages.

Access Data