kyutai ORG

@kyutai-labs ·France ·kyutai.org

Kyutai - Open Science AI Lab

29 repos
1.7k followers
0 following

Python 50%
Rust 25%
C++ 11%
TypeScript 7%
Jupyter Notebook 4%

All public repos (29)

Show forks Show archived

moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

An open AI model for real-time full-duplex voice conversation, both sides can speak at once like a phone call, with implementations for research, Apple Silicon devices, and production deployments.

Python ★ 10k 1mo ago
Explain →
pocket-tts

A TTS that fits in your CPU (and pocket)

Python ★ 4.6k 18d ago
Explain →
delayed-streams-modeling

Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.

Python ★ 3.0k 4mo ago
Explain →
hibiki

Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- Hibiki adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk.

Rust ★ 1.5k 1y ago
Explain →
unmute

Make text LLMs listen and speak

Python ★ 1.3k 16d ago
Explain →
moshi-finetune

No description.

Python ★ 465 8mo ago
Explain →
hibiki-zero

A real-time and multilingual speech translation model

Python ★ 256 4mo ago
Explain →
moshivis

Kyutai with an "eye"

Python ★ 251 1y ago
Explain →
nanoGPTaudio ⑂

Code for the blog "Neural audio codecs: how to get audio into LLMs"

Python ★ 174 8mo ago
Explain →
moshi-swift

No description.

Swift ★ 139 1y ago
Explain →
moshi-rag

MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sacrificing real-time interactivity.

Rust ★ 110 1mo ago
Explain →
invincible-voice

To bring back voice to those who lost it

TypeScript ★ 93 2mo ago
Explain →
sphn

python bindings for symphonia/opus - read various audio formats from python and write opus files

Rust ★ 79 5mo ago
Explain →
ovie

Official implementation and models for OVIE (One View Is Enough! Monocular Training for In-the-Wild Novel View Generation)

Jupyter Notebook ★ 70 2mo ago
Explain →
dactory

No description.

Python ★ 52 12d ago
Explain →
yomikomi

A small rust-based data loader

Rust ★ 37 4mo ago
Explain →
casa

A vision-language model with an improved cross-attention mechanism for scalable streaming inference

Python ★ 30 3mo ago
Explain →
tts_longeval

No description.

Python ★ 30 1mo ago
Explain →
ARC-Encoder

No description.

Python ★ 29 5mo ago
Explain →
kaudio

Rust crate for some audio utilities

Rust ★ 28 4d ago
Explain →
flash-attn3-jax

JAX bindings for the FlashAttention 3 kernels

C++ ★ 26 3mo ago
Explain →
jax-flash-attn3

JAX bindings for the flash-attention3 kernels

C++ ★ 23 5mo ago
Explain →
moshi-webrtc

Proof of concept for running moshi/hibiki using webrtc

Rust ★ 21 1y ago
Explain →
jax-flash-attn2

JAX bindings for the flash-attention2 kernels

C++ ★ 12 1y ago
Explain →
ogg-table

Ogg-vorbis reader with fast random access

Rust ★ 8 1y ago
Explain →
kairos

No description.

Python ★ 7 25d ago
Explain →
flashy

Framework for writing deep learning training loops. Lightweight, and retaining full freedom to design as you see fits. It handles checkpointing, logging, distributed, compatibility with Dora, and more!

Python ★ 6 1mo ago
Explain →
dora

Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.

Python ★ 5 5mo ago
Explain →
neural-audio-codecs-anims

Animations for the blog "Neural audio codecs: how to get audio into LLMs"

TypeScript ★ 4 8mo ago
Explain →