-
moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Python ★ 10k 1mo agoExplain → -
pocket-tts
A TTS that fits in your CPU (and pocket)
Python ★ 4.6k 18d agoExplain → -
delayed-streams-modeling
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
Python ★ 3.0k 4mo agoExplain → -
hibiki
Hibiki is a model for streaming speech translation (also known as simultaneous translation). Unlike offline translation—where one waits for the end of the source utterance to start translating--- Hibiki adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk.
Rust ★ 1.5k 1y agoExplain → -
unmute
Make text LLMs listen and speak
Python ★ 1.3k 16d agoExplain → -
moshi-finetune
No description.
Python ★ 465 8mo agoExplain → -
hibiki-zero
A real-time and multilingual speech translation model
Python ★ 256 4mo agoExplain → -
moshivis
Kyutai with an "eye"
Python ★ 251 1y agoExplain → -
nanoGPTaudio ⑂
Code for the blog "Neural audio codecs: how to get audio into LLMs"
Python ★ 174 8mo agoExplain → -
moshi-swift
No description.
Swift ★ 139 1y agoExplain → -
moshi-rag
MoshiRAG is a compact full-duplex speech language model augmented with asynchronous knowledge retrieval to improve factuality without sacrificing real-time interactivity.
Rust ★ 110 1mo agoExplain → -
invincible-voice
To bring back voice to those who lost it
TypeScript ★ 93 2mo agoExplain → -
sphn
python bindings for symphonia/opus - read various audio formats from python and write opus files
Rust ★ 79 5mo agoExplain → -
ovie
Official implementation and models for OVIE (One View Is Enough! Monocular Training for In-the-Wild Novel View Generation)
Jupyter Notebook ★ 70 2mo agoExplain → -
dactory
No description.
Python ★ 52 12d agoExplain → -
yomikomi
A small rust-based data loader
Rust ★ 37 4mo agoExplain → -
casa
A vision-language model with an improved cross-attention mechanism for scalable streaming inference
Python ★ 30 3mo agoExplain → -
tts_longeval
No description.
Python ★ 30 1mo agoExplain → -
ARC-Encoder
No description.
Python ★ 29 5mo agoExplain → -
kaudio
Rust crate for some audio utilities
Rust ★ 28 4d agoExplain → -
flash-attn3-jax
JAX bindings for the FlashAttention 3 kernels
C++ ★ 26 3mo agoExplain → -
jax-flash-attn3
JAX bindings for the flash-attention3 kernels
C++ ★ 23 5mo agoExplain → -
moshi-webrtc
Proof of concept for running moshi/hibiki using webrtc
Rust ★ 21 1y agoExplain → -
jax-flash-attn2
JAX bindings for the flash-attention2 kernels
C++ ★ 12 1y agoExplain → -
ogg-table
Ogg-vorbis reader with fast random access
Rust ★ 8 1y agoExplain → -
kairos
No description.
Python ★ 7 25d agoExplain → -
flashy
Framework for writing deep learning training loops. Lightweight, and retaining full freedom to design as you see fits. It handles checkpointing, logging, distributed, compatibility with Dora, and more!
Python ★ 6 1mo agoExplain → -
dora
Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.
Python ★ 5 5mo agoExplain → -
neural-audio-codecs-anims
Animations for the blog "Neural audio codecs: how to get audio into LLMs"
TypeScript ★ 4 8mo agoExplain →
No repos match these filters.