2-day longest streak
Hi there 👋 Rongjie Huang (黄融杰) did my Graduate study at College of Computer Science and Software, Zhejiang University, supervised by Prof. Zhou Zhao. I also obtained Bachelor’s degree at…
Hi there 👋
Rongjie Huang (黄融杰) did my Graduate study at College of Computer Science and Software, Zhejiang University, supervised by Prof. Zhou Zhao. I also obtained Bachelor’s degree at Zhejiang University. During my graduate study, I was lucky to collaborate with the CMU Speech Team led by Prof. Shinji Watanabe, and Audio Research Team at Zhejiang University. I was grateful to intern or collaborate at TikTok, Shanghai AI Lab (OpenGV Lab), Tencent Seattle Lab, Alibaba Damo Academic, with Yi Ren, Jinglin Liu, Chunlei Zhang and Dong Yu.
My research interest includes Multi-Modal Generative AI, Multi-Modal Language Processing, and AI4Science. I have published first-author papers at the top international AI conferences such as NeurIPS/ICLR/ICML/ACL/IJCAI.
I am actively looking for academic collaboration, feel free to drop me an email.
📎 Homepages
- Personal Pages: https://rongjiehuang.github.io (updated recently🔥)
- Linkedin: https://www.linkedin.com/in/rongjie-huang-a362541b2
- Google Scholar: https://scholar.google.com/citations?user=iRHBUsgAAAAJ
💻 Selected Research Papers
Generative AI for Speech, Sing, and Audio: Spoken Large Language Model, Text-to-Audio Synthesis, Text-to-Speech Synthesis, Singing Voice SynthesisAudio-Visual Language Processing: Audio-Visual Speech-to-Speech Translation, Self-Supervised Learning
My full paper list is shown at my personal homepage.
Spoken Large Language Model
- InstructSpeech: Following Speech Editing Instructions via Large Language Models. Rongjie Huang, Ruofan Hu, Yongqi Wang, Zehan Wang, Xize Cheng, Ziyue Jiang, Zhenhui Ye, Dongchao Yang, Luping Liu, Peng Gao, Zhou Zhao. ICML, 2024
- AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head. Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe. AAAI, 2024
- UniAudio: An Audio Foundation Model Toward Universal Audio Generation. Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Xixin Wu, Zhou Zhao, Shinji Watanabe, Helen Meng. ICML 2024
Video-to-Audio Synthesis
- [Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models.]() Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao. ICML, 2023
- [Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT.]() Le Zhuo*, Ruoyi Du*, Han Xiao*, Yangguang Li*, Dongyang Liu*, Rongjie Huang*, Wenze Liu, Lirui Zhao, Fu-Yun Wang, Zhanyu Ma, Xu Luo, Zehan Wang, Kaipeng Zhang, Xiangyang Zhu, Si Liu, Xiangyu Yue, Dingning Liu, Wanli Ouyang, Ziwei Liu, Yu Qiao, Hongsheng Li, Peng Gao. ICLR, 2024
- [Make-An-Audio 2: Improving Text-to-Audio with Dual Text Information Representation.]() Jiawei Huang, Yi Ren, Rongjie Huang, Dongchao Yang, Zhenhui Ye, Chen Zhang, Jinglin Liu, Xiang Yin, Zejun Ma, Zhou Zhao. Arxiv, 2023
Audio-Visual Language Processing
- Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset. FAIR at Meta. Core contributor
- TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. Rongjie Huang, Jinglin Liu, Huadai Liu, Yi Ren, Lichao Zhang, Jinzheng He, and Zhou Zhao. ICLR, 2023
- [AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation.]() Rongjie Huang, Huadai Liu, Xize Cheng, Yi Ren, Linjun Li, Zhenhui Ye, Jinzheng He, Lichao Zhang, Jinglin Liu, Xiang Yin and Zhou Zhao. ACL, 2023
Text-to-Speech Synthesis
- GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech. Rongjie Huang, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. NeurIPS, 2022
- FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. Rongjie Huang, Max W.Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, and Zhou Zhao. IJCAI, 2022(oral)
- Multi-Singer: Fast multi-singer singing voice vocoder with a large-scale corpus. Rongjie Huang, Feiyang Chen, Yi Ren, Jinglin Liu, Chenye Cui, and Zhou Zhao. ACM MM, 2021(oral)
-
AudioGPT ★ PINNED ⑂
No description.
Python ★ 3 3y agoExplain → -
TranSpeech ★ PINNED
PyTorch Implementation of TranSpeech (ICLR'23): Textless NAR Speech-to-Speech Translation with Bilateral Perturbation
Python ★ 183 2y agoExplain → -
FastDiff ★ PINNED
PyTorch Implementation of FastDiff (IJCAI'22)
Python ★ 423 2y agoExplain → -
ProDiff
PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline
Python ★ 432 3y agoExplain → -
GenerSpeech
PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.
Python ★ 332 2y agoExplain → -
Multi-Singer
PyTorch Implementation of Multi-Singer (ACM-MM'21)
Python ★ 139 4y agoExplain → -
awesome-speech-to-speech-translation
List of direct speech-to-speech translation papers.
★ 39 3y agoExplain → -
Multiband-WaveRNN
An unofficial implement of autoregressive vocoder Multiband-WaveRNN. Audio samples in https://rongjiehuang.github.io/Multiband-WaveRNN/
Python ★ 28 5y agoExplain → -
WaterCo
基于Mask R-CNN的水下垃圾检测
Jupyter Notebook ★ 9 5y agoExplain → -
SingGAN ⑂
Project page for SingGAN (ACM-MM' 2022): Generative Adversarial Network For High-Fidelity Singing Voice Generation
★ 6 4y agoExplain → -
Awesome-Diffusion-Models ⑂
A collection of resources and papers on Diffusion Models
★ 4 3y agoExplain → -
Rongjiehuang
No description.
★ 3 11mo agoExplain → -
UniAudio ⑂
The Open Source Code of UniAudio
★ 2 2y agoExplain → -
LeetCodeAnimation ⑂
Demonstrate all the questions on LeetCode in the form of animation.(用动画的形式呈现解LeetCode题目的思路)
★ 2 5y agoExplain → -
955.WLB ⑂
955 不加班的公司名单 - 工作 955,work–life balance (工作与生活的平衡)
★ 1 5y agoExplain → -
leetcode ⑂
LeetCode题解,151道题完整版
★ 1 6y agoExplain → -
tips_for_interview ⑂
我的一些面试心得;自学CS历程分享;找工作经验分享
★ 1 6y agoExplain → -
daily-paper-computer-vision ⑂
记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文
★ 1 6y agoExplain → -
awesome-courses ⑂
:books: List of awesome university courses for learning Computer Science!
★ 1 6y agoExplain → -
2019_algorithm_intern_information ⑂
2020年的算法实习岗位/校招公司信息表,部分包括内推码,和常见深度学习算法岗面试题及答案,暑期计算机视觉实习面经和总结
★ 1 5y agoExplain → -
pumpkin-book ⑂
《机器学习》(西瓜书)公式推导解析,在线阅读地址:https://datawhalechina.github.io/pumpkin-book
★ 1 5y agoExplain → -
zju-icicles ⑂
浙江大学课程攻略共享计划
★ 1 5y agoExplain → -
Deep-Learning-Papers-Reading-Roadmap ⑂
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
★ 1 5y agoExplain → -
996.ICU ⑂
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
★ 1 5y agoExplain → -
PaddleSpeech ⑂
An Easy-to-use Speech Toolkit including SOTA ASR pipeline, influential TTS with text frontend and End-to-End Speech Simultaneous Translation.
★ 1 4y agoExplain → -
wait_rongjiehuang.github.io ⑂
A beautiful, simple, clean, and responsive Jekyll theme for academics
★ 1 4y agoExplain → -
fairseq ⑂
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Python ★ 0 1y agoExplain → -
code-of-learn-deep-learning-with-pytorch ⑂
This is code of book "Learn Deep Learning with PyTorch"
★ 0 7y agoExplain → -
Awesome-algorithm-interview ⑂
算法工程师(人工智能CV方向)面试问题及相关资料
★ 0 6y agoExplain →
No repos match these filters.