xmu-xiaoma666

@xmu-xiaoma666 ·Hangzhou, China ·xmu-xiaoma666.github.io

Algorithm Engineer at Xiaohongshu (dots) | Ph.D. from MAC Lab, Xiamen University | Multimodal LLMs & Text-to-Image

37 repos
1.1k followers
48 following

Python 90%
JavaScript 5%
Shell 5%

Hi there, I'm Yiwei Ma (马祎炜) 👋 Algorithm Engineer @ dots , Xiaohongshu (RED) · Ph.D. from MAC Lab, Xiamen University Multimodal Large Language Models 🤖 · Text-to-Image Pretraining 🎨…

Hi there, I'm Yiwei Ma (马祎炜) 👋

Algorithm Engineer @ dots, Xiaohongshu (RED)  ·  Ph.D. from MAC Lab, Xiamen University

Multimodal Large Language Models 🤖  ·  Text-to-Image Pretraining 🎨

---

👨‍💻 About Me

🔬 I'm an Algorithm Engineer at the dots team of Xiaohongshu (RED), working on Multimodal Large Language Models and Text-to-Image Pretraining.
🎓 I received my Ph.D. from the Department of Artificial Intelligence, Xiamen University (MAC Lab), advised by Prof. Rongrong Ji and Prof. Xiaoshuai Sun.
📚 27 papers in CCF-A/B venues (17 as first/co-first author, 3 Orals), with 1500+ Google Scholar citations.
⭐ Core developer of External-Attention-pytorch (12k+ stars).
📫 Reach me at [email protected] — feel free to chat!

🔥 Latest News

2026 — Two papers accepted by IJCV; one by ACL 2026 (Findings); one by Pattern Recognition.
2025 — One paper accepted by IEEE TPAMI; one by ACM MM 2025.

🏆 Selected Honors

🥇 2026 Top-Talent Program Offers (9): Xiaohongshu Red Star · Tencent Qingyun · Tongyi Alibaba Star · ByteDance Jindouyun · Ant Star · Huawei Genius Youth · Meituan Beidou · Xiaomi Top Talent · JD TGT
🧪 NSFC Youth Student Basic Research Project — *Principal Investigator* (国自然青基), 2024
🚀 CAST Young Talent Support Project for Ph.D. Students (青托), 2025
🎖️ Baidu Scholarship — Global Top 40, 2024
🏅 National Scholarship ×3 (2019 · 2022 · 2024)

📝 Selected Publications

> Full list on my homepage →

An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing — *IJCV 2026* [Code]
CoP: Chain of Perception for Referring 3D Instance Segmentation — *IJCV 2026* [Code]
Boosting Multi-Modal Large Language Model with Enhanced Visual Features — *TPAMI 2025* [Code]
I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing — *NeurIPS 2024* [Code]
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation — *ICML 2024* [Project]
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance — *ICCV 2023* [Project]
Towards Local Visual Modeling for Image Captioning — *Pattern Recognition 2023* 🏆 *ESI Highly Cited* [Code]
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval — *ACM MM 2022* 🔥 *500+ citations* [Code]

🚀 Open-Source Projects

🤖 dots.vlm1.inst — Instruction-tuned multimodal LLM from the dots series *(Xiaohongshu · dots)*
📄 dots.mocr — Multilingual document layout parsing & OCR model *(Xiaohongshu · dots)*
⭐ External-Attention-pytorch — PyTorch implementations of Attention / MLP / Re-param / Conv modules *(12k+ stars)*

✍️ Writing & Community

I share paper reading notes and tutorials on 知乎 (Zhihu) and my WeChat public account FightingCV.

📖 Selected articles

All public repos (37)

Show forks Show archived