Hi there, I'm Yiwei Ma (马祎炜) 👋 Algorithm Engineer @ dots , Xiaohongshu (RED) · Ph.D. from MAC Lab, Xiamen University Multimodal Large Language Models 🤖 · Text-to-Image Pretraining 🎨…
Hi there, I'm Yiwei Ma (马祎炜) 👋
Algorithm Engineer @ dots, Xiaohongshu (RED) · Ph.D. from MAC Lab, Xiamen University
Multimodal Large Language Models 🤖 · Text-to-Image Pretraining 🎨
---
👨💻 About Me
- 🔬 I'm an Algorithm Engineer at the dots team of Xiaohongshu (RED), working on Multimodal Large Language Models and Text-to-Image Pretraining.
- 🎓 I received my Ph.D. from the Department of Artificial Intelligence, Xiamen University (MAC Lab), advised by Prof. Rongrong Ji and Prof. Xiaoshuai Sun.
- 📚 27 papers in CCF-A/B venues (17 as first/co-first author, 3 Orals), with 1500+ Google Scholar citations.
- ⭐ Core developer of External-Attention-pytorch (12k+ stars).
- 📫 Reach me at [email protected] — feel free to chat!
🔥 Latest News
- 2026 — Two papers accepted by IJCV; one by ACL 2026 (Findings); one by Pattern Recognition.
- 2025 — One paper accepted by IEEE TPAMI; one by ACM MM 2025.
🏆 Selected Honors
- 🥇 2026 Top-Talent Program Offers (9): Xiaohongshu Red Star · Tencent Qingyun · Tongyi Alibaba Star · ByteDance Jindouyun · Ant Star · Huawei Genius Youth · Meituan Beidou · Xiaomi Top Talent · JD TGT
- 🧪 NSFC Youth Student Basic Research Project — *Principal Investigator* (国自然青基), 2024
- 🚀 CAST Young Talent Support Project for Ph.D. Students (青托), 2025
- 🎖️ Baidu Scholarship — Global Top 40, 2024
- 🏅 National Scholarship ×3 (2019 · 2022 · 2024)
📝 Selected Publications
> Full list on my homepage →
- An Extensive Benchmark for Single-Round and Multi-Round Instruction-Based Image Editing — *IJCV 2026* [Code]
- CoP: Chain of Perception for Referring 3D Instance Segmentation — *IJCV 2026* [Code]
- Boosting Multi-Modal Large Language Model with Enhanced Visual Features — *TPAMI 2025* [Code]
- I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing — *NeurIPS 2024* [Code]
- X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation — *ICML 2024* [Project]
- X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance — *ICCV 2023* [Project]
- Towards Local Visual Modeling for Image Captioning — *Pattern Recognition 2023* 🏆 *ESI Highly Cited* [Code]
- X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval — *ACM MM 2022* 🔥 *500+ citations* [Code]
🚀 Open-Source Projects
- 🤖 dots.vlm1.inst — Instruction-tuned multimodal LLM from the dots series *(Xiaohongshu · dots)*
- 📄 dots.mocr — Multilingual document layout parsing & OCR model *(Xiaohongshu · dots)*
- ⭐ External-Attention-pytorch — PyTorch implementations of Attention / MLP / Re-param / Conv modules *(12k+ stars)*
✍️ Writing & Community
I share paper reading notes and tutorials on 知乎 (Zhihu) and my WeChat public account FightingCV.
📖 Selected articles
-
External-Attention-pytorch
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐
Python ★ 12k 3mo agoExplain → -
FightingCV-Paper-Reading
⭐⭐⭐FightingCV Paper Reading, which helps you understand the most advanced research work in an easier way 🍀 🍀 🍀
Shell ★ 820 3y agoExplain → -
X-Dreamer
A pytorch implementation of “X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation”
Python ★ 75 2y agoExplain → -
xmu-xiaoma666
No description.
★ 35 13d agoExplain → -
LSTNet
Towards Local Visual Modeling for Image Captioning
Python ★ 30 3y agoExplain → -
RepMLP-pytorch
Pytorch implement ion of RepMLP
Python ★ 30 5y agoExplain → -
X-Mesh
A pytorch implementation of “ X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance”
Python ★ 29 2y agoExplain → -
Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool works locally and aims to create inference chains akin to those used by OpenAI-o1, but with localized processing power.
Python ★ 28 1y agoExplain → -
ECCV2022-Paper-List
ECCV2022-Paper-List
★ 19 3y agoExplain → -
SDATR
Official Code for "Knowing what it is: Semantic-enhanced Dual Attention Transformer" (TMM2022)
Python ★ 19 3y agoExplain → -
ImageCaptionMetrics
This repository contains 2 tools: - A py3 Lib for NLP & image-caption metrics - Code for a two-tailed t-test with paired samples. It will reveals whether the difference of two results is significant. In this code, we complete evaluation code for Spice details(*i.e.*,Object, Relation, Attribute, Color, Count, and Size ).
Python ★ 18 5y agoExplain → -
vMLLM
The official repository for “vMLLM: Boosting Multi-modal Large Language Model with Enhanced Visual Features”.
Python ★ 13 1y agoExplain → -
CVAlgorithm
CV面试中的常见算法
Python ★ 8 4y agoExplain → -
Visualizer ⑂
helper tools for attention visualization in deep learning
★ 7 4y agoExplain → -
yoloair ⑂
🔥🔥🔥YOLOAir:Including YOLOv5, YOLOv7, Transformer, YOLOX, YOLOR and other networks... Support to improve backbone, head, loss, IoU, NMS...The original version was created based on YOLOv5
★ 7 3y agoExplain → -
MFM
An official implementation for "Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning"
Python ★ 6 3y agoExplain → -
MLP-Mixer-pytorch
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision
Python ★ 6 5y agoExplain → -
Leetcode_diary
Leetcode is all you need
Python ★ 5 4y agoExplain → -
Pytorch-Image-Classification
Pytorch-Image-Classification
Python ★ 4 4y agoExplain → -
ECCV2022-Papers-with-Code ⑂
ECCV 2022 论文开源项目合集,同时欢迎各位大佬提交issue,分享ECCV 2020开源项目
★ 4 4y agoExplain → -
DTNet
The official repository for “Image Captioning via Dynamic Path Customization”.
Python ★ 3 1y agoExplain → -
ECCV2022-Papers-with-Code-Demo ⑂
收集 ECCV 最新的成果,包括论文、代码和demo视频等,欢迎大家推荐!
★ 3 3y agoExplain → -
ECCV2022-Paper-Code-Interpretation ⑂
ECCV2022 论文/代码/解读合集,极市团队整理
★ 2 3y agoExplain → -
Awesome-Model-Pytorch
pytorch implementation of deep learning models
★ 2 5y agoExplain → -
LLM-MPI
No description.
Python ★ 1 1y agoExplain → -
CoP
No description.
Python ★ 1 1y agoExplain → -
Beat
No description.
Python ★ 1 2y agoExplain → -
xmu-xiaoma666.github.io
No description.
JavaScript ★ 0 13d agoExplain → -
TGNN ⑂
No description.
★ 0 4y agoExplain → -
X-RefSeg3D ⑂
The official implementation of the paper "X-RefSeg3D: Enhancing Referring 3D Instance Segmentation via Structured Cross-Modal Graph Neural Networks"(AAAI2024)
★ 0 2y agoExplain → -
MLLM-Selector
The official repository for “MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning”.
★ 0 1y agoExplain → -
X-LLM ⑂
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
★ 0 2y agoExplain → -
Doragd
A ✨special ✨ repository to show myself on my homepage.
★ 0 4y agoExplain → -
X-CLIP ⑂
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
★ 0 3y agoExplain → -
Swin-Transformer ⑂
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
★ 0 3y agoExplain → -
CLIP4Clip ⑂
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
★ 0 4y agoExplain → -
rentainhe ⑂
No description.
★ 0 5y agoExplain →
No repos match these filters.