GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
GLM-4 is a family of open-source AI chat models built by the team at Zhipu AI. The models are designed to understand and generate text in multiple languages and can also work with images and other non-text inputs. The project contains several model variants released in April 2025, each aimed at different use cases and hardware constraints.
The main model, GLM-4-32B-0414, has 32 billion parameters and was trained on 15 trillion tokens of text data. According to the README, its performance on coding and question-answering benchmarks is comparable to GPT-4o and DeepSeek-V3, both of which are substantially larger. It handles tasks like writing and running code, calling external tools or APIs, answering questions from web search, and generating structured documents.
A second variant, GLM-Z1-32B-0414, adds extended reasoning. It was trained with extra steps focused on mathematics, logic, and code so it can work through harder multi-step problems before giving an answer. A third variant, GLM-Z1-Rumination-32B-0414, goes further: it is designed for longer, more open-ended tasks like writing detailed research reports. It can search the web during its thinking process to gather information before composing a response.
For users with limited computing resources, there is a smaller 9-billion-parameter option called GLM-Z1-9B-0414. The README notes it ranks near the top of open-source models at that size, particularly for math reasoning, making it a practical choice when running a full 32B model is not feasible.
The repository includes Python code for running inference, notebooks demonstrating specific capabilities, and instructions for using vLLM to serve the models in production. Deployment guides for Ollama and llama.cpp are also provided. The models are hosted on Hugging Face and can be downloaded freely. A commercial API version is available at bigmodel.cn, and the models can be tested without downloading anything at chat.z.ai.