ChatGLM2-6B
ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
ChatGLM2-6B is a free, open-source bilingual (Chinese and English) AI chat model with 6 billion parameters that runs on a single consumer GPU and can be fine-tuned on your own data.
ChatGLM2-6B is the second-generation version of ChatGLM-6B, an open-source bilingual (Chinese and English) conversational large language model. "6B" refers to its size — roughly six billion parameters — which is small enough to run on a single consumer GPU while still being capable enough for general chat. The repository contains the model code and supporting scripts you need to download weights, run inference, and fine-tune the model on your own data.
Compared with the first generation, ChatGLM2-6B was upgraded across several axes. The base model was retrained on 1.4T Chinese and English tokens with the GLM mixed-objective function and aligned to human preferences, producing large jumps on benchmarks like MMLU, C-Eval, GSM8K, and BBH (the README quotes gains such as +23% on MMLU and +571% on GSM8K). The context length was extended from 2K to 32K tokens using FlashAttention, with an 8K window used during chat training and a separate ChatGLM2-6B-32K variant for longer documents. Inference was made more efficient through Multi-Query Attention: roughly 42% faster generation than the first generation, and a 6GB GPU running INT4 quantization can sustain conversations up to 8K characters. INT8 and INT4 quantization further reduce memory with only modest accuracy loss.
You would use ChatGLM2-6B if you want a freely available chatbot model that is strong in both Chinese and English, can run on a single GPU, and can be fine-tuned locally — for research, prototyping, or, after registering through a form, free commercial use. It is built in Python on PyTorch and Hugging Face Transformers, installed with pip after cloning. The full README is longer than what was provided.
Where it fits
- Deploy a bilingual Chinese-English chatbot on a single consumer GPU with as little as 6GB VRAM
- Fine-tune the model on your own dataset for a domain-specific assistant or research task
- Run a local AI chat assistant with 32K context for long documents using the ChatGLM2-6B-32K variant
- Use the model as a research base for studying bilingual language understanding and alignment