gitmyhub

WizardLM

Python ★ 9.5k updated 1y ago

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

A Microsoft research project releasing three AI models, one for general instructions, one for coding, and one for math, trained to handle complex tasks better than earlier models of similar size.

PythonPyTorchTransformerssetup: hardcomplexity 5/5

WizardLM is a research project from Microsoft that produced a family of AI language models trained to follow complex instructions more reliably than earlier models of similar size. The project contains three distinct models: WizardLM for general conversation and instruction following, WizardCoder for writing and understanding code, and WizardMath for solving math problems. All three are built using a method the team calls Evol-Instruct, where a simpler set of training examples is automatically expanded into a larger, more varied and challenging set by having an AI generate progressively harder versions of each example.

WizardCoder is the most prominent part of the repository in terms of benchmark results. As of early 2024, the 33-billion-parameter version achieved scores on standard coding benchmarks that the team reported as competitive with or surpassing GPT-3.5-Turbo and Gemini Pro. WizardMath similarly focuses on grade-school and competition-style math problems, with the 70-billion-parameter version outperforming GPT-3.5 on one benchmark (GSM8K) at the time of release. WizardLM itself targets general complex instructions and was accepted as a paper at ICLR 2024.

All three model families are available for download from HuggingFace. The models come in several sizes, ranging from 1 billion to 70 billion parameters, so users with different hardware can choose a version that fits their available memory and compute. The underlying base models include Llama, Mistral, and DeepSeek-Coder depending on the version.

The code in the repository covers training scripts for reproducing the Evol-Instruct process and evaluation scripts for running the benchmarks. It requires Python 3.9 or later. Data produced by the project is licensed under Creative Commons BY-NC 4.0, meaning it can be used for research and non-commercial purposes. The code itself is Apache 2.0 licensed.

The project has a Discord community and a homepage with additional details. Development appears to have been most active between 2023 and early 2024, corresponding to the period when these benchmarks were published.

Where it fits