gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
OpenAI's original code and model weights for GPT-2, the 2019 language model that could generate fluent text, answer questions, and summarize without task-specific training, released as an archived research artifact.
This repository contains the original code and model weights released by OpenAI for GPT-2, the AI language model described in their 2019 research paper "Language Models are Unsupervised Multitask Learners." GPT-2 is a neural network trained to predict the next word in a sentence — and by doing so at massive scale across a huge dataset of internet text, it became capable of generating surprisingly coherent and fluent paragraphs, answering questions, summarizing text, and performing other language tasks without being explicitly trained for each one. This multi-ability from a single model trained on one objective was the key finding of the paper.
The repository is an archived research artifact — code is provided as-is with no further updates expected. It is intended as a starting point for researchers and engineers who want to study or experiment with GPT-2's behavior, fine-tune it for specific tasks, or investigate its biases and failure modes. The code is written in Python. OpenAI notes important caveats: the model can produce inaccurate or biased outputs because its training data contains biases and factual errors, and generated text should always be clearly labeled as synthetic to avoid being mistaken for human writing.
Where it fits
- Study how the original GPT-2 model architecture and training setup worked before transformer models became widespread.
- Fine-tune GPT-2 on a small custom text dataset to experiment with domain-specific language generation.
- Investigate GPT-2's biases and failure modes as a research project on early large language model behavior.
- Use GPT-2 as a lightweight text generation baseline to compare against modern models in an NLP experiment.