gitmyhub

linguist

Ruby ★ 7 updated 8y ago ⑂ fork

Language Savant. If your repository's language is being reported incorrectly, send us a pull request!

Linguist Explanation

Linguist is a tool that automatically figures out what programming languages are used in a code repository. When you push code to GitHub, this library runs behind the scenes to analyze all your files and generate that colored language breakdown bar you see on a repository's main page—the one that says something like "80% Python, 20% JavaScript." It makes sure the percentages are accurate by ignoring files you didn't write (like third-party libraries), files that are auto-generated, and documentation, so your stats reflect only the code you actually maintain.

The library works by examining each file in your repository and using several detective methods to identify its language. It starts by checking for hints like Vim or Emacs configuration markers at the top of a file, then looks at the filename itself, checks for shebangs (those #!/usr/bin/env python lines), examines the file extension, and if still uncertain, uses more sophisticated pattern-matching. This layered approach means it rarely gets confused—but when it does, you can tell it the correct answer.

If GitHub is misidentifying your repository's language or missing one entirely, Linguist is where you'd report it. The tool is extensible: if you write in a language it doesn't recognize yet, you can contribute support for it. You can also override its decisions on a per-file or per-folder basis using a .gitattributes file in your project—for example, telling it to ignore a vendor folder or to treat a specific file as a different language than it guessed.

Developers and maintainers use this because accurate language stats matter. They help you understand what your codebase is actually made of, help potential contributors know what languages a project uses, and make repositories easier to search and categorize on GitHub. For most people, it just works invisibly—but when it doesn't, this is the project you can improve.