gorilla

Python ★ 13k updated 2mo ago

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

A UC Berkeley research project that trains and benchmarks AI models to accurately call external APIs and functions, including a drop-in replacement for OpenAI function calling and a live leaderboard.

PythonPyTorchREST APIsetup: hardcomplexity 4/5

Gorilla is a research project from UC Berkeley focused on training and evaluating large language models (LLMs) to call external APIs accurately. When you ask an AI assistant to use a tool or service, it needs to produce a correctly formatted function call with the right arguments. Gorilla studies how to make that work reliably across thousands of different APIs.

The repository contains several interconnected components. The original Gorilla model is a fine-tuned language model trained on a dataset called APIBench, which covers more than 1,600 APIs from sources like Hugging Face, PyTorch Hub, and TensorFlow Hub. The model takes a plain-language request and produces the correct API call, including proper argument names and types, with fewer errors than general-purpose models at the time of release.

The Berkeley Function Calling Leaderboard (BFCL) is a benchmark for ranking AI models on their ability to call functions correctly. It has gone through several versions, progressing from single function calls to multi-turn conversations, multi-step workflows, and a V4 that tests tool use in real agent settings including web search with multi-hop reasoning and memory management. The leaderboard is publicly available and tracks many commercial and open-source models.

Gorilla OpenFunctions V2 is a model designed as a drop-in replacement for OpenAI function calling, with support for Python, Java, JavaScript, and REST APIs. It can execute multiple functions in parallel and includes logic to detect when a function call is not actually relevant, reducing unnecessary invocations.

GoEx is a separate component that acts as a runtime for executing actions an LLM generates, with built-in support for undoing actions and limiting damage from unintended operations. The project is licensed under Apache 2.0.

Where it fits

Benchmark your AI model's function-calling accuracy using the Berkeley Function Calling Leaderboard.
Use Gorilla OpenFunctions V2 as a drop-in replacement for OpenAI function calling in Python or JavaScript apps.
Run LLM-generated actions safely with GoEx, which supports undoing unintended operations.
Train a model on APIBench to produce accurate API calls from plain-language requests.

Open on GitHub → Full breakdown on explaingit →