gitmyhub

Goutte

PHP ★ 9.2k updated 3y ago ▣ archived

Goutte, a simple PHP Web Scraper

Goutte is a deprecated PHP library for scraping websites, visit pages, extract data with CSS selectors, and submit forms in code. New projects should use Symfony's HttpBrowser component directly instead.

PHPSymfonyComposersetup: easycomplexity 2/5

Goutte is a PHP library for web scraping: it lets you write code that visits websites, reads their content, and extracts data from HTML pages without needing a real browser. You can use it to pull information from websites automatically, navigate between pages, click links, and submit forms.

The basic workflow is to create a client, point it at a URL, and get back a crawler object that lets you search through the page using CSS selectors. From there you can pull out text, follow links, or fill in and submit forms, all from PHP code.

However, the README is clear that Goutte is deprecated. Starting with version 4, the library became a thin wrapper around a Symfony component called HttpBrowser. If you have existing code that uses Goutte, the migration is straightforward: replace Goutte\Client with Symfony\Component\BrowserKit\HttpBrowser in your code. New projects should use the Symfony components directly rather than Goutte.

Under the hood, Goutte is built on four Symfony components: BrowserKit for browser-like navigation, DomCrawler for searching HTML structure, CssSelector for using CSS-style queries to find page elements, and HttpClient for making the actual HTTP requests. These components have their own documentation and are actively maintained by the Symfony project.

The library requires PHP 7.1 or higher and installs via Composer, the standard PHP package manager, with a single command. It is released under the MIT license. The name is pronounced like "goot" to rhyme with "boot."

Where it fits