examples-of-web-crawlers

HTML ★ 15k updated 11mo ago

一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、微信读书、豆瓣、QQ等网站。(Some interesting examples of python crawlers that are friendly to beginners. )

A beginner-friendly collection of Python web scraping scripts that automatically collect data from websites, including Chinese shopping sites, stock platforms, and social apps like WeChat and QQ.

PythonSeleniumChromesetup: moderatecomplexity 2/5

This repository is a collection of Python web crawling examples aimed at beginners. Web crawling means writing code that automatically visits websites and collects data from them, much like how a person would open a page, read the information, and copy it down, except the program does it automatically at scale. The code examples here are described as beginner-friendly, with heavy commenting to explain each step.

The collection covers more than a dozen separate projects, each targeting a specific task. Several examples focus on Chinese platforms such as Taobao, Tmall, and Douban, which are major Chinese shopping and entertainment sites. For those examples, the scripts use a tool called Selenium, which controls a real Chrome browser window so that the code can log in and navigate pages that would otherwise block automated access.

Other examples include: downloading high-resolution wallpapers from a Mac wallpaper app, scraping movie rankings from Douban (a Chinese film review site), collecting mutual fund and stock data from a financial site using multiple threads and a pool of rotating IP addresses to avoid being blocked, generating a personal report from your WeChat contact list, and generating a historical summary report from your QQ account. There is also a script that sends scheduled reminder messages to a contact via WeChat at set times each day.

Most examples follow the same setup pattern: install a few Python packages listed in a requirements file, optionally download a Chrome browser driver if Selenium is needed, fill in your account credentials in the script, and then run a single Python file. Some projects include animated screenshots in the README showing the program in action.

The README is written primarily in Chinese, but the code itself and the project structure are straightforward enough that the steps can be followed with the help of a translation tool. The project is licensed under the MIT license.

Where it fits

Scrape product listings or rankings from Chinese e-commerce sites like Taobao or Tmall automatically.
Collect stock and mutual fund data from financial sites using multiple threads and rotating IP addresses to avoid blocks.
Generate a summary report from your WeChat contacts or QQ chat history with a single Python script.
Download high-resolution wallpapers automatically from a wallpaper app.

Open on GitHub → Full breakdown on explaingit →