gitmyhub

ECommerceCrawlers

Python ★ 5.6k updated 2y ago

实战🐍多种网站、电商数据爬虫🕷。包含🕸:淘宝商品、微信公众号、大众点评、企查查、招聘网站、闲鱼、阿里任务、博客园、微博、百度贴吧、豆瓣电影、包图网、全景网、豆瓣音乐、某省药监局、搜狐新闻、机器学习文本采集、fofa资产采集、汽车之家、国家统计局、百度关键词收录数、蜘蛛泛目录、今日头条、豆瓣影评、携程、小米应用商店、安居客、途家民宿❤️❤️❤️。微信爬虫展示项目:

A Python collection of web scrapers targeting major Chinese websites including Taobao, Weibo, Dianping, and more, built from real client work and presented as learning examples for web scraping techniques.

Pythonsetup: moderatecomplexity 3/5

ECommerceCrawlers is a Python collection of web scrapers targeting Chinese websites and platforms. Each scraper in the repository is a standalone project written by contributors to the group, and the README notes that about 80 percent were originally built for paying clients who agreed to open-source the work before it was added here. The collection is also presented as a learning resource for people studying web scraping techniques.

The repository covers more than twenty different targets, ranging from e-commerce platforms to social media, news sites, and business databases. Among the included scrapers are tools for Taobao (China's largest shopping platform), Dianping (a restaurant and business review site), and Xianyu (a second-hand goods marketplace). There are also scrapers for job listing sites, WeChat public accounts, Weibo (a large Chinese microblogging platform), Douban (a movie and music review site), and Baidu Tieba (a popular forum network). Travel booking data from Ctrip, business registration data from QiChaCha, and property listings from Anjuke and Tujia are also covered.

Each sub-project in the collection comes with its own readme explaining how the scraping process works for that particular site. The README describes the collection as practical examples that help someone new to crawling understand common problems and solutions, built around real targets rather than toy exercises.

The project is maintained on both GitHub and a Chinese code hosting platform called Gitee. The README is written in Chinese, and the project is clearly aimed at a Chinese-speaking audience familiar with these platforms.

Where it fits