Anti-Anti-Spider
越来越多的网站具有反爬虫特性,有的用图片隐藏关键数据,有的使用反人类的验证码,建立反反爬虫的代码仓库,通过与不同特性的网站做斗争(无恶意)提高技术。(欢迎提交难以采集的网站)(因工作原因,项目暂停)
A Python learning project that trains a convolutional neural network to recognize and solve CAPTCHA images, achieving about 95.5% accuracy using AlexNet or LeNet architectures on TensorFlow 1.9.
This repository contains a Python library for automatically reading and solving CAPTCHA images using a type of neural network called a convolutional neural network (CNN). CAPTCHAs are the distorted text or image puzzles that websites use to check whether a visitor is a human rather than an automated program. This project was built as a learning resource around the specific technical challenge of training a model to recognize those images.
The README is written in Chinese. It describes using two CNN model architectures called AlexNet and LeNet, and reports around 95.5% accuracy on CAPTCHA recognition. The training process runs on TensorFlow 1.9.0, either on a standard CPU or with an NVIDIA GPU for faster training. AlexNet requires images resized to 227 by 227 pixels, and the repository includes a preprocessing script to handle that resizing.
The workflow is: collect and label CAPTCHA images, split them into training and validation sets, place them in the sample directory, run the training script, and then use the recognition script to test how well the trained model performs. Configuration is handled through a JSON file.
The author notes the project is now paused due to other commitments, and emphasizes it is intended for learning about image recognition and CNNs only, not for malicious use. The project folder also includes older content in an Anti-Anti-Spider subdirectory covering related web-scraping techniques.
The project description and topics are in Chinese, and the README does not include an English version. Pre-trained model files are available for download from a link in the README if training from scratch is not needed.
Where it fits
- Learn how convolutional neural networks work in practice by training one on a concrete image-recognition task like CAPTCHA solving.
- Compare AlexNet and LeNet architectures on a small image dataset and observe the accuracy and speed trade-offs firsthand.
- Download the pre-trained model to run CAPTCHA recognition in a research or learning context without training from scratch.