Scrapinghub command line client =============================== .. image:: https://img.shields.io/pypi/v/shub.svg :target: https://pypi.python.org/pypi/shub :alt: PyPI Version .. image:: https://img.shields.io/pypi/pyversions/shub.svg :target: https://pypi.python.org/pypi/shub :alt: Python Versions .. image:: https://github.com/scrapinghub/shub/actions/workflows/tests.yml/badge.svg :target: https://github.com/scrapinghub/shub/actions/workflows/tests.yml :alt: Tests .. image::…
Scrapinghub command line client
===============================
.. image:: https://img.shields.io/pypi/v/shub.svg
:target: https://pypi.python.org/pypi/shub
:alt: PyPI Version
.. image:: https://img.shields.io/pypi/pyversions/shub.svg
:target: https://pypi.python.org/pypi/shub
:alt: Python Versions
.. image:: https://github.com/scrapinghub/shub/actions/workflows/tests.yml/badge.svg
:target: https://github.com/scrapinghub/shub/actions/workflows/tests.yml
:alt: Tests
.. image:: https://img.shields.io/codecov/c/github/scrapinghub/shub/master.svg
:target: https://codecov.io/github/scrapinghub/shub?branch=master
:alt: Coverage report
`shub is the Scrapinghub command line client. It allows you to deploy
projects or dependencies, schedule spiders, and retrieve scraped data or logs
without leaving the command line.
Requirements
------------
- Python >= 3.10
Installation
------------
If you have pip installed on your system, you can install shub from
the Python Package Index::
pip install shub
Please note:
- if you are using Python < 3.6, you should pin shub
to2.13.0or lower. - if you are using Python < 3.9, you should pin shub
to2.15.4or lower. - if you are using Python < 3.10, you should pin shub
to2.16.0or lower.
_.
.. _
latest Github release: https://github.com/scrapinghub/shub/releases/latest
Documentation
-------------
Documentation is available online via Read the Docs:
https://shub.readthedocs.io/, or in the
docs` directory.Members
-
splash ★ PINNED
Lightweight, scriptable browser as a service with an HTTP API
Python ★ 4.2k 1y agoExplain → -
dateparser ★ PINNED
python parser for human readable dates
Python ★ 2.8k 6d agoExplain → -
python-scrapinghub ★ PINNED
A client interface for Scrapinghub's API
Python ★ 206 1d agoExplain → -
extruct ★ PINNED
Extract embedded metadata from HTML markup
Python ★ 966 2mo agoExplain → -
spidermon ★ PINNED
Scrapy Extension for monitoring spiders execution.
Python ★ 559 24d agoExplain → -
python-crfsuite ★ PINNED
A python binding for crfsuite
Python ★ 773 5mo agoExplain → -
portia
Visual scraping for Scrapy
Python ★ 9.5k 2y agoExplain → -
frontera
A scalable frontier for web crawlers
Python ★ 1.3k 1y agoExplain → -
slackbot
A chat bot for Slack (https://slack.com).
Python ★ 1.3k 3y agoExplain → -
scrapyrt
HTTP API for Scrapy spiders
Python ★ 881 3mo agoExplain → -
article-extraction-benchmark
Article extraction benchmark: dataset and evaluation scripts
Python ★ 375 22d agoExplain → -
price-parser
Extract price amount and currency symbol from a raw text string
Python ★ 344 3mo agoExplain → -
webstruct
NER toolkit for HTML data
HTML ★ 259 2y agoExplain → -
adblockparser
Python parser for Adblock Plus filters
Python ★ 202 7y agoExplain → -
js2xml
Convert Javascript code to an XML document
Python ★ 188 4y agoExplain → -
testspiders
Useful test spiders for Scrapy
Python ★ 184 6y agoExplain → -
scrapy-training
Scrapy Training companion code
Python ★ 173 7y agoExplain → -
skinfer
Skinfer is a tool for inferring and merging JSON schemas
Python ★ 141 2y agoExplain → -
sample-projects
Sample projects showcasing Scrapinghub tech
Python ★ 137 2y agoExplain → -
shub
Scrapinghub Command Line Client
Python ★ 130 1d agoExplain → -
number-parser
Parse numbers written in natural language
Python ★ 130 1y agoExplain → -
python-simhash
An efficient simhash implementation for python
C ★ 127 6y agoExplain → -
scrapy-poet
Page Object pattern for Scrapy
Python ★ 127 12d agoExplain → -
mdr
A python library detect and extract listing data from HTML page.
C ★ 110 9y agoExplain → -
web-poet
Web scraping Page Objects core library
Python ★ 107 3d agoExplain → -
aile
Automatic Item List Extraction
HTML ★ 85 10y agoExplain → -
wappalyzer-python
UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)
Python ★ 82 9y agoExplain → -
scrapinghub-stack-scrapy
Software stack with latest Scrapy and updated deps
Dockerfile ★ 64 29d agoExplain → -
scrapy-autoextract
Zyte Automatic Extraction integration for Scrapy
Python ★ 58 2mo agoExplain → -
scrapy-autounit
Automatic unit test generation for Scrapy.
Python ★ 58 5y agoExplain → -
aduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).
C ★ 54 2y agoExplain → -
learn.scrapinghub.com
Scrapinghub Learning Center. Report issues in Jira: Report issues in Jira: https://scrapinghub.atlassian.net/projects/WEB
CSS ★ 53 6y agoExplain → -
portia2code
No description.
Python ★ 50 4y agoExplain → -
arche
Analyze scraped data
Python ★ 47 6y agoExplain → -
scmongo
MongoDB extensions for Scrapy
Python ★ 44 11y agoExplain → -
exporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
Python ★ 39 2y agoExplain → -
webpager
Paginating the web
C ★ 37 12y agoExplain → -
scrapy-frontera
More flexible and featured Frontera scheduler for Scrapy
Python ★ 36 1y agoExplain → -
page_clustering
A simple algorithm for clustering web pages, suitable for crawlers
HTML ★ 33 9y agoExplain → -
scrapylib ▣
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
Python ★ 33 8y agoExplain → -
docker-images
No description.
Dockerfile ★ 33 8mo agoExplain → -
flatson
Tool to flatten stream of JSON-like objects, configured via schema
Python ★ 33 6y agoExplain → -
scaws
Extensions for using Scrapy on Amazon AWS
Python ★ 32 13y agoExplain → -
pycon-speakers
Speakers Spider (PyCon 2014 sprint)
Python ★ 29 11y agoExplain → -
page_finder ⑂
Find which links on a web page are pagination links
HTML ★ 29 9y agoExplain → -
docker-devpi
pypi caching service using devpi and docker
Shell ★ 28 9y agoExplain → -
scrapy-mosquitera
Restrict crawl and scraping scope using matchers.
Python ★ 26 10y agoExplain → -
scrapy ⑂
Scrapy, a fast high-level screen scraping and web crawling framework for Python.
★ 26 10y agoExplain → -
crawlera-tools ▣
Crawlera tools
Python ★ 26 10y agoExplain → -
scrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runner
Python ★ 24 3mo agoExplain → -
andi
Library for annotation-based dependency injection
Python ★ 24 3mo agoExplain → -
autoextract-spiders
Pre-built Scrapy spiders for AutoExtract
Python ★ 19 2y agoExplain → -
kafka-scanner
High Level Kafka Scanner
Python ★ 19 8y agoExplain → -
python-cld2
Python bindings for CLD2.
Python ★ 17 7y agoExplain → -
shublang
Pluggable DSL that uses pipes to perform a series of linear transformations to extract data
Python ★ 16 1y agoExplain → -
product-extraction-benchmark
No description.
Jupyter Notebook ★ 16 2mo agoExplain → -
python-hubstorage
Deprecated HubStorage client library - please use python-scrapinghub>=1.9.0 instead
Python ★ 16 9y agoExplain → -
autopager ⑂
Detect and classify pagination links
HTML ★ 15 5y agoExplain → -
shub-workflow
No description.
Python ★ 14 2d agoExplain → -
shubc ▣
Go bindings for Scrapinghub HTTP API and a sweet command line tool for Scrapy Cloud
Go ★ 13 8y agoExplain → -
pybloomfiltermmap ⑂
Fast Python Bloom Filter using Mmap
C ★ 13 14y agoExplain → -
scrapinghub-stack-portia
Software stack used to run Portia spiders in Scrapinghub cloud
Python ★ 11 7y agoExplain → -
autologin ⑂
A project to attempt to automatically login to a website given a single seed
Python ★ 11 2y agoExplain → -
navscraper
Vanguard ETF NAV scraper
Python ★ 10 12y agoExplain → -
hcf-backend
Crawl Frontier HCF backend
Python ★ 9 7mo agoExplain → -
django-xadmin ⑂
Drop-in replacement of Django admin comes with lots of goodies, fully extensible with plugin support, pretty UI based on Twitter Bootstrap.
Python ★ 9 2y agoExplain → -
Formasaurus ⑂
Formasaurus tells you the type of an HTML form and its fields using machine learning
HTML ★ 8 9mo agoExplain → -
varanus
A command line spider monitoring tool
Python ★ 8 1y agoExplain → -
pastebin
No description.
Python ★ 8 2y agoExplain → -
tutorials
No description.
Python ★ 8 2y agoExplain → -
autoextract-poet
web-poet definitions for AutoExtract
Python ★ 7 4y agoExplain → -
hadoop-jmx-exporter ⑂
HDFS & YARN jmx metrics prometheus exporter
★ 7 5y agoExplain → -
pydatanyc
No description.
Python ★ 7 10y agoExplain → -
scrapy-monkeylearn ⑂
A Scrapy pipeline to categorize items using MonkeyLearn
Python ★ 7 2y agoExplain → -
scrapinghub-buildpack-scrapy ⑂ ▣
Python buildpack
Shell ★ 7 11y agoExplain → -
locode
No description.
Python ★ 5 5y agoExplain → -
adblockgoparser
Golang parser for Adblock Plus filters
Go ★ 5 6y agoExplain → -
collection-scanner
HubStorage collection scanner library
Python ★ 5 3y agoExplain → -
autologin-middleware ⑂
Scrapy middleware for the autologin
Python ★ 5 10y agoExplain → -
disco ⑂
a Map/Reduce framework for distributed computing
Erlang ★ 5 11y agoExplain → -
autoextract-examples
No description.
Jupyter Notebook ★ 4 2mo agoExplain → -
shub-image ▣
Deprecated client side tool to prepare docker images to run crawlers in Scrapinghub - please use shub>=2.5.0 instead
Python ★ 4 8y agoExplain → -
webstruct-demo
HTTP demo for https://github.com/scrapinghub/webstruct
Python ★ 4 2y agoExplain → -
python-readability ⑂
fast python port of arc90's readability tool, updated to match latest readability.js!
HTML ★ 4 11y agoExplain → -
luigi ⑂
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Python ★ 3 2y agoExplain → -
xpathcsstutorial
[Work in progress] XPath & CSS for web scraping tutorial
Jupyter Notebook ★ 3 9y agoExplain → -
Zappa ⑂
Serverless Python Web Services
Python ★ 3 2y agoExplain → -
hubstorage-frontera
Hubstorage crawl frontier backend for Frontera
Python ★ 3 9y agoExplain → -
docker-cloudera-manager
Run Cloudera Manager in docker
Dockerfile ★ 3 6y agoExplain → -
django-channels ⑂
Developer-friendly asynchrony for Django
Python ★ 3 7y agoExplain → -
httpation
No description.
Erlang ★ 3 9y agoExplain → -
vulcand ⑂
HTTP proxy that uses Etcd as a configuration backend.
★ 3 11y agoExplain → -
custom-images-examples
Examples of custom images running on Scrapinghub platform
★ 2 2y agoExplain → -
marathon-apps-collectd-plugin ⑂
marathon-apps-collectd-plugin
Python ★ 2 6mo agoExplain → -
streamparse ⑂
streamparse lets you run Python code against real-time streams of data. Integrates with Apache Storm.
Python ★ 2 1y agoExplain → -
social-app-django ⑂
Python Social Auth - Application - Django
Python ★ 2 1y agoExplain → -
epmdless_dist
No description.
Erlang ★ 2 3y agoExplain → -
scrapinghub-conda-recipes
Conda packages for scrapinghub channel
Shell ★ 2 9y agoExplain → -
drone ⑂
Drone is a Continuous Integration platform built on Docker, written in Go
Go ★ 2 10y agoExplain → -
pydaybot
Demo bot for Python Day Uruguay 2011
Python ★ 2 13y agoExplain → -
egraylog
No description.
Erlang ★ 2 6y agoExplain → -
docker-redmine ⑂
Dockerized redmine app server with a couple of pre-installed themes and plugins
Shell ★ 2 7y agoExplain → -
pgcontents ⑂
A Postgres-backed ContentsManager implementation for IPython
Python ★ 2 1y agoExplain → -
pymesos ⑂
A pure python implementation of Mesos scheduler and executor
Python ★ 2 9y agoExplain → -
sklearn-crfsuite ⑂
scikit-learn inspired API for CRFsuite
Python ★ 1 9mo agoExplain → -
gcloud-python ⑂
Google Cloud Client Library for Python
Python ★ 1 2y agoExplain → -
scrapinghub-stack-hworker ▣
[DEPRECATED] Software stack fully compatible with Scrapy Cloud 1.0
Python ★ 1 9y agoExplain → -
erl-iputils
No description.
Erlang ★ 1 3y agoExplain → -
baseimage-docker ⑂
A minimal Ubuntu base image modified for Docker-friendliness
Shell ★ 1 4y agoExplain → -
social-core ⑂
Python Social Auth - Core
★ 1 6y agoExplain → -
dockerfiles-stunnel ⑂
secure services with stunnel
Shell ★ 1 5y agoExplain → -
kafka-consumer-group-exporter ⑂
Prometheus Kafka Consumer Group Exporter
Python ★ 1 5y agoExplain → -
docker-secor ⑂
Docker image running Secor
Shell ★ 1 6y agoExplain → -
docker-registry ⑂
Registry server for Docker (hosting/delivering of repositories and images)
Python ★ 1 2y agoExplain → -
mobilenetv2.pytorch ⑂
72.2% MobileNet V2 model on ImageNet with PyTorch Implementation
Python ★ 1 7y agoExplain → -
jupyterhub-stacks
A docker images for jhub cluster
Python ★ 1 6y agoExplain → -
discourse-sso-google
Use Google as Single-Sign-On provider for Discourse
Python ★ 1 7y agoExplain → -
tabix ⑂
Tabix.io UI
JavaScript ★ 1 2y agoExplain → -
Burrow ⑂
Kafka Consumer Lag Checking
Go ★ 1 7y agoExplain → -
hbase-docker ⑂
HBase running in Docker
Shell ★ 1 9y agoExplain → -
aquarium ⑂
Splash + HAProxy + Docker Compose
Python ★ 1 10y agoExplain → -
crawlera.com
crawlera.com website
HTML ★ 1 9y agoExplain → -
python-wapiti ⑂
Python bindings for libwapiti
C ★ 1 12y agoExplain → -
docker-kibana ⑂
Balsamiq kibana webapp docker container
Shell ★ 1 8y agoExplain → -
grafana ⑂
Gorgeous metric viz, dashboards & editors for Graphite, InfluxDB & OpenTSDB
Go ★ 1 2y agoExplain → -
spark ⑂
Mirror of Apache Spark
Scala ★ 1 2y agoExplain → -
python-memcached ⑂
A python memcached client library.
Python ★ 1 10y agoExplain → -
dulwich ⑂
Pure-Python Git implementation
Python ★ 1 9y agoExplain → -
keystone ⑂
No description.
★ 1 2y agoExplain → -
storm-docker ⑂
Dockerfiles for building a storm cluster.
Shell ★ 1 11y agoExplain → -
python-intercom ⑂
Python wrapper for the Intercom API.
Python ★ 1 2y agoExplain → -
pkg-opengrok
Ubuntu packaging for OpenGrok
Shell ★ 1 11y agoExplain → -
mysql-connector-python ⑂
MySQL Connector/Python is implementing the MySQL Client/Server protocol completely in Python. No MySQL libraries are needed, and no compilation is necessary to run this Python DB API v2.0 compliant driver. Documentation & Download: http://dev.mysql.com/doc/connector-python/en
Python ★ 1 10y agoExplain → -
redmine_image_clipboard_paste ⑂
Redmine plugin to allow pasting an image from the clipboard into the comment box on the form
JavaScript ★ 1 10y agoExplain → -
otp ⑂
Erlang/OTP
Erlang ★ 1 8y agoExplain → -
logrotate ⑂
The logrotate utility is designed to simplify the administration of log files on a system which generates a lot of log files.
C ★ 1 11y agoExplain → -
cld2
Compact Language Detector 2
C++ ★ 1 11y agoExplain → -
urilator
No description.
Erlang ★ 0 3y agoExplain → -
marray
No description.
Erlang ★ 0 3y agoExplain → -
erlang_consul_node_discovery
No description.
Erlang ★ 0 3y agoExplain → -
mochiweb ⑂
MochiWeb is an Erlang library for building lightweight HTTP servers.
Erlang ★ 0 5y agoExplain → -
docker-erlang-otp ⑂
the Official Erlang OTP image on Docker Hub
Dockerfile ★ 0 4y agoExplain → -
woodpecker ⑂
An opinionated fork of the Drone CI system
Go ★ 0 1y agoExplain → -
hbase-thirdparty ⑂
Mirror of Apache HBase Third Party Libs
★ 0 5y agoExplain → -
jira ⑂
Python JIRA Library is the easiest way to automate JIRA. Support for py27 was dropped on 2019-10-14, do not raise bugs related to it.
Python ★ 0 5y agoExplain → -
kafka-docker ⑂
No description.
Shell ★ 0 4y agoExplain → -
shublangjs
No description.
JavaScript ★ 0 5y agoExplain → -
asgi_rabbitmq ⑂
RabbitMQ backend for ASGI
Python ★ 0 3y agoExplain → -
deimos ⑂
Mesos containerizer hooks for Docker
★ 0 12y agoExplain → -
newrelic-python-agent ⑂
Mirror of the New Relic Python agent source
Python ★ 0 8y agoExplain → -
jmx_exporter ⑂
A process for exposing JMX Beans via HTTP for Prometheus consumption
Java ★ 0 2y agoExplain → -
docker-pxc
Percona XtraDB Cluster Dockerization
Dockerfile ★ 0 6y agoExplain → -
happybase ⑂
A developer-friendly Python library to interact with Apache HBase
Python ★ 0 8y agoExplain → -
slackbridge ⑂
Docker container for slack-irc
Dockerfile ★ 0 7y agoExplain → -
hannibal ⑂
Hannibal is tool to help monitor and maintain HBase-Clusters that are configured for manual splitting.
Ruby ★ 0 9y agoExplain → -
pyfoo ⑂
A Python Wrapper for the Wufoo REST API
Python ★ 0 11y agoExplain → -
py-trello ⑂
Python API wrapper around Trello's API
Python ★ 0 11y agoExplain → -
kibana3 ⑂
Fork of kibana 3 that works with elasticearch 5.0
JavaScript ★ 0 7y agoExplain → -
nanomsg ⑂
nanomsg library
C ★ 0 9y agoExplain → -
librdkafka ⑂
The Apache Kafka C/C++ library
C ★ 0 8y agoExplain → -
drone-0.3-build-images ⑂
[DEPRECATED]
★ 0 8y agoExplain → -
rundeck-docker ⑂
No description.
Shell ★ 0 8y agoExplain → -
confd ⑂
Manage local application configuration files using templates and data from etcd or consul
Go ★ 0 9y agoExplain → -
scrapinghub-image-casperjs
Recommended base Docker image for CasperJS spiders at Scrapinghub
Python ★ 0 9y agoExplain → -
confluent-kafka-python ⑂
Confluent's Apache Kafka Python client
Python ★ 0 8y agoExplain → -
kafka ⑂
Mirror of Apache Kafka
Java ★ 0 9y agoExplain → -
mrjob ⑂
Run MapReduce jobs on Hadoop or Amazon Web Services
Python ★ 0 2y agoExplain → -
shc ⑂
No description.
Scala ★ 0 8y agoExplain → -
docker-custodian ⑂
Keep docker hosts tidy
Python ★ 0 2y agoExplain → -
ga-beacon ⑂
Google Analytics collector-as-a-service (using GA measurement protocol).
Go ★ 0 9y agoExplain → -
redmine_mentions ⑂
Allows users to mention team members in their notes/comments.
JavaScript ★ 0 10y agoExplain → -
redmine_didyoumean ⑂
A Redmine plugin to search for possible duplicates when users are about to open new issues.
Ruby ★ 0 2y agoExplain → -
opentsdb-docker ⑂
Files required to make a trusted opentsdb Docker such that opentsdb can be used for other projects (e.g. scollector)
Shell ★ 0 10y agoExplain → -
potsdb ⑂
Python client to OpenTSDB
Python ★ 0 10y agoExplain → -
payment-logos ⑂
Downloadable set of payment gateway and credit card logo icons. Available in 4 sizes.
★ 0 10y agoExplain → -
redmine_emojibutton ⑂
Adds emoji capabilities to Redmine
Ruby ★ 0 10y agoExplain → -
log-courier ⑂
Log Courier, a lightweight log shipper with Logstash integration.
Go ★ 0 11y agoExplain → -
tx-keystone-auth
A project to authenticate and authorize access with keystone
Python ★ 0 11y agoExplain → -
backsaver
A git server
Python ★ 0 11y agoExplain → -
slugrunner ⑂
Runs Heroku slugs produced by slugbuilder in Docker
Shell ★ 0 12y agoExplain → -
slugbuilder ⑂
Builds Heroku slugs using Docker and buildpacks
Shell ★ 0 12y agoExplain → -
cedarish ⑂
Heroku Cedar-ish Base Image for Docker
Shell ★ 0 12y agoExplain → -
python-brightpearl
No description.
Python ★ 0 12y agoExplain →
No repos match these filters.