gitmyhub

pandas-cookbook

Jupyter Notebook ★ 7.1k updated 1y ago

Recipes for using Python's pandas library

Nine interactive Jupyter notebooks teaching pandas data analysis using real-world datasets, from reading a CSV file to grouping, cleaning messy data, and working with dates and SQL.

PythonpandasJupyter Notebooksetup: easycomplexity 2/5

Pandas is a Python library for working with structured data like spreadsheets and CSV files. It is widely used in data analysis because it makes it fast to filter, sort, group, and combine large datasets. This cookbook is a collection of worked examples intended to help beginners get started with pandas using real datasets rather than toy examples.

The cookbook is organized as nine chapters, each in its own Jupyter Notebook file. Jupyter Notebooks are interactive documents where code and explanatory text are combined, so you can run each example step by step in your browser or on your own machine. The chapters start with the basics, like reading a CSV file and selecting rows or columns, and progress through more involved tasks: grouping data to find patterns, combining multiple datasets, extracting information from text, cleaning up messy data, working with dates and timestamps, and loading data from a SQL database.

All three real-world datasets used in the cookbook are included in the repository, so you can run every example immediately without hunting for data. The datasets are 311 service calls in New York City, bicycle path counts in Montreal, and hourly Montreal weather data for 2012.

You can try the cookbook in your browser via Jupyter Lite without installing anything. To run it locally, you clone the repository, install the dependencies with pip, and start Jupyter. A Docker option is also described for those who prefer containers.

The cookbook was written by Julia Evans, who notes in the README that the official pandas documentation is thorough but that many people find it hard to get started without concrete examples that show real-world messiness. The license is Creative Commons Attribution-ShareAlike 4.0. A Chinese translation of the repository exists separately.

Where it fits