data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
A collection of Jupyter notebooks with working code examples covering data science and machine learning topics like deep learning, scikit-learn, pandas, and big data processing.
This repository is a large collection of Jupyter notebooks — interactive documents that combine written explanation with runnable Python code — covering a wide range of data science topics. The problem it solves is giving learners and practitioners a single organized reference for the most common tools and techniques used in data science and machine learning.
The notebooks are organized by topic. There are sections on deep learning using TensorFlow, Theano, Keras, and Caffe; on scikit-learn for traditional machine learning tasks like classification and regression; on pandas and NumPy for manipulating data; on matplotlib for creating charts; on Spark and Hadoop MapReduce for processing very large datasets that don't fit on a single machine; on working with Amazon Web Services; and on Python fundamentals. There are also notebooks from Kaggle, which is a platform that hosts data science competitions.
Each notebook walks through a concept with working code examples, making it easy to see both the explanation and the actual output side by side. You can open any notebook, run the code, and experiment with it directly.
You would use this repository when you are learning data science or machine learning in Python, or when you want a quick working example of how to use a particular library or technique without starting from scratch.
Where it fits
- Learn data science fundamentals by running interactive notebooks with explanations and code side by side.
- Find working examples of how to use libraries like pandas, scikit-learn, or TensorFlow without building from scratch.
- Explore deep learning, traditional machine learning, and big data processing techniques with executable code.
- Reference common data manipulation and visualization patterns when building your own data science projects.