gitmyhub

reinforcement-learning

Jupyter Notebook ★ 22k updated 3y ago

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

A hands-on learning resource with Python code examples and exercises for reinforcement learning, aligned with the Sutton-Barto textbook and David Silver's lectures.

Python 3Jupyter NotebookOpenAI GymTensorFlowsetup: moderatecomplexity 3/5

This repository is a learning resource for reinforcement learning — a branch of artificial intelligence where a software agent learns to make decisions by trial and error, receiving rewards for good actions and penalties for bad ones. Think of it like training a dog with treats, but applied to algorithms.

The code is designed to accompany two specific learning materials: the textbook "Reinforcement Learning: An Introduction" (2nd edition) by Sutton and Barto, and David Silver's university lecture course on reinforcement learning. Each folder in the repo corresponds to a chapter or topic from those materials, and contains exercises, worked solutions, a summary of the key concepts, and links to further reading.

The implemented algorithms cover a progression from foundational to more advanced techniques: dynamic programming (planning when you have a complete model of the environment), Monte Carlo methods (learning from complete episodes of experience), temporal difference learning (learning step by step without waiting for an episode to end), Q-Learning (a widely studied off-policy method), and Deep Q-Learning (combining Q-Learning with neural networks to handle complex problems like Atari games). Policy gradient methods and an actor-critic algorithm are also included.

Everything is written in Python 3 using Jupyter Notebooks — interactive documents that mix code, explanations, and output — and uses OpenAI Gym for training environments and TensorFlow for the neural network-based algorithms.

You would use this repo if you are studying reinforcement learning and want hands-on code alongside the theory.

Where it fits