gitmyhub

awesome-project-ideas

★ 9.2k updated 3y ago

Curated list of Machine Learning, NLP, Vision, Recommender Systems Project Ideas

A curated list of over 30 machine learning and deep learning project ideas for learners and practitioners, each paired with links to public datasets you can download and start training on.

setup: easycomplexity 1/5

This repository is a curated list of project ideas for people learning or practicing machine learning and deep learning. It is not a library or a tool, it is a collection of prompts describing what you could build, along with pointers to public datasets you could use to build it. The list covers more than 30 ideas ranging from beginner-friendly tasks to research-level problems.

The ideas are organized into several categories. The text and natural language section includes things like automatically tagging forum questions, detecting abusive comments, answering questions from a document, summarizing long articles, and detecting whether two questions mean the same thing. Each entry names a relevant public dataset you can download and train on.

A forecasting section covers predicting time series data such as rainfall, air quality, electricity demand, and blood donation likelihood. A recommendation systems section includes building a movie recommender using ratings data or a book recommender. A vision section covers tasks like identifying plant diseases from photos, detecting objects in satellite imagery, and recognizing lip movements from video.

A hackathon ideas section, added more recently, focuses on projects that make use of large language models, such as a command-line tool that takes plain-English instructions and converts them to shell commands, knowledge base question answering, text-to-SQL, guided summarization, and text-to-music generation. These entries mention specific open-source tools and models that could serve as starting points.

The README also includes a short music and audio section covering tasks like genre classification and automatic playlist generation. No code is included in the repository itself; the value is in the idea descriptions and dataset links.

Where it fits