gitmyhub

researchpooler

Python ★ 460 updated 2y ago

Automating research publications discovery and analysis. For example, ever wish your computer could automatically open papers that are most similar to a paper at an arbitrary url? How about finding all papers that report results on some dataset? Let's re-imagine literature review.

What This Project Does

ResearchPooler is a tool that tries to make literature review easier by automating the tedious parts. Instead of manually searching academic databases, clicking through links, and reading abstracts one by one, this project lets you write simple queries to find, filter, and open papers in bulk. For example, you could find every paper mentioning "deep learning" in the title and open them all at once in your browser, or search across the full text of papers for mentions of a specific dataset like MNIST.

The core idea is to build a searchable database of research papers where you can ask practical questions: "Show me all papers similar to this one," or "Find every paper that uses this dataset," or "List all papers by this author." Today this feels obvious, but when this project was created, academic search tools were much more rigid and limited.

At a high level, the project organizes papers into simple data records—think of it as a spreadsheet where each row is a paper with fields like title, authors, year, venue, and ideally the full text extracted from PDFs. Once you have that data collected and organized, you can write quick Python scripts to search and filter it. The author provides working examples: three lines of code to find and open all NIPS papers mentioning "deep" in the title, or to search across full paper text for dataset names.

The project is deliberately kept simple and flat. Rather than building a complex database system right away, papers are stored as a pickled list of dictionaries—a straightforward Python format. This makes it easy for anyone to add new conferences or venues by writing a simple parser script that downloads papers from that conference's website and extracts basic metadata. The author explicitly invites contributors to build parsers for other venues like ICML or CVPR, following the same pattern.

This project would appeal to graduate students, researchers, and academics who spend hours hunting for related work, or to anyone building a literature review tool. The real value isn't in any single query—it's in the ability to batch-process and combine searches in ways that would take much longer to do by hand. The README notes this was built quickly as a proof-of-concept, so while it's rough around the edges, the core concept is genuinely useful.