Members
-
OLMo ★ PINNED
Modeling, training, eval, and inference code for OLMo
Python ★ 6.6k 6mo agoExplain → -
dolma ★ PINNED
Data and tools for generating and inspecting OLMo pre-training data.
Python ★ 1.5k 7mo agoExplain → -
ai2thor ★ PINNED
An open-source platform for Visual AI.
C# ★ 1.7k 7mo agoExplain → -
olmocr ★ PINNED
Toolkit for linearizing PDFs for LLM datasets/training
Python ★ 17k 2mo agoExplain → -
OLMoE ★ PINNED
OLMoE: Open Mixture-of-Experts Language Models
Jupyter Notebook ★ 1.0k 9mo agoExplain → -
allennlp ▣
An open-source NLP research library, built on PyTorch.
Python ★ 12k 3y agoExplain → -
open-instruct
AllenAI's post-training codebase
Python ★ 3.8k 20h agoExplain → -
RL4LMs
A modular RL library to fine-tune language models to human preferences
Python ★ 2.4k 2y agoExplain → -
longformer
Longformer: The Long-Document Transformer
Python ★ 2.2k 3y agoExplain → -
scispacy
A full spaCy pipeline and models for scientific/biomedical documents.
Python ★ 2.0k 6mo agoExplain → -
scibert
A BERT model for scientific text.
Python ★ 1.7k 4y agoExplain → -
bilm-tf
Tensorflow implementation of contextualized word representations from bi-directional language models
Python ★ 1.6k 3y agoExplain → -
bi-att-flow
Bi-directional Attention Flow (BiDAF) network is a multi-stage hierarchical process that represents context at different levels of granularity and uses a bi-directional attention flow mechanism to achieve a query-aware context representation without early summarization.
Python ★ 1.5k 3y agoExplain → -
OLMo-core
PyTorch building blocks for the OLMo ecosystem
Python ★ 1.3k 3h agoExplain → -
objaverse-xl
🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!
Python ★ 1.3k 1y agoExplain → -
s2orc
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
Python ★ 1.1k 2y agoExplain → -
natural-instructions
Expanding natural instructions
Python ★ 1.0k 2y agoExplain → -
mmc4
MultimodalC4 is a multimodal extension of c4 that interleaves millions of images with text.
Python ★ 953 1y agoExplain → -
molmo
Code for the Molmo Vision-Language Model
Python ★ 913 1y agoExplain → -
XNOR-Net
ImageNet classification using binary Convolutional Neural Networks
Lua ★ 870 8y agoExplain → -
papermage
library supporting NLP and CV research on scientific papers
Python ★ 797 1y agoExplain → -
visprog
Official code for VisProg (CVPR 2023 Best Paper!)
Python ★ 773 1y agoExplain → -
scitldr
No description.
Python ★ 759 3y agoExplain → -
pdffigures2
Given a scholarly PDF, extract figures, tables, captions, and section titles.
Scala ★ 748 2y agoExplain → -
reward-bench
RewardBench: the first evaluation tool for reward models.
Python ★ 721 4mo agoExplain → -
science-parse
Science Parse parses scientific papers (in PDF form) and returns them in structured form.
Java ★ 700 2y agoExplain → -
molmo2
Code for the Molmo2 Vision-Language Model
Python ★ 660 3mo agoExplain → -
unified-io-2
No description.
Python ★ 648 2y agoExplain → -
molmoact2
Official Repository for MolmoAct2
Python ★ 619 4d agoExplain → -
specter
SPECTER: Document-level Representation Learning using Citation-informed Transformers
Python ★ 583 3y agoExplain → -
WildDet3D
Allen Institute for AI: WildDet3D: Scaling Promptable 3D Detection in the Wild
Python ★ 582 19d agoExplain → -
molmoweb
No description.
Python ★ 574 11d agoExplain → -
tango
Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.
Python ★ 571 2y agoExplain → -
allennlp-models ▣
Officially supported AllenNLP models
Python ★ 563 3y agoExplain → -
Holodeck
CVPR 2024: Language Guided Generation of 3D Embodied AI Environments.
Python ★ 553 1y agoExplain → -
dont-stop-pretraining
Code associated with the Don't Stop Pretraining ACL 2020 paper
Python ★ 543 4y agoExplain → -
python-package-template
A template repo for Python packages
Python ★ 535 1y agoExplain → -
OLMoASR
An open-source implementation of Whisper
Python ★ 491 7mo agoExplain → -
lumos
Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"
Python ★ 478 2y agoExplain → -
s2orc-doc2json
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
Python ★ 469 2y agoExplain → -
procthor
🏘️ Scaling Embodied AI by Procedurally Generating Interactive 3D Houses
Python ★ 442 3y agoExplain → -
pawls
Software that makes labeling PDFs easy.
Python ★ 430 2y agoExplain → -
scholarphi
An interactive PDF reader.
Python ★ 428 2y agoExplain → -
deep_qa ▣
A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Python ★ 403 8y agoExplain → -
ir_datasets
Provides a common interface to many IR ranking datasets.
Python ★ 390 23d agoExplain → -
allenact
An open source framework for research in Embodied-AI from AI2.
Python ★ 382 1mo agoExplain → -
olmes
Reproducible, flexible LLM evaluations
Python ★ 380 2mo agoExplain → -
vla-evaluation-harness
One framework to evaluate any VLA model on any robot simulation benchmark.
Python ★ 374 2d agoExplain → -
molmoact
Official Repository for MolmoAct
Python ★ 369 1mo agoExplain → -
molmospaces
An end-to-end open ecosystem for robot learning
Python ★ 366 2d agoExplain → -
ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
Scala ★ 365 6mo agoExplain → -
awesome-open-source-lms
Friends of OLMo and their links.
★ 364 9mo agoExplain → -
codescientist
CodeScientist: An automated scientific discovery system for code-based experiments
Python ★ 346 2mo agoExplain → -
satlas-super-resolution
No description.
Python ★ 340 2mo agoExplain → -
openie-standalone
Quality information extraction at web scale. Edit
Scala ★ 333 9y agoExplain → -
OLMoE.swift
No description.
Swift ★ 310 1y agoExplain → -
ai2-scholarqa-lib
Repo housing the open sourced code for the ai2 scholar qa app and also the corresponding library
Python ★ 281 3mo agoExplain → -
satlas
No description.
Python ★ 279 2mo agoExplain → -
s2-folks ▣
Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
★ 276 1y agoExplain → -
scifact
Data and models for the SciFact verification task.
Python ★ 265 2y agoExplain → -
objaverse-rendering
📷 Scripts for rendering Objaverse
Python ★ 265 2y agoExplain → -
comet-atomic-2020
No description.
Python ★ 262 1y agoExplain → -
WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
Python ★ 254 1y agoExplain → -
olmoearth_pretrain
Earth system foundation model data, training, and eval
Python ★ 251 1d agoExplain → -
asta-paper-finder
frozen-in-time version of our Paper Finder agent for reproducing evaluation results
Python ★ 244 3mo agoExplain → -
sera-cli
A tool to use the Ai2 Open Coding Agents Soft-Verified Efficient Repository Agents (SERA) model with Claude Code
Python ★ 240 3mo agoExplain → -
real-toxicity-prompts
No description.
Jupyter Notebook ★ 233 5y agoExplain → -
wimbd
What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets
Python ★ 228 1y agoExplain → -
cartography
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Jupyter Notebook ★ 219 3y agoExplain → -
discoveryworld
A virtual environment for developing and evaluating automated scientific discovery agents.
Python ★ 216 1y agoExplain → -
hidden-networks
No description.
Python ★ 198 3mo agoExplain → -
cord19
Get started with CORD-19
★ 187 1y agoExplain → -
autodiscovery-neurips
Official code for NeurIPS 2025 paper "AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise"
Python ★ 186 2mo agoExplain → -
peS2o
Pretraining Efficiently on S2ORC!
Python ★ 186 1y agoExplain → -
medicat
Dataset of medical images, captions, subfigure-subcaption annotations, and inline textual references
Python ★ 176 4mo agoExplain → -
asta-theorizer
Staging area for a public release of Theorizer
HTML ★ 169 1mo agoExplain → -
mmda
multimodal document analysis
Jupyter Notebook ★ 166 1mo agoExplain → -
pixmo-docs
ACL 2025: Synthetic data generation pipelines for text-rich images.
Python ★ 163 1y agoExplain → -
multimodalqa
No description.
Python ★ 156 3y agoExplain → -
spoc-robot-training
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Python ★ 154 1y agoExplain → -
IFBench
No description.
Python ★ 151 1mo agoExplain → -
FlexOlmo
Code and training scripts for FlexOlmo
Python ★ 150 2mo agoExplain → -
discoverybench
Discovering Data-driven Hypotheses in the Wild
Python ★ 148 1y agoExplain → -
scidocs
Dataset accompanying the SPECTER model
Python ★ 148 3y agoExplain → -
SERA
Data generation and training repository for SERA: Soft-Verified Efficient Repository Agents.
Python ★ 146 26d agoExplain → -
satlaspretrain_models
No description.
Jupyter Notebook ★ 144 1y agoExplain → -
agent-baselines
No description.
Python ★ 143 12d agoExplain → -
SPECTER2
No description.
Python ★ 137 3mo agoExplain → -
bolmo-core
Code for Bolmo: Byteifying the Next Generation of Language Models
Python ★ 136 4d agoExplain → -
scicite
Repository for NAACL 2019 paper on Citation Intent prediction
Python ★ 130 6y agoExplain → -
wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Python ★ 126 1y agoExplain → -
procthor-10k
The ProcTHOR-10K Houses Dataset
Python ★ 123 3y agoExplain → -
aokvqa
Official repository for the A-OKVQA dataset
Python ★ 116 2y agoExplain → -
s2search
The Semantic Scholar Search Reranker
Python ★ 112 5y agoExplain → -
asta-bench
No description.
Python ★ 111 3d agoExplain → -
S2AND
Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite
Python ★ 109 3d agoExplain → -
PoliFormer
PoliFormer: Scaling On-Policy RL with Transformers Results in Masterful Navigators
Python ★ 106 1y agoExplain → -
infinigram-api
No description.
Python ★ 101 8d agoExplain → -
safety-eval
A simple evaluation of generative language models and safety classifiers.
Python ★ 100 4d agoExplain → -
DecomP
Repository for Decomposed Prompting
Python ★ 99 2y agoExplain → -
robothor-challenge
RoboTHOR Challenge
Python ★ 99 5y agoExplain → -
pdf-component-library
No description.
TypeScript ★ 92 2y agoExplain → -
reclip
No description.
Python ★ 92 4y agoExplain → -
scirepeval
SciRepEval benchmark training and evaluation scripts
Python ★ 91 1mo agoExplain → -
rslearn
A tool for developing remote sensing datasets and models.
Python ★ 90 2d agoExplain → -
MolmoBot
Code and website for "MolmoB0T: Large-Scale Simulation Enables Zero-Shot Manipulation".
Python ★ 90 16d agoExplain → -
Break
No description.
HTML ★ 89 3y agoExplain → -
clin
No description.
JavaScript ★ 89 2y agoExplain → -
duplodocus
Tooling for exact and MinHash deduplication of large-scale text datasets
Rust ★ 87 2mo agoExplain → -
allennlp-reading-comprehension ▣
No description.
Python ★ 87 6y agoExplain → -
dolma3
No description.
Python ★ 78 2mo agoExplain → -
olmoearth_projects
OlmoEarth projects
Python ★ 76 1mo agoExplain → -
pnp
Probabilistic Neural Programming
Scala ★ 74 7y agoExplain → -
olmo-cookbook
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
Python ★ 73 23d agoExplain → -
SAGE
[arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
Python ★ 71 6mo agoExplain → -
codenav
CodeNav is an LLM agent that navigates and leverages previously unseen code repositories to solve user queries.
Python ★ 69 1y agoExplain → -
ms2
No description.
Python ★ 68 3y agoExplain → -
atlantes
Efficient and low latency real-time global-scale GPS trajectory modeling
Python ★ 66 4d agoExplain → -
phone2proc
📱👉🏠 Perform conditional procedural generation to generate houses like your own!
Python ★ 64 2y agoExplain → -
marg-reviewer
Code/data for MARG (multi-agent review generation)
Python ★ 64 3mo agoExplain → -
paper-embedding-public-apis
Collection of public APIs for embedding scientific papers
★ 60 5y agoExplain → -
ruletaker
No description.
Python ★ 55 1y agoExplain → -
scruples
A corpus and code for understanding norms and subjectivity. 🤖
Python ★ 54 1y agoExplain → -
datamap-rs
Data mapping framework for rust stuff
Rust ★ 54 2mo agoExplain → -
super-benchmark
No description.
Jupyter Notebook ★ 53 1y agoExplain → -
molmo-motion
No description.
Python ★ 52 2d agoExplain → -
cached_path
A file utility for accessing both local and remote files through a unified interface.
Python ★ 48 1mo agoExplain → -
EMO
No description.
HTML ★ 44 14h agoExplain → -
nlpstack
NLP toolkit (tokenizer, POS-tagger, parser, etc.)
Scala ★ 43 9y agoExplain → -
wildteaming
No description.
Python ★ 43 1y agoExplain → -
olmo-eval
No description.
Python ★ 42 2d agoExplain → -
olmix
No description.
Python ★ 41 26d agoExplain → -
bff
No description.
Rust ★ 39 2y agoExplain → -
fermi
No description.
Python ★ 38 4y agoExplain → -
artifact-linker
ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery
Python ★ 37 2d agoExplain → -
multicite
MultiCite code and data. Models are available on Huggingface.
Python ★ 36 4y agoExplain → -
c4-documentation
No description.
★ 33 5y agoExplain → -
decon
decontamination
Rust ★ 33 3mo agoExplain → -
prescience
PreScience: A Benchmark for Forecasting Scientific Contributions
Python ★ 30 1mo agoExplain → -
signal-and-noise
Measuring the Signal to Noise Ratio in Language Model Evaluation
Python ★ 30 10mo agoExplain → -
recoma
Reasoning by Communicating with Agents
Python ★ 30 1y agoExplain → -
fluid-benchmarking
Fluid Language Model Benchmarking
Python ★ 30 9mo agoExplain → -
persona-bias
No description.
Python ★ 29 2y agoExplain → -
few_shot_explanations
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
HTML ★ 29 3y agoExplain → -
hybrid-preferences
Learning to route instances for Human vs AI Feedback (ACL Main '25)
Python ★ 29 11mo agoExplain → -
natural-instructions-v1
Benchmarking Generalization to New Tasks from Natural Language Instructions
Python ★ 28 5y agoExplain → -
noncompliance
This repository contains data, code and models for contextual noncompliance.
Python ★ 26 1y agoExplain → -
grobid ⑂
A machine learning software for extracting information from scholarly documents
Java ★ 23 5d agoExplain → -
sbt-plugins ▣
SBT Plugins for AI2 projects
Scala ★ 23 3y agoExplain → -
rslearn_projects
No description.
Python ★ 22 2d agoExplain → -
S2APLER
S2APLER: S2 Agglomeration of Papers with Low Error Rate (it's for academic paper clustering)
Python ★ 22 1mo agoExplain → -
neurodiscoverybench
No description.
Python ★ 22 4mo agoExplain → -
SimplerEnv
No description.
Jupyter Notebook ★ 21 9mo agoExplain → -
openscilm
Demo for https://arxiv.org/abs/2411.14199
Python ★ 20 2mo agoExplain → -
DrawEduMath
Can VLMs understand students' hand-drawn math work?
Python ★ 19 5mo agoExplain → -
twentyquestions
A web application for playing 20 Questions to crowdsource common sense. 🤖
Python ★ 17 3y agoExplain → -
asta-plugins
No description.
Python ★ 16 2d agoExplain → -
AskOlmo
No description.
Python ★ 16 7mo agoExplain → -
MolmoPoint-GUISyn
Synthetic GUI Pointing Data Generation
Python ★ 15 2mo agoExplain → -
agent-eval
No description.
Python ★ 14 4d agoExplain → -
lerobot ⑂
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
Python ★ 14 19d agoExplain → -
s6ui
A fast AWS S3 browser, with inspiration from s5cmd
Rust ★ 13 22d agoExplain → -
clarifydelphi
No description.
Python ★ 13 2y agoExplain → -
feb
Code associated with the paper: "Few-Shot Self-Rationalization with Natural Language Prompts"
HTML ★ 12 4y agoExplain → -
lighthouse
distance to the nearest coastline is all you need.
Python ★ 12 6mo agoExplain → -
olmoearth_pretrain_minimal
No description.
Python ★ 12 1mo agoExplain → -
panda
Panda ("plan-and-act") agent for Autonomous Scientific Discovery
Python ★ 10 2mo agoExplain → -
STTS
Official Repository for STTS.
Python ★ 10 3mo agoExplain → -
olmpool
Code for the paper "Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension"
Python ★ 8 1mo agoExplain → -
mathfish
No description.
Python ★ 8 1y agoExplain → -
common ▣
A collection of useful utility classes and functions.
Scala ★ 8 5y agoExplain → -
olmoearth_ml4rs_tutorial
No description.
Jupyter Notebook ★ 8 1mo agoExplain → -
grapal-website ▣
GrapAL landing page, user guide & documentation
Ruby ★ 7 6y agoExplain → -
sinonym
Format and normalize Chinese names into Western forms
Python ★ 6 4d agoExplain → -
olmo-api
Deprecated: use https://github.com/allenai/playground
Python ★ 6 1mo agoExplain → -
olmo-ui
Deprecated: use https://github.com/allenai/playground
TypeScript ★ 6 2mo agoExplain → -
OLMo-ladder
Repository for task scaling laws using model ladders
Python ★ 6 7mo agoExplain → -
asta-resource-repo
Service for sharing data resource between asta users, clients, and agents
Python ★ 5 2d agoExplain → -
layout-parser ⑂
A Python Library for Document Layout Understanding
Python ★ 5 16d agoExplain → -
dracula
Official implementation of "DRACULA: Hunting for the Actions Users Want Deep Research Agents to Execute"
Python ★ 5 2mo agoExplain → -
mcp-tool-eval
No description.
Python ★ 5 7mo agoExplain → -
molmospaces-resources
Resource manager for MolmoSpaces
Python ★ 4 9d agoExplain → -
molmo-utils
A set of helper functions for processing and integrating visual inputs with Molmo
Python ★ 4 6mo agoExplain → -
asta-bench-leaderboard
No description.
Python ★ 3 2d agoExplain → -
nora_lib
No description.
Python ★ 3 3d agoExplain → -
ai2_robot_infra
Common real-robot infra for Ai2 Robotics
Python ★ 3 3mo agoExplain → -
skiff2-actions
GitHub actions for skiff2 repositories.
TypeScript ★ 3 12d agoExplain → -
pier
Workspace manager for coding agents. Interactively solve and develop Harbor tasks.
Python ★ 3 2mo agoExplain → -
OlmoEarth-Feedback
Repo for collection of feedback on OlmoEarth
★ 2 4mo agoExplain → -
mujoco
No description.
C++ ★ 2 5d agoExplain → -
personalized-scholarqa-eval
Evaluation code for the paper "Language Models Don't Know What You Want: Evaluating Personalization in Deep Research Needs Real Users"
Python ★ 2 2mo agoExplain → -
olmes-docker
Execution environments for OLMES
Python ★ 2 8mo agoExplain → -
call_molmo
No description.
Python ★ 2 8mo agoExplain → -
ai2-scholarqa-eval
No description.
Python ★ 2 3mo agoExplain → -
intent-aware-lfqa
No description.
Python ★ 2 2mo agoExplain → -
curobo
No description.
Python ★ 1 1mo agoExplain → -
homebrew-s6ui
Homebrew tap to install allenai/s6ui
Ruby ★ 1 3mo agoExplain → -
reshard-tokenized
CLI for merging tokenized shard .npy files and remapping .csv.gz metadata offsets.
Rust ★ 1 3mo agoExplain → -
molmospaces_policy_zoo
Policy zoo for data generation + evaluation in MolmoSpaces
Python ★ 0 4d agoExplain → -
fairseq ⑂
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Python ★ 0 17d agoExplain →
No repos match these filters.