-
Trace2Skill
Official codebase of the paper -- Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Python ★ 152 1mo agoExplain → -
STAR
STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
Python ★ 49 1mo agoExplain → -
MARCH
No description.
Python ★ 28 10d agoExplain → -
CollectionLoRA
Implementation of "CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation"
Python ★ 27 19d agoExplain → -
CLIPO
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR
Python ★ 21 2mo agoExplain → -
OpenRS
Open Rubric System: Scaling Reinforcement Learning with Pairwise Adaptive Rubric
Python ★ 18 3mo agoExplain → -
SSP ⑂
Search Self-Play: Pushing the Frontier of Agent Capability without Supervision
Python ★ 18 5mo agoExplain → -
DIR
No description.
Python ★ 17 4mo agoExplain → -
Skill-RM
Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill
Python ★ 14 11d agoExplain → -
GD2PO
No description.
Python ★ 9 3d agoExplain → -
SiameseNorm ⑂
Code of the research paper "SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm"
Python ★ 9 28d agoExplain → -
EVPV-PRM
No description.
Python ★ 7 3mo agoExplain → -
Proxy-GRM
No description.
Python ★ 4 3mo agoExplain → -
ATP-Bench
No description.
Python ★ 3 1mo agoExplain → -
RiT
RiT: Rubrics-in-Thinking Reinforcement Learning for Improved Reasoning in Large Language Models
★ 0 2mo agoExplain →
No repos match these filters.