Hi, I'm Kobe 👋 🚀 Currently Maintaining/Contributing Harbor — Agent evaluation framework and RL environment toolkit. [[paper]](https://arxiv.org/abs/2601.11868) SkillsBench — Evaluating how well skills work and how effective agents are at…
Hi, I'm Kobe 👋
🚀 Currently Maintaining/Contributing
- Harbor — Agent evaluation framework and RL environment toolkit. [[paper]](https://arxiv.org/abs/2601.11868)
- SkillsBench — Evaluating how well skills work and how effective agents are at using them. [[paper]](https://arxiv.org/abs/2602.12670)
- LMCache — The fastest KV cache layer for LLMs. [[paper]](https://arxiv.org/abs/2510.09665)
- OT Agent — Open-source terminal agent from the Open Thoughts team. [[blog]](https://www.openthoughts.ai/blog/openthoughts-tblite) [[blog]](https://www.openthoughts.ai/blog/agent)
- ClawsBench — Benchmark for claw-like agents. [[paper]](https://arxiv.org/abs/2604.05172)
🛠️ Previous Projects
Agents & Evaluation
- Terminal Bench — Benchmark for LLMs on complex terminal tasks. [[paper]](https://arxiv.org/abs/2601.11868)
- lmcache-agent-trace — Agent application, benchmark, and workload traces for LLM serving research.
- claude-code-tracing — Tracing tooling for Claude Code agent runs. [[blog]](https://huggingface.co/blog/kobe0938/context-engineering-reuse-pattern-claude-code)
LLM Inference & Serving Infra
- vLLM / production-stack — High-throughput LLM inference engine and its K8s-native serving stack. *Contributor.*
- inference-engine-arena — Postman & Chatbot Arena for inference benchmarking. *(Open-sourced ~3 months before SemiAnalysisAI/InferenceX.)*
- cacheserve — KV-cache-aware serving experiments. [[paper]](https://arxiv.org/abs/2512.14946)
- lmcache-trace-analysis / mooncake-trace-replayer — Trace analysis & replay for inference workloads.
Others
- Continuum — Multi-turn LLM agent scheduling with KV-cache time-to-live for efficient serving. *Contributor.* [[paper]](https://arxiv.org/abs/2511.02230)
- VidGen — Diffusion + autoregressive models for interactive video/game generation (Diffusive AI).
- LAG — Research experiments.
- citation-verifier — Verifying citations produced by LLM agents (TypeScript).
-
claude-code-tracing
No description.
Python ★ 3 26d agoExplain → -
llm-inference-fast-benchmark
This repository benchmarks the performance of large language models (LLMs) on a 8B role play model(Sao10K/L3-8B-Lunaris-v1) with an average input of 4k tokens and an output of 250 tokens
Python ★ 3 1y agoExplain → -
LMCache ⑂
Making Long-Context LLM Inference 10x Faster and 10x Cheaper
Python ★ 2 7mo agoExplain → -
cacheserve
No description.
Jupyter Notebook ★ 1 6mo agoExplain → -
docker-hub-star-tracker
No description.
Python ★ 0 27m agoExplain → -
harbor ⑂
Harbor is a framework for running agent evaluations and creating and using RL environments.
Python ★ 0 3h agoExplain → -
harborize-harbor-check-experiment-logs
No description.
Python ★ 0 6d agoExplain → -
habor-Include-Exclude-patterns-in-agent-verifier
No description.
Shell ★ 0 9d agoExplain → -
terminal-bench-leaderboard-detection-blog
No description.
Python ★ 0 11d agoExplain → -
harbor-datasets ⑂
No description.
HTML ★ 0 5mo agoExplain → -
kobe0938
My profile README — agents, LLM infra, and the projects I'm working on.
★ 0 1mo agoExplain → -
long-horizon ⑂
Verifiable long-horizon SWE tasks
★ 0 1mo agoExplain → -
citation-verifier
No description.
TypeScript ★ 0 2mo agoExplain → -
ghostfolio-fork ⑂
Open Source Wealth Management Software. Angular + NestJS + Prisma + Nx + TypeScript 🤍
TypeScript ★ 0 2mo agoExplain → -
smolclaw-fork ⑂
High resolution mock environments for testing and improving claw like agents
Python ★ 0 3mo agoExplain → -
terminal-bench ⑂
A benchmark for LLMs on complicated tasks in the terminal
Python ★ 0 7mo agoExplain → -
LAG
No description.
Python ★ 0 3mo agoExplain → -
terminal-bench-3-fork ⑂
🚧 Accepting Task Submissions 🚧
Python ★ 0 4mo agoExplain → -
awesome-harbor ⑂
A curated list of awesome Harbor ecosystem projects
★ 0 4mo agoExplain → -
skillsbench ⑂
SkillsBench evaluates how well skills work and how effective agents are at using them
PDDL ★ 0 4mo agoExplain → -
blog
No description.
HTML ★ 0 4mo agoExplain → -
tb-parity_experiment-trace
No description.
★ 0 7mo agoExplain → -
terminal-bench-datasets ⑂
No description.
Python ★ 0 7mo agoExplain → -
vllm-fork ⑂
A high-throughput and memory-efficient inference and serving engine for LLMs
Python ★ 0 8mo agoExplain → -
lmcache-trace-analysis
No description.
Python ★ 0 8mo agoExplain → -
lmcache.github.io ⑂
LMCache official blog
HTML ★ 0 8mo agoExplain → -
production-stack ⑂
vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
Python ★ 0 9mo agoExplain → -
VidGen
This research project, developed by Diffusive AI, explores the use of diffusion models and autoregressive models for generating interactive videos and games.
Python ★ 0 1y agoExplain → -
mooncake-trace-replayer
No description.
Python ★ 0 10mo agoExplain → -
docs
No description.
MDX ★ 0 1y agoExplain → -
fastapi
No description.
Python ★ 0 1y agoExplain → -
rembg ⑂
Rembg is a tool to remove images background
Python ★ 0 1y agoExplain → -
gorilla ⑂
Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
Python ★ 0 1y agoExplain → -
ChatGPT-Next-Web-BISV ⑂
A well-designed cross-platform ChatGPT UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT 应用。
★ 0 2y agoExplain → -
chatgpt-vercel ⑂
Elegant and Powerfull. Powered by OpenAI and Vercel.
★ 0 3y agoExplain → -
seismic-hazard-risk-class ⑂
Code supporting Jack Baker's seismic hazard and risk analysis class
★ 0 3y agoExplain → -
Large-Platform-Reinforcement-Learning-Model
No description.
Jupyter Notebook ★ 0 3y agoExplain → -
Full-Stack-Web-Application
No description.
JavaScript ★ 0 3y agoExplain → -
leetcode ⑂
Leetcode solutions
★ 0 3y agoExplain → -
Data-Structure
No description.
Java ★ 0 4y agoExplain → -
Piglets-Nursing-Level-Prediction
No description.
Jupyter Notebook ★ 0 4y agoExplain → -
DataBase-SQL
No description.
Jupyter Notebook ★ 0 4y agoExplain → -
Android-BunnyWorld
No description.
Java ★ 0 4y agoExplain → -
Machine-Learning
No description.
TeX ★ 0 4y agoExplain →
No repos match these filters.