366-day current streak·366-day longest streak
Hi there,I'm XuMing 👋 🔭 I’m currently working on Agentica, MedicalGPT, TreeSearch, ai-paper-analysis 🌱 I’m currently learning Multimodal Technology, NLP, CV 👯 I’m looking to collaborate on pycorrector, text2vec 💬…
Hi there,I'm XuMing 👋
- 🔭 I’m currently working on Agentica, MedicalGPT, TreeSearch, ai-paper-analysis
- 🌱 I’m currently learning Multimodal Technology, NLP, CV
- 👯 I’m looking to collaborate on pycorrector, text2vec
- 💬 Ask me about NLP, pytorch, deeplearning
- 📫 How to reach me: Ming Xu (徐明)
- 😄 Pronouns: 做个星星,有棱有角,还会发光
shibing624/shibing624 is a ✨ _special_ ✨ repository because its README.md (this file) appears on your GitHub profile.
Here are some ideas to get you started:
- 🔭 I’m currently working on ...
- 🌱 I’m currently learning ...
- 👯 I’m looking to collaborate on ...
- 🤔 I’m looking for help with ...
- 💬 Ask me about ...
- 📫 How to reach me: ...
- 😄 Pronouns: ...
- ⚡ Fun fact: ...
---
-
pycorrector ★ PINNED
pycorrector is a toolkit for text error correction. 文本纠错,实现了Kenlm,T5,MacBERT,ChatGLM3,Qwen2.5等模型应用在纠错场景,开箱即用。
Python ★ 6.5k 24d agoExplain → -
text2vec ★ PINNED
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
Python ★ 5.0k 4mo agoExplain → -
MedicalGPT ★ PINNED
MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型,实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO、GRPO。
Python ★ 5.6k 25d agoExplain → -
agentica ★ PINNED
Agentica: Lightweight async-first Python framework for AI agents. 轻量级异步优先的AI Agent框架,支持工具调用、RAG、多智能体和MCP。
Python ★ 319 3h agoExplain → -
imgocr ★ PINNED
Python3 package for Chinese/English OCR,use paddleocr-v5 onnx model(~20MB), with ultra-fast inference speed. 基于ppocr-v5-onnx模型推理,中英文OCR开源SOTA,推理速度超快。
Python ★ 132 2mo agoExplain → -
TreeSearch ★ PINNED
TreeSearch: Search your codebase like a human — not like a vector database. No embeddings. No chunking. Just millisecond search over structured documents and large codebases. 无需 embedding,无需切分文档,在结构化文档和大型代码库中实现毫秒级检索。
Python ★ 210 2mo agoExplain → -
python-tutorial
Python实用教程,包括:Python基础,Python高级特性,面向对象编程,多线程,数据库,数据科学,Flask,爬虫开发教程。
Jupyter Notebook ★ 2.4k 3y agoExplain → -
similarity
similarity: Text similarity calculation Toolkit for Java. 文本相似度计算工具包,java编写,可用于文本相似度计算、情感分析等任务,开箱即用。
Java ★ 1.6k 5mo agoExplain → -
textgen
TextGen: Implementation of Text Generation models, include LLaMA, BLOOM, GPT2, BART, T5, SongNet and so on. 文本生成模型,实现了包括LLaMA,ChatGLM,BLOOM,GPT2,Seq2Seq,BART,T5,UDA等模型的训练和预测,开箱即用。
Python ★ 980 1y agoExplain → -
similarities
Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。
Python ★ 903 3mo agoExplain → -
ChatPDF
RAG for Local LLM, chat with PDF/doc/txt files, ChatPDF. 纯原生实现RAG功能,基于本地LLM、embedding模型、reranker模型实现,支持GraphRAG,无须安装任何第三方agent库。
Python ★ 854 1y agoExplain → -
ChatPilot
ChatPilot: Chat Agent Web UI,实现Chat对话前端,支持Google搜索、文件网址对话(RAG)、代码解释器功能,复现了Kimi Chat(文件,拖进来;网址,发出来)。
Svelte ★ 600 5mo agoExplain → -
parrots
Automatic Speech Recognition(ASR), Text-To-Speech(TTS) engine. 中英语音识别、多角色语音合成,支持多语言,准确率高
Python ★ 525 7mo agoExplain → -
pytextclassifier
pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。
Python ★ 523 1y agoExplain → -
nlp-tutorial
自然语言处理(NLP)教程,包括:词向量,词法分析,预训练语言模型,文本分类,文本语义匹配,信息抽取,翻译,对话。
Jupyter Notebook ★ 486 4y agoExplain → -
dialogbot
dialogbot, provide search-based dialogue, task-based dialogue and generative dialogue model. 对话机器人,基于问答型对话、任务型对话、聊天型对话等模型实现,支持网络检索问答,领域知识问答,任务引导问答,闲聊问答,开箱即用。
Python ★ 329 2y agoExplain → -
addressparser ⑂
中文地址提取工具,支持中国三级区划地址(省、市、区)提取和映射,支持地址热力图绘制。
Python ★ 241 1y agoExplain → -
pke_zh
pke_zh, python keyphrase extraction for chinese(zh). 中文关键词或关键句提取工具,实现了KeyBert、PositionRank、TopicRank、TextRank等算法,开箱即用。
Python ★ 215 2y agoExplain → -
AIAvatar
AI Avatar: Build Your Personal Digital Avatar. AI数字人,实时交互数字人,可音视频同步对话。
Python ★ 147 7mo agoExplain → -
chatgpt-webui
ChatGPT WebUI using gradio. 给 LLM 对话和检索知识问答RAG提供一个简单好用的Web UI界面
Python ★ 142 1y agoExplain → -
lmft ▣
ChatGLM-6B fine-tuning.
Python ★ 135 3y agoExplain → -
nerpy
🌈 NERpy: Implementation of Named Entity Recognition using Python. 命名实体识别工具,支持BertSoftmax、BertSpan等模型,开箱即用。
Python ★ 118 2y agoExplain → -
open-o1
open-o1: Using GPT-4o with CoT to Create o1-like Reasoning Chains
Python ★ 115 1y agoExplain → -
pysenti
Chinese Sentiment Classification Tool. 情感极性分类,基于知网、清华、BosonNLP情感词典,易扩展,基准方法,开箱即用。
Python ★ 103 2y agoExplain → -
companynameparser
company name parser, extract company name brand. 中文公司名称分词工具,支持公司名称中的地名,品牌名(主词),行业词,公司名后缀提取。
Python ★ 97 3y agoExplain → -
CodeAssist
CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具,高质量为 Python、Java 和 C++ 补全代码。
Python ★ 60 9mo agoExplain → -
judger
自动作文评分工具,支持中文、英文作文智能评分,支持评分模型自训练,支持WEKA处理模型数据,支持自定义评分算法。java开发。
Roff ★ 56 9y agoExplain → -
github-hot
Tracking the hot Github repos and update daily 每天自动追踪Github热门项目
Python ★ 52 11h agoExplain → -
relext
RelExt: A Tool for Relation Extraction from Text. 文本实体关系抽取工具。
Python ★ 49 4y agoExplain → -
deep-research
Python implementation of AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models.
Python ★ 49 1y agoExplain → -
WebResearcher
WebResearcher: An Iterative Deep-Research Agent,迭代式深度研究智能体
Python ★ 48 4mo agoExplain → -
SearchGPT ⑂
SearchGPT: Building a quick conversation-based search engine with LLMs.
TypeScript ★ 45 1y agoExplain → -
rater
rater, recommender systems. 推荐模型,包括:DeepFM,Wide&Deep,DIN,DeepWalk,Node2Vec等模型实现,开箱即用。
Python ★ 45 5y agoExplain → -
ai-paper-analysis
Daily in-depth Chinese analyses of frontier AI research papers, including topics such as LLMs, agents, RAG, and RL. AI 前沿论文中文深度解读
Python ★ 44 4d agoExplain → -
text-feature
文本特征提取,适用于小说,论文,议论文等文本,提取词语、句子、依存关系等特征。python开发。
Python ★ 42 8y agoExplain → -
labelit
labelit, label tool with active learning, for classification task. 自动标注,基于主动学习,边标注边学习,减少人工标注量。
Python ★ 31 3y agoExplain → -
pinyin-tokenizer
pinyintokenizer, 拼音分词器,将连续的拼音切分为单字拼音列表。
Python ★ 31 1y agoExplain → -
title-generator
Automatic Text Summarization and Title Generation.
Python ★ 25 5y agoExplain → -
thinking-intervention
Used for thinking process intervention of reasoning models such as DeepSeek-R1, effectively controlling the reasoning thinking process. 用于DeepSeek-R1等推理模型的思维过程干预,有效控制推理思考过程
Python ★ 24 1y agoExplain → -
case-analysis
NLP之病历分析:从病历文本之中提取关键信息,便于后续分析处理。
Java ★ 22 9y agoExplain → -
skills
Learn from Experience skill 是让 Agent 从经验中学习 -- 纠正过的错误不再重犯,好的方法自动沉淀,越用越懂你。原理是基于你跟Agent交互中的经验(踩坑、纠正、好方法)记下来、整理好、用起来,不再随会话结束而消失。
★ 20 2mo agoExplain → -
EssaySocring
英文作文自动评分系统,支持评分模型自训练,支持WEKA处理模型数据,支持自定义评分算法。Java开发。
Roff ★ 17 5y agoExplain → -
weibo-roast
一个微博毒舌AI,疯狂 diss 微博博主
Python ★ 15 1y agoExplain → -
crf-seg
crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。
Java ★ 14 4y agoExplain → -
AIDailyNews ⑂
auto push daily news with ai
Astro ★ 13 7h agoExplain → -
VoiceInput
Lightweight macOS menu-bar app for voice input. Hold Fn → speak → release. Built from a single prompt. | 轻量 macOS 语音输入工具,按住Fn说话松开即输入,一句Prompt生成整个项目
Swift ★ 13 2mo agoExplain → -
text2vec-service
Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。
Python ★ 12 4y agoExplain → -
synth-wiki
LLM-wiki: drop in papers/articles/notes, get a structured, interlinked, searchable wiki knowledge base. LLM-wiki:扔论文、丢笔记,自动生成交互链接、可搜索的个人wiki知识库。
Python ★ 11 2mo agoExplain → -
zh-normalization
Chinese(zh) sentence NSW(Non-Standard-Word) Normalization
Python ★ 11 1y agoExplain → -
authorship-identification
【今日头条】文本作者身份识别比赛
Jupyter Notebook ★ 10 7y agoExplain → -
fake-news-detector
Fake News Detection Competition
Python ★ 9 4y agoExplain → -
chinese-chess-ai
Chinese Chess AI Game. 中国象棋AI人机对弈游戏。
JavaScript ★ 8 8mo agoExplain → -
mcp-run-python-code
Python interpreter, MCP server, no API key, free. Get results from running Python code. Python代码解释器MCP,功能包括:执行python代码,运行python库安装,运行python脚本。
Python ★ 8 8mo agoExplain → -
ChatGPT-API-server
build a python server for ChatGPT API.
Python ★ 8 3y agoExplain → -
prompt-optimizer
A 300-line Python tool to automatically optimize prompts using labeled data. 基于标注数据,自动优化 LLM Prompt 的轻量级工具
Python ★ 7 3mo agoExplain → -
cpp-tutorial
C++开发实例教程,基础,开源库进阶,高级技巧。
C++ ★ 6 8y agoExplain → -
ChatGPT-Next-Web ⑂
A cross-platform ChatGPT/Gemini UI (Web / PWA / Linux / Win / MacOS). 一键拥有你自己的跨平台 ChatGPT/Gemini 应用。
TypeScript ★ 5 2y agoExplain → -
angry-birds-gesture
Angry Birds - Gesture Control Edition. 愤怒的小鸟手势版 - 使用手势控制的网页版愤怒的小鸟游戏。
JavaScript ★ 5 6mo agoExplain → -
llm-debate-arena
LLM Debate Tournament(AI 辩论竞技场) - Watch LLMs compete in structured debates with ELO rankings, real-time judging, and tool-augmented argumentation
Python ★ 5 6mo agoExplain → -
url_crawler
High-Concurrency URL Web Page Content Fetching Service. 高并发URL网页内容抓取服务。
Python ★ 5 7mo agoExplain → -
text2vec-encoder
**Text2vecEncoder** wraps the text2vec model with jina. It encodes text data into dense vectors.
Python ★ 5 4y agoExplain → -
nlpcommon
NLP common tools.
Python ★ 5 4y agoExplain → -
dual-mem
Dual-system layered memory SDK for LLM agents — structured write path, hybrid retrieval, evolution chains.
Python ★ 4 5h agoExplain → -
NER-models ⑂
Named Entity Recognition(NER) models, include BERT(softmax, CRF, Span), BiLSTM-CRF model.
Python ★ 4 5y agoExplain → -
weather-forecast-server
weather-forecast-server, MCP server, without api key, free. Get weather for cities around the world. 免费天气预报MCP工具。
Python ★ 4 8mo agoExplain → -
cvnet
have fun with image AI
Jupyter Notebook ★ 4 1y agoExplain → -
HanLP ⑂
自然语言处理 中文分词 词性标注 命名实体识别 依存句法分析 关键词提取 自动摘要 短语提取 拼音 简繁转换
Java ★ 4 6y agoExplain → -
t5-pegasus-pytorch ⑂
追一科技开源的t5-pegasus的Pytorch实现
Python ★ 4 3y agoExplain → -
shibing624
No description.
★ 3 2mo agoExplain → -
agentica-gateway ▣
agentica gateway service, use Lark(feishu) or WeCom control your own personal AI assistant. 使用飞书或者企微控制你的个人AI智能体。
Python ★ 3 2mo agoExplain → -
open-webui ⑂
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
JavaScript ★ 3 5mo agoExplain → -
weibo-api-sdk ⑂
对微博m站API进行封装,免登陆获取微博数据
Python ★ 3 8mo agoExplain → -
tools
tools
JavaScript ★ 3 2y agoExplain → -
codev
Code Agent, Code Dev tool.
Python ★ 3 1y agoExplain → -
BlogDemo ▣
我的csdn博客中使用的代码,主要是算法。
Java ★ 3 4y agoExplain → -
graphrag-lite
graphrag-lite, Lightweight GraphRAG implementation with openai API and knowledge traceability.
Python ★ 2 4mo agoExplain → -
file-server
File Server: Self-hosted file storage service. 一个支持密码保护的简单自托管文件存储服务。
HTML ★ 2 4mo agoExplain → -
NeuralNLP-NeuralClassifier ⑂
An Open-source Neural Hierarchical Multi-label Text Classification Toolkit, add bert hierarchical multi label classification.
Python ★ 2 3y agoExplain → -
phrase-search
短语搜索,支持公司名称、地址名称等短语的搜索,支持自定义排序、拼音处理,内置jetty提供web接口。java编写。
Java ★ 2 4y agoExplain → -
Diffusion-Tuning
Diffusion-Tuning: Training Your Own Diffusion model with custom dataset.
Python ★ 2 2y agoExplain → -
mcp-bocha-search ⑂
Bocha Search MCP Server.
Python ★ 1 6mo agoExplain → -
homebrew-tap
Homebrew tap for shibing624 tools (treesearch and friends)
Ruby ★ 1 2mo agoExplain → -
little-spring
理解spring核心代码,自己仿写spring,实现简化功能。
Java ★ 1 9y agoExplain → -
DeepCTR ⑂
Easy-to-use,Modular and Extendible package of deep-learning based CTR models.
Python ★ 1 6y agoExplain → -
ansj_seg ⑂
ansj分词.ict的真正java实现.分词效果速度都超过开源版的ict. 中文分词,人名识别,词性标注,用户自定义词典
Java ★ 1 9y agoExplain → -
pyweb
Web server use tornado.
Python ★ 1 5y agoExplain → -
html5-demos
Use the html5 to show funny web demos
JavaScript ★ 1 10y agoExplain → -
claude-code ⑂
原汁原昧 Claude Code 可运行,可构建版; Typescript 类型全修复; 企业级可靠性; 安全无毒, lock 文件保真, 可直接 bun i; bun run dev 启动
★ 0 2mo agoExplain → -
claude-code-source-code ⑂
Claude Code v2.1.88 Source Code
★ 0 2mo agoExplain → -
transformers ⑂
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
★ 0 2y agoExplain → -
ChuanhuChatGPT ⑂
GUI for ChatGPT API and many LLMs. Supports agents, file-based QA, GPT finetuning and query with web search. All with a neat UI.
★ 0 2y agoExplain → -
PaddleOCR ⑂
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Python ★ 0 2y agoExplain → -
FastChat ⑂
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Python ★ 0 2y agoExplain → -
SongNet ⑂
Code for ACL 2020 paper "Rigid Formats Controlled Text Generation":https://www.aclweb.org/anthology/2020.acl-main.68/
Python ★ 0 4y agoExplain → -
Paddle-Image-Models ⑂
A PaddlePaddle version image model zoo.
★ 0 4y agoExplain → -
PaddleNLP ⑂
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
★ 0 4y agoExplain →
No repos match these filters.