gitmyhub

wukong

Go ★ 4.5k updated 4y ago

高度可定制的全文搜索引擎

Wukong is a fast, embeddable full-text search engine library for Go apps, with built-in Chinese word segmentation, BM25 relevance ranking, and the ability to index one million documents in under 30 seconds.

GoBM25segosetup: moderatecomplexity 3/5

Wukong is a full-text search engine library written in Go. A full-text search engine lets your application search across a collection of text documents by keyword, returning the most relevant results quickly. Wukong is designed to be embedded in your own Go application rather than run as a standalone service.

The README is written in Chinese and describes an engine built with Chinese-language content in mind, though the underlying technology applies to other languages too. Key numbers cited: indexing one million short posts totaling 500MB takes about 28 seconds, search responses average 1.65 milliseconds, and the engine handles around 19,000 search queries per second. Chinese word segmentation is built in using a companion library called sego, processing text at 27 megabytes per second.

Beyond basic keyword matching, the engine supports proximity scoring (rewarding results where searched terms appear close together in the original text), BM25 relevance scoring (a standard formula used in information retrieval to rank how well a document matches a query), and custom scoring rules so developers can define their own ranking logic. Documents can be added and removed from the index while the engine is running, without restarting. The index can also be saved to disk and reloaded, and a distributed mode is mentioned for spreading work across multiple machines.

The code example in the README shows the minimal setup: initialize the engine with a dictionary file, add a few documents, flush the index, and run a search. The result is a ranked list of matching documents.

The project is released under the Apache License v2, which permits commercial use. The README is sparse in English documentation but links to a tutorial that walks through building a microblog search site in under 200 lines of Go code.

Where it fits