gitmyhub

libsvm

Java ★ 4.7k updated 6mo ago

LIBSVM -- A Library for Support Vector Machines

LIBSVM is a widely-used, decades-old library for training Support Vector Machine models that classify data or predict numbers, comes with command-line tools, a Python interface, and a helper script that automates the full training pipeline for beginners.

CJavaPythonMATLABsetup: moderatecomplexity 3/5

LIBSVM is a software library for training and using Support Vector Machines, a family of mathematical models used in machine learning for classification and regression tasks. Given a dataset with labeled examples, a support vector machine learns a boundary that separates the categories and can then classify new, unseen data points. LIBSVM is one of the most widely used implementations of this technique and has been a standard reference in academic and applied machine learning for decades.

The library covers several variations of the SVM approach: C-SVC and nu-SVC for classifying data into categories, epsilon-SVR and nu-SVR for predicting continuous numeric values, and one-class SVM for detecting whether new data resembles the training set. These variants differ in how they handle the tradeoffs between fitting the training data and tolerating errors.

Users interact with LIBSVM through three command-line programs. The svm-train program reads a data file, fits a model, and writes it to disk. The svm-predict program loads a saved model and produces predictions on new data. The svm-scale program rescales input features to a consistent range, which the documentation says improves results in practice. A Python script called easy.py automates the full pipeline, including scaling and searching for good model parameters, making it accessible to people new to the technique.

Data files use a plain-text format where each line represents one example: a label followed by index-value pairs for each feature. This sparse format is efficient for datasets where many features are zero.

Interfaces are available for Java, Python, and MATLAB/Octave, in addition to the core C implementation. A simple graphical toy program lets users draw data points on screen and visualize how the model separates them. Pre-built Windows binaries are included. The library is distributed with a copyright notice that permits free use for research and commercial purposes with attribution.

Where it fits