scikit-learn user guide, Release 0.20.dev0
• sklearn_theano scikit-learn compatible estimators, transformers, and datasets which use Theano internally
• nolearn A number of wrappers and abstractions around existing neural network libraries
• keras Deep Learning library capable of running on top of either TensorFlow or Theano.
• lasagne A lightweight library to build and train neural networks in Theano.
Broad scope
• mlxtend Includes a number of additional estimators as well as model visualization utilities.
• sparkit-learn Scikit-learn API and functionality for PySpark’s distributed modelling.
Other regression and classification
• xgboost Optimised gradient boosted decision tree library.
• ML-Ensemble Generalized ensemble learning (stacking, blending, subsemble, deep ensembles, etc.).
• lightning Fast state-of-the-art linear model solvers (SDCA, AdaGrad, SVRG, SAG, etc. . .).
• py-earth Multivariate adaptive regression splines
• Kernel Regression Implementation of Nadaraya-Watson kernel regression with automatic bandwidth selection
• gplearn Genetic Programming for symbolic regression tasks.
• multiisotonic Isotonic regression on multidimensional features.
Decomposition and clustering
• lda: Fast implementation of latent Dirichlet allocation in Cython which uses Gibbs sampling
to sample from the true posterior distribution. (scikit-learn’s sklearn.decomposition.
LatentDirichletAllocation implementation uses variational inference to sample from a tractable
approximation of a topic model’s posterior distribution.)
• Sparse Filtering Unsupervised feature learning based on sparse-filtering
• kmodes k-modes clustering algorithm for categorical data, and several of its variations.
• hdbscan HDBSCAN and Robust Single Linkage clustering algorithms for robust variable density clustering.
• spherecluster Spherical K-means and mixture of von Mises Fisher clustering routines for data on the unit hyper-
sphere.
Pre-processing
• categorical-encoding A library of sklearn compatible categorical variable encoders.
• imbalanced-learn Various methods to under- and over-sample datasets.
1.4.3 Statistical learning with Python
Other packages useful for data analysis and machine learning.
• Pandas Tools for working with heterogeneous and columnar data, relational queries, time series and basic statis-
tics.
• theano A CPU/GPU array processing framework geared towards deep learning research.
• statsmodels Estimating and analysing statistical models. More focused on statistical tests and less on prediction
than scikit-learn.
• PyMC Bayesian statistical models and fitting algorithms.
• Sacred Tool to help you configure, organize, log and reproduce experiments
10 Chapter 1. Welcome to scikit-learn