way compare TensorFlow, as we have dedicated Section VI to
this specific purpose.
A. General Machine Learning
In the following paragraphs we list and briefly review a
small set of general machine learning libraries in chronolog-
ical order. With general, we mean to describe any particular
library whose common use cases in the machine learning and
data science community include but are not limited to deep
learning. As such, these libraries may be used for statisti-
cal analysis, clustering, dimensionality reduction, structured
prediction, anomaly detection, shallow (as opposed to deep)
neural networks and other tasks.
We begin our review with a library published 21 years
before TensorFlow: MLC++ [9]. MLC++ is a software library
developed in the C++ programming language providing algo-
rithms alongside a comparison framework for a number of
data mining, statistical analysis as well as pattern recognition
techniques. It was originally developed at Stanford University
in 1994 and is now owned and maintained by Silicon Graphics,
Inc (SGI
1
). To the best of our knowledge, MLC++ is the oldest
machine learning library still available today.
Following MLC++ in the chronological order, OpenCV
2
(Open Computer Vision) was released in the year 2000 by
Bradski et al. [10]. It is aimed primarily at solving learning
tasks in the field of computer vision and image recognition,
including a collection of algorithms for face recognition,
object identification, 3D-model extraction and other purposes.
It is released under a BSD license and provides interfaces in
multiple programming languages such as C++, Python and
MATLAB.
Another machine learning library we wish to mention is
scikit-learn
3
[7]. The scikit-learn project was originally devel-
oped by David Cournapeu as part of the Google Summer of
Code program
4
in 2008. It is an open source machine learning
library written in Python, on top of the NumPy, SciPy and
matplotlib frameworks. It is useful for a large class of both
supervised and unsupervised learning problems.
The Accord.NET
5
library stands apart from the aforemen-
tioned examples in that it is written in the C# (“C Sharp”)
programming language. Released in 2008, it is composed not
only of a variety of machine learning algorithms, but also
signal processing modules for speech and image recognition
[11].
Massive Online Analysis
6
(MOA) is an open source frame-
work for online and offline analysis of massive, potentially
infinite, data streams. MOA includes a variety of tools for
classification, regression, recommender systems and other
disciplines. It is written in the Java programming language
1
https://www.sgi.com/tech/mlc/
2
http://opencv.org
3
http://scikit-learn.org/stable/
4
https://summerofcode.withgoogle.com
5
http://accord-framework.net/index.html
6
http://moa.cms.waikato.ac.nz
and maintained by staff of the University of Waikato, New
Zealand. It was conceived in 2010 [12].
The Mahout
7
project, part of Apache Software Foundation
8
,
is a Java programming environment for scalable machine
learning applications, built on top of the Apache Hadoop
9
plat-
form. It allows for analysis of large datasets distributed in the
Hadoop Distributed File System (HDFS) using the MapReduce
programming paradigm. Mahout provides machine learning
algorithms for classification, clustering and filtering.
Pattern
10
is a Python machine learning module we include
in our list due to its rich set of web mining facilities. It com-
prises not only general machine learning algorithms (e.g. clus-
tering, classification or nearest neighbor search) and natural
language processing methods (e.g. n-gram search or sentiment
analysis), but also a web crawler that can, for example, fetch
Tweets or Wikipedia entries, facilitating quick data analysis on
these sources. It was published by the University of Antwerp
in 2012 and is open source.
Lastly, Spark MLlib
11
is an open source machine learning
and data analysis platform released in 2015 and built on top
of the Apache Spark
12
project [13], a fast cluster computing
system. Similar to Apache Mahout, it supports processing
of large scale distributed datasets and training of machine
learning models across a cluster of commodity hardware. For
this, it includes classification, regression, clustering and other
machine learning algorithms [14].
B. Deep Learning
While the software libraries mentioned in the previous
section are useful for a great variety of different machine
learning and statistical analysis tasks, the following paragraphs
list software frameworks especially effective in training deep
learning models.
The first and oldest framework in our list suited to the
development and training of deep neural networks is Torch
13
,
released already in 2002 [6]. Torch consisted originally of
a pure C++ implementation and interface. Today, its core
is implemented in C/CUDA while it exposes an interface
in the Lua
14
scripting language. For this, Torch makes use
of a LuaJIT (just-in-time) compiler to connect Lua routines
to the underlying C implementations. It includes, inter alia,
numerical optimization routines, neural network models as
well as general purpose n-dimensional array (tensor) objects.
Theano
15
, released in 2008 [5], is another noteworthy deep
learning library. We note that while Theano enjoys greatest
popularity among the machine learning community, it is, in
essence, not a machine learning library at all. Rather, it is a
7
http://mahout.apache.org
8
http://www.apache.org
9
http://hadoop.apache.org
10
http://www.clips.ua.ac.be/pages/pattern
11
http://spark.apache.org/mllib
12
http://spark.apache.org/
13
http://torch.ch
14
https://www.lua.org
15
http://deeplearning.net/software/theano/
评论0