Python scikit-learn机器学习实战：0.16.1版用户指南

5星 · 超过95%的资源需积分: 23 196 浏览量更新于2024-07-22 38 收藏 54.19MB PDF 举报

"scikit-learn用户手册0.16.1版" scikit-learn是Python编程语言中广泛使用的机器学习库，它基于BSD开源许可证，由David Cournapeau在2007年发起，并由一个活跃的社区志愿者团队持续维护和发展。该库提供了丰富的机器学习算法，适用于各种任务，包括分类、回归、聚类、数据降维、模型选择以及数据预处理。在使用scikit-learn解决机器学习问题时，一般遵循以下三个关键步骤： 1. 数据准备与预处理：首先，你需要导入和清洗数据，这可能涉及到数据的加载、缺失值处理、异常值检测、特征缩放（如标准化或归一化）以及特征工程等。scikit-learn支持多种数据格式，例如经典的iris数据集和LibSVM格式的数据。 2. 模型选择与训练：在预处理阶段之后，你可以选择合适的模型进行训练。scikit-learn提供了大量预训练的模型，如线性回归、逻辑回归、决策树、随机森林、支持向量机、神经网络等。使用`fit()`方法将数据拟合到模型中，进行训练。 3. 模型验证与参数调优：训练完成后，需要评估模型的性能，可以使用交叉验证、网格搜索等方法进行模型选择和参数调优。scikit-learn提供了`GridSearchCV`等工具，用于自动化地寻找最佳参数组合。手册中的主要内容涵盖： - 1. An introduction to machine learning with scikit-learn：介绍机器学习的基本概念，以及如何使用scikit-learn来处理这些问题。 - 2. A tutorial on statistical-learning for scientific data processing：深入讨论统计学习方法，包括监督学习（如分类和回归）、模型选择和参数调整、无监督学习（如聚类和降维）等。 - 3. Working With Text Data：专门针对文本数据处理的教程，涵盖了加载文本数据、提取特征、训练分类器、构建流水线、评估性能和参数调优的完整流程。手册还提供了实际示例，如20 Newsgroups数据集的处理，以及如何进行文本分类和情感分析等练习，帮助用户更好地理解和应用scikit-learn的功能。通过scikit-learn，开发者可以轻松实现机器学习算法，进行模型开发和验证，这使得它成为数据科学家和机器学习工程师的首选工具之一。不断更新的文档和强大的社区支持确保了scikit-learn能够适应不断发展的机器学习领域。

scikit-learn user guide, Release 0.16.1

>>> from sklearn import svm

>>> from sklearn import datasets

>>> clf = svm.SVC()

>>> iris = datasets.load_iris()

>>> X, y = iris.data, iris.target

>>> clf.fit(X, y)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,

kernel='rbf', max_iter=-1, probability=False, random_state=None,

shrinking=True, tol=0.001, verbose=False)

>>> import pickle

>>> s = pickle.dumps(clf)

>>> clf2 = pickle.loads(s)

>>> clf2.predict(X[0])

array([0])

>>> y[0]

In the speciﬁc case of the scikit, it may be more interesting to use joblib’s replacement of pickle (joblib.dump &

joblib.load), which is more efﬁcient on big data, but can only pickle to the disk and not to a string:

>>> from sklearn.externals import joblib

>>> joblib.dump(clf, 'filename.pkl')

Later you can load back the pickled model (possibly in another Python process) with:

>>> clf = joblib.load('filename.pkl')

Note: joblib.dump returns a list of ﬁlenames. Each individual numpy array contained in the clf object is serialized

as a separate ﬁle on the ﬁlesystem. All ﬁles are required in the same folder when reloading the model with joblib.load.

Note that pickle has some security and maintainability issues. Please refer to section Model persistence for more

detailed information about model persistence with scikit-learn.

6 Chapter 1. An introduction to machine learning with scikit-learn

剩余2159页未读，继续阅读

ssrob

粉丝: 6
资源: 7

Python scikit-learn机器学习实战：0.16.1版用户指南

Scikit-learn 使用手册中文版.zip_SCIKIT-LEARN_Scikit-learn 使用手册中文版_sciki

Scikit-learn 使用手册中文版(官方手册中文版)

scikit-learn-docs-0.19.1

scikit-learn-0.16.1.tar.gz

Scikit-learn使用手册中文版

Scikit-learn 使用手册中文版

scikit-learn用户手册0.21.2版

Scikit-learn使用手册中文版.pdf

scikit-learn-0.16.1.win32-py3.4.exe

scikit-learn-0.16.1.win-amd64-py3.4.exe

最新资源