没有合适的资源?快使用搜索试试~ 我知道了~
首页Self-taught Learning: Transfer Learning from Unlabeled Data
资源详情
资源推荐
![](https://csdnimg.cn/release/download_crawler_static/9868054/bg1.jpg)
Self-taught Learning: Transfer Learning from Unlabeled Data
Rajat Raina rajatr@cs.stanford.edu
Alexis Battle ajbattle@cs.stanford.edu
Honglak Lee hllee@cs.stanford.edu
Benjamin Packer bpacker@cs.stanford.edu
Andrew Y. Ng ang@cs.stanford.edu
Computer Science Department, Stanford University, CA 94305 USA
Abstract
We present a new machine learning frame-
work called “self-taught learning” for using
unlabeled data in supervised classification
tasks. We do not assume that the unla-
beled data follows the same class labels or
generative distribution as the labeled data.
Thus, we would like to use a large number
of unlabeled images (or audio samples, or
text documents) randomly downloaded from
the Internet to improve performance on a
given image (or audio, or text) classification
task. Such unlabeled data is significantly eas-
ier to obtain than in typical semi-supervised
or transfer learning settings, making self-
taught learning widely applicable to many
practical learning problems. We describe an
approach to self-taught learning that uses
sparse coding to construct higher-level fea-
tures using the unlabeled data. These fea-
tures form a succinct input representation
and significantly improve classification per-
formance. When using an SVM for classifi-
cation, we further show how a Fisher kernel
can be learned for this representation.
1. Introduction
Labeled data for machine learning is often very diffi-
cult and expensive to obtain, and thus the ability to
use unlabeled data holds significant promise in terms
of vastly expanding the applicability of learning meth-
ods. In this paper, we study a novel use of unlabeled
data for improving performance on supervised learn-
ing tasks. To motivate our discussion, consider as a
running example the computer vision task of classi-
fying images of elephants and rhinos. For this task,
it is difficult to obtain many labeled examples of ele-
phants and rhinos; indeed, it is difficult even to obtain
many unlabeled examples of elephants and rhinos. (In
fact, we find it difficult to envision a process for col-
lecting such unlabeled images, that does not immedi-
Appearing in Proceedings of the 24
th
International Confer-
ence on Machine Learning, Corvallis, OR, 2007. Copyright
2007 by the author(s)/owner(s).
ately also provide the class labels.) This makes the
classification task quite hard with existing algorithms
for using labeled and unlabeled data, including most
semi-supervised learning algorithms such as the one
by Nigam et al. (2000). In this paper, we ask how un-
labeled images from other object classes—which are
much easier to obtain than images specifically of ele-
phants and rhinos—can be used. For example, given
unlimited access to unlabeled, randomly chosen im-
ages downloaded from the Internet (probably none of
which contain elephants or rhinos), can we do better
on the given supervised classification task?
Our approach is motivated by the observation that
even many randomly downloaded images will contain
basic visual patterns (such as edges) that are similar
to those in images of elephants and rhinos. If, there-
fore, we can learn to recognize such patterns from the
unlabeled data, these patterns can be used for the su-
pervised learning task of interest, such as recognizing
elephants and rhinos. Concretely, our approach learns
a succinct, higher-level feature representation of the in-
puts using unlabeled data; this representation makes
the classification task of interest easier.
Although we use computer vision as a running exam-
ple, the problem that we pose to the machine learning
community is more general. Formally, we consider
solving a supervised learning task given labeled and
unlabeled data, where the unlabeled data does not
share the class labels or the generative distribution of
the labeled data. For example, given unlimited access
to natural sounds (audio), can we perform better
speaker identification? Given unlimited access to news
articles (text), can we perform better email foldering
of “ICML reviewing” vs. “NIPS reviewing” emails?
Like semi-supervised learning (Nigam et al., 2000),
our algorithms will therefore use labeled and unlabeled
data. But unlike semi-supervised learning as it is typ-
ically studied in the literature, we do not assume that
the unlabeled data can be assigned to the supervised
learning task’s class labels. To thus distinguish our
formalism from such forms of semi-supervised learn-
ing, we will call our task self-taught learning.
There is no prior general, principled framework for
incorporating such unlabeled data into a supervised
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://profile-avatar.csdnimg.cn/3ae5ac40e5124e069dc1aa022758da80_wblgers1234.jpg!1)
wblgers1234
- 粉丝: 144
- 资源: 16
上传资源 快速赚钱
我的内容管理 收起
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![](https://csdnimg.cn/release/wenkucmsfe/public/img/voice.245cc511.png)
会员权益专享
最新资源
- 利用迪杰斯特拉算法的全国交通咨询系统设计与实现
- 全国交通咨询系统C++实现源码解析
- DFT与FFT应用:信号频谱分析实验
- MATLAB图论算法实现:最小费用最大流
- MATLAB常用命令完全指南
- 共创智慧灯杆数据运营公司——抢占5G市场
- 中山农情统计分析系统项目实施与管理策略
- XX省中小学智慧校园建设实施方案
- 中山农情统计分析系统项目实施方案
- MATLAB函数详解:从Text到Size的实用指南
- 考虑速度与加速度限制的工业机器人轨迹规划与实时补偿算法
- Matlab进行统计回归分析:从单因素到双因素方差分析
- 智慧灯杆数据运营公司策划书:抢占5G市场,打造智慧城市新载体
- Photoshop基础与色彩知识:信息时代的PS认证考试全攻略
- Photoshop技能测试:核心概念与操作
- Photoshop试题与答案详解
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035711.png)
![](https://img-home.csdnimg.cn/images/20220527035111.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![](https://csdnimg.cn/release/wenkucmsfe/public/img/green-success.6a4acb44.png)