Self-taught Learning: Transfer Learning from Unlabeled Data

需积分: 50 149 浏览量更新于2023-05-31 收藏 474KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

Self-taught Learning: Transfer Learning from Unlabeled Data

Rajat Raina rajatr@cs.stanford.edu

Alexis Battle ajbattle@cs.stanford.edu

Honglak Lee hllee@cs.stanford.edu

Benjamin Packer bpacker@cs.stanford.edu

Andrew Y. Ng ang@cs.stanford.edu

Computer Science Department, Stanford University, CA 94305 USA

Abstract

We present a new machine learning frame-

work called “self-taught learning” for using

unlabeled data in supervised classiﬁcation

tasks. We do not assume that the unla-

beled data follows the same class labels or

generative distribution as the labeled data.

Thus, we would like to use a large number

of unlabeled images (or audio samples, or

text documents) randomly downloaded from

the Internet to improve performance on a

given image (or audio, or text) classiﬁcation

task. Such unlabeled data is signiﬁcantly eas-

ier to obtain than in typical semi-supervised

or transfer learning settings, making self-

taught learning widely applicable to many

practical learning problems. We describe an

approach to self-taught learning that uses

sparse coding to construct higher-level fea-

tures using the unlabeled data. These fea-

tures form a succinct input representation

and signiﬁcantly improve classiﬁcation per-

formance. When using an SVM for classiﬁ-

cation, we further show how a Fisher kernel

can be learned for this representation.

1. Introduction

Labeled data for machine learning is often very diﬃ-

cult and expensive to obtain, and thus the ability to

use unlabeled data holds signiﬁcant promise in terms

of vastly expanding the applicability of learning meth-

ods. In this paper, we study a novel use of unlabeled

data for improving performance on supervised learn-

ing tasks. To motivate our discussion, consider as a

running example the computer vision task of classi-

fying images of elephants and rhinos. For this task,

it is diﬃcult to obtain many labeled examples of ele-

phants and rhinos; indeed, it is diﬃcult even to obtain

many unlabeled examples of elephants and rhinos. (In

fact, we ﬁnd it diﬃcult to envision a process for col-

lecting such unlabeled images, that does not immedi-

Appearing in Proceedings of the 24

International Confer-

ence on Machine Learning, Corvallis, OR, 2007. Copyright

2007 by the author(s)/owner(s).

ately also provide the class labels.) This makes the

classiﬁcation task quite hard with existing algorithms

for using labeled and unlabeled data, including most

semi-supervised learning algorithms such as the one

by Nigam et al. (2000). In this paper, we ask how un-

labeled images from other object classes—which are

much easier to obtain than images speciﬁcally of ele-

phants and rhinos—can be used. For example, given

unlimited access to unlabeled, randomly chosen im-

ages downloaded from the Internet (probably none of

which contain elephants or rhinos), can we do better

on the given supervised classiﬁcation task?

Our approach is motivated by the observation that

even many randomly downloaded images will contain

basic visual patterns (such as edges) that are similar

to those in images of elephants and rhinos. If, there-

fore, we can learn to recognize such patterns from the

unlabeled data, these patterns can be used for the su-

pervised learning task of interest, such as recognizing

elephants and rhinos. Concretely, our approach learns

a succinct, higher-level feature representation of the in-

puts using unlabeled data; this representation makes

the classiﬁcation task of interest easier.

Although we use computer vision as a running exam-

ple, the problem that we pose to the machine learning

community is more general. Formally, we consider

solving a supervised learning task given labeled and

unlabeled data, where the unlabeled data does not

share the class labels or the generative distribution of

the labeled data. For example, given unlimited access

to natural sounds (audio), can we perform better

speaker identiﬁcation? Given unlimited access to news

articles (text), can we perform better email foldering

of “ICML reviewing” vs. “NIPS reviewing” emails?

Like semi-supervised learning (Nigam et al., 2000),

our algorithms will therefore use labeled and unlabeled

data. But unlike semi-supervised learning as it is typ-

ically studied in the literature, we do not assume that

the unlabeled data can be assigned to the supervised

learning task’s class labels. To thus distinguish our

formalism from such forms of semi-supervised learn-

ing, we will call our task self-taught learning.

There is no prior general, principled framework for

incorporating such unlabeled data into a supervised

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余7页未读，立即下载

wblgers1234

粉丝: 144
资源: 16

会员权益专享

Self-taught Learning: Transfer Learning from Unlabeled Data

Self-taught Learning Transfer Learning from Unlabeled Data

self-learning

selflearning

无监督CNN分类算法有哪些

while not converged do

使用python设计一个简单的学生管理系统代码，包括person，student，course，teacher，admin

用python写一个简单的学生和课程管理程序，用字典保存学生和课程信息用prettytable来展示数据，可以输出学生和教师课表，并给出测试代码

Assuming that all modules are mandatory, produce a list of all students (containing their IDs, first and last names) taught by Brett from the "Students" and "Modules" tables. Write your answer using a single SQL statement.

1.定义一个课程对象，并访问、增加属性,删除属性，检测属性，遍历对象

帮我写一封邮件，来应聘大学食品专业老师岗位

tell me a stroy

“Each teacher teaches up to 2 modules. A module can be taught by 1 to 3 teachers”, what is the cardinality ratio of teacher and module?

帮我写一封邮件，来应聘大学的食品专业老师岗位

写一篇关于你军训的经历的英语演讲时间大约1分半

用JS定义一个课程对象，并访问、增加属性,删除属性，检测属性，遍历对象

会员权益专享

最新资源