Neural Networks: Tricks of the Trade, Second Edition

需积分: 12 164 浏览量更新于2023-05-26 收藏 11.7MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

Lecture Notes in Computer Science 7700

Commenced Publication in 1973

Founding and Former Series Editors:

Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board

David Hutchison

Lancaster University, UK

Takeo Kanade

Carnegie Mellon University, Pittsburgh, PA, USA

Josef Kittler

University of Surrey, Guildford, UK

Jon M. Kleinberg

Cornell University, Ithaca, NY, USA

Alfred Kobsa

University of California, Irvine, CA, USA

Friedemann Mattern

ETH Zurich, Switzerland

John C. Mitchell

Stanford University, CA, USA

Moni Naor

Weizmann Institute of Science, Rehovot, Israel

Oscar Nierstrasz

University of Bern, Switzerland

C. Pandu Rangan

Indian Institute of Technology, Madras, India

Bernhard Steffen

TU Dortmund University, Germany

Madhu Sudan

Microsoft Research, Cambridge, MA, USA

Demetri Terzopoulos

University of California, Los Angeles, CA, USA

Doug Tygar

University of California, Berkeley, CA, USA

Gerhard Weikum

Max Planck Institute for Informatics, Saarbruecken, Germany

推荐策略

Volume Editors

Grégoire Montavon

Technische Universität Berlin

Department of Computer Science

Franklinstr. 28/29, 10587 Berlin, Germany

E-mail: gregoire.montavon@tu-berlin.de

Geneviève B. Orr

Willamette University

Department of Computer Science

900 State Street, Salem, OR 97301, USA

E-mail: gorr@willamette.edu

Klaus-Robert Müller

Technische Universität Berlin

Department of Computer Science

Franklinstr. 28/29, 10587 Berlin, Germany

and

Korea University

Department of Brain and Cognitive Engineering

Anam-dong, Seongbuk-gu, Seoul 136-713, Korea

E-mail: klaus-robert.mueller@tu-berlin.de

ISSN 0302-9743 e-ISSN 1611-3349

ISBN 978-3-642-35288-1 e-ISBN 978-3-642-35289-8

DOI 10.1007/978-3-642-35289-8

Springer Heidelberg Dordrecht London New York

Library of Congress Control Number: 2012952591

CR Subject Classiﬁcation (1998): F.1, I.2.6, I.5.1, C.1.3, F.2, J.3

LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is

concerned, speciﬁcally the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,

reproduction on microﬁlms or in any other way, and storage in data banks. Duplication of this publication

or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,

in its current version, and permission for use must always be obtained from Springer. Violations are liable

to prosecution under the German Copyright Law.

The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,

even in the absence of a speciﬁc statement, that such names are exempt from the relevant protective laws

and regulations and therefore free for general use.

Typesetting: Camera-ready by author, data conversion by Scientiﬁc Publishing Services, Chennai, India

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Preface to the Second Edition

There have been substantial changes in the ﬁeld of neural networks since the ﬁrst

edition of this book in 1998. Some of them have been driven by external factors

such as the increase of available data and computing power. The Internet made

public massive amounts of labeled and unlabeled data. The ever-increasing raw

mass of user-generated and sensed data is made easily accessible by databases

and Web crawlers. Nowadays, anyone having an Internet connection can parse

the 4,000,000+ articles available on Wikipedia and construct a dataset out of

them. Anyone can capture a Web TV stream and obtain days of video content

to test their learning algorithm.

Another development is the amount of available computing power that has

continued to rise at steady rate owing to progress in hardware design and en-

gineering. While the number of cycles per second of processors has thresholded

due to physics limitations, the slow-down has been oﬀset by the emergence of

processing parallelism, best exempliﬁed by the massively parallel graphics pro-

cessing units (GPU). Nowadays, everybody can buy a GPU board (usually al-

ready available in consumer-grade laptops), install free GPU software, and run

computation-intensive simulations at low cost.

These developments have raised the following question: Can we make use of

this large computing power to make sense of these increasingly complex datasets?

Neural networks are a promising approach, as they have the intrinsic modeling

capacity and ﬂexibility to represent the solution. Their intrinsically distributed

nature allows one to leverage the massively parallel computing resources.

During the last two decades, the focus of neural network research and the

practice of training neural networks underwent important changes. Learning in

deep (or “deep learning”) has to a certain degree displacedtheoncemorepreva-

lent regularization issues, or more precisely, changed the practice of regularizing

neural networks. Use of unlabeled data via unsupervised layer-wise pretrain-

ing or deep unsupervised embeddings is now often preferred over traditional

regularization schemes such as weight decay or restricted connectivity. This new

paradigm has started to spread over a large number of applications such as image

recognition, speech recognition, natural language processing, complex systems,

neuroscience, and computational physics.

The second edition of the book reloads the ﬁrst edition with more tricks.

These tricks arose from 14 years of theory and experimentation (from 1998

to 2012) by some of the world’s most prominent neural networks researchers.

These tricks can make a substantial diﬀerence (in terms of speed, ease of im-

plementation, and accuracy) when it comes to putting algorithms to work on

real problems. Tricks may not necessarily have solid theoretical foundations or

formal validation. As Yoshua Bengio states in Chap. 19, “the wisdom distilled

here should be taken as a guideline, to be tried and challenged, not as a practice

set in stone” [1].

VI G. Montavon and K.-R. Müller

The second part of the new edition starts with tricks to faster optimize neu-

ral networks and make more eﬃcient use of the potentially inﬁnite stream of

data presented to them. Chapter 18 [2] shows that a simple stochastic gradi-

ent descent (learning one example at a time) is suited for training most neural

networks. Chapter 19 [1] introduces a large number of tricks and recommenda-

tions for training feed-forward neural networks and choosing the multiple hyper-

parameters.

When the representation built by the neural network is highly sensitive to

small parameter changes, for example, in recurrent neural networks, second-order

methods based on mini-batches such as those presented in Chap. 20 [9] can be a

better choice. The seemingly simple optimization procedures presented in these

chapters require their fair share of tricks in order to work optimally. The software

Torch7 presented in Chap. 21 [5] provides a fast and modular implementation

of these neural networks.

The novel second part of this volume continues with tricks to incorporate

invariance into the model. In the context of image recognition, Chap. 22 [4] shows

that translation invariance can be achieved by learning a k-means representation

of image patches and spatially pooling the k-means activations. Chapter 23 [3]

shows that invariance can be injected directly in the input space in the form

of elastic distortions. Unlabeled data are ubiquitous and using them to capture

regularities in data is an important component of many learning algorithms.

For example, we can learn an unsupervised model of data as a ﬁrst step, as

discussed in Chaps. 24 [7] and 25 [10], and feed the unsupervised representation

to a supervised classiﬁer. Chapter 26 [12] shows that similar improvements can

be obtained by learning an unsupervised embedding in the deep layers of a neural

network, with added ﬂexibility.

The book concludes with the application of neural networks to modeling time

series and optimal control systems. Modeling time series can be done using a very

simple technique discussed in Chap. 27 [8] that consists of ﬁtting a linear model on

top of a “reservoir” that implements a rich set of time series primitives. Chapter 28

[13] oﬀers an alternative to the previous method by directly identifying the underly-

ing dynamical system that generates the time series data. Chapter 29 [6] presents

how these system identiﬁcation techniques can be used to identify a Markov de-

cision process from the observation of a control system (a sequence of states and

actions in the reinforcement learning terminology). Chapter 30 [11] concludes by

showing how the control system can be dynamically improved by ﬁtting a neural

network as the control system explores the space of states and actions.

The book intends to provide a timely snapshot of tricks, theory, and algo-

rithms that are of use. Our hope is that some of the chapters of the new second

edition will become our companions when doing experimental work—eventually

becoming classics, as some of the papers of the ﬁrst edition have become. Even-

tually in some years, there may be an urge to reload again...

September 2012 Grégoire

Klaus

剩余752页未读，继续阅读

OppenheimerYu

粉丝: 1
资源: 6

会员权益专享

Neural Networks: Tricks of the Trade, Second Edition

nueruel network tricks of the trade

Neural Networks—Tricks of the Trade (2nd Edition).pdf

Neural Networks Tricks of the Trade Second Edition 英文-文字版-带目录

complex-valued neural networks: theories and applications电子版

meta-learning in neural networks: a survey

graph neural networks: a review of methods and applications

优化概率神经网络_Bayesian Neural Networks：贝叶斯神经网络

active learning for convolutional neural networks: a core-set approach

s. haykin, neural networks: a comprehensive foundation, prentice hall intern

循环神经网络有哪些经典书籍

Watermarking Deep Neural Networks

In neural networks, in the context of error back propagation learning, define learning rate and explain its effect on the learning process.

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Adaptive Normalized Risk-Averting Training for Deep Neural Networks

基于Dendritic Neural Networks的分类问题研究的研究意义有哪些

csrnet: dilated convolutional neural networks for understanding the highly congested scenes

2023年最新的神经网络结构

ieee transactions on neural networks an

The current research topic in the field of artificial intelligence

基于Dendritic Neural Networks的分类问题研究的研究意义是什么

会员权益专享

最新资源