没有合适的资源?快使用搜索试试~ 我知道了~
首页最新《神经架构搜索NAS》(来自西北大学)
最新《神经架构搜索NAS》(来自西北大学)
需积分: 31 670 浏览量
更新于2023-05-22
评论 2
收藏 1.14MB PDF 举报
神经架构搜索Neural Architecture Search (NAS)是深度学习研究热点。NAS旨在通过使用有限的计算资源,以尽可能少的人工干预的自动化方式设计具有最佳性能的网络架构。西北大学等学者发布了关于神经架构搜索的综述论文,对NAS进行了全面、系统的综述。
资源详情
资源评论
资源推荐

111
A Comprehensive Survey of Neural Architecture Search:
Challenges and Solutions
PENGZHEN REN
∗
and YUN XIAO
∗
, Northwest University
XIAOJUN CHANG, Monash University
PO-YAO HUANG, Carnegie Mellon University
ZHIHUI LI, University of New South Wales
XIAOJIANG CHEN and XIN WANG, Northwest University
Deep learning has made major breakthroughs and progress in many elds. This is due to the powerful
automatic representation capabilities of deep learning. It has been proved that the design of the network
architecture is crucial to the feature representation of data and the nal performance. In order to obtain a good
feature representation of data, the researchers designed various complex network architectures. However, the
design of the network architecture relies heavily on the researchers’ prior knowledge and experience. Due to
the limitations of human’s inherent knowledge, it is dicult for people to jump out of the original thinking
paradigm and design an optimal model. Therefore, a natural idea is to reduce human intervention as much as
possible and let the algorithm automatically design the architecture of the network. Thus going further to the
strong intelligence.
In recent years, a large number of related algorithms for Neural Architecture Search (NAS) have emerged.
They have made various improvements to the NAS algorithm, and the related research work is complicated
and rich. In order to reduce the diculty for beginners to conduct NAS-related research, a comprehensive and
systematic survey on the NAS is essential. Previously related surveys began to classify existing work mainly
from the basic components of NAS: search space, search strategy and evaluation strategy. This classication
method is more intuitive, but it is dicult for readers to grasp the challenges and the landmark work in the
middle. Therefore, in this survey, we provide a new perspective: starting with an overview of the characteristics
of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then giving
solutions for subsequent related research work. In addition, we conducted a detailed and comprehensive
analysis, comparison and summary of these works. Finally, we give possible future research directions.
CCS Concepts: • Computing methodologies → Machine learning algorithms.
Additional Key Words and Phrases: Neural Architecture Search, AutoDL, Modular Search Strategy, Continuous
Search Space, Network Architecture Recycle, Incomplete Training.
ACM Reference Format:
Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen, and Xin Wang. 2020. A
Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. 37, 4,
Article 111 (August 2020), 30 pages. https://doi.org/10.1145/1122445.1122456
∗
Both authors contributed equally to this research.
Authors’ addresses: Pengzhen Ren, pzhren@foxmail.com; Yun Xiao, yxiao@nwu.edu.cn, Northwest University; Xiaojun
Chang, Monash University; Po-Yao Huang, Carnegie Mellon University; Zhihui Li, University of New South Wales; Xiaojiang
Chen; Xin Wang, Northwest University.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and
the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specic permission and/or a fee. Request permissions from permissions@acm.org.
© 2020 Association for Computing Machinery.
0360-0300/2020/8-ART111 $15.00
https://doi.org/10.1145/1122445.1122456
ACM Comput. Surv., Vol. 37, No. 4, Article 111. Publication date: August 2020.
arXiv:2006.02903v1 [cs.LG] 1 Jun 2020

111:2 Ren and Chang, et al.
1 INTRODUCTION
Dlearning has already demonstrated strong learning capabilities in many elds: machine translation
[
1
–
3
], image recognition [
4
,
6
,
7
] and object detection [
8
–
10
]. This is mainly due to the powerful
automatic feature extraction capabilities of deep learning for unstructured data. Deep learning has
transformed the traditional way of manually designing features [
13
,
14
] into automatic extraction
[
4
,
29
,
30
]. This allows researchers to focus on the design of neural architecture [
11
,
12
,
19
].
However, the design of the neural architecture relies heavily on the researchers’ prior knowledge
and experience, which makes it dicult for beginners to make reasonable modications to the
network architecture according to their actual needs. In addition, human’s existing prior knowledge
and xed thinking paradigm are likely to limit the discovery of a new network architecture to a
certain extent.
As a result, Neural architecture search (NAS) came into being. NAS aims to design a network
architecture with the best performance using limited computing resources in an automated way with
as little human intervention as possible. The work of NAS-RL [
11
] and MetaQNN [
12
] is considered
a pioneering work of NAS. The network architecture they obtained using reinforcement learning (RL)
methods reached the state-of-the-art classication accuracy on the image classication task. This
shows that the idea of automated network architecture design is feasible. Subsequently, the work of
Large-scale Evolution [
15
] once again veried the feasibility of this idea, which uses evolutionary
learning to achieve similar results. However, they have consumed hundreds of GPU days or even
more computing resources in their respective methods. This huge amount of calculation is almost
catastrophic for ordinary researchers. Therefore, a lot of work has emerged on how to reduce the
amount of calculation and accelerate the search of the network architecture [18–20, 48, 49, 52, 84,
105
]. With the improvement of NAS search eciency, NAS is also quickly applied in the elds of
object detection [
65
,
75
,
111
,
118
], semantic segmentation [
63
,
64
,
120
], adversarial learning [
53
],
architectural scaling [
114
,
122
,
124
], multi-objective optimization [
39
,
115
,
125
], platform-aware
[
28
,
34
,
103
,
117
], data augmentation [
121
,
123
] and so on. In addition, there is some work to consider
how to strike a balance between performance and eciency [
116
,
119
]. Although NAS-related
research has been so abundant, it is still dicult to compare and reproduce NAS methods [
127
].
Because dierent NAS methods have many dierences in search space, hyperparameters, tricks,
etc., some work is also devoted to providing a unied evaluation platform for popular NAS methods
[78, 126].
With the deepening and rapid development of NAS-related research, some methods previously
accepted by researchers have been proved to be imperfect by new research. And soon there was
an improved solution. For example, early NAS trained each candidate network architecture from
scratch during the architecture search phase, leading to a surge in computation [
11
,
12
]. ENAS [
19
]
proposes to accelerate the process of architecture search by using a parameter sharing strategy. This
strategy avoids training each subnet from scratch, but forces all subnets to share weights, thereby
greatly reducing the time to obtain the best performing subnet from a large number of candidate
networks. Due to the superiority of ENAS in search eciency, the weight sharing strategy was
quickly recognized by a large number of researchers [
23
,
53
,
54
]. However, soon new research found
that the widely accepted weight sharing strategy is likely to lead to inaccurate ranking of candidate
architectures [
24
]. This will make it dicult for the NAS to select the optimal network architecture
from a large number of candidate architectures, thereby further deteriorating the performance of
the nally searched network architecture. Shortly afterwards, DNA [
21
] modularized the large
search space of NAS into blocks, so that the candidate architecture was fully trained to reduce the
representation shift problem caused by the weight sharing. In addition, GDAS-NSAS [
25
] proposes a
Novelty Search based Architecture Selection (NSAS) loss function to solve the problem of multi-model
ACM Comput. Surv., Vol. 37, No. 4, Article 111. Publication date: August 2020.

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions 111:3
Fig. 1. The general framework of NAS. NAS generally starts with a set of predefined operation sets, and uses
search strategies to obtain a large number of candidate network architectures based on the search space
formed by the operation sets. The candidate network architecture is trained and ranked. Then, the search
strategy is adjusted according to the ranking information of the candidate network architecture, thereby
further obtaining a set of new candidate network architectures. When the search is terminated, the most
promising network architecture is used as the final optimal network architecture, which is used for the final
performance evaluation.
forgetting (when weight sharing is used to sequentially train a new network architecture, the
performance of the previous network architecture is reduced) caused by weight sharing during the
super network training process.
Similar research clues are very common in the rapidly developing NAS research eld, so a
comprehensive and systematic survey based on challenges and solutions is very useful for NAS
research. Previous related surveys classied existing work mainly according to the basic components
of NAS: search space, search strategy and evaluation strategy [
26
,
27
]. This classication method is
more intuitive, but it is not conducive to readers to capture the research clues. Therefore, in this
survey, we will rst summarize the characteristics and corresponding challenges of the early NAS
methods. Based on these challenges, we have summarized and categorized existing research in
order to show readers a comprehensive and systematic overview based on challenges and solutions.
Finally, we will compare the performance of existing research work, and give possible future
research directions and some thoughts.
2 CHARACTERISTICS OF EARLY NAS
In this section, we summarize the general framework of early NAS methods, their characteristics,
and the challenges facing subsequent NAS research.
We summarize the general framework of NAS as shown in Fig.1. NAS usually starts with a
set of predened operation sets and uses a search strategy to obtain a large number of candidate
network architectures based on the search space formed by these operation sets. Then train the
candidate network architecture on the training set and rank them according to their accuracy
on the validation set. The ranking information of the candidate network architecture is used as
feedback information to adjust the search strategy to further obtain a set of new candidate network
architectures. When the termination condition is reached, the search will be terminated to select
the best network architecture. The network architecture performs performance evaluation on the
test set.
Early NAS also roughly followed the above process [
11
,
12
,
15
,
16
]. The idea of NAS-RL [
11
]
comes from such a simple observation that the architecture of a neural network can be described
as a variable-length string. Therefore, a very natural idea is that we can use RNN as a controller to
generate such a string, and then use RL to optimize the controller, and nally get a satisfactory
network architecture. MetaQNN [
12
] regards the selection process of the network architecture
ACM Comput. Surv., Vol. 37, No. 4, Article 111. Publication date: August 2020.

111:4 Ren and Chang, et al.
as a Markov decision process, and uses Q-learning to record rewards, so as to obtain the optimal
network architecture. Large-scale Evolution [
15
] aims to automatically learn an optimal network
architecture using evolutionary algorithms (EA) while reducing human intervention as much as
possible. It uses the simplest network structure to initialize a large population, and obtains the best
network architecture by reproduce, mutating, and selecting the population. GeNet [
16
] also used
EA, it proposes a new neural network architecture coding scheme, which represents the network
architecture as a xed-length binary string. It randomly initializes a group of individuals, uses a
predened set of genetic operations to modify the binary string to generate new individuals, and
nally selects competitive individuals as the nal network architecture.
These early NAS made the automatically generated network architecture a reality. In order to
understand the reasons restricting the widespread use of early NAS, we summarized the common
characteristics existing in early NAS work from the perspective of a latecomer as follows:
• Global search strategy.
It requires the NAS to use a search strategy to search all necessary
components of the network architecture. This means that NAS needs to nd an optimal
network architecture within a huge search space. Obviously, the larger the search space, the
higher the corresponding search cost.
• Discrete search space.
It regards the dierences between dierent network architectures
as a limited set of basic operations, that is, by discretely modifying an operation to change
the network architecture. This means that we cannot use the gradient strategy to quickly
adjust the network architecture.
• Search from scratch.
The model is built from scratch until the nal network architecture is
generated. Obviously, this method wastes the existing network architecture design experience
and cannot utilize the existing excellent network architecture.
• Fully trained.
It requires training each candidate network architecture from scratch to
convergence. We know that there is a similar network structure between the subsequent
network architecture and the previous network architecture, as well as between the network
architectures at the same stage. Therefore, training each candidate network architecture from
scratch obviously does not fully utilize this relationship. In addition, we only need to obtain
the relative performance ranking of the candidate architecture. Whether it is necessary to
train each candidate architecture to convergence is also a question worth considering.
The search space is determined by the predened operation set and the hyperparameters of
the network architecture (for example: an architectural template, connection method, the number
of channels of the convolutional layer used for feature extraction in the initial stage, etc.). These
parameters dene which network architectures can be searched by the NAS. Fig.2 shows examples
of two common global search spaces with a chain structure in early NAS work.
o
i
is an operation in
the candidate operation set and the
i
-th operation in the chain structure. The feature map generated
by
o
i
is represented as
z
(i)
. The input goes through a series of operations to get the nal output.
Fig.2 (left): The simplest example of a chain structure MetaQNN [
12
]. At this point, for any feature
map z
(i)
, there is only one input node z
(i−1)
, and
z
(i)
= o
i
{(z
(i−1)
)}. (1)
Fig.2 (right): The example after adding skip connections [
11
,
15
,
16
]. At this time, there can be
multiple inputs for any feature map z
(i)
, and
z
(i)
= o
(i)
n
z
(i−1)
o
⊙
n
z
(k)
|α
k, i
= 1, k < i − 1
o
, (2)
where
⊙
can be a sum operation or a merge operation. For example,
⊙
is a merge operation in
NAS-RL [
11
], and
⊙
is a sum operation in GeNet [
16
]. It should be pointed out that NASNet [
31
]
ACM Comput. Surv., Vol. 37, No. 4, Article 111. Publication date: August 2020.

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions 111:5
Fig. 2. Two common global search spaces with a chain structure in early NAS work. Le: The simplest example
of a chain structure. Right: The example aer adding skip connections.
o
i
is an operation in the candidate set
of operations and the
i
-th operation in the chain structure. The feature map generated by
o
i
is represented as
z
(i)
. The input goes through a series of operations to get the final output.
considered these two operations in the experiment, but the experimental results show that the sum
operation is better than the merge operation. Therefore, since then, a lot of work has taken the
summation operation as the connection method of the feature map obtained between dierent
branch operations [
17
,
36
,
37
]. Similar to the chain structure, Mnasnet [
28
] suggests searching for a
network architecture composed of multiple segments connected in sequence, each segment having
its own repeating structure.
In addition, in the early NAS, searching from scratch was a commonly adopted search strategy.
NAS-RL [
11
] expresses the network architecture as a string of variable length, which is generated
by RNN as a controller. Then generate the corresponding network architecture according to the
string, and then use reinforcement learning as the corresponding search strategy to adjust the
network architecture search. MetaQNN [
12
] considers training an agent to sequentially select the
layer structure of the neural network on the search space constructed by the predened operation
set. It regards the layer selection process as a Markov decision process, and uses Q-learning as
a search strategy to adjust the agent’s selection behavior. Similar to NAS-RL [
11
], GeNet [
16
]
also adopts the idea of encoding the network structure. The dierence is that in GeNet [
16
], the
network architecture representation is regarded as a string of xed-length binary codes. This binary
code is regarded as the DNA of the network architecture. The population is initialized randomly,
and then use evolutionary learning to reproduce, mutate and select the population, and iterate to
select the best individual. It can be seen from the above analysis that these methods do not use the
existing excellent articially designed network architecture, but search the network architecture
from scratch in their respective methods. More simply, Large-scale Evolution [
15
] only uses a
single-layer model without convolution as the starting point for individual evolution. Then use
evolutionary learning methods to evolve the population, and then select the most competitive
individuals in the population. We take Large-scale Evolution [
15
] as an example and show an
example of searching from scratch in Fig.3.
The common characteristics of these early NAS work are also the collective challenges faced
by the automatic generation of network architecture. Based on the above challenges, we will
summarize the solutions in the subsequent NAS-related research work in Section 3.
ACM Comput. Surv., Vol. 37, No. 4, Article 111. Publication date: August 2020.
剩余29页未读,继续阅读

















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0