2019年ISSCC：全球最前沿的集成电路技术摘要

需积分: 10 31 浏览量更新于2024-07-16 收藏 95.17MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"ISSCC2019-Digest.pdf" 是2019年IEEE国际固态电路学会（IEEE Solid-State Circuits Society）的固态电路会议摘要，这是一场汇集了全球前沿技术的盛会，主要关注最新的存储技术、高速设计、模拟放大器设计以及功放等领域。会议于2019年2月17日至21日举行。摘要集包含了62卷的技术论文，是当年固态电路技术的重要参考资料。在本次会议中，与会者可以了解到全球最新的存储技术进展。存储技术是信息技术的基础，它涉及到计算机内存、闪存、硬盘驱动器等多个方面。新的存储技术可能包括更高的存储密度、更快的读写速度、更低的能耗，甚至全新的存储机制，比如量子存储或者生物启发式存储等，这些都能显著提升电子设备的性能和效率。高速设计是另一个关键领域，它涵盖了通信系统、数据处理中心和高性能计算。高速设计通常涉及到信号完整性、电源完整性、电磁兼容性等问题，研究如何在不断提高速度的同时保持系统的稳定性和可靠性。在2019年的ISSCC中，可能会展示一些创新的高速接口技术、高速数字电路设计方法，以及解决高速设计中挑战的新方案。模拟放大器设计是电子工程中的核心技术，它在传感器信号处理、音频系统、图像传感器等领域有着广泛的应用。模拟放大器的设计往往需要平衡增益、带宽、噪声和功耗等多个因素。在摘要中，我们可以期待看到新的模拟放大器架构，或者针对特定应用优化的高性能放大器设计。功率放大器（Power Amplifier）在无线通信中扮演着至关重要的角色，因为它们决定了无线设备的发射功率和效率。在标签"power wireless"中，我们可以预见到会议将涵盖有关无线通信功率放大器的最新研究成果，包括更高效的功放设计、线性化技术，以及适用于5G和其他下一代无线通信标准的新技术。 "ISSCC2019-Digest.pdf" 提供了一个全面的平台，展示了全球科学家和工程师在模拟和数字集成电路领域的最新进展。参会者和读者可以从这些论文中获取到最尖端的科研成果，以及对现有技术的深度理解，这对于推动整个IT行业的技术进步具有重大意义。无论是对于学术研究，还是工业界的产品开发，这都是一个宝贵的资源。

资源详情

资源推荐

ISSCC 2019 / February 18, 2019 / 8:45 AM

3.2.1 Dynamic Networks, Differentiable Programming: A relatively re

cent

concept in DL is the idea of dynamic networks. Regular DL systems use a

static network of parameterized modules. But, in increasingly many

applications, the network architecture is dynamic and changes for every new

data point. In effect, dynamic DL systems can be seen as the execution trace

of a program, with conditionals and loops that are input-dependent. DL

frameworks such as PyTorch record a “tape” of this execution trace, which

can be played backwards to back-propagate gradients through the program.

This method is known as “autograd”. The phrase “differentiable programming”

designates the process of writing a program with calls to parameterized

functions that automatically compute the gradient of the function’s

output

with respect to the parameters, allowing the function to be ﬁnalized through

learning. Dynamic networks are particularly useful in a variety of applications:

for natural-language processing, for data that does not come in the form of a

ﬁxed-sized tensor, for systems that need to activate parts of a large network

on demand in a data-dependent way (such as the Multi-scale DenseNet

architecture [47] shown in Figure 1.1.4) and for “reasoning” networks whose

output is another network speciﬁcally designed to answer a particular question

[48, 49] (see Figure 1.1.5).

3.2.2 Neural Networks on Graphs: One of the most exciting recent

developments

in DL is neural networks on graphs [50]. Many problems

are difﬁcult to represent with ﬁxed-size tensors or variable-length

sequences of tensors, but are better represented by graphs whose arcs

and nodes are annotated by tensors. This suggests the use of networks

of differentiable modules whose inputs and outputs are annotated

graphs. The idea goes back to Graph Transformer Networks, built to

recognize character strings [18]. But recent incarnations of graph neural

networks have been applied to 3D meshes, social networks, gene-

regulation networks, and chemical molecules. Convolution operations

can easily be deﬁned on irregular graphs: they are deﬁned as diagonal

operators in the eigenspace of the graph Laplacian, which is a

generalization of the Fourier transform. We foresee an increase in the

usage of such networks for a wide variety of applications, which are

likely to violate the assumptions of current DL hardware.

3.2.3 Graph Embedding Networks: Increasingly, DL is used for large-

scale

embedding of knowledge bases. For example, using a large

knowledge graph composed of triplets (subject, relation, object), such as

(“Barak Obama”,

“was

born in”,

“Hawaii”)

one may train a network to rate

such triplets or to predict one of the elements from the other two. A special

case of this consists in learning a vector for each object and subject, such

as a simple scalar-valued operation between the vectors (distance) will

predict the presence or absence of a particular relation between the object

or subject. These methods, applied on a large scale, are particularly efﬁcient

for recommender systems, and can use hyperbolic metric spaces to

represent hierarchical categories [51].

3.2.4 Memory-Augmented Networks: To endow DL systems with the

abil

ity to reason, they need a short-term memory, to be used as an

episodic memory, or a scratchpad/working memory. For example, if a

system is to answer questions about a series of events (described as a

text), it must be able to store the story in a memory and retrieve the

relevant bits to answer a particular question. This led to the memory-

network architecture [52, 53] in which a recurrent neural nework is

augmented by what amounts to a differentiable associative memory

circuit (see Figure

1.1.5)

. This associative memory can be quite large

and requires ﬁnding the nearest neighbors to a key vector very

efﬁciently. As DL systems are increasingly used for high-level cognitive

tasks, such memory modules will become commonplace and very large,

requiring hardware support.

3.2.5 Complex Inference and Search: Most of today’s DL systems

simply

produce an output given an input. But complex reasoning

requires that the output variable actually be an input to a scoring

network whose scalar output (akin to energy) indicates the

incompatibility between the input and an output proposal. An inference

procedure must search for the output value that minimizes the energy.

This type of model is called an energy-based model [54]. If the energy-

minimizing inference procedure is gradient-based, inference hardware

will need to support back-propagation.

3.2.6 Sparse Activations: As the size of DL systems grows, it is likely that

the modules’ activations will become increasingly sparse, with only a subset

of variables of a subset of modules being activated at any one time. This is

akin to how the brain represents information: on average, neurons

in the brain

are at 2% of their maximum activation, and most neurons are quiet

most of the time, which is good for power dissipation. Examples of

explicitly sparse networks already exist (for processing volumetric

imaging data [56]).

3.2.7 Overall:

New architectural concepts such as dynamic networks, graph data,

associative-memory structures, and inference-through-minimization

procedures are likely to affect the type of hardware architectures that will be

required in the future.

4 The Revolution will not be Supervised

With all the hype around the new AI and DL, the way machines learn today

is vastly less efﬁcient than the way humans and animals learn. Almost all

practical applications of DL use supervised learning (SL), in which the

system is fed the desired output during training, with a tiny minority using

reinforcement learning (RL). Most humans are capable of learning to drive

a car in about 30 hours of training without ever causing accidents. In

contrast, current model-free RL methods would likely require millions of

hours of practice, with numerous accidents, for an autonomous car to learn

to drive. This is not a problem in easy-to-simulate fully-observable

environments with discrete state, such as the game of go or chess. But, it

does not work in the real world! Obviously, our current learning paradigms

are missing a key ingredient.

One hypothesis is that this missing ingredient is self-supervised learning.

The bulk of learning in humans and animals is self-supervised: we learn

enormous amounts of background knowledge about how the world works

by observation in the ﬁrst days, weeks, and months of life. In particular,

we learn intuitive physics and the properties of the physical world. By the

age of 9 months, babies understand object permanence, stability, animate

vs inanimate objects, stability, gravity, inertia, and so on. The ability to

predict what is going to happen in the world is what allows us to learn to

drive without causing accidents: our world model allows us to anticipate

the consequences of our actions, to maintain the car on the road, and to

avoid disasters.

The idea of self-supervised learning is to train a machine to predict any

subset of its input from other subsets (with a possible overlap between

the subsets). For example, given a 6-frame video clip, one could train a

DL system to predict the last two frames from the ﬁrst four.

Why should SSL be more efﬁcient than either RL or SL? In RL, the system

produces an output (often an action or sequence of actions) and gets in

return a single scalar value representing the “reward” for this action.

Learning a complex task in this scenario requires a very large number of

trials, and a large number of errors. While the process works ﬁne for fully-

observable games (such as chess and go) where millions of trials can be

generated through self-play, it is largely impractical in the real world. A

model-free RL system would require millions of hours of driving and

numerous crashes to train a car to drive itself. The number of trials required

is large because the feedback from the environment is information-poor.

In SL, the system is given the correct answer, generally in the form of a

target output vector. While this is less information-poor than in RL, it still

requires a lot of training samples to capture the essence of the problem.

On the other hand, SSL asks the machine to predict a large amount of

information in the form of a high-dimensional signal (such as a whole video

frame). More complex models with more parameters can be learned with

a given number of samples or trials. The main difﬁculty is that predicting

the future of a video is not achievable exactly because the world is not

entirely predictable. If one uses a least-square criterion to train a video

predictor, the resulting predictions are blurry frames: an average of all the

possible futures. To make sharp predictions, one must have a set of latent

variables that, when passed through a predictor, parameterize the set of

plausible predictions. One technique used to train such models is

Generative Adversarial Networks (GAN) [59], which for training uses two

networks simultaneously: a generator that makes predictions using

15DIGEST OF TECHNICAL PAPERS •

16 • 2019 IEEE International Solid-State Circuits Conference

ISSCC 2019 / SESSION 1 / PLENARY / 1.1

observations and a source of random vectors drawn from a known

distribution, with a discriminator whose role is to produce a scalar energy

indicating whether a generated prediction is plausible or not. The

discriminator is trained to distinguish real data (low energy) from generated

predictions (high energy). The generator trains itself to produce predictions

that the discriminator cannot tell are fake. To do so, the generator uses the

gradient of the discriminator’s output energy with respect to its input to

compute how to modify its predictions, and thereby modify its parameters.

Variations of GANs have produced stunning results in image generation

[61, 62]. Other latent-variable generative models, such as Variational Auto-

Encoders [60] and regularized latent variable models [45] have also

produced good results.

One hope is that training a system to predict videos will allow it to

discover much of the hidden regularities, geometry, and physics of the

world, such as the fact that the scenery changes in particular ways as

the camera moves, and that certain objects occlude others and can

move independently. Such predictions can be done in pixel space [43,

44], or in higher-level representations (such as instance segmentation

maps obtained by a pre-trained system [46]).

The use of predictive models that not only predict the evolution of the

environment, but also predict the consequences of actions, is key to

reducing the number of trials a system needs to learn a skill. I predict that

self-supervised latent-variable predictive models will be the centerpiece of

intelligent systems based on model-predictive control and model-based

reinforcement learning for such applications as robotic grasping and

manipulation [44] and autonomous driving [45]. Figure 1.1.6 shows a

latent-variable predictive model that predicts a visual representation of the

surroundings of a car. This kind of model can be used to predict multiple

scenarios of how surrounding cars are going to move, and to plan a driving

policy accordingly.

If self-supervised learning eventually allows machines to learn vast

amounts of background knowledge about how the world works through

observation, one may hypothesize that some form of machine common

sense could emerge! One form of common sense is our ability to ﬁll in the

blanks, using our knowledge of the structure and constraints of the world.

Future DL systems will largely be trained using a form of self-supervised

learning. These systems will be much larger than they are today, because the

amount of data with which they can be trained (e.g. raw video) is essentially

unlimited. Such systems will eventually be trained to acquire vast amounts of

background knowledge so as to acquire a form of common sense. New high-

performance hardware will be required to enable such progress.

5 Requirements for Future DL Hardware and Software

5.1 How Will DL Software Evolve?

Clearly, what is needed is a software framework for differentiable

programming that is both interactive, ﬂexible, dynamic, and efﬁcient.

Although frameworks such as PyTorch, TensorFlow, and others are moving

in that direction, the main obstacle is that people love Python, largely

because of its gigantic set of libraries. But Python is very slow and memory

hungry. It is often impractical to develop high-volume applications or

embedded applications that rely on Python at runtime. However, for static

compute graphs, there is no issue: one can export the graph to adhere to

a standard format, such ONNX (Open Neural Net Exchange), and use one

of the numerous ONNX-compliant backends. On the other hand, for

dynamic networks, there are two main options: One is to provide a compiler

for a sufﬁciently large subset of Python that can produce Python-

independent executables for DL (such as Torch.Jit in the recently-released

PyTorch-1.0 [64]). This may also require an auxiliary domain-speciﬁc

language to specify low-level numerical operations (on tensors and graphs)

such as Tensor Comprehensions [55]; A second option is to design a

suitable compilable language from scratch. It would have to be interactive

and dynamic, have safe parallelism, and use type inference as much as

possible, perhaps something resembling Julia or Skip [63] with good

support for scientiﬁc computing. However, dedicated user’s desire to

access the vast repository of Python libraries will limit its potential

adoption.

5.2 Hardware for Training

One problem is that sparsity, architecture dynamicity, and modules that

manipulate non-tensor data (graphs), break the assumption that one

can perform computation on batches of identically-sized samples.

Unfortunately, with current hardware, batching is what allows us to

reduce most low-level neural network operations to matrix products,

and thereby to reduce the memory access-to-computation ratio. Thus,

we will need new hardware architectures that can function efﬁciently

with a batch size of one. As well, handling sparse structured data is

another requirement. Increasingly, input data will come to us in a variety

of forms, beyond tensors, such as graphs annotated with tensors and

symbols.

Down the line, one can imagine architectures and learning algorithms

that favor sparse activations in the network. When most units are off

most of the time, it may become advantageous to make our hardware

event driven, so that only the units that are activated consume

resources. Such sparse networks, such as Submanifold Sparse

ConvNets (implemented in software) have been shown to be very

effective for processing sparse data, such as 3D scenes, which are

represented by voxel arrays that are largely empty [56]. Sparse

activation is one of the features that makes the brain so power-efﬁcient.

5.3 Hardware for Inference

While demand for data-center and cloud-based inference will grow,

future DL applications will increasingly run on mobile phones,

wearables, home apppliances, vehicles, IoT devices, and robots.

Applications in augmented and virtual reality and telepresence will

require extremely low-power ASICs for DL inference for such things as

real-time/low-latency object tracking, 3D re-construction, instance

labeling, facial reconstruction, predictive compression and display.

In the short and medium term, the bulk of the computation will be

convolutions. Since batching is out of the question, hardware will have

to exploit the regularities of convolutions instead of being mere matrix-

product engines.

Ultimately, the solution to power constraints may well be the exploitation

of sparse activations, perhaps using event-based computation. In any

case, it may exploit the use of exotic number representations (the 8-bit

logarithmic representation of [35]).

6 The Long Term Outlook

In the long run, could we see a return to analog implementations?

Perhaps programmable resistor technology will become sufﬁciently

compact, reliable, durable, and conﬁgurable for DL applications. But

since this would require one unmovable physical memory cell per

parameter in the network, only activations could be circulated (assuming

they are converted to digital representation), and hardware multiplexing

would be limited to sections that share weights (as in the ANNA chip).

It is very unclear whether analog implementations provide any power

dissipation advantages over digital, and current evidence seems to point

in the opposite direction.

A number of authors have been advocating architectures with spiking

neurons. Unfortunately, the performance of spiking neuron circuits

seems considerably inferior to that of traditional digital architectures for

realistic ConvNet-type networks [57]. Current learning algorithms do

not take advantage of the peculiarities of spiking networks, and no

spiking-neuron learning algorithms has been shown to come close to

the accuracy of backprop with continuous representations.

The important trends discussed in this paper include: (1) more self-

supervised learning, resulting in larger network architectures; (2)

dynamic network resulting from differentiable programs whose

architecture changes for each new sample; (3) the need for hardware

that is efﬁcient for batch-size 1, implying the end of reliance of matrix

products as the lowest-level operator; (3) exotic number representation

for inference on low-power hardware; (4) very large networks with very

sparse activations, that new architectures could exploit for power

reduction; (5) new operators such as fast K-nearest neighbors for

ISSCC 2019 / February 18, 2019 / 8:45 AM

17DIGEST OF TECHNICAL PAPERS •

(differentiable) associative-memory modules; (6) networks that

manipulate annotated graphs instead of tensors. However, chances are

that the bulk of the computation in future DL systems will still consist

primarily of convolutions.

References

[1] L.D. Jackel, R.E. Howard, H.P. Graf, B. Straughn, J.S. Denker,

“Artiﬁcial Neural Networks for Computing”. Journal of Vacuum Science &

Technology B: Microelectronics Processing and Phenomena, 4(1), pp. 61-

63, 1986.

[2]

H. Graf, P. de Vegvar, “A CMOS Associative Memory Chip Based on

Neural Networks”, ISSCC, pp. 304- 305, 1987.

[3] G. Indiveri, et al., “Neuromorphic Silicon Neuron Circuits”, Frontiers

in Neuroscience, 5, p. 73, 2011.

[4]

S.B. Furber, F. Galluppi, S. Temple, L. Plana, “The Spinnaker Project”,

Proceedings of the IEEE, 102(5), pp. 652-665, 2014.

[5] F. Rosenblatt, “The Perceptron, A Perceiving and Recognizing

Automaton (Project Para)”. Cornell Aeronautical Laboratory, 1957.

[6]

B. Widrow, W.H. Pierce, J.B. Angell. “Birth, Life, and Death in

Microelectronic Systems”, IRE Trans. Mil. Electron., 1051(3), pp. 191- 201,

1961.

[7] R.W. Lucky, “Automatic Equalization for Digital Communication”,

Bell System Technical Journal, 44(4), pp. 547-588, 1965.

[8] M. Minsky, S.A. Papert, “Perceptrons: An Introduction to

Computational Geometry”. MIT press, 1969.

[9] J.J. Hopﬁeld. “Neural Networks and Physical Systems with Emergent

Collective Computational Abilities”. Proceedings of the National

Academy of Sciences, 79(8), pp. 2554-2558, 1982.

[10] G.E. Hinton, T.J. Sejnowski, “Optimal Perceptual Inference”,

Proceedings of the IEEE conference on Computer Vision and Pattern

Recognition, pp. 448-453, June 1983.

[11] D.E. Rumelhart, G.E. Hinton, R.J. Williams, “Learning

Representations by Back-Propagating Errors. Nature, 323(6088), pp.

533, 1986.

[12] Y. LeCun, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.

Hubbard, L.D. Jackel, “Backpropagation Applied to Handwritten Zip Code

Recognition”, Neural Computation, 1(4), pp. 541-551, 1989.

[13] Y. LeCun, B.E. Boser, J.S. Denker, D. Henderson, R.E. Howard, W.E.

Hubbard, L.D. Jackel, “Handwritten Digit Recognition with a Back-

Propagation Network”, NIPS, pp. 396-404, 1989.

[14]

H.P. Graf, R. Janow, D. Henderson, R. Lee, “Reconﬁgurable Neural

Net chip with 32K Connections”, Advances in Neural Information

Processing Systems, pp. 1032-1038, 1991.

[15]

B.E. Boser, E. Sackinger, J. Bromley, Y. Le Cun, L.D. Jackel,

“An

Analog Neural Network Processor with Programmable Topology”. IEEE

Journal of Solid-State Circuits, 26(12), pp. 2017-2025, 1991.

[16] E. Sackinger, B.E. Boser, J. Bromley, Y. LeCun, L.D. Jackel,

“Application of the ANNA Neural Network Chip to High-Speed Character

Recognition”, IEEE Transactions on Neural Networks, 3(3), pp. 498-505,

1992.

[17]

J. Cloutier, E. Cosatto, S. Pigeon, F.R. Boyer, P.Y. Simard, “VIP: An

FPGA-Based Processor for Image Processing and Neural Networks”, Proc.

of Int. Conf. Microelectronics for Neural Networks, pp. 330-336, 1996.

[18]

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-Based Learning

Applied to Document Recognition”, Proceedings of the IEEE, 86(11), pp.

2278-2324, 1998.

[19] L. Bottou, P. Gallinari, “A Framework for the Cooperation of

Learning Algorithms”, Advances in Neural Information Processing

Systems, pp. 781-788, 1991.

[20] C. Farabet, C. Poulet, Y. LeCun, “An FPGA-Based Stream Processor

for Embedded Real-Time Vision with Convolutional Networks, ICCV

Workshops, pp. 878-885, September 2009.

[21]

C. Farabet, Y. LeCun, K. Kavukcuoglu, E. Culurciello, B. Martini, P.

Akselrod, S. Talay, “Large-Scale FPGA-Based Convolutional Networks”, R.

Bekkerman, M. Bilenko, J. Langford (Eds.), “Scaling up Machine Learning:

Parallel and Distributed Approaches”, pp. 399-419, Cambridge University

Press, 2011.

[22] A. Canziani, A. Paszke, E. Culurciello, “An Analysis of Deep Neural

Network Models for Practical Applications, arxiv: 1605.07678, 2017.

[23] Y.H. Chen, T. Krishna, J.S. Emer, V. Sze, “Eyeriss: An Energy-

Efﬁcient Reconﬁgurable Accelerator for Deep Convolutional Neural

Networks”. IEEE Journal of Solid-State Circuits, 52(1), pp. 127-138, 2017.

[24] Y.H. Chen, J. Emer, V. Sze, “Eyeriss v2: A Flexible and High-

Performance Accelerator for Emerging Deep Neural Networks”.

arXiv:1807.07928, 2018.

[25] G.E. Hinton, R.R. Salakhutdinov, “Reducing the Dimensionality of

Data with Neural Networks”, Science, 313(5786), pp. 504-507, 2006.

[26] G.E. Hinton, L. Deng, D. Yu, G.E. Dahl, A.R. Mohamed, N. Jaitly, A.

Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, B. Kingsbury, “Deep Neural

Networks for Acoustic Modeling in Speech Recognition: The shared views

of four research groups”, IEEE Signal Processing Magazine, 29(6), pp. 82-

97, 2012.

[27] R. Collobert, J. Weston, “A Uniﬁed Architecture for Natural

Language Processing: Deep Neural Networks with Multitask Learning.

ICML, pp. 160-167, 2008.

[28] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P.

Kuksa, “Natural Language Processing (almost) from Scratch”, Journal of

Machine Learning Research, pp. 2493-2537, August 2011.

[29] C. Farabet, C., Couprie, L. Najman, Y. LeCun, “Scene Parsing with

Multiscale Feature Learning, Purity Trees, and Optimal Covers”, ICML

arXiv:1202.2160, 2012.

[30] P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. LeCun, “Pedestrian

Detection with Unsupervised Multi-Stage Feature Learning”, CVPR pp.

3626-3633, 2013.

[31] A. Krizhevsky, I. Sutskever, G.E. Hinton, “Imagenet Classiﬁcation

with Deep Convolutional Neural Networks”, Advances in Neural

Information Processing Systems, pp. 1097-1105, 2012.

[32] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, “Bag of Tricks for

Efﬁcient Text Classiﬁcation”, Proc 15th Conference of the European Chapter

of the Association for Computational Linguistics: Volume 2, Short Papers.

Vol. 2, pp. 427-431, 2017.

[33] D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun, M. Paluri,

“A Closer Look at Spatiotemporal Convolutions for Action Recognition:,

Proc. Computer Vision and Pattern Recognition, pp. 6450-6459, 2018.

[34]

M. Ott, S. Edunov, D. Grangier, M. Auli, “Scaling Neural Machine

Translation”. arXiv:1806.00187, 2018.

[35] J. Johnson, “Rethinking Floating Point for Deep Learning”,

ArXiv:1811.01721, 2018.

[36]

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N.

Gomez, L. Kaiser, I. Polosukhin, “Attention is all you need” NIPS, pp.

5998-6008, 2017.

[37] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun,

“Overfeat: Integrated Recognition, Localization and Detection Using

Convolutional Networks”. Proc. ICLR, arXiv:1312.6229, 2014.

[38] K. He, G. Gkioxari, P. Dollr, R. Girshick, R. (2017, “Mask R-CNN”, Proc.

ICCV, pp.2980-2988, October 2017.

[39] T.Y. Lin, P. Dollr, R.B. Girshick, K. He, B. Hariharan, S.J. Belongie,

“Feature Pyramid Networks for Object Detection” CVPR, Vol. 1, No. 2,

p. 4, 2017.

[40] T.Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollr, “Focal Loss for Dense

Object Detection”, Proc. ICCV, arXiv:1708.02002, 2017.

[41] O. Ronneberger, P. Fischer, T. Brox, “U-Net: Convolutional

Networks for Biomedical Image Segmentation”, International

Conference on Medical Image Computing and Computer-Assisted

Intervention”, pp. 234-241, October 2015.

[42] P. Jaeger, S. Kohl, S. Bickelhaupt, F. Isensee, T.A.Kuder,

H.-P. Schlemmer, K. Maier-Hein, “Retina U-Net: Embarrassingly Simple

Exploitation of Segmentation Supervision for Medical Object Detectio”,

arXiv:1811.08661, 2018.

[43]

M. Mathieu, C. Couprie, Y. LeCun, “Deep Multi-Scale Video Prediction

Beyond Mean Square Error”, ICLR, arXiv:1511.05440, 2016.

[44] C.Finn, I. Goodfellow, S. Levine, “Unsupervised Learning for Physical

Interaction Through Video Prediction”, Advances in Neural Information

Processing Systems, pp. 64-72, 2016.

[45]

M. Henaff, A. Canziani, Y. LeCun, “Model-Predictive Policy

Learning with Uncertainty Regularization for Driving in Dense Trafﬁc”

To appear in 2019.

[46]

P. Luc, C. Couprie, Y. LeCun, J. Verbeek, “Predicting Future Instance

Segmentations by Forecasting Convolutional Features”. ECCV,

arXiv:1803.11496, 2018.

18 • 2019 IEEE International Solid-State Circuits Conference

ISSCC 2019 / SESSION 1 / PLENARY / 1.1

[47] G. Huang, D. Chen, T. Li, F. Wu, L. van der Maaten, K.Q. Weinberger,

“Multi-Scale Dense Networks for Resource Efﬁcient Image Classiﬁcation”.

ICLR, arXiv:1703.09844, 2018.

[48]

J. Johnson, et al, “Inferring and Executing Programs for Visual

Reasoning” ICCV, pp. 3008-3017, 2017.

[49]

R. Hu, J. Andreas, M. Rohrbach, T. Darrell, K. Saenko, “Learning to

Reason: End-to-End Module Networks for Visual Question Answering”,

ICCV arxiv:1704.05526, 2017.

[50] M.M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, P. Vandergheynst,

“Geometric Deep Learning: Going Beyond Euclidean Data”, IEEE Signal

Processing Magazine, 34(4), pp. 18-42, 2017.

[51] M.

Nickel, D. Kiela, “Learning Continuous Hierarchies in the Lorentz

Model of Hyperbolic Geometry”, arXiv:1806.03417, 2018.

[52] S.

Sukhbaatar, J. Weston, R. Fergus, “End-to-End Memory

Networks”, Advances in Neural Information Processing Systems, pp.

2440-2448, 2015.

[53] A. Miller, A.Fisch, J. Dodge, A.H. Karimi, A. Bordes, J. Weston,

“Key-Value Memory Networks for Directly Reading Documents”,

ArXiv:1606.03126, 2016.

[54] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, F. Huang, “A Tutorial

on Energy-BasedLlearning.in Bakir et al (Eds), Predicting Structured

Data, MIT Press, 2006.

[55] N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. DeVito, W.S.

Moses, S. Verdoolaege, A. Adams, A. Cohen, “Tensor Comprehensions:

Framework-Agnostic High-Performance Machine Learning Abstractions”.

arXiv:1802.04730, 2018.

[56]

B. Graham, M. Engelcke, L. van der Maaten, “3D Semantic

Segmentation with Submanifold Sparse Convolutional Networks”,

CVPR, pp 18-22, 2018.

[57] C. Farabet, R. Paz, J. Prez-Carrasco, C. Zamarreo, A. Linares-

Barranco, Y. LeCun, E. Culurciello, T. Serrano-Gotarredona, B.

Linares-Barranco, “Comparison Between Frame-Constrained Fix-Pixel-

Value and Frame-Free Spiking-Dynamic-Pixel ConvNets for Visual

Processing:, Frontiers in Neuroscience, 6, 32, 2012.

[58]

D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y Li, A.

Bharambe, L. van der Maaten, “Exploring the Limits of Weakly Supervised

Pretraining” ECCV, arXiv:1805.00932, 2018.

[59] I. Goodfellow, et al, “Generative Adversarial Nets”, NIPS, pp. 2672-

2680, 2014.

[60] D.P.

Kingma, M. Welling, “Auto-Encoding Variational Bayes”, ICLR.

arXiv:1312.6114, 2014.

[61]

T. Karras, T. Aila, S. Laine, J. Lehtinen, “Progressive Growing of Gans

for Improved Quality, Stability, and Variation”, ICLR. arXiv:1710.10196,

2018.

[62] A. Brock, J. Donahue, K. Simonyan, “Large Scale Gan Training for

High Fidelity Natural Image Synthesis”. arXiv:1809.11096, 2018.

[63] http://github.com/facebookresearch/maskrcnn-benchmark

[64] http://www.skiplang.com

[65] https://pytorch.org

ISSCC 2019 / February 18, 2019 / 8:45 AM

19DIGEST OF TECHNICAL PAPERS •

Figure 1.1.1: Early neural network chips from Bell Labs. (A) 1986: 12-

resistor array, 6×6 microns [1]; (B) 1987: 54×54 analog array with

programmable ternary weight array with FIFOs for convolutions [14]; (D)

1991: ANNA ConvNet chip 64×64 array with 6-bit weights and 3-bit

activations [15].

Figure 1.1.2: An example of Convolutional Network architecture for

image recognition. Not all layers are represented [37].

Figure 1.1.3: Top-1 accuracy on ImageNet versus number of

operations for one pass of various ConvNet architectures. Circle size

indicates the number of parameters [22].

Figure 1.1.5: (Top) Memory Network architecture [52]; (Bottom) Key-

Value Memory Network architecture for question answering [53]. Both

architectures contain a central processing network connected with a

“soft”

associative memory circuit that stores facts. The memory module

is a “soft” associative memory circuit in which the “address”vector is

compared with each key vector through a dot product, producing scalar

matching scores. The scores are normalized to sum to one. The output

is a linear combination of the stored value vectors, weighted by the

normalized scores.

Figure 1.1.6: An example of self-supervised learning. A latent-

variable model predicts how surrounding cars will move relative to

the ego car (in the center). The model takes a few past frames and

predicts the future relative positions of other cars, conditioned on a

vector of latent variables. It is trained using data collected from trafﬁc

cameras overlooking roads. Different samplings of the latent variable

produce different futures. This model can be used to plan or to train

an artiﬁcial driver to minimize the probability of collision.

Figure 1.1.4: (Top) Multi-Scale DenseNet with conditional computation

for accelerated results [47]. (Bottom) RetinaNet architecture for image

semantic segmentation [40].

剩余737页未读，继续阅读

PiersonWong

粉丝: 1
资源: 3

2019年ISSCC：全球最前沿的集成电路技术摘要

ISSCC2010 - Digest.rar

ISSCC 2019 - Digest.rar

2008~2011_isscc_Digest.rar

isscc2020-无线收发器电路和架构的基础知识(从 2g 到 5g)

2024 ieee isscc digest

isscc2020下载

isscc2023下载

isscc 2022 session

isscc 2017 tutorial

ISSCC-2020-Digest.pdf

ADC资料： ISSCC 2019

ISSCC2018文章合集

ISSCC2019论文合集

120吨双级反渗透程序+混床程序，以及阻垢剂、杀菌剂 加药 一键制水，一键反洗，一键正洗，无人值守 西门子S7-200 sm

java基于SpringBoot+vue 校园新闻管理系统源码 带毕业论文

weixin237基于微信小程序的医院挂号预约系统ssm.rar

基于Matlab界面GUI设计的打地鼠游戏[Matlab界面GUI设计].zip

weixin160在线课堂微信小程序+ssm.rar

10个数据结构课程设计实例二叉树建立遍历冒泡排序快速排序等.zip

基于springboot的生鲜交易系统设计与实现.docx

最新资源

120吨双级反渗透程序+混床程序，以及阻垢剂、杀菌剂加药一键制水，一键反洗，一键正洗，无人值守西门子S7-200 sm

java基于SpringBoot+vue 校园新闻管理系统源码带毕业论文