深度神经网络的数学框架解析

需积分: 13 14 浏览量更新于2024-07-19 1 收藏 724KB PDF 举报

"《Deep Neural Networks in a Mathematical Framework》是由Anthony L. Caterini和Dong Eui Chang合著的一本书，旨在通过建立一个通用的数学框架来深入理解深度神经网络（DNNs）的工作原理。书中探讨了DNN在处理大量数据问题时的广泛应用，并指出虽然多层参数化线性和非线性变换的DNN表现出色，但我们尚未完全理解其背后的理论基础和最佳构造方法。作者希望通过这个框架提供一种共同的数学语言，促进对DNN分析特性的进一步研究。" 这本书的第一章回顾了神经网络的历史，并讨论了数学对神经网络的贡献。作者指出，现有的DNN表示方法通常将参数和输入分解为标量，而不是利用其底层的向量空间，这使得分析变得复杂。相反，他们提出的框架直接在这些向量空间上操作，为DNN提供了一个更为自然的数学描述。在书中，作者探讨了DNN的理论基础，包括其结构、学习过程以及优化策略。他们可能涉及深度学习中的关键概念，如反向传播算法、激活函数（如ReLU）、损失函数和权重初始化。此外，他们可能讨论了梯度消失和梯度爆炸等训练DNN时遇到的问题，以及如何通过正则化、批量归一化和残差连接等技术来解决这些问题。书中的数学框架可能包括矩阵和线性代数的概念，用于描述神经网络层之间的信息传递。可能还涵盖了泛函分析和微分方程，因为这些工具对于理解和分析神经网络的动态行为至关重要。作者可能会解释如何在这个框架下分析网络的表达能力和计算复杂性，以及如何利用这些理论洞察来设计更有效的网络架构。书中可能还讨论了当前的研究趋势，如卷积神经网络（CNNs）在图像识别中的应用，递归神经网络（RNNs）及其变体在自然语言处理中的作用，以及生成对抗网络（GANs）在生成新数据方面的潜力。作者可能还会探讨这些模型的数学性质，以及如何在特定问题上选择和调整它们。《Deep Neural Networks in a Mathematical Framework》是一本深入研究DNN的数学基础的著作，旨在提供一个严谨的分析平台，促进研究人员对DNN的理解和未来的发展。通过这本书，读者可以期望获得关于如何构建和优化DNN的理论见解，以及如何利用数学工具解决实际问题的知识。

1.4 Book Layout 7

1.3 Mathematical Representations

Although there has been some work done towards developing a theoretical under-

standing of neural networks, we still have a long way to go until the theory can

reliably improve results in application. We conjecture that one of the reasons

for this is the lack of a standard framework to analyze neural networks from

an algebraic perspective. The current approach of describing neural networks as

a computational graph and working over individual components [17]orusing

automatic differentiation (reviewed in [5]) to calculate derivatives is excellent for

a majority of applications, as evidenced by the incredible empirical results that

deep learning has achieved [35]. However, such an approach does not provide a

satisfying theoretical description of the network as a whole, as it does not reference

vector spaces deﬁning the network inputs, or the associated parameters, at each

layer. In simple networks, like the MLP, this is ﬁne, but when dealing with more

complex networks, like the CNN, it can be difﬁcult to determine exactly how all

of the components of the network ﬁt together using a graphical approach or when

strictly dealing with scalars. Thus, in this book, we propose a generic mathematical

framework in which we can represent DNNs as vector-valued functions, taking

care to deﬁne all operations that we use very clearly. For example, in the view of

graphical models, it is quite common to differentiate nodes in the graph—which can

be either scalars or vectors—with respect to parameters [17]; in this work, we view

derivatives as operators which act on functions to produce new linear operators.

Furthermore, the representations and deﬁnitions that we use for vector- and matrix-

valued derivatives are unambiguous and clearly deﬁned, which is not always the

case in neural networks. One of the biggest debates regarding matrix derivatives is

the numerator vs. denominator layout, described in [37]; our representation skirts

this issue entirely by exclusively differentiating functions.

1.4 Book Layout

This book is a purely theoretical work that aims to develop a mathematical

representation of neural networks that is clear, general, and easy to work with. To

accomplish this goal, we begin in Chap. 2 by deﬁning the notation that we will

use throughout the work and review some important preliminary results. Then, in

Chap. 3, we will describe a generic neural network using this notation. We will also

write out a gradient descent algorithm acting directly over the vector space in which

the parameters are deﬁned. We apply the generic framework to speciﬁc to speciﬁc

neural network structures in Chap. 4, demonstrating its effectiveness in describing

the MLP, CNN, and DAE, and also detailing how to modify and relax some of

the assumptions made. In Chap. 5, we further extend the framework to represent

RNNs, explicitly writing out two methods for gradient calculation and discussing

some extensions. Finally, we review the major contributions of this book in Chap. 6

8 1 Introduction and Motivation

and outline some possible directions for future work. A large portion of Chaps. 2,

3 and 4 appeared in our work on CNNs [8] and MLPs and DAEs [9], but we have

combined the results from those papers into a single work in this book.

References

1. D. Ackley, G. Hinton, T. Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci.

9(1), 147–169 (1985)

2. M. Arjovsky, L. Bottou, Towards principled methods for training generative adversarial

networks. arXiv:1701.04862 (2017, preprint)

3. M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN. arXiv:1701.07875 (2017, preprint)

4. D. Ballard, Modular learning in neural networks, in AAAI (1987), pp. 279–284.

5. A. Baydin, B. Pearlmutter, A. Radul, J. Siskind, Automatic differentiation in machine learning:

a survey. arXiv:1502.05767 (2015, preprint)

6. Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep

networks, in Advances in Neural Information Processing Systems (2007), pp. 153–160

7. J. Berkson, Application of the logistic function to bio-assay. J. Am. Stat. Assoc. 39(227),

357–365 (1944)

8. A.L. Caterini, D.E. Chang, A geometric framework for convolutional neural networks.

arXiv:1608.04374 (2016, preprint)

9. A.L. Caterini, D.E. Chang, A novel representation of neural networks. arXiv:1610.01549

(2016, preprint)

10. D. Cire¸san, U. Meier, L. Gambardella, J. Schmidhuber, Deep, big, simple neural nets for

handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)

11. D. Clevert, T. Unterthiner, S. Hochreiter, Fast and accurate deep network learning by

exponential linear units (ELUs). arXiv:1511.07289 (2015, preprint)

12. G. Cybenko, Approximation by superpositions of a sigmoidal function. Math. Control Signals

Syst. 2(4), 303–314 (1989)

13. Y. Dauphin, R. Pascanu, C. Gulcehre, K. Cho, S. Ganguli, Y. Bengio, Identifying and attacking

the saddle point problem in high-dimensional non-convex optimization, in Advances in Neural

Information Processing Systems (2014), pp. 2933–2941

14. R. Eldan, O. Shamir, The power of depth for feedforward neural networks, in Conference on

Learning Theory (2016), pp. 907–940

15. K. Fukushima, S. Miyake, Neocognitron: a self-organizing neural network model for a

mechanism of visual pattern recognition, in Competition and Cooperation in Neural Nets

(Springer, Berlin, 1982), pp. 267–285

16. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,

Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems

(2014), pp. 2672–2680

17. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016), http://

www.deeplearningbook.org.

18. B. Graham, Fractional max-pooling. arXiv:1412.6071 (2014, preprint)

19. A. Graves, Generating sequences with recurrent neural networks. arXiv:1308.0850 (2013,

preprint)

20. R. Hahnloser, R. Sarpeshkar, M.A. Mahowald, R. Douglas, H. Seung, Digital selection and

analogue ampliﬁcation coexist in a cortex-inspired silicon circuit. Nature 405(6789), 947–951

(2000)

21. K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectiﬁers: surpassing human-level

performance on imagenet classiﬁcation, in Proceedings of the IEEE International Conference

on Computer Vision (2015), pp. 1026–1034

References 9

22. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings

of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778

23. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN. arXiv:1703.06870 (2017, preprint)

24. G. Hinton, S. Osindero, Y. Teh, A fast learning algorithm for deep belief nets. Neural Comput.

18(7), 1527–1554 (2006)

25. S. Hochreiter, Untersuchungen zu dynamischen neuronalen netzen, Diploma, Technische

Universität München, 91, 1991

26. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780

(1997)

27. S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient ﬂow in recurrent nets: the

difﬁculty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent

Neural Networks. IEEE Press (2001)

28. K. Hornik, Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2),

251–257 (1991)

29. A. Ivakhnenko, V. Lapa, Cybernetic predicting devices, Technical report, DTIC Document,

1966

30. L. Kanal, Perceptron, in Encyclopedia of Computer Science (Wiley, Chichester, 2003)

31. Y. LeCun, D. Touresky, G. Hinton, T. Sejnowski, A theoretical framework for back-

propagation, in The Connectionist Models Summer School, vol. 1 (1988), pp. 21–28

32. Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel, Handwritten

digit recognition with a back-propagation network, in Advances in Neural Information

Processing Systems (1990), pp. 396–404

33. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document

recognition. Proc. IEEE 86(11), 2278–2324 (1998)

34. Y. LeCun, C. Cortes, C. Burges, Mnist handwritten digit database. AT&T Labs [Online]. http://

yann.lecun.com/exdb/mnist, 2 (2010)

35. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015)

36. H. Lin, M. Tegmark, Why does deep and cheap learning work so well? arXiv:1608.08225

(2016, preprint)

37. H. Lutkepohl, Handbook of Matrices (Wiley, Hoboken, 1997)

38. A. Maas, A. Hannun, A. Ng, Rectiﬁer nonlinearities improve neural network acoustic models,

in Proceedings of ICML, vol. 30 (2013)

39. W. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity. Bull.

Math. Biol. 5(4), 115–133 (1943)

40. M. Minsky, S. Papert, Perceptrons (MIT press, Cambridge, 1969)

41. V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves, M. Ried-

miller et al., Human-level control through deep reinforcement learning. Nature 518(7540),

529–533 (2015)

42. G. Montufar, R. Pascanu, K. Cho, Y. Bengio, On the number of linear regions of deep neural

networks, in Advances in Neural Information Processing Systems (2014), pp. 2924–2932

43. V. Nair, G. Hinton, Rectiﬁed linear units improve restricted Boltzmann machines, in Pro-

ceedings of the 27th International Conference on Machine Learning (ICML-10) (2010), pp.

807–814

44. R. Pascanu, G. Montufar, Y. Bengio, On the number of response regions of deep feed forward

networks with piece-wise linear activations. arXiv:1312.6098 (2013, preprint)

45. A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolu-

tional generative adversarial networks. arXiv:1511.06434 (2015, preprint)

46. F. Rosenblatt, The perceptron: a probabilistic model for information storage and organization

in the brain. Psychol. Rev. 65(6), 386 (1958)

47. D. Rumelhart, G. Hinton, R. Williams, Learning internal representations by error propagation,

Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985

48. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved

techniques for training GANs, in Advances in Neural Information Processing Systems (2016),

pp. 2226–2234

剩余90页未读，继续阅读

WindStand

粉丝: 35
资源: 367

深度神经网络的数学框架解析

Deep Neural Networks in a Mathematical Framework

Efficient Processing of Deep Neural Networks A Tutorial and Survey

深度学习国外综述论文 Efficient Processing of Deep Neural Networks: A Tutorial and Survey

Deep-Learning-with-TensorFlow-Explore-neural-networks-with-Python.pdf.pdf

神经网络和深度学习neural networks and deep-learning-zh.pdf

PRUNING-DEEP-NEURAL-NETWORKS-FROM-A-SPARSITY全文翻译.pdf

neural-networks-and-deep-learning-master-python3.zip

neural-networks-and-deep-learning-master.zip_Neural networks_dee

Neural Networks and Deep Learni - Pat Nakamoto.azw3

neural-networks-and-deep-learning-master.zip_deep learning_人工神经网

最新资源