快速人工神经网络库fann的实现与优化

fann

manual

需积分: 10 81 浏览量更新于2024-07-29 1 收藏 555KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

"fann_doc_complete" Fast Artificial Neural Network (FANN) 是一个用ANSI C语言实现的高效神经网络开发库，由Steffen Nissen创建并维护。该库专注于快速、多功能且易于使用的特性，特别适合于多层前馈神经网络的构建，同时支持完全连接和稀疏连接的网络结构。FANN的一个独特之处在于它支持固定点运算，这使得在没有浮点处理器的系统上也能进行快速执行。 FANN库的核心优势之一是其对固定点算术的支持。为了防止因整数溢出而导致的问题，库在训练完成后会计算小数点的位置，并确保在该小数点位置下不会发生整数溢出。这是一项重要的优化措施，特别是在硬件资源有限的环境中，可以保证神经网络的稳定运行。设计上，FANN强调了速度、灵活性和易用性。通过执行一系列基准测试，FANN在没有浮点处理器的系统上表现出比其他库显著更快的性能。而在配备了浮点处理器的系统上，尽管与其他高度优化的库相比，其性能相当，但FANN仍然保持了竞争力。关键词：人工神经网络(ANN)，固定点运算，多层前馈网络，稀疏连接，神经网络库 FANN库提供了丰富的功能，包括但不限于： 1. 训练算法：FANN支持多种训练算法，如快速反向传播（Quickprop）、批量梯度下降法（Batch Gradient Descent）和随机梯度下降法（RPROP）等，这些算法用于调整神经网络的权重以优化其性能。 2. 网络结构自定义：用户可以根据需求创建不同层数、每层神经元数量的网络结构，包括输入层、隐藏层和输出层。 3. 保存和加载模型：FANN允许用户将训练好的模型保存到文件，以便后续使用或进一步调整。 4. 剪枝功能：通过剪枝，可以减少网络的复杂性，提高运行效率，同时保持一定的预测精度。 5. 误差函数：库内包含了多种误差函数，如均方误差（MSE）和绝对误差，用于衡量模型的预测结果与实际值之间的差距。 FANN库广泛应用于各种领域，如模式识别、数据分类、预测分析和机器学习任务。由于其高效性和易用性，开发者可以快速地搭建和训练神经网络模型，从而解决各种复杂问题。同时，FANN库也提供了一套详尽的文档，方便用户理解和使用。

资源详情

资源推荐

3.3 Performance Analysis 3 ANALYSIS

The hardware integer overﬂow exception is usually masked by the operating

system or the compiler

. This implies that the only real alternatives are to check

for overﬂow on each calculation or not to check for overﬂow at all. To check for

overﬂow on each calculation would be too costly and would void the whole idea of

using ﬁxed point arithmetic for greater speed. On the other hand not to check at

all would create overﬂow and unpredictable results in consequence.

This is an annoying problem, especially because you have no real control over the

values of the weights. Usually in ﬁxed point arithmetic you either give a guarantee,

that there will never be integer overﬂow or you make a simple check, that can

see if an overﬂow has occurred during a series of calculations. I can not ﬁnd any

simple check, that can guarantee that there has not been an overﬂow for a series of

operations, but what I can do, is to guarantee that an overﬂow will never occur.

In order to make a guarantee that an overﬂow will never occur, I will have to

reevaluate the amount of functionality which should be implemented in the ﬁxed

point library. Since ﬁxed point arithmetic is mostly geared towards portable com-

puters, it is safe to assume that there will be a standard PC available for training

the ANN. This means that the training part of the ANN does not need to be imple-

mented in ﬁxed point. Another observation is that after the ANN is fully trained,

the ANN never changes and it is therefore possible to make one check, after the

training has ﬁnished, that will guarantee that an overﬂow will never occur.

These observations about the problem of ﬁxed point arithmetic, give rise to sev-

eral diﬀerent implementation strategies. In section 4.4 I will choose an appropriate

strategy and prove that there can be no overﬂow using this strategy.

3.3 Performance Analysis

The primary aim of the library is to be as fast as possible during training and

execution of the ANN. To reach this aim, I will consider several kinds of optimiza-

tion techniques. The techniques are partly inspired by the performance engineering

course at DIKU and the rules deﬁned in [Bentley, 1982] and partly by general com-

mon sense. The optimization techniques that I will consider are the following:

• Algorithmic optimization

• Architectural optimization

• Cache optimization

• Common subexpression elimination

• In-lining of code

• Specializations for fully connected ANNs

• Loop unrolling

• Table lookup

The cache optimizations are the most eﬃcient, as can be seen in the benchmarks

(section 6).

3.3.1 Algorithmic Optimization

When optimizing a piece of software, you will often ﬁnd the most eﬃcient improve-

ments, in the algorithms used for the software. If you could change the running

time of a piece of software, from Θ(n

) to Θ(n) then this optimization would almost

certainly be better than all other optimizations you could think of.

In gcc it is possible to get signals when a overﬂow occurs with -ftrapv, but unfortunately

some optimized pointer arithmetic in gcc makes integer overﬂow and breaks this functionality (see

http://gcc.gnu.org/bugzilla/show

bug.cgi?id=1823).

3.3 Performance Analysis 3 ANALYSIS

The backpropagation algorithm will have to visit all connections, this cannot

be changed and it is therefore not possible to change the running time of the back-

propagation algorithm. However, as described in section 2.3, other more advanced

algorithms exists which could get better results than the backpropagation algo-

rithm. These algorithms do not execute faster than the backpropagation algorithm,

but they adjust the weights more precise, making them reach a result faster.

I have chosen to implement the backpropagation algorithm, because it is simple

and eﬀective enough in most cases. This decision means that I have knowingly not

implemented an important optimization for the training algorithm, which implies

that there is not much use in spending too much time on the other optimization

strategies, because a highly tuned backpropagation algorithm will still be slower

than an untuned RPROP algorithm. In spite of that, a basic level of optimization

is still a desirable feature in the implementation of the backpropagation algorithm.

In conclusion; not much is done about the algorithms (although something could

be done about the training), which means that the running time is still Θ(n), where

n is the number of connections. However, there is still room for optimization of the

overhead involved in executing the actual calculations.

3.3.2 Architectural Optimization

There are many ways of building the architecture (data structures) for a neural

network. The object oriented approach would be to make everything an object and

there are actually good abstract concepts like neurons, synapses etc. which would

make for a great class hierarchy. In Jet’s Neural Library [Heller, 2002] such an

approach has been chosen, with all the advantages and disadvantages of this choice.

There are several major disadvantage of this approach:

• Data itself are not located closely together and cache performance is very bad.

• Algorithms like executing the network has code located in several diﬀerent

classes, which makes the code hard to optimize and adds an overhead on

several key functions.

• It is diﬃcult to make tight inner loops.

These are obviously problems that could be ﬁxed, while still using the object

oriented approach, but the object oriented approach makes it diﬃcult to do so.

A good architecture for a neural network should not take up too much space and

should not include too deep a level of objects. On the other hand some level of object

abstraction is highly desired. Perhaps a three level hierarchy would be acceptable,

with the outer level consisting of the entire ANN, the next level consisting of the

individual layers and the last level consisting of the single neurons and connections.

A good architecture will also allow for easy access to information like total

number of neurons etc.

3.3.3 Cache Optimization

If a good data architecture is in place much of the work for the cache optimization

is already done. But some work still needs to be done in improving the architecture

and making sure that the algorithms themselves are cache aware.

The architecture should assure that data could be accessed sequentially for good

cache performance. A good example of this is the weights, which should be accessed

sequentially when executing the network. For this reasons the weights should be

aligned in memory in one long array, which could be accessed sequentially.

The algorithms themselves should obviously use this optimized architecture and

access the data sequentially. The algorithms should also assure that all the code,

3.3 Performance Analysis 3 ANALYSIS

that they execute, are located at the same place to utilize the code cache to an

optimum.

3.3.4 Common Subexpression Elimination

Many expressions are calculated several times in standard neural network algo-

rithms. Although a compiler can do common subexpression elimination, it is often

a good idea to calculate expressions only once and store them in local variables. A

person can often do a better job at this, because a person can predict side eﬀects

and aliasing

, which the compiler can not predict.

This is especially a good idea for the stop criteria of a loop, because this calcu-

lation is made in each run of the loop. If some of this calculation could be made

only once, this would make for a good performance increase. Also variables from

the ANN which is used in central loops could be prefetched to a local variable to

avoid overhead of fetching the variable from memory each time.

The central algorithms should be hand optimized to evaluate all common subex-

pressions at an early state, while the not so central algorithms should let the com-

piler take care of this optimization.

3.3.5 In-lining of Code

All code which is evaluated more than once in either execution or training of the

ANN, should be in-lined in the algorithm. This will avoid unnecessary overhead for

function calls and allow the compiler to do optimizations across the function call.

The in-lining can be done by either writing the code directly in the algorithm,

by using in-line functions or macros.

3.3.6 Specializations for Fully Connected ANNs

In fully connected ANNs we already know the connections between two layers. If

we assure that the weights for fully connected ANNs are always located at the same

place, we can implement algorithms which beneﬁt from this information. This

information can be used to access the weight independently of the information

stored about connections.

Such an optimization beneﬁts the performance in two ways: First of all, we

can completely eliminate the need for using the memory, which store information

about connections. Secondly, we can access the weights in one step less (one pointer

reference instead of two).

3.3.7 Loop Unrolling

Unrolling loops can often be done more eﬃcient by hand than by a compiler. This

is partly because the compiler has to deal with aliasing, where a programmer can

see that aliasing will not happen and make faster code.

A short example of this is:

a[0] = b[0];

a[0] += b[1];

a[0] += b[2];

Which could be rewritten by a programmer to the following (if the programmer

was sure that a and b did not share data):

C and C++ automaticly thinks that data reached from two diﬀerent pointers could be the

same. This makes for safe but slow code (FORTRAN assumes the opposite, which makes for fast

unsafe code).

4 DESIGN AND IMPLEMENTATION

4 Design and Implementation

In section 3 I have analyzed what the library should be able to do and which methods

should be used to reach this objective. In this section I will use these considerations

to give concrete suggestions as to how the design and programming of the library

should be constructed. I will also describe how I have implemented some of these

suggestions and why I have not implemented others. If nothing else is stated, all

suggestions from both analysis and design have been implemented.

4.1 API Design

Much of the API have already been sketched in the “Usage Analysis” (section 3.1),

so I will only give a few more details in this section and leave the actual description

of the API to the “User’s Guide” (section 5).

Since the library should be written in ANSI C, the API needs to be a function

based API, but there can still be an object oriented thought behind the API.

I will use an ANN structure, which can be allocated by a constructor and deal-

located by a destructor. This structure should be given as the ﬁrst argument to all

functions which operates on the ANN, to mimic an object oriented approach.

The ANN should have three diﬀerent methods of storing the internal weights:

float, double and int. Where float and double are standard ﬂoating point

representations and int is the ﬁxed point representation. In order to give the

compiler the best possible opportunity to optimize the code, this distinction should

be made at compile-time. This will produce several diﬀerent libraries and require

that the person using the library should include a header ﬁle which is speciﬁc to

the method chosen. Although there is this distinction between which header ﬁle

is include, it should still be easy to write code which could compile with all three

header ﬁles. For this purpose I have invented a fann type, which is deﬁned in the

three header ﬁles as float, double and int respectively.

It should be possible to save the network in standard ﬂoating point representa-

tion and in ﬁxed point representation (more on this in section 4.4).

Furthermore there should be a structure which could hold training data. This

structure should like the net itself be loadable from a ﬁle. The structure of the ﬁle

should be fairly simple, making it easy to export a ﬁle in this format from another

program.

I will leave the rest of the API details to section 5 “User’s Guide”.

4.2 Architectural Design

In section 3.3.2 “Architectural Optimization” I have outlined how I will create the

general architectural design. In this section I will specify more precisely how the

design should be.

The data structures should be structured in three levels: A level containing the

whole ANN, a level containing the layers and a level containing the neurons and

connections.

With this three level structure, I will suggest two diﬀerent implementations.

The ﬁrst is centered around the connections and the second is centered around the

neurons.

4.2.1 Connection Centered Architecture

In a structure where the connections are the central structure, the three levels would

look like this:

1. fann The ANN with references to the connection layers.

剩余91页未读，继续阅读

haioukuahai

粉丝: 2
资源: 25

快速人工神经网络库fann的实现与优化

fann_pascal_vs_dlls.rar

FANN_Built_for_iOS

arduino编写3*5*3bp神经网络反向传播算法代码及讲解

神经网络lazarus代码

fann 支持多大规模

写一个模糊自适应神经网络的Python代码

关于C语言的深度学习神经网络代码

fnn文件用什么打开

arduino 神经网络

BP（Back Propagation）神经网络在哪个软件使用

请你使用c语言编写神经网络pid算法

如何用arduino编写神经网络算法代码

有没有C++的神经网络库

用c++写神经网络，需要什么库

Python写一个统计字符

c# checklist 下拉

在C++项目中集成代码文档工具：提升开发效率与代码质

新疆大学在广东2021-2024各专业最低录取分数及位次表.pdf

COMSOL 三维离散裂隙注浆模型 基于粘度空间衰减的宾汉姆流体注浆 裂隙采用随机分布的圆盘模型，恒压注浆

华北科技学院在广东2021-2024各专业最低录取分数及位次表.pdf

最新资源

arduino编写353bp神经网络反向传播算法代码及讲解

COMSOL 三维离散裂隙注浆模型基于粘度空间衰减的宾汉姆流体注浆裂隙采用随机分布的圆盘模型，恒压注浆