CUDA加速的并行最近邻分区神经网络分类器

40 浏览量更新于2024-08-26 收藏 1.74MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"本文探讨了一种基于CUDA的并行最近邻分区神经网络分类器的加速方法，旨在解决传统NNP模型构建过程中的时间效率问题。通过利用计算统一设备架构（CUDA）和图形处理单元（GPU），该方法显著提升了NNP的性能，特别是在处理大规模数据集时。实验结果显示，此并行化策略在神经网络分类器的运行速度和实际应用效果上均有优秀表现，如在水泥微结构性能评价中的应用。" 在计算机科学领域，特别是机器学习和人工智能中，神经网络分类器是一种广泛使用的工具，用于对数据进行高效准确的分类。最近邻分区（NNP）方法是这类算法的一种增强，它通过优化数据空间的分割策略来提高分类效果。然而，NNP的构建过程中涉及到大量的计算，尤其是在处理大量数据时，这会导致计算时间过长，限制了其在实时或高要求环境中的应用。 CUDA（Compute Unified Device Architecture）是由NVIDIA公司开发的一种编程模型，它允许程序员充分利用GPU的并行计算能力。由于GPU拥有成千上万的流处理器核心，它们能够同时执行大量简单的计算任务，非常适合处理大规模数据集的并行计算问题。在本研究中，作者将CUDA与NNP相结合，设计了一种新的并行算法，使得NNP的构建过程可以在GPU上并行执行，大大减少了计算时间。具体实现中，作者利用CUDA的线程块和线程的概念，将评估潜在神经网络和执行并行子任务的任务分配给这些线程。线程块内的线程可以协同工作，处理同一数据块，而不同的线程块则可以并行处理不同的数据部分。这种并行化策略有效地分摊了计算负担，提升了整体计算效率。实验结果证明了该并行NNP方法的有效性。不仅在理论性能测试中表现出显著的加速效果，而且在实际应用中，例如对水泥微结构性能的评价，也显示了良好的分类性能。这表明，基于CUDA的并行NNP方法不仅在理论上可行，而且在实际工程问题中也能提供有价值的解决方案。总结来说，这项研究为神经网络分类器的优化提供了一个新的视角，即通过并行计算技术，尤其是CUDA，可以有效加速NNP模型的构建，扩展其在大数据和实时应用中的适用性。这一成果对计算机科学、机器学习以及依赖快速分类的工程领域具有重要的实践意义。

资源详情

资源推荐

Engineering Applications of Artificial Intelligence 68 (2018) 53–62

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence

journal homepage: www.elsevier.com/locate/engappai

Accelerating nearest neighbor partitioning neural network classifier based

on CUDA

Lin Wang

*, Xuehui Zhu

, Bo Yang

, Jifeng Guo

, Shuangrong Liu

, Meihui Li

, Jian Zhu

Ajith Abraham

Shandong Provincial Key Laboratory of Network based Intelligent Computing, University of Jinan, Jinan, 250022, China

Machine Intelligence Research Labs (MIR Labs), Scientific Network for Innovation and Research Excellence, Auburn, WA, 98071, USA

Department of Radiation Oncology, Shandong Cancer Hospital and Institute, Jinan 250117, China

a r t i c l e i n f o

Keywords:

Parallel nearest neighbor partitioning

Neural network classifier

Compute unified device architecture

Graphics processing units

a b s t r a c t

The nearest neighbor partitioning (NNP) method is a high performance approach which is used for improving

traditional neural network classifiers. However, the construction process of NNP model is very time-consuming,

particularly for large data sets, thus limiting its range of application. In this study, a parallel NNP method

is proposed to accelerate NNP based on Compute Unified Device Architecture(CUDA). In this method, blocks

and threads are used to evaluate potential neural networks and to perform parallel subtasks, respectively.

Experimental results manifest that the proposed parallel method improves performance of NNP neural network

classifier. Furthermore, the application of parallel NNP in performance evaluation of cement microstructure

indicates that the proposed approach has favorable performance.

1. Introduction

Classification is an useful tool in the field of data mining and machine

learning. It aims to learn a classification function based on existing

data or to construct a classification model, i.e., the ‘‘Classifier’’. After

learning, the function or model not only maps the data records in the

training data to a specific class but also is able to predict the unknown

samples. Specifically, the classifier is a general term for classifying the

samples in data mining, including neural networks (Lu et al., 1996;

Misra et al., 2008; Gao and Ji, 2005; Hassan, 2011), support vector

machines (López and Suykens, 2011; Vapnik, 1997; Chua, 2003), rule-

based system, decision tree (Quinlan, 1986; Freund, 1995), data

gravitation (Peng et al., 2014, 2017) etc. Artificial neural network is

a general model of approximating nonlinear function, which directly

maps samples to centroid which belongs a class. It has been successfully

used in many classification tasks (Casta et al., 2011; An et al., 2011;

Avci, 2012; Yaakob and Jain, 2012; Kang and Park, 2009).

In the traditional neural network, the position, number, and labels

of centroids are permanent in training neural networks. However,

in the optimization process, mapping the samples to fixed centroids

reduces the possibility of finding optimal neural network. Thus the

floating centroid method (FCM) (Wang et al., 2012) was proposed to

solve the above problems. Its centroids are obtained through clustering

Corresponding author.

E-mail address: ise_wanglin@ujn.edu.cn (L. Wang).

method (Hartigan and Wong, 1979; Zhou et al., 2016) and distributed

automatically in the entire partition space. However, the FCM also

has its limitation, e.g., it cannot yield flexible decision boundaries.

The nearest neighbor partitioning (NNP) (Wang et al., 2017) method

overcomes that limitation by generating flexible decision boundary in

a sphere-like zone. The goodness of each possible neural network in

NNP is evaluated using a nearest-neighbor criterion. NNP is able to

easily yield flexible decision boundaries and partitions, thus increases

the probability of discovering optimal solution.

Despite the NNP’s significant improvement of accuracy, its effi-

ciency, particularly in solving large size data, is still problematic. The

collection of scientific data is getting easier due to the progress of

experimental devices and technologies. However, in recent years, the

efficiency problem of approaches for big data has drawn a widely atten-

tion. In terms of NNP, the similarity computation in its training stage

is an important procedure of the nearest-neighbor criterion (Baraldi et

al., 2016; Yu et al., 2015) and its use in the target function results in the

distribution of evaluated points are essential. However, the dramatic

increase in data size increases the complexity of similarity computation

and thus affects the efficiency of NNP. Furthermore, the efficiency of

other stages in NNP, including sample mapping, normalization, and

target function calculation, are also strongly influenced by the size of

https://doi.org/10.1016/j.engappai.2017.10.023

Received 26 May 2017; Received in revised form 1 October 2017; Accepted 30 October 2017

Available online 21 November 2017

下载后可阅读完整内容，剩余9页未读，立即下载

weixin_38635979

粉丝: 4
资源: 914

CUDA加速的并行最近邻分区神经网络分类器

矩阵序列matlab代码-kNN-CUDA:使用GPU快速进行k最近邻搜索

基于CUDA技术的卷积神经网络识别算法

基于cuda的gpu加速

CUDA和cuDNN加速器为神经网络提供性能需求这个过程有先后顺序么，从编解码架构角度阐述

CUDA和cuDNN加速器为某个编解码架构的神经网络的训练和推理提供性能需求时，这个过程有性能的峰值和谷底么，有哪些可能的消耗

python 0神经网络gpu加速

用python写一个cuda实现二位卷积神经网络的代码

基于cuda的gpu并行程序开发指南

基于cuda的gpu并行程序开发指南 pdf

使用cuda写一个深度神经网络的demo

什么是CUDA加速？如何采用CUDA加速

cudnn和cuda的关系

cv::dnn::DNN_BACKEND_CUDA

cuda和cudnn什么区别

介绍cuda和cuDNN的关系

qt+cuda+tensorrt:基于qt+cuda+trt+win10的图像分类

cuda 11.1 pytorch下载

CUDA和pytorch混

最新资源