一维视觉Transformer与多尺度卷积融合轴承故障诊断

5星 · 超过95%的资源需积分: 50 27 浏览量更新于2024-08-04 收藏 1.53MB PDF 举报

"这篇论文提出了一种用于轴承故障诊断的一维视觉Transformer模型，结合了多尺度卷积融合（MCF-1DViT），旨在解决传统基于卷积神经网络（CNN）的故障诊断方法无法有效捕获滚动轴承的时间信息的问题。通过设计多尺度卷积融合层（MCF）自动并有效地从采集的振动信号中提取多尺度特征，随后引入改进的视觉Transformer架构来处理这些特征，从而提高诊断性能。" 在当前的工业设备健康管理（PHM）领域，轴承故障诊断是至关重要的任务，因为轴承故障可能导致严重的设备停机和经济损失。传统的基于CNN的方法虽然在图像识别和模式分析方面表现出色，但在处理一维时间序列数据（如机械振动信号）时，可能无法充分利用时间上的连续性和动态变化信息。本文提出的MCF-1DViT模型，其创新点在于结合了一维视觉Transformer与多尺度卷积融合层。一维视觉Transformer（1DViT）是对二维视觉Transformer的扩展，适应于处理一维序列数据，如声音、振动等信号。Transformer以其自注意力机制，能够捕捉到序列中的长距离依赖关系，这对于理解和诊断滚动轴承等旋转机械的故障至关重要。多尺度卷积融合层则是为了增强模型对不同时间尺度特征的捕获能力。在机械设备的振动信号中，故障特征可能在不同的频率或时间窗口内出现。通过多尺度卷积，模型可以同时学习到短期和长期的特征，使得故障检测更为精确。论文中，MCF层与1DViT的结合策略可能是先通过多尺度卷积提取多尺度特征，然后将这些特征输入到Transformer结构中，利用Transformer的自注意力机制进行全局信息整合，进一步提升特征表示的质量。实验结果通常会对比传统的CNN模型和其他Transformer变体，证明MCF-1DViT在轴承故障诊断的准确性、鲁棒性以及泛化能力上的优势。这篇研究工作提供了一种新的视角来处理一维时间序列数据，特别是对于工业设备健康监测和故障预测，有望推动相关领域的技术进步。通过将Transformer的强大学习能力与多尺度卷积的特征提取能力相结合，MCF-1DViT模型为未来轴承故障诊断和其他类似应用提供了有力的工具。

2021 Global Reliability and Prognostics and Health Management

(PHM-Nanjing)

A One-Dimensional Vision Transformer with Multi-

scale Convolution Fusion for Bearing Fault Diagnosis

Chaoyang Weng

School of Mechanical Engineering

Nanjing University of

Science and Technology

Nanjing, China

wengcy@njust.edu.cn

Baochun Lu

School of Mechanical Engineering

Nanjing University of

Science and Technology

Nanjing, China

lbcnust@sina.com

Jiachen Yao

School of Mechanical Engineering

Nanjing University of

Science and Technology

Nanjing, China

791344334@qq.com

Abstract—Aiming at the problem that traditional convolutional

neural networks (CNN) based fault diagnosis methods cannot

capture the temporal information of rolling bearings, a one-

dimensional Vision Transformer with Multiscale Convolution

Fusion (MCF-1DViT) is proposed in this paper. To automatically

and effectively enrich multiscale features from the collected

vibration signals, the multiscale convolution fusion (MCF) layer is

designed to capture the fault features in multiple time scales. Then,

the improved Vision Transformer architecture is introduced to

learn long-term time-related information with Transformer,

which can significantly improve the diagnosis accuracy and anti-

noise ability. Finally, experiments on a popular rolling bearing

dataset are implemented to validate the proposed method. The

results show that the proposed method can obtain superior

diagnosis performance compared with the existing methods.

Keywords- bearing fault diagnosis; one-dimensional; Vision

Transformers; multiscale; self-attention

I. INTRODUCTION

Rotating machinery has been widely used in modern industry.

In most cases, rotating machinery needs to work in harsh

environments and complex working conditions, which will lead

to various faults [1]. As a key component of rotating machinery,

rolling bearings account for 30% of all failures of rotating

components [2]. Failure of rolling bearing could cause huge

economic losses, and even endanger the safety of operators in

severe cases [3]. Therefore, it is necessary to find an effective

intelligent bearing fault diagnosis method.

Recently, deep learning technologies, as an effective method

of automatic feature extraction and classification, have been

widely applied in many fields such as machine vision and speech

recognition [4, 5]. Due to its attractive characteristic that can

automatically learn high-level representations of inputs without

manual feature extraction, deep learning technologies have been

applied into the area of fault diagnoses, such as deep belief

networks (DBNs) [6], convolutional neural networks (CNN) [7]

and residual convolution network (ResNet) [8]. Among these

deep learning methods, the CNN is a typical deep learning

architecture bases on the special multilayer perceptrons neural

network, which through convolution operations and pooling

operations to processing shift-invariant data [9]. Thereby, many

scholars utilize CNN to achieve bearing fault diagnosis. For

example, Chen et al. [7] used a map representations of Cyclic

Spectral Coherence as the input of CNN, and greatly improved

the recognition performance of bearing faults. Wang et al. [10]

combined symmetrized dot pattern with CNN for intelligent

bearing fault diagnosis. Wen et al. [11] eliminated the effect of

manual features by converting the signal into two-dimensional

(2D) images directly and fed it into a novel CNN-based mothed.

However, the vibration signals are usually one-dimensional (1D)

time-domain signals. Therefore, Zhang et al. [12] used raw

vibration signals as the input of deep CNN with wide first-layer

kernels, and get better robustness in complex environments.

Huang et al. [13] added different scales of kernel to the first layer

of CNN, which can obtain the distinguishable information in

multiple time scales adaptively. Liu et al. [14] proposed a multi-

scale kernel based Residual CNN architecture to capture the

fault features. In summary, the CNN-based methods can extract

the highly localized feature via kernels and achieve a certain

fault recognition accuracy. However, these methods have not

leveraged related information about the relative or absolute

position of the entire raw vibration signals sequence.

Unlike the CNN-based that typically use filters with a local

receptive field, a new type of deep learning model called

Transformers [15] have been proposed to relate spatially distant

concept through self-attention in token-space. The self-attention

can capture long-range relationships between the sequence’s

elements and judiciously allocate computation by attending to

important regions, instead of treating all points equally [16].

Thus, Transformer is currently considered state-of-the-art

models in sequential data, especially natural language

processing (NLP) methods. Inspired by the major success of

Transformer architectures in the field of NLP, researchers have

introduced Transformer to computer vision tasks. In particular,

the Vision Transformer (ViT) [17] was proposed to perform

classification by mapping a sequence of image patches to the

semantic label. The Transformer employed by the ViT can

process different regions of the image and integrate information

across the entire image.

Motivated by these observations, ViT has great potentials for

intelligent fault diagnosis. However, Transformer lack some

inductive biases inherent to CNNs, such as translation equal

variance and locality, which make it unable to generalize well

when trained on an insufficient amount of data. Moreover, the

vibration signals are the 1D time-domain signals, the 2D images

reshaped by the raw vibration signals cannot reflect the inherent

vibration information [18], which makes it difficult to learn

meaningful fault features directly.

2021 Global Reliability and Prognostics and Health Management (PHM-Nanjing)

Authorized licensed use limited to: China University of Petroleum. Downloaded on July 15,2022 at 08:00:57 UTC from IEEE Xplore. Restrictions apply.

下载后可阅读完整内容，剩余5页未读，立即下载

计算机视觉、一维信号分类

粉丝: 224
资源: 2

一维视觉Transformer与多尺度卷积融合轴承故障诊断

High-dimensional data analysis with low-dimensional models-2020.pdf

High-Dimensional Probability: An Introduction with Applications in Data Science

One-to-one disjointpath covers on multi- dimensional tori

Study on optical gain of one-dimensional photonic crystals with active impurity

Localized modes in orientation-disordered one-dimensional media with uniaxial scatterers

Design and fabrication of one-dimensional focusing X-ray compound lens with Al material

High-Dimensional Data Analysis with Low-Dimensional Models

Synthesis and Properties of a One-dimensional Mixed-ligand Cd(II) Coordination Polymer with Helical Structure

Two-Dimensional Super High Density Multi-Fiber Connector

Two-dimensional vision measurement approach based on local sub-plane mapping

最新资源