多模态图像对齐新法：线性映射驱动的地标匹配

需积分: 26 42 浏览量更新于2024-08-13 1 收藏 3.22MB PDF 举报

本文是一篇深入研究的"通过特征模态之间的线性映射进行多模态图像对齐"的研究论文。作者团队，由Yanyun Jiang、Yuanjie Zheng等人组成，来自山东师范大学等多所知名机构，聚焦于计算机科学领域中的一个关键问题：如何有效地在不同模态的图像之间建立精确的对齐，尤其是在存在大量噪声的情况下。他们的方法创新性地提出了一个基于地标匹配的策略，这种方法依赖于解决不同特征模态之间的线性映射。传统的图像对齐通常依赖于特征点的对应，但这个新方法通过求解线性映射，重新定义了不同模态图像间的相似性度量。这不仅考虑了单个特征点的匹配，而是将整个特征空间的结构纳入考量，提高了对复杂非线性和非刚性空间变换的适应能力。在实现上，他们设计了一个算法框架，该框架通过最小化一个凸二次函数来联合优化线性映射和地标（关键点或特征）的对应关系。这种方法不仅能够处理静态图像，也适用于涉及动态图像或视频序列的多模态数据，如医学影像分析，其中不同的成像技术可能捕捉到不同的视图或时间窗口。值得注意的是，论文在2017年1月接收并经过审阅后，在同年7月正式发布。学术编辑Saverio Aﬀatato对文章进行了指导。由于采用了Creative Commons Attribution License，这意味着读者可以无限制地使用、复制和分发这篇论文，只要尊重原创作者的权益。这项工作对多模态图像分析领域具有重要的理论贡献和实际应用价值，它提供了一种稳健且灵活的工具，有助于提高跨模态数据融合的质量，对于诸如人脸识别、物体识别、医疗影像处理等任务具有显著的推动作用。

Research Article

Multimodal Image Alignment via Linear Mapping between

Feature Modalities

Yanyun Jiang,

Yuanjie Zheng,

Sujuan Hou,

Yuchou Chang,

and James Gee

School of Information Science and Engineering, Key Lab of Intelligent Computing & Information Security in Universities of

Shandong, Institute of Life Sciences, Shandong Provincial Key Laboratory for Distributed Computer Software Novel Technology and

Key Lab of Intelligent Information Processing, Shandong Normal University, Jinan, Shandong 250014, China

Computer Science and Engineering Technology Department, University of Houston-Downtown, Houston, TX 77002, USA

Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA

Correspondence should be addressed to Yuanjie Zheng; zhengyuanjie@gmail.com

Received 8 January 2017; Accepted 10 May 2017; Published 6 July 2017

Academic

Editor: Saverio Aﬀatato

which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

We propose a novel landmark matching based method for aligning multimodal images, which is accomplished uniquely by

resolving a linear mapping between diﬀerent feature modalities. This linear mapping results in a new measurement on similarity

of images captured from diﬀerent modalities. In addition, our method simultaneously solves this linear mapping and the

landmark correspondences by minimizing a convex quadratic function. Our method can estimate complex image relationship

between diﬀerent modalities and nonlinear nonrigid spatial transformations even in the presence of heavy noise, as shown in

our experiments carried out by using a variety of image modalities.

1. Introduction

Multimodal/multispectral images acquired from multiple

modalities or diﬀerent spectral bands of the same subject or

organ are of great importance for medical diagnosis and

computer-aided surgery, beneﬁting from the complementary

information captured by sensors of diﬀerent modalities/spec-

tra (e.g., magnetic resonance imaging and computed tomog-

raphy or the multispectral imaging) [1–3]. They are also

being more and more widely used in other ﬁelds, such as

computer vision and computational photography, accom-

plished via diﬀerent imaging modalities (e.g., RGB and near

infrared) or under various imaging conditions (e.g., ﬂash

and no ﬂash, depth, and color images) [4].

Image alignment resolves spatial correspondences

between images and plays a fundamentally important role

in practical application of multimodal images. There cur-

rently exist various techniques [4–9] for multimodal image

alignment, which can be basically categorized into feature-

based and patch-based methods. The feature-based methods

detect sparse salient points and extract features to describe

their local photometric/geometric pattern [10, 11]. Diﬀerent

from alignment of generic images, multimodal image align-

ment requires the features together with their similarity

measurement to be able to deal with image variations caused

by the modality diﬀerence [6]. The patch-based methods

measure the similarity between local patches by computing

their mutual information [12], cross correlation [4, 6, 13],

or their combination [14].

Disregarding the promising results reported in existing

papers, multimodal image alignment still remains a challenge

mainly due to the comp lex and unknown relationship

between image modalities (as shown by the left two images

in Figure 1(c)). The common information bet ween multi-

modal images is needed for deﬁning image features. How-

ever, it is not always trivial to recognize, model, or learn

this information in practice due to outliers, large displace-

ment, and the complex relationship [4]. Moreover, the

predeﬁned image features can work well only when the cor-

responding measurement of the feature similarity ﬁts these

features, which is not always an easy task in practice. Finally,

the deﬁnition of image feature and similarity is independent

Hindawi

Journal of Healthcare Engineering

Volume 2017, Article ID 8625951, 6 pages

https://doi.org/10.1155/2017/8625951

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38547421

粉丝: 3
资源: 958

多模态图像对齐新法：线性映射驱动的地标匹配

知识库实体对齐技术综述

使图像对齐

线性映射驱动的多模态图像对齐方法

多模态数据融合模型.pptx

图像处理中图像的点运算，包括图像配准，图像线性变换，对数变换等

多模态数据库数据集成与管理.pptx

ChatGPT技术的多模态输入与输出处理方法优化.docx

医学图像配准：刚体变换在多模态图像对齐中的应用

音乐情感分类的多模态融合与深度学习方法探讨

特征融合在多模态人脸识别中的研究

最新资源