医学成像中的隐式神经表示：一项比较调查

需积分: 0 44 浏览量更新于2024-08-03 收藏 7.28MB PDF 举报

"该资源是一篇关于隐式神经表示（INR）在医学成像应用的综合调查文章，探讨了INR在图像重建、分割、配准等任务中的应用，并分析了其优缺点以及在医学成像领域的挑战与机遇。作者们来自多个大学和研究机构，包括伊朗科技大学、南达科他州立大学、亚琛工业大学和德国雷根斯堡大学的图像分析和计算机视觉研究所等。" 隐式神经表示（INR）是近年来在场景重建和计算机图形学领域中备受关注的技术，其通过神经网络来参数化数据，形成一种隐式的连续函数表示。这一方法在医学成像领域也显示出巨大的潜力，尤其是在解决图像处理中的复杂和不定问题时。医学成像面临着诸多挑战，如高维度数据、不规则形状以及动态临床场景的分析，而INR的特性——如分辨率无关性、内存效率、全局优化能力及可微分性——使得它在这些挑战面前具有显著优势。 INR在医学成像任务中的应用广泛，包括图像重建，这涉及从有限的测量数据中恢复原始图像；分割，即自动识别和分离图像中的特定结构；配准，是将不同图像对齐到同一坐标系的过程；新视角合成，允许从不同角度查看图像；以及压缩，减少数据存储和传输的需求。INR在这类任务中的表现得益于其对复杂形状的灵活表示和强大的学习能力。然而，INR也存在局限性，如数据需求大、计算复杂度高以及对医学成像数据的适应性问题。医学数据通常具有高度的异质性和噪声，这要求INR模型具备更强的鲁棒性和泛化能力。此外，实时和交互式系统的实现是另一个挑战，需要快速的推理速度和较低的计算资源消耗。综述中还讨论了未来的研究方向，如INR与多模态成像的结合，这可以整合来自不同成像技术的信息，提高诊断和理解的准确性。实时和交互式系统的开发对于临床实践至关重要，可以提供更直观的决策支持。此外，如何有效地训练INR模型以适应动态临床场景也是重要的研究课题。这篇综述为INR在医学成像的应用提供了全面的视角，揭示了其潜力和挑战，为未来的研究工作提供了指导。随着技术的不断发展，INR有望在医学成像领域带来更多的突破，改善医疗诊断和治疗过程。

Input View

Novel View

(a) Activation

(b) Input

(d) NeRF

Figure 2. The ﬁgure illustrates various modiﬁcations to alleviate the spectral bias problem in INRs, provides an overview of their underlying

principles, and introduces NeRF as an additional background method, as discussed in section 2.

(INR | medical | NeRF). Here, Task refers to one

of the applications covered (Figure 4). To ensure the

selection of relevant papers, we conducted a meticulous

evaluation based on factors such as novelty, contribution,

and signiﬁcance. Priority was given to papers that were

pioneering in the ﬁeld of medical imaging. Subsequently,

we selected papers with the highest rankings for further

examination.

2. Background

Implicitly representing signals with neural networks has

gathered pace in recent years. Instead of parametrizing sig-

nals with discrete representations such as grids, voxels, point

clouds, and meshes, a simple MLP can be learned to contin-

uously represent the signal of interest as an implicit function

Ψ : x → Ψ(x), mapping their spatial coordinates x ∈ R

from M dimensional space to their corresponding N dimen-

sional value ψ(x) ∈ R

(e.g., occupancy, color, etc.). While

INRs have shown promising, they can fail to encode high-

frequency details compared to discrete representations, lead-

ing to a suppressed representation quality. Rahaman et al.

[39] have made signiﬁcant strides in uncovering limitations

within conventional ReLU-based MLPs and their ability to

represent ﬁne details in underlying signals accurately. These

MLPs have shown a propensity to learn low-frequency de-

tails, leading to a phenomenon known as spectral bias in

piece-wise linear networks. In order to address this issue,

several approaches have been explored to redirect the net-

work’s focus toward capturing high-frequency details and

effectively representing the signal with ﬁner-grained details.

To enhance the representation of the input signal, three av-

enues can be pursued within an MLP framework based on its

structure. Firstly, one can consider changing the input type

by mapping it to a higher-dimensional space to enable the

network to capture more intricate details within the signal.

Secondly, another approach involves replacing the ReLU ac-

tivation function with a new activation function that bet-

ter facilitates the learning of high-frequency components.

Lastly, one can explore altering the output of the MLP to

a higher-dimensional space, where each node is responsi-

ble for reconstructing a speciﬁc part of the signal. In this

section, we will provide a background based on the modi-

ﬁcations that can be made to mitigate the spectral bias is-

sue. Additionally, we will cover a neural volume rendering

model called NeRF [34] as a pioneering approach to bridge

implicit representations and novel view synthesis. Figure 2

illustrates the overview of our proposed background.

2.1. Input

The conventional approach in INR treats the spatial co-

ordinate of each element in the signal, such as pixels in an

image, as the input to an MLP. However, this approach tends

to learn low-frequency functions, limiting its ability to effec-

tively represent complex signals. To address this limitation,

recent progress suggests using a sinusoidal mapping of the

Cartesian coordinates to a higher dimensional space, which

enables the learning of high-frequency details more effec-

tively [55]:

1. Basic: γ(v) = [cos(2πv, sin(2πv)]

2. PE: γ(v) = [..., cos(2πσ

j/m

v, sin(2πσ

j/m

v), ...]

for j = 0, ..., m − 1. PE denotes Positional Encoding,

and the scale σ is determined for individual tasks and

datasets through a process of hyperparameter sweep.

3. Gaussian: γ(v) = [cos(2πBv, sin(2πBv]

, where

the variable v represents the signal coordinates, while

B is a random Gaussian matrix, where each entry

is independently sampled from a normal distribution

N (0, σ

). Similarly, the scale σ is selected through a

hyperparameter sweep for each task and dataset.

These encoding processes are known as Fourier features

mapping.

2.2. Activation Function

In general, the intuition behind activation functions is to

apply non-linearity to the neural network. As for implicit

剩余13页未读，继续阅读

学术菜鸟小晨

粉丝: 1w+
资源: 5379

医学成像中的隐式神经表示：一项比较调查

awesome-implicit-representations:隐式神经表示资源精选清单

Combining implicit surfaces with soft blending in a CSG tree

Moments and Moment Invariants in Pattern Recognition

implicit neural function

Unexpected implicit cast to `Button`: layout tag was `ImageButton`

最新资源