Input View
Novel View
(a) Activation
(c) Output
(b) Input
(d) NeRF
Figure 2. The figure illustrates various modifications to alleviate the spectral bias problem in INRs, provides an overview of their underlying
principles, and introduces NeRF as an additional background method, as discussed in section 2.
(INR | medical | NeRF). Here, Task refers to one
of the applications covered (Figure 4). To ensure the
selection of relevant papers, we conducted a meticulous
evaluation based on factors such as novelty, contribution,
and significance. Priority was given to papers that were
pioneering in the field of medical imaging. Subsequently,
we selected papers with the highest rankings for further
examination.
2. Background
Implicitly representing signals with neural networks has
gathered pace in recent years. Instead of parametrizing sig-
nals with discrete representations such as grids, voxels, point
clouds, and meshes, a simple MLP can be learned to contin-
uously represent the signal of interest as an implicit function
Ψ : x → Ψ(x), mapping their spatial coordinates x ∈ R
M
from M dimensional space to their corresponding N dimen-
sional value ψ(x) ∈ R
N
(e.g., occupancy, color, etc.). While
INRs have shown promising, they can fail to encode high-
frequency details compared to discrete representations, lead-
ing to a suppressed representation quality. Rahaman et al.
[39] have made significant strides in uncovering limitations
within conventional ReLU-based MLPs and their ability to
represent fine details in underlying signals accurately. These
MLPs have shown a propensity to learn low-frequency de-
tails, leading to a phenomenon known as spectral bias in
piece-wise linear networks. In order to address this issue,
several approaches have been explored to redirect the net-
work’s focus toward capturing high-frequency details and
effectively representing the signal with finer-grained details.
To enhance the representation of the input signal, three av-
enues can be pursued within an MLP framework based on its
structure. Firstly, one can consider changing the input type
by mapping it to a higher-dimensional space to enable the
network to capture more intricate details within the signal.
Secondly, another approach involves replacing the ReLU ac-
tivation function with a new activation function that bet-
ter facilitates the learning of high-frequency components.
Lastly, one can explore altering the output of the MLP to
a higher-dimensional space, where each node is responsi-
ble for reconstructing a specific part of the signal. In this
section, we will provide a background based on the modi-
fications that can be made to mitigate the spectral bias is-
sue. Additionally, we will cover a neural volume rendering
model called NeRF [34] as a pioneering approach to bridge
implicit representations and novel view synthesis. Figure 2
illustrates the overview of our proposed background.
2.1. Input
The conventional approach in INR treats the spatial co-
ordinate of each element in the signal, such as pixels in an
image, as the input to an MLP. However, this approach tends
to learn low-frequency functions, limiting its ability to effec-
tively represent complex signals. To address this limitation,
recent progress suggests using a sinusoidal mapping of the
Cartesian coordinates to a higher dimensional space, which
enables the learning of high-frequency details more effec-
tively [55]:
1. Basic: γ(v) = [cos(2πv, sin(2πv)]
T
.
2. PE: γ(v) = [..., cos(2πσ
j/m
v, sin(2πσ
j/m
v), ...]
T
for j = 0, ..., m − 1. PE denotes Positional Encoding,
and the scale σ is determined for individual tasks and
datasets through a process of hyperparameter sweep.
3. Gaussian: γ(v) = [cos(2πBv, sin(2πBv]
T
, where
the variable v represents the signal coordinates, while
B is a random Gaussian matrix, where each entry
is independently sampled from a normal distribution
N (0, σ
2
). Similarly, the scale σ is selected through a
hyperparameter sweep for each task and dataset.
These encoding processes are known as Fourier features
mapping.
2.2. Activation Function
In general, the intuition behind activation functions is to
apply non-linearity to the neural network. As for implicit
3