快照式高光谱光场成像——新方法获取高光谱光场的完整信息

194 浏览量更新于2023-10-16 收藏 12.59MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

132700快照式高光谱光场成像0熊志伟1 王立志2 李慧群1 刘东1 吴锋101中国科学技术大学2北京理工大学0摘要0本文介绍了第一个实用的快照式高光谱光场成像仪。具体而言，我们设计了一种新颖的混合相机系统，以获取两种互补的测量结果，分别采样角度和光谱维度。为了从严重欠采样的测量中恢复完整的5D高光谱光场，我们提出了一种通过自学习字典利用角度和光谱维度之间的大相关性的高效计算重建算法。对一个精心设计的高光谱光场数据集进行的仿真验证了所提方法的有效性。硬件实验结果表明，据我们所知，首次实现了在单次拍摄中获取包含9×9个角度视图和27个光谱波段的5D高光谱光场。01. 引言0计算成像在过去几十年取得了巨大的进展，这要归功于光学仪器的快速发展和计算能力的爆炸性增长。计算成像的最终目标是同时解决光场函数的7个维度，即空间中的3D（2D平面+1D深度），时间中的1D，光谱中的1D和角度中的2D[33]。超出传统数字成像的维度，即深度、光谱和角度，在文献中得到了广泛的探索，范围涵盖了深度/3D成像、多/高光谱成像和光场成像等[4, 34, 8,12]。此外，商用深度相机（例如Kinect）和光场相机（例如Lytro）已经可以日常使用，这为解决困难的计算机视觉任务提供了新的机会，也实现了新的应用[11, 22,17]。计算成像的趋势是将更高的光场维度集成在一起，同时尽可能保持各自的分辨率。遵循这一趋势，当前的研究前沿扩展到从光场中获取深度[13, 30]，光场超分辨率[31,35]，超0图1. 快照式高光谱光场成像仪示意图。0光谱视频采集[20, 28]，高光谱3D成像[14,29]等。由于不同光学维度之间存在较大的相关性，因此可以从严重欠采样的测量中恢复高维光信息。然而，为了保证良好的性能，仍然需要精心设计的硬件系统和计算重建算法。在本文中，我们探索了计算成像的新方向，即快照式高光谱光场成像，这是包括光场函数所有维度的重要步骤。以前，获取高光谱光场需要以扫描方式进行，可以使用安装在门架上的光谱仪按角度和光谱维度顺序扫描[32]，也可以使用微透镜阵列和可调滤波器仅按光谱维度扫描[16]。然而，上述扫描方法在时间关键场景中不适用，例如具有动态物体或变化照明的场景。如何在不牺牲时间分辨率的情况下获取高光谱光场仍然是一个关键挑战。为此，我们设计了一个新颖的混合相机系统，如图1所示，该系统由现成的光场相机（Lytro）和编码孔径快照光谱成像仪（CASSI）组成。两个分支通过分束器共位，并在空间、角度和光谱维度上进行校准。场景中的入射光被分束器均匀分割，然后由Lytro和32710CASSI，分别。Lytro获得的RGB光场包含场景的角度信息，但缺乏光谱分辨率，而CASSI获得的压缩测量则编码了场景的高光谱信息，但缺乏角度分辨率。因此，混合成像器为恢复具有高角度和光谱分辨率的5D高光谱光场提供了互补的测量，如图1所示。然而，从这样的欠采样测量中恢复完整的5D高光谱光场是一个严重的欠定问题，由于维度差距很大。应该利用角度和光谱维度之间的相关性来辅助重建。这里的一个关键观察是，5D高光谱光场可以被视为4D逐波段光场的串联，每个波段光场与Lytro获得的蓝、绿和红光场之一具有类似的结构，根据光谱接近性。因此，RGB光场为逐波段光场的稀疏表示提供了强有力的先验知识。具体而言，可以从RGB光场中学习到三个过完备的4D字典，以稀疏表示每个波段光场。然后，我们提出将高光谱光场重建问题制定为一个稀疏约束优化问题，给定欠采样测量和自学习的字典。为了评估目的，我们通过在一个架子上安装光谱仪扫描一组静态场景来准备一个高光谱光场数据集。该数据集用于通过模拟验证计算重建算法的有效性，并优化重建过程中的参数。然后，我们使用开发的混合相机系统和提出的重建算法进行硬件实验。模拟和硬件实验结果都表明，据我们所知，首次可以在一次拍摄中获得具有高角度和光谱分辨率的5D高光谱光场。这项工作的主要贡献可以总结为三个方面：（1）用于快照式高光谱光场获取的第一个硬件系统。（2）一种有效的计算重建算法，用于从严重欠采样的测量中恢复完整的5D高光谱光场。（3）一个精心制作的高光谱光场数据集，将公开提供用于开发新的计算成像系统和算法。此外，值得一提的是，由于其快照特性，所提出的方法可以自然地扩展到6D高光谱光场视频获取。在这个意义上，所提出的方法为同时解决7D光场函数的终极目标迈出了一大步。02. 相关工作0光场成像。光场成像技术在商业相机中的应用已经趋于成熟（例如Lytro和Raytrix），可供消费者和实验室使用。一般来说，这些相机以减小探测器的空间分辨率为代价，输出场景的RGB光场的角度分辨率。另外，压缩光场摄影[21]不会牺牲空间分辨率，但需要从外部光场数据库中学习到的过完备的4D字典来辅助计算重建。我们提出的系统使用Lytro直接捕获RGB光场，从中学习到三个过完备的4D字典，以利用角度和光谱维度之间的相关性。这些自学习字典在表示待恢复的逐波段光场时保证了高稀疏性。快照式高光谱成像。最近已经开发出各种快照式高光谱成像原型，其中CASSI [2,25]及其变种[18,19]展示了令人印象深刻的性能，并引起了越来越多的关注。CASSI通过使用编码孔径和分散器将3D光谱信息光学编码到2D探测器上，依靠压缩感知理论和计算重建来恢复完整的高光谱图像，支持动态场景的视频记录[26]。为了提高CASSI的重建保真度，同时保持其快照优势，提出了CASSI的双摄像头设计[27,28]，其中CASSI和一个共位的灰度相机的测量结果共同用于高光谱重建。受此启发，我们提出的系统将Lytro与CASSI集成在一起，以获取高光谱光场而不牺牲时间分辨率。混合成像。混合相机系统已经在许多场合中被用于突破单个相机的能力限制。例如，可以将高速低分辨率相机和低速高分辨率相机组合用于高速高分辨率成像[3]；可以将低分辨率高光谱成像仪和高分辨率RGB相机一起用于高分辨率高光谱成像[20]；可以将低分辨率光场相机与高分辨率RGB相机结合用于高分辨率光场成像[6]。我们提出的系统继承了混合成像的思想，用于高光谱光场获取，首次将角度和光谱维度结合起来。03. 系统原理0图2展示了所提出系统的数据流。在分束器之后，Lytro分支捕获包含场景角度信息的 RGB 光场BeamsplitterCompressivemeasurementRGB light field…………Hyperspectral light fieldComputational reconstruction4D dictionary learning……(Coded aperture &dispersive prism)(Micro lens array)……4. Computational reconstructionDue to the large dimensionality gap, recovering the full5D hyperspectral light ﬁeld F from its undersampled mea-surements G is a severely underdetermined problem and hasrarely been investigated before. To tackle this problem, weexploit the large correlations across the angular and spec-tral dimensions of the 5D signal through the sparsity prior.Speciﬁcally, the 5D hyperspectral light ﬁeld can be regardedas a concatenation of 4D band-wise light ﬁelds. If we treatthe RGB light ﬁeld from Lytro as three separate light ﬁelds,then each 4D band-wise light ﬁeld shares similar structuresto one of them according to the spectral proximity. Forexample, a band-wise light ﬁeld falling in the green spec-trum is observed to have similar structures to the green lightﬁeld. Therefore, the RGB light ﬁeld can be used to learnthree over-complete 4D dictionaries to sparsely representeach 4D band-wise light ﬁeld.To learn the three over-complete dictionaries, we ran-domly sample a number of 4D patches sized m = w × h ×s × s from the blue, green, and red light ﬁelds, separately.The corresponding dictionary Dk ∈ Rm×n(k = 1, 2, 3) isthen derived by KSVD [1], where n (n > m) is the numberof atoms (i.e., vectorized 4D patches) remaining in the dic-tionary. These self-learned dictionaries ensure high sparsitywhen they are used to represent the band-wise light ﬁeldsto be recovered. Once the dictionaries are ready, the 5Dhyperspectral light ﬁeld can be sparsely represented asF = [F1, F2, . . . , FΩ]T = [D1, D2, D3]◦[α1, α2, . . . , αΩ]T(7)where Fλ(1 ≤ λ ≤ Ω) denotes a band-wise light ﬁeld,αλ(1 ≤ λ ≤ Ω) denotes the sparse coefﬁcient vector thatrepresents Fλ on Dk, and the operation ◦ is deﬁned asF = [D1(α1, . . . , αi), D2(αi+1, . . . , αj), D3(αj+1, . . . , αΩ)]T(8)where 1 ≤ i < j ≤ Ω specify which dictionary should beused for each band-wise light ﬁeld and are determined bythe spectral response of the Lytro detector. A more simpli-ﬁed expression will beF = D ◦ α(9)where D is composed of {Dk}3k=1 and α is the concatena-tion of {αλ}Ωλ=1.According to the compressive sensing theory [7, 9], Fcan be recovered by solving the following optimizationproblem insteadˆα = arg minα||G − ΦD ◦ α||22 + τ||α||0(10)where τ is a regularization parameter. This optimizationproblem can be efﬁciently solved by employing the orthog-onal matching pursuit algorithm [24].Figure 3. (a) The platform for preparing the hyperspectral lightﬁeld dataset. (b)-(d) three static scenes for generating the dataset:Toys, Boards, and Fruits.Figure 4. Spectral sensitivity curves of the Lytro (left) and CASSI(right) detectors used in our hardware system.5. Simulation5.1. DatasetSince the hyperspectral light ﬁeld data is rarely availablein public, we prepare a hyperspectral light ﬁeld dataset our-selves for the evaluation purpose. The dataset is obtainedby scanning three static scenes using a spectrometer (SenopRikola) mounted on a gantry. The platform used to collectthe data is shown in Figure 3(a). The spectrometer usesa liquid crystal tunable ﬁlter and captures a narrow-bandspectral image with up to 1nm bandwidth in a single shot.The gantry supports 2D translation of the spectrometer with0.01mm precision on each direction. The three scenes, asshown in Figure 3(b)-(d), are placed at a distance of around1m to the platform and contain a variety of materials withdiverse geometry and reﬂectance characteristics. The rawdata for each scene contains 9 × 9 angular views and 25spectral bands (ranging from 450nm to 690nm in 10nm in-crements) at a spatial resolution of 512 × 512. In total, wecapture 2025 images for each scene when the spectrometeris placed at different locations (the distance between twoneighboring views is 10mm). Note a one-time calibrationis needed in advance to address the extrinsic and intrinsic3273camera parameters for rectifying the captured images.5.2. Algorithm evaluationThe spectral sensitivity curves of the Lytro and CASSIdetectors used in our hardware system are shown in Figure4, which are discretized to generate the required spectral re-sponse functions for simulation. In addition, for the CASSIbranch, the transmission function of the coded aperture isgenerated as a random Bernoulli distribution with p = 0.5,and the dispersion function of the prism is assumed to bea linear distribution for simplicity. For the Lytro branch,we further divide the spectral coordinate into three intervalsas 450-530nm, 540-600nm, and 610-690nm, which corre-spond to the blue, green, and red channels and specify thedictionary that should be used for recovering a certain band-wise light ﬁeld (i.e., determining i and j in Eq. 8).For simulation, we test three different angular resolu-tions of S = 5, 7, 9. The spectral resolution remains as Ω =25, and the spatial resolution is slightly cropped after recti-ﬁcation. The parameters used in our proposed dictionary-based reconstruction (DBR) algorithm are selected opti-mally through a cross-validation process. For dictionarylearning, the 4D patch size is set as m = 6 × 6 × S × S,and 30000 patches are randomly sampled from each of theblue, green, and red light ﬁelds, respectively. After KSVD,there are n = 2m atoms remaining in the dictionary. Themaximum iteration number of DBR is set to 80 and τ isset to 0.004, 0.002, and 0.0005 when S equals to 5,7,and 9, respectively. For comparison, we also generate thereconstruction results using the two-step iterative shrink-age/thresholding (TwIST) algorithm [5] along with the totalvariation regularizer.Quantitative evaluation. Two quantitative image qual-ity metrics, peak signal-to-noise ratio (PSNR) and spectralangle mapping (SAM) [15], are adopted to evaluate the re-construction ﬁdelity. PSNR measures the spatial ﬁdelity ofreconstruction, which is calculated based on each 2D spa-tial image and then averaged over the spectral and angu-lar dimensions. SAM measures the spectral ﬁdelity of re-construction, which is calculated based on each 1D spectralvector and then averaged over the spatial and angular di-mensions. The PSNR and SAM results of TwIST and DBRare reported in Table 1. It can be seen that both TwIST andDBR decently recover the 5D hyperspectral light ﬁeld atdifferent angular resolutions, which demonstrates the fea-sibility of our proposed hybrid imaging model. Moreover,DBR outperforms TwIST with an average of 2.45dB gainin PSNR and 15% decrease in SAM (a smaller SAM indi-cates a higher ﬁdelity reconstruction), which validates theeffectiveness of the sparse-constraint reconstruction usingthe self-learned dictionaries.Qualitative evaluation. Figure 5 shows one selectedband from the central view of the reconstructed hyperspec-Table 1. Quantitative evaluation of two reconstruction methods atdifferent angular resolutions.ViewsScenePSNRSAMTwISTDBRTwISTDBR5 × 5Boards34.4437.290.0970.079Toys36.4338.600.0770.065Fruits36.3538.860.0740.0657 × 7Boards33.9836.800.0980.082Toys36.1838.120.0780.066Fruits35.9737.90.0770.0679 × 9Boards32.9936.300.1080.092Toys35.7138.000.0870.074Fruits35.7238.010.0820.072Average35.3137.760.0860.073tral light ﬁeld for each scene when S equals to 5. We can seethat, on the one hand, the original image is decently recov-ered under the hybrid imaging model through either TwISTor DBR. On the other hand, as can be easily observed fromthe zoom-in results, TwIST tends to smear out the objectdetails due to its local smoothness prior, while DBR betterpreserves the object details by further exploiting the corre-lations across the angular and spectral dimensions throughthe self-learned dictionaries.View-wise and band-wise evaluation. For an inspec-tion across the angular and spectral dimensions, Figure 6shows the view-wise and band-wise PSNR results of theDBR reconstruction of the Boards scene when S equalsto 5.As demonstrated, in terms of the spectral dimen-sion, the PSNR of each band in a certain view varies alongwith wavelength, due to the non-uniform spectral sensitiv-ity of the detectors. In terms of the angular dimensions,all views share a similar PSNR distribution, as they aretreated equally in simulation. In practice, due to the vi-gnetting effect of the micro-lens-array in the light ﬁeld cam-era, the central view generally has higher reconstruction ﬁ-delity than the corner views, as can be seen from the hard-ware experimental results.Spectral signature evaluation. For a more comprehen-sive comparison, Figure 7 shows the recovered spectral sig-natures at two selected spatial points from the central viewof the Boards scene when S equals to 5. Besides TwISTand DBR, we also generate the spectral signature throughinterpolation from the RGB values [23]. As can be seen,the interpolation results have a large deviation from thegroundtruth, while the DBR results are the closest to thegroundtruth. Table 2 gives the corresponding root-mean-square-error (RMSE) of the recovered signatures by differ-ent methods, which conﬁrms the superiority of DBR.3274Figure 5. Reconstruction results of one selected band from the central view of three scenes (5 × 5 views). From top to bottom: Boards(620nm), Toys (570nm), and Fruits (520nm). (Please see the electronic version for better visualization.)4505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040450550650304045055065030404505506503040Figure 6. View-wise and band-wise PSNR results of the DBR re-construction of the Boards scene (5 × 5 views). Horizontal axisfor wavelength and vertical axis for PSNR in each view.6. Experiments6.1. Hardware systemFigure 8 demonstrates the prototype system we have de-veloped for snapshot hyperspectral light ﬁeld imaging. Theincident light from the scene is equally divided by a beamsplitter and captured by Lytro and CASSI, respectively. TheLytro branch captures an RGB light ﬁeld with 9 × 9 viewsat a spatial resolution of 380 × 380. The main objectivelens of Lytro has a ﬁxed f-number of f/2 [10]. In the CASSI450500550600650700wavelength(nm)0.120.130.140.150.160.170.180.19Normalized IntensityGroundtruthInterpolationTwISTDBR(a)450500550600650700wavelength(nm)0.140.150.160.170.180.190.20.21Normalized IntensityGroundtruthInterpolationTwISTDBR(b)Figure 7. Spectral signatures at two selected spatial points fromthe central view of the Boards scene (5 × 5 views).Table 2. RMSE of spectral signatures at two points in Figure 7.PointDBRTwISTInterpolation(a)0.00290.00380.0106(b)0.00310.00500.0132branch, an 8mm objective lens is used to project the sceneonto a coded aperture, for which the f-number is also set tof/2 to match with Lytro. The manufactured coded apertureis a random binary pattern with 300×300 elements and eachelement has a size of 10µm × 10µm. A double Amici prismvertically disperses the spectrum with the center wavelengthat 550nm. Each element on the coded aperture is mapped to2×2 pixels on a panchromatic detector (PointGrey FL3-U3-13Y3M-C) by a relay lens (Edmund 45-762), so the spatialresolution of the CASSI measurement is 600 × 600. An op-tical ﬁlter with a passband from 500nm to 700nm is used ineach branch to restrict the spectrum to the same range.The calibration of our system contains two steps: cal-3275Figure 8. Prototype of snapshot hyperspectral light ﬁeld imager.ibration of CASSI and calibration between Lytro andCASSI. The CASSI calibration is conducted following theprocedures in the seminal work [26], from which the obser-vation matrix of CASSI is obtained and the entire spectrumspanning over the passband of the optical ﬁlter is discretizedinto 27 bands with different intervals. The calibration be-tween Lytro and CASSI is conducted using a checkerboardscene. Owing to the CASSI calibration, we only need toalign the central view of Lytro with the projection of onewavelength on the CASSI detector, and the alignment withother wavelengths can then be easily deduced. To this end,the checkerboard is illuminated by monochromatic light andcaptured by Lytro and CASSI simultaneously. Once the op-tical components in the CASSI b

下载后可阅读完整内容，剩余1页未读，立即下载