RealmDreamer：文本生成3D场景，融合修复与深度扩散

版权申诉

毕业设计

计算机视觉

180 浏览量更新于2024-06-13 2 收藏 32.42MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

" RealmDreamer: 文本驱动的3D场景生成，具有修复和深度扩散功能。这项技术能够根据文本描述创建3D场景，利用先进的文本到图像生成器、3D高斯Splatting表示法、图像条件扩散模型以及深度扩散模型来实现。它无需视频或多视图数据，能合成具有丰富几何结构的高质量3D场景，适用于3D毕业设计和计算机视觉领域。" 在计算机视觉和3D场景生成领域， RealmDreamer 是一个创新的技术，其核心是将自然语言处理与3D建模相结合，为用户提供了一种从文本描述直接生成逼真3D场景的方法。这个方法的实现依赖于几个关键组件：首先， RealmDreamer 使用了3D高斯Splatting表示法，这是一种将3D空间中的信息用高斯分布的点云表示的技术。通过优化这种表示，系统能够更好地匹配复杂的文本提示，从而构建出与描述相符的3D环境。其次，技术中融入了预训练的文本到图像生成器，如DALL-E或COCO-GAN等，它们能够从文本输入中生成2D图像。 RealmDreamer 将这些生成的2D图像样本提升到3D维度，并计算出各元素之间的遮挡关系，这有助于构建出具有正确深度关系的3D场景。接着，引入了图像条件扩散模型，这一模型能够在多个视图中优化3D表示，实现跨视图的3D修复任务。这意味着即使从不同的角度查看，场景也能保持一致性和连贯性。在学习正确的几何结构方面， RealmDreamer 利用了深度扩散模型。通过对内修模型中的样品进行调节，它可以学习并生成精细的几何细节，创造出丰富的3D形状和结构。最后，为了进一步提升生成场景的质量和清晰度， RealmDreamer 使用了来自图像生成器的锐化样本对整个模型进行微调。这样处理后的3D场景不仅有真实的视觉效果，还能包含多个对象，并且风格多样。值得注意的是， RealmDreamer 的强大之处在于其通用性。它不需要视频或多视图数据，仅依赖单个图像就能进行3D合成。这一特性极大地扩展了其在实际应用中的可能性，例如虚拟现实(VR)、增强现实(AR)、游戏设计和电影特效等领域。 RealmDreamer 通过结合多种先进的机器学习模型和技术，为3D场景生成开辟了新的途径，为用户提供了一个直观且高效的工具，通过简单的文本输入即可创造出具有深度、细节和多样性的3D世界。

资源详情

资源推荐

Figure 5. Progression of 3D Model after each stage. In this ﬁgure, we show how the 3D model changes after each stage in our pipeline.

As shown in a) Stage 1 (Sec. 4.1 creates a point cloud with many empty regions. In b), we show the subsequent inpainted model from

Stage 2 (Sec. 4.2). Finally, the ﬁne-tuning stage (Sec. 4.4) reﬁnes b) to produce the ﬁnal model, with greater cohesion and sharper detail.

dients outside the masked region.

4.3. Depth diffusion for text-to-3D

Recently, image-conditional depth diffusion models have

shown state-of-the-art results [29] for relative depth estima-

tion by ﬁnetuning a large-scale pre-trained RGB prior [51]

on depth datasets. In this section, we show how to distill the

knowledge from such methods for text-to-3D synthesis. We

consider a depth diffusion model ϵ

depth

which is conditioned

by images and text.

As mentioned in the previous section, during training,

we can draw samples ˆx using the inpainting diffusion model

inpaint

applied to a noisy rendering of the current scene. Our

insight is to use these clean samples as the conditioning for

the depth diffusion model (shown in Fig. 4). Starting from

pure noise d

∼ N (0, I), we predict the normalized depth

using DDIM sampling [59]. We then compute the (negated)

Pearson Correlation between the rendered depth and sam-

pled depth:

depth

= −

−

)(

−

)

−

)

(

−

)

(6)

where d is the rendered depth from the 3DGS model and n

is the number of pixels.

4.4. Optimization and Finetuning

The ﬁnal loss for the ﬁrst training stage of our pipeline is

thus:

init

= L

inpaint

+ L

depth

. (7)

After training with this loss, we have a 3D scene that

roughly corresponds to the text prompt, but which may lack

cohesiveness between the reference image I

ref

and the in-

painted regions. To remedy this, we incorporate an addi-

tional lightweight ﬁnetuning phase. In this phase, we utilize

a vanilla text-to-image diffusion model ϵ

text

personalized for

the input image [15, 37, 48, 52]. We compute ˆx using the

same procedure as in Sec. 4.2, except with ϵ

text

. The loss

text

is the same as Eq. (5), except with the ˆz and ˆx sampled

with this ﬁnetuned diffusion model ϵ

text

To encourage sharp details in our model, we use a lower

noise strength than the inpainting stage and uniformly sam-

ple the timestep t within this limited range. We also propose

a novel sharpening procedure, which improves the sharp-

ness of our ﬁnal 3D model. Instead of using ˆx to com-

pute the image-space diffusion loss introduced earlier, we

use S(ˆx), where S is a sharpening ﬁlter applied on samples

from the diffusion model. Finally, to encourage high opac-

ity points in our 3DGS model, we also add an opacity loss

opacity

per point that encourages the opacity of points to

reach either 0 or 1, inspired by the transmittance regularizer

used in Plenoxels [16].

The combined loss for the ﬁne-tuning stage can be writ-

ten as:

ﬁnetune

= L

text

+ λ

opacity

, (8)

where λ

opacity

is a hyperparameter controlling the effect of

the opacity loss.

4.5. Implementation Details

Point Cloud Initialization. We implement the point cloud

initialization stage (Sec. 4.1) in Pytorch3D [49], with Stable

Diffusion [51] as our inpainting model. To lift the generated

samples to 3D, we use a high quality monocular depth esti-

剩余27页未读，继续阅读

人工智能_SYBH

粉丝: 4w+
资源: 220

会员权益专享

RealmDreamer：文本生成3D场景，融合修复与深度扩散

1719378276792.jpg

054ssm-jsp-mysql旅游景点线路网站.zip（可运行源码+数据库文件+文档）

基于单片机的篮球赛计时计分器.doc

基于springboot开发华强北商城二手手机管理系统vue+mysql+论文（毕业设计）.zip

wx152微信阅读小程序-ssm+vue+uniapp.zip（可运行源码+sql文件+）

05-5 瓷夹或塑料夹配线质量管理.doc

web网页html版基于python深度学习识别水面漂浮垃圾-含图片数据集.zip

arcgis api for js4.x聚类示例数据（芝加哥城市2016年第四季度犯罪记录）(免费)

SystemView-Windows-V354-x64.exe

51单片机鱼缸温度控制器（程序以及protues仿真）

05 高聚物改性沥青卷材屋面防水层分项工程质量管理.doc

基于ssm框架+微信小程序的培训咨询平台项目源码

报名预约.exe

CS西电高悦计网课设-校园网设计

51单片机与Matlab串口通讯、GUI设计方法附单片机、Matlab源程序、效果图

基于mvc结构生成的javaweb网页设计

demo_dhc.py

会员权益专享

最新资源