单目视频实现人体动态捕捉：MonoPerfCap

单目重建

需积分: 31 186 浏览量更新于2024-07-18 收藏 8.54MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

资源详情

资源推荐

MonoPerfCap: Human Performance Capture from Monocular Video • 39:3

Fig. 2. Given a monocular video and a personalized actor rig, our approach reconstructs the actor motion as well as medium-scale surface deformations.

The monocular reconstruction problem is solved by joint recovery of temporally coherent per-batch motion based on a low dimensional trajectory subspace.

Non-rigid alignment based on automatically extracted silhouees is used to beer match the input.

manual initialization [Wren et al

1997] and pose correction. [Wei

and Chai 2010] obtain high quality 3D pose from challenging sport

video sequences using physical constraints, but require manual

joint position annotations for each keyframe (every 30 frames). Also

simpler temporal priors have been applied [Sidenbladh et al

2000;

Urtasun et al

2005, 2006]. With recent advances in convolutional

neural networks (CNNs), fully-automatic, high accuracy 2D pose

estimation [Jain et al

2014; Newell et al

2016; Pishchulin et al

2016;

Toshev and Szegedy 2014; Wei et al

2016] is feasible from a single

image. Lifting the 2D detections to the corresponding 3D pose is

common [Akhter and Black 2015; Li et al

2015; Mori and Malik 2006;

Simo-Serra et al

2012; Taylor 2000; Wang et al

2014; Yasin et al

2016], but is a hard and underconstrained problem [Sminchisescu

and Triggs 2003b]. [Bogo et al

2016] employ a pose prior based on

a mixture of Gaussians in combination with penetration constraints.

The approach of [Zhou et al

2015] reconstructs 3D pose as a sparse

linear combination of a set of example poses. Direct regression from

a single image to the 3D pose is an alternative [Ionescu et al

2014a;

Li and Chan 2014; Mehta et al

2016; Pavlakos et al

2016; Tekin

et al

2016; Zhou et al

2016a], but leads to temporally incoherent

reconstructions.

Promising are hybrid approaches that combine discriminative 2D-

[Elhayek et al

2015] and 3D-pose estimation techniques [Rosales

and Sclaro 2006; Sminchisescu et al

2006] with generative image

formation models, but these approches require multiple views of

the scene. Recently, a real-time 3D human pose estimation approach

has been proposed [Mehta et al

2017], which also relies on monoc-

ular video input. It is a very fast method, but does not achieve the

temporal stability and robustness to dicult poses of our approach.

In contrast to this previous work, our method not only estimates

the 3D skeleton more robustly, by leveraging the complimentary

strength of 2D and 3D discriminative models, and trajectory sub-

space constraints, but also recovers medium-scale non-rigid surface

deformations that can not be modeled using only skeleton subspace

deformation. We extensively compare to the approach of [Mehta

et al. 2017] in Sec. 6.

Dense Monocular Shape Reconstruction. Reconstructing strongly

deforming non-rigid objects and humans in general apparel given

just monocular input is an ill-posed problem. By constraining the

solution to a low-dimensional space, coarse human shape can be

reconstructed based on a foreground segmentation [Chen et al

2010;

Fig. 3. Acquisition of a textured template mesh from handheld video footage

of the actor in a static pose.

Grest et al

2005; Guan et al

2009; Jain et al

2010; Rogge et al

2014;

Zhou et al

2010]. Still, these approaches rely on manual initial-

ization and correction steps. Fully automatic approaches combine

generative body models with discriminative pose and shape estima-

tion, e.g. conditioned on silhouette cues [Sigal et al

2007] and 2D

pose [Bogo et al

2016], but can also only capture skin-tight clothing

without surface details. The recent work of [Huang et al

2017],

which ts a parametric human body model to the 2D pose detection

and the silhouettes over time, has demonstrated compelling results

on both multi-view and monocular data. But again, their method

is not able to model loose clothing. Model-free reconstructions are

based on rigidity and temporal smoothness assumptions [Garg et al

2013; Russell et al

2014] and only apply to medium-scale deforma-

tions and simple motions. Template-based approaches enable fast

sequential tracking [Bartoli et al

2015; Salzmann and Fua 2011; Yu

et al

2015], but are unable to capture the fast and highly articulated

motion of the human body. Automatic monocular performance cap-

ture of more general human motion is still an unsolved problem,

especially if non-rigid surface deformations are taken into account.

Our approach tackles this challenging problem.

3 METHOD OVERVIEW

Non-rigid 3D reconstruction from monocular RGB video is a chal-

lenging and ill-posed problem, since the subjects are partially visible

at each time instance and depth cues are implicit. To tackle the

problem of partial visibility, similar to many previous works, we

employ a template mesh, pre-acquired by image based monocular

reconstruction of the actor in a static pose. When it comes to the

2018-02-26 01:54 page 3 (pp. 1-15) ACM Transactions on Graphics, Vol. 9, No. 4, Article 39. Publication date: March 2018.

剩余14页未读，继续阅读

lvan1234

粉丝: 3
资源: 9

单目视频实现人体动态捕捉：MonoPerfCap

OpenCV运动捕捉vs2015实例

BVH动作数据

video_capture

Triangulation Learning Network: from Monocular to Stereo 3D Object Detection目前存在的主要问题

基于单目相机的姿态跟踪推荐最近几年的论文

monocular cuese

./euroc_examples.sh: 行 7: ./Examples/Monocular/mono_euroc: 没有那个文件或目录

cvpr 深度估计_无监督单目视频深度估计中的uncertainty方法(CVPR'20)

ubuntu18.04安装orb_slam2

Dimension Embeddings for Monocular 3D Object Detectio

如何用ORBSLAM运行自己录制的视频

./Monocular/mono_euroc: error while loading shared libraries: libORB_SLAM3.so: cannot open shared object file: No such file or directory

ORB SLAM2 ROS单目摄像头环境搭建

digging into self-supervised monocular depth estimation

ubuntu 16.04安装orb slam3

unsupervised monocular depth estimation with left-right consistency

improving sea-thru with monocular depth estimation methods

ORB_SLAM3运行TUM评估

lsd-slam: large-scale direct monocular slam

最新资源