
Robust Flow-Guided Neural Prediction for Sketch-Based Freeform Surface Modeling • 238:3
and shape from contour, feature and representative curvature lines.
On the other hand, while these previous methods rely on detailed
user annotations to parse the sketches into curves of dierent func-
tions and often use expensive nonlinear numerical optimizations
to solve the 2D to 3D conversion, we utilize CNN models to parse
the sketch and infer the geometry with improved eciency and
reduced user sketch and annotations. See Sec. 5.2 for comparisons.
Data-driven methods. For many common objects and scenes, it
is usually reasoned that we humans envision their 3D shapes by
rst recognizing what they are and then matching a prior shapes
of the same category in memory to the observations. This idea
underlies a range of data-driven methods for sketch-based modeling,
as they generally separate the modeling task into two steps: rst
search matching shapes through a database against an input sketch,
and then adapt the retrieved shapes as necessary to t the input
sketch. Examples include pure sketch-based retrieval [Eitz et al
.
2012;
Su et al
.
2015; Wang et al
.
2015c], and retrieval with subsequent
adaptation and composition, like Sketch2Scene [Xu et al
.
2013] for
scene modeling and [Guo et al
.
2016; Lee and Funkhouser 2008; Xie
et al
.
2013] for object modeling. While these approaches signicantly
ease the user burden by providing abundant a prior knowledge for
a specic category of objects that the user tries to model, the tool
built for one category however does not generalize to others. In
comparison, our machine learning model learns the more generic
geometric reconstruction process rather than the knowledge of
class-specic 3D objects, which makes our method possibly less
ecient for modeling a particular class of objects but more generic
with ner levels of shape control provided to the user.
Later works in this domain do not explicitly separate the model
searching and adapting steps, but rely on the powerful deep neural
networks to map directly from sketch to 3D data, examples including
[Delanoy et al
.
2017; Lun et al
.
2017; Su et al
.
2018]. [Su et al
.
2018]
predicts normal maps from a category-specic 2D sketch by an
encoder-decoder network, which minimizes normal tting loss and
adversarial loss, and takes as optional input user specied normal
samples. [Delanoy et al
.
2017] uses a CNN to map sketches to a
volumetric occupancy grid representing the 3D shape, and allows the
incremental update of the shape through an updater CNN as the user
sketches in new views. However, it is shown that the CNN trained
for each object category does not generalize to other categories.
Besides, the volumetric representation restricts the resolution of
modeled shapes. The work by Lun et al. [2017] inputs category-
specic planar sketches from canonical viewpoints (front, side, top)
to a CNN with an encoder and thirteen decoders, each of which
outputs the depth and normal maps for one of thirteen predened
viewpoints, which are then fused together into a 3D mesh.
Dierent from [Delanoy et al
.
2017] and [Lun et al
.
2017] that
solve the generation of complete 3D shapes of trained categories, our
work focuses on modeling freeform surfaces that are represented as
depth maps, while also providing a multi-view fusion approach to
combine the surfaces into full 3D models. By modeling a surface at
a time using general geometric rules and learned priors for shape
from sketch, our approach is agnostic to shape categories. However,
we note that to break up a complete 3D shape into multiple surface
patches to be modeled sequentially is not always straightforward
to conceive for users, which we regard as a due price to pay for the
category-free advantage. To help the user modeling, our multi-view
interactive process allows the user to sketch in arbitrary views for
dierent parts of the shape incrementally, with surfaces modeled in
other views assisting the sketching in new view (Sec. 4).
Procedural and parametric models provide another kind of prior
knowledge, which eectively reduces the modeling task to a map-
ping from sketches to model parameters. Many works learn the
mapping from data, for modeling urban architectures [Nishida et al
.
2016], faces [Han et al
.
2017], and others [Huang et al
.
2016]. These
methods are tailored for the given parametric models and do not
generalize to other freeform shapes.
Recent works directly reconstruct from 2D images the 3D shapes
and scenes represented in depth and normal maps [Eigen and Fergus
2015; Tatarchenko et al
.
2016; Wang et al
.
2015a], point cloud [Fan
et al
.
2017] or volumetric grids [Choy et al
.
2016; Tatarchenko et al
.
2017; Wu et al
.
2016], utilizing data-driven and CNN models. In
this paper we focus on reconstructing high quality 3D shapes from
sketches which contain much sparser information than images, and
provide the user convenient control for 3D modeling.
3 SINGLE VIEW MODELING
In a single view, we recover depth and normal data from a sparse
planar sketch. There are two primary challenges for this process.
First, the sparse strokes in a sketch have dierent meanings, each af-
fecting proximate regions of the corresponding 3D shape dierently;
we need to parse the strokes consistently and interpolate their data
over the entire planar region to infer the 3D surface. To solve this
problem, we rely on a CNN model to parse the dierent input lines
automatically with minimal user specication, thus saving much
user eort. See Fig. 2 for an example where the ridges and valleys
are distinguished automatically from input unlabeled strokes.
Second, the 2D sketches have inherent ambiguity of what 3D
shapes they represent, which can fail whatever powerful machine
learning model that tries brute force regression of the reconstruction.
Previous approaches resolve the ambiguity usually by restricting to
shapes of common classes; as a result, such a model works well for
the particular category it is trained for but does not generalize to
others [Delanoy et al
.
2017; Lun et al
.
2017]. We instead strive for
more general freeform shape modeling and focus on using geometric
principles with optional user input to combat ambiguities.
To summarize, at the core of our single view modeling is a two-
stage CNN regression model (Fig. 2):
•
Given the input sketches, a rst stage subnetwork (DFNet)
regresses the ow eld, a dense signal that describes the
surface curvature directions and guides its reconstruction
(Sec. 3.2).
•
A second stage subnetwork (GeomNet) takes the sketch and
ow eld guidance, and predicts depth/normal maps, and a
condence map that shows how much ambiguity there is for
each point of the input sketch (Sec. 3.3).
In addition, the user can further modify the surface and resolve
ambiguity, by providing curvature hints over strokes, or depth values
on sparse sample points; our CNN model is trained to utilize these
optional inputs. Next we discuss the single view modeling in detail.
ACM Transactions on Graphics, Vol. 37, No. 6, Article 238. Publication date: November 2018.