解决C++持久化对象中的隐藏指针问题

需积分: 3 92 浏览量更新于2024-10-20 收藏 317KB PDF 举报

"C++对象持久化：隐藏指针问题与对象布局的形式验证" 在C++编程语言中，具有虚函数或虚基类的对象在C11标准下包含一种特殊的内存指针，即“隐藏指针”（volatile 'memory' pointers）。这些指针未在用户指定的代码中明确表示，当这类C11对象被持久化后，这些指针在程序的不同执行实例之间可能会变得无效。这个问题在基于C11的数据库语言O11的实现中尤为突出，因为O11扩展了C11，使其能够创建和访问持久对象。隐藏指针问题的核心在于，当C++对象被持久化存储并在后续的程序运行时重新加载，其内部的虚函数表指针和虚基类指针可能无法正确指向内存中的新位置。这会导致程序在访问这些对象时出现错误，如虚函数调用失败或者对象类型识别错误。解决这个问题的方法通常需要在不修改C11编译器或改变C11语义的前提下进行。可能的解决方案包括在持久化过程中更新隐藏指针的值，或者在恢复对象时动态地重新绑定这些指针。此外，还需要处理C++中基类指针可以指向派生类对象这一特性带来的问题，因为在持久化和恢复过程中，这种多态性可能导致意外的行为。为了确保对象布局的正确性，论文进一步探讨了对象布局的形式验证，特别是针对C++多重继承的复杂情况。作者们通过形式化的方法定义了一组C++对象布局策略，并使用Wassermann等人提出的多重继承操作语义进行机械验证。这种形式化方法足够灵活，可以适应空基类优化和尾部填充优化等空间节省技术。论文的应用成果是为现实的、优化的对象布局算法提供了首个正式的正确性证明，其中包括基于流行的GNU C++应用二进制接口的布局算法。这项工作不仅为发现和证明新的布局优化提供了语义基础，而且是向验证C++编译器前端迈出的重要一步。这个摘要涵盖了C++对象持久化的挑战，尤其是隐藏指针问题，以及通过形式化方法确保C++对象布局正确性的研究进展。这些问题对于开发高效、可靠的数据库系统和软件至关重要，特别是在需要持久化数据和多态性的场景下。通过深入理解和解决这些问题，开发者可以更好地设计和实现支持持久化存储的C++系统。

Assume the size and natural alignment

of type int are 4 bytes. To ensure

proper alignment of ﬁeld a, 3 bytes of

padding are inserted after ﬁeld b in B,

and 3 more bytes of padding are in-

serted after ﬁeld c in C. (To understand

why, consider arrays whose elements

have type B or C.) Consequently, B has

size 8 while C has size 12.

84 12

However, it is perfectly legitimate to reuse one

of the 3 bytes of padding present at the end of

B to hold the value of ﬁeld c of C. In this case,

C occupies only 8 bytes, for a nice space saving

of 1/3. This layout trick is known as the tail

padding optimization, and is used by several

C++ compilers, including GCC.

a cb

54 8

2.4 Empty base class optimization

Empty classes offer another opportunity for reusing space that is

unused in a base class. Consider:

struct A {};

struct B1 : A {};

struct B2 : A {};

struct C : B1 , B2 { char c1 ; char c2 ; };

C c;

A * a1 = (A *) ( B1 *) & c;

A * a2 = (A *) ( B2 *) & c;

Here, the classes A, B1 and B2 are empty: they

contain zero ﬁelds and they do not need any dy-

namic type information. It is tempting to give

them size 0, so that their instances occupy no

memory space at all. However, this would vi-

olate the “object identity” requirement: as de-

picted to the right, the pointers a1 and a2 (to

the two instances of A logically contained within

C) would compare equal, while C++’s semantics

mandate that they are different.

c1 c2

a1 == a2

The layout algorithm must therefore insert one

byte of padding in A, resulting in A, B1 and B2

having size 1. Following the naive approach

outlined in section 2.2, C therefore occupies 4

bytes.

B1 B2

a1 a2

However, it is unnecessary to keep the ﬁelds

c1 and c2 disjoint from the subobjects of types

B1 and B2: the padding inserted in the latter to

satisfy the object identity requirement can be

reused to hold the values of ﬁelds c1 and c2,

as shown to the right.

c1 c2

(B2*)c

a2a1

(B1*)c

This technique is known as the empty base class optimization. It is

implemented by many C++ compilers, but often in restricted cases

only. Here is an example where GCC 4.3 misses an opportunity for

reusing the tail padding of an empty base class.

struct A {};

struct B1 : A {};

struct B2 : A {};

struct C : B1 , B2 { char c ; };

struct D : C { char d ; };

As in the previous example, B1 and B2 must be laid out at different

offsets within C, to allow distinguishing the two A contained in C.

Thus, C must have size at least 2. This lower bound can be achieved

by placing c at offset 0, as explained above.

What about D? GCC places d at offset 2, resulting in a size of

3 for D. However, the second byte of C is just padding introduced

to preserve the identity of empty base classes, and it can be reused

to hold data such as ﬁeld d. This optimized layout ﬁts D in just two

bytes.

Why empty classes matter Over the years, successful C++ soft-

ware, such as the Standard Template Libraries (STL), have become

dependent on C++’s ability to deliver efﬁcient code based on sim-

ple techniques such as empty classes and inlining. Part of the suc-

cess of the STL is based on its archetypical use of function objects:

these are objects of class with the function call operator overloaded.

These objects typically do not carry any runtime data. Rather, their

semantics is their static types. As an example, consider sorting an

array of integers. The STL provides the following template:

template < typename Ran , typename Comp >

void sort(Ran first , Ran last , Comp cmp);

The comparator cmp could be a pointer to function (like the qsort

function in C), but in idiomatic C++ it is any object whose type

overloads the function call operator.

struct MyGreater {

typedef int first_argument_type ;

typedef int second_argument_type ;

typedef bool result_type ;

bool operator ()( int i , int j) const {

return i > j;

}

};

The sort template can, then, be invoked as sort(t, t + n,

MyGreater()). The comparator object constructed by MyGreater()

is of interest not for its runtime data (it carries none), but for its

type: The comparison function is not passed to the sorting routine

through data, but through the type of the object. Consequently, it

is directly called, and in most cases inlined since the body is a

simple integer comparison, which typically reduces to a single

machine instruction. This simple technique, a cornerstone of the

STL success, is effective. One can think of it as a simulation of

dependent types, where data is encoded at the level of types,

therefore making data ﬂow obvious.

The function object technique just described is at the basis of

a composable component of the STL. Composition implies the ex-

istence of a protocol that parts being composed should adhere to.

For example, if we want to combine two unary function objects,

we need a mechanism to ensure that the result type of one func-

tion object agrees with the argument type of the other. To that

end, the STL requires the existence of certain nested types, such

as first_argument_type, second_argument_type and result_type

in the MyGreater class above. To reduce clutter, the STL provides

ready-to-use classes that deﬁne those types. In this case, idiomati-

cally, one would write

struct MyGreater :

std:: binary_function <int,in t ,bool > {

bool operator ()( int i , int j) const {

return i > j;

}

};

The sole purpose of std::binary_function<int,int,bool> is to

provide those nested types. It is an empty base class that introduces

no data and no virtual functions. Its semantics is purely static. This

usage, while not object-oriented programming by most popular

deﬁnitions, is very common in modern C++ programs.

The pattern is not restricted to only one empty base class: a class

can simulatenously inherit from several empty base classes. This

is the case of bidirectional_input_iterator_tag, which inherits

3 2010/7/15

剩余11页未读，继续阅读

kurt6868

粉丝: 4

解决C++持久化对象中的隐藏指针问题

深入探讨gambit-persistent：持久性数据结构实现

Net::HTTP::Persistent: Ruby中的线程安全持久HTTP连接技术

graphql-w-persistent: SQL数据库的GraphQL服务包更新与示例

persistent:Dart 的高效持久数据结构

js-persistent：JavaScript的持久数据结构

cheese-mvc-persistent：Java Spring Boot Studio

C++ Common Persistent Objects Library-开源

persistent:Haskell的持久性接口允许多种存储方法

todobackend-scotty-persistent:不赞成使用todobackend-haskell

servant-persistent:持之以恒的仆人的一个简短例子

最新资源