数据结构深度解析：从基础到高级

需积分: 11 124 浏览量更新于2024-07-23 收藏 13.75MB PDF 举报

"该资源是一本关于高级数据结构的书籍，涵盖了抽象数据类型、各种类型的数组、列表、二叉树、B树、堆、尝试树、多路树、空间分割树、哈希、图以及附录等内容。书中详细介绍了各种数据结构的概念、实现和应用，适合计算机科学和软件工程领域的学习者或从业者。" 在计算机科学中，数据结构是组织和存储数据的方式，它直接影响到算法的效率和程序设计的复杂性。以下是根据标题和描述中的关键概念所展开的知识点： 1. **抽象数据类型(ADT)**：ADT是一种高级编程概念，它定义了一组操作以及这些操作如何作用于一组数据值。ADT不关注具体的实现细节，而是关注其对外的接口和行为。例如，列表、栈、队列、优先队列、映射和集合等都是常见的ADT。 2. **数组**：数组是最基础的数据结构，它是一个固定大小的元素序列，可以通过索引来访问每个元素。数组的变种包括动态数组、稀疏数组、位数组和位棋盘等，它们在特定场景下有各自的优缺点。 3. **列表**：列表是有序数据的集合，可以是链接列表或数组实现。链接列表包括单链表、双链表、XOR链接列表、无环链表、VList和跳跃列表等。跳跃列表是一种高效的数据结构，用于快速查找，其性能接近于平衡二叉搜索树。 4. **二叉树**：二叉树每个节点最多有两个子节点，分为左子节点和右子节点。二叉搜索树是一种特殊的二叉树，其中每个节点的值大于其左子树中的所有节点，小于其右子树中的所有节点。自平衡二叉搜索树如AVL树和红黑树，通过旋转操作保持平衡，以确保高效的查找性能。 5. **堆**：堆是一种特殊的完全二叉树，满足堆性质（父节点的值大于或小于其子节点的值，具体取决于是最大堆还是最小堆）。堆常用于优先队列的实现，也用于排序算法如堆排序。 6. **B树**：B树是一种自平衡的多路查找树，适用于大量数据的存储系统，如数据库和文件系统。 7. **哈希**：哈希数据结构利用哈希函数将数据映射到固定大小的桶中，实现快速查找、插入和删除操作。哈希冲突是哈希表的一个重要问题，通常通过开放寻址法或链地址法解决。 8. **图**：图由节点和边构成，用于表示对象之间的关系。图可以用来解决许多问题，如最短路径、网络流和图遍历等。以上仅是高级数据结构中的一部分内容，每一种数据结构都有其特定的应用场景和优化策略。深入理解和掌握这些数据结构对于编写高效的算法和软件至关重要。

Compressed data structure

The term compressed data structure arises in the computer science subfields of algorithms, data structures, and

theoretical computer science. It refers to a data structure whose operations are roughly as fast as those of a

conventional data structure for the problem, but whose size can be substantially smaller. The size of the compressed

data structure is typically highly dependent upon the entropy of the data being represented.

Important examples of compressed data structures include the compressed suffix array

[1][2]

and the FM-index,

[3]

both of which can represent an arbitrary text of characters T for pattern matching. Given any input pattern P, they

support the operation of finding if and where P appears in T. The search time is proportional to the sum of the length

of pattern P, a very slow-growing function of the length of the text T, and the number of reported matches. The space

they occupy is roughly equal to the size of the text T in entropy-compressed form, such as that obtained by

Prediction by Partial Matching or gzip. Moreover, both data structures are self-indexing, in that they can reconstruct

the text T in a random access manner, and thus the underlying text T can be discarded. In other words, they

simultaneously provide a compressed and quickly searchable representation of the text T. They represent a

substantial space improvement over the conventional suffix tree and suffix array, which occupy many times more

space than the size of T. They also support searching for arbitrary patterns, as opposed to the inverted index, which

can support only word-based searches. In addition, inverted indexes do not have the self-indexing feature.

An important related notion is that of a succinct data structure, which uses space roughly equal to the

information-theoretic minimum, which is a worst-case notion of the space needed to represent the data. In contrast,

the size of a compressed data structure depends upon the particular data being represented. When the data are

compressible, as is often the case in practice for natural language text, the compressed data structure can occupy

substantially less space than the information-theoretic minimum.

References

[1] R. Grossi and J. S. Vitter, Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching], Proceedings

of the 32nd ACM Symposium on Theory of Computing, May 2000, 397-406. Journal version in SIAM Journal on Computing, 35(2), 2005,

378-407.

[2] R. Grossi, A. Gupta, and J. S. Vitter, High-Order Entropy-Compressed Text Indexes, Proceedings of the 14th Annual SIAM/ACM Symposium

on Discrete Algorithms, January 2003, 841-850.

[3] P. Ferragina and G. Manzini, Opportunistic Data Structures with Applications, Proceedings of the 41st IEEE Symposium on Foundations of

Computer Science, November 2000, 390-398. Journal version in Indexing Compressed Text, Journal of the ACM, 52(4), 2005, 552-581.

Search data structure

In computer science, a search data structure is any data structure that allows the efficient retrieval of specific items

from a set of items, such as a specific record from a database.

The simplest, most general, and least efficient search structure is merely an unordered sequential list of all the items.

Locating the desired item in such a list, by the linear search method, inevitably requires a number of operations

proportional to the number n of items, in the worst case as well as in the average case. Useful search data structures

allow faster retrieval; however, they are limited to queries of some specific kind. Moreover, since the cost of

building such structures is at least proportional to n, they only pay off if several queries are to be performed on the

same database (or on a database that changes little between queries).

Static search structures are designed for answering many queries on a fixed database; dynamic structures also allow

insertion, deletion, or modification of items between successive queries. In the dynamic case, one must also consider

the cost of fixing the search structure to account for the changes in the database.

Classification

The simplest kind of query is to locate a record that has a specific field (the key) equal to a specified value v. Other

common kinds of query are "find the item with smallest (or largest) key value", "find the item with largest key value

not exceeding v", "find all items with key values between specified bounds v

min

and v

max

In certain databases the key values may be points in some multi-dimensional space. For example, the key may be a

geographic position (latitude and longitude) on the Earth. In that case, common kinds of queries are find the record

with a key closest to a given point v", or "find all items whose key lies at a given distance from v", or "find all items

within a specified region R of the space".

A common special case of the latter are simultaneous range queries on two or more simple keys, such as "find all

employee records with salary between 50,000 and 100,000 and hired between 1995 and 2007".

Single ordered keys

• Array if the key values span a moderately compact interval.

• Priority-sorted list; see linear search

• Key-sorted array; see binary search

•• Self-balancing binary search tree

•• Hash table

Finding the smallest element

•• Heap

Asymptotic amortized worst-case analysis

In this table, the asymptotic notation O(f(n)) means "not exceeding some fixed multiple of f(n) in the worst case."

Search data structure

Insert Delete Search Find maximum Space usage

Unsorted array O(1) O(1) O(n) O(n) O(n)

Value-indexed array O(1) O(1) O(1) O(n) O(n)

Sorted array O(n) O(n) O(log n) O(1) O(n)

Unsorted linked list

O(1)

* O(n) O(n) O(n)

Sorted linked list

O(n)

O(1)

* O(n) O(1) O(n)

Self-balancing binary tree O(log n) O(log n) O(log n) O(log n) O(n)

Heap O(log n)

O(log n)

** O(n) O(1) O(n)

Hash table O(1) O(1) O(1) O(n) O(n)

* The cost to add or delete an element into a known location in the list (i.e. if you have an iterator to the location) is O(1). If you don't know the

location, then you need to traverse the list to the location of deletion/insertion, which takes O(n) time. ‚** The deletion cost is O(log n) for the

minimum or maximum, O(n) for an arbitrary element.

This table is only an approximate summary; for each data structure there are special situations and variants that may

lead to different costs. Also two or more data structures can be combined to obtain lower costs.

Footnotes

Persistent data structure

In computing, a persistent data structure is a data structure that always preserves the previous version of itself

when it is modified. Such data structures are effectively immutable, as their operations do not (visibly) update the

structure in-place, but instead always yield a new updated structure. (A persistent data structure is not a data

structure committed to persistent storage, such as a disk; this is a different and unrelated sense of the word

"persistent.")

A data structure is partially persistent if all versions can be accessed but only the newest version can be modified.

The data structure is fully persistent if every version can be both accessed and modified. If there is also a meld or

merge operation that can create a new version from two previous versions, the data structure is called confluently

persistent. Structures that are not persistent are called ephemeral.

[1]

These types of data structures are particularly common in logical and functional programming, and in a purely

functional program all data is immutable, so all data structures are automatically fully persistent.

[1]

Persistent data

structures can also be created using in-place updating of data and these may, in general, use less time or storage

space than their purely functional counterparts.

While persistence can be achieved by simple copying, this is inefficient in time and space, because most operations

make only small changes to a data structure. A better method is to exploit the similarity between the new and old

versions to share structure between them, such as using the same subtree in a number of tree structures. However,

because it rapidly becomes infeasible to determine how many previous versions share which parts of the structure,

and because it is often desirable to discard old versions, this necessitates an environment with garbage collection.

Persistent data structure

Partially persistent

In partially persistence model, we may query any previous version of the data structure, but we may only update the

latest version. This implies a linear ordering among the versions.

Three methods on balanced binary search tree:

Fat Node

Fat node method is to record all changes made to node fields in the nodes themselves, without erasing old values of

the fields. This requires that we allow nodes to become arbitrarily “fat”. In other words, each fat node contains the

same information and pointer fields as an ephemeral node, along with space for an arbitrary number of extra field

values. Each extra field value has an associated field name and a version stamp which indicates the version in which

the named field was changed to have the specified value. Besides, each fat node has its own version stamp,

indicating the version in which the node was created. The only purpose of nodes having version stamps is to make

sure that each node only contains one value per field name per version. In order to navigate through the structure,

each original field value in a node has a version stamp of zero.

Complexity of Fat Node

With using fat node method, it requires O(1) space for every modification: just store the new data. Each modification

takes O(1) additional time to store the modification at the end of the modification history. This is an amortized time

bound, assuming we store the modification history in a growable array. For access time, we must find the right

version at each node as we traverse the structure. If we made m modifications, then each access operation has

O(logm) slowdown, and a query in our region computation will take O(log2n) time. Since we have arranged the

modification history, using time stamp, so it takes O(logm) time to find the last modification before an arbitrary time

stamp.

Path Copying

Path copy is to make a copy of all nodes on the path which contains the node we about to insert or delete. Then you

have to cascade the change back through the data structure: all nodes that pointed to the old node must be modified

to point to the new node instead. These modifications cause more cascading changes, and so on, until we reach to the

root. Maintain an array of roots indexed by timestamp. The data structure pointed to by time t’s root is exactly time

t’s date structure.

Complexity of Path Copying

With m modifications, this costs O(logm) additive lookup time. Modification time and space are bounded by the size

of the structure, since a single modification may cause the entire structure to be copied. That’s O(m) for one update,

and thus O(n2) preprocessing time.

A combination

Sleator, Tarjan et al. came up with a way to combine the advantages of fat nodes and path copying, getting O(1)

access slowdown and O(1) modification space and time.

In each node, we store one modification box. This box can hold one modification to the node—either a modification

to one of the pointers, or to the node’s key, or to some other piece of node-specific data—and a timestamp for when

that modification was applied. Initially, every node’s modification box is empty.

Whenever we access a node, we check the modification box, and compare its timestamp against the access time.

(The access time specifies the version of the data structure that we care about.) If the modification box is empty, or

the access time is before the modification time, then we ignore the modification box and just deal with the normal

Persistent data structure

part of the node. On the other hand, if the access time is after the modification time, then we use the value in the

modification box, overriding that value in the node. (Say the modification box has a new left pointer. Then we’ll use

it instead of the normal left pointer, but we’ll still use the normal right pointer.)

Modifying a node works like this. (We assume that each modification touches one pointer or similar field.) If the

node’s modification box is empty, then we fill it with the modification. Otherwise, the modification box is full. We

make a copy of the node, but using only the latest values.(That is, we overwrite one of the node’s fields with the

value that was stored in the modification box.) Then we perform the modification directly on the new node, without

using the modification box. (We overwrite one of the new node’s fields, and its modification box stays empty.)

Finally, we cascade this change to the node’s parent, just like path copying. (This may involve filling the parent’s

modification box, or making a copy of the parent recursively. If the node has no parent—it’s the root—we add the

new root to a sorted array of roots.)

With this algorithm, given any time t, at most one modification box exists in the data structure with time t. Thus, a

modification at time t splits the tree into three parts: one part contains the data from before time t, one part contains

the data from after time t, and one part was unaffected by the modification.

Complexity of the combination

Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1)

amortized time. To see why, use a potential function ϕ,where ϕ(T)is the number of full live nodes in T . The live

nodes of T are just the nodes that are reachable from the current root at the current time (that is, after the last

modification). The full live nodes are the live nodes whose modification boxes are full.

Each modification involves some number of copies, say k, followed by 1 change to a modification box. (Well, not

quite—you could add a new root—but that doesn’t change the argument.) Consider each of the k copies. Each costs

O(1) space and time, but decreases the potential function by one. (First, the node we copy must be full and live, so it

contributes to the potential function. The potential function will only drop, however, if the old node isn’t reachable in

the new tree. But we know it isn’t reachable in the new tree—the next step in the algorithm will be to modify the

node’s parent to point at the copy. Finally, we know the copy’s modification box is empty. Thus, we’ve replaced a

full live node with an empty live node, and ϕ goes down by one.) The final step fills a modification box, which costs

O(1) time and increases ϕ by one.

Putting it all together, the change in ϕ is Δϕ =1− k.Thus, we’ve paid O(k +Δϕ)= O(1) space and O(k +Δϕ +1) =

O(1) time.

Fully persistent

In fully persistent model, both updates and queries are allowed on any version of the data structure.

Confluently persistent

In confluently persistent model, we use combinatiors to combine input of more than one previous versions to output

a new single version. Rather than a branching tree, combinations of versions induce a DAG (direct acyclic graph)

structure on the version graph.

Examples of persistent data structures

Perhaps the simplest persistent data structure is the singly linked list or cons-based list, a simple list of objects

formed by each carrying a reference to the next in the list. This is persistent because we can take a tail of the list,

meaning the last k items for some k, and add new nodes on to the front of it. The tail will not be duplicated, instead

becoming shared between both the old list and the new list. So long as the contents of the tail are immutable, this

sharing will be invisible to the program.

剩余616页未读，继续阅读

DylanWalsh

粉丝: 0
资源: 1

数据结构深度解析：从基础到高级

Advanced-Data-structures

Algorithm-Advanced-Data-Structures-with-Python.zip

Algorithm-math-advanced-data-structures-and-algorithms.zip

Advanced-Data-Structures

数据结构Advanced-Data-Structures

Advanced-Data-Structures-and-Algorithms-in-Java-9:Packt发布的Java 9中的高级数据结构和算法

数据结构（英文版）advanced-data-structures

Advanced-data-Structures--COP5536:佛罗里达大学的Sartaj Sahni教授在“高级数据结构”课程中完成了作业和项目提交

Advanced-Data-Structures:动态规划、回溯和树遍历问题

leetcode和oj-Data-Structures-and-Algorithms:挑战的日子

最新资源