相似学习方法：计算机视觉与文本挖掘之间的知识迁移

需积分: 10 155 浏览量更新于2024-07-18 收藏 6.18MB PDF 举报

《计算机视觉与文本挖掘之间的知识转移：相似性学习方法》一书探讨了在当今广泛应用机器学习技术的领域，特别是计算机视觉、生物信息学、信息检索、自然语言处理和音频处理等。本书的核心内容聚焦于相似性学习，这是一种基于训练样本对之间相似度、不相似度或距离关系的学习方法。这种方法可以是监督学习，也可以是无监督学习。书中首先介绍了基于新颖的图像差异度的最近邻模型，用于手写数字识别，展现了卓越的性能。接着，书中提出了用于视觉词直方图的新型核函数，实现了图像对象识别中的顶尖表现。针对面部表情识别，书中的研究展示了基于金字塔表示的多个核函数的应用。此外，作者还展示了如何通过字符串核在文本主题分类中成功运用金字塔表示，以及一种基于字符串核的本土语言识别方法，该方法不仅达到了最先进的性能水平，而且具有语言独立性和理论中立性。一个有趣的现象在于，书中涉及的机器学习任务可以大致划分为计算机视觉和字符串处理两个领域。尽管这两个领域表面上看似无关，但图像分析和字符串处理在某些方面存在着相通之处。书中强调了将图像和文本以相似方式处理的概念，在特定的计算机视觉应用中显示出了极大的潜力。例如，著名的图像分类方法之一就是受“词袋模型”启发，这进一步证明了两个领域间的知识转移可能性。本书的两位作者，Radu Tudor Ionescu和Marius Popescu，分别来自罗马尼亚布加勒斯特大学的计算机科学系，他们共同探讨了计算机视觉与文本挖掘之间的知识共享和相互影响。通过一系列的相似性学习方法，本书展示了如何跨越学科界限，实现跨领域的技术创新和性能提升。如果你对计算机视觉的最新进展、文本挖掘中的知识融合以及相似性学习策略感兴趣，这本书无疑是一个深入学习和探索的重要资源。

Figure 2.4 The function φ embeds the data into a feature space where

the nonlinear relations now appear linear. Machine

learning methods can easily detect such linear relations . . . . 24

Figure 4.1 Two images that are compared with LPD. a For every

position (x

; y

) in the ﬁrst image, LPD tries to ﬁnd a

similar patch in the second image. First, it looks at the

same position (x

; y

) in the second image. The patches are

not similar. b LPD gradually looks around position (x

; y

)

in the second ima ge to ﬁnd a similar patch. c LPD sum up

the spatial offset between the similar patches at ( x

; y

)

from the ﬁrst image and (x

; y

) from the second image . . . . 57

Figure 4.2 A random sample of 15 handwritten digits from the

MNIST data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Figure 4.3 A random sample of 12 images from the Birds data set.

There are two images per class. Images from the same

class sit next to each other in this figure . . . . . . . . . . . . . . . 63

Figure 4.4 Average accuracy rates of the 3-NN based on LPD model

with patche s of 1  1 pixels at the top and 2  2 pixels at

the bottom. Experiment performed on the MNIST subset

of 100 images. a Accuracy rates with patches of 1  1

pixels. b Accuracy rates with patches of 2  2 pixels . . . . . . 65

Figure 4.5 Average accuracy rates of the 3-NN based on LPD model

with patche s of 3  3 pixels at the top and 4  4 pixels at

the bottom. Experiment performed on the MNIST subset

of 100 images. a Accuracy rates with patches of 3  3

pixels. b Accuracy rates with patches of 4  4 pixels . . . . . . 66

Figure 4.6 Average accuracy rates of the 3-NN based on LPD model

with patche s of 5  5 pixels at the top and 6  6 pixels at

the bottom. Experiment performed on the MNIST subset

of 100 images. a Accuracy rates with patches of 5  5

pixels. b Accuracy rates with patches of 6  6 pixels . . . . . . 67

Figure 4.7 Average accuracy rates of the 3-NN based on LPD model

with patche s of 7  7 pixels at the top and 8  8 pixels at

the bottom. Experiment performed on the MNIST subset

of 100 images. a Accuracy rates with patches of 7  7

pixels. b Accuracy rates with patches of 8  8 pixels . . . . . . 68

Figure 4.8 Average accuracy rates of the 3-NN based on LPD model

with patches of 9  9 pixels at the top and 10  10 pixels

at the bottom. Experiment performed on the MNIST subset

of 100 images.

a Accuracy rates with patches of 9  9

pixels. b Accuracy rates with patches of 10  10 pixels . . . . 69

xvi List of Figures

Figure 4.9 Average accuracy rates of the 3-NN based on LPD model

with patches ranging from 2  2 pixels to 9  9 pixels.

Experiment performed on the MNIST subset of 300

images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Figure 4.10 Similarity matrix based on LPD with patches of 4  4

pixels and a similarity threshold of 0. 12, obtained by

computing pairwise dissimilarities between the samples

of the MNIST subset of 1000 images . . . . . . . . . . . . . . . . . 72

Figure 4.11 Euclidean distance matrix based on L

-norm, obtained by

computing pairwise distances between the samples of the

MNIST subset of 1000 images . . . . . . . . . . . . . . . . . . . . . 73

Figure 4.12 Error rate drops as K increases for 3-NN () and 6-NN ()

classiﬁers based on LPD with ﬁltering . . . . . . . . . . . . . . . . 77

Figure 4.13 Sample images from three classes of the Brodatz

data set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

Figure 4.14 Sample images from four classes of the UIUCTex data set.

Each image is showing a textured surfa ce viewed under

different poses. a Bark. b Pebbles. c Brick. d Plaid . . . . . . . 88

Figure 4.15 Sample images from the biomass texture data set. a Wheat.

b Waste. c Corn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Figure 4.16 Similarity matrix based on LTD with patches of 32  32

pixels and a similarity threshold of 0. 02, obtained by

computing pairwise dissimilarities between the texture

samples of the Brodatz data set . . . . . . . . . . . . . . . . . . . . . 92

Figure 4.17 Similarity matrix based on LTD with patches of 64  64

pixels and a similarity threshold of 0. 02, obtained by

computing pairwise dissimilarities between the texture

samples of the UIUCTex data set. . . . . . . . . . . . . . . . . . . . 94

Figure 5.1 The BOVW learning model for object class recognition.

The featu re vector consists of SIFT features computed on a

regular grid across the image (dense SIFT) and vector

quantized into visual words. The frequency of each visual

word is then recorded in a histogram. The histograms enter

the training stage. Learning is done by a kernel method . . . . 102

Figure 5.2 The spatial similarity of two images computed with the

SNAK framework. First, the center of mass is computed

according to the objectness map. The average position and

the standard deviation of the spatial distribution of each

visual word are computed next. The images are aligned

according to their centers, and the SNAK kernel is

computed by summing the distances between the average

positions and the standard deviations of each visual word

in the two images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

List of Figures xvii

Figure 5.3 A random sample of 12 images from the Pascal VOC data

set. Some of the images contain objects of more than one

class. For example, the image at the top left shows a dog

sitting on a couch, and the image at the top right shows a

person and a horse. Dog, couch, person, and horse are

among the 20 classes of this data set . . . . . . . . . . . . . . . . . 112

Figure 5.4 A random sample of 12 images from the Birds data set.

There are two images per class. Images from the same

class sit next to each other in this figure . . . . . . . . . . . . . . . 113

Figure 5.5 The BOVW learning model for facial expression recog-

nition. The feature vector consists of SIFT featu res

computed on a regular grid across the image (dense SIFT)

and vector quantized into visual words. The presence of

each visua l word is then recorded in a presence vector.

Normalized presence vectors enter the training stage.

Learning is done by a local kernel method . . . . . . . . . . . . . 124

Figure 5.6 An example of SIFT features extracted from two images

representing distinct emotions: fear (left) and disgust

(right) ...................................... 125

Figure 5.7 The six neares t neighbors selected with the presence kernel

from the vicinity of the test image are visually more

similar than the other six images randomly selected from

the training set. Despite of this fact, the nearest neighbors

do not adequately indicate the test label (disgust). Thus, a

learning method needs to be trained on the selected

neighbors to accurat ely predict the label of the test

image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Figure 7.1 Phylogenetic tree obtained for 22 mammalian

mtDNA sequences using LRD based on 2-mers. . . . . . . . . . 168

Figure 7.2 Phylogenetic tree obtained for 22 mammalian mtDNA

sequences using LRD based on 4-mers . . . . . . . . . . . . . . . . 168

Figure 7.3 Phylogenetic tree obtained for 22 mammalian

mtDNA sequences using LRD based on 6-mers. . . . . . . . . . 169

Figure 7.4 Phylogenetic tree obtained for 22 mammalian mtDNA

sequences using LRD based on 8-mers . . . . . . . . . . . . . . . . 169

Figure 7.5 Phylogenetic tree obtained for 22 mammalian

mtDNA sequences using LRD based on 10-mers . . . . . . . . . 170

Figure 7.6 Phylogenetic tree obtained for 22 mammalian mtDNA

sequences using LRD based on sum of k-mers . . . . . . . . . . 170

Figure 7.7 Phylogenetic tree obtained for 27 mammalian

mtDNA sequences using LRD based on 18-mers . . . . . . . . . 171

xviii List of Figures

Figure 7.8 The distance evolution of the best chromosome at each

generation for the rat–mouse–cow experiment. The green

line represents the rat–house mouse (RH) distance, the

blue line represents the rat–fat dormouse (RF) distance,

and the red line represents the rat–cow (RC) distance. . . . . . 173

Figure 7.9 The precision–recall curves of the state-of-the-art aligners

versus the precision–recall curves of the two LRD

aligners, when 10,000 contaminated reads of length 100

from the orangutan are included. The two variants of the

BOWTIE aligner are based on local and global alignment,

respectively. The LRD aligner based on hash tables is a

fast approximate version of the original LRD aligner . . . . . . 175

Figure 7.10 The precision –recall curves of the state-of-the-art aligners

versus the precision–recall curves of the two LRD

aligners, when 50,000 contaminated reads of length 100

from 5 mammals are included. The two variants of the

BOWTIE aligner are based on local and global alignment,

respectively. The LRD aligner based on hash tables is a

fast approximate version of the original LRD aligner . . . . . . 178

Figure 7.11 Local Rank Distance computed in the presence of different

types of DNA changes such as point mutations, indels, and

inversions. In the first three cases a–c, a single type of

DNA polymorphism is included in the second (bott om)

string. The last case d shows how LRD measures the

differences between the two DNA strings when all the

types of DNA changes occur in the second string. The

nucleotides affected by changes are marked with bold. To

compare the results for the different types of DNA

changes, the first string is always the same in all the four

cases. Note that in all the four exa mples, LRD is based on

1-mers. In each case, Δ

LRD

¼ Δ

left

þ Δ

right

. a Measuring

LRD with point mutations. The T at index 7 is substituted

with C. b Measuring LRD with indels. The substring GT is

deleted. c Measuring LRD with inversions. The substring

AGTT is inverted. d Measuring LRD with point mutations,

indels, and invensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Figure 8.1 An example with three classes that illustrates the masking

problem. Class A is masked by classes B and C.......... 203

List of Figures xix

List of Tables

Table 4.1 Results of the experiment performed on the MNIST subset

of 300 images, using the 3-NN based on LPD model with

patches ranging from 2  2 pixels to 9  9 pixels . . . . . . . . . 70

Table 4.2 Results of the experiment performed on the MNIST subset

of 300 images, using various maximum offsets, patches of

4  4 pixels, and a similarity threshold of 0.12. . . . . . . . . . . 71

Table 4.3 Baseline 3-NN versus 3-NN based on LPD . . . . . . . . . . . . . 71

Table 4.4 Accuracy rates of several classifiers based on LPD versus

the accuracy rates of the standard SVM and KRR. . . . . . . . . 73

Table 4.5 Comparison of severa l classifiers (some based on LPD). . . . . 74

Table 4.6 Error and time of the 3-NN classifier based on LPD with

filtering, for various K values. . . . . . . . . . . . . . . . . . . . . . . 76

Table 4.7 Confusion matrix of the 3-NN based on LPD with fil tering

using K ¼ 50.................................. 78

Table 4.8 Error rates on the entire MNIST data set for baseline 3-NN,

k-NN based on Tangent distance, and k-NN based on LPD

with filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Table 4.9 Error rates of different k-NN models on Birds data set . . . . . 80

Table 4.10 Error on Birds data set for texton learning methods

of Lazebnik et al. (2005a) and kernel methods based on

LPD........................................ 80

Table 4.11 Accuracy rates on the Brodatz data set using 3 random

samples per class for training . . . . . . . . . . . . . . . . . . . . . . . 90

Table 4.12 Accuracy rates of several MKL approaches that include

LTD compared with state-of-the-art methods on the

Brodatz data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Table 4.13 Accuracy rates on the UIUCTex data set using 20 random

samples per class for training . . . . . . . . . . . . . . . . . . . . . . . 93

Table 4.14 Accuracy rates of several MKL approaches that include

LTD compared with state-of-the-art methods on the

UIUCTex data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

xxi

剩余264页未读，继续阅读

wang1062807258

粉丝: 13
资源: 272

相似学习方法：计算机视觉与文本挖掘之间的知识迁移

Python-textsimilarity用TF特征向量和simhash指纹计算中文文本的相似度

TextSimilarity.rar

Python-TextSimilarity使用不同的方法计算相似度

请把这篇文献《Accelerating Similarity-Based Model Matching Using On-The-Fly Similarity Preserving Hashing》翻译成中文

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning张量点

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning预备工作

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning用到的模型

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning如何提取时间信息

ViSiL: Fine-grained Spatio-Temporal Video Similarity Learning卷积神经网络模型

node similarity-based graph convolution for link prediction

最新资源