没有合适的资源？快使用搜索试试~ 我知道了~

首页Introduction to Algorithms Lecture Notes (MIT 6.006)

Introduction to Algorithms Lecture Notes (MIT 6.006)

6.006

Algorithms

5星 · 超过95%的资源需积分: 13 42 下载量 177 浏览量更新于2023-03-16 评论 1 收藏 18.19MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

试读

150页

Introduction to Algorithms Lecture Notes (MIT 6.006)

资源详情

资源评论

资源推荐

MIT OpenCourseWare

http://ocw.mit.edu

6.006 Introduction to Algorithms

Spring 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms

Lecture 1 Introduction and Document Distance 6.006 Spring 2008

Lecture 1: Introduction and the Document

Distance Problem

Course Overview

• Eﬃcient procedures for solving problems on large inputs (Ex: entire works of Shake-

speare, human genome, U.S. Highway map)

• Scalability

• Classic data

structures and elementary algorithms (CLRS text)

Real implementations in Python Fun problem sets! • ⇔

β version

of the class - feedback is welcome! •

Pre-requisites

• Familiarity with Python and Discrete Mathematics

Contents

The course is divided into 7 modules - each of which has a motivating problem and problem

set (except for the last module). Modules and motivating problems are as described below:

1. Linked

Data Structures: Document Distance (DD)

2. Hashing: DD,

Genome Comparison

3. Sorting: Gas

Simulation

4. Search:

Rubik’s Cube 2 × 2 × 2

5. Shortest Paths:

Caltech MIT→

6. Dynamic Programming:

Stock Market

7. Numerics:

√

Document Distance Problem

Motivation

Given

two documents, how similar are they?

• Identical

- easy?

• Modiﬁed

or related (Ex: DNA, Plagiarism, Authorship)



Lecture 1 Introduction and Document Distance 6.006 Spring 2008

• Did Francis

Bacon write Shakespeare’s plays?

answer the above, we need to deﬁne practical metrics. Metrics are deﬁned in terms of

word frequencies.

Deﬁnitions

1. Word:

Sequence of alphanumeric characters. For example, the phrase “6.006 is fun”

has 4 words.

2. Word

Frequencies: Word frequency D(w) of a given word w is the number of times

it occurs in a document D.

For

example, the words and word frequencies for the above phrase are as below:

Count

: 1 0 1 1 0 1

W ord : 6 the is 006 easy fun

In practice,

while counting, it is easy to choose some canonical ordering of words.

3. Distance

Metric: The document distance metric is the inner product of the vectors D

and D

containing the word frequencies for all words in the 2 documents. Equivalently,

this is the projection of vectors D

onto D

or vice versa. Mathematically this is

expressed as:

· D

= D

(w) · D

(w) (1)

4. Angle Metric: The angle between the vectors D

and D

gives an indication of overlap

between the 2 documents. Mathematically this angle is expressed as:

 

θ(

, D

) = arccos

· D

� D

� ∗ � D

�

0 ≤ θ ≤ π/

An angle

metric of 0 means the two documents are identical whereas an angle metric

of π/2 implies that there are no common words.

5. Number

of Words in Document: The magnitude of the vector D which contains word

frequencies of all words in the document. Mathematically this is expressed as:

D) =� D �=

√

D D (2)·

So let’s

apply the ideas to a few Python programs and try to ﬂesh out more.

Lecture 1 Introduction and Document Distance 6.006 Spring 2008

Document Distance in Practice

Computing Document Distance: docdist1.py

The python

code and results relevant to this section are available here. This program com-

putes the

distance between 2 documents by performing the following steps:

Read ﬁle

•

• Make

word list [“the”,“year”,. . . ]

• Count

frequencies [[“the”,4012],[“year”,55],. . . ]

• Sort into

order [[“a”,3120],[“after”,17],. . . ]

• Compute θ

Ideally,

we would like to run this program to compute document distances between writings

of the following authors:

Jules Verne

- document size 25k •

• Bobsey Twins

- document size 268k

Lewis and

Clark - document size 1M •

• Shakespeare

- document size 5.5M

Churchill

- document size 10M •

Experiment:

Comparing the Bobsey and Lewis documents with docdist1.py gives θ = 0.574.

However, it takes approximately 3 minutes to compute this document distance, and probably

gets slower as the inputs get large.

What is wrong with the eﬃciency of this program?

Is it a Python vs. C issue? Is it a choice of algorithm issue - θ(n

) versus θ(n)?

Proﬁling: docdist2.py

In order

to ﬁgure out why our initial program is so slow, we now “instrument” the program

so that Python will tell us where the running time is going. This can be done simply using

the proﬁle module in Python. The proﬁle module indicates how much time is spent in each

routine.

(See this

link for details on proﬁle).

The proﬁle module is imported into docdist1.py and the end of the docdist1.py ﬁle is

modiﬁed. The modiﬁed docdist1.py ﬁle is renamed as docdist2.py

Detailed results

of document comparisons are available here

Lecture 1 Introduction and Document Distance 6.006 Spring 2008

from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes from cryptography.hazmat.primitives.cmac import CMAC from cryptography.hazmat.backends import default_backend # 输入的密钥 key = b"952B76AD1003E35763AE3FB9F1D01C5F" # 输入的消息 message = b"0x330003C53B2A4EA388130001C0004fff" # 创建AES-128算法的Cipher对象和CMAC对象 aes_cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=default_backend()) cmac_obj = CMAC(aes_cipher, backend=default_backend()) # 更新CMAC对象的输入消息 cmac_obj.update(message) # 计算MAC值 mac = cmac_obj.finalize() print(mac.hex())

aes_cipher = Cipher(algorithms.AES(key), modes.ECB(), backend=default_backend()) cmac_obj = CMAC(aes_cipher, backend=default_backend()) # 更新CMAC对象的输入消息 cmac_obj.update(message) # 计算MAC值 ...

介绍机器人路径规划的外文文献

Planning algorithms. Cambridge University Press. 3. Choset, H., Lynch, K. M., Hutchinson, S., Kantor, G., Burgard, W., Kavraki, L. E., & Thrun, S. (2005). Principles of robot motion: theory, ...

Whether from a visual or quantitative perspective,our NWCTV-WTTV method surpasses other comparative algorithms in abdominal image restoration.这个有语法错误吗？

这个句子没有语法错误，但是可能需要进行一些调整以提高..."Our NWCTV-WTTV method surpasses other comparative algorithms in abdominal image restoration, whether from a visual or quantitative perspective.

基于c语言的常用排序算法为题写一个开题报告并在结尾加上参考文献

在这篇开题报告中，我将讨论基于C语言的常用排序... Stein, “Introduction to Algorithms,” MIT Press, 2009. [2] A. Aho, J. Hopcroft and J. Ullman, “Data Structures and Algorithms,” Addison-Wesley, 1983.

会员权益专享

Introduction to Algorithms Lecture Notes (MIT 6.006)

评论3

会员权益专享

最新资源

Introduction to Algorithms Lecture Notes (MIT 6.006)

评论3

MIT经典教材之算法导论Introduction.to.algorithms完整版+教材+讲义+习题答案

Intro-to-Algorithms:麻省理工学院OCW 6.006（算法简介）的课程工作

算法导论（introduction to algorithms ）课后习题经典解析及答案

introduction to algorithms. mit press, 2001.

machine trading_deploying computer algorithms to conquer the markets.pdf

最短路径dijkstra算法参考文献

introduction to algorithms 3rd

拓扑排序专业参考文献

introduction to algorithms - hardcover edition

algorithms,.s..dasgupta,.c.h..papadimitriou,.u.v..vazirani,.mgh,.2008

introduction to algorithms, 3rd edition, the mit press 2009 pdf

Floyd算法在最短路径问题中的应用研究，请给我提供15个以上的文献，其中至少3篇英文的

introduction to algorithms pdf

introduction to algorithms 4th

数据结构书籍和网课推荐

介绍机器人路径规划的外文文献

Whether from a visual or quantitative perspective,our NWCTV-WTTV method surpasses other comparative algorithms in abdominal image restoration.这个有语法错误吗？

基于c语言的常用排序算法为题写一个开题报告 并在结尾加上参考文献

推荐程序员必读的10本书

会员权益专享

最新资源

基于c语言的常用排序算法为题写一个开题报告并在结尾加上参考文献