Python编程：信息探索实战

python

information

需积分: 20 138 浏览量更新于2024-07-19 收藏 1.49MB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"Python for Information" 是一本关于使用Python进行信息探索的书籍，作者是Charles Severance。本书主要讲解如何利用Python进行网络爬虫程序的编写，内容涵盖网络爬虫的基础，包括数据抓取的三种方法，缓存数据的提取，多线程和多进程的并发抓取技术，动态页面内容的抓取，处理表单交互和验证码问题，以及使用Scrapy和Portia这两个工具进行数据抓取。此外，书中还通过实例展示了如何应用所学技术对真实网站进行数据抓取。全书分为多个章节，从基础的Python编程概念开始，逐步深入到高级的网络爬虫技术。2013年版本增加了关于数据可视化的新章节，并对第13章和第14章进行了重大修订，采用JSON格式处理数据，并介绍了OAuth协议的使用。从2009年至2014年，书本经历了多次修订和完善，内容更加丰富和专业。书中首先介绍了Python的基本语法和编程思维，让读者建立起计算机科学的基础。接着，讲解了网络爬虫的基础知识，如HTTP协议、HTML解析、正则表达式等，帮助读者理解如何从网页中抓取所需信息。在数据提取部分，书中详细讨论了三种方法，包括DOM解析、XPath和BeautifulSoup库的使用。针对动态内容的抓取，书中提到了JavaScript执行和异步请求的处理，这对于抓取现代Web应用程序中的信息至关重要。同时，书中也涵盖了如何处理登录、表单提交和验证码识别，这些都是实际爬虫项目中常见的挑战。在并发抓取部分，作者介绍了Python的多线程和多进程技术，帮助提高爬虫的效率。Scrapy是一个强大的爬虫框架，而Portia则提供了一个可视化的爬虫设计工具，这两者结合使用可以让数据抓取工作更加高效和便捷。最后，书中通过实例教程演示了如何将这些理论和技术应用到实际的网站数据抓取中，使读者能够将所学知识付诸实践，增强解决实际问题的能力。《Python for Information》是一本全面且实用的Python网络爬虫指南，适合对网络爬虫感兴趣的初学者和有一定编程基础的读者。通过学习这本书，读者不仅可以掌握Python编程基础，还能深入了解网络爬虫的原理和实践，为信息获取和数据分析打下坚实基础。

资源详情

资源推荐

4 Chapter 1. Why should you learn to write programs?

While most of the detail of how thes e components work is best left to computer

builders, it helps to have some terminology so we can talk about these different

parts as we write our programs.

As a programmer, your job is to use and orchestrate each of these resources to

solve the problem that you need to solve and analyze the data you get from the

solution. As a programmer you will mostly be “talking” to the CPU and telling

it what to do next. Sometimes you will tell the CPU to use the main memory,

secondary memory, network, or the input/output devices.

You

Input

Software

Output

Devices

What

Next?

Central

Processing

Unit

Main

Memory

Secondary

Memory

Network

You need to be the person who answers the CPU’s “What next?” question. But it

would be very uncomfortable to shrink you down to 5mm tall and inser t you into

the computer just so you could issue a command three billion times per second. So

instead, you must write down your instructions in advance. We call these stored

instructions a program and the act of writing these instructions down and getting

the instructions to be correct programming.

1.3 Understanding programming

In the rest of this book, we will try to turn you into a person who is skilled

in the art of progr amming. In the end you will be a programmer — perhaps

not a professional programmer, but at least you will have the skills to look at a

data/information analysis problem and develop a program to solve the problem.

In a sense, you need two skills to be a programmer:

• First, you need to know the programming language (Python) - you need

to know the vocabulary and the grammar. You need to be able to spell

the words in this new language properly and know how to construct well-

formed “sentences” in this new language.

1.4. Words and sentences 5

• Second, you need to “tell a story”. In writing a story, you combine words

and sentences to convey an idea to the reader. There is a skill and art in

constructing the story, and skill in story writing is improved by doing some

writing and getting some feedback. In programming, our program is the

“story” and the problem you are trying to solve is the “idea”.

Once you learn one programming language such as Python, you will ﬁnd it much

easier to learn a second programming language such as JavaScript or C++. The

new programming language has very different vocabulary and grammar but the

problem-solving skills will be the same across all programming languages.

You will learn the “vocabulary” and “sentences” of Python pretty quickly. It will

take longer for you to be able to write a coherent program to solve a brand-new

problem. We teach programming much like we teach writing. We start reading

and explaining programs, then we write simple programs, and then we write in-

creasingly complex programs over time. At some point you “get your muse” and

see the patterns on your own and can see more naturally how to take a problem

and write a pr ogram that solves that problem. And once you get to that point,

programming becomes a very pleasant and creative process.

We start with the vocabulary and structure of Python programs. Be patient as the

simple examples remind you of when you started reading for the ﬁrst time.

1.4 Words and sentences

Unlike human languages, the Python vocabulary is actually pretty small. We call

this “vocabulary” the “reserved words”. These are words that have very special

meaning to Python. When Python sees these words in a Python program, they

have one and only one meaning to Python. Later as you write programs you will

make up your own words that have meaning to you called variables. You will

have great latitude in choosing your names for your variables, but you cannot use

any of Python’s reserved words as a name for a variable.

When we train a dog, we use special words like “sit”, “stay”, and “fetch”. When

you talk to a dog and don’t use any of the reserved words, they just look at you with

a quizzical look on their face until you say a reserved word. For example, if you

say, “I wish more people would walk to improve their overall health”, what most

dogs likely hear is, “blah blah blah walk blah blah blah blah.” That is because

“walk” is a reserved word in dog language. Many might suggest that the language

between humans and cats has no reserved words

The reserved words in the language where humans talk to Python include the

following:

http://xkcd.com/231/

6 Chapter 1. Why should you learn to write programs?

and del from not while

as elif global or with

assert else if pass yield

break except import print

class exec in raise

continue finally is return

def for lambda try

That is it, and unlike a dog, Python is already completely trained. When you say

“try”, Python will try every time you say it without fail.

We will learn these reserved words and how they are used in good time, but for

now we will focus on the Python equivalent of “speak” (in human-to-dog lan-

guage). The nice thing about telling Python to speak is that we can even tell it

what to say by giving it a message in quotes:

Hello world!

And we have even written our ﬁrst s yntactically correct Python sentence. Our

sentence starts with the reserved word print followed by a string of text of our

choosing enclosed in single quotes.

1.5 Conversing with Python

Now that we have a word and a simple sentence that we know in Python, we need

to know how to start a conversation with Python to test our new language skills.

Before you can converse with Python, you must ﬁrst install the Python software on

your computer and learn how to start Python on your computer. That is too much

detail for this chapter so I suggest that you consult

www.pythonlearn.com

where

I have detailed instructions and screencasts of setting up and starting Python on

Macintosh and Windows systems. At some point, you will be in a terminal or

command window and you will type python and the Python interpreter will start

executing in interactive mode and appear somewhat as follows:

Python 2.6.1 (r261:67515, Jun 24 2010, 21:47:49)

[GCC 4.2.1 (Apple Inc. build 5646)] on darwin

Type "help", "copyright", "credits" or "license" for more information.

>>>

The

>>>

prompt is the Python interpreter’s way of asking you, “What do you want

me to do next?” Python is ready to have a conversation with you. All you have to

know is how to speak the Python language.

Let’s say for example that you did not know even the simplest Python language

words or sentences. You might want to use the standard line that astronauts use

when they land on a faraway planet and try to speak with the inhabitants of the

planet:

1.5. Conversing with Python 7

>>> I come in peace, please take me to your leader

File "<stdin>", line 1

I come in peace, please take me to your leader

SyntaxError: invalid syntax

>>>

This is not going so well. Unless you think of something quickly, the inhabitants

of the planet are likely to stab you with their spears, put you on a spit, roast you

over a ﬁre, and eat you for dinner.

Luckily you brought a copy of this book on your travels, and you thumb to this

very page and try again:

>>> print

Hello world!

This is looking much better, so you try to communicate some more:

>>> print

You must be the legendary god that comes from the sky

>>> print

We have been waiting for you for a long time

>>> print

Our legend says you will be very tasty with mustard

>>> print

We will have a feast tonight unless you say

File "<stdin>", line 1

We will have a feast tonight unless you say

SyntaxError: EOL while scanning string literal

>>>

The conversation was going so well for a while and then you made the tiniest

mistake using the Python language and Python brought the spears back out.

At this point, you s hould also realize that while Python is amazingly complex and

powerful and very picky about the syntax you use to communicate with it, Python

is not intelligent. You are really just having a conversation with yourself, but using

proper syntax.

In a sense, when you use a program written by someone else the conversation is

between you and those other programmers with Python acting as an intermediary.

Python is a way for the creators of programs to express how the conversation is

supposed to proceed. And in just a few more chapters, you will be one of those

programmers using Python to talk to the users of your program.

Before we leave our ﬁrst conversation with the Python interpreter, you should

probably know the proper way to say “good-bye” when interacting with the in-

habitants of Planet Python:

>>> good-bye

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

8 Chapter 1. Why should you learn to write programs?

NameError: name

good

is not defined

>>> if you don

tmind,Ineedtoleave

File "<stdin>", line 1

if you don

tmind,Ineedtoleave

SyntaxError: invalid syntax

>>> quit()

You will notice that the error is different for the ﬁrst two incorrect attempts. The

second error is different because if is a reserved word and Python saw the reserved

word and thought we were trying to say something but got the syntax of the sen-

tence wrong.

The proper way to say “good-bye” to Python is to enter quit() at the interactive

chevron

>>>

prompt. It would have probably taken you quite a while to guess that

one, so having a book handy probably will turn out to be helpful.

1.6 Terminology: interpreter and compiler

Python is a high-level language intended to be relatively straightforward for hu-

mans to read and write and for computers to read and process. Other high-level

languages include Java, C++, PHP, Ruby, Basic, Perl, JavaScript, and many more.

The actual hardware inside the Central Processing Unit (CPU) does not understand

any of these high-level languages.

The CPU understands a language we call machine language. Machine language

is very simple and frankly very tiresome to write because it is represented all in

zeros and ones:

01010001110100100101010000001111

11100110000011101010010101101101

...

Machine language seems quite simple on the surface, given that there are only ze-

ros and ones, but its s yntax is even more complex and far more intricate than

Python. So very few programmers ever write machine language. Instead we

build various translators to allow programmers to write in high-level languages

like Python or JavaScript and these translators convert the programs to machine

language for actual execution by the CPU.

Since machine language is tied to the computer hardware, machine language is not

portable across different types of hardware. Programs written in high-level lan-

guages can be moved between different computers by using a different interpreter

on the new machine or recompiling the code to create a machine language version

of the program for the new machine.

These progr amming language translators fall into two general categories: (1) in-

terpreters and (2) compilers.

剩余243页未读，继续阅读

卖女孩的小滑稽

粉丝: 2
资源: 10

Python编程：信息探索实战

python info

Python for Information

使用python获取Information centrality

python scripting for klayout

Python crawls information from a specific search on Weibo

Add Python to PATH

Unknown option: -3 usage: C:\Users\15875\AppData\Local\Programs\Python\Python39\python.exe [option] ... [-c cmd | -m mod | file | -] [arg] ... Try `python -h' for more information.

如何在Python写代码上打开Python

django进入通过命令“python manage.py shell”交互模式出现“Type 'copyright', 'credits' or 'license' for more information IPython 8.12.0 -- An enhanced Interactive Python. Type '?' for help.”是什么原因

The Python Jedi server crashed 5 times in the last 3 minutes. The server will not be restarted. See the output for more information.

python visions compatibility

java基于ssm+jsp一家运动鞋店的产品推广网站系统源码 带毕业论文

51单片机Proteus仿真LCD1602+DS18B20的温度读取显示编程.rar

暴风电视 50F1 配屏V500HJ1-PE8(C3) 机编600000MWV00 屏参30162503 风UI1.0 本地升级

最新资源

java基于ssm+jsp一家运动鞋店的产品推广网站系统源码带毕业论文