用python写一个网络爬虫
时间: 2024-05-05 13:15:39 浏览: 89
作为AI语言模型,我无法直接执行代码,但我可以提供一些Python网络爬虫的基本框架和示例代码,帮助您开始编写自己的网络爬虫。
网络爬虫的基本框架:
1. 导入相关的Python库
```
import requests
from bs4 import BeautifulSoup
```
2. 获取目标网页的HTML源码
```
url = 'http://www.example.com'
response = requests.get(url)
html = response.content
```
3. 解析HTML源码,提取需要的信息
```
soup = BeautifulSoup(html, 'html.parser')
title = soup.title.string
links = []
for link in soup.find_all('a'):
links.append(link.get('href'))
```
4. 存储或处理提取的信息
```
# 存储到文件
with open('output.txt', 'w') as f:
f.write(title)
f.write('\n')
for link in links:
f.write(link)
f.write('\n')
# 处理提取的信息
for link in links:
# do something with link
```
示例代码:爬取豆瓣电影Top250的电影名称和评分
```
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html, 'html.parser')
movies = soup.find_all('div', class_='info')
for movie in movies:
title = movie.find('span', class_='title').string
rating = movie.find('span', class_='rating_num').string
print(title, rating)
```
注意:在编写网络爬虫时,请遵守相关的法律和道德规范,不要对目标网站进行恶意攻击或侵犯他人隐私。
阅读全文