python网页爬虫实例代码
时间: 2023-12-11 09:03:44 浏览: 87
以下是三个Python爬虫项目实例代码,供您参考:
1. 爬取豆瓣电影Top250的电影名称和评分
```python
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_list = []
for i in range(0, 250, 25):
req = requests.get(url + '?start=' + str(i), headers=headers)
soup = BeautifulSoup(req.text, 'html.parser')
movies = soup.find_all('div', {'class': 'hd'})
for movie in movies:
title = movie.a.span.text.strip()
rating = movie.parent.find('span', {'class': 'rating_num'}).text.strip()
movie_list.append(title + ' ' + rating)
print('\n'.join(movie_list))
```
2. 爬取知乎上的问题和答案
```python
import requests
from bs4 import BeautifulSoup
url = 'https://www.zhihu.com/question/37787176'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
question = ''
answers = []
req = requests.get(url, headers=headers)
soup = BeautifulSoup(req.text, 'html.parser')
question = soup.find('h1', {'class': 'QuestionHeader-title'}).text.strip()
answer_list = soup.find_all('div', {'class': 'List-item'})
for answer in answer_list:
content = answer.find('div', {'class': 'RichContent-inner'}).text.strip()
answers.append(content)
print('问题:' + question)
print('回答:\n' + '\n\n'.join(answers))
阅读全文