requests爬取豆瓣top250
时间: 2023-06-05 18:47:51 浏览: 170
可以使用Python中的requests库来爬取豆瓣电影Top250的数据。首先需要使用requests发送HTTP请求获取网页内容,然后使用BeautifulSoup库解析网页内容,提取出需要的数据。具体步骤如下:
1. 导入requests和BeautifulSoup库
```
import requests
from bs4 import BeautifulSoup
```
2. 发送HTTP请求获取网页内容
```
url = 'https://movie.douban.com/top250'
response = requests.get(url)
html = response.text
```
3. 解析网页内容,提取出需要的数据
```
soup = BeautifulSoup(html, 'html.parser')
movies = soup.find_all('div', class_='info')
for movie in movies:
title = movie.find('span', class_='title').text
rating = movie.find('span', class_='rating_num').text
print(title, rating)
```
这样就可以爬取豆瓣电影Top250的电影名称和评分了。
相关问题
爬取豆瓣top250xpath
以下是使用XPath爬取豆瓣Top250的示例代码:
```python
import requests
from lxml import etree
# 发送请求
url = 'https://movie.douban.com/top250'
response = requests.get(url)
# 解析网页内容
html = etree.HTML(response.text)
# 提取电影名称和评分
movies = html.xpath('//div[@class="hd"]')
for movie in movies:
title = movie.xpath('.//span[@class="title"]/text()')[0]
rating = movie.xpath('..//span[@class="rating_num"]/text()')[0]
print(f"电影名称:{title},评分:{rating}")
```
这段代码首先发送请求获取豆瓣Top250的网页内容,然后使用XPath解析网页内容,提取电影名称和评分信息,并打印输出。
爬取豆瓣top250 python
为了爬取豆瓣Top250的电影信息,您可以使用Python中的Web爬虫库来实现。下面是一种基本的爬取方法:
```python
import requests
from bs4 import BeautifulSoup
def get_movies(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
movie_list = soup.find_all('div', class_='hd')
movies = []
for movie in movie_list:
title = movie.a.span.text.strip()
movies.append(title)
return movies
def main():
url = 'https://movie.douban.com/top250'
movies = get_movies(url)
for i, movie in enumerate(movies):
print(f'{i+1}. {movie}')
if __name__ == '__main__':
main()
```