爬取豆瓣前一百的电影用python实现
时间: 2023-12-18 07:05:13 浏览: 13
你可以使用 Python 的爬虫库如 BeautifulSoup 或 Scrapy 来爬取豆瓣电影的前一百部电影信息。下面是一个简单的例子,使用 BeautifulSoup 库爬取豆瓣电影的前十部电影的名称和评分:
```
import requests
from bs4 import BeautifulSoup
url = 'https://movie.douban.com/chart'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
movies = soup.find_all('div', {'class': 'pl2'})
for movie in movies[:10]:
title = movie.find('a')['title']
rate = movie.find('span', {'class': 'rating_nums'}).text
print(f'{title}:{rate}')
```
相关问题
利用python爬取豆瓣top前250电影
好的,以下是爬取豆瓣 Top 250 电影并输出到 CSV 文件的 Python 代码:
```python
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://movie.douban.com/top250'
def get_movies(url):
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')
movie_list = soup.find('ol', {'class': 'grid_view'})
movies = []
for movie_li in movie_list.find_all('li'):
movie = {}
movie_hd = movie_li.find('div', {'class': 'hd'})
movie['title'] = movie_hd.find('span', {'class': 'title'}).text
movie_info = movie_li.find('div', {'class': 'bd'})
movie['rating'] = movie_info.find('span', {'class': 'rating_num'}).text
movie['quote'] = movie_info.find('span', {'class': 'inq'}).text
movies.append(movie)
return movies
movies = []
for i in range(0, 250, 25):
movies.extend(get_movies(url + '?start=' + str(i)))
with open('douban_top250_movies.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['电影名称', '评分', '简介'])
for movie in movies:
writer.writerow([movie['title'], movie['rating'], movie['quote']])
```
这段代码会从豆瓣 Top 250 电影页面爬取电影的名称、评分和简介,并将其输出到名为 `douban_top250_movies.csv` 的 CSV 文件中。
写一个Python程序实现爬取豆瓣电影top250
好的,以下是Python代码实现爬取豆瓣电影top250的步骤:
1. 导入必要的库
```python
import requests
from bs4 import BeautifulSoup
```
2. 定义函数,获取页面信息
```python
def get_html(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url, headers=headers)
html = response.text
return html
```
3. 定义函数,解析页面信息
```python
def parse_html(html):
soup = BeautifulSoup(html, 'html.parser')
movie_list = soup.find('ol', class_='grid_view').find_all('li')
result = []
for movie in movie_list:
name = movie.find('span', class_='title').text
rating = float(movie.find('span', class_='rating_num').text)
comments = int(movie.find('div', class_='star').find_all('span')[3].text[:-3])
result.append((name, rating, comments))
return result
```
4. 定义函数,存储数据到文件
```python
def save_data(data):
with open('top250.txt', 'w', encoding='utf-8') as f:
for item in data:
f.write(item[0] + '\t' + str(item[1]) + '\t' + str(item[2]) + '\n')
```
5. 主函数,爬取top250并存储数据到文件
```python
if __name__ == '__main__':
url = 'https://movie.douban.com/top250'
data = []
for i in range(0, 10):
html = get_html(url + '?start=' + str(i * 25))
result = parse_html(html)
data += result
save_data(data)
```
这样就实现了爬取豆瓣电影top250的功能,数据存储在top250.txt文件中。
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![-](https://csdnimg.cn/download_wenku/file_type_column_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)