python代码实现爬取豆瓣电影250,需使用bs4,正则表达式,flask框架,最后将数据存储在Excel表
时间: 2024-06-10 07:04:21 浏览: 147
格中,你能帮我实现吗?
当然可以,以下是爬取豆瓣电影250并将数据存储在Excel表格中的Python代码:
```python
import requests
from bs4 import BeautifulSoup
import re
import xlwt
app = Flask(__name__)
def get_movies():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_list = []
for i in range(0, 10):
link = 'https://movie.douban.com/top250?start=' + str(i * 25)
r = requests.get(link, headers=headers, timeout=10)
soup = BeautifulSoup(r.text, 'lxml')
div_list = soup.find_all('div', class_='hd')
for each in div_list:
movie = each.a.span.text.strip()
movie_list.append(movie)
return movie_list
def save_to_excel(movie_list):
workbook = xlwt.Workbook()
sheet = workbook.add_sheet('Top250 Movies', cell_overwrite_ok=True)
sheet.write(0, 0, 'Top250 Movies')
i = 1
for movie in movie_list:
sheet.write(i, 0, movie)
i += 1
workbook.save('豆瓣电影Top250.xls')
@app.route('/')
def hello():
movies = get_movies()
save_to_excel(movies)
return '豆瓣电影Top250.xls has been saved!'
```
此代码将从豆瓣电影250中获取电影名,并将其保存在一个Excel表格中。在Flask应用程序中,我们可以在本地运行它,然后在浏览器中通过访问<http://127.0.0.1:5000/>来启动它。
阅读全文