我的程序代码为:import json import requests from bs4 import BeautifulSoup from flask import Flask, jsonify, request app = Flask(__name__) # 获取新闻列表 @app.route('/news_list', methods=['GET']) def get_news_list(): url = 'https://www.ecjtuit.edu.cn/' r = requests.get(url) r.encoding = 'utf-8' soup = BeautifulSoup(r.text, 'html.parser') news_list = soup.select('.new_list li a')[:5] result = [] for news in news_list: news_title = news['title'] news_url = url + news['href'] result.append({ 'title': news_title, 'url': news_url }) return json.dumps(result, ensure_ascii=False) @app.route('/get_carousel_info') def get_carousel_info(): url = 'https://www.ecjtuit.edu.cn/' r = requests.get(url) r.encoding = 'utf-8' soup = BeautifulSoup(r.text, 'html.parser') carousel_items = soup.select('.bd ul li a') # print(carousel_items) result = [] for item in carousel_items: try: title = item['title'] href = url + item['href'] img_src = item.img['src'] result.append({ 'title': title, 'href': href, 'img_src': 'https://www.ecjtuit.edu.cn' + img_src }) except TypeError: # 如果出现 TypeError 异常,则跳过该 item pass return jsonify(result) if __name__ == '__main__': app.run(debug=True)
时间: 2023-06-16 11:05:05 浏览: 165
这是一个使用 Flask 框架开发的 web 应用程序,通过对指定网站进行爬虫,获取新闻列表和轮播图信息,并提供 API 接口供其他应用程序调用。其中 `get_news_list()` 函数用于获取新闻列表,`get_carousel_info()` 函数用于获取轮播图信息。这个应用程序可以在本地运行,也可以部署到服务器上提供服务。
相关问题
python代码实现爬取豆瓣电影250,需使用bs4,正则表达式,flask框架,最后将数据存储在Excel表
格中,你能帮我实现吗?
当然可以,以下是爬取豆瓣电影250并将数据存储在Excel表格中的Python代码:
```python
import requests
from bs4 import BeautifulSoup
import re
import xlwt
app = Flask(__name__)
def get_movies():
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
movie_list = []
for i in range(0, 10):
link = 'https://movie.douban.com/top250?start=' + str(i * 25)
r = requests.get(link, headers=headers, timeout=10)
soup = BeautifulSoup(r.text, 'lxml')
div_list = soup.find_all('div', class_='hd')
for each in div_list:
movie = each.a.span.text.strip()
movie_list.append(movie)
return movie_list
def save_to_excel(movie_list):
workbook = xlwt.Workbook()
sheet = workbook.add_sheet('Top250 Movies', cell_overwrite_ok=True)
sheet.write(0, 0, 'Top250 Movies')
i = 1
for movie in movie_list:
sheet.write(i, 0, movie)
i += 1
workbook.save('豆瓣电影Top250.xls')
@app.route('/')
def hello():
movies = get_movies()
save_to_excel(movies)
return '豆瓣电影Top250.xls has been saved!'
```
此代码将从豆瓣电影250中获取电影名,并将其保存在一个Excel表格中。在Flask应用程序中,我们可以在本地运行它,然后在浏览器中通过访问<http://127.0.0.1:5000/>来启动它。
阅读全文