首页爬取https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html页面中的新闻数据。

爬取https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html页面中的新闻数据。

时间: 2023-10-31 08:59:02 浏览: 28

好的，你需要使用 Python 爬虫来实现这个任务。可以使用 requests 库进行网页请求，使用 BeautifulSoup 库进行网页解析。以下是实现该任务的代码： ```python import requests from bs4 import BeautifulSoup url = 'https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html' headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') news_list = [] news_items = soup.find_all('div', class_='text-con') for item in news_items: news_dict = {} news_dict['title'] = item.find('h3').text.strip() news_dict['content'] = item.find('p').text.strip() news_list.append(news_dict) print(news_list) ``` 这段代码首先发送了一个 GET 请求来获取页面的 HTML 内容，然后使用 BeautifulSoup 库来解析页面。接着，使用 find_all 方法来查找页面中所有的新闻项，然后对每个新闻项进行解析，将新闻标题和内容存储到一个字典中，并将字典添加到一个列表中。最后，输出新闻列表。

阅读全文