首页完成 https://ie.njxzc.edu.cn/8743/ 页面中通知标题、链接地址、日期信息的爬取。

完成 https://ie.njxzc.edu.cn/8743/ 页面中通知标题、链接地址、日期信息的爬取。

时间: 2023-08-15 13:06:26 浏览: 46

好的，以下是爬取 https://ie.njxzc.edu.cn/8743/ 页面中通知标题、链接地址、日期信息的代码： ```python import requests from bs4 import BeautifulSoup url = 'https://ie.njxzc.edu.cn/8743/' response = requests.get(url) html = response.text soup = BeautifulSoup(html, 'html.parser') notices = soup.find_all('div', attrs={'class': 'wp_article_list_item clearfix'}) for notice in notices: title = notice.find('a').text.strip() link = notice.find('a').get('href') date = notice.find('span', attrs={'class': 'date'}).text.strip() print(title, link, date) ``` 代码中，我们首先使用 requests 库发送 GET 请求获取网页的 HTML 内容，然后使用 BeautifulSoup 库解析 HTML 内容。通过观察网页的 HTML 结构，我们可以发现通知标题、链接地址、日期信息都在 class 为 `wp_article_list_item clearfix` 的 div 元素中。因此，我们可以使用 `soup.find_all()` 方法找到所有符合条件的 div 元素，并遍历每个元素，提取出其中的通知标题、链接地址、日期信息。最后，使用 `print()` 函数将提取到的信息输出到控制台。您可以根据实际需要，将其保存到本地文件或数据库中。