首页爬取https://www.ntst.edu.cn/通知通告部分

爬取https://www.ntst.edu.cn/通知通告部分

时间: 2023-10-07 22:13:10 浏览: 65

可以使用Python中的requests和beautifulsoup库来爬取网页内容。以下是示例代码： ```python import requests from bs4 import BeautifulSoup url = "https://www.ntst.edu.cn/tztg.htm" response = requests.get(url) response.encoding = 'utf-8' html = response.text soup = BeautifulSoup(html, 'html.parser') notices = soup.select('.list1 li a') for notice in notices: title = notice.text.strip() link = notice['href'] print(title, link) ``` 该代码会爬取南通师范学院通知通告页面的所有通知标题和链接。你可以根据需要修改代码，例如使用正则表达式来提取更多信息。同时，为了避免对网站造成过大的负担，请谨慎使用爬虫技术。

用这种方式r'<ul><a href=.?>(.?)</a></ul>'爬取https://www.ntst.edu.cn/主页面通知通告部分

可以使用Python的requests和re库来实现爬取，具体代码如下： ```python import requests import re url = 'https://www.ntst.edu.cn/' html = requests.get(url).text notices = re.findall(r'<ul><a href=.*?>(.*?)</a></ul>', html) for notice in notices: print(notice) ``` 这段代码首先使用requests库获取网页的html代码，然后使用正则表达式r'<ul><a href=.*?>(.*?)</a></ul>'匹配出通知通告部分的内容，最后将结果打印出来。

阅读全文