爬取https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html页面中的新闻数据。
时间: 2023-10-31 14:57:39 浏览: 33
您好,为了爬取 https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html 页面中的新闻数据,我们需要使用 Python 的 requests 和 BeautifulSoup 库。具体实现步骤如下:
1. 使用 requests 库获取页面内容:
```python
import requests
url = "https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html"
response = requests.get(url)
html = response.text
```
2. 使用 BeautifulSoup 库解析页面内容,提取新闻数据:
```python
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
news_list = []
# 找到新闻列表所在的 div 标签
news_divs = soup.find_all("div", {"class": "layout-content-side"})
# 遍历每个 div 标签,获取其中的新闻标题和链接
for news_div in news_divs:
news_links = news_div.find_all("a", {"class": "text-title"})
for news_link in news_links:
news_title = news_link.text.strip()
news_url = "https://www.xuexi.cn" + news_link["href"]
news_list.append({"title": news_title, "url": news_url})
```
3. 打印新闻数据:
```python
for news in news_list:
print(news["title"], news["url"])
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
url = "https://www.xuexi.cn/f997e76a890b0e5a053c57b19f468436/018d244441062d8916dd472a4c6a0a0b.html"
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, "html.parser")
news_list = []
news_divs = soup.find_all("div", {"class": "layout-content-side"})
for news_div in news_divs:
news_links = news_div.find_all("a", {"class": "text-title"})
for news_link in news_links:
news_title = news_link.text.strip()
news_url = "https://www.xuexi.cn" + news_link["href"]
news_list.append({"title": news_title, "url": news_url})
for news in news_list:
print(news["title"], news["url"])
```
希望这个回答能够帮到您。
相关推荐
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![apk](https://img-home.csdnimg.cn/images/20210720083646.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![docx](https://img-home.csdnimg.cn/images/20210720083331.png)