首页1. 爬取人民网首页http://www.people.com.cn/国际 · 军事模块下的新闻标题和出现的图片。

1. 爬取人民网首页http://www.people.com.cn/国际 · 军事模块下的新闻标题和出现的图片。

时间: 2023-12-30 21:05:35 浏览: 63

好的，以下是针对该需求的Python爬虫代码实现： ```python import requests from bs4 import BeautifulSoup url = "http://www.people.com.cn/" response = requests.get(url) response.encoding = 'gbk' soup = BeautifulSoup(response.text, "html.parser") # 获取“国际·军事”模块下的新闻链接 module = soup.find("div", class_="ej_channel") news_links = module.find_all("a") for link in news_links: if link.get("href") and "world" in link.get("href") and "people" in link.get("href"): news_url = link.get("href") break # 进入新闻页面并获取新闻标题和图片 response = requests.get(news_url) response.encoding = 'gbk' soup = BeautifulSoup(response.text, "html.parser") title = soup.find("h1").text print("新闻标题：", title) images = soup.find_all("img") for image in images: if image.get("src") and "http" in image.get("src"): print("新闻图片：", image.get("src")) ``` 解释一下代码实现的思路：首先，我们访问人民网首页并获取页面内容，然后使用BeautifulSoup库解析页面。通过查看页面源代码，我们可以发现“国际·军事”模块下的新闻链接是通过a标签的href属性指定的。找到该模块的div标签，遍历其中的所有a标签，通过判断href属性中是否包含“world”和“people”字符串来确定是否是我们需要的新闻链接。最后，进入新闻页面，通过查看页面源代码，我们可以发现新闻标题是通过h1标签包含的文本来指定的，而新闻图片则是通过img标签的src属性指定的。因此，我们可以使用BeautifulSoup库的find和find_all方法来获取这些信息，最终输出到控制台。

阅读全文