首页python爬取央视新闻文本

python爬取央视新闻文本

时间: 2023-06-10 08:08:50 浏览: 59

要爬取央视新闻文本，可以使用Python中的requests和BeautifulSoup库。首先，使用requests库发送HTTP请求获取央视新闻网站的HTML源代码： ```python import requests url = "http://news.cctv.com/" response = requests.get(url) html = response.content ``` 然后，使用BeautifulSoup库解析HTML源代码，提取新闻标题和内容： ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, "html.parser") news_list = soup.find_all("div", class_="newslist") for news in news_list: title = news.find("a").text link = news.find("a")["href"] response = requests.get(link) soup = BeautifulSoup(response.content, "html.parser") content = soup.find("div", class_="cnt_bd").text print(title) print(content) ``` 这样就可以爬取央视新闻网站上的新闻文本了。注意，为了避免被网站屏蔽，最好加上一些延时和随机User-Agent等防止反爬措施。