首页python爬取新浪新闻

python爬取新浪新闻

时间: 2023-09-30 21:02:27 浏览: 263

使用Python爬取新浪新闻可以使用第三方库requests和BeautifulSoup。首先安装requests和BeautifulSoup库，可以通过以下命令进行安装： ```python pip install requests pip install beautifulsoup4 ``` 然后，我们需要确定要爬取的新闻网址。下面以新浪新闻为例，比如要爬取的是体育新闻，那么体育新闻的网址是：https://sports.sina.com.cn/。接下来，我们可以使用requests库获取网页的源代码，然后使用BeautifulSoup库对源代码进行解析，找到我们需要的新闻信息。下面是一个简单的示例代码： ```python import requests from bs4 import BeautifulSoup url = 'https://sports.sina.com.cn/' # 请求页面 res = requests.get(url) # 解析页面 soup = BeautifulSoup(res.text, 'html.parser') # 找到所有的新闻链接 news_links = soup.find_all('a', class_='news-item') # 遍历新闻链接，输出新闻标题和链接 for link in news_links: title = link.text.strip() href = link['href'] print(title, href) ``` 说明： 1. 首先定义要爬取的新闻网址。 2. 使用requests库向网址发送请求，并获取到网页的源代码。 3. 使用BeautifulSoup库对源代码进行解析。 4. 使用find_all方法找到所有class为news-item的a标签，这些a标签包含了我们需要的新闻链接。 5. 遍历所有的新闻链接，使用text属性获取新闻标题，使用['href']获取新闻链接。

阅读全文