首页python爬虫批量下载腾讯新闻网页

python爬虫批量下载腾讯新闻网页

时间: 2023-11-08 16:53:31 浏览: 210

要批量下载腾讯新闻网页，你可以使用Python的爬虫库Beautiful Soup和requests来实现。以下是一个简单的示例代码： ```python import requests from bs4 import BeautifulSoup # 设置要爬取的网页链接 url = 'https://news.qq.com/' # 发送HTTP请求 response = requests.get(url) # 解析HTML内容 soup = BeautifulSoup(response.text, 'html.parser') # 找到所有新闻链接 news_links = soup.find_all('a', class_='linkto') # 遍历链接，并下载网页 for link in news_links: news_url = link['href'] news_response = requests.get(news_url) news_content = news_response.text # 可以将网页内容保存到本地文件 with open('{}.html'.format(news_url.replace('/', '_')), 'w', encoding='utf-8') as f: f.write(news_content) ``` 这段代码会首先发送HTTP请求获取腾讯新闻首页的HTML内容，然后使用Beautiful Soup解析HTML，找到所有新闻链接，最后遍历链接，下载每个新闻对应的网页并保存到本地文件。

阅读全文