首页python爬取多个网页内容

python爬取多个网页内容

时间: 2023-07-12 21:55:42 浏览: 89

要爬取多个网页内容，可以使用 Python 的 requests 和 BeautifulSoup 库。首先，使用 requests 库发送 HTTP 请求获取网页的 HTML 内容，然后使用 BeautifulSoup 解析 HTML，提取出需要的内容。以下是一个简单的示例代码，可以爬取多个网页的标题、正文和链接： ```python import requests from bs4 import BeautifulSoup urls = ['https://www.example.com/page1', 'https://www.example.com/page2', 'https://www.example.com/page3'] for url in urls: # 发送 HTTP 请求获取 HTML 内容 response = requests.get(url) html = response.content # 使用 BeautifulSoup 解析 HTML soup = BeautifulSoup(html, 'html.parser') # 提取标题、正文和链接 title = soup.title.text content = soup.find('div', {'class': 'content'}).text link = soup.find('a')['href'] # 打印结果 print('Title:', title) print('Content:', content) print('Link:', link) print() ``` 需要注意的是，爬取网页的行为可能会违反网站的使用协议，应该遵守相关法律法规和道德规范，不要进行恶意爬取和侵犯他人隐私的行为。

阅读全文