首页python爬虫爬取网页数据

python爬虫爬取网页数据

时间: 2023-07-03 16:04:45 浏览: 63

使用 Python 爬虫可以方便地获取网页数据，常用的库有 requests 和 BeautifulSoup。首先，使用 requests 库发送 HTTP 请求获取网页内容： ``` import requests url = 'http://example.com' response = requests.get(url) html = response.text ``` 然后，使用 BeautifulSoup 解析 HTML 内容： ``` from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') ``` 接下来，可以使用 BeautifulSoup 提供的方法来获取所需的信息，例如： ``` # 获取标题 title = soup.title.string # 获取所有链接 links = [link.get('href') for link in soup.find_all('a')] # 获取所有图片链接 img_links = [img.get('src') for img in soup.find_all('img')] ``` 最后，记得关闭请求连接： ``` response.close() ``` 需要注意的是，爬取网页时应该遵守网站的规定，不要过度请求或者进行恶意操作，以免被封 IP 或者违反法律法规。