首页怎么用Python 爬取html数据

怎么用Python 爬取html数据

时间: 2024-04-30 21:17:39 浏览: 60

python 爬取网页

python 爬取网页爬取豆瓣网top250电影

你需要使用 Python 的第三方库 requests 和 Beautiful Soup。 1. 首先，使用 requests 库发送 HTTP 请求，并获取响应内容： ```python import requests url = 'http://example.com' response = requests.get(url) html_content = response.text ``` 2. 然后，使用 Beautiful Soup 解析 HTML 内容： ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html_content, 'html.parser') ``` 3. 接下来，你可以使用 Beautiful Soup 提供的方法进行数据提取，如通过标签名获取标签的内容： ```python title_tag = soup.title print(title_tag.text) ``` 4. 你也可以使用 CSS 选择器来选择特定的元素： ```python element = soup.select_one('#id') # 选择 id 为 id 的元素 print(element.text) ``` 以上是一个简单的例子，你可以根据自己的需求进一步探索 Beautiful Soup 的其他功能。同时，需要注意的是，爬取网站数据需要遵守相关法律法规，避免违法行为。

阅读全文