首页python爬虫爬取网页数据

python爬虫爬取网页数据

时间: 2023-10-07 21:05:53 浏览: 69

使用 Python 编写爬虫程序可以通过 urllib/urllib2、requests 等库来发起网络请求，获取网页数据。其中 requests 库比 urllib 更加方便易用。以下是一个使用 requests 库获取网页数据的示例代码： ```python import requests # 发起 GET 请求 response = requests.get('https://www.example.com') # 获取响应内容 content = response.text # 输出响应内容 print(content) ``` 另外，使用 BeautifulSoup 库可以方便地解析 HTML 页面，提取需要的数据。以下是一个示例代码： ```python from bs4 import BeautifulSoup import requests # 发起 GET 请求 response = requests.get('https://www.example.com') # 获取响应内容 content = response.text # 使用 BeautifulSoup 解析 HTML 页面 soup = BeautifulSoup(content, 'html.parser') # 查找所有 <a> 标签 links = soup.find_all('a') # 输出所有链接的 href 属性值 for link in links: print(link.get('href')) ``` 以上示例代码只是一个简单的示例，实际的爬虫程序可能需要处理更多的内容，例如网页的编码、反爬虫机制等。在编写爬虫程序时，请注意遵守相关法律法规，不要进行非法爬取。

阅读全文