首页如何编写一个python爬虫脚本

如何编写一个python爬虫脚本

时间: 2023-05-24 21:06:01 浏览: 270

编写一个 Python 爬虫脚本的基本步骤如下： 1. 导入所需库 ``` import requests from bs4 import BeautifulSoup ``` 2. 构建请求头和请求参数 ``` headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} params = { 'q': 'python', 'start': '0', 'num': '10' } ``` 3. 发起请求并获取响应 ``` response = requests.get('https://www.google.com/search', params=params, headers=headers) ``` 4. 解析页面内容 ``` soup = BeautifulSoup(response.text, 'html.parser') ``` 5. 提取需要的数据 ``` titles = soup.find_all('h3', class_='r') for title in titles: print(title.text) ``` 完整代码示例： ``` import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 ' '(KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} params = { 'q': 'python', 'start': '0', 'num': '10' } response = requests.get('https://www.google.com/search', params=params, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') titles = soup.find_all('h3', class_='r') for title in titles: print(title.text) ``` 以上代码可以用于在 Google 搜索中搜索 Python 相关的结果，并提取出标题。编写爬虫时需要注意网站的合法性，遵守规则和尊重网站所有者的意愿，避免被认为是恶意爬虫。

阅读全文