python爬取大众点评代码
时间: 2023-10-10 11:06:25 浏览: 262
由于大众点评有反爬虫机制,所以直接爬取可能会被封IP,建议使用代理IP和随机User-Agent来进行爬取。以下是一个简单的爬取大众点评店铺信息的代码示例:
```python
import requests
from bs4 import BeautifulSoup
import random
import time
# 设置请求头,包括User-Agent和Referer
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
'Referer': 'http://www.dianping.com/',
}
# 设置代理IP
proxies = {
'http': 'http://127.0.0.1:1080',
'https': 'https://127.0.0.1:1080',
}
# 设置爬取的城市和分类
city = '上海'
category = '美食'
# 设置爬取的页数
pages = 10
for i in range(1, pages+1):
# 构造请求URL
url = 'http://www.dianping.com/{0}/{1}/p{2}'.format(city, category, i)
# 发送请求
response = requests.get(url, headers=headers, proxies=proxies)
# 解析HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 获取店铺列表
shop_list = soup.find_all('div', {'class': 'txt'})
# 遍历店铺列表,提取店铺信息
for shop in shop_list:
# 店名
name = shop.find('div', {'class': 'tit'}).find('a').text.strip()
# 评分
rating = shop.find('span', {'class': 'comment-list'}).find('b').text.strip()
# 评论数
review_count = shop.find('span', {'class': 'comment-list'}).find_all('a')[1].text.strip()
# 人均消费
avg_price = shop.find('span', {'class': 'mean-price'}).text.strip()
# 地址
address = shop.find('span', {'class': 'addr'}).text.strip()
# 打印店铺信息
print(name, rating, review_count, avg_price, address)
# 随机休眠1~3秒,避免被封IP
time.sleep(random.randint(1, 3))
```
这段代码将爬取上海的美食店铺信息,每页10个,共爬取10页。在遍历店铺列表时,提取了店名、评分、评论数、人均消费和地址。注意,在遍历完每一页之后,随机休眠1~3秒,避免被封IP。
阅读全文