使用scrapy抓取大众点评网美食的商家名字,人均消费,地址等等信息
时间: 2024-05-03 18:22:56 浏览: 22
由于大众点评网需要登录才能访问,因此需要先获取登录后的cookie,然后在请求时带上cookie。
以下是一个示例代码,可以抓取大众点评网上某个城市的美食商家的名称、人均消费、地址等信息:
```
import scrapy
from scrapy.http.cookies import CookieJar
class DianpingSpider(scrapy.Spider):
name = 'dianping'
allowed_domains = ['www.dianping.com']
custom_settings = {
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299',
}
def start_requests(self):
# 通过登录获取cookie
return [scrapy.Request(url='https://account.dianping.com/login?redir=https://www.dianping.com/',
callback=self.parse_login)]
def parse_login(self, response):
formdata = {
'username': 'your_username',
'password': 'your_password',
'redir': 'https://www.dianping.com/',
'geetest_challenge': '',
'geetest_validate': '',
'geetest_seccode': '',
}
# 提交登录表单
yield scrapy.FormRequest.from_response(
response,
formdata=formdata,
callback=self.after_login
)
def after_login(self, response):
# 验证登录是否成功
if '我的点评' in response.text:
# 登录成功,开始抓取数据
cookie_jar = CookieJar()
cookie_jar.extract_cookies(response, response.request)
cookies = {}
for cookie in cookie_jar:
cookies[cookie.name] = cookie.value
# 抓取某个城市的美食商家列表
city = 'shenzhen'
url = f'https://www.dianping.com/{city}/ch10'
yield scrapy.Request(url=url, cookies=cookies, callback=self.parse)
def parse(self, response):
# 抓取每个商家的信息
for item in response.css('div.tit > a'):
name = item.css('::text').get()
url = item.css('::attr(href)').get()
yield scrapy.Request(url=url, cookies=response.request.cookies, callback=self.parse_detail,
meta={'name': name})
def parse_detail(self, response):
# 抓取商家的详细信息
name = response.meta['name']
avg_price = response.css('span.avg-price > b::text').get()
address = response.css('div.expand-info.address span::text').get()
phone = response.css('p.expand-info.tel span::text').get()
yield {
'name': name,
'avg_price': avg_price,
'address': address,
'phone': phone,
}
```
需要将代码中的`your_username`和`your_password`替换为自己的账号和密码。这里使用了`scrapy.http.cookies.CookieJar`类来获取登录后的cookie,并将cookie在后续的请求中带上。在抓取商家的详细信息时,使用了`meta`参数来传递商家名称,以便后续整合数据。