jupyter爬取京东苏州酒店信息
时间: 2023-10-17 18:37:11 浏览: 135
好的,以下是在 Jupyter Notebook 中爬取京东苏州酒店信息的步骤:
1. 导入必要的库
```python
import requests
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
import pandas as pd
```
2. 设置 headers
```python
headers = {
'User-Agent': UserAgent().chrome
}
```
3. 发送请求并解析页面
```python
url = 'https://hotels.jd.com/city-suzhou.html'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
```
4. 获取苏州酒店列表
```python
hotel_list = soup.select('.hotel-item')
```
5. 遍历酒店列表,提取需要的信息
```python
hotel_info_list = []
for hotel in hotel_list:
hotel_info = {}
hotel_info['hotel_name'] = hotel.select_one('.hotel-name').text.strip()
hotel_info['hotel_address'] = hotel.select_one('.hotel-address').text.strip()
hotel_info['hotel_score'] = hotel.select_one('.hotel-score').text.strip()
hotel_info['hotel_comment_num'] = hotel.select_one('.hotel-comment-num').text.strip()
hotel_info['hotel_price'] = hotel.select_one('.room-price').text.strip()
hotel_info_list.append(hotel_info)
```
6. 存储数据
```python
hotel_df = pd.DataFrame(hotel_info_list)
hotel_df.to_csv('suzhou_hotel_info.csv', index=False)
```
以上就是在 Jupyter Notebook 中爬取京东苏州酒店信息的基本步骤。需要注意的是,具体实现还需要根据网页的结构进行调整。
阅读全文