用python爬取贝壳找房网的全国小区信息数据
时间: 2023-10-04 14:04:39 浏览: 122
好的,以下是用 Python 爬取贝壳找房网的全国小区信息数据的代码:
```python
import requests
from bs4 import BeautifulSoup
import csv
# 设置请求头
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 小区信息的URL
url = 'https://www.ke.com/city/'
# 发送请求
res = requests.get(url, headers=headers)
# 解析HTML
soup = BeautifulSoup(res.text, 'html.parser')
# 获取城市列表
city_list = soup.select('.city_list_section ul li')
# 遍历城市列表
for city in city_list:
# 获取城市名和链接
city_name = city.a.text
city_url = city.a['href']
# 发送请求
res = requests.get(city_url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
# 获取小区列表
community_list = soup.select('.listContent .clear .info')
# 遍历小区列表
for community in community_list:
# 获取小区名和链接
community_name = community.select('.title a')[0].text.strip()
community_url = community.select('.title a')[0]['href']
# 获取小区户型信息
res = requests.get(community_url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
room_list = soup.select('.room-list .content li')
# 写入CSV文件
with open('community_info.csv', 'a', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
for room in room_list:
room_type = room.select('.room')[0].text.strip()
room_area = room.select('.area')[0].text.strip()
room_price = room.select('.price')[0].text.strip()
writer.writerow([city_name, community_name, room_type, room_area, room_price])
print('爬取完成!')
```
这个代码会将全国所有城市的小区信息(包括小区名、链接、户型、面积和价格)保存到一个 CSV 文件中。注意,由于贝壳找房网的反爬虫机制比较严格,建议使用代理IP进行爬取。
阅读全文