首页python爬取济南链家网户型信息代码

python爬取济南链家网户型信息代码

时间: 2023-07-22 21:27:01 浏览: 162

以下是使用 Python 爬取济南链家网户型信息的示例代码： ```python import requests from bs4 import BeautifulSoup url = 'https://jn.lianjia.com/ershoufang/pg1/' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, 'html.parser') house_items = soup.find_all('div', class_='info clear') for item in house_items: title = item.find('div', class_='title').a.text.strip() price = item.find('div', class_='totalPrice').span.text.strip() + '万' area = item.find('div', class_='houseInfo').text.strip().split('|')[1].replace('平米', '').strip() room = item.find('div', class_='houseInfo').text.strip().split('|')[0] floor = item.find('div', class_='positionInfo').text.strip().split(' ')[0] year = item.find('div', class_='positionInfo').text.strip().split(' ')[-1].replace('年建', '').strip() print(title, price, area, room, floor, year) ``` 使用 `requests` 库和 `BeautifulSoup` 库分别发送请求和解析 HTML 页面。代码中的 `url` 是济南链家网二手房列表的第一页，可以根据需要修改为其他页面的链接。`headers` 是模拟浏览器发送请求的请求头，需要设置 User-Agent，否则链家网会返回 403 状态码。解析 HTML 页面后，使用 `find_all` 方法找到所有的房源信息，包括标题、总价、面积、房型、楼层和建造年份。最后将每个房源的信息打印出来。

阅读全文