python二手房使用教程_python爬虫爬取链家二手房信息
时间: 2023-07-12 21:46:38 浏览: 141
好的,下面是一个简单的Python爬虫爬取链家二手房信息的教程:
1. 安装必要的库:requests、BeautifulSoup4、pandas
```
pip install requests
pip install BeautifulSoup4
pip install pandas
```
2. 导入库
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
```
3. 定义函数获取页面信息
```python
def get_page(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
response.encoding = 'utf-8'
html = response.text
soup = BeautifulSoup(html, 'html.parser')
return soup
```
4. 定义函数获取房源信息
```python
def get_house_info(soup):
house_list = []
house_nodes = soup.select('.sellListContent .info')
for house_node in house_nodes:
house = {}
house['title'] = house_node.select('.title a')[0].get_text().strip()
house['address'] = house_node.select('.flood .positionInfo a')[0].get_text().strip()
house['info'] = house_node.select('.address .houseInfo')[0].get_text().strip()
house['area'] = house_node.select('.address .area')[0].get_text().strip()
house['total_price'] = house_node.select('.priceInfo .totalPrice span')[0].get_text().strip()
house['unit_price'] = house_node.select('.priceInfo .unitPrice span')[0].get_text().strip()
house_list.append(house)
return house_list
```
5. 定义主函数
```python
def main():
url = 'https://bj.lianjia.com/ershoufang/'
soup = get_page(url)
total_pages = int(soup.select('.page-box .totalPage')[0].get_text())
house_list = []
for page in range(1, total_pages + 1):
page_url = url + 'pg{}/'.format(page)
soup = get_page(page_url)
house_list += get_house_info(soup)
df = pd.DataFrame(house_list)
df.to_csv('house.csv', index=False, encoding='utf-8')
```
6. 运行主函数
```python
if __name__ == '__main__':
main()
```
这个爬虫会爬取北京链家二手房的前100页房源信息,并将结果保存在csv文件中。你可以根据自己的需求修改代码。需要注意的是,爬虫是一种不受欢迎的行为,你需要遵守网站的规则,尊重他人的权益。
阅读全文