如何python爬取链家制成表格
时间: 2023-10-18 22:09:04 浏览: 80
要爬取链家网站并制成表格,可以按照以下步骤进行:
1. 导入所需的库,包括requests、BeautifulSoup和pandas。
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
```
2. 定义要爬取的网页链接,使用requests库获取网页内容,并使用BeautifulSoup库解析网页。
```python
url = 'https://bj.lianjia.com/zufang/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
```
3. 使用BeautifulSoup库解析网页获取房源信息,并将信息存储在一个列表中。
```python
house_list = []
house_items = soup.find_all('div', class_='content__list--item--main')
for item in house_items:
title = item.find('p', class_='content__list--item--title').text.strip()
price = item.find('span', class_='content__list--item-price').text.strip()
area = item.find('span', class_='content__list--item--des').text.strip()
house_list.append([title, price, area])
```
4. 将房源信息列表转换为数据框,并将数据框保存为Excel表格。
```python
df = pd.DataFrame(house_list, columns=['Title', 'Price', 'Area'])
df.to_excel('lianjia.xlsx', index=False)
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://bj.lianjia.com/zufang/'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
house_list = []
house_items = soup.find_all('div', class_='content__list--item--main')
for item in house_items:
title = item.find('p', class_='content__list--item--title').text.strip()
price = item.find('span', class_='content__list--item-price').text.strip()
area = item.find('span', class_='content__list--item--des').text.strip()
house_list.append([title, price, area])
df = pd.DataFrame(house_list, columns=['Title', 'Price', 'Area'])
df.to_excel('lianjia.xlsx', index=False)
```
阅读全文
相关推荐








