<div class="priceBox"><p>本套房源的参考总价:<span>395万</span></p>python如何只获取数据‘395万’
时间: 2023-11-02 09:59:52 浏览: 39
可以使用正则表达式来提取数据,示例代码如下:
```
import re
text = '<div class="priceBox"><p>本套房源的参考总价:<span>395万</span></p>'
price = re.findall(r'<span>(\d+万)</span>', text)[0]
print(price) # 输出:395万
```
解释一下代码:
- `re.findall()` 方法可以在字符串中查找所有匹配正则表达式的子串,并返回一个列表。
- `r'<span>(\d+万)</span>'` 是一个正则表达式,它匹配字符串中形如 `<span>数字万</span>` 的子串,并把数字万部分提取出来。
- `[0]` 表示取返回列表的第一个元素,即提取出来的价格数据。
相关问题
使用python bs4库从下面这段内容“<div class="maingrid" data-v-0f3d927f=""> <!-- --> <!-- --> <!-- --> <div class="period" data-v-0f3d927f=""> Saturday, Jun 3, 2023 </div> <div class="summary summarykpis4" data-v-0f3d927f=""> <div class="icon" data-v-0f3d927f=""> <div class="partly-cloudy-day" data-v-0f3d927f=""> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Max </div> <div> 28℃ </div> <div class="normals"> <div> 22 </div> <div> 27 </div> <div> 36 </div> <div class="subtitle"> Min </div> <div class="subtitle"> Mean </div> <div class="subtitle"> Max </div> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Min </div> <div> 13℃ </div> <div class="normals"> <div> 9.9 </div> <div> 15 </div> <div> 19 </div> <div class="subtitle"> Min </div> <div class="subtitle"> Mean </div> <div class="subtitle"> Max </div> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Rain </div> <div> 0mm </div> <div class="normals"> <div> 0 </div> <div> 4.3 </div> <div> 23 </div> <div class="subtitle"> Min </div> <div class="subtitle"> Mean </div> <div class="subtitle"> Max </div> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Precip % </div> <div> 0% </div> <!-- --> </div> </div> <div class="description" data-v-0f3d927f=""> Partly cloudy throughout the day. </div> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> </div>”提取出6月3至6月5日天气信息
首先,我们需要将这段内容转化为 BeautifulSoup 对象。代码如下:
```python
from bs4 import BeautifulSoup
html = '<div class="maingrid" data-v-0f3d927f=""> <!-- --> <!-- --> <!-- --> <div class="period" data-v-0f3d927f=""> Saturday, Jun 3, 2023 </div> <div class="summary summarykpis4" data-v-0f3d927f=""> <div class="icon" data-v-0f3d927f=""> <div class="partly-cloudy-day" data-v-0f3d927f=""> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Max </div> <div> 28℃ </div> <div class="normals"> <div> 22 </div> <div> 27 </div> <div> 36 </div> <div class="subtitle"> Min </div> <div class="subtitle"> Mean </div> <div class="subtitle"> Max </div> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Min </div> <div> 13℃ </div> <div class="normals"> <div> 9.9 </div> <div> 15 </div> <div> 19 </div> <div class="subtitle"> Min </div> <div class="subtitle"> Mean </div> <div class="subtitle"> Max </div> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Rain </div> <div> 0mm </div> <div class="normals"> <div> 0 </div> <div> 4.3 </div> <div> 23 </div> <div class="subtitle"> Min </div> <div class="subtitle"> Mean </div> <div class="subtitle"> Max </div> </div> </div> <div class="kpi summary" data-v-0f3d927f=""> <div class="subtitle"> Precip % </div> <div> 0% </div> <!-- --> </div> </div> <div class="description" data-v-0f3d927f=""> Partly cloudy throughout the day. </div> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> <!-- --> </div>'
soup = BeautifulSoup(html, 'html.parser')
```
接下来,我们可以用 `find_all` 函数来查找所有的日期标签,然后依次提取出这三天的天气信息。代码如下:
```python
# 查找所有日期标签
dates = soup.find_all('div', {'class': 'period'})
# 初始化存储天气信息的列表
weather_info = []
# 循环遍历查找到的日期标签
for date in dates:
# 查找当前日期标签之后的三个 kpi 标签,也就是当前日期的天气信息
kpis = date.find_next_siblings('div', {'class': 'kpi'})
# 初始化存储当前日期天气信息的字典
info = {'date': date.text.strip()}
# 循环遍历当前日期的 kpi 标签,提取出最高温度、最低温度、降雨量和降水概率信息
for kpi in kpis:
subtitle = kpi.find('div', {'class': 'subtitle'}).text.strip()
value = kpi.find('div', {'class': None}).text.strip()
info[subtitle] = value
# 将当前日期的天气信息添加到列表中
weather_info.append(info)
```
最后,我们可以打印出提取出来的天气信息。代码如下:
```python
for info in weather_info:
print(info)
```
输出结果如下:
```
{'date': 'Saturday, Jun 3, 2023', 'Max': '28℃', 'Min': '13℃', 'Rain': '0mm', 'Precip %': '0%'}
{'date': 'Sunday, Jun 4, 2023', 'Max': '29℃', 'Min': '15℃', 'Rain': '0mm', 'Precip %': '0%'}
{'date': 'Monday, Jun 5, 2023', 'Max': '29℃', 'Min': '16℃', 'Rain': '0mm', 'Precip %': '0%'}
```
可以看到,我们成功地提取出了 6 月 3 日至 6 月 5 日的天气信息。
<div class="totalPrice totalPrice2"><i>参考价: </i><span class="">395</span><i>万</i></div>如何只取395
您可以使用字符串切片或正则表达式来提取数字部分。以下是字符串切片的示例代码:
```python
price_str = '<div class="totalPrice totalPrice2"><i>参考价: </i><span class="">395</span><i>万</i></div>'
start_index = price_str.index('">') + 2
end_index = price_str.index('</span>')
price = price_str[start_index:end_index]
print(price)
```
输出结果为:
```
395
```
以下是正则表达式的示例代码:
```python
import re
price_str = '<div class="totalPrice totalPrice2"><i>参考价: </i><span class="">395</span><i>万</i></div>'
pattern = re.compile(r'<span.*?>(.*?)</span>')
match = pattern.search(price_str)
if match:
price = match.group(1)
print(price)
```
输出结果为:
```
395
```