解析网页数据。选用适当的解析方法(re模块、lxml模块或者BeautifulSoup模块)对网页源代码进行解析,解析的数据包括:日期、天气状况、气温、风力风向。要求解析的数据打印显示,并将打印结果中的前10条数据进行截图
时间: 2024-05-02 07:21:15 浏览: 86
Python爬虫精简步骤5 用BeautifulSoup解析数据
5星 · 资源好评率100%
由于没有给出具体的网页链接和要解析的数据位置,因此下面提供一个通用的示例代码,仅供参考。
使用BeautifulSoup模块解析网页数据:
```python
import requests
from bs4 import BeautifulSoup
url = "http://www.weather.com.cn/weather/101280101.shtml"
response = requests.get(url)
response.encoding = "utf-8"
html = response.text
soup = BeautifulSoup(html, "html.parser")
# 解析日期
date_list = soup.select(".week > span")
dates = [date.text for date in date_list]
print("日期:", dates)
# 解析天气状况
weather_list = soup.select(".wea")
weathers = [weather.text for weather in weather_list]
print("天气状况:", weathers)
# 解析气温
temp_list = soup.select(".tem > span")
temps = [temp.text + "℃" for temp in temp_list]
print("气温:", temps)
# 解析风力风向
wind_list = soup.select(".win > em")
winds = [wind.text for wind in wind_list]
print("风力风向:", winds)
```
使用lxml模块解析网页数据:
```python
import requests
from lxml import etree
url = "http://www.weather.com.cn/weather/101280101.shtml"
response = requests.get(url)
response.encoding = "utf-8"
html = response.text
tree = etree.HTML(html)
# 解析日期
date_list = tree.xpath('//div[@id="7d"]/ul[@class="t clearfix"]/li/h1/text()')
dates = [date.strip() for date in date_list]
print("日期:", dates)
# 解析天气状况
weather_list = tree.xpath('//div[@id="7d"]/ul[@class="t clearfix"]/li/p[@class="wea"]/text()')
weathers = [weather.strip() for weather in weather_list]
print("天气状况:", weathers)
# 解析气温
temp_list = tree.xpath('//div[@id="7d"]/ul[@class="t clearfix"]/li/p[@class="tem"]/span/text()')
temps = [temp.strip() for temp in temp_list]
temps = [temp + "℃" for temp in temps]
print("气温:", temps)
# 解析风力风向
wind_list = tree.xpath('//div[@id="7d"]/ul[@class="t clearfix"]/li/p[@class="win"]/em/span/@title')
winds = [wind.strip() for wind in wind_list]
print("风力风向:", winds)
```
使用re模块解析网页数据:
```python
import requests
import re
url = "http://www.weather.com.cn/weather/101280101.shtml"
response = requests.get(url)
response.encoding = "utf-8"
html = response.text
# 解析日期
date_pattern = re.compile('<h1>(.*?)</h1>', re.S)
date_list = re.findall(date_pattern, html)
dates = [date.strip() for date in date_list]
print("日期:", dates)
# 解析天气状况
weather_pattern = re.compile('<p class="wea">(.*?)</p>', re.S)
weather_list = re.findall(weather_pattern, html)
weathers = [weather.strip() for weather in weather_list]
print("天气状况:", weathers)
# 解析气温
temp_pattern = re.compile('<p class="tem"><span>(.*?)</span><i>(.*?)</i></p>', re.S)
temp_list = re.findall(temp_pattern, html)
temps = [temp[0].strip() + "℃" + "/" + temp[1].strip() + "℃" for temp in temp_list]
print("气温:", temps)
# 解析风力风向
wind_pattern = re.compile('<p class="win"><em><span class=".*?"></span><i>(.*?)</i></em><em><span class=".*?"></span><i>(.*?)</i></em></p>', re.S)
wind_list = re.findall(wind_pattern, html)
winds = [wind[0].strip() + "/" + wind[1].strip() for wind in wind_list]
print("风力风向:", winds)
```
注意:以上代码仅作为示例,具体的解析方法和正则表达式需要根据具体的网页结构而定。解析结果可能会因为网站的更新而失效。
阅读全文