使用scrapy框架爬取2022年全国的天气数据
时间: 2023-09-20 14:08:21 浏览: 59
好的,以下是爬取2022年全国天气数据的Scrapy示例代码:
```python
import scrapy
class WeatherSpider(scrapy.Spider):
name = 'weather'
def start_requests(self):
urls = ['http://www.tianqihoubao.com/lishi/']
for url in urls:
yield scrapy.Request(url=url, callback=self.parse_province)
def parse_province(self, response):
provinces = response.xpath('//div[@class="citychk"]/dl/dd/a')
for province in provinces:
province_name = province.xpath('./text()').get()
province_url = province.xpath('./@href').get()
yield scrapy.Request(url=response.urljoin(province_url), callback=self.parse_city, meta={'province': province_name})
def parse_city(self, response):
province_name = response.meta['province']
cities = response.xpath('//div[@class="citychk"]/dl/dd/a')
for city in cities:
city_name = city.xpath('./text()').get()
city_url = city.xpath('./@href').get()
yield scrapy.Request(url=response.urljoin(city_url), callback=self.parse_weather, meta={'province': province_name, 'city': city_name})
def parse_weather(self, response):
province_name = response.meta['province']
city_name = response.meta['city']
rows = response.xpath('//table[@class="b"]/tr[position()>1]')
for row in rows:
date = row.xpath('./td[1]/a/text()').get()
weather = row.xpath('./td[2]/text()').get()
temperature = row.xpath('./td[3]/text()').get()
wind = row.xpath('./td[4]/text()').get()
yield {
'province': province_name,
'city': city_name,
'date': date,
'weather': weather,
'temperature': temperature,
'wind': wind
}
```
这个爬虫会从http://www.tianqihoubao.com/lishi/开始,依次爬取所有省份,再依次爬取所有城市,最后爬取每个城市的天气数据。爬取的结果会被保存在字典中,并通过yield返回。你可以在爬虫中加入自己的存储逻辑,将数据存入数据库或文件。