如何使用scrapy爬取猫眼电影最受期待榜前20并保存到csv文件中
时间: 2024-06-08 12:11:17 浏览: 174
首先需要安装scrapy,可以使用以下命令进行安装:
```
pip install scrapy
```
然后创建一个新的scrapy项目和一个爬虫:
```
scrapy startproject maoyan
cd maoyan
scrapy genspider maoyan_spider maoyan.com
```
接下来,打开 `maoyan_spider.py` 文件,在 `start_urls` 列表中添加猫眼电影最受期待榜的链接:
```python
start_urls = ['https://maoyan.com/films?showType=3']
```
然后在 `parse` 方法中编写爬取逻辑:
```python
def parse(self, response):
movies = response.xpath('//div[@class="movie-item film-channel"]')[:20]
for movie in movies:
title = movie.xpath('./a/@title').extract_first().strip()
release_time = movie.xpath('./div[2]/text()').extract_first().strip()
yield {'title': title, 'release_time': release_time}
```
上面的代码使用 `xpath` 提取了每个电影的标题和上映时间,并使用 `yield` 将结果返回。注意,我们只提取前20个电影。
最后,可以使用以下命令运行爬虫并将结果保存到 `movies.csv` 文件中:
```
scrapy crawl maoyan_spider -o movies.csv
```
完整的爬虫代码如下:
```python
import scrapy
class MaoyanSpiderSpider(scrapy.Spider):
name = 'maoyan_spider'
allowed_domains = ['maoyan.com']
start_urls = ['https://maoyan.com/films?showType=3']
def parse(self, response):
movies = response.xpath('//div[@class="movie-item film-channel"]')[:20]
for movie in movies:
title = movie.xpath('./a/@title').extract_first().strip()
release_time = movie.xpath('./div[2]/text()').extract_first().strip()
yield {'title': title, 'release_time': release_time}
```
希望能够帮助到你。
阅读全文