首页如何使用scrapy爬取猫眼电影最受期待榜前20并保存到csv文件中

如何使用scrapy爬取猫眼电影最受期待榜前20并保存到csv文件中

时间: 2024-06-08 12:11:17 浏览: 174

首先需要安装scrapy，可以使用以下命令进行安装： ``` pip install scrapy ``` 然后创建一个新的scrapy项目和一个爬虫： ``` scrapy startproject maoyan cd maoyan scrapy genspider maoyan_spider maoyan.com ``` 接下来，打开 `maoyan_spider.py` 文件，在 `start_urls` 列表中添加猫眼电影最受期待榜的链接： ```python start_urls = ['https://maoyan.com/films?showType=3'] ``` 然后在 `parse` 方法中编写爬取逻辑： ```python def parse(self, response): movies = response.xpath('//div[@class="movie-item film-channel"]')[:20] for movie in movies: title = movie.xpath('./a/@title').extract_first().strip() release_time = movie.xpath('./div[2]/text()').extract_first().strip() yield {'title': title, 'release_time': release_time} ``` 上面的代码使用 `xpath` 提取了每个电影的标题和上映时间，并使用 `yield` 将结果返回。注意，我们只提取前20个电影。最后，可以使用以下命令运行爬虫并将结果保存到 `movies.csv` 文件中： ``` scrapy crawl maoyan_spider -o movies.csv ``` 完整的爬虫代码如下： ```python import scrapy class MaoyanSpiderSpider(scrapy.Spider): name = 'maoyan_spider' allowed_domains = ['maoyan.com'] start_urls = ['https://maoyan.com/films?showType=3'] def parse(self, response): movies = response.xpath('//div[@class="movie-item film-channel"]')[:20] for movie in movies: title = movie.xpath('./a/@title').extract_first().strip() release_time = movie.xpath('./div[2]/text()').extract_first().strip() yield {'title': title, 'release_time': release_time} ``` 希望能够帮助到你。

阅读全文