scrapy爬虫猫眼电影排行TOP100
时间: 2023-12-09 07:05:35 浏览: 253
爬取猫眼电影的排行榜前100名
3星 · 编辑精心推荐
以下是使用Scr框架爬取猫眼电影TOP100的示例代码:
1. 创建Scrapy项目
在命令行中输入以下命令创建一个名为maoyan的Scrapy项目:
```
scrapy startproject maoyan
```
2. 创建爬虫
在命令行中进入maoyan项目目录,输入以下命令创建一个名为movies的爬虫:
```
scrapy genspider movies maoyan.com
```
3. 编写爬虫代码
打开maoyan/spiders/movies.py文件,将以下代码复制进去:
```python
import scrapy
from maoyan.items import MaoyanItem
class MoviesSpider(scrapy.Spider):
name = 'movies'
allowed_domains = ['maoyan.com']
start_urls = ['https://maoyan.com/board/4']
def parse(self, response):
movies = response.xpath('//div[@class="movie-item-info"]')
for movie in movies:
item = MaoyanItem()
item['rank'] = movie.xpath('div[@class="board-index"]/text()').get().strip()
item['title'] = movie.xpath('div[@class="movie-item-info"]/p[@class="name"]/a/@title').get().strip()
item['star'] = movie.xpath('div[@class="movie-item-info"]/p[@class="star"]/text()').get().strip()
item['release_time'] = movie.xpath('div[@class="movie-item-info"]/p[@class="releasetime"]/text()').get().strip()
yield item
```
4. 编写Item
在maoyan目录下创建一个名为items.py的文件,将以下代码复制进去:
```python
import scrapy
class MaoyanItem(scrapy.Item):
rank = scrapy.Field()
title = scrapy.Field()
star = scrapy.Field()
release_time = scrapy.Field()
```
5. 运行爬虫
在命令行中进入maoyan目录,输入以下命令运行爬虫:
```
scrapy crawl movies -o movies.csv
```
6. 查看结果
在maoyan目录下会生成一个名为movies.csv的文件,里面包含了猫眼电影TOP100的排行信息。
阅读全文