请帮我完成scrapy下面的代码,要求能够在comments_url提供的url下进行循环并进行爬取,以下是我的代码 def parse_detail(self, response, **kwargs): movie_item = kwargs['item'] sel = Selector(response) movie_item['directors'] = sel.xpath('//a[@rel="v:directedBy"]/text()').extract() movie_item['casts'] = sel.xpath('//a[@rel="v:starring"]/text()').extract() movie_item['types'] = sel.css('span[property="v:genre"]::text').extract() movie_item['country'] = sel.css('span[property="v:initialReleaseDate"]::text').extract() movie_item['movie_time'] = sel.css('span[property="v:runtime"]::text').extract() comments = movie_item['url'] comment = "/comments?start=0&limit=20&status=P&sort=new_score" comments_url = comments + comment
时间: 2023-11-22 08:52:18 浏览: 88
以下是代码,可以在comments_url提供的url下进行循环并进行爬取:
```
def parse_detail(self, response, **kwargs):
movie_item = kwargs['item']
sel = Selector(response)
movie_item['directors'] = sel.xpath('//a[@rel="v:directedBy"]/text()').extract()
movie_item['casts'] = sel.xpath('//a[@rel="v:starring"]/text()').extract()
movie_item['types'] = sel.css('span[property="v:genre"]::text').extract()
movie_item['country'] = sel.css('span[property="v:initialReleaseDate"]::text').extract()
movie_item['movie_time'] = sel.css('span[property="v:runtime"]::text').extract()
# 获取评论url
comments = movie_item['url']
comment = "/comments?start={}&limit=20&status=P&sort=new_score"
start = 0
while True:
comments_url = comments + comment.format(start)
yield scrapy.Request(comments_url, callback=self.parse_comments, meta={'item': movie_item})
start += 20
def parse_comments(self, response):
movie_item = response.meta['item']
sel = Selector(response)
comments = sel.css('div.comment-item')
for comment in comments:
commenter = comment.css('a[href^="https://www.douban.com/people/"]::text').extract_first()
rating = comment.css('span.comment-info>span.rating::attr(title)').extract_first()
content = comment.css('span.short::text').extract_first()
time = comment.css('span.comment-time::text').extract_first()
movie_item['comments'].append({'commenter': commenter, 'rating': rating, 'content': content, 'time': time})
yield movie_item
```
在`parse_detail`函数中,我们使用一个while循环不断获取评论的url,其中start从0开始,每次增加20,直到获取不到评论为止。然后,我们使用scrapy.Request发送请求,将获取到的评论url传递给`parse_comments`函数进行解析,并将电影信息传递给meta参数。在`parse_comments`函数中,我们使用css选择器获取评论信息,并将其添加到电影信息的comments字段中,最后将电影信息yield出去。
阅读全文