首页用scrapy编写一个知乎热榜的爬虫

用scrapy编写一个知乎热榜的爬虫

时间: 2023-11-23 22:48:47 浏览: 168

以下是一个使用Scrapy爬取知乎热榜的示例代码： ```python import scrapy class ZhihuHotSpider(scrapy.Spider): name = "zhihuhot" allowed_domains = ["www.zhihu.com"] start_urls = [ "https://www.zhihu.com/hot" ] def parse(self, response): for item in response.css('.HotItem'): yield { 'title': item.css('.HotItem-title a::text').get(), 'url': item.css('.HotItem-title a::attr(href)').get(), 'excerpt': item.css('.HotItem-excerpt::text').get(), 'score': item.css('.HotItem-metrics::text').get(), 'image': item.css('.HotItem-cover img::attr(src)').get() } ``` 该爬虫首先访问知乎热榜页面，然后使用CSS选择器来提取每个热门条目的标题、URL、摘要、得分和图像。最后，将这些数据作为字典对象返回。可以使用以下命令运行该爬虫： ``` scrapy runspider zhihuhot.py -o zhihuhot.json ``` 该命令将启动爬虫并将结果保存到名为“zhihuhot.json”的JSON文件中。

阅读全文