首页python爬虫代码爬取古诗词

python爬虫代码爬取古诗词

时间: 2023-12-20 20:05:17 浏览: 145

以下是一个使用Python爬虫框架Scrapy爬取古诗词的示例代码： ```python import scrapy class PoemSpider(scrapy.Spider): name = 'poemSpider' start_urls = ['https://www.gushiwen.org/gushi/tangshi.aspx'] def parse(self, response): poem_list = response.xpath('//div[@class="left"]/div[@class="sons"]/div[@class="cont"]') for poem in poem_list: title = poem.xpath('./a/b/text()').get() author = poem.xpath('./a[@href]/text()').get() content = poem.xpath('./div[@class="contson"]/text()').getall() content = ''.join(content).strip() yield { 'title': title, 'author': author, 'content': content } next_page = response.xpath('//div[@class="pages"]/a[@class="amore"]/@href') if next_page: yield response.follow(next_page[0], self.parse) ``` 这个爬虫代码使用Scrapy框架，爬取了古诗词网站唐诗三百首页面的诗词标题、作者和内容，并将结果以字典形式输出。你可以将这个代码保存为.py文件，然后在Terminal窗口中输入命令“scrapy crawl poemSpider”，回车运行即可完成数据的爬取。

阅读全文