scrapy爬虫爬多节页面的代码
时间: 2023-02-20 10:35:40 浏览: 13
Scrapy爬取多级网页内容源代码加实现.txt
Scrapy是一个用于编写爬虫的Python框架,可以用来爬取多个页面。下面是一个简单的示例代码,用于爬取多个页面:
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class MySpider(CrawlSpider):
name = 'example.com'
allowed_domains = ['example.com']
start_urls = ['http://www.example.com']
rules = [
Rule(LinkExtractor(allow=r'/category/'), 'parse_category')
]
def parse_category(self, response):
for link in LinkExtractor(allow=r'/item/').extract_links(response):
yield scrapy.Request(link.url, callback=self.parse_item)
def parse_item(self, response):
self.logger.info('Hi, this is an item page! %s', response.url)
阅读全文