scrapy爬虫完整实例爬虫完整实例
主要介绍了scrapy爬虫完整实例,小编觉得还是挺不错的,具有一定借鉴价值,需要的朋友可以参考下
本文主要通过实例介绍了scrapy框架的使用,分享了两个例子,爬豆瓣文本例程 douban 和图片例程 douban_imgs ,具体如
下。
例程例程1:: douban
目录树目录树
douban
--douban
--spiders
--__init__.py
--bookspider.py
--douban_comment_spider.py
--doumailspider.py
--__init__.py
--items.py
--pipelines.py
--settings.py
--scrapy.cfg
–spiders–init.py
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
bookspider.py
# -*- coding:utf-8 -*-
'''by sudo rm -rf http://imchenkun.com'''
import scrapy
from douban.items import DoubanBookItem
class BookSpider(scrapy.Spider):
name = 'douban-book'
allowed_domains = ['douban.com']
start_urls = [
'https://book.douban.com/top250'
]
def parse(self, response):
# 请求第一页
yield scrapy.Request(response.url, callback=self.parse_next)
# 请求其它页
for page in response.xpath('//div[@class="paginator"]/a'):
link = page.xpath('@href').extract()[0]
yield scrapy.Request(link, callback=self.parse_next)
def parse_next(self, response):
for item in response.xpath('//tr[@class="item"]'):
book = DoubanBookItem()
book['name'] = item.xpath('td[2]/div[1]/a/@title').extract()[0]
book['content'] = item.xpath('td[2]/p/text()').extract()[0]
book['ratings'] = item.xpath('td[2]/div[2]/span[2]/text()').extract()[0]
yield book
douban_comment_spider.py
# -*- coding:utf-8 -*-
import scrapy
from faker import Factory
from douban.items import DoubanMovieCommentItem
import urlparse
f = Factory.create()
class MailSpider(scrapy.Spider):
name = 'douban-comment'
评论0