pythonscrapy爬虫网站案例
时间: 2023-07-14 16:09:50 浏览: 142
Scrapy是一个Python爬虫框架,可以用来爬取网站上的数据。下面是一个简单的爬取豆瓣电影信息的示例:
1. 安装Scrapy: pip install scrapy
2. 创建一个新项目: scrapy startproject douban_spider
3. 创建一个爬虫: scrapy genspider douban movie.douban.com
4. 编辑douban_spider/spiders/douban.py, 实现爬虫逻辑
5. 运行爬虫: scrapy crawl douban
示例代码:
```python
import scrapy
class DoubanSpider(scrapy.Spider):
name = "douban"
allowed_domains = ["movie.douban.com"]
start_urls = [
"https://movie.douban.com/top250"
]
def parse(self, response):
for sel in response.xpath('//ol[@class="grid_view"]/li'):
item = {}
item['name'] = sel.xpath('div/div[2]/a/span[1]/text()').extract()
item['year'] = sel.xpath('div/div[2]/div/span[2]/text()').re('\d{4}')
item['score'] = sel.xpath('div/div[2]/div/span[2]/text()').re('\d\.\d')
item['director'] = sel.xpath('div/div[2]/div[1]/a/text()').extract()
item['classification'] = sel.xpath('div/div[2]/div[2]/text()').extract()
yield item
```
运行爬虫后会输出爬取到的电影信息,可以导出到csv文件或者json文件中。
阅读全文
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![rar](https://img-home.csdnimg.cn/images/20241231044955.png)
![docx](https://img-home.csdnimg.cn/images/20241231044901.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![zip](https://img-home.csdnimg.cn/images/20241231045053.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![-](https://img-home.csdnimg.cn/images/20241231044955.png)
![-](https://img-home.csdnimg.cn/images/20241231045053.png)
![-](https://img-home.csdnimg.cn/images/20241231044930.png)
![-](https://img-home.csdnimg.cn/images/20241231044955.png)