python使用scrapy爬取京东多页数据并且将数据传入数据库,然后用pyecharts做数据可视化
时间: 2023-09-17 09:10:02 浏览: 91
好的,这是一个比较复杂的问题,需要分几个步骤来完成。
1. 安装和配置Scrapy
首先,你需要安装Scrapy。你可以在终端中使用以下命令安装:
```
pip install scrapy
```
安装完成后,你需要在Scrapy项目中配置以下文件:`settings.py`、`items.py`、`pipelines.py`。
其中,`settings.py`文件中需要配置一些参数,如下所示:
```
BOT_NAME = 'jd'
SPIDER_MODULES = ['jd.spiders']
NEWSPIDER_MODULE = 'jd.spiders'
ROBOTSTXT_OBEY = False
ITEM_PIPELINES = {
'jd.pipelines.JdPipeline': 300,
}
FEED_EXPORT_ENCODING = 'utf-8'
```
`items.py`文件中定义了我们要抓取的数据字段,如下所示:
```
import scrapy
class JdItem(scrapy.Item):
title = scrapy.Field()
price = scrapy.Field()
comment = scrapy.Field()
shop = scrapy.Field()
```
`pipelines.py`文件中我们可以对抓取到的数据进行处理,然后将其存入数据库中,如下所示:
```
import pymysql
class JdPipeline(object):
def __init__(self):
self.connect = pymysql.connect(
host='localhost',
port=3306,
db='jd',
user='root',
passwd='123456',
charset='utf8',
use_unicode=True)
self.cursor = self.connect.cursor()
def process_item(self, item, spider):
self.cursor.execute(
"""insert into jd_goods(title, price, comment, shop)
value (%s, %s, %s, %s)""",
(item['title'], item['price'], item['comment'], item['shop']))
self.connect.commit()
return item
```
2. 编写Scrapy爬虫
接下来,你需要编写一个Scrapy爬虫来爬取京东商品数据。这里以爬取“手机”关键词的商品数据为例,爬取多页数据。
```
import scrapy
from jd.items import JdItem
class JdSpider(scrapy.Spider):
name = 'jd'
allowed_domains = ['jd.com']
start_urls = ['https://search.jd.com/Search?keyword=手机&enc=utf-8']
def parse(self, response):
goods_list = response.xpath('//ul[@class="gl-warp clearfix"]/li')
for goods in goods_list:
item = JdItem()
item['title'] = goods.xpath('div[@class="gl-i-wrap"]/div[@class="p-name"]/a/em/text()').extract_first()
item['price'] = goods.xpath('div[@class="gl-i-wrap"]/div[@class="p-price"]/strong/i/text()').extract_first()
item['comment'] = goods.xpath('div[@class="gl-i-wrap"]/div[@class="p-commit"]/strong/a/text()').extract_first()
item['shop'] = goods.xpath('div[@class="gl-i-wrap"]/div[@class="p-shop"]/span/a/text()').extract_first()
yield item
# 翻页
next_page = response.xpath('//a[@class="pn-next"]/@href')
if next_page:
url = response.urljoin(next_page.extract_first())
yield scrapy.Request(url, callback=self.parse)
```
在命令行中输入以下命令运行Scrapy爬虫:
```
scrapy crawl jd
```
3. 将数据可视化
最后,你需要使用Pyecharts将爬取到的数据进行可视化。这里以柱状图为例,代码如下所示:
```
import pymysql
from pyecharts import options as opts
from pyecharts.charts import Bar
connect = pymysql.connect(
host='localhost',
port=3306,
db='jd',
user='root',
passwd='123456',
charset='utf8',
use_unicode=True)
cursor = connect.cursor()
cursor.execute("""SELECT shop, COUNT(*) FROM jd_goods GROUP BY shop""")
data = cursor.fetchall()
bar = (
Bar()
.add_xaxis([i[0] for i in data])
.add_yaxis("商品数量", [i[1] for i in data])
.set_global_opts(title_opts=opts.TitleOpts(title="京东手机商品店铺分布"))
)
bar.render("jd.html")
```
最后,你可以在浏览器中打开`jd.html`文件,查看生成的柱状图。
阅读全文