Python爬取当当、京东、亚马逊图书信息代码实例_python爬虫京东图书,爬取当当网书籍信息存到excel中

18 浏览量更新于2023-03-16 评论 5 收藏 357KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Python爬取当当、京东、亚马逊图书信息代码实例爬取当当、京东、亚马逊图书信息代码实例

主要介绍了Python爬取当当、京东、亚马逊图书信息代码实例，具有一定借鉴价值，需要的朋友可以参考下。

注：1.本程序采用MSSQLserver数据库存储，请运行程序前手动修改程序开头处的数据库链接信息

2.需要bs4、requests、pymssql库支持

3.支持多线程

from bs4 import BeautifulSoup

import re,requests,pymysql,threading,os,traceback

try:

conn = pymysql.connect(host='127.0.0.1', port=3306, user='root', passwd='root', db='book',charset="utf8")

cursor = conn.cursor()

except:

print('错误：数据库连接失败')

#返回指定页面的html信息

def getHTMLText(url):

try:

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}

r = requests.get(url,headers = headers)

r.raise_for_status()

r.encoding = r.apparent_encoding

return r.text

except:

return ''

#返回指定url的Soup对象

def getSoupObject(url):

try:

html = getHTMLText(url)

soup = BeautifulSoup(html,'html.parser')

return soup

except:

return ''

#获取该关键字在图书网站上的总页数

def getPageLength(webSiteName,url):

try:

soup = getSoupObject(url)

if webSiteName == 'DangDang':

a = soup('a',{'name':'bottom-page-turn'})

return a[-1].string

elif webSiteName == 'Amazon':

a = soup('span',{'class':'pagnDisabled'})

return a[-1].string

except:

print('错误：获取{}总页数时出错...'.format(webSiteName))

return -1

class DangDangThread(threading.Thread):

def __init__(self,keyword):

threading.Thread.__init__(self)

self.keyword = keyword

def run(self):

print('提示：开始爬取当当网数据...')

count = 1

length = getPageLength('DangDang','http://search.dangdang.com/?key={}'.format(self.keyword))#总页数

tableName = 'db_{}_dangdang'.format(self.keyword)

try:

print('提示：正在创建DangDang表...')

cursor.execute('create table {} (id int ,title text,prNow text,prPre text,link text)'.format(tableName))

print('提示：开始爬取当当网页面...')

for i in range(1,int(length)):

url = 'http://search.dangdang.com/?key={}&page_index={}'.format(self.keyword,i)

soup = getSoupObject(url)

lis = soup('li',{'class':re.compile(r'line'),'id':re.compile(r'p')})

for li in lis:

a = li.find_all('a',{'name':'itemlist-title','dd_name':'单品标题'})

pn = li.find_all('span',{'class': 'search_now_price'})

pp = li.find_all('span',{'class': 'search_pre_price'})

if not len(a) == 0:

link = a[0].attrs['href']

title = a[0].attrs['title'].strip()

else:

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38606076

粉丝: 4
资源: 944

会员权益专享

Python爬取当当、京东、亚马逊图书信息代码实例

评论0

会员权益专享

最新资源

Python爬取当当、京东、亚马逊图书信息代码实例

评论0

python爬取dangdang指定图书数据

python书籍信息爬虫实例

Python爬取豆瓣图书信息并保存到本地

python爬取图书信息_Python爬取当当、京东、亚马逊图书信息代码实例

python爬取当当网儿童图书榜引言

python爬取当当网书籍

python爬取当当网儿童图书榜来源

python爬取当当图书信息并存到csv

python爬取当当网书籍评论

用Python爬取当当网数据可视化

python爬取音乐排行_python爬取网易云音乐热歌榜实例代码

python爬取网易云音乐飙升榜音乐_python爬取网易云音乐热歌榜 python爬取网易云音乐热歌榜实例代码...

python 爬取多页京东商品信息

python爬取京东电脑的代码

python爬取京东商品信息

python爬取微博热点

python爬取京东评论

python爬取已经打开的网页源代码

python爬取京东商品信息并保存exe文件

python爬虫爬取京东商品详情信息

会员权益专享

最新资源