当当网爬虫保存到本地
时间: 2024-02-11 20:02:05 浏览: 120
今日头条新闻信息爬虫保存到本地
以下是一个将当当网的书名、链接、价格和评论数爬取并保存到本地的爬虫示例:
```python
import requests
from lxml import etree
import csv
# 定义爬取函数
def crawl_dangdang():
url = "http://search.dangdang.com/?key=python&act=input&page_index=1"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
response = requests.get(url, headers=headers)
html = response.text
# 使用lxml解析网页源码
tree = etree.HTML(html)
# 获取书名、链接、价格和评论数
book_names = tree.xpath('//ul[@class="bigimg"]/li//a[@class="pic"]/@title')
book_links = tree.xpath('//ul[@class="bigimg"]/li//a[@class="pic"]/@href')
book_prices = tree.xpath('//ul[@class="bigimg"]/li//p[@class="price"]/span[@class="search_now_price"]/text()')
book_comments = tree.xpath('//ul[@class="bigimg"]/li//p[@class="search_star_line"]/a/text()')
# 将数据写入CSV文件
with open('books.csv', 'w', encoding='utf-8', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['书名', '链接', '价格', '评论数'])
for i in range(len(book_names)):
writer.writerow([book_names[i], book_links[i], book_prices[i], book_comments[i]])
# 调用爬取函数
crawl_dangdang()
```
这个爬虫会将爬取到的书名、链接、价格和评论数保存到名为`books.csv`的CSV文件中。
阅读全文