给定书名,python爬取京东图书信息,并创建excel储存作者和分类
时间: 2024-09-21 19:12:46 浏览: 28
要使用Python爬取京东图书信息并将其存储到Excel文件中,你需要使用requests库来发送HTTP请求获取网页数据,然后使用BeautifulSoup解析HTML内容提取所需信息(如作者和类别),最后使用pandas将数据写入Excel。以下是一个简化的步骤:
1. 安装必要的库:
```bash
pip install requests beautifulsoup4 pandas openpyxl
```
2. 使用`requests.get()`获取网页源码:
```python
import requests
def get_html(url):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
return response.text
else:
print(f"Failed to fetch the page. Status code: {response.status_code}")
return None
```
3. 解析HTML并提取信息:
```python
from bs4 import BeautifulSoup
def extract_info(html_content):
soup = BeautifulSoup(html_content, 'html.parser')
# 这里假设书籍信息在一个特定的CSS选择器下,你需要根据实际页面结构修改
book_list = soup.select('.book-list')
books_data = []
for book in book_list:
author = book.select_one('.author').text.strip() if book.select_one('.author') else ""
category = book.select_one('.category').text.strip() if book.select_one('.category') else ""
books_data.append({'Author': author, 'Category': category})
return books_data
```
4. 将数据写入Excel:
```python
import pandas as pd
def write_to_excel(data, filename):
df_books = pd.DataFrame(data)
df_books.to_excel(filename, index=False)
# 假设你已经知道了某本书的URL
url = "https://jd.com/book/<book_url>"
html_content = get_html(url)
if html_content is not None:
extracted_data = extract_info(html_content)
write_to_excel(extracted_data, 'books_info.xlsx')
else:
print("Failed to parse the HTML content.")
```
**注意**:
- 需要替换上述代码中的`<book_url>`为你要爬取的具体书籍链接。
- 根据京东网站的实际HTML结构调整CSS选择器,以便正确抓取作者和分类信息。