首页爬取并下载当当网某一本书的网页内容，并保存为html格式

爬取并下载当当网某一本书的网页内容，并保存为html格式

时间: 2023-04-27 21:06:36 浏览: 386

可以使用Python编程语言来爬取并下载当当网某一本书的网页内容，并保存为html格式。具体步骤如下： 1. 导入所需的库，如requests、beautifulsoup4等。 2. 构造请求头，模拟浏览器访问。 3. 发送请求，获取网页内容。 4. 使用beautifulsoup4库解析网页内容，提取需要的信息。 5. 将提取的信息保存为html格式的文件。下面是一个简单的Python代码示例： ```python import requests from bs4 import BeautifulSoup # 构造请求头 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} # 发送请求，获取网页内容 url = 'http://product.dangdang.com/23384958.html' response = requests.get(url, headers=headers) html = response.text # 使用beautifulsoup4库解析网页内容，提取需要的信息 soup = BeautifulSoup(html, 'html.parser') book_title = soup.find('div', class_='name_info').h1.text.strip() book_author = soup.find('div', class_='author').a.text.strip() book_price = soup.find('span', class_='price_n').text.strip() # 将提取的信息保存为html格式的文件 with open('book.html', 'w', encoding='utf-8') as f: f.write(f'<html><head><title>{book_title}</title></head><body>') f.write(f'<h1>{book_title}</h1>') f.write(f'<p>作者：{book_author}</p>') f.write(f'<p>价格：{book_price}</p>') f.write('</body></html>') ``` 运行以上代码后，会在当前目录下生成一个名为book.html的文件，其中包含了所爬取的书籍信息。

阅读全文