首页用 Beautiful soup 外部类做爬虫程序，爬取当当网的商品数据，最后以文件形存

用 Beautiful soup 外部类做爬虫程序，爬取当当网的商品数据，最后以文件形存

时间: 2024-04-05 19:35:20 浏览: 15

好的，我可以帮你回答这个问题。首先，你需要安装 Beautiful Soup 库，然后导入库。接下来，你需要使用 requests 库来获取网页的 HTML 内容。用以下代码可以实现： ```python import requests url = "http://search.dangdang.com/?key=python&act=input" response = requests.get(url) html = response.text ``` 接着，你需要使用 Beautiful Soup 对 HTML 内容进行解析，提取你需要的数据。以下是一个例子： ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'html.parser') items = soup.find_all('li', {'class': 'line'}) for item in items: # 提取商品名称 name = item.find('a', {'class': 'pic'}).get('title') # 提取商品价格 price = item.find('p', {'class': 'price'}).find('span', {'class': 'search_now_price'}).text # 提取商品链接 link = item.find('a', {'class': 'pic'}).get('href') # 保存数据到文件 with open('dangdang.txt', 'a', encoding='utf-8') as f: f.write(name + '\t' + price + '\t' + link + '\n') ``` 上面的代码中，我们使用了 BeautifulSoup 的 `find_all` 方法来查找所有符合条件的 HTML 标签，并且使用 `find` 方法来查找第一个符合条件的 HTML 标签。最后，我们把提取出来的数据保存到文件中。希望我的回答能够帮助到你。