首页实践一个爬取京东商品信息的Python网络爬虫代码

实践一个爬取京东商品信息的Python网络爬虫代码

时间: 2024-09-18 09:14:02 浏览: 76

实现一个简单的京东商品信息爬取的Python网络爬虫通常会使用`requests`, `BeautifulSoup` 或者 `selenium` 等库。这里是一个基本示例，我们将使用`requests` 和 `BeautifulSoup` 来抓取商品名称、价格和链接。注意，京东有反爬机制，实际操作时需要遵守其Robots协议，并且可能需要处理登录、动态加载内容等问题。 ```python import requests from bs4 import BeautifulSoup # 模拟浏览器头 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} def get_html(url): response = requests.get(url, headers=headers) if response.status_code == 200: return response.text else: print(f"请求失败，状态码：{response.status_code}") return None def parse_html(html): soup = BeautifulSoup(html, 'lxml') products = soup.find_all('div', class_='gl-item') # 假设商品列表在class为'main-gl'的容器里 for product in products: name = product.find('h4', class_='p-name').text.strip() # 找到商品名 price = product.find('span', class_='p-price').text.strip() # 找到商品价格 item_url = product.find('a')['href'] # 找到商品详情页链接 print(f"商品名: {name}, 价格: {price}, 链接: {item_url}") if __name__ == '__main__': url = "https://search.jd.com/Search?keyword=电子产品&enc=utf-8" # 替换为你想要爬取的商品分类URL html = get_html(url) if html: parse_html(html) else: print("爬取数据失败，请检查网络连接和目标URL.") ```

阅读全文