首页使用python爬取http://hbjd.com.cn/article-view-id-20554.html的文字和图片，并保存到本地

使用python爬取http://hbjd.com.cn/article-view-id-20554.html的文字和图片，并保存到本地

时间: 2024-03-26 08:37:24 浏览: 100

好的，下面是使用 Python 爬取 http://hbjd.com.cn/article-view-id-20554.html 的文字和图片，并保存到本地的示例代码： ```python import requests from bs4 import BeautifulSoup import os # 发送 GET 请求获取网页内容 url = 'http://hbjd.com.cn/article-view-id-20554.html' response = requests.get(url) # 使用 BeautifulSoup 解析网页内容 soup = BeautifulSoup(response.content, 'html.parser') # 获取文章标题 title = soup.find('h1', {'class': 'article-title'}).text.strip() # 获取文章正文 content = '' for p in soup.find_all('div', {'class': 'article-detail'})[0].find_all('p'): content += p.text.strip() + '\n' # 创建保存图片的文件夹 if not os.path.exists(title): os.makedirs(title) # 获取文章中的图片链接并保存到本地 for img in soup.find_all('div', {'class': 'article-detail'})[0].find_all('img'): img_url = 'http://hbjd.com.cn' + img['src'] img_name = img_url.split('/')[-1] img_path = os.path.join(title, img_name) img_data = requests.get(img_url).content with open(img_path, 'wb') as f: f.write(img_data) # 保存文章标题和正文到本地文件 with open(title + '.txt', 'w', encoding='utf-8') as f: f.write(title + '\n\n') f.write(content) ``` 运行以上代码，会在当前目录下创建一个以文章标题命名的文件夹，该文件夹中包含了文章中的所有图片，同时，还会生成一个以文章标题命名的文本文件，其中包含了文章的标题和正文。

阅读全文