python爬虫文字图片等到本地文件代码实现
时间: 2023-08-06 18:03:35 浏览: 57
当然,这里是一个简单的Python爬虫示例代码,可以将网页上的文字和图片保存到本地文件。
```python
import requests
from bs4 import BeautifulSoup
import os
def save_text(url, save_path):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
text = soup.get_text()
with open(save_path, 'w', encoding='utf-8') as file:
file.write(text)
def save_image(url, save_dir):
response = requests.get(url, stream=True)
filename = url.split('/')[-1]
save_path = os.path.join(save_dir, filename)
with open(save_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=128):
file.write(chunk)
def main():
url = 'http://example.com' # 替换为你要爬取的网页URL
save_dir = 'save_folder' # 替换为你要保存文件的文件夹路径
if not os.path.exists(save_dir):
os.makedirs(save_dir)
save_text(url, os.path.join(save_dir, 'text.txt'))
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
for img_tag in img_tags:
img_url = img_tag['src']
save_image(img_url, save_dir)
if __name__ == '__main__':
main()
```
请替换代码中的`url`为你要爬取的网页URL,`save_dir`为你要保存文件的文件夹路径。代码使用了第三方库`requests`进行网页请求,使用了`BeautifulSoup`进行HTML解析。`save_text`函数用于保存网页中的文字内容,`save_image`函数用于保存网页中的图片。在`main`函数中,首先创建保存文件的文件夹,然后调用`save_text`和`save_image`函数进行保存。