读取不同text1和text2文件中的内容,获取不同cookie1和cookie2信息,爬取相关网页图片,并存入本地文件image1和image2Python代码
时间: 2024-10-21 20:17:19 浏览: 52
Cookie的存入和读取,并解决中文问题
在Python中,你可以使用requests库来发送HTTP请求,BeautifulSoup库来解析HTML并提取图片链接,以及os库来处理文件操作。下面是一个简单的示例代码,假设你已经安装了这些库:
```python
import requests
from bs4 import BeautifulSoup
import os
# 定义文件名和存储路径
file1 = "text1.txt"
file2 = "text2.txt"
cookie1 = "your_cookie1"
cookie2 = "your_cookie2"
image_dir = "images"
# 读取文本文件中的URL
def get_urls_from_files(file_path, cookie):
with open(file_path, 'r') as f:
urls = [line.strip() for line in f]
return [f"{url}?cookie={cookie}" for url in urls]
# 爬取图片并保存到本地
def download_images(url_list, image_dir):
if not os.path.exists(image_dir):
os.makedirs(image_dir)
for i, url in enumerate(url_list):
response = requests.get(url, cookies={"cookie": cookie1 if i == 0 else cookie2})
# 检查请求是否成功
if response.status_code == 200:
img_data = response.content
file_name = f"image{i + 1}.jpg" # 假设图片都是jpg格式
with open(os.path.join(image_dir, file_name), 'wb') as f:
f.write(img_data)
print(f"Downloaded and saved {file_name}")
else:
print(f"Failed to download {url} (status code: {response.status_code})")
# 主函数
def main():
text_file_urls1 = get_urls_from_files(file1, cookie1)
text_file_urls2 = get_urls_from_files(file2, cookie2)
download_images(text_file_urls1, os.path.join(image_dir, "cookies1"))
download_images(text_file_urls2, os.path.join(image_dir, "cookies2"))
if __name__ == "__main__":
main()
```
注意,这个代码假设每个文本文件里有一行或多行的URL,并且这些URL需要附带特定的cookie才能访问。实际应用中可能需要对HTML结构有深入理解,以便精确地找到图片元素。
阅读全文