find.all() 我想获取照片中的src 的值

`find_all()` 是 Python 中 `BeautifulSoup` 库的一个常用方法，用于从 HTML 或 XML 文档中查找匹配指定标签（或正则表达式）的所有元素。当你处理网页抓取时，可能会遇到包含大量图片链接（src属性）的情况。如果你想从一个 HTML 结构中提取所有的 `<img>` 标签的 `src` 属性值，你可以这样做： ```python from bs4 import BeautifulSoup import requests # 假设url是你要抓取的网页地址 url = 'http://example.com' # 使用requests获取网页内容 response = requests.get(url) html_content = response.text # 使用BeautifulSoup解析HTML soup = BeautifulSoup(html_content, 'html.parser') # 查找所有<img>标签并提取src属性 images_src = soup.find_all('img', src=True) # 现在images_src是一个列表，其中包含每个<img>标签的src属性值 for img in images_src: print(img['src']) ``` 这个例子中，`find_all('img', src=True)` 会找到所有 `img` 标签，并且只返回那些 `src` 属性存在的标签。如果有些 `img` 没有 `src`，它们将不会被包括在结果中。

python beautifulsoup4 findall 之后获取 img limian de src

可以使用以下代码获取img标签中的src属性： ```python from bs4 import BeautifulSoup import requests url = 'http://example.com' # 网页链接 response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') imgs = soup.find_all('img') # 找到所有的img标签 for img in imgs: src = img.get('src') # 获取img标签的src属性 print(src) ``` 其中，`response.content`表示网页的原始内容，`html.parser`是解析器。后面的代码通过遍历找到的`img`标签，使用`get()`方法获取`src`属性值，并打印输出。

优化Python这段代码# coding:utf-8 import requests from bs4 import BeautifulSoup import re import os obj={} response = requests.get('http://hbjd.com.cn/article-view-id-20554.html') dom = BeautifulSoup(response.text, "html.parser") article=dom.find_all("article")[0] subdom = BeautifulSoup(str(article),"html.parser") title = subdom.find_all("h2")[0].txt obj["title"]=title p1 = subdom.find_all("p")[0].text obj["p1"]=p1 p2 = subdom.find_all("p")[1] subdom2 = BeautifulSoup(str(p2),"html.parser") p2 = subdom2.find_all("span")[1].text+subdom2.find_all("span")[2].text obj["p2"]=p2 p3 = subdom.find_all("p")[3].text obj["p3"]=p3 out = "".join(obj["p1"].split()) out1 = "".join(obj["p2"].split()) out2 = "".join(obj["p3"].split()) print (title) content = out+"\n" + out1 + "\n" + out2 print (content) with open ("article.txt","w")as fp: if content != None and len(content) >0: fp.write(str(content)) fp.flush() subdom3 = BeautifulSoup(str(subdom.find_all("p")[2]),"html.parser") img = subdom3.find_all("img")[0] url = "http://http://www.hbjd.com.cn/"+img.attrs['src'] responseImg = requests.get(url) with open ("img1.jpg","wb")as fp: if len(responseImg.content) >0: fp.write(responseImg.content) fp.flush() subdom4 = BeautifulSoup(str(subdom.find_all("p")[4]),"html.parser") img = subdom4.find_all("img")[0] url = "http://www.hbjd.com.cn/"+img.attrs['src'] responseImg = requests.get(url) with open ("img2.jpg","wb")as fp: if len(responseImg.content) > 0: fp.write(responseImg.content) fp.flush()

可以考虑以下几个方面的优化： 1. 减少网络请求：当前代码中使用了多次网络请求获取网页和图片，可以尝试使用会话（Session）来复用TCP连接，减少建立连接的时间。 2. 减少解析HTML的次数：当前代码中使用了多次BeautifulSoup来解析HTML，可以尝试在一次解析中获取需要的信息。 3. 简化字符串处理：当前代码中使用了多次字符串处理来去除空白字符，可以使用正则表达式或者Python内置的字符串方法来简化处理。 4. 异常处理：当前代码中没有进行异常处理，可以加入适当的异常处理来提高代码的健壮性。以下是优化后的代码示例： ```python # coding:utf-8 import requests from bs4 import BeautifulSoup import re url = 'http://hbjd.com.cn/article-view-id-20554.html' # 使用会话复用TCP连接 session = requests.Session() response = session.get(url) response.encoding = 'utf-8' dom = BeautifulSoup(response.text, 'html.parser') article = dom.find('article') # 在一次解析中获取需要的信息 title = article.find('h2').text p1, p2, p3 = [p.text for p in article.find_all('p')[::2]] # 使用正则表达式简化字符串处理 out = re.sub(r'\s+', '', p1) out1 = re.sub(r'\s+', '', p2.split()[1] + p2.split()[2]) out2 = re.sub(r'\s+', '', p3) print(title) content = out + '\n' + out1 + '\n' + out2 print(content) # 添加异常处理 try: with open('article.txt', 'w') as fp: if content and len(content) > 0: fp.write(content) except Exception as e: print('Failed to write to file:', e) # 下载图片 for i, img in enumerate(article.find_all('img'), 1): url = 'http://www.hbjd.com.cn' + img.attrs['src'] responseImg = session.get(url) if responseImg.ok: with open(f'img{i}.jpg', 'wb') as fp: fp.write(responseImg.content) ```

阅读全文

find.all() 我想获取照片中的src 的值

python beautifulsoup4 findall 之后获取 img limian de src

相关推荐

正则获取html中的 <img src = 图片地址

KKK.zip_img src

jquery获取checkbox选中的值

python爬取网页图片并放到指定文件夹同时获取网页中的Cookie同时用到re.findall

video_url = re.findall('src="(.*?)">', detail_text)[0]哪里不对？

try: res = requests.get(url=URL, headers=headers) res.encoding = res.apparent_encoding img_info = re.findall('<img src="(.*?)" alt=".*?', res1.text)

Python爬虫代码 1.运用BeautifulSoup函数在旅游学院官网打印新闻标题 2.运用find或find_all函数打印出旅游学院网站新闻的正文 3.利用正则表达式打印出所有的图片URL链接 4.利用正则表达是把我们所获取到的新闻正文里的文字替换

大家在看

基2，8点DIT-FFT，三级流水线verilog实现

某大型国企信息化项目验收管理办法.pdf

CISP-DSG 数据安全培训教材课件标准版

synopsis dma ip核手册

MRP整体设计.pptx

最新推荐

java计算器源码.zip

PHP集成Autoprefixer让CSS自动添加供应商前缀

揭秘数字音频编码的奥秘：非均匀量化A律13折线的全面解析

arduino PAJ7620U2

网站啄木鸟：深入分析SQL注入工具的效率与限制

【GPStoolbox使用技巧大全】：20个实用技巧助你精通GPS数据处理

spring boot怎么配置maven

我的个人简历HTML模板解析与应用

3GPP架构深度解析：掌握网络功能与服务框架的关键

Failed to restart vntoolsd.service: Unit vntoolsd.service not found.

try: res = requests.get(url=URL, headers=headers) res.encoding = res.apparent_encoding img_info = re.findall('<img src="(.?)" alt=".?', res1.text)