song_list = soup.select('#song > li')优化这段为爬取这段链接<a target="_1" href="/play/588129.htm" class="songNameA"><font>As Long As You Love Me (Acoustic Version)</font></a>

要爬取链接 `<a target="_1" href="/play/588129.htm" class="songNameA"><font>As Long As You Love Me (Acoustic Version)</font></a>` 中的歌名和链接，可以使用以下代码： ``` song_list = soup.select('#song > li') for song in song_list: song_link = song.select_one('.songNameA')['href'] song_name = song.select_one('.songNameA font').text print(song_name, song_link) ``` 这里使用了 `select_one` 方法来获取链接和歌名。 `select_one('.songNameA')['href']` 获取链接， `select_one('.songNameA font').text` 获取歌名。注意， `select_one` 只会返回第一个匹配项，因为每个歌曲只有一个链接和一个歌名，所以这里使用 `select_one` 没有问题。

if soup.find_all("div", class_="zg_page list_pagebox"): another_url = soup.select('div.zg_page.list_pagebox > p > a')[1].get("href") wb2_data = requests.get(another_url) wb2_data.encoding = 'gb2312' soup = BeautifulSoup(wb2_data.text, 'lxml') passage1 = soup.select('div.cont.clearfix > div.zgsz_show.fl > div.zgsz_sContent.clearfix > p') passage1.pop(0) for paragraph1 in passage1: data1 = paragraph1.get_text() if len(data1) > 30: f.write(data1 + '\n') 优化这段代码

这段代码可以做一些优化，例如： 1. 使用try-except语句来处理可能出现的异常，避免程序因为异常而崩溃。 2. 将代码封装成函数，使其更加模块化和可重用。 3. 使用with语句来更好地管理文件对象，避免忘记关闭文件。下面是优化后的代码示例： ```python import requests from bs4 import BeautifulSoup def get_passages(url, file_path): try: response = requests.get(url) response.encoding = 'gb2312' soup = BeautifulSoup(response.text, 'lxml') if soup.find_all("div", class_="zg_page list_pagebox"): next_url = soup.select('div.zg_page.list_pagebox > p > a')[1].get("href") get_passages(next_url, file_path) passages = soup.select('div.cont.clearfix > div.zgsz_show.fl > div.zgsz_sContent.clearfix > p') passages.pop(0) with open(file_path, 'a', encoding='utf-8') as f: for passage in passages: data = passage.get_text().strip() if len(data) > 30: f.write(data + '\n') except Exception as e: print('Error:', e) ``` 这个函数接受两个参数：url和file_path，分别表示要爬取的页面URL和要保存数据的文件路径。该函数会递归地爬取所有页面，并将满足条件的文本写入文件。如果中途出现异常，函数会打印错误信息。

优化代码：import requests from bs4 import BeautifulSoup import csv # 请求URL url = "https://pvp.qq.com/web201605/herodetail/527.shtml" # 请求头部信息 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.57"} # 发送请求 response = requests.get(url, headers=headers) # 解析HTML soup = BeautifulSoup(response.content, "html.parser") # 获取所有英雄的链接 hero_links = [] for hero in soup.select(".herolist > li > a"): hero_links.append(hero["href"]) # 爬取每个英雄的属性 heroes = [] for link in hero_links: response = requests.get(link, headers=headers) soup = BeautifulSoup(response.content, "html.parser") # 获取英雄属性 name = soup.select(".cover-name")[0].text survive = soup.select(".")[0].text attack = soup.select(".cover-list-bar data-bar2 fl")[0].text skill = soup.select(".skill")[0].text difficulty = soup.select(".difficulty")[0].text # 保存英雄属性 heroes.append({"name": name, "survive": survive, "attack": attack, "skill": skill, "difficulty": difficulty}) # 将数据写入CSV文件 with open("heroes.csv", "w", newline="", encoding="utf-8-sig") as csvfile: fieldnames = ["name", "survive", "attack", "skill", "difficulty"] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) # 写入表头 writer.writeheader() # 写入数据 for hero in heroes: writer.writerow(hero)

可以尝试使用多线程或异步的方式来提高爬取效率，比如使用`concurrent.futures`库中的`ThreadPoolExecutor`来实现多线程爬取。同时，可以对代码进行优化，减少不必要的HTTP请求，比如将`hero_links`列表中的链接去重，避免重复爬取同一个英雄的属性。另外，可以将英雄属性的选择器封装成一个函数，提高代码的可读性和可维护性。优化后的代码如下： ```python import csv import requests from bs4 import BeautifulSoup from concurrent.futures import ThreadPoolExecutor # 请求头部信息 headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.57" } # 获取英雄属性 def get_hero_attrs(link): response = requests.get(link, headers=headers) soup = BeautifulSoup(response.content, "html.parser") name = soup.select(".cover-name")[0].text survive = soup.select(".")[0].text attack = soup.select(".cover-list-bar data-bar2 fl")[0].text skill = soup.select(".skill")[0].text difficulty = soup.select(".difficulty")[0].text return {"name": name, "survive": survive, "attack": attack, "skill": skill, "difficulty": difficulty} # 请求URL url = "https://pvp.qq.com/web201605/herodetail/527.shtml" # 发送请求 response = requests.get(url, headers=headers) # 解析HTML soup = BeautifulSoup(response.content, "html.parser") # 获取所有英雄的链接，并去重 hero_links = set(hero["href"] for hero in soup.select(".herolist > li > a")) # 爬取每个英雄的属性 heroes = [] with ThreadPoolExecutor(max_workers=8) as executor: hero_attrs = executor.map(get_hero_attrs, hero_links) for attr in hero_attrs: heroes.append(attr) # 将数据写入CSV文件 with open("heroes.csv", "w", newline="", encoding="utf-8-sig") as csvfile: fieldnames = ["name", "survive", "attack", "skill", "difficulty"] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) # 写入表头 writer.writeheader() # 写入数据 for hero in heroes: writer.writerow(hero) ```

阅读全文

song_list = soup.select('#song > li')优化这段为爬取这段链接<a target="_1" href="/play/588129.htm" class="songNameA"><font>As Long As You Love Me (Acoustic Version)</font></a>

相关推荐

python爬虫_爬取百度图片.zip

python 3利用BeautifulSoup抓取div标签的方法示例

Python爬虫实例_城市公交网络站点数据的爬取方法

爬取页面中展开收起 里面的li

Beautiful_Soup中文文档.pdf

爬取中行外汇牌价，设定牌价为美元，牌价设定选项为select name="pjname" id="pjname"，下一页的按钮在下一页，这个位置可以显示当前页码<input type="hidden" name="page" value="10">，这个位置显示了总页数共41页

https://fanqienovel.com/page/7111874278269652002?enter_from=stack-room，请用python编写代码爬取上述网址的前10章内容。

从http://xc.hfut.edu.cn/1955/list.htm爬取100条新闻标题

编写的爬虫代码写完的excel文档为空，请重新编写用python编写爬取招标网站的中部公告代码，网址为https://www.ccgp-hainan.gov.cn/cgw/cgw_list.jsp，爬取页数为10，字段为链接和标题和发布时间和中标金额和全文，并写入excel或csv

https://v.qq.com/channel/tv?channel=tv&feature=7&iarea=814&listpage=1给我爬取这个网页的代码

如何爬取href=“#”内容

如何爬取https://www.learning.mil.cn/course/search/的数据

爬虫爬取https://finance.sina.com.cn/mac/ 网页内容

用BeautifulSoup实现对http://news.imnu.edu.cn/的爬取

https://www.bilibili.com/v/channel/1339688?tab=featured爬取这个页面所有视频的基本信息

写python代码，爬取https://www.ccdi.gov.cn/scdcn/内容

最新推荐

MongoDB分片集群搭建教程：副本集创建与数据分片

前端协作项目：发布猜图游戏功能与待修复事项

管理建模和仿真的文件

【高斯信道信号编码优化】：4大方法优化Chirp信号编码过程

对给定图，实现图的深度优先遍历和广度优先遍历。以邻接表或邻接矩阵为存储结构，实现连通无向图的深度优先和广度优先遍历。以用户指定的结点为起点，分别输出每种遍历下的结点访问序列，并调试算法。使用C语言

Spring框架REST服务开发实践指南

"互动学习：行动中的多样性与论文攻读经历"

【Chirp信号检测算法精解】：掌握高效检测Chirp信号的5大关键步骤

如何修改此代码使其支持模糊匹配？

ALU课设实现基础与高级运算功能

爬取页面中展开收起里面的li