def parse_onepage(html): soup =BeautifulSoup(html,'lxml') div_list=soup.select('div.li_txt') jsons=[] # a = li_list[0].select('div > div.info > div.hd >a > span.title')[0].get.text() # print(a) for node in div_list: title = node.select('h4')[0].get_text() name = node.select('h3 ')[0].get_text() info = node.select('p')[0].get_text() pic=node.select('src') data = {'头衔':title,'老师':name,'简介':info,'图片':pic} jsons.append(data) save(jsons)下载教师图片
时间: 2024-02-23 19:02:09 浏览: 21
在这段代码中,获取教师图片的代码是有问题的。应该使用node.select('img')来获取图片,而不是node.select('src')。修改后的代码如下:
```
def parse_onepage(html):
soup = BeautifulSoup(html, 'lxml')
div_list = soup.select('div.li_txt')
jsons = []
for node in div_list:
title = node.select('h4')[0].get_text()
name = node.select('h3')[0].get_text()
info = node.select('p')[0].get_text()
pic = node.select('img')[0].get('src') # 修改这里的代码
data = {'头衔':title, '老师':name, '简介':info, '图片':pic}
jsons.append(data)
save(jsons)
```
这样就可以获取教师图片了。
相关问题
def get_argparser(): parser = argparse.ArgumentParser()
Sorry, as an AI language model, I don't have access to the previous context or the purpose of the function you shared. But based on the function signature, it seems like this is a Python function that returns an instance of the `ArgumentParser` class from the `argparse` module. The purpose of this class is to create a parser object that can parse command-line arguments and options.
Here's an example usage of the `ArgumentParser` class:
```
import argparse
def get_argparser():
parser = argparse.ArgumentParser()
parser.add_argument('--name', help='Enter your name')
parser.add_argument('--age', type=int, help='Enter your age')
return parser
if __name__ == '__main__':
parser = get_argparser()
args = parser.parse_args()
print(f"Hello {args.name}, you are {args.age} years old!")
```
In the above example, the `get_argparser()` function creates an instance of the `ArgumentParser` class and adds two arguments to it - `name` and `age`. The `parse_args()` method of the parser object is then called to parse the command-line arguments and options. Finally, the values of the `name` and `age` arguments are printed using the `print()` function.
import requests from bs4 import BeautifulSoup def getHTMLText(url): try: r = requests.get(url,timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return "产生异常" def jiexi(html_doc): soup = BeautifulSoup(html_doc,"html.parser") links = soup.find_all('a') for link in links: print(link.name,link["nref"],link.get.next()) if __name__ == 'main': url = "https://wallhaven.cc/toplist?page=7" print(getHTMLText(url))改进代码
这段代码可以改进的地方有:
1. 函数名jiexi不太直观,可以改为更明确的parse_links。
2. 在解析链接时,可以加上判断,如果链接是以http或https开头的,才输出链接信息,否则忽略。
3. 在主程序中,应该调用parse_links函数,而不是直接打印HTML页面内容。
改进后的代码如下:
```
import requests
from bs4 import BeautifulSoup
def get_html_text(url):
try:
r = requests.get(url, timeout=30)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return "产生异常"
def parse_links(html_doc):
soup = BeautifulSoup(html_doc, "html.parser")
links = soup.find_all('a')
for link in links:
if link.has_attr('href') and (link['href'].startswith('http') or link['href'].startswith('https')):
print(link.name, link['href'], link.get_text())
if __name__ == '__main__':
url = "https://wallhaven.cc/toplist?page=7"
html_text = get_html_text(url)
parse_links(html_text)
```
这样修改后,代码更加清晰易懂,功能也更加完善。