Traceback (most recent call last): File "C:\Users\test\Desktop\DIDI测试.py", line 13, in <module> extract_html_info(file_path) ^^^^^^^^^^^^^^^^^ NameError: name 'extract_html_info' is not defined
时间: 2023-08-04 22:06:25 浏览: 144
测试用的html文件
抱歉,我之前给出的代码有一处错误。在调用`extract_html_info`函数之前,需要先定义该函数。请将以下代码块放在调用`extract_html_info`函数之前:
```python
# 提取html文件内所需要数据
def extract_html_info(file_path):
with open(file_path, 'r', encoding='utf-8') as file:
# 读取HTML源代码
html = file.read()
soup = BeautifulSoup(html, 'html.parser')
# 提取所有的<p>标签
p_tags = soup.find_all('p')
for p_tag in p_tags:
# 提取ID
talk_id = p_tag.find_previous(string=lambda text: isinstance(text, str) and '[talkid:' in text)
talk_id = talk_id.strip('[talkid:]')
# 提取时间
timestamp = p_tag.find_previous('body').find_previous('head').find('meta', {'http-equiv': 'Content=-Type'})['content']
# 提取发送号码
send_number = p_tag.find('span', {'class': 'hint-success'}).text
# 提取接收号码
receive_number = p_tag.find_all('span', {'class': 'hint-success'})[1].text
# 提取信息内容
message = p_tag.previous_sibling.strip()
# 判断是否是音频
if '音频' in message:
file_url = p_tag.find('a')['href']
print(f"ID: {talk_id}, 时间: {timestamp}, 发送号码: {send_number}, 接收号码: {receive_number}, 音频文件地址: {file_url}")
else:
print(f"ID: {talk_id}, 时间: {timestamp}, 发送号码: {send_number}, 接收号码: {receive_number}, 信息内容: {message}")
```
这样,你就定义了`extract_html_info`函数,并且可以在主程序中调用它了。如果还有其他问题,请随时提问。
阅读全文