首页如何获取网页源代码中script里面的js源代码以python字典的格式保存

如何获取网页源代码中script里面的js源代码以python字典的格式保存

时间: 2023-05-12 21:05:41 浏览: 130

可以使用Python中的BeautifulSoup库和正则表达式来获取网页源代码中script里面的js源代码，并将其转换为Python字典的格式保存。具体步骤如下： 1. 使用requests库获取网页源代码，将其保存到一个变量中。 2. 使用BeautifulSoup库解析网页源代码，找到所有的script标签。 3. 使用正则表达式匹配script标签中的js源代码，将其保存到一个变量中。 4. 使用json库将js源代码转换为Python字典的格式，保存到一个变量中。 5. 将Python字典保存到文件中，或者在程序中使用。以下是示例代码： ```python import requests from bs4 import BeautifulSoup import re import json # 1. 获取网页源代码 url = 'https://www.example.com' response = requests.get(url) html = response.text # 2. 解析网页源代码 soup = BeautifulSoup(html, 'html.parser') scripts = soup.find_all('script') # 3. 匹配js源代码 js_code = '' for script in scripts: if script.string: match = re.search(r'var\s+data\s+=\s+(\{.*?\});', script.string, re.DOTALL) if match: js_code = match.group(1) break # 4. 转换为Python字典 data = json.loads(js_code) # 5. 保存到文件中 with open('data.json', 'w', encoding='utf-8') as f: json.dump(data, f, ensure_ascii=False, indent=4) ``` 注意：以上代码仅供参考，实际使用时需要根据具体情况进行修改。

阅读全文