首页爬取天天基金网中医疗服务板块所有的个股基金的历史净值并保存为csv文件

爬取天天基金网中医疗服务板块所有的个股基金的历史净值并保存为csv文件

时间: 2024-04-01 13:34:02 浏览: 65

好的，需要用到Python的爬虫库和数据处理库。以下是一个可能的实现过程： 1. 导入需要的库： ```python import requests from bs4 import BeautifulSoup import pandas as pd ``` 2. 设置请求头和目标URL： ```python headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} url = 'http://fund.eastmoney.com' ``` 3. 发送请求获取页面内容，并用BeautifulSoup解析页面： ```python response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') ``` 4. 找到医疗服务板块的URL： ```python med_url = '' for a in soup.find_all('a'): if '医疗服务' in str(a): med_url = url + a['href'] break ``` 5. 进入医疗服务板块页面，获取所有个股基金的URL： ```python response = requests.get(med_url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') fund_urls = [] for a in soup.find_all('a'): if '基金净值' in str(a) and 'html' in str(a): fund_urls.append(url + a['href']) ``` 6. 进入每个基金的净值页面，获取历史净值数据并存为CSV文件： ```python for fund_url in fund_urls: response = requests.get(fund_url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') # 获取基金名称 fund_name = soup.find('div', {'class': 'fundDetail-tit'}).text.strip() # 获取历史净值数据 table = soup.find('table', {'class': 'w782 comm tzxq'}) df = pd.read_html(str(table))[0] # 保存为CSV文件 df.to_csv(fund_name + '.csv', index=False) ``` 完整代码如下：

阅读全文