python爬取下载巨潮网指定公司的年报步骤答案
时间: 2024-09-23 22:02:01 浏览: 208
Python爬取巨潮网(中国证监会指定上市公司信息披露网站)的年报通常需要几个步骤:
1. **安装必要的库**:
首先,你需要安装`requests`库来发送HTTP请求,以及`BeautifulSoup`库来解析HTML内容。可以使用pip命令行工具安装:
```
pip install requests beautifulsoup4
```
2. **定位URL结构**:
查看目标公司年报页面的URL结构,年报一般会按照年份分类。例如,https://www.cninfo.com.cn/companys/hisAnnounce?cmpid=XXX(替换为公司证券代码)。
3. **编写爬虫脚本**:
使用Python编写一个函数,输入公司代码,通过循环遍历年份,构造每个年度报告的链接。示例代码如下:
```python
import requests
from bs4 import BeautifulSoup
def get_annual_report(url_template, company_code):
url = url_template.format(company_code)
# 发送GET请求
response = requests.get(url)
response.raise_for_status() # 检查请求是否成功
# 解析HTML
soup = BeautifulSoup(response.text, 'html.parser')
# 找到年报链接(这取决于网页的具体结构,可能需要调整)
annual_report_links = soup.select('.annual-report-link') # 假设class名是'.annual-report-link'
for link in annual_report_links:
report_year = link.text.split(' ')[0] # 提取年份
report_url = link['href'] # 获取PDF链接
download_pdf(report_url)
def download_pdf(pdf_url):
with requests.get(pdf_url, stream=True) as r:
r.raise_for_status()
filename = f'{company_code}_{report_year}.pdf' # 创建文件名
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
# 调用函数,传入公司代码
get_annual_report("https://www.cninfo.com.cn/companys/hisAnnounce?cmpid={}&annNo=&tabName=year", 'your_company_code')
```
阅读全文
相关推荐
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/6eee2/6eee29554420e01e83364d49443b3b12df11c8af" alt=""
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"
data:image/s3,"s3://crabby-images/c7f95/c7f957a578cbb465f17670ca5ec5de6d8fbcb44e" alt="zip"