编写的爬虫代码写完的excel文档为空,请重新编写用python编写爬取招标网站的中部公告代码,网址为https://www.ccgp-hainan.gov.cn/cgw/cgw_list.jsp,爬取页数为10,字段为链接和标题和发布时间和中标金额和全文,并写入excel或csv
时间: 2023-05-13 18:05:51 浏览: 183
你的日记APP导出日记为txt的爬虫代码.zip
非常感谢您的提问。以下是用Python编写的爬虫代码,可以爬取招标网站https://www.ccgp-hainan.gov.cn/cgw/cgw_list.jsp的中部公告,爬取10页,字段为链接、标题、发布时间、中标金额和全文,并将结果写入Excel文档中:
```python
import requests
from bs4 import BeautifulSoup
import openpyxl
# 设置请求头,模拟浏览器访问
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
# 创建Excel文档
wb = openpyxl.Workbook()
sheet = wb.active
sheet.title = '招标公告'
# 写入表头
sheet['A1'] = '链接'
sheet['B1'] = '标题'
sheet['C1'] = '发布时间'
sheet['D1'] = '中标金额'
sheet['E1'] = '全文'
# 爬取10页数据
for i in range(1, 11):
url = 'https://www.ccgp-hainan.gov.cn/cgw/cgw_list.jsp?page=' + str(i)
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
# 获取每个公告的链接
links = soup.select('.ewb-info-list2 a')
for link in links:
href = link.get('href')
if href.startswith('http'):
detail_url = href
else:
detail_url = 'https://www.ccgp-hainan.gov.cn' + href
# 访问详情页,获取标题、发布时间、中标金额和全文
detail_response = requests.get(detail_url, headers=headers)
detail_soup = BeautifulSoup(detail_response.text, 'html.parser')
title = detail_soup.select_one('.ewb-article h1').text.strip()
pub_time = detail_soup.select_one('.ewb-article .ewb-article-info').text.strip()
amount = detail_soup.select_one('.ewb-article .ewb-article-content p:nth-of-type(3)').text.strip()
content = detail_soup.select_one('.ewb-article .ewb-article-content').text.strip()
# 将结果写入Excel文档
row = [detail_url, title, pub_time, amount, content]
sheet.append(row)
# 保存Excel文档
wb.save('招标公告.xlsx')
```
希望这个代码能够帮助您解决问题。如果您有任何疑问,请随时联系我。
阅读全文