用python编写爬取海南招标网站中标公告的代码,爬取页数为10,字段为链接和标题和发布时间和中标金额和全文,并写入excel或csv,网址为https://www.ccgp-hainan.gov.cn/cgw/cgw_list.jsp
时间: 2023-05-14 09:05:13 浏览: 129
以下是用 Python 编写的爬取海南招标网站中标公告的代码,爬取页数为 10,字段为链接、标题、发布时间、中标金额和全文,并将结果写入 CSV 文件:
```python
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://www.ccgp-hainan.gov.cn/cgw/cgw_list.jsp'
params = {
'currentPage': 1,
'area': '',
'industries': '',
'noticesType': '',
'noticesTitle': '',
'noticesInfo': '',
'agentName': '',
'supplierName': '',
'startTime': '',
'endTime': '',
'pageSize': 20,
'pageCount': 10
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
def get_data():
data = []
for i in range(1, params['pageCount'] + 1):
params['currentPage'] = i
response = requests.get(url, params=params, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'class': 'table'})
rows = table.find_all('tr')[1:]
for row in rows:
cols = row.find_all('td')
link = cols[1].find('a')['href']
title = cols[1].find('a').text.strip()
date = cols[2].text.strip()
amount = cols[3].text.strip()
content = get_content(link)
data.append([link, title, date, amount, content])
return data
def get_content(link):
response = requests.get(link, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
content = soup.find('div', {'class': 'content'}).text.strip()
return content
def write_csv(data):
with open('result.csv', 'w', newline='', encoding='utf-8-sig') as f:
writer = csv.writer(f)
writer.writerow(['链接', '标题', '发布时间', '中标金额', '全文'])
writer.writerows(data)
if __name__ == '__main__':
data = get_data()
write_csv(data)
```
请注意,这个代码只是一个示例,实际使用时需要根据网站的具体情况进行修改。
阅读全文