首页帮我写一个爬取https://www.luogu.com.cn/contest/68651#scoreboard的代码

帮我写一个爬取https://www.luogu.com.cn/contest/68651#scoreboard的代码

时间: 2023-12-26 21:06:33 浏览: 147

好的，以下是爬取https://www.luogu.com.cn/contest/68651#scoreboard的Python代码： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 构造请求头 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} # 发送请求 url = 'https://www.luogu.com.cn/contest/68651#scoreboard' response = requests.get(url, headers=headers) # 解析网页 soup = BeautifulSoup(response.text, 'html.parser') # 获取数据 table = soup.find("table", {"class": "lg-table"}) thead = table.find("thead") th_list = thead.find_all("th") columns = [th.text.strip() for th in th_list] tbody = table.find("tbody") tr_list = tbody.find_all("tr") data = [] for tr in tr_list: td_list = tr.find_all("td") row_data = [td.text.strip() for td in td_list] data.append(row_data) # 保存数据到CSV文件 df = pd.DataFrame(data, columns=columns) df.to_csv("contest_data.csv", index=False) ``` 这段代码会使用requests库向指定URL发送请求，并使用BeautifulSoup库解析返回的HTML代码。我们可以从HTML代码中找到比赛排名表格的相关信息，并使用pandas库将表格数据保存到CSV文件中。请注意，这段代码中使用了一个User-Agent请求头，这是为了模拟浏览器发送的请求，避免被网站识别为爬虫并屏蔽。如果您需要爬取其他网站，请根据需要修改请求头。

阅读全文