采用异步请求编写Python爬虫程序,爬取地址: https://www.luogu.com.cn/contest/68651#scoreboard中的比赛榜单, 读取《用户信息.xls文件》,如果用户信息中的账号不在榜单中则排名填-1,如果用户在榜单中则爬取用户的排名和题目通过情况(例:A题通过则填AC,否则留空)
时间: 2024-02-29 16:54:55 浏览: 82
好的,这个问题需要用到 aiohttp 库和 asyncio 库来实现异步请求和协程。以下是代码实现:
```python
import asyncio
import aiohttp
import pandas as pd
async def fetch(session, url):
async with session.get(url) as response:
return await response.json()
async def main():
# 爬取榜单数据
url = 'https://www.luogu.com.cn/contest/68651/scoreboard/ajax?_='
headers = {
'Referer': 'https://www.luogu.com.cn/contest/68651',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
async with aiohttp.ClientSession(headers=headers) as session:
data = await fetch(session, url)
# 解析榜单数据
ranklist = []
for item in data['currentData']:
ranklist.append({
'rank': item['rank'],
'username': item['user_name'],
'solved': item['solved'],
'penalty': item['penalty']
})
# 读取用户信息文件
user_df = pd.read_excel('用户信息.xls')
# 处理用户排名和题目通过情况
for index, row in user_df.iterrows():
username = row['账号']
found = False
for item in ranklist:
if item['username'] == username:
user_df.at[index, '排名'] = item['rank']
user_df.at[index, 'A'] = 'AC' if item['solved'].find('A') != -1 else ''
user_df.at[index, 'B'] = 'AC' if item['solved'].find('B') != -1 else ''
user_df.at[index, 'C'] = 'AC' if item['solved'].find('C') != -1 else ''
user_df.at[index, 'D'] = 'AC' if item['solved'].find('D') != -1 else ''
user_df.at[index, 'E'] = 'AC' if item['solved'].find('E') != -1 else ''
user_df.at[index, 'F'] = 'AC' if item['solved'].find('F') != -1 else ''
user_df.at[index, 'G'] = 'AC' if item['solved'].find('G') != -1 else ''
user_df.at[index, 'H'] = 'AC' if item['solved'].find('H') != -1 else ''
found = True
break
if not found:
user_df.at[index, '排名'] = -1
# 写入用户信息文件
user_df.to_excel('用户信息.xls', index=False)
if __name__ == '__main__':
asyncio.run(main())
```
需要注意的是,aiohttp 库需要 Python 3.6 或以上版本支持,而且异步请求需要服务端支持,否则可能会失败。
阅读全文