for i in range(1,123): if i == 1: url = 'https://www.4399.com/flash_fl/5_1.htm' html = requests.get(url,headers=headers).content doc = etree.HTML(html) name = doc.xpath('//*[@class="list affix cf"]/li/a/text()') url_1 = doc.xpath('//*[@class="list affix cf"]/li/a/@href') data_list = [] for wu_y_s in url_1: new_url = 'https://www.4399.com' +wu_y_s for i in range(len(name)): dit = {} dit['游戏名称'] = name[i] dit['游戏地址'] = new_url[i] data_list.append(dit) csv_writer.writerow(dit) else: url = 'https://www.4399.com/flash_fl/more_5_{}.htm'.format(str(i)) html = requests.get(url,headers=headers).content doc = etree.HTML(html) name = doc.xpath('//*[@class="list affix cf"]/li/a/text()') url_1 = doc.xpath('//*[@class="list affix cf"]/li/a/@href') data_list = [] for wu_y_s in url_1: new_url = 'https://www.4399.com' +wu_y_s for i in range(len(name)): dit = {} dit['游戏名称'] = name[i] dit['游戏地址'] = new_url[i] data_list.append(dit) csv_writer.writerow(dit) print(data_list)怎么改
时间: 2024-02-15 08:28:34 浏览: 174
修改 flash中的url
4星 · 用户满意度95%
根据你的代码,你想爬取4399游戏网站的游戏名称和链接,并将其保存到CSV文件中。但是,你的代码有一些问题,其中最主要的问题在于每次循环都会覆盖之前爬取到的数据,最终只会保存最后一页数据。
为了解决这个问题,你需要将data_list定义在循环外部,这样在每次循环中都会将新爬取到的数据添加到data_list中。另外,还需要修改CSV文件的写入方式,应该在循环结束后再次打开CSV文件并写入所有数据。
下面是修改后的代码示例:
```
import requests
import csv
from lxml import etree
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
data_list = [] # 定义空列表,用于保存所有数据
for i in range(1, 123):
if i == 1:
url = 'https://www.4399.com/flash_fl/5_1.htm'
else:
url = 'https://www.4399.com/flash_fl/more_5_{}.htm'.format(str(i))
html = requests.get(url, headers=headers).content
doc = etree.HTML(html)
name = doc.xpath('//*[@class="list affix cf"]/li/a/text()')
url_1 = doc.xpath('//*[@class="list affix cf"]/li/a/@href')
for wu_y_s in url_1:
new_url = 'https://www.4399.com' + wu_y_s
for i in range(len(name)):
dit = {}
dit['游戏名称'] = name[i]
dit['游戏地址'] = new_url
data_list.append(dit)
# 将数据写入CSV文件
with open('4399_games.csv', 'w', newline='', encoding='utf-8') as f:
fieldnames = ['游戏名称', '游戏地址']
csv_writer = csv.DictWriter(f, fieldnames=fieldnames)
csv_writer.writeheader()
csv_writer.writerows(data_list)
print(data_list)
```
在上面的代码中,首先定义了一个空列表data_list,然后在每次循环中将新爬取到的数据添加到data_list中。在所有循环结束后,再次打开CSV文件并写入所有数据。最终,data_list中保存了所有游戏名称和链接的字典数据,并且已经成功写入到CSV文件中。
阅读全文