下面代码运行后文档内只有标题,没有内容,请帮我找一下原因 # -*- coding: utf-8 -*- import urllib.request import re import pandas as pd import time def getdata(url): req = urllib.request.Request(url) req.add_header('User-Agent', ' Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36') data = urllib.request.urlopen(req).read().decode('utf-8') str1 = str(data) pat = '''<tr> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> <td>.?</td> </tr>''' result = re.compile(pat).findall(str1) return result if __name__ == '__main__': for i in range(2018, 2020): print('正在收集第%d年数据' % i) rank = [] country = [] zhou = [] total = [] url = 'https://www.kuaiyilicai.com/stats/global/yearly/g_gdp/' + str(i) + '.html' data = getdata(url) for j in range(0, len(data)): rank.append(data[j][0]) country.append(data[j][1]) zhou.append(data[j][2]) total.append(data[j][3]) dataframe = pd.DataFrame({'排名': rank, '国家/地区': country, '所在洲': zhou, 'GDP(美元计)': total}) with open(str(i) + "年世界gdp排名.csv", mode="w", encoding="utf_8_sig") as f: dataframe.to_csv(f, index=False, sep=',') print(i, '年数据收集完成')
时间: 2024-04-28 22:21:54 浏览: 134
Python错误: SyntaxError: Non-ASCII character解决办法
这个代码的问题可能是在正则表达式的匹配上。你可以把这一行代码:
```
pat = '''<tr> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> <td>.?</td> </tr>'''
```
修改为:
```
pat = '''<tr> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> <td>(.*?)</td> </tr>'''
```
原因是因为在该网页中,每一行的最后一个数据被包含在了 `<td>` 标签内,而原来的正则表达式并没有匹配到该数据。
阅读全文