6、从CSV中创建DataFrame,分隔符为“,”,编码格式为gbk。
时间: 2024-10-28 08:07:20 浏览: 20
在Pandas中,我们可以使用`read_csv()`函数从CSV文件中读取数据并创建DataFrame。当你指定了分隔符(`delimiter`)和编码(`encoding`)时,它会确保按照指定的方式解析文件内容。这里是如何操作的:
```python
import pandas as pd
# 指定文件路径、分隔符和编码
file_path = "your_file.csv" # 替换为你需要读取的CSV文件路径
delimiter = "," # 分割符为逗号
encoding = "gbk" # 编码格式为GBK
# 使用read_csv()函数,并设置参数
df_from_csv = pd.read_csv(file_path, delimiter=",", encoding="gbk")
# 打印读取的DataFrame
print(df_from_csv.head())
```
在这个例子中,你需要将`"your_file.csv"`替换为你要读取的实际文件路径。运行这个代码后,Pandas会读取CSV文件,按照逗号分隔每个字段,并使用GBK编码解析内容。
**相关问题--:**
1. `read_csv()`函数是否支持其他常见的编码格式,如UTF-8?
2. 如果CSV文件不存在或者路径错误,`read_csv()`会发生什么?
3. 如何处理CSV文件中缺失值或异常格式的数据?
相关问题
如何在csv中创建dataframe分隔符;编码格式gbk
可以使用pandas库中的read_csv函数来读取csv文件并创建dataframe,其中可以指定分隔符和编码格式。例如:
import pandas as pd
df = pd.read_csv('filename.csv', sep=';', encoding='gbk')
这样就可以创建一个以分号为分隔符,以gbk编码格式读取的dataframe了。
以下代码转换为python可用,代码如下:let 源 = Table.FromColumns({Lines.FromBinary(Web.Contents("https://1x2d.titan007.com/" & "2337054" & ".js"), null, null, 936)}), Column1 = 源{50}[Column1], 拆分文本 = Text.Split(Column1, ";"","), 转换为表 = Table.FromList(拆分文本, Splitter.SplitTextByDelimiter(":"), null, null, ExtraValues.Error), 更改的类型 = Table.TransformColumnTypes(转换为表,{{"Column1", type text}}), 按分隔符拆分列 = Table.SplitColumn(更改的类型, "Column1", Splitter.SplitTextByDelimiter("^", QuoteStyle.Csv), {"Column1.1", "Column1.2"}), 按分隔符拆分列1 = Table.ExpandListColumn(Table.TransformColumns(按分隔符拆分列, {{"Column1.2", Splitter.SplitTextByDelimiter(";", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1.2"), 更改的类型1 = Table.TransformColumnTypes(按分隔符拆分列1,{{"Column1.1", type text}, {"Column1.2", type text}}), 替换的值 = Table.ReplaceValue(更改的类型1,"var gameDetail=Array(","",Replacer.ReplaceText,{"Column1.1"}), 按分隔符拆分列2 = Table.SplitColumn(替换的值, "Column1.2", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Column1.2.1", "Column1.2.2", "Column1.2.3", "Column1.2.4", "Column1.2.5", "Column1.2.6", "Column1.2.7"}), 更改的类型2 = Table.TransformColumnTypes(按分隔符拆分列2,{{"Column1.2.1", type number}, {"Column1.2.2", type number}, {"Column1.2.3", type number}, {"Column1.2.4", type text}, {"Column1.2.5", type number}, {"Column1.2.6", type number}, {"Column1.2.7", type number}}), 重命名的列 = Table.RenameColumns(更改的类型2,{{"Column1.1", "公司ID"}, {"Column1.2.1", "胜"}, {"Column1.2.2", "平"}, {"Column1.2.3", "负"}, {"Column1.2.4", "时间"}, {"Column1.2.5", "凯胜"}, {"Column1.2.6", "凯平"}, {"Column1.2.7", "凯负"}}), 筛选的行 = Table.SelectRows(重命名的列, each [时间] <> null and [时间] <> ""),
以下是将该代码转换为Python的代码:
```
import pandas as pd
url = "https://1x2d.titan007.com/" + "2337054" + ".js"
data = pd.read_csv(url, encoding='gbk', header=None)
column1 = data.iloc[50, 0]
split_text = column1.split(";\"")
converted_table = pd.DataFrame([x.split(":") for x in split_text])
converted_table.columns = ['Column1.1', 'Column1.2']
split_column = converted_table['Column1.2'].str.split('^', expand=True)
split_column.columns = ['Column1.2.1', 'Column1.2.2']
joined_table = pd.concat([converted_table[['Column1.1']], split_column], axis=1)
split_column2 = joined_table['Column1.2.2'].str.split('|', expand=True)
split_column2.columns = ['Column1.2.2.1', 'Column1.2.2.2', 'Column1.2.2.3', 'Column1.2.2.4', 'Column1.2.2.5', 'Column1.2.2.6', 'Column1.2.2.7']
final_table = pd.concat([joined_table[['Column1.1', 'Column1.2.1']], split_column2], axis=1)
final_table = final_table.rename(columns={"Column1.1": "公司ID", "Column1.2.1": "胜", "Column1.2.2.1": "平", "Column1.2.2.2": "负", "Column1.2.2.4": "时间", "Column1.2.2.5": "凯胜", "Column1.2.2.6": "凯平", "Column1.2.2.7": "凯负"})
filtered_table = final_table.dropna(subset=['时间'])
filtered_table = filtered_table[filtered_table['时间'] != '']
print(filtered_table)
```
请注意,由于我无法访问您提供的网站,因此我使用了示例数据进行转换。如果数据源不同,则代码需要进行适当的修改。
阅读全文