读取iris数据集中的花萼长度【sepal_length】数据(已保存为csv格式),并对其进行排序、去重,并求出和、累积和、均值、标准差、方差、最小值、最大值
时间: 2024-10-09 11:03:28 浏览: 81
首先,我们需要导入所需的库,如pandas用于数据处理,numpy用于数值计算。假设`data.csv`是保存了Iris数据集的CSV文件,我们可以按照以下步骤进行操作:
1. **加载数据**:
```python
import pandas as pd
data = pd.read_csv("data.csv")
sepal_length = data["sepal_length"]
```
2. **数据预处理**:
- **排序**:
```python
sorted_sepal_length = sepal_length.sort_values()
```
- **去重**:
```python
unique_sepal_length = sorted_sepal_length.drop_duplicates()
```
3. **计算汇总统计量**:
- **和(总和)**:
```python
total_sum = sepal_length.sum()
```
- **累积和**:
```python
cumulative_sum = sepal_length.cumsum()
```
- **均值**:
```python
mean = sepal_length.mean()
```
- **标准差**:
```python
std_dev = sepal_length.std()
```
- **方差**:
```python
variance = sepal_length.var()
```
- **最小值**:
```python
min_value = sepal_length.min()
```
- **最大值**:
```python
max_value = sepal_length.max()
```
完整代码如下:
```python
import pandas as pd
import numpy as np
# 加载数据
data = pd.read_csv("data.csv")
sepal_length = data["sepal_length"]
# 排序和去重
sorted_unique_sepal_length = sepal_length.sort_values().drop_duplicates()
# 计算统计量
total_sum = sepal_length.sum()
cumulative_sum = sepal_length.cumsum()
mean = sepal_length.mean()
std_dev = sepal_length.std()
variance = sepal_length.var()
min_value = sepal_length.min()
max_value = sepal_length.max()
print(f"和: {total_sum}")
print(f"Cumulative Sum: {cumulative_sum}")
print(f"均值: {mean}")
print(f"标准差: {std_dev}")
print(f"方差: {variance}")
print(f"最小值: {min_value}")
print(f"最大值: {max_value}")
阅读全文