jupyter:读取iris数据集中鸢尾花的萼片,花瓣长度,并对其进行排序、去重、并求出和,累计和,均值,标准差、方差、最大值和最小值
时间: 2024-09-24 19:15:18 浏览: 107
Map-Reduce-Iris-Flower:这些Map Reduce程序的目标是从著名的鸢尾花数据集中计算出萼片长度,萼片宽度,花瓣长度和花瓣宽度的最大值,最小值和平均值。
Jupyter是一个交互式笔记本环境,常用于数据科学和机器学习项目。在Python中,我们可以使用Pandas库方便地处理数据集,如Iris数据集。Iris数据集包含了鸢尾花的几种测量特征。
首先,我们需要导入必要的库:
```python
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
```
然后加载iris数据集:
```python
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
```
接下来,我们选择萼片和花瓣的长度作为我们操作的数据列:
```python
sepal_length = df['sepal length (cm)']
petal_length = df['petal length (cm)']
```
对这两个列进行排序:
```python
sepal_length_sorted = sepal_length.sort_values(ascending=True)
petal_length_sorted = petal_length.sort_values(ascending=True)
```
去重(这里由于数值型数据通常不会有重复,所以默认就是无重复的):
```python
sepal_length_unique = sepal_length_sorted.drop_duplicates()
petal_length_unique = petal_length_sorted.drop_duplicates()
```
计算和、累计和:
```python
total_sepal_length = sepal_length.sum()
cumulative_sum_sepal_length = sepal_length_sorted.cumsum()
total_petal_length = petal_length.sum()
cumulative_sum_petal_length = petal_length_sorted.cumsum()
```
统计均值、标准差、方差、最大值和最小值:
```python
mean_sepal_length = sepal_length.mean()
std_dev_sepal_length = sepal_length.std()
variance_sepal_length = sepal_length.var()
mean_petal_length = petal_length.mean()
std_dev_petal_length = petal_length.std()
variance_petal_length = petal_length.var()
max_sepal_length = sepal_length.max()
min_sepal_length = sepal_length.min()
max_petal_length = petal_length.max()
min_petal_length = petal_length.min()
```
将上述结果整理成一个字典或者DataFrame展示:
阅读全文