随机选择/下载一个数据集,或生成一些数据 1.绘制盒装图、柱状图、散点图、折线图等 2.编写常用距离和相似度计算程序 闵可夫斯基距离 曼哈顿距离 欧式距离 切比雪夫距离 余弦相似度 Jaccard 相似度 汉明距离要完整代码
时间: 2024-10-11 16:07:14 浏览: 44
在Python中,你可以使用pandas库来处理数据集,matplotlib和seaborn库来绘制各种图表,如Boxplot(箱型图)、Bar chart(柱状图)、Scatter plot(散点图)、Line chart(折线图)。这里是一个简化的例子:
```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# 加载数据集
data = pd.read_csv('your_dataset.csv') # 替换为实际数据集路径
# 绘制盒形图
sns.boxplot(x="column_name", y="another_column_name", data=data)
plt.show()
# 绘制柱状图
df_grouped = data.groupby('category').sum()
sns.barplot(data=df_grouped, x='index', y='value')
plt.show()
# 绘制散点图或折线图
plt.scatter(data['x_column'], data['y_column'])
plt.plot(data['time_series_column'])
plt.show()
# 计算距离和相似度
from sklearn.metrics.pairwise import euclidean_distances, manhattan_distances, cosine_similarity
def calculate_distance(x, y, method):
if method == 'euclidean':
return euclidean_distances([x], [y])[0][0]
elif method == 'manhattan':
return manhattan_distances([x], [y])[0][0]
elif method == 'cosine':
return 1 - cosine_similarity([[x]], [[y]])[0][0]
x = [1, 2, 3] # 样本1
y = [4, 5, 6] # 样本2
distance_euclid = calculate_distance(x, y, 'euclidean')
distance_manh = calculate_distance(x, y, 'manhattan')
distance_cosine = calculate_distance(x, y, 'cosine')
print(f"Euclidean distance: {distance_euclid}")
print(f"Manhattan distance: {distance_manh}")
print(f"Cosine similarity: {distance_cosine}")
# Jaccard和汉明距离通常用于集合操作,例如计算两个样本的交集和并集
jaccard = len(set(x) & set(y)) / len(set(x) | set(y))
hamming = sum(a != b for a, b in zip(x, y))
print(f"Jaccard similarity: {jaccard}")
print(f"Hamming distance: {hamming}")
```
阅读全文