首页path = 'data/cora/' data_name = 'cora' print('Loading from raw data file...') idx_features_labels = np.genfromtxt("{}{}.content".format(path, data_name), dtype=np.dtype(str)) features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32) _, _, labels = np.unique(idx_features_labels[:, -1], return_index=True, return_inverse=True)

path = 'data/cora/' data_name = 'cora' print('Loading from raw data file...') idx_features_labels = np.genfromtxt("{}{}.content".format(path, data_name), dtype=np.dtype(str)) features = sp.csr_matrix(idx_features_labels[:, 1:-1], dtype=np.float32) _, _, labels = np.unique(idx_features_labels[:, -1], return_index=True, return_inverse=True)

时间: 2023-06-13 21:02:07 浏览: 192

cora_cora数据集_

5星 · 资源好评率100%

样本特征，标签，邻接矩阵该数据集共2708个样本点，每个样本点都是一篇科学论文，所有样本点被分为8个类别，类别分别是1）基于案例；2）遗传算法；3）神经网络；4）概率方法；5）强化学习；6）规则学习；7）理论

这段代码是用来从Cora数据集的原始文件中加载数据的。Cora是一个常用的图分类数据集，包含2708个科学出版物，每个出版物有1433个词作为特征，标签是出版物所属的一类。具体来说，这段代码首先使用numpy的genfromtxt函数从Cora数据集的文件中读取数据。Cora数据集的每一行都包含出版物的ID、1433个特征值和标签，这些值都被视为字符串类型。因此，dtype=np.dtype(str)告诉numpy将所有值都视为字符串类型。然后，代码使用切片操作idx_features_labels[:, 1:-1]提取特征值，并使用np.float32将其转换为浮点数类型，创建一个稀疏矩阵。接下来，代码使用np.unique函数确定Cora数据集中的唯一标签，并将它们转换为整数索引，以便在训练过程中使用。这些整数标签存储在变量labels中。

阅读全文