Then, the median absolute deviations (MAD) was calculated for each remained feature21. Features with MAD equal to zero were discarded, as these features were considered as non-informative. After this step, 33881 features were left. Next, we further selected features with prognostic value. Here the prognostic performance is assessed using the concordance index (C-index), a generalization of the area under the receiver operating characteristic (ROC) curve (AUC)22. The C-index for each feature was calculated. Features with C-index ≥ 0.580 are considered as predictive factors. After prognostic performance analysis, 1581 features remained. Then, we further reduced the data dimension by removing highly correlated features. Here the correlation coefficient between each pair of features is calculated. For feature pair with correlated coefficient ≥0.90, the more prognostic feature is retained and the other feature is removed. Finally, the remained 150 image features are selected and regarded as robust, predictive and nonredundant. 解释
时间: 2024-04-29 12:19:23 浏览: 8
该段文字描述了一个数据特征选择的过程。首先,对于所有特征,计算其中位数绝对偏差(MAD),并移除MAD等于零的特征,因为这些特征被认为是非信息性的。经过此步骤,剩下33881个特征。然后,使用协调指数(C-index)对这些特征进行预测价值分析,C-index是接收者操作特征(ROC)曲线下面积(AUC)的推广。具有C-index≥0.580的特征被视为预测性因素。经过预测性能分析后,剩下1581个特征。接下来,通过计算特征之间的相关系数,进一步减少数据维度。对于相关系数≥0.90的特征对,保留更具预测性的特征,移除另一个特征。最后,剩下150个图像特征被选为具有稳健性、预测性和非冗余性的特征。
相关问题
异常值检测 —— MAD(median absolute deviation)
MAD(median absolute deviation)是一种常见的异常值检测方法,它基于数据的中位数来度量数据的离散程度。
具体来说,MAD 的计算方法如下:
1. 计算数据的中位数 median;
2. 对于每个数据点,计算它与中位数之间的绝对差;
3. 计算所有绝对差的中位数 MAD。
MAD 和标准差(standard deviation)类似,都可以用来衡量数据的离散程度,但 MAD 对异常值更加敏感,因为它使用的是中位数而不是平均值,而中位数对于极端值的影响较小。因此,MAD 更适合用于检测存在异常值的数据。
在进行异常值检测时,可以将数据点与中位数之间的绝对差与 MAD 相比较,如果某个数据点的绝对差超过了 k 倍的 MAD,那么就可以将它视为异常值,其中 k 是一个经验常数,通常取值为 2 或 3。
Please use the method in DSelection to find the median of medians recursively on the following array A in groups of 5.
As an AI language model, I cannot see the input array A to perform the required operation. However, I can explain the method of DSelection to find the median of medians recursively on an array A in groups of 5.
The DSelection algorithm is a modified version of the QuickSelect algorithm that uses the median of medians to select a pivot element. The median of medians is found by recursively dividing the array into groups of 5 and finding the median of each group. This process is repeated until a single median element is found. This median element is then used as the pivot for the QuickSelect algorithm.
Here are the steps to find the median of medians recursively on an array A in groups of 5 using the DSelection algorithm:
1. Divide the array A into groups of 5 elements each.
2. Find the median of each group by sorting the group and selecting the middle element.
3. Create a new array B containing all the medians found in step 2.
4. Recursively apply steps 1-3 on array B until a single median element is found.
5. Use the median element found in step 4 as the pivot for the QuickSelect algorithm.
By using the median of medians as the pivot, the DSelection algorithm ensures that the worst-case time complexity of QuickSelect is O(n). This makes it a useful algorithm for finding the kth smallest element in an unsorted array.