![](https://csdnimg.cn/release/download_crawler_static/8828437/bg3.jpg)
+++++ took 9.99999 msecs for Outlier scoring
Now lets se the histogram of Outlier score, to choose the optimal threshold to decid weather a data-point
is outlier is not.
In [235]: weights = np.ones_like(outlier_score)/outlier_score.shape[0] # to normalize the histogram to probability plot
hist(outlier_score, bins = 50, weights = weights, histtype = ’stepfilled’, color = ’cyan’)
title(’Distribution of outlier score’)
Out[235]: <matplotlib.text.Text at 0x36030588>
It can be observd that, the optimal outlier score threshold to decide weather a data-point is outlier is
outlier or not is around 2 for most of the cases, so lets use it to see our sesults.
In [236]: threshold = 2.
# plot non outliers as green
scatter(data[:, 0], data[:, 1], c = ’green’, s = 10, edgecolors=’None’, alpha=0.5)
# find the outliers and plot te outliers
idx = np.where(outlier_score > threshold)
scatter(data[idx, 0], data[idx, 1], c = ’red’, s = 10, edgecolors=’None’, alpha=0.5)
Out[236]: <matplotlib.collections.PathCollection at 0x3640e6a0>
3