“Stocks.txt” is data related to stocks, with the symbols appearing in column 1, and various variables relating to the symbol on the right. Question: (1)Apply PCA on this data and explain how much variability is explained by the first two principal components, how many components to keep if we want to have more than 90% variance explained. (2)Use biplot() to visualize the PCA result, and interpret how many variables comprised by the principal components 1?
时间: 2023-12-18 10:04:59 浏览: 76
To apply PCA on the data in "Stocks.txt", we first need to load the data into a data frame in R. Assuming that the data is in tab-delimited format, we can use the following code to read the data into R:
```
stocks <- read.table("Stocks.txt", header = TRUE, sep = "\t")
```
Next, we need to perform PCA on the data using the `princomp()` function in R. Here is the code to do this:
```
pca <- princomp(stocks[,2:ncol(stocks)], cor = TRUE)
```
This code selects all columns from the second to the last in the data frame (`stocks[,2:ncol(stocks)]`) as the variables to be included in the PCA. The `cor = TRUE` argument specifies that the correlation matrix should be used in the PCA.
To determine how much variability is explained by the first two principal components, we can use the `summary()` function on the PCA object:
```
summary(pca)
```
This will produce output that includes the proportion of variance explained by each principal component. We can also use the `screeplot()` function to visualize the proportion of variance explained by each component:
```
screeplot(pca)
```
To determine how many components to keep if we want to have more than 90% variance explained, we can use the `cumsum()` function to calculate the cumulative proportion of variance explained and then identify the number of components needed to reach 90%:
```
cumulative.variance <- cumsum(pca$sdev^2 / sum(pca$sdev^2))
n.components <- length(cumulative.variance[cumulative.variance <= 0.9])
```
In this case, we would need to keep the first three principal components to explain more than 90% of the variance.
To create a biplot to visualize the PCA result, we can use the `biplot()` function:
```
biplot(pca)
```
This will produce a plot that shows the scores of the observations on the first two principal components, as well as the loadings of the variables on these components. To interpret how many variables are comprised by the principal component 1, we can look at the loadings of the variables on this component. The length of each loading vector indicates the strength of the relationship between the variable and the component. We can also look at the variable labels to see which variables are associated with the largest loadings on component 1.
阅读全文