n <- 10000000 p <- 10 x <- matrix(rnorm(n*p),ncol = p) beta <- matrix(c(1:p),ncol = 1) z <- x %*% beta condprob <- pnorm(z) y <- matrix(rbinom(n,size = 1,prob = condprob),ncol = 1) prob.fit <- glm.fit(x,y,family = binomial(link = "probit"))$coefficients logit.fit <- glm.fit(x,y,family = binomial(link = "logit"))$coefficients linear.fit <- glm.fit(x,y,family = gaussian(link = "identity"))$coefficients coef.mat <- cbind(prob.fit,logit.fit,linear.fit) print(coef.mat) prop.mat <- cbind(prob.fit/logit.fit,prob.fit/linear.fit,logit.fit/linear.fit) print(prop.mat)
时间: 2024-04-26 14:21:12 浏览: 170
这段代码用于生成一个包含n个观测值和p个预测变量的数据集,然后使用不同的广义线性模型(GLM)对y(响应变量)进行建模。具体来说,这里使用logit、probit和线性模型对y进行建模,并输出每个模型的系数矩阵和该矩阵中每个参数的比例矩阵。其中,logit和probit模型是二元响应变量的GLM,而线性模型是用于连续响应变量的GLM。
相关问题
k <- 1000 n <- 100 beta0 <- c(1,1) #真实值 alp <- 0.05 #显著性水平 beta1hat <- matrix(nrow = k,ncol = 2) beta2hat <- matrix(nrow = k,ncol = 2) hsig <- numeric(k) hus <- matrix(nrow = k,ncol = 2) hls <- matrix(nrow = k,ncol = 2) y <- 2*x1+3*x2 for(i in 1:k){ x1 <- rnorm(n,0,0.5) x2 <- rbinom(n,1,prob=0.5) eb <- rnorm(n,0,1) hy <- X1%*%beta1hat +X2%*%beta2hat+eb beta1hat[i] <- solve(t(x1)%*%x1)%*%t(x1)%*%hy beta2hat[i] <- solve(t(x2)%*%x2)%*%t(x2)%*%hy }
这段代码是一个模拟线性回归的过程。其中,通过生成随机数来模拟自变量和误差项,然后利用最小二乘法来估计回归系数。其中,beta1hat和beta2hat是两个回归系数的估计值,hsig、hus和hls则是用来计算回归系数的标准误、上置信限和下置信限的值。最后,y是根据真实值和估计出来的回归系数计算出来的因变量的值。
1 Simulate a dataset containing continuous traits with given ℎ 2 . Split it into training and testing datasets with proportion 4 ∶ 1. set.seed(20132014) n <- 5000 p <- 1000 h <- c(0.2, 0.8)[1] # simulate genotype (not exactly) x_r <- matrix(rnorm(n * p), ncol = p) xmean <- matrix(rep(colMeans(x_r), n), ncol=p, byrow = TRUE) xsd <- matrix(rep(apply(x_r, 2, sd), n), ncol=p, byrow = TRUE) x <- (x_r - xmean)/xsd # ... 2 4 Calculate polygenic scores with both the marginnal and joint models on testing data. And compare their performance under low and high ℎ
Sure, here's the code to calculate polygenic scores with both the marginal and joint models on the testing data:
```R
# perform Lasso on training set
fit_joint <- glmnet(x_train, y_train, alpha = 1)
# calculate polygenic scores on testing set using joint model
ps_joint <- x_test %*% fit_joint$beta[,1]
# perform simple linear regression on training set
fit_marginal <- lm(y_train ~ x_train)
# calculate polygenic scores on testing set using marginal model
ps_marginal <- x_test %*% coef(fit_marginal)[-1]
# compare performance under low and high h^2
h_low <- c(0.2, 0.8)[1]
h_high <- c(0.2, 0.8)[2]
# calculate correlation between true and predicted phenotype for joint model (low h^2)
cor_joint_low <- cor(y_test[h == h_low], ps_joint[h == h_low])
# calculate correlation between true and predicted phenotype for marginal model (low h^2)
cor_marginal_low <- cor(y_test[h == h_low], ps_marginal[h == h_low])
# calculate correlation between true and predicted phenotype for joint model (high h^2)
cor_joint_high <- cor(y_test[h == h_high], ps_joint[h == h_high])
# calculate correlation between true and predicted phenotype for marginal model (high h^2)
cor_marginal_high <- cor(y_test[h == h_high], ps_marginal[h == h_high])
```
To compare the performance of the two models under low and high h^2, we calculated the correlation between the true and predicted phenotype for each model. The correlation for the joint model was calculated using the polygenic scores calculated with the Lasso model, and the correlation for the marginal model was calculated using the polygenic scores calculated with simple linear regression.
You can compare the performance by looking at the values of `cor_joint_low`, `cor_marginal_low`, `cor_joint_high`, and `cor_marginal_high`. The higher the correlation, the better the model's performance at predicting the phenotype.
I hope this helps! Let me know if you have any further questions.
阅读全文