mutate(n=row_number())

这个表达式出现在 R 语言中的 dplyr 包中的 mutate() 函数中，其作用是为数据框中的每一行添加一个名为 "n" 的新变量，并为每一行赋予一个递增的序号，从 1 开始。这个序号可以按照数据框中的指定列进行排序。这个函数通常用于数据框的预处理，为数据框添加新的变量，以及为数据框的每一行添加一个唯一标识符。

library(dplyr) data <- data %>% mutate(startlabel = as.character(start.station.id), endlabel = as.character(end.station.id)) %>% group_by(start.station.id, end.station.id) %>% mutate(label = row_number()) %>% ungroup() %>% mutate(startlabel = ifelse(duplicated(startlabel), paste0(startlabel, ".", endlabel), startlabel), endlabel = ifelse(duplicated(endlabel), paste0(endlabel, ".", start.station.id), endlabel)) %>% group_by(start.station.id) %>% mutate(startlabel = dense_rank(startlabel)) %>% ungroup() %>% group_by(end.station.id) %>% mutate(endlabel = dense_rank(endlabel)) %>% ungroup() %>% mutate(startlabel = ifelse(is.na(startlabel), max(endlabel, na.rm = TRUE) + row_number(), startlabel), endlabel = ifelse(is.na(endlabel), max(startlabel, na.rm = TRUE) + row_number(), endlabel)) %>% select(-label)

这段代码使用了 R 语言中的 dplyr 包来进行数据处理操作。它对一个名为 `data` 的数据框进行了一系列变换操作，下是对每个操作简要说明： 1.mutate(startlabel = as.character(start.station.id), endlabel = as.character(end.station.id))`：将 `start.station.id` 和 `end.station.id` 列的值转换为字符类型，并将结果存储在 `startlabel` 和 `endlabel` 列中。 2. `group_by(start.station.id, end.station.id)`：按照 `start.station.id` 和 `end.station.id` 列对数据进行分组。 3. `mutate(label = row_number())`：为每个组内的观测添加一个新的 `label` 列，值为组内观测的行号。 4. `ungroup()`：取消分组，将数据恢复为未分组状态。 5. `mutate(startlabel = ifelse(duplicated(startlabel), paste0(startlabel, ".", endlabel), startlabel), endlabel = ifelse(duplicated(endlabel), paste0(endlabel, ".", start.station.id), endlabel))`：对于重复的 `startlabel` 和 `endlabel`，在其后添加对应的值，以区分它们。 6. `group_by(start.station.id)`：按照 `start.station.id` 列对数据进行再次分组。 7. `mutate(startlabel = dense_rank(startlabel))`：对每个组内的 `startlabel` 进行密集排名，即为每个不同的值分配一个唯一的整数。 8. `ungroup()`：取消分组。 9. `group_by(end.station.id)`：按照 `end.station.id` 列对数据进行分组。 10. `mutate(endlabel = dense_rank(endlabel))`：对每个组内的 `endlabel` 进行密集排名，即为每个不同的值分配一个唯一的整数。 11. `ungroup()`：取消分组。 12. `mutate(startlabel = ifelse(is.na(startlabel), max(endlabel, na.rm = TRUE) + row_number(), startlabel), endlabel = ifelse(is.na(endlabel), max(startlabel, na.rm = TRUE) + row_number(), endlabel))`：如果 `startlabel` 或 `endlabel` 中有缺失值，将缺失值用最大值加上行号来填充。 13. `select(-label)`：删除 `label` 列。请注意，这只是代码的解释，并非实际运行该代码的环境。如果你想运行该代码，请确保已经安装了 dplyr 包，并且数据框 `data` 中包含了对应的列。

用r语言写程序：Simulate a dataset containing continuous traits with given h^2. Split itinto training and testing datasets with proportion 4 ∶ 1.set.seed(20132014)n <- 5000p <- 1000h <- c(0.2, 0.8)[1]# simulate genotype (not exactly)x_r <- matrix(rnorm(n * p), ncol = p)xmean <- matrix(rep(colMeans(x_r), n), ncol=p, byrow = TRUE)xsd <- matrix(rep(apply(x_r, 2, sd), n), ncol=p, byrow = TRUE)x <- (x_r - xmean)/xsdPlease perform the marginal model.## [1] 526 776 50 557 801 345 298 559 178 214 • Show the locations of significant SNPs.• Replicate above figure, and highlight the significant coefficents in red.

Sure, here is the R code to simulate the dataset, split it into training and testing datasets, and perform the marginal model: ```R # Simulate dataset set.seed(20132014) n <- 5000 p <- 1000 h2 <- 0.2 x_r <- matrix(rnorm(n * p), ncol = p) xmean <- matrix(rep(colMeans(x_r), n), ncol=p, byrow = TRUE) xsd <- matrix(rep(apply(x_r, 2, sd), n), ncol=p, byrow = TRUE) x <- (x_r - xmean)/xsd beta <- rnorm(p, mean = 0, sd = sqrt(h2/p)) y <- x %*% beta + rnorm(n, mean = 0, sd = sqrt(1 - h2)) # Split into training and testing datasets train_idx <- sample(1:n, size = n/5*4, replace = FALSE) x_train <- x[train_idx, ] y_train <- y[train_idx] x_test <- x[-train_idx, ] y_test <- y[-train_idx] # Perform marginal model library(glmnet) fit <- glmnet(x_train, y_train, family = "gaussian") cvfit <- cv.glmnet(x_train, y_train, family = "gaussian") lambda_min <- cvfit$lambda.min beta_hat <- coef(fit, s = lambda_min)[-1, ] ``` To show the locations of significant SNPs, we can plot the absolute values of the estimated coefficients and highlight the significant ones in red: ```R # Plot coefficients library(ggplot2) library(dplyr) beta_df <- data.frame(beta_hat) %>% mutate(SNP = row_number()) %>% gather(key = "beta", value = "value", -SNP) %>% mutate(value = abs(value)) ggplot(beta_df, aes(x = SNP, y = value)) + geom_point() + scale_y_log10() + labs(title = "Estimated Coefficients", x = "SNP", y = "Absolute Value") + theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_blank(), axis.ticks.x = element_blank()) # Highlight significant coefficients significant_snps <- which(beta_hat != 0) significant_df <- beta_df %>% filter(SNP %in% significant_snps) %>% mutate(significant = TRUE) ggplot(beta_df, aes(x = SNP, y = value)) + geom_point() + geom_point(data = significant_df, aes(color = significant), size = 2) + scale_y_log10() + scale_color_manual(values = c("black", "red")) + labs(title = "Estimated Coefficients", x = "SNP", y = "Absolute Value") + theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_blank(), axis.ticks.x = element_blank()) ``` This should give you the desired plot with significant coefficients highlighted in red.

mutate(n=row_number())

相关推荐

by_row:一个替代rowwise（）和mutate（）+ map（）的建议

mutate_cpp:C ++突变测试环境

swr-mutate-many:针对SWR的多个缓存键调用mutate的小功能

r语言对山脊图峰值排序

r语言 根据分组1对分组2排序 并分配序号mutate()

dplyr包数据排序

R language code: grouping to produce serial numbers?

如何使用R将细菌蛋白质序列可视化

我有一个日期列和一个数据列，利用r语言写一段代码找出某一列连续出现的相同数据，及这些数据对应的起讫日期，并输出这些起讫日期的最大日期和最小日期

在R语言下，使用孪生网络计算文本相似度该怎么实现？能给出代码吗？

Mutate.m

mutate-stream

Laravel开发-laravel-mutate

dexcov:并排生成Dextool Mutate与LCOV

最新推荐

基于改进YOLO的玉米病害识别系统（部署教程＆源码）

基于单片机的瓦斯监控系统硬件设计.doc

管理建模和仿真的文件

：Python环境变量配置从入门到精通：Win10系统下Python环境变量配置完全手册

electron桌面壁纸功能

基于单片机的流量检测系统的设计_机电一体化毕业设计.doc

"互动学习：行动中的多样性与论文攻读经历"

：Python环境变量配置实战：Win10系统下Python环境变量配置详解

ps -ef|grep smon

基于单片机的继电器设计.doc

r语言根据分组1对分组2排序并分配序号mutate()