ssgsea富集分析代码
时间: 2023-06-21 15:21:46 浏览: 217
SSGSEA(Single-sample Gene Set Enrichment Analysis)是一种基于基因集富集分析的方法,可以对单个样本进行基因表达谱的分析。以下是一个Python实现的SSGSEA富集分析代码示例:
```python
import numpy as np
from scipy.stats import norm
def ssgsea(gene_exp, gene_sets, nperm=1000, weighted_score_type=1, permutation=True,
min_size=1, max_size=5000, verbose=False, seed=None):
"""
:param gene_exp: array-like, shape (n_samples, n_features)
Gene expression matrix (rows are samples and columns are features).
:param gene_sets: dict
Gene sets in the format of dictionary. Keys are pathway names and values are gene lists.
:param nperm: int, optional
The number of permutations for calculating the p-value. Default is 1000.
:param weighted_score_type: int, optional
The weighting score type. Default is 1.
:param permutation: bool, optional
Whether to do permutation for calculating the p-value. Default is True.
:param min_size: int, optional
The minimum number of genes in a gene set to be considered. Default is 1.
:param max_size: int, optional
The maximum number of genes in a gene set to be considered. Default is 5000.
:param verbose: bool, optional
Whether to print the progress. Default is False.
:param seed: int, optional
The seed for the random number generator. Default is None.
:return: dict
A dictionary of pathway names and enrichment scores.
"""
# Initialize the random number generator
if seed is not None:
np.random.seed(seed)
# Prepare the gene expression matrix
gene_exp = np.array(gene_exp)
# Prepare the gene set list
gene_sets = {k: v for k, v in gene_sets.items() if min_size <= len(v) <= max_size}
# Compute the gene set scores
pathways = {}
for pathway, genes in gene_sets.items():
# Compute the gene set score for each sample
gss = []
for i in range(gene_exp.shape[0]):
# Get the gene expression values for the pathway genes
pathway_exp = gene_exp[i, np.isin(gene_exp.columns, genes)]
# Compute the gene set score
if weighted_score_type == 0:
gss.append(pathway_exp.sum())
elif weighted_score_type == 1:
gss.append(pathway_exp.mean())
elif weighted_score_type == -1:
gss.append(pathway_exp.abs().mean())
# Compute the enrichment score and p-value
if permutation:
null_gss = []
for i in range(nperm):
# Shuffle the gene expression values
shuffle_exp = gene_exp.apply(np.random.permutation, axis=1)
# Compute the gene set score for each sample
null_gss.append(shuffle_exp.apply(lambda x: x[np.isin(gene_exp.columns, genes)].mean(), axis=1))
null_gss = pd.concat(null_gss, axis=1)
null_es = null_gss.apply(lambda x: (x > gss).mean() - (x < gss).mean())
es = (gss - null_es.mean()) / null_es.std()
pval = (null_es < gss).mean()
else:
es = (gss - gss.mean()) / gss.std()
pval = 1 - norm.cdf(es)
pathways[pathway] = {'es': es, 'pval': pval}
if verbose:
print('%s: ES = %.3f, p-value = %.3f' % (pathway, es, pval))
return pathways
```
该代码使用了NumPy和SciPy库进行计算。在使用时,需要将基因表达谱和基因集传递给`ssgsea`函数。此外,还可以设置其他参数,例如是否进行置换和置换次数等。函数返回一个包含富集分析结果的字典。
阅读全文