对于上述操作得到的gene.txt文件进行如下python操作:对不同gene_type,统计各length分布,绘制箱式图;统计数目最多的五种,绘制饼图。将两个图以300dpi保存为pdf文件。
时间: 2023-11-27 13:55:30 浏览: 82
以下是代码实现:
```python
import pandas as pd
import matplotlib.pyplot as plt
# 读取数据
df = pd.read_csv('gene.txt', sep='\t')
# 统计各gene_type的length分布并绘制箱式图
fig, axs = plt.subplots(figsize=(10, 6), nrows=1, ncols=3)
gene_types = df['gene_type'].unique()
for i in range(len(gene_types)):
gene_type = gene_types[i]
lengths = df[df['gene_type'] == gene_type]['length']
axs[i].boxplot(lengths, widths=0.5)
axs[i].set_xticklabels([gene_type])
axs[i].set_ylabel('length')
fig.suptitle('Distribution of Gene Length by Gene Type')
plt.savefig('gene_length_boxplot.pdf', dpi=300)
# 统计数目最多的五种gene_type并绘制饼图
gene_type_counts = df['gene_type'].value_counts()
top_gene_types = gene_type_counts.head(5).index.tolist()
other_count = gene_type_counts.sum() - gene_type_counts[top_gene_types].sum()
top_gene_types.append('Other')
gene_type_counts = gene_type_counts[top_gene_types].append(pd.Series([other_count], index=['Other']))
gene_type_counts.plot(kind='pie', labels=None, autopct='%1.1f%%', pctdistance=0.7, explode=[0.1] * 6)
plt.legend(labels=top_gene_types, bbox_to_anchor=(1.1, 1.05))
plt.title('Top 5 Gene Types by Count')
plt.savefig('gene_type_piechart.pdf', dpi=300)
```
其中,`gene.txt` 是之前生成的文件,包含了 `gene_id`、`gene_type` 和 `length` 三列数据。
运行后,会生成两个 PDF 文件,分别是箱式图和饼图。
阅读全文