首页编写python脚本，读取fasta格式的序列文件，并输出无回车和空行的fasta文件

编写python脚本，读取fasta格式的序列文件，并输出无回车和空行的fasta文件

时间: 2024-10-19 11:10:10 浏览: 45

在Python中，你可以使用内置的`open()`函数配合正则表达式库`re`来读取fasta格式的文件并去掉不必要的换行和空行。下面是一个简单的示例： ```python import re def read_fasta(filename): with open(filename, 'r') as f: seqs = [] current_seq = "" for line in f: # 忽略注释行 if not line.startswith(">"): # 移除回车符 line = line.strip() # 如果开始新的sequence，加入旧的并清空当前seq if len(current_seq) > 0: seqs.append(current_seq) current_seq = "" # 仅保留DNA或RNA碱基 current_seq += re.sub(r"[^ACGTacgt]", "", line.upper()) else: # 当前行是sequence标签，跳过 continue # 添加最后一个sequence，如果有的话 if len(current_seq) > 0: seqs.append(current_seq) return seqs # 使用函数并保存结果到新的fasta文件 clean_seqs = read_fasta("input.fasta") with open("output.fasta", "w") as out_file: for seq in clean_seqs: out_file.write(f">{seq.strip()}\n{seq}\n")

阅读全文