请帮我利用Python和正则表达式写一个程序,要求程序找出'ID=cds-NP_001096854.1;Parent=rna-NM_001103384.3;Dbxref=FLYBASE:FBpp0111834,GeneID:5740847,Genbank:NP_001096854.1,FLYBASE:FBgn0025837;orig_protein_id=gnl|FlyBase|CG17636-PC|gb|AFH07158;orig_transcript_id=gnl|FlyBase|CG17636-RC;product=uncharacterized protein%2C transcript variant C;Name=NP_001096854.1;gbkey=CDS;gene=CG17636;locus_tag=Dmel_CG17636;orig_transcript_id=gnl|FlyBase|CG17636-RA;product=uncharacterized protein%2C isoform A;protein_id=NP_001096854.1'中"protein_id="(前面的开头不能是’_‘)的字符串
时间: 2023-03-19 22:21:21 浏览: 123
我可以提供一个可以用来完成这个任务的Python正则表达式:import re pattern = re.compile(r'protein_id=([^_][A-Za-z0-9.-]+)') result = pattern.search(string) protein_id = result.group(1) print(protein_id)
相关问题
请帮我利用Python和正则表达式写一个程序,要求程序找出'ID=cds-NP_001096854.1;Parent=rna-NM_001103384.3;Dbxref=FLYBASE:FBpp0111834,GeneID:5740847,Genbank:NP_001096854.1,FLYBASE:FBgn0025837;orig_protein_id=gnl|FlyBase|CG17636-PC|gb|AFH07158;orig_transcript_id=gnl|FlyBase|CG17636-RC;product=uncharacterized protein%2C transcript variant C;Name=NP_001096854.1;gbkey=CDS;gene=CG17636;locus_tag=Dmel_CG17636;orig_transcript_id=gnl|FlyBase|CG17636-RA;product=uncharacterized protein%2C isoform A;protein_id=NP_001096854.1'中"protein_id="的数值
首先,我们需要导入Python中的正则表达式模块:import re。然后,我们可以使用正则表达式来查找'ID=cds-NP_001096854.1;parent=rna-NM_001103384.3;Dbxref=FLYBASE:FBpp0111834,GeneID:5740847,Genbank:NP_001096854.1,FLYBASE:FBgn0025837;orig_protein_id=gnl|FlyBase|CG17636-PC|gb|AFH07158;orig_transcript_id=gnl|FlyBase|CG17636-RC;product=uncharacterized protein, transcript variant C;Name=NP_001096854.1;gbkey=CDS;gene=CG17636;locus_tag=Dmel_CG17636;orig_transcript_id=gnl|FlyBase|CG17636-RA;product=uncharacterized protein, isoform A;protein_id=NP_001096854.1'中的"protein_id"的值:protein_id = re.search("protein_id=(.*?) ", string).group(1)。
阅读全文