Python实现CAZy数据库蛋白质数据自动检索工具

需积分: 15 83 浏览量更新于2024-11-17 收藏 1.12MB ZIP 举报

资源摘要信息:"cazy_webscraper是一个专为Python3开发的程序包，用于自动化地从CAZy网站数据库中检索所有蛋白质数据。CAZy（Carbohydrate-Active enZYmes Database）是专门用于存储和分类碳水化合物活性酶（CAZymes）的数据库。casy_webscraper的功能包括创建一个本地SQL数据库，该数据库不仅存储了从CAZy网站检索的蛋白质数据，而且允许用户以超越CAZy网站自身功能的方式全面地查询数据。该程序还包含了名为expand的模块，它可以用于扩展检索到的蛋白质序列数据，并进一步从结构生物信息学研究合作社（RCSB）的蛋白质数据库中检索相关蛋白质结构文件。 cazy_webscraper的配置选项非常灵活，用户可以根据需要选择抓取整个数据库、选定的CAZy类别、特定CAZy系列或者通过分类过滤器限制CAZymes的数据抓取范围，如限制到特定的生物王国、属、物种和/或菌株。此外，该软件包遵循MIT许可协议，意味着在得到适当认可的前提下，用户可以免费使用它。通过提供详细的文档和一个根目录中的实体关系（ER）模型图，cazy_webscraper致力于向用户提供清晰的使用指导和数据结构说明，以便用户可以有效地使用这一工具并理解数据的组织方式。该程序包的标签为'scraper'，这指的是它是一个网络爬虫或网页抓取工具，用于从互联网上抓取数据。'cazy'标签强调了其特定的数据源是CAZy数据库，而'cazymes'则直接指向了该数据库所关注的酶类——碳水化合物活性酶。'HTML'标签可能意味着该程序在与网页交互时需要处理或生成HTML内容。 'cazy_webscraper-master'是压缩包子文件的文件名称，表明用户可以通过解压缩这个文件来获取该程序包的最新版本或源代码。'master'通常在版本控制系统（如Git）中代表主分支，意味着该压缩包可能包含了软件包的最新开发版本。"

收起资源包目录

cazy_webscraper:Web刮板检索由CAZy网站数据库分类的所有蛋白质数据（116个子文件）

kb_pag_page.html 96KB

http___www_cazy_org_GH60_all_html.html 19KB

test_crawler_parse_html_pages.py 5KB

protein_no_gbks.html 864B

bioconda-badge-wide.png 103KB

no_protein_total.html 93KB

cazy_homepage.html 16KB

cazy_gh147_page.html 92KB

sql_orm.py 18KB

no_table.html 21KB

test_crawler_get_html_pages.py 24KB

incorrect_format.html 19KB

SOURCES.txt 413B

test_crawler_scrape_all.py 26KB

protein_multiple_primary_gbks.html 1KB

setup.py 2KB

__init__.py 19KB

write_fasta_from_db.py 0B

no_cazyme_table.html 19KB

test_crawler_scrape_kingdoms.py 27KB

get_cazy_pages.py 31KB

license.rst 1KB

pract.py 403B

make.bat 799B

__init__.py 25KB

cazy_classpage_no_subfams.html 108KB

.gitignore 251B

kb_incomplete.html 22KB

scrape_all.py 34KB

Format_and_parsing_errors_CW_timestamp.log 332B

protein_without_ec.html 926B

Makefile 638B

cazy_classpage_no_urls.html 55KB

configuration_scraper.rst 13KB

add_cazyme_data.py 14KB

__init__.py 0B

cazy_homepage_no_urls.html 16KB

protein_no_primary_genbanks.html 917B

test_family_urls.txt 5KB

README.md 204B

test_webscraper.py 26KB

empty_fam.html 19KB

__init__.py 17KB

family_urls.txt 6KB

__init__.py 0B

get_pdb_structures.py 12KB

test_sql_interface.py 14KB

pdb.rst 2KB

getting_started_poster.pdf 725KB

no_pagination_pag.html 94KB

CAZy_connection_failures_CW_timestamp.log 600B

PKG-INFO 15KB

http___www_cazy_org_GH21_all_html.html 19KB

__init__.py 21KB

cazy_dictionary.json 2KB

__init__.py 1KB

README.md 219B

parse_local_pages.py 11KB

subfamily_urls.txt 6KB

deleted_fam.html 19KB

pag_page.html 715KB

cazy_webscraper.py 18KB

test_sql_queries.py 0B

SQL_errors_CW_timestamp.log 512B

sequence.fasta 226B

theme_overrides.css 374B

genbank.rst 3KB

LICENSE 1KB

test_expand.py 27KB

cazy_dictionary.json 2KB

test_sql_interface__init__.py 10KB

kb_no_pag.html 30KB

test_sql_orm.py 4KB

cazy_homepage_no_spip_out.html 6KB

test_crawler.py 22KB

README.md 19KB

README.md 270B

__init__.py 3KB

test_file_io.py 8KB

settings.json 690B

unit_test_2021-04-27--11-54-58.db 144KB

index.rst 8KB

__init__.py 1KB

http___www_cazy_org_GH61_all_html.html 19KB

cazy_classpage.html 115KB

scrape_by_kingdom.py 35KB

cazy_classpage_incorrect_urls.html 115KB

no_table.html 19KB

conftest.py 2KB

__init__.py 1KB

proteins.txt 9KB

__init__.py 5KB

conf.py 2KB

protein_with_ec.html 1KB

test_utilities.py 3KB

test_parse_configuration.py 20KB

__init__.py 1KB

test_log 0B

get_genbank_sequences.py 36KB

共 116 条

狛绝的追随者

粉丝: 27
资源: 4611

Python实现CAZy数据库蛋白质数据自动检索工具

碳水化合物活性酶数据库(CAZy)及其研究趋势 (2014年)

Python库 | cazy_webscraper-2.0.3-py3-none-any.whl

Python库 | cazy_webscraper-2.0.10-py3-none-any.whl

Python数据分析利器：cazy_webscraper库使用指南

Python库cazy_webscraper-2.0.10详细安装教程

Leachate_Microbiome-CAZy_Database

33Annotation功能注释数据库1

Recipe_V_Ver1_C.pdf

MELSECiQ_F_FX5用户手册定位篇.pdf

iQ_F_FX5定位模块FB参考.pdf

最新资源