2019 ICLR:图结构引导的蛋白质设计生成模型

需积分: 0 124 浏览量更新于2024-08-05 收藏 3.11MB PDF 举报

2019年国际机器学习会议（ICLR）上，一篇名为"2019-ICLR-GENERATIVE MODELS FOR GRAPH-BASED PROTEIN DESIGN"的工作坊论文介绍了研究人员们在蛋白质设计领域的创新方法。该研究由John Ingraham、Vikas K. Garg、Regina Barzilay和Tommi Jaakkola来自麻省理工学院的计算机科学与人工智能实验室（CSAIL）共同完成。论文关注的是利用生成模型解决基于图结构的蛋白质设计问题。在生物医学、能源和材料科学等领域，人工设计出能够有效解决问题的蛋白质具有巨大的潜力。然而，实际操作中找到稳定且可行的设计往往十分困难，这主要源于蛋白质序列与三维结构之间复杂的相互作用，即所谓的"逆向蛋白质折叠"问题。作者的目标是开发一种能够根据设计目标的图结构规范来生成蛋白质序列的生成模型。传统的针对给定结构的蛋白质序列模型存在局限性，它们可能无法充分捕捉蛋白质序列中的长程依赖关系，但这些依赖在三维空间中通常是局部化的。因此，研究团队提出了一种新型的生成框架，其重点在于利用深度生成模型高效地处理这种序列上的长程关联，同时保持三维空间中的局部一致性。这种方法在处理蛋白质序列和结构之间的复杂关系方面取得了显著的进步，相比于先前基于参数的结构给定蛋白质序列模型，它朝着通过深度生成模型实现快速和定向的生物分子设计目标迈出了重要的一步。论文的核心内容包括介绍蛋白质设计的挑战、生成模型的基本原理、如何构建能够捕捉长程序列依赖的图结构表示、以及实验结果和对比分析，展示新模型在性能上的提升。这项工作为蛋白质设计提供了新的计算工具，有望推动这一领域的发展，并为未来的药物发现和材料设计带来创新解决方案。

Published as a workshop paper at ICLR 2019

GENERATIVE MODELS FOR GRAPH-BASED PROTEIN

DESIGN

John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola

CSAIL, MIT

ABSTRACT

Engineered proteins offer the potential to solve many problems in biomedicine,

energy, and materials science, but creating designs that succeed is difﬁcult in prac-

tice. A signiﬁcant aspect of this challenge is the complex coupling between pro-

tein sequence and 3D structure, and the task of ﬁnding a viable design is often

referred to as the inverse protein folding problem. We develop generative mod-

els for protein sequences conditioned on a graph-structured speciﬁcation of the

design target. Our approach efﬁciently captures the complex dependencies in pro-

teins by focusing on those that are long-range in sequence but local in 3D space.

Our framework signiﬁcantly improves upon prior parametric models of protein se-

quences given structure, and takes a step toward rapid and targeted biomolecular

design with the aid of deep generative models.

1 INTRODUCTION

A central goal for computational protein design is to automate the invention of protein molecules

with deﬁned structural and functional properties. This ﬁeld has seen tremendous progess in the past

two decades (Huang et al., 2016), including the design of novel 3D folds (Kuhlman et al., 2003),

enzymes (Siegel et al., 2010), and complexes (Bale et al., 2016). However, the current practice often

requires multiple rounds of trial-and-error, with ﬁrst designs frequently failing (Koga et al., 2012;

Rocklin et al., 2017). Several of the challenges stem from the bottom-up nature of contemporary

approaches that rely on both the accuracy of energy functions to describe protein physics as well as

on the efﬁciency of sampling algorithms to explore the protein sequence and structure space.

Here, we explore an alternative, top-down framework for protein design that directly learns a con-

ditional generative model for protein sequences given a speciﬁcation of the target structure, which

is represented as a graph over the sequence elements. Speciﬁcally, we augment the autoregressive

self-attention of recent sequence models (Vaswani et al., 2017) with graph-based descriptions of the

3D structure. By composing multiple layers of structured self-attention, our model can effectively

capture higher-order, interaction-based dependencies between sequence and structure, in contrast to

previous parameteric approaches (O’Connell et al., 2018; Wang et al., 2018) that are limited to only

the ﬁrst-order effects.

The graph-structured conditioning of a sequence model affords several beneﬁts, including favorable

computational efﬁciency, inductive bias, and representational ﬂexibility. We accomplish the ﬁrst

two by leveraging a well-evidenced ﬁnding in protein science, namely that long-range dependen-

cies in sequence are generally short-range in 3D space (Marks et al., 2011; Morcos et al., 2011;

Balakrishnan et al., 2011). By making the graph and self-attention similarly sparse and localized in

3D space, we achieve computational scaling that is linear in sequence length. Additionally, graph

structured inputs offer representational ﬂexibility, as they accomodate both coarse, ‘ﬂexible back-

bone’ (connectivity and topology) as well as ﬁne-grained (precise atom locations) descriptions of

structure.

We demonstrate the merits of our approach via a detailed empirical study. Speciﬁcally, we evaluate

our model at structural generalization to sequences of protein folds that were outside of the training

set. Our model achieves considerably improved generalization performance over the recent deep

models of protein sequence given structure as well as structure-na

ıve language models.

下载后可阅读完整内容，剩余9页未读，立即下载

江水流春去

粉丝: 50

2019 ICLR:图结构引导的蛋白质设计生成模型

2019-ICLR-DEEP GENERATIVE MODELS FOR GENERATING LABELED GRAPHS-R

2019-ICLR-Confidence-based Graph Convolutional Networks for Semi

2019-ICLR-百度-DEEP GEOMETRICAL GRAPH CLASSIFICATION-游走向量化+GNN+图下采

2019-ICLR-CAPSULE GRAPH NEURAL NETWORK-网文-rrr1

2019-ICLR-Graph Generation via Scattering-作者信息-rrrrr1

2019-ICLR-Graph Classification with Geometric Scattering-作者信息-rr

2019-ICLR-Graph Classification with Geometric Scattering-基于矩阵的小波

2019-ICLR-Pooling Is Neither Necessary nor Sufficient for Approp

protein-sequence-embedding-iclr2019:“使用来自结构的信息学习蛋白质序列嵌入”的源代码-ICLR 2019-Source code learning

ICLR-Workshop-Challenge-1-CGIAR-Computer-Vision-for-Crop-Disease:Zindi竞赛的入门代码-ICLR Workshop Challenge＃1

最新资源