cnn与transformer融合
时间: 2023-09-25 07:16:57 浏览: 92
CoAtNet是一种将CNN和Transformer结合起来的通用视觉模型。融合CNN和Transformer的正确方法是通过使用MBConv块和相对自注意力来将平移同变性、输入自适应加权和全局感受野融合在一起。CoAtNet利用了CNN和Transformer的超强能力,并在经典的ViT结构上引入了由3x3卷积组成的Conv Stem和由Depth-wise卷积和自注意力机制组合而成的CMT模块。通过这种融合,CoAtNet能够在不增加太多计算量的情况下大幅度提升视觉网络的精度。在ImageNet和下游任务上的实验证明了CoAtNet架构的优越性。 <span class="em">1</span><span class="em">2</span><span class="em">3</span>
#### 引用[.reference_title]
- *1* *2* [论文导读:CoAtNet是如何完美结合 CNN 和 Transformer的](https://blog.csdn.net/m0_46510245/article/details/123372945)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"]
- *3* [CVPR22 |CMT:CNN和Transformer的高效结合(开源)](https://blog.csdn.net/qq_29462849/article/details/125551051)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 50%"]
[ .reference_list ]