没有合适的资源?快使用搜索试试~ 我知道了~
首页GPT-4技术报告:大型多模态模型的突破
GPT-4技术报告:大型多模态模型的突破
需积分: 0 0 下载量 68 浏览量
更新于2024-06-26
收藏 4.99MB PDF 举报
"本文档是关于GPT-4技术报告的摘要,由OpenAI发布。GPT-4是一个大规模的多模态模型,能够接受图像和文本输入,并生成文本输出。尽管在许多现实世界场景中,GPT-4的能力不如人类,但它在各种专业和学术基准测试中表现出与人类相当的水平,包括模拟考试,成绩达到前10%的测试者水平。GPT-4基于Transformer架构,通过预训练来预测文档中的下一个标记。经过后训练的对齐过程,模型在事实准确性以及遵循期望行为方面的表现得到提升。项目的关键部分是开发能够在大规模范围内预测性行为的基础设施和优化方法。这使得即使在使用比GPT-4训练所用计算资源小1/1000的情况下,也能准确预测模型的部分性能。
1. 引言
这份技术报告介绍了GPT-4,一个能够处理图像和文本输入并生成文本输出的大规模多模态模型。这类模型的研究至关重要,因为它们有可能被广泛应用于各种领域。"
### GPT-4的技术特点和能力
GPT-4的核心特性在于其多模态处理能力,能够同时理解和处理图像和文本信息。这扩展了其在自然语言处理(NLP)任务上的应用范围,使其不仅限于纯文本输入,还能处理如图像识别、图文结合的问答等复杂任务。它在模拟考试中的高分表现展示了其在理解、推理和应用知识方面的强大能力。
### 预训练与后训练对齐
GPT-4基于Transformer架构,这是一种深度学习模型,特别适合处理序列数据,如文本。预训练阶段,模型通过预测上下文中的下一个令牌来学习语言模式。后训练对齐阶段则进一步优化模型,使其在事实准确性、避免有害输出和遵循期望行为方面表现更好。这一过程可能涉及大量的监督学习和微调,以确保模型产出的内容更为可靠和安全。
### 基础设施与优化方法
项目团队开发了可预测性基础设施,这是一项关键成就,意味着模型的性能可以在不同规模下进行预测。这允许研究人员在有限的计算资源下评估模型性能,对于控制成本和提高效率具有重要意义。这种能力对于未来的大型模型开发具有借鉴价值。
### 应用前景
GPT-4的性能和多功能性使其在多个领域有潜在应用,如虚拟助手、自动内容生成、教育、医疗信息分析、法律文档理解等。然而,也需要注意其可能存在的局限性和潜在风险,如误导性输出、隐私问题和伦理考量。
### 结论
GPT-4作为OpenAI的最新成果,展示了多模态模型在提升人工智能性能方面的巨大潜力。尽管仍存在挑战,但这些进步为未来更智能、更适应现实世界的AI系统奠定了基础。随着模型规模的扩大和算法的优化,我们有望看到更多创新应用的涌现。
Dataset contributions
11
Diogo Almeida, Mo Bavarian, Juan Felipe Cerón Uribe, Tyna Eloun-
dou, Liam Fedus, Tarun Gogineni, Rapha Gontijo-Lopes, Jonathan
Gordon, Joost Huizinga, Shawn Jain, Roger Jiang, Łukasz Kaiser,
Christina Kim, Jan Leike, Chak Ming Li, Stephanie Lin, Ryan Lowe,
Jacob Menick, Luke Metz, Pamela Mishkin, Tong Mu, Oleg Murk,
Ashvin Nair, Long Ouyang, Alex Passos, Michael (Rai) Pokorny,
Vitchyr Pong, Shibani Santurkar, Daniel Selsam, Sarah Shoker, Car-
roll Wainwright, Matt Wiethoff, Jeff Wu, Kai Xiao, Kevin Yu, Marvin
Zhang, Chong Zhang, William Zhuk, Barret Zoph
Data infrastructure
11
Irwan Bello, Lenny Bogdonoff, Juan Felipe Cerón Uribe, Joshua
Gross, Shawn Jain, Haozhun Jin, Christina Kim, Aris Konstantinidis,
Teddy Lee, David Medina, Jacob Menick, Luke Metz, Ashvin Nair,
Long Ouyang, Michael (Rai) Pokorny, Vitchyr Pong, John Schulman,
Jonathan Ward, Jiayi Weng, Matt Wiethoff, Sarah Yoo, Kevin Yu,
Wojciech Zaremba, William Zhuk, Barret Zoph
ChatML format
11
Ilge Akkaya, Christina Kim, Chak Ming Li, Rachel Lim, Jacob
Menick, Luke Metz, Andrey Mishchenko, Vitchyr Pong, John Schul-
man, Carroll Wainwright, Barret Zoph
Model safety
11
Josh Achiam, Steven Adler, Juan Felipe Cerón Uribe, Hyung Won
Chung, Tyna Eloundou, Rapha Gontijo-Lopes, Shixiang Shane Gu,
Johannes Heidecke, Joost Huizinga, Teddy Lee, Jan Leike, Stephanie
Lin, Ryan Lowe, Todor Markov, Luke Metz, Tong Mu, Shibani San-
turkar, John Schulman, Andrea Vallone, Carroll Wainwright, Jason
Wei, Lilian Weng, Kai Xiao, Chong Zhang, Marvin Zhang, Barret
Zoph
Refusals
11
Juan Felipe Cerón Uribe, Tyna Eloundou, Johannes Heidecke, Joost
Huizinga, Jan Leike, Stephanie Lin, Ryan Lowe, Pamela Mishkin,
Tong Mu, Carroll Wainwright, Lilian Weng, Kai Xiao, Chong Zhang,
Barret Zoph
Foundational RLHF and InstructGPT work
11
Diogo Almeida, Joost Huizinga, Roger Jiang, Jan Leike, Stephanie
Lin, Ryan Lowe, Pamela Mishkin, Dan Mossing, Long Ouyang, Kata-
rina Slama, Carroll Wainwright, Jeff Wu, Kai Xiao, Marvin Zhang
Flagship training runs
11
Greg Brockman, Liam Fedus, Johannes Heidecke, Joost Huizinga,
Roger Jiang, Kyle Kosic, Luke Metz, Ashvin Nair, Jiayi Weng,
Chong Zhang, Shengjia Zhao, Barret Zoph
Code capability
11
Ilge Akkaya, Mo Bavarian, Jonathan Gordon, Shawn Jain, Haozhun
Jin, Teddy Lee, Chak Ming Li, Oleg Murk, Ashvin Nair, Vitchyr
Pong, Benjamin Sokolowsky, Jerry Tworek, Matt Wiethoff, Sarah
Yoo, Kevin Yu, Wojciech Zaremba, William Zhuk
Evaluation & analysis
Core contributors
11
Sandhini Agarwal System card co-lead
Lama Ahmad Expert red teaming & adversarial testing program lead
Mo Bavarian Capability prediction co-lead
Tyna Eloundou Safety evaluations co-lead
Andrew Kondrich OpenAI Evals open-sourcing co-lead
Gretchen Krueger System card co-lead
Michael Lampe Privacy and PII evaluations lead
Pamela Mishkin Economic impact & overreliance evaluations lead
Benjamin Sokolowsky Capability prediction co-lead
Jack Rae Research benchmark execution lead
Chelsea Voss Eval execution lead
Alvin Wang OpenAI Evals lead
Kai Xiao Safety evaluations co-lead
Marvin Zhang OpenAI Evals open-sourcing co-lead
OpenAI Evals library
11
Shixiang Shane Gu, Angela Jiang, Logan Kilpatrick, Andrew Kon-
drich, Pamela Mishkin, Jakub Pachocki, Ted Sanders, Jessica Shieh,
Alvin Wang, Marvin Zhang
Model-graded evaluation infrastructure
11
Liam Fedus, Rapha Gontijo-Lopes, Shixiang Shane Gu, Andrew
Kondrich, Michael (Rai) Pokorny, Wojciech Zaremba, Chong Zhang,
Marvin Zhang, Shengjia Zhao, Barret Zoph
Acceleration forecasting
11
Alan Hickey, Daniel Kokotajlo, Cullen O’Keefe, Sarah Shoker
ChatGPT evaluations
11
Juan Felipe Cerón Uribe, Hyung Won Chung, Rapha Gontijo-Lopes,
Liam Fedus, Luke Metz, Michael Rai Pokorny, Jason Wei, Shengjia
Zhao, Barret Zoph
Capability evaluations
11
Tyna Eloundou, Shengli Hu, Roger Jiang, Jamie Kiros, Teddy Lee,
Scott Mayer McKinney, Jakub Pachocki, Alex Paino, Giambattista
Parascandolo, Boris Power, Raul Puri, Jack Rae, Nick Ryder, Ted
Sanders, Szymon Sidor, Benjamin Sokolowsky, Chelsea Voss, Alvin
Wang, Rowan Zellers, Juntang Zhuang
Coding evaluations
11
Ilge Akkaya, Mo Bavarian, Jonathan Gordon, Shawn Jain, Chak Ming
Li, Oleg Murk, Vitchyr Pong, Benjamin Sokolowsky, Jerry Tworek,
Kevin Yu, Wojciech Zaremba
Real-world use case evaluations
11
Andrew Kondrich, Joe Palermo, Boris Power, Ted Sanders
Contamination investigations
11
Adrien Ecoffet, Roger Jiang, Ingmar Kanitscheider, Scott Mayer
McKinney, Alex Paino, Giambattista Parascandolo, Jack Rae, Qim-
ing Yuan
Instruction following and API evals
11
Diogo Almeida, Carroll Wainwright, Marvin Zhang
Novel capability discovery
11
Filipe de Avila Belbute Peres, Kevin Button, Fotis Chantzis, Mike
Heaton, Wade Hickey, Xin Hu, Andrew Kondrich, Matt Knight, An-
drew Mayne, Jake McNeil, Vinnie Monaco, Joe Palermo, Joel Parish,
Boris Power, Bob Rotsted, Ted Sanders
Vision evaluations
11
Shixiang Shane Gu, Shengli Hu, Jamie Kiros, Hyeonwoo Noh, Raul
Puri, Rowan Zellers
Economic impact evaluation
11
Tyna Eloundou, Sam Manning, Aalok Mehta, Pamela Mishkin
Non-proliferation, international humanitarian law & national
security red teaming
11
Sarah Shoker
Overreliance analysis
11
Miles Brundage, Michael Lampe, Pamela Mishkin
Privacy and PII evaluations
11
Michael Lampe, Vinnie Monaco, Ashley Pantuliano
Safety and policy evaluations
11
Josh Achiam, Sandhini Agarwal, Lama Ahmad, Jeff Belgum, Tyna
Eloundou, Johannes Heidecke, Shengli Hu, Joost Huizinga, Jamie
Kiros, Gretchen Krueger, Michael Lampe, Stephanie Lin, Ryan
Lowe, Todor Markov, Vinnie Monaco, Tong Mu, Raul Puri, Girish
Sastry, Andrea Vallone, Carroll Wainwright, CJ Weinmann, Lilian
Weng, Kai Xiao, Chong Zhang
OpenAI adversarial testers
11
Josh Achiam, Steven Adler, Lama Ahmad, Shyamal Anadkat, Red
Avila, Gabriel Bernadett-Shapiro, Anna-Luisa Brakman, Tim Brooks,
Miles Brundage, Chelsea Carlson, Derek Chen, Hyung Won Chung,
Jeremiah Currier, Daniel Kokotajlo, David Dohan, Adrien Ecoffet,
Juston Forte, Vik Goel, Ryan Greene, Johannes Heidecke, Alan
Hickey, Shengli Hu, Joost Huizinga, Janko, Tomer Kaftan, Ali Ka-
mali, Nitish Shirish Keskar, Tabarak Khan, Hendrik Kirchner, Daniel
Kokotajlo, Gretchen Krueger, Michael Lampe, Teddy Lee, Molly
Lin, Ryan Lowe, Todor Markov, Jake McNeil, Pamela Mishkin,
Vinnie Monaco, Daniel Mossing, Tong Mu, Oleg Murk, Cullen
O’Keefe, Joe Palermo, Giambattista Parascandolo, Joel Parish, Boris
Power, Alethea Power, Cameron Raymond, Francis Real, Bob Rot-
sted, Mario Salterelli, Sam Wolrich, Ted Sanders, Girish Sastry,
Sarah Shoker, Shyamal Anadkat, Yang Song, Natalie Staudacher,
Madeleine Thompson, Elizabeth Tseng, Chelsea Voss, Jason Wei,
Chong Zhang
16
System card & broader impacts analysis
11
Steven Adler, Sandhini Agarwal, Lama Ahmad, Janko Altenschmidt,
Jeff Belgum, Gabriel Bernadett-Shapiro, Miles Brundage, Derek
Chen, Tyna Eloundou, Liam Fedus, Leo Gao, Vik Goel, Johannes
Heidecke, Alan Hickey, Shengli Hu, Joost Huizinga, Daniel Kokota-
jlo, Gretchen Krueger, Michael Lampe, Jade Leung, Stephanie Lin,
Ryan Lowe, Kim Malfacini, Todor Markov, Bianca Martin, Aalok
Mehta, Pamela Mishkin, Tong Mu, Richard Ngo, Cullen O’Keefe,
Joel Parish, Rai Pokorny, Bob Rotsted, Girish Sastry, Sarah Shoker,
Andrea Vallone, Carroll Wainwright, CJ Weinmann, Lilian Weng,
Dave Willner, Kai Xiao, Chong Zhang
Deployment
Core contributors
11
Steven Adler Early stage program management lead
Sandhini Agarwal Launch safety lead
Derek Chen Monitoring & response lead
Atty Eleti GPT-4 API co-lead
Joanne Jang GPT-4 product co-lead
Angela Jiang GPT-4 product co-lead
Tomer Kaftan Inference infrastructure & deployment lead
Rachel Lim GPT-4 API co-lead
Kim Malfacini Usage policy lead
Bianca Martin Release program management lead
Evan Morikawa Engineering lead
Henrique Ponde de Oliveira Pinto Inference workflow lead
Heather Schmidt GPT-4 infrastructure management
Maddie Simens Design lead
Felipe Petroski Such Inference optimization & reliability lead
Andrea Vallone Detection & refusals policy lead
Lilian Weng Applied research lead
Dave Willner Trust & safety lead
Michael Wu Inference research lead
Inference research
11
Paul Baltescu, Scott Gray, Yuchen He, Arvind Neelakantan, Michael
Wu
GPT-4 API & ChatML deployment
11
Greg Brockman, Brooke Chan, Chester Cho, Atty Eleti, Rachel Lim,
Andrew Peng, Michelle Pokrass, Sherwin Wu
GPT-4 web experience
11
Valerie Balcom, Lenny Bogdonoff, Jason Chen, Dave Cummings,
Noah Deutsch, Mike Heaton, Paul McMillan, Rajeev Nayak, Joel
Parish, Adam Perelman, Eric Sigler, Nick Turley, Arun Vijayvergiya,
Chelsea Voss
Inference infrastructure
11
Brooke Chan, Scott Gray, Chris Hallacy, Kenny Hsu, Tomer Kaftan,
Rachel Lim, Henrique Ponde de Oliveira Pinto, Raul Puri, Heather
Schmidt, Felipe Petroski Such
Reliability engineering
11
Haiming Bao, Madelaine Boyd, Ben Chess, Damien Deville, Yufei
Guo, Vishal Kuo, Ikai Lan, Michelle Pokrass, Carl Ross, David
Schnurr, Jordan Sitkin, Felipe Petroski Such
Trust & safety engineering
11
Jeff Belgum, Madelaine Boyd, Vik Goel
Trust & safety monitoring and response
11
Janko Altenschmidt, Anna-Luisa Brakman, Derek Chen, Florencia
Leoni Aleman, Molly Lin, Cameron Raymond, CJ Weinmann, Dave
Willner, Samuel Wolrich
Trust & safety policy
11
Rosie Campbell, Kim Malfacini, Andrea Vallone, Dave Willner
Deployment compute
11
Peter Hoeschele, Evan Morikawa
Product management
11
Jeff Harris, Joanne Jang, Angela Jiang
Additional contributions
Sam Altman, Katie Mayer, Bob McGrew, Mira Murati, Ilya Sutskever,
Peter Welinder
11
Blog post & paper content
11
Sandhini Agarwal, Greg Brockman, Miles Brundage, Adrien Ecof-
fet, Tyna Eloundou, David Farhi, Johannes Heidecke, Shengli Hu,
Joost Huizinga, Roger Jiang, Gretchen Krueger, Jan Leike, Daniel
Levy, Stephanie Lin, Ryan Lowe, Tong Mu, Hyeonwoo Noh, Jakub
Pachocki, Jack Rae, Kendra Rimbach, Shibani Santurkar, Szymon
Sidor, Benjamin Sokolowsky, Jie Tang, Chelsea Voss, Kai Xiao,
Rowan Zellers, Chong Zhang, Marvin Zhang
Communications
11
Ruby Chen, Cory Decareaux, Thomas Degry, Steve Dowling, Niko
Felix, Elie Georges, Anna Makanju, Andrew Mayne, Aalok Mehta,
Elizabeth Proehl, Kendra Rimbach, Natalie Summers, Justin Jay
Wang, Hannah Wong
Compute allocation support
11
Theresa Lopez, Elizabeth Tseng
Contracting, revenue, pricing, & finance support
11
Brooke Chan, Denny Jin, Billie Jonn, Patricia Lue, Kyla Sheppard,
Lauren Workman
Launch partners & product operations
11
Filipe de Avila Belbute Peres, Brittany Carey, Simón Posada Fishman,
Isabella Fulford, Teddy Lee„ Yaniv Markovski, Tolly Powell, Toki
Sherbakov, Jessica Shieh, Natalie Staudacher, Preston Tuggle
Legal
11
Jake Berdine, Che Chang, Sheila Dunning, Ashley Pantuliano
Security & privacy engineering
11
Kevin Button, Fotis Chantzis, Wade Hickey, Xin Hu, Shino Jomoto,
Matt Knight, Jake McNeil, Vinnie Monaco, Joel Parish, Bob Rotsted
System administration & on-call support
11
Morgan Grafstein, Francis Real, Mario Saltarelli
Authorship & credit attribution
11
David Farhi
We also acknowledge and thank every OpenAI team member not explicitly mentioned above,
including the amazing people on the executive assistant, finance, go to market, human resources,
legal, operations and recruiting teams. From hiring everyone in the company, to making sure we have
an amazing office space, to building the administrative, HR, legal, and financial structures that allow
us to do our best work, everyone at OpenAI has contributed to GPT-4.
We thank Microsoft for their partnership, especially Microsoft Azure for supporting model
training with infrastructure design and management, and the Microsoft Bing team and Microsoft’s
safety teams for their partnership on safe deployment.
We are grateful to our expert adversarial testers and red teamers who helped test our mod-
11
All author lists sorted alphabetically.
17
els at early stages of development and informed our risk assessments as well as the System Card.
Participation in this red teaming process is not an endorsement of the deployment plans of OpenAI or
OpenAI’s policies: Steven Basart, Sophie Duba, Cèsar Ferri, Heather Frase, Gavin Hartnett, Jake J.
Hecla, Dan Hendrycks, Jose Hernandez-Orallo, Alice Hunsberger, Rajiv W. Jain, Boru Gollo Jattani,
Lauren Kahn, Dan Kaszeta, Sara Kingsley, Noam Kolt, Nathan Labenz, Eric Liddick, Andrew J.
Lohn, Andrew MacPherson, Sam Manning, Mantas Mazeika, Anna Mills, Yael Moros, Jimin Mun,
Aviv Ovadya, Roya Pakzad, Yifan Peng, Ciel Qi, Alex Rosenblatt, Paul Röttger, Maarten Sap, Wout
Schellaert, George Shih, Muhammad Shoker, Melanie Subbiah, Bryan West, Andrew D. White, Anna
Katariina Wisakanto, Akhila Yerukola, Lexin Zhou, Xuhui Zhou.
We thank our collaborators at Casetext and Stanford CodeX for conducting the simulated
bar exam: P. Arredondo (Casetext/Stanford CodeX), D. Katz (Stanford CodeX), M. Bommarito
(Stanford CodeX), S. Gao (Casetext).
GPT-4 was used for help with wording, formatting, and styling throughout this work.
References
[1]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal,
Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are
few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
[2]
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza
Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al.
Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
[3]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam
Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. PaLM:
Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
[4]
Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song,
John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. Scaling language
models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446,
2021.
[5]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, and Ruslan Salakhutdinov.
Transformer-XL: Attentive language models beyond a fixed-length context. arXiv preprint
arXiv:1901.02860, 2019.
[6]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy,
Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. RoBERTa: A robustly optimized BERT
pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
[7]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of
deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805,
2018.
[8]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena,
Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified
text-to-text transformer. arXiv preprint arXiv:1910.10683, 2019.
[9]
Noam Shazeer and Mitchell Stern. Adafactor: Adaptive learning rates with sublinear memory
cost. arXiv preprint arXiv:1804.04235, 2018.
[10]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization. arXiv preprint
arXiv:1607.06450, 2016.
[11]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny
Zhou. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022.
[12]
Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei
Han. Large language models can self-improve. arXiv preprint arXiv:2210.11610, 2022.
18
[13]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large
language models are zero-shot reasoners. arXiv preprint arXiv:2205.11916, 2022.
[14]
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child,
Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language
models. arXiv preprint arXiv:2001.08361, 2020.
[15]
Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson,
Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, et al. Scaling laws for autoregressive
generative modeling. arXiv preprint arXiv:2010.14701, 2020.
[16]
Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick
Ryder, Jakub Pachocki, Weizhu Chen, and Jianfeng Gao. Tensor Programs V: Tuning large
neural networks via zero-shot hyperparameter transfer. arXiv preprint arXiv:2203.03466, 2022.
[17]
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton,
and Jeff Dean. Outrageously large neural networks: The sparsely-gated Mixture-of-Experts
layer. arXiv preprint arXiv:1701.06538, 2017.
[18]
Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer,
and William Fedus. ST-MoE: Designing stable and transferable sparse expert models. arXiv
preprint arXiv:2202.08906, 2022.
[19]
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani
Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large
language models. TMLR, 2022.
[20]
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz Kaiser. Uni-
versal transformers. In International Conference on Learning Representations, 2019. URL
https://openreview.net/forum?id=HyzdRiR9Y7.
[21]
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. RoFormer:
Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
[22]
Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson,
Karel Lenc, Arthur Mensch, Katherine Millican, Malcolm Reynolds, et al. Flamingo: a visual
language model for few-shot learning. In Advances in Neural Information Processing Systems.
[23]
Xi Chen, Xiao Wang, Soravit Changpinyo, AJ Piergiovanni, Piotr Padlewski, Daniel Salz,
Sebastian Goodman, Adam Grycner, Basil Mustafa, Lucas Beyer, et al. PaLI: A jointly-scaled
multilingual language-image model. arXiv preprint arXiv:2209.06794, 2022.
[24]
Ben Wang and Aran Komatsuzaki. GPT-J-6B: A 6 billion parameter autoregressive language
model, 2021.
[25]
Sid Black, Leo Gao, Phil Wang, Connor Leahy, and Stella Biderman. GPT-Neo: Large scale
autoregressive language modeling with mesh-tensorflow. If you use this software, please cite it
using these metadata, 58, 2021.
[26]
Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ili
´
c, Daniel Hesslow,
Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, et al. Bloom: A
176B-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100,
2022.
[27]
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen,
Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. OPT: Open pre-trained
transformer language models. arXiv preprint arXiv:2205.01068, 2022.
[28] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo-
thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. LLaMA: Open
and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[29]
Alec Radford, Rafal Józefowicz, and Ilya Sutskever. Learning to generate reviews and discover-
ing sentiment. arXiv preprint arXiv:1704.01444, 2017.
19
[30]
Guillaume Lample and Alexis Conneau. Cross-lingual language model pretraining. arXiv
preprint arXiv:1901.07291, 2019.
[31]
Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and
memory-efficient exact attention with io-awareness. arXiv preprint arXiv:2205.14135, 2022.
[32]
Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. Generating long sequences with
sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
[33]
Markus N. Rabe and Charles Staats. Self-attention does not need
o(n
2
)
memory. arXiv preprint
arXiv:2112.05682, 2021.
[34]
Scott Gray, Alec Radford, and Diederik P. Kingma. Gpu kernels for block-sparse weights, 2017.
URL https://cdn.openai.com/blocksparse/blocksparsepaper.pdf.
[35]
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and
Jacob Steinhardt. Measuring massive multitask language understanding. Proceedings of the
International Conference on Learning Representations (ICLR), 2021.
[36]
Dan Hendrycks, Collin Burns, Steven Basart, Andrew Critch, Jerry Li, Dawn Song, and Jacob
Steinhardt. Aligning AI with shared human values. Proceedings of the International Conference
on Learning Representations (ICLR), 2021.
[37]
Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language
models are unsupervised multitask learners. 2019.
[38]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language
understanding by generative pre-training. 2018.
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. NeurIPS, 2017.
[40]
Paul F Christiano, Jan Leike, Tom Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep
reinforcement learning from human preferences. Advances in Neural Information Processing
Systems, 30, 2017.
[41]
Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan
Kianinejad, Md Patwary, Mostofa Ali, Yang Yang, and Yanqi Zhou. Deep learning scaling is
predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
[42]
Neil C Thompson, Kristjan Greenewald, Keeheon Lee, and Gabriel F Manso. The computational
limits of deep learning. arXiv preprint arXiv:2007.05558, 2020.
[43]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto,
Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul
Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke
Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad
Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias
Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex
Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain,
William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra,
Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer,
Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech
Zaremba. Evaluating large language models trained on code. 2021.
[44]
Ian McKenzie, Alexander Lyzhov, Alicia Parrish, Ameya Prabhu, Aaron Mueller, Najoung Kim,
Sam Bowman, and Ethan Perez. The Inverse Scaling Prize, 2022. URL
https://github.
com/inverse-scaling/prize.
[45]
Jason Wei, Najoung Kim, Yi Tay, and Quoc V. Le. Inverse scaling can become U-shaped. arXiv
preprint arXiv:2211.02011, 2022.
[46]
Ian McKenzie, Alexander Lyzhov, Alicia Parrish, Ameya Prabhu, Aaron Mueller, Najoung
Kim, Sam Bowman, and Ethan Perez. Inverse Scaling Prize: First round winners, 2022. URL
https://irmckenzie.co.uk/round1.
20
剩余99页未读,继续阅读
1676 浏览量
3561 浏览量
599 浏览量
424 浏览量
m0_74853282
- 粉丝: 0
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 叉车变矩器故障诊断及处理.rar
- BULLDOG-开源
- 草图设备:一些草图格式的设备
- libdaisy-rust:菊花板的硬件抽象层实现
- clangular:lan角
- 行业文档-设计装置-一种拒油抗静电纸质包装材料.zip
- ICLR-Workshop-Challenge-1-CGIAR-Computer-Vision-for-Crop-Disease:Zindi竞赛的入门代码-ICLR Workshop Challenge#1
- aklabeth:Akalabeth aka'Ultima 0'的翻拍-开源
- snglpg:Занимаясь“在浏览器中设计”
- OpenCore-0.6.2-09-09.zip
- 摩尔斯电码,实现将字符转为摩尔斯电码的主体功能,能将摩尔斯电码通过串口上位机进行显示
- matlab布朗运动代码-Zombie:用于团队项目的MATLAB僵尸启示仿真(2016)
- 纯css3圆形发光按钮动画特效
- mvntest
- 版本:效用调查,专家和UX使用者,请指责一个集体经济团体,请参阅一份通俗的经济通函,一份从业者的各种困难和疑难解答,请参见网站实际内容
- OpenCore-0.6.1-09-08正式版.zip
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功