ArXiv Preprint
Artificial intelligence (AI) researchers have been developing and refining
large language models (LLMs) that exhibit remarkable capabilities across a
variety of domains and tasks, challenging our understanding of learning and
cognition. The latest model developed by OpenAI, GPT-4, was trained using an
unprecedented scale of compute and data. In this paper, we report on our
investigation of an early version of GPT-4, when it was still in active
development by OpenAI. We contend that (this early version of) GPT-4 is part of
a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that
exhibit more general intelligence than previous AI models. We discuss the
rising capabilities and implications of these models. We demonstrate that,
beyond its mastery of language, GPT-4 can solve novel and difficult tasks that
span mathematics, coding, vision, medicine, law, psychology and more, without
needing any special prompting. Moreover, in all of these tasks, GPT-4's
performance is strikingly close to human-level performance, and often vastly
surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's
capabilities, we believe that it could reasonably be viewed as an early (yet
still incomplete) version of an artificial general intelligence (AGI) system.
In our exploration of GPT-4, we put special emphasis on discovering its
limitations, and we discuss the challenges ahead for advancing towards deeper
and more comprehensive versions of AGI, including the possible need for
pursuing a new paradigm that moves beyond next-word prediction. We conclude
with reflections on societal influences of the recent technological leap and
future research directions.
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang
2023-03-22