Overview
In 2022, the launch of OpenAI's chatbot ChatGPT marked a major milestone in the development of artificial intelligence, signaling broad adoption across domains such as entertainment, education, productivity, and healthcare. On February 16, OpenAI announced a new model called Sora. According to OpenAI, Sora can generate up to 60 seconds of video from text prompts, including detailed backgrounds, multi-angle shots, and emotionally expressive characters.
Sora represents an expansion of advanced AI capabilities into the video domain. The model demonstrates progress in understanding and creating complex visual content and raises implications for content production, entertainment, and film industries.
Sora demonstrations
Compilation of Sora demonstration videos released at launch.
01. Foundational reflections on Sora
Humans perceive and recognize the world by collecting and interpreting various types of information, which enables cognition and transformation of the environment. Information exists in many formats, including text, symbols, tables, audio, images, video, and animation.
Sora extends the ability to understand, simulate, and generate aspects of the real world beyond human-only processes. It covers text, images, audio, and video formats that previously required human creation. This expansion suggests substantial changes in how visual and multimodal content can be produced.
02. Core technology of large models: machine learning
Large models rely on natural language processing and deep learning. Specifically, they are typically based on the Transformer architecture and trained with large-scale, often unsupervised methods such as autoregressive language modeling and masked language modeling to form large neural network models.
Machine learning is complex in practice. An accessible analogy is to compare it with human learning behavior. Humans often learn by first grasping abstract concepts, which corresponds to feature engineering in machine learning. For example, when learning about triangles in geometry, students first recognize symbols, then learn definitions, properties, and formulas. These conceptual steps are analogous to feature engineering.
Teachers commonly assign homework to reinforce understanding; the process of completing assignments is analogous to training on labeled samples in machine learning. Training data are used to teach models and build the internal representations that capture knowledge.
Students may participate in extra classes or practice exercises to obtain more varied and effective training examples, analogous to augmenting training datasets. Periodic assessments such as weekly quizzes, midterms, and finals serve the same role as test sets in machine learning: evaluating a model's performance. Correct predictions on test data are the end goal of many machine learning activities.
Many computational techniques, including machine learning, are inspired by everyday human activities and practices, and can often be understood through familiar analogies.
Traditional cultural knowledge and wisdom can be transmitted without written language, and this inheritance can be seen as a form of encoded information passed across generations. By analogy, humans have now created successive generations of artificial intelligence built on previous knowledge and practice.
03. Potential disruptions and impacts of Sora
The introduction of Sora is likely to have significant effects on video production workflows and may create new opportunities across industries. The scope and depth of its impact could extend beyond content creation.
1. Industry and occupational disruption
According to OpenAI's documentation and demonstration videos, Sora is a powerful video-generation tool. If integrated into widely used products, similar to how DALL·E was incorporated into ChatGPT, a video model like Sora could gradually change sectors involved with video, including film production, short-form video creation, online education, advertising, game development, and live streaming.
2. Effects from blending virtual and physical realities
Sora can generate scenarios and scenes based on user preferences and actions, enhancing interactive experiences. As media and virtual environments such as AR and VR become more immersive, the line between reality and virtual content may become harder to distinguish, which raises questions about perception and user behavior.
3. Digitization and implications for social behavior
With advancing digital technologies, many human activities can be recorded and represented as data. Sora could facilitate the generation and reconstruction of detailed models or trajectories of real-world events based on value profiles, organizational or personal tags, and behavior records.
For example, it may become possible to generate video reconstructions without relying solely on camera footage or witness sketches. Greater transparency of transactions and holdings on digital platforms could follow, affecting how financial flows and assets are perceived and managed. These developments may reduce avenues for concealing illicit behavior and could influence social norms and ethical conduct, potentially shifting focus from material pursuits toward ethical and spiritual considerations.
Sora demonstrates not only video generation capability but also how a large model's understanding and simulation of the real world could drive broader societal changes.
04. Conclusion
There is a traditional Chinese saying about the number seven that reflects cycles and change. The 21st century presents rapid transitions and complexities that make older, 20th-century modes of thinking less sufficient. In a digital era, alternative approaches and strategic shifts are often necessary, and prior learning remains valuable.
Sora marks a new milestone in AI technology. China is temporarily behind the United States in some aspects of AI, but the impact of AI on society will have both benefits and risks that remain to be fully observed. Some perspectives view inventions as discoveries that become evident at the appropriate time. If one considers an invisible digital system mapping and recording the world, Sora's development further narrows the gap between physical reality and its digital twin, potentially producing additional societal changes.