SORA — a new frontier in DeepFake technology
February 20, 2024
This week represented another huge step forward in Deepfake technology. OpenAI’s impressive SORA model provides users the ability to create realistic and sophisticated video content with a simple text prompt. This innovative technology promises to change the way we create and consume video content, but it also raises important questions about the future of creative industries and the ethical implications of AI-generated content.
How Does SORA Work?
Designed to convert even the most basic text prompts into detailed and expansive video content, SORA takes the generative ability of text to image generators a step further. While much of the specific architecture of this new model has not been revealed, AI experts have been able to realistically theorize how this system might have come about.
Firstly, Deep Learning models absorb hundreds of thousands of text to image pairs. Using architectures such as Generative Adversarial Networks(GANs) and transformer models, SORA gains the ability to transform textual information into corresponding visual content.
It is quite likely that a model such as SORA begins by creating dozens of still frame images, generated to represent the key moments necessary for a video representation of the given prompt. Using sophisticated image to video synthesis models, these frames are used to create dynamic and realistic transitions, using sophisticated modeling of temporal movement and realistic human physics to create a smooth and continuous video that mimics a live action video recording.
Lastly, it is more than likely that these stitched videos undergo sophisticated post processing, further enhancing their visual quality, ensuring consistent physics, and adding detail.
Potential Shortcomings of the SORA Model
As with every new machine learning model, SORA does not come without its shortcomings. Both having analyzed SORA outputs, and recognizing the inconsistencies that OpenAI has acknowledged, there are three key factors in which improvement is necessary.
SORA tends to struggle creating accurate and representative physics models. This is an understandable problem, given the complexity even the most sophisticated scientific models struggle with to simulate physics. While that may be the case, this has led to many SORA videos appearing unrealistic or at times comical, consider this example released by OpenAI to consider this very point.
Failure to understand cause and effect situations. While the outputs of SORA are beyond impressive, and the ability to generate complex scenes with many involved characters is unparalleled, the model seems to struggle with characters interacting with one another. For instance, consider this adorable video of puppies interacting with each other. Despite the accurate representation of the animals, their collisions cause a number of inconsistencies that are clearly off to the human eye.
Lastly, SORA struggles to represent time based description of phenomena. Take this OpenAI released example of a grandmother blowing out her birthday candles. While the model has accurately simulated the action of blowing out the candles, it is unable to represent the relationship between the individual blowing out the candles, and the actual extinguishing of the candles themselves. While not destructive, this shortcoming certainly calls into question the realism of many of these generated videos.
Ethical Considerations of the SORA Model
Each time a new sophisticated generation model is released, we believe it is our obligation to consider the ethical impact it may have on the continued propagation of synthetically manipulated content on the internet. While we celebrate the advances of this amazing technology, it is also paramount to consider how it may be used to manipulate or harm.
Certainly, these videos can unquestionably be used for the propagation of misinfromation. Let’s begin by considering a quite mundane example, the AI Generated social media phenomenon of former President Trump being arrested in New York City. Perhaps instead of releasing a singular image, one was to release a full video series of Trump’s arrest, adding significant fuel to the fire of misinformation. While this might easily be disproven, the damage will already have been done, and illustrates how this advancing technology can build off the streams of misinformation that AI generated images have already created.
Let’s now consider a more serious example, perhaps an individual is being tried for a crime they did not commit, say they are accused of stealing a wallet from an individual in a dark alley. The prosecution decides to play a video, unknowingly generated using an AI model, in which it appears an individual looking quite like the defendant steals and escapes with another’s possessions. Perhaps it doesn’t look perfectly human, but the prosecution claims that this is simply a result of the dark alley, and that this video represents proof of the defendant’s crimes. Now certainly, we would hope that a jury would be able to tease apart this deception, it could realistically be quite difficult for a jury to make a judgment on such new and advanced technology.
Say perhaps an expert is brought on, and testifies that this video appears to have been generated by AI. The defendant is found not guilty, and they begin to return to their life, until a trial reporter releases the generated video on social media. Despite the expert’s opinion, this individual becomes a news sensation, and their life is turned upside down with a simple, AI generated text prompt.
Where does this leave us?
The advent of OpenAI’s SORA model marks a significant milestone in the evolution of deepfake technology, offering unprecedented capabilities in video content generation. While SORA’s ability to create realistic videos from simple text prompts is undoubtedly impressive, it also underscores the urgent need for a comprehensive framework to address the ethical and societal implications of AI-generated content.