Hard Code | As Sora breaks cover, a reminder of all that’s at stake - Hindustan Times
close_game
close_game

Hard Code | As Sora breaks cover, a reminder of all that’s at stake

Feb 19, 2024 03:13 PM IST

Sure, AI video tools like Sora will upend media production, and there are more pressing concerns about today’s AI arms race

This past week, OpenAI released Sora, a brand-new artificial intelligence tool that allows users to create videos from text instructions.

A photo shows a frame of a video generated by a new intelligence artificial tool, dubbed "Sora", unveiled by the company OpenAI, in Paris on February 16, 2024. OpenAI, the creator of ChatGPT and image generator DALL-E, said it was testing a text-to-video model called Sora that would allow users to create realistic videos with a simple prompt. The Microsoft-backed company said the new platform was currently being tested but released a few videos of what it said was already possible, with the accompanying input made to generate the video. (Photo by Stefano RELLANDINI / AFP)(AFP) PREMIUM
A photo shows a frame of a video generated by a new intelligence artificial tool, dubbed "Sora", unveiled by the company OpenAI, in Paris on February 16, 2024. OpenAI, the creator of ChatGPT and image generator DALL-E, said it was testing a text-to-video model called Sora that would allow users to create realistic videos with a simple prompt. The Microsoft-backed company said the new platform was currently being tested but released a few videos of what it said was already possible, with the accompanying input made to generate the video. (Photo by Stefano RELLANDINI / AFP)(AFP)

The company and its founders released a handful of videos to demonstrate its capabilities and said an early group of testers are now “red-teaming” it: trying to break it to see how, or if, it can be made to break safeguards meant to keep the content it generates safe (that is, without nudity, violence or any other conventionally unacceptable form of media).

The release of the demos has been met with the sort of awe and alarm that OpenAI’s ChatGPT elicited when it first debuted in late 2022.

For instance, the first video the company posted on X (formerly Twitter) was a video of a couple walking through a street that looks like is in Tokyo, after being fed the instruction (or prompt): “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous Sakura petals are flying through the wind along with snowflakes.”

The details and the photorealism — something the company said sets it apart from others — are striking. In another example, of a Pixar-like “short fluffy monster”, Sora generated exceptional detail of the imaginary animal’s fur.

OpenAI said Sora can churn out such videos up to a length of 60 seconds.

Sora’s aren’t the first such demonstrations. Google’s Lumiere does something similar, demonstration videos from the January show —  perhaps not as well as Sora, but the difference is slim enough to think the tech behemoth could close the gap.

OpenAI said Sora can create videos from a prompt, extend a still image into a video (by imaging -- and imagining -- what could be), and extend a video too. The implications of a machine’s ability to do such things are immense for many industries, such as ChatGPT’s has been.

On the most disruptive extreme, it can lead to the sword falling over the visual FX and filming industry, where art and illustration have been the preserve of the artists who sketch (or vector) it out, tapping into nothing but their raw imagination.

Earlier this year, the Animation Guild released a survey showing how industry leaders sized up the potential of disruption. Most —and these are C-suite executives who make business decisions guided primarily by business efficiency (read: the cheapest and most effective way of doing a task) — seemed to agree that jobs like sound editors, 3D modellers, audio and video technicians, game and UI/UX designers could be “displaced”.

The report contends that the hardest hit could be those in entry-level positions. “These have rarely been glamorous or high-paying jobs, but they have offered entry points into entertainment industries and serve as the primary pipeline to mid- and senior-level positions… Such changes will disproportionately affect those from less affluent backgrounds and underrepresented communities who have traditionally used these roles as a means towards economic and career mobility.”

This is not unprecedented. A similar entry-level “jobocalypse” has been forecast for the software industry since AI chatbots demonstrated they could write pages of code from a few lines of instructions, which would normally take a person days to churn out.

While it is too early to say how the economic disruption of such technology will indeed play out —  remember, AI has been best seen as a tool to augment capability, not replace them — the friction is best captured by comments made over the past year by two big names.

First was Disney CEO Bob Iger, who in an earnings call, said the studio is starting to use AI to operate more efficiently. “Overall, I’m bullish about the prospects because I think they’ll create efficiencies and ways for us to basically provide better services to customers”.

On the other hand, James Cameron, one of the most celebrated directors alive and the creator of sci-fi blockbusters like The Terminator and Avatar, said in July that he has no plans to let AI into the creative areas of his craft. “I just don’t believe that a disembodied mind that’s just regurgitating what other embodied minds have said will ever have something that’s going to move an audience,” Cameron told CTV News in an interview.

While Sora has understandably reignited questions about how we see creativity and its place in the process of creating art, other crucial aspects need to be talked about, as both opportunities and threats.

From an opportunity standpoint, AI tools like Sora could someday give indie filmmakers the same sort of visual capabilities that big studios, with multimillion-dollar budgets, wield. It could help journalists and documentary filmmakers with limited resources – often the very ones who have the strongest need to get a message across —  create more engaging work. And, just like the industrial age did to the production of merchandise, freed up human resources from jumping through hoops of repetitive but necessary, nuts-and-bolts processes to automate them.

But there will be new threats too. Deepfake images and audio have already stressed our ability to sift the real from the fake. One of Sora’s stated capabilities —  extending videos or creating motion from a still frame —  is an ability with dangerous implications, especially if such tools are broken to put people doing things that they did not do. A real clip of a politician walking off stage can, theoretically, be appended with realistic but fake footage showing them assaulting a rival; or an innocuous scene from a movie extended into a graphic, sexual encounter, robbing an actor of their agency.

Both Google and OpenAI acknowledge the safety risks and say they are working to ensure these harms are mitigated. “Our primary goal in this work is to enable novice users to generate visual content in a creative and flexible way. However, there is a risk of misuse for creating fake or harmful content with our technology, and we believe that it is crucial to develop and apply tools for detecting biases and malicious use cases in order to ensure safe and fair use,” Google’s technical paper on Lumiere concludes.

This, in itself, is an area that needs utmost attention immediately. Aside from demos and technical details on how their products work, no AI developer has opened up their technology for an under-the-hoods analysis by independent researchers and academics, as has long been the demand ever since social media companies and their black-box algorithms wreaked havoc on society.

The private arms race between AI companies also has a larger environmental, geopolitical and natural resource implication. Sora’s 1-minute videos, according to one report, are not produced in an instant —  one researcher is quoted as saying that to churn out one such sub-1-minute video, it takes about as much time as it does to “go out and get a burrito”. This implies vast computing resources, which only some companies will ever possess, and assimilate.

Computing resources draw power and the hardware they rely on requires rare earth metals. How these resources are used (or misused) are conversations that need to be spotlighted.

Sora, like Lumiere, ChatGPT, Gemini or Midjourney, has immense implications for the future. But there are hard consequences for the present too, especially in the legal questions, that need to be cracked soon.

Binayak Dasgupta, the Page 1 editor of Hindustan Times, looks at the emerging challenges from technology and what society, laws and technology itself can do about them

Unlock the power of data-driven insights with IIT Delhi's Data Science & Machine Learning Certificate Program! Click here to know more.

See more

Continue reading with HT Premium Subscription

Daily E Paper I Premium Articles I Brunch E Magazine I Daily Infographics
freemium
SHARE THIS ARTICLE ON
Share this article
SHARE
Story Saved
Live Score
OPEN APP
Saved Articles
Following
My Reads
Sign out
New Delhi 0C
Sunday, June 23, 2024
Start 14 Days Free Trial Subscribe Now
Follow Us On