Google Announces Development of Lumiere, a Next-Generation AI-Driven Text-to-Video Generator

Examples of results generated by Lumiere, including text generation in video (first row), image in video (second row), style-referenced generation and video inpainting (third row; bounding box indicates the region of the inpainting mask). Credit: arXiv (2024). DOI: 10.48550/arxiv.2401.12945

A team of AI researchers from Google Research has developed a next-generation AI-powered text-to-video generator called Lumiere. The group published an article describing its efforts on arXiv preprint server.

In recent years, artificial intelligence applications have moved from the research laboratory to the user community at large: LLMs such as ChatGPT, for example, have been integrated into browsers, allowing users to generate text from unprecedented way.

More recently, text-to-image generators have allowed users to create surreal images. And text-to-video generators allowed users to generate short video clips using just a few words. In this new effort, the Google team has taken the latter category to new heights with the announcement of a text-to-video generator called Lumiere.

Lumiere, probably named after the Lumière brothers who pioneered photographic equipment, allows users to enter a simple phrase such as “two raccoons reading books together” and get a fully finished video showing two raccoons doing exactly that – and it does it in incredibly high resolution. The new generator represents the next step in the development of text-to-video generators by creating much more aesthetically pleasing results.

Google describes the technology behind the new generator as a “revolutionary space-time U-Net architecture.” It was designed to generate an animated video in a single template pass.

The demo video shows that Google has added additional features, such as allowing users to edit an existing video by highlighting part of it and typing instructions, such as “change dress color to red “. The generator also produces different types of results, such as stylizations, where the style of a subject is created rather than a color representation. It also allows substyles, such as different style references. It also makes cinematic images, in which a user can highlight part or all of a still image and have it animate.

In its announcement, Google did not specify whether it plans to release or distribute Lumiere to the public, likely due to the obvious legal consequences that could arise from potentially creating videos that violate copyright laws.

More information:
Omer Bar-Tal et al, Light: a spatio-temporal diffusion model for video generation, arXiv (2024). DOI: 10.48550/arxiv.2401.12945

Lumière-video.github.io/

Journal information:
arXiv

Quote: Google announces the development of Lumiere, a next-generation AI-based text-to-video generator (January 26, 2024) retrieved January 26, 2024 from

This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.

Google Announces Development of Lumiere, a Next-Generation AI-Driven Text-to-Video Generator

Locusts’ sense of smell enhanced with custom-made nanoparticles

While passing through the United States, a tourist comes across a large 7.46-carat diamond

While passing through the United States, a tourist comes across a large 7.46-carat diamond

Leave a Reply Cancel reply

Category