A team of AI researchers from Google Research has developed a next-generation AI-powered text-to-video generator called Lumiere. The group published an article describing its efforts on arXiv preprint server.
In recent years, artificial intelligence applications have moved from the research laboratory to the user community at large: LLMs such as ChatGPT, for example, have been integrated into browsers, allowing users to generate text from unprecedented way.
More recently, text-to-image generators have allowed users to create surreal images. And text-to-video generators allowed users to generate short video clips using just a few words. In this new effort, the Google team has taken the latter category to new heights with the announcement of a text-to-video generator called Lumiere.
Lumiere, probably named after the Lumière brothers who pioneered photographic equipment, allows users to enter a simple phrase such as “two raccoons reading books together” and get a fully finished video showing two raccoons doing exactly that – and it does it in incredibly high resolution. The new generator represents the next step in the development of text-to-video generators by creating much more aesthetically pleasing results.
Google describes the technology behind the new generator as a “revolutionary space-time U-Net architecture.” It was designed to generate an animated video in a single template pass.
The demo video shows that Google has added additional features, such as allowing users to edit an existing video by highlighting part of it and typing instructions, such as “change dress color to red “. The generator also produces different types of results, such as stylizations, where the style of a subject is created rather than a color representation. It also allows substyles, such as different style references. It also makes cinematic images, in which a user can highlight part or all of a still image and have it animate.
In its announcement, Google did not specify whether it plans to release or distribute Lumiere to the public, likely due to the obvious legal consequences that could arise from potentially creating videos that violate copyright laws.
More information:
Omer Bar-Tal et al, Light: a spatio-temporal diffusion model for video generation, arXiv (2024). DOI: 10.48550/arxiv.2401.12945
Lumière-video.github.io/
arXiv
© 2024 Science X Network
Quote: Google announces the development of Lumiere, a next-generation AI-based text-to-video generator (January 26, 2024) retrieved January 26, 2024 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.