Ablation study of the spatial pyramid at a 50k imaging stage. On the right is a quantitative comparison of FID results, where our method achieves almost three times faster convergence speed.
A team of AI researchers from Peking University, Kuaishou Technology and Beijing University of Posts and Telecommunications, have developed a new AI model called Pyramid Flow, which can be used to generate virtual video images high resolution (768p). The group wrote a paper describing how they built their model, its attributes and the uses to which it could be applied and published it on the website. arXiv preprint server.
Over the past few years, several entities, both private and public, have worked to create video AI generation models. Indeed, such models can be used to create applications capable of producing virtual video content for television and cinema, at a cost much lower than that of filming real scenes.
This means that the value of AI models is increasing very quickly. In this new effort, the Chinese team has chosen to make its model open source, meaning that anyone who chooses to develop an application for it (an inference shell) and run it locally, including at commercial purposes, cannot do so without any problem. cost.
The creators of Pyramid Flow have added a new feature to the AI video generation models: it generates the video in several low-resolution steps before generating the final result of its processing. The research team claims that an inference shell can generate a five-second video in 56 seconds: the result will be 384p resolution.
They point out that their approach generates video using much less computing power, making it less expensive. This also significantly reduces the number of tokens needed for video generation, making it more efficient.
The team has published (under MIT license) the code for Pyramid Flow on GitHub, along with example videos that demonstrate the very realistic results that can be expected from the model. They also listed the open source datasets they used to train their model, which total up to 10 million short videos.
The research team did not mention the impact of ongoing allegations from those who view virtual videos made from open source databases as violating the rights of copyright holders. However, they suggest that Pyramid Flow could be a suitable tool for refining open source hardware, without the need to pay a third party.
More information:
Yang Jin et al, Pyramid flow matching for efficient video generative modeling, arXiv (2024). DOI: 10.48550/arxiv.2410.05954
Demo: huggingface.co/spaces/Pyramid-Flow/pyramid-flow
© 2024 Science X Network
Quote: New AI model for high-resolution video generation, Pyramid Flow, is available as open source software (October 14, 2024) retrieved October 14, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.