Video-to-audio research uses video pixels and text prompts to generate rich soundtracks
视频转音频研究利用视频像素和文字提示生成丰富的背景音乐
Video generation models are advancing at an incredible pace, but many current systems can only generate silent output. One of the next major steps toward bringing generated movies to life is creating soundtracks for these silent videos.
Today, we're sharing progress on our video-to-audio (V2A) technology, which makes synchronized audiovisual generation possible. V2A combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.
Our V2A technology is pairable with video generation models like Veo to create shots with a dramatic score, realistic sound effects or dialogue that matches the characters and tone of a video.
It can also generate soundtracks for a range of traditional footage, including archival material, silent films and more — opening a wider range of creative opportunities.
它还可以为各种传统素材(包括档案资料、默片等)生成配乐,从而带来更多的创作机会。
Prompt for audio: Cinematic, thriller, horror film, music, tension, ambience, footsteps on concrete