25 lines
2.1 KiB
JSON
25 lines
2.1 KiB
JSON
{
|
|
"title": "Latent Swap Joint Diffusion for Long-Form Audio Generation",
|
|
"authors": [
|
|
"Yusheng Dai",
|
|
"Chenxi Wang",
|
|
"Chang Li",
|
|
"Chen Wang",
|
|
"Jun Du",
|
|
"Kewei Li",
|
|
"Ruoyu Wang",
|
|
"Jiefeng Ma",
|
|
"Lei Sun",
|
|
"Jianqing Gao"
|
|
],
|
|
"abstract": "Previous work on long-form audio generation using global-view diffusion or\niterative generation demands significant training or inference costs. While\nrecent advancements in multi-view joint diffusion for panoramic generation\nprovide an efficient option, they struggle with spectrum generation with severe\noverlap distortions and high cross-view consistency costs. We initially explore\nthis phenomenon through the connectivity inheritance of latent maps and uncover\nthat averaging operations excessively smooth the high-frequency components of\nthe latent map. To address these issues, we propose Swap Forward (SaFa), a\nframe-level latent swap framework that synchronizes multiple diffusions to\nproduce a globally coherent long audio with more spectrum details in a\nforward-only manner. At its core, the bidirectional Self-Loop Latent Swap is\napplied between adjacent views, leveraging stepwise diffusion trajectory to\nadaptively enhance high-frequency components without disrupting low-frequency\ncomponents. Furthermore, to ensure cross-view consistency, the unidirectional\nReference-Guided Latent Swap is applied between the reference and the\nnon-overlap regions of each subview during the early stages, providing\ncentralized trajectory guidance. Quantitative and qualitative experiments\ndemonstrate that SaFa significantly outperforms existing joint diffusion\nmethods and even training-based long audio generation models. Moreover, we find\nthat it also adapts well to panoramic generation, achieving comparable\nstate-of-the-art performance with greater efficiency and model\ngeneralizability. Project page is available at https://swapforward.github.io/.",
|
|
"pdf_url": "http://arxiv.org/pdf/2502.05130v1",
|
|
"entry_id": "http://arxiv.org/abs/2502.05130v1",
|
|
"categories": [
|
|
"cs.SD",
|
|
"cs.AI",
|
|
"cs.CV",
|
|
"cs.MM",
|
|
"eess.AS"
|
|
]
|
|
} |