Z-SASLM:零样本风格对齐的SLI潜空间操纵

Z-SASLM: Zero-Shot Style-Aligned SLI Blending Latent Manipulation

摘要 Abstract

我们提出了Z-SASLM(Zero-Shot Style-Aligned SLI Blending Latent Manipulation),这是一种零样本风格对齐的球面线性插值(SLI)潜空间混合操纵方法,克服了现有多种风格混合方法的局限性。传统方法依赖于线性混合,假定潜在空间为平坦空间,这在整合多参考风格时会导致次优结果。相比之下,我们的框架利用了潜在空间的非线性几何特性,通过SLI混合方法结合加权风格表示。通过沿超球面上的测地线进行插值,Z-SASLM保持了潜在空间的内在结构,确保了多样化风格的高保真度和一致融合——且无需微调。我们进一步提出了一种新的定量评估混合风格一致性的指标,即加权多风格DINO ViT-B/8。虽然我们的主要关注点在于SLI混合在风格操作中的理论和实践优势,但我们还通过全面的实验研究展示了其在多模态内容融合场景中的有效性。实验结果表明,Z-SASLM实现了增强且稳健的风格对齐效果。实现代码可访问:https://github.com/alessioborgi/Z-SASLM。

We introduce Z-SASLM, a Zero-Shot Style-Aligned SLI (Spherical Linear Interpolation) Blending Latent Manipulation pipeline that overcomes the limitations of current multi-style blending methods. Conventional approaches rely on linear blending, assuming a flat latent space leading to suboptimal results when integrating multiple reference styles. In contrast, our framework leverages the non-linear geometry of the latent space by using SLI Blending to combine weighted style representations. By interpolating along the geodesic on the hypersphere, Z-SASLM preserves the intrinsic structure of the latent space, ensuring high-fidelity and coherent blending of diverse styles - all without the need for fine-tuning. We further propose a new metric, Weighted Multi-Style DINO ViT-B/8, designed to quantitatively evaluate the consistency of the blended styles. While our primary focus is on the theoretical and practical advantages of SLI Blending for style manipulation, we also demonstrate its effectiveness in a multi-modal content fusion setting through comprehensive experimental studies. Experimental results show that Z-SASLM achieves enhanced and robust style alignment. The implementation code can be found at: https://github.com/alessioborgi/Z-SASLM.

Z-SASLM:零样本风格对齐的SLI潜空间操纵 - arXiv