Virtual try-on has become a transformative technology, empowering users to experiment with fashion without ever having to physically try on clothing. However, existing methods often struggle with generating high-fidelity and detail-consistent results. Diffusion models have demonstrated their ability to generate high-quality and photorealistic images, but when it comes to conditional generation scenarios like virtual try-ons, they still face challenges in achieving control and consistency. Outfit Anyone addresses these limitations by leveraging a two-stream conditional diffusion model, enabling it to adeptly handle garment deformation for more lifelike results. It distinguishes itself with scalability—modulating factors such as pose and body shape—and broad applicability, extending from anime to in-the-wild images. Outfit Anyone's performance in diverse scenarios underscores its utility and readiness for real-world deployment.
The conditional Diffusion Model central to our approach processes images of the model, garments, and accompanying text prompts, using garment images as the control factor. Internally, the network segregates into two streams for independent processing of model and garment data. These streams converge within a fusion network that facilitates the embedding of garment details onto the model's feature representation. On this foundation, we have established Outfit Anyone, comprising two key elements: the Zero-shot Try-on Network for initial try-on imagery, and the Post-hoc Refiner for detailed enhancement of clothing and skin texture in the output images.
We showcase Outfit Anyone's capability for versatile outfit changes, including full ensembles and individual pieces, in realistic scenarios.
Here we showcase our model's ability to handle a wide range of eccentric and unique clothing styles, dress them onto the models, and even create corresponding outfit combinations when necessary.
Our model demonstrates the ability to generalize to various body types, including those that are fit, curve and petite, thereby catering to the try-on demands of individuals from all walks of life.
we demonstrate the powerful generalization ability of our model, which can support the creation of new animation characters.
Furthermore, We showcase the effects before and after using the Refiner, demonstrating its ability to significantly enhance the texture and realism of the clothing, while maintaining consistency in the apparel.
We demonstrate the integration of Outfit Anyone with Animate Anyone, a state-of-the-art pose-to-video model, to achieve outfit changes and motion video generation for any character.