Researchers at Meta Platforms Inc. have unveiled significant strides in AI-driven image and video generation. These advancements harness Emu, Meta’s foundational model for image generation, and introduce novel tools empowering precise control over image editing through text-based instructions.
Emu Edit, one of these tools, revolutionizes image manipulation by enabling users to input text-based directives for tasks like local and global edits, background alterations, color transformations, object detection, and more. Its unique design ensures precise pixel-level alterations based on user instructions, avoiding unnecessary modifications beyond the specified request. For instance, adding text to an image without affecting other elements like a baseball cap.
This tool was trained on an extensive dataset of 10 million synthesized samples, providing unparalleled fidelity to instructions and image quality.
Similarly, Emu Video focuses on advancing video generation by leveraging the Emu model, employing diffusion models for text-to-video creation. This process involves conditioning an image based on a text prompt before generating a video using that image and another text input. Compared to prior methods, this “factorized” approach simplifies training video generation models while maintaining high quality and fidelity to the original prompts.
The advancements in both image editing and video generation aim not to replace professional artists and animators but to empower users in expressing themselves creatively. These tools offer exciting possibilities such as creating personalized animated stickers and GIFs effortlessly and simplifying photo editing without the need for complex software.