The Journey of Developing an AI Image Editor: Challenges and Takeaways

Jan Schmitz

Jun 12, 2024 • 6 min read

Screenshot of AI Image Editor

Over the past 6 months, I've embarked on an incredible journey to develop an AI-powered image editor. This project, which started as a simple idea, has now evolved into a robust tool that leverages various AI models to simplify and enhance the image editing process while combining it with a classical image editing canvas. Today, I want to share the challenges I faced and the valuable takeaways from this endeavour.

The Genesis of the Idea

The idea for the AI Image Editor was born out of a personal need for a more intuitive and powerful image editing tool. Traditional image editors, while powerful, often have steep learning curves and can be overwhelming for casual users. The best example of this is Adobe Photoshop with its new AI features. The goal was to create an editor that combines the power of Generative AI on a high-quality level, with a user-friendly interface to make image editing accessible and efficient. In simple terms, I wanted to create an image editor that combines the best parts of Canvas, Figma and Adobe Photoshop.

Core Features and Functionality

First I had to compose a list of core features which I believed would best harmonize to bring the easy, flexible, yet high-quality workflow to life. After initial user research, I settled on the list of features below. Having now, a good first idea of what to focus on the hacking can get started.

First, a solid canvas editor had to be put in place that serves as the bedrock for all generative AI features. Here is the list of features and AI models that I have incorporated into the product.

Canvas-Based Editing: The cornerstone of our AI Image Editor is a versatile canvas, built using the robust Fabric.js library. This canvas serves as the primary workspace where users can engage in a wide range of editing and composition tasks. The canvas supports the creation and manipulation of various objects, including text, lines, circles, squares, and other shapes. Users can easily layer different objects, adjust their properties, and create complex compositions with precision and flexibility.

Object Manipulation and Layering: The editor allows for extensive object manipulation, including resizing, rotating, and moving elements within the canvas. Each object can be layered, enabling users to control the depth and visual hierarchy of their compositions. This functionality is crucial for creating visually appealing and professionally structured designs.

Text and Shape Integration: Adding text and shapes to the canvas is straightforward, with tools designed for creating and customizing these elements. Text objects can be styled with various fonts, sizes, and colors, while shapes can be adjusted for stroke, fill, and other properties. This integration supports a wide range of design applications, from simple annotations to complex graphical compositions.

Generative AI Enhancements

Complementing the core canvas features, our editor incorporates advanced generative AI functionalities. These include:

Text-to-Image: Users can generate images from textual descriptions by inputting a prompt, to create entirely new images that align with their creative vision. This is usually the entry point into the image generation features. The text-to-image model interprets the input text and generates high-quality images, making it a powerful tool for ideation and creative exploration.

The text-to-image feature is powered by a SDXL model and a DPM ++ 2M KARRAS sampler.

Inpainting: The inpainting feature seamlessly fills missing parts of an image or extends it beyond its original borders. The models ensure that generated content matches the surrounding context, maintaining visual coherence and detail.

For the inpainting, I used a fine-tuned version of the SDXL model

Object Removal: The object removal feature works in a similar way but reversed, it identifies and removes unwanted objects from images. The removed areas are intelligently filled, ensuring that the edits blend seamlessly with the rest of the image.

For the object removal feature, the LaMa model is being used (not the one from Meta)

Removing fish as object from image — Removing fish as an object from image

Background Removal: Powered by the Highly Accurate Dichotomous Image Segmentation (DIS) Model, the background removal feature provides precise background removal capabilities. Users can effortlessly isolate subjects from their backgrounds, enabling the creation of clean and professional images for various applications.

The integration of these generative AI features with the Fabric.js based canvas enhances the overall editing workflow. Users can switch between manual editing and AI-powered enhancements seamlessly, combining the best of both worlds. This integration not only streamlines the editing process but also opens up new creative possibilities, allowing users to achieve complex edits with minimal effort.

Challenges Faced

Integration of AI Models: One of the biggest challenges was integrating AI models into the editor. It required extensive research and testing to ensure that the models could perform reliably across a wide range of images. Ensuring the AI's accuracy and efficiency without compromising performance was a delicate balance.

Performance Optimization: Handling large image files and performing real-time edits posed significant performance challenges. Optimizing the canvas rendering and managing memory usage was critical to maintaining a smooth user experience.

User Experience Design: Designing an intuitive user interface that could accommodate advanced features while remaining simple to use was a complex task. Continuous user testing and feedback were essential in refining the design. Let me know what you guys think of the current UX.

Scalability and Stability: As the tool evolved, ensuring scalability and stability became paramount. This involved robust backend infrastructure to handle the AI processing and implementing fail-safes to prevent crashes during intensive operations. It's not perfect yet, but with more users, more ways to break the applications will be revealed 😉

Takeaways

Iterative Development: The importance of iterative development cannot be overstated. Regularly testing new features and gathering user feedback helped in identifying issues early and making necessary adjustments.

Balancing Innovation with Usability: Striking the right balance between innovative features and usability is crucial. It’s easy to get carried away with adding advanced features, but ensuring that the tool remains user-friendly should always be a priority.

Community Engagement: Engaging with the community and potential users provided invaluable insights. Their feedback not only highlighted areas of improvement but also inspired new features.

Continuous Learning: The field of AI and image processing is rapidly evolving. Staying updated with the latest advancements and continuously learning is essential to keep the tool relevant and effective. As an example, during the process of developing this application, Stability AI released a new flagship model Stable Diffusion 3, which I'd like to test next and potentially integrate.

Looking Ahead

The AI Image Editor is a testament to what can be achieved with a blend of innovative technology and user-centered design. As I try to continue refining and expanding the tool, the focus remains on enhancing user experience and leveraging AI to simplify complex tasks.

For anyone interested in exploring the AI Image Editor, I invite you to visit Sage Marketer. Your feedback and insights would be greatly appreciated as we continue to improve and innovate. Link to AI Image Editor

Screenshot of AI Image Editor - Check it out and give feedback

This journey has been immensely rewarding, and I hope that sharing these experiences will inspire and assist others in their own projects. Whether you’re a developer, designer, or just someone interested in AI, there’s always something new to learn and discover.