AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People

Authors:

Seonghee Lee、Maho Kohga、Steve Landau、Sile O’Modhrain、Hari Subramonyam

Paper:

Introduction

Creating visual content is a significant challenge for blind or visually impaired (BVI) individuals, especially when it involves conveying spatial and structural information. Traditional accessible drawing tools, which often rely on line-by-line construction, are limited in their ability to support expressive artwork. On the other hand, generative AI-based text-to-image tools can produce rich illustrations from natural language descriptions but lack precise control over image composition and properties. This gap in functionality inspired the development of AltCanvas, a tile-based image editor that integrates generative AI to provide enhanced control and editing capabilities for BVI users.

Related Work

Image Editing Challenges for BVI Users

BVI users face significant challenges when editing images due to the lack of real-time visual feedback in many current editing software. Complex image editors like Adobe Photoshop are particularly difficult to navigate due to their extensive keyboard commands and non-screen reader-friendly interfaces. Effective image editing tools for BVI users should incorporate elements that assist in navigating the spatial layout and understanding the interactive drawing space.

BVI Image Editing Interfaces

Previous research has explored grid-based interfaces and haptic displays to help BVI users make precise point selections. These interfaces often use keyboard commands coupled with verbalizations of grid locations. Additionally, innovative navigational aids like TextSL offer collision-free navigation using natural language cues. Tactile feedback and image description systems have also been developed to enhance accessibility, allowing users to interact with and understand visual content through a combination of touch and auditory feedback.

Image Generation and Editing with AI Tools

Generative AI has been employed to make visual content more accessible, such as through video summaries and tactile graphics design. Tools like WorldSmith and Crosspower leverage language structure to facilitate graphic content creation, highlighting the potential and limitations of using natural language as a primary medium for image editing.

Research Methodology

Formative Study with Blind Visual Content Creators

To understand existing visual content authoring workflows and associated challenges, semi-structured interviews were conducted with five blind experienced visual content creators. Participants were asked about their authoring processes, challenges, and opinions on generative text-to-image models. Key findings included the need for precise control and guidance, the ability to use pre-existing graphics, and enhanced usability of generative AI features through verbal descriptions.

Design and Development of AltCanvas

Based on the formative study, AltCanvas was developed using an iterative design and evaluation approach. The system features a dynamic tile-based interface and sonification features to support the authoring of visual graphics. The tile view allows users to add, edit, move, and arrange objects while receiving speech and audio feedback. Once completed, the scene can be rendered as a color illustration or as a vector for tactile graphic generation.

Experimental Design

Setup and Tutorial

Users begin by opening AltCanvas on their web browser and familiarizing themselves with the keyboard commands. The system supports stereo audio with panning to aid directional navigation. Users can adjust the speech rate and select the type of image they wish to create, either for general audiences or tactile graphics.

Adding Objects to the Canvas

Users add objects to the canvas using voice commands. The system processes the command, generates the image, and provides a detailed description of the generated image. Users can then navigate through the tiles to add more objects relative to the initial image.

Perception of AI-Generated Content

AltCanvas provides four commands to help users understand the layout and orientation of objects on the canvas: Global Canvas Descriptions, Local Information, Radar Scan, and Chat. These features offer detailed auditory feedback, helping users visualize the spatial relationships between objects.

Editing and Composition

AltCanvas supports essential editing operations, including modifying the location and size of objects, pushing images around the canvas, and deleting unwanted elements. These operations are accompanied by auditory feedback to ensure users can accurately manipulate the visual elements.

Rendering the Final Image

Once editing is complete, users can render the final image for tactile graphic printing or enhance its visual qualities for sharing with sighted audiences. The system uses a dedicated model to optimize the image for tactile graphics, adjusting texture and relief for tactile perception.

Results and Analysis

Usability Evaluation

A user study was conducted with eight BVI users to evaluate the overall usability and effectiveness of AltCanvas. Participants completed a series of illustration tasks and provided feedback on the system’s features. The study found that users were able to successfully create illustrations using the tile-based paradigm and authoring workflow.

Image Generations

Users found the voice command feature intuitive and were generally pleased with the quality of the generated images. However, some users expressed frustration with the lack of control over the AI-generated outputs.

Image Descriptions

Participants were satisfied with the generated image descriptions, which helped them understand the state of the canvas. The image descriptions were used to confirm image generations, verify current canvas states, and ensure desired interactions were displayed correctly.

Image Editing Interactions

The tile-based interface was effective for relative image location navigation, and participants appreciated the sonification features for spatial awareness. Verbal feedback was also found to be helpful during the editing process.

Quality of Final Illustration

Participants responded positively to the match between their perceived illustration and the final printed output. They suggested additional features to enhance the tool’s functionality, such as background adjustment and image rotation.

Overall Conclusion

AltCanvas introduces a novel workflow for creating visual content through generative AI, offering enhanced control over editing interactions and scene composition for BVI users. The tile-based paradigm supports spatial understanding and manipulation of visual content, making it a valuable tool for various applications, including educational materials and professional graphic design. Future work will focus on expanding editing capabilities, improving assistive technology integration, and enhancing sonification experiences to create a more comprehensive and accessible tool for BVI users.