Assisting the blind to reach daily objects using smart glasses.
Displays 2026 Jul; 93.

Abstract

Searching for objects in their surrounding is challenging for blind and visually impaired individuals (BVI) in daily life. Current assistive technologies powered by large language models (LLMs) and vision language models (VLMs) can offer BVI scene descriptions through conversations. However, communication is often inefficient in helping BVI to reach daily objects or destinations, because those general purpose LLMs/VLMs are not optimized for interpreting or conveying spatial information. We developed a smart glass solution that can utilize open vocabulary object detection models to aid BVI in searching/reaching for a variety of specific objects that are not limited to fixed categories of model training. In our implementations, video streams from the glasses can be processed using open vocabulary object detection models either locally or on other connected devices, such as a smartphone or computer. User can input custom search prompt verbally. This hands-free solution allows people to naturally scan their surroundings by moving their heads, and the stereo audio tones provide directional cues in horizontal and vertical directions to help zero in on the targets, so that it becomes possible to reach these objects accurately. We conducted a human subject pilot study involving 5 blindfolded individuals who reached specific objects (e.g. grabbing the red bottle; reaching the empty chair) among other distractors. The smart glasses solution was compared with Ray-Ban Meta glasses that were running built-in Meta AI for scene recognition. The average task time with our solution (53 seconds) was significantly lower than Meta glasses (126 seconds, p<0.001). The device was also demonstrated to successfully aid a blind user in a grocery shopping scenario. This work shows that active orientation guidance, which is typically lacking in VLMs but provided by our smart glasses solution, can aid in interaction with surrounding environment, such as when reaching for objects and destinations.

Authors+Show Affiliations

Singh ASchepens Eye Research Institute of Mass Eye & Ear, Boston MA. Northeastern University, Boston MA.
Bhanushali MASchepens Eye Research Institute of Mass Eye & Ear, Boston MA. Northeastern University, Boston MA.
Luo JSchepens Eye Research Institute of Mass Eye & Ear, Boston MA. Northeastern University, Boston MA.
Luo GSchepens Eye Research Institute of Mass Eye & Ear, Boston MA. Harvard Medical School Department of Ophthalmology, Boston MA.
Pundlik SSchepens Eye Research Institute of Mass Eye & Ear, Boston MA. Harvard Medical School Department of Ophthalmology, Boston MA.

Pub Type(s)

Journal Article

Language

eng

PubMed ID

42266717