Revert AI image description work #19425

seanbudd · 2026-01-09T05:28:28Z

Reverts PR

Reverts:

Issues fixed

Fixes #19298

Issues reopened

Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt

The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298).
The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer.
An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662.
This would have removed numpy, and improved flexibility for using different models in the future.
Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403).

This means we have the following options for image descriptions:

Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as git-base-coco or blip2.
Wait until MS builds ARM64EC into C++ ONNX (blocked by OnnxRuntime for Windows on Arm as Arm64EC variant? microsoft/onnxruntime#15403)
Attempt to build our own fork of ONNX with ARM64EC
Build a separate ARM native installer of NVDA, offer as an alternative to allow for ARM devices to do image descriptions with numpy.
Release the feature on C++ without support for ARM devices.

All of these options require a significant amount of work.
As such, sadly this feature is not ready for a stable release.

Instead this code will be moved to a feature branch, until ONNX C++ matures such as fixing microsoft/onnxruntime#15403.
Additionally, ONNX C++ runtimes are only available through the experimental 2.0 version of the Windows App SDK, and requires you to build your own headers from it.
I think this feature will be blocked until microsoft/onnxruntime#15403 is implemented and the 2.0 version of the Windows App SDK becomes stable.
Future re-implementations should also consider using higher quality, more modern models.

This reverts commit 20e5b81.

… settings panel (#19243)" This reverts commit 480b087.

)" This reverts commit fa3dcfa.

…abled (#19057)" This reverts commit 61ffb2f.

This reverts commit c9b9d02.

This reverts commit 121c221.

…19036)" This reverts commit 758d7c4.

This reverts commit e1cef07.

Copilot

Pull request overview

This pull request reverts the AI image descriptions feature that was previously introduced across multiple PRs (#18475, #19036, #19024, #19055, #19057, #19178, #19243, #19327, and partial #19342). The revert is motivated by quality concerns with the 3-year-old model producing low-quality captions, and technical challenges with numpy dependencies causing RAM/storage overhead and ARM64 compatibility issues.

Key Changes:

Removes on-device AI image captioning functionality and the NVDA+g gesture
Eliminates numpy and onnxruntime dependencies from the codebase
Removes the _localCaptioner module and all related GUI components

Reviewed changes

Copilot reviewed 28 out of 31 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
uv.lock	Removes numpy, onnxruntime, onnx, coloredlogs, flatbuffers, ml-dtypes, mpmath, sympy and related dependencies; updates remaining packages to newer versions
pyproject.toml	Removes numpy, onnxruntime, and onnx from dependencies and system-tests
source/setup.py	Moves numpy from packages to excludes list; removes numpy-specific includes
source/config/configSpec.py	Removes automatedImageDescriptions config section; fixes indentation inconsistencies
source/config/init.py	Removes automatedImageDescriptions from profile sections
source/NVDAState.py	Removes modelsDir property
source/core.py	Removes _localCaptioner initialization and termination
source/globalCommands.py	Removes image description scripts and SCRCAT_IMAGE_DESC category
source/gui/init.py	Removes LocalCaptionerSettingsPanel references
source/gui/settingsDialogs.py	Removes LocalCaptionerSettingsPanel class
source/gui/blockAction.py	Removes SCREEN_CURTAIN context check
source/_localCaptioner/*	Removes entire module including captioner, downloader, and UI components
source/gui/_localCaptioner/*	Removes dialog implementations
tests/unit/test_localCaptioner/*	Removes unit tests
tests/system/robot/automatedImageDescriptions.*	Removes system tests
tests/system/nvdaSettingsFiles/standard-doLoadMockModel.ini	Removes test configuration
tests/system/libraries/SystemTestSpy/mockModels.py	Removes mock model generator
tests/system/libraries/SystemTestSpy/configManager.py	Removes model configuration logic
user_docs/en/userGuide.md	Removes Image Captioner section and references
user_docs/en/changes.md	Removes feature announcement from changelog
.github/workflows/testAndPublish.yml	Removes imageDescriptions test job

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

source/config/configSpec.py

tianzeshi-study · 2026-01-09T14:35:27Z

The current implementation of AI image descriptions yields low quality captions from a 3 year old model (see #19298). The current implementation also requires using numpy, which hogs RAM, slows initialization, and increases the weight of the installer. An attempt was made to convert this to C++ using WinML and Windows ONNX runtimes as per #18662. This would have removed numpy, and improved flexibility for using different models in the future. Unfortunately, this was not found to be feasible, as ONNX C++ fails to work via 64bit emulation on ARM (microsoft/onnxruntime#15403).

This means we have the following options for image descriptions:
1. Continue to use the python onnxruntime, and accept the RAM and storage hits. Instead, improve the quality of the captioner with better models such as [git-base-coco](https://huggingface.co/microsoft/git-base-coco) or [blip2](https://huggingface.co/Salesforce/blip2-opt-2.7b-coco).

It is worth noting that, in addition to the default model, the current architecture can run Mozilla’s distilvit with zero code changes, which is not a three-year-old model.

At the same time, the proposed “better” models (BLIP-2 and GIT-base-COCO) are from roughly the same period as vit-gpt2-image-captioning. Their performance and output quality may still need to be validated, and they may not actually perform as well as expected.

To be honest, running models on consumer-grade CPUs is not a major focus of current industry and research efforts. As a result, there are only a limited number of transformer-based models that can produce results within three seconds on a CPU.

That said, we still hope that more capable multimodal models will emerge in the future to address this gap.

Regarding the use of numpy:

if having it as a dependency is truly unacceptable, then in the future any offline model–based translation or OCR features would also be unable to rely on numpy and would have to be implemented in C++ instead. This would represent a significant amount of work.Additionally, introducing too many submodules could make the repository increasingly bloated and harder to maintain.

seanbudd added 10 commits January 9, 2026 15:52

Revert "Add warnings to AI image descriptions (#19327)"

4bf76c4

This reverts commit 20e5b81.

Revert "Show dialog to enable image desc once if it is not enabled in…

f51d2e3

… settings panel (#19243)" This reverts commit 480b087.

Revert "Able to toggle imageDesc while screen curtain is enabled (#19178

5c92b14

)" This reverts commit fa3dcfa.

Revert "Avoid runing AI image descriptions while screen curtain is en…

56852c7

…abled (#19057)" This reverts commit 61ffb2f.

Revert "Lazy load heavy deps for AI image descriptions (#19055)"

411c667

This reverts commit c9b9d02.

Revert "Improve image captioner (#19024)"

3e36f5e

This reverts commit 121c221.

Revert "support progress report for AI image descriptions download (#…

ee51e77

…19036)" This reverts commit 758d7c4.

Revert "Support image descriptions using local AI model (#18475)"

3ec57b2

This reverts commit e1cef07.

fix up reverts

17aa391

fix up reverts

34c5c6f

Copilot AI review requested due to automatic review settings January 9, 2026 05:28

seanbudd requested review from a team as code owners January 9, 2026 05:28

seanbudd requested review from Qchristensen and SaschaCowley January 9, 2026 05:28

seanbudd added this to the 2026.1 milestone Jan 9, 2026

Copilot started reviewing on behalf of seanbudd January 9, 2026 05:29 View session

This was referenced Jan 9, 2026

format download message better for AI models #19417

Closed

Use WinML for python ONNX runtime #19416

Closed

seanbudd changed the title ~~Revert image description work~~ Revert AI image description work Jan 9, 2026

Copilot AI reviewed Jan 9, 2026

View reviewed changes

source/config/configSpec.py Show resolved Hide resolved

source/config/configSpec.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Revert AI image description work #19425

Revert AI image description work #19425

Uh oh!

seanbudd commented Jan 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

tianzeshi-study commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Revert AI image description work #19425

Are you sure you want to change the base?

Revert AI image description work #19425

Uh oh!

Conversation

seanbudd commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reverts PR

Issues fixed

Issues reopened

Reason for revert / Can this PR be reimplemented? If so, what is required for the next attempt

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

tianzeshi-study commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

seanbudd commented Jan 9, 2026 •

edited

Loading