Feature/parakeet onnx by chuntwdev · Pull Request #1 · Cyronlee/TransFlow

chuntwdev · 2026-02-10T06:19:01Z

Summary

This PR delivers full on-device STT support with sherpa-onnx, adds selectable local engines/models, and significantly improves local model download reliability and UX.

Core STT integration

Integrates sherpa-onnx into the macOS app via XCFramework + bridging header + project linkage updates.
Refactors speech backends behind a shared TranscriptionEngine abstraction:
- AppleSpeechEngine (existing Apple path)
- ParakeetSpeechEngine (offline local)
- NemotronStreamingSpeechEngine (streaming local)
Adds local engine/model selection architecture:
- TranscriptionEngineKind: .apple / .local
- LocalTranscriptionModelKind: Parakeet + Nemotron
- AppSettings persistence + backward compatibility for legacy engine key.

Local model management + download pipeline

Refactors LocalModelManager into a spec-driven, multi-model manager.
Replaces byte-stream download path with URLSessionDownloadTask delegate pipeline.
Adds robust large-file handling:
- persisted resume data per model/file role
- bounded transient retry with exponential backoff
- staging + validation + atomic install/replace
- cleanup of stale staging/resume artifacts
Improves settings UX for local models:
- per-model picker and state
- richer progress feedback (bytes/speed/ETA)
- resume-aware Download action + Cancel
- added/updated i18n keys for en / zh-Hans.

Validation-driven fixes included

Path handling fix for app support directories: path(percentEncoded: false).
Download temp-file ownership fix in delegate callback (prevents delayed move failures).
Nemotron online config fix: sets bpe_vocab correctly for BPE models.
Language selector UX fix: refresh language options when engine changes.
Control bar ProgressView layout warning cleanup.

Developer workflow improvements

Hardens scripts/build-sherpa-onnx.sh:
- prerequisite checks
- configurable flags (--version, --archs, --deployment-target, --jobs, --clean, --reclone, --output)
- deterministic source/tag sync
- modern cmake -S/-B, cmake --build, cmake --install
- static lib presence checks + arch validation
- atomic XCFramework output
- improved logging/help output
Updates README.md / README_EN.md with local STT developer setup instructions.

Why

Enable practical offline/local transcription with user-selectable model backends.
Keep app bundle small by downloading model assets on demand.
Make model downloads resilient and user-friendly (resume/retry/progress visibility).
Preserve Apple Speech as default while expanding to local inference workflows.
Improve contributor onboarding for local STT development.

Test Plan

Build succeeds:
- DEVELOPER_DIR="/Applications/Xcode.app/Contents/Developer" xcodebuild -project "TransFlow/TransFlow.xcodeproj" -scheme "TransFlow" -configuration Debug -sdk macosx build
Engine switching:
- Apple engine shows multi-language picker.
- Local engine constrains language to English as expected.
- Switching back to Apple refreshes language options immediately.
Local model downloads:
- progress, bytes/speed/ETA update correctly.
- cancel + resume path works.
- transient failure retry path is exercised.
Model lifecycle:
- not downloaded -> downloading -> ready transitions.
- delete resets model status.
Runtime engines:
- Parakeet offline path starts and transcribes.
- Nemotron streaming path initializes and transcribes.
Build script sanity:
- bash -n scripts/build-sherpa-onnx.sh
- ./scripts/build-sherpa-onnx.sh --help

Notes

Download integrity verification (e.g., hash checks) is intentionally deferred for a later iteration.
Apple speech model flow remains intact and unchanged in default behavior.

- Introduced a new transcription engine option for local speech recognition. - Added model management features including download, validation, and status tracking. - Updated settings UI to allow engine selection and model management. - Ensured compatibility with existing Apple Speech backend as default. - Included localization for new UI elements and model statuses.

- Added support for local Parakeet TDT backend using sherpa-onnx. - Implemented model download, validation, and status tracking features. - Updated settings UI to allow selection between Apple Speech and Parakeet engines. - Included localization for new UI elements and model statuses. - Introduced a bridging header for integrating C API with Swift. Co-authored-by: Cursor <cursoragent@cursor.com>

…gine - Updated ParakeetSpeechEngine to handle errors during recognizer and VAD initialization. - Enhanced memory management by optimizing sample buffer handling and reducing unnecessary copies. - Adjusted VAD parameters for improved performance. - Added functionality to emit detected speech segments more efficiently. - Updated SettingsView to refresh model statuses based on selected engine.

- Added support for local ASR models, including Nemotron and Parakeet, with corresponding localization. - Updated AppSettings to manage selected local model and ensure backward compatibility. - Enhanced LocalModelManager for improved model status tracking and management. - Refactored TransFlowViewModel and SettingsView to accommodate new local model options and statuses. - Introduced NemotronStreamingSpeechEngine for real-time speech recognition. - Improved error handling and user feedback in the settings interface.

- Introduced LocalModelDownloadDetail struct to track download progress, speed, and estimated time. - Updated LocalModelManager to handle download cancellation and resume functionality. - Enhanced SettingsView to display detailed download progress and allow users to cancel ongoing downloads. - Improved localization strings for new UI elements related to model management. - Adjusted VAD parameters in ParakeetSpeechEngine for better performance.

- Added a new state variable for app settings in MainView. - Implemented an onChange listener for selectedEngine to trigger loading of supported languages asynchronously.

Improve build script reliability with prerequisite checks, deterministic source sync, configurable flags, and atomic xcframework output. Document local STT developer setup in both Chinese and English READMEs. Co-authored-by: Cursor <cursoragent@cursor.com>

Cyronlee · 2026-02-12T02:05:37Z

感谢PR，我来试试🫡

另外我其实想用FluidAudio框架，onnx太底层了每次都需要编译，你觉得呢

Cyronlee · 2026-02-12T02:53:36Z

不错，试了下可以work，但有几个小问题：

我之前也尝试过Parakeet TDT模型，好像不适合处理实时转录，需要更多的分块来处理，所以实时区域如果没有特殊处理就无法显示，如图：

我感觉TDT适合拿来做后处理精校字幕

另外Nemotron Streaming 0.6B这个模型能力比较一般，测试如图

参考：https://github.com/FluidInference/FluidAudio/blob/main/Documentation/Models.md

chuntwdev · 2026-02-12T06:20:03Z

同意，实际用起来这两个模型并没有 Speech Analyzer 效果好，这个 PR 可以 Close

FluidAudio 看起来很不错，他们支持 Parakeet EOU，可以做 streaming，但是好像没有 auto capitalization 和 punctuation，我先去试试看

BTW，APP做得很好很有用，感谢！

chuntwdev and others added 8 commits February 9, 2026 16:24

feat: integrate settings change handling in MainView

06778cf

- Added a new state variable for app settings in MainView. - Implemented an onChange listener for selectedEngine to trigger loading of supported languages asynchronously.

Merge branch 'main' into feature/parakeet-onnx

d33d57f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feature/parakeet onnx#1

Feature/parakeet onnx#1
chuntwdev wants to merge 8 commits intoCyronlee:mainfrom
chuntwdev:feature/parakeet-onnx

chuntwdev commented Feb 10, 2026

Uh oh!

Cyronlee commented Feb 12, 2026

Uh oh!

Cyronlee commented Feb 12, 2026

Uh oh!

chuntwdev commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

chuntwdev commented Feb 10, 2026

Summary

Core STT integration

Local model management + download pipeline

Validation-driven fixes included

Developer workflow improvements

Why

Test Plan

Notes

Uh oh!

Cyronlee commented Feb 12, 2026

Uh oh!

Cyronlee commented Feb 12, 2026

Uh oh!

chuntwdev commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants