Open
Conversation
- Introduced a new transcription engine option for local speech recognition. - Added model management features including download, validation, and status tracking. - Updated settings UI to allow engine selection and model management. - Ensured compatibility with existing Apple Speech backend as default. - Included localization for new UI elements and model statuses.
- Added support for local Parakeet TDT backend using sherpa-onnx. - Implemented model download, validation, and status tracking features. - Updated settings UI to allow selection between Apple Speech and Parakeet engines. - Included localization for new UI elements and model statuses. - Introduced a bridging header for integrating C API with Swift. Co-authored-by: Cursor <cursoragent@cursor.com>
…gine - Updated ParakeetSpeechEngine to handle errors during recognizer and VAD initialization. - Enhanced memory management by optimizing sample buffer handling and reducing unnecessary copies. - Adjusted VAD parameters for improved performance. - Added functionality to emit detected speech segments more efficiently. - Updated SettingsView to refresh model statuses based on selected engine.
- Added support for local ASR models, including Nemotron and Parakeet, with corresponding localization. - Updated AppSettings to manage selected local model and ensure backward compatibility. - Enhanced LocalModelManager for improved model status tracking and management. - Refactored TransFlowViewModel and SettingsView to accommodate new local model options and statuses. - Introduced NemotronStreamingSpeechEngine for real-time speech recognition. - Improved error handling and user feedback in the settings interface.
- Introduced LocalModelDownloadDetail struct to track download progress, speed, and estimated time. - Updated LocalModelManager to handle download cancellation and resume functionality. - Enhanced SettingsView to display detailed download progress and allow users to cancel ongoing downloads. - Improved localization strings for new UI elements related to model management. - Adjusted VAD parameters in ParakeetSpeechEngine for better performance.
- Added a new state variable for app settings in MainView. - Implemented an onChange listener for selectedEngine to trigger loading of supported languages asynchronously.
Improve build script reliability with prerequisite checks, deterministic source sync, configurable flags, and atomic xcframework output. Document local STT developer setup in both Chinese and English READMEs. Co-authored-by: Cursor <cursoragent@cursor.com>
Owner
|
感谢PR,我来试试🫡 另外我其实想用FluidAudio框架,onnx太底层了每次都需要编译,你觉得呢 |
Owner
|
不错,试了下可以work,但有几个小问题: 我之前也尝试过Parakeet TDT模型,好像不适合处理实时转录,需要更多的分块来处理,所以实时区域如果没有特殊处理就无法显示,如图: 我感觉TDT适合拿来做后处理精校字幕 另外Nemotron Streaming 0.6B这个模型能力比较一般,测试如图 参考:https://github.com/FluidInference/FluidAudio/blob/main/Documentation/Models.md |
Collaborator
Author
|
同意,实际用起来这两个模型并没有 Speech Analyzer 效果好,这个 PR 可以 Close FluidAudio 看起来很不错,他们支持 Parakeet EOU,可以做 streaming,但是好像没有 auto capitalization 和 punctuation,我先去试试看 BTW,APP做得很好很有用,感谢! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
This PR delivers full on-device STT support with
sherpa-onnx, adds selectable local engines/models, and significantly improves local model download reliability and UX.Core STT integration
sherpa-onnxinto the macOS app via XCFramework + bridging header + project linkage updates.TranscriptionEngineabstraction:AppleSpeechEngine(existing Apple path)ParakeetSpeechEngine(offline local)NemotronStreamingSpeechEngine(streaming local)TranscriptionEngineKind:.apple/.localLocalTranscriptionModelKind: Parakeet + NemotronAppSettingspersistence + backward compatibility for legacy engine key.Local model management + download pipeline
LocalModelManagerinto a spec-driven, multi-model manager.URLSessionDownloadTaskdelegate pipeline.en/zh-Hans.Validation-driven fixes included
path(percentEncoded: false).bpe_vocabcorrectly for BPE models.ProgressViewlayout warning cleanup.Developer workflow improvements
scripts/build-sherpa-onnx.sh:--version,--archs,--deployment-target,--jobs,--clean,--reclone,--output)cmake -S/-B,cmake --build,cmake --installREADME.md/README_EN.mdwith local STT developer setup instructions.Why
Test Plan
DEVELOPER_DIR="/Applications/Xcode.app/Contents/Developer" xcodebuild -project "TransFlow/TransFlow.xcodeproj" -scheme "TransFlow" -configuration Debug -sdk macosx buildbash -n scripts/build-sherpa-onnx.sh./scripts/build-sherpa-onnx.sh --helpNotes