SOTY - Speech-To-Text Recognition on Ubuntu Touch
-
New version is out
Changes:- Fixed a few bugs
- Added models installer (works for many languages listed in the menu, other models will be uploaded later)
- The application UI can be translated.
- Now it comes with amd64 build
-
The Client library for v2 protocol is now a complete QML plugin, which can be easily added to your application and then used in your QML layout. The repository contains all steps to integrate speech recognition client in your project. No permissions needed, the only requirement is server application running in background locally.
-
This is great, and I'll try it out soon!
How much change to add TTS?
-
@undrwater
Thanks for your interest. You can easily integrate TTS support in your application using espeak-ng. However, espeak data takes 20 MBytes of user storage space. If you want this functionality for Soty server, that would require changing communication protocol for both server and client. It also would not be too hard, but I personally think we should look for a more accurate solution for this task, that could be seamlessly integrated in system, such as speech-dispatcher. -
@idonthatevests I assume you're using LLM for STT. I'm doing that on my desktop in a python venv.
I'm still figuring my way around how UT is organized (I use gentoo, and it's quite different).
Anyway, I remember speech-dispatcher use in the past, but those generally were non-LLM targets (e-speak, sphinxs, etc...), but if it could be a wrapper for any target, it would be great!
-
@undrwater said in SOTY - Speech-To-Text Recognition on Ubuntu Touch:
@idonthatevests I assume you're using LLM for STT. I'm doing that on my desktop in a python venv.
No, I use small ASR models. Running LLM on an old mobile CPU for this task would likely make the speech recognition expensive and slow. And I think the same situation would be with attempts to use it for speech synthesis. So, using LLMs for that on mobile OS is probably possible, but only if you implement it for non time-critical tasks. Yet, in my opinion espeak-ng is still a fine option for that and is highly configurable.
I'm still figuring my way around how UT is organized (I use gentoo, and it's quite different).
There are many things in UT that are not organized yet, but that's what is great about UT for me, that you can do it yourself! Have fun with your research
-
Can you link the SAR model you're using?
Also, I'm not getting the microphone icon on the UT Translator app. enableSTT=true is set on the conf file, and Soty server is running (configured not to be killed by the tweak tool).
I am getting "failed to load Vosk model" in the Soty server, but not sure why. I unzipped the model into ~/.local/kl.soty/en, and permissions look good.
Can soty be started from the terminal?
-
@undrwater
There's an automatic installer in Soty app. It is achieved by clicking settings button (gear pictogram) in the right corner of top panel.
For manual installation, models should be put in .local/share/kl.soty
Now I see that it's my mistake. I'm sorry for providing not working instructions. Editing op post right now -
Cool. I did try installing from the gear icon, but already had the language model downloaded and unzipped into 'en'.
When tapping the english model from the gear icon, it didn't seem to finish. Is this because the model I unzipped was in the way?
I'll move it to try.
-
Still no luck. Getting the following message:
"THE INSTALLATION IS IN PROGRESS. THIS MAY TAKE A WHILE. DO NOT PRESS THAT BUTTON UNTIL THE INSTALLATION IS FINISHED PLEASE."
This message remains for...well...ever. Far too long for a model install. I've left it overnight (though of course the phone went to sleep). I've tried keeping the phone awake for about 30 minutes, but the message stays.
The reason I asked about running from cli is to see if there are any useful messages regarding why it's taking so long.
Thanks!
PS: I've got vosk running on my desktop. Amazing what it can recognize!