UBports Robot Logo UBports Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    SOTY - Speech-To-Text Recognition on Ubuntu Touch

    Scheduled Pinned Locked Moved App Development
    12 Posts 2 Posters 727 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
      Reply
      • Reply as topic
      Log in to reply
      This topic has been deleted. Only users with topic management privileges can see it.
      • I Offline
        idonthatevests @idonthatevests
        last edited by

        New version is out
        Introducing protocol v2 - now client apps do not need to use the microphone, audio recording is performed on a server. There are some flaws for this decision, though.
        CPU load has been reduced in this release
        This release is also backwards compatible with protocol v1, where client sends recorded data to server, just in case someone wants to send audio data from other sources.

        Also, my apologies, I forgot to link
        Client library for v2 protocol
        Client library for v1 protocol (requires Microphone permission)

        1 Reply Last reply Reply Quote 4
        • I Offline
          idonthatevests
          last edited by

          New version is out
          Changes:

          • Fixed a few bugs
          • Added models installer (works for many languages listed in the menu, other models will be uploaded later)
          • The application UI can be translated.
          • Now it comes with amd64 build
          1 Reply Last reply Reply Quote 2
          • I Offline
            idonthatevests
            last edited by idonthatevests

            The Client library for v2 protocol is now a complete QML plugin, which can be easily added to your application and then used in your QML layout. The repository contains all steps to integrate speech recognition client in your project. No permissions needed, the only requirement is server application running in background locally.

            1 Reply Last reply Reply Quote 1
            • U Offline
              undrwater
              last edited by

              This is great, and I'll try it out soon!

              How much change to add TTS?

              I 1 Reply Last reply Reply Quote 0
              • I Offline
                idonthatevests @undrwater
                last edited by idonthatevests

                @undrwater
                Thanks for your interest. You can easily integrate TTS support in your application using espeak-ng. However, espeak data takes 20 MBytes of user storage space. If you want this functionality for Soty server, that would require changing communication protocol for both server and client. It also would not be too hard, but I personally think we should look for a more accurate solution for this task, that could be seamlessly integrated in system, such as speech-dispatcher.

                U 1 Reply Last reply Reply Quote 1
                • U Offline
                  undrwater @idonthatevests
                  last edited by

                  @idonthatevests I assume you're using LLM for STT. I'm doing that on my desktop in a python venv.

                  I'm still figuring my way around how UT is organized (I use gentoo, and it's quite different).

                  Anyway, I remember speech-dispatcher use in the past, but those generally were non-LLM targets (e-speak, sphinxs, etc...), but if it could be a wrapper for any target, it would be great!

                  I 1 Reply Last reply Reply Quote 0
                  • I Offline
                    idonthatevests @undrwater
                    last edited by idonthatevests

                    @undrwater said in SOTY - Speech-To-Text Recognition on Ubuntu Touch:

                    @idonthatevests I assume you're using LLM for STT. I'm doing that on my desktop in a python venv.

                    No, I use small ASR models. Running LLM on an old mobile CPU for this task would likely make the speech recognition expensive and slow. And I think the same situation would be with attempts to use it for speech synthesis. So, using LLMs for that on mobile OS is probably possible, but only if you implement it for non time-critical tasks. Yet, in my opinion espeak-ng is still a fine option for that and is highly configurable.

                    I'm still figuring my way around how UT is organized (I use gentoo, and it's quite different).

                    There are many things in UT that are not organized yet, but that's what is great about UT for me, that you can do it yourself! Have fun with your research

                    U 1 Reply Last reply Reply Quote 1
                    • U Offline
                      undrwater @idonthatevests
                      last edited by

                      @idonthatevests

                      Can you link the SAR model you're using?

                      Also, I'm not getting the microphone icon on the UT Translator app. enableSTT=true is set on the conf file, and Soty server is running (configured not to be killed by the tweak tool).

                      I am getting "failed to load Vosk model" in the Soty server, but not sure why. I unzipped the model into ~/.local/kl.soty/en, and permissions look good.

                      Can soty be started from the terminal?

                      I 1 Reply Last reply Reply Quote 0
                      • I Offline
                        idonthatevests @undrwater
                        last edited by idonthatevests

                        @undrwater
                        There's an automatic installer in Soty app. It is achieved by clicking settings button (gear pictogram) in the right corner of top panel.
                        For manual installation, models should be put in .local/share/kl.soty
                        Now I see that it's my mistake. I'm sorry for providing not working instructions. Editing op post right now

                        U 1 Reply Last reply Reply Quote 1
                        • U Offline
                          undrwater @idonthatevests
                          last edited by

                          @idonthatevests

                          Cool. I did try installing from the gear icon, but already had the language model downloaded and unzipped into 'en'.

                          When tapping the english model from the gear icon, it didn't seem to finish. Is this because the model I unzipped was in the way?

                          I'll move it to try.

                          U 1 Reply Last reply Reply Quote 0
                          • U Offline
                            undrwater @undrwater
                            last edited by

                            Still no luck. Getting the following message:

                            "THE INSTALLATION IS IN PROGRESS. THIS MAY TAKE A WHILE. DO NOT PRESS THAT BUTTON UNTIL THE INSTALLATION IS FINISHED PLEASE."

                            This message remains for...well...ever. Far too long for a model install. I've left it overnight (though of course the phone went to sleep). I've tried keeping the phone awake for about 30 minutes, but the message stays.

                            The reason I asked about running from cli is to see if there are any useful messages regarding why it's taking so long.

                            Thanks!

                            PS: I've got vosk running on my desktop. Amazing what it can recognize!

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post