UBports Robot Logo UBports Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login

    SOTY - Speech-To-Text Recognition on Ubuntu Touch

    Scheduled Pinned Locked Moved App Development
    1 Posts 1 Posters 72 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
      Reply
      • Reply as topic
      Log in to reply
      This topic has been deleted. Only users with topic management privileges can see it.
      • I Offline
        idonthatevests
        last edited by idonthatevests

        Here, in this thread we discuss

        SPEECH RECOGNITION ENGINE on UBUNTU TOUCH

        Yeah, it's real. I've made it so that it could run on UT. Locally. Without sending it to someone's server

        This solution is called SOTY and is free software. It is a port of VOSK API wrapper for Kaldi - pretty neat speech recognition framework.

        Well, back to the point. Post your test results here, be surprised how (in)accurate the results are, review the source code and propose changes, ask questions on adapting your application to this feature, and don't forget to ask again why this isn't working in background and how to use that. It is recommended you read all this long post before posting yourself.

        What is it?

        It is a speech recognition server, which means it receives input (raw audio data) from a client, processes it and then sends back a transcription of a data being recorded on a client.

        The server itself is completely useless without a client, it doesn't even have an access to audio subsystem.
        The server was made to be combined with other software that would utilize speech recognition, where it could be useful

        So, right now this is more like a framework for developers, who might be interested in it.

        Installation

        Downloading the application from OpenStore will not be enough.
        You also need to install models.

        To install English language model you need to run these commands in terminal:

        mkdir -P /home/phablet/.local/kl.soty
        cd /home/phablet/.local/kl.soty
        curl -O https://gitlab.com/soty-stt/soty-models/-/raw/main/en/small/vosk-model-small-en-us-0.15.zip?ref_type=heads&inline=false
        unzip vosk-model-small-en-us-0.15.zip
        mv vosk-model-small-en-us-0.15 en
        

        Now it supports transcribing in English. You can test your accent with...

        List of applications that work with SOTY STT

        UT Translator (recent update)
        To enable SOTY integration run in terminal:

        sed -i 's/enableSTT=false/enableSTT=true/g' /home/phablet/.config/ut-dictionary-frontend.ut-dictionary-frontend/ut-dictionary-frontend.ut-dictionary-frontend.conf
        
        

        Then after installing and running SOTY properly

        1. Open SOTY first and start server
        2. Open UT Translator (WITHOUT CLOSING SOTY SERVER)
        3. Choose English language. Microphone icon will appear on a top panel. Click on it to start recording audio.

        I hope this list will grow bigger over time

        (I would be more than happy to have it integrated in lomiri keyboard, and that would probably eliminate the need to integrate it with any other app, but I don't know if that's ever going to happen)

        Quality

        It is now possible to transcribe everything you say on a device locally, your smartphone that runs Ubuntu Touch could totally do that.
        Too good to be true. There of course are limitations.
        If we use small models, which is the current case, they won't cover all the words in language. And our small models are not good at transcribing previously unknown words and separate letters.

        You could try using models that are much bigger and intended for use on servers, but they, however would require more RAM and more time to process your data. It is significally slower. You will not like it. Implementing VAD preprocessing might help a little. And might not.

        Another fly in the ointment is that models currently in use are not helpful with spelling words. At all. You need to re-train them for this specific task.

        Summary

        I hope it has some potential. Will it evolve into an open-source voice assistant for your device of the future, or will it remain a funny conceptual toy, it's up to you, dear Community.

        How can I help

        Here's what you can do for this project:

        • Design
        • Code improvements
        • Guides for other people
        • If you are an app developer : think of ways it could be useful in your application
        • Testing and reporting bugs
        • (The most important)Improving models

        Plans

        Add models installer.
        Make it configurable.
        System OSK integration.

        Improving models

        Under construction

        1 Reply Last reply Reply Quote 4
        • First post
          Last post