UBports Robot Logo UBports Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search
    • Register
    • Login
    1. Home
    2. idonthatevests
    I
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 1
    • Posts 1
    • Groups 0

    idonthatevests

    @idonthatevests

    4
    Reputation
    1
    Profile views
    1
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    idonthatevests Unfollow Follow

    Best posts made by idonthatevests

    • SOTY - Speech-To-Text Recognition on Ubuntu Touch

      Here, in this thread we discuss

      SPEECH RECOGNITION ENGINE on UBUNTU TOUCH

      Yeah, it's real. I've made it so that it could run on UT. Locally. Without sending it to someone's server

      This solution is called SOTY and is free software. It is a port of VOSK API wrapper for Kaldi - pretty neat speech recognition framework.

      Well, back to the point. Post your test results here, be surprised how (in)accurate the results are, review the source code and propose changes, ask questions on adapting your application to this feature, and don't forget to ask again why this isn't working in background and how to use that. It is recommended you read all this long post before posting yourself.

      What is it?

      It is a speech recognition server, which means it receives input (raw audio data) from a client, processes it and then sends back a transcription of a data being recorded on a client.

      The server itself is completely useless without a client, it doesn't even have an access to audio subsystem.
      The server was made to be combined with other software that would utilize speech recognition, where it could be useful

      So, right now this is more like a framework for developers, who might be interested in it.

      Installation

      Downloading the application from OpenStore will not be enough.
      You also need to install models.

      To install English language model you need to run these commands in terminal:

      mkdir -P /home/phablet/.local/kl.soty
      cd /home/phablet/.local/kl.soty
      curl -O https://gitlab.com/soty-stt/soty-models/-/raw/main/en/small/vosk-model-small-en-us-0.15.zip?ref_type=heads&inline=false
      unzip vosk-model-small-en-us-0.15.zip
      mv vosk-model-small-en-us-0.15 en
      

      Now it supports transcribing in English. You can test your accent with...

      List of applications that work with SOTY STT

      UT Translator (recent update)
      To enable SOTY integration run in terminal:

      sed -i 's/enableSTT=false/enableSTT=true/g' /home/phablet/.config/ut-dictionary-frontend.ut-dictionary-frontend/ut-dictionary-frontend.ut-dictionary-frontend.conf
      
      

      Then after installing and running SOTY properly

      1. Open SOTY first and start server
      2. Open UT Translator (WITHOUT CLOSING SOTY SERVER)
      3. Choose English language. Microphone icon will appear on a top panel. Click on it to start recording audio.

      I hope this list will grow bigger over time

      (I would be more than happy to have it integrated in lomiri keyboard, and that would probably eliminate the need to integrate it with any other app, but I don't know if that's ever going to happen)

      Quality

      It is now possible to transcribe everything you say on a device locally, your smartphone that runs Ubuntu Touch could totally do that.
      Too good to be true. There of course are limitations.
      If we use small models, which is the current case, they won't cover all the words in language. And our small models are not good at transcribing previously unknown words and separate letters.

      You could try using models that are much bigger and intended for use on servers, but they, however would require more RAM and more time to process your data. It is significally slower. You will not like it. Implementing VAD preprocessing might help a little. And might not.

      Another fly in the ointment is that models currently in use are not helpful with spelling words. At all. You need to re-train them for this specific task.

      Summary

      I hope it has some potential. Will it evolve into an open-source voice assistant for your device of the future, or will it remain a funny conceptual toy, it's up to you, dear Community.

      How can I help

      Here's what you can do for this project:

      • Design
      • Code improvements
      • Guides for other people
      • If you are an app developer : think of ways it could be useful in your application
      • Testing and reporting bugs
      • (The most important)Improving models

      Plans

      Add models installer.
      Make it configurable.
      System OSK integration.

      Improving models

      Under construction

      posted in App Development
      I
      idonthatevests

    Latest posts made by idonthatevests

    • SOTY - Speech-To-Text Recognition on Ubuntu Touch

      Here, in this thread we discuss

      SPEECH RECOGNITION ENGINE on UBUNTU TOUCH

      Yeah, it's real. I've made it so that it could run on UT. Locally. Without sending it to someone's server

      This solution is called SOTY and is free software. It is a port of VOSK API wrapper for Kaldi - pretty neat speech recognition framework.

      Well, back to the point. Post your test results here, be surprised how (in)accurate the results are, review the source code and propose changes, ask questions on adapting your application to this feature, and don't forget to ask again why this isn't working in background and how to use that. It is recommended you read all this long post before posting yourself.

      What is it?

      It is a speech recognition server, which means it receives input (raw audio data) from a client, processes it and then sends back a transcription of a data being recorded on a client.

      The server itself is completely useless without a client, it doesn't even have an access to audio subsystem.
      The server was made to be combined with other software that would utilize speech recognition, where it could be useful

      So, right now this is more like a framework for developers, who might be interested in it.

      Installation

      Downloading the application from OpenStore will not be enough.
      You also need to install models.

      To install English language model you need to run these commands in terminal:

      mkdir -P /home/phablet/.local/kl.soty
      cd /home/phablet/.local/kl.soty
      curl -O https://gitlab.com/soty-stt/soty-models/-/raw/main/en/small/vosk-model-small-en-us-0.15.zip?ref_type=heads&inline=false
      unzip vosk-model-small-en-us-0.15.zip
      mv vosk-model-small-en-us-0.15 en
      

      Now it supports transcribing in English. You can test your accent with...

      List of applications that work with SOTY STT

      UT Translator (recent update)
      To enable SOTY integration run in terminal:

      sed -i 's/enableSTT=false/enableSTT=true/g' /home/phablet/.config/ut-dictionary-frontend.ut-dictionary-frontend/ut-dictionary-frontend.ut-dictionary-frontend.conf
      
      

      Then after installing and running SOTY properly

      1. Open SOTY first and start server
      2. Open UT Translator (WITHOUT CLOSING SOTY SERVER)
      3. Choose English language. Microphone icon will appear on a top panel. Click on it to start recording audio.

      I hope this list will grow bigger over time

      (I would be more than happy to have it integrated in lomiri keyboard, and that would probably eliminate the need to integrate it with any other app, but I don't know if that's ever going to happen)

      Quality

      It is now possible to transcribe everything you say on a device locally, your smartphone that runs Ubuntu Touch could totally do that.
      Too good to be true. There of course are limitations.
      If we use small models, which is the current case, they won't cover all the words in language. And our small models are not good at transcribing previously unknown words and separate letters.

      You could try using models that are much bigger and intended for use on servers, but they, however would require more RAM and more time to process your data. It is significally slower. You will not like it. Implementing VAD preprocessing might help a little. And might not.

      Another fly in the ointment is that models currently in use are not helpful with spelling words. At all. You need to re-train them for this specific task.

      Summary

      I hope it has some potential. Will it evolve into an open-source voice assistant for your device of the future, or will it remain a funny conceptual toy, it's up to you, dear Community.

      How can I help

      Here's what you can do for this project:

      • Design
      • Code improvements
      • Guides for other people
      • If you are an app developer : think of ways it could be useful in your application
      • Testing and reporting bugs
      • (The most important)Improving models

      Plans

      Add models installer.
      Make it configurable.
      System OSK integration.

      Improving models

      Under construction

      posted in App Development
      I
      idonthatevests