idonthatevests

idonthatevests

Here, in this thread we discuss

SPEECH RECOGNITION ENGINE on UBUNTU TOUCH

Yeah, it's real. I've made it so that it could run on UT. Locally. Without sending it to someone's server

This solution is called SOTY and is free software. It is a port of VOSK API wrapper for Kaldi - pretty neat speech recognition framework.

Well, back to the point. Post your test results here, be surprised how (in)accurate the results are, review the source code and propose changes, ask questions on adapting your application to this feature, and don't forget to ask again why this isn't working in background and how to use that. It is recommended you read all this long post before posting yourself.

What is it?

It is a speech recognition server, which means it receives input (raw audio data) from a client, processes it and then sends back a transcription of a data being recorded on a client.

The server itself is completely useless without a client, it doesn't even have an access to audio subsystem.
The server was made to be combined with other software that would utilize speech recognition, where it could be useful

So, right now this is more like a framework for developers, who might be interested in it.

Installation

Downloading the application from OpenStore will not be enough.
You also need to install models.

~~To install English language model you need to run these commands in terminal:~~

Models can be installed using the in-application installer, which is accessible through "gear" icon in top right corner of the app interface.
Now it supports transcribing in English. You can test your accent with...

List of applications that work with SOTY STT

UT Translator (recent update)
To enable SOTY integration run in terminal:

sed -i 's/enableSTT=false/enableSTT=true/g' /home/phablet/.config/ut-dictionary-frontend.ut-dictionary-frontend/ut-dictionary-frontend.ut-dictionary-frontend.conf

Then after installing and running SOTY properly

Open SOTY first and start server
Open UT Translator (WITHOUT CLOSING SOTY SERVER)
Choose English language. Microphone icon will appear on a top panel. Click on it to start recording audio.

I hope this list will grow bigger over time

(I would be more than happy to have it integrated in lomiri keyboard, and that would probably eliminate the need to integrate it with any other app, but I don't know if that's ever going to happen)

Quality

It is now possible to transcribe everything you say on a device locally, your smartphone that runs Ubuntu Touch could totally do that.
Too good to be true. There of course are limitations.
If we use small models, which is the current case, they won't cover all the words in language. And our small models are not good at transcribing previously unknown words and separate letters.

You could try using models that are much bigger and intended for use on servers, but they, however would require more RAM and more time to process your data. It is significally slower. You will not like it. Implementing VAD preprocessing might help a little. And might not.

Another fly in the ointment is that models currently in use are not helpful with spelling words. At all. You need to re-train them for this specific task.

Summary

I hope it has some potential. Will it evolve into an open-source voice assistant for your device of the future, or will it remain a funny conceptual toy, it's up to you, dear Community.

How can I help

Here's what you can do for this project:

Design
Code improvements
Guides for other people
If you are an app developer : think of ways it could be useful in your application
Testing and reporting bugs
(The most important)Improving models

Plans

Add models installer.
Make it configurable.
System OSK integration.

Improving models

Under construction

idonthatevests

New version is out
Introducing protocol v2 - now client apps do not need to use the microphone, audio recording is performed on a server. There are some flaws for this decision, though.
CPU load has been reduced in this release
This release is also backwards compatible with protocol v1, where client sends recorded data to server, just in case someone wants to send audio data from other sources.

Also, my apologies, I forgot to link
Client library for v2 protocol
Client library for v1 protocol (requires Microphone permission)

idonthatevests

This idea has some flaws: there are many good devs and their apps, with these restrictions probably some of them would be undeservedly not mentioned here, and some, despite putting a lot of work in this project, may be not nominated because we don't usually see them where we see other devs.

I would like to nominate the following apps:

Waydroid because we all know why
LogViewer since it makes debugging much less painful

Developers, who absolutely deserve mentioning here:

Danfro
fredldotme

idonthatevests

New version is out
Changes:

Fixed a few bugs
Added models installer (works for many languages listed in the menu, other models will be uploaded later)
The application UI can be translated.
Now it comes with amd64 build

idonthatevests

@Leroy_Linux
RC for 1.x has VoLTE support, and waydroid works fine

idonthatevests

you might want to teach her to force shutdown, which is a little tricky for this model, just in case

idonthatevests

The Client library for v2 protocol is now a complete QML plugin, which can be easily added to your application and then used in your QML layout. The repository contains all steps to integrate speech recognition client in your project. No permissions needed, the only requirement is server application running in background locally.

idonthatevests

@freddo
Thanks, that would be great to have a few mirrors, but currently, there's no integrity check mechanism implemented in OpenStore, and that is a requirement. I think I'll try doing something about it soon. That would also require cooperation with the OpenStore team

idonthatevests

@shengchieh
OnePlus N10 is a good device for that, if you find one with unlockable bootloader, but sometimes it glitches out (could be because of my use case) and requires forced shutdown, which requires holding all buttons for some time

idonthatevests

@undrwater
Thanks for your interest. You can easily integrate TTS support in your application using espeak-ng. However, espeak data takes 20 MBytes of user storage space. If you want this functionality for Soty server, that would require changing communication protocol for both server and client. It also would not be too hard, but I personally think we should look for a more accurate solution for this task, that could be seamlessly integrated in system, such as speech-dispatcher.

idonthatevests

@wally said in BUDGET 5G smartphone w/ Ubuntu Touch:

@idonthatevests Does this glitch involve the screen flickering? Does it seem to occur in conjunction with incoming notifications/messages?

Yes, sometimes that happens when I try to reboot, screen starts flickering while in system and even if it stops, it will glitch out. Might be caused by using firefox, I think.

idonthatevests

@Leroy_Linux
RC for 1.x has VoLTE support, and waydroid works fine

idonthatevests

In this topic Ubuntu Touch users can receive technical support for UT Translator app, instead of filling the application page with misinformation and bug reports. So, if you experience technical issues while using this app, want something changed or have questions about this app, feel free to ask here.

Questions that have been asked in a way

How do I install languages?
You can open the installer menu by pressing the gear icon on top right corner of the application

Where Bulgarian, Czech, German, Spanish, Estonian, Persian, French, Icelandic, Italian, Dutch, Polish, Portuguese, Russian, Ukrainian?
All these languages are supported by this application and can be installed by choosing "Install basic language models (600 MB)" entry in the installer menu

idonthatevests

@shengchieh
OnePlus N10 is a good device for that, if you find one with unlockable bootloader, but sometimes it glitches out (could be because of my use case) and requires forced shutdown, which requires holding all buttons for some time

idonthatevests

@DJac
yes.
/usr/bin/X11 is a link to current directory in this system

idonthatevests

Good option for controlling your audio volume in desktop apps is to install pavucontrol in libertine

idonthatevests

@tigermoth Hi, this would be a volunteer's job. This may or may not happen for apps you want to see in OpenStore. Leaving a request for these apps would raise chances a little. In fact, you could try porting some by yourself, in best case scenario recompiling the application with new build config would be enough to make it run on Noble. And some of them might work just as they are, downloaded from OpenStore.

idonthatevests

@linhmieu2 Have you tried changing string constants, such as CONFIG_LOCALVERSION just to confirm that the kernel has been flashed correctly? You need to unlock your device's boot loader to flash modified images.

idonthatevests

There's a postmarketOS port for your device, have you tried looking at their flashing configs for your device?

idonthatevests

@ravachol This information is outdated, Ubuntu Touch has VoLTE support and it has been stated in the development blog that FP4 has received VoLTE updates recently. It may still require polishing, or your device's modem doesn't support operator frequencies, it rarely happens.