Gotta release fast

dobey

So, in the Canonical days, there was a lot more testing, with more automation, along with more visibility into things like unit test coverage, and crash reports.

Could we perhaps document where we differ now from what processes and features were in place at that time, so we can perhaps bring back some of those in a manner suitable for UBports, and then go from there?

In particular, the tests situation is really bad now with UBports, as many packages just have tests disabled during the deb builds, autopilot tests aren't being run anywhere, and we don't have the infrastructure set up to be building with coverage enabled.

rogier.oudshoorn

My vote is for FAST. Smaller changes at every release usually means it's easier to fix, and said fix is always only 2 weeks away (unless you also want to be able to do hotfixes to stable, which might mean days away).

Given all the above, i would suggest having a manual step before RC gets promoted to stable; just to have someone look around to check how many people have actually used the RC to address stability. It also means that, in case of vacation times, nothing gets promoted automatically.

I do have a question: can you actually OTA the entire population within a day?

dobey

Everyone would obviously vote for fast. But we already have that, so why are they not just using the devel channel?

We don't need faster (or slower) releases. We need verifiable releases. Once a month for stable channel is plenty fast enough a cadence, and we could do interim security fix releases if necessary. If you're on stable, that's plenty fast for releases, and people who want faster can switch to rc, or even devel. System updates being released too often on stable channel just makes the product seem immature, while not enough updates makes it look unmaintained. We need to find the happy medium there.

But if we can't verify the reliability of the software, any release cadence is going to be problematic.

zx81

I prefer the fast option but I am happy to go with the flow if the slower option is preferred by the majority.

At the end of the day you guys are doing a fantastic job but can't be expected to devote your entire time to it .

vandys

I have to say that the transition to Xenial was a huge step, almost a defining milestone for the project. I think you made the right call after that big hurdle to get it out, even with some imperfections. Maybe from here on onward you might choose to hold back, but I think for something as big as your new release, it was right to get it out in the world rather than letting it drag on like one of those never-released "1.0"'s.

neopar

Great to see there is public discussion about this topic!

I think the point is to pick up almost every UT user, since there are not many of them. There are still users using Canonical version out there.

In some projects we used more layers to solve that kind of problems. In UT world the approach could be to introduce one more release channel (maybe that would be the happy medium metioned by @dobey

devel → will likely break something on your phone
testing → kinda mix of current devel and rc
rc → stable enough to use as daily driver, well tested, introducing new feauters frequently
stable → rock solid version, for users who don’t care much about new features they don’t need ATM.

The user could decide to use rc or stable without being afraid that something will beak with next update.

But there should be a way to deliver critical bug fixes on any channel instantly.

A Former User

@elastic: "of course one can follow the xfce path as well it's ready when it's ready"

I think there's a lot to be said for that, actually. It would give things a more natural flow.

UniSuperBox

I love the discussion that this has garnered. I think we can take all of the points to bring together a better plan soon.

@alan_g:

The criteria for moving from rc to stable needs to be clearer or stricter. I suggest that any new issues on rc (i.e. ones that don't also exist on stable) should default to critical-rc unless there is an explicit group decision otherwise.

Good choice. It would require some very on-the-ball triagers keeping track of bugs and being able to mark anything new in this RC as a regression. I'm not sure GitHub or GitLab automation would have the tools we need to make that automatic.

Maybe there could be a place where they can sign off that they have "used it for 24 hours on XXX device" to feed into the release-to-stable decision?

Another good choice. Something as simple as a form where they enter their device, the version they're using, and say "yes" or "no" could work. Having trusted testers in our QA group with ultimate sign-off powers is what I'd come back with, though.

@dobey:

Could we perhaps document where we differ now from what processes and features were in place at that [Canonical] time, so we can perhaps bring back some of those in a manner suitable for UBports, and then go from there?

Hmm. I don't have a clear understanding of what those processes were, to be honest. I was just getting started with UBports when Canonical stopped releasing updates altogether. Could you give a clearer picture of what you're looking for, maybe some wiki.ubuntu.com pages?

This post is my first try for getting our processes compiled into a coherent place. Later, a fully fleshed out release document would be placed on docs.ubports.com.

Once a month for stable channel is plenty fast enough a cadence, and we could do interim security fix releases if necessary.

A good plan on its face, but the rolling development nature might make the "Interim security releases" part sticky. What if there's an in-image security fix at week 2 but we merged in a feature at week 1 that is still undergoing full QA (since we have a month, right?) prior to release? I think this situation would be fairly common and I wouldn't be comfortable pulling the release trigger in that case.

the tests situation is really bad now with UBports, as many packages just have tests disabled during the deb builds, autopilot tests aren't being run anywhere, and we don't have the infrastructure set up to be building with coverage enabled.

Correct on all counts. While Autopilot might be a problem (no one is maintaining it), autopkgtests should definitely be running.

System updates being released too often on stable channel just makes the product seem immature, while not enough updates makes it look unmaintained. We need to find the happy medium there.

Maybe if not weekly, then every 2 weeks. My primary concern is discouraging a mindset of "bring it in for QA, the release is 4 weeks away" and encouraging "QA, can you look at this a lot before we bring it in?" (this applies to @3arn0wl's quote from @elastic, too)

@neopar:

I think the point is to pick up almost every UT user, since there are not many of them. There are still users using Canonical version out there.

Not really. The only way we'll reach the people on Canonical builds is with marketing and word of mouth. The release cadence doesn't relate to that.

In some projects we used more layers to solve that kind of problems. In UT world the approach could be to introduce one more release channel

Who is your ideal user for the fourth channel? Right now, the split is fairly clear:

developers who are merging things into the image should have devel on at least one device which they update extremely often.
Users who want faster updates and QA testers should have rc on all their devices
Users who want only the most tested software should have stable on all their devices.

I'm not sure where "testing" would fit in here.

But there should be a way to deliver critical bug fixes on any channel instantly.

The rolling release model shoots this as far as I can see. That's why I'd prefer a faster, but scheduled, release.

alan_g

@unisuperbox said in Gotta release fast:

Good choice. It would require some very on-the-ball triagers keeping track of bugs and being able to mark anything new in this RC as a regression. I'm not sure GitHub or GitLab automation would have the tools we need to make that automatic.

UBports has an unpaid (or even paying!) community who want to get involved but may not have the skills or resources to help with writing the code. Reviewing issue reports is an essential task with a modest learning curve and can be done with limited resources such as time.

All that is required is to:

Ensure the report describes a problem
Ensure there is enough information to reproduce the problem
Ensure the affected device(s), channel(s) and version(s) are mentioned
Mark the issue as "checked"

It should be clear (but probably isn't) that this is something anyone with "5 minutes" can help with. It amounts to putting the issue into this outline:

Title
How to reproduce
Expected result
Actual result
Devices, channels & versions
Confirmed by

The above has to happen before additional steps, that requires more knowledge or skills, can be taken:

Ensure the bug is reported against the right project
Assess the impact
Prioritize the work to fix
Do the work to fix

neopar

@unisuperbox said in Gotta release fast:

Who is your ideal user for the fourth channel? Right now, the split is fairly clear:

I try to clarify my point of view and map suggested channels to the current channels we have:

devel -> current devel
testing -> current rc
rc -> current stable
stable -> kind of LTS release; for users who don't want to download the entire footfs image every week, but want to receive security updates

It is not relevant ATM, since most users are kind of beta testers. But it could be important when the community grows.

kugiigi

I don't know how much effort is needed every stable release but perhaps we can have a regular OTA and a some kind of a mini-OTA midway. This mini-OTA will include small bug fixes that won't most likely create regressions. An example of this are the layout issues in core apps. Regular OTA will be for bigger fixes and new features.

alan_g

@kugiigi said in Gotta release fast:

I don't know how much effort is needed every stable release but perhaps we can have a regular OTA and a some kind of a mini-OTA midway. This mini-OTA will include small bug fixes that won't most likely create regressions. An example of this are the layout issues in core apps. Regular OTA will be for bigger fixes and new features.

That sounds plausible, but I've worked (or consulted) on a lot of software projects and this is hard to make work. It is very difficult to manage updating both a released version and a development version without more than doubling the workload, It turns out there can be unexpected interactions between apparently small, safe changes.

It is better to be in a position where there's no need to manage two sets of changes. This does have risks (which is why devel breaks) and we have to mitigate those risks. One way is to release often so that the number of changes in each release is small, another is to have easy ways to revert to a previous good release, another is to release to a small group before rollout.

With the proposed use of rc a group of "canaries" try changes on rc first to identify unexpected issues.

Which reminds me: one essential test is that rc can revert back to stable in the event of problems.

dobey

@alan_g One should always be able to revert to the previous image in the same channel as well. The main place where this becomes problematic, is for things which are shipped as clicks, rather than as the more immutable system image. Reverting to previous clicks is nigh impossible (without already having the old click), and we don't have stable/testing/devel channels for things in the app store.

alan_g

@dobey said in Gotta release fast:

@alan_g One should always be able to revert to the previous image in the same channel as well. The main place where this becomes problematic, is for things which are shipped as clicks, rather than as the more immutable system image. Reverting to previous clicks is nigh impossible (without already having the old click), and we don't have stable/testing/devel channels for things in the app store.

That is good both for the user and for the channel owner. Is that supported?

dobey

@alan_g said in Gotta release fast:

That is good both for the user and for the channel owner. Is that supported?

I would say so, though there isn't UI for it at the moment. We could probably make some changes to the UI to make it easier to select any of the available images in a channel, to switch to (assuming we only keep N images in a channel, rather than all images ever built).

Currently one needs to do the revert either using ubuntu-device-flash or by using system-image-cli directly on the device via an adb/ssh shell.

Lakotaubp

@dobey Would be good to have an easy UI rather than having to resort to terminal etc. Only I don't know how difficult it would be to include.

UniSuperBox

For clicks:

The OpenStore can (and does) downgrade Clicks when it sees fit, but there is currently no way in the UI to select an older app version to download and install.

For images:

There is currently no way to select which image in a channel you wish to use. You have the latest, always.

dobey

@unisuperbox said in Gotta release fast:

There is currently no way to select which image in a channel you wish to use. You have the latest, always.

You can specify which revision of an image to install, and from which channel, with the two tools I mentioned. I presume the installer always uses the latest image and has no way to specify a build number, but I do not know with certainty, so I didn't mention it.

I think it would be pretty easy to add something to the System Settings updates/channels panel though.

UniSuperBox

As we near the OTA-6 release, I would like to gather what I believe are the most important points from this post:

Everyone wants "release faster", without a doubt.
"Release faster" depends on having enough confidence that software being released to stable is well-tested, which we do not have. Some things which would translate into more release confidence include:
- Automated tests on all system components
- Integration tests between system components
- Automated full-device testing, such as Canonical's fabled "Frankenstein device lab"
- Formalized manual testing by users who can confirm stability
Even with full release confidence (but especially without), an automatic release model requires the ability to roll back to a previous version of apps and the full system image
- Aside: I want to experiment with one of the new atomic release models like OSTree which allows agnostic packaging formats to be installed on top. From my quick foray, I see:
  - Benefits: Automatic and manual rollbacks, automatic diffing, and switchable roots
  - Drawbacks that make this a long-term or impossible idea for our existing devices
    - Probably requires newer kernel versions than 3.4 or 3.10
    - We don't have enough engineering power to concoct a solution which would allow converting a system-image system to OSTree in-flight, so we'd probably require users to manually switch.

I think that bringing up the edge channel as a place to do a large migrations out-of-band with the normal release cadence is going to have a huge impact on how quickly we can release in the future. Previously, @mariogrip's work on bringing us to upstream Libhybris would have had to wait until after OTA-6 lands, and then it would have delayed OTA-7 until we had proper confidence in it. With edge, we can have people help us test these huge changes with the ability to roll back and without disrupting current users.

To respond to my earlier hindsight: OTA-6 was not (yet ) taken away by TDS. We did not hit the original deadline set, but we will still be within our 6-8 week cycle when it releases. This includes our testing and release admin stage, which we are currently in. All of this while I'm not sweating bullets from pushing a release on ourselves before we're sure we're ready.

We have ended up in "Gotta release slower" in this cycle without a doubt. I hope to make the improvements we've identified here so "gotta release fast" can become a reality. I also hope to be able to take our release management up so that we hit 2-4 weeks development, 2 weeks testing. This means we'll get faster releases, but we still won't hit fast releases.

Important note for anyone interested in contributing to... basically anything I've said here: I'll be holding an OTA-7 development meeting at the start of the cycle where we'll discuss how we want to improve during the OTA-7 cycle and taking bugs from the tracker for the release. Only assigned tickets will be added to this milestone, so if you have a pet bug that you want fixed now is the time to get involved to help fix it! Subscribe to this GitLab issue for more information as the day draws nearer.