All of lore.kernel.org
 help / color / mirror / Atom feed
* [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
@ 2018-07-02 18:00 Lars Kurth
  2018-07-02 18:03 ` Lars Kurth
  0 siblings, 1 reply; 82+ messages in thread
From: Lars Kurth @ 2018-07-02 18:00 UTC (permalink / raw)
  To: xen-devel; +Cc: Doug Goldstein, Rich Persaud, committers, advisory-board


[-- Attachment #1.1: Type: text/plain, Size: 6178 bytes --]

# Topics to discuss

### Release Cadence

2 years ago, we moved to a 6 monthly release cadence. The idea was to help companies getting

features into Xen in a more predictable way. This appears not to have worked. At the same time,

the number of releases is creating problems for the security team and some downstreams. I

wanted to collect views to kick-start an e-mail discussion.

### Security Process:

See https://lists.xenproject.org/archives/html/xen-devel/2018-05/msg01127.html

### Other changes that may be worth highlighting ...



# Discussed topics

### Release Cadence: 6 months vs 9 months

We compared the pros and cons of both models

| 6 months                                       | 9 months                                     |

| ------                                         | ------                                       |

| Large features have to be broken up into parts | Less of an issue with 9 months               |

| 4 months development 2-3 months hardening      | 6 months development 3-4 months hardening    |

| Fixed release cycle, fairly predictable        | In the past we slipped these more than today |

| Security overhead                              | Less of a security overhead                  |

| Positive benefits did not materialize.         |                                              |

| Distros use a wider range of Xen versions      | Distros 1.5 years out of date                |



In terms of positive benefits: we have mainly been releasing on time, but have less

predictability on what makes it into a release. Also, contributors frequently miss their targeted

releases.



We then had a discussion around why the positive benefits didn't materialize:

* Andrew and a few other believe that the model isn't broken, but that the issue is with how we

  develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but

  merely provide an incentive not to fix them.

* Issues highlighted were:

  * 2-3 months stabilizing period is too long

  * Too much start/stop of development - we should branch earlier (we mainly do this on the last

    RC now). The serial period of development has essentially become too short. *Everyone* in the

    room agreed that fixing this, is the *most important issue*.

  * Testing (aka OSSTEST) and CI automation is too fragile and we are too slow. Do we need another

    way of smoke testing which acts as a filter to OSSTEST and reduces the number of issues that

    block everyone - also see the Gitlab discussion

    (https://lists.xenproject.org/archives/html/xen-devel/2018-07/threads.html#00126)

    Also, we could do testing using QEMU for most cases which are not hardware specific for faster
    turnaround as ViryaOS is planning on doing. This is slow, but can be offloaded into the cloud
    and could be triggered before a patch hits staging. Doug indicated he could help get the needed

    capacity for free.

  * Testing for end-users is too hard: we should not require them to build Xen - again with GitLab

    as per previous discussion we could deliver unsigned RPMs and Debug Builds for testing. If we

    were able to do this, this should help



Then there was a discussion on how we could make more downstreams coalesce around the same Xen

releases. This discussion was driven by Doug G, but we didn't come to a conclusion. If we were

able to do this, we could introduce a model similarly to Debian, were downstreams could pull

together and fund a longer support period for a Xen version which the majority of downstreams

use, without impacting the stable release maintainer.



Doug mentioned to me afterwards, that there is a Xen Slack channel for distros which is very

active, where we could test this idea.



Another approach would be to follow a model as the Wayland and X11 communities do, which have

different support periods for odd and even releases. They use the odd-unstable/even-stable style

of versioning. This may be worth investigating further.



Also, there was a discussion about Ubuntu, which was struggling with their release model for

4 years, but they stuck with it and essentially fixed the underlying issues.



*ACTION:* Lars volunteered to put together a similar document as we did for Security, covering

the Release cycle. The summit discussion provided some interesting pointers and insights. As a

first step, we ought to get a clearer picture of the pain points within the release cycle that

we experience today.



### Security Process

*Batches and timing:* Everyone present, felt that informal batching is good (exception Doug G),

but that we should not move to a Patch Tuesday model. For that to work, we would need

significantly more man-power in the security team than we have now. In addition, it would also

imply to defer *critical issues* merely to hit an arbitrary date, which was perceived as bad

in particular by the downstream reps at the meeting. However, as in the xen-devel@ discussion,

there was no disagreement to codify batching as an option in our security policy, as long as

it is flexible enough.



Again, there was a sense that some of the issues we are seeing could be solved if we had better

CI capability: in other words, some of the issues we were seeing could be resolved by

* Better CI capability as suggested in the Release Cadence discussion

* Improving some of the internal working practices of the security team

* Before we commit to a change (such as improved batching), we should try them first informally.
  E.g. the security team could try and work towards more predictable dates for batches vs. a
  concrete process change



Note that we did not get to the stable baseline discussion: but it was highlighted that several

members of the security team also wear the hat of distro packagers for Debian and CentOS and

are starting to feel pain.



Lars noted that I may make sense to arrange a community call to make more progress on this discussion.

[-- Attachment #1.2: Type: text/html, Size: 16918 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-02 18:00 [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, Lars Kurth
@ 2018-07-02 18:03 ` Lars Kurth
  2018-07-03  6:26   ` Juergen Gross
                     ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Lars Kurth @ 2018-07-02 18:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Doug Goldstein, Rich Persaud, committers, advisory-board

Non-html version: apologies
Lars

From: Lars Kurth <lars.kurth@citrix.com>
Date: Monday, 2 July 2018 at 19:00
To: xen-devel <xen-devel@lists.xenproject.org>
Cc: "committers@xenproject.org" <committers@xenproject.org>, Rich Persaud <persaur@gmail.com>, Doug Goldstein <cardoe@cardoe.com>, "advisory-board@lists.xenproject.org" <advisory-board@lists.xenproject.org>
Subject: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...

# Topics to discuss
### Release Cadence
2 years ago, we moved to a 6 monthly release cadence. The idea was to help companies getting 
features into Xen in a more predictable way. This appears not to have worked. At the same time, 
the number of releases is creating problems for the security team and some downstreams. I 
wanted to collect views to kick-start an e-mail discussion.
### Security Process: 
See https://lists.xenproject.org/archives/html/xen-devel/2018-05/msg01127.html
### Other changes that may be worth highlighting ...
 
# Discussed topics
### Release Cadence: 6 months vs 9 months
We compared the pros and cons of both models
| 6 months                                       | 9 months                                     |
| ------                                         | ------                                       |
| Large features have to be broken up into parts | Less of an issue with 9 months               |
| 4 months development 2-3 months hardening      | 6 months development 3-4 months hardening    |
| Fixed release cycle, fairly predictable        | In the past we slipped these more than today |
| Security overhead                              | Less of a security overhead                  |
| Positive benefits did not materialize.         |                                              |
| Distros use a wider range of Xen versions      | Distros 1.5 years out of date                |
 
In terms of positive benefits: we have mainly been releasing on time, but have less 
predictability on what makes it into a release. Also, contributors frequently miss their targeted 
releases.
 
We then had a discussion around why the positive benefits didn't materialize:
* Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
  develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
  merely provide an incentive not to fix them.
* Issues highlighted were:
  * 2-3 months stabilizing period is too long
  * Too much start/stop of development - we should branch earlier (we mainly do this on the last 
    RC now). The serial period of development has essentially become too short. *Everyone* in the 
    room agreed that fixing this, is the *most important issue*.
  * Testing (aka OSSTEST) and CI automation is too fragile and we are too slow. Do we need another
    way of smoke testing which acts as a filter to OSSTEST and reduces the number of issues that 
    block everyone - also see the Gitlab discussion 
    (https://lists.xenproject.org/archives/html/xen-devel/2018-07/threads.html#00126)
    Also, we could do testing using QEMU for most cases which are not hardware specific for faster 
    turnaround as ViryaOS is planning on doing. This is slow, but can be offloaded into the cloud
    and could be triggered before a patch hits staging. Doug indicated he could help get the needed 
    capacity for free.
  * Testing for end-users is too hard: we should not require them to build Xen - again with GitLab 
    as per previous discussion we could deliver unsigned RPMs and Debug Builds for testing. If we 
    were able to do this, this should help 
 
Then there was a discussion on how we could make more downstreams coalesce around the same Xen 
releases. This discussion was driven by Doug G, but we didn't come to a conclusion. If we were 
able to do this, we could introduce a model similarly to Debian, were downstreams could pull 
together and fund a longer support period for a Xen version which the majority of downstreams 
use, without impacting the stable release maintainer. 
 
Doug mentioned to me afterwards, that there is a Xen Slack channel for distros which is very 
active, where we could test this idea.
 
Another approach would be to follow a model as the Wayland and X11 communities do, which have 
different support periods for odd and even releases. They use the odd-unstable/even-stable style 
of versioning. This may be worth investigating further.
 
Also, there was a discussion about Ubuntu, which was struggling with their release model for 
4 years, but they stuck with it and essentially fixed the underlying issues.
 
*ACTION:* Lars volunteered to put together a similar document as we did for Security, covering 
the Release cycle. The summit discussion provided some interesting pointers and insights. As a 
first step, we ought to get a clearer picture of the pain points within the release cycle that 
we experience today.
 
### Security Process
*Batches and timing:* Everyone present, felt that informal batching is good (exception Doug G), 
but that we should not move to a Patch Tuesday model. For that to work, we would need 
significantly more man-power in the security team than we have now. In addition, it would also 
imply to defer *critical issues* merely to hit an arbitrary date, which was perceived as bad 
in particular by the downstream reps at the meeting. However, as in the xen-devel@ discussion, 
there was no disagreement to codify batching as an option in our security policy, as long as 
it is flexible enough.
 
Again, there was a sense that some of the issues we are seeing could be solved if we had better 
CI capability: in other words, some of the issues we were seeing could be resolved by
* Better CI capability as suggested in the Release Cadence discussion
* Improving some of the internal working practices of the security team
* Before we commit to a change (such as improved batching), we should try them first informally. 
  E.g. the security team could try and work towards more predictable dates for batches vs. a 
  concrete process change
 
Note that we did not get to the stable baseline discussion: but it was highlighted that several 
members of the security team also wear the hat of distro packagers for Debian and CentOS and 
are starting to feel pain.
 
Lars noted that I may make sense to arrange a community call to make more progress on this discussion.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-02 18:03 ` Lars Kurth
@ 2018-07-03  6:26   ` Juergen Gross
  2018-07-03  7:00     ` Jan Beulich
  2018-07-03  8:06     ` Lars Kurth
  2018-07-03 10:07   ` Roger Pau Monné
  2018-07-05 17:48   ` Doug Goldstein
  2 siblings, 2 replies; 82+ messages in thread
From: Juergen Gross @ 2018-07-03  6:26 UTC (permalink / raw)
  To: Lars Kurth, xen-devel
  Cc: committers, Rich Persaud, Doug Goldstein, advisory-board

On 02/07/18 20:03, Lars Kurth wrote:
> Non-html version: apologies
> Lars
> 
> From: Lars Kurth <lars.kurth@citrix.com>
> Date: Monday, 2 July 2018 at 19:00
> To: xen-devel <xen-devel@lists.xenproject.org>
> Cc: "committers@xenproject.org" <committers@xenproject.org>, Rich Persaud <persaur@gmail.com>, Doug Goldstein <cardoe@cardoe.com>, "advisory-board@lists.xenproject.org" <advisory-board@lists.xenproject.org>
> Subject: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
> 
> # Topics to discuss
> ### Release Cadence
> 2 years ago, we moved to a 6 monthly release cadence. The idea was to help companies getting 
> features into Xen in a more predictable way. This appears not to have worked. At the same time, 
> the number of releases is creating problems for the security team and some downstreams. I 
> wanted to collect views to kick-start an e-mail discussion.
> ### Security Process: 
> See https://lists.xenproject.org/archives/html/xen-devel/2018-05/msg01127.html
> ### Other changes that may be worth highlighting ...
>  
> # Discussed topics
> ### Release Cadence: 6 months vs 9 months
> We compared the pros and cons of both models
> | 6 months                                       | 9 months                                     |
> | ------                                         | ------                                       |
> | Large features have to be broken up into parts | Less of an issue with 9 months               |
> | 4 months development 2-3 months hardening      | 6 months development 3-4 months hardening    |
> | Fixed release cycle, fairly predictable        | In the past we slipped these more than today |
> | Security overhead                              | Less of a security overhead                  |
> | Positive benefits did not materialize.         |                                              |
> | Distros use a wider range of Xen versions      | Distros 1.5 years out of date                |
>  
> In terms of positive benefits: we have mainly been releasing on time, but have less 
> predictability on what makes it into a release. Also, contributors frequently miss their targeted 
> releases.
>  
> We then had a discussion around why the positive benefits didn't materialize:
> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>   merely provide an incentive not to fix them.

Keeping the 6 month schedule still leads to a substantial burden for
managing the stable releases.

> * Issues highlighted were:
>   * 2-3 months stabilizing period is too long

I absolutely agree.

>   * Too much start/stop of development - we should branch earlier (we mainly do this on the last 
>     RC now). The serial period of development has essentially become too short. *Everyone* in the 
>     room agreed that fixing this, is the *most important issue*.

While I'm really in favor of branching early I fear that this will even
raise the burden on some few developers who need to backport fixes to
the just branched off release candidate. An approach to solve this would
be to accept a development patch only in case it is accompanied by the
release backport.

>   * Testing (aka OSSTEST) and CI automation is too fragile and we are too slow. Do we need another
>     way of smoke testing which acts as a filter to OSSTEST and reduces the number of issues that 
>     block everyone - also see the Gitlab discussion 
>     (https://lists.xenproject.org/archives/html/xen-devel/2018-07/threads.html#00126)
>     Also, we could do testing using QEMU for most cases which are not hardware specific for faster 
>     turnaround as ViryaOS is planning on doing. This is slow, but can be offloaded into the cloud
>     and could be triggered before a patch hits staging. Doug indicated he could help get the needed 
>     capacity for free.

I really like automatic testing, but OSSTEST has its problems.

A major source of the pain seems to be the hardware: About half of all
cases where I looked into the test reports to find the reason for a
failing test flight were related to hardware failures. Not sure how to
solve that.

Another potential problems showed up last week: OSSTEST is using the
Debian servers for doing the basic installation. A change there (e.g.
a new point release) will block tests. I'd prefer to have a local cache
of the last known good set of *.deb files to be used especially for the
branched Xen versions. This would rule out remote problems for releases.

>   * Testing for end-users is too hard: we should not require them to build Xen - again with GitLab 
>     as per previous discussion we could deliver unsigned RPMs and Debug Builds for testing. If we 
>     were able to do this, this should help 
>  
> Then there was a discussion on how we could make more downstreams coalesce around the same Xen 
> releases. This discussion was driven by Doug G, but we didn't come to a conclusion. If we were 
> able to do this, we could introduce a model similarly to Debian, were downstreams could pull 
> together and fund a longer support period for a Xen version which the majority of downstreams 
> use, without impacting the stable release maintainer. 
>  
> Doug mentioned to me afterwards, that there is a Xen Slack channel for distros which is very 
> active, where we could test this idea.
>  
> Another approach would be to follow a model as the Wayland and X11 communities do, which have 
> different support periods for odd and even releases. They use the odd-unstable/even-stable style 
> of versioning. This may be worth investigating further.

I believe just leaving out every second release (i.e. doing one release
per year) is a rough equivalent for that with less overhead.

> Lars noted that I may make sense to arrange a community call to make more progress on this discussion.

Yes, please. I'd like to know how to do the next release. :-)


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03  6:26   ` Juergen Gross
@ 2018-07-03  7:00     ` Jan Beulich
  2018-07-03  8:13       ` Lars Kurth
  2018-07-03  8:06     ` Lars Kurth
  1 sibling, 1 reply; 82+ messages in thread
From: Jan Beulich @ 2018-07-03  7:00 UTC (permalink / raw)
  To: Lars Kurth, Juergen Gross
  Cc: xen-devel, Rich Persaud, Doug Goldstein, advisory-board, committers

>>> On 03.07.18 at 08:26, <jgross@suse.com> wrote:
> On 02/07/18 20:03, Lars Kurth wrote:
>>   * Too much start/stop of development - we should branch earlier (we mainly do this on the last 
>>     RC now). The serial period of development has essentially become too short. *Everyone* in the 
>>     room agreed that fixing this, is the *most important issue*.
> 
> While I'm really in favor of branching early I fear that this will even
> raise the burden on some few developers who need to backport fixes to
> the just branched off release candidate. An approach to solve this would
> be to accept a development patch only in case it is accompanied by the
> release backport.

I think that would depend on when exactly we branch and whether, as
we do now, we try to avoid doing intrusive commits until the release
was done. Generally backports to the most recent stable tree (even
after its release) are pretty simple.

The thing I'd be worried about if we branched really early (say at the
first RC) is that people would focus even less on the release branch,
but pay attention only to what they want in the next version. To be
fair, looking at "for-next" patch submissions, this hasn't been as bad
this time as it had been during the 4.10 freeze, but I'd very much
expect the situation to become worse again if we formally started
the next development period early.

Fundamentally the problem can as well be seen when looking at any
of the stable branches: The variety of authors there is significantly
more narrow than for what goes into master. I understand people
mostly care about their features, but there ought to be a certain
level of responsibility beyond that by everyone. For example, I'd
sort of expect it to be the rule rather than the exception that
people look at nearby code or code they clone, and address issues
they see. At the risk of repeating myself, a large number of the
security issues found results from paying attention to nearby code
(also during code review). Looking over the list of reporters there
very well supports my statement above regarding feature
submission authors vs bug fix ones.

Which reminds me of a related question: How do we define
maintainership? Is it really enough to ack a few patches here and
there to be considered a maintainer? To me, code maintenance
also (and perhaps first of all) means actively looking after the
code. And yes, I'm aware that an implication of the implication
here might be the undesirable situation of us having more
unmaintained code in the tree and/or even larger bodies of code
in even fewer hands. So it is (as almost always) a matter of
weighing pros and cons.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03  6:26   ` Juergen Gross
  2018-07-03  7:00     ` Jan Beulich
@ 2018-07-03  8:06     ` Lars Kurth
  2018-07-03 10:01       ` Jan Beulich
  2018-07-05 11:05       ` Ian Jackson
  1 sibling, 2 replies; 82+ messages in thread
From: Lars Kurth @ 2018-07-03  8:06 UTC (permalink / raw)
  To: Juergen Gross, xen-devel
  Cc: committers, Rich Persaud, Doug Goldstein, advisory-board, Matt Spencer



On 03/07/2018, 07:26, "Juergen Gross" <jgross@suse.com> wrote:
    On 02/07/18 20:03, Lars Kurth wrote:
    > We then had a discussion around why the positive benefits didn't materialize:
    > * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
    >   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
    >   merely provide an incentive not to fix them.
    
    Keeping the 6 month schedule still leads to a substantial burden for
    managing the stable releases.

That is true, which is why there was a discussion about maybe treating even/odd releases differently and other models were discussed (see below). But that of course has an impact on down streams and vendors.
    
    > * Issues highlighted were:
    >   * 2-3 months stabilizing period is too long
    
    I absolutely agree.

Could we try and establish a list for the underlying reasons. Issues with OSSTEST appears to be one of the reasons, but this may not be the only one.
    
    >   * Too much start/stop of development - we should branch earlier (we mainly do this on the last 
    >     RC now). The serial period of development has essentially become too short. *Everyone* in the 
    >     room agreed that fixing this, is the *most important issue*.
    
    While I'm really in favor of branching early I fear that this will even
    raise the burden on some few developers who need to backport fixes to
    the just branched off release candidate. An approach to solve this would
    be to accept a development patch only in case it is accompanied by the
    release backport.
    
    >   * Testing (aka OSSTEST) and CI automation is too fragile and we are too slow. Do we need another
    >     way of smoke testing which acts as a filter to OSSTEST and reduces the number of issues that 
    >     block everyone - also see the Gitlab discussion 
    >     (https://lists.xenproject.org/archives/html/xen-devel/2018-07/threads.html#00126)
    >     Also, we could do testing using QEMU for most cases which are not hardware specific for faster 
    >     turnaround as ViryaOS is planning on doing. This is slow, but can be offloaded into the cloud
    >     and could be triggered before a patch hits staging. Doug indicated he could help get the needed 
    >     capacity for free.
    
    I really like automatic testing, but OSSTEST has its problems.

The GitLab discussion was really interesting. Looking at OSSTEST, it basically performs build test and integration tests on Hardware. Whereas all these are needed, build testing and testing of functionality that does not depend on hardware could be done earlier. The ghist of the GitLab discussion was to "move" build testing - and possibly some basic integration testing - to the point of submitting a patch. The basic flow is:
* Someone posts a patch to the list => this will start the GitLab machinery 
* The Gitlab machinery will do build tests (and the discussion showed that we should be able to do this via cross compilation or compilation on a real system if a service such as infosifter is used - @Matt: I can't find the company via Google, maybe the spelling in the minutes is wrong)
* This could eventually include a basic set of smoke tests that are system independent and could run under QEMU - Doug already uses a basic test where a xen host and/or VM is started 
* If it fails, a mail is sent in reply to the patch submission by a bot - that piece has been developed by Intel for QEMU and Linux and could be re-used

This would free up time by reviewers and also leads to issues found earlier. In other words, OSSTEST would merely re-test what had been tested earlier and would focus on testing on real hardware. Thus, OSSTEST failures should become less likely. But obviously implementing such a system, even though all the pieces for it exist, will take some time.

Another interesting piece of information is that OpenXT started looking at OSSTEST, but is now not going down this route. Maybe Rich can expand. 
    
    A major source of the pain seems to be the hardware: About half of all
    cases where I looked into the test reports to find the reason for a
    failing test flight were related to hardware failures. Not sure how to
    solve that.

This is worrying and I would like to get Ian Jackson's viewpoint on it. 
    
    Another potential problems showed up last week: OSSTEST is using the
    Debian servers for doing the basic installation. A change there (e.g.
    a new point release) will block tests. I'd prefer to have a local cache
    of the last known good set of *.deb files to be used especially for the
    branched Xen versions. This would rule out remote problems for releases.

This is again something which we should definitely look at.
    
    >   * Testing for end-users is too hard: we should not require them to build Xen - again with GitLab 
    >     as per previous discussion we could deliver unsigned RPMs and Debug Builds for testing. If we 
    >     were able to do this, this should help 
    >  
    > Then there was a discussion on how we could make more downstreams coalesce around the same Xen 
    > releases. This discussion was driven by Doug G, but we didn't come to a conclusion. If we were 
    > able to do this, we could introduce a model similarly to Debian, were downstreams could pull 
    > together and fund a longer support period for a Xen version which the majority of downstreams 
    > use, without impacting the stable release maintainer. 
    >  
    > Doug mentioned to me afterwards, that there is a Xen Slack channel for distros which is very 
    > active, where we could test this idea.
    >  
    > Another approach would be to follow a model as the Wayland and X11 communities do, which have 
    > different support periods for odd and even releases. They use the odd-unstable/even-stable style 
    > of versioning. This may be worth investigating further.
    
    I believe just leaving out every second release (i.e. doing one release
    per year) is a rough equivalent for that with less overhead.

Well, yes and no. A longer development cycle leads to less discipline and thus longer stabilization periods, at least according to theory. I think for now, we should park this and collect more information on why the stabilisation period is so long and on OSSTEST and improving CI capability. All these activities benefit the release process whatever the length.
    
    > Lars noted that I may make sense to arrange a community call to make more progress on this discussion.
    
    Yes, please. I'd like to know how to do the next release. :-)

Until we decide on a different way forward, the default assumption would be that we stick with what we have. I am wondering, whether we should use 30 minutes in the upcoming community call (next week) to discuss the Release cycle (in particular for the next release). A lot of progress was made on most of the blocked feature work at the summit, so I think covering the release model should be OK.

Regards
Lars


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03  7:00     ` Jan Beulich
@ 2018-07-03  8:13       ` Lars Kurth
  2018-07-03 10:13         ` Jan Beulich
  0 siblings, 1 reply; 82+ messages in thread
From: Lars Kurth @ 2018-07-03  8:13 UTC (permalink / raw)
  To: Jan Beulich, Juergen Gross
  Cc: advisory-board, Doug Goldstein, Rich Persaud, committers,
	xen-devel, Matt Spencer



On 03/07/2018, 08:00, "Jan Beulich" <JBeulich@suse.com> wrote:

    >>> On 03.07.18 at 08:26, <jgross@suse.com> wrote:
    > On 02/07/18 20:03, Lars Kurth wrote:
    >>   * Too much start/stop of development - we should branch earlier (we mainly do this on the last 
    >>     RC now). The serial period of development has essentially become too short. *Everyone* in the 
    >>     room agreed that fixing this, is the *most important issue*.
    > 
    > While I'm really in favor of branching early I fear that this will even
    > raise the burden on some few developers who need to backport fixes to
    > the just branched off release candidate. An approach to solve this would
    > be to accept a development patch only in case it is accompanied by the
    > release backport.
    
    I think that would depend on when exactly we branch and whether, as
    we do now, we try to avoid doing intrusive commits until the release
    was done. Generally backports to the most recent stable tree (even
    after its release) are pretty simple.
    
    The thing I'd be worried about if we branched really early (say at the
    first RC) is that people would focus even less on the release branch,
    but pay attention only to what they want in the next version. To be
    fair, looking at "for-next" patch submissions, this hasn't been as bad
    this time as it had been during the 4.10 freeze, but I'd very much
    expect the situation to become worse again if we formally started
    the next development period early.

I think no one is talking about RC1. My expectation would be around a month or so after the freeze (aka half-way during the freeze period). In practice we seem to be opting for the last or second last RC today.
    
    Fundamentally the problem can as well be seen when looking at any
    of the stable branches: The variety of authors there is significantly
    more narrow than for what goes into master. I understand people
    mostly care about their features, but there ought to be a certain
    level of responsibility beyond that by everyone. For example, I'd
    sort of expect it to be the rule rather than the exception that
    people look at nearby code or code they clone, and address issues
    they see. At the risk of repeating myself, a large number of the
    security issues found results from paying attention to nearby code
    (also during code review). Looking over the list of reporters there
    very well supports my statement above regarding feature
    submission authors vs bug fix ones.

That is understood: if the project leadership agrees, then this is no issue, as committers essentially are the gate keepers for what goes in. So in other words, if committers are mainly focussing on getting a release out, necessarily even if master is open during hardening, development would still be slower than before. I don't think any contributors would have an issue with this.
    
    Which reminds me of a related question: How do we define
    maintainership? Is it really enough to ack a few patches here and
    there to be considered a maintainer? To me, code maintenance
    also (and perhaps first of all) means actively looking after the
    code. And yes, I'm aware that an implication of the implication
    here might be the undesirable situation of us having more
    unmaintained code in the tree and/or even larger bodies of code
    in even fewer hands. So it is (as almost always) a matter of
    weighing pros and cons.
    
What would speak against elevating the more active maintainers to committers (maybe on probation for a fixed time period, not yet responsible for THE REST). Would this help in your view?

Regards
Lars
    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03  8:06     ` Lars Kurth
@ 2018-07-03 10:01       ` Jan Beulich
  2018-07-03 10:14         ` Lars Kurth
  2018-07-05 11:05       ` Ian Jackson
  1 sibling, 1 reply; 82+ messages in thread
From: Jan Beulich @ 2018-07-03 10:01 UTC (permalink / raw)
  To: Lars Kurth
  Cc: Juergen Gross, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Matt Spencer

>>> On 03.07.18 at 10:06, <lars.kurth@citrix.com> wrote:
> The GitLab discussion was really interesting. Looking at OSSTEST, it 
> basically performs build test and integration tests on Hardware. Whereas all 
> these are needed, build testing and testing of functionality that does not 
> depend on hardware could be done earlier. The ghist of the GitLab discussion 
> was to "move" build testing - and possibly some basic integration testing - 
> to the point of submitting a patch. The basic flow is:
> * Someone posts a patch to the list => this will start the GitLab machinery 
> * The Gitlab machinery will do build tests (and the discussion showed that 
> we should be able to do this via cross compilation or compilation on a real 
> system if a service such as infosifter is used - @Matt: I can't find the 
> company via Google, maybe the spelling in the minutes is wrong)
> * This could eventually include a basic set of smoke tests that are system 
> independent and could run under QEMU - Doug already uses a basic test where a 
> xen host and/or VM is started 
> * If it fails, a mail is sent in reply to the patch submission by a bot - 
> that piece has been developed by Intel for QEMU and Linux and could be 
> re-used
> 
> This would free up time by reviewers and also leads to issues found earlier. 
> In other words, OSSTEST would merely re-test what had been tested earlier and 
> would focus on testing on real hardware. Thus, OSSTEST failures should become 
> less likely. But obviously implementing such a system, even though all the 
> pieces for it exist, will take some time.

But the problem is rarely with actual build issues, and much more
frequently with other hickups. Plus osstest re-testing what had already
been tested elsewhere is not going to help osstest's bandwidth. Yet with
now 6 stable branches regularly needing testing, bandwidth is an issue.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-02 18:03 ` Lars Kurth
  2018-07-03  6:26   ` Juergen Gross
@ 2018-07-03 10:07   ` Roger Pau Monné
  2018-07-03 10:23     ` Lars Kurth
                       ` (2 more replies)
  2018-07-05 17:48   ` Doug Goldstein
  2 siblings, 3 replies; 82+ messages in thread
From: Roger Pau Monné @ 2018-07-03 10:07 UTC (permalink / raw)
  To: Lars Kurth
  Cc: xen-devel, Rich Persaud, Doug Goldstein, advisory-board, committers

On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> We then had a discussion around why the positive benefits didn't materialize:
> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>   merely provide an incentive not to fix them.
> * Issues highlighted were:
>   * 2-3 months stabilizing period is too long

I think one of the goals with the 6 month release cycle was to shrink
the stabilizing period, but it didn't turn that way, and the
stabilizing period is quite similar with a 6 or a 9 month release
cycle.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03  8:13       ` Lars Kurth
@ 2018-07-03 10:13         ` Jan Beulich
  0 siblings, 0 replies; 82+ messages in thread
From: Jan Beulich @ 2018-07-03 10:13 UTC (permalink / raw)
  To: Lars Kurth
  Cc: Juergen Gross, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Matt Spencer

>>> On 03.07.18 at 10:13, <lars.kurth@citrix.com> wrote:
> On 03/07/2018, 08:00, "Jan Beulich" <JBeulich@suse.com> wrote:
>     Fundamentally the problem can as well be seen when looking at any
>     of the stable branches: The variety of authors there is significantly
>     more narrow than for what goes into master. I understand people
>     mostly care about their features, but there ought to be a certain
>     level of responsibility beyond that by everyone. For example, I'd
>     sort of expect it to be the rule rather than the exception that
>     people look at nearby code or code they clone, and address issues
>     they see. At the risk of repeating myself, a large number of the
>     security issues found results from paying attention to nearby code
>     (also during code review). Looking over the list of reporters there
>     very well supports my statement above regarding feature
>     submission authors vs bug fix ones.
> 
> That is understood: if the project leadership agrees, then this is no issue, 
> as committers essentially are the gate keepers for what goes in. So in other 
> words, if committers are mainly focussing on getting a release out, 
> necessarily even if master is open during hardening, development would still 
> be slower than before. I don't think any contributors would have an issue 
> with this.

You didn't get one of my main points then: It is _contributers_ whom
I'd expect to focus more on stabilization (in general, not just during
freezes).

>     Which reminds me of a related question: How do we define
>     maintainership? Is it really enough to ack a few patches here and
>     there to be considered a maintainer? To me, code maintenance
>     also (and perhaps first of all) means actively looking after the
>     code. And yes, I'm aware that an implication of the implication
>     here might be the undesirable situation of us having more
>     unmaintained code in the tree and/or even larger bodies of code
>     in even fewer hands. So it is (as almost always) a matter of
>     weighing pros and cons.
>     
> What would speak against elevating the more active maintainers to committers 
> (maybe on probation for a fixed time period, not yet responsible for THE 
> REST). Would this help in your view?

I don't think so, no. We need more active maintainers, not more
committers.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:01       ` Jan Beulich
@ 2018-07-03 10:14         ` Lars Kurth
  2018-07-03 11:29           ` Wei Liu
  0 siblings, 1 reply; 82+ messages in thread
From: Lars Kurth @ 2018-07-03 10:14 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Matt Spencer



On 03/07/2018, 11:01, "Jan Beulich" <JBeulich@suse.com> wrote:

    >>> On 03.07.18 at 10:06, <lars.kurth@citrix.com> wrote:
    > The GitLab discussion was really interesting. Looking at OSSTEST, it 
    > basically performs build test and integration tests on Hardware. Whereas all 
    > these are needed, build testing and testing of functionality that does not 
    > depend on hardware could be done earlier. The ghist of the GitLab discussion 
    > was to "move" build testing - and possibly some basic integration testing - 
    > to the point of submitting a patch. The basic flow is:
    > * Someone posts a patch to the list => this will start the GitLab machinery 
    > * The Gitlab machinery will do build tests (and the discussion showed that 
    > we should be able to do this via cross compilation or compilation on a real 
    > system if a service such as infosifter is used - @Matt: I can't find the 
    > company via Google, maybe the spelling in the minutes is wrong)
    > * This could eventually include a basic set of smoke tests that are system 
    > independent and could run under QEMU - Doug already uses a basic test where a 
    > xen host and/or VM is started 
    > * If it fails, a mail is sent in reply to the patch submission by a bot - 
    > that piece has been developed by Intel for QEMU and Linux and could be 
    > re-used
    > 
    > This would free up time by reviewers and also leads to issues found earlier. 
    > In other words, OSSTEST would merely re-test what had been tested earlier and 
    > would focus on testing on real hardware. Thus, OSSTEST failures should become 
    > less likely. But obviously implementing such a system, even though all the 
    > pieces for it exist, will take some time.
    
    But the problem is rarely with actual build issues, and much more
    frequently with other hickups. Plus osstest re-testing what had already
    been tested elsewhere is not going to help osstest's bandwidth. Yet with
    now 6 stable branches regularly needing testing, bandwidth is an issue.
    
OK, I didn't realize bandwidth is the primary issue. If it is, we can fix this part by throwing more HW at the problem.
Let me have a chat with Ian when I am in the office and come up with a list of issues from his perspective and feed this back into the thread.

Lars
    
    
    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:07   ` Roger Pau Monné
@ 2018-07-03 10:23     ` Lars Kurth
  2018-07-03 10:47       ` Juergen Gross
  2018-07-03 10:30     ` Juergen Gross
  2018-07-04 15:26     ` George Dunlap
  2 siblings, 1 reply; 82+ messages in thread
From: Lars Kurth @ 2018-07-03 10:23 UTC (permalink / raw)
  To: Roger Pau Monne, 'Jan Beulich'
  Cc: xen-devel, Rich Persaud, Doug Goldstein, advisory-board, committers

Combined reply to Jan and Roger
Lars

On 03/07/2018, 11:07, "Roger Pau Monne" <roger.pau@citrix.com> wrote:

    On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
    > We then had a discussion around why the positive benefits didn't materialize:
    > * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
    >   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
    >   merely provide an incentive not to fix them.
    > * Issues highlighted were:
    >   * 2-3 months stabilizing period is too long
    
    I think one of the goals with the 6 month release cycle was to shrink
    the stabilizing period, but it didn't turn that way, and the
    stabilizing period is quite similar with a 6 or a 9 month release
    cycle.

Right: we need to establish what the reasons are:
* One has to do with a race condition between security issues and the desire to cut a release which has issues fixed in it. If I remember correctly, that has in effect almost added a month to the last few releases (more to this one). 
* One seems to have to do with issues with OSSTEST
* <Please add other reasons>

On 03/07/2018, 08:00, "Jan Beulich" <JBeulich@suse.com> wrote:
     Fundamentally the problem can as well be seen when looking at any
     of the stable branches: The variety of authors there is significantly
     more narrow than for what goes into master. I understand people
     mostly care about their features, but there ought to be a certain
     level of responsibility beyond that by everyone. For example, I'd
     sort of expect it to be the rule rather than the exception that
     people look at nearby code or code they clone, and address issues
     they see. At the risk of repeating myself, a large number of the
     security issues found results from paying attention to nearby code
     (also during code review). Looking over the list of reporters there
     very well supports my statement above regarding feature
     submission authors vs bug fix ones.

  That is understood: if the project leadership agrees, then this is no issue,
  as committers essentially are the gate keepers for what goes in. So in other
  words, if committers are mainly focussing on getting a release out,
  necessarily even if master is open during hardening, development would still 
  be slower than before. I don't think any contributors would have an issue
  with this.

  You didn't get one of my main points then: It is _contributors_ whom
  I'd expect to focus more on stabilization (in general, not just during
  freezes).

You are right: I misunderstood. You talked about people, which is ambiguous.
I am wondering how we can encourage different behaviour. Have to think about this. 

Lars

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:07   ` Roger Pau Monné
  2018-07-03 10:23     ` Lars Kurth
@ 2018-07-03 10:30     ` Juergen Gross
  2018-07-04 15:26     ` George Dunlap
  2 siblings, 0 replies; 82+ messages in thread
From: Juergen Gross @ 2018-07-03 10:30 UTC (permalink / raw)
  To: Roger Pau Monné, Lars Kurth
  Cc: xen-devel, Rich Persaud, Doug Goldstein, advisory-board, committers

On 03/07/18 12:07, Roger Pau Monné wrote:
> On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
>> We then had a discussion around why the positive benefits didn't materialize:
>> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>>   merely provide an incentive not to fix them.
>> * Issues highlighted were:
>>   * 2-3 months stabilizing period is too long
> 
> I think one of the goals with the 6 month release cycle was to shrink
> the stabilizing period, but it didn't turn that way, and the
> stabilizing period is quite similar with a 6 or a 9 month release
> cycle.

I guess that is to be expected.

Its not as if OSSTEST would run only during the stabilizing period.
The stabilizing period will be used to catch the bugs introduced since
the last successful OSSTEST run (which shouldn't be older than about
one month) and to find some rarely triggering bugs, which could easily
be present in older releases, too (e.g. the hypercall buffer issue
found only now).

<sarcasm>
This would lead to the conclusion that the stabilizing period can be
made shorter only by shortening the development period to less than
the average time between OSSTEST pushs.

Or IOW: we can maximize the ratio development/stabilizing by making
the development period as long as possible.
</sarcasm>


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:23     ` Lars Kurth
@ 2018-07-03 10:47       ` Juergen Gross
  2018-07-03 11:24         ` Lars Kurth
                           ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Juergen Gross @ 2018-07-03 10:47 UTC (permalink / raw)
  To: Lars Kurth, Roger Pau Monne, 'Jan Beulich'
  Cc: xen-devel, Rich Persaud, Doug Goldstein, advisory-board, committers

On 03/07/18 12:23, Lars Kurth wrote:
> Combined reply to Jan and Roger
> Lars
> 
> On 03/07/2018, 11:07, "Roger Pau Monne" <roger.pau@citrix.com> wrote:
> 
>     On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
>     > We then had a discussion around why the positive benefits didn't materialize:
>     > * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>     >   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>     >   merely provide an incentive not to fix them.
>     > * Issues highlighted were:
>     >   * 2-3 months stabilizing period is too long
>     
>     I think one of the goals with the 6 month release cycle was to shrink
>     the stabilizing period, but it didn't turn that way, and the
>     stabilizing period is quite similar with a 6 or a 9 month release
>     cycle.
> 
> Right: we need to establish what the reasons are:
> * One has to do with a race condition between security issues and the desire to cut a release which has issues fixed in it. If I remember correctly, that has in effect almost added a month to the last few releases (more to this one). 

The only way to avoid that would be to not allow any security fixes to
be included in the release the last few weeks before the planned release
date. I don't think this is a good idea. I'd rather miss the planned
release date.

BTW: the problem wasn't waiting for the security patches, but some
fixes for those needed. And this is something you can never rule out.
And waiting for the fixes meant new security fixes being ready...

> * One seems to have to do with issues with OSSTEST

... which in turn led to more security fixes being available.

> * <Please add other reasons>

We didn't look at the sporadic failing tests thoroughly enough. The
hypercall buffer failure has been there for ages, a newer kernel just
made it more probable. This would have saved us some weeks.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:47       ` Juergen Gross
@ 2018-07-03 11:24         ` Lars Kurth
  2018-07-05 11:16         ` Ian Jackson
  2018-07-05 17:58         ` Doug Goldstein
  2 siblings, 0 replies; 82+ messages in thread
From: Lars Kurth @ 2018-07-03 11:24 UTC (permalink / raw)
  To: Juergen Gross, Roger Pau Monne, 'Jan Beulich'
  Cc: xen-devel, Rich Persaud, Doug Goldstein, committers

Taking the advisory board list off the CC list: will summarize when we have more of a plan forward

On 03/07/2018, 11:47, "Juergen Gross" <jgross@suse.com> wrote:

    On 03/07/18 12:23, Lars Kurth wrote:
    > Combined reply to Jan and Roger
    > Lars
    > 
    > On 03/07/2018, 11:07, "Roger Pau Monne" <roger.pau@citrix.com> wrote:
    > 
    >     On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
    >     > We then had a discussion around why the positive benefits didn't materialize:
    >     > * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
    >     >   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
    >     >   merely provide an incentive not to fix them.
    >     > * Issues highlighted were:
    >     >   * 2-3 months stabilizing period is too long
    >     
    >     I think one of the goals with the 6 month release cycle was to shrink
    >     the stabilizing period, but it didn't turn that way, and the
    >     stabilizing period is quite similar with a 6 or a 9 month release
    >     cycle.
    > 
    > Right: we need to establish what the reasons are:
    > * One has to do with a race condition between security issues and the desire to cut a release which has issues fixed in it. If I remember correctly, that has in effect almost added a month to the last few releases (more to this one). 
    
    The only way to avoid that would be to not allow any security fixes to
    be included in the release the last few weeks before the planned release
    date. I don't think this is a good idea. I'd rather miss the planned
    release date.

This kind of comes back down partially to opening master. When we are at the stage that we are only waiting for security issues, we should already have opened master. Although in this case, we also had 
    
    BTW: the problem wasn't waiting for the security patches, but some
    fixes for those needed. And this is something you can never rule out.
    And waiting for the fixes meant new security fixes being ready...

That is of course true. And some of the side-channel attack mitigations are generally complex and large and introduce more risk than more traditional fixes. 
    
    > * One seems to have to do with issues with OSSTEST
    
    ... which in turn led to more security fixes being available.

Agreed: because we didn't release when we planned, another set of security fixes pushed out the release. 
    
    > * <Please add other reasons>
    
    We didn't look at the sporadic failing tests thoroughly enough. The
    hypercall buffer failure has been there for ages, a newer kernel just
    made it more probable. This would have saved us some weeks.

That is certainly something we could look at. It seems to me that there is a dynamic of "because there is too much noise/random issues/HW issues", we ignore OSSTEST too often. I am wondering whether there is a way of mapping some tests to maintainers. Maintainers should certainly care about test failures in their respective areas, but to make this practical, we need to have a way to map failures and also CC reports to the right people. We could also potentially use get_maintainers.pl on the patches which are being tested (aka the staging => master transition), but we would need to know that a test was "clean" before. Maybe we need to build in an effort to deal with the sporadically failing tests: e.g. a commit moratorium until we get to a better base state.

I also think that from a mere psychological viewpoint, having some test capability at patch posting time and a patchbot rejecting a patch, would change the contribution dynamic significantly from a psychological viewpoint. In other words, it would make dealing with quality issues part of the contribution process, which kind of often seems to be deferred until commit time and/or release hardening time. Just a thought.

Also, coming back to Jan's bandwidth issue: if we had a set of more generic tests that can be offloaded into a cloud instance (e.g. via testing on QEMU), then we could reserve OSSTEST for tests which require hardware, thus potentially reducing bottlenecks. I am also wondering whether the bottleneck we are seeing is caused by the lack of good Arm test hardware (aka is that the critical path for the entire system): if so, maybe the two things can somehow be de-coupled.

These ideas are fairly half-baked right now, so open up for discussion. I wanted to get a good amount of input before we discuss at the community call.

Regards
Lars
 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:14         ` Lars Kurth
@ 2018-07-03 11:29           ` Wei Liu
  0 siblings, 0 replies; 82+ messages in thread
From: Wei Liu @ 2018-07-03 11:29 UTC (permalink / raw)
  To: Lars Kurth
  Cc: Juergen Gross, Wei Liu, advisory-board, Doug Goldstein,
	Rich Persaud, committers, Jan Beulich, xen-devel, Matt Spencer

On Tue, Jul 03, 2018 at 10:14:06AM +0000, Lars Kurth wrote:
> 
> 
> On 03/07/2018, 11:01, "Jan Beulich" <JBeulich@suse.com> wrote:
> 
>     >>> On 03.07.18 at 10:06, <lars.kurth@citrix.com> wrote:
>     > The GitLab discussion was really interesting. Looking at OSSTEST, it 
>     > basically performs build test and integration tests on Hardware. Whereas all 
>     > these are needed, build testing and testing of functionality that does not 
>     > depend on hardware could be done earlier. The ghist of the GitLab discussion 
>     > was to "move" build testing - and possibly some basic integration testing - 
>     > to the point of submitting a patch. The basic flow is:
>     > * Someone posts a patch to the list => this will start the GitLab machinery 
>     > * The Gitlab machinery will do build tests (and the discussion showed that 
>     > we should be able to do this via cross compilation or compilation on a real 
>     > system if a service such as infosifter is used - @Matt: I can't find the 
>     > company via Google, maybe the spelling in the minutes is wrong)
>     > * This could eventually include a basic set of smoke tests that are system 
>     > independent and could run under QEMU - Doug already uses a basic test where a 
>     > xen host and/or VM is started 
>     > * If it fails, a mail is sent in reply to the patch submission by a bot - 
>     > that piece has been developed by Intel for QEMU and Linux and could be 
>     > re-used
>     > 
>     > This would free up time by reviewers and also leads to issues found earlier. 
>     > In other words, OSSTEST would merely re-test what had been tested earlier and 
>     > would focus on testing on real hardware. Thus, OSSTEST failures should become 
>     > less likely. But obviously implementing such a system, even though all the 
>     > pieces for it exist, will take some time.
>     
>     But the problem is rarely with actual build issues, and much more
>     frequently with other hickups. Plus osstest re-testing what had already
>     been tested elsewhere is not going to help osstest's bandwidth. Yet with
>     now 6 stable branches regularly needing testing, bandwidth is an issue.
>     
> OK, I didn't realize bandwidth is the primary issue. If it is, we can fix this part by throwing more HW at the problem.
> Let me have a chat with Ian when I am in the office and come up with a list of issues from his perspective and feed this back into the thread.

Throwing in more hardware is good, but that also bring more hardware and
(non-Xen) software related issues. We need to weight carefully pros and
cons on this.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:07   ` Roger Pau Monné
  2018-07-03 10:23     ` Lars Kurth
  2018-07-03 10:30     ` Juergen Gross
@ 2018-07-04 15:26     ` George Dunlap
  2018-07-04 15:47       ` Ian Jackson
                         ` (2 more replies)
  2 siblings, 3 replies; 82+ messages in thread
From: George Dunlap @ 2018-07-04 15:26 UTC (permalink / raw)
  To: Roger Pau Monne
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel



> On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> 
> On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
>> We then had a discussion around why the positive benefits didn't materialize:
>> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>>   merely provide an incentive not to fix them.
>> * Issues highlighted were:
>>   * 2-3 months stabilizing period is too long
> 
> I think one of the goals with the 6 month release cycle was to shrink
> the stabilizing period, but it didn't turn that way, and the
> stabilizing period is quite similar with a 6 or a 9 month release
> cycle.

Right, and I think this was something that wasn’t quite captured in Lars’ summary.

Everyone agreed:
1. The expectation was that a shorter release cycle would lead to shorter stabilization periods
2. This has not turned out to be the case, which means
3 At the moment, our “time doing development” to “time fixing bugs for a release” ratio is far too low.

One option to fix #3 is to go back to a 9-month cycle (or even a 12-month cycle), which would increase the “development” part of the equation.

But Doug was advocating trying instead to attack the “time fixing bugs” part of the equation.  He said he was a big fan “continuous delivery” — of being *always* ready to release.  And I think there’s a fair amount of agreement that one of the reasons it takes so long to stabilize is that our testing isn’t reliably catching bugs for whatever reason.

So a fair amount of the discussion was about what it would look like, and what it would take, to make it such that almost any push from osstest (or whatever testing infrasctructure we went with) could reasonably be released, and would have a very low expectation of having extraneous bugs.

I seem to recall saying that even if we agreed that moving to continuous delivery was a goal we wanted to pursue, we would still be several years away from achieving anything like it; and so in the mean time, it would probably make sense to move back to a 9-month cycle while we attack the problem.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-04 15:26     ` George Dunlap
@ 2018-07-04 15:47       ` Ian Jackson
  2018-07-04 15:59         ` Steven Haigh
  2018-07-04 15:51       ` Steven Haigh
  2018-07-05  7:53       ` Wei Liu
  2 siblings, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-04 15:47 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Roger Pau Monne

George Dunlap writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> I seem to recall saying that even if we agreed that moving to continuous delivery was a goal we wanted to pursue, we would still be several years away from achieving anything like it; and so in the mean time, it would probably make sense to move back to a 9-month cycle while we attack the problem.

Another thing that is that as our window of N years'
security-supported releases has filled up with ~6-month releases,
there are more of them.

I know we had concerns that this makes backporting harder.  I'm not
really sure that's true.  The total amount of backporting lossage
(merge conflicts etc.) is the same, and trivial automatic backports
are nearly no work.

But one thing that is noticeable is that this significantly increases
our test load when a security update comes out.  Each
security-supported branch gets updates, and osstest suddenly needs to
test them all.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-04 15:26     ` George Dunlap
  2018-07-04 15:47       ` Ian Jackson
@ 2018-07-04 15:51       ` Steven Haigh
  2018-07-05  7:53       ` Wei Liu
  2 siblings, 0 replies; 82+ messages in thread
From: Steven Haigh @ 2018-07-04 15:51 UTC (permalink / raw)
  To: xen-devel
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 4458 bytes --]

On Thursday, 5 July 2018 1:26:16 AM AEST George Dunlap wrote:
> > On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com>
> > wrote:
> > 
> > On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> > 
> >> We then had a discussion around why the positive benefits didn't
> >> materialize:
 * Andrew and a few other believe that the model isn't
> >> broken, but that the issue is with how we>> 
> >>   develop. In other words, moving to a 9 months model will *not* fix the
> >>   underlying issues, but 
 merely provide an incentive not to fix them.
> >> 
> >> * Issues highlighted were:
> >> 
> >>   * 2-3 months stabilizing period is too long
> > 
> > 
> > I think one of the goals with the 6 month release cycle was to shrink
> > the stabilizing period, but it didn't turn that way, and the
> > stabilizing period is quite similar with a 6 or a 9 month release
> > cycle.
> 
> 
> Right, and I think this was something that wasn’t quite captured in Lars’
> summary.
 
> Everyone agreed:
> 1. The expectation was that a shorter release cycle would lead to shorter
> stabilization periods
 2. This has not turned out to be the case, which
> means
> 3 At the moment, our “time doing development” to “time fixing bugs for a
> release” ratio is far too low.
 
> One option to fix #3 is to go back to a 9-month cycle (or even a 12-month
> cycle), which would increase the “development” part of the equation.
 
> But Doug was advocating trying instead to attack the “time fixing bugs” part
> of the equation.  He said he was a big fan “continuous delivery” — of being
> *always* ready to release.  And I think there’s a fair amount of agreement
> that one of the reasons it takes so long to stabilize is that our testing
> isn’t reliably catching bugs for whatever reason.

On this point alone, release quickly, release often. The kernel for instance 
runs about 1 release per week on the stable branches - sometimes 2.

With regular process, releases should be easy to achieve - and that means 
automation, automation and more automation.

From my end as a 'consumer' of this process, a regular process makes it easy 
for me to rebase work based on a regular release. In fact, it usually takes me 
less than 30 minutes to update, compile, package and release kernel builds - 
and all of that is compile time. I check kernel.org every 6 hours, and fire 
off automated builds as needed. The reality is, I could easily make this 
hourly and reduce time from release to package to less than an hour - but at 
that point is it worth it? :)

The longest delay on getting this to the clients is the time it takes for non-
local mirrors to catch up.
 
> So a fair amount of the discussion was about what it would look like, and
> what it would take, to make it such that almost any push from osstest (or
> whatever testing infrasctructure we went with) could reasonably be
> released, and would have a very low expectation of having extraneous bugs.

The key here is testing. If we started at the release end of the tree, 
something as simple as a git tag triggers a tarball export of the tree 
versioned as per the tag may well be a huge step forward in the release.

Then time can be taken after this in tweaking the testing part as we find 
things that become obvious through the rapid release process.

> I seem to recall saying that even if we agreed that moving to continuous
> delivery was a goal we wanted to pursue, we would still be several years
> away from achieving anything like it; and so in the mean time, it would
> probably make sense to move back to a 9-month cycle while we attack the
> problem.

Honestly, as a packager, I don't see ground-breaking changes in any version of 
Xen that is currently released. Most are optimisations like the new PVH code.

My point here is that really, are there enough major features that would break 
everything to require a new major release every 9 months?

Would 12 months for major features be more suitable and keep smaller 
refinements / additions as point releases?

With a solid base via testing, there's no real reason why a weekly (or daily) 
release wouldn't be technically feasible - apart from not having enough 
changes to justify it ;)

-- 
Steven Haigh

📧 netwiz@crc.id.au       💻 https://www.crc.id.au
📞 +61 (3) 9001 6090    📱 0412 935 897

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-04 15:47       ` Ian Jackson
@ 2018-07-04 15:59         ` Steven Haigh
  0 siblings, 0 replies; 82+ messages in thread
From: Steven Haigh @ 2018-07-04 15:59 UTC (permalink / raw)
  To: xen-devel
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 1932 bytes --]

On Thursday, 5 July 2018 1:47:27 AM AEST Ian Jackson wrote:
> George Dunlap writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
session] Process changes: is the 6 monthly release Cadence too short, Security 
Process, ..."):
> > I seem to recall saying that even if we agreed that moving to continuous
> > delivery was a goal we wanted to pursue, we would still be several years
> > away from achieving anything like it; and so in the mean time, it would
> > probably make sense to move back to a 9-month cycle while we attack the
> > problem.
> Another thing that is that as our window of N years'
> security-supported releases has filled up with ~6-month releases,
> there are more of them.
> 
> I know we had concerns that this makes backporting harder.  I'm not
> really sure that's true.  The total amount of backporting lossage
> (merge conflicts etc.) is the same, and trivial automatic backports
> are nearly no work.
> 
> But one thing that is noticeable is that this significantly increases
> our test load when a security update comes out.  Each
> security-supported branch gets updates, and osstest suddenly needs to
> test them all.

I did like the idea of an 'LTS' vs 'testing' release. The idea that odd / even 
or similar varied in lifecycle - allowing entire versions to be killed off 
rapidly.

If we had 'LTS' support at 2 (or more?) years, and 'testing' support at 6 (or 
less?) months, would this help?

I guess the idea would be that the 'testing' versions are where all the rapid 
features are added, which is then 'frozen' into an LTS release at some point 
to only get security fixes via point releases (ala linux kernel version type 
bumping).

Is this even practical? I guess this would depend on how often an LTS release 
is spawned...

-- 
Steven Haigh

📧 netwiz@crc.id.au       💻 https://www.crc.id.au
📞 +61 (3) 9001 6090    📱 0412 935 897

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-04 15:26     ` George Dunlap
  2018-07-04 15:47       ` Ian Jackson
  2018-07-04 15:51       ` Steven Haigh
@ 2018-07-05  7:53       ` Wei Liu
  2018-07-05  8:06         ` Roger Pau Monné
                           ` (3 more replies)
  2 siblings, 4 replies; 82+ messages in thread
From: Wei Liu @ 2018-07-05  7:53 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, Wei Liu, advisory-board, Doug Goldstein,
	Rich Persaud, committers, xen-devel, Roger Pau Monne

On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> 
> 
> > On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> > 
> > On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> >> We then had a discussion around why the positive benefits didn't materialize:
> >> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
> >>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
> >>   merely provide an incentive not to fix them.
> >> * Issues highlighted were:
> >>   * 2-3 months stabilizing period is too long
> > 
> > I think one of the goals with the 6 month release cycle was to shrink
> > the stabilizing period, but it didn't turn that way, and the
> > stabilizing period is quite similar with a 6 or a 9 month release
> > cycle.
> 
> Right, and I think this was something that wasn’t quite captured in Lars’ summary.
> 
> Everyone agreed:
> 1. The expectation was that a shorter release cycle would lead to shorter stabilization periods
> 2. This has not turned out to be the case, which means
> 3 At the moment, our “time doing development” to “time fixing bugs for a release” ratio is far too low.
> 
> One option to fix #3 is to go back to a 9-month cycle (or even a
> 12-month cycle), which would increase the “development” part of the
> equation.

You get more changes in, you also get more bugs.  Assuming bugs are
introduced at a constant rate in relation to changes, moving back to 9
months won't help.

At least in my experience, a majority of time during the freeze is spent
on *waiting*. Waiting for osstest to turn around, waiting for security
issues to become public. Moving to 9 months won't change those factors.

A typical bug would need five working days (one week) to fix.

1. Someone report or osstest reports a bug. (Day 1)
2. Someone analyses it and writes a patch. (Day 2)
3. Someone reviews it. (Day 2 or 3).
4. Someone commits it. (Day 3 or 4).
5. Osstest produces test results (Day 3 to 5).

For a simple bug, we might finish 1-4 in one day. But we still need to
allow for at least two days to get a push.

In reality, a number of factors actually prolong getting things fixed
(in the sense that patches are pushed to master): 1. bug fixes are
incomplete; 2. hardware issues in test system; 3. other random hiccups.
Should any of these happens, another 2 to 3 days is required to get
patches pushed.

Osstest is really resource intense and heavy weight. We need to think of
a way to  reduce its turnaround time.  Or we can introduce other
auxiliary test systems to reduce its burden.

> 
> But Doug was advocating trying instead to attack the “time fixing
> bugs” part of the equation.  He said he was a big fan “continuous
> delivery” — of being *always* ready to release.  And I think there’s a
> fair amount of agreement that one of the reasons it takes so long to
> stabilize is that our testing isn’t reliably catching bugs for
> whatever reason.

I also think CD is a good idea.

Is there anything technical that stops us doing this? I don't think so.
We just need more automation. Maintainers push tags and tarballs are
automatically produced. Currently the process involves a lot of manual
work. I don't think that's a good use of people's time frankly.

> 
> So a fair amount of the discussion was about what it would look like,
> and what it would take, to make it such that almost any push from
> osstest (or whatever testing infrasctructure we went with) could
> reasonably be released, and would have a very low expectation of
> having extraneous bugs.

I would also like to advocate changing the mentality a bit. The current
mentality is that "we want to be reasonably sure there is low
expectation of bugs before we can release". Why not change to "we
release when we're sure there is definitely improvement in the tree
compared to last release"?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  7:53       ` Wei Liu
@ 2018-07-05  8:06         ` Roger Pau Monné
  2018-07-05  8:19           ` Wei Liu
  2018-07-05  8:28         ` George Dunlap
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2018-07-05  8:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, xen-devel

On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
> On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> > So a fair amount of the discussion was about what it would look like,
> > and what it would take, to make it such that almost any push from
> > osstest (or whatever testing infrasctructure we went with) could
> > reasonably be released, and would have a very low expectation of
> > having extraneous bugs.
> 
> I would also like to advocate changing the mentality a bit. The current
> mentality is that "we want to be reasonably sure there is low
> expectation of bugs before we can release". Why not change to "we
> release when we're sure there is definitely improvement in the tree
> compared to last release"?

The current guideline is quite objective, if there are no reported
bugs and osstest flight doesn't show any regressions we are ready to
release. OTOH how should the improvements to the tree be quantized and
measured?

At any point during the development or the release process the tree
will contain improvements in some areas compared to the last
release.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:06         ` Roger Pau Monné
@ 2018-07-05  8:19           ` Wei Liu
  2018-07-05  8:43             ` Roger Pau Monné
  0 siblings, 1 reply; 82+ messages in thread
From: Wei Liu @ 2018-07-05  8:19 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lars Kurth, Wei Liu, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel

On Thu, Jul 05, 2018 at 10:06:52AM +0200, Roger Pau Monné wrote:
> On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
> > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> > > So a fair amount of the discussion was about what it would look like,
> > > and what it would take, to make it such that almost any push from
> > > osstest (or whatever testing infrasctructure we went with) could
> > > reasonably be released, and would have a very low expectation of
> > > having extraneous bugs.
> > 
> > I would also like to advocate changing the mentality a bit. The current
> > mentality is that "we want to be reasonably sure there is low
> > expectation of bugs before we can release". Why not change to "we
> > release when we're sure there is definitely improvement in the tree
> > compared to last release"?
> 
> The current guideline is quite objective, if there are no reported
> bugs and osstest flight doesn't show any regressions we are ready to
> release. OTOH how should the improvements to the tree be quantized and
> measured?

Say, a security bug is fixed? A major bug is closed?

> 
> At any point during the development or the release process the tree
> will contain improvements in some areas compared to the last
> release.

Yes, that is right. That's what CD does, right?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  7:53       ` Wei Liu
  2018-07-05  8:06         ` Roger Pau Monné
@ 2018-07-05  8:28         ` George Dunlap
  2018-07-05  8:44           ` Wei Liu
  2018-07-05  8:31         ` Juergen Gross
  2018-07-05 11:19         ` Ian Jackson
  3 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2018-07-05  8:28 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Roger Pau Monne



> On Jul 5, 2018, at 8:53 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
>> 
>> 
>>> On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>> 
>>> On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
>>>> We then had a discussion around why the positive benefits didn't materialize:
>>>> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>>>>  develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>>>>  merely provide an incentive not to fix them.
>>>> * Issues highlighted were:
>>>>  * 2-3 months stabilizing period is too long
>>> 
>>> I think one of the goals with the 6 month release cycle was to shrink
>>> the stabilizing period, but it didn't turn that way, and the
>>> stabilizing period is quite similar with a 6 or a 9 month release
>>> cycle.
>> 
>> Right, and I think this was something that wasn’t quite captured in Lars’ summary.
>> 
>> Everyone agreed:
>> 1. The expectation was that a shorter release cycle would lead to shorter stabilization periods
>> 2. This has not turned out to be the case, which means
>> 3 At the moment, our “time doing development” to “time fixing bugs for a release” ratio is far too low.
>> 
>> One option to fix #3 is to go back to a 9-month cycle (or even a
>> 12-month cycle), which would increase the “development” part of the
>> equation.
> 
> You get more changes in, you also get more bugs.  Assuming bugs are
> introduced at a constant rate in relation to changes, moving back to 9
> months won't help.

Er, sorry — are you saying you think that the stabilization period we’ve had for 6-month releases *has* been shorter than the one we’ve had for 9-month releases?  Or are you saying they’re the same length, but due to some other reason (such as, we’ve been finding more bugs), and so if we go back to 9 months it will now be *even longer* than it was before?

Bugs are found outside of the “stabilization” window; so going from 6 to 9 months won’t result in 50% more bugs to find during the stabilization window — many of those will have been found during the normal course of development.

> 
> At least in my experience, a majority of time during the freeze is spent
> on *waiting*. Waiting for osstest to turn around, waiting for security
> issues to become public. Moving to 9 months won't change those factors.

Agreed, but rather than spending 3 months developing and 3 months stabilizing (as it has nearly been), we’d be spending 6 months developing and 3 months stabilizing.  

Obviously 7 months developing and 2 months stabilizing or 8 months developing and 1 month stabilizing would be even better, so doing the testing / automation improvements discussed would definitely be worthwhile.  But can we get all those improvements done in a reasonable amount of time?

Personally I much prefer the idea of doing 6-month releases.  But at the moment it seems clear to me that 1) it's causing a lot of extra overhead, and 2) we can’t fix the root causes in any reasonable time frame.  Given that the extra overhead will *also* distract us from fixing root causes, I think it makes sense to consider moving back to 9 months in the short term, and reconsidering once we’ve sorted out all our automation / testing issues.

> 
>> 
>> So a fair amount of the discussion was about what it would look like,
>> and what it would take, to make it such that almost any push from
>> osstest (or whatever testing infrasctructure we went with) could
>> reasonably be released, and would have a very low expectation of
>> having extraneous bugs.
> 
> I would also like to advocate changing the mentality a bit. The current
> mentality is that "we want to be reasonably sure there is low
> expectation of bugs before we can release". Why not change to "we
> release when we're sure there is definitely improvement in the tree
> compared to last release”?

I’m pretty sure most people consuming our releases care much more about bugs than about new features.  If *we* aren’t doing the testing to find some point that is reasonably bug-free, then who will?  Fedora, Centos, Debian certainly don’t have the resources for that, which means suddenly those become not viable platforms for  normal users anymore.  Citrix, SuSE, and Oracle have to go back to duplicating their own testing and having massive patchqueues on every release.  And where does that leave less well-funded projects, like QubesOS and OpenXT?

I think our bug-finding standard is about right.  It doesn’t need to be as high as for a commercial offering, but it certainly needs to be good enough for an average motivated user.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  7:53       ` Wei Liu
  2018-07-05  8:06         ` Roger Pau Monné
  2018-07-05  8:28         ` George Dunlap
@ 2018-07-05  8:31         ` Juergen Gross
  2018-07-05  8:55           ` Wei Liu
  2018-07-05 11:19         ` Ian Jackson
  3 siblings, 1 reply; 82+ messages in thread
From: Juergen Gross @ 2018-07-05  8:31 UTC (permalink / raw)
  To: Wei Liu, George Dunlap
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Roger Pau Monne

On 05/07/18 09:53, Wei Liu wrote:
> On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
>>
>>
>>> On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
>>>
>>> On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
>>>> We then had a discussion around why the positive benefits didn't materialize:
>>>> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
>>>>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
>>>>   merely provide an incentive not to fix them.
>>>> * Issues highlighted were:
>>>>   * 2-3 months stabilizing period is too long
>>>
>>> I think one of the goals with the 6 month release cycle was to shrink
>>> the stabilizing period, but it didn't turn that way, and the
>>> stabilizing period is quite similar with a 6 or a 9 month release
>>> cycle.
>>
>> Right, and I think this was something that wasn’t quite captured in Lars’ summary.
>>
>> Everyone agreed:
>> 1. The expectation was that a shorter release cycle would lead to shorter stabilization periods
>> 2. This has not turned out to be the case, which means
>> 3 At the moment, our “time doing development” to “time fixing bugs for a release” ratio is far too low.
>>
>> One option to fix #3 is to go back to a 9-month cycle (or even a
>> 12-month cycle), which would increase the “development” part of the
>> equation.
> 
> You get more changes in, you also get more bugs.  Assuming bugs are
> introduced at a constant rate in relation to changes, moving back to 9
> months won't help.

Uuh, why not? It isn't as if no bugs are found and corrected in the
development period. As long as the development period is longer than
the average time between OSSTEST pushs the stabilization period should
be roghly constant. So a longer development period will result in a
better ratio development / stabilization.

> At least in my experience, a majority of time during the freeze is spent
> on *waiting*. Waiting for osstest to turn around, waiting for security
> issues to become public. Moving to 9 months won't change those factors.

But waiting isn't a factor, it is a constant, assuming the number of
unresolved bugs at the end of the development period is roughly the
same. And that will be the case if we don't:

a) pay no attention to OSSTEST results during development (i.e. trying
   to get pushs as often as possible)

b) rush most series in at the end of the development period

> A typical bug would need five working days (one week) to fix.
> 
> 1. Someone report or osstest reports a bug. (Day 1)
> 2. Someone analyses it and writes a patch. (Day 2)
> 3. Someone reviews it. (Day 2 or 3).
> 4. Someone commits it. (Day 3 or 4).
> 5. Osstest produces test results (Day 3 to 5).
> 
> For a simple bug, we might finish 1-4 in one day. But we still need to
> allow for at least two days to get a push.

In case of OSSTEST often enough multiple bugs are reported in parallel
and can (and should) be processed concurrently.

> In reality, a number of factors actually prolong getting things fixed
> (in the sense that patches are pushed to master): 1. bug fixes are
> incomplete; 2. hardware issues in test system; 3. other random hiccups.
> Should any of these happens, another 2 to 3 days is required to get
> patches pushed.

4. rarely triggering bugs which have been ignored before re-surface
and cause delays


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:19           ` Wei Liu
@ 2018-07-05  8:43             ` Roger Pau Monné
  2018-07-05  8:47               ` Wei Liu
  2018-07-05 10:43               ` Sander Eikelenboom
  0 siblings, 2 replies; 82+ messages in thread
From: Roger Pau Monné @ 2018-07-05  8:43 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, xen-devel

On Thu, Jul 05, 2018 at 09:19:10AM +0100, Wei Liu wrote:
> On Thu, Jul 05, 2018 at 10:06:52AM +0200, Roger Pau Monné wrote:
> > On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
> > > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> > > > So a fair amount of the discussion was about what it would look like,
> > > > and what it would take, to make it such that almost any push from
> > > > osstest (or whatever testing infrasctructure we went with) could
> > > > reasonably be released, and would have a very low expectation of
> > > > having extraneous bugs.
> > > 
> > > I would also like to advocate changing the mentality a bit. The current
> > > mentality is that "we want to be reasonably sure there is low
> > > expectation of bugs before we can release". Why not change to "we
> > > release when we're sure there is definitely improvement in the tree
> > > compared to last release"?
> > 
> > The current guideline is quite objective, if there are no reported
> > bugs and osstest flight doesn't show any regressions we are ready to
> > release. OTOH how should the improvements to the tree be quantized and
> > measured?
> 
> Say, a security bug is fixed? A major bug is closed?

I think this is still quite subjective, whereas the previous criteria
was objective.

Who will take the decision of whether a bug is major or not?

> > 
> > At any point during the development or the release process the tree
> > will contain improvements in some areas compared to the last
> > release.
> 
> Yes, that is right. That's what CD does, right?

I thin so, but I'm not an expert on development techniques TBH :).

IMO one of the problems with Xen is that users don't tend to test
master often, I assume this is because Xen is a critical piece of
their infra, and they require it to be completely stable. Not everyone
can effort an extra box just for testing Xen master. I'm not sure this
is going to change a lot even if nightly builds are provided.

This is different from say an email or IRC clients, where people don't
mind that much using unstable versions, and so the development branch
gets more testing even before the release process starts.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:28         ` George Dunlap
@ 2018-07-05  8:44           ` Wei Liu
  0 siblings, 0 replies; 82+ messages in thread
From: Wei Liu @ 2018-07-05  8:44 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, Wei Liu, advisory-board, Doug Goldstein,
	Rich Persaud, committers, xen-devel, Roger Pau Monne

On Thu, Jul 05, 2018 at 09:28:07AM +0100, George Dunlap wrote:
> 
> 
> > On Jul 5, 2018, at 8:53 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> > 
> > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> >> 
> >> 
> >>> On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>> 
> >>> On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> >>>> We then had a discussion around why the positive benefits didn't materialize:
> >>>> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
> >>>>  develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
> >>>>  merely provide an incentive not to fix them.
> >>>> * Issues highlighted were:
> >>>>  * 2-3 months stabilizing period is too long
> >>> 
> >>> I think one of the goals with the 6 month release cycle was to shrink
> >>> the stabilizing period, but it didn't turn that way, and the
> >>> stabilizing period is quite similar with a 6 or a 9 month release
> >>> cycle.
> >> 
> >> Right, and I think this was something that wasn’t quite captured in Lars’ summary.
> >> 
> >> Everyone agreed:
> >> 1. The expectation was that a shorter release cycle would lead to shorter stabilization periods
> >> 2. This has not turned out to be the case, which means
> >> 3 At the moment, our “time doing development” to “time fixing bugs for a release” ratio is far too low.
> >> 
> >> One option to fix #3 is to go back to a 9-month cycle (or even a
> >> 12-month cycle), which would increase the “development” part of the
> >> equation.
> > 
> > You get more changes in, you also get more bugs.  Assuming bugs are
> > introduced at a constant rate in relation to changes, moving back to 9
> > months won't help.
> 
> Er, sorry — are you saying you think that the stabilization period
> we’ve had for 6-month releases *has* been shorter than the one we’ve
> had for 9-month releases?  Or are you saying they’re the same length,
> but due to some other reason (such as, we’ve been finding more bugs),
> and so if we go back to 9 months it will now be *even longer* than it
> was before?
> 

The stabilisation period for 6 months is actually shorter or of same
length proportional to development window -- barring any unpredictable
security issues.

> Bugs are found outside of the “stabilization” window; so going from 6
> to 9 months won’t result in 50% more bugs to find during the
> stabilization window — many of those will have been found during the
> normal course of development.

Right. Bugs are found outside of that stabilisation window, but bugs are
also introduced outside of the same window.  "Stabilisation" in Xen's
sense means no more new features. If we keep committing new features, we
get new bugs.

To be clear, I'm not against moving back to 9 months, but I'm not buying
the argument that moving to 9 months changes the pattern of
stabilisation period. We will still get bugs proportional to development
window.

> 
> > 
> > At least in my experience, a majority of time during the freeze is spent
> > on *waiting*. Waiting for osstest to turn around, waiting for security
> > issues to become public. Moving to 9 months won't change those factors.
> 

> Agreed, but rather than spending 3 months developing and 3 months
> stabilizing (as it has nearly been), we’d be spending 6 months
> developing and 3 months stabilizing.  

Well 6 months means 4 months development + 2 months stabilisation.
There has never been 3+3. So it is the same as 6+3 in terms of ratio.

> 
> Obviously 7 months developing and 2 months stabilizing or 8 months
> developing and 1 month stabilizing would be even better, so doing the
> testing / automation improvements discussed would definitely be
> worthwhile.  But can we get all those improvements done in a
> reasonable amount of time?
> 
> Personally I much prefer the idea of doing 6-month releases.  But at
> the moment it seems clear to me that 1) it's causing a lot of extra
> overhead, and 2) we can’t fix the root causes in any reasonable time
> frame.  Given that the extra overhead will *also* distract us from
> fixing root causes, I think it makes sense to consider moving back to
> 9 months in the short term, and reconsidering once we’ve sorted out
> all our automation / testing issues.

If we move back to 9 months. I think we should try 7+2.

> 
> > 
> >> 
> >> So a fair amount of the discussion was about what it would look like,
> >> and what it would take, to make it such that almost any push from
> >> osstest (or whatever testing infrasctructure we went with) could
> >> reasonably be released, and would have a very low expectation of
> >> having extraneous bugs.
> > 
> > I would also like to advocate changing the mentality a bit. The current
> > mentality is that "we want to be reasonably sure there is low
> > expectation of bugs before we can release". Why not change to "we
> > release when we're sure there is definitely improvement in the tree
> > compared to last release”?
> 

> I’m pretty sure most people consuming our releases care much more
> about bugs than about new features.  If *we* aren’t doing the testing
> to find some point that is reasonably bug-free, then who will?
> Fedora, Centos, Debian certainly don’t have the resources for that,
> which means suddenly those become not viable platforms for  normal
> users anymore.  Citrix, SuSE, and Oracle have to go back to
> duplicating their own testing and having massive patchqueues on every
> release.  And where does that leave less well-funded projects, like
> QubesOS and OpenXT?
> 
> I think our bug-finding standard is about right.  It doesn’t need to
> be as high as for a commercial offering, but it certainly needs to be
> good enough for an average motivated user.

"Improvement" doesn't mean new features only. Why not release when a
major security issue is fixed? Isn't that what downstream wants?

Wei.

> 
>  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:43             ` Roger Pau Monné
@ 2018-07-05  8:47               ` Wei Liu
  2018-07-05  8:55                 ` Roger Pau Monné
  2018-07-05 10:43               ` Sander Eikelenboom
  1 sibling, 1 reply; 82+ messages in thread
From: Wei Liu @ 2018-07-05  8:47 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lars Kurth, Wei Liu, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel

On Thu, Jul 05, 2018 at 10:43:38AM +0200, Roger Pau Monné wrote:
> On Thu, Jul 05, 2018 at 09:19:10AM +0100, Wei Liu wrote:
> > On Thu, Jul 05, 2018 at 10:06:52AM +0200, Roger Pau Monné wrote:
> > > On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
> > > > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> > > > > So a fair amount of the discussion was about what it would look like,
> > > > > and what it would take, to make it such that almost any push from
> > > > > osstest (or whatever testing infrasctructure we went with) could
> > > > > reasonably be released, and would have a very low expectation of
> > > > > having extraneous bugs.
> > > > 
> > > > I would also like to advocate changing the mentality a bit. The current
> > > > mentality is that "we want to be reasonably sure there is low
> > > > expectation of bugs before we can release". Why not change to "we
> > > > release when we're sure there is definitely improvement in the tree
> > > > compared to last release"?
> > > 
> > > The current guideline is quite objective, if there are no reported
> > > bugs and osstest flight doesn't show any regressions we are ready to
> > > release. OTOH how should the improvements to the tree be quantized and
> > > measured?
> > 
> > Say, a security bug is fixed? A major bug is closed?
> 
> I think this is still quite subjective, whereas the previous criteria
> was objective.
> 

They are orthogonal. We can still wait a bit until osstest reports no
regression and noone reports bugs.

> Who will take the decision of whether a bug is major or not?

That's as subjective as why a release should be done in 6 months or 9
months but not 1 year or 2 years.

> 
> > > 
> > > At any point during the development or the release process the tree
> > > will contain improvements in some areas compared to the last
> > > release.
> > 
> > Yes, that is right. That's what CD does, right?
> 
> I thin so, but I'm not an expert on development techniques TBH :).
> 
> IMO one of the problems with Xen is that users don't tend to test
> master often, I assume this is because Xen is a critical piece of
> their infra, and they require it to be completely stable. Not everyone
> can effort an extra box just for testing Xen master. I'm not sure this
> is going to change a lot even if nightly builds are provided.

I think you underestimate the number of people wanting to use nightly
builds. I think XenServer.org is a good example?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:31         ` Juergen Gross
@ 2018-07-05  8:55           ` Wei Liu
  2018-07-05 11:24             ` Ian Jackson
  0 siblings, 1 reply; 82+ messages in thread
From: Wei Liu @ 2018-07-05  8:55 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, Wei Liu, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel,
	Roger Pau Monne

On Thu, Jul 05, 2018 at 10:31:10AM +0200, Juergen Gross wrote:
> On 05/07/18 09:53, Wei Liu wrote:
> > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> >>
> >>
> >>> On Jul 3, 2018, at 11:07 AM, Roger Pau Monné <roger.pau@citrix.com> wrote:
> >>>
> >>> On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> >>>> We then had a discussion around why the positive benefits didn't materialize:
> >>>> * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
> >>>>   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
> >>>>   merely provide an incentive not to fix them.
> >>>> * Issues highlighted were:
> >>>>   * 2-3 months stabilizing period is too long
> >>>
> >>> I think one of the goals with the 6 month release cycle was to shrink
> >>> the stabilizing period, but it didn't turn that way, and the
> >>> stabilizing period is quite similar with a 6 or a 9 month release
> >>> cycle.
> >>
> >> Right, and I think this was something that wasn’t quite captured in Lars’ summary.
> >>
> >> Everyone agreed:
> >> 1. The expectation was that a shorter release cycle would lead to shorter stabilization periods
> >> 2. This has not turned out to be the case, which means
> >> 3 At the moment, our “time doing development” to “time fixing bugs for a release” ratio is far too low.
> >>
> >> One option to fix #3 is to go back to a 9-month cycle (or even a
> >> 12-month cycle), which would increase the “development” part of the
> >> equation.
> > 
> > You get more changes in, you also get more bugs.  Assuming bugs are
> > introduced at a constant rate in relation to changes, moving back to 9
> > months won't help.
> 
> Uuh, why not? It isn't as if no bugs are found and corrected in the
> development period. As long as the development period is longer than
> the average time between OSSTEST pushs the stabilization period should
> be roghly constant. So a longer development period will result in a
> better ratio development / stabilization.
> 
> > At least in my experience, a majority of time during the freeze is spent
> > on *waiting*. Waiting for osstest to turn around, waiting for security
> > issues to become public. Moving to 9 months won't change those factors.
> 
> But waiting isn't a factor, it is a constant, assuming the number of
> unresolved bugs at the end of the development period is roughly the
> same. And that will be the case if we don't:
> 
> a) pay no attention to OSSTEST results during development (i.e. trying
>    to get pushs as often as possible)
> 
> b) rush most series in at the end of the development period

b) has never hold in the last 5 years.

What stops committers from rushing most series at the end of the
development period? If RM is to take over the tree before stabilisation
begin, shouldn't we just call the point of takeover the start of
stabilisation period?

(IIRC the record I know is committers pushed 250 patches in a single day
before the freeze)

> 
> > A typical bug would need five working days (one week) to fix.
> > 
> > 1. Someone report or osstest reports a bug. (Day 1)
> > 2. Someone analyses it and writes a patch. (Day 2)
> > 3. Someone reviews it. (Day 2 or 3).
> > 4. Someone commits it. (Day 3 or 4).
> > 5. Osstest produces test results (Day 3 to 5).
> > 
> > For a simple bug, we might finish 1-4 in one day. But we still need to
> > allow for at least two days to get a push.
> 
> In case of OSSTEST often enough multiple bugs are reported in parallel
> and can (and should) be processed concurrently.

It is more the case that one incomplete fix blocks all other valid
fixes, so the time from staging to msater is even longer.

(The record in this case is 100 patches between staging and master and
exactly 1 calendar month to get a push)

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:47               ` Wei Liu
@ 2018-07-05  8:55                 ` Roger Pau Monné
  2018-07-05  9:17                   ` Wei Liu
  0 siblings, 1 reply; 82+ messages in thread
From: Roger Pau Monné @ 2018-07-05  8:55 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, xen-devel

On Thu, Jul 05, 2018 at 09:47:43AM +0100, Wei Liu wrote:
> On Thu, Jul 05, 2018 at 10:43:38AM +0200, Roger Pau Monné wrote:
> > On Thu, Jul 05, 2018 at 09:19:10AM +0100, Wei Liu wrote:
> > > On Thu, Jul 05, 2018 at 10:06:52AM +0200, Roger Pau Monné wrote:
> > > > On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
> > > > > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> > > > > > So a fair amount of the discussion was about what it would look like,
> > > > > > and what it would take, to make it such that almost any push from
> > > > > > osstest (or whatever testing infrasctructure we went with) could
> > > > > > reasonably be released, and would have a very low expectation of
> > > > > > having extraneous bugs.
> > > > > 
> > > > > I would also like to advocate changing the mentality a bit. The current
> > > > > mentality is that "we want to be reasonably sure there is low
> > > > > expectation of bugs before we can release". Why not change to "we
> > > > > release when we're sure there is definitely improvement in the tree
> > > > > compared to last release"?
> > > > 
> > > > The current guideline is quite objective, if there are no reported
> > > > bugs and osstest flight doesn't show any regressions we are ready to
> > > > release. OTOH how should the improvements to the tree be quantized and
> > > > measured?
> > > 
> > > Say, a security bug is fixed? A major bug is closed?
> > 
> > I think this is still quite subjective, whereas the previous criteria
> > was objective.
> > 
> 
> They are orthogonal. We can still wait a bit until osstest reports no
> regression and noone reports bugs.
> 
> > Who will take the decision of whether a bug is major or not?
> 
> That's as subjective as why a release should be done in 6 months or 9
> months but not 1 year or 2 years.

But that's a subjective one time decision that once taken then the
project sticks to. Deciding when to release in your scenario involves
at least one subjective decision before each release.

As an example just see how much opinions are we having about changing
the release cycle. Imagine we had this every time the project needs to
decide whether to release or not.

> > 
> > > > 
> > > > At any point during the development or the release process the tree
> > > > will contain improvements in some areas compared to the last
> > > > release.
> > > 
> > > Yes, that is right. That's what CD does, right?
> > 
> > I thin so, but I'm not an expert on development techniques TBH :).
> > 
> > IMO one of the problems with Xen is that users don't tend to test
> > master often, I assume this is because Xen is a critical piece of
> > their infra, and they require it to be completely stable. Not everyone
> > can effort an extra box just for testing Xen master. I'm not sure this
> > is going to change a lot even if nightly builds are provided.
> 
> I think you underestimate the number of people wanting to use nightly
> builds. I think XenServer.org is a good example?

OK, maybe. The above was mostly my opinion, but I agree I don't know
that much. I think xenserver nightly builds are much more stable,
because the hypervisor there doesn't change that often and it's based
on a cured stable branch.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:55                 ` Roger Pau Monné
@ 2018-07-05  9:17                   ` Wei Liu
  0 siblings, 0 replies; 82+ messages in thread
From: Wei Liu @ 2018-07-05  9:17 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Lars Kurth, Wei Liu, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel

On Thu, Jul 05, 2018 at 10:55:49AM +0200, Roger Pau Monné wrote:
> On Thu, Jul 05, 2018 at 09:47:43AM +0100, Wei Liu wrote:
> > On Thu, Jul 05, 2018 at 10:43:38AM +0200, Roger Pau Monné wrote:
> > > On Thu, Jul 05, 2018 at 09:19:10AM +0100, Wei Liu wrote:
> > > > On Thu, Jul 05, 2018 at 10:06:52AM +0200, Roger Pau Monné wrote:
> > > > > On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
> > > > > > On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
> > > > > > > So a fair amount of the discussion was about what it would look like,
> > > > > > > and what it would take, to make it such that almost any push from
> > > > > > > osstest (or whatever testing infrasctructure we went with) could
> > > > > > > reasonably be released, and would have a very low expectation of
> > > > > > > having extraneous bugs.
> > > > > > 
> > > > > > I would also like to advocate changing the mentality a bit. The current
> > > > > > mentality is that "we want to be reasonably sure there is low
> > > > > > expectation of bugs before we can release". Why not change to "we
> > > > > > release when we're sure there is definitely improvement in the tree
> > > > > > compared to last release"?
> > > > > 
> > > > > The current guideline is quite objective, if there are no reported
> > > > > bugs and osstest flight doesn't show any regressions we are ready to
> > > > > release. OTOH how should the improvements to the tree be quantized and
> > > > > measured?
> > > > 
> > > > Say, a security bug is fixed? A major bug is closed?
> > > 
> > > I think this is still quite subjective, whereas the previous criteria
> > > was objective.
> > > 
> > 
> > They are orthogonal. We can still wait a bit until osstest reports no
> > regression and noone reports bugs.
> > 
> > > Who will take the decision of whether a bug is major or not?
> > 
> > That's as subjective as why a release should be done in 6 months or 9
> > months but not 1 year or 2 years.
> 
> But that's a subjective one time decision that once taken then the
> project sticks to. Deciding when to release in your scenario involves
> at least one subjective decision before each release.
> 
> As an example just see how much opinions are we having about changing
> the release cycle. Imagine we had this every time the project needs to
> decide whether to release or not.

They are different issues.  Why would we argue over what to release or
not if the process is lightweight and releasing a new version is as easy
as pushing a tag?

I can think of major release being a problem because that affects how
many branches we support but point releases shouldn't be a point of
contention at all.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:43             ` Roger Pau Monné
  2018-07-05  8:47               ` Wei Liu
@ 2018-07-05 10:43               ` Sander Eikelenboom
  1 sibling, 0 replies; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-05 10:43 UTC (permalink / raw)
  To: Roger Pau Monné, Wei Liu
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, xen-devel

On 05/07/18 10:43, Roger Pau Monné wrote:
> On Thu, Jul 05, 2018 at 09:19:10AM +0100, Wei Liu wrote:
>> On Thu, Jul 05, 2018 at 10:06:52AM +0200, Roger Pau Monné wrote:
>>> On Thu, Jul 05, 2018 at 08:53:51AM +0100, Wei Liu wrote:
>>>> On Wed, Jul 04, 2018 at 03:26:16PM +0000, George Dunlap wrote:
>>>>> So a fair amount of the discussion was about what it would look like,
>>>>> and what it would take, to make it such that almost any push from
>>>>> osstest (or whatever testing infrasctructure we went with) could
>>>>> reasonably be released, and would have a very low expectation of
>>>>> having extraneous bugs.
>>>>
>>>> I would also like to advocate changing the mentality a bit. The current
>>>> mentality is that "we want to be reasonably sure there is low
>>>> expectation of bugs before we can release". Why not change to "we
>>>> release when we're sure there is definitely improvement in the tree
>>>> compared to last release"?
>>>
>>> The current guideline is quite objective, if there are no reported
>>> bugs and osstest flight doesn't show any regressions we are ready to
>>> release. OTOH how should the improvements to the tree be quantized and
>>> measured?
>>
>> Say, a security bug is fixed? A major bug is closed?
> 
> I think this is still quite subjective, whereas the previous criteria
> was objective.
> 
> Who will take the decision of whether a bug is major or not?
> 
>>>
>>> At any point during the development or the release process the tree
>>> will contain improvements in some areas compared to the last
>>> release.
>>
>> Yes, that is right. That's what CD does, right?
> 
> I thin so, but I'm not an expert on development techniques TBH :).
> 
> IMO one of the problems with Xen is that users don't tend to test
> master often, I assume this is because Xen is a critical piece of
> their infra, and they require it to be completely stable. Not everyone
> can effort an extra box just for testing Xen master. I'm not sure this
> is going to change a lot even if nightly builds are provided.

Since i actually do (on my homeserver), just to share the experience:
- Master/xen-unstable is actually quite stable !
- Most issues i encounter are boot-issues and not uncommonly boot issues
  due to upstream linux kernel changes (by other developers) which have
  an unforeseen impact on Xen.
- So one of the issues with testing is other projects where Xen depends
  upon, and which are out of your control.
- But that's n=1 and on older hardware, so i don't run into issue
  with newer hardware-features.

One thing i haven't seen being mentioned regarding OSSTEST and testing
in general is the perhaps a bit unwieldy test matrix. For a lot of Xen
functionality there are at least 2 options.
I do understand that feature deprecation isn't an easy thing to do and
there are always reasons to keep stuff around and even if you do it, it
still takes time to reap the benefits, but the benefits could be:
    - (a lot of) code cleanup, which eases development in general.
    - much less building and testing that has to be done.
    - easier documentation wise.
    - in the end, easier for users as well.

But perhaps it won't hurt to have some discussion about whether, *why*
and until when, certain features/sub-systems are still worth to keep and
for how long (for example: qemu-trad <-> qemu-xen, PV <-> PVH, seabios
<-> rombios). Since deprecation generally takes a few releases i think
it could be wise to have a discussion about it upfront of every release
so an deprecation warning can be incorporated in that release and the
release notes if need be.

--
Sander

> This is different from say an email or IRC clients, where people don't
> mind that much using unstable versions, and so the development branch
> gets more testing even before the release process starts.
> 
> Roger.
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03  8:06     ` Lars Kurth
  2018-07-03 10:01       ` Jan Beulich
@ 2018-07-05 11:05       ` Ian Jackson
  2018-07-05 11:18         ` George Dunlap
  1 sibling, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 11:05 UTC (permalink / raw)
  To: Lars Kurth
  Cc: Juergen Gross, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Matt Spencer

Lars Kurth writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> * The Gitlab machinery will do build tests (and the discussion
>   showed that we should be able to do this via cross compilation or
>   compilation on a real system if a service such as infosifter is
>   used [...]
> * This could eventually include a basic set of smoke tests that are
>   system independent and could run under QEMU - Doug already uses a
>   basic test where a xen host and/or VM is started

Firstly, I think this is all an excellent idea.  It should be pursued.

I don't think it interacts directly with osstest except to reduce the
rate of test failures.


> [Juergen:]
>     A major source of the pain seems to be the hardware: About half of all
>     cases where I looked into the test reports to find the reason for a
>     failing test flight were related to hardware failures. Not sure how to
>     solve that.
>     
> This is worrying and I would like to get Ian Jackson's viewpoint on it. 

I haven't worked up a formal analysis of the pattern of failures, but
(discussing hardware trouble only):

 * We are still waiting for new ARM hardware.  When we get it we will
   hopefully be able to decommission the arndale dev boards, whose
   network controllers are unreliable.

   Sadly, the unreliability of the armhf tests has become so
   normalised that we all just shrug and hope the next one will be
   better.  Another option would be to decommission the arndales right
   away and reduce the armhf test coverage.

 * We have had problems with PDU relays, affecting about three
   machines (depending how you count things).  My experience with the
   PDUs in Massachusetts has been much poorer than in Cambridge.  I
   think the underlying cause is probably USAian 110V electricity (!)
   I have a plan to fix this, involving more use of IPMI in tandem
   with the PDUs, which I hope will reduce this significantly.

 * As the test lab increases in size, the rate of hardware failure
   necessarily also rises.  Right now, response to that is manual: a
   human must notice the problem, inspect test results, decide it is a
   hardware problem, and take the affected node out of service.  I
   am working on a plan to do that part automatically.

   Human intervention will still be required to diagnose and repair
   the problem of course, but in the meantime, further tests will not
   be affected.

>     Another potential problems showed up last week: OSSTEST is using the
>     Debian servers for doing the basic installation. A change there (e.g.
>     a new point release) will block tests. I'd prefer to have a local cache
>     of the last known good set of *.deb files to be used especially for the
>     branched Xen versions. This would rule out remote problems for releases.
> 
> This is again something which we should definitely look at.

This was bad luck.  This kind of update happens about 3-4 times a
year.  It does break everything, leading to a delay of a day or two,
but the fix is straightforward.

Obviously this is not ideal but the solutions are nontrivial.  It is
not really possible to "have a local cache of the last known good set
of *.deb files" without knowing what that subset should be; that would
require an edifice to track what is used, or some manual configuration
which would probably break.  Alternatively we could run a complete
mirror but that is a *lot* of space and bandwidth, most of which would
be unused.

I think the right approach is probably to switch from using d-i for
host installs, to something like FAI.  That would be faster as well.
However that amouns to reengineering the way osstest does host
installs; it would also leave us maintaining an additional way to do
host installs, since we would still want to be able to *test* d-i
operation as a guest.

So overall I have left this one on the back burner.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:47       ` Juergen Gross
  2018-07-03 11:24         ` Lars Kurth
@ 2018-07-05 11:16         ` Ian Jackson
  2018-07-05 11:39           ` George Dunlap
                             ` (2 more replies)
  2018-07-05 17:58         ` Doug Goldstein
  2 siblings, 3 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 11:16 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, 'Jan Beulich',
	xen-devel, Roger Pau Monne

Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> We didn't look at the sporadic failing tests thoroughly enough. The
> hypercall buffer failure has been there for ages, a newer kernel just
> made it more probable. This would have saved us some weeks.

In general, as a community, we are very bad at this kind of thing.

In my experience, the development community is not really interested
in fixing bugs which aren't directly in their way.

You can observe this easily in the way that regression in Linux,
spotted by osstest, are handled.  Linux 4.9 has been broken for 43
days.  Linux mainline is broken too.

We do not have a team of people reading these test reports, and
chasing developers to fix them.  I certainly do not have time to do
this triage.  On trees where osstest failures do not block
development, things go unfixed for weeks, sometimes months.

And overall my gut feeling is that tests which fail intermittently are
usually blamed (even if this is not stated explicitly) on problems
with osstest or with our test infrastructure.  It is easy for
developers to think this because if they wait, the test will get
"lucky", and pass, and so there will be a push and the developers can
carry on.

I have a vague plan to sit down and think about how osstest's
results analysers could respond better to intermittent failures.  The
If I can, I would like intermittent failures to block pushes.  That
would at least help address the problem of heisenbugs (which are often
actually quite serious issues) not beint taken seriously.

I would love to hear suggestions for how to get people to actually fix
test failures in trees not maintained by the Xen Project and therefore
not gated by osstest.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:05       ` Ian Jackson
@ 2018-07-05 11:18         ` George Dunlap
  2018-07-05 11:51           ` Andrew Cooper
  0 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2018-07-05 11:18 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	Rich Persaud, committers, xen-devel, Matt Spencer



> On Jul 5, 2018, at 12:05 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> 
> Lars Kurth writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> * The Gitlab machinery will do build tests (and the discussion
>>  showed that we should be able to do this via cross compilation or
>>  compilation on a real system if a service such as infosifter is
>>  used [...]
>> * This could eventually include a basic set of smoke tests that are
>>  system independent and could run under QEMU - Doug already uses a
>>  basic test where a xen host and/or VM is started
> 
> Firstly, I think this is all an excellent idea.  It should be pursued.
> 
> I don't think it interacts directly with osstest except to reduce the
> rate of test failures.
> 
> 
>> [Juergen:]
>>    A major source of the pain seems to be the hardware: About half of all
>>    cases where I looked into the test reports to find the reason for a
>>    failing test flight were related to hardware failures. Not sure how to
>>    solve that.
>> 
>> This is worrying and I would like to get Ian Jackson's viewpoint on it. 
> 
> I haven't worked up a formal analysis of the pattern of failures, but
> (discussing hardware trouble only):
> 
> * We are still waiting for new ARM hardware.  When we get it we will
>   hopefully be able to decommission the arndale dev boards, whose
>   network controllers are unreliable.
> 
>   Sadly, the unreliability of the armhf tests has become so
>   normalised that we all just shrug and hope the next one will be
>   better.  Another option would be to decommission the arndales right
>   away and reduce the armhf test coverage.
> 
> * We have had problems with PDU relays, affecting about three
>   machines (depending how you count things).  My experience with the
>   PDUs in Massachusetts has been much poorer than in Cambridge.  I
>   think the underlying cause is probably USAian 110V electricity (!)
>   I have a plan to fix this, involving more use of IPMI in tandem
>   with the PDUs, which I hope will reduce this significantly.
> 
> * As the test lab increases in size, the rate of hardware failure
>   necessarily also rises.  Right now, response to that is manual: a
>   human must notice the problem, inspect test results, decide it is a
>   hardware problem, and take the affected node out of service.  I
>   am working on a plan to do that part automatically.
> 
>   Human intervention will still be required to diagnose and repair
>   the problem of course, but in the meantime, further tests will not
>   be affected.
> 
>>    Another potential problems showed up last week: OSSTEST is using the
>>    Debian servers for doing the basic installation. A change there (e.g.
>>    a new point release) will block tests. I'd prefer to have a local cache
>>    of the last known good set of *.deb files to be used especially for the
>>    branched Xen versions. This would rule out remote problems for releases.
>> 
>> This is again something which we should definitely look at.
> 
> This was bad luck.  This kind of update happens about 3-4 times a
> year.  It does break everything, leading to a delay of a day or two,
> but the fix is straightforward.
> 
> Obviously this is not ideal but the solutions are nontrivial.  It is
> not really possible to "have a local cache of the last known good set
> of *.deb files" without knowing what that subset should be; that would
> require an edifice to track what is used, or some manual configuration
> which would probably break.  Alternatively we could run a complete
> mirror but that is a *lot* of space and bandwidth, most of which would
> be unused.
> 
> I think the right approach is probably to switch from using d-i for
> host installs, to something like FAI.  That would be faster as well.
> However that amouns to reengineering the way osstest does host
> installs; it would also leave us maintaining an additional way to do
> host installs, since we would still want to be able to *test* d-i
> operation as a guest.

What I think would be ideal is a way to take ‘snapshots’ of different states of setup for various hosts and revert to them.  There’s absolutely no reason to do a full install of a host every osstest run, when that install happens 1) before we even install Xen, and 2) should be nearly identical each time.  We should be able to install a host, take a snapshot of the “clean” install, then do the build prep, take a snapshot of that, and then simply revert to one or both of those (assuming build requirements haven’t changed in the mean time) whenever necessary.  Re-generating these snapshots once per week per host should be plenty, and sounds like it would massively improve the current throughput.

I’d like to propose the idea also that we try to find a more efficient way of testing guest functionality than doing a guest install.  I understand it’s a natural way to test a reasonable range of functionality, but particularly for Windows guests, my impression is that it’s very slow; there must be a way to make a test that would have similar coverage but be able to be completed with a pre-installed snapshot, in only a few minutes.

 -George





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  7:53       ` Wei Liu
                           ` (2 preceding siblings ...)
  2018-07-05  8:31         ` Juergen Gross
@ 2018-07-05 11:19         ` Ian Jackson
  3 siblings, 0 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 11:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: Lars Kurth, advisory-board, Doug Goldstein, George Dunlap,
	Rich Persaud, committers, xen-devel, Roger Pau Monne

Wei Liu writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> Osstest is really resource intense and heavy weight. We need to think of
> a way to  reduce its turnaround time.  Or we can introduce other
> auxiliary test systems to reduce its burden.

We should pursue both of these strategies simultaneously.

I have a plan to improve osstest's efficiency by having it reinstall
test hosts less often.  But of course there's only one of me and there
is often a fire to be put out (and automated firefighting machinery to
develop).

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05  8:55           ` Wei Liu
@ 2018-07-05 11:24             ` Ian Jackson
  0 siblings, 0 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 11:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel,
	Roger Pau Monne

Wei Liu writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> It is more the case that one incomplete fix blocks all other valid
> fixes, so the time from staging to msater is even longer.
> 
> (The record in this case is 100 patches between staging and master and
> exactly 1 calendar month to get a push)

One possibility would be to split osstest's input queues up.

Currently, osstest uses an unusual model, compared to many other CI
systems: by and large the input branches to osstest are
fast-forwarding.  I don't operate this way with osstest itself.  I
allow myself to rewind osstest pretest (the equivalent of staging)
whenever it seems appropriate.

The result is that if maintainer X pushes a bad series, the only way
to move forward is to fix it or revert it.

An alternative model would be that each batch of work is prepared by
committers separately, and only becomes part of public master after it
has been tested.

Obviously this would divide osstest bandwidth between the batches, so
each batch would have to wait longer for a test.  But it does mean
that if a batch produces a test failure, no-one else is blocked, and
the batch can be reworked.

If this is an interesting idea we should talk about what more
precisely it would look like.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:16         ` Ian Jackson
@ 2018-07-05 11:39           ` George Dunlap
  2018-07-05 18:14             ` Doug Goldstein
  2018-07-05 11:41           ` Juergen Gross
  2018-07-05 18:13           ` Doug Goldstein
  2 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2018-07-05 11:39 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	Rich Persaud, committers, Jan Beulich, xen-devel,
	Roger Pau Monne



> On Jul 5, 2018, at 12:16 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> 
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> We didn't look at the sporadic failing tests thoroughly enough. The
>> hypercall buffer failure has been there for ages, a newer kernel just
>> made it more probable. This would have saved us some weeks.
> 
> In general, as a community, we are very bad at this kind of thing.
> 
> In my experience, the development community is not really interested
> in fixing bugs which aren't directly in their way.
> 
> You can observe this easily in the way that regression in Linux,
> spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> days.  Linux mainline is broken too.
> 
> We do not have a team of people reading these test reports, and
> chasing developers to fix them.  I certainly do not have time to do
> this triage.  On trees where osstest failures do not block
> development, things go unfixed for weeks, sometimes months.
> 
> And overall my gut feeling is that tests which fail intermittently are
> usually blamed (even if this is not stated explicitly) on problems
> with osstest or with our test infrastructure.  It is easy for
> developers to think this because if they wait, the test will get
> "lucky", and pass, and so there will be a push and the developers can
> carry on.
> 
> I have a vague plan to sit down and think about how osstest's
> results analysers could respond better to intermittent failures.  The
> If I can, I would like intermittent failures to block pushes.  That
> would at least help address the problem of heisenbugs (which are often
> actually quite serious issues) not beint taken seriously.
> 
> I would love to hear suggestions for how to get people to actually fix
> test failures in trees not maintained by the Xen Project and therefore
> not gated by osstest.

Well at the moment, investigation is ad-hoc.  Basically everyone has to look to see *whether* there’s been a failure, and it’s nobody’s job in particular to try to chase it down to find out what it might be.  If we had a team, we could have a robot rotate between the teams to nominate one particular person per failure to take a look at the result and at least try to classify it, maybe try to find the appropriate person who may be able to take a deeper look.

 -George





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:16         ` Ian Jackson
  2018-07-05 11:39           ` George Dunlap
@ 2018-07-05 11:41           ` Juergen Gross
  2018-07-05 18:13           ` Doug Goldstein
  2 siblings, 0 replies; 82+ messages in thread
From: Juergen Gross @ 2018-07-05 11:41 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, 'Jan Beulich',
	xen-devel, Roger Pau Monne

On 05/07/18 13:16, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> We didn't look at the sporadic failing tests thoroughly enough. The
>> hypercall buffer failure has been there for ages, a newer kernel just
>> made it more probable. This would have saved us some weeks.
> 
> In general, as a community, we are very bad at this kind of thing.

We should nominate someone taking care of that. This could be e.g.
the Release Manager (who will have to do that during the stabilization
period anyway). I'd be happy for other people helping out, obviously.

> In my experience, the development community is not really interested
> in fixing bugs which aren't directly in their way.
> 
> You can observe this easily in the way that regression in Linux,
> spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> days.  Linux mainline is broken too.

Just sent the patch to stable repairing that issue.

Unfortunately I didn't spot the problem when sending the backports
of the patches for repairing the recent problems on AMD hardware: I
had specified kernel parameters in my tests avoiding the latest issues.

It took longer than I have hoped to find some time looking into the
problem due to ongoing security work and the spent time for release
related stuff.

> We do not have a team of people reading these test reports, and
> chasing developers to fix them.  I certainly do not have time to do
> this triage.  On trees where osstest failures do not block
> development, things go unfixed for weeks, sometimes months.

Maybe we should find an owner for each tree who will get the reports
directly and who is responsible for reaching out to the developers?
As said above I think the Release Manager is a possible owner of the
xen-unstable test tree.

> And overall my gut feeling is that tests which fail intermittently are
> usually blamed (even if this is not stated explicitly) on problems
> with osstest or with our test infrastructure.  It is easy for
> developers to think this because if they wait, the test will get
> "lucky", and pass, and so there will be a push and the developers can
> carry on.

Yes.

> I have a vague plan to sit down and think about how osstest's
> results analysers could respond better to intermittent failures.  The
> If I can, I would like intermittent failures to block pushes.  That
> would at least help address the problem of heisenbugs (which are often
> actually quite serious issues) not beint taken seriously.

+1

> I would love to hear suggestions for how to get people to actually fix
> test failures in trees not maintained by the Xen Project and therefore
> not gated by osstest.

In case nobody stands up to do it this will be quite difficult. One
option could be to drop the failing feature from Xen in case it isn't
an absolutely mandatory one. In case somebody really wants to keep that
feature he would have to act in order to repair it.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:18         ` George Dunlap
@ 2018-07-05 11:51           ` Andrew Cooper
  2018-07-05 12:20             ` Juergen Gross
  0 siblings, 1 reply; 82+ messages in thread
From: Andrew Cooper @ 2018-07-05 11:51 UTC (permalink / raw)
  To: George Dunlap, Ian Jackson
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	Rich Persaud, committers, xen-devel, Matt Spencer

On 05/07/18 12:18, George Dunlap wrote:
>
>>>    Another potential problems showed up last week: OSSTEST is using the
>>>    Debian servers for doing the basic installation. A change there (e.g.
>>>    a new point release) will block tests. I'd prefer to have a local cache
>>>    of the last known good set of *.deb files to be used especially for the
>>>    branched Xen versions. This would rule out remote problems for releases.
>>>
>>> This is again something which we should definitely look at.
>> This was bad luck.  This kind of update happens about 3-4 times a
>> year.  It does break everything, leading to a delay of a day or two,
>> but the fix is straightforward.
>>
>> Obviously this is not ideal but the solutions are nontrivial.  It is
>> not really possible to "have a local cache of the last known good set
>> of *.deb files" without knowing what that subset should be; that would
>> require an edifice to track what is used, or some manual configuration
>> which would probably break.  Alternatively we could run a complete
>> mirror but that is a *lot* of space and bandwidth, most of which would
>> be unused.
>>
>> I think the right approach is probably to switch from using d-i for
>> host installs, to something like FAI.  That would be faster as well.
>> However that amouns to reengineering the way osstest does host
>> installs; it would also leave us maintaining an additional way to do
>> host installs, since we would still want to be able to *test* d-i
>> operation as a guest.
> What I think would be ideal is a way to take ‘snapshots’ of different states of setup for various hosts and revert to them.  There’s absolutely no reason to do a full install of a host every osstest run, when that install happens 1) before we even install Xen, and 2) should be nearly identical each time.  We should be able to install a host, take a snapshot of the “clean” install, then do the build prep, take a snapshot of that, and then simply revert to one or both of those (assuming build requirements haven’t changed in the mean time) whenever necessary.  Re-generating these snapshots once per week per host should be plenty, and sounds like it would massively improve the current throughput.
>
> I’d like to propose the idea also that we try to find a more efficient way of testing guest functionality than doing a guest install.  I understand it’s a natural way to test a reasonable range of functionality, but particularly for Windows guests, my impression is that it’s very slow; there must be a way to make a test that would have similar coverage but be able to be completed with a pre-installed snapshot, in only a few minutes.

We've had similar discussions in XenServer. That idea is superficially
attractive but actually makes things worse, because it now means that
filesystem clone/snapshot is now in the mix of things which can go wrong.

Particularly with OSSTest testing mainline kernels, rather than distro
stable kernels, the chances of finding filesystem bugs grows
substantially, and the complexity of diagnosing an issue is outside of
our area of expertise.

Testing, particularly smoke testing, needs to be 100% reliable to be
useful, and OSSTest is not, is demonstrated across this thread.

The only way to make things better is to improve the reliability. 
Improving reliability means removing all unnecessary complexity, and
replacing any unreliable hardware.

The Xen Project has the money to replace intermittent PDUs (if that is
believed to be the cause of the problem).  What the Xen Project doesn't
have is the time for people to investigate intermittent issues, and what
it can't afford is the current attitude of "oh - that's just OSSTest
being flaky - it will hopefully pass next time".

By far and away the best overall timesaving comes from having all
testing working reliably, at which point OSSTest doesn't need to rerun
tests again in the hope of getting a different answer, and identified
failures are a clear sign to developers that there is a problem which
needs fixing.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:51           ` Andrew Cooper
@ 2018-07-05 12:20             ` Juergen Gross
  2018-07-05 12:34               ` Lars Kurth
  2018-07-05 15:14               ` Ian Jackson
  0 siblings, 2 replies; 82+ messages in thread
From: Juergen Gross @ 2018-07-05 12:20 UTC (permalink / raw)
  To: Andrew Cooper, George Dunlap, Ian Jackson
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel, Matt Spencer

On 05/07/18 13:51, Andrew Cooper wrote:
> On 05/07/18 12:18, George Dunlap wrote:
>>
>>>>    Another potential problems showed up last week: OSSTEST is using the
>>>>    Debian servers for doing the basic installation. A change there (e.g.
>>>>    a new point release) will block tests. I'd prefer to have a local cache
>>>>    of the last known good set of *.deb files to be used especially for the
>>>>    branched Xen versions. This would rule out remote problems for releases.
>>>>
>>>> This is again something which we should definitely look at.
>>> This was bad luck.  This kind of update happens about 3-4 times a
>>> year.  It does break everything, leading to a delay of a day or two,
>>> but the fix is straightforward.
>>>
>>> Obviously this is not ideal but the solutions are nontrivial.  It is
>>> not really possible to "have a local cache of the last known good set
>>> of *.deb files" without knowing what that subset should be; that would
>>> require an edifice to track what is used, or some manual configuration
>>> which would probably break.  Alternatively we could run a complete
>>> mirror but that is a *lot* of space and bandwidth, most of which would
>>> be unused.
>>>
>>> I think the right approach is probably to switch from using d-i for
>>> host installs, to something like FAI.  That would be faster as well.
>>> However that amouns to reengineering the way osstest does host
>>> installs; it would also leave us maintaining an additional way to do
>>> host installs, since we would still want to be able to *test* d-i
>>> operation as a guest.
>> What I think would be ideal is a way to take ‘snapshots’ of different states of setup for various hosts and revert to them.  There’s absolutely no reason to do a full install of a host every osstest run, when that install happens 1) before we even install Xen, and 2) should be nearly identical each time.  We should be able to install a host, take a snapshot of the “clean” install, then do the build prep, take a snapshot of that, and then simply revert to one or both of those (assuming build requirements haven’t changed in the mean time) whenever necessary.  Re-generating these snapshots once per week per host should be plenty, and sounds like it would massively improve the current throughput.
>>
>> I’d like to propose the idea also that we try to find a more efficient way of testing guest functionality than doing a guest install.  I understand it’s a natural way to test a reasonable range of functionality, but particularly for Windows guests, my impression is that it’s very slow; there must be a way to make a test that would have similar coverage but be able to be completed with a pre-installed snapshot, in only a few minutes.
> 
> We've had similar discussions in XenServer. That idea is superficially
> attractive but actually makes things worse, because it now means that
> filesystem clone/snapshot is now in the mix of things which can go wrong.
> 
> Particularly with OSSTest testing mainline kernels, rather than distro
> stable kernels, the chances of finding filesystem bugs grows
> substantially, and the complexity of diagnosing an issue is outside of
> our area of expertise.

But are really all tests required to use a freshly installed mainline
kernel? Can't we e.g. do a guest install test once a new kernel is
released and clone the resulting guest disk image for other tests
(after shutting down the guest, of course)?

Same applies to the host: the base system (without the to be tested
component like qemu, xen, or whatever) could be installed just by
cloning a disk/partition/logical volume.

Each image would run through the stages new->staging->stable:

- Each time a component is released an image is based on (e.g. a new
  mainline kernel) a new image is created by installing it. In case this
  succeeds, the image is moved to the staging area.
- The images in the staging area are tested using known stable
  components in order to test the image, not the test-components. In
  case of all tests succeeded the image is moved to the stable area.
- The stable images are used to test components from staging. In case of
  success the related components can be pushed to stable/master.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 12:20             ` Juergen Gross
@ 2018-07-05 12:34               ` Lars Kurth
  2018-07-05 15:14               ` Ian Jackson
  1 sibling, 0 replies; 82+ messages in thread
From: Lars Kurth @ 2018-07-05 12:34 UTC (permalink / raw)
  To: Juergen Gross, Andrew Cooper, George Dunlap, Ian Jackson
  Cc: committers, Rich Persaud, Doug Goldstein, Matt Spencer, xen-devel

Hi all,
at some point we need to take a step back and summarize this discussion and establish 
a) what seems to be agreed
b) what is possible
c) and what is controversial
I am sort of volunteering for this and was planning to do so tomorrow. 
Lars

On 05/07/2018, 13:20, "Juergen Gross" <jgross@suse.com> wrote:

    On 05/07/18 13:51, Andrew Cooper wrote:
    > On 05/07/18 12:18, George Dunlap wrote:
    >>
    >>>>    Another potential problems showed up last week: OSSTEST is using the
    >>>>    Debian servers for doing the basic installation. A change there (e.g.
    >>>>    a new point release) will block tests. I'd prefer to have a local cache
    >>>>    of the last known good set of *.deb files to be used especially for the
    >>>>    branched Xen versions. This would rule out remote problems for releases.
    >>>>
    >>>> This is again something which we should definitely look at.
    >>> This was bad luck.  This kind of update happens about 3-4 times a
    >>> year.  It does break everything, leading to a delay of a day or two,
    >>> but the fix is straightforward.
    >>>
    >>> Obviously this is not ideal but the solutions are nontrivial.  It is
    >>> not really possible to "have a local cache of the last known good set
    >>> of *.deb files" without knowing what that subset should be; that would
    >>> require an edifice to track what is used, or some manual configuration
    >>> which would probably break.  Alternatively we could run a complete
    >>> mirror but that is a *lot* of space and bandwidth, most of which would
    >>> be unused.
    >>>
    >>> I think the right approach is probably to switch from using d-i for
    >>> host installs, to something like FAI.  That would be faster as well.
    >>> However that amouns to reengineering the way osstest does host
    >>> installs; it would also leave us maintaining an additional way to do
    >>> host installs, since we would still want to be able to *test* d-i
    >>> operation as a guest.
    >> What I think would be ideal is a way to take ‘snapshots’ of different states of setup for various hosts and revert to them.  There’s absolutely no reason to do a full install of a host every osstest run, when that install happens 1) before we even install Xen, and 2) should be nearly identical each time.  We should be able to install a host, take a snapshot of the “clean” install, then do the build prep, take a snapshot of that, and then simply revert to one or both of those (assuming build requirements haven’t changed in the mean time) whenever necessary.  Re-generating these snapshots once per week per host should be plenty, and sounds like it would massively improve the current throughput.
    >>
    >> I’d like to propose the idea also that we try to find a more efficient way of testing guest functionality than doing a guest install.  I understand it’s a natural way to test a reasonable range of functionality, but particularly for Windows guests, my impression is that it’s very slow; there must be a way to make a test that would have similar coverage but be able to be completed with a pre-installed snapshot, in only a few minutes.
    > 
    > We've had similar discussions in XenServer. That idea is superficially
    > attractive but actually makes things worse, because it now means that
    > filesystem clone/snapshot is now in the mix of things which can go wrong.
    > 
    > Particularly with OSSTest testing mainline kernels, rather than distro
    > stable kernels, the chances of finding filesystem bugs grows
    > substantially, and the complexity of diagnosing an issue is outside of
    > our area of expertise.
    
    But are really all tests required to use a freshly installed mainline
    kernel? Can't we e.g. do a guest install test once a new kernel is
    released and clone the resulting guest disk image for other tests
    (after shutting down the guest, of course)?
    
    Same applies to the host: the base system (without the to be tested
    component like qemu, xen, or whatever) could be installed just by
    cloning a disk/partition/logical volume.
    
    Each image would run through the stages new->staging->stable:
    
    - Each time a component is released an image is based on (e.g. a new
      mainline kernel) a new image is created by installing it. In case this
      succeeds, the image is moved to the staging area.
    - The images in the staging area are tested using known stable
      components in order to test the image, not the test-components. In
      case of all tests succeeded the image is moved to the stable area.
    - The stable images are used to test components from staging. In case of
      success the related components can be pushed to stable/master.
    
    
    Juergen
    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 12:20             ` Juergen Gross
  2018-07-05 12:34               ` Lars Kurth
@ 2018-07-05 15:14               ` Ian Jackson
  2018-07-05 15:40                 ` Sander Eikelenboom
  2018-07-06  8:58                 ` Juergen Gross
  1 sibling, 2 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 15:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, Andrew Cooper, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel, Matt Spencer

Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> Same applies to the host: the base system (without the to be tested
> component like qemu, xen, or whatever) could be installed just by
> cloning a disk/partition/logical volume.

Certainly it would be a bad idea to use anything *on the test host
itself* as a basis for a subsequent test.  The previous test might
have corrupted it.

So that means that often, and at least from one test flight to the
next, all of the base dom0 OS needs to be copied from somewhere else
to the test host.  This is not currently as fast as it could be, but
running d-i is not massively slower than something like FAI.

To a fairly large extent, similar considerations apply to guest
images.

> Each image would run through the stages new->staging->stable:
> 
> - Each time a component is released an image is based on (e.g. a new
>   mainline kernel) a new image is created by installing it. In case this
>   succeeds, the image is moved to the staging area.

This would happen a lot more often than you seem to image.  "Releaed"
here really means "is updated in its appropriate git branch".

Unless you think we should do our testing of Xen mainly with released
versions of Linux stable branches (in which case, given how Linux
stable branches are often broken, we might be long out of date), or
our testing of Linux only with point releases of Xen, etc.

The current approach is mostly to take the most recent
tested-and-working git commit from each of the inputs.  This aspect of
osstest generally works well, I think.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 15:14               ` Ian Jackson
@ 2018-07-05 15:40                 ` Sander Eikelenboom
  2018-07-05 16:23                   ` Ian Jackson
  2018-07-06  8:58                 ` Juergen Gross
  1 sibling, 1 reply; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-05 15:40 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, George Dunlap, Rich Persaud, committers,
	xen-devel, Matt Spencer


Thursday, July 5, 2018, 5:14:39 PM, you wrote:

> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> Same applies to the host: the base system (without the to be tested
>> component like qemu, xen, or whatever) could be installed just by
>> cloning a disk/partition/logical volume.

> Certainly it would be a bad idea to use anything *on the test host
> itself* as a basis for a subsequent test.  The previous test might
> have corrupted it.

> So that means that often, and at least from one test flight to the
> next, all of the base dom0 OS needs to be copied from somewhere else
> to the test host.  This is not currently as fast as it could be, but
> running d-i is not massively slower than something like FAI.

how about using (LVM) snapshotting (whch does COW) and drop the snapshots after a test ?
Only do a new OS install once a day/week (or point releas) and only after having 
an OSSTEST pass ? 
That should have fairly little overhead.

--
Sander

> To a fairly large extent, similar considerations apply to guest
> images.

>> Each image would run through the stages new->staging->stable:
>> 
>> - Each time a component is released an image is based on (e.g. a new
>>   mainline kernel) a new image is created by installing it. In case this
>>   succeeds, the image is moved to the staging area.

> This would happen a lot more often than you seem to image.  "Releaed"
> here really means "is updated in its appropriate git branch".

> Unless you think we should do our testing of Xen mainly with released
> versions of Linux stable branches (in which case, given how Linux
> stable branches are often broken, we might be long out of date), or
> our testing of Linux only with point releases of Xen, etc.

> The current approach is mostly to take the most recent
> tested-and-working git commit from each of the inputs.  This aspect of
> osstest generally works well, I think.

> Ian.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 15:40                 ` Sander Eikelenboom
@ 2018-07-05 16:23                   ` Ian Jackson
  2018-07-05 16:41                     ` George Dunlap
  0 siblings, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 16:23 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, George Dunlap, Rich Persaud, committers,
	xen-devel, Matt Spencer

Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> Thursday, July 5, 2018, 5:14:39 PM, you wrote:
> > So that means that often, and at least from one test flight to the
> > next, all of the base dom0 OS needs to be copied from somewhere else
> > to the test host.  This is not currently as fast as it could be, but
> > running d-i is not massively slower than something like FAI.
> 
> how about using (LVM) snapshotting (whch does COW) and drop the
> snapshots after a test ?  Only do a new OS install once a day/week
> (or point releas) and only after having an OSSTEST pass ?  That
> should have fairly little overhead.

I'm sorry to have to say this, but you seem not to have read what I
wrote above.

Leaving aside other questions about using LVM for a whole machine
including possible EFI system partition, bootloader etc., where would
the base image be for this LVM snapshot ?

If it is on the host itself then the previous test can corrupt it.
This is not theoretical: we are doing OS and hypervisor development.
Breakage is to be expected.

If it is not on the host itself, then the system is doing some kind of
network cow lvm thing.  That is not going to improve the test
reliability.  And it is unreliability which is our main problem.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 16:23                   ` Ian Jackson
@ 2018-07-05 16:41                     ` George Dunlap
  2018-07-05 16:54                       ` Andrew Cooper
  0 siblings, 1 reply; 82+ messages in thread
From: George Dunlap @ 2018-07-05 16:41 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, Sander Eikelenboom, Rich Persaud, committers,
	xen-devel, Matt Spencer



> On Jul 5, 2018, at 5:23 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> 
> Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> Thursday, July 5, 2018, 5:14:39 PM, you wrote:
>>> So that means that often, and at least from one test flight to the
>>> next, all of the base dom0 OS needs to be copied from somewhere else
>>> to the test host.  This is not currently as fast as it could be, but
>>> running d-i is not massively slower than something like FAI.
>> 
>> how about using (LVM) snapshotting (whch does COW) and drop the
>> snapshots after a test ?  Only do a new OS install once a day/week
>> (or point releas) and only after having an OSSTEST pass ?  That
>> should have fairly little overhead.
> 
> I'm sorry to have to say this, but you seem not to have read what I
> wrote above.
> 
> Leaving aside other questions about using LVM for a whole machine
> including possible EFI system partition, bootloader etc., where would
> the base image be for this LVM snapshot ?
> 
> If it is on the host itself then the previous test can corrupt it.
> This is not theoretical: we are doing OS and hypervisor development.
> Breakage is to be expected.

What would you think of a “backup partition” scheme instead?  I.e., make (say) /dev/sda1 and /dev/sda2  identical size, install to /dev/sda1, use a small “snapshot” netboot utility to dd it into /dev/sda2.  Run a test, then dd from /dev/sda2 back into /dev/sda1.  If this were only a few gigs it shouldn’t take more than a minute or two.  How long does a full install take?

In theory of course a wild OS write could corrupt something in an unmounted partition, but in practice the chance of that happening in *our* testing seems seems pretty tiny.  (Embedded device manufacturers seem to think this is rare enough to update firmware with, and they have a lot more to lose than we do.)

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 16:41                     ` George Dunlap
@ 2018-07-05 16:54                       ` Andrew Cooper
  2018-07-05 17:02                         ` Ian Jackson
  0 siblings, 1 reply; 82+ messages in thread
From: Andrew Cooper @ 2018-07-05 16:54 UTC (permalink / raw)
  To: George Dunlap, Ian Jackson
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	Sander Eikelenboom, Rich Persaud, committers, xen-devel,
	Matt Spencer

On 05/07/18 17:41, George Dunlap wrote:
>
>> On Jul 5, 2018, at 5:23 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
>>
>> Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>>> Thursday, July 5, 2018, 5:14:39 PM, you wrote:
>>>> So that means that often, and at least from one test flight to the
>>>> next, all of the base dom0 OS needs to be copied from somewhere else
>>>> to the test host.  This is not currently as fast as it could be, but
>>>> running d-i is not massively slower than something like FAI.
>>> how about using (LVM) snapshotting (whch does COW) and drop the
>>> snapshots after a test ?  Only do a new OS install once a day/week
>>> (or point releas) and only after having an OSSTEST pass ?  That
>>> should have fairly little overhead.
>> I'm sorry to have to say this, but you seem not to have read what I
>> wrote above.
>>
>> Leaving aside other questions about using LVM for a whole machine
>> including possible EFI system partition, bootloader etc., where would
>> the base image be for this LVM snapshot ?
>>
>> If it is on the host itself then the previous test can corrupt it.
>> This is not theoretical: we are doing OS and hypervisor development.
>> Breakage is to be expected.
> What would you think of a “backup partition” scheme instead?  I.e., make (say) /dev/sda1 and /dev/sda2  identical size, install to /dev/sda1, use a small “snapshot” netboot utility to dd it into /dev/sda2.  Run a test, then dd from /dev/sda2 back into /dev/sda1.  If this were only a few gigs it shouldn’t take more than a minute or two.  How long does a full install take?
>
> In theory of course a wild OS write could corrupt something in an unmounted partition, but in practice the chance of that happening in *our* testing seems seems pretty tiny.  (Embedded device manufacturers seem to think this is rare enough to update firmware with, and they have a lot more to lose than we do.)

XenRT, which is XenServers provisioning and testing system and install,
can deploy arbitrary builds of XenServer, or arbitrary builds of various
Linux distros in 10 minutes (although for distros, we limit our install
media to published point releases).  Google "10 minutes to Xen" for some
PR on this subject done back in the day!

This is a fresh install of the host.  Its not hard, and its not rocket
science.  What it is is absolutely necessary for reliable testing.

Attempting to cleverly cache the existing install to avoid reinstalls
won't save you much time, and it will introduce extra complexity, extra
corner cases and inevitably make the problem we're trying to solve even
worse.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 16:54                       ` Andrew Cooper
@ 2018-07-05 17:02                         ` Ian Jackson
  2018-07-05 17:06                           ` Sander Eikelenboom
  2018-07-05 17:22                           ` George Dunlap
  0 siblings, 2 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 17:02 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	George Dunlap, Sander Eikelenboom, Rich Persaud, committers,
	xen-devel, Matt Spencer

Andrew Cooper writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> XenRT, which is XenServers provisioning and testing system and install,
> can deploy arbitrary builds of XenServer, or arbitrary builds of various
> Linux distros in 10 minutes (although for distros, we limit our install
> media to published point releases).  Google "10 minutes to Xen" for some
> PR on this subject done back in the day!

osstest's d-i runs take more like 15 minutes.  As I say, this could be
improved by using something like FAI, but by a factor of at most 2 I
think.  Instead of working on that, I have been working on reusing an
install when it is feasible to do so: specifically, after a passing
job and when the host is to be reused by the same flight, with an
identical configuration.  In my tests that saves about 50% of the host
installs.  I haven't yet completed and deployed this.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:02                         ` Ian Jackson
@ 2018-07-05 17:06                           ` Sander Eikelenboom
  2018-07-05 17:11                             ` Ian Jackson
  2018-07-05 17:22                           ` George Dunlap
  1 sibling, 1 reply; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-05 17:06 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel, Matt Spencer

On 05/07/18 19:02, Ian Jackson wrote:
> Andrew Cooper writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> XenRT, which is XenServers provisioning and testing system and install,
>> can deploy arbitrary builds of XenServer, or arbitrary builds of various
>> Linux distros in 10 minutes (although for distros, we limit our install
>> media to published point releases).  Google "10 minutes to Xen" for some
>> PR on this subject done back in the day!
> 
> osstest's d-i runs take more like 15 minutes.  As I say, this could be
> improved by using something like FAI, but by a factor of at most 2 I
> think.  Instead of working on that, I have been working on reusing an
> install when it is feasible to do so: specifically, after a passing
> job and when the host is to be reused by the same flight, with an
> identical configuration.  In my tests that saves about 50% of the host
> installs.  I haven't yet completed and deployed this.
> 
> Ian.
> 
Just wondering, are there any timing statistics kept for the OSStest
flights (and separate for building the various components and running
the individual tests ?). Or should they be parse-able from the logs kept ?

That could perhaps give some better insight in the average and variation
in time spent into all the components.
--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:06                           ` Sander Eikelenboom
@ 2018-07-05 17:11                             ` Ian Jackson
  2018-07-05 17:20                               ` Sander Eikelenboom
  2018-07-05 22:47                               ` Sander Eikelenboom
  0 siblings, 2 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 17:11 UTC (permalink / raw)
  To: Sander Eikelenboom
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, George Dunlap, Rich Persaud, committers,
	xen-devel, Matt Spencer

Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> Just wondering, are there any timing statistics kept for the OSStest
> flights (and separate for building the various components and running
> the individual tests ?). Or should they be parse-able from the logs kept ?

Yes.  The database has a started and stopped time_t for each test
step.  That's where I got the ~15 mins number from.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:11                             ` Ian Jackson
@ 2018-07-05 17:20                               ` Sander Eikelenboom
  2018-07-05 22:47                               ` Sander Eikelenboom
  1 sibling, 0 replies; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-05 17:20 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, George Dunlap, Rich Persaud, committers,
	xen-devel, Matt Spencer

On 05/07/18 19:11, Ian Jackson wrote:
> Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> Just wondering, are there any timing statistics kept for the OSStest
>> flights (and separate for building the various components and running
>> the individual tests ?). Or should they be parse-able from the logs kept ?
> 
> Yes.  The database has a started and stopped time_t for each test
> step.  That's where I got the ~15 mins number from.
> 
> Ian.
> 

And what time does a complete flight for a push would require at minimum ?
and is there much variation between flights (does a non push with some failing test require more or less time than a succesful one) ?

--
Sander

BTW:
The link following link: http://osstest.xs.citrite.net/~osstest/testlogs/logs
From the osstest mail with subject: "[Xen-devel] [xen-4.10-testing baseline-only test] 74937: tolerable FAIL" doesn't seem to work"

(Is that server not publicly accessible  ?)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:02                         ` Ian Jackson
  2018-07-05 17:06                           ` Sander Eikelenboom
@ 2018-07-05 17:22                           ` George Dunlap
  2018-07-05 17:25                             ` Ian Jackson
  1 sibling, 1 reply; 82+ messages in thread
From: George Dunlap @ 2018-07-05 17:22 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, Sander Eikelenboom, Rich Persaud, committers,
	xen-devel, Matt Spencer



> On Jul 5, 2018, at 6:02 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> 
> Andrew Cooper writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> XenRT, which is XenServers provisioning and testing system and install,
>> can deploy arbitrary builds of XenServer, or arbitrary builds of various
>> Linux distros in 10 minutes (although for distros, we limit our install
>> media to published point releases).  Google "10 minutes to Xen" for some
>> PR on this subject done back in the day!
> 
> osstest's d-i runs take more like 15 minutes.  As I say, this could be
> improved by using something like FAI, but by a factor of at most 2 I
> think.

I figure a dd from a backup partition couldn’t take more than 2 minutes.  So obviously we need to apply Amdahl’s law here. So what’s the total percentage of time spent doing host installs now, if you had to guess?  If it’s something like 5%, then yeah, 15 -> 2 will save you a bit but not much.  If it’s closer to 50%, then you’re talking a much more significant savings from avoiding the re-install.

>  Instead of working on that, I have been working on reusing an
> install when it is feasible to do so: specifically, after a passing
> job and when the host is to be reused by the same flight, with an
> identical configuration.

I don’t really understand why you’re more worried about a test corrupting a backup partition or LVM snapshot, than of a test corrupting a filesystem even when the test actually passed.  I don’t have the same experience you do, but it seems like random stuff left over from a previous test — even if the test passes — would have more of a chance of screwing up a future test than some sort of corruption of an LVM snapshot, and even less so a backup partition.

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:22                           ` George Dunlap
@ 2018-07-05 17:25                             ` Ian Jackson
  2018-07-05 17:47                               ` Sander Eikelenboom
  0 siblings, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-05 17:25 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, Sander Eikelenboom, Rich Persaud, committers,
	xen-devel, Matt Spencer

George Dunlap writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> I don’t really understand why you’re more worried about a test
> corrupting a backup partition or LVM snapshot, than of a test
> corrupting a filesystem even when the test actually passed.  I don’t
> have the same experience you do, but it seems like random stuff left
> over from a previous test — even if the test passes — would have
> more of a chance of screwing up a future test than some sort of
> corruption of an LVM snapshot, and even less so a backup partition.

The difference is that these are tests *in the same flight*.  That
means they're testing the same software.

If test A passes, but corrupts the disk which is detected by test B
because the host wasn't wiped in between, causing test B to fail, then
that is a genuine test failure - albeit one whose repro conditions are
complicated.  I'm betting that this will be rare enough not to matter.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:25                             ` Ian Jackson
@ 2018-07-05 17:47                               ` Sander Eikelenboom
  0 siblings, 0 replies; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-05 17:47 UTC (permalink / raw)
  To: Ian Jackson, George Dunlap
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, Rich Persaud, committers, xen-devel,
	Matt Spencer

On 05/07/18 19:25, Ian Jackson wrote:
> George Dunlap writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> I don’t really understand why you’re more worried about a test
>> corrupting a backup partition or LVM snapshot, than of a test
>> corrupting a filesystem even when the test actually passed.  I don’t
>> have the same experience you do, but it seems like random stuff left
>> over from a previous test — even if the test passes — would have
>> more of a chance of screwing up a future test than some sort of
>> corruption of an LVM snapshot, and even less so a backup partition.
> 
> The difference is that these are tests *in the same flight*.  That
> means they're testing the same software.
> 
> If test A passes, but corrupts the disk which is detected by test B
> because the host wasn't wiped in between, causing test B to fail, then
> that is a genuine test failure - albeit one whose repro conditions are
> complicated.  I'm betting that this will be rare enough not to matter.
> 
> Ian.
> 

I know assumption happens to be the mother of some children with a certain
"attitude", but
With LVM i think in practice the chances would be pretty minimal for corruption.
The most prominent test which could cause issues would be with a linux-linus (unstable) kernel.
Else there would be a very very grave bug in either a stable linux kernel or the hardware or the Xen used on the test flight which nukes something quite specific.
Since you don't use the LVM LV with the base image it self but only the snapshot and you recycle the snapshot every time.

The others points you mentioned about EFI etc. could be interesting though.

On the other hand using a setup with LVM wouldn't prohibit you from reinstalling the base image to LV on every flight just
you seem to suggest above. It does have the benefit of being able to keep it across flights with some seemingly simple adjustments
when in practice that would just seem to work (with always being able to revert to doing a new base image every flight).

But i will leave itat this for now, it was merely a suggestion :).

--
Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-02 18:03 ` Lars Kurth
  2018-07-03  6:26   ` Juergen Gross
  2018-07-03 10:07   ` Roger Pau Monné
@ 2018-07-05 17:48   ` Doug Goldstein
  2018-07-05 18:51     ` George Dunlap
  2 siblings, 1 reply; 82+ messages in thread
From: Doug Goldstein @ 2018-07-05 17:48 UTC (permalink / raw)
  To: Lars Kurth; +Cc: xen-devel, Rich Persaud, committers, advisory-board

On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> ### Security Process
> *Batches and timing:* Everyone present, felt that informal batching is good (exception Doug G), 

fwiw, I don't dislike the batching. I just complained when there's a lot
of items in the batch. We attempt to live patch every issue and have
that ready to go when the embargo drops. When there are multiple XSAs
we each grab one to work on but depending on the size of the batch and
the current workload of the team there might be one that has no staffing
available. This obviously puts a bit of strain on since whoever finishes
with one first needs to grab that last one. Then quite often at least
one XSA has revisions to the patch during the process which requires
additional work and suddenly we're swamped. It was more an off the cuff
remark about big batches than to really be note worthy as a formal
objection to batching.

> Again, there was a sense that some of the issues we are seeing could be solved if we had better 
> CI capability: in other words, some of the issues we were seeing could be resolved by
> * Better CI capability as suggested in the Release Cadence discussion
> * Improving some of the internal working practices of the security team
> * Before we commit to a change (such as improved batching), we should try them first informally. 
>   E.g. the security team could try and work towards more predictable dates for batches vs. a 
>   concrete process change

My feeling on CI is clear in this thread and other threads. But I think
what would help OSSTEST bottlenecks if we do better at separating up
different parts of the testing process into more parallel tasks that
also provide feedback to the contributor faster. I'll obviously never
suggest the GitHub/GitLab PR/MR model to a ML driven project because I
wouldn't survive the hate mail but there is something that those models
do provide. A lot of work can be pushed back onto the contributor in an
automatic fashion instead of on the reviewer. The Rust project is a
decent model here. They only accept code contributions via a GitHub PR
but their process causes the submission to immediately be run against
code style checks, a build test on all their supported platforms and
then a number of unit tests are done over the entire code base. Lastly
they have a bot assign a random maintainer from that part of the code
base to review the submission. Ultimately the way Xen works the first
three steps are up to the reviewer to validate and the last one is
manually up to the contributor (and should they make a mistake the
reviewer needs to chime up).

The biggest boon to our review process would be to automate away a bunch
of these tasks because our reviewers are human and things are missed.
Many aren't even the fault of the reviewer doing a poor job. e.g. the
code change breaks the built with a GCC newer than the reviewer had
locally.

>  
> Note that we did not get to the stable baseline discussion: but it was highlighted that several 
> members of the security team also wear the hat of distro packagers for Debian and CentOS and 
> are starting to feel pain.
>  

To me the hardship comes from the fact that security patches apply
against the staging branch for that release (e.g. staging-4.10) but not
necessarily to the last release. Steven Haigh has brought this up as
well. This leaves each downstream responsible for backporting the
security patch against the release they shipped. Which has caused a
number of distros to bow out of providing security updates for Xen.
Yocto (via meta-virt) and Ubuntu are two notable ones that don't update
for XSAs. Gentoo is typically treated as a best effort depending on how
much time that maintainer has.

--
Doug

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-03 10:47       ` Juergen Gross
  2018-07-03 11:24         ` Lars Kurth
  2018-07-05 11:16         ` Ian Jackson
@ 2018-07-05 17:58         ` Doug Goldstein
  2 siblings, 0 replies; 82+ messages in thread
From: Doug Goldstein @ 2018-07-05 17:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, advisory-board, Rich Persaud, committers,
	'Jan Beulich',
	xen-devel, Roger Pau Monne

On Tue, Jul 03, 2018 at 12:47:14PM +0200, Juergen Gross wrote:
> On 03/07/18 12:23, Lars Kurth wrote:
> > Combined reply to Jan and Roger
> > Lars
> > 
> > On 03/07/2018, 11:07, "Roger Pau Monne" <roger.pau@citrix.com> wrote:
> > 
> >     On Mon, Jul 02, 2018 at 06:03:39PM +0000, Lars Kurth wrote:
> >     > We then had a discussion around why the positive benefits didn't materialize:
> >     > * Andrew and a few other believe that the model isn't broken, but that the issue is with how we 
> >     >   develop. In other words, moving to a 9 months model will *not* fix the underlying issues, but 
> >     >   merely provide an incentive not to fix them.
> >     > * Issues highlighted were:
> >     >   * 2-3 months stabilizing period is too long
> >     
> >     I think one of the goals with the 6 month release cycle was to shrink
> >     the stabilizing period, but it didn't turn that way, and the
> >     stabilizing period is quite similar with a 6 or a 9 month release
> >     cycle.
> > 
> > Right: we need to establish what the reasons are:
> > * One has to do with a race condition between security issues and the desire to cut a release which has issues fixed in it. If I remember correctly, that has in effect almost added a month to the last few releases (more to this one). 
> 
> The only way to avoid that would be to not allow any security fixes to
> be included in the release the last few weeks before the planned release
> date. I don't think this is a good idea. I'd rather miss the planned
> release date.

Another option could be to make the release on time without any security
patches and then once the security issue is resolved to do a point
release. I'm gonna beat on the Rust drum again but they recently did a
1.27.0 release with a known issue. They felt it was more important to
remain with their 6 *WEEK* release cadence than to break that cadence.
They followed it up with a 1.27.1 release that fixed the issue. The
difference is for security issues that Xen puts out patches that don't
necessarily cleanly apply against the last release tarball and the
staging branch has other fixes other than a security issue in it making
it less clear and easy for a downstream to ship a fix.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:16         ` Ian Jackson
  2018-07-05 11:39           ` George Dunlap
  2018-07-05 11:41           ` Juergen Gross
@ 2018-07-05 18:13           ` Doug Goldstein
  2018-07-06  8:32             ` Jan Beulich
  2 siblings, 1 reply; 82+ messages in thread
From: Doug Goldstein @ 2018-07-05 18:13 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, advisory-board, Rich Persaud,
	committers, 'Jan Beulich',
	xen-devel, Roger Pau Monne

On Thu, Jul 05, 2018 at 12:16:09PM +0100, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> > We didn't look at the sporadic failing tests thoroughly enough. The
> > hypercall buffer failure has been there for ages, a newer kernel just
> > made it more probable. This would have saved us some weeks.
> 
> In general, as a community, we are very bad at this kind of thing.
> 
> In my experience, the development community is not really interested
> in fixing bugs which aren't directly in their way.
> 
> You can observe this easily in the way that regression in Linux,
> spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> days.  Linux mainline is broken too.
> 
> We do not have a team of people reading these test reports, and
> chasing developers to fix them.  I certainly do not have time to do
> this triage.  On trees where osstest failures do not block
> development, things go unfixed for weeks, sometimes months.

Honestly this is where we need some kind of metrics with output that my
5-year old could decipher. The OSSTEST emails are large and overwhelming
and require a bit of time commitment to digest the volume and amount of
data.

Jenkins uses weather icons to attempt to convey if this test is
trending worse or better or successful or broken. If it fails but not
every time and the amount of failures is increasing over time then its
got storm clouds. If the amount of failures is decreasing there's a
little bit of sun peaking out.

Just some kind of dashboard which would tell me what would provide the
most value to drill into would likely go a long way. But again, this is
just an assumption and could be a time waste.

--
Doug

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 11:39           ` George Dunlap
@ 2018-07-05 18:14             ` Doug Goldstein
  0 siblings, 0 replies; 82+ messages in thread
From: Doug Goldstein @ 2018-07-05 18:14 UTC (permalink / raw)
  To: George Dunlap
  Cc: Juergen Gross, Lars Kurth, advisory-board, Rich Persaud,
	committers, Jan Beulich, xen-devel, Ian Jackson, Roger Pau Monne

On Thu, Jul 05, 2018 at 11:39:51AM +0000, George Dunlap wrote:
> 
> 
> > On Jul 5, 2018, at 12:16 PM, Ian Jackson <ian.jackson@citrix.com> wrote:
> > 
> > Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> >> We didn't look at the sporadic failing tests thoroughly enough. The
> >> hypercall buffer failure has been there for ages, a newer kernel just
> >> made it more probable. This would have saved us some weeks.
> > 
> > In general, as a community, we are very bad at this kind of thing.
> > 
> > In my experience, the development community is not really interested
> > in fixing bugs which aren't directly in their way.
> > 
> > You can observe this easily in the way that regression in Linux,
> > spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> > days.  Linux mainline is broken too.
> > 
> > We do not have a team of people reading these test reports, and
> > chasing developers to fix them.  I certainly do not have time to do
> > this triage.  On trees where osstest failures do not block
> > development, things go unfixed for weeks, sometimes months.
> > 
> > And overall my gut feeling is that tests which fail intermittently are
> > usually blamed (even if this is not stated explicitly) on problems
> > with osstest or with our test infrastructure.  It is easy for
> > developers to think this because if they wait, the test will get
> > "lucky", and pass, and so there will be a push and the developers can
> > carry on.
> > 
> > I have a vague plan to sit down and think about how osstest's
> > results analysers could respond better to intermittent failures.  The
> > If I can, I would like intermittent failures to block pushes.  That
> > would at least help address the problem of heisenbugs (which are often
> > actually quite serious issues) not beint taken seriously.
> > 
> > I would love to hear suggestions for how to get people to actually fix
> > test failures in trees not maintained by the Xen Project and therefore
> > not gated by osstest.
> 
> Well at the moment, investigation is ad-hoc.  Basically everyone has to look to see *whether* there’s been a failure, and it’s nobody’s job in particular to try to chase it down to find out what it might be.  If we had a team, we could have a robot rotate between the teams to nominate one particular person per failure to take a look at the result and at least try to classify it, maybe try to find the appropriate person who may be able to take a deeper look.
> 
>  -George

I forget the saying exactly and forgot who said it but it goes something
like "Any task that is the job of everyone is the job of no one and will
not get done."

--
Doug

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:48   ` Doug Goldstein
@ 2018-07-05 18:51     ` George Dunlap
  2018-07-05 19:00       ` Stefano Stabellini
                         ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: George Dunlap @ 2018-07-05 18:51 UTC (permalink / raw)
  To: Doug Goldstein
  Cc: Lars Kurth, Rich Persaud, committers, advisory-board, xen-devel

> 
>> Again, there was a sense that some of the issues we are seeing could be solved if we had better 
>> CI capability: in other words, some of the issues we were seeing could be resolved by
>> * Better CI capability as suggested in the Release Cadence discussion
>> * Improving some of the internal working practices of the security team
>> * Before we commit to a change (such as improved batching), we should try them first informally. 
>>   E.g. the security team could try and work towards more predictable dates for batches vs. a 
>>   concrete process change
> 
> My feeling on CI is clear in this thread and other threads. But I think
> what would help OSSTEST bottlenecks if we do better at separating up
> different parts of the testing process into more parallel tasks that
> also provide feedback to the contributor faster. I'll obviously never
> suggest the GitHub/GitLab PR/MR model to a ML driven project because I
> wouldn't survive the hate mail but there is something that those models
> do provide.

FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended discussion about this in our team meeting today, and everyone basically agreed that there are some things about the web-based PR model that are *really* nice:

1. Effective tracking of submission state — open / assigned to a reviewer / merged / rejected
2. Automation 
3. Not having to marshal git commits into email, and then marshal them back into git commits again

On the other hand, the general consensus, from people who had used such websites “in anger” (as they say here in the UK) was that they really didn’t like the way that reviews worked.  Email was seen as:
1. Much more convenient for giving feedback and having discussions
2. Easier for people to “listen in” on other people’s reviews
3. More accessible for posterity

In the end we generally agreed that it was an idea worth thinking about more.  Not sure how the wider community feels, but there are at least a decent cohort who wouldn’t send you hate mail. :-)

 -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 18:51     ` George Dunlap
@ 2018-07-05 19:00       ` Stefano Stabellini
  2018-07-05 19:02       ` Doug Goldstein
  2018-07-06  2:54       ` Tamas K Lengyel
  2 siblings, 0 replies; 82+ messages in thread
From: Stefano Stabellini @ 2018-07-05 19:00 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, xen-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3146 bytes --]

On Thu, 5 Jul 2018, George Dunlap wrote:
> >> Again, there was a sense that some of the issues we are seeing could be solved if we had better 
> >> CI capability: in other words, some of the issues we were seeing could be resolved by
> >> * Better CI capability as suggested in the Release Cadence discussion
> >> * Improving some of the internal working practices of the security team
> >> * Before we commit to a change (such as improved batching), we should try them first informally. 
> >>   E.g. the security team could try and work towards more predictable dates for batches vs. a 
> >>   concrete process change
> > 
> > My feeling on CI is clear in this thread and other threads. But I think
> > what would help OSSTEST bottlenecks if we do better at separating up
> > different parts of the testing process into more parallel tasks that
> > also provide feedback to the contributor faster. I'll obviously never
> > suggest the GitHub/GitLab PR/MR model to a ML driven project because I
> > wouldn't survive the hate mail but there is something that those models
> > do provide.
> 
> FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended discussion about this in our team meeting today, and everyone basically agreed that there are some things about the web-based PR model that are *really* nice:
> 
> 1. Effective tracking of submission state — open / assigned to a reviewer / merged / rejected
> 2. Automation 
> 3. Not having to marshal git commits into email, and then marshal them back into git commits again
> 
> On the other hand, the general consensus, from people who had used such websites “in anger” (as they say here in the UK) was that they really didn’t like the way that reviews worked.  Email was seen as:
> 1. Much more convenient for giving feedback and having discussions
> 2. Easier for people to “listen in” on other people’s reviews
> 3. More accessible for posterity
> 
> In the end we generally agreed that it was an idea worth thinking about more.  Not sure how the wider community feels, but there are at least a decent cohort who wouldn’t send you hate mail. :-)

A properly run patchwork instance is supposed to be able to give us many
of the benefits of the web PR model while retaining the benefits of the
ML based model.

For instance, at Xen Summit we discussed the possibility of running
automated tests on submitted patch series, before they are even
reviewed. The purpose is to free up reviewers' time. Obviously, we don't
need web PRs to enable this, given that both the Linux kernel and QEMU
already have something along those lines, however, we do need a way to
recognize patch series submissions and run something in response to
them. I don't know what Intel is using for the Linux kernel but maybe it
is something based on patchworks? We had an action item to get in touch
with them and ask.

Of course, if we don't even have the test bandwidth to do releases,
testing un-reviewed patch series is not really a priority right now.
However, these tests wouldn't be done on OSSTest, they could be done
with the aforementioned GitLab infrastructure; they are independent.

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 18:51     ` George Dunlap
  2018-07-05 19:00       ` Stefano Stabellini
@ 2018-07-05 19:02       ` Doug Goldstein
  2018-07-06  1:58         ` Doug Goldstein
  2018-07-06  2:54       ` Tamas K Lengyel
  2 siblings, 1 reply; 82+ messages in thread
From: Doug Goldstein @ 2018-07-05 19:02 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, Rich Persaud, committers, advisory-board, xen-devel

On Thu, Jul 05, 2018 at 06:51:16PM +0000, George Dunlap wrote:
> > 
> >> Again, there was a sense that some of the issues we are seeing could be solved if we had better 
> >> CI capability: in other words, some of the issues we were seeing could be resolved by
> >> * Better CI capability as suggested in the Release Cadence discussion
> >> * Improving some of the internal working practices of the security team
> >> * Before we commit to a change (such as improved batching), we should try them first informally. 
> >>   E.g. the security team could try and work towards more predictable dates for batches vs. a 
> >>   concrete process change
> > 
> > My feeling on CI is clear in this thread and other threads. But I think
> > what would help OSSTEST bottlenecks if we do better at separating up
> > different parts of the testing process into more parallel tasks that
> > also provide feedback to the contributor faster. I'll obviously never
> > suggest the GitHub/GitLab PR/MR model to a ML driven project because I
> > wouldn't survive the hate mail but there is something that those models
> > do provide.
> 
> FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended discussion about this in our team meeting today, and everyone basically agreed that there are some things about the web-based PR model that are *really* nice:
> 
> 1. Effective tracking of submission state — open / assigned to a reviewer / merged / rejected
> 2. Automation 
> 3. Not having to marshal git commits into email, and then marshal them back into git commits again
> 
> On the other hand, the general consensus, from people who had used such websites “in anger” (as they say here in the UK) was that they really didn’t like the way that reviews worked.  Email was seen as:
> 1. Much more convenient for giving feedback and having discussions
> 2. Easier for people to “listen in” on other people’s reviews
> 3. More accessible for posterity
> 
> In the end we generally agreed that it was an idea worth thinking about more.  Not sure how the wider community feels, but there are at least a decent cohort who wouldn’t send you hate mail. :-)
> 
>  -George

I guess my point is "No one think that I'm suggesting the web PR model
so please don't fire off the email cannons!". But I was say there are
some nice things about the model like you mentioned. I'm wondering if we
could somehow implement something to get the best of both worlds if that
makes sense. That's what I'm hoping to do with GitLab but I haven't had
the cycles to dive deeply into it.

--
Doug

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 17:11                             ` Ian Jackson
  2018-07-05 17:20                               ` Sander Eikelenboom
@ 2018-07-05 22:47                               ` Sander Eikelenboom
  2018-07-06  8:09                                 ` Sander Eikelenboom
  1 sibling, 1 reply; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-05 22:47 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, George Dunlap, Rich Persaud, committers,
	xen-devel, Matt Spencer

On 05/07/18 19:11, Ian Jackson wrote:
> Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> Just wondering, are there any timing statistics kept for the OSStest
>> flights (and separate for building the various components and running
>> the individual tests ?). Or should they be parse-able from the logs kept ?
> 
> Yes.  The database has a started and stopped time_t for each test
> step.  That's where I got the ~15 mins number from.
> 
> Ian.
> 

Hi Ian,

Since the current OSStest emails give a 404 on the link to the logs,
i digged in the archives and found the right url:
    http://logs.test-lab.xenproject.org/osstest/logs/

I took the liberty to browse through some of the flights trying to get a grasp on how
to interpret the numbers.

Let't take an example: http://logs.test-lab.xenproject.org/osstest/logs/124946/
Started:	2018-07-03 13:08:06 Z
Finished:	2018-07-05 06:08:54 Z

That is quite some time ...

Now if i take an example job/test say: http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl/info.html

I see:
- step 2 hosts-allocate takes 20012 seconds
  which if i interpret it right, indicates a lot of time waiting before actually having a slot available to run,
  so that seems to be indicating at least a capacity problem on the infra structure.
- Step 3 seems to be the elapsed time while syslog recorded all the steps thereafter.
  It's 2639 seconds, while the rest of the steps remaining give a sum of 2630, so that seems about right.

  All the other steps together take 2630 seconds, so the run to wait ratio is about 1/7 ....
  For the remainder let's keep the waiting out of the equation, under the assumption that if we can reduce the rest, 
  we reduce the load on the infrastructure and reduce the waiting time as well.
 
- step 4 host-install(4) takes 1005 seconds
  It seems step 4 is the step you referred to with the 15 minutes (it's indeed around 15 minutes) ?
  That is around 38% percent of all the steps (excluding the waiting from step 2) !

- step 10 debian-install which seems to be the guest install, seems modest with 288 seconds.

I also browsed some other tests and flights and on first sight it does seem the give the same pattern.

So (sometimes), a lot of time is spent on waiting for a slot, followed by doing the host install. 

So any improvement in the later will probably reap a double benefit by also reducing the wait time !


When i look at job/test: http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win10-i386/info.html

I see:
- step 2 hosts-allocate: 47116 seconds.
- step 3 syslog-server: 8191 seconds.
- step 4 host-install(4): 789 seconds, somewhat shorter than the other job/test.
- step 10 windows-install 7061 seconds, but a failing windows 10 guest install dwarfs them all...


When i look at job/test: http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win7-amd64/info.html

I see:
- step 2 hosts-allocate: 13272 seconds.
- step 3 syslog-server: 2985 seconds.
- step 4 host-install(4): 675 seconds, even somewhat shorter than both the other job/tests.
- step 10 windows-install 1029 seconds, that's a lot better than the failing windows 10 install from the other job.

So running the windows install is currently a black box with a timeout of 7000 seconds.
If it fails the total runtime of the job/test is around 8000 seconds which is almost 2 hours !

Which we do 4 times: 
- test-amd64-amd64-xl-qemut-win10-i386
- test-amd64-i386-xl-qemut-win10-i386
- test-amd64-amd64-xl-qemuu-win10-i386
- test-amd64-i386-xl-qemuu-win10-i386

Which all seem to result in a "10. windows-install" -> "fail never pass".
I sincerely *hope* i'm not interpreting this correct .. but are we wasting 4 * 2 hours = 8 hours in a flight, 
on a job/test that has *never ever* passed (and probably will never, miracles or a specific bugfix excluded) ?

Would it be an idea to only test "fail never pass" on install steps only every once in a while (they can't be blockers anyway ?)
if at all (only re-enable manually after fix?). If my interpretation is right this seems to be quite low hanging fruit.

--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 19:02       ` Doug Goldstein
@ 2018-07-06  1:58         ` Doug Goldstein
  2018-07-06  8:15           ` Roger Pau Monné
  0 siblings, 1 reply; 82+ messages in thread
From: Doug Goldstein @ 2018-07-06  1:58 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, Rich Persaud, committers, advisory-board, xen-devel

On Thu, Jul 05, 2018 at 02:02:33PM -0500, Doug Goldstein wrote:
> On Thu, Jul 05, 2018 at 06:51:16PM +0000, George Dunlap wrote:
> > > 
> > >> Again, there was a sense that some of the issues we are seeing could be solved if we had better 
> > >> CI capability: in other words, some of the issues we were seeing could be resolved by
> > >> * Better CI capability as suggested in the Release Cadence discussion
> > >> * Improving some of the internal working practices of the security team
> > >> * Before we commit to a change (such as improved batching), we should try them first informally. 
> > >>   E.g. the security team could try and work towards more predictable dates for batches vs. a 
> > >>   concrete process change
> > > 
> > > My feeling on CI is clear in this thread and other threads. But I think
> > > what would help OSSTEST bottlenecks if we do better at separating up
> > > different parts of the testing process into more parallel tasks that
> > > also provide feedback to the contributor faster. I'll obviously never
> > > suggest the GitHub/GitLab PR/MR model to a ML driven project because I
> > > wouldn't survive the hate mail but there is something that those models
> > > do provide.
> > 
> > FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended discussion about this in our team meeting today, and everyone basically agreed that there are some things about the web-based PR model that are *really* nice:
> > 
> > 1. Effective tracking of submission state — open / assigned to a reviewer / merged / rejected
> > 2. Automation 
> > 3. Not having to marshal git commits into email, and then marshal them back into git commits again
> > 
> > On the other hand, the general consensus, from people who had used such websites “in anger” (as they say here in the UK) was that they really didn’t like the way that reviews worked.  Email was seen as:
> > 1. Much more convenient for giving feedback and having discussions
> > 2. Easier for people to “listen in” on other people’s reviews
> > 3. More accessible for posterity
> > 
> > In the end we generally agreed that it was an idea worth thinking about more.  Not sure how the wider community feels, but there are at least a decent cohort who wouldn’t send you hate mail. :-)
> > 
> >  -George
> 
> I guess my point is "No one think that I'm suggesting the web PR model
> so please don't fire off the email cannons!". But I was say there are
> some nice things about the model like you mentioned. I'm wondering if we
> could somehow implement something to get the best of both worlds if that
> makes sense. That's what I'm hoping to do with GitLab but I haven't had
> the cycles to dive deeply into it.
> 
> --
> Doug

I'll also mention I personally feel less comfortable reviewing things
on the mailing list. I review and read through most patches but I don't
comment on them because I'm not necessarily confident enough to add my
R-b to them. I'm not sure if others feel this way.

--
Doug

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 18:51     ` George Dunlap
  2018-07-05 19:00       ` Stefano Stabellini
  2018-07-05 19:02       ` Doug Goldstein
@ 2018-07-06  2:54       ` Tamas K Lengyel
  2018-07-11 14:06         ` Rich Persaud
  2 siblings, 1 reply; 82+ messages in thread
From: Tamas K Lengyel @ 2018-07-06  2:54 UTC (permalink / raw)
  To: George Dunlap
  Cc: Lars Kurth, advisory-board, cardoe, Rich Persaud, committers, Xen-devel

On Thu, Jul 5, 2018 at 12:52 PM George Dunlap <George.Dunlap@citrix.com> wrote:
>
> >
> >> Again, there was a sense that some of the issues we are seeing could be solved if we had better
> >> CI capability: in other words, some of the issues we were seeing could be resolved by
> >> * Better CI capability as suggested in the Release Cadence discussion
> >> * Improving some of the internal working practices of the security team
> >> * Before we commit to a change (such as improved batching), we should try them first informally.
> >>   E.g. the security team could try and work towards more predictable dates for batches vs. a
> >>   concrete process change
> >
> > My feeling on CI is clear in this thread and other threads. But I think
> > what would help OSSTEST bottlenecks if we do better at separating up
> > different parts of the testing process into more parallel tasks that
> > also provide feedback to the contributor faster. I'll obviously never
> > suggest the GitHub/GitLab PR/MR model to a ML driven project because I
> > wouldn't survive the hate mail but there is something that those models
> > do provide.
>
> FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended discussion about this in our team meeting today, and everyone basically agreed that there are some things about the web-based PR model that are *really* nice:
>
> 1. Effective tracking of submission state — open / assigned to a reviewer / merged / rejected
> 2. Automation
> 3. Not having to marshal git commits into email, and then marshal them back into git commits again
>
> On the other hand, the general consensus, from people who had used such websites “in anger” (as they say here in the UK) was that they really didn’t like the way that reviews worked.  Email was seen as:
> 1. Much more convenient for giving feedback and having discussions
> 2. Easier for people to “listen in” on other people’s reviews
> 3. More accessible for posterity
>
> In the end we generally agreed that it was an idea worth thinking about more.  Not sure how the wider community feels, but there are at least a decent cohort who wouldn’t send you hate mail. :-)

I for one would very much welcome a PR-style model. Keeping track of
patches in emails I need to review is not fun (and I'm pretty bad at
it) and then just to find a patch that doesn't even compile is a waste
of everyone's time. Automatic style checks and compile checks are the
bare minimum I would consider any project should have today. There is
already a Travis CI script shipped with Xen yet it's not used when
patches are submitted.. Perhaps the reviews are more accessible for
posterity but I personally never end up reading old reviews, even in
the depths of the worst code archeology it's always just looking at
git blame and commit messages. Giving feedback and discussions I also
find a lot more easier to navigate on say Github then on the
mailinglist - and I do get email copies of PRs and can reply inline
via email if I want to.. We are already keeping track of open patch
series on Jira - or at least there was an attempt to do so, not sure
how up-to-date that is - but that's not the right way as that requires
manual porting of tasks from the mailinglist. Perhaps it should be the
other way around.

Just my 2c.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 22:47                               ` Sander Eikelenboom
@ 2018-07-06  8:09                                 ` Sander Eikelenboom
  0 siblings, 0 replies; 82+ messages in thread
From: Sander Eikelenboom @ 2018-07-06  8:09 UTC (permalink / raw)
  To: Sander Eikelenboom, Ian Jackson
  Cc: Juergen Gross, Lars Kurth, Andrew Cooper, advisory-board,
	Doug Goldstein, George Dunlap, Rich Persaud, committers,
	xen-devel, Matt Spencer

On 06/07/18 00:47, Sander Eikelenboom wrote:
> On 05/07/18 19:11, Ian Jackson wrote:
>> Sander Eikelenboom writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>>> Just wondering, are there any timing statistics kept for the OSStest
>>> flights (and separate for building the various components and running
>>> the individual tests ?). Or should they be parse-able from the logs kept ?
>>
>> Yes.  The database has a started and stopped time_t for each test
>> step.  That's where I got the ~15 mins number from.
>>
>> Ian.
>>
> 
> Hi Ian,
> 
> Since the current OSStest emails give a 404 on the link to the logs,
> i digged in the archives and found the right url:
>     http://logs.test-lab.xenproject.org/osstest/logs/
> 
> I took the liberty to browse through some of the flights trying to get a grasp on how
> to interpret the numbers.
> 
> Let't take an example: http://logs.test-lab.xenproject.org/osstest/logs/124946/
> Started:	2018-07-03 13:08:06 Z
> Finished:	2018-07-05 06:08:54 Z
> 
> That is quite some time ...
> 
> Now if i take an example job/test say: http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl/info.html
> 
> I see:
> - step 2 hosts-allocate takes 20012 seconds
>   which if i interpret it right, indicates a lot of time waiting before actually having a slot available to run,
>   so that seems to be indicating at least a capacity problem on the infra structure.
> - Step 3 seems to be the elapsed time while syslog recorded all the steps thereafter.
>   It's 2639 seconds, while the rest of the steps remaining give a sum of 2630, so that seems about right.
> 
>   All the other steps together take 2630 seconds, so the run to wait ratio is about 1/7 ....
>   For the remainder let's keep the waiting out of the equation, under the assumption that if we can reduce the rest, 
>   we reduce the load on the infrastructure and reduce the waiting time as well.
>  
> - step 4 host-install(4) takes 1005 seconds
>   It seems step 4 is the step you referred to with the 15 minutes (it's indeed around 15 minutes) ?
>   That is around 38% percent of all the steps (excluding the waiting from step 2) !
> 
> - step 10 debian-install which seems to be the guest install, seems modest with 288 seconds.
> 
> I also browsed some other tests and flights and on first sight it does seem the give the same pattern.
> 
> So (sometimes), a lot of time is spent on waiting for a slot, followed by doing the host install. 
> 
> So any improvement in the later will probably reap a double benefit by also reducing the wait time !
> 
> 
> When i look at job/test: http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win10-i386/info.html
> 
> I see:
> - step 2 hosts-allocate: 47116 seconds.
> - step 3 syslog-server: 8191 seconds.
> - step 4 host-install(4): 789 seconds, somewhat shorter than the other job/test.
> - step 10 windows-install 7061 seconds, but a failing windows 10 guest install dwarfs them all...
> 
> 
> When i look at job/test: http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win7-amd64/info.html
> 
> I see:
> - step 2 hosts-allocate: 13272 seconds.
> - step 3 syslog-server: 2985 seconds.
> - step 4 host-install(4): 675 seconds, even somewhat shorter than both the other job/tests.
> - step 10 windows-install 1029 seconds, that's a lot better than the failing windows 10 install from the other job.
> 
> So running the windows install is currently a black box with a timeout of 7000 seconds.
> If it fails the total runtime of the job/test is around 8000 seconds which is almost 2 hours !
> 
> Which we do 4 times: 
> - test-amd64-amd64-xl-qemut-win10-i386
> - test-amd64-i386-xl-qemut-win10-i386
> - test-amd64-amd64-xl-qemuu-win10-i386
> - test-amd64-i386-xl-qemuu-win10-i386
> 
> Which all seem to result in a "10. windows-install" -> "fail never pass".
> I sincerely *hope* i'm not interpreting this correct .. but are we wasting 4 * 2 hours = 8 hours in a flight, 
> on a job/test that has *never ever* passed (and probably will never, miracles or a specific bugfix excluded) ?

This morning had another look and http://logs.test-lab.xenproject.org/osstest/logs/124940/test-amd64-amd64-xl-qemuu-win10-i386/fiano0_win.guest.osstest-vnc.jpeg
could indicate windows 10 has detected no NIC. Perhaps changing the emulated NIC type from the default realtek 8139 to an intel e1000 would be all it takes to make
the test succeed, seems to be worth a try. Hopefully the time of a successful test of a windows 10 install will be significantly less than the 2 hours of a failing one.

--
Sander


> 
> Would it be an idea to only test "fail never pass" on install steps only every once in a while (they can't be blockers anyway ?)
> if at all (only re-enable manually after fix?). If my interpretation is right this seems to be quite low hanging fruit.
> 
> --
> Sander
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06  1:58         ` Doug Goldstein
@ 2018-07-06  8:15           ` Roger Pau Monné
  0 siblings, 0 replies; 82+ messages in thread
From: Roger Pau Monné @ 2018-07-06  8:15 UTC (permalink / raw)
  To: Doug Goldstein
  Cc: Lars Kurth, advisory-board, George Dunlap, Rich Persaud,
	committers, xen-devel

On Thu, Jul 05, 2018 at 08:58:18PM -0500, Doug Goldstein wrote:
> On Thu, Jul 05, 2018 at 02:02:33PM -0500, Doug Goldstein wrote:
> > I guess my point is "No one think that I'm suggesting the web PR model
> > so please don't fire off the email cannons!". But I was say there are
> > some nice things about the model like you mentioned. I'm wondering if we
> > could somehow implement something to get the best of both worlds if that
> > makes sense. That's what I'm hoping to do with GitLab but I haven't had
> > the cycles to dive deeply into it.
> > 
> > --
> > Doug
> 
> I'll also mention I personally feel less comfortable reviewing things
> on the mailing list. I review and read through most patches but I don't
> comment on them because I'm not necessarily confident enough to add my
> R-b to them. I'm not sure if others feel this way.

I'm afraid I don't see how switching to web PR review model is going
to change that. Confidence of the reviewer seems orthogonal to the
tool used to perform the reviews.

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 18:13           ` Doug Goldstein
@ 2018-07-06  8:32             ` Jan Beulich
  2018-07-06  8:44               ` Andrew Cooper
  2018-07-06 14:52               ` Doug Goldstein
  0 siblings, 2 replies; 82+ messages in thread
From: Jan Beulich @ 2018-07-06  8:32 UTC (permalink / raw)
  To: Doug Goldstein
  Cc: Juergen Gross, Lars Kurth, advisory-board, Rich Persaud,
	committers, xen-devel, Ian Jackson, Roger Pau Monne

>>> On 05.07.18 at 20:13, <cardoe@cardoe.com> wrote:
> On Thu, Jul 05, 2018 at 12:16:09PM +0100, Ian Jackson wrote:
>> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
> session] Process changes: is the 6 monthly release Cadence too short, 
> Security Process, ..."):
>> > We didn't look at the sporadic failing tests thoroughly enough. The
>> > hypercall buffer failure has been there for ages, a newer kernel just
>> > made it more probable. This would have saved us some weeks.
>> 
>> In general, as a community, we are very bad at this kind of thing.
>> 
>> In my experience, the development community is not really interested
>> in fixing bugs which aren't directly in their way.
>> 
>> You can observe this easily in the way that regression in Linux,
>> spotted by osstest, are handled.  Linux 4.9 has been broken for 43
>> days.  Linux mainline is broken too.
>> 
>> We do not have a team of people reading these test reports, and
>> chasing developers to fix them.  I certainly do not have time to do
>> this triage.  On trees where osstest failures do not block
>> development, things go unfixed for weeks, sometimes months.
> 
> Honestly this is where we need some kind of metrics with output that my
> 5-year old could decipher. The OSSTEST emails are large and overwhelming
> and require a bit of time commitment to digest the volume and amount of
> data.

I don't understand this: All that's really relevant in those mails for
an initial check is the top most section "Tests which did not succeed
and are blocking". Everything further from that requires looking into
one or more of the logs and auxiliary files linked to at the very top
of those mails.

> Jenkins uses weather icons to attempt to convey if this test is
> trending worse or better or successful or broken. If it fails but not
> every time and the amount of failures is increasing over time then its
> got storm clouds. If the amount of failures is decreasing there's a
> little bit of sun peaking out.
> 
> Just some kind of dashboard which would tell me what would provide the
> most value to drill into would likely go a long way. But again, this is
> just an assumption and could be a time waste.

I think every test failure warrants looking into. It is just the case that
after having seen a certain "uninteresting" case a number of times, I
for instance make further implications from that on later flight reports.
Maybe I shouldn't, but I also can't afford spending endless hours on
looking all the details of all the flights.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06  8:32             ` Jan Beulich
@ 2018-07-06  8:44               ` Andrew Cooper
  2018-07-06 14:03                 ` Ian Jackson
  2018-07-06 14:52               ` Doug Goldstein
  1 sibling, 1 reply; 82+ messages in thread
From: Andrew Cooper @ 2018-07-06  8:44 UTC (permalink / raw)
  To: Jan Beulich, Doug Goldstein
  Cc: Juergen Gross, Lars Kurth, advisory-board, Rich Persaud,
	committers, xen-devel, Ian Jackson, Roger Pau Monne

On 06/07/2018 09:32, Jan Beulich wrote:
>> Jenkins uses weather icons to attempt to convey if this test is
>> trending worse or better or successful or broken. If it fails but not
>> every time and the amount of failures is increasing over time then its
>> got storm clouds. If the amount of failures is decreasing there's a
>> little bit of sun peaking out.
>>
>> Just some kind of dashboard which would tell me what would provide the
>> most value to drill into would likely go a long way. But again, this is
>> just an assumption and could be a time waste.
> I think every test failure warrants looking into. It is just the case that
> after having seen a certain "uninteresting" case a number of times, I
> for instance make further implications from that on later flight reports.
> Maybe I shouldn't, but I also can't afford spending endless hours on
> looking all the details of all the flights.

The results of testing should be a single bit.  Yes or No.

No means that someone needs to investigate and get it back to saying Yes.

I've said this before, but categories like "fail never pass" and "fail
not blocking" only muddy the water and train people to get complacent at
the results.  (Also, fail never pass is a 100% waste of time running in
the first place, and this isn't the first time I've suggested that it is
the lowest hanging of the low hanging fruit...)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-05 15:14               ` Ian Jackson
  2018-07-05 15:40                 ` Sander Eikelenboom
@ 2018-07-06  8:58                 ` Juergen Gross
  2018-07-06 14:08                   ` Ian Jackson
  1 sibling, 1 reply; 82+ messages in thread
From: Juergen Gross @ 2018-07-06  8:58 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lars Kurth, Andrew Cooper, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel, Matt Spencer

On 05/07/18 17:14, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> Same applies to the host: the base system (without the to be tested
>> component like qemu, xen, or whatever) could be installed just by
>> cloning a disk/partition/logical volume.
> 
> Certainly it would be a bad idea to use anything *on the test host
> itself* as a basis for a subsequent test.  The previous test might
> have corrupted it.

Right. Not sure, whether possible, but in an ideal environment we'd have
an external storage system reachable by the control nodes and the test
systems. The test systems should be able to access their test data only,
while the control nodes would initialize the test data while the related
test machine is still offline.

>> Each image would run through the stages new->staging->stable:
>>
>> - Each time a component is released an image is based on (e.g. a new
>>   mainline kernel) a new image is created by installing it. In case this
>>   succeeds, the image is moved to the staging area.
> 
> This would happen a lot more often than you seem to image.  "Releaed"
> here really means "is updated in its appropriate git branch".
> 
> Unless you think we should do our testing of Xen mainly with released
> versions of Linux stable branches (in which case, given how Linux
> stable branches are often broken, we might be long out of date), or
> our testing of Linux only with point releases of Xen, etc.

Yes, that's what I think.

The Xen (hypervisor, tools) tests should be done with released kernels
(either stable or the last one from upstream).

Tests of Linux Xen support should be done with released Xen versions.

> The current approach is mostly to take the most recent
> tested-and-working git commit from each of the inputs.  This aspect of
> osstest generally works well, I think.

We have a bandwidth problem already. If one unstable input product is
failing all related tests do so, too. I'd rather know which of the
input sources is the most probable one to be blamed for a test failure.

Another aspect of using stable versions is the possibility to find even
performance regressions automatically. Changing multiple versions
between tests makes that impossible, as you don't know who is to blame.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06  8:44               ` Andrew Cooper
@ 2018-07-06 14:03                 ` Ian Jackson
  2018-07-06 14:09                   ` Juergen Gross
  0 siblings, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-06 14:03 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Juergen Gross, Lars Kurth, advisory-board, Doug Goldstein,
	Rich Persaud, committers, Jan Beulich, xen-devel,
	Roger Pau Monne

Andrew Cooper writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> On 06/07/2018 09:32, Jan Beulich wrote:
> > I think every test failure warrants looking into. It is just the case that
> > after having seen a certain "uninteresting" case a number of times, I
> > for instance make further implications from that on later flight reports.
> > Maybe I shouldn't, but I also can't afford spending endless hours on
> > looking all the details of all the flights.
> 
> The results of testing should be a single bit.  Yes or No.
> 
> No means that someone needs to investigate and get it back to saying Yes.

That one bit is "did we git a push".  It's in the Subject line.

> I've said this before, but categories like "fail never pass" and "fail
> not blocking" only muddy the water and train people to get complacent at
> the results.

These messages are clearly separated from the things one needs to care
about.

IMO the real difficulties we are having are (i) the blockers which are
not bugs in the code under test; (ii) the heisenbugs; and (iii) that
bugs are being detected in expensive (slow) osstest runs which could
have been detected earlier.

As for (i), we have discussed the various categories of this and what
we are doing about them, elsewhere in this thread.  (ii) is a bit more
complicated.  (iii) is addressed to a large extent by the patchwork
proposals.

> (Also, fail never pass is a 100% waste of time running in
> the first place, and this isn't the first time I've suggested that it is
> the lowest hanging of the low hanging fruit...)

I disagree entirely.  In general, these consume negligible resources,
and never block pushes.  Allowing the existence of this category (and
also "fail pass in NNNN" means that it is possible to develop tests
for a feature in parallel with the feature and greatly reduces the
amount of sequencing between committing to various trees.

"fail not blocking" is obviously an essential category.  If a
particular thing is unreliable, it needs to be stopped from blocking
tests.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06  8:58                 ` Juergen Gross
@ 2018-07-06 14:08                   ` Ian Jackson
  2018-07-06 14:17                     ` Juergen Gross
  0 siblings, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-06 14:08 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, Andrew Cooper, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel, Matt Spencer

Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> On 05/07/18 17:14, Ian Jackson wrote:
> > Certainly it would be a bad idea to use anything *on the test host
> > itself* as a basis for a subsequent test.  The previous test might
> > have corrupted it.
> 
> Right. Not sure, whether possible, but in an ideal environment we'd have
> an external storage system reachable by the control nodes and the test
> systems. The test systems should be able to access their test data only,
> while the control nodes would initialize the test data while the related
> test machine is still offline.

That would mean that every test would run with the test host accessing
its primary OS storage via something like iSCSI.  That would be an
awful lot of extra complexity.

> > Unless you think we should do our testing of Xen mainly with released
> > versions of Linux stable branches (in which case, given how Linux
> > stable branches are often broken, we might be long out of date), or
> > our testing of Linux only with point releases of Xen, etc.
> 
> Yes, that's what I think.
> 
> The Xen (hypervisor, tools) tests should be done with released kernels
> (either stable or the last one from upstream).
> 
> Tests of Linux Xen support should be done with released Xen versions.

The result of this is that a feature which requires support in
multiple trees could not be tested until at least one (and probably
several) of the relevant trees had been released.  Which would be
rather late to discover that it doesn't work.

> > The current approach is mostly to take the most recent
> > tested-and-working git commit from each of the inputs.  This aspect of
> > osstest generally works well, I think.
> 
> We have a bandwidth problem already. If one unstable input product is
> failing all related tests do so, too. I'd rather know which of the
> input sources is the most probable one to be blamed for a test failure.

I think you are conflating "released" with "tested".  In the current
osstest setup each test of xen-unstable#staging is done with *tested*
versions of all the other trees.

So barring heisenbugs, hardware problems, or whatever, blocking
failures will be due to changes in xen-unstable#staging.
(Host-specific failures might also slip through, but this is not very
likely.)

The problem is that we have too many "heisenbugs, hardware problems,
or whatever".

> Another aspect of using stable versions is the possibility to find even
> performance regressions automatically. Changing multiple versions
> between tests makes that impossible, as you don't know who is to blame.

osstest does not change multiple versions in that sense.  At each
stage it is testing a candidate version of some particular component,
with tested-and-passed versions of the other components.

The candidate component is what the "branch" is, in the Subject line.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 14:03                 ` Ian Jackson
@ 2018-07-06 14:09                   ` Juergen Gross
  2018-07-06 14:26                     ` Ian Jackson
  0 siblings, 1 reply; 82+ messages in thread
From: Juergen Gross @ 2018-07-06 14:09 UTC (permalink / raw)
  To: Ian Jackson, Andrew Cooper
  Cc: Lars Kurth, advisory-board, Doug Goldstein, Rich Persaud,
	committers, Jan Beulich, xen-devel, Roger Pau Monne

On 06/07/18 16:03, Ian Jackson wrote:
> "fail not blocking" is obviously an essential category.  If a
> particular thing is unreliable, it needs to be stopped from blocking
> tests.

What is the value of such a test?

Either we say the tested functionality isn't mandatory, so a failure
should not block the release - we could just drop the test without
losing anything.

Or the test is wrong, so it should be either corrected or removed, but
letting it use scarce resources is questionable.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 14:08                   ` Ian Jackson
@ 2018-07-06 14:17                     ` Juergen Gross
  2018-07-06 14:27                       ` Ian Jackson
  0 siblings, 1 reply; 82+ messages in thread
From: Juergen Gross @ 2018-07-06 14:17 UTC (permalink / raw)
  To: Ian Jackson
  Cc: Lars Kurth, Andrew Cooper, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel, Matt Spencer

On 06/07/18 16:08, Ian Jackson wrote:
> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
>> On 05/07/18 17:14, Ian Jackson wrote:
>>> Certainly it would be a bad idea to use anything *on the test host
>>> itself* as a basis for a subsequent test.  The previous test might
>>> have corrupted it.
>>
>> Right. Not sure, whether possible, but in an ideal environment we'd have
>> an external storage system reachable by the control nodes and the test
>> systems. The test systems should be able to access their test data only,
>> while the control nodes would initialize the test data while the related
>> test machine is still offline.
> 
> That would mean that every test would run with the test host accessing
> its primary OS storage via something like iSCSI.  That would be an
> awful lot of extra complexity.

SCSI and NAS are not the only storage technologies available.

In my previous employment we used FC disk arrays for that purpose.

>>> Unless you think we should do our testing of Xen mainly with released
>>> versions of Linux stable branches (in which case, given how Linux
>>> stable branches are often broken, we might be long out of date), or
>>> our testing of Linux only with point releases of Xen, etc.
>>
>> Yes, that's what I think.
>>
>> The Xen (hypervisor, tools) tests should be done with released kernels
>> (either stable or the last one from upstream).
>>
>> Tests of Linux Xen support should be done with released Xen versions.
> 
> The result of this is that a feature which requires support in
> multiple trees could not be tested until at least one (and probably
> several) of the relevant trees had been released.  Which would be
> rather late to discover that it doesn't work.

For that purpose special tests could be set up. E.g. by using a Linux
kernel tree based on a released kernel with just the needed patches on
top.

>>> The current approach is mostly to take the most recent
>>> tested-and-working git commit from each of the inputs.  This aspect of
>>> osstest generally works well, I think.
>>
>> We have a bandwidth problem already. If one unstable input product is
>> failing all related tests do so, too. I'd rather know which of the
>> input sources is the most probable one to be blamed for a test failure.
> 
> I think you are conflating "released" with "tested".  In the current
> osstest setup each test of xen-unstable#staging is done with *tested*
> versions of all the other trees.

And what about tests of the other trees? Do those only use
xen-unstable#master? If yes, I'm fine.

> So barring heisenbugs, hardware problems, or whatever, blocking
> failures will be due to changes in xen-unstable#staging.
> (Host-specific failures might also slip through, but this is not very
> likely.)
> 
> The problem is that we have too many "heisenbugs, hardware problems,
> or whatever".

Yes.


Juergen


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 14:09                   ` Juergen Gross
@ 2018-07-06 14:26                     ` Ian Jackson
  0 siblings, 0 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-06 14:26 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, Andrew Cooper, advisory-board, Doug Goldstein,
	Rich Persaud, committers, Jan Beulich, xen-devel,
	Roger Pau Monne

Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> On 06/07/18 16:03, Ian Jackson wrote:
> > "fail not blocking" is obviously an essential category.  If a
> > particular thing is unreliable, it needs to be stopped from blocking
> > tests.
> 
> What is the value of such a test?
> 
> Either we say the tested functionality isn't mandatory, so a failure
> should not block the release - we could just drop the test without
> losing anything.

Such a test exists in anticipation that the code will be fixed, and
start to pass.

When the test is a longstanding failure it represents a longstanding
deficiency.  Deleting the test removes any remaining pressure to fix
the deficiency.

> Or the test is wrong, so it should be either corrected or removed, but
> letting it use scarce resources is questionable.

Certainly if a whole job has been failing, for any significant period
of time, then this is indeed a waste of resources.  But that isn't
always the case.

Looking at the report from 124956, for example, the failing steps are
ones like these:

 test-armhf-armhf-libvirt-xsm 14 saverestore-support-check    fail  like 124566

   Dozens of these, and of the corresponding migrate support check.  1
   second each.  This is not a problem.  I do not intend to remove
   this.

 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 10 debian-hvm-install fail never pass

   Whole job is essentially a wipeout, so resources are indeed being
   "wasted".  But this is a new feature.  It depends on changes in
   Linux, Xen, qemu, and of course osstest.  Those pieces are on their
   way so it should start to pass soon, in at least some branches.

So those are OK I think.


Then we have this:

 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop            fail like 124789

   200 seconds each; this is more of a problem.

 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install         fail never pass

   7062 seconds each.  Four of these.

This is a much more serious problem.  Also, these tests take
inordinately long because Windows takes forever to install even when
it succeeds.  We do install tests because the churning that Windows
does when it installs do seem to tend to reveal all sorts of
bizarrenesses.

But we have a problem with triage and minding these tests.  I know
very little about Windows.  You may remember me posting to xen-devel
asking for help debugging these Windows tests.  Such help has not
really been forthcoming; certainly not in the quantity needed.

I probably should have chased this up, and set a deadline for dropping
all Windows 10 testing.  You can see why that wouldn't be popular.


Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 14:17                     ` Juergen Gross
@ 2018-07-06 14:27                       ` Ian Jackson
  0 siblings, 0 replies; 82+ messages in thread
From: Ian Jackson @ 2018-07-06 14:27 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Lars Kurth, Andrew Cooper, advisory-board, Doug Goldstein,
	George Dunlap, Rich Persaud, committers, xen-devel, Ian Jackson,
	Matt Spencer

Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> On 06/07/18 16:08, Ian Jackson wrote:
> > I think you are conflating "released" with "tested".  In the current
> > osstest setup each test of xen-unstable#staging is done with *tested*
> > versions of all the other trees.
> 
> And what about tests of the other trees? Do those only use
> xen-unstable#master? If yes, I'm fine.

Yes, exactly.  Good.

> > So barring heisenbugs, hardware problems, or whatever, blocking
> > failures will be due to changes in xen-unstable#staging.
> > (Host-specific failures might also slip through, but this is not very
> > likely.)
> > 
> > The problem is that we have too many "heisenbugs, hardware problems,
> > or whatever".
> 
> Yes.

Right.  I think that's what we need to work on.  I am trying, but
there is only one of me and I have other responsibilities too...

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06  8:32             ` Jan Beulich
  2018-07-06  8:44               ` Andrew Cooper
@ 2018-07-06 14:52               ` Doug Goldstein
  2018-07-06 15:09                 ` Ian Jackson
  1 sibling, 1 reply; 82+ messages in thread
From: Doug Goldstein @ 2018-07-06 14:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Juergen Gross, Lars Kurth, advisory-board, Rich Persaud,
	committers, xen-devel, Ian Jackson, Roger Pau Monne

On Fri, Jul 06, 2018 at 02:32:16AM -0600, Jan Beulich wrote:
> >>> On 05.07.18 at 20:13, <cardoe@cardoe.com> wrote:
> > On Thu, Jul 05, 2018 at 12:16:09PM +0100, Ian Jackson wrote:
> >> Juergen Gross writes ("Re: [Xen-devel] [Notes for xen summit 2018 design 
> > session] Process changes: is the 6 monthly release Cadence too short, 
> > Security Process, ..."):
> >> > We didn't look at the sporadic failing tests thoroughly enough. The
> >> > hypercall buffer failure has been there for ages, a newer kernel just
> >> > made it more probable. This would have saved us some weeks.
> >> 
> >> In general, as a community, we are very bad at this kind of thing.
> >> 
> >> In my experience, the development community is not really interested
> >> in fixing bugs which aren't directly in their way.
> >> 
> >> You can observe this easily in the way that regression in Linux,
> >> spotted by osstest, are handled.  Linux 4.9 has been broken for 43
> >> days.  Linux mainline is broken too.
> >> 
> >> We do not have a team of people reading these test reports, and
> >> chasing developers to fix them.  I certainly do not have time to do
> >> this triage.  On trees where osstest failures do not block
> >> development, things go unfixed for weeks, sometimes months.
> > 
> > Honestly this is where we need some kind of metrics with output that my
> > 5-year old could decipher. The OSSTEST emails are large and overwhelming
> > and require a bit of time commitment to digest the volume and amount of
> > data.
> 
> I don't understand this: All that's really relevant in those mails for
> an initial check is the top most section "Tests which did not succeed
> and are blocking". Everything further from that requires looking into
> one or more of the logs and auxiliary files linked to at the very top
> of those mails.
> 

My point is more about human nature. When people feel overwhelmed then
tend to shy away. The amount of emails that people need to check on is
fairly high. So reducing it down into some easy summary would help get
more eyes on things. This is the reason we have cover letters for a
series of commits. Folks would be overwhelmed if they had to explore
each one to see what the goal of the series was.

Also one test flight email doesn't provide information on trends.

> > Jenkins uses weather icons to attempt to convey if this test is
> > trending worse or better or successful or broken. If it fails but not
> > every time and the amount of failures is increasing over time then its
> > got storm clouds. If the amount of failures is decreasing there's a
> > little bit of sun peaking out.
> > 
> > Just some kind of dashboard which would tell me what would provide the
> > most value to drill into would likely go a long way. But again, this is
> > just an assumption and could be a time waste.
> 
> I think every test failure warrants looking into. It is just the case that
> after having seen a certain "uninteresting" case a number of times, I
> for instance make further implications from that on later flight reports.
> Maybe I shouldn't, but I also can't afford spending endless hours on
> looking all the details of all the flights.
> 
> Jan

You effectively supported my point in the end. People value their time.
Giving them details about trends could help folks to look at test
failures that have "interesting" failures.

--
Doug

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 14:52               ` Doug Goldstein
@ 2018-07-06 15:09                 ` Ian Jackson
  2018-07-06 16:42                   ` Lars Kurth
  0 siblings, 1 reply; 82+ messages in thread
From: Ian Jackson @ 2018-07-06 15:09 UTC (permalink / raw)
  To: Doug Goldstein
  Cc: Juergen Gross, Lars Kurth, advisory-board, Rich Persaud,
	committers, Jan Beulich, xen-devel, Roger Pau Monne

Doug Goldstein writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
> You effectively supported my point in the end. People value their time.
> Giving them details about trends could help folks to look at test
> failures that have "interesting" failures.

Your focuse on the notion of "trend" here is interesting.

One way of looking at that is that really you are asking about change
over time.  Of course that's what the "regression" concept is in
osstest.

But that is for each individual test step.  When we have a heisenbug
which affects multiple tests, osstest is not really right now very
good at aggregating that information.

Maybe your "trend" idea is useful here.  osstest could perhaps track
the proportion of "heisen" failures somehow.  I will have to think
about how to do that.  Ie, exactly how to calculate the numerator and
denominator.

Do we want to track that over flights that got pushes, or something,
only ?  ISTM that, for example, if master==staging~2, and tests of
staging~1 were a wipeout because of a regression fixed in staging~0,
then when we report the "trend" in the test report of staging~0, that
should disregard the disaster that was the test(s) of staging~1.

And of course I should be asking a different question entirely if we
decide to move to multiple, rewinding, input branches, rather than a
single fast-forwarding one.

Ian.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 15:09                 ` Ian Jackson
@ 2018-07-06 16:42                   ` Lars Kurth
  2018-07-12  9:24                     ` Lars Kurth
  0 siblings, 1 reply; 82+ messages in thread
From: Lars Kurth @ 2018-07-06 16:42 UTC (permalink / raw)
  To: Ian Jackson, Doug Goldstein, Jan Beulich, Roger Pau Monne,
	Rich Persaud, xen-devel, Juergen Gross, committers,
	Sander Eikelenboom, Stefano Stabellini

Hi all, (I also moved the AB to BCC)

I summarized the discussion in https://docs.google.com/document/d/1W7OuISUau-FtPG6tIinD4GXYFb-hKDjaqTj84pogNrA/edit?usp=sharing 

I may have missed some things or misinterpreted them, but it looks as if consensus is emerging in some areas. I would like to discuss what we do for the 4.12 release at next week's community call. As far as I can see we have a few options:
* Go on as we are
* Move to 9 months, until we fixed the underlying issues - the problem is that unless we get some sort of commitment 
* Skip a release as a one-off: Set ourselves some goals that must be achieved in this cycle around testing - this will need some commitment from vendors

Regards
Lars

On 06/07/2018, 16:09, "Ian Jackson" <ian.jackson@citrix.com> wrote:

    Doug Goldstein writes ("Re: [Xen-devel] [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ..."):
    > You effectively supported my point in the end. People value their time.
    > Giving them details about trends could help folks to look at test
    > failures that have "interesting" failures.
    
    Your focuse on the notion of "trend" here is interesting.
    
    One way of looking at that is that really you are asking about change
    over time.  Of course that's what the "regression" concept is in
    osstest.
    
    But that is for each individual test step.  When we have a heisenbug
    which affects multiple tests, osstest is not really right now very
    good at aggregating that information.
    
    Maybe your "trend" idea is useful here.  osstest could perhaps track
    the proportion of "heisen" failures somehow.  I will have to think
    about how to do that.  Ie, exactly how to calculate the numerator and
    denominator.
    
    Do we want to track that over flights that got pushes, or something,
    only ?  ISTM that, for example, if master==staging~2, and tests of
    staging~1 were a wipeout because of a regression fixed in staging~0,
    then when we report the "trend" in the test report of staging~0, that
    should disregard the disaster that was the test(s) of staging~1.
    
    And of course I should be asking a different question entirely if we
    decide to move to multiple, rewinding, input branches, rather than a
    single fast-forwarding one.
    
    Ian.
    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06  2:54       ` Tamas K Lengyel
@ 2018-07-11 14:06         ` Rich Persaud
  2018-07-11 15:12           ` Paul Durrant
  0 siblings, 1 reply; 82+ messages in thread
From: Rich Persaud @ 2018-07-11 14:06 UTC (permalink / raw)
  To: Xen-devel; +Cc: committers, Lars Kurth, cardoe, George Dunlap, Tamas K Lengyel

On Jul 5, 2018, at 22:54, Tamas K Lengyel <tamas.k.lengyel@gmail.com> wrote:
> 
>> On Thu, Jul 5, 2018 at 12:52 PM George Dunlap <George.Dunlap@citrix.com> wrote:
>> 
>>> 
>>>> Again, there was a sense that some of the issues we are seeing could be solved if we had better
>>>> CI capability: in other words, some of the issues we were seeing could be resolved by
>>>> * Better CI capability as suggested in the Release Cadence discussion
>>>> * Improving some of the internal working practices of the security team
>>>> * Before we commit to a change (such as improved batching), we should try them first informally.
>>>>  E.g. the security team could try and work towards more predictable dates for batches vs. a
>>>>  concrete process change
>>> 
>>> My feeling on CI is clear in this thread and other threads. But I think
>>> what would help OSSTEST bottlenecks if we do better at separating up
>>> different parts of the testing process into more parallel tasks that
>>> also provide feedback to the contributor faster. I'll obviously never
>>> suggest the GitHub/GitLab PR/MR model to a ML driven project because I
>>> wouldn't survive the hate mail but there is something that those models
>>> do provide.
>> 
>> FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended discussion about this in our team meeting today, and everyone basically agreed that there are some things about the web-based PR model that are *really* nice:
>> 
>> 1. Effective tracking of submission state — open / assigned to a reviewer / merged / rejected
>> 2. Automation
>> 3. Not having to marshal git commits into email, and then marshal them back into git commits again
>> 
>> On the other hand, the general consensus, from people who had used such websites “in anger” (as they say here in the UK) was that they really didn’t like the way that reviews worked.  Email was seen as:
>> 1. Much more convenient for giving feedback and having discussions
>> 2. Easier for people to “listen in” on other people’s reviews
>> 3. More accessible for posterity
>> 
>> In the end we generally agreed that it was an idea worth thinking about more.  Not sure how the wider community feels, but there are at least a decent cohort who wouldn’t send you hate mail. :-)
> 
> I for one would very much welcome a PR-style model. Keeping track of
> patches in emails I need to review is not fun (and I'm pretty bad at
> it) and then just to find a patch that doesn't even compile is a waste
> of everyone's time. Automatic style checks and compile checks are the
> bare minimum I would consider any project should have today. There is
> already a Travis CI script shipped with Xen yet it's not used when
> patches are submitted.. Perhaps the reviews are more accessible for
> posterity but I personally never end up reading old reviews, even in
> the depths of the worst code archeology it's always just looking at
> git blame and commit messages. Giving feedback and discussions I also
> find a lot more easier to navigate on say Github then on the
> mailinglist - and I do get email copies of PRs and can reply inline
> via email if I want to.. We are already keeping track of open patch
> series on Jira - or at least there was an attempt to do so, not sure
> how up-to-date that is - but that's not the right way as that requires
> manual porting of tasks from the mailinglist. Perhaps it should be the
> other way around.
> 
> Just my 2c.
> 
> Tamas

OpenXT uses JIRA for issue tracking and Github for pull requests and  approval workflow.  JIRA can link  issues to PRs, based on ticket number in the PR description.

Both JIRA and Github can mirror  issue/PR comments and content to email (individual or mailing list).  Replies to these emails will be associated with issues/PRs, if the sender has an account on the service.

Would there be interest in testing a Gitlab/Github workflow in a Xen sub project, where contributors are already inclined to use such tools?   Windows PV drivers could be a candidate, as QubesOS uses Github PRs and the volume of changes is not high.

The value of these services is not just the metadata archive and structure that it brings to dev workflow, but the ever-expanding ecosystem of analytics tools that can use the  historical data.  Yes, in theory, similar data can be extracted from xen-devel archives, but it often requires custom tooling.  Case in point - the lack of accessible quantitative data to substantiate the intuitions expressed in this thread, about the differences in dev patterns with 6-month vs. longer release cycles.

Rich
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-11 14:06         ` Rich Persaud
@ 2018-07-11 15:12           ` Paul Durrant
  0 siblings, 0 replies; 82+ messages in thread
From: Paul Durrant @ 2018-07-11 15:12 UTC (permalink / raw)
  To: 'Rich Persaud', Xen-devel
  Cc: Lars Kurth, committers, George Dunlap, cardoe, Tamas K Lengyel

> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf
> Of Rich Persaud
> Sent: 11 July 2018 15:06
> To: Xen-devel <xen-devel@lists.xenproject.org>
> Cc: committers@xenproject.org; Lars Kurth <lars.kurth@citrix.com>;
> cardoe@cardoe.com; George Dunlap <George.Dunlap@citrix.com>; Tamas K
> Lengyel <tamas.k.lengyel@gmail.com>
> Subject: Re: [Xen-devel] [Notes for xen summit 2018 design session] Process
> changes: is the 6 monthly release Cadence too short, Security Process, ...
> 
> On Jul 5, 2018, at 22:54, Tamas K Lengyel <tamas.k.lengyel@gmail.com>
> wrote:
> >
> >> On Thu, Jul 5, 2018 at 12:52 PM George Dunlap
> <George.Dunlap@citrix.com> wrote:
> >>
> >>>
> >>>> Again, there was a sense that some of the issues we are seeing could
> be solved if we had better
> >>>> CI capability: in other words, some of the issues we were seeing could
> be resolved by
> >>>> * Better CI capability as suggested in the Release Cadence discussion
> >>>> * Improving some of the internal working practices of the security team
> >>>> * Before we commit to a change (such as improved batching), we
> should try them first informally.
> >>>>  E.g. the security team could try and work towards more predictable
> dates for batches vs. a
> >>>>  concrete process change
> >>>
> >>> My feeling on CI is clear in this thread and other threads. But I think
> >>> what would help OSSTEST bottlenecks if we do better at separating up
> >>> different parts of the testing process into more parallel tasks that
> >>> also provide feedback to the contributor faster. I'll obviously never
> >>> suggest the GitHub/GitLab PR/MR model to a ML driven project because
> I
> >>> wouldn't survive the hate mail but there is something that those models
> >>> do provide.
> >>
> >> FWIW we (IanJ, Wei, Roger, Anthony and I) just had a fairly extended
> discussion about this in our team meeting today, and everyone basically
> agreed that there are some things about the web-based PR model that are
> *really* nice:
> >>
> >> 1. Effective tracking of submission state — open / assigned to a reviewer /
> merged / rejected
> >> 2. Automation
> >> 3. Not having to marshal git commits into email, and then marshal them
> back into git commits again
> >>
> >> On the other hand, the general consensus, from people who had used
> such websites “in anger” (as they say here in the UK) was that they really
> didn’t like the way that reviews worked.  Email was seen as:
> >> 1. Much more convenient for giving feedback and having discussions
> >> 2. Easier for people to “listen in” on other people’s reviews
> >> 3. More accessible for posterity
> >>
> >> In the end we generally agreed that it was an idea worth thinking about
> more.  Not sure how the wider community feels, but there are at least a
> decent cohort who wouldn’t send you hate mail. :-)
> >
> > I for one would very much welcome a PR-style model. Keeping track of
> > patches in emails I need to review is not fun (and I'm pretty bad at
> > it) and then just to find a patch that doesn't even compile is a waste
> > of everyone's time. Automatic style checks and compile checks are the
> > bare minimum I would consider any project should have today. There is
> > already a Travis CI script shipped with Xen yet it's not used when
> > patches are submitted.. Perhaps the reviews are more accessible for
> > posterity but I personally never end up reading old reviews, even in
> > the depths of the worst code archeology it's always just looking at
> > git blame and commit messages. Giving feedback and discussions I also
> > find a lot more easier to navigate on say Github then on the
> > mailinglist - and I do get email copies of PRs and can reply inline
> > via email if I want to.. We are already keeping track of open patch
> > series on Jira - or at least there was an attempt to do so, not sure
> > how up-to-date that is - but that's not the right way as that requires
> > manual porting of tasks from the mailinglist. Perhaps it should be the
> > other way around.
> >
> > Just my 2c.
> >
> > Tamas
> 
> OpenXT uses JIRA for issue tracking and Github for pull requests and
> approval workflow.  JIRA can link  issues to PRs, based on ticket number in
> the PR description.
> 
> Both JIRA and Github can mirror  issue/PR comments and content to email
> (individual or mailing list).  Replies to these emails will be associated with
> issues/PRs, if the sender has an account on the service.
> 
> Would there be interest in testing a Gitlab/Github workflow in a Xen sub
> project, where contributors are already inclined to use such tools?   Windows
> PV drivers could be a candidate, as QubesOS uses Github PRs and the volume
> of changes is not high.
> 

Personally I'm not a fan of web based workflows. I think that mailing lists work much better for review as my experience of using web review tools has been that it is nearly impossible to comment on a patch as a whole and when comments are mirrored to email they end up some sort of digest in reverse chronological order. That said, pulling the final reviewed code from a branch is certainly much easier than applying patches from a mailbox.

  Paul

> The value of these services is not just the metadata archive and structure
> that it brings to dev workflow, but the ever-expanding ecosystem of
> analytics tools that can use the  historical data.  Yes, in theory, similar data can
> be extracted from xen-devel archives, but it often requires custom tooling.
> Case in point - the lack of accessible quantitative data to substantiate the
> intuitions expressed in this thread, about the differences in dev patterns
> with 6-month vs. longer release cycles.
> 
> Rich
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-06 16:42                   ` Lars Kurth
@ 2018-07-12  9:24                     ` Lars Kurth
  2018-07-19 15:39                       ` Juergen Gross
  0 siblings, 1 reply; 82+ messages in thread
From: Lars Kurth @ 2018-07-12  9:24 UTC (permalink / raw)
  To: Ian Jackson, Doug Goldstein, Jan Beulich, Roger Pau Monne,
	Rich Persaud, xen-devel, Juergen Gross, committers,
	Sander Eikelenboom, Stefano Stabellini



On 06/07/2018, 17:42, "Lars Kurth" <lars.kurth@citrix.com> wrote:

    Hi all, (I also moved the AB to BCC)
    
    I summarized the discussion in https://docs.google.com/document/d/1W7OuISUau-FtPG6tIinD4GXYFb-hKDjaqTj84pogNrA/edit?usp=sharing 
    
    I may have missed some things or misinterpreted them, but it looks as if consensus is emerging in some areas. I would like to discuss what we do for the 4.12 release at next week's community call. As far as I can see we have a few options:
    * Go on as we are
    * Move to 9 months, until we fixed the underlying issues - the problem is that unless we get some sort of commitment 
    * Skip a release as a one-off: Set ourselves some goals that must be achieved in this cycle around testing - this will need some commitment from vendors
    
    Regards
    Lars
    
That discussion took place yesterday, including some people who were not at the design session, but not the complete list of people. Thus, I am copying the notes into here as well (and the google doc), such that everything is in one place.

Juergen: raises the point that keeping the release cadence at 6 months is very unfair on Jan
who has raised many times that the workload resulting from having to maintain so many
release branches would be too high. After running 6 monthly releases for some time, this
has in fact come true, when at the time Jan’s concerns were dismissed. The overhead
breaks down into backporting fixes, backporting security fixes and dealing with the release
mechanics.

Jan: raised the point that hardly anyone responds to calls for back-ports and if so, only send
change-sets and Jan do the backporting. Jan also says he suspects that people may not
respond to backport requests, because that would require them to backport the patches.

George: points out that unless he remembers at the time he writes or reviews a patch,
whether it is back-port worthy.

George and Andrew raised the idea that we could maintain a list of pending backports and
assign backport tasks to people.

Jan: maintaining releases as a single person is the most efficient way of doing it. A single
person doing all trees is most efficient, but then we need to restrict the number of trees. And
2 releases per year are too many.

Andrew: suggests that an even/odd releases model with different support cycles would solve
this. By doing this, we would retain the discipline of doing releases.

Juergen: this would however impose the release overhead

Andrew: agrees that we need to reduce our release overhead regardless, but this issue is
orthogonal from the release cadence.

**Staying at 6 months we would either have to find someone who would like to carry the
maintenance load, or move to a longer cadence. Also we need to make it clear that
reducing the release overhead is independent from release cadence and process. We
should be doing this irrespective depending on the cadence.**

Juergen: We could ** look at 8 months (instead of 9)it is better from a scheduling
perspective (working around public holidays).** ​ With an 8 month release cycle, the release
occurs at only 3 different dates during the calendar year, rather than the 4 dates with a 9
month cycle. This makes planning easier for selecting dates that avoid public holidays. 8
months is also closer to the 6 month cycle for those preferring shorter cadence. An 8 month
cycle would not increase the number of concurrently supported branches when compared
with a 9 month cycle.

**ACTION: George will put together a survey for the committers outlining the issue and
trade-offs and then go from there** 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-12  9:24                     ` Lars Kurth
@ 2018-07-19 15:39                       ` Juergen Gross
  2018-07-19 17:44                         ` Lars Kurth
  0 siblings, 1 reply; 82+ messages in thread
From: Juergen Gross @ 2018-07-19 15:39 UTC (permalink / raw)
  To: Lars Kurth, Ian Jackson, Doug Goldstein, Jan Beulich,
	Roger Pau Monne, Rich Persaud, xen-devel, committers,
	Sander Eikelenboom, Stefano Stabellini

On 12/07/18 11:24, Lars Kurth wrote:
> 
> 
> On 06/07/2018, 17:42, "Lars Kurth" <lars.kurth@citrix.com> wrote:
> 
>     Hi all, (I also moved the AB to BCC)
>     
>     I summarized the discussion in https://docs.google.com/document/d/1W7OuISUau-FtPG6tIinD4GXYFb-hKDjaqTj84pogNrA/edit?usp=sharing 
>     
>     I may have missed some things or misinterpreted them, but it looks as if consensus is emerging in some areas. I would like to discuss what we do for the 4.12 release at next week's community call. As far as I can see we have a few options:
>     * Go on as we are
>     * Move to 9 months, until we fixed the underlying issues - the problem is that unless we get some sort of commitment 
>     * Skip a release as a one-off: Set ourselves some goals that must be achieved in this cycle around testing - this will need some commitment from vendors
>     
>     Regards
>     Lars
>     
> That discussion took place yesterday, including some people who were not at the design session, but not the complete list of people. Thus, I am copying the notes into here as well (and the google doc), such that everything is in one place.
> 
> Juergen: raises the point that keeping the release cadence at 6 months is very unfair on Jan
> who has raised many times that the workload resulting from having to maintain so many
> release branches would be too high. After running 6 monthly releases for some time, this
> has in fact come true, when at the time Jan’s concerns were dismissed. The overhead
> breaks down into backporting fixes, backporting security fixes and dealing with the release
> mechanics.
> 
> Jan: raised the point that hardly anyone responds to calls for back-ports and if so, only send
> change-sets and Jan do the backporting. Jan also says he suspects that people may not
> respond to backport requests, because that would require them to backport the patches.
> 
> George: points out that unless he remembers at the time he writes or reviews a patch,
> whether it is back-port worthy.
> 
> George and Andrew raised the idea that we could maintain a list of pending backports and
> assign backport tasks to people.
> 
> Jan: maintaining releases as a single person is the most efficient way of doing it. A single
> person doing all trees is most efficient, but then we need to restrict the number of trees. And
> 2 releases per year are too many.
> 
> Andrew: suggests that an even/odd releases model with different support cycles would solve
> this. By doing this, we would retain the discipline of doing releases.
> 
> Juergen: this would however impose the release overhead
> 
> Andrew: agrees that we need to reduce our release overhead regardless, but this issue is
> orthogonal from the release cadence.
> 
> **Staying at 6 months we would either have to find someone who would like to carry the
> maintenance load, or move to a longer cadence. Also we need to make it clear that
> reducing the release overhead is independent from release cadence and process. We
> should be doing this irrespective depending on the cadence.**
> 
> Juergen: We could ** look at 8 months (instead of 9)it is better from a scheduling
> perspective (working around public holidays).** ​ With an 8 month release cycle, the release
> occurs at only 3 different dates during the calendar year, rather than the 4 dates with a 9
> month cycle. This makes planning easier for selecting dates that avoid public holidays. 8
> months is also closer to the 6 month cycle for those preferring shorter cadence. An 8 month
> cycle would not increase the number of concurrently supported branches when compared
> with a 9 month cycle.
> 
> **ACTION: George will put together a survey for the committers outlining the issue and
> trade-offs and then go from there** 
> 

Ping? Anything new? I'd like to know the dates for 4.12...


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, ...
  2018-07-19 15:39                       ` Juergen Gross
@ 2018-07-19 17:44                         ` Lars Kurth
  0 siblings, 0 replies; 82+ messages in thread
From: Lars Kurth @ 2018-07-19 17:44 UTC (permalink / raw)
  To: Juergen Gross, Ian Jackson, Doug Goldstein, Jan Beulich,
	Roger Pau Monne, Rich Persaud, xen-devel, committers,
	Sander Eikelenboom, Stefano Stabellini, George Dunlap



On 19/07/2018, 08:40, "Juergen Gross" <jgross@suse.com> wrote:

    > **ACTION: George will put together a survey for the committers outlining the issue and
    > trade-offs and then go from there** 
    > 
    
    Ping? Anything new? I'd like to know the dates for 4.12...
    
Adding George: He was doing the 5 point survey and was travelling, but is back as of tomorrow. I think we have a way forward, but I will let George reply. 
Lars
    

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2018-07-19 17:44 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-02 18:00 [Notes for xen summit 2018 design session] Process changes: is the 6 monthly release Cadence too short, Security Process, Lars Kurth
2018-07-02 18:03 ` Lars Kurth
2018-07-03  6:26   ` Juergen Gross
2018-07-03  7:00     ` Jan Beulich
2018-07-03  8:13       ` Lars Kurth
2018-07-03 10:13         ` Jan Beulich
2018-07-03  8:06     ` Lars Kurth
2018-07-03 10:01       ` Jan Beulich
2018-07-03 10:14         ` Lars Kurth
2018-07-03 11:29           ` Wei Liu
2018-07-05 11:05       ` Ian Jackson
2018-07-05 11:18         ` George Dunlap
2018-07-05 11:51           ` Andrew Cooper
2018-07-05 12:20             ` Juergen Gross
2018-07-05 12:34               ` Lars Kurth
2018-07-05 15:14               ` Ian Jackson
2018-07-05 15:40                 ` Sander Eikelenboom
2018-07-05 16:23                   ` Ian Jackson
2018-07-05 16:41                     ` George Dunlap
2018-07-05 16:54                       ` Andrew Cooper
2018-07-05 17:02                         ` Ian Jackson
2018-07-05 17:06                           ` Sander Eikelenboom
2018-07-05 17:11                             ` Ian Jackson
2018-07-05 17:20                               ` Sander Eikelenboom
2018-07-05 22:47                               ` Sander Eikelenboom
2018-07-06  8:09                                 ` Sander Eikelenboom
2018-07-05 17:22                           ` George Dunlap
2018-07-05 17:25                             ` Ian Jackson
2018-07-05 17:47                               ` Sander Eikelenboom
2018-07-06  8:58                 ` Juergen Gross
2018-07-06 14:08                   ` Ian Jackson
2018-07-06 14:17                     ` Juergen Gross
2018-07-06 14:27                       ` Ian Jackson
2018-07-03 10:07   ` Roger Pau Monné
2018-07-03 10:23     ` Lars Kurth
2018-07-03 10:47       ` Juergen Gross
2018-07-03 11:24         ` Lars Kurth
2018-07-05 11:16         ` Ian Jackson
2018-07-05 11:39           ` George Dunlap
2018-07-05 18:14             ` Doug Goldstein
2018-07-05 11:41           ` Juergen Gross
2018-07-05 18:13           ` Doug Goldstein
2018-07-06  8:32             ` Jan Beulich
2018-07-06  8:44               ` Andrew Cooper
2018-07-06 14:03                 ` Ian Jackson
2018-07-06 14:09                   ` Juergen Gross
2018-07-06 14:26                     ` Ian Jackson
2018-07-06 14:52               ` Doug Goldstein
2018-07-06 15:09                 ` Ian Jackson
2018-07-06 16:42                   ` Lars Kurth
2018-07-12  9:24                     ` Lars Kurth
2018-07-19 15:39                       ` Juergen Gross
2018-07-19 17:44                         ` Lars Kurth
2018-07-05 17:58         ` Doug Goldstein
2018-07-03 10:30     ` Juergen Gross
2018-07-04 15:26     ` George Dunlap
2018-07-04 15:47       ` Ian Jackson
2018-07-04 15:59         ` Steven Haigh
2018-07-04 15:51       ` Steven Haigh
2018-07-05  7:53       ` Wei Liu
2018-07-05  8:06         ` Roger Pau Monné
2018-07-05  8:19           ` Wei Liu
2018-07-05  8:43             ` Roger Pau Monné
2018-07-05  8:47               ` Wei Liu
2018-07-05  8:55                 ` Roger Pau Monné
2018-07-05  9:17                   ` Wei Liu
2018-07-05 10:43               ` Sander Eikelenboom
2018-07-05  8:28         ` George Dunlap
2018-07-05  8:44           ` Wei Liu
2018-07-05  8:31         ` Juergen Gross
2018-07-05  8:55           ` Wei Liu
2018-07-05 11:24             ` Ian Jackson
2018-07-05 11:19         ` Ian Jackson
2018-07-05 17:48   ` Doug Goldstein
2018-07-05 18:51     ` George Dunlap
2018-07-05 19:00       ` Stefano Stabellini
2018-07-05 19:02       ` Doug Goldstein
2018-07-06  1:58         ` Doug Goldstein
2018-07-06  8:15           ` Roger Pau Monné
2018-07-06  2:54       ` Tamas K Lengyel
2018-07-11 14:06         ` Rich Persaud
2018-07-11 15:12           ` Paul Durrant

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.