All of lore.kernel.org
 help / color / mirror / Atom feed
* [lustre-devel] Should we have fewer releases?
       [not found]   ` <D25FC1DB.89C48%spitzcor@cray.com>
@ 2015-11-05 21:45     ` Christopher J. Morrone
  2015-11-06 13:53       ` DEGREMONT Aurelien
  2015-11-06 14:39       ` Drokin, Oleg
  0 siblings, 2 replies; 10+ messages in thread
From: Christopher J. Morrone @ 2015-11-05 21:45 UTC (permalink / raw)
  To: lustre-devel

Hi,

I think that Cory meant to send his message to this list.  Please read 
his comment at the end before reading my reply here.

Peter Jones is summarized in those notes as saying that how long 
releases take seems to depend on how much change was introduce into the 
tree.  I agree; this is a causal relationship.

I believe that if our six months releases are often late and take in the 
7-9 month range, then I think that planned nine month releases will in 
actuality take 12+ months.

It may not be the current advocate's reason for suggesting the longer 
release cycle, but one argument I have heard many times is that a longer 
cycle will reduce the amount of manpower needed to create releases.  I 
don't think that is substantially true.  While there are some fixed 
costs in creating a release, there is no real reason that those fixed 
costs need be a dominant factor for manpower demands.  On the other 
hand, required manpower is almost always going to be strongly 
proportional to, and dominated by, the amount of change we introduce.

If we perform excellent, in-depth reviews on all code changes and we 
also perform strong testing throughout the development cycle, then the 
manpower centered around "release time" need not be very high.  But 
right now our peer reviews aren't quite as in depth as they could be, 
and community testing, while improving of late, is unpredictably applied 
and concentrated near the end of the cycle.  This guarantees a large and 
unpredictable amount of development effort shortly before the release 
date, often resulting in a missed release target.

So lets think about what happens if we extend the development cycle, 
including extending freeze dates.  Assuming only minor, gradual 
improvements in code reviews and continuous testing (a very safe 
assumption, I think), the amount of change introduced into the release 
will be proportionally higher the longer we leave the landing window 
open.  The greater the change, the larger the amount of effort needed to 
stabilize the code after the fact.

Furthermore, I would speculate that extending the release cycle and 
putting off the testing and stabilization effort will actually require a 
super linear increase in the time for that effort.

Consider for instance that the longer we make the release cycle, the 
more likely that bug authors have moved on to another task or project. 
Since this is an open source project we don't have any way to order the 
bug author back to work on her code.  Even if the original author is 
available to work on the bug, she may need significant time to shift 
gears and remember how the code she touched works before she can make 
significant progress.  If the original author is not available, then 
someone else needs to learn that portion of code and that has even more 
obvious impact on time to solution and release.

I think there are also other effects that will conspire (e.g. unexpected 
change interactions) to make the testing and stabilization period grow 
super-linearly with the increase in the landing window.

Therefore, I would argue that lengthening the release cycle will neither 
reduce our manpower needs nor result in more predictable release dates.

On the contrary, we need to go in the opposite direction to achieve 
those goals.  We need to shorten the release cycle and have more 
frequent releases.  I would recommend that we move to to a roughly three 
month release cycle.  Some of the benefits might be:

* Less change and accumulate before the release
* The penalty for missing a release landing window is reduced when 
releases are more often
* Code reviewers have less pressure to land unfinished and/or 
insufficiently reviewed and tested code when the penalty is reduced
* Less change means less to test and fix at release time
* Bug authors are more likely to still remember what they did and 
participate in cleanup.
* Less time before bugs that slip through the cracks appear in a major 
release
* Reduces developer frustration with long freeze windows
* Encourages developers to rally more frequently around the landing 
windows instead of falling into a long period of silence and then trying 
to shove a bunch of code in just before freeze.  (They'll still try to 
ram things in just before freeze, but with more frequent landing windows 
the amount will be smaller and more manageable.)

It was also mentioned in the LWG email that vendors believe that the 
open source releases need to adhere to an advertised schedule.  Having 
shorter release cycles with smaller and more manageable change will 
directly contribute to Lustre releases happening on a more regular schedule.

Those same vendors tend to be concerned that they will not be able to 
productise every single release if they happen on a three month 
schedule.  It is important to recognize that a vendor's product schedule 
need not be directly in sync with every community release.  It is 
actually quite common in the open source world for vendors to select a 
version to productise, and skip over some community releases to find the 
next version which they will productise.  Consider, for instance, the 
Linux kernel.  RedHat selects a version of the kernel to include in RHEL 
and then sticks with the base of code fore many years.  They will 
backport changes as they see fit, but their base on that release remains 
the same.  The next kernel that they decide to package in their product 
will skip over many of the upstream Linux releases.

Some Lustre vendors already operate this way, and the ones that do not 
need to adapt to this common, successful open source model.

Shortening the release cycle will help encourage and sustain an active 
open source community of Lustre developers from a diverse set of 
organizations.

Conversely, lengthening the release cycle will result in less Lustre 
stability and encourage stagnation.  It will make us less nimble, less 
likely to meet the needs of our current user base, and slower to expand 
into new markets.

Lets start working through what process changes we will need to make to 
shorten the development cycles and make lustre releases more often.

Thanks,
Chris

On 11/04/2015 01:16 PM, Cory Spitz wrote:
> Hello, Lustre developers.
>
> On today?s OpenSFS LWG teleconference call (notes at
> http://wiki.opensfs.org/LWG_Minutes_2015-11-04) I proposed that we change
> the Lustre release cadence from six months to nine months.  Chris M.
> responded (below) that any discussion about development changes should
> happen here on lustre-devel.  I agree, developers need to be on-board.
>
> So what do you think about release changes?  What requirements do you
> have?  What issues would you have if OpenSFS changed the major release
> cadence to nine months?
>
> Thanks,
>
> -Cory
>
> On 11/4/15, 1:58 PM, "lwg on behalf of Christopher J. Morrone"
> <lwg-bounces at lists.opensfs.org on behalf of morrone2@llnl.gov> wrote:
>
>> On 11/04/2015 10:28 AM, Cory Spitz wrote:
>>
>>> Lustre release cadence
>>> We haven?t been good about hitting our 6 month schedules
>>> Cory proposed a 9 month cadence just to recognize reality.  Certainly
>>> pros/cons to any scheme.  Should be up for discussion.  How/when to
>>> decide?
>>
>> Any development change like that needs to be discussed on lustre-devel.
>>
>> Chris
>>
>> _______________________________________________
>> lwg mailing list
>> lwg at lists.opensfs.org
>> http://lists.opensfs.org/listinfo.cgi/lwg-opensfs.org
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.opensfs.org
> http://lists.opensfs.org/listinfo.cgi/lustre-devel-opensfs.org
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-05 21:45     ` [lustre-devel] Should we have fewer releases? Christopher J. Morrone
@ 2015-11-06 13:53       ` DEGREMONT Aurelien
  2015-11-06 14:33         ` Rick Wagner
  2015-11-06 22:18         ` Christopher J. Morrone
  2015-11-06 14:39       ` Drokin, Oleg
  1 sibling, 2 replies; 10+ messages in thread
From: DEGREMONT Aurelien @ 2015-11-06 13:53 UTC (permalink / raw)
  To: lustre-devel

Hi

You're right for most of your comments.

However, you forgot one important thing: there is no more public 
maintenance releases.
Your theory is correct if Lustre releases get bugfix only minor releases 
(i.e: 2.7.1, 2.7.2, ...) like it uses to have.

HPC centers cannot upgrade Lustre release very often. Usually they pick 
one and stick to it for a while, doing at most one or two major upgrades 
during the computer life cycle (lets say 5 years).
The more Lustre releases they will be, the more scattering of Lustre 
versions in production they will be. As there is no more bugfix release 
made, admins need to regroup their efforts on fewer releases to benefit 
from debugging and patches produced by others, on the same Lustre release.

Regarding lengthening the release cycle, I clearly agree that having 
longer landing window won't help at all. That means that lengthening 
release cycle only mean lengthening the code freeze window.


Aur?lien

Le 05/11/2015 22:45, Christopher J. Morrone a ?crit :
> Hi,
>
> I think that Cory meant to send his message to this list.  Please read 
> his comment at the end before reading my reply here.
>
> Peter Jones is summarized in those notes as saying that how long 
> releases take seems to depend on how much change was introduce into 
> the tree.  I agree; this is a causal relationship.
>
> I believe that if our six months releases are often late and take in 
> the 7-9 month range, then I think that planned nine month releases 
> will in actuality take 12+ months.
>
> It may not be the current advocate's reason for suggesting the longer 
> release cycle, but one argument I have heard many times is that a 
> longer cycle will reduce the amount of manpower needed to create 
> releases.  I don't think that is substantially true.  While there are 
> some fixed costs in creating a release, there is no real reason that 
> those fixed costs need be a dominant factor for manpower demands.  On 
> the other hand, required manpower is almost always going to be 
> strongly proportional to, and dominated by, the amount of change we 
> introduce.
>
> If we perform excellent, in-depth reviews on all code changes and we 
> also perform strong testing throughout the development cycle, then the 
> manpower centered around "release time" need not be very high.  But 
> right now our peer reviews aren't quite as in depth as they could be, 
> and community testing, while improving of late, is unpredictably 
> applied and concentrated near the end of the cycle. This guarantees a 
> large and unpredictable amount of development effort shortly before 
> the release date, often resulting in a missed release target.
>
> So lets think about what happens if we extend the development cycle, 
> including extending freeze dates.  Assuming only minor, gradual 
> improvements in code reviews and continuous testing (a very safe 
> assumption, I think), the amount of change introduced into the release 
> will be proportionally higher the longer we leave the landing window 
> open.  The greater the change, the larger the amount of effort needed 
> to stabilize the code after the fact.
>
> Furthermore, I would speculate that extending the release cycle and 
> putting off the testing and stabilization effort will actually require 
> a super linear increase in the time for that effort.
>
> Consider for instance that the longer we make the release cycle, the 
> more likely that bug authors have moved on to another task or project. 
> Since this is an open source project we don't have any way to order 
> the bug author back to work on her code.  Even if the original author 
> is available to work on the bug, she may need significant time to 
> shift gears and remember how the code she touched works before she can 
> make significant progress.  If the original author is not available, 
> then someone else needs to learn that portion of code and that has 
> even more obvious impact on time to solution and release.
>
> I think there are also other effects that will conspire (e.g. 
> unexpected change interactions) to make the testing and stabilization 
> period grow super-linearly with the increase in the landing window.
>
> Therefore, I would argue that lengthening the release cycle will 
> neither reduce our manpower needs nor result in more predictable 
> release dates.
>
> On the contrary, we need to go in the opposite direction to achieve 
> those goals.  We need to shorten the release cycle and have more 
> frequent releases.  I would recommend that we move to to a roughly 
> three month release cycle.  Some of the benefits might be:
>
> * Less change and accumulate before the release
> * The penalty for missing a release landing window is reduced when 
> releases are more often
> * Code reviewers have less pressure to land unfinished and/or 
> insufficiently reviewed and tested code when the penalty is reduced
> * Less change means less to test and fix at release time
> * Bug authors are more likely to still remember what they did and 
> participate in cleanup.
> * Less time before bugs that slip through the cracks appear in a major 
> release
> * Reduces developer frustration with long freeze windows
> * Encourages developers to rally more frequently around the landing 
> windows instead of falling into a long period of silence and then 
> trying to shove a bunch of code in just before freeze. (They'll still 
> try to ram things in just before freeze, but with more frequent 
> landing windows the amount will be smaller and more manageable.)
>
> It was also mentioned in the LWG email that vendors believe that the 
> open source releases need to adhere to an advertised schedule.  Having 
> shorter release cycles with smaller and more manageable change will 
> directly contribute to Lustre releases happening on a more regular 
> schedule.
>
> Those same vendors tend to be concerned that they will not be able to 
> productise every single release if they happen on a three month 
> schedule.  It is important to recognize that a vendor's product 
> schedule need not be directly in sync with every community release.  
> It is actually quite common in the open source world for vendors to 
> select a version to productise, and skip over some community releases 
> to find the next version which they will productise.  Consider, for 
> instance, the Linux kernel.  RedHat selects a version of the kernel to 
> include in RHEL and then sticks with the base of code fore many 
> years.  They will backport changes as they see fit, but their base on 
> that release remains the same. The next kernel that they decide to 
> package in their product will skip over many of the upstream Linux 
> releases.
>
> Some Lustre vendors already operate this way, and the ones that do not 
> need to adapt to this common, successful open source model.
>
> Shortening the release cycle will help encourage and sustain an active 
> open source community of Lustre developers from a diverse set of 
> organizations.
>
> Conversely, lengthening the release cycle will result in less Lustre 
> stability and encourage stagnation.  It will make us less nimble, less 
> likely to meet the needs of our current user base, and slower to 
> expand into new markets.
>
> Lets start working through what process changes we will need to make 
> to shorten the development cycles and make lustre releases more often.
>
> Thanks,
> Chris
>
> On 11/04/2015 01:16 PM, Cory Spitz wrote:
>> Hello, Lustre developers.
>>
>> On today?s OpenSFS LWG teleconference call (notes at
>> http://wiki.opensfs.org/LWG_Minutes_2015-11-04) I proposed that we 
>> change
>> the Lustre release cadence from six months to nine months. Chris M.
>> responded (below) that any discussion about development changes should
>> happen here on lustre-devel.  I agree, developers need to be on-board.
>>
>> So what do you think about release changes?  What requirements do you
>> have?  What issues would you have if OpenSFS changed the major release
>> cadence to nine months?
>>
>> Thanks,
>>
>> -Cory
>>
>> On 11/4/15, 1:58 PM, "lwg on behalf of Christopher J. Morrone"
>> <lwg-bounces at lists.opensfs.org on behalf of morrone2@llnl.gov> wrote:
>>
>>> On 11/04/2015 10:28 AM, Cory Spitz wrote:
>>>
>>>> Lustre release cadence
>>>> We haven?t been good about hitting our 6 month schedules
>>>> Cory proposed a 9 month cadence just to recognize reality. Certainly
>>>> pros/cons to any scheme.  Should be up for discussion. How/when to
>>>> decide?
>>>
>>> Any development change like that needs to be discussed on lustre-devel.
>>>
>>> Chris
>>>
>>> _______________________________________________
>>> lwg mailing list
>>> lwg at lists.opensfs.org
>>> http://lists.opensfs.org/listinfo.cgi/lwg-opensfs.org
>>
>> _______________________________________________
>> lustre-devel mailing list
>> lustre-devel at lists.opensfs.org
>> http://lists.opensfs.org/listinfo.cgi/lustre-devel-opensfs.org
>>
>
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-06 13:53       ` DEGREMONT Aurelien
@ 2015-11-06 14:33         ` Rick Wagner
  2015-11-06 22:18         ` Christopher J. Morrone
  1 sibling, 0 replies; 10+ messages in thread
From: Rick Wagner @ 2015-11-06 14:33 UTC (permalink / raw)
  To: lustre-devel

Aur?lien,

Your point about the lack of bug fix releases is only partially correct. As part of their maintenance contract with OpenSFS Intel HPDD has chosen to focus on feature releases. However, there is nothing preventing the community from managing intermediate releases. Even a curated list of recommended patches and well-document and reproducible build processes would be a start.

--Rick

> On Nov 6, 2015, at 5:53 AM, DEGREMONT Aurelien <aurelien.degremont@cea.fr> wrote:
> 
> Hi
> 
> You're right for most of your comments.
> 
> However, you forgot one important thing: there is no more public maintenance releases.
> Your theory is correct if Lustre releases get bugfix only minor releases (i.e: 2.7.1, 2.7.2, ...) like it uses to have.
> 
> HPC centers cannot upgrade Lustre release very often. Usually they pick one and stick to it for a while, doing at most one or two major upgrades during the computer life cycle (lets say 5 years).
> The more Lustre releases they will be, the more scattering of Lustre versions in production they will be. As there is no more bugfix release made, admins need to regroup their efforts on fewer releases to benefit from debugging and patches produced by others, on the same Lustre release.
> 
> Regarding lengthening the release cycle, I clearly agree that having longer landing window won't help at all. That means that lengthening release cycle only mean lengthening the code freeze window.
> 
> 
> Aur?lien
> 
> Le 05/11/2015 22:45, Christopher J. Morrone a ?crit :
>> Hi,
>> 
>> I think that Cory meant to send his message to this list.  Please read his comment at the end before reading my reply here.
>> 
>> Peter Jones is summarized in those notes as saying that how long releases take seems to depend on how much change was introduce into the tree.  I agree; this is a causal relationship.
>> 
>> I believe that if our six months releases are often late and take in the 7-9 month range, then I think that planned nine month releases will in actuality take 12+ months.
>> 
>> It may not be the current advocate's reason for suggesting the longer release cycle, but one argument I have heard many times is that a longer cycle will reduce the amount of manpower needed to create releases.  I don't think that is substantially true.  While there are some fixed costs in creating a release, there is no real reason that those fixed costs need be a dominant factor for manpower demands.  On the other hand, required manpower is almost always going to be strongly proportional to, and dominated by, the amount of change we introduce.
>> 
>> If we perform excellent, in-depth reviews on all code changes and we also perform strong testing throughout the development cycle, then the manpower centered around "release time" need not be very high.  But right now our peer reviews aren't quite as in depth as they could be, and community testing, while improving of late, is unpredictably applied and concentrated near the end of the cycle. This guarantees a large and unpredictable amount of development effort shortly before the release date, often resulting in a missed release target.
>> 
>> So lets think about what happens if we extend the development cycle, including extending freeze dates.  Assuming only minor, gradual improvements in code reviews and continuous testing (a very safe assumption, I think), the amount of change introduced into the release will be proportionally higher the longer we leave the landing window open.  The greater the change, the larger the amount of effort needed to stabilize the code after the fact.
>> 
>> Furthermore, I would speculate that extending the release cycle and putting off the testing and stabilization effort will actually require a super linear increase in the time for that effort.
>> 
>> Consider for instance that the longer we make the release cycle, the more likely that bug authors have moved on to another task or project. Since this is an open source project we don't have any way to order the bug author back to work on her code.  Even if the original author is available to work on the bug, she may need significant time to shift gears and remember how the code she touched works before she can make significant progress.  If the original author is not available, then someone else needs to learn that portion of code and that has even more obvious impact on time to solution and release.
>> 
>> I think there are also other effects that will conspire (e.g. unexpected change interactions) to make the testing and stabilization period grow super-linearly with the increase in the landing window.
>> 
>> Therefore, I would argue that lengthening the release cycle will neither reduce our manpower needs nor result in more predictable release dates.
>> 
>> On the contrary, we need to go in the opposite direction to achieve those goals.  We need to shorten the release cycle and have more frequent releases.  I would recommend that we move to to a roughly three month release cycle.  Some of the benefits might be:
>> 
>> * Less change and accumulate before the release
>> * The penalty for missing a release landing window is reduced when releases are more often
>> * Code reviewers have less pressure to land unfinished and/or insufficiently reviewed and tested code when the penalty is reduced
>> * Less change means less to test and fix at release time
>> * Bug authors are more likely to still remember what they did and participate in cleanup.
>> * Less time before bugs that slip through the cracks appear in a major release
>> * Reduces developer frustration with long freeze windows
>> * Encourages developers to rally more frequently around the landing windows instead of falling into a long period of silence and then trying to shove a bunch of code in just before freeze. (They'll still try to ram things in just before freeze, but with more frequent landing windows the amount will be smaller and more manageable.)
>> 
>> It was also mentioned in the LWG email that vendors believe that the open source releases need to adhere to an advertised schedule.  Having shorter release cycles with smaller and more manageable change will directly contribute to Lustre releases happening on a more regular schedule.
>> 
>> Those same vendors tend to be concerned that they will not be able to productise every single release if they happen on a three month schedule.  It is important to recognize that a vendor's product schedule need not be directly in sync with every community release.  It is actually quite common in the open source world for vendors to select a version to productise, and skip over some community releases to find the next version which they will productise.  Consider, for instance, the Linux kernel.  RedHat selects a version of the kernel to include in RHEL and then sticks with the base of code fore many years.  They will backport changes as they see fit, but their base on that release remains the same. The next kernel that they decide to package in their product will skip over many of the upstream Linux releases.
>> 
>> Some Lustre vendors already operate this way, and the ones that do not need to adapt to this common, successful open source model.
>> 
>> Shortening the release cycle will help encourage and sustain an active open source community of Lustre developers from a diverse set of organizations.
>> 
>> Conversely, lengthening the release cycle will result in less Lustre stability and encourage stagnation.  It will make us less nimble, less likely to meet the needs of our current user base, and slower to expand into new markets.
>> 
>> Lets start working through what process changes we will need to make to shorten the development cycles and make lustre releases more often.
>> 
>> Thanks,
>> Chris
>> 
>>> On 11/04/2015 01:16 PM, Cory Spitz wrote:
>>> Hello, Lustre developers.
>>> 
>>> On today?s OpenSFS LWG teleconference call (notes at
>>> http://wiki.opensfs.org/LWG_Minutes_2015-11-04) I proposed that we change
>>> the Lustre release cadence from six months to nine months. Chris M.
>>> responded (below) that any discussion about development changes should
>>> happen here on lustre-devel.  I agree, developers need to be on-board.
>>> 
>>> So what do you think about release changes?  What requirements do you
>>> have?  What issues would you have if OpenSFS changed the major release
>>> cadence to nine months?
>>> 
>>> Thanks,
>>> 
>>> -Cory
>>> 
>>> On 11/4/15, 1:58 PM, "lwg on behalf of Christopher J. Morrone"
>>> <lwg-bounces at lists.opensfs.org on behalf of morrone2@llnl.gov> wrote:
>>> 
>>>>> On 11/04/2015 10:28 AM, Cory Spitz wrote:
>>>>> 
>>>>> Lustre release cadence
>>>>> We haven?t been good about hitting our 6 month schedules
>>>>> Cory proposed a 9 month cadence just to recognize reality. Certainly
>>>>> pros/cons to any scheme.  Should be up for discussion. How/when to
>>>>> decide?
>>>> 
>>>> Any development change like that needs to be discussed on lustre-devel.
>>>> 
>>>> Chris
>>>> 
>>>> _______________________________________________
>>>> lwg mailing list
>>>> lwg at lists.opensfs.org
>>>> http://lists.opensfs.org/listinfo.cgi/lwg-opensfs.org
>>> 
>>> _______________________________________________
>>> lustre-devel mailing list
>>> lustre-devel at lists.opensfs.org
>>> http://lists.opensfs.org/listinfo.cgi/lustre-devel-opensfs.org
>> 
>> _______________________________________________
>> lustre-devel mailing list
>> lustre-devel at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
> 
> _______________________________________________
> lustre-devel mailing list
> lustre-devel at lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-05 21:45     ` [lustre-devel] Should we have fewer releases? Christopher J. Morrone
  2015-11-06 13:53       ` DEGREMONT Aurelien
@ 2015-11-06 14:39       ` Drokin, Oleg
  2015-11-06 22:08         ` Christopher J. Morrone
  1 sibling, 1 reply; 10+ messages in thread
From: Drokin, Oleg @ 2015-11-06 14:39 UTC (permalink / raw)
  To: lustre-devel

Hello!

On Nov 5, 2015, at 4:45 PM, Christopher J. Morrone wrote:
> On the contrary, we need to go in the opposite direction to achieve those goals.  We need to shorten the release cycle and have more frequent releases.  I would recommend that we move to to a roughly three month release cycle.  Some of the benefits might be:
> 
> * Less change and accumulate before the release
> * The penalty for missing a release landing window is reduced when releases are more often
> * Code reviewers have less pressure to land unfinished and/or insufficiently reviewed and tested code when the penalty is reduced
> * Less change means less to test and fix at release time
> * Bug authors are more likely to still remember what they did and participate in cleanup.
> * Less time before bugs that slip through the cracks appear in a major release
> * Reduces developer frustration with long freeze windows
> * Encourages developers to rally more frequently around the landing windows instead of falling into a long period of silence and then trying to shove a bunch of code in just before freeze.  (They'll still try to ram things in just before freeze, but with more frequent landing windows the amount will be smaller and more manageable.)

Bringing this to the logical extreme - we should just have one release per major feature.
Sadly, I think the stabilization process is not likely to get any shorter. Either that or interested parties would only jump into testing when enough of interesting features accumulate,
after which point there'd be a bunch of bugreports for the current feature plus the backlocd that did not get any significant real-world testing before. We have seen this pattern
to some degree already even with current releases. The releases that are ignored by community for one reason or another tend to be not very stable and then the follow-on release
gets this "testing debt" baggage that is paid at release time once testing outside of Intel picks up the pace.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-06 14:39       ` Drokin, Oleg
@ 2015-11-06 22:08         ` Christopher J. Morrone
  2015-11-07  0:46           ` Mitchell Erblich
  2015-11-07  8:36           ` [lustre-devel] Should we have fewer releases? Drokin, Oleg
  0 siblings, 2 replies; 10+ messages in thread
From: Christopher J. Morrone @ 2015-11-06 22:08 UTC (permalink / raw)
  To: lustre-devel

On 11/06/2015 06:39 AM, Drokin, Oleg wrote:
> Hello!
>
> On Nov 5, 2015, at 4:45 PM, Christopher J. Morrone wrote:
>> On the contrary, we need to go in the opposite direction to achieve those goals.  We need to shorten the release cycle and have more frequent releases.  I would recommend that we move to to a roughly three month release cycle.  Some of the benefits might be:
>>
>> * Less change and accumulate before the release
>> * The penalty for missing a release landing window is reduced when releases are more often
>> * Code reviewers have less pressure to land unfinished and/or insufficiently reviewed and tested code when the penalty is reduced
>> * Less change means less to test and fix at release time
>> * Bug authors are more likely to still remember what they did and participate in cleanup.
>> * Less time before bugs that slip through the cracks appear in a major release
>> * Reduces developer frustration with long freeze windows
>> * Encourages developers to rally more frequently around the landing windows instead of falling into a long period of silence and then trying to shove a bunch of code in just before freeze.  (They'll still try to ram things in just before freeze, but with more frequent landing windows the amount will be smaller and more manageable.)
>
> Bringing this to the logical extreme - we should just have one release per major feature.

It do not agree that it is logical to extend the argument to that 
extreme.  That is the "Appeal to Extremes" logical fallacy.

I also don't think it is appropriate to conflate major releases with 
major features.  When/if we move to a shorter release cycle, it would be 
entirely appropriate to put out a major release with no headline "major 
features".  It is totally acceptable to release the many changes that 
did make it in the landing window.  Even if none of the changes 
individually count as "major", they still collectively represent a major 
amount of work.

Right now we combine that major amount of work with seriously 
destabilizing new features that more than offset all the bug fixing that 
went on.  Why do we insist on making those destabilizing influences a 
requirement for a release?

Whether a major feature makes it into any particular release should be 
judge primarily on the quality and completeness of code, testing, and 
documentation for said feature.  Further, how many major features can be 
landed in a release would be gated on the amount of manpower we have for 
review and testing.  If 3 major features are truely complete and ready 
to land, but we can only fully vet 1 in the landing window, well, only 
one will land.  We'll have to make a judgement call as a community on 
the priority and work on that.

In summary: I think we should decouple the concept of major releases and 
major features.  Major releases do not need to be subject to major features.

> Sadly, I think the stabilization process is not likely to get any shorter.

Do not see a connection between the amount of change and the time it 
takes to stabilize that change?  Can you explain why you think that?

> Either that or interested parties would only jump into testing when enough of interesting features accumulate,
> after which point there'd be a bunch of bugreports for the current feature plus the backlocd that did not get any significant real-world testing before. We have seen this pattern
> to some degree already even with current releases.

The scary future you paint is no different than our present. 
Organizations like LLNL only move to new major releases every 18 months 
at the earliest, and we would really like to run the same version for 
more like three years in some cases.  We are too busy drowning in 
production Lustre issues half the time to get involved in testing except 
when it is something that is on our roadmap to put into production.  I 
don't think we're alone.  Even if it isn't Lustre issues, everyone has 
day jobs that keep us busy and time for testing things that don't look 
immediately relevant to upper management can be difficult to justify.

So I agree, many people already are skipping the testing of many 
releases and that will continue into the future.

Frankly, I think that relying on an open source community to do rigorous 
and systematic testing is foolhardy.  The only way that really works is 
if your user base is large in proportion to the size of your code size 
and complexity.  I would estimate the Lustre is low in that ratio, while 
something like ZFS is probably medium to large, and Linux is large.

The testing you get from an open source community is going to be a 
fairly random in terms of code coverage.  In order to the coverage to be 
reasonably complete, you need _alot_ of people testing.

If we rely on a voluntary, at-will community testing as out primary SQA 
enforcement method, we are not going to ever put out terribly quality 
code with something as complex and poorly documented as Lustre.

Lets not apply the Appeal to Extremes argument to this either.  I am not 
saying that we shouldn't have testing.  We absolutely should.  We should 
also strive to make the barriers to testing as low as possible,
and make the opportunities for testing as frequent as reasonble.

If we have release every three months on a _reliable_ schedule, that 
will give prospective testers the ability to plan their testing time 
ahead, increases the probability that each prospective tester will have 
spare time that aligns with one of our release testing windows.

All that said, I think you might also be wrong about no one testing the 
each releases.  ORNL has already demonstrated a commitment to try every 
version.  Cray is stepping up testing.  I would like to have my team at 
LLNL become more active on master in the future, and have our testing 
person worked into the Lustre development cycle.

> The releases that are ignored by community for one reason or another tend to be not very stable and then the follow-on release
> gets this "testing debt" baggage that is paid at release time once testing outside of Intel picks up the pace.

That is a challenge now, and I acknowledge that it will continue to be a 
challenge in the future.

Making the releases more frequently and on a reliable schedule is not 
magic; it will not fix everything about our development process on its 
own.  Nevertheless I do believe that it will be a key supporting element 
in improving our software development and SQA processes.

Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-06 13:53       ` DEGREMONT Aurelien
  2015-11-06 14:33         ` Rick Wagner
@ 2015-11-06 22:18         ` Christopher J. Morrone
  1 sibling, 0 replies; 10+ messages in thread
From: Christopher J. Morrone @ 2015-11-06 22:18 UTC (permalink / raw)
  To: lustre-devel

I agree that the lack of public stable branches is an issue for many. 
But I also think that the topic of stable branches is somewhat 
orthogonal to the discussion about the development cycle on the master 
branch.  We can make these decisions about changing master's development 
process without knowing what we are going to do about stable branches in 
the future.

That said, there are some advantages to the three month release cycle 
that I propose that would help those who are hurting the most by the 
loss of the stable branches.

Assuming that we make these two changes:

1) Releases happen every three months on schedule
2) Releases are decouple from major releases (we don't hold up major 
releases for unfinished features, and it is fine to have a major release 
with no major feature)

Then the people most hurt by the lack of stable branches and the least 
able to purchase support contracts will see these benefits:

1) Less time to wait for a major release with relevant bugs fixes
2) Less reluctance to track the major releases

It is really in Lustre's best interest in general for us to move our 
development model in a direction that makes everyone less afraid of 
trying new versions of Lustre.  We don't have a good reputation there, 
but I think that is fixable if we make an effort.

Chris

On 11/06/2015 05:53 AM, DEGREMONT Aurelien wrote:
> Hi
>
> You're right for most of your comments.
>
> However, you forgot one important thing: there is no more public
> maintenance releases.
> Your theory is correct if Lustre releases get bugfix only minor releases
> (i.e: 2.7.1, 2.7.2, ...) like it uses to have.
>
> HPC centers cannot upgrade Lustre release very often. Usually they pick
> one and stick to it for a while, doing at most one or two major upgrades
> during the computer life cycle (lets say 5 years).
> The more Lustre releases they will be, the more scattering of Lustre
> versions in production they will be. As there is no more bugfix release
> made, admins need to regroup their efforts on fewer releases to benefit
> from debugging and patches produced by others, on the same Lustre release.
>
> Regarding lengthening the release cycle, I clearly agree that having
> longer landing window won't help at all. That means that lengthening
> release cycle only mean lengthening the code freeze window.
>
>
> Aur?lien
>
> Le 05/11/2015 22:45, Christopher J. Morrone a ?crit :
>> Hi,
>>
>> I think that Cory meant to send his message to this list.  Please read
>> his comment at the end before reading my reply here.
>>
>> Peter Jones is summarized in those notes as saying that how long
>> releases take seems to depend on how much change was introduce into
>> the tree.  I agree; this is a causal relationship.
>>
>> I believe that if our six months releases are often late and take in
>> the 7-9 month range, then I think that planned nine month releases
>> will in actuality take 12+ months.
>>
>> It may not be the current advocate's reason for suggesting the longer
>> release cycle, but one argument I have heard many times is that a
>> longer cycle will reduce the amount of manpower needed to create
>> releases.  I don't think that is substantially true.  While there are
>> some fixed costs in creating a release, there is no real reason that
>> those fixed costs need be a dominant factor for manpower demands.  On
>> the other hand, required manpower is almost always going to be
>> strongly proportional to, and dominated by, the amount of change we
>> introduce.
>>
>> If we perform excellent, in-depth reviews on all code changes and we
>> also perform strong testing throughout the development cycle, then the
>> manpower centered around "release time" need not be very high.  But
>> right now our peer reviews aren't quite as in depth as they could be,
>> and community testing, while improving of late, is unpredictably
>> applied and concentrated near the end of the cycle. This guarantees a
>> large and unpredictable amount of development effort shortly before
>> the release date, often resulting in a missed release target.
>>
>> So lets think about what happens if we extend the development cycle,
>> including extending freeze dates.  Assuming only minor, gradual
>> improvements in code reviews and continuous testing (a very safe
>> assumption, I think), the amount of change introduced into the release
>> will be proportionally higher the longer we leave the landing window
>> open.  The greater the change, the larger the amount of effort needed
>> to stabilize the code after the fact.
>>
>> Furthermore, I would speculate that extending the release cycle and
>> putting off the testing and stabilization effort will actually require
>> a super linear increase in the time for that effort.
>>
>> Consider for instance that the longer we make the release cycle, the
>> more likely that bug authors have moved on to another task or project.
>> Since this is an open source project we don't have any way to order
>> the bug author back to work on her code.  Even if the original author
>> is available to work on the bug, she may need significant time to
>> shift gears and remember how the code she touched works before she can
>> make significant progress.  If the original author is not available,
>> then someone else needs to learn that portion of code and that has
>> even more obvious impact on time to solution and release.
>>
>> I think there are also other effects that will conspire (e.g.
>> unexpected change interactions) to make the testing and stabilization
>> period grow super-linearly with the increase in the landing window.
>>
>> Therefore, I would argue that lengthening the release cycle will
>> neither reduce our manpower needs nor result in more predictable
>> release dates.
>>
>> On the contrary, we need to go in the opposite direction to achieve
>> those goals.  We need to shorten the release cycle and have more
>> frequent releases.  I would recommend that we move to to a roughly
>> three month release cycle.  Some of the benefits might be:
>>
>> * Less change and accumulate before the release
>> * The penalty for missing a release landing window is reduced when
>> releases are more often
>> * Code reviewers have less pressure to land unfinished and/or
>> insufficiently reviewed and tested code when the penalty is reduced
>> * Less change means less to test and fix at release time
>> * Bug authors are more likely to still remember what they did and
>> participate in cleanup.
>> * Less time before bugs that slip through the cracks appear in a major
>> release
>> * Reduces developer frustration with long freeze windows
>> * Encourages developers to rally more frequently around the landing
>> windows instead of falling into a long period of silence and then
>> trying to shove a bunch of code in just before freeze. (They'll still
>> try to ram things in just before freeze, but with more frequent
>> landing windows the amount will be smaller and more manageable.)
>>
>> It was also mentioned in the LWG email that vendors believe that the
>> open source releases need to adhere to an advertised schedule.  Having
>> shorter release cycles with smaller and more manageable change will
>> directly contribute to Lustre releases happening on a more regular
>> schedule.
>>
>> Those same vendors tend to be concerned that they will not be able to
>> productise every single release if they happen on a three month
>> schedule.  It is important to recognize that a vendor's product
>> schedule need not be directly in sync with every community release. It
>> is actually quite common in the open source world for vendors to
>> select a version to productise, and skip over some community releases
>> to find the next version which they will productise.  Consider, for
>> instance, the Linux kernel.  RedHat selects a version of the kernel to
>> include in RHEL and then sticks with the base of code fore many
>> years.  They will backport changes as they see fit, but their base on
>> that release remains the same. The next kernel that they decide to
>> package in their product will skip over many of the upstream Linux
>> releases.
>>
>> Some Lustre vendors already operate this way, and the ones that do not
>> need to adapt to this common, successful open source model.
>>
>> Shortening the release cycle will help encourage and sustain an active
>> open source community of Lustre developers from a diverse set of
>> organizations.
>>
>> Conversely, lengthening the release cycle will result in less Lustre
>> stability and encourage stagnation.  It will make us less nimble, less
>> likely to meet the needs of our current user base, and slower to
>> expand into new markets.
>>
>> Lets start working through what process changes we will need to make
>> to shorten the development cycles and make lustre releases more often.
>>
>> Thanks,
>> Chris
>>
>> On 11/04/2015 01:16 PM, Cory Spitz wrote:
>>> Hello, Lustre developers.
>>>
>>> On today?s OpenSFS LWG teleconference call (notes at
>>> http://wiki.opensfs.org/LWG_Minutes_2015-11-04) I proposed that we
>>> change
>>> the Lustre release cadence from six months to nine months. Chris M.
>>> responded (below) that any discussion about development changes should
>>> happen here on lustre-devel.  I agree, developers need to be on-board.
>>>
>>> So what do you think about release changes?  What requirements do you
>>> have?  What issues would you have if OpenSFS changed the major release
>>> cadence to nine months?
>>>
>>> Thanks,
>>>
>>> -Cory
>>>
>>> On 11/4/15, 1:58 PM, "lwg on behalf of Christopher J. Morrone"
>>> <lwg-bounces at lists.opensfs.org on behalf of morrone2@llnl.gov> wrote:
>>>
>>>> On 11/04/2015 10:28 AM, Cory Spitz wrote:
>>>>
>>>>> Lustre release cadence
>>>>> We haven?t been good about hitting our 6 month schedules
>>>>> Cory proposed a 9 month cadence just to recognize reality. Certainly
>>>>> pros/cons to any scheme.  Should be up for discussion. How/when to
>>>>> decide?
>>>>
>>>> Any development change like that needs to be discussed on lustre-devel.
>>>>
>>>> Chris
>>>>
>>>> _______________________________________________
>>>> lwg mailing list
>>>> lwg at lists.opensfs.org
>>>> http://lists.opensfs.org/listinfo.cgi/lwg-opensfs.org
>>>
>>> _______________________________________________
>>> lustre-devel mailing list
>>> lustre-devel at lists.opensfs.org
>>> http://lists.opensfs.org/listinfo.cgi/lustre-devel-opensfs.org
>>>
>>
>> _______________________________________________
>> lustre-devel mailing list
>> lustre-devel at lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org
>
> .
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-06 22:08         ` Christopher J. Morrone
@ 2015-11-07  0:46           ` Mitchell Erblich
  2015-11-07  1:07             ` [lustre-devel] How do I use mailman? Christopher J. Morrone
  2015-11-07  8:36           ` [lustre-devel] Should we have fewer releases? Drokin, Oleg
  1 sibling, 1 reply; 10+ messages in thread
From: Mitchell Erblich @ 2015-11-07  0:46 UTC (permalink / raw)
  To: lustre-devel

Group,

		Is their a reason why a person can not be unsubscribed?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] How do I use mailman?
  2015-11-07  0:46           ` Mitchell Erblich
@ 2015-11-07  1:07             ` Christopher J. Morrone
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher J. Morrone @ 2015-11-07  1:07 UTC (permalink / raw)
  To: lustre-devel

Mitchell,

Anyone that would like to unsubscribe if free to do so.  Here is the 
link to the mailing list's web page:

   http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

Here are instructions on how to unsubscribe from a mailing list managed 
by mailman to help you with the process:

   http://www.gnu.org/software/mailman/mailman-member/node14.html

Chris

On 11/06/2015 04:46 PM, Mitchell Erblich wrote:
> Group,
>
> 		Is their a reason why a person can not be unsubscribed?
>
>
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-06 22:08         ` Christopher J. Morrone
  2015-11-07  0:46           ` Mitchell Erblich
@ 2015-11-07  8:36           ` Drokin, Oleg
  2015-12-01  3:33             ` Christopher J. Morrone
  1 sibling, 1 reply; 10+ messages in thread
From: Drokin, Oleg @ 2015-11-07  8:36 UTC (permalink / raw)
  To: lustre-devel

Hello!

On Nov 6, 2015, at 5:08 PM, Christopher J. Morrone wrote:

> On 11/06/2015 06:39 AM, Drokin, Oleg wrote:
>> Hello!
>> 
>> On Nov 5, 2015, at 4:45 PM, Christopher J. Morrone wrote:
>>> On the contrary, we need to go in the opposite direction to achieve those goals.  We need to shorten the release cycle and have more frequent releases.  I would recommend that we move to to a roughly three month release cycle.  Some of the benefits might be:
>>> 
>>> * Less change and accumulate before the release
>>> * The penalty for missing a release landing window is reduced when releases are more often
>>> * Code reviewers have less pressure to land unfinished and/or insufficiently reviewed and tested code when the penalty is reduced
>>> * Less change means less to test and fix at release time
>>> * Bug authors are more likely to still remember what they did and participate in cleanup.
>>> * Less time before bugs that slip through the cracks appear in a major release
>>> * Reduces developer frustration with long freeze windows
>>> * Encourages developers to rally more frequently around the landing windows instead of falling into a long period of silence and then trying to shove a bunch of code in just before freeze.  (They'll still try to ram things in just before freeze, but with more frequent landing windows the amount will be smaller and more manageable.)
>> 
>> Bringing this to the logical extreme - we should just have one release per major feature.
> 
> It do not agree that it is logical to extend the argument to that extreme.  That is the "Appeal to Extremes" logical fallacy.

It probably is. But it's sometimes useful still.

> I also don't think it is appropriate to conflate major releases with major features.  When/if we move to a shorter release cycle, it would be entirely appropriate to put out a major release with no headline "major features".  It is totally acceptable to release the many changes that did make it in the landing window.  Even if none of the changes individually count as "major", they still collectively represent a major amount of work.

Yes, I agree there could be releases with no major new features, though attempts to make them in the past were not met with great enthusiasm.

> Right now we combine that major amount of work with seriously destabilizing new features that more than offset all the bug fixing that went on.  Why do we insist on making those destabilizing influences a requirement for a release?

That's what people want, apparently. Features are developed because there's a need for them.

> Whether a major feature makes it into any particular release should be judge primarily on the quality and completeness of code, testing, and documentation for said feature.  Further, how many major features can be landed in a release would be gated on the amount of manpower we have for review and testing.  If 3 major features are truely complete and ready to land, but we can only fully vet 1 in the landing window, well, only one will land.  We'll have to make a judgement call as a community on the priority and work on that.

I am skeptical this is going to work. If a feature is perceived to be ready, but is not accepted for whatever reason, those who feel they need it would just find some way of using it anyway.
And it would lead to more fragmentation in the end.

> In summary: I think we should decouple the concept of major releases and major features.  Major releases do not need to be subject to major features.

Should there be a period where no new features were developed into the "Ready to include" state, yes - I am all for it.
I guess you think this is going to be easier to achieve by shortening time to next release. It's just right now we have such a backlog of features that that might not be a realistic assumption.

>> Sadly, I think the stabilization process is not likely to get any shorter.
> Do not see a connection between the amount of change and the time it takes to stabilize that change?  Can you explain why you think that?

Testing (and vetting) takes a fixed time. For large scale community testing we also depend on large systems availability schedule. These do not change.
Any problems found would require a retest once the fix is in place.
Then there's a backlog of "deferred" bugs that are not deemed super critical, but as number of truly critical bugs goes down, I suspect those bugs from backlog would be viewed
as more serious (and I don't think it's a bad thing).

Of course I might be all wrong on this, but it's just my feeling. If we take any past Lustre release and add another X months of pure code freeze and stabilization,
do you think that particular release would not have benefitted from that?
I suspect same is true of (almost?) any other software project.

>> Either that or interested parties would only jump into testing when enough of interesting features accumulate,
>> after which point there'd be a bunch of bugreports for the current feature plus the backlocd that did not get any significant real-world testing before. We have seen this pattern
>> to some degree already even with current releases.
> The scary future you paint is no different than our present. Organizations like LLNL only move to new major releases every 18 months at the earliest, and we would really like to run the same version for more like three years in some cases.  We are too busy drowning in production Lustre issues half the time to get involved in testing except when it is something that is on our roadmap to put into production.  I don't think we're alone.  Even if it isn't Lustre issues, everyone has day jobs that keep us busy and time for testing things that don't look immediately relevant to upper management can be difficult to justify.

Indeed. I am not painting any scare future, I am just making observations about today.

> So I agree, many people already are skipping the testing of many releases and that will continue into the future.
> 
> Frankly, I think that relying on an open source community to do rigorous and systematic testing is foolhardy.  The only way that really works is if your user base is large in proportion to the size of your code size and complexity.  I would estimate the Lustre is low in that ratio, while something like ZFS is probably medium to large, and Linux is large.
> 
> The testing you get from an open source community is going to be a fairly random in terms of code coverage.  In order to the coverage to be reasonably complete, you need _alot_ of people testing.

This is very true. We need many more unique environments to extend the coverage. That or finding some way of somehow forcing every possible code path to execute somehow in testing, which is
not really realistic.

> If we rely on a voluntary, at-will community testing as out primary SQA enforcement method, we are not going to ever put out terribly quality code with something as complex and poorly documented as Lustre.
> 
> Lets not apply the Appeal to Extremes argument to this either.  I am not saying that we shouldn't have testing.  We absolutely should.  We should also strive to make the barriers to testing as low as possible,
> and make the opportunities for testing as frequent as reasonble.
> 
> If we have release every three months on a _reliable_ schedule, that will give prospective testers the ability to plan their testing time ahead, increases the probability that each prospective tester will have spare time that aligns with one of our release testing windows.

We need all the diverse testing we can get and then some. So there's no disagreement from me here.
If you think just by doubling the number of releases gets us double the testing time from community, that alone might be worth it.

> All that said, I think you might also be wrong about no one testing the each releases.  ORNL has already demonstrated a commitment to try every version.  Cray is stepping up testing.  I would like to have my team at LLNL become more active on master in the future, and have our testing person worked into the Lustre development cycle.

There were releases in the past when this was true for various reasons.

>> The releases that are ignored by community for one reason or another tend to be not very stable and then the follow-on release
>> gets this "testing debt" baggage that is paid at release time once testing outside of Intel picks up the pace.
> 
> That is a challenge now, and I acknowledge that it will continue to be a challenge in the future.
> 
> Making the releases more frequently and on a reliable schedule is not magic; it will not fix everything about our development process on its own.  Nevertheless I do believe that it will be a key supporting element in improving our software development and SQA processes.

We just need to ensure the rate at which bugs are introduced is a lot smaller than the rate at which bugs are fixed. ;)
And we need to also achieve this without choking new features somehow.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [lustre-devel] Should we have fewer releases?
  2015-11-07  8:36           ` [lustre-devel] Should we have fewer releases? Drokin, Oleg
@ 2015-12-01  3:33             ` Christopher J. Morrone
  0 siblings, 0 replies; 10+ messages in thread
From: Christopher J. Morrone @ 2015-12-01  3:33 UTC (permalink / raw)
  To: lustre-devel

On 11/07/2015 12:36 AM, Drokin, Oleg wrote:
> Hello!
>
> On Nov 6, 2015, at 5:08 PM, Christopher J. Morrone wrote:
>
>> On 11/06/2015 06:39 AM, Drokin, Oleg wrote:
>>> Hello!
>>>
>>> On Nov 5, 2015, at 4:45 PM, Christopher J. Morrone wrote:
>>>> On the contrary, we need to go in the opposite direction to achieve those goals.  We need to shorten the release cycle and have more frequent releases.  I would recommend that we move to to a roughly three month release cycle.  Some of the benefits might be:
>>>>
>>>> * Less change and accumulate before the release
>>>> * The penalty for missing a release landing window is reduced when releases are more often
>>>> * Code reviewers have less pressure to land unfinished and/or insufficiently reviewed and tested code when the penalty is reduced
>>>> * Less change means less to test and fix at release time
>>>> * Bug authors are more likely to still remember what they did and participate in cleanup.
>>>> * Less time before bugs that slip through the cracks appear in a major release
>>>> * Reduces developer frustration with long freeze windows
>>>> * Encourages developers to rally more frequently around the landing windows instead of falling into a long period of silence and then trying to shove a bunch of code in just before freeze.  (They'll still try to ram things in just before freeze, but with more frequent landing windows the amount will be smaller and more manageable.)
>>>
>>> Bringing this to the logical extreme - we should just have one release per major feature.
>>
>> It do not agree that it is logical to extend the argument to that extreme.  That is the "Appeal to Extremes" logical fallacy.
>
> It probably is. But it's sometimes useful still.
>
>> I also don't think it is appropriate to conflate major releases with major features.  When/if we move to a shorter release cycle, it would be entirely appropriate to put out a major release with no headline "major features".  It is totally acceptable to release the many changes that did make it in the landing window.  Even if none of the changes individually count as "major", they still collectively represent a major amount of work.
>
> Yes, I agree there could be releases with no major new features, though attempts to make them in the past were not met with great enthusiasm.
>
>> Right now we combine that major amount of work with seriously destabilizing new features that more than offset all the bug fixing that went on.  Why do we insist on making those destabilizing influences a requirement for a release?
>
> That's what people want, apparently. Features are developed because there's a need for them.

Yes, people want everything.  While we want features, we want stability 
just as much.  So far stability has been playing second chair to 
features with Lustre.  We can't stop adding features, but we do need to 
shift the level of balance a bit.

>> Whether a major feature makes it into any particular release should be judge primarily on the quality and completeness of code, testing, and documentation for said feature.  Further, how many major features can be landed in a release would be gated on the amount of manpower we have for review and testing.  If 3 major features are truely complete and ready to land, but we can only fully vet 1 in the landing window, well, only one will land.  We'll have to make a judgement call as a community on the priority and work on that.
>
> I am skeptical this is going to work. If a feature is perceived to be ready, but is not accepted for whatever reason, those who feel they need it would just find some way of using it anyway.
> And it would lead to more fragmentation in the end.

If you really think about it, we already have that fragmentation today. 
  Hopefully the burden of constantly refreshing one's private major 
features to work with master will convince organizations that it is in 
their best effort to upstream their code.

I think that what causes people to be most annoyed and stop working with 
an upstream is when the upstream developers are uncommunicative and 
seemingly arbitrary.  We could use some work in that area, I think, if 
we are all honest.

I think that if we all discuss landing priority and some's feature is 
deemed too low to get into the current release, yes, that author will be 
disappointed.  But I think that the vast majority of developers would 
also be understanding.  They would especially be understanding if we 
explain that it will be at the top of the list for the next landing cycle.

Feedback and communication go a long way to keeping everyone content.

>> In summary: I think we should decouple the concept of major releases and major features.  Major releases do not need to be subject to major features.
>
> Should there be a period where no new features were developed into the "Ready to include" state, yes - I am all for it.
> I guess you think this is going to be easier to achieve by shortening time to next release. It's just right now we have such a backlog of features that that might not be a realistic assumption.

We have a backlog of stability and usability too.  I don't care how big 
the feature backlog is, we can't allow them in unless they are 
reasonably stable.

>>> Sadly, I think the stabilization process is not likely to get any shorter.
>> Do not see a connection between the amount of change and the time it takes to stabilize that change?  Can you explain why you think that?
>
> Testing (and vetting) takes a fixed time. For large scale community testing we also depend on large systems availability schedule. These do not change.

If you are right that test dates on large systems are unchangeable, then 
our current unpredicatble releases are completely incompatible with that 
model.

But, with all due respect, I think you are wrong about that.  Test dates 
on large machines _are_ changeable.  Those dates need to be scheduled in 
advance, but we _can_ change when those future testing windows will be.

Also, if we can't find ways to improve SQA other than small testing 
windows on large systems, then we may as well just give up on Lustre 
now.  That will never result in quality software.

Fortunately, I do not believe that is the case.  I think there are many 
ways that we can improve the development processes over time to result 
in higher quality software.  Testing won't catch enough on its own.

> Any problems found would require a retest once the fix is in place.
> Then there's a backlog of "deferred" bugs that are not deemed super critical, but as number of truly critical bugs goes down, I suspect those bugs from backlog would be viewed
> as more serious (and I don't think it's a bad thing).
>
> Of course I might be all wrong on this, but it's just my feeling. If we take any past Lustre release and add another X months of pure code freeze and stabilization,
> do you think that particular release would not have benefitted from that?
> I suspect same is true of (almost?) any other software project.

Not really.  We have already been adding X months of code freeze 
randomly as needed.  Once the easily found bugs are squashed the 
developers move on to adding new bugs^H^H^H^Hfeatures.  I don't think 
that testing alone is going to solve our quality problem.

[cut some things we agreed upon]
>> If we have release every three months on a _reliable_ schedule, that will give prospective testers the ability to plan their testing time ahead, increases the probability that each prospective tester will have spare time that aligns with one of our release testing windows.
>
> We need all the diverse testing we can get and then some. So there's no disagreement from me here.
> If you think just by doubling the number of releases gets us double the testing time from community, that alone might be worth it.

No, I don't think the testing will double.  I think a smaller overall 
improvement (10%? 20%? 30%?) might be reasonable.

I want this change not primarily for testing, but for the other 
advantages that it can provide to the development process.  Mainly it 
allows us to start getting control of the amount of change in each release.

Less change and shorter delay between landing and testing will tend to 
decrease the difficulty of removing the bugs that we find.

We are under less pressure to land unfinished features because the 
penalty for missing a release is lowered (only 3 months until the next 
release instead of the current 6-9 months).

Etc.

[cut]
>> Making the releases more frequently and on a reliable schedule is not magic; it will not fix everything about our development process on its own.  Nevertheless I do believe that it will be a key supporting element in improving our software development and SQA processes.
>
> We just need to ensure the rate at which bugs are introduced is a lot smaller than the rate at which bugs are fixed. ;)
> And we need to also achieve this without choking new features somehow.

I agree!

We haven't been too successful at introducing fewer bugs than we fix 
thus far with Lustre. :)

The painful truth is that feature progress probably needs to be slower 
if we want higher quality software in the future.  We shouldn't stop 
feature development, but we should take more care in their landing than 
we have in the past if we want stability to improve.

It is always a balancing game.

Chris

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-12-01  3:33 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <D25FA9D2.89BFF%spitzcor@cray.com>
     [not found] ` <563A6351.4080407@llnl.gov>
     [not found]   ` <D25FC1DB.89C48%spitzcor@cray.com>
2015-11-05 21:45     ` [lustre-devel] Should we have fewer releases? Christopher J. Morrone
2015-11-06 13:53       ` DEGREMONT Aurelien
2015-11-06 14:33         ` Rick Wagner
2015-11-06 22:18         ` Christopher J. Morrone
2015-11-06 14:39       ` Drokin, Oleg
2015-11-06 22:08         ` Christopher J. Morrone
2015-11-07  0:46           ` Mitchell Erblich
2015-11-07  1:07             ` [lustre-devel] How do I use mailman? Christopher J. Morrone
2015-11-07  8:36           ` [lustre-devel] Should we have fewer releases? Drokin, Oleg
2015-12-01  3:33             ` Christopher J. Morrone

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.