All of lore.kernel.org
 help / color / mirror / Atom feed
* Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-13 21:57 Thomas D.
  2016-06-15  0:02   ` Dave Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas D. @ 2016-06-13 21:57 UTC (permalink / raw)
  To: xfs

Hi,

can anybody confirm if there's something broken with the latest XFS
change set which is now applied on all stable kernels?

I found https://forums.grsecurity.net/viewtopic.php?t=4489&p=16355 and
grsec changelog says

> commit 1f621dc42acbabb71bd69f6ba606cee56e7ad3bc
> Author: Brad Spengler <spender@grsecurity.net>
> Date:   Sat Jun 11 08:14:32 2016 -0400
> 
>     Fix Greg KH's broken XFS backport, caused a benign case to be detected
>     as disk corruption
>     Problem was due to a tree-wide conversion of error codes to their negative
>     counterparts, which would likely never be backported to older kernels, but
>     the backports didn't account for the change
> 
>  fs/xfs/xfs_inode.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

This is the change grsec applied:

> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index fb8579d..af807d8 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3098,7 +3111,7 @@ xfs_iflush(
>  	 */
>  	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
>  			       0);
> -	if (error == -EAGAIN) {
> +	if (error == EAGAIN) {
>  		xfs_ifunlock(ip);
>  		return error;
>  	}

The bad commit according to grsec's statement is

> From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> From: Dave Chinner <dchinner@redhat.com>
> Date: Wed, 18 May 2016 13:53:42 +1000
> Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error

Would be nice to get some clarification.

Thanks.


-- 
Regards,
Thomas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-13 21:57 Something badly broken with the latest XFS changeset in all stable kernels? Thomas D.
@ 2016-06-15  0:02   ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2016-06-15  0:02 UTC (permalink / raw)
  To: Thomas D.; +Cc: spender, stable, xfs

On Mon, Jun 13, 2016 at 11:57:53PM +0200, Thomas D. wrote:
> Hi,
> 
> can anybody confirm if there's something broken with the latest XFS
> change set which is now applied on all stable kernels?
> 
> I found https://forums.grsecurity.net/viewtopic.php?t=4489&p=16355 and
> grsec changelog says
> 
> > commit 1f621dc42acbabb71bd69f6ba606cee56e7ad3bc
> > Author: Brad Spengler <spender@grsecurity.net>
> > Date:   Sat Jun 11 08:14:32 2016 -0400
> > 
> >     Fix Greg KH's broken XFS backport, caused a benign case to be detected
> >     as disk corruption
> >     Problem was due to a tree-wide conversion of error codes to their negative
> >     counterparts, which would likely never be backported to older kernels, but
> >     the backports didn't account for the change
> > 
> >  fs/xfs/xfs_inode.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)

Yes, the backport is busted, and needs this fix. The error sign
change occurred in 3.17. xfstests would have picked up this
regression in a couple of minutes, so I'm guessing that none of
these stable releases have had any significant regression testing
done....

> This is the change grsec applied:
> 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index fb8579d..af807d8 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3098,7 +3111,7 @@ xfs_iflush(
> >  	 */
> >  	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
> >  			       0);
> > -	if (error == -EAGAIN) {
> > +	if (error == EAGAIN) {
> >  		xfs_ifunlock(ip);
> >  		return error;
> >  	}

Yes, that is the fix that is needed. Thank you for reporting it to
us.

Mr Spender: it would be appreciated if you reported stable kernel
regressions to the relevant upstream maintainers so they can be
fixed quickly for everyone, rather than having one of your users
decide it needs to be reported.

Stable kernel maintainers: the above error sign change is needed for
stable kernels 3.16 and earlier, as a matter of critical importance.
And as a further matter of critical importance: in future, please
take the time to regression test the changes you backport.

> The bad commit according to grsec's statement is
> 
> > From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> > From: Dave Chinner <dchinner@redhat.com>
> > Date: Wed, 18 May 2016 13:53:42 +1000
> > Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error
> 
> Would be nice to get some clarification.

There's nothing wrong with that commit in the upstream kernel,
it's the backport that has a bug in it because it failed to take
into account changes outside the context of the upstream commit that
the older kernels don't have.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  0:02   ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2016-06-15  0:02 UTC (permalink / raw)
  To: Thomas D.; +Cc: xfs, stable, spender

On Mon, Jun 13, 2016 at 11:57:53PM +0200, Thomas D. wrote:
> Hi,
> 
> can anybody confirm if there's something broken with the latest XFS
> change set which is now applied on all stable kernels?
> 
> I found https://forums.grsecurity.net/viewtopic.php?t=4489&p=16355 and
> grsec changelog says
> 
> > commit 1f621dc42acbabb71bd69f6ba606cee56e7ad3bc
> > Author: Brad Spengler <spender@grsecurity.net>
> > Date:   Sat Jun 11 08:14:32 2016 -0400
> > 
> >     Fix Greg KH's broken XFS backport, caused a benign case to be detected
> >     as disk corruption
> >     Problem was due to a tree-wide conversion of error codes to their negative
> >     counterparts, which would likely never be backported to older kernels, but
> >     the backports didn't account for the change
> > 
> >  fs/xfs/xfs_inode.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)

Yes, the backport is busted, and needs this fix. The error sign
change occurred in 3.17. xfstests would have picked up this
regression in a couple of minutes, so I'm guessing that none of
these stable releases have had any significant regression testing
done....

> This is the change grsec applied:
> 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index fb8579d..af807d8 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -3098,7 +3111,7 @@ xfs_iflush(
> >  	 */
> >  	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
> >  			       0);
> > -	if (error == -EAGAIN) {
> > +	if (error == EAGAIN) {
> >  		xfs_ifunlock(ip);
> >  		return error;
> >  	}

Yes, that is the fix that is needed. Thank you for reporting it to
us.

Mr Spender: it would be appreciated if you reported stable kernel
regressions to the relevant upstream maintainers so they can be
fixed quickly for everyone, rather than having one of your users
decide it needs to be reported.

Stable kernel maintainers: the above error sign change is needed for
stable kernels 3.16 and earlier, as a matter of critical importance.
And as a further matter of critical importance: in future, please
take the time to regression test the changes you backport.

> The bad commit according to grsec's statement is
> 
> > From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> > From: Dave Chinner <dchinner@redhat.com>
> > Date: Wed, 18 May 2016 13:53:42 +1000
> > Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error
> 
> Would be nice to get some clarification.

There's nothing wrong with that commit in the upstream kernel,
it's the backport that has a bug in it because it failed to take
into account changes outside the context of the upstream commit that
the older kernels don't have.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-15  0:02   ` Dave Chinner
@ 2016-06-15  1:30     ` Greg KH
  -1 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2016-06-15  1:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Thomas D., spender, stable, xfs

On Wed, Jun 15, 2016 at 10:02:41AM +1000, Dave Chinner wrote:
> On Mon, Jun 13, 2016 at 11:57:53PM +0200, Thomas D. wrote:
> > Hi,
> > 
> > can anybody confirm if there's something broken with the latest XFS
> > change set which is now applied on all stable kernels?
> > 
> > I found https://forums.grsecurity.net/viewtopic.php?t=4489&p=16355 and
> > grsec changelog says
> > 
> > > commit 1f621dc42acbabb71bd69f6ba606cee56e7ad3bc
> > > Author: Brad Spengler <spender@grsecurity.net>
> > > Date:   Sat Jun 11 08:14:32 2016 -0400
> > > 
> > >     Fix Greg KH's broken XFS backport, caused a benign case to be detected
> > >     as disk corruption
> > >     Problem was due to a tree-wide conversion of error codes to their negative
> > >     counterparts, which would likely never be backported to older kernels, but
> > >     the backports didn't account for the change
> > > 
> > >  fs/xfs/xfs_inode.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Yes, the backport is busted, and needs this fix. The error sign
> change occurred in 3.17. xfstests would have picked up this
> regression in a couple of minutes, so I'm guessing that none of
> these stable releases have had any significant regression testing
> done....
> 
> > This is the change grsec applied:
> > 
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index fb8579d..af807d8 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -3098,7 +3111,7 @@ xfs_iflush(
> > >  	 */
> > >  	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
> > >  			       0);
> > > -	if (error == -EAGAIN) {
> > > +	if (error == EAGAIN) {
> > >  		xfs_ifunlock(ip);
> > >  		return error;
> > >  	}
> 
> Yes, that is the fix that is needed. Thank you for reporting it to
> us.
> 
> Mr Spender: it would be appreciated if you reported stable kernel
> regressions to the relevant upstream maintainers so they can be
> fixed quickly for everyone, rather than having one of your users
> decide it needs to be reported.
> 
> Stable kernel maintainers: the above error sign change is needed for
> stable kernels 3.16 and earlier, as a matter of critical importance.
> And as a further matter of critical importance: in future, please
> take the time to regression test the changes you backport.
> 
> > The bad commit according to grsec's statement is
> > 
> > > From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> > > From: Dave Chinner <dchinner@redhat.com>
> > > Date: Wed, 18 May 2016 13:53:42 +1000
> > > Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error
> > 
> > Would be nice to get some clarification.
> 
> There's nothing wrong with that commit in the upstream kernel,
> it's the backport that has a bug in it because it failed to take
> into account changes outside the context of the upstream commit that
> the older kernels don't have.

Thanks for letting me know about this.

As the patch was tagged with 3.10+, I assumed that it was safe to be
merged to those older kernels, otherwise I would never have done so.  We
do have ways to mark external things like this for stable patches, it's
a great help when doing backports.

thanks,

greg k-h

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  1:30     ` Greg KH
  0 siblings, 0 replies; 15+ messages in thread
From: Greg KH @ 2016-06-15  1:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Thomas D., xfs, stable, spender

On Wed, Jun 15, 2016 at 10:02:41AM +1000, Dave Chinner wrote:
> On Mon, Jun 13, 2016 at 11:57:53PM +0200, Thomas D. wrote:
> > Hi,
> > 
> > can anybody confirm if there's something broken with the latest XFS
> > change set which is now applied on all stable kernels?
> > 
> > I found https://forums.grsecurity.net/viewtopic.php?t=4489&p=16355 and
> > grsec changelog says
> > 
> > > commit 1f621dc42acbabb71bd69f6ba606cee56e7ad3bc
> > > Author: Brad Spengler <spender@grsecurity.net>
> > > Date:   Sat Jun 11 08:14:32 2016 -0400
> > > 
> > >     Fix Greg KH's broken XFS backport, caused a benign case to be detected
> > >     as disk corruption
> > >     Problem was due to a tree-wide conversion of error codes to their negative
> > >     counterparts, which would likely never be backported to older kernels, but
> > >     the backports didn't account for the change
> > > 
> > >  fs/xfs/xfs_inode.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Yes, the backport is busted, and needs this fix. The error sign
> change occurred in 3.17. xfstests would have picked up this
> regression in a couple of minutes, so I'm guessing that none of
> these stable releases have had any significant regression testing
> done....
> 
> > This is the change grsec applied:
> > 
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index fb8579d..af807d8 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -3098,7 +3111,7 @@ xfs_iflush(
> > >  	 */
> > >  	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
> > >  			       0);
> > > -	if (error == -EAGAIN) {
> > > +	if (error == EAGAIN) {
> > >  		xfs_ifunlock(ip);
> > >  		return error;
> > >  	}
> 
> Yes, that is the fix that is needed. Thank you for reporting it to
> us.
> 
> Mr Spender: it would be appreciated if you reported stable kernel
> regressions to the relevant upstream maintainers so they can be
> fixed quickly for everyone, rather than having one of your users
> decide it needs to be reported.
> 
> Stable kernel maintainers: the above error sign change is needed for
> stable kernels 3.16 and earlier, as a matter of critical importance.
> And as a further matter of critical importance: in future, please
> take the time to regression test the changes you backport.
> 
> > The bad commit according to grsec's statement is
> > 
> > > From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> > > From: Dave Chinner <dchinner@redhat.com>
> > > Date: Wed, 18 May 2016 13:53:42 +1000
> > > Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error
> > 
> > Would be nice to get some clarification.
> 
> There's nothing wrong with that commit in the upstream kernel,
> it's the backport that has a bug in it because it failed to take
> into account changes outside the context of the upstream commit that
> the older kernels don't have.

Thanks for letting me know about this.

As the patch was tagged with 3.10+, I assumed that it was safe to be
merged to those older kernels, otherwise I would never have done so.  We
do have ways to mark external things like this for stable patches, it's
a great help when doing backports.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-15  1:30     ` Greg KH
@ 2016-06-15  5:28       ` Willy Tarreau
  -1 siblings, 0 replies; 15+ messages in thread
From: Willy Tarreau @ 2016-06-15  5:28 UTC (permalink / raw)
  To: Greg KH; +Cc: Thomas D., spender, stable, xfs

On Tue, Jun 14, 2016 at 06:30:56PM -0700, Greg KH wrote:
> As the patch was tagged with 3.10+, I assumed that it was safe to be
> merged to those older kernels, otherwise I would never have done so.  We
> do have ways to mark external things like this for stable patches, it's
> a great help when doing backports.

I guess only 3.14.72 was affected in the end. 3.18.35 has the fix but my
understanding is that after 3.17 it's OK. Apparently it's not yet queued
for 3.16. Jiri has queued it for 3.12 but not yet released it. For 3.10
I only pick patches that are already in a 3.14 release so I didn't have
the time to backport it to 3.10 yet as the preview started before the
release. Overall the regression lived only 8 days in a single branch, I
guess it shows that our process works rather well and limits the exposure
to regressions.

Brad, it would have been nice to report it to the stable team especially
since you rely on these kernels for yours as well.

Regards,
Willy

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  5:28       ` Willy Tarreau
  0 siblings, 0 replies; 15+ messages in thread
From: Willy Tarreau @ 2016-06-15  5:28 UTC (permalink / raw)
  To: Greg KH; +Cc: Dave Chinner, Thomas D., xfs, stable, spender

On Tue, Jun 14, 2016 at 06:30:56PM -0700, Greg KH wrote:
> As the patch was tagged with 3.10+, I assumed that it was safe to be
> merged to those older kernels, otherwise I would never have done so.  We
> do have ways to mark external things like this for stable patches, it's
> a great help when doing backports.

I guess only 3.14.72 was affected in the end. 3.18.35 has the fix but my
understanding is that after 3.17 it's OK. Apparently it's not yet queued
for 3.16. Jiri has queued it for 3.12 but not yet released it. For 3.10
I only pick patches that are already in a 3.14 release so I didn't have
the time to backport it to 3.10 yet as the preview started before the
release. Overall the regression lived only 8 days in a single branch, I
guess it shows that our process works rather well and limits the exposure
to regressions.

Brad, it would have been nice to report it to the stable team especially
since you rely on these kernels for yours as well.

Regards,
Willy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-15  1:30     ` Greg KH
@ 2016-06-15  6:40       ` Dave Chinner
  -1 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2016-06-15  6:40 UTC (permalink / raw)
  To: Greg KH; +Cc: Thomas D., spender, stable, xfs

On Tue, Jun 14, 2016 at 06:30:56PM -0700, Greg KH wrote:
> On Wed, Jun 15, 2016 at 10:02:41AM +1000, Dave Chinner wrote:
> > On Mon, Jun 13, 2016 at 11:57:53PM +0200, Thomas D. wrote:
> > > The bad commit according to grsec's statement is
> > > 
> > > > From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > Date: Wed, 18 May 2016 13:53:42 +1000
> > > > Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error
> > > 
> > > Would be nice to get some clarification.
> > 
> > There's nothing wrong with that commit in the upstream kernel,
> > it's the backport that has a bug in it because it failed to take
> > into account changes outside the context of the upstream commit that
> > the older kernels don't have.
> 
> Thanks for letting me know about this.
> 
> As the patch was tagged with 3.10+, I assumed that it was safe to be
> merged to those older kernels, otherwise I would never have done so.  We
> do have ways to mark external things like this for stable patches, it's
> a great help when doing backports.

Little things like this are very easy to forget about - those error
sign changes are ancient history as far as upstream development is
concerned. This is why we have regression tests - the zero-day
kernel robot can run xfstests - perhaps stable kernels should be
submitted to a full round of testing before release to catch
subtle "patch applies but ends up wrong" issues like this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  6:40       ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2016-06-15  6:40 UTC (permalink / raw)
  To: Greg KH; +Cc: Thomas D., xfs, stable, spender

On Tue, Jun 14, 2016 at 06:30:56PM -0700, Greg KH wrote:
> On Wed, Jun 15, 2016 at 10:02:41AM +1000, Dave Chinner wrote:
> > On Mon, Jun 13, 2016 at 11:57:53PM +0200, Thomas D. wrote:
> > > The bad commit according to grsec's statement is
> > > 
> > > > From b1438f477934f5a4d5a44df26f3079a7575d5946 Mon Sep 17 00:00:00 2001
> > > > From: Dave Chinner <dchinner@redhat.com>
> > > > Date: Wed, 18 May 2016 13:53:42 +1000
> > > > Subject: [PATCH] xfs: xfs_iflush_cluster fails to abort on error
> > > 
> > > Would be nice to get some clarification.
> > 
> > There's nothing wrong with that commit in the upstream kernel,
> > it's the backport that has a bug in it because it failed to take
> > into account changes outside the context of the upstream commit that
> > the older kernels don't have.
> 
> Thanks for letting me know about this.
> 
> As the patch was tagged with 3.10+, I assumed that it was safe to be
> merged to those older kernels, otherwise I would never have done so.  We
> do have ways to mark external things like this for stable patches, it's
> a great help when doing backports.

Little things like this are very easy to forget about - those error
sign changes are ancient history as far as upstream development is
concerned. This is why we have regression tests - the zero-day
kernel robot can run xfstests - perhaps stable kernels should be
submitted to a full round of testing before release to catch
subtle "patch applies but ends up wrong" issues like this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-15  5:28       ` Willy Tarreau
@ 2016-06-15  6:51         ` Dave Chinner
  -1 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2016-06-15  6:51 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Greg KH, xfs, spender, stable, Thomas D.

On Wed, Jun 15, 2016 at 07:28:31AM +0200, Willy Tarreau wrote:
> On Tue, Jun 14, 2016 at 06:30:56PM -0700, Greg KH wrote:
> > As the patch was tagged with 3.10+, I assumed that it was safe to be
> > merged to those older kernels, otherwise I would never have done so.  We
> > do have ways to mark external things like this for stable patches, it's
> > a great help when doing backports.
> 
> I guess only 3.14.72 was affected in the end. 3.18.35 has the fix but my
> understanding is that after 3.17 it's OK.

Correct.

> Apparently it's not yet queued
> for 3.16.

The queue for 3.16 has a different bunch of XFS stuff that I've
previous said is pretty risky and not advisable to back port (the
mmaplock stuff). I'm waiting for that to get to users and a new
bunch of whacky problems to be reported...

> Jiri has queued it for 3.12 but not yet released it. For 3.10
> I only pick patches that are already in a 3.14 release so I didn't have
> the time to backport it to 3.10 yet as the preview started before the
> release.

Just lucky, eh?

> Overall the regression lived only 8 days in a single branch, I
> guess it shows that our process works rather well and limits the exposure
> to regressions.

That it got as far as release and it took so long to get to the
upstream maintainers shows the process could do with being improved.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  6:51         ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2016-06-15  6:51 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Greg KH, Thomas D., xfs, stable, spender

On Wed, Jun 15, 2016 at 07:28:31AM +0200, Willy Tarreau wrote:
> On Tue, Jun 14, 2016 at 06:30:56PM -0700, Greg KH wrote:
> > As the patch was tagged with 3.10+, I assumed that it was safe to be
> > merged to those older kernels, otherwise I would never have done so.  We
> > do have ways to mark external things like this for stable patches, it's
> > a great help when doing backports.
> 
> I guess only 3.14.72 was affected in the end. 3.18.35 has the fix but my
> understanding is that after 3.17 it's OK.

Correct.

> Apparently it's not yet queued
> for 3.16.

The queue for 3.16 has a different bunch of XFS stuff that I've
previous said is pretty risky and not advisable to back port (the
mmaplock stuff). I'm waiting for that to get to users and a new
bunch of whacky problems to be reported...

> Jiri has queued it for 3.12 but not yet released it. For 3.10
> I only pick patches that are already in a 3.14 release so I didn't have
> the time to backport it to 3.10 yet as the preview started before the
> release.

Just lucky, eh?

> Overall the regression lived only 8 days in a single branch, I
> guess it shows that our process works rather well and limits the exposure
> to regressions.

That it got as far as release and it took so long to get to the
upstream maintainers shows the process could do with being improved.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-15  5:28       ` Willy Tarreau
@ 2016-06-15  7:00         ` Jiri Slaby
  -1 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-06-15  7:00 UTC (permalink / raw)
  To: Willy Tarreau, Greg KH; +Cc: Thomas D., spender, stable, xfs

On 06/15/2016, 07:28 AM, Willy Tarreau wrote:
> Jiri has queued it for 3.12 but not yet released it.

Fixed now, thanks!

-- 
js
suse labs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  7:00         ` Jiri Slaby
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2016-06-15  7:00 UTC (permalink / raw)
  To: Willy Tarreau, Greg KH; +Cc: Dave Chinner, Thomas D., xfs, stable, spender

On 06/15/2016, 07:28 AM, Willy Tarreau wrote:
> Jiri has queued it for 3.12 but not yet released it.

Fixed now, thanks!

-- 
js
suse labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
  2016-06-15  6:51         ` Dave Chinner
@ 2016-06-15  7:14           ` Willy Tarreau
  -1 siblings, 0 replies; 15+ messages in thread
From: Willy Tarreau @ 2016-06-15  7:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Greg KH, xfs, spender, stable, Thomas D.

On Wed, Jun 15, 2016 at 04:51:21PM +1000, Dave Chinner wrote:
> > Overall the regression lived only 8 days in a single branch, I
> > guess it shows that our process works rather well and limits the exposure
> > to regressions.
> 
> That it got as far as release and it took so long to get to the
> upstream maintainers shows the process could do with being improved.

Yep but users relying on oldest LTS usually run on servers that cannot
reboot without prior planning, so in practice we know that users tend
to be late by 1 week or more and often one or two versions. I'm pretty
sure this is the reason why a single user reported the issue. I'm not
saying this to dismiss the importance of a regression, just the fact
that the zero-risk never exists and that if an issue can only be
discovered in field (since the commit was backported according to the
tags and that the need to adapt it was not detected during the review
process), it will need one first victim to save all other users. In the
end the quick enough report ensured that most 3.14 users will not have
the time to upgrade to the broken version and 3.10/3.12 users will not
even have the opportunity to see the issue at all. That's what I'm
saying is not that bad.

We could imagine some automated tests depending on the affected files
and/or subsystems but it's not even certain that an automated test
would have detected this one :-/  We need to thank the initial reporter
here, and to encourage Brad to pass the information back ASAP next time
this happens via his channel.

Willy

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Something badly broken with the latest XFS changeset in all stable kernels?
@ 2016-06-15  7:14           ` Willy Tarreau
  0 siblings, 0 replies; 15+ messages in thread
From: Willy Tarreau @ 2016-06-15  7:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Greg KH, Thomas D., xfs, stable, spender

On Wed, Jun 15, 2016 at 04:51:21PM +1000, Dave Chinner wrote:
> > Overall the regression lived only 8 days in a single branch, I
> > guess it shows that our process works rather well and limits the exposure
> > to regressions.
> 
> That it got as far as release and it took so long to get to the
> upstream maintainers shows the process could do with being improved.

Yep but users relying on oldest LTS usually run on servers that cannot
reboot without prior planning, so in practice we know that users tend
to be late by 1 week or more and often one or two versions. I'm pretty
sure this is the reason why a single user reported the issue. I'm not
saying this to dismiss the importance of a regression, just the fact
that the zero-risk never exists and that if an issue can only be
discovered in field (since the commit was backported according to the
tags and that the need to adapt it was not detected during the review
process), it will need one first victim to save all other users. In the
end the quick enough report ensured that most 3.14 users will not have
the time to upgrade to the broken version and 3.10/3.12 users will not
even have the opportunity to see the issue at all. That's what I'm
saying is not that bad.

We could imagine some automated tests depending on the affected files
and/or subsystems but it's not even certain that an automated test
would have detected this one :-/  We need to thank the initial reporter
here, and to encourage Brad to pass the information back ASAP next time
this happens via his channel.

Willy

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-06-15  7:14 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13 21:57 Something badly broken with the latest XFS changeset in all stable kernels? Thomas D.
2016-06-15  0:02 ` Dave Chinner
2016-06-15  0:02   ` Dave Chinner
2016-06-15  1:30   ` Greg KH
2016-06-15  1:30     ` Greg KH
2016-06-15  5:28     ` Willy Tarreau
2016-06-15  5:28       ` Willy Tarreau
2016-06-15  6:51       ` Dave Chinner
2016-06-15  6:51         ` Dave Chinner
2016-06-15  7:14         ` Willy Tarreau
2016-06-15  7:14           ` Willy Tarreau
2016-06-15  7:00       ` Jiri Slaby
2016-06-15  7:00         ` Jiri Slaby
2016-06-15  6:40     ` Dave Chinner
2016-06-15  6:40       ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.