All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: always free inline data before resetting inode fork during ifree
@ 2017-11-23  6:01 Darrick J. Wong
  2017-11-23  8:14 ` Christoph Hellwig
  2018-03-23  1:30 ` Luis R. Rodriguez
  0 siblings, 2 replies; 34+ messages in thread
From: Darrick J. Wong @ 2017-11-23  6:01 UTC (permalink / raw)
  To: xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

In xfs_ifree, we reset the data/attr forks to extents format without
bothering to free any inline data buffer that might still be around
after all the blocks have been truncated off the file.  Prior to commit
43518812d2 ("xfs: remove support for inlining data/extents into the
inode fork") nobody noticed because the leftover inline data after
truncation was small enough to fit inside the inline buffer inside the
fork itself.

However, now that we've removed the inline buffer, we /always/ have to
free the inline data buffer or else we leak them like crazy.  This test
was found by turning on kmemleak for generic/001 or generic/388.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 61d1cb7..8012741 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
 }
 
 /*
+ * Free any local-format buffers sitting around before we reset to
+ * extents format.
+ */
+static inline void
+xfs_ifree_local_data(
+	struct xfs_inode	*ip,
+	int			whichfork)
+{
+	struct xfs_ifork	*ifp;
+
+	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
+		return;
+
+	ifp = XFS_IFORK_PTR(ip, whichfork);
+	xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
+}
+
+/*
  * This is called to return an inode to the inode free list.
  * The inode should already be truncated to 0 length and have
  * no pages associated with it.  This routine also assumes that
@@ -2437,6 +2455,9 @@ xfs_ifree(
 	if (error)
 		return error;
 
+	xfs_ifree_local_data(ip, XFS_DATA_FORK);
+	xfs_ifree_local_data(ip, XFS_ATTR_FORK);
+
 	VFS_I(ip)->i_mode = 0;		/* mark incore inode as free */
 	ip->i_d.di_flags = 0;
 	ip->i_d.di_dmevmask = 0;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2017-11-23  6:01 [PATCH] xfs: always free inline data before resetting inode fork during ifree Darrick J. Wong
@ 2017-11-23  8:14 ` Christoph Hellwig
  2018-03-23  1:30 ` Luis R. Rodriguez
  1 sibling, 0 replies; 34+ messages in thread
From: Christoph Hellwig @ 2017-11-23  8:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2017-11-23  6:01 [PATCH] xfs: always free inline data before resetting inode fork during ifree Darrick J. Wong
  2017-11-23  8:14 ` Christoph Hellwig
@ 2018-03-23  1:30 ` Luis R. Rodriguez
  2018-03-23  3:41   ` Darrick J. Wong
  1 sibling, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-23  1:30 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> In xfs_ifree, we reset the data/attr forks to extents format without
> bothering to free any inline data buffer that might still be around
> after all the blocks have been truncated off the file.  Prior to commit
> 43518812d2 ("xfs: remove support for inlining data/extents into the
> inode fork") nobody noticed because the leftover inline data after
> truncation was small enough to fit inside the inline buffer inside the
> fork itself.
> 
> However, now that we've removed the inline buffer, we /always/ have to
> free the inline data buffer or else we leak them like crazy.  This test
> was found by turning on kmemleak for generic/001 or generic/388.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_inode.c |   21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 61d1cb7..8012741 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
>  }
>  
>  /*
> + * Free any local-format buffers sitting around before we reset to
> + * extents format.
> + */
> +static inline void
> +xfs_ifree_local_data(
> +	struct xfs_inode	*ip,
> +	int			whichfork)
> +{
> +	struct xfs_ifork	*ifp;
> +
> +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> +		return;

I'm new to all this so this was a bit hard to follow. I'm confused with how
commit 43518812d2 ("xfs: remove support for inlining data/extents into the
inode fork") exacerbated the leak, isn't that commit about
XFS_DINODE_FMT_EXTENTS?

Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-23  1:30 ` Luis R. Rodriguez
@ 2018-03-23  3:41   ` Darrick J. Wong
  2018-03-23 17:08     ` Luis R. Rodriguez
  0 siblings, 1 reply; 34+ messages in thread
From: Darrick J. Wong @ 2018-03-23  3:41 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: xfs

On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > In xfs_ifree, we reset the data/attr forks to extents format without
> > bothering to free any inline data buffer that might still be around
> > after all the blocks have been truncated off the file.  Prior to commit
> > 43518812d2 ("xfs: remove support for inlining data/extents into the
> > inode fork") nobody noticed because the leftover inline data after
> > truncation was small enough to fit inside the inline buffer inside the
> > fork itself.
> > 
> > However, now that we've removed the inline buffer, we /always/ have to
> > free the inline data buffer or else we leak them like crazy.  This test
> > was found by turning on kmemleak for generic/001 or generic/388.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_inode.c |   21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 61d1cb7..8012741 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> >  }
> >  
> >  /*
> > + * Free any local-format buffers sitting around before we reset to
> > + * extents format.
> > + */
> > +static inline void
> > +xfs_ifree_local_data(
> > +	struct xfs_inode	*ip,
> > +	int			whichfork)
> > +{
> > +	struct xfs_ifork	*ifp;
> > +
> > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > +		return;
> 
> I'm new to all this so this was a bit hard to follow. I'm confused with how
> commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> inode fork") exacerbated the leak, isn't that commit about
> XFS_DINODE_FMT_EXTENTS?

Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
incore data was small enough to fit in if_inline_ata.

> Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?

An empty directory is 6 bytes, which is what you get with a fresh mkdir
or after deleting everything in the directory.  Prior to the 43518812d2
patch we could get away with not even checking if we had to free if_data
when deleting a directory because it fit within if_inline_data.

--D

>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-23  3:41   ` Darrick J. Wong
@ 2018-03-23 17:08     ` Luis R. Rodriguez
  2018-03-23 17:26       ` Darrick J. Wong
  0 siblings, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-23 17:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Luis R. Rodriguez, xfs

On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 61d1cb7..8012741 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> > >  }
> > >  
> > >  /*
> > > + * Free any local-format buffers sitting around before we reset to
> > > + * extents format.
> > > + */
> > > +static inline void
> > > +xfs_ifree_local_data(
> > > +	struct xfs_inode	*ip,
> > > +	int			whichfork)
> > > +{
> > > +	struct xfs_ifork	*ifp;
> > > +
> > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > > +		return;
> > 
> > I'm new to all this so this was a bit hard to follow. I'm confused with how
> > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> > inode fork") exacerbated the leak, isn't that commit about
> > XFS_DINODE_FMT_EXTENTS?
> 
> Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> incore data was small enough to fit in if_inline_ata.

Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.

> > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> 
> An empty directory is 6 bytes, which is what you get with a fresh mkdir
> or after deleting everything in the directory.  Prior to the 43518812d2
> patch we could get away with not even checking if we had to free if_data
> when deleting a directory because it fit within if_inline_data.

Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-23 17:08     ` Luis R. Rodriguez
@ 2018-03-23 17:26       ` Darrick J. Wong
  2018-03-23 18:23         ` Luis R. Rodriguez
  0 siblings, 1 reply; 34+ messages in thread
From: Darrick J. Wong @ 2018-03-23 17:26 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: xfs

On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > index 61d1cb7..8012741 100644
> > > > --- a/fs/xfs/xfs_inode.c
> > > > +++ b/fs/xfs/xfs_inode.c
> > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> > > >  }
> > > >  
> > > >  /*
> > > > + * Free any local-format buffers sitting around before we reset to
> > > > + * extents format.
> > > > + */
> > > > +static inline void
> > > > +xfs_ifree_local_data(
> > > > +	struct xfs_inode	*ip,
> > > > +	int			whichfork)
> > > > +{
> > > > +	struct xfs_ifork	*ifp;
> > > > +
> > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > > > +		return;
> > > 
> > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> > > inode fork") exacerbated the leak, isn't that commit about
> > > XFS_DINODE_FMT_EXTENTS?
> > 
> > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> > incore data was small enough to fit in if_inline_ata.
> 
> Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> 
> > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> > 
> > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> > or after deleting everything in the directory.  Prior to the 43518812d2
> > patch we could get away with not even checking if we had to free if_data
> > when deleting a directory because it fit within if_inline_data.
> 
> Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.

You'd have to modify the patch so that it doesn't try to kmem_free
if_data if if_data == if_inline_data but otherwise (in theory) I think
that the concept applies to pre-4.15 kernels.

(YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)

--D

>   Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-23 17:26       ` Darrick J. Wong
@ 2018-03-23 18:23         ` Luis R. Rodriguez
  2018-03-24  9:06           ` Greg Kroah-Hartman
  2018-03-25 22:33           ` Dave Chinner
  0 siblings, 2 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-23 18:23 UTC (permalink / raw)
  To: Darrick J. Wong, Christoph Hellwig
  Cc: Luis R. Rodriguez, xfs, linux-kernel, Sasha Levin,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > > index 61d1cb7..8012741 100644
> > > > > --- a/fs/xfs/xfs_inode.c
> > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> > > > >  }
> > > > >  
> > > > >  /*
> > > > > + * Free any local-format buffers sitting around before we reset to
> > > > > + * extents format.
> > > > > + */
> > > > > +static inline void
> > > > > +xfs_ifree_local_data(
> > > > > +	struct xfs_inode	*ip,
> > > > > +	int			whichfork)
> > > > > +{
> > > > > +	struct xfs_ifork	*ifp;
> > > > > +
> > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > > > > +		return;
> > > > 
> > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> > > > inode fork") exacerbated the leak, isn't that commit about
> > > > XFS_DINODE_FMT_EXTENTS?
> > > 
> > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> > > incore data was small enough to fit in if_inline_ata.
> > 
> > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> > 
> > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> > > 
> > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> > > or after deleting everything in the directory.  Prior to the 43518812d2
> > > patch we could get away with not even checking if we had to free if_data
> > > when deleting a directory because it fit within if_inline_data.
> > 
> > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> 
> You'd have to modify the patch so that it doesn't try to kmem_free
> if_data if if_data == if_inline_data but otherwise (in theory) I think
> that the concept applies to pre-4.15 kernels.
> 
> (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)

Well... so we need a resolution and better get testing this already given that
*I believe* the new auto-selection algorithm used to cherry pick patches onto
stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
are prefixed with AUTOSEL, a recent discussion covered this in November 2017
[1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
inlining data/extents into the inode fork").

Sasha, Greg,

Can you confirm if the algorithm was used in this case?

Since both commits are merged on v4.15, this is a non-issue on >= 4.15.

I do wonder if other XFS folks are *at least* aware that the auto-selection
algorithm now currently merging patches onto stable for XFS?

FWIW I just finished completing review *all* the other stable commits merged on
XFS on v4.14 *and* v4.13.y and this was the only one that cried out as fishy...
so I would not use this as a reason to say we shouldn't use it for XFS,
specially in lieu of any formal active process which we can count on always
takes place for XFS stable patches. In fact, I'd say that if the auto-selection
algorithm was used we should be able to fine tune it with a bit more subsystem
involvement.  I can at least volunteer to help try to review the candidate
patches that AUTOSEL comes up with (any others?), but note I'm new to XFS... I
can also think of a few modifications to the algorithm but which I can make in
a separate thread. Anyway, provided this is reasonable with others, then
perhaps we can keep using it for XFS?

[0] https://soarsmu.github.io/papers/icse12-patch.pdf
[1] https://lkml.org/lkml/2017/11/21/486

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-23 18:23         ` Luis R. Rodriguez
@ 2018-03-24  9:06           ` Greg Kroah-Hartman
  2018-03-24 17:21             ` Darrick J. Wong
  2018-03-25 22:33           ` Dave Chinner
  1 sibling, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2018-03-24  9:06 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Darrick J. Wong, Christoph Hellwig, xfs, linux-kernel,
	Sasha Levin, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
> On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > > > index 61d1cb7..8012741 100644
> > > > > > --- a/fs/xfs/xfs_inode.c
> > > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > + * Free any local-format buffers sitting around before we reset to
> > > > > > + * extents format.
> > > > > > + */
> > > > > > +static inline void
> > > > > > +xfs_ifree_local_data(
> > > > > > +	struct xfs_inode	*ip,
> > > > > > +	int			whichfork)
> > > > > > +{
> > > > > > +	struct xfs_ifork	*ifp;
> > > > > > +
> > > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > > > > > +		return;
> > > > > 
> > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> > > > > inode fork") exacerbated the leak, isn't that commit about
> > > > > XFS_DINODE_FMT_EXTENTS?
> > > > 
> > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> > > > incore data was small enough to fit in if_inline_ata.
> > > 
> > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> > > 
> > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> > > > 
> > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> > > > or after deleting everything in the directory.  Prior to the 43518812d2
> > > > patch we could get away with not even checking if we had to free if_data
> > > > when deleting a directory because it fit within if_inline_data.
> > > 
> > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> > 
> > You'd have to modify the patch so that it doesn't try to kmem_free
> > if_data if if_data == if_inline_data but otherwise (in theory) I think
> > that the concept applies to pre-4.15 kernels.
> > 
> > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
> 
> Well... so we need a resolution and better get testing this already given that
> *I believe* the new auto-selection algorithm used to cherry pick patches onto
> stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
> are prefixed with AUTOSEL, a recent discussion covered this in November 2017
> [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
> data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
> on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
> inlining data/extents into the inode fork").
> 
> Sasha, Greg,
> 
> Can you confirm if the algorithm was used in this case?

No idea.

I think xfs should just be added to the "blacklist" so that it is not
even looked at for these types of auto-selected patches.  Much like the
i915 driver currently is handled (it too is ignored for these patches
due to objections from the maintainers of it.)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-24  9:06           ` Greg Kroah-Hartman
@ 2018-03-24 17:21             ` Darrick J. Wong
  2018-03-26  4:54               ` Sasha Levin
  0 siblings, 1 reply; 34+ messages in thread
From: Darrick J. Wong @ 2018-03-24 17:21 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Luis R. Rodriguez, Christoph Hellwig, xfs, linux-kernel,
	Sasha Levin, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote:
> On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> > > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > > > > index 61d1cb7..8012741 100644
> > > > > > > --- a/fs/xfs/xfs_inode.c
> > > > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> > > > > > >  }
> > > > > > >  
> > > > > > >  /*
> > > > > > > + * Free any local-format buffers sitting around before we reset to
> > > > > > > + * extents format.
> > > > > > > + */
> > > > > > > +static inline void
> > > > > > > +xfs_ifree_local_data(
> > > > > > > +	struct xfs_inode	*ip,
> > > > > > > +	int			whichfork)
> > > > > > > +{
> > > > > > > +	struct xfs_ifork	*ifp;
> > > > > > > +
> > > > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > > > > > > +		return;
> > > > > > 
> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> > > > > > inode fork") exacerbated the leak, isn't that commit about
> > > > > > XFS_DINODE_FMT_EXTENTS?
> > > > > 
> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> > > > > incore data was small enough to fit in if_inline_ata.
> > > > 
> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> > > > 
> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> > > > > 
> > > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> > > > > or after deleting everything in the directory.  Prior to the 43518812d2
> > > > > patch we could get away with not even checking if we had to free if_data
> > > > > when deleting a directory because it fit within if_inline_data.
> > > > 
> > > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> > > 
> > > You'd have to modify the patch so that it doesn't try to kmem_free
> > > if_data if if_data == if_inline_data but otherwise (in theory) I think
> > > that the concept applies to pre-4.15 kernels.
> > > 
> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
> > 
> > Well... so we need a resolution and better get testing this already given that
> > *I believe* the new auto-selection algorithm used to cherry pick patches onto
> > stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
> > are prefixed with AUTOSEL, a recent discussion covered this in November 2017
> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
> > data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
> > inlining data/extents into the inode fork").
> > 
> > Sasha, Greg,
> > 
> > Can you confirm if the algorithm was used in this case?
> 
> No idea.
> 
> I think xfs should just be added to the "blacklist" so that it is not
> even looked at for these types of auto-selected patches.  Much like the
> i915 driver currently is handled (it too is ignored for these patches
> due to objections from the maintainers of it.)

Just out of curiosity, how does this autoselection mechanism work today?
If it's smart enough to cherry pick patches, apply them to a kernel,
build the kernel and run xfstests, and propose the patches if nothing
weird happened, then I'd be interested in looking further.  I've nothing
against algorithmic selection per se, but I'd want to know more about
the data sets and parameters that feed the algorithm.

I did receive the AUTOSEL tagged patches a few days ago, but I couldn't
figure out what automated regression testing, if any, had been done; or
whether the patch submission was asking if we wanted it put into 4.14
or if it was a declaration that they were on their way in.  Excuse me
for being behind the times, but I'd gotten accustomed xfs patches only
ending up in the stable kernels because we'd deliberately put them
there. :)

If blacklisting xfs is more convenient then I'm happy to continue things
as they were.

--D

> thanks,
> 
> greg k-h
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-23 18:23         ` Luis R. Rodriguez
  2018-03-24  9:06           ` Greg Kroah-Hartman
@ 2018-03-25 22:33           ` Dave Chinner
  2018-03-26 23:54             ` Sasha Levin
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2018-03-25 22:33 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Darrick J. Wong, Christoph Hellwig, xfs, linux-kernel,
	Sasha Levin, Greg Kroah-Hartman, Julia Lawall, Josh Triplett,
	Takashi Iwai, Michal Hocko, Joerg Roedel

On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
> On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > > > > index 61d1cb7..8012741 100644
> > > > > > --- a/fs/xfs/xfs_inode.c
> > > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> > > > > >  }
> > > > > >  
> > > > > >  /*
> > > > > > + * Free any local-format buffers sitting around before we reset to
> > > > > > + * extents format.
> > > > > > + */
> > > > > > +static inline void
> > > > > > +xfs_ifree_local_data(
> > > > > > +	struct xfs_inode	*ip,
> > > > > > +	int			whichfork)
> > > > > > +{
> > > > > > +	struct xfs_ifork	*ifp;
> > > > > > +
> > > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> > > > > > +		return;
> > > > > 
> > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> > > > > inode fork") exacerbated the leak, isn't that commit about
> > > > > XFS_DINODE_FMT_EXTENTS?
> > > > 
> > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> > > > incore data was small enough to fit in if_inline_ata.
> > > 
> > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> > > 
> > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> > > > 
> > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> > > > or after deleting everything in the directory.  Prior to the 43518812d2
> > > > patch we could get away with not even checking if we had to free if_data
> > > > when deleting a directory because it fit within if_inline_data.
> > > 
> > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> > 
> > You'd have to modify the patch so that it doesn't try to kmem_free
> > if_data if if_data == if_inline_data but otherwise (in theory) I think
> > that the concept applies to pre-4.15 kernels.
> > 
> > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
> 
> Well... so we need a resolution and better get testing this already given that
> *I believe* the new auto-selection algorithm used to cherry pick patches onto
> stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
> are prefixed with AUTOSEL, a recent discussion covered this in November 2017
> [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
> data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
> on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
> inlining data/extents into the inode fork").

Yikes. That sets off all my "how to break filesysetms for fun and
profit" alarm bells. This is like playing russian roulette with all
our user's data.  XFS fixes that look like they are simple often
have subtle dependencies in them that automated backports won't ever
be able to understand, and if we don't get that right, we break
stuff.

Filesystems aren't like drivers or memory management - you can't
reboot to fix a filesystem corruption or data loss bug. User's tend
to care a lot more about their data and cat photos than they do
about how often the dodgy hardware they bought on ebay needs OS
rebooting to get working again..

> I do wonder if other XFS folks are *at least* aware that the auto-selection
> algorithm now currently merging patches onto stable for XFS?

No I wasn't aware that this was happening.  I'm kinda shit scared
right now hearing about how automated backports of random kernel
patches are being done with minimal oversight and no visibility to
the subsystem developers. When did this start happening?

At this point I'd be much more comfortable if XFS was blacklisted
until there's solid subsystem developer visibility of the iautomated
backports, not to mention a solid set of automated regression
testing backing this automated backport proceedure.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-24 17:21             ` Darrick J. Wong
@ 2018-03-26  4:54               ` Sasha Levin
  2018-03-26  6:48                 ` Darrick J. Wong
  2018-03-26 17:39                 ` Luis R. Rodriguez
  0 siblings, 2 replies; 34+ messages in thread
From: Sasha Levin @ 2018-03-26  4:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Greg Kroah-Hartman, Luis R. Rodriguez, Christoph Hellwig, xfs,
	linux-kernel, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote:
>On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote:
>> On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
>> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
>> > > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
>> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
>> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
>> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
>> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> > > > > > > index 61d1cb7..8012741 100644
>> > > > > > > --- a/fs/xfs/xfs_inode.c
>> > > > > > > +++ b/fs/xfs/xfs_inode.c
>> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
>> > > > > > >  }
>> > > > > > >
>> > > > > > >  /*
>> > > > > > > + * Free any local-format buffers sitting around before we reset to
>> > > > > > > + * extents format.
>> > > > > > > + */
>> > > > > > > +static inline void
>> > > > > > > +xfs_ifree_local_data(
>> > > > > > > +	struct xfs_inode	*ip,
>> > > > > > > +	int			whichfork)
>> > > > > > > +{
>> > > > > > > +	struct xfs_ifork	*ifp;
>> > > > > > > +
>> > > > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
>> > > > > > > +		return;
>> > > > > >
>> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
>> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
>> > > > > > inode fork") exacerbated the leak, isn't that commit about
>> > > > > > XFS_DINODE_FMT_EXTENTS?
>> > > > >
>> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
>> > > > > incore data was small enough to fit in if_inline_ata.
>> > > >
>> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
>> > > >
>> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
>> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
>> > > > >
>> > > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
>> > > > > or after deleting everything in the directory.  Prior to the 43518812d2
>> > > > > patch we could get away with not even checking if we had to free if_data
>> > > > > when deleting a directory because it fit within if_inline_data.
>> > > >
>> > > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
>> > >
>> > > You'd have to modify the patch so that it doesn't try to kmem_free
>> > > if_data if if_data == if_inline_data but otherwise (in theory) I think
>> > > that the concept applies to pre-4.15 kernels.
>> > >
>> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
>> >
>> > Well... so we need a resolution and better get testing this already given that
>> > *I believe* the new auto-selection algorithm used to cherry pick patches onto
>> > stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
>> > are prefixed with AUTOSEL, a recent discussion covered this in November 2017
>> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
>> > data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
>> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
>> > inlining data/extents into the inode fork").
>> >
>> > Sasha, Greg,
>> >
>> > Can you confirm if the algorithm was used in this case?
>>
>> No idea.
>>
>> I think xfs should just be added to the "blacklist" so that it is not
>> even looked at for these types of auto-selected patches.  Much like the
>> i915 driver currently is handled (it too is ignored for these patches
>> due to objections from the maintainers of it.)
>
>Just out of curiosity, how does this autoselection mechanism work today?
>If it's smart enough to cherry pick patches, apply them to a kernel,
>build the kernel and run xfstests, and propose the patches if nothing
>weird happened, then I'd be interested in looking further.  I've nothing
>against algorithmic selection per se, but I'd want to know more about
>the data sets and parameters that feed the algorithm.

It won't go beyond build testing.

>I did receive the AUTOSEL tagged patches a few days ago, but I couldn't
>figure out what automated regression testing, if any, had been done; or
>whether the patch submission was asking if we wanted it put into 4.14
>or if it was a declaration that they were on their way in.  Excuse me

There would be (at least) 3 different mails involved in this process:

 1. You'd get a mail from me, proposing this patch for stable. We give
 at least 1 week (but usually closer to 2) to comment on whether this
 patch should or should not go in stable.

 2. If no objections were received, Greg would add it to his queue and
 you'd get another mail about that.

 3. A few more days later, Greg would release that stable tree and you'd
 get another mail.

>for being behind the times, but I'd gotten accustomed xfs patches only
>ending up in the stable kernels because we'd deliberately put them
>there. :)
>
>If blacklisting xfs is more convenient then I'm happy to continue things
>as they were.

No problem with blacklisting subsystems if maintainers prefer it that
way, but the i915 case was slightly different as their development
process was very quirky and testing was complex, so they asked to just
keep doing their own selection for stable.

However, looking at stable history, it seems that no patch from fs/xfs/
was proposed for stable for about half a year now, which is something
that the autoselection project is trying to help with.

A different flow I'm working on for this is to send an email as a reply
to the original patch submission to lkml if the patch is selected by the
network, including details about which trees it was applied to and build
results. I think it might work better for subsystems such as xfs.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-26  4:54               ` Sasha Levin
@ 2018-03-26  6:48                 ` Darrick J. Wong
  2018-03-26 17:39                 ` Luis R. Rodriguez
  1 sibling, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2018-03-26  6:48 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Greg Kroah-Hartman, Luis R. Rodriguez, Christoph Hellwig, xfs,
	linux-kernel, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Mon, Mar 26, 2018 at 04:54:59AM +0000, Sasha Levin wrote:
> On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote:
> >On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote:
> >> On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
> >> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> >> > > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> >> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> >> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> >> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> >> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> > > > > > > index 61d1cb7..8012741 100644
> >> > > > > > > --- a/fs/xfs/xfs_inode.c
> >> > > > > > > +++ b/fs/xfs/xfs_inode.c
> >> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> >> > > > > > >  }
> >> > > > > > >
> >> > > > > > >  /*
> >> > > > > > > + * Free any local-format buffers sitting around before we reset to
> >> > > > > > > + * extents format.
> >> > > > > > > + */
> >> > > > > > > +static inline void
> >> > > > > > > +xfs_ifree_local_data(
> >> > > > > > > +	struct xfs_inode	*ip,
> >> > > > > > > +	int			whichfork)
> >> > > > > > > +{
> >> > > > > > > +	struct xfs_ifork	*ifp;
> >> > > > > > > +
> >> > > > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> >> > > > > > > +		return;
> >> > > > > >
> >> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> >> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> >> > > > > > inode fork") exacerbated the leak, isn't that commit about
> >> > > > > > XFS_DINODE_FMT_EXTENTS?
> >> > > > >
> >> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> >> > > > > incore data was small enough to fit in if_inline_ata.
> >> > > >
> >> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> >> > > >
> >> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> >> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> >> > > > >
> >> > > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> >> > > > > or after deleting everything in the directory.  Prior to the 43518812d2
> >> > > > > patch we could get away with not even checking if we had to free if_data
> >> > > > > when deleting a directory because it fit within if_inline_data.
> >> > > >
> >> > > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> >> > >
> >> > > You'd have to modify the patch so that it doesn't try to kmem_free
> >> > > if_data if if_data == if_inline_data but otherwise (in theory) I think
> >> > > that the concept applies to pre-4.15 kernels.
> >> > >
> >> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
> >> >
> >> > Well... so we need a resolution and better get testing this already given that
> >> > *I believe* the new auto-selection algorithm used to cherry pick patches onto
> >> > stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
> >> > are prefixed with AUTOSEL, a recent discussion covered this in November 2017
> >> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
> >> > data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
> >> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
> >> > inlining data/extents into the inode fork").
> >> >
> >> > Sasha, Greg,
> >> >
> >> > Can you confirm if the algorithm was used in this case?
> >>
> >> No idea.
> >>
> >> I think xfs should just be added to the "blacklist" so that it is not
> >> even looked at for these types of auto-selected patches.  Much like the
> >> i915 driver currently is handled (it too is ignored for these patches
> >> due to objections from the maintainers of it.)
> >
> >Just out of curiosity, how does this autoselection mechanism work today?
> >If it's smart enough to cherry pick patches, apply them to a kernel,
> >build the kernel and run xfstests, and propose the patches if nothing
> >weird happened, then I'd be interested in looking further.  I've nothing
> >against algorithmic selection per se, but I'd want to know more about
> >the data sets and parameters that feed the algorithm.
> 
> It won't go beyond build testing.

No further regression testing ==> please blacklist XFS.

We will continue our current practices w.r.t. stable.

--D

> >I did receive the AUTOSEL tagged patches a few days ago, but I couldn't
> >figure out what automated regression testing, if any, had been done; or
> >whether the patch submission was asking if we wanted it put into 4.14
> >or if it was a declaration that they were on their way in.  Excuse me
> 
> There would be (at least) 3 different mails involved in this process:
> 
>  1. You'd get a mail from me, proposing this patch for stable. We give
>  at least 1 week (but usually closer to 2) to comment on whether this
>  patch should or should not go in stable.
> 
>  2. If no objections were received, Greg would add it to his queue and
>  you'd get another mail about that.
> 
>  3. A few more days later, Greg would release that stable tree and you'd
>  get another mail.
> 
> >for being behind the times, but I'd gotten accustomed xfs patches only
> >ending up in the stable kernels because we'd deliberately put them
> >there. :)
> >
> >If blacklisting xfs is more convenient then I'm happy to continue things
> >as they were.
> 
> No problem with blacklisting subsystems if maintainers prefer it that
> way, but the i915 case was slightly different as their development
> process was very quirky and testing was complex, so they asked to just
> keep doing their own selection for stable.
> 
> However, looking at stable history, it seems that no patch from fs/xfs/
> was proposed for stable for about half a year now, which is something
> that the autoselection project is trying to help with.
> 
> A different flow I'm working on for this is to send an email as a reply
> to the original patch submission to lkml if the patch is selected by the
> network, including details about which trees it was applied to and build
> results. I think it might work better for subsystems such as xfs.
> 
> 
> --
> Thanks,
> Sasha--
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-26  4:54               ` Sasha Levin
  2018-03-26  6:48                 ` Darrick J. Wong
@ 2018-03-26 17:39                 ` Luis R. Rodriguez
  1 sibling, 0 replies; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-26 17:39 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Darrick J. Wong, Greg Kroah-Hartman, Luis R. Rodriguez,
	Christoph Hellwig, xfs, linux-kernel, Julia Lawall,
	Josh Triplett, Takashi Iwai, Michal Hocko, Joerg Roedel

On Mon, Mar 26, 2018 at 04:54:59AM +0000, Sasha Levin wrote:
> On Sat, Mar 24, 2018 at 10:21:59AM -0700, Darrick J. Wong wrote:
> >On Sat, Mar 24, 2018 at 10:06:38AM +0100, Greg Kroah-Hartman wrote:
> >> On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
> >> > On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> >> > > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> >> > > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> >> > > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> >> > > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> >> > > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> > > > > > > index 61d1cb7..8012741 100644
> >> > > > > > > --- a/fs/xfs/xfs_inode.c
> >> > > > > > > +++ b/fs/xfs/xfs_inode.c
> >> > > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> >> > > > > > >  }
> >> > > > > > >
> >> > > > > > >  /*
> >> > > > > > > + * Free any local-format buffers sitting around before we reset to
> >> > > > > > > + * extents format.
> >> > > > > > > + */
> >> > > > > > > +static inline void
> >> > > > > > > +xfs_ifree_local_data(
> >> > > > > > > +	struct xfs_inode	*ip,
> >> > > > > > > +	int			whichfork)
> >> > > > > > > +{
> >> > > > > > > +	struct xfs_ifork	*ifp;
> >> > > > > > > +
> >> > > > > > > +	if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> >> > > > > > > +		return;
> >> > > > > >
> >> > > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> >> > > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> >> > > > > > inode fork") exacerbated the leak, isn't that commit about
> >> > > > > > XFS_DINODE_FMT_EXTENTS?
> >> > > > >
> >> > > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> >> > > > > incore data was small enough to fit in if_inline_ata.
> >> > > >
> >> > > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> >> > > >
> >> > > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> >> > > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> >> > > > >
> >> > > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> >> > > > > or after deleting everything in the directory.  Prior to the 43518812d2
> >> > > > > patch we could get away with not even checking if we had to free if_data
> >> > > > > when deleting a directory because it fit within if_inline_data.
> >> > > >
> >> > > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> >> > >
> >> > > You'd have to modify the patch so that it doesn't try to kmem_free
> >> > > if_data if if_data == if_inline_data but otherwise (in theory) I think
> >> > > that the concept applies to pre-4.15 kernels.
> >> > >
> >> > > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
> >> >
> >> > Well... so we need a resolution and better get testing this already given that
> >> > *I believe* the new auto-selection algorithm used to cherry pick patches onto
> >> > stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
> >> > are prefixed with AUTOSEL, a recent discussion covered this in November 2017
> >> > [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
> >> > data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
> >> > on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
> >> > inlining data/extents into the inode fork").
> >> >
> >> > Sasha, Greg,
> >> >
> >> > Can you confirm if the algorithm was used in this case?
> >>
> >> No idea.
> >>
> >> I think xfs should just be added to the "blacklist" so that it is not
> >> even looked at for these types of auto-selected patches.  Much like the
> >> i915 driver currently is handled (it too is ignored for these patches
> >> due to objections from the maintainers of it.)
> >
> >Just out of curiosity, how does this autoselection mechanism work today?
> >If it's smart enough to cherry pick patches, apply them to a kernel,
> >build the kernel and run xfstests, and propose the patches if nothing
> >weird happened, then I'd be interested in looking further.  I've nothing
> >against algorithmic selection per se, but I'd want to know more about
> >the data sets and parameters that feed the algorithm.
> 
> It won't go beyond build testing.

This is one area where we can improve things.

Perhaps the typical use case for the kernel for looking for stable fixes is to
groom for good candidate fixes using some heuristics, test compile and if there
is no issue merge. It would be not so different than a human using their own
experience of some sort using a pattern to do the same. However the big
difference for filesystems is we *need* proper testing.

The issue with filesystems is the consequence of a bug are much more severe
than just a panic. Any corner case issue can really be much more critical than
just crashing a system. Ruining your filesystem is simply unforgivable,
specially if all that was missing was a proper regression test, and we already
have infrastructure for it.

The good news is that we have proper a proper testing suite to help out, and as
per my inspection 0-day already embraces a few tests. We can do a bit more here
for expanding on these... but also use a baseline for testing proposed XFS stable
fixes. For now I suggest a bit of manual work and I can volunteer to lead this
on, with a transparent mechanism to enable other folks to reproduce similar
testing in an easy way. I had in mind a mechanism to automate this long term,
and I started working on something, so I can use that to start this effort,
and publish what I have soon.

But that would not be enough, we need proper testing and careful oversight. I
can volunteer to review the patches manually (any others?) and if any questions
come up bring them up and check / verify with others.

The new autoselection truly is state of the art and we'd be silly to not
address corner cases for subsystems to help take advantage of the candidate
patches it provides. My inspection so far revealed only one possible issue
fix, that is not bad at all.

> >I did receive the AUTOSEL tagged patches a few days ago, but I couldn't
> >figure out what automated regression testing, if any, had been done; or
> >whether the patch submission was asking if we wanted it put into 4.14
> >or if it was a declaration that they were on their way in.  Excuse me
> 
> There would be (at least) 3 different mails involved in this process:
> 
>  1. You'd get a mail from me, proposing this patch for stable. We give
>  at least 1 week (but usually closer to 2) to comment on whether this
>  patch should or should not go in stable.
> 
>  2. If no objections were received, Greg would add it to his queue and
>  you'd get another mail about that.

Is there a temporary tree with all the patches already merged by any chance?
That would help with testing the queue of candidate fixes.

>  3. A few more days later, Greg would release that stable tree and you'd
>  get another mail.

What I propose is for XFS we add a few more steps:

1a. Ensure the fixes are in a temporary tree
1b. Add me to the list of reviewers
1c. I'll run some tests against it compared to a baseline which would be
    pre-established
1d. While the tests are running do a manual inspection of the patches
1e. Only if patches produce no regressions with the established baseline and get
    a Reviewed-by by at least one developer do we publish to stable

I can work on establishing the baseline with the community next.

> >for being behind the times, but I'd gotten accustomed xfs patches only
> >ending up in the stable kernels because we'd deliberately put them
> >there. :)
> >
> >If blacklisting xfs is more convenient then I'm happy to continue things
> >as they were.
> 
> No problem with blacklisting subsystems if maintainers prefer it that
> way, but the i915 case was slightly different as their development
> process was very quirky and testing was complex, so they asked to just
> keep doing their own selection for stable.

filesystems are the same -- the issue is the risk of an issue is much more
severe.

> However, looking at stable history, it seems that no patch from fs/xfs/
> was proposed for stable for about half a year now, which is something
> that the autoselection project is trying to help with.

Indeed. I have addressed the stable question on XFS in person with at the small
BoF at Vault last year, and with other folks later. The current process may
work for some... but I think we can do better and it just requires volunteers,
and I think using the auto-selection process to come up with *candidates* is a
great opportunity, specially in light of the patches I have seen so far already
merged.

My testing of them also proved no regressions (modulo I skipped the one I
pointed out on this thread.

> A different flow I'm working on for this is to send an email as a reply
> to the original patch submission to lkml if the patch is selected by the
> network, including details about which trees it was applied to and build
> results. I think it might work better for subsystems such as xfs.

We need testing. Without testing this cannot fly.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-25 22:33           ` Dave Chinner
@ 2018-03-26 23:54             ` Sasha Levin
  2018-03-27  7:06               ` Michal Hocko
  2018-03-28  3:32               ` Dave Chinner
  0 siblings, 2 replies; 34+ messages in thread
From: Sasha Levin @ 2018-03-26 23:54 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Luis R. Rodriguez, Darrick J. Wong, Christoph Hellwig, xfs,
	linux-kernel@vger.kernel.org List, Sasha Levin,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Sun, Mar 25, 2018 at 6:33 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
>> On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
>> > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
>> > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
>> > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
>> > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
>> > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
>> > > > > > index 61d1cb7..8012741 100644
>> > > > > > --- a/fs/xfs/xfs_inode.c
>> > > > > > +++ b/fs/xfs/xfs_inode.c
>> > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
>> > > > > >  }
>> > > > > >
>> > > > > >  /*
>> > > > > > + * Free any local-format buffers sitting around before we reset to
>> > > > > > + * extents format.
>> > > > > > + */
>> > > > > > +static inline void
>> > > > > > +xfs_ifree_local_data(
>> > > > > > +   struct xfs_inode        *ip,
>> > > > > > +   int                     whichfork)
>> > > > > > +{
>> > > > > > +   struct xfs_ifork        *ifp;
>> > > > > > +
>> > > > > > +   if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
>> > > > > > +           return;
>> > > > >
>> > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
>> > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
>> > > > > inode fork") exacerbated the leak, isn't that commit about
>> > > > > XFS_DINODE_FMT_EXTENTS?
>> > > >
>> > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
>> > > > incore data was small enough to fit in if_inline_ata.
>> > >
>> > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
>> > >
>> > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
>> > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
>> > > >
>> > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
>> > > > or after deleting everything in the directory.  Prior to the 43518812d2
>> > > > patch we could get away with not even checking if we had to free if_data
>> > > > when deleting a directory because it fit within if_inline_data.
>> > >
>> > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
>> >
>> > You'd have to modify the patch so that it doesn't try to kmem_free
>> > if_data if if_data == if_inline_data but otherwise (in theory) I think
>> > that the concept applies to pre-4.15 kernels.
>> >
>> > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
>>
>> Well... so we need a resolution and better get testing this already given that
>> *I believe* the new auto-selection algorithm used to cherry pick patches onto
>> stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
>> are prefixed with AUTOSEL, a recent discussion covered this in November 2017
>> [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
>> data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
>> on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
>> inlining data/extents into the inode fork").
>
> Yikes. That sets off all my "how to break filesysetms for fun and
> profit" alarm bells. This is like playing russian roulette with all
> our user's data.  XFS fixes that look like they are simple often
> have subtle dependencies in them that automated backports won't ever
> be able to understand, and if we don't get that right, we break
> stuff.

On the other hand, XFS has a few commits that fix possible
corruptions, that have never ended up in a stable tree. Isn't it just
as bad ("playing roulette") for users?

> Filesystems aren't like drivers or memory management - you can't
> reboot to fix a filesystem corruption or data loss bug. User's tend
> to care a lot more about their data and cat photos than they do
> about how often the dodgy hardware they bought on ebay needs OS
> rebooting to get working again..

Thank you for your input Dave. Let me give you the background for why
I've been doing it this way up to now to explain my reasoning, and
where I was wrong.

After I've built the initial framework for this, I ran it through a
set of kernel versions and ended up with a massively large set of
commits that were detected as bug fixes but didn't end up getting in
stable trees.

My thinking back then was that I can start building branches of
proposed commits, send them to their respective authors/maintainers,
and remove any patches that maintainers objected to their inclusion.
This process ran for a few months, and a few thousand patches (between
all kernel trees) went in this way. The rate of rejection (either when
a maintainer objects to an inclusion, or a regression discovered after
a kernel was released) was on par (and even lower) than the one for
"regular" patches tagged for stable.

What I didn't account for correctly was that people are way more
busier than I have expected, so a lot of these mails were lost or
ignored in mailboxes, so some of these patches never received review
before going in a stable tree.

I'm trying to fix this with a different approach (more below).

>> I do wonder if other XFS folks are *at least* aware that the auto-selection
>> algorithm now currently merging patches onto stable for XFS?
>
> No I wasn't aware that this was happening.  I'm kinda shit scared
> right now hearing about how automated backports of random kernel
> patches are being done with minimal oversight and no visibility to
> the subsystem developers. When did this start happening?

About half a year ago. I'm not sure about the no visibility part -
maintainers and authors would receive at least 3 mails for each patch
that got in this way, and would have at least a week (usually a lot
more) to object to the inclusion. Did you not receive any mails from
me?

> At this point I'd be much more comfortable if XFS was blacklisted
> until there's solid subsystem developer visibility of the iautomated
> backports, not to mention a solid set of automated regression
> testing backing this automated backport proceedure.

I'll be winding down what I'm trying to do now, and will be trying to
address these concerns from maintainers in a few different ways.

I've started working on a framework to automate reviews of sent
patches to lkml by my framework, this will allow me to do the
following:

 - I would send a reply to the original patch sent to LKML within a
few hours for patches that have a high probability for being a bug fix
rather than sending a brand new mail a few months after this patch
made it upstream. This will help reviews as this commit is still fresh
in the author+maintainers head.
 - I will include the results of builds for various build testing (I
got that working now). At this point I suspect this will mostly help
Greg with patches that are already sent with stable tags.
 - This will turn into an opt-in rather than opt-out, but it will be
extremely easy to opt in (something like replying with "ack" to have
that patch included in the proposed stable branches).
 - In the future, I'd also like to create a per-subsystem testing
procedure (so for example, for xfs - run xfstest). I'll try working
with maintainers of each subsystem to create something they're happy
with. Given this discussion, I'll make XFS my first attempt at this :)

The mails will look something like this (an example based on a recent
XFS commit):

> From: Sasha Levin <alexander.levin@microsoft.com>
> To: Sasha Levin <alexander.levin@microsoft.com>
> To: linux-xfs@vger.kernel.org, "Darrick J . Wong" <darrick.wong@oracle.com>
> Cc: Brian Foster <bfoster@redhat.com>, linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] xfs: Correctly invert xfs_buftarg LRU isolation logic
> In-Reply-To: <20180306102638.25322-1-vbendel@redhat.com>
> References: <20180306102638.25322-1-vbendel@redhat.com>
>
> Hi Vratislav Bendel,
>
> [This is an automated email]
>
> This commit has been processed by the -stable helper bot and determined
> to be a high probability candidate for -stable trees. (score: 6.4845)
>
> The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
>
> v4.15.12: OK!
> v4.14.29: OK!
> v4.9.89: OK!
> v4.4.123: OK!
> v4.1.50: OK!
> v3.18.101: OK!
>
> Please reply with "ack" to have this patch included in the appropriate stable trees.
>
> --
> Thanks,
> Sasha

If you look at the recent history for fs/xfs, there were no commits in
the past half a year or so that were submitted to any stable tree in
the "traditional" way. There are no XFS fixes in the 4.14 LTS tree
besides the ones submitted with the autoselection method. This is not
finger pointing at XFS, but rather at the -stable process itself. It's
difficult to keep track on which branches authors need to test their
patches on, what sort of tests they need to do, and how they should
tag their commits. In quite a few cases the effort to properly tag a
commit for stable takes more effort that writing the code for that
commit, which deters people from working with stable.

Thanks again for your input.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-26 23:54             ` Sasha Levin
@ 2018-03-27  7:06               ` Michal Hocko
  2018-03-27 19:54                 ` Luis R. Rodriguez
  2018-03-28  1:11                 ` Sasha Levin
  2018-03-28  3:32               ` Dave Chinner
  1 sibling, 2 replies; 34+ messages in thread
From: Michal Hocko @ 2018-03-27  7:06 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Chinner, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Sasha Levin, Greg Kroah-Hartman, Julia Lawall, Josh Triplett,
	Takashi Iwai, Joerg Roedel

On Mon 26-03-18 19:54:31, Sasha Levin wrote:
[...]
> About half a year ago. I'm not sure about the no visibility part -
> maintainers and authors would receive at least 3 mails for each patch
> that got in this way, and would have at least a week (usually a lot
> more) to object to the inclusion. Did you not receive any mails from
> me?

Well, I was aware of your emails yet my free time didn't really allow me
to follow those patch bombs. I responded only when some email subject
hit my eyes as being non-stable candidate. So by no means the MM
backports were reviewed by me. And considering how hard it is to get any
review for MM patches in general I strongly suspect that others didn't
review either.

In general I am quite skeptical about the automagic backports
selections, to be honest. MM patches should be reasonably good at
selecting stable backports and adding more patches on top just risks
regressions. 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-27  7:06               ` Michal Hocko
@ 2018-03-27 19:54                 ` Luis R. Rodriguez
  2018-03-28 13:21                   ` Michal Hocko
  2018-03-28  1:11                 ` Sasha Levin
  1 sibling, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-27 19:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sasha Levin, Dave Chinner, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Sasha Levin, Greg Kroah-Hartman, Julia Lawall, Josh Triplett,
	Takashi Iwai, Joerg Roedel

On Tue, Mar 27, 2018 at 09:06:37AM +0200, Michal Hocko wrote:
> So by no means the MM backports were reviewed by me. And considering how hard
> it is to get any review for MM patches in general I strongly suspect that
> others didn't review either.
> 
> In general I am quite skeptical about the automagic backports
> selections, to be honest. MM patches should be reasonably good at
> selecting stable backports and adding more patches on top just risks
> regressions. 

BTW other than suggesting we needing *actual review* of the MM patches, are
there known unit tests which could be run as well? Thinking long term.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-27  7:06               ` Michal Hocko
  2018-03-27 19:54                 ` Luis R. Rodriguez
@ 2018-03-28  1:11                 ` Sasha Levin
  2018-03-28 13:20                   ` Michal Hocko
  1 sibling, 1 reply; 34+ messages in thread
From: Sasha Levin @ 2018-03-28  1:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Sasha Levin, Dave Chinner, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Joerg Roedel

On Tue, Mar 27, 2018 at 09:06:37AM +0200, Michal Hocko wrote:
>On Mon 26-03-18 19:54:31, Sasha Levin wrote:
>[...]
>> About half a year ago. I'm not sure about the no visibility part -
>> maintainers and authors would receive at least 3 mails for each patch
>> that got in this way, and would have at least a week (usually a lot
>> more) to object to the inclusion. Did you not receive any mails from
>> me?
>
>Well, I was aware of your emails yet my free time didn't really allow me
>to follow those patch bombs. I responded only when some email subject
>hit my eyes as being non-stable candidate. So by no means the MM
>backports were reviewed by me. And considering how hard it is to get any
>review for MM patches in general I strongly suspect that others didn't
>review either.

Being aware is different than saying it was snuck in without anyone
knowing.

>In general I am quite skeptical about the automagic backports
>selections, to be honest. MM patches should be reasonably good at
>selecting stable backports and adding more patches on top just risks
>regressions.

Okay, let's take mm/ as a subsystem that is doing a good job with
submitting stuff to -stable.

There were 40 patches in total going into the 4.14 LTS tree that touched
mm/. Out of those, 5 were selected via the "automagic" process:

> 647a37ec1a17 mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'
> d91c3f2e540f mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep
> b2ba0bd34695 kmemleak: add scheduling point to kmemleak_scan()
> 5ca94e03675a slub: fix sysfs duplicate filename creation when slub_debug=O
> 342ee8775800 mm, x86/mm: Fix performance regression in get_user_pages_fast()

The only "questionable" one here I see is the performance regression
one, which I decided to keep in because it's a rather severe one in a
common code path.

Even good subsystems might sometimes miss a patch or two.

But yes, I've received less response from maintainers than I expected,
so I'll switch it to an opt-in model for new patches that go upstream.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-26 23:54             ` Sasha Levin
  2018-03-27  7:06               ` Michal Hocko
@ 2018-03-28  3:32               ` Dave Chinner
  2018-03-28 19:30                 ` Sasha Levin
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2018-03-28  3:32 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Luis R. Rodriguez, Darrick J. Wong, Christoph Hellwig, xfs,
	linux-kernel@vger.kernel.org List, Sasha Levin,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Mon, Mar 26, 2018 at 07:54:31PM -0400, Sasha Levin wrote:
> On Sun, Mar 25, 2018 at 6:33 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Fri, Mar 23, 2018 at 06:23:02PM +0000, Luis R. Rodriguez wrote:
> >> On Fri, Mar 23, 2018 at 10:26:20AM -0700, Darrick J. Wong wrote:
> >> > On Fri, Mar 23, 2018 at 05:08:13PM +0000, Luis R. Rodriguez wrote:
> >> > > On Thu, Mar 22, 2018 at 08:41:45PM -0700, Darrick J. Wong wrote:
> >> > > > On Fri, Mar 23, 2018 at 01:30:37AM +0000, Luis R. Rodriguez wrote:
> >> > > > > On Wed, Nov 22, 2017 at 10:01:37PM -0800, Darrick J. Wong wrote:
> >> > > > > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> >> > > > > > index 61d1cb7..8012741 100644
> >> > > > > > --- a/fs/xfs/xfs_inode.c
> >> > > > > > +++ b/fs/xfs/xfs_inode.c
> >> > > > > > @@ -2401,6 +2401,24 @@ xfs_ifree_cluster(
> >> > > > > >  }
> >> > > > > >
> >> > > > > >  /*
> >> > > > > > + * Free any local-format buffers sitting around before we reset to
> >> > > > > > + * extents format.
> >> > > > > > + */
> >> > > > > > +static inline void
> >> > > > > > +xfs_ifree_local_data(
> >> > > > > > +   struct xfs_inode        *ip,
> >> > > > > > +   int                     whichfork)
> >> > > > > > +{
> >> > > > > > +   struct xfs_ifork        *ifp;
> >> > > > > > +
> >> > > > > > +   if (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL)
> >> > > > > > +           return;
> >> > > > >
> >> > > > > I'm new to all this so this was a bit hard to follow. I'm confused with how
> >> > > > > commit 43518812d2 ("xfs: remove support for inlining data/extents into the
> >> > > > > inode fork") exacerbated the leak, isn't that commit about
> >> > > > > XFS_DINODE_FMT_EXTENTS?
> >> > > >
> >> > > > Not specifically _EXTENTS, merely any fork (EXTENTS or LOCAL) whose
> >> > > > incore data was small enough to fit in if_inline_ata.
> >> > >
> >> > > Got it, I thought those were XFS_DINODE_FMT_EXTENTS by definition.
> >> > >
> >> > > > > Did we have cases where the format was XFS_DINODE_FMT_LOCAL and yet
> >> > > > > ifp->if_u1.if_data == ifp->if_u2.if_inline_data ?
> >> > > >
> >> > > > An empty directory is 6 bytes, which is what you get with a fresh mkdir
> >> > > > or after deleting everything in the directory.  Prior to the 43518812d2
> >> > > > patch we could get away with not even checking if we had to free if_data
> >> > > > when deleting a directory because it fit within if_inline_data.
> >> > >
> >> > > Ah got it. So your fix *is* also applicable even prior to commit 43518812d2.
> >> >
> >> > You'd have to modify the patch so that it doesn't try to kmem_free
> >> > if_data if if_data == if_inline_data but otherwise (in theory) I think
> >> > that the concept applies to pre-4.15 kernels.
> >> >
> >> > (YMMV, please do run this through QA/kmemleak just in case I'm wrong, etc...)
> >>
> >> Well... so we need a resolution and better get testing this already given that
> >> *I believe* the new auto-selection algorithm used to cherry pick patches onto
> >> stable for linux-4.14.y (covered on a paper [0] and when used, stable patches
> >> are prefixed with AUTOSEL, a recent discussion covered this in November 2017
> >> [1]) recommended to merge your commit 98c4f78dcdd8 ("xfs: always free inline
> >> data before resetting inode fork during ifree") as stable commit 1eccdbd4836a41
> >> on v4.14.17 *without* merging commit 43518812d2 ("xfs: remove support for
> >> inlining data/extents into the inode fork").
> >
> > Yikes. That sets off all my "how to break filesysetms for fun and
> > profit" alarm bells. This is like playing russian roulette with all
> > our user's data.  XFS fixes that look like they are simple often
> > have subtle dependencies in them that automated backports won't ever
> > be able to understand, and if we don't get that right, we break
> > stuff.
> 
> On the other hand, XFS has a few commits that fix possible
> corruptions, that have never ended up in a stable tree. Isn't it just
> as bad ("playing roulette") for users?

No, because most corruption problems we fix are rarely seen by
users. Those that are seen or considered a significant risk are
backported as per the usual process. What we don't do is shovel
things that *look like fixes* back in older kernels.

This is the third time in recent weeks where I've had to explain
this. e.g:

https://marc.info/?l=linux-xfs&m=152103080002315&w=2

And note Christoph's followup:
https://marc.info/?l=linux-xfs&m=152103175702634&w=2

What's important to note is that the discussion in that thread lead
to the patch being backported, validated and then included in Greg's
stable tree.

Validating that backports to all the stable kernels is effectively a
full time job in itself, and we simply don't have enough upstream
developer resources available to do that. So it's a simple: if we
don't have the resources to validate changes properly, then we
*don't change the code*.

....

> >> I do wonder if other XFS folks are *at least* aware that the auto-selection
> >> algorithm now currently merging patches onto stable for XFS?
> >
> > No I wasn't aware that this was happening.  I'm kinda shit scared
> > right now hearing about how automated backports of random kernel
> > patches are being done with minimal oversight and no visibility to
> > the subsystem developers. When did this start happening?
> 
> About half a year ago. I'm not sure about the no visibility part -
> maintainers and authors would receive at least 3 mails for each patch
> that got in this way, and would have at least a week (usually a lot
> more) to object to the inclusion. Did you not receive any mails from
> me?

I'm not the XFS maintainer (haven't been for 18 months now), I don't
subscribe to LKML anymore and none of my patches were selected for
backports. So, no, I had no idea this was going on.

> I've started working on a framework to automate reviews of sent
> patches to lkml by my framework, this will allow me to do the
> following:
> 
>  - I would send a reply to the original patch sent to LKML within a
> few hours for patches that have a high probability for being a bug fix
> rather than sending a brand new mail a few months after this patch
> made it upstream. This will help reviews as this commit is still fresh
> in the author+maintainers head.
>  - I will include the results of builds for various build testing (I
> got that working now). At this point I suspect this will mostly help
> Greg with patches that are already sent with stable tags.
>  - This will turn into an opt-in rather than opt-out, but it will be
> extremely easy to opt in (something like replying with "ack" to have
> that patch included in the proposed stable branches).
>  - In the future, I'd also like to create a per-subsystem testing
> procedure (so for example, for xfs - run xfstest). I'll try working
> with maintainers of each subsystem to create something they're happy
> with. Given this discussion, I'll make XFS my first attempt at this :)

How much time are your test rigs going to be able to spend running
xfstests? A single pass on a single filesysetm config on spinning
disks will take 3-4 hours of run time. And we have at least 4 common
configs that need validation (v4, v4 w/ 512b block size, v5
(defaults), and v5 w/ reflink+rmap) and so you're looking at a
minimum 12-24 hours of machine test time per kernel you'd need to
test.

And that's just for XFS. There's the same sort of basic
configuration matrix test for ext4 (Ted does it via kvm-xfstests on
GCE which, IIRC takes about 20 hours to run) and btrfs has similar
test requirements. Then there's f2fs, overlay, etc.

You can probably start to see the scope of the validation problem
stable kernels pose, and this is just for filesystem changes....

> The mails will look something like this (an example based on a recent
> XFS commit):
> 
> > From: Sasha Levin <alexander.levin@microsoft.com>
> > To: Sasha Levin <alexander.levin@microsoft.com>
> > To: linux-xfs@vger.kernel.org, "Darrick J . Wong" <darrick.wong@oracle.com>
> > Cc: Brian Foster <bfoster@redhat.com>, linux-kernel@vger.kernel.org
> > Subject: Re: [PATCH] xfs: Correctly invert xfs_buftarg LRU isolation logic
> > In-Reply-To: <20180306102638.25322-1-vbendel@redhat.com>
> > References: <20180306102638.25322-1-vbendel@redhat.com>
> >
> > Hi Vratislav Bendel,
> >
> > [This is an automated email]
> >
> > This commit has been processed by the -stable helper bot and determined
> > to be a high probability candidate for -stable trees. (score: 6.4845)
> >
> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
> >
> > v4.15.12: OK!
> > v4.14.29: OK!
> > v4.9.89: OK!
> > v4.4.123: OK!
> > v4.1.50: OK!
> > v3.18.101: OK!
> >
> > Please reply with "ack" to have this patch included in the appropriate stable trees.

That might help, but the testing and validation is completely
opaque. If I wanted to know what that "OK!" actually meant, where
do I go to find that out?

> If you look at the recent history for fs/xfs, there were no commits in
> the past half a year or so that were submitted to any stable tree in
> the "traditional" way. There are no XFS fixes in the 4.14 LTS tree
> besides the ones submitted with the autoselection method. This is not
> finger pointing at XFS, but rather at the -stable process itself.

It's not a reflection on the -stable process, it's a reflection on
the amount of work validation of filesystem changes require. If we
decide to do backports, the -stable process will work just fine for
the mechanical code movement into the stable trees. It's all the
extra stuff before and after that movement occurs that incurs the
resource costs.

> It's
> difficult to keep track on which branches authors need to test their
> patches on, what sort of tests they need to do, and how they should
> tag their commits. In quite a few cases the effort to properly tag a
> commit for stable takes more effort that writing the code for that
> commit, which deters people from working with stable.

See the link I posted above - I explicitly address the overhead
involved in adding "fixes" tags and identifying backport targets.
And even without the overhead of having to add "fixes" tags, the
broader point I'm making about effectively random selection of
commits for backports is very relevant to the auto-backport magic
we've just learnt about...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28  1:11                 ` Sasha Levin
@ 2018-03-28 13:20                   ` Michal Hocko
  0 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2018-03-28 13:20 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Sasha Levin, Dave Chinner, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Joerg Roedel

On Wed 28-03-18 01:11:55, Sasha Levin wrote:
> On Tue, Mar 27, 2018 at 09:06:37AM +0200, Michal Hocko wrote:
> >On Mon 26-03-18 19:54:31, Sasha Levin wrote:
> >[...]
> >> About half a year ago. I'm not sure about the no visibility part -
> >> maintainers and authors would receive at least 3 mails for each patch
> >> that got in this way, and would have at least a week (usually a lot
> >> more) to object to the inclusion. Did you not receive any mails from
> >> me?
> >
> >Well, I was aware of your emails yet my free time didn't really allow me
> >to follow those patch bombs. I responded only when some email subject
> >hit my eyes as being non-stable candidate. So by no means the MM
> >backports were reviewed by me. And considering how hard it is to get any
> >review for MM patches in general I strongly suspect that others didn't
> >review either.
> 
> Being aware is different than saying it was snuck in without anyone
> knowing.

Well, you have to realize that we are receiving tons of emails and if
there are large piles appearing in my inbox I tend to delete them all if
they fall into certain pattern categories.

But just to be absolutely clear. I do not want to accuse anybody from
sneaking changes into the stable tree behind maintainers backs.

> 
> >In general I am quite skeptical about the automagic backports
> >selections, to be honest. MM patches should be reasonably good at
> >selecting stable backports and adding more patches on top just risks
> >regressions.
> 
> Okay, let's take mm/ as a subsystem that is doing a good job with
> submitting stuff to -stable.
> 
> There were 40 patches in total going into the 4.14 LTS tree that touched
> mm/. Out of those, 5 were selected via the "automagic" process:
> 
> > 647a37ec1a17 mm/frame_vector.c: release a semaphore in 'get_vaddr_frames()'
> > d91c3f2e540f mm/early_ioremap: Fix boot hang with earlyprintk=efi,keep
> > b2ba0bd34695 kmemleak: add scheduling point to kmemleak_scan()
> > 5ca94e03675a slub: fix sysfs duplicate filename creation when slub_debug=O
> > 342ee8775800 mm, x86/mm: Fix performance regression in get_user_pages_fast()
> 
> The only "questionable" one here I see is the performance regression
> one, which I decided to keep in because it's a rather severe one in a
> common code path.
> 
> Even good subsystems might sometimes miss a patch or two.

Ohh, absolutely. But if anybody is going to hit the issue, it would be
easier to spot with the upstream kernel having the fix already. Most of
above patches have good chance of never being hit or at least not that
harmful.

> But yes, I've received less response from maintainers than I expected,
> so I'll switch it to an opt-in model for new patches that go upstream.

OK, that would certainly be more conservative and to some degree safer.
I have had that feeling that stable simply has to rely on maintainers
others we are operating on the pure luck mode. Just the fact that the
patch applies doesn't mean it is correct to be backported and it is out
of reasonable to expect stable tree maintainers to evaluate that.

So as much as I appreciate the work you and others are doing (the
automagic part is actually and interesting concept for sure) the less is
sometimes more...
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-27 19:54                 ` Luis R. Rodriguez
@ 2018-03-28 13:21                   ` Michal Hocko
  2018-03-28 19:33                     ` Sasha Levin
  0 siblings, 1 reply; 34+ messages in thread
From: Michal Hocko @ 2018-03-28 13:21 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Sasha Levin, Dave Chinner, Darrick J. Wong, Christoph Hellwig,
	xfs, linux-kernel@vger.kernel.org List, Sasha Levin,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Joerg Roedel

On Tue 27-03-18 19:54:35, Luis R. Rodriguez wrote:
> On Tue, Mar 27, 2018 at 09:06:37AM +0200, Michal Hocko wrote:
> > So by no means the MM backports were reviewed by me. And considering how hard
> > it is to get any review for MM patches in general I strongly suspect that
> > others didn't review either.
> > 
> > In general I am quite skeptical about the automagic backports
> > selections, to be honest. MM patches should be reasonably good at
> > selecting stable backports and adding more patches on top just risks
> > regressions. 
> 
> BTW other than suggesting we needing *actual review* of the MM patches, are
> there known unit tests which could be run as well? Thinking long term.

There are some in selftests but most fixes are quite hard to get a
specialized testcase for. Rememeber the MM is a pile of heuristics to
handle large scale of workloads.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28  3:32               ` Dave Chinner
@ 2018-03-28 19:30                 ` Sasha Levin
  2018-03-28 19:40                   ` Darrick J. Wong
  2018-03-28 23:05                   ` Dave Chinner
  0 siblings, 2 replies; 34+ messages in thread
From: Sasha Levin @ 2018-03-28 19:30 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Wed, Mar 28, 2018 at 02:32:28PM +1100, Dave Chinner wrote:
>How much time are your test rigs going to be able to spend running
>xfstests? A single pass on a single filesysetm config on spinning
>disks will take 3-4 hours of run time. And we have at least 4 common
>configs that need validation (v4, v4 w/ 512b block size, v5
>(defaults), and v5 w/ reflink+rmap) and so you're looking at a
>minimum 12-24 hours of machine test time per kernel you'd need to
>test.

No reason they can't run in parallel, right?

>> > From: Sasha Levin <alexander.levin@microsoft.com>
>> > To: Sasha Levin <alexander.levin@microsoft.com>
>> > To: linux-xfs@vger.kernel.org, "Darrick J . Wong" <darrick.wong@oracle.com>
>> > Cc: Brian Foster <bfoster@redhat.com>, linux-kernel@vger.kernel.org
>> > Subject: Re: [PATCH] xfs: Correctly invert xfs_buftarg LRU isolation logic
>> > In-Reply-To: <20180306102638.25322-1-vbendel@redhat.com>
>> > References: <20180306102638.25322-1-vbendel@redhat.com>
>> >
>> > Hi Vratislav Bendel,
>> >
>> > [This is an automated email]
>> >
>> > This commit has been processed by the -stable helper bot and determined
>> > to be a high probability candidate for -stable trees. (score: 6.4845)
>> >
>> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
>> >
>> > v4.15.12: OK!
>> > v4.14.29: OK!
>> > v4.9.89: OK!
>> > v4.4.123: OK!
>> > v4.1.50: OK!
>> > v3.18.101: OK!
>> >
>> > Please reply with "ack" to have this patch included in the appropriate stable trees.
>
>That might help, but the testing and validation is completely
>opaque. If I wanted to know what that "OK!" actually meant, where
>do I go to find that out?

This is actually something I want maintainers to dictate. What sort of
testing would make the XFS folks happy here? Right now I'm doing
"./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28 13:21                   ` Michal Hocko
@ 2018-03-28 19:33                     ` Sasha Levin
  2018-03-29  7:01                       ` Michal Hocko
  0 siblings, 1 reply; 34+ messages in thread
From: Sasha Levin @ 2018-03-28 19:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Luis R. Rodriguez, Sasha Levin, Dave Chinner, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Joerg Roedel

On Wed, Mar 28, 2018 at 03:21:48PM +0200, Michal Hocko wrote:
>On Tue 27-03-18 19:54:35, Luis R. Rodriguez wrote:
>> On Tue, Mar 27, 2018 at 09:06:37AM +0200, Michal Hocko wrote:
>> > So by no means the MM backports were reviewed by me. And considering how hard
>> > it is to get any review for MM patches in general I strongly suspect that
>> > others didn't review either.
>> >
>> > In general I am quite skeptical about the automagic backports
>> > selections, to be honest. MM patches should be reasonably good at
>> > selecting stable backports and adding more patches on top just risks
>> > regressions.
>>
>> BTW other than suggesting we needing *actual review* of the MM patches, are
>> there known unit tests which could be run as well? Thinking long term.
>
>There are some in selftests but most fixes are quite hard to get a
>specialized testcase for. Rememeber the MM is a pile of heuristics to
>handle large scale of workloads.

Would running mmtests for each patch help here at all?

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28 19:30                 ` Sasha Levin
@ 2018-03-28 19:40                   ` Darrick J. Wong
  2018-03-28 23:05                   ` Dave Chinner
  1 sibling, 0 replies; 34+ messages in thread
From: Darrick J. Wong @ 2018-03-28 19:40 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Chinner, Sasha Levin, Luis R. Rodriguez, Christoph Hellwig,
	xfs, linux-kernel@vger.kernel.org List, Greg Kroah-Hartman,
	Julia Lawall, Josh Triplett, Takashi Iwai, Michal Hocko,
	Joerg Roedel

On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> On Wed, Mar 28, 2018 at 02:32:28PM +1100, Dave Chinner wrote:
> >How much time are your test rigs going to be able to spend running
> >xfstests? A single pass on a single filesysetm config on spinning
> >disks will take 3-4 hours of run time. And we have at least 4 common
> >configs that need validation (v4, v4 w/ 512b block size, v5
> >(defaults), and v5 w/ reflink+rmap) and so you're looking at a
> >minimum 12-24 hours of machine test time per kernel you'd need to
> >test.
> 
> No reason they can't run in parallel, right?

Correct, parallelizing them turns horrifying long test runs into
manageable quantities.

> >> > From: Sasha Levin <alexander.levin@microsoft.com>
> >> > To: Sasha Levin <alexander.levin@microsoft.com>
> >> > To: linux-xfs@vger.kernel.org, "Darrick J . Wong" <darrick.wong@oracle.com>
> >> > Cc: Brian Foster <bfoster@redhat.com>, linux-kernel@vger.kernel.org
> >> > Subject: Re: [PATCH] xfs: Correctly invert xfs_buftarg LRU isolation logic
> >> > In-Reply-To: <20180306102638.25322-1-vbendel@redhat.com>
> >> > References: <20180306102638.25322-1-vbendel@redhat.com>
> >> >
> >> > Hi Vratislav Bendel,
> >> >
> >> > [This is an automated email]
> >> >
> >> > This commit has been processed by the -stable helper bot and determined
> >> > to be a high probability candidate for -stable trees. (score: 6.4845)
> >> >
> >> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
> >> >
> >> > v4.15.12: OK!
> >> > v4.14.29: OK!
> >> > v4.9.89: OK!
> >> > v4.4.123: OK!
> >> > v4.1.50: OK!
> >> > v3.18.101: OK!
> >> >
> >> > Please reply with "ack" to have this patch included in the appropriate stable trees.
> >
> >That might help, but the testing and validation is completely
> >opaque. If I wanted to know what that "OK!" actually meant, where
> >do I go to find that out?
> 
> This is actually something I want maintainers to dictate. What sort of
> testing would make the XFS folks happy here? Right now I'm doing
> "./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?

FWIW /me usually runs ./check '-g auto,quick,clone,dedupe,fsmap,rmap'
with the following four mkfs configs:

MKFS_OPTIONS='-m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
MKFS_OPTIONS='-m reflink=1,rmapbt=1, -i sparse=1,'
MKFS_OPTIONS='-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
MKFS_OPTIONS='-m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'

Eventually I'll turn quotas on all the time too, time permitting.

--D

> 
> --
> Thanks,
> Sasha--
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28 19:30                 ` Sasha Levin
  2018-03-28 19:40                   ` Darrick J. Wong
@ 2018-03-28 23:05                   ` Dave Chinner
  2018-03-29 18:12                     ` Luis R. Rodriguez
  2018-03-30  2:47                     ` Sasha Levin
  1 sibling, 2 replies; 34+ messages in thread
From: Dave Chinner @ 2018-03-28 23:05 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> On Wed, Mar 28, 2018 at 02:32:28PM +1100, Dave Chinner wrote:
> >How much time are your test rigs going to be able to spend running
> >xfstests? A single pass on a single filesysetm config on spinning
> >disks will take 3-4 hours of run time. And we have at least 4 common
> >configs that need validation (v4, v4 w/ 512b block size, v5
> >(defaults), and v5 w/ reflink+rmap) and so you're looking at a
> >minimum 12-24 hours of machine test time per kernel you'd need to
> >test.
> 
> No reason they can't run in parallel, right?

Sure they can, if you've got the infrastructure to do it. e.g. putting
concurrent test runs on the same spinning disk doesn't speed up the
overall test run time by very much - they slow each other down as
they contend for IO from the same spindle...

I have 5-6 configs on each of my test VMs that I use for validation.
They all have the default config, all have a reflink enabledi
config, and then have varing numbers of other unique configs
according to how fast they run. i.e. it's tailored to "overnight"
testing, so 12-16 hours of test run time.

With them all running in parallel, it takes about 16 hours to cover
all the different configs. I could create more test VMs and run one
config per VM, but that's slower than (due to resource contention)
than running mutliple configs sequentially with limited
concurrency. What is most efficient for your available resources
will be different, so don't assume what works for me will work for
you....

> >> > From: Sasha Levin <alexander.levin@microsoft.com>
> >> > To: Sasha Levin <alexander.levin@microsoft.com>
> >> > To: linux-xfs@vger.kernel.org, "Darrick J . Wong" <darrick.wong@oracle.com>
> >> > Cc: Brian Foster <bfoster@redhat.com>, linux-kernel@vger.kernel.org
> >> > Subject: Re: [PATCH] xfs: Correctly invert xfs_buftarg LRU isolation logic
> >> > In-Reply-To: <20180306102638.25322-1-vbendel@redhat.com>
> >> > References: <20180306102638.25322-1-vbendel@redhat.com>
> >> >
> >> > Hi Vratislav Bendel,
> >> >
> >> > [This is an automated email]
> >> >
> >> > This commit has been processed by the -stable helper bot and determined
> >> > to be a high probability candidate for -stable trees. (score: 6.4845)
> >> >
> >> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
> >> >
> >> > v4.15.12: OK!
> >> > v4.14.29: OK!
> >> > v4.9.89: OK!
> >> > v4.4.123: OK!
> >> > v4.1.50: OK!
> >> > v3.18.101: OK!
> >> >
> >> > Please reply with "ack" to have this patch included in the appropriate stable trees.
> >
> >That might help, but the testing and validation is completely
> >opaque. If I wanted to know what that "OK!" actually meant, where
> >do I go to find that out?
> 
> This is actually something I want maintainers to dictate. What sort of
> testing would make the XFS folks happy here? Right now I'm doing
> "./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?

... and you're doing it wrong. This is precisely why being able
to discover /exactly/ what you are testing and being able to browse
the test results so we can find out if tests passed when a user
reports a bug on a stable kernel.

The way you are running fstests skips more than half the test suite
It also runs tests that are considered dangerous because they are
likely to cause the test run to fail in some way (i.e. trigger an
oops, hang the machine, leave a filesystem in an unmountable state,
etc) and hence not complete a full pass.

"./check -g auto" runs the full "expected to pass" regression test
suite for all configured test configurations. (i.e. all config
sections listed in the configs/<host>.config file)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28 19:33                     ` Sasha Levin
@ 2018-03-29  7:01                       ` Michal Hocko
  0 siblings, 0 replies; 34+ messages in thread
From: Michal Hocko @ 2018-03-29  7:01 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Luis R. Rodriguez, Sasha Levin, Dave Chinner, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Joerg Roedel

On Wed 28-03-18 19:33:06, Sasha Levin wrote:
> On Wed, Mar 28, 2018 at 03:21:48PM +0200, Michal Hocko wrote:
> >On Tue 27-03-18 19:54:35, Luis R. Rodriguez wrote:
> >> On Tue, Mar 27, 2018 at 09:06:37AM +0200, Michal Hocko wrote:
> >> > So by no means the MM backports were reviewed by me. And considering how hard
> >> > it is to get any review for MM patches in general I strongly suspect that
> >> > others didn't review either.
> >> >
> >> > In general I am quite skeptical about the automagic backports
> >> > selections, to be honest. MM patches should be reasonably good at
> >> > selecting stable backports and adding more patches on top just risks
> >> > regressions.
> >>
> >> BTW other than suggesting we needing *actual review* of the MM patches, are
> >> there known unit tests which could be run as well? Thinking long term.
> >
> >There are some in selftests but most fixes are quite hard to get a
> >specialized testcase for. Rememeber the MM is a pile of heuristics to
> >handle large scale of workloads.
> 
> Would running mmtests for each patch help here at all?

mmtests are more for performance than regression/correctness testing
AFAIR.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28 23:05                   ` Dave Chinner
@ 2018-03-29 18:12                     ` Luis R. Rodriguez
  2018-03-29 18:17                       ` Josef Bacik
  2018-03-30  2:47                     ` Sasha Levin
  1 sibling, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-29 18:12 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Sasha Levin, Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel, Anna Schumaker, Josef Bacik, Tso Ted

On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
> On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> > 
> > This is actually something I want maintainers to dictate. What sort of
> > testing would make the XFS folks happy here? Right now I'm doing
> > "./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?
> 
> ... and you're doing it wrong. This is precisely why being able
> to discover /exactly/ what you are testing and being able to browse
> the test results so we can find out if tests passed when a user
> reports a bug on a stable kernel.
> 
> The way you are running fstests skips more than half the test suite
> It also runs tests that are considered dangerous because they are
> likely to cause the test run to fail in some way (i.e. trigger an
> oops, hang the machine, leave a filesystem in an unmountable state,
> etc) and hence not complete a full pass.
> 
> "./check -g auto" runs the full "expected to pass" regression test
> suite for all configured test configurations. (i.e. all config
> sections listed in the configs/<host>.config file)

ie, it would be safer to expect that an algorithmic auto-selection process for
fixes for stable kernels should have direct input and involvement from
subsystems for run-time testing and simply guessing or assuming won't suffice.

The days of just compile testing should be way over by now, and we should
expect no less for stable kernels, *specially* if we start involving
automation.

Would a way to *start* to address this long term for XFS or other filesystems
for auto-selection long-term be a topic worth covering / addressing at LSF/MM?

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-29 18:12                     ` Luis R. Rodriguez
@ 2018-03-29 18:17                       ` Josef Bacik
  2018-03-29 18:36                         ` Sasha Levin
  0 siblings, 1 reply; 34+ messages in thread
From: Josef Bacik @ 2018-03-29 18:17 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Dave Chinner, Sasha Levin, Sasha Levin, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel, Anna Schumaker, Josef Bacik, Tso Ted

On Thu, Mar 29, 2018 at 06:12:23PM +0000, Luis R. Rodriguez wrote:
> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
> > On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> > > 
> > > This is actually something I want maintainers to dictate. What sort of
> > > testing would make the XFS folks happy here? Right now I'm doing
> > > "./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?
> > 
> > ... and you're doing it wrong. This is precisely why being able
> > to discover /exactly/ what you are testing and being able to browse
> > the test results so we can find out if tests passed when a user
> > reports a bug on a stable kernel.
> > 
> > The way you are running fstests skips more than half the test suite
> > It also runs tests that are considered dangerous because they are
> > likely to cause the test run to fail in some way (i.e. trigger an
> > oops, hang the machine, leave a filesystem in an unmountable state,
> > etc) and hence not complete a full pass.
> > 
> > "./check -g auto" runs the full "expected to pass" regression test
> > suite for all configured test configurations. (i.e. all config
> > sections listed in the configs/<host>.config file)
> 
> ie, it would be safer to expect that an algorithmic auto-selection process for
> fixes for stable kernels should have direct input and involvement from
> subsystems for run-time testing and simply guessing or assuming won't suffice.
> 
> The days of just compile testing should be way over by now, and we should
> expect no less for stable kernels, *specially* if we start involving
> automation.
> 
> Would a way to *start* to address this long term for XFS or other filesystems
> for auto-selection long-term be a topic worth covering / addressing at LSF/MM?
> 

It would be cool to tie tests to commit numbers for things where we're making
sure a oops/hang doesn't happen again, but honestly I'm not sure it's worth the
effort.  Maybe this is my upstream bias showing, but I only ever run xfstests
against something relatively close to linus, so I'm not super worried about
./check -g auto eating my box.  I expect that if I run auto that everything
minus the few flakey tests are going to pass.

Also TIL about configs/<host>.config, that's pretty fucking cool.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-29 18:17                       ` Josef Bacik
@ 2018-03-29 18:36                         ` Sasha Levin
  0 siblings, 0 replies; 34+ messages in thread
From: Sasha Levin @ 2018-03-29 18:36 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Luis R. Rodriguez, Dave Chinner, Sasha Levin, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel, Anna Schumaker, Tso Ted

On Thu, Mar 29, 2018 at 02:17:13PM -0400, Josef Bacik wrote:
>On Thu, Mar 29, 2018 at 06:12:23PM +0000, Luis R. Rodriguez wrote:
>> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
>> > On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
>> > >
>> > > This is actually something I want maintainers to dictate. What sort of
>> > > testing would make the XFS folks happy here? Right now I'm doing
>> > > "./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?
>> >
>> > ... and you're doing it wrong. This is precisely why being able
>> > to discover /exactly/ what you are testing and being able to browse
>> > the test results so we can find out if tests passed when a user
>> > reports a bug on a stable kernel.
>> >
>> > The way you are running fstests skips more than half the test suite
>> > It also runs tests that are considered dangerous because they are
>> > likely to cause the test run to fail in some way (i.e. trigger an
>> > oops, hang the machine, leave a filesystem in an unmountable state,
>> > etc) and hence not complete a full pass.
>> >
>> > "./check -g auto" runs the full "expected to pass" regression test
>> > suite for all configured test configurations. (i.e. all config
>> > sections listed in the configs/<host>.config file)
>>
>> ie, it would be safer to expect that an algorithmic auto-selection process for
>> fixes for stable kernels should have direct input and involvement from
>> subsystems for run-time testing and simply guessing or assuming won't suffice.
>>
>> The days of just compile testing should be way over by now, and we should
>> expect no less for stable kernels, *specially* if we start involving
>> automation.
>>
>> Would a way to *start* to address this long term for XFS or other filesystems
>> for auto-selection long-term be a topic worth covering / addressing at LSF/MM?

If this is something the FS/MM folks would like to discuss I'd be happy
to attend. I do plan on pushing this to other subsystems (and given that
I'd support xfstests, fs/ is a good candidate) once the infrastructure +
XFS specific stuff is done (hopefully later today).

>It would be cool to tie tests to commit numbers for things where we're making
>sure a oops/hang doesn't happen again, but honestly I'm not sure it's worth the
>effort.  Maybe this is my upstream bias showing, but I only ever run xfstests
>against something relatively close to linus, so I'm not super worried about
>./check -g auto eating my box.  I expect that if I run auto that everything
>minus the few flakey tests are going to pass.
>
>Also TIL about configs/<host>.config, that's pretty fucking cool.  Thanks,

On the other hand, from all the customers I've seen, none run anything
"close" (<3 months delta?) from Linus. The "good" ones run 4.15.y, the
worst are probably 4.1.y for upstream and the 2.6.32 RHEL ones.

So while making upstream bug-free is important, no one is actually using
that code :)

By the time those customers end up using a 4.16 kernel, for example,
they'll get all the fixes you're pushing in now, but they'll also be
stuck with the new bugs that slipped in.

Either way, I've integrated xfstests to run the way Derrick and Dave
recommended for every stable tree that I'm considering a backport of a
patch to, I'll send another mail later today once I wrap it up.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-28 23:05                   ` Dave Chinner
  2018-03-29 18:12                     ` Luis R. Rodriguez
@ 2018-03-30  2:47                     ` Sasha Levin
  2018-03-30 19:49                       ` Luis R. Rodriguez
  2018-03-31 22:02                       ` Dave Chinner
  1 sibling, 2 replies; 34+ messages in thread
From: Sasha Levin @ 2018-03-30  2:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
>On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
>> This is actually something I want maintainers to dictate. What sort of
>> testing would make the XFS folks happy here? Right now I'm doing
>> "./check 'xfs/*'" with xfstests. Is it sufficient? Anything else you'd like to see?
>
>... and you're doing it wrong. This is precisely why being able
>to discover /exactly/ what you are testing and being able to browse
>the test results so we can find out if tests passed when a user
>reports a bug on a stable kernel.
>
>The way you are running fstests skips more than half the test suite
>It also runs tests that are considered dangerous because they are
>likely to cause the test run to fail in some way (i.e. trigger an
>oops, hang the machine, leave a filesystem in an unmountable state,
>etc) and hence not complete a full pass.
>
>"./check -g auto" runs the full "expected to pass" regression test
>suite for all configured test configurations. (i.e. all config
>sections listed in the configs/<host>.config file)

Great! With information from Darrick and yourself I've modified tests to
be more relevant. Right now I run 4 configs for each stable kernel, but
can add more or remove any - depends on what helps people analyse the
results.

The complete VM serial logs as well as results/ from xfstests are also
available and are linked from the email.

Here's an example of such email:

> From: Sasha Levin <alexander.levin@microsoft.com>
> To: Sasha Levin <alexander.levin@microsoft.com>
> To: linux-xfs@vger.kernel.org, "Darrick J . Wong" <darrick.wong@oracle.com>
> Cc: Brian Foster <bfoster@redhat.com>, linux-kernel@vger.kernel.org
> Subject: Re: [PATCH] xfs: Correctly invert xfs_buftarg LRU isolation logic
> In-Reply-To: <20180306102638.25322-1-vbendel@redhat.com>
> References: <20180306102638.25322-1-vbendel@redhat.com>
>
> Hi Vratislav Bendel,
>
> [This is an automated email]
>
> This commit has been processed by the -stable helper bot and determined
> to be a high probability candidate for -stable trees. (score: 6.4845)
>
> The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
>
> v4.15.12: Build OK!
> v4.14.29: Build OK!
> v4.9.89: Build OK!
> v4.4.123: Build OK!
> v4.1.50: Build OK!
> v3.18.101: Build OK!
>
> XFS Specific tests:
>
> v4.15.12 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.15.12/tests/):
>     No tests completed!
> v4.14.29 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.14.29/tests/):
>     No tests completed!
> v4.9.89 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/):
>     No tests completed!
> v4.4.123 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.4.123/tests/):
>     v4:
>         Thu Mar 29 21:23:57 UTC 2018
>         Interrupted!
>         Passed all 0 tests
>     v4_reflink:
>         Thu Mar 29 21:24:37 UTC 2018
>         Interrupted!
>         Passed all 0 tests
> v4.1.50 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.1.50/tests/):
>     No tests completed!
> v3.18.101 (http://stable-bot.westus2.cloudapp.azure.com/test/v3.18.101/tests/):
>     v4:
>         Thu Mar 29 21:30:40 UTC 2018
>         Interrupted!
>         Passed all 0 tests
>     v4_reflink:
>         Thu Mar 29 21:25:14 UTC 2018
>         Interrupted!
>         Passed all 0 tests
>
> Please let us know if you'd like to have this patch included in a stable tree.
>
> --
> Thanks,
> Sasha

Let me know if this would be good enough for now, and if there's
anything else to add that'll be useful.

This brings me to the sad part of this mail: not a single stable kernel
survived a run. Most are paniced, some are hanging, and some were killed
because of KASan.

All have hit various warnings in fs/iomap.c, and kernels accross several
versions hit the BUG at fs/xfs/xfs_message.c:113 (+-1 line)

4.15.12 is hitting a use-after-free in xfs_efi_release().
4.14.29 and 4.9.89 seems to end up with corrupted memory (KASAN
warnings) at or before generic/027.
And finally, 3.18.101 is pretty unhappy with sleeping functions called
from atomic context.

--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-30  2:47                     ` Sasha Levin
@ 2018-03-30 19:49                       ` Luis R. Rodriguez
  2018-04-02  0:35                         ` Sasha Levin
  2018-03-31 22:02                       ` Dave Chinner
  1 sibling, 1 reply; 34+ messages in thread
From: Luis R. Rodriguez @ 2018-03-30 19:49 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Dave Chinner, Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Fri, Mar 30, 2018 at 02:47:05AM +0000, Sasha Levin wrote:
> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
> >On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> >"./check -g auto" runs the full "expected to pass" regression test
> >suite for all configured test configurations. (i.e. all config
> >sections listed in the configs/<host>.config file)
> 
> Great! With information from Darrick and yourself I've modified tests to
> be more relevant. Right now I run 4 configs for each stable kernel, but
> can add more or remove any - depends on what helps people analyse the
> results.
>
> This brings me to the sad part of this mail: not a single stable kernel
> survived a run. Most are paniced, some are hanging, 

I expected this. The semantics over -g auto yielding "expected to pass"
are relative. Perhaps its better described as "should pass"?

> and some were killed
> because of KASan.
> 
> All have hit various warnings in fs/iomap.c, and kernels accross several
> versions hit the BUG at fs/xfs/xfs_message.c:113 (+-1 line)
> 
> 4.15.12 is hitting a use-after-free in xfs_efi_release().
> 4.14.29 and 4.9.89 seems to end up with corrupted memory (KASAN
> warnings) at or before generic/027.
> And finally, 3.18.101 is pretty unhappy with sleeping functions called
> from atomic context.

>From my limited experience you have no option but to create an expunge list for
each failure for now, and then pass the expunge lists -- that in essence would
define the stable baseline and you should expect this to be different per
kernel release. If you upgrade tooling, it can also change the results, and
likewise if you upgrade fstests.

If you define an expunge list you can then pass the list with the -E parameter,
you can for instance categorize then failures by type and use a file for each
type of failure, whether that's a triage list or a type of common failure.
Format can be:

test # comments are ignored

Since you may want to database this somehow, perhaps a format that lists
some tracking for it or other heuristics:

generic/388 # bug#12345 - 1/300 run fails

I'd recommend to just add all failures to one large expunge list for now,
and later you can split / sort them them as needed.

The idea later is that any failure later would be a regression. What would
be good is to test a stable kernel prior to the auto-selection and use that
as baseline, then bump the kernel and ensure no regressions were created.

A dicey corner issue is that of tests which are supposed to "pass" but yet
can fail every blue moon. For instance I've been running into one-off failures
with generic/388 -- but only if I run it over 300 times.

As such the baseline IMHO should also track these as just failures, however it
will be often picked up as a regression first. The only way to rule this out
is to loop test the same test prior to a kernel update and ensure it wasn't
a regression -- ie, that it *was* still failing before.

This is why all this work is rather full time'ish. There is no way around it,
it will take time to establish a baseline from fstests for a filesystem. There
will also be a lot of odd ins and outs of each filesystem.

  Luis

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-30  2:47                     ` Sasha Levin
  2018-03-30 19:49                       ` Luis R. Rodriguez
@ 2018-03-31 22:02                       ` Dave Chinner
  2018-04-02  0:32                         ` Sasha Levin
  1 sibling, 1 reply; 34+ messages in thread
From: Dave Chinner @ 2018-03-31 22:02 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Fri, Mar 30, 2018 at 02:47:05AM +0000, Sasha Levin wrote:
> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
> >On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> > This commit has been processed by the -stable helper bot and determined
> > to be a high probability candidate for -stable trees. (score: 6.4845)
> >
> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
> >
> > v4.15.12: Build OK!
> > v4.14.29: Build OK!
> > v4.9.89: Build OK!
> > v4.4.123: Build OK!
> > v4.1.50: Build OK!
> > v3.18.101: Build OK!
> >
> > XFS Specific tests:
> >
> > v4.15.12 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.15.12/tests/):
> >     No tests completed!

Can you capture the actual check command output into it's own file?
That tells us at a glance which tests succeed or failed.

So I'm looking at the v5.log file:

....
echo 'export MKFS_OPTIONS='\''-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'\'''
....


FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 autosel 4.15.12+
MKFS_OPTIONS  -- -f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,
/dev/vdb
MOUNT_OPTIONS -- /dev/vdb /mnt2

That's not testing v5 filesystems. That's turned off crcs, and
so is testing a v4 filesystem. You'll see this on filesysetms that
don't support reflink:

 [not run] Reflink not supported by test filesystem type: xfs

Also, you need to make the test filesystem to match the options
the test run is configured with (i.e. v4, v5, reflink, etc)
otherwise half the tests don't exercise the expected config.

[not run] src/dbtest not built

 [not run] chacl command not found

 [not run] xfs_io set_encpolicy support is missing

You need to update your userspace.

And the test run has not completed. It's run to:

generic/430	[11172.480621] run fstests generic/430 at 2018-03-30 00:20:12
+ scp -i /home/sasha/ssh/id_rsa -P 10022 -r root@10.3.38.7:/root/xfstests-dev/results /home/sasha/data/results/test/v4.15.12/tests//v5/
+ az vm delete -y --resource-group sasha-auto-stable --name sasha-worker-629016242-vm

generic/430 and then stopped. There's still another ~50 tests in the
generic group to run, and then there's the shared and XFS subdirs to
run, too. So there's still something wrong in the way you are
setting up/installing fstests here....

> > v4.14.29 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.14.29/tests/):
> >     No tests completed!
> > v4.9.89 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/):
> >     No tests completed!
> > v4.4.123 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.4.123/tests/):
> >     v4:
> >         Thu Mar 29 21:23:57 UTC 2018
> >         Interrupted!
> >         Passed all 0 tests
> >     v4_reflink:

There's no such configuration as "v4 reflink". reflink is only
available on v5 (crc enabled) filesystems on kernels >=4.10 (IIRC).
Perhaps you've mislabelled them?

> Let me know if this would be good enough for now, and if there's
> anything else to add that'll be useful.
> 
> This brings me to the sad part of this mail: not a single stable kernel
> survived a run. Most are paniced, some are hanging, and some were killed
> because of KASan.
> 
> All have hit various warnings in fs/iomap.c,

Normal - the dmesg filter in the test harness catches those and
ignores them if they are known/expected to occur.

> and kernels accross several
> versions hit the BUG at fs/xfs/xfs_message.c:113 (+-1 line)

That's an ASSERT() failure, indicating a fatal error. e.g:

Stuff like this (from
http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/v4_reflink.log)

.....
generic/083	[ 4443.536212] run fstests generic/083 at 2018-03-29 22:32:17
[ 4444.557989] XFS (vdb): Unmounting Filesystem
[ 4445.498461] XFS (vdb): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[ 4445.505860] XFS (vdb): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[ 4445.513090] XFS (vdb): Mounting V5 Filesystem
[ 4445.531284] XFS (vdb): Ending clean mount
[ 4458.087406] XFS: Assertion failed: xfs_is_reflink_inode(ip), file: fs/xfs/xfs_reflink.c, line: 509

[snip stack trace]

Indicate a problem that should not be occurring. It's debug an
triage time - there's some problem that needs backports to fix. I
doubt anyone in XFS land has time to do this on top of everything
else we alreayd have to do...

> 4.15.12 is hitting a use-after-free in xfs_efi_release().

Debug and triage time.

> 4.14.29 and 4.9.89 seems to end up with corrupted memory (KASAN
> warnings) at or before generic/027.

More debug and triage time.

> And finally, 3.18.101 is pretty unhappy with sleeping functions called
> from atomic context.

Needle in a haystack :/

So this is just basic XFS validation, and it's throwing problems up
all over the place. Now do you see why we've been saying maintaining
stable backports for XFS is pretty much a full time job for someone?

And keep in mind this is just one filesystem. You're going to end up
with the same issues on ext4 and btrfs - the regression tests are
going to show up all sorts of problems that have been fixed in the
upstream kernels but never backported....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-31 22:02                       ` Dave Chinner
@ 2018-04-02  0:32                         ` Sasha Levin
  2018-04-03  1:46                           ` Dave Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Sasha Levin @ 2018-04-02  0:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Sun, Apr 01, 2018 at 08:02:45AM +1000, Dave Chinner wrote:
>On Fri, Mar 30, 2018 at 02:47:05AM +0000, Sasha Levin wrote:
>> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
>> >On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
>> > This commit has been processed by the -stable helper bot and determined
>> > to be a high probability candidate for -stable trees. (score: 6.4845)
>> >
>> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
>> >
>> > v4.15.12: Build OK!
>> > v4.14.29: Build OK!
>> > v4.9.89: Build OK!
>> > v4.4.123: Build OK!
>> > v4.1.50: Build OK!
>> > v3.18.101: Build OK!
>> >
>> > XFS Specific tests:
>> >
>> > v4.15.12 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.15.12/tests/):
>> >     No tests completed!
>
>Can you capture the actual check command output into it's own file?
>That tells us at a glance which tests succeed or failed.
>
>So I'm looking at the v5.log file:
>
>....
>echo 'export MKFS_OPTIONS='\''-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'\'''
>....
>
>
>FSTYP         -- xfs (debug)
>PLATFORM      -- Linux/x86_64 autosel 4.15.12+
>MKFS_OPTIONS  -- -f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,
>/dev/vdb
>MOUNT_OPTIONS -- /dev/vdb /mnt2
>
>That's not testing v5 filesystems. That's turned off crcs, and
>so is testing a v4 filesystem. You'll see this on filesysetms that
>don't support reflink:
>
> [not run] Reflink not supported by test filesystem type: xfs

I mixed up the configs here. Basically I have 4 different ones (provided
by Darrick):

MKFS_OPTIONS='-m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
MKFS_OPTIONS='-m reflink=1,rmapbt=1, -i sparse=1,'
MKFS_OPTIONS='-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
MKFS_OPTIONS='-m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'

>Also, you need to make the test filesystem to match the options
>the test run is configured with (i.e. v4, v5, reflink, etc)
>otherwise half the tests don't exercise the expected config.
>
>[not run] src/dbtest not built
>
> [not run] chacl command not found
>
> [not run] xfs_io set_encpolicy support is missing
>
>You need to update your userspace.

I thought xfsprogs in Ubuntu 18.04 is more recent, but is really 1 year
old. Instead, I'm just building it from source now.

>And the test run has not completed. It's run to:
>
>generic/430	[11172.480621] run fstests generic/430 at 2018-03-30 00:20:12
>+ scp -i /home/sasha/ssh/id_rsa -P 10022 -r root@10.3.38.7:/root/xfstests-dev/results /home/sasha/data/results/test/v4.15.12/tests//v5/
>+ az vm delete -y --resource-group sasha-auto-stable --name sasha-worker-629016242-vm
>
>generic/430 and then stopped. There's still another ~50 tests in the
>generic group to run, and then there's the shared and XFS subdirs to
>run, too. So there's still something wrong in the way you are
>setting up/installing fstests here....

I had a timeout I forgot about. There tests do take a while... :)

>> > v4.14.29 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.14.29/tests/):
>> >     No tests completed!
>> > v4.9.89 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/):
>> >     No tests completed!
>> > v4.4.123 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.4.123/tests/):
>> >     v4:
>> >         Thu Mar 29 21:23:57 UTC 2018
>> >         Interrupted!
>> >         Passed all 0 tests
>> >     v4_reflink:
>
>There's no such configuration as "v4 reflink". reflink is only
>available on v5 (crc enabled) filesystems on kernels >=4.10 (IIRC).
>Perhaps you've mislabelled them?
>
>> Let me know if this would be good enough for now, and if there's
>> anything else to add that'll be useful.
>>
>> This brings me to the sad part of this mail: not a single stable kernel
>> survived a run. Most are paniced, some are hanging, and some were killed
>> because of KASan.
>>
>> All have hit various warnings in fs/iomap.c,
>
>Normal - the dmesg filter in the test harness catches those and
>ignores them if they are known/expected to occur.
>
>> and kernels accross several
>> versions hit the BUG at fs/xfs/xfs_message.c:113 (+-1 line)
>
>That's an ASSERT() failure, indicating a fatal error. e.g:
>
>Stuff like this (from
>http://stable-bot.westus2.cloudapp.azure.com/test/v4.9.89/tests/v4_reflink.log)
>
>.....
>generic/083	[ 4443.536212] run fstests generic/083 at 2018-03-29 22:32:17
>[ 4444.557989] XFS (vdb): Unmounting Filesystem
>[ 4445.498461] XFS (vdb): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
>[ 4445.505860] XFS (vdb): EXPERIMENTAL reflink feature enabled. Use at your own risk!
>[ 4445.513090] XFS (vdb): Mounting V5 Filesystem
>[ 4445.531284] XFS (vdb): Ending clean mount
>[ 4458.087406] XFS: Assertion failed: xfs_is_reflink_inode(ip), file: fs/xfs/xfs_reflink.c, line: 509
>
>[snip stack trace]
>
>Indicate a problem that should not be occurring. It's debug an
>triage time - there's some problem that needs backports to fix. I
>doubt anyone in XFS land has time to do this on top of everything
>else we alreayd have to do...
>
>> 4.15.12 is hitting a use-after-free in xfs_efi_release().
>
>Debug and triage time.
>
>> 4.14.29 and 4.9.89 seems to end up with corrupted memory (KASAN
>> warnings) at or before generic/027.
>
>More debug and triage time.
>
>> And finally, 3.18.101 is pretty unhappy with sleeping functions called
>> from atomic context.
>
>Needle in a haystack :/
>
>So this is just basic XFS validation, and it's throwing problems up
>all over the place. Now do you see why we've been saying maintaining
>stable backports for XFS is pretty much a full time job for someone?

Maintaining stable backports is indeed lots of work, but I feel that
a lot of progress can be made by automating parts of it.

>And keep in mind this is just one filesystem. You're going to end up
>with the same issues on ext4 and btrfs - the regression tests are
>going to show up all sorts of problems that have been fixed in the
>upstream kernels but never backported....

Right, but this effort will both make it harder for new regressions to
creep in, and will make it easier to backport fixes for these issues
much easier.

I've tried running it on current mainline (that was about 12 hours
before v4.16 was released), and saw some failures there, in particular
the use-after-free which seems to be caused by generic/388:

[25302.342348] ==================================================================
[25302.348851] BUG: KASAN: use-after-free in xfs_rui_release+0x1ce/0x220
[25302.355334] Read of size 4 at addr ffff8801600f3630 by task fsstress/6274
[25302.360678]
[25302.360678] CPU: 7 PID: 6274 Comm: fsstress Not tainted 4.16.0-rc7+ #22
[25302.360678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[25302.360678] Call Trace:
[25302.360678]  dump_stack+0xd6/0x184
[25302.360678]  ? _atomic_dec_and_lock+0x2cc/0x2cc
[25302.360678]  ? show_regs_print_info+0x5/0x5
[25302.360678]  print_address_description+0x75/0x430
[25302.360678]  ? xfs_rui_release+0x1ce/0x220
[25302.360678]  kasan_report+0x1d8/0x460
[25302.360678]  ? xfs_rui_release+0x1ce/0x220
[25302.360678]  xfs_rui_release+0x1ce/0x220
[25302.407137]  ? xfs_rui_copy_format+0x100/0x100
[25302.407137]  xfs_defer_trans_abort+0x1e6/0xac0
[25302.407137]  ? xfs_defer_intake_work+0xad0/0xad0
[25302.407137]  ? xfs_trans_free_items+0x2e0/0x2e0
[25302.407137]  ? __lock_is_held+0xcb/0x1a0
[25302.407137]  xfs_defer_trans_roll+0x727/0xe60
[25302.407137]  ? xfs_defer_trans_abort+0xac0/0xac0
[25302.407137]  ? xfs_bmap_add_extent_hole_delay+0x1320/0x1320
[25302.407137]  ? __lock_is_held+0xcb/0x1a0
[25302.407137]  xfs_defer_finish+0x2a6/0x2140
[25302.407137]  ? xfs_trans_log_inode+0xa5/0x5b0
[25302.407137]  ? xfs_defer_bjoin+0x100/0x100
[25302.407137]  ? xfs_trans_ichgtime+0x1a0/0x1a0
[25302.448302]  ? save_stack+0x89/0xb0
[25302.448302]  ? xfs_bmapi_write+0x275c/0x4e20
[25302.448302]  ? do_syscall_64+0x24b/0x8b0
[25302.460213]  ? entry_SYSCALL_64_after_hwframe+0x42/0xb7
[25302.460213]  ? __bfs+0xaa0/0xaf0
[25302.460213]  ? xfs_bmapi_reserve_delalloc+0xd20/0xd20
[25302.460213]  ? __lock_is_held+0xcb/0x1a0
[25302.460213]  ? __lock_is_held+0xcb/0x1a0
[25302.460213]  ? rcu_read_lock_sched_held+0x102/0x120
[25302.482063]  ? xfs_defer_init+0x433/0x5f0
[25302.482063]  ? xfs_trans_unreserve_and_mod_sb+0xc10/0xc10
[25302.490331]  xfs_alloc_file_space+0x53a/0xf90
[25302.490331]  ? xfs_prepare_shift+0x160/0x160
[25302.498803]  ? lock_acquire+0x1c0/0x600
[25302.498803]  ? xfs_ilock+0x42b/0x670
[25302.498803]  ? lock_contended+0x1330/0x1330
[25302.498803]  ? down_write_nested+0xf2/0x190
[25302.498803]  ? _down_write_nest_lock+0x190/0x190
[25302.498803]  ? xfs_get_cowextsz_hint+0xa0/0xa0
[25302.498803]  ? xfs_break_layouts+0x2c/0x350
[25302.498803]  xfs_file_fallocate+0x5c6/0xac0
[25302.498803]  ? xfs_update_prealloc_flags+0x370/0x370
[25302.498803]  ? __lock_is_held+0xcb/0x1a0
[25302.498803]  ? rcu_read_lock_sched_held+0x102/0x120
[25302.498803]  ? rcu_sync_lockdep_assert+0xbc/0x150
[25302.498803]  vfs_fallocate+0x2f5/0x930
[25302.498803]  ioctl_preallocate+0x1dc/0x320
[25302.498803]  ? vfs_ioctl+0xf0/0xf0
[25302.550185]  do_vfs_ioctl+0xfe4/0x1690
[25302.550185]  ? cp_new_stat+0x66a/0x8e0
[25302.550185]  ? inode_get_bytes+0xc0/0xc0
[25302.550185]  ? ioctl_preallocate+0x320/0x320
[25302.550185]  ? fget_raw+0x10/0x10
[25302.550185]  ? vfs_statx_fd+0x3f/0x70
[25302.550185]  ? SYSC_newfstat+0x7c/0xd0
[25302.570302]  ? vfs_statx_fd+0x70/0x70
[25302.570302]  SyS_ioctl+0x6f/0x80
[25302.570302]  ? do_vfs_ioctl+0x1690/0x1690
[25302.570302]  do_syscall_64+0x24b/0x8b0
[25302.570302]  ? exit_to_usermode_loop+0xdf/0x1f0
[25302.570302]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[25302.570302]  ? syscall_return_slowpath+0x560/0x560
[25302.590209]  ? syscall_return_slowpath+0x21f/0x560
[25302.590209]  ? entry_SYSCALL_64_after_hwframe+0x52/0xb7
[25302.600302]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[25302.600302]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[25302.607106] RIP: 0033:0x7fb0455315d7
[25302.607106] RSP: 002b:00007fffc620fef8 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
[25302.607106] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fb0455315d7
[25302.607106] RDX: 00007fffc620ff20 RSI: 000000004030582a RDI: 0000000000000003
[25302.607106] RBP: 00000000000000dd R08: 000000000000007d R09: feff54bcff603065
[25302.607106] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000051873
[25302.607106] R13: 0000000051eb851f R14: 00007fffc62104b6 R15: 000055be2ca263e0
[25302.607106]
[25302.607106] Allocated by task 6274:
[25302.607106]  kasan_kmalloc+0xa0/0xd0
[25302.649054]  kmem_cache_alloc+0x109/0x450
[25302.649054]  kmem_zone_alloc+0x67/0x180
[25302.649054]  xfs_rui_init+0xbd/0x2e0
[25302.649054]  xfs_rmap_update_create_intent+0x43/0xd0
[25302.660185]  xfs_defer_intake_work+0x184/0xad0
[25302.660185]  xfs_defer_finish+0x29b/0x2140
[25302.660185]  xfs_alloc_file_space+0x53a/0xf90
[25302.660185]  xfs_file_fallocate+0x5c6/0xac0
[25302.660185]  vfs_fallocate+0x2f5/0x930
[25302.680347]  ioctl_preallocate+0x1dc/0x320
[25302.680347]  do_vfs_ioctl+0xfe4/0x1690
[25302.680347]  SyS_ioctl+0x6f/0x80
[25302.680347]  do_syscall_64+0x24b/0x8b0
[25302.680347]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[25302.680347]
[25302.680347] Freed by task 6274:
[25302.680347]  __kasan_slab_free+0x136/0x180
[25302.700464]  kmem_cache_free+0xe7/0x4b0
[25302.700464]  xfs_trans_free_items+0x198/0x2e0
[25302.700464]  __xfs_trans_commit+0x27f/0xcc0
[25302.711238]  xfs_trans_roll+0x17b/0x2a0
[25302.713745]  xfs_defer_trans_roll+0x6ad/0xe60
[25302.713745]  xfs_defer_finish+0x2a6/0x2140
[25302.713745]  xfs_alloc_file_space+0x53a/0xf90
[25302.713745]  xfs_file_fallocate+0x5c6/0xac0
[25302.713745]  vfs_fallocate+0x2f5/0x930
[25302.713745]  ioctl_preallocate+0x1dc/0x320
[25302.713745]  do_vfs_ioctl+0xfe4/0x1690
[25302.713745]  SyS_ioctl+0x6f/0x80
[25302.713745]  do_syscall_64+0x24b/0x8b0
[25302.713745]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[25302.713745]
[25302.713745] The buggy address belongs to the object at ffff8801600f35a8
[25302.713745]  which belongs to the cache xfs_rui_item of size 680
[25302.757350] The buggy address is located 136 bytes inside of
[25302.757350]  680-byte region [ffff8801600f35a8, ffff8801600f3850)
[25302.757350] The buggy address belongs to the page:
[25302.757350] page:ffffea0005803c00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
[25302.757350] flags: 0x2fffc0000008100(slab|head)
[25302.757350] raw: 02fffc0000008100 0000000000000000 0000000000000000 0000000180140014
[25302.757350] raw: ffffea000e4da600 0000000500000005 ffff8803a707c300 0000000000000000
[25302.757350] page dumped because: kasan: bad access detected
[25302.757350]
[25302.757350] Memory state around the buggy address:
[25302.757350]  ffff8801600f3500: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
[25302.757350]  ffff8801600f3580: fc fc fc fc fc fb fb fb fb fb fb fb fb fb fb fb
[25302.757350] >ffff8801600f3600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[25302.757350]                                      ^
[25302.757350]  ffff8801600f3680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[25302.757350]  ffff8801600f3700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[25302.757350] ==================================================================


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-03-30 19:49                       ` Luis R. Rodriguez
@ 2018-04-02  0:35                         ` Sasha Levin
  0 siblings, 0 replies; 34+ messages in thread
From: Sasha Levin @ 2018-04-02  0:35 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Dave Chinner, Sasha Levin, Darrick J. Wong, Christoph Hellwig,
	xfs, linux-kernel@vger.kernel.org List, Greg Kroah-Hartman,
	Julia Lawall, Josh Triplett, Takashi Iwai, Michal Hocko,
	Joerg Roedel

On Fri, Mar 30, 2018 at 07:49:46PM +0000, Luis R. Rodriguez wrote:
>On Fri, Mar 30, 2018 at 02:47:05AM +0000, Sasha Levin wrote:
>> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
>> >On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
>> >"./check -g auto" runs the full "expected to pass" regression test
>> >suite for all configured test configurations. (i.e. all config
>> >sections listed in the configs/<host>.config file)
>>
>> Great! With information from Darrick and yourself I've modified tests to
>> be more relevant. Right now I run 4 configs for each stable kernel, but
>> can add more or remove any - depends on what helps people analyse the
>> results.
>>
>> This brings me to the sad part of this mail: not a single stable kernel
>> survived a run. Most are paniced, some are hanging,
>
>I expected this. The semantics over -g auto yielding "expected to pass"
>are relative. Perhaps its better described as "should pass"?
>
>> and some were killed
>> because of KASan.
>>
>> All have hit various warnings in fs/iomap.c, and kernels accross several
>> versions hit the BUG at fs/xfs/xfs_message.c:113 (+-1 line)
>>
>> 4.15.12 is hitting a use-after-free in xfs_efi_release().
>> 4.14.29 and 4.9.89 seems to end up with corrupted memory (KASAN
>> warnings) at or before generic/027.
>> And finally, 3.18.101 is pretty unhappy with sleeping functions called
>> from atomic context.
>
>From my limited experience you have no option but to create an expunge list for
>each failure for now, and then pass the expunge lists -- that in essence would
>define the stable baseline and you should expect this to be different per
>kernel release. If you upgrade tooling, it can also change the results, and
>likewise if you upgrade fstests.
>
>If you define an expunge list you can then pass the list with the -E parameter,
>you can for instance categorize then failures by type and use a file for each
>type of failure, whether that's a triage list or a type of common failure.
>Format can be:
>
>test # comments are ignored
>
>Since you may want to database this somehow, perhaps a format that lists
>some tracking for it or other heuristics:
>
>generic/388 # bug#12345 - 1/300 run fails
>
>I'd recommend to just add all failures to one large expunge list for now,
>and later you can split / sort them them as needed.
>
>The idea later is that any failure later would be a regression. What would
>be good is to test a stable kernel prior to the auto-selection and use that
>as baseline, then bump the kernel and ensure no regressions were created.
>
>A dicey corner issue is that of tests which are supposed to "pass" but yet
>can fail every blue moon. For instance I've been running into one-off failures
>with generic/388 -- but only if I run it over 300 times.
>
>As such the baseline IMHO should also track these as just failures, however it
>will be often picked up as a regression first. The only way to rule this out
>is to loop test the same test prior to a kernel update and ensure it wasn't
>a regression -- ie, that it *was* still failing before.

Thanks for the pointers!

>This is why all this work is rather full time'ish. There is no way around it,
>it will take time to establish a baseline from fstests for a filesystem. There
>will also be a lot of odd ins and outs of each filesystem.

Right, but the way I see it, no one actually uses upstream. If anything,
it's a development branch, and the "real" users pick up one of the
stable trees to work with. So while there seems to be a lot of effort
dedicated to new features or fixing upstream bugs, not enough people
care that no one won't see those fixes for a few years.


--
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH] xfs: always free inline data before resetting inode fork during ifree
  2018-04-02  0:32                         ` Sasha Levin
@ 2018-04-03  1:46                           ` Dave Chinner
  0 siblings, 0 replies; 34+ messages in thread
From: Dave Chinner @ 2018-04-03  1:46 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Sasha Levin, Luis R. Rodriguez, Darrick J. Wong,
	Christoph Hellwig, xfs, linux-kernel@vger.kernel.org List,
	Greg Kroah-Hartman, Julia Lawall, Josh Triplett, Takashi Iwai,
	Michal Hocko, Joerg Roedel

On Mon, Apr 02, 2018 at 12:32:44AM +0000, Sasha Levin wrote:
> On Sun, Apr 01, 2018 at 08:02:45AM +1000, Dave Chinner wrote:
> >On Fri, Mar 30, 2018 at 02:47:05AM +0000, Sasha Levin wrote:
> >> On Thu, Mar 29, 2018 at 10:05:35AM +1100, Dave Chinner wrote:
> >> >On Wed, Mar 28, 2018 at 07:30:06PM +0000, Sasha Levin wrote:
> >> > This commit has been processed by the -stable helper bot and determined
> >> > to be a high probability candidate for -stable trees. (score: 6.4845)
> >> >
> >> > The bot has tested the following trees: v4.15.12, v4.14.29, v4.9.89, v4.4.123, v4.1.50, v3.18.101.
> >> >
> >> > v4.15.12: Build OK!
> >> > v4.14.29: Build OK!
> >> > v4.9.89: Build OK!
> >> > v4.4.123: Build OK!
> >> > v4.1.50: Build OK!
> >> > v3.18.101: Build OK!
> >> >
> >> > XFS Specific tests:
> >> >
> >> > v4.15.12 (http://stable-bot.westus2.cloudapp.azure.com/test/v4.15.12/tests/):
> >> >     No tests completed!
> >
> >Can you capture the actual check command output into it's own file?
> >That tells us at a glance which tests succeed or failed.
> >
> >So I'm looking at the v5.log file:
> >
> >....
> >echo 'export MKFS_OPTIONS='\''-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'\'''
> >....
> >
> >
> >FSTYP         -- xfs (debug)
> >PLATFORM      -- Linux/x86_64 autosel 4.15.12+
> >MKFS_OPTIONS  -- -f -m crc=0,reflink=0,rmapbt=0, -i sparse=0,
> >/dev/vdb
> >MOUNT_OPTIONS -- /dev/vdb /mnt2
> >
> >That's not testing v5 filesystems. That's turned off crcs, and
> >so is testing a v4 filesystem. You'll see this on filesysetms that
> >don't support reflink:
> >
> > [not run] Reflink not supported by test filesystem type: xfs
> 
> I mixed up the configs here. Basically I have 4 different ones (provided
> by Darrick):
> 
> MKFS_OPTIONS='-m reflink=1,rmapbt=1, -i sparse=1, -b size=1024,'
> MKFS_OPTIONS='-m reflink=1,rmapbt=1, -i sparse=1,'
> MKFS_OPTIONS='-m crc=0,reflink=0,rmapbt=0, -i sparse=0,'
> MKFS_OPTIONS='-m crc=0,reflink=0,rmapbt=0, -i sparse=0, -b size=512,'

That doesn't actually test the default options that most users will
currently be using. It tests a new format relatively few people are
using in production, and the old format that people are slowly
moving away from. If you run mkfs.xfs without any options right now
you get

MKFS_OPTIONS='-m crc=1,reflink=0,rmapbt=0, -i sparse=0'

> >So this is just basic XFS validation, and it's throwing problems up
> >all over the place. Now do you see why we've been saying maintaining
> >stable backports for XFS is pretty much a full time job for someone?
> 
> Maintaining stable backports is indeed lots of work, but I feel that
> a lot of progress can be made by automating parts of it.

Yup, but the sorting out the results of the automated tests is the
part that takes all the developer time and resources, not the
running of the tests...

> >And keep in mind this is just one filesystem. You're going to end up
> >with the same issues on ext4 and btrfs - the regression tests are
> >going to show up all sorts of problems that have been fixed in the
> >upstream kernels but never backported....
> 
> Right, but this effort will both make it harder for new regressions to
> creep in, and will make it easier to backport fixes for these issues
> much easier.

Assuming someone has the time to actually look at all these
problems....

> I've tried running it on current mainline (that was about 12 hours
> before v4.16 was released), and saw some failures there, in particular
> the use-after-free which seems to be caused by generic/388:
> 
> [25302.342348] ==================================================================
> [25302.348851] BUG: KASAN: use-after-free in xfs_rui_release+0x1ce/0x220
> [25302.355334] Read of size 4 at addr ffff8801600f3630 by task fsstress/6274
> [25302.360678]
> [25302.360678] CPU: 7 PID: 6274 Comm: fsstress Not tainted 4.16.0-rc7+ #22
> [25302.360678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [25302.360678] Call Trace:
> [25302.360678]  dump_stack+0xd6/0x184
> [25302.360678]  ? _atomic_dec_and_lock+0x2cc/0x2cc
> [25302.360678]  ? show_regs_print_info+0x5/0x5
> [25302.360678]  print_address_description+0x75/0x430
> [25302.360678]  ? xfs_rui_release+0x1ce/0x220
> [25302.360678]  kasan_report+0x1d8/0x460
> [25302.360678]  ? xfs_rui_release+0x1ce/0x220
> [25302.360678]  xfs_rui_release+0x1ce/0x220
> [25302.407137]  ? xfs_rui_copy_format+0x100/0x100
> [25302.407137]  xfs_defer_trans_abort+0x1e6/0xac0

I'm sure this was fixed. It's simply a case of having the code call
the release function rather than freeing the item it directly on
abort.

But I can't find the patch, commit, etc.

Ah, a drive-by bug fix back to the EFI code in January, didn't pass
review, was not followed up on with fixes required. Just posted a
patch to fix it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-04-03  1:46 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-23  6:01 [PATCH] xfs: always free inline data before resetting inode fork during ifree Darrick J. Wong
2017-11-23  8:14 ` Christoph Hellwig
2018-03-23  1:30 ` Luis R. Rodriguez
2018-03-23  3:41   ` Darrick J. Wong
2018-03-23 17:08     ` Luis R. Rodriguez
2018-03-23 17:26       ` Darrick J. Wong
2018-03-23 18:23         ` Luis R. Rodriguez
2018-03-24  9:06           ` Greg Kroah-Hartman
2018-03-24 17:21             ` Darrick J. Wong
2018-03-26  4:54               ` Sasha Levin
2018-03-26  6:48                 ` Darrick J. Wong
2018-03-26 17:39                 ` Luis R. Rodriguez
2018-03-25 22:33           ` Dave Chinner
2018-03-26 23:54             ` Sasha Levin
2018-03-27  7:06               ` Michal Hocko
2018-03-27 19:54                 ` Luis R. Rodriguez
2018-03-28 13:21                   ` Michal Hocko
2018-03-28 19:33                     ` Sasha Levin
2018-03-29  7:01                       ` Michal Hocko
2018-03-28  1:11                 ` Sasha Levin
2018-03-28 13:20                   ` Michal Hocko
2018-03-28  3:32               ` Dave Chinner
2018-03-28 19:30                 ` Sasha Levin
2018-03-28 19:40                   ` Darrick J. Wong
2018-03-28 23:05                   ` Dave Chinner
2018-03-29 18:12                     ` Luis R. Rodriguez
2018-03-29 18:17                       ` Josef Bacik
2018-03-29 18:36                         ` Sasha Levin
2018-03-30  2:47                     ` Sasha Levin
2018-03-30 19:49                       ` Luis R. Rodriguez
2018-04-02  0:35                         ` Sasha Levin
2018-03-31 22:02                       ` Dave Chinner
2018-04-02  0:32                         ` Sasha Levin
2018-04-03  1:46                           ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.