All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
@ 2018-03-21 16:13 Brian Foster
  2018-03-21 17:27 ` Darrick J. Wong
  2018-03-24  0:54 ` Darrick J. Wong
  0 siblings, 2 replies; 7+ messages in thread
From: Brian Foster @ 2018-03-21 16:13 UTC (permalink / raw)
  To: linux-xfs

Most of the generic data structures embedded in xfs_mount are
dynamically initialized immediately after mp is allocated. A few
fields are left out and initialized during the xfs_mountfs()
sequence, after mp has been attached to the superblock.

To clean this up and help prevent premature access of associated
fields, refactor xfs_mount allocation and all dependent init calls
into a new helper. This self-documents that all low level data
structures (i.e., locks, trees, etc.) should be initialized before
xfs_mount is attached to the superblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---

I realized after the fact that this steps on the ability to instrument
the use-before-init variant[1] of the problematic radix tree access on
failed mount issue that Dave tracked down. This is because we'd
immediately initialize the structure and thus a subsequent memset()
would ultimately require another proper initialization.

As Dave noted in that thread, we could alternatively instrument the
use-after-free variant of the problem with a post-free delay of the mp.
I actually wonder if a last step instrumented failure in fill_super()
might be a more broadly useful test because it provides future coverage
of the entire mount teardown sequence rather than just the bad radix
tree access.

On the flipside, this patch isn't the cleanup of the century so I'd be
fine with dropping it in favor of [1], but I think it's worthy enough to
consider.

Brian

[1] https://marc.info/?l=linux-xfs&m=152152202700598&w=2

 fs/xfs/libxfs/xfs_sb.c |  1 -
 fs/xfs/xfs_mount.c     |  2 --
 fs/xfs/xfs_super.c     | 39 +++++++++++++++++++++++++++++----------
 3 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index a55f7a45fa78..53433cc024fd 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -731,7 +731,6 @@ xfs_sb_mount_common(
 	struct xfs_sb	*sbp)
 {
 	mp->m_agfrotor = mp->m_agirotor = 0;
-	spin_lock_init(&mp->m_agirotor_lock);
 	mp->m_maxagi = mp->m_sb.sb_agcount;
 	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
 	mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 6f5a5e6764d8..a901b86772f8 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -817,8 +817,6 @@ xfs_mountfs(
 	/*
 	 * Allocate and initialize the per-ag data.
 	 */
-	spin_lock_init(&mp->m_perag_lock);
-	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
 	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
 	if (error) {
 		xfs_warn(mp, "Failed per-ag init: %d", error);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 951271f57d00..612c1d5348b3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1579,29 +1579,48 @@ xfs_destroy_percpu_counters(
 	percpu_counter_destroy(&mp->m_fdblocks);
 }
 
-STATIC int
-xfs_fs_fill_super(
-	struct super_block	*sb,
-	void			*data,
-	int			silent)
+static struct xfs_mount *
+xfs_mount_alloc(
+	struct super_block	*sb)
 {
-	struct inode		*root;
-	struct xfs_mount	*mp = NULL;
-	int			flags = 0, error = -ENOMEM;
+	struct xfs_mount	*mp;
 
 	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL);
 	if (!mp)
-		goto out;
+		return NULL;
 
+	mp->m_super = sb;
 	spin_lock_init(&mp->m_sb_lock);
+	spin_lock_init(&mp->m_agirotor_lock);
+	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
+	spin_lock_init(&mp->m_perag_lock);
 	mutex_init(&mp->m_growlock);
 	atomic_set(&mp->m_active_trans, 0);
 	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
 	INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker);
 	INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
 	mp->m_kobj.kobject.kset = xfs_kset;
+	return mp;
+}
 
-	mp->m_super = sb;
+
+STATIC int
+xfs_fs_fill_super(
+	struct super_block	*sb,
+	void			*data,
+	int			silent)
+{
+	struct inode		*root;
+	struct xfs_mount	*mp = NULL;
+	int			flags = 0, error = -ENOMEM;
+
+	/*
+	 * allocate mp and do all low-level struct initializations before we
+	 * attach it to the super
+	 */
+	mp = xfs_mount_alloc(sb);
+	if (!mp)
+		goto out;
 	sb->s_fs_info = mp;
 
 	error = xfs_parseargs(mp, (char *)data);
-- 
2.13.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
  2018-03-21 16:13 [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers Brian Foster
@ 2018-03-21 17:27 ` Darrick J. Wong
  2018-03-21 18:07   ` Brian Foster
  2018-03-24  0:54 ` Darrick J. Wong
  1 sibling, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2018-03-21 17:27 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Mar 21, 2018 at 12:13:25PM -0400, Brian Foster wrote:
> Most of the generic data structures embedded in xfs_mount are
> dynamically initialized immediately after mp is allocated. A few
> fields are left out and initialized during the xfs_mountfs()
> sequence, after mp has been attached to the superblock.
> 
> To clean this up and help prevent premature access of associated
> fields, refactor xfs_mount allocation and all dependent init calls
> into a new helper. This self-documents that all low level data
> structures (i.e., locks, trees, etc.) should be initialized before
> xfs_mount is attached to the superblock.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> 
> I realized after the fact that this steps on the ability to instrument
> the use-before-init variant[1] of the problematic radix tree access on
> failed mount issue that Dave tracked down. This is because we'd
> immediately initialize the structure and thus a subsequent memset()
> would ultimately require another proper initialization.

That's simply a matter of re-calling

	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);

after the msleep, right?

> As Dave noted in that thread, we could alternatively instrument the
> use-after-free variant of the problem with a post-free delay of the mp.
> I actually wonder if a last step instrumented failure in fill_super()
> might be a more broadly useful test because it provides future coverage
> of the entire mount teardown sequence rather than just the bad radix
> tree access.

That sounds like a very good idea, there's a lot of complex stuff that
has to be unwound and at least I've stumbled over getting it right.

> On the flipside, this patch isn't the cleanup of the century so I'd be
> fine with dropping it in favor of [1], but I think it's worthy enough to
> consider.
> 
> Brian
> 
> [1] https://marc.info/?l=linux-xfs&m=152152202700598&w=2
> 
>  fs/xfs/libxfs/xfs_sb.c |  1 -
>  fs/xfs/xfs_mount.c     |  2 --
>  fs/xfs/xfs_super.c     | 39 +++++++++++++++++++++++++++++----------
>  3 files changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index a55f7a45fa78..53433cc024fd 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -731,7 +731,6 @@ xfs_sb_mount_common(
>  	struct xfs_sb	*sbp)
>  {
>  	mp->m_agfrotor = mp->m_agirotor = 0;

Can this be removed since we're using kzalloc now?

Mostly looks ok to me, though I wonder if the INIT_DONE flag can be
replaced with radix_tree_empty() now that we're guaranteed never to see
an mp with an uninitialized perag radix tree?

--D

> -	spin_lock_init(&mp->m_agirotor_lock);
>  	mp->m_maxagi = mp->m_sb.sb_agcount;
>  	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
>  	mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 6f5a5e6764d8..a901b86772f8 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -817,8 +817,6 @@ xfs_mountfs(
>  	/*
>  	 * Allocate and initialize the per-ag data.
>  	 */
> -	spin_lock_init(&mp->m_perag_lock);
> -	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
>  	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
>  	if (error) {
>  		xfs_warn(mp, "Failed per-ag init: %d", error);
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 951271f57d00..612c1d5348b3 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1579,29 +1579,48 @@ xfs_destroy_percpu_counters(
>  	percpu_counter_destroy(&mp->m_fdblocks);
>  }
>  
> -STATIC int
> -xfs_fs_fill_super(
> -	struct super_block	*sb,
> -	void			*data,
> -	int			silent)
> +static struct xfs_mount *
> +xfs_mount_alloc(
> +	struct super_block	*sb)
>  {
> -	struct inode		*root;
> -	struct xfs_mount	*mp = NULL;
> -	int			flags = 0, error = -ENOMEM;
> +	struct xfs_mount	*mp;
>  
>  	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL);
>  	if (!mp)
> -		goto out;
> +		return NULL;
>  
> +	mp->m_super = sb;
>  	spin_lock_init(&mp->m_sb_lock);
> +	spin_lock_init(&mp->m_agirotor_lock);
> +	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> +	spin_lock_init(&mp->m_perag_lock);
>  	mutex_init(&mp->m_growlock);
>  	atomic_set(&mp->m_active_trans, 0);
>  	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
>  	INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker);
>  	INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
>  	mp->m_kobj.kobject.kset = xfs_kset;
> +	return mp;
> +}
>  
> -	mp->m_super = sb;
> +
> +STATIC int
> +xfs_fs_fill_super(
> +	struct super_block	*sb,
> +	void			*data,
> +	int			silent)
> +{
> +	struct inode		*root;
> +	struct xfs_mount	*mp = NULL;
> +	int			flags = 0, error = -ENOMEM;
> +
> +	/*
> +	 * allocate mp and do all low-level struct initializations before we
> +	 * attach it to the super
> +	 */
> +	mp = xfs_mount_alloc(sb);
> +	if (!mp)
> +		goto out;
>  	sb->s_fs_info = mp;
>  
>  	error = xfs_parseargs(mp, (char *)data);
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
  2018-03-21 17:27 ` Darrick J. Wong
@ 2018-03-21 18:07   ` Brian Foster
  2018-03-22  0:19     ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Brian Foster @ 2018-03-21 18:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Mar 21, 2018 at 10:27:28AM -0700, Darrick J. Wong wrote:
> On Wed, Mar 21, 2018 at 12:13:25PM -0400, Brian Foster wrote:
> > Most of the generic data structures embedded in xfs_mount are
> > dynamically initialized immediately after mp is allocated. A few
> > fields are left out and initialized during the xfs_mountfs()
> > sequence, after mp has been attached to the superblock.
> > 
> > To clean this up and help prevent premature access of associated
> > fields, refactor xfs_mount allocation and all dependent init calls
> > into a new helper. This self-documents that all low level data
> > structures (i.e., locks, trees, etc.) should be initialized before
> > xfs_mount is attached to the superblock.
> > 
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > 
> > I realized after the fact that this steps on the ability to instrument
> > the use-before-init variant[1] of the problematic radix tree access on
> > failed mount issue that Dave tracked down. This is because we'd
> > immediately initialize the structure and thus a subsequent memset()
> > would ultimately require another proper initialization.
> 
> That's simply a matter of re-calling
> 
> 	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> 
> after the msleep, right?
> 

I suppose that may work (I'll let Dave chime in)...

> > As Dave noted in that thread, we could alternatively instrument the
> > use-after-free variant of the problem with a post-free delay of the mp.
> > I actually wonder if a last step instrumented failure in fill_super()
> > might be a more broadly useful test because it provides future coverage
> > of the entire mount teardown sequence rather than just the bad radix
> > tree access.
> 
> That sounds like a very good idea, there's a lot of complex stuff that
> has to be unwound and at least I've stumbled over getting it right.
> 

... but it sounds like we agree this may be preferable anyways. ;)

> > On the flipside, this patch isn't the cleanup of the century so I'd be
> > fine with dropping it in favor of [1], but I think it's worthy enough to
> > consider.
> > 
> > Brian
> > 
> > [1] https://marc.info/?l=linux-xfs&m=152152202700598&w=2
> > 
> >  fs/xfs/libxfs/xfs_sb.c |  1 -
> >  fs/xfs/xfs_mount.c     |  2 --
> >  fs/xfs/xfs_super.c     | 39 +++++++++++++++++++++++++++++----------
> >  3 files changed, 29 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> > index a55f7a45fa78..53433cc024fd 100644
> > --- a/fs/xfs/libxfs/xfs_sb.c
> > +++ b/fs/xfs/libxfs/xfs_sb.c
> > @@ -731,7 +731,6 @@ xfs_sb_mount_common(
> >  	struct xfs_sb	*sbp)
> >  {
> >  	mp->m_agfrotor = mp->m_agirotor = 0;
> 
> Can this be removed since we're using kzalloc now?
> 

Hm, that's one example where we actually don't rely on the zeroed
allocation. I suppose we could remove that, but note this patch doesn't
introduce the use of kzalloc(). I actually don't mind seeing code like
the above if the developer wants to be clear about intent. (I also don't
consider it a requirement given the use of kzalloc(), so just my .02).

> Mostly looks ok to me, though I wonder if the INIT_DONE flag can be
> replaced with radix_tree_empty() now that we're guaranteed never to see
> an mp with an uninitialized perag radix tree?
> 

I'm not sure we even need that. The lookup should just return NULL if
the tree is empty or partially populated. That's probably what already
happens today due to kzalloc(), so this patch is really just more of a
cleanup...

(It sounds like Dave was looking into more generic shrinker changes and
thus may have other reasons to avoid the flag anyways, but that's for
the other[1] thread...).

Brian

> --D
> 
> > -	spin_lock_init(&mp->m_agirotor_lock);
> >  	mp->m_maxagi = mp->m_sb.sb_agcount;
> >  	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
> >  	mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index 6f5a5e6764d8..a901b86772f8 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -817,8 +817,6 @@ xfs_mountfs(
> >  	/*
> >  	 * Allocate and initialize the per-ag data.
> >  	 */
> > -	spin_lock_init(&mp->m_perag_lock);
> > -	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> >  	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
> >  	if (error) {
> >  		xfs_warn(mp, "Failed per-ag init: %d", error);
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 951271f57d00..612c1d5348b3 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1579,29 +1579,48 @@ xfs_destroy_percpu_counters(
> >  	percpu_counter_destroy(&mp->m_fdblocks);
> >  }
> >  
> > -STATIC int
> > -xfs_fs_fill_super(
> > -	struct super_block	*sb,
> > -	void			*data,
> > -	int			silent)
> > +static struct xfs_mount *
> > +xfs_mount_alloc(
> > +	struct super_block	*sb)
> >  {
> > -	struct inode		*root;
> > -	struct xfs_mount	*mp = NULL;
> > -	int			flags = 0, error = -ENOMEM;
> > +	struct xfs_mount	*mp;
> >  
> >  	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL);
> >  	if (!mp)
> > -		goto out;
> > +		return NULL;
> >  
> > +	mp->m_super = sb;
> >  	spin_lock_init(&mp->m_sb_lock);
> > +	spin_lock_init(&mp->m_agirotor_lock);
> > +	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> > +	spin_lock_init(&mp->m_perag_lock);
> >  	mutex_init(&mp->m_growlock);
> >  	atomic_set(&mp->m_active_trans, 0);
> >  	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
> >  	INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker);
> >  	INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
> >  	mp->m_kobj.kobject.kset = xfs_kset;
> > +	return mp;
> > +}
> >  
> > -	mp->m_super = sb;
> > +
> > +STATIC int
> > +xfs_fs_fill_super(
> > +	struct super_block	*sb,
> > +	void			*data,
> > +	int			silent)
> > +{
> > +	struct inode		*root;
> > +	struct xfs_mount	*mp = NULL;
> > +	int			flags = 0, error = -ENOMEM;
> > +
> > +	/*
> > +	 * allocate mp and do all low-level struct initializations before we
> > +	 * attach it to the super
> > +	 */
> > +	mp = xfs_mount_alloc(sb);
> > +	if (!mp)
> > +		goto out;
> >  	sb->s_fs_info = mp;
> >  
> >  	error = xfs_parseargs(mp, (char *)data);
> > -- 
> > 2.13.6
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
  2018-03-21 18:07   ` Brian Foster
@ 2018-03-22  0:19     ` Dave Chinner
  2018-03-24  0:52       ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2018-03-22  0:19 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, linux-xfs

On Wed, Mar 21, 2018 at 02:07:38PM -0400, Brian Foster wrote:
> On Wed, Mar 21, 2018 at 10:27:28AM -0700, Darrick J. Wong wrote:
> > On Wed, Mar 21, 2018 at 12:13:25PM -0400, Brian Foster wrote:
> > > Most of the generic data structures embedded in xfs_mount are
> > > dynamically initialized immediately after mp is allocated. A few
> > > fields are left out and initialized during the xfs_mountfs()
> > > sequence, after mp has been attached to the superblock.
> > > 
> > > To clean this up and help prevent premature access of associated
> > > fields, refactor xfs_mount allocation and all dependent init calls
> > > into a new helper. This self-documents that all low level data
> > > structures (i.e., locks, trees, etc.) should be initialized before
> > > xfs_mount is attached to the superblock.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > I realized after the fact that this steps on the ability to instrument
> > > the use-before-init variant[1] of the problematic radix tree access on
> > > failed mount issue that Dave tracked down. This is because we'd
> > > immediately initialize the structure and thus a subsequent memset()
> > > would ultimately require another proper initialization.
> > 
> > That's simply a matter of re-calling
> > 
> > 	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> > 
> > after the msleep, right?
> > 
> 
> I suppose that may work (I'll let Dave chime in)...

Well, to tell the truth, this whole "VFS accesses the fs during
mount: problem is isolated to only the inode cache shrinker, and it
while a mount is in progress it does not run inode cache scans
(can't lock s_umount because the mount holds it locked).

IOWs, this crash can be fixed entirely by modifying the VFS code as
I mentioned to avoid super_cache_count() doing any work during mount
and unmount.

So, really, none of these XFS changes are necessary to fix the
problem that was reported. I'm not opposed to it as a cleanup, but
let's not conflate it with the VFS superblock shrinker bug that
needs to be fixed....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
  2018-03-22  0:19     ` Dave Chinner
@ 2018-03-24  0:52       ` Darrick J. Wong
  2018-03-25 22:17         ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2018-03-24  0:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Brian Foster, linux-xfs

On Thu, Mar 22, 2018 at 11:19:45AM +1100, Dave Chinner wrote:
> On Wed, Mar 21, 2018 at 02:07:38PM -0400, Brian Foster wrote:
> > On Wed, Mar 21, 2018 at 10:27:28AM -0700, Darrick J. Wong wrote:
> > > On Wed, Mar 21, 2018 at 12:13:25PM -0400, Brian Foster wrote:
> > > > Most of the generic data structures embedded in xfs_mount are
> > > > dynamically initialized immediately after mp is allocated. A few
> > > > fields are left out and initialized during the xfs_mountfs()
> > > > sequence, after mp has been attached to the superblock.
> > > > 
> > > > To clean this up and help prevent premature access of associated
> > > > fields, refactor xfs_mount allocation and all dependent init calls
> > > > into a new helper. This self-documents that all low level data
> > > > structures (i.e., locks, trees, etc.) should be initialized before
> > > > xfs_mount is attached to the superblock.
> > > > 
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > 
> > > > I realized after the fact that this steps on the ability to instrument
> > > > the use-before-init variant[1] of the problematic radix tree access on
> > > > failed mount issue that Dave tracked down. This is because we'd
> > > > immediately initialize the structure and thus a subsequent memset()
> > > > would ultimately require another proper initialization.
> > > 
> > > That's simply a matter of re-calling
> > > 
> > > 	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> > > 
> > > after the msleep, right?
> > > 
> > 
> > I suppose that may work (I'll let Dave chime in)...
> 
> Well, to tell the truth, this whole "VFS accesses the fs during
> mount: problem is isolated to only the inode cache shrinker, and it
> while a mount is in progress it does not run inode cache scans
> (can't lock s_umount because the mount holds it locked).
> 
> IOWs, this crash can be fixed entirely by modifying the VFS code as
> I mentioned to avoid super_cache_count() doing any work during mount
> and unmount.
> 
> So, really, none of these XFS changes are necessary to fix the
> problem that was reported. I'm not opposed to it as a cleanup, but
> let's not conflate it with the VFS superblock shrinker bug that
> needs to be fixed....

Are you working on such a patch?  I guessed that one could change the
vfs to avoid register_shrinker until after fill_super returns?  Or
possibly just have super_cache_count bail out if !SB_ACTIVE, though that
could be messy given that xfs (and others) mess with that flag.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
  2018-03-21 16:13 [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers Brian Foster
  2018-03-21 17:27 ` Darrick J. Wong
@ 2018-03-24  0:54 ` Darrick J. Wong
  1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-03-24  0:54 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Wed, Mar 21, 2018 at 12:13:25PM -0400, Brian Foster wrote:
> Most of the generic data structures embedded in xfs_mount are
> dynamically initialized immediately after mp is allocated. A few
> fields are left out and initialized during the xfs_mountfs()
> sequence, after mp has been attached to the superblock.
> 
> To clean this up and help prevent premature access of associated
> fields, refactor xfs_mount allocation and all dependent init calls
> into a new helper. This self-documents that all low level data
> structures (i.e., locks, trees, etc.) should be initialized before
> xfs_mount is attached to the superblock.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>

Looks ok enough to test,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> ---
> 
> I realized after the fact that this steps on the ability to instrument
> the use-before-init variant[1] of the problematic radix tree access on
> failed mount issue that Dave tracked down. This is because we'd
> immediately initialize the structure and thus a subsequent memset()
> would ultimately require another proper initialization.
> 
> As Dave noted in that thread, we could alternatively instrument the
> use-after-free variant of the problem with a post-free delay of the mp.
> I actually wonder if a last step instrumented failure in fill_super()
> might be a more broadly useful test because it provides future coverage
> of the entire mount teardown sequence rather than just the bad radix
> tree access.
> 
> On the flipside, this patch isn't the cleanup of the century so I'd be
> fine with dropping it in favor of [1], but I think it's worthy enough to
> consider.
> 
> Brian
> 
> [1] https://marc.info/?l=linux-xfs&m=152152202700598&w=2
> 
>  fs/xfs/libxfs/xfs_sb.c |  1 -
>  fs/xfs/xfs_mount.c     |  2 --
>  fs/xfs/xfs_super.c     | 39 +++++++++++++++++++++++++++++----------
>  3 files changed, 29 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
> index a55f7a45fa78..53433cc024fd 100644
> --- a/fs/xfs/libxfs/xfs_sb.c
> +++ b/fs/xfs/libxfs/xfs_sb.c
> @@ -731,7 +731,6 @@ xfs_sb_mount_common(
>  	struct xfs_sb	*sbp)
>  {
>  	mp->m_agfrotor = mp->m_agirotor = 0;
> -	spin_lock_init(&mp->m_agirotor_lock);
>  	mp->m_maxagi = mp->m_sb.sb_agcount;
>  	mp->m_blkbit_log = sbp->sb_blocklog + XFS_NBBYLOG;
>  	mp->m_blkbb_log = sbp->sb_blocklog - BBSHIFT;
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 6f5a5e6764d8..a901b86772f8 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -817,8 +817,6 @@ xfs_mountfs(
>  	/*
>  	 * Allocate and initialize the per-ag data.
>  	 */
> -	spin_lock_init(&mp->m_perag_lock);
> -	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
>  	error = xfs_initialize_perag(mp, sbp->sb_agcount, &mp->m_maxagi);
>  	if (error) {
>  		xfs_warn(mp, "Failed per-ag init: %d", error);
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 951271f57d00..612c1d5348b3 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1579,29 +1579,48 @@ xfs_destroy_percpu_counters(
>  	percpu_counter_destroy(&mp->m_fdblocks);
>  }
>  
> -STATIC int
> -xfs_fs_fill_super(
> -	struct super_block	*sb,
> -	void			*data,
> -	int			silent)
> +static struct xfs_mount *
> +xfs_mount_alloc(
> +	struct super_block	*sb)
>  {
> -	struct inode		*root;
> -	struct xfs_mount	*mp = NULL;
> -	int			flags = 0, error = -ENOMEM;
> +	struct xfs_mount	*mp;
>  
>  	mp = kzalloc(sizeof(struct xfs_mount), GFP_KERNEL);
>  	if (!mp)
> -		goto out;
> +		return NULL;
>  
> +	mp->m_super = sb;
>  	spin_lock_init(&mp->m_sb_lock);
> +	spin_lock_init(&mp->m_agirotor_lock);
> +	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> +	spin_lock_init(&mp->m_perag_lock);
>  	mutex_init(&mp->m_growlock);
>  	atomic_set(&mp->m_active_trans, 0);
>  	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
>  	INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker);
>  	INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker);
>  	mp->m_kobj.kobject.kset = xfs_kset;
> +	return mp;
> +}
>  
> -	mp->m_super = sb;
> +
> +STATIC int
> +xfs_fs_fill_super(
> +	struct super_block	*sb,
> +	void			*data,
> +	int			silent)
> +{
> +	struct inode		*root;
> +	struct xfs_mount	*mp = NULL;
> +	int			flags = 0, error = -ENOMEM;
> +
> +	/*
> +	 * allocate mp and do all low-level struct initializations before we
> +	 * attach it to the super
> +	 */
> +	mp = xfs_mount_alloc(sb);
> +	if (!mp)
> +		goto out;
>  	sb->s_fs_info = mp;
>  
>  	error = xfs_parseargs(mp, (char *)data);
> -- 
> 2.13.6
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers
  2018-03-24  0:52       ` Darrick J. Wong
@ 2018-03-25 22:17         ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2018-03-25 22:17 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Fri, Mar 23, 2018 at 05:52:03PM -0700, Darrick J. Wong wrote:
> On Thu, Mar 22, 2018 at 11:19:45AM +1100, Dave Chinner wrote:
> > On Wed, Mar 21, 2018 at 02:07:38PM -0400, Brian Foster wrote:
> > > On Wed, Mar 21, 2018 at 10:27:28AM -0700, Darrick J. Wong wrote:
> > > > On Wed, Mar 21, 2018 at 12:13:25PM -0400, Brian Foster wrote:
> > > > > Most of the generic data structures embedded in xfs_mount are
> > > > > dynamically initialized immediately after mp is allocated. A few
> > > > > fields are left out and initialized during the xfs_mountfs()
> > > > > sequence, after mp has been attached to the superblock.
> > > > > 
> > > > > To clean this up and help prevent premature access of associated
> > > > > fields, refactor xfs_mount allocation and all dependent init calls
> > > > > into a new helper. This self-documents that all low level data
> > > > > structures (i.e., locks, trees, etc.) should be initialized before
> > > > > xfs_mount is attached to the superblock.
> > > > > 
> > > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > > ---
> > > > > 
> > > > > I realized after the fact that this steps on the ability to instrument
> > > > > the use-before-init variant[1] of the problematic radix tree access on
> > > > > failed mount issue that Dave tracked down. This is because we'd
> > > > > immediately initialize the structure and thus a subsequent memset()
> > > > > would ultimately require another proper initialization.
> > > > 
> > > > That's simply a matter of re-calling
> > > > 
> > > > 	INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC);
> > > > 
> > > > after the msleep, right?
> > > > 
> > > 
> > > I suppose that may work (I'll let Dave chime in)...
> > 
> > Well, to tell the truth, this whole "VFS accesses the fs during
> > mount: problem is isolated to only the inode cache shrinker, and it
> > while a mount is in progress it does not run inode cache scans
> > (can't lock s_umount because the mount holds it locked).
> > 
> > IOWs, this crash can be fixed entirely by modifying the VFS code as
> > I mentioned to avoid super_cache_count() doing any work during mount
> > and unmount.
> > 
> > So, really, none of these XFS changes are necessary to fix the
> > problem that was reported. I'm not opposed to it as a cleanup, but
> > let's not conflate it with the VFS superblock shrinker bug that
> > needs to be fixed....
> 
> Are you working on such a patch?

Yes.

> I guessed that one could change the
> vfs to avoid register_shrinker until after fill_super returns?  Or
> possibly just have super_cache_count bail out if !SB_ACTIVE, though that
> could be messy given that xfs (and others) mess with that flag.

I think it's fine for the shrinker to check SB_ACTIVE, because if
it's not set then then inodes are not placed on thr LRUs that the
shrinker works on.  The problem is that the shrinker can run before
filesystems assert SB_ACTIVE, even though there will be nothing for
it to do and the filesystem has not set everything up. As such, I
think it's the right thing to do to match SB_ACTIVE checks from
iput_final() to super_cache_cache().

FWIW, if a filesystem is setting SB_ACTIVE earlier than the VFS,
then it better make sure it's already done all the setup it needs
to. The filesystem has to do that setup anyway (as the VFS requires
it), so it's just a matter of what order we do things in
->fill_super.

In XFS we have finished all the setup of the structures the shrinker
is dependent on by the time we set SB_ACTIVE in log recovery. Hence
I don't really see any problems with XFS or any of the other 
filesystems that set SB_ACTIVE in their mount routines.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-03-25 22:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-21 16:13 [PATCH] xfs: clean up xfs_mount allocation and dynamic initializers Brian Foster
2018-03-21 17:27 ` Darrick J. Wong
2018-03-21 18:07   ` Brian Foster
2018-03-22  0:19     ` Dave Chinner
2018-03-24  0:52       ` Darrick J. Wong
2018-03-25 22:17         ` Dave Chinner
2018-03-24  0:54 ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.