All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	djwong@kernel.org, "Theodore Ts'o" <tytso@mit.edu>,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	Jan Kara <jack@suse.cz>,
	ceph-devel@vger.kernel.org, cluster-devel@redhat.com,
	linux-nfs@vger.kernel.org, logfs@logfs.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mtd@lists.infradead.org,
	reiserfs-devel@vger.kernel.org,
	linux-ntfs-dev@lists.sourceforge.net,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-afs@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*
Date: Tue, 7 Feb 2017 09:51:50 +1100	[thread overview]
Message-ID: <20170206225150.GB12125@dastard> (raw)
In-Reply-To: <20170206184743.GB20731@dhcp22.suse.cz>

On Mon, Feb 06, 2017 at 07:47:43PM +0100, Michal Hocko wrote:
> On Mon 06-02-17 10:32:37, Darrick J. Wong wrote:
> > On Mon, Feb 06, 2017 at 06:44:15PM +0100, Michal Hocko wrote:
> > > On Mon 06-02-17 07:39:23, Matthew Wilcox wrote:
> > > > On Mon, Feb 06, 2017 at 03:07:16PM +0100, Michal Hocko wrote:
> > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > @@ -442,17 +442,17 @@ _xfs_buf_map_pages(
> > > > >  		bp->b_addr = NULL;
> > > > >  	} else {
> > > > >  		int retried = 0;
> > > > > -		unsigned noio_flag;
> > > > > +		unsigned nofs_flag;
> > > > >  
> > > > >  		/*
> > > > >  		 * vm_map_ram() will allocate auxillary structures (e.g.
> > > > >  		 * pagetables) with GFP_KERNEL, yet we are likely to be under
> > > > >  		 * GFP_NOFS context here. Hence we need to tell memory reclaim
> > > > > -		 * that we are in such a context via PF_MEMALLOC_NOIO to prevent
> > > > > +		 * that we are in such a context via PF_MEMALLOC_NOFS to prevent
> > > > >  		 * memory reclaim re-entering the filesystem here and
> > > > >  		 * potentially deadlocking.
> > > > >  		 */
> > > > 
> > > > This comment feels out of date ... how about:
> > > 
> > > which part is out of date?
> > > 
> > > > 
> > > > 		/*
> > > > 		 * vm_map_ram will allocate auxiliary structures (eg page
> > > > 		 * tables) with GFP_KERNEL.  If that tries to reclaim memory
> > > > 		 * by calling back into this filesystem, we may deadlock.
> > > > 		 * Prevent that by setting the NOFS flag.
> > > > 		 */
> > > 
> > > dunno, the previous wording seems clear enough to me. Maybe little bit
> > > more chatty than yours but I am not sure this is worth changing.
> > 
> > I prefer to keep the "...yet we are likely to be under GFP_NOFS..."
> > wording of the old comment because it captures the uncertainty of
> > whether or not we actually are already under NOFS.  If someone actually
> > has audited this code well enough to know for sure then yes let's change
> > the comment, but I haven't gone that far.
> 
> I believe we can drop the memalloc_nofs_save then as well because either
> we are called from a potentially dangerous context and thus we are in
> the nofs scope we we do not need the protection at all.

No, absolutely not. "Belief" is not a sufficient justification for
removing low level deadlock avoidance infrastructure. This code
needs to remain in _xfs_buf_map_pages() until a full audit of the
caller paths is done and we're 100% certain that there are no
lurking deadlocks.

For example, I'm pretty sure we can call into _xfs_buf_map_pages()
outside of a transaction context but with an inode ILOCK held
exclusively. If we then recurse into memory reclaim and try to run a
transaction during reclaim, we have an inverted ILOCK vs transaction
locking order. i.e. we are not allowed to call xfs_trans_reserve()
with an ILOCK held as that can deadlock the log:  log full, locked
inode pins tail of log, inode cannot be flushed because ILOCK is
held by caller waiting for log space to become available....

i.e. there are certain situations where holding a ILOCK is a
deadlock vector. See xfs_lock_inodes() for an example of the lengths
we go to avoid ILOCK based log deadlocks like this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	djwong@kernel.org, Theodore Ts'o <tytso@mit.edu>,
	Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.cz>,
	Jan Kara <jack@suse.cz>,
	ceph-devel@vger.kernel.org, cluster-devel@redhat.com,
	linux-nfs@vger.kernel.org, logfs@logfs.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mtd@lists.infradead.org,
	reiserfs-devel@vger.kernel.org,
	linux-ntfs-dev@lists.sourceforge.net,
	linux-f2fs-devel@lists.sourceforge.net,
	linux-afs@lists.infradead.org,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio*
Date: Tue, 7 Feb 2017 09:51:50 +1100	[thread overview]
Message-ID: <20170206225150.GB12125@dastard> (raw)
In-Reply-To: <20170206184743.GB20731@dhcp22.suse.cz>

On Mon, Feb 06, 2017 at 07:47:43PM +0100, Michal Hocko wrote:
> On Mon 06-02-17 10:32:37, Darrick J. Wong wrote:
> > On Mon, Feb 06, 2017 at 06:44:15PM +0100, Michal Hocko wrote:
> > > On Mon 06-02-17 07:39:23, Matthew Wilcox wrote:
> > > > On Mon, Feb 06, 2017 at 03:07:16PM +0100, Michal Hocko wrote:
> > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > @@ -442,17 +442,17 @@ _xfs_buf_map_pages(
> > > > >  		bp->b_addr = NULL;
> > > > >  	} else {
> > > > >  		int retried = 0;
> > > > > -		unsigned noio_flag;
> > > > > +		unsigned nofs_flag;
> > > > >  
> > > > >  		/*
> > > > >  		 * vm_map_ram() will allocate auxillary structures (e.g.
> > > > >  		 * pagetables) with GFP_KERNEL, yet we are likely to be under
> > > > >  		 * GFP_NOFS context here. Hence we need to tell memory reclaim
> > > > > -		 * that we are in such a context via PF_MEMALLOC_NOIO to prevent
> > > > > +		 * that we are in such a context via PF_MEMALLOC_NOFS to prevent
> > > > >  		 * memory reclaim re-entering the filesystem here and
> > > > >  		 * potentially deadlocking.
> > > > >  		 */
> > > > 
> > > > This comment feels out of date ... how about:
> > > 
> > > which part is out of date?
> > > 
> > > > 
> > > > 		/*
> > > > 		 * vm_map_ram will allocate auxiliary structures (eg page
> > > > 		 * tables) with GFP_KERNEL.  If that tries to reclaim memory
> > > > 		 * by calling back into this filesystem, we may deadlock.
> > > > 		 * Prevent that by setting the NOFS flag.
> > > > 		 */
> > > 
> > > dunno, the previous wording seems clear enough to me. Maybe little bit
> > > more chatty than yours but I am not sure this is worth changing.
> > 
> > I prefer to keep the "...yet we are likely to be under GFP_NOFS..."
> > wording of the old comment because it captures the uncertainty of
> > whether or not we actually are already under NOFS.  If someone actually
> > has audited this code well enough to know for sure then yes let's change
> > the comment, but I haven't gone that far.
> 
> I believe we can drop the memalloc_nofs_save then as well because either
> we are called from a potentially dangerous context and thus we are in
> the nofs scope we we do not need the protection at all.

No, absolutely not. "Belief" is not a sufficient justification for
removing low level deadlock avoidance infrastructure. This code
needs to remain in _xfs_buf_map_pages() until a full audit of the
caller paths is done and we're 100% certain that there are no
lurking deadlocks.

For example, I'm pretty sure we can call into _xfs_buf_map_pages()
outside of a transaction context but with an inode ILOCK held
exclusively. If we then recurse into memory reclaim and try to run a
transaction during reclaim, we have an inverted ILOCK vs transaction
locking order. i.e. we are not allowed to call xfs_trans_reserve()
with an ILOCK held as that can deadlock the log:  log full, locked
inode pins tail of log, inode cannot be flushed because ILOCK is
held by caller waiting for log space to become available....

i.e. there are certain situations where holding a ILOCK is a
deadlock vector. See xfs_lock_inodes() for an example of the lengths
we go to avoid ILOCK based log deadlocks like this...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} instead of memalloc_noio*
Date: Tue, 7 Feb 2017 09:51:50 +1100	[thread overview]
Message-ID: <20170206225150.GB12125@dastard> (raw)
In-Reply-To: <20170206184743.GB20731@dhcp22.suse.cz>

On Mon, Feb 06, 2017 at 07:47:43PM +0100, Michal Hocko wrote:
> On Mon 06-02-17 10:32:37, Darrick J. Wong wrote:
> > On Mon, Feb 06, 2017 at 06:44:15PM +0100, Michal Hocko wrote:
> > > On Mon 06-02-17 07:39:23, Matthew Wilcox wrote:
> > > > On Mon, Feb 06, 2017 at 03:07:16PM +0100, Michal Hocko wrote:
> > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > @@ -442,17 +442,17 @@ _xfs_buf_map_pages(
> > > > >  		bp->b_addr = NULL;
> > > > >  	} else {
> > > > >  		int retried = 0;
> > > > > -		unsigned noio_flag;
> > > > > +		unsigned nofs_flag;
> > > > >  
> > > > >  		/*
> > > > >  		 * vm_map_ram() will allocate auxillary structures (e.g.
> > > > >  		 * pagetables) with GFP_KERNEL, yet we are likely to be under
> > > > >  		 * GFP_NOFS context here. Hence we need to tell memory reclaim
> > > > > -		 * that we are in such a context via PF_MEMALLOC_NOIO to prevent
> > > > > +		 * that we are in such a context via PF_MEMALLOC_NOFS to prevent
> > > > >  		 * memory reclaim re-entering the filesystem here and
> > > > >  		 * potentially deadlocking.
> > > > >  		 */
> > > > 
> > > > This comment feels out of date ... how about:
> > > 
> > > which part is out of date?
> > > 
> > > > 
> > > > 		/*
> > > > 		 * vm_map_ram will allocate auxiliary structures (eg page
> > > > 		 * tables) with GFP_KERNEL.  If that tries to reclaim memory
> > > > 		 * by calling back into this filesystem, we may deadlock.
> > > > 		 * Prevent that by setting the NOFS flag.
> > > > 		 */
> > > 
> > > dunno, the previous wording seems clear enough to me. Maybe little bit
> > > more chatty than yours but I am not sure this is worth changing.
> > 
> > I prefer to keep the "...yet we are likely to be under GFP_NOFS..."
> > wording of the old comment because it captures the uncertainty of
> > whether or not we actually are already under NOFS.  If someone actually
> > has audited this code well enough to know for sure then yes let's change
> > the comment, but I haven't gone that far.
> 
> I believe we can drop the memalloc_nofs_save then as well because either
> we are called from a potentially dangerous context and thus we are in
> the nofs scope we we do not need the protection at all.

No, absolutely not. "Belief" is not a sufficient justification for
removing low level deadlock avoidance infrastructure. This code
needs to remain in _xfs_buf_map_pages() until a full audit of the
caller paths is done and we're 100% certain that there are no
lurking deadlocks.

For example, I'm pretty sure we can call into _xfs_buf_map_pages()
outside of a transaction context but with an inode ILOCK held
exclusively. If we then recurse into memory reclaim and try to run a
transaction during reclaim, we have an inverted ILOCK vs transaction
locking order. i.e. we are not allowed to call xfs_trans_reserve()
with an ILOCK held as that can deadlock the log:  log full, locked
inode pins tail of log, inode cannot be flushed because ILOCK is
held by caller waiting for log space to become available....

i.e. there are certain situations where holding a ILOCK is a
deadlock vector. See xfs_lock_inodes() for an example of the lengths
we go to avoid ILOCK based log deadlocks like this...

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com



  parent reply	other threads:[~2017-02-06 22:52 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-06 14:07 [PATCH 0/6 v4] scope GFP_NOFS api Michal Hocko
2017-02-06 14:07 ` [Cluster-devel] " Michal Hocko
2017-02-06 14:07 ` Michal Hocko
2017-02-06 14:07 ` Michal Hocko
2017-02-06 14:07 ` Michal Hocko
2017-02-06 14:07 ` Michal Hocko
2017-02-06 14:07 ` Michal Hocko
2017-02-06 14:07 ` [PATCH 1/6] lockdep: allow to disable reclaim lockup detection Michal Hocko
2017-02-06 14:07   ` [Cluster-devel] " Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:26   ` Matthew Wilcox
2017-02-06 14:26     ` [Cluster-devel] " Matthew Wilcox
2017-02-06 14:26     ` Matthew Wilcox
2017-02-06 14:34     ` Michal Hocko
2017-02-06 14:34       ` [Cluster-devel] " Michal Hocko
2017-02-06 14:34       ` Michal Hocko
2017-02-06 15:24       ` Matthew Wilcox
2017-02-06 15:24         ` [Cluster-devel] " Matthew Wilcox
2017-02-06 15:24         ` Matthew Wilcox
2017-02-06 15:30         ` Michal Hocko
2017-02-06 15:30           ` [Cluster-devel] " Michal Hocko
2017-02-06 15:30           ` Michal Hocko
2017-02-06 14:07 ` [PATCH 2/6] xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS Michal Hocko
2017-02-06 14:07   ` [Cluster-devel] " Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07 ` [PATCH 3/6] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2017-02-06 14:07   ` [Cluster-devel] [PATCH 3/6] mm: introduce memalloc_nofs_{save, restore} API Michal Hocko
2017-02-06 14:07   ` [PATCH 3/6] mm: introduce memalloc_nofs_{save,restore} API Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07 ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} instead of memalloc_noio* Michal Hocko
2017-02-06 14:07   ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 15:39   ` Matthew Wilcox
2017-02-06 15:39     ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Matthew Wilcox
2017-02-06 15:39     ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Matthew Wilcox
2017-02-06 17:44     ` Michal Hocko
2017-02-06 17:44       ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Michal Hocko
2017-02-06 17:44       ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Michal Hocko
2017-02-06 18:32       ` Darrick J. Wong
2017-02-06 18:32         ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Darrick J. Wong
2017-02-06 18:32         ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Darrick J. Wong
2017-02-06 18:47         ` Michal Hocko
2017-02-06 18:47           ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Michal Hocko
2017-02-06 18:47           ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Michal Hocko
2017-02-06 19:51           ` Darrick J. Wong
2017-02-06 19:51             ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Darrick J. Wong
2017-02-06 19:51             ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Darrick J. Wong
2017-02-06 21:18             ` Michal Hocko
2017-02-06 21:18               ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Michal Hocko
2017-02-06 21:18               ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Michal Hocko
2017-02-06 22:51           ` Dave Chinner [this message]
2017-02-06 22:51             ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Dave Chinner
2017-02-06 22:51             ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Dave Chinner
2017-02-07  7:17             ` Michal Hocko
2017-02-07  7:17               ` [Cluster-devel] [PATCH 4/6] xfs: use memalloc_nofs_{save, restore} " Michal Hocko
2017-02-07  7:17               ` [PATCH 4/6] xfs: use memalloc_nofs_{save,restore} " Michal Hocko
2017-02-06 14:07 ` [PATCH 5/6] jbd2: mark the transaction context with the scope GFP_NOFS context Michal Hocko
2017-02-06 14:07   ` [Cluster-devel] " Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07 ` [PATCH 6/6] jbd2: make the whole kjournald2 kthread NOFS safe Michal Hocko
2017-02-06 14:07   ` [Cluster-devel] " Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko
2017-02-06 14:07   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170206225150.GB12125@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=ceph-devel@vger.kernel.org \
    --cc=clm@fb.com \
    --cc=cluster-devel@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.cz \
    --cc=jack@suse.cz \
    --cc=linux-afs@lists.infradead.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-ntfs-dev@lists.sourceforge.net \
    --cc=linux-xfs@vger.kernel.org \
    --cc=logfs@logfs.org \
    --cc=mhocko@kernel.org \
    --cc=reiserfs-devel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.