All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org
Subject: Re: [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS
Date: Thu, 18 Jan 2024 15:32:14 -0800	[thread overview]
Message-ID: <20240118233214.GJ674499@frogsfrogsfrogs> (raw)
In-Reply-To: <20240115230113.4080105-8-david@fromorbit.com>

On Tue, Jan 16, 2024 at 09:59:45AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In the past we've had problems with lockdep false positives stemming
> from inode locking occurring in memory reclaim contexts (e.g. from
> superblock shrinkers). Lockdep doesn't know that inodes access from
> above memory reclaim cannot be accessed from below memory reclaim
> (and vice versa) but there has never been a good solution to solving
> this problem with lockdep annotations.
> 
> This situation isn't unique to inode locks - buffers are also locked
> above and below memory reclaim, and we have to maintain lock
> ordering for them - and against inodes - appropriately. IOWs, the
> same code paths and locks are taken both above and below memory
> reclaim and so we always need to make sure the lock orders are
> consistent. We are spared the lockdep problems this might cause
> by the fact that semaphores and bit locks aren't covered by lockdep.
> 
> In general, this sort of lockdep false positive detection is cause
> by code that runs GFP_KERNEL memory allocation with an actively
> referenced inode locked. When it is run from a transaction, memory
> allocation is automatically GFP_NOFS, so we don't have reclaim
> recursion issues. So in the places where we do memory allocation
> with inodes locked outside of a transaction, we have explicitly set
> them to use GFP_NOFS allocations to prevent lockdep false positives
> from being reported if the allocation dips into direct memory
> reclaim.
> 
> More recently, __GFP_NOLOCKDEP was added to the memory allocation
> flags to tell lockdep not to track that particular allocation for
> the purposes of reclaim recursion detection. This is a much better
> way of preventing false positives - it allows us to use GFP_KERNEL
> context outside of transactions, and allows direct memory reclaim to
> proceed normally without throwing out false positive deadlock
> warnings.
> 
> The obvious places that lock inodes and do memory allocation are the
> lookup paths and inode extent list initialisation. These occur in
> non-transactional GFP_KERNEL contexts, and so can run direct reclaim
> and lock inodes.
> 
> This patch makes a first path through all the explicit GFP_NOFS
> allocations in XFS and converts the obvious ones to GFP_KERNEL |
> __GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS
> allocations from the XFS code.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks pretty straightforward to me,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_ag.c         |  2 +-
>  fs/xfs/libxfs/xfs_btree.h      |  4 +++-
>  fs/xfs/libxfs/xfs_da_btree.c   |  8 +++++---
>  fs/xfs/libxfs/xfs_dir2.c       | 14 ++++----------
>  fs/xfs/libxfs/xfs_iext_tree.c  | 22 +++++++++++++---------
>  fs/xfs/libxfs/xfs_inode_fork.c |  8 +++++---
>  fs/xfs/xfs_icache.c            |  5 ++---
>  fs/xfs/xfs_qm.c                |  6 +++---
>  8 files changed, 36 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 937ea48d5cc0..036f4ee43fd3 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -389,7 +389,7 @@ xfs_initialize_perag(
>  		pag->pag_agno = index;
>  		pag->pag_mount = mp;
>  
> -		error = radix_tree_preload(GFP_NOFS);
> +		error = radix_tree_preload(GFP_KERNEL | __GFP_RETRY_MAYFAIL);
>  		if (error)
>  			goto out_free_pag;
>  
> diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
> index d906324e25c8..75a0e2c8e115 100644
> --- a/fs/xfs/libxfs/xfs_btree.h
> +++ b/fs/xfs/libxfs/xfs_btree.h
> @@ -725,7 +725,9 @@ xfs_btree_alloc_cursor(
>  {
>  	struct xfs_btree_cur	*cur;
>  
> -	cur = kmem_cache_zalloc(cache, GFP_NOFS | __GFP_NOFAIL);
> +	/* BMBT allocations can come through from non-transactional context. */
> +	cur = kmem_cache_zalloc(cache,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	cur->bc_tp = tp;
>  	cur->bc_mp = mp;
>  	cur->bc_btnum = btnum;
> diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c
> index 3383b4525381..444ec1560f43 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.c
> +++ b/fs/xfs/libxfs/xfs_da_btree.c
> @@ -85,7 +85,8 @@ xfs_da_state_alloc(
>  {
>  	struct xfs_da_state	*state;
>  
> -	state = kmem_cache_zalloc(xfs_da_state_cache, GFP_NOFS | __GFP_NOFAIL);
> +	state = kmem_cache_zalloc(xfs_da_state_cache,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	state->args = args;
>  	state->mp = args->dp->i_mount;
>  	return state;
> @@ -2519,7 +2520,8 @@ xfs_dabuf_map(
>  	int			error = 0, nirecs, i;
>  
>  	if (nfsb > 1)
> -		irecs = kzalloc(sizeof(irec) * nfsb, GFP_NOFS | __GFP_NOFAIL);
> +		irecs = kzalloc(sizeof(irec) * nfsb,
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  
>  	nirecs = nfsb;
>  	error = xfs_bmapi_read(dp, bno, nfsb, irecs, &nirecs,
> @@ -2533,7 +2535,7 @@ xfs_dabuf_map(
>  	 */
>  	if (nirecs > 1) {
>  		map = kzalloc(nirecs * sizeof(struct xfs_buf_map),
> -				GFP_NOFS | __GFP_NOFAIL);
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  		if (!map) {
>  			error = -ENOMEM;
>  			goto out_free_irecs;
> diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
> index e60aa8f8d0a7..728f72f0d078 100644
> --- a/fs/xfs/libxfs/xfs_dir2.c
> +++ b/fs/xfs/libxfs/xfs_dir2.c
> @@ -333,7 +333,8 @@ xfs_dir_cilookup_result(
>  					!(args->op_flags & XFS_DA_OP_CILOOKUP))
>  		return -EEXIST;
>  
> -	args->value = kmalloc(len, GFP_NOFS | __GFP_RETRY_MAYFAIL);
> +	args->value = kmalloc(len,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_RETRY_MAYFAIL);
>  	if (!args->value)
>  		return -ENOMEM;
>  
> @@ -364,15 +365,8 @@ xfs_dir_lookup(
>  	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
>  	XFS_STATS_INC(dp->i_mount, xs_dir_lookup);
>  
> -	/*
> -	 * We need to use KM_NOFS here so that lockdep will not throw false
> -	 * positive deadlock warnings on a non-transactional lookup path. It is
> -	 * safe to recurse into inode recalim in that case, but lockdep can't
> -	 * easily be taught about it. Hence KM_NOFS avoids having to add more
> -	 * lockdep Doing this avoids having to add a bunch of lockdep class
> -	 * annotations into the reclaim path for the ilock.
> -	 */
> -	args = kzalloc(sizeof(*args), GFP_NOFS | __GFP_NOFAIL);
> +	args = kzalloc(sizeof(*args),
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	args->geo = dp->i_mount->m_dir_geo;
>  	args->name = name->name;
>  	args->namelen = name->len;
> diff --git a/fs/xfs/libxfs/xfs_iext_tree.c b/fs/xfs/libxfs/xfs_iext_tree.c
> index 16f18b08fe4c..8796f2b3e534 100644
> --- a/fs/xfs/libxfs/xfs_iext_tree.c
> +++ b/fs/xfs/libxfs/xfs_iext_tree.c
> @@ -394,12 +394,18 @@ xfs_iext_leaf_key(
>  	return leaf->recs[n].lo & XFS_IEXT_STARTOFF_MASK;
>  }
>  
> +static inline void *
> +xfs_iext_alloc_node(
> +	int	size)
> +{
> +	return kzalloc(size, GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
> +}
> +
>  static void
>  xfs_iext_grow(
>  	struct xfs_ifork	*ifp)
>  {
> -	struct xfs_iext_node	*node = kzalloc(NODE_SIZE,
> -						GFP_NOFS | __GFP_NOFAIL);
> +	struct xfs_iext_node	*node = xfs_iext_alloc_node(NODE_SIZE);
>  	int			i;
>  
>  	if (ifp->if_height == 1) {
> @@ -455,8 +461,7 @@ xfs_iext_split_node(
>  	int			*nr_entries)
>  {
>  	struct xfs_iext_node	*node = *nodep;
> -	struct xfs_iext_node	*new = kzalloc(NODE_SIZE,
> -						GFP_NOFS | __GFP_NOFAIL);
> +	struct xfs_iext_node	*new = xfs_iext_alloc_node(NODE_SIZE);
>  	const int		nr_move = KEYS_PER_NODE / 2;
>  	int			nr_keep = nr_move + (KEYS_PER_NODE & 1);
>  	int			i = 0;
> @@ -544,8 +549,7 @@ xfs_iext_split_leaf(
>  	int			*nr_entries)
>  {
>  	struct xfs_iext_leaf	*leaf = cur->leaf;
> -	struct xfs_iext_leaf	*new = kzalloc(NODE_SIZE,
> -						GFP_NOFS | __GFP_NOFAIL);
> +	struct xfs_iext_leaf	*new = xfs_iext_alloc_node(NODE_SIZE);
>  	const int		nr_move = RECS_PER_LEAF / 2;
>  	int			nr_keep = nr_move + (RECS_PER_LEAF & 1);
>  	int			i;
> @@ -586,8 +590,7 @@ xfs_iext_alloc_root(
>  {
>  	ASSERT(ifp->if_bytes == 0);
>  
> -	ifp->if_data = kzalloc(sizeof(struct xfs_iext_rec),
> -					GFP_NOFS | __GFP_NOFAIL);
> +	ifp->if_data = xfs_iext_alloc_node(sizeof(struct xfs_iext_rec));
>  	ifp->if_height = 1;
>  
>  	/* now that we have a node step into it */
> @@ -607,7 +610,8 @@ xfs_iext_realloc_root(
>  	if (new_size / sizeof(struct xfs_iext_rec) == RECS_PER_LEAF)
>  		new_size = NODE_SIZE;
>  
> -	new = krealloc(ifp->if_data, new_size, GFP_NOFS | __GFP_NOFAIL);
> +	new = krealloc(ifp->if_data, new_size,
> +			GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	memset(new + ifp->if_bytes, 0, new_size - ifp->if_bytes);
>  	ifp->if_data = new;
>  	cur->leaf = new;
> diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c
> index f6d5b86b608d..709fda3d742f 100644
> --- a/fs/xfs/libxfs/xfs_inode_fork.c
> +++ b/fs/xfs/libxfs/xfs_inode_fork.c
> @@ -50,7 +50,8 @@ xfs_init_local_fork(
>  		mem_size++;
>  
>  	if (size) {
> -		char *new_data = kmalloc(mem_size, GFP_NOFS | __GFP_NOFAIL);
> +		char *new_data = kmalloc(mem_size,
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  
>  		memcpy(new_data, data, size);
>  		if (zero_terminate)
> @@ -205,7 +206,8 @@ xfs_iformat_btree(
>  	}
>  
>  	ifp->if_broot_bytes = size;
> -	ifp->if_broot = kmalloc(size, GFP_NOFS | __GFP_NOFAIL);
> +	ifp->if_broot = kmalloc(size,
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	ASSERT(ifp->if_broot != NULL);
>  	/*
>  	 * Copy and convert from the on-disk structure
> @@ -690,7 +692,7 @@ xfs_ifork_init_cow(
>  		return;
>  
>  	ip->i_cowfp = kmem_cache_zalloc(xfs_ifork_cache,
> -				       GFP_NOFS | __GFP_NOFAIL);
> +				GFP_KERNEL | __GFP_NOLOCKDEP | __GFP_NOFAIL);
>  	ip->i_cowfp->if_format = XFS_DINODE_FMT_EXTENTS;
>  }
>  
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index dba514a2c84d..06046827b5fe 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -659,10 +659,9 @@ xfs_iget_cache_miss(
>  	/*
>  	 * Preload the radix tree so we can insert safely under the
>  	 * write spinlock. Note that we cannot sleep inside the preload
> -	 * region. Since we can be called from transaction context, don't
> -	 * recurse into the file system.
> +	 * region.
>  	 */
> -	if (radix_tree_preload(GFP_NOFS)) {
> +	if (radix_tree_preload(GFP_KERNEL | __GFP_NOLOCKDEP)) {
>  		error = -EAGAIN;
>  		goto out_destroy;
>  	}
> diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
> index 46a7fe70e57e..384a5349e696 100644
> --- a/fs/xfs/xfs_qm.c
> +++ b/fs/xfs/xfs_qm.c
> @@ -643,9 +643,9 @@ xfs_qm_init_quotainfo(
>  	if (error)
>  		goto out_free_lru;
>  
> -	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_NOFS);
> -	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_NOFS);
> -	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_NOFS);
> +	INIT_RADIX_TREE(&qinf->qi_uquota_tree, GFP_KERNEL);
> +	INIT_RADIX_TREE(&qinf->qi_gquota_tree, GFP_KERNEL);
> +	INIT_RADIX_TREE(&qinf->qi_pquota_tree, GFP_KERNEL);
>  	mutex_init(&qinf->qi_tree_lock);
>  
>  	/* mutex used to serialize quotaoffs */
> -- 
> 2.43.0
> 
> 

  reply	other threads:[~2024-01-18 23:32 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-15 22:59 [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Dave Chinner
2024-01-15 22:59 ` [PATCH 01/12] xfs: convert kmem_zalloc() to kzalloc() Dave Chinner
2024-01-18 22:48   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 02/12] xfs: convert kmem_alloc() to kmalloc() Dave Chinner
2024-01-18 22:50   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 03/12] xfs: move kmem_to_page() Dave Chinner
2024-01-18 22:50   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 04/12] xfs: convert kmem_free() for kvmalloc users to kvfree() Dave Chinner
2024-01-18 22:53   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 05/12] xfs: convert remaining kmem_free() to kfree() Dave Chinner
2024-01-18 22:54   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 06/12] xfs: use an empty transaction for fstrim Dave Chinner
2024-01-18 22:55   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 07/12] xfs: use __GFP_NOLOCKDEP instead of GFP_NOFS Dave Chinner
2024-01-18 23:32   ` Darrick J. Wong [this message]
2024-01-15 22:59 ` [PATCH 08/12] xfs: use GFP_KERNEL in pure transaction contexts Dave Chinner
2024-01-18 23:38   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 09/12] xfs: place intent recovery under NOFS allocation context Dave Chinner
2024-01-18 23:39   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 10/12] xfs: place the CIL under nofs " Dave Chinner
2024-01-18 23:41   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 11/12] xfs: clean up remaining GFP_NOFS users Dave Chinner
2024-01-19  0:52   ` Darrick J. Wong
2024-01-15 22:59 ` [PATCH 12/12] xfs: use xfs_defer_alloc a bit more Dave Chinner
2024-01-18 23:41   ` Darrick J. Wong
2024-03-25 17:46 ` [PATCH 00/12] xfs: remove remaining kmem interfaces and GFP_NOFS usage Pankaj Raghav (Samsung)
2024-04-01 21:30   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240118233214.GJ674499@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.