All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 v2] xfs: lockless buffer lookups
@ 2022-06-27  6:08 Dave Chinner
  2022-06-27  6:08 ` [PATCH 1/6] xfs: rework xfs_buf_incore() API Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Dave Chinner @ 2022-06-27  6:08 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Current work to merge the XFS inode life cycle with the VFS indoe
life cycle is finding some interesting issues. If we have a path
that hits buffer trylocks fairly hard (e.g. a non-blocking
background inode freeing function), we end up hitting massive
contention on the buffer cache hash locks:

-   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
   - 92.67% xfs_inodegc_worker
      - 92.13% xfs_inode_unlink
         - 91.52% xfs_inactive_ifree
            - 85.63% xfs_read_agi
               - 85.61% xfs_trans_read_buf_map
                  - 85.59% xfs_buf_read_map
                     - xfs_buf_get_map
                        - 85.55% xfs_buf_find
                           - 72.87% _raw_spin_lock
                              - do_raw_spin_lock
                                   71.86% __pv_queued_spin_lock_slowpath
                           - 8.74% xfs_buf_rele
                              - 7.88% _raw_spin_lock
                                 - 7.88% do_raw_spin_lock
                                      7.63% __pv_queued_spin_lock_slowpath
                           - 1.70% xfs_buf_trylock
                              - 1.68% down_trylock
                                 - 1.41% _raw_spin_lock_irqsave
                                    - 1.39% do_raw_spin_lock
                                         __pv_queued_spin_lock_slowpath
                           - 0.76% _raw_spin_unlock
                                0.75% do_raw_spin_unlock

This is basically hammering the pag->pag_buf_lock from lots of CPUs
doing trylocks at the same time. Most of the buffer trylock
operations ultimately fail after we've done the lookup, so we're
really hammering the buf hash lock whilst making no progress.

We can also see significant spinlock traffic on the same lock just
under normal operation when lots of tasks are accessing metadata
from the same AG, so let's avoid all this by creating a lookup fast
path which leverages the rhashtable's ability to do rcu protected
lookups.

This is a rework of the initial lockless buffer lookup patch I sent
here:

https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/

And the alternative cleanup sent by Christoph here:

https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/

This version isn't quite a short as Christophs, but it does roughly
the same thing in killing the two-phase _xfs_buf_find() call
mechanism. It separates the fast and slow paths a little more
cleanly and doesn't have context dependent buffer return state from
the slow path that the caller needs to handle. It also picks up the
rhashtable insert optimisation that Christoph added.

This series passes fstests under several different configs and does
not cause any obvious regressions in scalability testing that has
been performed. Hence I'm proposing this as potential 5.20 cycle
material.

Thoughts, comments?

Version 2:
- based on 5.19-rc2
- high speed collision of original proposals.

Initial versions:
- https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/
- https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/



^ permalink raw reply	[flat|nested] 29+ messages in thread
* Re: [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map()
@ 2022-06-27 22:09 kernel test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2022-06-27 22:09 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 9521 bytes --]

CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220627060841.244226-4-david@fromorbit.com>
References: <20220627060841.244226-4-david@fromorbit.com>
TO: Dave Chinner <david@fromorbit.com>
TO: linux-xfs(a)vger.kernel.org

Hi Dave,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on xfs-linux/for-next]
[also build test WARNING on linus/master v5.19-rc4 next-20220627]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/xfs-lockless-buffer-lookups/20220627-141053
base:   https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git for-next
:::::: branch date: 16 hours ago
:::::: commit date: 16 hours ago
config: x86_64-randconfig-m001-20220627 (https://download.01.org/0day-ci/archive/20220628/202206280549.pN2aOPuA-lkp(a)intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

smatch warnings:
fs/xfs/xfs_buf.c:703 xfs_buf_get_map() error: we previously assumed 'bp' could be null (see line 686)

vim +/bp +703 fs/xfs/xfs_buf.c

ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  651  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  652  /*
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  653   * Assembles a buffer covering the specified range. The code is optimised for
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  654   * cache hits, as metadata intensive workloads will see 3 orders of magnitude
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  655   * more hits than misses.
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  656   */
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  657  int
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  658  xfs_buf_get_map(
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  659  	struct xfs_buftarg	*btp,
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  660  	struct xfs_buf_map	*map,
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  661  	int			nmaps,
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  662  	xfs_buf_flags_t		flags,
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  663  	struct xfs_buf		**bpp)
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  664  {
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  665  	struct xfs_perag	*pag;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  666  	struct xfs_buf		*bp = NULL;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  667  	struct xfs_buf_map	cmap = { .bm_bn = map[0].bm_bn };
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  668  	int			error;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  669  	int			i;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  670  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  671  	for (i = 0; i < nmaps; i++)
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  672  		cmap.bm_len += map[i].bm_len;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  673  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  674  	error = xfs_buf_find_verify(btp, &cmap);
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  675  	if (error)
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  676  		return error;
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  677  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  678  	pag = xfs_perag_get(btp->bt_mount,
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  679  			    xfs_daddr_to_agno(btp->bt_mount, cmap.bm_bn));
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  680  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  681  	error = xfs_buf_find_fast(pag, &cmap, flags, &bp);
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  682  	if (error && error != -ENOENT)
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  683  		goto out_put_perag;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  684  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  685  	/* cache hits always outnumber misses by at least 10:1 */
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27 @686  	if (unlikely(!bp)) {
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  687  		XFS_STATS_INC(btp->bt_mount, xb_miss_locked);
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  688  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  689  		if (flags & XBF_INCORE)
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  690  			goto out_put_perag;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  691  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  692  		/* xfs_buf_find_insert() consumes the perag reference. */
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  693  		error = xfs_buf_find_insert(btp, pag, &cmap, map, nmaps,
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  694  				flags, &bp);
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  695  		if (error)
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  696  			return error;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  697  	} else {
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  698  		XFS_STATS_INC(btp->bt_mount, xb_get_locked);
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  699  		xfs_perag_put(pag);
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  700  	}
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  701  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  702  	/* We do not hold a perag reference anymore. */
611c99468c7aa1 fs/xfs/xfs_buf.c           Dave Chinner      2012-04-23 @703  	if (!bp->b_addr) {
ce8e922c0e79c8 fs/xfs/linux-2.6/xfs_buf.c Nathan Scott      2006-01-11  704  		error = _xfs_buf_map_pages(bp, flags);
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  705  		if (unlikely(error)) {
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  706  			xfs_warn_ratelimited(btp->bt_mount,
93baa55af1a19a fs/xfs/xfs_buf.c           Darrick J. Wong   2020-02-21  707  				"%s: failed to map %u pages", __func__,
93baa55af1a19a fs/xfs/xfs_buf.c           Darrick J. Wong   2020-02-21  708  				bp->b_page_count);
a8acad70731e7d fs/xfs/xfs_buf.c           Dave Chinner      2012-04-23  709  			xfs_buf_relse(bp);
3848b5f6709221 fs/xfs/xfs_buf.c           Darrick J. Wong   2020-01-23  710  			return error;
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  711  		}
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  712  	}
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  713  
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  714  	/*
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  715  	 * Clear b_error if this is a lookup from a caller that doesn't expect
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  716  	 * valid data to be found in the buffer.
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  717  	 */
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  718  	if (!(flags & XBF_READ))
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  719  		xfs_buf_ioerror(bp, 0);
b79f4a1c68bb99 fs/xfs/xfs_buf.c           Dave Chinner      2016-01-12  720  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  721  	XFS_STATS_INC(btp->bt_mount, xb_get);
0b1b213fcf3a84 fs/xfs/linux-2.6/xfs_buf.c Christoph Hellwig 2009-12-14  722  	trace_xfs_buf_get(bp, flags, _RET_IP_);
3848b5f6709221 fs/xfs/xfs_buf.c           Darrick J. Wong   2020-01-23  723  	*bpp = bp;
3848b5f6709221 fs/xfs/xfs_buf.c           Darrick J. Wong   2020-01-23  724  	return 0;
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  725  
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  726  out_put_perag:
ac3a2dcca9b378 fs/xfs/xfs_buf.c           Dave Chinner      2022-06-27  727  	xfs_perag_put(pag);
170041f71596da fs/xfs/xfs_buf.c           Christoph Hellwig 2021-06-07  728  	return error;
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  729  }
^1da177e4c3f41 fs/xfs/linux-2.6/xfs_buf.c Linus Torvalds    2005-04-16  730  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 29+ messages in thread
* [PATCH 0/6 v3] xfs: lockless buffer lookups
@ 2022-07-07 23:52 Dave Chinner
  2022-07-07 23:52 ` [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() Dave Chinner
  0 siblings, 1 reply; 29+ messages in thread
From: Dave Chinner @ 2022-07-07 23:52 UTC (permalink / raw)
  To: linux-xfs

Hi folks,

Current work to merge the XFS inode life cycle with the VFS indoe
life cycle is finding some interesting issues. If we have a path
that hits buffer trylocks fairly hard (e.g. a non-blocking
background inode freeing function), we end up hitting massive
contention on the buffer cache hash locks:

-   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
   - 92.67% xfs_inodegc_worker
      - 92.13% xfs_inode_unlink
         - 91.52% xfs_inactive_ifree
            - 85.63% xfs_read_agi
               - 85.61% xfs_trans_read_buf_map
                  - 85.59% xfs_buf_read_map
                     - xfs_buf_get_map
                        - 85.55% xfs_buf_find
                           - 72.87% _raw_spin_lock
                              - do_raw_spin_lock
                                   71.86% __pv_queued_spin_lock_slowpath
                           - 8.74% xfs_buf_rele
                              - 7.88% _raw_spin_lock
                                 - 7.88% do_raw_spin_lock
                                      7.63% __pv_queued_spin_lock_slowpath
                           - 1.70% xfs_buf_trylock
                              - 1.68% down_trylock
                                 - 1.41% _raw_spin_lock_irqsave
                                    - 1.39% do_raw_spin_lock
                                         __pv_queued_spin_lock_slowpath
                           - 0.76% _raw_spin_unlock
                                0.75% do_raw_spin_unlock

This is basically hammering the pag->pag_buf_lock from lots of CPUs
doing trylocks at the same time. Most of the buffer trylock
operations ultimately fail after we've done the lookup, so we're
really hammering the buf hash lock whilst making no progress.

We can also see significant spinlock traffic on the same lock just
under normal operation when lots of tasks are accessing metadata
from the same AG, so let's avoid all this by creating a lookup fast
path which leverages the rhashtable's ability to do rcu protected
lookups.

This is a rework of the initial lockless buffer lookup patch I sent
here:

https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/

And the alternative cleanup sent by Christoph here:

https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/

This version isn't quite a short as Christophs, but it does roughly
the same thing in killing the two-phase _xfs_buf_find() call
mechanism. It separates the fast and slow paths a little more
cleanly and doesn't have context dependent buffer return state from
the slow path that the caller needs to handle. It also picks up the
rhashtable insert optimisation that Christoph added.

This series passes fstests under several different configs and does
not cause any obvious regressions in scalability testing that has
been performed. Hence I'm proposing this as potential 5.20 cycle
material.

Thoughts, comments?

Version 3:
- rebased onto linux-xfs/for-next
- rearranged some of the changes to avoid repeated shuffling of code
  to different locations
- fixed typos in commits
- s/xfs_buf_find_verify/xfs_buf_map_verify/
- s/xfs_buf_find_fast/xfs_buf_lookup/

Version 2:
- https://lore.kernel.org/linux-xfs/20220627060841.244226-1-david@fromorbit.com/
- based on 5.19-rc2
- high speed collision of original proposals.

Initial versions:
- https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/
- https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/



^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-07-12  0:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-27  6:08 [PATCH 0/6 v2] xfs: lockless buffer lookups Dave Chinner
2022-06-27  6:08 ` [PATCH 1/6] xfs: rework xfs_buf_incore() API Dave Chinner
2022-06-29  7:30   ` Christoph Hellwig
2022-06-29 21:24   ` Darrick J. Wong
2022-06-27  6:08 ` [PATCH 2/6] xfs: break up xfs_buf_find() into individual pieces Dave Chinner
2022-06-28  2:22   ` Chris Dunlop
2022-06-29  7:35   ` Christoph Hellwig
2022-06-29 21:50   ` Darrick J. Wong
2022-06-27  6:08 ` [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() Dave Chinner
2022-06-29  7:40   ` Christoph Hellwig
2022-06-29 22:06     ` Darrick J. Wong
2022-07-07 12:39       ` Dave Chinner
2022-06-27  6:08 ` [PATCH 4/6] xfs: reduce the number of atomic when locking a buffer after lookup Dave Chinner
2022-06-29 22:00   ` Darrick J. Wong
2022-06-27  6:08 ` [PATCH 5/6] xfs: remove a superflous hash lookup when inserting new buffers Dave Chinner
2022-06-29  7:40   ` Christoph Hellwig
2022-06-29 22:01   ` Darrick J. Wong
2022-06-27  6:08 ` [PATCH 6/6] xfs: lockless buffer lookup Dave Chinner
2022-06-29  7:41   ` Christoph Hellwig
2022-06-29 22:04   ` Darrick J. Wong
2022-07-07 12:36     ` Dave Chinner
2022-07-07 17:55       ` Darrick J. Wong
2022-07-11  5:16       ` Christoph Hellwig
2022-07-07  2:40 ` [PATCH 0/6 v2] xfs: lockless buffer lookups Darrick J. Wong
2022-06-27 22:09 [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() kernel test robot
2022-07-07 23:52 [PATCH 0/6 v3] xfs: lockless buffer lookups Dave Chinner
2022-07-07 23:52 ` [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong
2022-07-11  5:14   ` Christoph Hellwig
2022-07-12  0:01     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.