All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 0/6 v3] xfs: lockless buffer lookups
Date: Wed, 13 Jul 2022 10:01:15 -0700	[thread overview]
Message-ID: <Ys76W8V72KJmXN+B@magnolia> (raw)
In-Reply-To: <20220707235259.1097443-1-david@fromorbit.com>

On Fri, Jul 08, 2022 at 09:52:53AM +1000, Dave Chinner wrote:
> Hi folks,
> 
> Current work to merge the XFS inode life cycle with the VFS indoe
> life cycle is finding some interesting issues. If we have a path
> that hits buffer trylocks fairly hard (e.g. a non-blocking
> background inode freeing function), we end up hitting massive
> contention on the buffer cache hash locks:

Hmm.  I applied this to a test branch and this fell out of xfs/436 when
it runs rmmod xfs.  I'll see if I can reproduce it more regularly, but
thought I'd put this out there early...

XFS (sda3): Unmounting Filesystem
=============================================================================
BUG xfs_buf (Not tainted): Objects remaining in xfs_buf on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

Slab 0xffffea000443b780 objects=18 used=4 fp=0xffff888110edf340 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
CPU: 3 PID: 30378 Comm: modprobe Not tainted 5.19.0-rc5-djwx #rc5 bebda13a030d0898279476b6652ddea67c2060cc
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x34/0x44
 slab_err+0x95/0xc9
 __kmem_cache_shutdown.cold+0x39/0x1e9
 kmem_cache_destroy+0x49/0x130
 exit_xfs_fs+0x50/0xc57 [xfs 370e1c994a59de083c05cd4df389f629878b8122]
 __do_sys_delete_module.constprop.0+0x145/0x220
 ? exit_to_user_mode_prepare+0x6c/0x100
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fe7d7877c9b
Code: 73 01 c3 48 8b 0d 95 21 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 65 21 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007fffb911cab8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 0000555a217adcc0 RCX: 00007fe7d7877c9b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 0000555a217add28
RBP: 0000555a217adcc0 R08: 0000000000000000 R09: 0000000000000000
R10: 00007fe7d790fac0 R11: 0000000000000206 R12: 0000555a217add28
R13: 0000000000000000 R14: 0000555a217add28 R15: 00007fffb911ede8
 </TASK>
Disabling lock debugging due to kernel taint
Object 0xffff888110ede000 @offset=0
Object 0xffff888110ede1c0 @offset=448
Object 0xffff888110edefc0 @offset=4032
Object 0xffff888110edf6c0 @offset=5824

--D

> -   92.71%     0.05%  [kernel]                  [k] xfs_inodegc_worker
>    - 92.67% xfs_inodegc_worker
>       - 92.13% xfs_inode_unlink
>          - 91.52% xfs_inactive_ifree
>             - 85.63% xfs_read_agi
>                - 85.61% xfs_trans_read_buf_map
>                   - 85.59% xfs_buf_read_map
>                      - xfs_buf_get_map
>                         - 85.55% xfs_buf_find
>                            - 72.87% _raw_spin_lock
>                               - do_raw_spin_lock
>                                    71.86% __pv_queued_spin_lock_slowpath
>                            - 8.74% xfs_buf_rele
>                               - 7.88% _raw_spin_lock
>                                  - 7.88% do_raw_spin_lock
>                                       7.63% __pv_queued_spin_lock_slowpath
>                            - 1.70% xfs_buf_trylock
>                               - 1.68% down_trylock
>                                  - 1.41% _raw_spin_lock_irqsave
>                                     - 1.39% do_raw_spin_lock
>                                          __pv_queued_spin_lock_slowpath
>                            - 0.76% _raw_spin_unlock
>                                 0.75% do_raw_spin_unlock
> 
> This is basically hammering the pag->pag_buf_lock from lots of CPUs
> doing trylocks at the same time. Most of the buffer trylock
> operations ultimately fail after we've done the lookup, so we're
> really hammering the buf hash lock whilst making no progress.
> 
> We can also see significant spinlock traffic on the same lock just
> under normal operation when lots of tasks are accessing metadata
> from the same AG, so let's avoid all this by creating a lookup fast
> path which leverages the rhashtable's ability to do rcu protected
> lookups.
> 
> This is a rework of the initial lockless buffer lookup patch I sent
> here:
> 
> https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/
> 
> And the alternative cleanup sent by Christoph here:
> 
> https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/
> 
> This version isn't quite a short as Christophs, but it does roughly
> the same thing in killing the two-phase _xfs_buf_find() call
> mechanism. It separates the fast and slow paths a little more
> cleanly and doesn't have context dependent buffer return state from
> the slow path that the caller needs to handle. It also picks up the
> rhashtable insert optimisation that Christoph added.
> 
> This series passes fstests under several different configs and does
> not cause any obvious regressions in scalability testing that has
> been performed. Hence I'm proposing this as potential 5.20 cycle
> material.
> 
> Thoughts, comments?
> 
> Version 3:
> - rebased onto linux-xfs/for-next
> - rearranged some of the changes to avoid repeated shuffling of code
>   to different locations
> - fixed typos in commits
> - s/xfs_buf_find_verify/xfs_buf_map_verify/
> - s/xfs_buf_find_fast/xfs_buf_lookup/
> 
> Version 2:
> - https://lore.kernel.org/linux-xfs/20220627060841.244226-1-david@fromorbit.com/
> - based on 5.19-rc2
> - high speed collision of original proposals.
> 
> Initial versions:
> - https://lore.kernel.org/linux-xfs/20220403120119.235457-1-hch@lst.de/
> - https://lore.kernel.org/linux-xfs/20220328213810.1174688-1-david@fromorbit.com/
> 
> 

  parent reply	other threads:[~2022-07-13 17:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 23:52 [PATCH 0/6 v3] xfs: lockless buffer lookups Dave Chinner
2022-07-07 23:52 ` [PATCH 1/6] xfs: rework xfs_buf_incore() API Dave Chinner
2022-07-07 23:52 ` [PATCH 2/6] xfs: break up xfs_buf_find() into individual pieces Dave Chinner
2022-07-09 22:58   ` Darrick J. Wong
2022-07-07 23:52 ` [PATCH 3/6] xfs: merge xfs_buf_find() and xfs_buf_get_map() Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong
2022-07-11  5:14   ` Christoph Hellwig
2022-07-12  0:01     ` Dave Chinner
2022-07-07 23:52 ` [PATCH 4/6] xfs: reduce the number of atomic when locking a buffer after lookup Dave Chinner
2022-07-07 23:52 ` [PATCH 5/6] xfs: remove a superflous hash lookup when inserting new buffers Dave Chinner
2022-07-07 23:52 ` [PATCH 6/6] xfs: lockless buffer lookup Dave Chinner
2022-07-10  0:15   ` Darrick J. Wong
2022-07-13 17:01 ` Darrick J. Wong [this message]
2022-07-13 17:03   ` [PATCH 0/6 v3] xfs: lockless buffer lookups Darrick J. Wong
2022-07-14  1:32   ` Dave Chinner
2022-07-14  2:11     ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ys76W8V72KJmXN+B@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.