linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 23/24] xfs: reclaim inodes from the LRU
Date: Thu, 8 Aug 2019 12:39:05 -0400	[thread overview]
Message-ID: <20190808163905.GC24551@bfoster> (raw)
In-Reply-To: <20190801021752.4986-24-david@fromorbit.com>

On Thu, Aug 01, 2019 at 12:17:51PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Replace the AG radix tree walking reclaim code with a list_lru
> walker, giving us both node-aware and memcg-aware inode reclaim
> at the XFS level. This requires adding an inode isolation function to
> determine if the inode can be reclaim, and a list walker to
> dispose of the inodes that were isolated.
> 
> We want the isolation function to be non-blocking. If we can't
> grab an inode then we either skip it or rotate it. If it's clean
> then we skip it, if it's dirty then we rotate to give it time to be

Do you mean we remove it if it's clean?

> cleaned before it is scanned again.
> 
> This congregates the dirty inodes at the tail of the LRU, which
> means that if we start hitting a majority of dirty inodes either
> there are lots of unlinked inodes in the reclaim list or we've
> reclaimed all the clean inodes and we're looped back on the dirty
> inodes. Either way, this is an indication we should tell kswapd to
> back off.
> 
> The non-blocking isolation function introduces a complexity for the
> filesystem shutdown case. When the filesystem is shut down, we want
> to free the inode even if it is dirty, and this may require
> blocking. We already hold the locks needed to do this blocking, so
> what we do is that we leave inodes locked - both the ILOCK and the
> flush lock - while they are sitting on the dispose list to be freed
> after the LRU walk completes.  This allows us to process the
> shutdown state outside the LRU walk where we can block safely.
> 
> Keep in mind we don't have to care about inode lock order or
> blocking with inode locks held here because a) we are using
> trylocks, and b) once marked with XFS_IRECLAIM they can't be found
> via the LRU and inode cache lookups will abort and retry. Hence
> nobody will try to lock them in any other context that might also be
> holding other inode locks.
> 
> Also convert xfs_reclaim_inodes() to use a LRU walk to free all
> the reclaimable inodes in the filesystem.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_icache.c | 199 ++++++++++++++++++++++++++++++++++++++------
>  fs/xfs/xfs_icache.h |  10 ++-
>  fs/xfs/xfs_inode.h  |   8 ++
>  fs/xfs/xfs_super.c  |  50 +++++++++--
>  4 files changed, 232 insertions(+), 35 deletions(-)
> 
...
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index b5c4c1b6fd19..e3e898a2896c 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
...
> @@ -1810,23 +1811,58 @@ xfs_fs_mount(
>  }
>  
>  static long
> -xfs_fs_nr_cached_objects(
> +xfs_fs_free_cached_objects(
>  	struct super_block	*sb,
>  	struct shrink_control	*sc)
>  {
> -	/* Paranoia: catch incorrect calls during mount setup or teardown */
> -	if (WARN_ON_ONCE(!sb->s_fs_info))
> -		return 0;
> +	struct xfs_mount	*mp = XFS_M(sb);
> +        struct xfs_ireclaim_args ra;

^ whitespace damage

> +	long freed;
>  
> -	return list_lru_shrink_count(&XFS_M(sb)->m_inode_lru, sc);
> +	INIT_LIST_HEAD(&ra.freeable);
> +	ra.lowest_lsn = NULLCOMMITLSN;
> +	ra.dirty_skipped = 0;
> +
> +	freed = list_lru_shrink_walk(&mp->m_inode_lru, sc,
> +					xfs_inode_reclaim_isolate, &ra);

This is more related to the locking discussion on the earlier patch, but
this looks like it has more similar serialization to the example patch I
posted than the one without locking at all. IIUC, this walk has an
internal lock per node lru that is held across the walk and passed into
the callback. We never cycle it, so for any given node we only allow one
reclaimer through here at a time.

That seems to be Ok given we don't do much in the isolation handler, the
lock isn't held across the dispose sequence and we're still batching in
the shrinker core on top of that. We're still serialized over the lru
fixups such that concurrent reclaimers aren't processing the same
inodes, however.

BTW I got a lockdep splat[1] for some reason on a straight mount/unmount
cycle with this patch.

Brian

[1] dmesg output:

[   39.011867] ============================================
[   39.012643] WARNING: possible recursive locking detected
[   39.013422] 5.3.0-rc2+ #205 Not tainted
[   39.014623] --------------------------------------------
[   39.015636] umount/871 is trying to acquire lock:
[   39.016122] 00000000ea09de26 (&xfs_nondir_ilock_class){+.+.}, at: xfs_ilock+0xd2/0x280 [xfs]
[   39.017072] 
[   39.017072] but task is already holding lock:
[   39.017832] 000000001a5b5707 (&xfs_nondir_ilock_class){+.+.}, at: xfs_ilock_nowait+0xcb/0x320 [xfs]
[   39.018909] 
[   39.018909] other info that might help us debug this:
[   39.019570]  Possible unsafe locking scenario:
[   39.019570] 
[   39.020248]        CPU0
[   39.020512]        ----
[   39.020778]   lock(&xfs_nondir_ilock_class);
[   39.021246]   lock(&xfs_nondir_ilock_class);
[   39.021705] 
[   39.021705]  *** DEADLOCK ***
[   39.021705] 
[   39.022338]  May be due to missing lock nesting notation
[   39.022338] 
[   39.023070] 3 locks held by umount/871:
[   39.023481]  #0: 000000004d39d244 (&type->s_umount_key#61){+.+.}, at: deactivate_super+0x43/0x50
[   39.024462]  #1: 0000000011270366 (&xfs_dir_ilock_class){++++}, at: xfs_ilock_nowait+0xcb/0x320 [xfs]
[   39.025488]  #2: 000000001a5b5707 (&xfs_nondir_ilock_class){+.+.}, at: xfs_ilock_nowait+0xcb/0x320 [xfs]
[   39.027163] 
[   39.027163] stack backtrace:
[   39.027681] CPU: 3 PID: 871 Comm: umount Not tainted 5.3.0-rc2+ #205
[   39.028534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   39.029152] Call Trace:
[   39.029428]  dump_stack+0x67/0x90
[   39.029889]  __lock_acquire.cold.67+0x121/0x1f9
[   39.030519]  lock_acquire+0x90/0x170
[   39.031170]  ? xfs_ilock+0xd2/0x280 [xfs]
[   39.031603]  down_write_nested+0x4f/0xb0
[   39.032064]  ? xfs_ilock+0xd2/0x280 [xfs]
[   39.032684]  ? xfs_dispose_inodes+0x124/0x320 [xfs]
[   39.033575]  xfs_ilock+0xd2/0x280 [xfs]
[   39.034058]  xfs_dispose_inodes+0x124/0x320 [xfs]
[   39.034656]  xfs_reclaim_inodes+0x149/0x190 [xfs]
[   39.035381]  ? finish_wait+0x80/0x80
[   39.035855]  xfs_unmountfs+0x81/0x190 [xfs]
[   39.036443]  xfs_fs_put_super+0x35/0x90 [xfs]
[   39.037016]  generic_shutdown_super+0x6c/0x100
[   39.037554]  kill_block_super+0x21/0x50
[   39.037986]  deactivate_locked_super+0x34/0x70
[   39.038477]  cleanup_mnt+0xb8/0x140
[   39.038879]  task_work_run+0x9e/0xd0
[   39.039302]  exit_to_usermode_loop+0xb3/0xc0
[   39.039774]  do_syscall_64+0x206/0x210
[   39.040591]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   39.041771] RIP: 0033:0x7fcd4ec0536b
[   39.042627] Code: 0b 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ed 0a 0c 00 f7 d8 64 89 01 48
[   39.045336] RSP: 002b:00007ffdedf686c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[   39.046119] RAX: 0000000000000000 RBX: 00007fcd4ed2f1e4 RCX: 00007fcd4ec0536b
[   39.047506] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055ad23f2ad90
[   39.048295] RBP: 000055ad23f2ab80 R08: 0000000000000000 R09: 00007ffdedf67440
[   39.049062] R10: 000055ad23f2adb0 R11: 0000000000000246 R12: 000055ad23f2ad90
[   39.049869] R13: 0000000000000000 R14: 000055ad23f2ac78 R15: 0000000000000000


  reply	other threads:[~2019-08-08 16:39 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-01  2:17 [RFC] [PATCH 00/24] mm, xfs: non-blocking inode reclaim Dave Chinner
2019-08-01  2:17 ` [PATCH 01/24] mm: directed shrinker work deferral Dave Chinner
2019-08-02 15:27   ` Brian Foster
2019-08-04  1:49     ` Dave Chinner
2019-08-05 17:42       ` Brian Foster
2019-08-05 23:43         ` Dave Chinner
2019-08-06 12:27           ` Brian Foster
2019-08-06 22:22             ` Dave Chinner
2019-08-07 11:13               ` Brian Foster
2019-08-01  2:17 ` [PATCH 02/24] shrinkers: use will_defer for GFP_NOFS sensitive shrinkers Dave Chinner
2019-08-02 15:27   ` Brian Foster
2019-08-04  1:50     ` Dave Chinner
2019-08-01  2:17 ` [PATCH 03/24] mm: factor shrinker work calculations Dave Chinner
2019-08-02 15:08   ` Nikolay Borisov
2019-08-04  2:05     ` Dave Chinner
2019-08-02 15:31   ` Brian Foster
2019-08-01  2:17 ` [PATCH 04/24] shrinker: defer work only to kswapd Dave Chinner
2019-08-02 15:34   ` Brian Foster
2019-08-04 16:48   ` Nikolay Borisov
2019-08-04 21:37     ` Dave Chinner
2019-08-07 16:12   ` kbuild test robot
2019-08-07 18:00   ` kbuild test robot
2019-08-01  2:17 ` [PATCH 05/24] shrinker: clean up variable types and tracepoints Dave Chinner
2019-08-01  2:17 ` [PATCH 06/24] mm: reclaim_state records pages reclaimed, not slabs Dave Chinner
2019-08-01  2:17 ` [PATCH 07/24] mm: back off direct reclaim on excessive shrinker deferral Dave Chinner
2019-08-01  2:17 ` [PATCH 08/24] mm: kswapd backoff for shrinkers Dave Chinner
2019-08-01  2:17 ` [PATCH 09/24] xfs: don't allow log IO to be throttled Dave Chinner
2019-08-01 13:39   ` Chris Mason
2019-08-01 23:58     ` Dave Chinner
2019-08-02  8:12       ` Christoph Hellwig
2019-08-02 14:11       ` Chris Mason
2019-08-02 18:34         ` Matthew Wilcox
2019-08-02 23:28         ` Dave Chinner
2019-08-05 18:32           ` Chris Mason
2019-08-05 23:09             ` Dave Chinner
2019-08-01  2:17 ` [PATCH 10/24] xfs: fix missed wakeup on l_flush_wait Dave Chinner
2019-08-01  2:17 ` [PATCH 11/24] xfs:: account for memory freed from metadata buffers Dave Chinner
2019-08-01  8:16   ` Christoph Hellwig
2019-08-01  9:21     ` Dave Chinner
2019-08-06  5:51       ` Christoph Hellwig
2019-08-01  2:17 ` [PATCH 12/24] xfs: correctly acount for reclaimable slabs Dave Chinner
2019-08-06  5:52   ` Christoph Hellwig
2019-08-06 21:05     ` Dave Chinner
2019-08-01  2:17 ` [PATCH 13/24] xfs: synchronous AIL pushing Dave Chinner
2019-08-05 17:51   ` Brian Foster
2019-08-05 23:21     ` Dave Chinner
2019-08-06 12:29       ` Brian Foster
2019-08-01  2:17 ` [PATCH 14/24] xfs: tail updates only need to occur when LSN changes Dave Chinner
2019-08-05 17:53   ` Brian Foster
2019-08-05 23:28     ` Dave Chinner
2019-08-06  5:33       ` Dave Chinner
2019-08-06 12:53         ` Brian Foster
2019-08-06 21:11           ` Dave Chinner
2019-08-01  2:17 ` [PATCH 15/24] xfs: eagerly free shadow buffers to reduce CIL footprint Dave Chinner
2019-08-05 18:03   ` Brian Foster
2019-08-05 23:33     ` Dave Chinner
2019-08-06 12:57       ` Brian Foster
2019-08-06 21:21         ` Dave Chinner
2019-08-01  2:17 ` [PATCH 16/24] xfs: Lower CIL flush limit for large logs Dave Chinner
2019-08-04 17:12   ` Nikolay Borisov
2019-08-01  2:17 ` [PATCH 17/24] xfs: don't block kswapd in inode reclaim Dave Chinner
2019-08-06 18:21   ` Brian Foster
2019-08-06 21:27     ` Dave Chinner
2019-08-07 11:14       ` Brian Foster
2019-08-01  2:17 ` [PATCH 18/24] xfs: reduce kswapd blocking on inode locking Dave Chinner
2019-08-06 18:22   ` Brian Foster
2019-08-06 21:33     ` Dave Chinner
2019-08-07 11:30       ` Brian Foster
2019-08-07 23:16         ` Dave Chinner
2019-08-01  2:17 ` [PATCH 19/24] xfs: kill background reclaim work Dave Chinner
2019-08-01  2:17 ` [PATCH 20/24] xfs: use AIL pushing for inode reclaim IO Dave Chinner
2019-08-07 18:09   ` Brian Foster
2019-08-07 23:10     ` Dave Chinner
2019-08-08 16:20       ` Brian Foster
2019-08-01  2:17 ` [PATCH 21/24] xfs: remove mode from xfs_reclaim_inodes() Dave Chinner
2019-08-01  2:17 ` [PATCH 22/24] xfs: track reclaimable inodes using a LRU list Dave Chinner
2019-08-08 16:36   ` Brian Foster
2019-08-09  0:10     ` Dave Chinner
2019-08-01  2:17 ` [PATCH 23/24] xfs: reclaim inodes from the LRU Dave Chinner
2019-08-08 16:39   ` Brian Foster [this message]
2019-08-09  1:20     ` Dave Chinner
2019-08-09 12:36       ` Brian Foster
2019-08-11  2:17         ` Dave Chinner
2019-08-11 12:46           ` Brian Foster
2019-08-01  2:17 ` [PATCH 24/24] xfs: remove unusued old inode reclaim code Dave Chinner
2019-08-06  5:57 ` [RFC] [PATCH 00/24] mm, xfs: non-blocking inode reclaim Christoph Hellwig
2019-08-06 21:37   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190808163905.GC24551@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).