* Re: [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
@ 2020-06-18 17:37 kernel test robot
0 siblings, 0 replies; 6+ messages in thread
From: kernel test robot @ 2020-06-18 17:37 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 7892 bytes --]
CC: kbuild-all(a)lists.01.org
In-Reply-To: <20200615160830.8471-3-longman@redhat.com>
References: <20200615160830.8471-3-longman@redhat.com>
TO: Waiman Long <longman@redhat.com>
TO: "Darrick J. Wong" <darrick.wong@oracle.com>
TO: Ingo Molnar <mingo@redhat.com>
TO: Peter Zijlstra <peterz@infradead.org>
TO: Juri Lelli <juri.lelli@redhat.com>
TO: Vincent Guittot <vincent.guittot@linaro.org>
CC: linux-xfs(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Dave Chinner <david@fromorbit.com>
CC: Qian Cai <cai@lca.pw>
CC: Eric Sandeen <sandeen@redhat.com>
Hi Waiman,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on tip/auto-latest]
[also build test WARNING on linus/master v5.8-rc1 next-20200618]
[cannot apply to xfs-linux/for-next tip/sched/core djwong-xfs/djwong-devel]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Waiman-Long/sched-xfs-Add-PF_MEMALLOC_NOLOCKDEP-to-fix-lockdep-problem-in-xfs/20200616-002456
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 83cdaef93988a6bc6875623781de571b2694fe02
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: openrisc-randconfig-m031-20200618 (attached as .config)
compiler: or1k-linux-gcc (GCC) 9.3.0
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
New smatch warnings:
fs/xfs/xfs_trans.c:277 xfs_trans_alloc() warn: should this be a bitwise op?
Old smatch warnings:
fs/xfs/xfs_trans.c:313 xfs_trans_alloc() warn: should this be a bitwise op?
fs/xfs/xfs_trans.c:819 xfs_trans_committed_bulk() error: double locked 'ailp->ail_lock' (orig line 796)
# https://github.com/0day-ci/linux/commit/111f7859b082fd14d74340a9220aa1f3d52aa65f
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 111f7859b082fd14d74340a9220aa1f3d52aa65f
vim +277 fs/xfs/xfs_trans.c
^1da177e4c3f41 Linus Torvalds 2005-04-16 247
253f4911f297b8 Christoph Hellwig 2016-04-06 248 int
253f4911f297b8 Christoph Hellwig 2016-04-06 249 xfs_trans_alloc(
253f4911f297b8 Christoph Hellwig 2016-04-06 250 struct xfs_mount *mp,
253f4911f297b8 Christoph Hellwig 2016-04-06 251 struct xfs_trans_res *resp,
253f4911f297b8 Christoph Hellwig 2016-04-06 252 uint blocks,
253f4911f297b8 Christoph Hellwig 2016-04-06 253 uint rtextents,
253f4911f297b8 Christoph Hellwig 2016-04-06 254 uint flags,
253f4911f297b8 Christoph Hellwig 2016-04-06 255 struct xfs_trans **tpp)
253f4911f297b8 Christoph Hellwig 2016-04-06 256 {
253f4911f297b8 Christoph Hellwig 2016-04-06 257 struct xfs_trans *tp;
111f7859b082fd Waiman Long 2020-06-15 258 int error = 0;
111f7859b082fd Waiman Long 2020-06-15 259 unsigned long pflags = -1;
111f7859b082fd Waiman Long 2020-06-15 260
111f7859b082fd Waiman Long 2020-06-15 261 /*
111f7859b082fd Waiman Long 2020-06-15 262 * When XFS_TRANS_NO_WRITECOUNT is set, it means there are no dirty
111f7859b082fd Waiman Long 2020-06-15 263 * data pages in the filesystem at this point. So even if fs reclaim
111f7859b082fd Waiman Long 2020-06-15 264 * is being done, it won't happen to this filesystem. In this case,
111f7859b082fd Waiman Long 2020-06-15 265 * PF_MEMALLOC_NOLOCKDEP should be set to avoid false positive
111f7859b082fd Waiman Long 2020-06-15 266 * lockdep splat like:
111f7859b082fd Waiman Long 2020-06-15 267 *
111f7859b082fd Waiman Long 2020-06-15 268 * CPU0 CPU1
111f7859b082fd Waiman Long 2020-06-15 269 * ---- ----
111f7859b082fd Waiman Long 2020-06-15 270 * lock(sb_internal);
111f7859b082fd Waiman Long 2020-06-15 271 * lock(fs_reclaim);
111f7859b082fd Waiman Long 2020-06-15 272 * lock(sb_internal);
111f7859b082fd Waiman Long 2020-06-15 273 * lock(fs_reclaim);
111f7859b082fd Waiman Long 2020-06-15 274 *
111f7859b082fd Waiman Long 2020-06-15 275 * *** DEADLOCK ***
111f7859b082fd Waiman Long 2020-06-15 276 */
111f7859b082fd Waiman Long 2020-06-15 @277 if (PF_MEMALLOC_NOLOCKDEP && (flags & XFS_TRANS_NO_WRITECOUNT))
111f7859b082fd Waiman Long 2020-06-15 278 current_set_flags_nested(&pflags, PF_MEMALLOC_NOLOCKDEP);
253f4911f297b8 Christoph Hellwig 2016-04-06 279
8683edb7755b85 Dave Chinner 2018-09-29 280 /*
8683edb7755b85 Dave Chinner 2018-09-29 281 * Allocate the handle before we do our freeze accounting and setting up
8683edb7755b85 Dave Chinner 2018-09-29 282 * GFP_NOFS allocation context so that we avoid lockdep false positives
8683edb7755b85 Dave Chinner 2018-09-29 283 * by doing GFP_KERNEL allocations inside sb_start_intwrite().
8683edb7755b85 Dave Chinner 2018-09-29 284 */
707e0ddaf67e89 Tetsuo Handa 2019-08-26 285 tp = kmem_zone_zalloc(xfs_trans_zone, 0);
253f4911f297b8 Christoph Hellwig 2016-04-06 286 if (!(flags & XFS_TRANS_NO_WRITECOUNT))
253f4911f297b8 Christoph Hellwig 2016-04-06 287 sb_start_intwrite(mp->m_super);
253f4911f297b8 Christoph Hellwig 2016-04-06 288
10ee25268e1f84 Darrick J. Wong 2018-06-21 289 /*
10ee25268e1f84 Darrick J. Wong 2018-06-21 290 * Zero-reservation ("empty") transactions can't modify anything, so
10ee25268e1f84 Darrick J. Wong 2018-06-21 291 * they're allowed to run while we're frozen.
10ee25268e1f84 Darrick J. Wong 2018-06-21 292 */
10ee25268e1f84 Darrick J. Wong 2018-06-21 293 WARN_ON(resp->tr_logres > 0 &&
10ee25268e1f84 Darrick J. Wong 2018-06-21 294 mp->m_super->s_writers.frozen == SB_FREEZE_COMPLETE);
253f4911f297b8 Christoph Hellwig 2016-04-06 295
253f4911f297b8 Christoph Hellwig 2016-04-06 296 tp->t_magic = XFS_TRANS_HEADER_MAGIC;
253f4911f297b8 Christoph Hellwig 2016-04-06 297 tp->t_flags = flags;
253f4911f297b8 Christoph Hellwig 2016-04-06 298 tp->t_mountp = mp;
253f4911f297b8 Christoph Hellwig 2016-04-06 299 INIT_LIST_HEAD(&tp->t_items);
253f4911f297b8 Christoph Hellwig 2016-04-06 300 INIT_LIST_HEAD(&tp->t_busy);
9d9e6233859706 Brian Foster 2018-08-01 301 INIT_LIST_HEAD(&tp->t_dfops);
bba59c5e4b38e1 Brian Foster 2018-07-11 302 tp->t_firstblock = NULLFSBLOCK;
253f4911f297b8 Christoph Hellwig 2016-04-06 303
253f4911f297b8 Christoph Hellwig 2016-04-06 304 error = xfs_trans_reserve(tp, resp, blocks, rtextents);
253f4911f297b8 Christoph Hellwig 2016-04-06 305 if (error) {
253f4911f297b8 Christoph Hellwig 2016-04-06 306 xfs_trans_cancel(tp);
111f7859b082fd Waiman Long 2020-06-15 307 goto out;
253f4911f297b8 Christoph Hellwig 2016-04-06 308 }
253f4911f297b8 Christoph Hellwig 2016-04-06 309
ba18781b91569a Dave Chinner 2018-05-09 310 trace_xfs_trans_alloc(tp, _RET_IP_);
253f4911f297b8 Christoph Hellwig 2016-04-06 311 *tpp = tp;
111f7859b082fd Waiman Long 2020-06-15 312 out:
111f7859b082fd Waiman Long 2020-06-15 313 if (PF_MEMALLOC_NOLOCKDEP && (pflags != -1))
111f7859b082fd Waiman Long 2020-06-15 314 current_restore_flags_nested(&pflags, PF_MEMALLOC_NOLOCKDEP);
111f7859b082fd Waiman Long 2020-06-15 315 return error;
253f4911f297b8 Christoph Hellwig 2016-04-06 316 }
253f4911f297b8 Christoph Hellwig 2016-04-06 317
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 26466 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
2020-06-15 20:53 ` Waiman Long
@ 2020-06-16 16:29 ` Darrick J. Wong
0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2020-06-16 16:29 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
linux-xfs, linux-kernel, Dave Chinner, Qian Cai, Eric Sandeen,
Andrew Morton
On Mon, Jun 15, 2020 at 04:53:38PM -0400, Waiman Long wrote:
> On 6/15/20 12:43 PM, Darrick J. Wong wrote:
> > On Mon, Jun 15, 2020 at 12:08:30PM -0400, Waiman Long wrote:
> > > Depending on the workloads, the following circular locking dependency
> > > warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
> > > lock) may show up:
> > >
> > > ======================================================
> > > WARNING: possible circular locking dependency detected
> > > 5.0.0-rc1+ #60 Tainted: G W
> > > ------------------------------------------------------
> > > fsfreeze/4346 is trying to acquire lock:
> > > 0000000026f1d784 (fs_reclaim){+.+.}, at:
> > > fs_reclaim_acquire.part.19+0x5/0x30
> > >
> > > but task is already holding lock:
> > > 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
> > >
> > > which lock already depends on the new lock.
> > > :
> > > Possible unsafe locking scenario:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > lock(sb_internal);
> > > lock(fs_reclaim);
> > > lock(sb_internal);
> > > lock(fs_reclaim);
> > >
> > > *** DEADLOCK ***
> > >
> > > 4 locks held by fsfreeze/4346:
> > > #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
> > > #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
> > > #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
> > > #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
> > >
> > > stack backtrace:
> > > Call Trace:
> > > dump_stack+0xe0/0x19a
> > > print_circular_bug.isra.10.cold.34+0x2f4/0x435
> > > check_prev_add.constprop.19+0xca1/0x15f0
> > > validate_chain.isra.14+0x11af/0x3b50
> > > __lock_acquire+0x728/0x1200
> > > lock_acquire+0x269/0x5a0
> > > fs_reclaim_acquire.part.19+0x29/0x30
> > > fs_reclaim_acquire+0x19/0x20
> > > kmem_cache_alloc+0x3e/0x3f0
> > > kmem_zone_alloc+0x79/0x150
> > > xfs_trans_alloc+0xfa/0x9d0
> > > xfs_sync_sb+0x86/0x170
> > > xfs_log_sbcount+0x10f/0x140
> > > xfs_quiesce_attr+0x134/0x270
> > > xfs_fs_freeze+0x4a/0x70
> > > freeze_super+0x1af/0x290
> > > do_vfs_ioctl+0xedc/0x16c0
> > > ksys_ioctl+0x41/0x80
> > > __x64_sys_ioctl+0x73/0xa9
> > > do_syscall_64+0x18f/0xd23
> > > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > > This is a false positive as all the dirty pages are flushed out before
> > > the filesystem can be frozen.
> > >
> > > Perhaps breaking the fs_reclaim pseudo lock into a per filesystem lock
> > > may fix the issue. However, that will greatly complicate the logic and
> > > may not be worth it.
> > >
> > > Another way to fix it is to disable the taking of the fs_reclaim
> > > pseudo lock when in the freezing code path as a reclaim on the
> > > freezed filesystem is not possible. By using the newly introduced
> > > PF_MEMALLOC_NOLOCKDEP flag, lockdep checking is disabled in
> > > xfs_trans_alloc() if XFS_TRANS_NO_WRITECOUNT flag is set.
> > >
> > > In the freezing path, there is another path where memory allocation
> > > is being done without the XFS_TRANS_NO_WRITECOUNT flag:
> > >
> > > xfs_fs_freeze()
> > > => xfs_quiesce_attr()
> > > => xfs_log_quiesce()
> > > => xfs_log_unmount_write()
> > > => xlog_unmount_write()
> > > => xfs_log_reserve()
> > > => xlog_ticket_alloc()
> > >
> > > In this case, we just disable fs reclaim for this particular 600 bytes
> > > memory allocation.
> > >
> > > Without this patch, the command sequence below will show that the lock
> > > dependency chain sb_internal -> fs_reclaim exists.
> > >
> > > # fsfreeze -f /home
> > > # fsfreeze --unfreeze /home
> > > # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
> > >
> > > After applying the patch, such sb_internal -> fs_reclaim lock dependency
> > > chain can no longer be found. Because of that, the locking dependency
> > > warning will not be shown.
> > >
> > > Suggested-by: Dave Chinner <david@fromorbit.com>
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > > fs/xfs/xfs_log.c | 9 +++++++++
> > > fs/xfs/xfs_trans.c | 30 ++++++++++++++++++++++++++----
> > > 2 files changed, 35 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index 00fda2e8e738..33244680d0d4 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -830,8 +830,17 @@ xlog_unmount_write(
> > > xfs_lsn_t lsn;
> > > uint flags = XLOG_UNMOUNT_TRANS;
> > > int error;
> > > + unsigned long pflags;
> > > + /*
> > > + * xfs_log_reserve() allocates memory. This can lead to fs reclaim
> > > + * which may conflicts with the unmount process. To avoid that,
> > > + * disable fs reclaim for this allocation.
> > > + */
> > > + current_set_flags_nested(&pflags, PF_MEMALLOC_NOFS);
> > > error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
> > > + current_restore_flags_nested(&pflags, PF_MEMALLOC_NOFS);
> > > +
> > > if (error)
> > > goto out_err;
> > > diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> > > index 3c94e5ff4316..ddb10ad3f51f 100644
> > > --- a/fs/xfs/xfs_trans.c
> > > +++ b/fs/xfs/xfs_trans.c
> > > @@ -255,7 +255,27 @@ xfs_trans_alloc(
> > > struct xfs_trans **tpp)
> > > {
> > > struct xfs_trans *tp;
> > > - int error;
> > > + int error = 0;
> > > + unsigned long pflags = -1;
> > > +
> > > + /*
> > > + * When XFS_TRANS_NO_WRITECOUNT is set, it means there are no dirty
> > > + * data pages in the filesystem at this point.
> > That's not true. Look at the other callers of xfs_trans_alloc_empty.
> Yes, I am aware of that. I can change it to check the freeze state.
<nod>
> > Also: Why not set PF_MEMALLOC_NOFS at the start of the freeze call
> > chain?
>
> I guess we can do that, but it eliminates a potential source for memory
> reclaim leading to freeze error when not much free memory is left. We can go
> this route if you think this is not a problem.
<shrug> It sounds like you & Dave had already worked that out, so we can
leave this as it is. I saw the untrue claim in the code comment and
started asking more questions. ;)
(Says me who has been checked out the last few days, not following
the various lockdep shuttup patch threads...)
--D
> Cheers,
> Longman
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
2020-06-15 16:43 ` Darrick J. Wong
2020-06-15 20:53 ` Waiman Long
@ 2020-06-15 23:28 ` Dave Chinner
1 sibling, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2020-06-15 23:28 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Waiman Long, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot, linux-xfs, linux-kernel, Qian Cai, Eric Sandeen,
Andrew Morton
On Mon, Jun 15, 2020 at 09:43:51AM -0700, Darrick J. Wong wrote:
> Also: Why not set PF_MEMALLOC_NOFS at the start of the freeze call
> chain?
Because there's no guarantee that we are always going to do this
final work in the freeze syscall context? i.e. the state is specific
to the context in which we are running the transaction, not the
task context it is running in...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
2020-06-15 16:43 ` Darrick J. Wong
@ 2020-06-15 20:53 ` Waiman Long
2020-06-16 16:29 ` Darrick J. Wong
2020-06-15 23:28 ` Dave Chinner
1 sibling, 1 reply; 6+ messages in thread
From: Waiman Long @ 2020-06-15 20:53 UTC (permalink / raw)
To: Darrick J. Wong
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
linux-xfs, linux-kernel, Dave Chinner, Qian Cai, Eric Sandeen,
Andrew Morton
On 6/15/20 12:43 PM, Darrick J. Wong wrote:
> On Mon, Jun 15, 2020 at 12:08:30PM -0400, Waiman Long wrote:
>> Depending on the workloads, the following circular locking dependency
>> warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
>> lock) may show up:
>>
>> ======================================================
>> WARNING: possible circular locking dependency detected
>> 5.0.0-rc1+ #60 Tainted: G W
>> ------------------------------------------------------
>> fsfreeze/4346 is trying to acquire lock:
>> 0000000026f1d784 (fs_reclaim){+.+.}, at:
>> fs_reclaim_acquire.part.19+0x5/0x30
>>
>> but task is already holding lock:
>> 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
>>
>> which lock already depends on the new lock.
>> :
>> Possible unsafe locking scenario:
>>
>> CPU0 CPU1
>> ---- ----
>> lock(sb_internal);
>> lock(fs_reclaim);
>> lock(sb_internal);
>> lock(fs_reclaim);
>>
>> *** DEADLOCK ***
>>
>> 4 locks held by fsfreeze/4346:
>> #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
>> #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
>> #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
>> #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
>>
>> stack backtrace:
>> Call Trace:
>> dump_stack+0xe0/0x19a
>> print_circular_bug.isra.10.cold.34+0x2f4/0x435
>> check_prev_add.constprop.19+0xca1/0x15f0
>> validate_chain.isra.14+0x11af/0x3b50
>> __lock_acquire+0x728/0x1200
>> lock_acquire+0x269/0x5a0
>> fs_reclaim_acquire.part.19+0x29/0x30
>> fs_reclaim_acquire+0x19/0x20
>> kmem_cache_alloc+0x3e/0x3f0
>> kmem_zone_alloc+0x79/0x150
>> xfs_trans_alloc+0xfa/0x9d0
>> xfs_sync_sb+0x86/0x170
>> xfs_log_sbcount+0x10f/0x140
>> xfs_quiesce_attr+0x134/0x270
>> xfs_fs_freeze+0x4a/0x70
>> freeze_super+0x1af/0x290
>> do_vfs_ioctl+0xedc/0x16c0
>> ksys_ioctl+0x41/0x80
>> __x64_sys_ioctl+0x73/0xa9
>> do_syscall_64+0x18f/0xd23
>> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> This is a false positive as all the dirty pages are flushed out before
>> the filesystem can be frozen.
>>
>> Perhaps breaking the fs_reclaim pseudo lock into a per filesystem lock
>> may fix the issue. However, that will greatly complicate the logic and
>> may not be worth it.
>>
>> Another way to fix it is to disable the taking of the fs_reclaim
>> pseudo lock when in the freezing code path as a reclaim on the
>> freezed filesystem is not possible. By using the newly introduced
>> PF_MEMALLOC_NOLOCKDEP flag, lockdep checking is disabled in
>> xfs_trans_alloc() if XFS_TRANS_NO_WRITECOUNT flag is set.
>>
>> In the freezing path, there is another path where memory allocation
>> is being done without the XFS_TRANS_NO_WRITECOUNT flag:
>>
>> xfs_fs_freeze()
>> => xfs_quiesce_attr()
>> => xfs_log_quiesce()
>> => xfs_log_unmount_write()
>> => xlog_unmount_write()
>> => xfs_log_reserve()
>> => xlog_ticket_alloc()
>>
>> In this case, we just disable fs reclaim for this particular 600 bytes
>> memory allocation.
>>
>> Without this patch, the command sequence below will show that the lock
>> dependency chain sb_internal -> fs_reclaim exists.
>>
>> # fsfreeze -f /home
>> # fsfreeze --unfreeze /home
>> # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
>>
>> After applying the patch, such sb_internal -> fs_reclaim lock dependency
>> chain can no longer be found. Because of that, the locking dependency
>> warning will not be shown.
>>
>> Suggested-by: Dave Chinner <david@fromorbit.com>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>> fs/xfs/xfs_log.c | 9 +++++++++
>> fs/xfs/xfs_trans.c | 30 ++++++++++++++++++++++++++----
>> 2 files changed, 35 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
>> index 00fda2e8e738..33244680d0d4 100644
>> --- a/fs/xfs/xfs_log.c
>> +++ b/fs/xfs/xfs_log.c
>> @@ -830,8 +830,17 @@ xlog_unmount_write(
>> xfs_lsn_t lsn;
>> uint flags = XLOG_UNMOUNT_TRANS;
>> int error;
>> + unsigned long pflags;
>>
>> + /*
>> + * xfs_log_reserve() allocates memory. This can lead to fs reclaim
>> + * which may conflicts with the unmount process. To avoid that,
>> + * disable fs reclaim for this allocation.
>> + */
>> + current_set_flags_nested(&pflags, PF_MEMALLOC_NOFS);
>> error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
>> + current_restore_flags_nested(&pflags, PF_MEMALLOC_NOFS);
>> +
>> if (error)
>> goto out_err;
>>
>> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
>> index 3c94e5ff4316..ddb10ad3f51f 100644
>> --- a/fs/xfs/xfs_trans.c
>> +++ b/fs/xfs/xfs_trans.c
>> @@ -255,7 +255,27 @@ xfs_trans_alloc(
>> struct xfs_trans **tpp)
>> {
>> struct xfs_trans *tp;
>> - int error;
>> + int error = 0;
>> + unsigned long pflags = -1;
>> +
>> + /*
>> + * When XFS_TRANS_NO_WRITECOUNT is set, it means there are no dirty
>> + * data pages in the filesystem at this point.
> That's not true. Look at the other callers of xfs_trans_alloc_empty.
Yes, I am aware of that. I can change it to check the freeze state.
>
> Also: Why not set PF_MEMALLOC_NOFS at the start of the freeze call
> chain?
I guess we can do that, but it eliminates a potential source for memory
reclaim leading to freeze error when not much free memory is left. We
can go this route if you think this is not a problem.
Cheers,
Longman
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
2020-06-15 16:08 ` [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim Waiman Long
@ 2020-06-15 16:43 ` Darrick J. Wong
2020-06-15 20:53 ` Waiman Long
2020-06-15 23:28 ` Dave Chinner
0 siblings, 2 replies; 6+ messages in thread
From: Darrick J. Wong @ 2020-06-15 16:43 UTC (permalink / raw)
To: Waiman Long
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
linux-xfs, linux-kernel, Dave Chinner, Qian Cai, Eric Sandeen,
Andrew Morton
On Mon, Jun 15, 2020 at 12:08:30PM -0400, Waiman Long wrote:
> Depending on the workloads, the following circular locking dependency
> warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
> lock) may show up:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.0.0-rc1+ #60 Tainted: G W
> ------------------------------------------------------
> fsfreeze/4346 is trying to acquire lock:
> 0000000026f1d784 (fs_reclaim){+.+.}, at:
> fs_reclaim_acquire.part.19+0x5/0x30
>
> but task is already holding lock:
> 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
>
> which lock already depends on the new lock.
> :
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(sb_internal);
> lock(fs_reclaim);
> lock(sb_internal);
> lock(fs_reclaim);
>
> *** DEADLOCK ***
>
> 4 locks held by fsfreeze/4346:
> #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
> #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
> #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
> #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
>
> stack backtrace:
> Call Trace:
> dump_stack+0xe0/0x19a
> print_circular_bug.isra.10.cold.34+0x2f4/0x435
> check_prev_add.constprop.19+0xca1/0x15f0
> validate_chain.isra.14+0x11af/0x3b50
> __lock_acquire+0x728/0x1200
> lock_acquire+0x269/0x5a0
> fs_reclaim_acquire.part.19+0x29/0x30
> fs_reclaim_acquire+0x19/0x20
> kmem_cache_alloc+0x3e/0x3f0
> kmem_zone_alloc+0x79/0x150
> xfs_trans_alloc+0xfa/0x9d0
> xfs_sync_sb+0x86/0x170
> xfs_log_sbcount+0x10f/0x140
> xfs_quiesce_attr+0x134/0x270
> xfs_fs_freeze+0x4a/0x70
> freeze_super+0x1af/0x290
> do_vfs_ioctl+0xedc/0x16c0
> ksys_ioctl+0x41/0x80
> __x64_sys_ioctl+0x73/0xa9
> do_syscall_64+0x18f/0xd23
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> This is a false positive as all the dirty pages are flushed out before
> the filesystem can be frozen.
>
> Perhaps breaking the fs_reclaim pseudo lock into a per filesystem lock
> may fix the issue. However, that will greatly complicate the logic and
> may not be worth it.
>
> Another way to fix it is to disable the taking of the fs_reclaim
> pseudo lock when in the freezing code path as a reclaim on the
> freezed filesystem is not possible. By using the newly introduced
> PF_MEMALLOC_NOLOCKDEP flag, lockdep checking is disabled in
> xfs_trans_alloc() if XFS_TRANS_NO_WRITECOUNT flag is set.
>
> In the freezing path, there is another path where memory allocation
> is being done without the XFS_TRANS_NO_WRITECOUNT flag:
>
> xfs_fs_freeze()
> => xfs_quiesce_attr()
> => xfs_log_quiesce()
> => xfs_log_unmount_write()
> => xlog_unmount_write()
> => xfs_log_reserve()
> => xlog_ticket_alloc()
>
> In this case, we just disable fs reclaim for this particular 600 bytes
> memory allocation.
>
> Without this patch, the command sequence below will show that the lock
> dependency chain sb_internal -> fs_reclaim exists.
>
> # fsfreeze -f /home
> # fsfreeze --unfreeze /home
> # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
>
> After applying the patch, such sb_internal -> fs_reclaim lock dependency
> chain can no longer be found. Because of that, the locking dependency
> warning will not be shown.
>
> Suggested-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> fs/xfs/xfs_log.c | 9 +++++++++
> fs/xfs/xfs_trans.c | 30 ++++++++++++++++++++++++++----
> 2 files changed, 35 insertions(+), 4 deletions(-)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index 00fda2e8e738..33244680d0d4 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -830,8 +830,17 @@ xlog_unmount_write(
> xfs_lsn_t lsn;
> uint flags = XLOG_UNMOUNT_TRANS;
> int error;
> + unsigned long pflags;
>
> + /*
> + * xfs_log_reserve() allocates memory. This can lead to fs reclaim
> + * which may conflicts with the unmount process. To avoid that,
> + * disable fs reclaim for this allocation.
> + */
> + current_set_flags_nested(&pflags, PF_MEMALLOC_NOFS);
> error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
> + current_restore_flags_nested(&pflags, PF_MEMALLOC_NOFS);
> +
> if (error)
> goto out_err;
>
> diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
> index 3c94e5ff4316..ddb10ad3f51f 100644
> --- a/fs/xfs/xfs_trans.c
> +++ b/fs/xfs/xfs_trans.c
> @@ -255,7 +255,27 @@ xfs_trans_alloc(
> struct xfs_trans **tpp)
> {
> struct xfs_trans *tp;
> - int error;
> + int error = 0;
> + unsigned long pflags = -1;
> +
> + /*
> + * When XFS_TRANS_NO_WRITECOUNT is set, it means there are no dirty
> + * data pages in the filesystem at this point.
That's not true. Look at the other callers of xfs_trans_alloc_empty.
Also: Why not set PF_MEMALLOC_NOFS at the start of the freeze call
chain?
--D
> + * So even if fs reclaim
> + * is being done, it won't happen to this filesystem. In this case,
> + * PF_MEMALLOC_NOLOCKDEP should be set to avoid false positive
> + * lockdep splat like:
> + *
> + * CPU0 CPU1
> + * ---- ----
> + * lock(sb_internal);
> + * lock(fs_reclaim);
> + * lock(sb_internal);
> + * lock(fs_reclaim);
> + *
> + * *** DEADLOCK ***
> + */
> + if (PF_MEMALLOC_NOLOCKDEP && (flags & XFS_TRANS_NO_WRITECOUNT))
> + current_set_flags_nested(&pflags, PF_MEMALLOC_NOLOCKDEP);
>
> /*
> * Allocate the handle before we do our freeze accounting and setting up
> @@ -284,13 +304,15 @@ xfs_trans_alloc(
> error = xfs_trans_reserve(tp, resp, blocks, rtextents);
> if (error) {
> xfs_trans_cancel(tp);
> - return error;
> + goto out;
> }
>
> trace_xfs_trans_alloc(tp, _RET_IP_);
> -
> *tpp = tp;
> - return 0;
> +out:
> + if (PF_MEMALLOC_NOLOCKDEP && (pflags != -1))
> + current_restore_flags_nested(&pflags, PF_MEMALLOC_NOLOCKDEP);
> + return error;
> }
>
> /*
> --
> 2.18.1
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim
2020-06-15 16:08 [PATCH 0/2] sched, xfs: Add PF_MEMALLOC_NOLOCKDEP to fix lockdep problem in xfs Waiman Long
@ 2020-06-15 16:08 ` Waiman Long
2020-06-15 16:43 ` Darrick J. Wong
0 siblings, 1 reply; 6+ messages in thread
From: Waiman Long @ 2020-06-15 16:08 UTC (permalink / raw)
To: Darrick J. Wong, Ingo Molnar, Peter Zijlstra, Juri Lelli,
Vincent Guittot
Cc: linux-xfs, linux-kernel, Dave Chinner, Qian Cai, Eric Sandeen,
Andrew Morton, Waiman Long
Depending on the workloads, the following circular locking dependency
warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
lock) may show up:
======================================================
WARNING: possible circular locking dependency detected
5.0.0-rc1+ #60 Tainted: G W
------------------------------------------------------
fsfreeze/4346 is trying to acquire lock:
0000000026f1d784 (fs_reclaim){+.+.}, at:
fs_reclaim_acquire.part.19+0x5/0x30
but task is already holding lock:
0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
which lock already depends on the new lock.
:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal);
lock(fs_reclaim);
lock(sb_internal);
lock(fs_reclaim);
*** DEADLOCK ***
4 locks held by fsfreeze/4346:
#0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
#1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
#2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
#3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
stack backtrace:
Call Trace:
dump_stack+0xe0/0x19a
print_circular_bug.isra.10.cold.34+0x2f4/0x435
check_prev_add.constprop.19+0xca1/0x15f0
validate_chain.isra.14+0x11af/0x3b50
__lock_acquire+0x728/0x1200
lock_acquire+0x269/0x5a0
fs_reclaim_acquire.part.19+0x29/0x30
fs_reclaim_acquire+0x19/0x20
kmem_cache_alloc+0x3e/0x3f0
kmem_zone_alloc+0x79/0x150
xfs_trans_alloc+0xfa/0x9d0
xfs_sync_sb+0x86/0x170
xfs_log_sbcount+0x10f/0x140
xfs_quiesce_attr+0x134/0x270
xfs_fs_freeze+0x4a/0x70
freeze_super+0x1af/0x290
do_vfs_ioctl+0xedc/0x16c0
ksys_ioctl+0x41/0x80
__x64_sys_ioctl+0x73/0xa9
do_syscall_64+0x18f/0xd23
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is a false positive as all the dirty pages are flushed out before
the filesystem can be frozen.
Perhaps breaking the fs_reclaim pseudo lock into a per filesystem lock
may fix the issue. However, that will greatly complicate the logic and
may not be worth it.
Another way to fix it is to disable the taking of the fs_reclaim
pseudo lock when in the freezing code path as a reclaim on the
freezed filesystem is not possible. By using the newly introduced
PF_MEMALLOC_NOLOCKDEP flag, lockdep checking is disabled in
xfs_trans_alloc() if XFS_TRANS_NO_WRITECOUNT flag is set.
In the freezing path, there is another path where memory allocation
is being done without the XFS_TRANS_NO_WRITECOUNT flag:
xfs_fs_freeze()
=> xfs_quiesce_attr()
=> xfs_log_quiesce()
=> xfs_log_unmount_write()
=> xlog_unmount_write()
=> xfs_log_reserve()
=> xlog_ticket_alloc()
In this case, we just disable fs reclaim for this particular 600 bytes
memory allocation.
Without this patch, the command sequence below will show that the lock
dependency chain sb_internal -> fs_reclaim exists.
# fsfreeze -f /home
# fsfreeze --unfreeze /home
# grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
After applying the patch, such sb_internal -> fs_reclaim lock dependency
chain can no longer be found. Because of that, the locking dependency
warning will not be shown.
Suggested-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Waiman Long <longman@redhat.com>
---
fs/xfs/xfs_log.c | 9 +++++++++
fs/xfs/xfs_trans.c | 30 ++++++++++++++++++++++++++----
2 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 00fda2e8e738..33244680d0d4 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -830,8 +830,17 @@ xlog_unmount_write(
xfs_lsn_t lsn;
uint flags = XLOG_UNMOUNT_TRANS;
int error;
+ unsigned long pflags;
+ /*
+ * xfs_log_reserve() allocates memory. This can lead to fs reclaim
+ * which may conflicts with the unmount process. To avoid that,
+ * disable fs reclaim for this allocation.
+ */
+ current_set_flags_nested(&pflags, PF_MEMALLOC_NOFS);
error = xfs_log_reserve(mp, 600, 1, &tic, XFS_LOG, 0);
+ current_restore_flags_nested(&pflags, PF_MEMALLOC_NOFS);
+
if (error)
goto out_err;
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 3c94e5ff4316..ddb10ad3f51f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -255,7 +255,27 @@ xfs_trans_alloc(
struct xfs_trans **tpp)
{
struct xfs_trans *tp;
- int error;
+ int error = 0;
+ unsigned long pflags = -1;
+
+ /*
+ * When XFS_TRANS_NO_WRITECOUNT is set, it means there are no dirty
+ * data pages in the filesystem at this point. So even if fs reclaim
+ * is being done, it won't happen to this filesystem. In this case,
+ * PF_MEMALLOC_NOLOCKDEP should be set to avoid false positive
+ * lockdep splat like:
+ *
+ * CPU0 CPU1
+ * ---- ----
+ * lock(sb_internal);
+ * lock(fs_reclaim);
+ * lock(sb_internal);
+ * lock(fs_reclaim);
+ *
+ * *** DEADLOCK ***
+ */
+ if (PF_MEMALLOC_NOLOCKDEP && (flags & XFS_TRANS_NO_WRITECOUNT))
+ current_set_flags_nested(&pflags, PF_MEMALLOC_NOLOCKDEP);
/*
* Allocate the handle before we do our freeze accounting and setting up
@@ -284,13 +304,15 @@ xfs_trans_alloc(
error = xfs_trans_reserve(tp, resp, blocks, rtextents);
if (error) {
xfs_trans_cancel(tp);
- return error;
+ goto out;
}
trace_xfs_trans_alloc(tp, _RET_IP_);
-
*tpp = tp;
- return 0;
+out:
+ if (PF_MEMALLOC_NOLOCKDEP && (pflags != -1))
+ current_restore_flags_nested(&pflags, PF_MEMALLOC_NOLOCKDEP);
+ return error;
}
/*
--
2.18.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-06-18 17:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 17:37 [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim kernel test robot
-- strict thread matches above, loose matches on Subject: below --
2020-06-15 16:08 [PATCH 0/2] sched, xfs: Add PF_MEMALLOC_NOLOCKDEP to fix lockdep problem in xfs Waiman Long
2020-06-15 16:08 ` [PATCH 2/2] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim Waiman Long
2020-06-15 16:43 ` Darrick J. Wong
2020-06-15 20:53 ` Waiman Long
2020-06-16 16:29 ` Darrick J. Wong
2020-06-15 23:28 ` Dave Chinner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.