All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Al Viro <viro@zeniv.linux.org.uk>,
	Dave Chinner <david@fromorbit.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Jan Kara <jack@suse.cz>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 0/4] change sb_writers to use percpu_rw_semaphore
Date: Fri, 7 Aug 2015 21:55:52 +0200	[thread overview]
Message-ID: <20150807195552.GB28529@redhat.com> (raw)
In-Reply-To: <20150722211513.GA19986@redhat.com>

Jan, Dave, please help.

I'll try to re-check/re-test, but so far I think that this and the
previous series are correct. However 3/4 from the previous series
(attached at the end just in case) uncovers (I think) some problems
in xfs locking.

What should I do now?

On 07/22, Oleg Nesterov wrote:
>
> Testing. Well, so far I only verified that ioctl(FIFREEZE) +
> ioctl(FITHAW) seems to wors "as expected" on my testing machine
> with ext3. So probably this needs more testing.

Finally I got the testing machine and ran xfstests, I did

	dd if=/dev/zero of=TEST.img bs=1MiB count=4000
	dd if=/dev/zero of=SCRATCH.img bs=1MiB count=4000

	losetup --find --show TEST.img
	losetup --find --show SCRATCH.img

	mkfs.xfs -f /dev/loop0
	mkfs.xfs -f /dev/loop1

	mkdir -p TEST SCRATCH

	mount /dev/loop0 TEST
	mount /dev/loop1 SCRATCH

	TEST_DEV=/dev/loop0 TEST_DIR=TEST SCRATCH_DEV=/dev/loop1 SCRATCH_MNT=SCRATCH \
	./check `grep -il freeze tests/*/???`

several times and every time the result looks like below:

	FSTYP         -- xfs (non-debug)
	PLATFORM      -- Linux/x86_64 intel-canoepass-10 4.1.0+
	MKFS_OPTIONS  -- -f -bsize=4096 /dev/loop1
	MOUNT_OPTIONS -- -o context=system_u:object_r:nfs_t:s0 /dev/loop1 SCRATCH

	generic/068 59s ... 61s
	generic/085 23s ... 26s
	generic/280 2s ... 3s
	generic/311 169s ... 167s
	xfs/011 21s ... 20s
	xfs/119 32s ... 21s
	xfs/297 455s ... 301s
	Ran: generic/068 generic/085 generic/280 generic/311 xfs/011 xfs/119 xfs/297
	Passed all 7 tests

But with CONFIG_LOCKDEP=y generic/068 triggers the warning:

	[   66.092831] ======================================================
	[   66.099726] [ INFO: possible circular locking dependency detected ]
	[   66.106719] 4.1.0+ #2 Not tainted
	[   66.110414] -------------------------------------------------------
	[   66.117405] fsstress/2210 is trying to acquire lock:
	[   66.122944]  (&mm->mmap_sem){++++++}, at: [<ffffffff81200562>] might_fault+0x42/0xa0
	[   66.131637]
		       but task is already holding lock:
	[   66.138146]  (&xfs_dir_ilock_class){++++..}, at: [<ffffffffa057da52>] xfs_ilock+0xc2/0x170 [xfs]
	[   66.148022]
		       which lock already depends on the new lock.

	[   66.157141]
		       the existing dependency chain (in reverse order) is:
	[   66.165490]
		       -> #4 (&xfs_dir_ilock_class){++++..}:
	[   66.170974]        [<ffffffff810fc94e>] lock_acquire+0xbe/0x150
	[   66.177596]        [<ffffffff810f4bbc>] down_write_nested+0x3c/0x70
	[   66.184605]        [<ffffffffa057dab6>] xfs_ilock+0x126/0x170 [xfs]
	[   66.191645]        [<ffffffffa057c4da>] xfs_setattr_nonsize+0x3ba/0x5d0 [xfs]
	[   66.199638]        [<ffffffffa057cb5a>] xfs_vn_setattr+0x9a/0xd0 [xfs]
	[   66.206950]        [<ffffffff81279411>] notify_change+0x271/0x3a0
	[   66.213762]        [<ffffffff8125391b>] chown_common.isra.11+0x15b/0x200
	[   66.221251]        [<ffffffff812551a6>] SyS_lchown+0xa6/0xf0
	[   66.227576]        [<ffffffff8182662e>] system_call_fastpath+0x12/0x76
	[   66.234878]
		       -> #3 (sb_internal#2){++++++}:
	[   66.239698]        [<ffffffff810fc94e>] lock_acquire+0xbe/0x150
	[   66.246316]        [<ffffffff81824256>] down_write+0x36/0x70
	[   66.252644]        [<ffffffff81100979>] percpu_down_write+0x39/0x110
	[   66.259760]        [<ffffffff8125901d>] freeze_super+0xbd/0x190
	[   66.266369]        [<ffffffff8126dbc8>] do_vfs_ioctl+0x3f8/0x520
	[   66.273082]        [<ffffffff8126dd71>] SyS_ioctl+0x81/0xa0
	[   66.279311]        [<ffffffff8182662e>] system_call_fastpath+0x12/0x76
	[   66.286610]
		       -> #2 (sb_pagefaults#2){++++++}:
	[   66.291623]        [<ffffffff810fc94e>] lock_acquire+0xbe/0x150
	[   66.298237]        [<ffffffff811007c1>] percpu_down_read+0x51/0xa0
	[   66.305144]        [<ffffffff8125979b>] __sb_start_write+0xdb/0x120
	[   66.312140]        [<ffffffff81295eda>] block_page_mkwrite+0x3a/0xb0
	[   66.319245]        [<ffffffffa057046e>] xfs_filemap_page_mkwrite+0x5e/0xb0 [xfs]
	[   66.327533]        [<ffffffff812009a8>] do_page_mkwrite+0x58/0x100
	[   66.334433]        [<ffffffff81205497>] handle_mm_fault+0x537/0x17c0
	[   66.341533]        [<ffffffff8106a03c>] __do_page_fault+0x19c/0x450
	[   66.348542]        [<ffffffff8106a31f>] do_page_fault+0x2f/0x80
	[   66.355172]        [<ffffffff818286e8>] page_fault+0x28/0x30
	[   66.361493]
		       -> #1 (&(&ip->i_mmaplock)->mr_lock){++++++}:
	[   66.367659]        [<ffffffff810fc94e>] lock_acquire+0xbe/0x150
	[   66.374275]        [<ffffffff810f4aef>] down_read_nested+0x3f/0x60
	[   66.381185]        [<ffffffffa057daf8>] xfs_ilock+0x168/0x170 [xfs]
	[   66.388212]        [<ffffffffa057050c>] xfs_filemap_fault+0x4c/0xb0 [xfs]
	[   66.395817]        [<ffffffff81200a9e>] __do_fault+0x4e/0x100
	[   66.402239]        [<ffffffff81205454>] handle_mm_fault+0x4f4/0x17c0
	[   66.409340]        [<ffffffff8106a03c>] __do_page_fault+0x19c/0x450
	[   66.416344]        [<ffffffff8106a31f>] do_page_fault+0x2f/0x80
	[   66.422959]        [<ffffffff818286e8>] page_fault+0x28/0x30
	[   66.429283]
		       -> #0 (&mm->mmap_sem){++++++}:
	[   66.434093]        [<ffffffff810fb867>] __lock_acquire+0x20d7/0x2100
	[   66.441194]        [<ffffffff810fc94e>] lock_acquire+0xbe/0x150
	[   66.447808]        [<ffffffff8120058f>] might_fault+0x6f/0xa0
	[   66.454235]        [<ffffffff8126e05a>] filldir+0x9a/0x130
	[   66.460373]        [<ffffffffa056daed>] xfs_dir2_sf_getdents.isra.10+0x1bd/0x220 [xfs]
	[   66.469233]        [<ffffffffa056e72e>] xfs_readdir+0x17e/0x1a0 [xfs]
	[   66.476449]        [<ffffffffa057095b>] xfs_file_readdir+0x2b/0x30 [xfs]
	[   66.483954]        [<ffffffff8126de2a>] iterate_dir+0x9a/0x140
	[   66.490473]        [<ffffffff8126e361>] SyS_getdents+0x91/0x120
	[   66.497091]        [<ffffffff8182662e>] system_call_fastpath+0x12/0x76
	[   66.504381]
		       other info that might help us debug this:

	[   66.513316] Chain exists of:
			 &mm->mmap_sem --> sb_internal#2 --> &xfs_dir_ilock_class

	[   66.522619]  Possible unsafe locking scenario:

	[   66.529222]        CPU0                    CPU1
	[   66.534275]        ----                    ----
	[   66.539327]   lock(&xfs_dir_ilock_class);
	[   66.543823]                                lock(sb_internal#2);
	[   66.550465]                                lock(&xfs_dir_ilock_class);
	[   66.557767]   lock(&mm->mmap_sem);
	[   66.561580]
			*** DEADLOCK ***

	[   66.568186] 2 locks held by fsstress/2210:
	[   66.572753]  #0:  (&type->i_mutex_dir_key#5){+.+.+.}, at: [<ffffffff8126ddf1>] iterate_dir+0x61/0x140
	[   66.583103]  #1:  (&xfs_dir_ilock_class){++++..}, at: [<ffffffffa057da52>] xfs_ilock+0xc2/0x170 [xfs]
	[   66.593457]
		       stack backtrace:
	[   66.598321] CPU: 26 PID: 2210 Comm: fsstress Not tainted 4.1.0+ #2
	[   66.605215] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013
	[   66.616372]  0000000000000000 0000000048bb9c77 ffff8817fa127ad8 ffffffff8181d9dd
	[   66.624663]  0000000000000000 ffffffff8288f500 ffff8817fa127b28 ffffffff810f7b26
	[   66.632955]  0000000000000002 ffff8817fa127b98 ffff8817fa127b28 ffff881803f06480
	[   66.641249] Call Trace:
	[   66.643983]  [<ffffffff8181d9dd>] dump_stack+0x45/0x57
	[   66.649713]  [<ffffffff810f7b26>] print_circular_bug+0x206/0x280
	[   66.656413]  [<ffffffff810fb867>] __lock_acquire+0x20d7/0x2100
	[   66.662919]  [<ffffffff810fa1b7>] ? __lock_acquire+0xa27/0x2100
	[   66.669523]  [<ffffffff810fc94e>] lock_acquire+0xbe/0x150
	[   66.675543]  [<ffffffff81200562>] ? might_fault+0x42/0xa0
	[   66.681566]  [<ffffffff8120058f>] might_fault+0x6f/0xa0
	[   66.687394]  [<ffffffff81200562>] ? might_fault+0x42/0xa0
	[   66.693418]  [<ffffffff8126e05a>] filldir+0x9a/0x130
	[   66.698968]  [<ffffffffa057db34>] ? xfs_ilock_data_map_shared+0x34/0x40 [xfs]
	[   66.706941]  [<ffffffffa056daed>] xfs_dir2_sf_getdents.isra.10+0x1bd/0x220 [xfs]
	[   66.715203]  [<ffffffffa057da52>] ? xfs_ilock+0xc2/0x170 [xfs]
	[   66.721712]  [<ffffffffa056e72e>] xfs_readdir+0x17e/0x1a0 [xfs]
	[   66.728316]  [<ffffffff81822b6d>] ? mutex_lock_killable_nested+0x3ad/0x480
	[   66.735998]  [<ffffffffa057095b>] xfs_file_readdir+0x2b/0x30 [xfs]
	[   66.742894]  [<ffffffff8126de2a>] iterate_dir+0x9a/0x140
	[   66.748819]  [<ffffffff8127a1ac>] ? __fget_light+0x6c/0xa0
	[   66.754938]  [<ffffffff8126e361>] SyS_getdents+0x91/0x120
	[   66.760958]  [<ffffffff8126dfc0>] ? fillonedir+0xf0/0xf0
	[   66.766884]  [<ffffffff8182662e>] system_call_fastpath+0x12/0x76

The chain reported by lockdep is not exactly the same every time,
but similar.

Once again, I'll recheck. but the patch below still looks correct
to me, and I think that it is actually a fix.

Before this patch freeze_super() calls sync_filesystem() and
s_op->freeze_fs(sb) without ->s_writers locks (as it seen by
lockdep) and this is wrong.

After this patch lockdep knows about ->s_writers locks held and this
is what we want, but apparently this way lockdep can notice the new
dependencies.

Oleg.

------------------------------------------------------------------------------
Subject: [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super()

Move the "fool the lockdep" code from sb_wait_write() into the new
simple helper, sb_lockdep_release(), called once before return from
freeze_super().

This is preparation, but imo this makes sense in any case. This way
we do not hide from lockdep the "write" locks we hold when we call
s_op->freeze_fs(sb).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/super.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index d0fdd49..e7ea9f1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level)
 {
 	s64 writers;
 
-	/*
-	 * We just cycle-through lockdep here so that it does not complain
-	 * about returning with lock to userspace
-	 */
 	rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_);
-	rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_);
 
 	do {
 		DEFINE_WAIT(wait);
-
 		/*
 		 * We use a barrier in prepare_to_wait() to separate setting
 		 * of frozen and checking of the counter
@@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level)
 	} while (writers);
 }
 
+static void sb_freeze_release(struct super_block *sb)
+{
+	int level;
+	/* Avoid the warning from lockdep_sys_exit() */
+	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
+		rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
+}
+
 /**
  * freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
@@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb)
 			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
 			wake_up(&sb->s_writers.wait_unfrozen);
+			sb_freeze_release(sb);
 			deactivate_locked_super(sb);
 			return ret;
 		}
@@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb)
 	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+	sb_freeze_release(sb);
 	up_write(&sb->s_umount);
 	return 0;
 }
-- 
1.5.5.1



  parent reply	other threads:[~2015-08-07 19:58 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-22 21:15 [PATCH 0/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-22 21:15 ` [PATCH 1/4] percpu-rwsem: introduce percpu_down_read_trylock() Oleg Nesterov
2015-07-22 21:15 ` [PATCH 2/4] percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire() Oleg Nesterov
2015-07-31 10:20   ` Peter Zijlstra
2015-08-03 15:40     ` Oleg Nesterov
2015-07-22 21:15 ` [PATCH 3/4] shift percpu_counter_destroy() into destroy_super_work() Oleg Nesterov
2015-07-28  8:36   ` Jan Kara
2015-07-22 21:15 ` [PATCH 4/4] change sb_writers to use percpu_rw_semaphore Oleg Nesterov
2015-07-22 21:34   ` Oleg Nesterov
2015-07-28  8:34   ` Jan Kara
2015-08-03 17:30     ` Oleg Nesterov
2015-08-07 19:54   ` Oleg Nesterov
2015-08-07 19:55 ` Oleg Nesterov [this message]
2015-08-10 14:59   ` [PATCH 0/4] " Jan Kara
2015-08-10 22:41     ` Dave Chinner
2015-08-11 13:16       ` Oleg Nesterov
2015-08-11 13:29         ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150807195552.GB28529@redhat.com \
    --to=oleg@redhat.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.