linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] sb_write: lockdep fixes/cleanups
@ 2015-07-20 17:00 Oleg Nesterov
  2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:00 UTC (permalink / raw)
  To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel

So this is the preparation for percpu_rw_semaphore conversion. But let
me repeat, imo these changes make sense in any case, so I'd like to send
them separately.

2/4 fixes 2 bugs and cleanups the "trylock" hack. Although this hack can
be probably removed, see the changelog.

3/4 and 4/4 try to make the lockdep annotations more consistent. If we
want to use lockdep we should not hide the "write" locks we hold when we
call freeze_fs(sb) and unfreeze_fs(sb).

Please review.

Oleg.

 fs/btrfs/transaction.c |    8 +---
 fs/super.c             |  101 +++++++++++++++++++++++++++++------------------
 fs/xfs/xfs_aops.c      |    6 +--
 include/linux/fs.h     |    5 ++
 4 files changed, 71 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers
  2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
  2015-07-21  8:23   ` Jan Kara
  2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
  To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel

Preparation to hide the sb->s_writers internals from xfs and btrfs.
Add 2 trivial define's they can use rather than play with ->s_writers
directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/btrfs/transaction.c |    8 ++------
 fs/xfs/xfs_aops.c      |    6 ++----
 include/linux/fs.h     |    5 +++++
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5628e25..6dca4e9 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1620,9 +1620,7 @@ static void do_async_commit(struct work_struct *work)
 	 * Tell lockdep about it.
 	 */
 	if (ac->newtrans->type & __TRANS_FREEZABLE)
-		rwsem_acquire_read(
-		     &ac->root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
-		     0, 1, _THIS_IP_);
+		__sb_acquire_write(ac->root->fs_info->sb, SB_FREEZE_FS);
 
 	current->journal_info = ac->newtrans;
 
@@ -1661,9 +1659,7 @@ int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
 	 * async commit thread will be the one to unlock it.
 	 */
 	if (ac->newtrans->type & __TRANS_FREEZABLE)
-		rwsem_release(
-			&root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
-			1, _THIS_IP_);
+		__sb_release_write(root->fs_info->sb, SB_FREEZE_FS);
 
 	schedule_work(&ac->work);
 
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a56960d..8034c78 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -119,8 +119,7 @@ xfs_setfilesize_trans_alloc(
 	 * We may pass freeze protection with a transaction.  So tell lockdep
 	 * we released it.
 	 */
-	rwsem_release(&ioend->io_inode->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
-		      1, _THIS_IP_);
+	__sb_release_write(ioend->io_inode->i_sb, SB_FREEZE_FS);
 	/*
 	 * We hand off the transaction to the completion thread now, so
 	 * clear the flag here.
@@ -171,8 +170,7 @@ xfs_setfilesize_ioend(
 	 * Similarly for freeze protection.
 	 */
 	current_set_flags_nested(&tp->t_pflags, PF_FSTRANS);
-	rwsem_acquire_read(&VFS_I(ip)->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
-			   0, 1, _THIS_IP_);
+	__sb_acquire_write(VFS_I(ip)->i_sb, SB_FREEZE_FS);
 
 	return xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
 }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35ec87e..78ac768 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1362,6 +1362,11 @@ extern struct timespec current_fs_time(struct super_block *sb);
 void __sb_end_write(struct super_block *sb, int level);
 int __sb_start_write(struct super_block *sb, int level, bool wait);
 
+#define __sb_acquire_write(sb, lev)	\
+	rwsem_acquire_read(&(sb)->s_writers.lock_map[(lev)-1], 0, 1, _THIS_IP_)
+#define __sb_release_write(sb, lev)	\
+	rwsem_release(&(sb)->s_writers.lock_map[(lev)-1], 1, _THIS_IP_)
+
 /**
  * sb_end_write - drop write access to a superblock
  * @sb: the super we wrote to
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] fix the broken lockdep logic in __sb_start_write()
  2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
  2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
  2015-07-21  8:38   ` Jan Kara
  2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
  2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
  To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel

1. wait_event(frozen < level) without rwsem_acquire_read() is just
   wrong from lockdep perspective. If we are going to deadlock
   because the caller is buggy, lockdep detect this problem.

2. __sb_start_write() can race with thaw_super() + freeze_super(),
   and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

3. The "tell lockdep we are doing trylock" hack doesn't look nice.

   I think this is correct, but this logic should be more explicit.
   Yes, the recursive read_lock() is fine if we hold the lock on a
   higher level. But we do not need to fool lockdep. If we can not
   deadlock in this case then try-lock must not fail and we can use
   use wait == F throughout this code.

Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/super.c |   73 ++++++++++++++++++++++++++++++++---------------------------
 1 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 928c20f..d0fdd49 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1158,38 +1158,11 @@ void __sb_end_write(struct super_block *sb, int level)
 }
 EXPORT_SYMBOL(__sb_end_write);
 
-#ifdef CONFIG_LOCKDEP
-/*
- * We want lockdep to tell us about possible deadlocks with freezing but
- * it's it bit tricky to properly instrument it. Getting a freeze protection
- * works as getting a read lock but there are subtle problems. XFS for example
- * gets freeze protection on internal level twice in some cases, which is OK
- * only because we already hold a freeze protection also on higher level. Due
- * to these cases we have to tell lockdep we are doing trylock when we
- * already hold a freeze protection for a higher freeze level.
- */
-static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock,
+static int do_sb_start_write(struct super_block *sb, int level, bool wait,
 				unsigned long ip)
 {
-	int i;
-
-	if (!trylock) {
-		for (i = 0; i < level - 1; i++)
-			if (lock_is_held(&sb->s_writers.lock_map[i])) {
-				trylock = true;
-				break;
-			}
-	}
-	rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip);
-}
-#endif
-
-/*
- * This is an internal function, please use sb_start_{write,pagefault,intwrite}
- * instead.
- */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
-{
+	if (wait)
+		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
 retry:
 	if (unlikely(sb->s_writers.frozen >= level)) {
 		if (!wait)
@@ -1198,9 +1171,6 @@ retry:
 			   sb->s_writers.frozen < level);
 	}
 
-#ifdef CONFIG_LOCKDEP
-	acquire_freeze_lock(sb, level, !wait, _RET_IP_);
-#endif
 	percpu_counter_inc(&sb->s_writers.counter[level-1]);
 	/*
 	 * Make sure counter is updated before we check for frozen.
@@ -1211,8 +1181,45 @@ retry:
 		__sb_end_write(sb, level);
 		goto retry;
 	}
+
+	if (!wait)
+		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
 	return 1;
 }
+
+/*
+ * This is an internal function, please use sb_start_{write,pagefault,intwrite}
+ * instead.
+ */
+int __sb_start_write(struct super_block *sb, int level, bool wait)
+{
+	bool force_trylock = false;
+	int ret;
+
+#ifdef CONFIG_LOCKDEP
+	/*
+	 * We want lockdep to tell us about possible deadlocks with freezing
+	 * but it's it bit tricky to properly instrument it. Getting a freeze
+	 * protection works as getting a read lock but there are subtle
+	 * problems. XFS for example gets freeze protection on internal level
+	 * twice in some cases, which is OK only because we already hold a
+	 * freeze protection also on higher level. Due to these cases we have
+	 * to use wait == F (trylock mode) which must not fail.
+	 */
+	if (wait) {
+		int i;
+
+		for (i = 0; i < level - 1; i++)
+			if (lock_is_held(&sb->s_writers.lock_map[i])) {
+				force_trylock = true;
+				break;
+			}
+	}
+#endif
+	ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_);
+	WARN_ON(force_trylock & !ret);
+	return ret;
+}
 EXPORT_SYMBOL(__sb_start_write);
 
 /**
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super()
  2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
  2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
  2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
  2015-07-21  8:40   ` Jan Kara
  2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
  To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel

Move the "fool the lockdep" code from sb_wait_write() into the new
simple helper, sb_lockdep_release(), called once before return from
freeze_super().

This is preparation, but imo this makes sense in any case. This way
we do not hide from lockdep the "write" locks we hold when we call
s_op->freeze_fs(sb).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/super.c |   16 ++++++++++------
 1 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index d0fdd49..e7ea9f1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level)
 {
 	s64 writers;
 
-	/*
-	 * We just cycle-through lockdep here so that it does not complain
-	 * about returning with lock to userspace
-	 */
 	rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_);
-	rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_);
 
 	do {
 		DEFINE_WAIT(wait);
-
 		/*
 		 * We use a barrier in prepare_to_wait() to separate setting
 		 * of frozen and checking of the counter
@@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level)
 	} while (writers);
 }
 
+static void sb_freeze_release(struct super_block *sb)
+{
+	int level;
+	/* Avoid the warning from lockdep_sys_exit() */
+	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
+		rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
+}
+
 /**
  * freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
@@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb)
 			sb->s_writers.frozen = SB_UNFROZEN;
 			smp_wmb();
 			wake_up(&sb->s_writers.wait_unfrozen);
+			sb_freeze_release(sb);
 			deactivate_locked_super(sb);
 			return ret;
 		}
@@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb)
 	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+	sb_freeze_release(sb);
 	up_write(&sb->s_umount);
 	return 0;
 }
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map
  2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
                   ` (2 preceding siblings ...)
  2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
  2015-07-21  8:48   ` Jan Kara
  3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
  To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel

Change thaw_super() to re-acquire the "write" locks we are going to
release. This way s_op->unfreeze_fs(sb) is called with these locks
held as it seen by lockdep, and this matches the reality. This adds
another trivial helper, sb_freeze_acquire().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/super.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index e7ea9f1..b4db3ee 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1263,6 +1263,14 @@ static void sb_freeze_release(struct super_block *sb)
 		rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
 }
 
+static void sb_freeze_acquire(struct super_block *sb)
+{
+	int level;
+
+	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
+		rwsem_acquire(sb->s_writers.lock_map + level, 0, 1, _THIS_IP_);
+}
+
 /**
  * freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
@@ -1386,16 +1394,20 @@ int thaw_super(struct super_block *sb)
 	if (sb->s_flags & MS_RDONLY)
 		goto out;
 
+	sb_freeze_acquire(sb);
+
 	if (sb->s_op->unfreeze_fs) {
 		error = sb->s_op->unfreeze_fs(sb);
 		if (error) {
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
+			sb_freeze_release(sb);
 			up_write(&sb->s_umount);
 			return error;
 		}
 	}
 
+	sb_freeze_release(sb);
 out:
 	sb->s_writers.frozen = SB_UNFROZEN;
 	smp_wmb();
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers
  2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
@ 2015-07-21  8:23   ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-07-21  8:23 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
	linux-kernel

On Mon 20-07-15 19:01:01, Oleg Nesterov wrote:
> Preparation to hide the sb->s_writers internals from xfs and btrfs.
> Add 2 trivial define's they can use rather than play with ->s_writers
> directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Nice. You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza

> ---
>  fs/btrfs/transaction.c |    8 ++------
>  fs/xfs/xfs_aops.c      |    6 ++----
>  include/linux/fs.h     |    5 +++++
>  3 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 5628e25..6dca4e9 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1620,9 +1620,7 @@ static void do_async_commit(struct work_struct *work)
>  	 * Tell lockdep about it.
>  	 */
>  	if (ac->newtrans->type & __TRANS_FREEZABLE)
> -		rwsem_acquire_read(
> -		     &ac->root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
> -		     0, 1, _THIS_IP_);
> +		__sb_acquire_write(ac->root->fs_info->sb, SB_FREEZE_FS);
>  
>  	current->journal_info = ac->newtrans;
>  
> @@ -1661,9 +1659,7 @@ int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
>  	 * async commit thread will be the one to unlock it.
>  	 */
>  	if (ac->newtrans->type & __TRANS_FREEZABLE)
> -		rwsem_release(
> -			&root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
> -			1, _THIS_IP_);
> +		__sb_release_write(root->fs_info->sb, SB_FREEZE_FS);
>  
>  	schedule_work(&ac->work);
>  
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index a56960d..8034c78 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -119,8 +119,7 @@ xfs_setfilesize_trans_alloc(
>  	 * We may pass freeze protection with a transaction.  So tell lockdep
>  	 * we released it.
>  	 */
> -	rwsem_release(&ioend->io_inode->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
> -		      1, _THIS_IP_);
> +	__sb_release_write(ioend->io_inode->i_sb, SB_FREEZE_FS);
>  	/*
>  	 * We hand off the transaction to the completion thread now, so
>  	 * clear the flag here.
> @@ -171,8 +170,7 @@ xfs_setfilesize_ioend(
>  	 * Similarly for freeze protection.
>  	 */
>  	current_set_flags_nested(&tp->t_pflags, PF_FSTRANS);
> -	rwsem_acquire_read(&VFS_I(ip)->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
> -			   0, 1, _THIS_IP_);
> +	__sb_acquire_write(VFS_I(ip)->i_sb, SB_FREEZE_FS);
>  
>  	return xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
>  }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 35ec87e..78ac768 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1362,6 +1362,11 @@ extern struct timespec current_fs_time(struct super_block *sb);
>  void __sb_end_write(struct super_block *sb, int level);
>  int __sb_start_write(struct super_block *sb, int level, bool wait);
>  
> +#define __sb_acquire_write(sb, lev)	\
> +	rwsem_acquire_read(&(sb)->s_writers.lock_map[(lev)-1], 0, 1, _THIS_IP_)
> +#define __sb_release_write(sb, lev)	\
> +	rwsem_release(&(sb)->s_writers.lock_map[(lev)-1], 1, _THIS_IP_)
> +
>  /**
>   * sb_end_write - drop write access to a superblock
>   * @sb: the super we wrote to
> -- 
> 1.5.5.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] fix the broken lockdep logic in __sb_start_write()
  2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
@ 2015-07-21  8:38   ` Jan Kara
  2015-07-22 21:13     ` Oleg Nesterov
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2015-07-21  8:38 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
	linux-kernel

On Mon 20-07-15 19:01:03, Oleg Nesterov wrote:
> 1. wait_event(frozen < level) without rwsem_acquire_read() is just
>    wrong from lockdep perspective. If we are going to deadlock
>    because the caller is buggy, lockdep detect this problem.
> 
> 2. __sb_start_write() can race with thaw_super() + freeze_super(),
>    and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.
> 
> 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
> 
>    I think this is correct, but this logic should be more explicit.
>    Yes, the recursive read_lock() is fine if we hold the lock on a
>    higher level. But we do not need to fool lockdep. If we can not
>    deadlock in this case then try-lock must not fail and we can use
>    use wait == F throughout this code.
> 
> Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> can be probably removed. But this needs a separate change and it will
> be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> back to __sb_start_write().

The patch looks good. Did you test this BTW? You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  fs/super.c |   73 ++++++++++++++++++++++++++++++++---------------------------
>  1 files changed, 40 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 928c20f..d0fdd49 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1158,38 +1158,11 @@ void __sb_end_write(struct super_block *sb, int level)
>  }
>  EXPORT_SYMBOL(__sb_end_write);
>  
> -#ifdef CONFIG_LOCKDEP
> -/*
> - * We want lockdep to tell us about possible deadlocks with freezing but
> - * it's it bit tricky to properly instrument it. Getting a freeze protection
> - * works as getting a read lock but there are subtle problems. XFS for example
> - * gets freeze protection on internal level twice in some cases, which is OK
> - * only because we already hold a freeze protection also on higher level. Due
> - * to these cases we have to tell lockdep we are doing trylock when we
> - * already hold a freeze protection for a higher freeze level.
> - */
> -static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock,
> +static int do_sb_start_write(struct super_block *sb, int level, bool wait,
>  				unsigned long ip)
>  {
> -	int i;
> -
> -	if (!trylock) {
> -		for (i = 0; i < level - 1; i++)
> -			if (lock_is_held(&sb->s_writers.lock_map[i])) {
> -				trylock = true;
> -				break;
> -			}
> -	}
> -	rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip);
> -}
> -#endif
> -
> -/*
> - * This is an internal function, please use sb_start_{write,pagefault,intwrite}
> - * instead.
> - */
> -int __sb_start_write(struct super_block *sb, int level, bool wait)
> -{
> +	if (wait)
> +		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
>  retry:
>  	if (unlikely(sb->s_writers.frozen >= level)) {
>  		if (!wait)
> @@ -1198,9 +1171,6 @@ retry:
>  			   sb->s_writers.frozen < level);
>  	}
>  
> -#ifdef CONFIG_LOCKDEP
> -	acquire_freeze_lock(sb, level, !wait, _RET_IP_);
> -#endif
>  	percpu_counter_inc(&sb->s_writers.counter[level-1]);
>  	/*
>  	 * Make sure counter is updated before we check for frozen.
> @@ -1211,8 +1181,45 @@ retry:
>  		__sb_end_write(sb, level);
>  		goto retry;
>  	}
> +
> +	if (!wait)
> +		rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
>  	return 1;
>  }
> +
> +/*
> + * This is an internal function, please use sb_start_{write,pagefault,intwrite}
> + * instead.
> + */
> +int __sb_start_write(struct super_block *sb, int level, bool wait)
> +{
> +	bool force_trylock = false;
> +	int ret;
> +
> +#ifdef CONFIG_LOCKDEP
> +	/*
> +	 * We want lockdep to tell us about possible deadlocks with freezing
> +	 * but it's it bit tricky to properly instrument it. Getting a freeze
> +	 * protection works as getting a read lock but there are subtle
> +	 * problems. XFS for example gets freeze protection on internal level
> +	 * twice in some cases, which is OK only because we already hold a
> +	 * freeze protection also on higher level. Due to these cases we have
> +	 * to use wait == F (trylock mode) which must not fail.
> +	 */
> +	if (wait) {
> +		int i;
> +
> +		for (i = 0; i < level - 1; i++)
> +			if (lock_is_held(&sb->s_writers.lock_map[i])) {
> +				force_trylock = true;
> +				break;
> +			}
> +	}
> +#endif
> +	ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_);
> +	WARN_ON(force_trylock & !ret);
> +	return ret;
> +}
>  EXPORT_SYMBOL(__sb_start_write);
>  
>  /**
> -- 
> 1.5.5.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super()
  2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
@ 2015-07-21  8:40   ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-07-21  8:40 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
	linux-kernel

On Mon 20-07-15 19:01:06, Oleg Nesterov wrote:
> Move the "fool the lockdep" code from sb_wait_write() into the new
> simple helper, sb_lockdep_release(), called once before return from
> freeze_super().
> 
> This is preparation, but imo this makes sense in any case. This way
> we do not hide from lockdep the "write" locks we hold when we call
> s_op->freeze_fs(sb).
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Good. You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza

> ---
>  fs/super.c |   16 ++++++++++------
>  1 files changed, 10 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index d0fdd49..e7ea9f1 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level)
>  {
>  	s64 writers;
>  
> -	/*
> -	 * We just cycle-through lockdep here so that it does not complain
> -	 * about returning with lock to userspace
> -	 */
>  	rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_);
> -	rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_);
>  
>  	do {
>  		DEFINE_WAIT(wait);
> -
>  		/*
>  		 * We use a barrier in prepare_to_wait() to separate setting
>  		 * of frozen and checking of the counter
> @@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level)
>  	} while (writers);
>  }
>  
> +static void sb_freeze_release(struct super_block *sb)
> +{
> +	int level;
> +	/* Avoid the warning from lockdep_sys_exit() */
> +	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
> +		rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
> +}
> +
>  /**
>   * freeze_super - lock the filesystem and force it into a consistent state
>   * @sb: the super to lock
> @@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb)
>  			sb->s_writers.frozen = SB_UNFROZEN;
>  			smp_wmb();
>  			wake_up(&sb->s_writers.wait_unfrozen);
> +			sb_freeze_release(sb);
>  			deactivate_locked_super(sb);
>  			return ret;
>  		}
> @@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb)
>  	 * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
>  	 */
>  	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> +	sb_freeze_release(sb);
>  	up_write(&sb->s_umount);
>  	return 0;
>  }
> -- 
> 1.5.5.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map
  2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
@ 2015-07-21  8:48   ` Jan Kara
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-07-21  8:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
	linux-kernel

On Mon 20-07-15 19:01:09, Oleg Nesterov wrote:
> Change thaw_super() to re-acquire the "write" locks we are going to
> release. This way s_op->unfreeze_fs(sb) is called with these locks
> held as it seen by lockdep, and this matches the reality. This adds
> another trivial helper, sb_freeze_acquire().
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>

Ah, I was confused for a moment why this doesn't trigger a lockdep warning
because of lock inversion between s_umount and freeze protection but you
use trylock in sb_freeze_acquire(). So things are fine.

You can add:

Reviewed-by: Jan Kara <jack@suse.com>

								Honza

> ---
>  fs/super.c |   12 ++++++++++++
>  1 files changed, 12 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index e7ea9f1..b4db3ee 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1263,6 +1263,14 @@ static void sb_freeze_release(struct super_block *sb)
>  		rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
>  }
>  
> +static void sb_freeze_acquire(struct super_block *sb)
> +{
> +	int level;
> +
> +	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
> +		rwsem_acquire(sb->s_writers.lock_map + level, 0, 1, _THIS_IP_);
> +}
> +
>  /**
>   * freeze_super - lock the filesystem and force it into a consistent state
>   * @sb: the super to lock
> @@ -1386,16 +1394,20 @@ int thaw_super(struct super_block *sb)
>  	if (sb->s_flags & MS_RDONLY)
>  		goto out;
>  
> +	sb_freeze_acquire(sb);
> +
>  	if (sb->s_op->unfreeze_fs) {
>  		error = sb->s_op->unfreeze_fs(sb);
>  		if (error) {
>  			printk(KERN_ERR
>  				"VFS:Filesystem thaw failed\n");
> +			sb_freeze_release(sb);
>  			up_write(&sb->s_umount);
>  			return error;
>  		}
>  	}
>  
> +	sb_freeze_release(sb);
>  out:
>  	sb->s_writers.frozen = SB_UNFROZEN;
>  	smp_wmb();
> -- 
> 1.5.5.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] fix the broken lockdep logic in __sb_start_write()
  2015-07-21  8:38   ` Jan Kara
@ 2015-07-22 21:13     ` Oleg Nesterov
  0 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-22 21:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Al Viro, Dave Chinner, Dave Hansen, linux-fsdevel, linux-kernel

On 07/21, Jan Kara wrote:
>
> On Mon 20-07-15 19:01:03, Oleg Nesterov wrote:
> > 1. wait_event(frozen < level) without rwsem_acquire_read() is just
> >    wrong from lockdep perspective. If we are going to deadlock
> >    because the caller is buggy, lockdep detect this problem.
> >
> > 2. __sb_start_write() can race with thaw_super() + freeze_super(),
> >    and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.
> >
> > 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
> >
> >    I think this is correct, but this logic should be more explicit.
> >    Yes, the recursive read_lock() is fine if we hold the lock on a
> >    higher level. But we do not need to fool lockdep. If we can not
> >    deadlock in this case then try-lock must not fail and we can use
> >    use wait == F throughout this code.
> >
> > Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> > can be probably removed. But this needs a separate change and it will
> > be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> > back to __sb_start_write().
>
> The patch looks good. Did you test this BTW? You can add:

Yes, but "artificially". I just wrote the function which takes/drops
SB_FREEZE_FS twice with and then without SB_FREEZE_WRITE. It worked
as expected, lockdep complained when SB_FREEZE_WRITE wasn't held.

> Reviewed-by: Jan Kara <jack@suse.com>

Thanks!

Oleg.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-07-22 21:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
2015-07-21  8:23   ` Jan Kara
2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
2015-07-21  8:38   ` Jan Kara
2015-07-22 21:13     ` Oleg Nesterov
2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
2015-07-21  8:40   ` Jan Kara
2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
2015-07-21  8:48   ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).