* [PATCH 0/4] sb_write: lockdep fixes/cleanups
@ 2015-07-20 17:00 Oleg Nesterov
2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:00 UTC (permalink / raw)
To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel
So this is the preparation for percpu_rw_semaphore conversion. But let
me repeat, imo these changes make sense in any case, so I'd like to send
them separately.
2/4 fixes 2 bugs and cleanups the "trylock" hack. Although this hack can
be probably removed, see the changelog.
3/4 and 4/4 try to make the lockdep annotations more consistent. If we
want to use lockdep we should not hide the "write" locks we hold when we
call freeze_fs(sb) and unfreeze_fs(sb).
Please review.
Oleg.
fs/btrfs/transaction.c | 8 +---
fs/super.c | 101 +++++++++++++++++++++++++++++------------------
fs/xfs/xfs_aops.c | 6 +--
include/linux/fs.h | 5 ++
4 files changed, 71 insertions(+), 49 deletions(-)
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers
2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
2015-07-21 8:23 ` Jan Kara
2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel
Preparation to hide the sb->s_writers internals from xfs and btrfs.
Add 2 trivial define's they can use rather than play with ->s_writers
directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/btrfs/transaction.c | 8 ++------
fs/xfs/xfs_aops.c | 6 ++----
include/linux/fs.h | 5 +++++
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5628e25..6dca4e9 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1620,9 +1620,7 @@ static void do_async_commit(struct work_struct *work)
* Tell lockdep about it.
*/
if (ac->newtrans->type & __TRANS_FREEZABLE)
- rwsem_acquire_read(
- &ac->root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
- 0, 1, _THIS_IP_);
+ __sb_acquire_write(ac->root->fs_info->sb, SB_FREEZE_FS);
current->journal_info = ac->newtrans;
@@ -1661,9 +1659,7 @@ int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
* async commit thread will be the one to unlock it.
*/
if (ac->newtrans->type & __TRANS_FREEZABLE)
- rwsem_release(
- &root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
- 1, _THIS_IP_);
+ __sb_release_write(root->fs_info->sb, SB_FREEZE_FS);
schedule_work(&ac->work);
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index a56960d..8034c78 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -119,8 +119,7 @@ xfs_setfilesize_trans_alloc(
* We may pass freeze protection with a transaction. So tell lockdep
* we released it.
*/
- rwsem_release(&ioend->io_inode->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
- 1, _THIS_IP_);
+ __sb_release_write(ioend->io_inode->i_sb, SB_FREEZE_FS);
/*
* We hand off the transaction to the completion thread now, so
* clear the flag here.
@@ -171,8 +170,7 @@ xfs_setfilesize_ioend(
* Similarly for freeze protection.
*/
current_set_flags_nested(&tp->t_pflags, PF_FSTRANS);
- rwsem_acquire_read(&VFS_I(ip)->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
- 0, 1, _THIS_IP_);
+ __sb_acquire_write(VFS_I(ip)->i_sb, SB_FREEZE_FS);
return xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35ec87e..78ac768 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1362,6 +1362,11 @@ extern struct timespec current_fs_time(struct super_block *sb);
void __sb_end_write(struct super_block *sb, int level);
int __sb_start_write(struct super_block *sb, int level, bool wait);
+#define __sb_acquire_write(sb, lev) \
+ rwsem_acquire_read(&(sb)->s_writers.lock_map[(lev)-1], 0, 1, _THIS_IP_)
+#define __sb_release_write(sb, lev) \
+ rwsem_release(&(sb)->s_writers.lock_map[(lev)-1], 1, _THIS_IP_)
+
/**
* sb_end_write - drop write access to a superblock
* @sb: the super we wrote to
--
1.5.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/4] fix the broken lockdep logic in __sb_start_write()
2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
2015-07-21 8:38 ` Jan Kara
2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel
1. wait_event(frozen < level) without rwsem_acquire_read() is just
wrong from lockdep perspective. If we are going to deadlock
because the caller is buggy, lockdep detect this problem.
2. __sb_start_write() can race with thaw_super() + freeze_super(),
and after "goto retry" the 2nd acquire_freeze_lock() is wrong.
3. The "tell lockdep we are doing trylock" hack doesn't look nice.
I think this is correct, but this logic should be more explicit.
Yes, the recursive read_lock() is fine if we hold the lock on a
higher level. But we do not need to fool lockdep. If we can not
deadlock in this case then try-lock must not fail and we can use
use wait == F throughout this code.
Note: as Dave Chinner explains, the "trylock" hack and the fat comment
can be probably removed. But this needs a separate change and it will
be trivial: just kill __sb_start_write() and rename do_sb_start_write()
back to __sb_start_write().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/super.c | 73 ++++++++++++++++++++++++++++++++---------------------------
1 files changed, 40 insertions(+), 33 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 928c20f..d0fdd49 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1158,38 +1158,11 @@ void __sb_end_write(struct super_block *sb, int level)
}
EXPORT_SYMBOL(__sb_end_write);
-#ifdef CONFIG_LOCKDEP
-/*
- * We want lockdep to tell us about possible deadlocks with freezing but
- * it's it bit tricky to properly instrument it. Getting a freeze protection
- * works as getting a read lock but there are subtle problems. XFS for example
- * gets freeze protection on internal level twice in some cases, which is OK
- * only because we already hold a freeze protection also on higher level. Due
- * to these cases we have to tell lockdep we are doing trylock when we
- * already hold a freeze protection for a higher freeze level.
- */
-static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock,
+static int do_sb_start_write(struct super_block *sb, int level, bool wait,
unsigned long ip)
{
- int i;
-
- if (!trylock) {
- for (i = 0; i < level - 1; i++)
- if (lock_is_held(&sb->s_writers.lock_map[i])) {
- trylock = true;
- break;
- }
- }
- rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip);
-}
-#endif
-
-/*
- * This is an internal function, please use sb_start_{write,pagefault,intwrite}
- * instead.
- */
-int __sb_start_write(struct super_block *sb, int level, bool wait)
-{
+ if (wait)
+ rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
retry:
if (unlikely(sb->s_writers.frozen >= level)) {
if (!wait)
@@ -1198,9 +1171,6 @@ retry:
sb->s_writers.frozen < level);
}
-#ifdef CONFIG_LOCKDEP
- acquire_freeze_lock(sb, level, !wait, _RET_IP_);
-#endif
percpu_counter_inc(&sb->s_writers.counter[level-1]);
/*
* Make sure counter is updated before we check for frozen.
@@ -1211,8 +1181,45 @@ retry:
__sb_end_write(sb, level);
goto retry;
}
+
+ if (!wait)
+ rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
return 1;
}
+
+/*
+ * This is an internal function, please use sb_start_{write,pagefault,intwrite}
+ * instead.
+ */
+int __sb_start_write(struct super_block *sb, int level, bool wait)
+{
+ bool force_trylock = false;
+ int ret;
+
+#ifdef CONFIG_LOCKDEP
+ /*
+ * We want lockdep to tell us about possible deadlocks with freezing
+ * but it's it bit tricky to properly instrument it. Getting a freeze
+ * protection works as getting a read lock but there are subtle
+ * problems. XFS for example gets freeze protection on internal level
+ * twice in some cases, which is OK only because we already hold a
+ * freeze protection also on higher level. Due to these cases we have
+ * to use wait == F (trylock mode) which must not fail.
+ */
+ if (wait) {
+ int i;
+
+ for (i = 0; i < level - 1; i++)
+ if (lock_is_held(&sb->s_writers.lock_map[i])) {
+ force_trylock = true;
+ break;
+ }
+ }
+#endif
+ ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_);
+ WARN_ON(force_trylock & !ret);
+ return ret;
+}
EXPORT_SYMBOL(__sb_start_write);
/**
--
1.5.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super()
2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
2015-07-21 8:40 ` Jan Kara
2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel
Move the "fool the lockdep" code from sb_wait_write() into the new
simple helper, sb_lockdep_release(), called once before return from
freeze_super().
This is preparation, but imo this makes sense in any case. This way
we do not hide from lockdep the "write" locks we hold when we call
s_op->freeze_fs(sb).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/super.c | 16 ++++++++++------
1 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index d0fdd49..e7ea9f1 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level)
{
s64 writers;
- /*
- * We just cycle-through lockdep here so that it does not complain
- * about returning with lock to userspace
- */
rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_);
- rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_);
do {
DEFINE_WAIT(wait);
-
/*
* We use a barrier in prepare_to_wait() to separate setting
* of frozen and checking of the counter
@@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level)
} while (writers);
}
+static void sb_freeze_release(struct super_block *sb)
+{
+ int level;
+ /* Avoid the warning from lockdep_sys_exit() */
+ for (level = 0; level < SB_FREEZE_LEVELS; ++level)
+ rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
+}
+
/**
* freeze_super - lock the filesystem and force it into a consistent state
* @sb: the super to lock
@@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb)
sb->s_writers.frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_writers.wait_unfrozen);
+ sb_freeze_release(sb);
deactivate_locked_super(sb);
return ret;
}
@@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb)
* sees write activity when frozen is set to SB_FREEZE_COMPLETE.
*/
sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+ sb_freeze_release(sb);
up_write(&sb->s_umount);
return 0;
}
--
1.5.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map
2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
` (2 preceding siblings ...)
2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
@ 2015-07-20 17:01 ` Oleg Nesterov
2015-07-21 8:48 ` Jan Kara
3 siblings, 1 reply; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-20 17:01 UTC (permalink / raw)
To: Al Viro, Dave Chinner, Dave Hansen, Jan Kara; +Cc: linux-fsdevel, linux-kernel
Change thaw_super() to re-acquire the "write" locks we are going to
release. This way s_op->unfreeze_fs(sb) is called with these locks
held as it seen by lockdep, and this matches the reality. This adds
another trivial helper, sb_freeze_acquire().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/super.c | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index e7ea9f1..b4db3ee 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1263,6 +1263,14 @@ static void sb_freeze_release(struct super_block *sb)
rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
}
+static void sb_freeze_acquire(struct super_block *sb)
+{
+ int level;
+
+ for (level = 0; level < SB_FREEZE_LEVELS; ++level)
+ rwsem_acquire(sb->s_writers.lock_map + level, 0, 1, _THIS_IP_);
+}
+
/**
* freeze_super - lock the filesystem and force it into a consistent state
* @sb: the super to lock
@@ -1386,16 +1394,20 @@ int thaw_super(struct super_block *sb)
if (sb->s_flags & MS_RDONLY)
goto out;
+ sb_freeze_acquire(sb);
+
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
if (error) {
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
+ sb_freeze_release(sb);
up_write(&sb->s_umount);
return error;
}
}
+ sb_freeze_release(sb);
out:
sb->s_writers.frozen = SB_UNFROZEN;
smp_wmb();
--
1.5.5.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers
2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
@ 2015-07-21 8:23 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-07-21 8:23 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
linux-kernel
On Mon 20-07-15 19:01:01, Oleg Nesterov wrote:
> Preparation to hide the sb->s_writers internals from xfs and btrfs.
> Add 2 trivial define's they can use rather than play with ->s_writers
> directly. No changes in btrfs/transaction.o and xfs/xfs_aops.o.
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Nice. You can add:
Reviewed-by: Jan Kara <jack@suse.com>
Honza
> ---
> fs/btrfs/transaction.c | 8 ++------
> fs/xfs/xfs_aops.c | 6 ++----
> include/linux/fs.h | 5 +++++
> 3 files changed, 9 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index 5628e25..6dca4e9 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1620,9 +1620,7 @@ static void do_async_commit(struct work_struct *work)
> * Tell lockdep about it.
> */
> if (ac->newtrans->type & __TRANS_FREEZABLE)
> - rwsem_acquire_read(
> - &ac->root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
> - 0, 1, _THIS_IP_);
> + __sb_acquire_write(ac->root->fs_info->sb, SB_FREEZE_FS);
>
> current->journal_info = ac->newtrans;
>
> @@ -1661,9 +1659,7 @@ int btrfs_commit_transaction_async(struct btrfs_trans_handle *trans,
> * async commit thread will be the one to unlock it.
> */
> if (ac->newtrans->type & __TRANS_FREEZABLE)
> - rwsem_release(
> - &root->fs_info->sb->s_writers.lock_map[SB_FREEZE_FS-1],
> - 1, _THIS_IP_);
> + __sb_release_write(root->fs_info->sb, SB_FREEZE_FS);
>
> schedule_work(&ac->work);
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index a56960d..8034c78 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -119,8 +119,7 @@ xfs_setfilesize_trans_alloc(
> * We may pass freeze protection with a transaction. So tell lockdep
> * we released it.
> */
> - rwsem_release(&ioend->io_inode->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
> - 1, _THIS_IP_);
> + __sb_release_write(ioend->io_inode->i_sb, SB_FREEZE_FS);
> /*
> * We hand off the transaction to the completion thread now, so
> * clear the flag here.
> @@ -171,8 +170,7 @@ xfs_setfilesize_ioend(
> * Similarly for freeze protection.
> */
> current_set_flags_nested(&tp->t_pflags, PF_FSTRANS);
> - rwsem_acquire_read(&VFS_I(ip)->i_sb->s_writers.lock_map[SB_FREEZE_FS-1],
> - 0, 1, _THIS_IP_);
> + __sb_acquire_write(VFS_I(ip)->i_sb, SB_FREEZE_FS);
>
> return xfs_setfilesize(ip, tp, ioend->io_offset, ioend->io_size);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 35ec87e..78ac768 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1362,6 +1362,11 @@ extern struct timespec current_fs_time(struct super_block *sb);
> void __sb_end_write(struct super_block *sb, int level);
> int __sb_start_write(struct super_block *sb, int level, bool wait);
>
> +#define __sb_acquire_write(sb, lev) \
> + rwsem_acquire_read(&(sb)->s_writers.lock_map[(lev)-1], 0, 1, _THIS_IP_)
> +#define __sb_release_write(sb, lev) \
> + rwsem_release(&(sb)->s_writers.lock_map[(lev)-1], 1, _THIS_IP_)
> +
> /**
> * sb_end_write - drop write access to a superblock
> * @sb: the super we wrote to
> --
> 1.5.5.1
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] fix the broken lockdep logic in __sb_start_write()
2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
@ 2015-07-21 8:38 ` Jan Kara
2015-07-22 21:13 ` Oleg Nesterov
0 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2015-07-21 8:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
linux-kernel
On Mon 20-07-15 19:01:03, Oleg Nesterov wrote:
> 1. wait_event(frozen < level) without rwsem_acquire_read() is just
> wrong from lockdep perspective. If we are going to deadlock
> because the caller is buggy, lockdep detect this problem.
>
> 2. __sb_start_write() can race with thaw_super() + freeze_super(),
> and after "goto retry" the 2nd acquire_freeze_lock() is wrong.
>
> 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
>
> I think this is correct, but this logic should be more explicit.
> Yes, the recursive read_lock() is fine if we hold the lock on a
> higher level. But we do not need to fool lockdep. If we can not
> deadlock in this case then try-lock must not fail and we can use
> use wait == F throughout this code.
>
> Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> can be probably removed. But this needs a separate change and it will
> be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> back to __sb_start_write().
The patch looks good. Did you test this BTW? You can add:
Reviewed-by: Jan Kara <jack@suse.com>
Honza
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
> fs/super.c | 73 ++++++++++++++++++++++++++++++++---------------------------
> 1 files changed, 40 insertions(+), 33 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index 928c20f..d0fdd49 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1158,38 +1158,11 @@ void __sb_end_write(struct super_block *sb, int level)
> }
> EXPORT_SYMBOL(__sb_end_write);
>
> -#ifdef CONFIG_LOCKDEP
> -/*
> - * We want lockdep to tell us about possible deadlocks with freezing but
> - * it's it bit tricky to properly instrument it. Getting a freeze protection
> - * works as getting a read lock but there are subtle problems. XFS for example
> - * gets freeze protection on internal level twice in some cases, which is OK
> - * only because we already hold a freeze protection also on higher level. Due
> - * to these cases we have to tell lockdep we are doing trylock when we
> - * already hold a freeze protection for a higher freeze level.
> - */
> -static void acquire_freeze_lock(struct super_block *sb, int level, bool trylock,
> +static int do_sb_start_write(struct super_block *sb, int level, bool wait,
> unsigned long ip)
> {
> - int i;
> -
> - if (!trylock) {
> - for (i = 0; i < level - 1; i++)
> - if (lock_is_held(&sb->s_writers.lock_map[i])) {
> - trylock = true;
> - break;
> - }
> - }
> - rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, trylock, ip);
> -}
> -#endif
> -
> -/*
> - * This is an internal function, please use sb_start_{write,pagefault,intwrite}
> - * instead.
> - */
> -int __sb_start_write(struct super_block *sb, int level, bool wait)
> -{
> + if (wait)
> + rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 0, ip);
> retry:
> if (unlikely(sb->s_writers.frozen >= level)) {
> if (!wait)
> @@ -1198,9 +1171,6 @@ retry:
> sb->s_writers.frozen < level);
> }
>
> -#ifdef CONFIG_LOCKDEP
> - acquire_freeze_lock(sb, level, !wait, _RET_IP_);
> -#endif
> percpu_counter_inc(&sb->s_writers.counter[level-1]);
> /*
> * Make sure counter is updated before we check for frozen.
> @@ -1211,8 +1181,45 @@ retry:
> __sb_end_write(sb, level);
> goto retry;
> }
> +
> + if (!wait)
> + rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 1, ip);
> return 1;
> }
> +
> +/*
> + * This is an internal function, please use sb_start_{write,pagefault,intwrite}
> + * instead.
> + */
> +int __sb_start_write(struct super_block *sb, int level, bool wait)
> +{
> + bool force_trylock = false;
> + int ret;
> +
> +#ifdef CONFIG_LOCKDEP
> + /*
> + * We want lockdep to tell us about possible deadlocks with freezing
> + * but it's it bit tricky to properly instrument it. Getting a freeze
> + * protection works as getting a read lock but there are subtle
> + * problems. XFS for example gets freeze protection on internal level
> + * twice in some cases, which is OK only because we already hold a
> + * freeze protection also on higher level. Due to these cases we have
> + * to use wait == F (trylock mode) which must not fail.
> + */
> + if (wait) {
> + int i;
> +
> + for (i = 0; i < level - 1; i++)
> + if (lock_is_held(&sb->s_writers.lock_map[i])) {
> + force_trylock = true;
> + break;
> + }
> + }
> +#endif
> + ret = do_sb_start_write(sb, level, wait && !force_trylock, _RET_IP_);
> + WARN_ON(force_trylock & !ret);
> + return ret;
> +}
> EXPORT_SYMBOL(__sb_start_write);
>
> /**
> --
> 1.5.5.1
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super()
2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
@ 2015-07-21 8:40 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-07-21 8:40 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
linux-kernel
On Mon 20-07-15 19:01:06, Oleg Nesterov wrote:
> Move the "fool the lockdep" code from sb_wait_write() into the new
> simple helper, sb_lockdep_release(), called once before return from
> freeze_super().
>
> This is preparation, but imo this makes sense in any case. This way
> we do not hide from lockdep the "write" locks we hold when we call
> s_op->freeze_fs(sb).
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Good. You can add:
Reviewed-by: Jan Kara <jack@suse.com>
Honza
> ---
> fs/super.c | 16 ++++++++++------
> 1 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index d0fdd49..e7ea9f1 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1236,16 +1236,10 @@ static void sb_wait_write(struct super_block *sb, int level)
> {
> s64 writers;
>
> - /*
> - * We just cycle-through lockdep here so that it does not complain
> - * about returning with lock to userspace
> - */
> rwsem_acquire(&sb->s_writers.lock_map[level-1], 0, 0, _THIS_IP_);
> - rwsem_release(&sb->s_writers.lock_map[level-1], 1, _THIS_IP_);
>
> do {
> DEFINE_WAIT(wait);
> -
> /*
> * We use a barrier in prepare_to_wait() to separate setting
> * of frozen and checking of the counter
> @@ -1261,6 +1255,14 @@ static void sb_wait_write(struct super_block *sb, int level)
> } while (writers);
> }
>
> +static void sb_freeze_release(struct super_block *sb)
> +{
> + int level;
> + /* Avoid the warning from lockdep_sys_exit() */
> + for (level = 0; level < SB_FREEZE_LEVELS; ++level)
> + rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
> +}
> +
> /**
> * freeze_super - lock the filesystem and force it into a consistent state
> * @sb: the super to lock
> @@ -1349,6 +1351,7 @@ int freeze_super(struct super_block *sb)
> sb->s_writers.frozen = SB_UNFROZEN;
> smp_wmb();
> wake_up(&sb->s_writers.wait_unfrozen);
> + sb_freeze_release(sb);
> deactivate_locked_super(sb);
> return ret;
> }
> @@ -1358,6 +1361,7 @@ int freeze_super(struct super_block *sb)
> * sees write activity when frozen is set to SB_FREEZE_COMPLETE.
> */
> sb->s_writers.frozen = SB_FREEZE_COMPLETE;
> + sb_freeze_release(sb);
> up_write(&sb->s_umount);
> return 0;
> }
> --
> 1.5.5.1
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map
2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
@ 2015-07-21 8:48 ` Jan Kara
0 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2015-07-21 8:48 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Al Viro, Dave Chinner, Dave Hansen, Jan Kara, linux-fsdevel,
linux-kernel
On Mon 20-07-15 19:01:09, Oleg Nesterov wrote:
> Change thaw_super() to re-acquire the "write" locks we are going to
> release. This way s_op->unfreeze_fs(sb) is called with these locks
> held as it seen by lockdep, and this matches the reality. This adds
> another trivial helper, sb_freeze_acquire().
>
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Ah, I was confused for a moment why this doesn't trigger a lockdep warning
because of lock inversion between s_umount and freeze protection but you
use trylock in sb_freeze_acquire(). So things are fine.
You can add:
Reviewed-by: Jan Kara <jack@suse.com>
Honza
> ---
> fs/super.c | 12 ++++++++++++
> 1 files changed, 12 insertions(+), 0 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index e7ea9f1..b4db3ee 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1263,6 +1263,14 @@ static void sb_freeze_release(struct super_block *sb)
> rwsem_release(sb->s_writers.lock_map + level, 1, _THIS_IP_);
> }
>
> +static void sb_freeze_acquire(struct super_block *sb)
> +{
> + int level;
> +
> + for (level = 0; level < SB_FREEZE_LEVELS; ++level)
> + rwsem_acquire(sb->s_writers.lock_map + level, 0, 1, _THIS_IP_);
> +}
> +
> /**
> * freeze_super - lock the filesystem and force it into a consistent state
> * @sb: the super to lock
> @@ -1386,16 +1394,20 @@ int thaw_super(struct super_block *sb)
> if (sb->s_flags & MS_RDONLY)
> goto out;
>
> + sb_freeze_acquire(sb);
> +
> if (sb->s_op->unfreeze_fs) {
> error = sb->s_op->unfreeze_fs(sb);
> if (error) {
> printk(KERN_ERR
> "VFS:Filesystem thaw failed\n");
> + sb_freeze_release(sb);
> up_write(&sb->s_umount);
> return error;
> }
> }
>
> + sb_freeze_release(sb);
> out:
> sb->s_writers.frozen = SB_UNFROZEN;
> smp_wmb();
> --
> 1.5.5.1
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/4] fix the broken lockdep logic in __sb_start_write()
2015-07-21 8:38 ` Jan Kara
@ 2015-07-22 21:13 ` Oleg Nesterov
0 siblings, 0 replies; 10+ messages in thread
From: Oleg Nesterov @ 2015-07-22 21:13 UTC (permalink / raw)
To: Jan Kara; +Cc: Al Viro, Dave Chinner, Dave Hansen, linux-fsdevel, linux-kernel
On 07/21, Jan Kara wrote:
>
> On Mon 20-07-15 19:01:03, Oleg Nesterov wrote:
> > 1. wait_event(frozen < level) without rwsem_acquire_read() is just
> > wrong from lockdep perspective. If we are going to deadlock
> > because the caller is buggy, lockdep detect this problem.
> >
> > 2. __sb_start_write() can race with thaw_super() + freeze_super(),
> > and after "goto retry" the 2nd acquire_freeze_lock() is wrong.
> >
> > 3. The "tell lockdep we are doing trylock" hack doesn't look nice.
> >
> > I think this is correct, but this logic should be more explicit.
> > Yes, the recursive read_lock() is fine if we hold the lock on a
> > higher level. But we do not need to fool lockdep. If we can not
> > deadlock in this case then try-lock must not fail and we can use
> > use wait == F throughout this code.
> >
> > Note: as Dave Chinner explains, the "trylock" hack and the fat comment
> > can be probably removed. But this needs a separate change and it will
> > be trivial: just kill __sb_start_write() and rename do_sb_start_write()
> > back to __sb_start_write().
>
> The patch looks good. Did you test this BTW? You can add:
Yes, but "artificially". I just wrote the function which takes/drops
SB_FREEZE_FS twice with and then without SB_FREEZE_WRITE. It worked
as expected, lockdep complained when SB_FREEZE_WRITE wasn't held.
> Reviewed-by: Jan Kara <jack@suse.com>
Thanks!
Oleg.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-07-22 21:23 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-20 17:00 [PATCH 0/4] sb_write: lockdep fixes/cleanups Oleg Nesterov
2015-07-20 17:01 ` [PATCH 1/4] introduce __sb_{acquire,release}_write() helpers Oleg Nesterov
2015-07-21 8:23 ` Jan Kara
2015-07-20 17:01 ` [PATCH 2/4] fix the broken lockdep logic in __sb_start_write() Oleg Nesterov
2015-07-21 8:38 ` Jan Kara
2015-07-22 21:13 ` Oleg Nesterov
2015-07-20 17:01 ` [PATCH 3/4] move rwsem_release() from sb_wait_write() to freeze_super() Oleg Nesterov
2015-07-21 8:40 ` Jan Kara
2015-07-20 17:01 ` [PATCH 4/4] change thaw_super() to re-acquire s_writers.lock_map Oleg Nesterov
2015-07-21 8:48 ` Jan Kara
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).