All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] Cleanup dev-replace locking
@ 2018-09-07 14:54 David Sterba
  2018-09-07 14:55 ` [PATCH 01/11] btrfs: remove btrfs_dev_replace::read_locks David Sterba
                   ` (11 more replies)
  0 siblings, 12 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:54 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The series peels off the custom locking that's used for dev-replace and
uses read-write semaphore in the end.

I've mainly focused on correctness and haven't measured the performance
effects. There should be none as the blocking and waiting was merely
open coding what the rw semaphore does, but without the fairness. The
overall number of locks taken is low, there's a lots of IO in between so
even if new scheme is slower, I don't expect any dramatic change.

  git://github.com/kdave/btrfs-devel.git dev/dev-replace-locking

David Sterba (11):
  btrfs: remove btrfs_dev_replace::read_locks
  btrfs: open code btrfs_dev_replace_clear_lock_blocking
  btrfs: open code btrfs_dev_replace_stats_inc
  btrfs: open code btrfs_after_dev_replace_commit
  btrfs: dev-replace: avoid useless lock on error handling path
  btrfs: dev-replace: move replace members out of fs_info
  btrfs: dev-replace: remove pointless assert in write unlock
  btrfs: reada: reorder dev-replace locks before radix tree preload
  btrfs: dev-replace: swich locking to rw semaphore
  btrfs: dev-replace: remove custom read/write blocking scheme
  btrfs: dev-replace: open code trivial locking helpers

 fs/btrfs/ctree.h       |  11 ++--
 fs/btrfs/dev-replace.c | 136 ++++++++++++-----------------------------
 fs/btrfs/dev-replace.h |  13 ----
 fs/btrfs/disk-io.c     |  14 ++---
 fs/btrfs/reada.c       |  16 ++---
 fs/btrfs/scrub.c       |  26 ++++----
 fs/btrfs/transaction.c |   5 +-
 fs/btrfs/volumes.c     |  23 ++++---
 8 files changed, 87 insertions(+), 157 deletions(-)

-- 
2.18.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 01/11] btrfs: remove btrfs_dev_replace::read_locks
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 14:55 ` [PATCH 02/11] btrfs: open code btrfs_dev_replace_clear_lock_blocking David Sterba
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

This member seems to be copied from the extent_buffer locking scheme and
is at least used to assert that the read lock/unlock is properly nested.
In some way. While the _inc/_dec are called inside the read lock
section, the asserts are both inside and outside, so the ordering is not
guaranteed and we can see read/inc/dec ordered in any way
(theoretically).

A missing call of btrfs_dev_replace_clear_lock_blocking could cause
unexpected read_locks count, so this at least looks like a valid
assertion, but this will become unnecessary with later updates.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h       | 1 -
 fs/btrfs/dev-replace.c | 5 -----
 fs/btrfs/disk-io.c     | 1 -
 3 files changed, 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 441610de9908..31f7a58add7e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -367,7 +367,6 @@ struct btrfs_dev_replace {
 
 	struct mutex lock_finishing_cancel_unmount;
 	rwlock_t lock;
-	atomic_t read_locks;
 	atomic_t blocking_readers;
 	wait_queue_head_t read_lock_wq;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index dec01970d8c5..24529d2c723a 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -961,13 +961,10 @@ int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace)
 void btrfs_dev_replace_read_lock(struct btrfs_dev_replace *dev_replace)
 {
 	read_lock(&dev_replace->lock);
-	atomic_inc(&dev_replace->read_locks);
 }
 
 void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace)
 {
-	ASSERT(atomic_read(&dev_replace->read_locks) > 0);
-	atomic_dec(&dev_replace->read_locks);
 	read_unlock(&dev_replace->lock);
 }
 
@@ -994,7 +991,6 @@ void btrfs_dev_replace_set_lock_blocking(
 					struct btrfs_dev_replace *dev_replace)
 {
 	/* only set blocking for read lock */
-	ASSERT(atomic_read(&dev_replace->read_locks) > 0);
 	atomic_inc(&dev_replace->blocking_readers);
 	read_unlock(&dev_replace->lock);
 }
@@ -1004,7 +1000,6 @@ void btrfs_dev_replace_clear_lock_blocking(
 					struct btrfs_dev_replace *dev_replace)
 {
 	/* only set blocking for read lock */
-	ASSERT(atomic_read(&dev_replace->read_locks) > 0);
 	ASSERT(atomic_read(&dev_replace->blocking_readers) > 0);
 	read_lock(&dev_replace->lock);
 	/* Barrier implied by atomic_dec_and_test */
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3611df2ce5c1..acd0cbc5a6b9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2155,7 +2155,6 @@ static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
 {
 	mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount);
 	rwlock_init(&fs_info->dev_replace.lock);
-	atomic_set(&fs_info->dev_replace.read_locks, 0);
 	atomic_set(&fs_info->dev_replace.blocking_readers, 0);
 	init_waitqueue_head(&fs_info->replace_wait);
 	init_waitqueue_head(&fs_info->dev_replace.read_lock_wq);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 02/11] btrfs: open code btrfs_dev_replace_clear_lock_blocking
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
  2018-09-07 14:55 ` [PATCH 01/11] btrfs: remove btrfs_dev_replace::read_locks David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 14:55 ` [PATCH 03/11] btrfs: open code btrfs_dev_replace_stats_inc David Sterba
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

There's a single caller and the function name does not say it's actually
taking the lock, so open coding makes it more explicit.

For now, btrfs_dev_replace_read_lock is used instead of read_lock so
it's paired with the unlocking wrapper in the same block.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 12 ------------
 fs/btrfs/dev-replace.h |  2 --
 fs/btrfs/volumes.c     |  6 +++++-
 3 files changed, 5 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 24529d2c723a..004e8d380a11 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -995,18 +995,6 @@ void btrfs_dev_replace_set_lock_blocking(
 	read_unlock(&dev_replace->lock);
 }
 
-/* acquire read lock and dec blocking cnt */
-void btrfs_dev_replace_clear_lock_blocking(
-					struct btrfs_dev_replace *dev_replace)
-{
-	/* only set blocking for read lock */
-	ASSERT(atomic_read(&dev_replace->blocking_readers) > 0);
-	read_lock(&dev_replace->lock);
-	/* Barrier implied by atomic_dec_and_test */
-	if (atomic_dec_and_test(&dev_replace->blocking_readers))
-		cond_wake_up_nomb(&dev_replace->read_lock_wq);
-}
-
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 {
 	percpu_counter_inc(&fs_info->bio_counter);
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index b6d4206188bb..39f022c457ff 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -28,8 +28,6 @@ void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_set_lock_blocking(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_clear_lock_blocking(
-					struct btrfs_dev_replace *dev_replace);
 
 static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
 {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ea9845a5e601..762a4499d00a 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5904,7 +5904,11 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	}
 out:
 	if (dev_replace_is_ongoing) {
-		btrfs_dev_replace_clear_lock_blocking(dev_replace);
+		ASSERT(atomic_read(&dev_replace->blocking_readers) > 0);
+		btrfs_dev_replace_read_lock(dev_replace);
+		/* Barrier implied by atomic_dec_and_test */
+		if (atomic_dec_and_test(&dev_replace->blocking_readers))
+			cond_wake_up_nomb(&dev_replace->read_lock_wq);
 		btrfs_dev_replace_read_unlock(dev_replace);
 	}
 	free_extent_map(em);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 03/11] btrfs: open code btrfs_dev_replace_stats_inc
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
  2018-09-07 14:55 ` [PATCH 01/11] btrfs: remove btrfs_dev_replace::read_locks David Sterba
  2018-09-07 14:55 ` [PATCH 02/11] btrfs: open code btrfs_dev_replace_clear_lock_blocking David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 21:02   ` Omar Sandoval
  2018-09-07 14:55 ` [PATCH 04/11] btrfs: open code btrfs_after_dev_replace_commit David Sterba
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The wrapper is too trivial, open coding does not make it less readable.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.h |  5 -----
 fs/btrfs/scrub.c       | 11 ++++-------
 2 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index 39f022c457ff..1c91bf9ef39a 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -29,9 +29,4 @@ void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_set_lock_blocking(struct btrfs_dev_replace *dev_replace);
 
-static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
-{
-	atomic64_inc(stat_value);
-}
-
 #endif
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 4bcc275f7612..902819d3cf41 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1124,7 +1124,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
 
 			if (scrub_write_page_to_dev_replace(sblock_other,
 							    page_num) != 0) {
-				btrfs_dev_replace_stats_inc(
+				atomic64_inc(
 					&fs_info->dev_replace.num_write_errors);
 				success = 0;
 			}
@@ -1564,8 +1564,7 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
 		if (btrfsic_submit_bio_wait(bio)) {
 			btrfs_dev_stat_inc_and_print(page_bad->dev,
 				BTRFS_DEV_STAT_WRITE_ERRS);
-			btrfs_dev_replace_stats_inc(
-				&fs_info->dev_replace.num_write_errors);
+			atomic64_inc(&fs_info->dev_replace.num_write_errors);
 			bio_put(bio);
 			return -EIO;
 		}
@@ -1592,8 +1591,7 @@ static void scrub_write_block_to_dev_replace(struct scrub_block *sblock)
 
 		ret = scrub_write_page_to_dev_replace(sblock, page_num);
 		if (ret)
-			btrfs_dev_replace_stats_inc(
-				&fs_info->dev_replace.num_write_errors);
+			atomic64_inc(&fs_info->dev_replace.num_write_errors);
 	}
 }
 
@@ -1726,8 +1724,7 @@ static void scrub_wr_bio_end_io_worker(struct btrfs_work *work)
 			struct scrub_page *spage = sbio->pagev[i];
 
 			spage->io_error = 1;
-			btrfs_dev_replace_stats_inc(&dev_replace->
-						    num_write_errors);
+			atomic64_inc(&dev_replace->num_write_errors);
 		}
 	}
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 04/11] btrfs: open code btrfs_after_dev_replace_commit
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (2 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 03/11] btrfs: open code btrfs_dev_replace_stats_inc David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 21:03   ` Omar Sandoval
  2018-09-07 14:55 ` [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path David Sterba
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Too trivial, the purpose can be simply documented in a comment.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 8 --------
 fs/btrfs/dev-replace.h | 1 -
 fs/btrfs/transaction.c | 5 ++++-
 3 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 004e8d380a11..8b512dddf727 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -382,14 +382,6 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
 	return ret;
 }
 
-void btrfs_after_dev_replace_commit(struct btrfs_fs_info *fs_info)
-{
-	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
-
-	dev_replace->committed_cursor_left =
-		dev_replace->cursor_left_last_write_of_item;
-}
-
 static char* btrfs_dev_name(struct btrfs_device *device)
 {
 	if (!device || test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index 1c91bf9ef39a..795c551f5b5e 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -11,7 +11,6 @@ struct btrfs_ioctl_dev_replace_args;
 int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info);
 int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
 			  struct btrfs_fs_info *fs_info);
-void btrfs_after_dev_replace_commit(struct btrfs_fs_info *fs_info);
 int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
 			    struct btrfs_ioctl_dev_replace_args *args);
 int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e7856e15adbf..c0c75b5ae12a 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1197,7 +1197,10 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
 
 	list_add_tail(&fs_info->extent_root->dirty_list,
 		      &trans->transaction->switch_commits);
-	btrfs_after_dev_replace_commit(fs_info);
+
+	/* Update dev-replace pointer once everything is committed */
+	fs_info->dev_replace.committed_cursor_left =
+		fs_info->dev_replace.cursor_left_last_write_of_item;
 
 	return 0;
 }
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (3 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 04/11] btrfs: open code btrfs_after_dev_replace_commit David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 21:06   ` Omar Sandoval
  2018-09-14 16:53   ` David Sterba
  2018-09-07 14:55 ` [PATCH 06/11] btrfs: dev-replace: move replace members out of fs_info David Sterba
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The exit sequence in btrfs_dev_replace_start does not allow to simply
add a label to the right place so the error handling after starting
transaction failure jumps there. Currently there's a lock that pairs
with the unlock in the section, which is unnecessary and only raises
questions.  Add a variable to track the locking status and avoid the
extra locking.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 8b512dddf727..0a49843b2ee6 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -400,6 +400,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	int ret;
 	struct btrfs_device *tgt_device = NULL;
 	struct btrfs_device *src_device = NULL;
+	bool need_unlock;
 
 	ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
 					    srcdev_name, &src_device);
@@ -424,6 +425,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 		return PTR_ERR(trans);
 	}
 
+	need_unlock = true;
 	btrfs_dev_replace_write_lock(dev_replace);
 	switch (dev_replace->replace_state) {
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
@@ -462,6 +464,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	atomic64_set(&dev_replace->num_write_errors, 0);
 	atomic64_set(&dev_replace->num_uncorrectable_read_errors, 0);
 	btrfs_dev_replace_write_unlock(dev_replace);
+	need_unlock = false;
 
 	ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
 	if (ret)
@@ -473,7 +476,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	trans = btrfs_start_transaction(root, 0);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
-		btrfs_dev_replace_write_lock(dev_replace);
 		goto leave;
 	}
 
@@ -497,7 +499,8 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 leave:
 	dev_replace->srcdev = NULL;
 	dev_replace->tgtdev = NULL;
-	btrfs_dev_replace_write_unlock(dev_replace);
+	if (need_unlock)
+		btrfs_dev_replace_write_unlock(dev_replace);
 	btrfs_destroy_dev_replace_tgtdev(tgt_device);
 	return ret;
 }
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 06/11] btrfs: dev-replace: move replace members out of fs_info
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (4 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 21:09   ` Omar Sandoval
  2018-09-07 14:55 ` [PATCH 07/11] btrfs: dev-replace: remove pointless assert in write unlock David Sterba
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The replace_wait and bio_counter were mistakenly added to fs_info in
commit c404e0dc2c843b154f ("Btrfs: fix use-after-free in the finishing
procedure of the device replace"), but they logically belong to
fs_info::dev_replace. Besides, bio_counter is a very generic name and is
confusing in bare fs_info context.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h       |  6 +++---
 fs/btrfs/dev-replace.c | 16 ++++++++--------
 fs/btrfs/disk-io.c     |  9 +++++----
 3 files changed, 16 insertions(+), 15 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 31f7a58add7e..89fc90eafb0a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -371,6 +371,9 @@ struct btrfs_dev_replace {
 	wait_queue_head_t read_lock_wq;
 
 	struct btrfs_scrub_progress scrub_progress;
+
+	struct percpu_counter bio_counter;
+	wait_queue_head_t replace_wait;
 };
 
 /* For raid type sysfs entries */
@@ -1093,9 +1096,6 @@ struct btrfs_fs_info {
 	/* device replace state */
 	struct btrfs_dev_replace dev_replace;
 
-	struct percpu_counter bio_counter;
-	wait_queue_head_t replace_wait;
-
 	struct semaphore uuid_tree_rescan_sem;
 
 	/* Used to reclaim the metadata space in the background. */
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 0a49843b2ee6..c0e9b2c229d5 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -540,8 +540,8 @@ int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
 static void btrfs_rm_dev_replace_blocked(struct btrfs_fs_info *fs_info)
 {
 	set_bit(BTRFS_FS_STATE_DEV_REPLACING, &fs_info->fs_state);
-	wait_event(fs_info->replace_wait, !percpu_counter_sum(
-		   &fs_info->bio_counter));
+	wait_event(fs_info->dev_replace.replace_wait, !percpu_counter_sum(
+		   &fs_info->dev_replace.bio_counter));
 }
 
 /*
@@ -550,7 +550,7 @@ static void btrfs_rm_dev_replace_blocked(struct btrfs_fs_info *fs_info)
 static void btrfs_rm_dev_replace_unblocked(struct btrfs_fs_info *fs_info)
 {
 	clear_bit(BTRFS_FS_STATE_DEV_REPLACING, &fs_info->fs_state);
-	wake_up(&fs_info->replace_wait);
+	wake_up(&fs_info->dev_replace.replace_wait);
 }
 
 static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
@@ -992,25 +992,25 @@ void btrfs_dev_replace_set_lock_blocking(
 
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 {
-	percpu_counter_inc(&fs_info->bio_counter);
+	percpu_counter_inc(&fs_info->dev_replace.bio_counter);
 }
 
 void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
 {
-	percpu_counter_sub(&fs_info->bio_counter, amount);
-	cond_wake_up_nomb(&fs_info->replace_wait);
+	percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
+	cond_wake_up_nomb(&fs_info->dev_replace.replace_wait);
 }
 
 void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
 {
 	while (1) {
-		percpu_counter_inc(&fs_info->bio_counter);
+		percpu_counter_inc(&fs_info->dev_replace.bio_counter);
 		if (likely(!test_bit(BTRFS_FS_STATE_DEV_REPLACING,
 				     &fs_info->fs_state)))
 			break;
 
 		btrfs_bio_counter_dec(fs_info);
-		wait_event(fs_info->replace_wait,
+		wait_event(fs_info->dev_replace.replace_wait,
 			   !test_bit(BTRFS_FS_STATE_DEV_REPLACING,
 				     &fs_info->fs_state));
 	}
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index acd0cbc5a6b9..3b3906589d12 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2156,7 +2156,7 @@ static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
 	mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount);
 	rwlock_init(&fs_info->dev_replace.lock);
 	atomic_set(&fs_info->dev_replace.blocking_readers, 0);
-	init_waitqueue_head(&fs_info->replace_wait);
+	init_waitqueue_head(&fs_info->dev_replace.replace_wait);
 	init_waitqueue_head(&fs_info->dev_replace.read_lock_wq);
 }
 
@@ -2646,7 +2646,8 @@ int open_ctree(struct super_block *sb,
 		goto fail_dirty_metadata_bytes;
 	}
 
-	ret = percpu_counter_init(&fs_info->bio_counter, 0, GFP_KERNEL);
+	ret = percpu_counter_init(&fs_info->dev_replace.bio_counter, 0,
+			GFP_KERNEL);
 	if (ret) {
 		err = ret;
 		goto fail_delalloc_bytes;
@@ -3307,7 +3308,7 @@ int open_ctree(struct super_block *sb,
 
 	iput(fs_info->btree_inode);
 fail_bio_counter:
-	percpu_counter_destroy(&fs_info->bio_counter);
+	percpu_counter_destroy(&fs_info->dev_replace.bio_counter);
 fail_delalloc_bytes:
 	percpu_counter_destroy(&fs_info->delalloc_bytes);
 fail_dirty_metadata_bytes:
@@ -4016,7 +4017,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
 
 	percpu_counter_destroy(&fs_info->dirty_metadata_bytes);
 	percpu_counter_destroy(&fs_info->delalloc_bytes);
-	percpu_counter_destroy(&fs_info->bio_counter);
+	percpu_counter_destroy(&fs_info->dev_replace.bio_counter);
 	cleanup_srcu_struct(&fs_info->subvol_srcu);
 
 	btrfs_free_stripe_hash_table(fs_info);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 07/11] btrfs: dev-replace: remove pointless assert in write unlock
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (5 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 06/11] btrfs: dev-replace: move replace members out of fs_info David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 14:55 ` [PATCH 08/11] btrfs: reada: reorder dev-replace locks before radix tree preload David Sterba
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The value of blocking_readers is increased only when the lock is taken
for read, no way we can fail the condition with the write lock.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index c0e9b2c229d5..ba1cbedaa422 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -977,7 +977,6 @@ void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
 
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
 {
-	ASSERT(atomic_read(&dev_replace->blocking_readers) == 0);
 	write_unlock(&dev_replace->lock);
 }
 
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 08/11] btrfs: reada: reorder dev-replace locks before radix tree preload
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (6 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 07/11] btrfs: dev-replace: remove pointless assert in write unlock David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 14:55 ` [PATCH 09/11] btrfs: dev-replace: swich locking to rw semaphore David Sterba
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The read lock is going to use rw semaphore that might sleep, this is not
possible in the radix tree preload section. The lock nesting is now:

* device replace
  * radix tree preload
    * readahead spinlock

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/reada.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index dec14b739b10..6c4fdd98582d 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -376,26 +376,28 @@ static struct reada_extent *reada_find_extent(struct btrfs_fs_info *fs_info,
 		goto error;
 	}
 
+	/* insert extent in reada_tree + all per-device trees, all or nothing */
+	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
 	ret = radix_tree_preload(GFP_KERNEL);
-	if (ret)
+	if (ret) {
+		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 		goto error;
+	}
 
-	/* insert extent in reada_tree + all per-device trees, all or nothing */
-	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
 	spin_lock(&fs_info->reada_lock);
 	ret = radix_tree_insert(&fs_info->reada_tree, index, re);
 	if (ret == -EEXIST) {
 		re_exist = radix_tree_lookup(&fs_info->reada_tree, index);
 		re_exist->refcnt++;
 		spin_unlock(&fs_info->reada_lock);
-		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 		radix_tree_preload_end();
+		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 		goto error;
 	}
 	if (ret) {
 		spin_unlock(&fs_info->reada_lock);
-		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 		radix_tree_preload_end();
+		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
 		goto error;
 	}
 	radix_tree_preload_end();
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 09/11] btrfs: dev-replace: swich locking to rw semaphore
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (7 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 08/11] btrfs: reada: reorder dev-replace locks before radix tree preload David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 14:55 ` [PATCH 10/11] btrfs: dev-replace: remove custom read/write blocking scheme David Sterba
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The this is first part of removing the custom locking and waiting scheme
used for device replace. It was probably copied from extent buffer
locking, but there's nothing that would require more than is provided by
the common locking primitives.

The rw spinlock protects waiting tasks counter in case of incompatible
locks and the waitqueue. Same as rw semaphore.

This patch only switches the locking primitive, for better
bisectability.  There should be no functional change other than the
overhead of the locking and potential sleeping instead of spinning when
the lock is contended.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/dev-replace.c | 12 ++++++------
 fs/btrfs/disk-io.c     |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 89fc90eafb0a..02b5372a42d7 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -366,7 +366,7 @@ struct btrfs_dev_replace {
 	struct btrfs_device *tgtdev;
 
 	struct mutex lock_finishing_cancel_unmount;
-	rwlock_t lock;
+	struct rw_semaphore rwsem;
 	atomic_t blocking_readers;
 	wait_queue_head_t read_lock_wq;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index ba1cbedaa422..ddaa9dcf8284 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -955,12 +955,12 @@ int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace)
 
 void btrfs_dev_replace_read_lock(struct btrfs_dev_replace *dev_replace)
 {
-	read_lock(&dev_replace->lock);
+	down_read(&dev_replace->rwsem);
 }
 
 void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace)
 {
-	read_unlock(&dev_replace->lock);
+	up_read(&dev_replace->rwsem);
 }
 
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
@@ -968,16 +968,16 @@ void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
 again:
 	wait_event(dev_replace->read_lock_wq,
 		   atomic_read(&dev_replace->blocking_readers) == 0);
-	write_lock(&dev_replace->lock);
+	down_write(&dev_replace->rwsem);
 	if (atomic_read(&dev_replace->blocking_readers)) {
-		write_unlock(&dev_replace->lock);
+		up_write(&dev_replace->rwsem);
 		goto again;
 	}
 }
 
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
 {
-	write_unlock(&dev_replace->lock);
+	up_write(&dev_replace->rwsem);
 }
 
 /* inc blocking cnt and release read lock */
@@ -986,7 +986,7 @@ void btrfs_dev_replace_set_lock_blocking(
 {
 	/* only set blocking for read lock */
 	atomic_inc(&dev_replace->blocking_readers);
-	read_unlock(&dev_replace->lock);
+	up_read(&dev_replace->rwsem);
 }
 
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3b3906589d12..3f73f2a3d67b 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2154,7 +2154,7 @@ static void btrfs_init_btree_inode(struct btrfs_fs_info *fs_info)
 static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
 {
 	mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount);
-	rwlock_init(&fs_info->dev_replace.lock);
+	init_rwsem(&fs_info->dev_replace.rwsem);
 	atomic_set(&fs_info->dev_replace.blocking_readers, 0);
 	init_waitqueue_head(&fs_info->dev_replace.replace_wait);
 	init_waitqueue_head(&fs_info->dev_replace.read_lock_wq);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 10/11] btrfs: dev-replace: remove custom read/write blocking scheme
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (8 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 09/11] btrfs: dev-replace: swich locking to rw semaphore David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-07 14:55 ` [PATCH 11/11] btrfs: dev-replace: open code trivial locking helpers David Sterba
  2018-09-27 11:06 ` [PATCH 00/11] Cleanup dev-replace locking David Sterba
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

After the rw semaphore has been added, the custom blocking using
::blocking_readers and ::read_lock_wq is redundant.

The blocking logic in __btrfs_map_block is replaced by extending the
time the semaphore is held, that has the same blocking effect on writes
as the previous custom scheme that waited until ::blocking_readers was
zero.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h       |  2 --
 fs/btrfs/dev-replace.c | 16 ----------------
 fs/btrfs/dev-replace.h |  1 -
 fs/btrfs/disk-io.c     |  2 --
 fs/btrfs/volumes.c     | 13 ++++++-------
 5 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 02b5372a42d7..2a255d5897f8 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -367,8 +367,6 @@ struct btrfs_dev_replace {
 
 	struct mutex lock_finishing_cancel_unmount;
 	struct rw_semaphore rwsem;
-	atomic_t blocking_readers;
-	wait_queue_head_t read_lock_wq;
 
 	struct btrfs_scrub_progress scrub_progress;
 
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index ddaa9dcf8284..e8c2f418b2d7 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -965,14 +965,7 @@ void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace)
 
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
 {
-again:
-	wait_event(dev_replace->read_lock_wq,
-		   atomic_read(&dev_replace->blocking_readers) == 0);
 	down_write(&dev_replace->rwsem);
-	if (atomic_read(&dev_replace->blocking_readers)) {
-		up_write(&dev_replace->rwsem);
-		goto again;
-	}
 }
 
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
@@ -980,15 +973,6 @@ void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
 	up_write(&dev_replace->rwsem);
 }
 
-/* inc blocking cnt and release read lock */
-void btrfs_dev_replace_set_lock_blocking(
-					struct btrfs_dev_replace *dev_replace)
-{
-	/* only set blocking for read lock */
-	atomic_inc(&dev_replace->blocking_readers);
-	up_read(&dev_replace->rwsem);
-}
-
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 {
 	percpu_counter_inc(&fs_info->dev_replace.bio_counter);
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index 795c551f5b5e..eec7289c2618 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -26,6 +26,5 @@ void btrfs_dev_replace_read_lock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace);
 void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_set_lock_blocking(struct btrfs_dev_replace *dev_replace);
 
 #endif
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3f73f2a3d67b..520b1d6f615f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2155,9 +2155,7 @@ static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
 {
 	mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount);
 	init_rwsem(&fs_info->dev_replace.rwsem);
-	atomic_set(&fs_info->dev_replace.blocking_readers, 0);
 	init_waitqueue_head(&fs_info->dev_replace.replace_wait);
-	init_waitqueue_head(&fs_info->dev_replace.read_lock_wq);
 }
 
 static void btrfs_init_qgroup(struct btrfs_fs_info *fs_info)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 762a4499d00a..d1f83529ac80 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5706,10 +5706,12 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 
 	btrfs_dev_replace_read_lock(dev_replace);
 	dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
+	/*
+	 * Hold the semaphore for read during the whole operation, write is
+	 * requested at commit time but must wait.
+	 */
 	if (!dev_replace_is_ongoing)
 		btrfs_dev_replace_read_unlock(dev_replace);
-	else
-		btrfs_dev_replace_set_lock_blocking(dev_replace);
 
 	if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
 	    !need_full_stripe(op) && dev_replace->tgtdev != NULL) {
@@ -5904,11 +5906,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	}
 out:
 	if (dev_replace_is_ongoing) {
-		ASSERT(atomic_read(&dev_replace->blocking_readers) > 0);
-		btrfs_dev_replace_read_lock(dev_replace);
-		/* Barrier implied by atomic_dec_and_test */
-		if (atomic_dec_and_test(&dev_replace->blocking_readers))
-			cond_wake_up_nomb(&dev_replace->read_lock_wq);
+		lockdep_assert_held(&dev_replace->rwsem);
+		/* Unlock and let waiting writers proceed */
 		btrfs_dev_replace_read_unlock(dev_replace);
 	}
 	free_extent_map(em);
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 11/11] btrfs: dev-replace: open code trivial locking helpers
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (9 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 10/11] btrfs: dev-replace: remove custom read/write blocking scheme David Sterba
@ 2018-09-07 14:55 ` David Sterba
  2018-09-27 11:06 ` [PATCH 00/11] Cleanup dev-replace locking David Sterba
  11 siblings, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-07 14:55 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The dev-replace locking functions are now trivial wrappers around rw
semaphore that can be used directly everywhere. No functional change.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/dev-replace.c | 73 ++++++++++++++++--------------------------
 fs/btrfs/dev-replace.h |  4 ---
 fs/btrfs/reada.c       | 12 +++----
 fs/btrfs/scrub.c       | 15 +++++----
 fs/btrfs/volumes.c     | 14 ++++----
 5 files changed, 48 insertions(+), 70 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index e8c2f418b2d7..d61c225b3e83 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -285,13 +285,13 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
 	struct btrfs_dev_replace_item *ptr;
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 
-	btrfs_dev_replace_read_lock(dev_replace);
+	down_read(&dev_replace->rwsem);
 	if (!dev_replace->is_valid ||
 	    !dev_replace->item_needs_writeback) {
-		btrfs_dev_replace_read_unlock(dev_replace);
+		up_read(&dev_replace->rwsem);
 		return 0;
 	}
-	btrfs_dev_replace_read_unlock(dev_replace);
+	up_read(&dev_replace->rwsem);
 
 	key.objectid = 0;
 	key.type = BTRFS_DEV_REPLACE_KEY;
@@ -349,7 +349,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
 	ptr = btrfs_item_ptr(eb, path->slots[0],
 			     struct btrfs_dev_replace_item);
 
-	btrfs_dev_replace_write_lock(dev_replace);
+	down_write(&dev_replace->rwsem);
 	if (dev_replace->srcdev)
 		btrfs_set_dev_replace_src_devid(eb, ptr,
 			dev_replace->srcdev->devid);
@@ -372,7 +372,7 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
 	btrfs_set_dev_replace_cursor_right(eb, ptr,
 		dev_replace->cursor_right);
 	dev_replace->item_needs_writeback = 0;
-	btrfs_dev_replace_write_unlock(dev_replace);
+	up_write(&dev_replace->rwsem);
 
 	btrfs_mark_buffer_dirty(eb);
 
@@ -426,7 +426,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	}
 
 	need_unlock = true;
-	btrfs_dev_replace_write_lock(dev_replace);
+	down_write(&dev_replace->rwsem);
 	switch (dev_replace->replace_state) {
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -463,7 +463,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	dev_replace->item_needs_writeback = 1;
 	atomic64_set(&dev_replace->num_write_errors, 0);
 	atomic64_set(&dev_replace->num_uncorrectable_read_errors, 0);
-	btrfs_dev_replace_write_unlock(dev_replace);
+	up_write(&dev_replace->rwsem);
 	need_unlock = false;
 
 	ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
@@ -500,7 +500,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
 	dev_replace->srcdev = NULL;
 	dev_replace->tgtdev = NULL;
 	if (need_unlock)
-		btrfs_dev_replace_write_unlock(dev_replace);
+		up_write(&dev_replace->rwsem);
 	btrfs_destroy_dev_replace_tgtdev(tgt_device);
 	return ret;
 }
@@ -567,18 +567,18 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	/* don't allow cancel or unmount to disturb the finishing procedure */
 	mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
 
-	btrfs_dev_replace_read_lock(dev_replace);
+	down_read(&dev_replace->rwsem);
 	/* was the operation canceled, or is it finished? */
 	if (dev_replace->replace_state !=
 	    BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED) {
-		btrfs_dev_replace_read_unlock(dev_replace);
+		up_read(&dev_replace->rwsem);
 		mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
 		return 0;
 	}
 
 	tgt_device = dev_replace->tgtdev;
 	src_device = dev_replace->srcdev;
-	btrfs_dev_replace_read_unlock(dev_replace);
+	up_read(&dev_replace->rwsem);
 
 	/*
 	 * flush all outstanding I/O and inode extent mappings before the
@@ -602,7 +602,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	/* keep away write_all_supers() during the finishing procedure */
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 	mutex_lock(&fs_info->chunk_mutex);
-	btrfs_dev_replace_write_lock(dev_replace);
+	down_write(&dev_replace->rwsem);
 	dev_replace->replace_state =
 		scrub_ret ? BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED
 			  : BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED;
@@ -622,7 +622,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 				 btrfs_dev_name(src_device),
 				 src_device->devid,
 				 rcu_str_deref(tgt_device->name), scrub_ret);
-		btrfs_dev_replace_write_unlock(dev_replace);
+		up_write(&dev_replace->rwsem);
 		mutex_unlock(&fs_info->chunk_mutex);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 		btrfs_rm_dev_replace_blocked(fs_info);
@@ -658,8 +658,7 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 	list_add(&tgt_device->dev_alloc_list, &fs_info->fs_devices->alloc_list);
 	fs_info->fs_devices->rw_devices++;
 
-	btrfs_dev_replace_write_unlock(dev_replace);
-
+	up_write(&dev_replace->rwsem);
 	btrfs_rm_dev_replace_blocked(fs_info);
 
 	btrfs_rm_dev_replace_remove_srcdev(src_device);
@@ -756,7 +755,7 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info,
 {
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 
-	btrfs_dev_replace_read_lock(dev_replace);
+	down_read(&dev_replace->rwsem);
 	/* even if !dev_replace_is_valid, the values are good enough for
 	 * the replace_status ioctl */
 	args->result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NO_ERROR;
@@ -768,7 +767,7 @@ void btrfs_dev_replace_status(struct btrfs_fs_info *fs_info,
 	args->status.num_uncorrectable_read_errors =
 		atomic64_read(&dev_replace->num_uncorrectable_read_errors);
 	args->status.progress_1000 = btrfs_dev_replace_progress(fs_info);
-	btrfs_dev_replace_read_unlock(dev_replace);
+	up_read(&dev_replace->rwsem);
 }
 
 int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
@@ -785,13 +784,13 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 		return -EROFS;
 
 	mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
-	btrfs_dev_replace_write_lock(dev_replace);
+	down_write(&dev_replace->rwsem);
 	switch (dev_replace->replace_state) {
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
 		result = BTRFS_IOCTL_DEV_REPLACE_RESULT_NOT_STARTED;
-		btrfs_dev_replace_write_unlock(dev_replace);
+		up_write(&dev_replace->rwsem);
 		goto leave;
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_SUSPENDED:
@@ -805,7 +804,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info)
 	dev_replace->replace_state = BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED;
 	dev_replace->time_stopped = ktime_get_real_seconds();
 	dev_replace->item_needs_writeback = 1;
-	btrfs_dev_replace_write_unlock(dev_replace);
+	up_write(&dev_replace->rwsem);
 	btrfs_scrub_cancel(fs_info);
 
 	trans = btrfs_start_transaction(root, 0);
@@ -834,7 +833,8 @@ void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info)
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 
 	mutex_lock(&dev_replace->lock_finishing_cancel_unmount);
-	btrfs_dev_replace_write_lock(dev_replace);
+	down_write(&dev_replace->rwsem);
+
 	switch (dev_replace->replace_state) {
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
@@ -850,7 +850,7 @@ void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info)
 		break;
 	}
 
-	btrfs_dev_replace_write_unlock(dev_replace);
+	up_write(&dev_replace->rwsem);
 	mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
 }
 
@@ -860,12 +860,13 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
 	struct task_struct *task;
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 
-	btrfs_dev_replace_write_lock(dev_replace);
+	down_write(&dev_replace->rwsem);
+
 	switch (dev_replace->replace_state) {
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_FINISHED:
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_CANCELED:
-		btrfs_dev_replace_write_unlock(dev_replace);
+		up_write(&dev_replace->rwsem);
 		return 0;
 	case BTRFS_IOCTL_DEV_REPLACE_STATE_STARTED:
 		break;
@@ -879,10 +880,10 @@ int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info)
 			   "cannot continue dev_replace, tgtdev is missing");
 		btrfs_info(fs_info,
 			   "you may cancel the operation after 'mount -o degraded'");
-		btrfs_dev_replace_write_unlock(dev_replace);
+		up_write(&dev_replace->rwsem);
 		return 0;
 	}
-	btrfs_dev_replace_write_unlock(dev_replace);
+	up_write(&dev_replace->rwsem);
 
 	/*
 	 * This could collide with a paused balance, but the exclusive op logic
@@ -953,26 +954,6 @@ int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace)
 	return 1;
 }
 
-void btrfs_dev_replace_read_lock(struct btrfs_dev_replace *dev_replace)
-{
-	down_read(&dev_replace->rwsem);
-}
-
-void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace)
-{
-	up_read(&dev_replace->rwsem);
-}
-
-void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace)
-{
-	down_write(&dev_replace->rwsem);
-}
-
-void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace)
-{
-	up_write(&dev_replace->rwsem);
-}
-
 void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
 {
 	percpu_counter_inc(&fs_info->dev_replace.bio_counter);
diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
index eec7289c2618..f81cb16e8e56 100644
--- a/fs/btrfs/dev-replace.h
+++ b/fs/btrfs/dev-replace.h
@@ -22,9 +22,5 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info *fs_info);
 void btrfs_dev_replace_suspend_for_unmount(struct btrfs_fs_info *fs_info);
 int btrfs_resume_dev_replace_async(struct btrfs_fs_info *fs_info);
 int btrfs_dev_replace_is_ongoing(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_read_lock(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_read_unlock(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace);
-void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace);
 
 #endif
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 6c4fdd98582d..96002e6e2346 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -377,10 +377,10 @@ static struct reada_extent *reada_find_extent(struct btrfs_fs_info *fs_info,
 	}
 
 	/* insert extent in reada_tree + all per-device trees, all or nothing */
-	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
+	down_read(&fs_info->dev_replace.rwsem);
 	ret = radix_tree_preload(GFP_KERNEL);
 	if (ret) {
-		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+		up_read(&fs_info->dev_replace.rwsem);
 		goto error;
 	}
 
@@ -391,13 +391,13 @@ static struct reada_extent *reada_find_extent(struct btrfs_fs_info *fs_info,
 		re_exist->refcnt++;
 		spin_unlock(&fs_info->reada_lock);
 		radix_tree_preload_end();
-		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+		up_read(&fs_info->dev_replace.rwsem);
 		goto error;
 	}
 	if (ret) {
 		spin_unlock(&fs_info->reada_lock);
 		radix_tree_preload_end();
-		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+		up_read(&fs_info->dev_replace.rwsem);
 		goto error;
 	}
 	radix_tree_preload_end();
@@ -439,13 +439,13 @@ static struct reada_extent *reada_find_extent(struct btrfs_fs_info *fs_info,
 			}
 			radix_tree_delete(&fs_info->reada_tree, index);
 			spin_unlock(&fs_info->reada_lock);
-			btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+			up_read(&fs_info->dev_replace.rwsem);
 			goto error;
 		}
 		have_zone = 1;
 	}
 	spin_unlock(&fs_info->reada_lock);
-	btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+	up_read(&fs_info->dev_replace.rwsem);
 
 	if (!have_zone)
 		goto error;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 902819d3cf41..4936cc65dca0 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3596,11 +3596,12 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 			break;
 		}
 
-		btrfs_dev_replace_write_lock(&fs_info->dev_replace);
+		down_write(&fs_info->dev_replace.rwsem);
 		dev_replace->cursor_right = found_key.offset + length;
 		dev_replace->cursor_left = found_key.offset;
 		dev_replace->item_needs_writeback = 1;
-		btrfs_dev_replace_write_unlock(&fs_info->dev_replace);
+		up_write(&dev_replace->rwsem);
+
 		ret = scrub_chunk(sctx, scrub_dev, chunk_offset, length,
 				  found_key.offset, cache);
 
@@ -3636,10 +3637,10 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
 
 		scrub_pause_off(fs_info);
 
-		btrfs_dev_replace_write_lock(&fs_info->dev_replace);
+		down_write(&fs_info->dev_replace.rwsem);
 		dev_replace->cursor_left = dev_replace->cursor_right;
 		dev_replace->item_needs_writeback = 1;
-		btrfs_dev_replace_write_unlock(&fs_info->dev_replace);
+		up_write(&fs_info->dev_replace.rwsem);
 
 		if (ro_set)
 			btrfs_dec_block_group_ro(cache);
@@ -3838,16 +3839,16 @@ int btrfs_scrub_dev(struct btrfs_fs_info *fs_info, u64 devid, u64 start,
 		return -EIO;
 	}
 
-	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
+	down_read(&fs_info->dev_replace.rwsem);
 	if (dev->scrub_ctx ||
 	    (!is_dev_replace &&
 	     btrfs_dev_replace_is_ongoing(&fs_info->dev_replace))) {
-		btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+		up_read(&fs_info->dev_replace.rwsem);
 		mutex_unlock(&fs_info->scrub_lock);
 		mutex_unlock(&fs_info->fs_devices->device_list_mutex);
 		return -EINPROGRESS;
 	}
-	btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+	up_read(&fs_info->dev_replace.rwsem);
 
 	ret = scrub_workers_get(fs_info, is_dev_replace);
 	if (ret) {
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d1f83529ac80..bde2992c3713 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1862,12 +1862,12 @@ static u64 btrfs_num_devices(struct btrfs_fs_info *fs_info)
 {
 	u64 num_devices = fs_info->fs_devices->num_devices;
 
-	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
+	down_read(&fs_info->dev_replace.rwsem);
 	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace)) {
 		ASSERT(num_devices > 1);
 		num_devices--;
 	}
-	btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+	up_read(&fs_info->dev_replace.rwsem);
 
 	return num_devices;
 }
@@ -5122,11 +5122,11 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 		ret = 1;
 	free_extent_map(em);
 
-	btrfs_dev_replace_read_lock(&fs_info->dev_replace);
+	down_read(&fs_info->dev_replace.rwsem);
 	if (btrfs_dev_replace_is_ongoing(&fs_info->dev_replace) &&
 	    fs_info->dev_replace.tgtdev)
 		ret++;
-	btrfs_dev_replace_read_unlock(&fs_info->dev_replace);
+	up_read(&fs_info->dev_replace.rwsem);
 
 	return ret;
 }
@@ -5704,14 +5704,14 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	if (!bbio_ret)
 		goto out;
 
-	btrfs_dev_replace_read_lock(dev_replace);
+	down_read(&dev_replace->rwsem);
 	dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace);
 	/*
 	 * Hold the semaphore for read during the whole operation, write is
 	 * requested at commit time but must wait.
 	 */
 	if (!dev_replace_is_ongoing)
-		btrfs_dev_replace_read_unlock(dev_replace);
+		up_read(&dev_replace->rwsem);
 
 	if (dev_replace_is_ongoing && mirror_num == map->num_stripes + 1 &&
 	    !need_full_stripe(op) && dev_replace->tgtdev != NULL) {
@@ -5908,7 +5908,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 	if (dev_replace_is_ongoing) {
 		lockdep_assert_held(&dev_replace->rwsem);
 		/* Unlock and let waiting writers proceed */
-		btrfs_dev_replace_read_unlock(dev_replace);
+		up_read(&dev_replace->rwsem);
 	}
 	free_extent_map(em);
 	return ret;
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 03/11] btrfs: open code btrfs_dev_replace_stats_inc
  2018-09-07 14:55 ` [PATCH 03/11] btrfs: open code btrfs_dev_replace_stats_inc David Sterba
@ 2018-09-07 21:02   ` Omar Sandoval
  0 siblings, 0 replies; 21+ messages in thread
From: Omar Sandoval @ 2018-09-07 21:02 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Fri, Sep 07, 2018 at 04:55:06PM +0200, David Sterba wrote:
> The wrapper is too trivial, open coding does not make it less readable.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/dev-replace.h |  5 -----
>  fs/btrfs/scrub.c       | 11 ++++-------
>  2 files changed, 4 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
> index 39f022c457ff..1c91bf9ef39a 100644
> --- a/fs/btrfs/dev-replace.h
> +++ b/fs/btrfs/dev-replace.h
> @@ -29,9 +29,4 @@ void btrfs_dev_replace_write_lock(struct btrfs_dev_replace *dev_replace);
>  void btrfs_dev_replace_write_unlock(struct btrfs_dev_replace *dev_replace);
>  void btrfs_dev_replace_set_lock_blocking(struct btrfs_dev_replace *dev_replace);
>  
> -static inline void btrfs_dev_replace_stats_inc(atomic64_t *stat_value)
> -{
> -	atomic64_inc(stat_value);
> -}
> -
>  #endif
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 4bcc275f7612..902819d3cf41 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1124,7 +1124,7 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check)
>  
>  			if (scrub_write_page_to_dev_replace(sblock_other,
>  							    page_num) != 0) {
> -				btrfs_dev_replace_stats_inc(
> +				atomic64_inc(
>  					&fs_info->dev_replace.num_write_errors);
>  				success = 0;
>  			}
> @@ -1564,8 +1564,7 @@ static int scrub_repair_page_from_good_copy(struct scrub_block *sblock_bad,
>  		if (btrfsic_submit_bio_wait(bio)) {
>  			btrfs_dev_stat_inc_and_print(page_bad->dev,
>  				BTRFS_DEV_STAT_WRITE_ERRS);
> -			btrfs_dev_replace_stats_inc(
> -				&fs_info->dev_replace.num_write_errors);
> +			atomic64_inc(&fs_info->dev_replace.num_write_errors);
>  			bio_put(bio);
>  			return -EIO;
>  		}
> @@ -1592,8 +1591,7 @@ static void scrub_write_block_to_dev_replace(struct scrub_block *sblock)
>  
>  		ret = scrub_write_page_to_dev_replace(sblock, page_num);
>  		if (ret)
> -			btrfs_dev_replace_stats_inc(
> -				&fs_info->dev_replace.num_write_errors);
> +			atomic64_inc(&fs_info->dev_replace.num_write_errors);
>  	}
>  }
>  
> @@ -1726,8 +1724,7 @@ static void scrub_wr_bio_end_io_worker(struct btrfs_work *work)
>  			struct scrub_page *spage = sbio->pagev[i];
>  
>  			spage->io_error = 1;
> -			btrfs_dev_replace_stats_inc(&dev_replace->
> -						    num_write_errors);
> +			atomic64_inc(&dev_replace->num_write_errors);
>  		}
>  	}
>  
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 04/11] btrfs: open code btrfs_after_dev_replace_commit
  2018-09-07 14:55 ` [PATCH 04/11] btrfs: open code btrfs_after_dev_replace_commit David Sterba
@ 2018-09-07 21:03   ` Omar Sandoval
  0 siblings, 0 replies; 21+ messages in thread
From: Omar Sandoval @ 2018-09-07 21:03 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Fri, Sep 07, 2018 at 04:55:09PM +0200, David Sterba wrote:
> Too trivial, the purpose can be simply documented in a comment.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/dev-replace.c | 8 --------
>  fs/btrfs/dev-replace.h | 1 -
>  fs/btrfs/transaction.c | 5 ++++-
>  3 files changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 004e8d380a11..8b512dddf727 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -382,14 +382,6 @@ int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
>  	return ret;
>  }
>  
> -void btrfs_after_dev_replace_commit(struct btrfs_fs_info *fs_info)
> -{
> -	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
> -
> -	dev_replace->committed_cursor_left =
> -		dev_replace->cursor_left_last_write_of_item;
> -}
> -
>  static char* btrfs_dev_name(struct btrfs_device *device)
>  {
>  	if (!device || test_bit(BTRFS_DEV_STATE_MISSING, &device->dev_state))
> diff --git a/fs/btrfs/dev-replace.h b/fs/btrfs/dev-replace.h
> index 1c91bf9ef39a..795c551f5b5e 100644
> --- a/fs/btrfs/dev-replace.h
> +++ b/fs/btrfs/dev-replace.h
> @@ -11,7 +11,6 @@ struct btrfs_ioctl_dev_replace_args;
>  int btrfs_init_dev_replace(struct btrfs_fs_info *fs_info);
>  int btrfs_run_dev_replace(struct btrfs_trans_handle *trans,
>  			  struct btrfs_fs_info *fs_info);
> -void btrfs_after_dev_replace_commit(struct btrfs_fs_info *fs_info);
>  int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
>  			    struct btrfs_ioctl_dev_replace_args *args);
>  int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
> diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
> index e7856e15adbf..c0c75b5ae12a 100644
> --- a/fs/btrfs/transaction.c
> +++ b/fs/btrfs/transaction.c
> @@ -1197,7 +1197,10 @@ static noinline int commit_cowonly_roots(struct btrfs_trans_handle *trans)
>  
>  	list_add_tail(&fs_info->extent_root->dirty_list,
>  		      &trans->transaction->switch_commits);
> -	btrfs_after_dev_replace_commit(fs_info);
> +
> +	/* Update dev-replace pointer once everything is committed */
> +	fs_info->dev_replace.committed_cursor_left =
> +		fs_info->dev_replace.cursor_left_last_write_of_item;
>  
>  	return 0;
>  }
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path
  2018-09-07 14:55 ` [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path David Sterba
@ 2018-09-07 21:06   ` Omar Sandoval
  2018-09-14 16:53   ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: Omar Sandoval @ 2018-09-07 21:06 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Fri, Sep 07, 2018 at 04:55:11PM +0200, David Sterba wrote:
> The exit sequence in btrfs_dev_replace_start does not allow to simply
> add a label to the right place so the error handling after starting
> transaction failure jumps there. Currently there's a lock that pairs
> with the unlock in the section, which is unnecessary and only raises
> questions.  Add a variable to track the locking status and avoid the
> extra locking.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/dev-replace.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 8b512dddf727..0a49843b2ee6 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -400,6 +400,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	int ret;
>  	struct btrfs_device *tgt_device = NULL;
>  	struct btrfs_device *src_device = NULL;
> +	bool need_unlock;
>  
>  	ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
>  					    srcdev_name, &src_device);
> @@ -424,6 +425,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  		return PTR_ERR(trans);
>  	}
>  
> +	need_unlock = true;
>  	btrfs_dev_replace_write_lock(dev_replace);
>  	switch (dev_replace->replace_state) {
>  	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
> @@ -462,6 +464,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	atomic64_set(&dev_replace->num_write_errors, 0);
>  	atomic64_set(&dev_replace->num_uncorrectable_read_errors, 0);
>  	btrfs_dev_replace_write_unlock(dev_replace);
> +	need_unlock = false;
>  
>  	ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
>  	if (ret)
> @@ -473,7 +476,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	trans = btrfs_start_transaction(root, 0);
>  	if (IS_ERR(trans)) {
>  		ret = PTR_ERR(trans);
> -		btrfs_dev_replace_write_lock(dev_replace);
>  		goto leave;
>  	}
>  
> @@ -497,7 +499,8 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  leave:
>  	dev_replace->srcdev = NULL;
>  	dev_replace->tgtdev = NULL;
> -	btrfs_dev_replace_write_unlock(dev_replace);
> +	if (need_unlock)
> +		btrfs_dev_replace_write_unlock(dev_replace);
>  	btrfs_destroy_dev_replace_tgtdev(tgt_device);
>  	return ret;
>  }
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 06/11] btrfs: dev-replace: move replace members out of fs_info
  2018-09-07 14:55 ` [PATCH 06/11] btrfs: dev-replace: move replace members out of fs_info David Sterba
@ 2018-09-07 21:09   ` Omar Sandoval
  0 siblings, 0 replies; 21+ messages in thread
From: Omar Sandoval @ 2018-09-07 21:09 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Fri, Sep 07, 2018 at 04:55:13PM +0200, David Sterba wrote:
> The replace_wait and bio_counter were mistakenly added to fs_info in
> commit c404e0dc2c843b154f ("Btrfs: fix use-after-free in the finishing
> procedure of the device replace"), but they logically belong to
> fs_info::dev_replace. Besides, bio_counter is a very generic name and is
> confusing in bare fs_info context.

Makes much more sense there.

Reviewed-by: Omar Sandoval <osandov@fb.com>

> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/ctree.h       |  6 +++---
>  fs/btrfs/dev-replace.c | 16 ++++++++--------
>  fs/btrfs/disk-io.c     |  9 +++++----
>  3 files changed, 16 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 31f7a58add7e..89fc90eafb0a 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -371,6 +371,9 @@ struct btrfs_dev_replace {
>  	wait_queue_head_t read_lock_wq;
>  
>  	struct btrfs_scrub_progress scrub_progress;
> +
> +	struct percpu_counter bio_counter;
> +	wait_queue_head_t replace_wait;
>  };
>  
>  /* For raid type sysfs entries */
> @@ -1093,9 +1096,6 @@ struct btrfs_fs_info {
>  	/* device replace state */
>  	struct btrfs_dev_replace dev_replace;
>  
> -	struct percpu_counter bio_counter;
> -	wait_queue_head_t replace_wait;
> -
>  	struct semaphore uuid_tree_rescan_sem;
>  
>  	/* Used to reclaim the metadata space in the background. */
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 0a49843b2ee6..c0e9b2c229d5 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -540,8 +540,8 @@ int btrfs_dev_replace_by_ioctl(struct btrfs_fs_info *fs_info,
>  static void btrfs_rm_dev_replace_blocked(struct btrfs_fs_info *fs_info)
>  {
>  	set_bit(BTRFS_FS_STATE_DEV_REPLACING, &fs_info->fs_state);
> -	wait_event(fs_info->replace_wait, !percpu_counter_sum(
> -		   &fs_info->bio_counter));
> +	wait_event(fs_info->dev_replace.replace_wait, !percpu_counter_sum(
> +		   &fs_info->dev_replace.bio_counter));
>  }
>  
>  /*
> @@ -550,7 +550,7 @@ static void btrfs_rm_dev_replace_blocked(struct btrfs_fs_info *fs_info)
>  static void btrfs_rm_dev_replace_unblocked(struct btrfs_fs_info *fs_info)
>  {
>  	clear_bit(BTRFS_FS_STATE_DEV_REPLACING, &fs_info->fs_state);
> -	wake_up(&fs_info->replace_wait);
> +	wake_up(&fs_info->dev_replace.replace_wait);
>  }
>  
>  static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
> @@ -992,25 +992,25 @@ void btrfs_dev_replace_set_lock_blocking(
>  
>  void btrfs_bio_counter_inc_noblocked(struct btrfs_fs_info *fs_info)
>  {
> -	percpu_counter_inc(&fs_info->bio_counter);
> +	percpu_counter_inc(&fs_info->dev_replace.bio_counter);
>  }
>  
>  void btrfs_bio_counter_sub(struct btrfs_fs_info *fs_info, s64 amount)
>  {
> -	percpu_counter_sub(&fs_info->bio_counter, amount);
> -	cond_wake_up_nomb(&fs_info->replace_wait);
> +	percpu_counter_sub(&fs_info->dev_replace.bio_counter, amount);
> +	cond_wake_up_nomb(&fs_info->dev_replace.replace_wait);
>  }
>  
>  void btrfs_bio_counter_inc_blocked(struct btrfs_fs_info *fs_info)
>  {
>  	while (1) {
> -		percpu_counter_inc(&fs_info->bio_counter);
> +		percpu_counter_inc(&fs_info->dev_replace.bio_counter);
>  		if (likely(!test_bit(BTRFS_FS_STATE_DEV_REPLACING,
>  				     &fs_info->fs_state)))
>  			break;
>  
>  		btrfs_bio_counter_dec(fs_info);
> -		wait_event(fs_info->replace_wait,
> +		wait_event(fs_info->dev_replace.replace_wait,
>  			   !test_bit(BTRFS_FS_STATE_DEV_REPLACING,
>  				     &fs_info->fs_state));
>  	}
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index acd0cbc5a6b9..3b3906589d12 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2156,7 +2156,7 @@ static void btrfs_init_dev_replace_locks(struct btrfs_fs_info *fs_info)
>  	mutex_init(&fs_info->dev_replace.lock_finishing_cancel_unmount);
>  	rwlock_init(&fs_info->dev_replace.lock);
>  	atomic_set(&fs_info->dev_replace.blocking_readers, 0);
> -	init_waitqueue_head(&fs_info->replace_wait);
> +	init_waitqueue_head(&fs_info->dev_replace.replace_wait);
>  	init_waitqueue_head(&fs_info->dev_replace.read_lock_wq);
>  }
>  
> @@ -2646,7 +2646,8 @@ int open_ctree(struct super_block *sb,
>  		goto fail_dirty_metadata_bytes;
>  	}
>  
> -	ret = percpu_counter_init(&fs_info->bio_counter, 0, GFP_KERNEL);
> +	ret = percpu_counter_init(&fs_info->dev_replace.bio_counter, 0,
> +			GFP_KERNEL);
>  	if (ret) {
>  		err = ret;
>  		goto fail_delalloc_bytes;
> @@ -3307,7 +3308,7 @@ int open_ctree(struct super_block *sb,
>  
>  	iput(fs_info->btree_inode);
>  fail_bio_counter:
> -	percpu_counter_destroy(&fs_info->bio_counter);
> +	percpu_counter_destroy(&fs_info->dev_replace.bio_counter);
>  fail_delalloc_bytes:
>  	percpu_counter_destroy(&fs_info->delalloc_bytes);
>  fail_dirty_metadata_bytes:
> @@ -4016,7 +4017,7 @@ void close_ctree(struct btrfs_fs_info *fs_info)
>  
>  	percpu_counter_destroy(&fs_info->dirty_metadata_bytes);
>  	percpu_counter_destroy(&fs_info->delalloc_bytes);
> -	percpu_counter_destroy(&fs_info->bio_counter);
> +	percpu_counter_destroy(&fs_info->dev_replace.bio_counter);
>  	cleanup_srcu_struct(&fs_info->subvol_srcu);
>  
>  	btrfs_free_stripe_hash_table(fs_info);
> -- 
> 2.18.0
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path
  2018-09-07 14:55 ` [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path David Sterba
  2018-09-07 21:06   ` Omar Sandoval
@ 2018-09-14 16:53   ` David Sterba
  1 sibling, 0 replies; 21+ messages in thread
From: David Sterba @ 2018-09-14 16:53 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Fri, Sep 07, 2018 at 04:55:11PM +0200, David Sterba wrote:
> The exit sequence in btrfs_dev_replace_start does not allow to simply
> add a label to the right place so the error handling after starting
> transaction failure jumps there. Currently there's a lock that pairs
> with the unlock in the section, which is unnecessary and only raises
> questions.  Add a variable to track the locking status and avoid the
> extra locking.
> 
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
>  fs/btrfs/dev-replace.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
> index 8b512dddf727..0a49843b2ee6 100644
> --- a/fs/btrfs/dev-replace.c
> +++ b/fs/btrfs/dev-replace.c
> @@ -400,6 +400,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	int ret;
>  	struct btrfs_device *tgt_device = NULL;
>  	struct btrfs_device *src_device = NULL;
> +	bool need_unlock;
>  
>  	ret = btrfs_find_device_by_devspec(fs_info, srcdevid,
>  					    srcdev_name, &src_device);
> @@ -424,6 +425,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  		return PTR_ERR(trans);
>  	}
>  
> +	need_unlock = true;
>  	btrfs_dev_replace_write_lock(dev_replace);
>  	switch (dev_replace->replace_state) {
>  	case BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED:
> @@ -462,6 +464,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	atomic64_set(&dev_replace->num_write_errors, 0);
>  	atomic64_set(&dev_replace->num_uncorrectable_read_errors, 0);
>  	btrfs_dev_replace_write_unlock(dev_replace);
> +	need_unlock = false;
>  
>  	ret = btrfs_sysfs_add_device_link(tgt_device->fs_devices, tgt_device);
>  	if (ret)
> @@ -473,7 +476,6 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
>  	trans = btrfs_start_transaction(root, 0);
>  	if (IS_ERR(trans)) {
>  		ret = PTR_ERR(trans);
> -		btrfs_dev_replace_write_lock(dev_replace);

There's a merge conflict with Jeff's patch
https://patchwork.kernel.org/patch/10591003/ that adds some dev-replace
state change, so the lock must be here. I'll update my patch to

@@ -474,6 +477,7 @@ int btrfs_dev_replace_start(struct btrfs_fs_info *fs_info,
        trans = btrfs_start_transaction(root, 0);
        if (IS_ERR(trans)) {
                ret = PTR_ERR(trans);
+               need_unlock = true;
                btrfs_dev_replace_write_lock(dev_replace);
                dev_replace->replace_state =
                        BTRFS_IOCTL_DEV_REPLACE_STATE_NEVER_STARTED;
---

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 00/11] Cleanup dev-replace locking
  2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
                   ` (10 preceding siblings ...)
  2018-09-07 14:55 ` [PATCH 11/11] btrfs: dev-replace: open code trivial locking helpers David Sterba
@ 2018-09-27 11:06 ` David Sterba
  2018-10-04  1:06   ` Anand Jain
  11 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-09-27 11:06 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

On Fri, Sep 07, 2018 at 04:54:59PM +0200, David Sterba wrote:
> The series peels off the custom locking that's used for dev-replace and
> uses read-write semaphore in the end.
> 
> I've mainly focused on correctness and haven't measured the performance
> effects. There should be none as the blocking and waiting was merely
> open coding what the rw semaphore does, but without the fairness. The
> overall number of locks taken is low, there's a lots of IO in between so
> even if new scheme is slower, I don't expect any dramatic change.

Mixed results. The btrfs/011 is a test that usually takes long time
(around 1000 sec on my test box) and I have a long list of run times to
compare. There this patchset does not show any problems. However on a VM
with 8 CPUs, this goes from similar times (1000 sec) to several hours,
without apparent reason. The is other low activity on the system, but so
this is with testing other patchsets and the times are not that bad.

So, I'm going to merge the first part of the patchset that does not
switch the locking primitives to the semaphores, more analysis needed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 00/11] Cleanup dev-replace locking
  2018-09-27 11:06 ` [PATCH 00/11] Cleanup dev-replace locking David Sterba
@ 2018-10-04  1:06   ` Anand Jain
  2018-10-04 10:13     ` David Sterba
  0 siblings, 1 reply; 21+ messages in thread
From: Anand Jain @ 2018-10-04  1:06 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs



On 09/27/2018 07:06 PM, David Sterba wrote:
> On Fri, Sep 07, 2018 at 04:54:59PM +0200, David Sterba wrote:
>> The series peels off the custom locking that's used for dev-replace and
>> uses read-write semaphore in the end.
>>
>> I've mainly focused on correctness and haven't measured the performance
>> effects. There should be none as the blocking and waiting was merely
>> open coding what the rw semaphore does, but without the fairness. The
>> overall number of locks taken is low, there's a lots of IO in between so
>> even if new scheme is slower, I don't expect any dramatic change.
> 
> Mixed results. The btrfs/011 is a test that usually takes long time
> (around 1000 sec on my test box) and I have a long list of run times to
> compare. There this patchset does not show any problems. However on a VM
> with 8 CPUs, this goes from similar times (1000 sec) to several hours,
> without apparent reason. The is other low activity on the system, but so
> this is with testing other patchsets and the times are not that bad.
> 
> So, I'm going to merge the first part of the patchset that does not
> switch the locking primitives to the semaphores, more analysis needed.
> 

I have reviewed/tested all in this set and it looks good to me.

So Reviewed-by: Anand Jain <anand.jain@oracle.com>

I do observer btrfs/011 taking longer time to complete (200sec more) and 
Null pointer dereference even without this patch set, tracing back lead 
to conclusion, that

v4.17 is good.
v4.18-rc1 is bad.

And in particular these commit ids..

7a932516f55c Merge tag 'vfs-timespec64' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground
e7655d2b2546 Merge tag 'for-4.18-part2-tag' of 
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
15eefe2a99b2 Merge branch 'vfs_timespec64' of 
https://github.com/deepa-hub/vfs into vfs-timespec64
6396bb221514 treewide: kzalloc() -> kcalloc()

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 00/11] Cleanup dev-replace locking
  2018-10-04  1:06   ` Anand Jain
@ 2018-10-04 10:13     ` David Sterba
  2018-10-05  8:58       ` Anand Jain
  0 siblings, 1 reply; 21+ messages in thread
From: David Sterba @ 2018-10-04 10:13 UTC (permalink / raw)
  To: Anand Jain; +Cc: dsterba, David Sterba, linux-btrfs

On Thu, Oct 04, 2018 at 09:06:01AM +0800, Anand Jain wrote:
> 
> 
> On 09/27/2018 07:06 PM, David Sterba wrote:
> > On Fri, Sep 07, 2018 at 04:54:59PM +0200, David Sterba wrote:
> >> The series peels off the custom locking that's used for dev-replace and
> >> uses read-write semaphore in the end.
> >>
> >> I've mainly focused on correctness and haven't measured the performance
> >> effects. There should be none as the blocking and waiting was merely
> >> open coding what the rw semaphore does, but without the fairness. The
> >> overall number of locks taken is low, there's a lots of IO in between so
> >> even if new scheme is slower, I don't expect any dramatic change.
> > 
> > Mixed results. The btrfs/011 is a test that usually takes long time
> > (around 1000 sec on my test box) and I have a long list of run times to
> > compare. There this patchset does not show any problems. However on a VM
> > with 8 CPUs, this goes from similar times (1000 sec) to several hours,
> > without apparent reason. The is other low activity on the system, but so
> > this is with testing other patchsets and the times are not that bad.
> > 
> > So, I'm going to merge the first part of the patchset that does not
> > switch the locking primitives to the semaphores, more analysis needed.
> > 
> 
> I have reviewed/tested all in this set and it looks good to me.
> 
> So Reviewed-by: Anand Jain <anand.jain@oracle.com>

Thanks for testing, I'll add the tag to the patches.

> I do observer btrfs/011 taking longer time to complete (200sec more) and 
> Null pointer dereference even without this patch set, tracing back lead 
> to conclusion, that

The run times of 011 are quite wild and sometimes drop to 500 secs and
go back to 1000+, on the same devices and basically same kernel. But
replacing the locking would need further analysis as it's possible that
the reader/writer fairness of the rwsem is acually not helping in the
replace case that as currently implemented favors readers.

I'm rarely observing a crash in 011, stacktrace attached, please let me
know if it's the same as the one you mention.

[  738.344228] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdc started
[  739.585174] WARNING: CPU: 3 PID: 20788 at fs/btrfs/scrub.c:620 scrub_setup_ctx.isra.19+0x20c/0x210 [btrfs]
[  739.591953] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
[  739.594926] CPU: 3 PID: 20788 Comm: btrfs Not tainted 4.19.0-rc5-default+ #271
[  739.596350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[  739.598304] RIP: 0010:scrub_setup_ctx.isra.19+0x20c/0x210 [btrfs]
[  739.603855] RSP: 0018:ffff914d076bbc30 EFLAGS: 00010246
[  739.605120] RAX: 0000000000000001 RBX: 0000000000000040 RCX: 0000000000182000
[  739.606206] RDX: ffffffffc04aa000 RSI: 0000000000000000 RDI: 0000000000000038
[  739.607119] RBP: ffffffffc03c9640 R08: ffffffffc0328000 R09: ffffffff9a00abc0
[  739.608025] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  739.608950] R13: ffff9049f94d8000 R14: ffff9049fbc56800 R15: ffff9049f94daf30
[  739.610321] FS:  00007f72e9a328c0(0000) GS:ffff9049fd200000(0000) knlGS:0000000000000000
[  739.612299] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  739.613602] CR2: 00005645d94eb4f0 CR3: 0000000046de0000 CR4: 00000000000006e0
[  739.615244] Call Trace:
[  739.615986]  btrfs_scrub_dev+0x14c/0x5a0 [btrfs]
[  739.617122]  ? start_transaction+0xa1/0x470 [btrfs]
[  739.618335]  btrfs_dev_replace_start.cold.24+0x9a/0x144 [btrfs]
[  739.619680]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
[  739.621003]  btrfs_ioctl+0x2a94/0x31d0 [btrfs]
[  739.622123]  ? filemap_map_pages+0x279/0x3d0
[  739.623199]  ? _raw_spin_unlock+0x24/0x40
[  739.624225]  ? __handle_mm_fault+0xb52/0xcb0
[  739.625263]  ? do_vfs_ioctl+0xa2/0x6b0
[  739.626247]  do_vfs_ioctl+0xa2/0x6b0
[  739.627184]  ksys_ioctl+0x3a/0x70
[  739.628058]  __x64_sys_ioctl+0x16/0x20
[  739.628953]  do_syscall_64+0x5a/0x1a0
[  739.629893]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  739.631083] RIP: 0033:0x7f72e8efeaa7
[  739.636109] RSP: 002b:00007ffd6d3695a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  739.637900] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f72e8efeaa7
[  739.639508] RDX: 00007ffd6d3699e0 RSI: 00000000ca289435 RDI: 0000000000000003
[  739.641116] RBP: 0000000000000000 R08: 00007f72e9a328c0 R09: 00007f72e93e72a0
[  739.642780] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd6d36c278
[  739.644499] R13: 0000000000000000 R14: 0000000000000004 R15: 00005645d95b72a0
[  739.646082] irq event stamp: 0
[  739.646968] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
[  739.648878] hardirqs last disabled at (0): [<ffffffff9905b9cd>] copy_process.part.66+0x84d/0x1db0
[  739.651061] softirqs last  enabled at (0): [<ffffffff9905b9cd>] copy_process.part.66+0x84d/0x1db0
[  739.653042] softirqs last disabled at (0): [<0000000000000000>]           (null)
[  739.654965] ---[ end trace 9e2151540055738e ]---
[  740.471443] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdc canceled
[  740.614161] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
[  740.617332] PGD 0 P4D 0
[  740.618575] Oops: 0000 [#1] PREEMPT SMP
[  740.620664] CPU: 0 PID: 12214 Comm: kworker/u16:4 Tainted: G        W         4.19.0-rc5-default+ #271
[  740.626680] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[  740.629983] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
[  740.631239] RIP: 0010:scrub_write_page_to_dev_replace+0x86/0x2d0 [btrfs]
[  740.636572] RSP: 0018:ffff914d0b0fbd28 EFLAGS: 00010206
[  740.637711] RAX: ffff9049931b0800 RBX: ffff9049e3103400 RCX: 0000000000000000
[  740.639271] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9049931b0800
[  740.640820] RBP: ffff9049fbc56800 R08: ffff9049931b0780 R09: ffff9049e3103400
[  740.642359] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9049e30dbf00
[  740.643876] R13: ffff9049fbc56ae8 R14: ffffffffc03c5ce0 R15: ffff9049c6d82a00
[  740.645378] FS:  0000000000000000(0000) GS:ffff9049fcc00000(0000) knlGS:0000000000000000
[  740.647291] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  740.648514] CR2: 00000000000000b0 CR3: 0000000038011000 CR4: 00000000000006f0
[  740.650067] Call Trace:
[  740.650709]  scrub_write_block_to_dev_replace+0x3b/0x60 [btrfs]
[  740.652083]  scrub_bio_end_io_worker+0xee/0x500 [btrfs]
[  740.653241]  normal_work_helper+0xcd/0x530 [btrfs]
[  740.654368]  process_one_work+0x246/0x5c0
[  740.655321]  worker_thread+0x3c/0x390
[  740.656233]  ? rescuer_thread+0x360/0x360
[  740.657271]  kthread+0x116/0x130
[  740.658138]  ? kthread_create_worker_on_cpu+0x70/0x70
[  740.659353]  ret_from_fork+0x24/0x30
[  740.660276] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
[  740.662453] CR2: 00000000000000b0
[  740.663402] ---[ end trace 9e2151540055738f ]---
[  740.664622] RIP: 0010:scrub_write_page_to_dev_replace+0x86/0x2d0 [btrfs]
[  740.670545] RSP: 0018:ffff914d0b0fbd28 EFLAGS: 00010206
[  740.671856] RAX: ffff9049931b0800 RBX: ffff9049e3103400 RCX: 0000000000000000
[  740.673523] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9049931b0800
[  740.675211] RBP: ffff9049fbc56800 R08: ffff9049931b0780 R09: ffff9049e3103400
[  740.676876] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9049e30dbf00
[  740.678582] R13: ffff9049fbc56ae8 R14: ffffffffc03c5ce0 R15: ffff9049c6d82a00
[  740.680092] FS:  0000000000000000(0000) GS:ffff9049fcc00000(0000) knlGS:0000000000000000
[  740.682002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  740.683264] CR2: 00000000000000b0 CR3: 0000000038011000 CR4: 00000000000006f0
[  740.684623] BUG: sleeping function called from invalid context at include/linux/cgroup-defs.h:711
[  740.686144] in_atomic(): 0, irqs_disabled(): 1, pid: 12214, name: kworker/u16:4
[  740.687471] INFO: lockdep is turned off.
[  740.688446] irq event stamp: 755514
[  740.689212] hardirqs last  enabled at (755513): [<ffffffff9960adf9>] _raw_spin_unlock_irq+0x29/0x50
[  740.690933] hardirqs last disabled at (755514): [<ffffffff9960283e>] __schedule+0xae/0xaf0
[  740.692750] softirqs last  enabled at (755460): [<ffffffff99a0034d>] __do_softirq+0x34d/0x442
[  740.694759] softirqs last disabled at (755439): [<ffffffff990648ba>] irq_exit+0xca/0xd0
[  740.696621] CPU: 0 PID: 12214 Comm: kworker/u16:4 Tainted: G      D W         4.19.0-rc5-default+ #271
[  740.698340] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[  740.700854] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
[  740.702106] Call Trace:
[  740.702695]  dump_stack+0x85/0xc0
[  740.703527]  ___might_sleep.cold.79+0xac/0xbe
[  740.704539]  exit_signals+0x30/0x130
[  740.705425]  do_exit+0xac/0xb90
[  740.706178]  ? rescuer_thread+0x360/0x360
[  740.707018]  ? kthread+0x116/0x130
[  740.707748]  rewind_stack_do_exit+0x17/0x20

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 00/11] Cleanup dev-replace locking
  2018-10-04 10:13     ` David Sterba
@ 2018-10-05  8:58       ` Anand Jain
  0 siblings, 0 replies; 21+ messages in thread
From: Anand Jain @ 2018-10-05  8:58 UTC (permalink / raw)
  To: dsterba, David Sterba, linux-btrfs



>> I do observer btrfs/011 taking longer time to complete (200sec more) and
>> Null pointer dereference even without this patch set,


>> tracing back lead
>> to conclusion, that

Ah. My trace back was for the circular lock dependency warning which I 
reported here: Subject: btrfs/011 possible circular locking dependency 
detected. Which is just a warning. If possible we could fix it. But 
dereference is something we should try harder. Null deference doesn't 
happen consistently here, I couldn't capture the stack trace.

> I'm rarely observing a crash in 011, stacktrace attached,

  Yep rarely. Thanks for the stacktrace.

Per the stack trace..

replace started..

[1]
[  738.344228] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 
1) to /dev/vdc started
[  739.585174] WARNING: CPU: 3 PID: 20788 at fs/btrfs/scrub.c:620 
scrub_setup_ctx.isra.19+0x20c/0x210 [btrfs]

The replace was canceled 'before' (as below) the scrub_setup_ctx() was 
able to init with the replace target missing warning.

-------------------
  .. scrub_setup_ctx(..)
  573 {
::
  619         if (is_dev_replace) {
  620                 WARN_ON(!fs_info->dev_replace.tgtdev);
::
-------------------



[2]
[  740.471443] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 
1) to /dev/vdc canceled

As scrub_setup_ctx() was able to init scrub successfully with a warning 
as above


[3]

[  740.614161] BUG: unable to handle kernel NULL pointer dereference at 
00000000000000b0
[  740.629983] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
[  740.631239] RIP: 0010:scrub_write_page_to_dev_replace+0x86/0x2d0 [btrfs]
[  740.650709]  scrub_write_block_to_dev_replace+0x3b/0x60 [btrfs]

We try to do bio to the target which is gone.


I can reproduce with [4] (sorry for the very rudimentary diff)
(without this patch-set.).

[4]
----------------------
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index dec01970d8c5..767cc380d08e 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -20,6 +20,9 @@
  #include "rcu-string.h"
  #include "dev-replace.h"
  #include "sysfs.h"
+#include <linux/delay.h>
+
+bool replace_cancel_done = false;

  static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
                                        int scrub_ret);
@@ -488,6 +491,11 @@ int btrfs_dev_replace_start(struct btrfs_fs_info 
*fs_info,
         ret = btrfs_commit_transaction(trans);
         WARN_ON(ret);

+       while(replace_cancel_done == false) {
+               msleep(1000);
+       }
+       replace_cancel_done = false;
+
         /* the disk copy procedure reuses the scrub code */
         ret = btrfs_scrub_dev(fs_info, src_device->devid, 0,
                               btrfs_device_get_total_bytes(src_device),
@@ -829,6 +837,7 @@ int btrfs_dev_replace_cancel(struct btrfs_fs_info 
*fs_info)
         if (tgt_device)
                 btrfs_destroy_dev_replace_tgtdev(tgt_device);

+       replace_cancel_done = true;
  leave:
         mutex_unlock(&dev_replace->lock_finishing_cancel_unmount);
         return result;
----------------------

Run:
mkfs.btrfs -qf /dev/sdb && mount /dev/sdb /btrfs
btrfs replace start -B 1 /dev/sdc /btrfs
from another terminal
btrfs replace cancel /btrfs

And we see the panic [5] which is same as yours.

The reason is the replace write lock does not encapsulate the scrub 
init, but if you do it now how to support cancel.

Fix is WIP. Thanks, Anand


[5]
-----------------------------------------------
[22304.655493] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 
1) to /dev/sdc started
[22343.645654] BTRFS info (device sdb): dev_replace from /dev/sdb (devid 
1) to /dev/sdc canceled
[22344.226987] WARNING: CPU: 0 PID: 7128 at fs/btrfs/scrub.c:620 
scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
[22344.226999] Modules linked in: btrfs xor zstd_decompress 
zstd_compress xxhash raid6_pq iptable_nat nf_nat_ipv4 nf_nat [last 
unloaded: xor]
[22344.227026] CPU: 0 PID: 7128 Comm: btrfs Not tainted 4.19.0-rc6+ #9
[22344.227029] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006
[22344.227045] RIP: 0010:scrub_setup_ctx.isra.19+0x220/0x230 [btrfs]
[22344.227048] Code: 00 00 49 8b 86 30 1e 00 00 5b 5d 49 89 84 24 50 03 
00 00 41 c6 84 24 58 03 00 00 00 4c 89 e0 41 5c 41 5d 41 5e c3 0f 0b eb 
8f <0f> 0b eb c8 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41
[22344.227051] RSP: 0018:ffffadc3c0633c48 EFLAGS: 00010246
[22344.227054] RAX: ffff99e978bab800 RBX: 000000000000003f RCX: 
0000000000000001
[22344.227057] RDX: 0000000000000000 RSI: 000000000015b000 RDI: 
0000000000000000
[22344.227059] RBP: ffff99e978bab9f8 R08: 0000000000000066 R09: 
ffff99e97013dc00
[22344.227061] R10: 0000000000000001 R11: 00000000fcf2000b R12: 
ffff99e978bab800
[22344.227064] R13: 0000000000000001 R14: ffff99e9734d0000 R15: 
0000000000000000
[22344.227067] FS:  00007f5f281a98c0(0000) GS:ffff99e97da00000(0000) 
knlGS:0000000000000000
[22344.227069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22344.227073] CR2: 000055ca99e26f30 CR3: 0000000078a38000 CR4: 
00000000000406f0
[22344.227075] Call Trace:
[22344.227092]  btrfs_scrub_dev+0x19b/0x5b0 [btrfs]
[22344.227110]  btrfs_dev_replace_start+0x564/0x6a0 [btrfs]
[22344.227126]  btrfs_dev_replace_by_ioctl+0x3a/0x60 [btrfs]
[22344.227140]  btrfs_ioctl+0x201b/0x2ce0 [btrfs]
[22344.227151]  ? find_held_lock+0x2d/0x90
[22344.227161]  do_vfs_ioctl+0xa9/0x6c0
[22344.227165]  ? lockdep_hardirqs_on+0xed/0x1a0
[22344.227171]  ksys_ioctl+0x60/0x90
[22344.227176]  __x64_sys_ioctl+0x16/0x20
[22344.227179]  do_syscall_64+0x50/0x180
[22344.227187]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[22344.227190] RIP: 0033:0x7f5f26fa7567
[22344.227194] Code: 44 00 00 48 8b 05 29 09 2d 00 64 c7 00 26 00 00 00 
48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 
05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f9 08 2d 00 f7 d8 64 89 01 48
[22344.227196] RSP: 002b:00007fff72353c08 EFLAGS: 00000202 ORIG_RAX: 
0000000000000010
[22344.227200] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 
00007f5f26fa7567
[22344.227202] RDX: 00007fff72354a80 RSI: 00000000ca289435 RDI: 
0000000000000003
[22344.227204] RBP: 0000000000000001 R08: 0000000000000000 R09: 
00007fff72353b60
[22344.227206] R10: 0000000000000008 R11: 0000000000000202 R12: 
0000000000000001
[22344.227208] R13: 0000000000000000 R14: 0000000000000003 R15: 
00000000024f4050
[22344.227222] irq event stamp: 43742
[22344.227227] hardirqs last  enabled at (43741): [<ffffffffa0a9518c>] 
__slab_alloc+0x52/0x61
[22344.227231] hardirqs last disabled at (43742): [<ffffffffa0801ad7>] 
trace_hardirqs_off_thunk+0x1a/0x1c
[22344.227234] softirqs last  enabled at (43244): [<ffffffffa180031c>] 
__do_softirq+0x31c/0x3a6
[22344.227238] softirqs last disabled at (43233): [<ffffffffa08e11d2>] 
irq_exit+0xb2/0xc0
[22344.227240] ---[ end trace 98bfe3316ee3a52a ]---
[22344.232735] BUG: unable to handle kernel NULL pointer dereference at 
00000000000000a0
[22344.233165] PGD 0 P4D 0
[22344.233409] Oops: 0000 [#1] SMP PTI
[22344.233641] CPU: 1 PID: 1091 Comm: kworker/u4:5 Tainted: G        W 
       4.19.0-rc6+ #9
[22344.234036] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS 
VirtualBox 12/01/2006
[22344.234429] Workqueue: btrfs-scrub btrfs_scrub_helper [btrfs]
[22344.234757] RIP: 0010:scrub_write_page_to_dev_replace+0xb4/0x260 [btrfs]
[22344.235149] Code: 43 28 48 8b 85 50 03 00 00 48 89 43 10 48 8b 43 18 
48 85 c0 0f 84 20 01 00 00 48 89 58 50 48 c7 40 48 30 ee 1c c0 48 8b 53 
10 <48> 8b 92 a0 00 00 00 48 8b 92 e0 00 00 00 48 39 50 08 74 18 66 81
[22344.236053] RSP: 0018:ffffadc3c0c27d20 EFLAGS: 00010246
[22344.236355] RAX: ffff99e97a65cd00 RBX: ffff99e97c197400 RCX: 
0000000000000000
[22344.236707] RDX: 0000000000000000 RSI: ffff99e97a65cd00 RDI: 
ffff99e97a65cd00
[22344.237085] RBP: ffff99e978bab800 R08: 0000000000027970 R09: 
ffff99e97c197400
[22344.237424] R10: ffffffffa16001ff R11: 0000000000000000 R12: 
ffff99e978e12680
[22344.237780] R13: ffff99e978babac8 R14: 0000000000000000 R15: 
ffff99e978513838
[22344.238135] FS:  0000000000000000(0000) GS:ffff99e97db00000(0000) 
knlGS:0000000000000000
[22344.238629] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22344.239043] CR2: 00000000000000a0 CR3: 0000000078ca4000 CR4: 
00000000000406e0
[22344.239516] Call Trace:
[22344.239886]  scrub_write_block_to_dev_replace+0x3f/0x60 [btrfs]
[22344.240305]  scrub_bio_end_io_worker+0x1af/0x490 [btrfs]
[22344.240777]  normal_work_helper+0xf0/0x4c0 [btrfs]
[22344.241201]  process_one_work+0x1f4/0x520
[22344.241792]  ? process_one_work+0x16e/0x520
[22344.242106]  worker_thread+0x46/0x3d0
[22344.242270]  kthread+0xf8/0x130
[22344.242530]  ? process_one_work+0x520/0x520
[22344.242824]  ? kthread_delayed_work_timer_fn+0x80/0x80
[22344.243138]  ret_from_fork+0x3a/0x50
[22344.243418] Modules linked in: btrfs xor zstd_decompress 
zstd_compress xxhash raid6_pq iptable_nat nf_nat_ipv4 nf_nat [last 
unloaded: xor]
[22344.244014] CR2: 00000000000000a0
[22344.244266] ---[ end trace 98bfe3316ee3a52b ]---

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2018-10-05  8:59 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-07 14:54 [PATCH 00/11] Cleanup dev-replace locking David Sterba
2018-09-07 14:55 ` [PATCH 01/11] btrfs: remove btrfs_dev_replace::read_locks David Sterba
2018-09-07 14:55 ` [PATCH 02/11] btrfs: open code btrfs_dev_replace_clear_lock_blocking David Sterba
2018-09-07 14:55 ` [PATCH 03/11] btrfs: open code btrfs_dev_replace_stats_inc David Sterba
2018-09-07 21:02   ` Omar Sandoval
2018-09-07 14:55 ` [PATCH 04/11] btrfs: open code btrfs_after_dev_replace_commit David Sterba
2018-09-07 21:03   ` Omar Sandoval
2018-09-07 14:55 ` [PATCH 05/11] btrfs: dev-replace: avoid useless lock on error handling path David Sterba
2018-09-07 21:06   ` Omar Sandoval
2018-09-14 16:53   ` David Sterba
2018-09-07 14:55 ` [PATCH 06/11] btrfs: dev-replace: move replace members out of fs_info David Sterba
2018-09-07 21:09   ` Omar Sandoval
2018-09-07 14:55 ` [PATCH 07/11] btrfs: dev-replace: remove pointless assert in write unlock David Sterba
2018-09-07 14:55 ` [PATCH 08/11] btrfs: reada: reorder dev-replace locks before radix tree preload David Sterba
2018-09-07 14:55 ` [PATCH 09/11] btrfs: dev-replace: swich locking to rw semaphore David Sterba
2018-09-07 14:55 ` [PATCH 10/11] btrfs: dev-replace: remove custom read/write blocking scheme David Sterba
2018-09-07 14:55 ` [PATCH 11/11] btrfs: dev-replace: open code trivial locking helpers David Sterba
2018-09-27 11:06 ` [PATCH 00/11] Cleanup dev-replace locking David Sterba
2018-10-04  1:06   ` Anand Jain
2018-10-04 10:13     ` David Sterba
2018-10-05  8:58       ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.