All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo@cn.fujitsu.com>
To: <linux-btrfs@vger.kernel.org>
Subject: [PATCH 5/5] btrfs: replace: Use ref counts to avoid destroying target device when canceled
Date: Fri, 3 Feb 2017 16:20:23 +0800	[thread overview]
Message-ID: <20170203082023.3577-6-quwenruo@cn.fujitsu.com> (raw)
In-Reply-To: <20170203082023.3577-1-quwenruo@cn.fujitsu.com>

When dev-replace and scrub are run at the same time, dev-replace can be
canceled by scrub. It's quite common for btrfs/069.

The backtrace would be like:
general protection fault: 0000 [#1] SMP
Workqueue: btrfs-endio-raid56 btrfs_endio_raid56_helper [btrfs]
RIP: 0010:[<ffffffff813a2fa8>]  [<ffffffff813a2fa8>] generic_make_request_checks+0x198/0x5a0
Call Trace:
 [<ffffffff813a4cff>] ? generic_make_request+0xcf/0x290
 [<ffffffff813a4c54>] generic_make_request+0x24/0x290
 [<ffffffff813a4cff>] ? generic_make_request+0xcf/0x290
 [<ffffffff813a4f2e>] submit_bio+0x6e/0x120
 [<ffffffffa087279d>] ? page_in_rbio+0x4d/0x80 [btrfs]
 [<ffffffffa08737d0>] ? rbio_orig_end_io+0x80/0x80 [btrfs]
 [<ffffffffa0873e31>] finish_rmw+0x401/0x550 [btrfs]
 [<ffffffffa0874fc6>] validate_rbio_for_rmw+0x36/0x40 [btrfs]
 [<ffffffffa087504d>] raid_rmw_end_io+0x7d/0x90 [btrfs]
 [<ffffffff8139c536>] bio_endio+0x56/0x60
 [<ffffffffa07e6e5c>] end_workqueue_fn+0x3c/0x40 [btrfs]
 [<ffffffffa08285bf>] btrfs_scrubparity_helper+0xef/0x610 [btrfs]
 [<ffffffffa0828b9e>] btrfs_endio_raid56_helper+0xe/0x10 [btrfs]
 [<ffffffff810ec8df>] process_one_work+0x2af/0x720
 [<ffffffff810ec85b>] ? process_one_work+0x22b/0x720
 [<ffffffff810ecd9b>] worker_thread+0x4b/0x4f0
 [<ffffffff810ecd50>] ? process_one_work+0x720/0x720
 [<ffffffff810ecd50>] ? process_one_work+0x720/0x720
 [<ffffffff810f39d3>] kthread+0xf3/0x110
 [<ffffffff810f38e0>] ? kthread_park+0x60/0x60
 [<ffffffff81857647>] ret_from_fork+0x27/0x40

While in that case, target device can be destroyed at cancel time,
leading to a user-after-free bug:

     Process A (dev-replace)         |         Process B(scrub)
----------------------------------------------------------------------
                                     |(Any RW is OK)
                                     |scrub_setup_recheck_block()
                                     ||- btrfs_map_sblock()
                                     |   Got a bbio with tgtdev
btrfs_dev_replace_finishing()        |
|- btrfs_destory_dev_replace_tgtdev()|
   |- call_rcu(free_device)          |
      |- __free_device()             |
         |- kfree(device)            |
                                     | Scrub worker:
                                     | Access bbio->stripes[], which
                                     | contains tgtdev.
                                     | This triggers general protection.

The bug is mostly obvious for RAID5/6 since raid56 choose to keep old
rbio and rbio->bbio for later steal, this hugely enlarged the race
window and makes it much easier to trigger the bug.

This patch introduces 'tgtdev_refs' and 'tgtdev_wait' for btrfs_device
to wait for all its user released the target device.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/dev-replace.c |  7 ++++++-
 fs/btrfs/volumes.c     | 36 +++++++++++++++++++++++++++++++++++-
 fs/btrfs/volumes.h     | 10 ++++++++++
 3 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 5de280b9ad73..794a6a0bedf2 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -558,7 +558,6 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 			  rcu_str_deref(src_device->name),
 			  src_device->devid,
 			  rcu_str_deref(tgt_device->name));
-	tgt_device->is_tgtdev_for_dev_replace = 0;
 	tgt_device->devid = src_device->devid;
 	src_device->devid = BTRFS_DEV_REPLACE_DEVID;
 	memcpy(uuid_tmp, tgt_device->uuid, sizeof(uuid_tmp));
@@ -579,6 +578,12 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info *fs_info,
 
 	btrfs_dev_replace_unlock(dev_replace, 1);
 
+	/*
+	 * Only change is_tgtdev_for_dev_replace flag after all its
+	 * users get released.
+	 */
+	wait_target_device(tgt_device);
+	tgt_device->is_tgtdev_for_dev_replace = 0;
 	btrfs_rm_dev_replace_blocked(fs_info);
 
 	btrfs_rm_dev_replace_remove_srcdev(fs_info, src_device);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3c3c69c0eee4..84db9fb22b7d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2064,6 +2064,7 @@ void btrfs_destroy_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	WARN_ON(!tgtdev);
 	mutex_lock(&fs_info->fs_devices->device_list_mutex);
 
+	wait_target_device(tgtdev);
 	btrfs_sysfs_rm_device_link(fs_info->fs_devices, tgtdev);
 
 	if (tgtdev->bdev)
@@ -2598,6 +2599,8 @@ int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	device->is_tgtdev_for_dev_replace = 1;
 	device->mode = FMODE_EXCL;
 	device->dev_stats_valid = 1;
+	atomic_set(&device->tgtdev_refs, 0);
+	init_waitqueue_head(&device->tgtdev_wait);
 	set_blocksize(device->bdev, 4096);
 	device->fs_devices = fs_info->fs_devices;
 	list_add(&device->dev_list, &fs_info->fs_devices->devices);
@@ -2624,6 +2627,8 @@ void btrfs_init_dev_replace_tgtdev_for_resume(struct btrfs_fs_info *fs_info,
 	tgtdev->sector_size = sectorsize;
 	tgtdev->fs_info = fs_info;
 	tgtdev->in_fs_metadata = 1;
+	atomic_set(&tgtdev->tgtdev_refs, 0);
+	init_waitqueue_head(&tgtdev->tgtdev_wait);
 }
 
 static noinline int btrfs_update_device(struct btrfs_trans_handle *trans,
@@ -5302,6 +5307,32 @@ static struct btrfs_bio *alloc_btrfs_bio(int total_stripes, int real_stripes)
 	return bbio;
 }
 
+static void pin_bbio_target_device(struct btrfs_bio *bbio)
+{
+	int i;
+
+	for (i = 0; i < bbio->num_stripes; i++) {
+		struct btrfs_device *device = bbio->stripes[i].dev;
+
+		if (device->is_tgtdev_for_dev_replace)
+			atomic_inc(&device->tgtdev_refs);
+	}
+}
+
+static void unpin_bbio_target_device(struct btrfs_bio *bbio)
+{
+	int i;
+
+	for (i = 0; i < bbio->num_stripes; i++) {
+		struct btrfs_device *device = bbio->stripes[i].dev;
+
+		if (device->is_tgtdev_for_dev_replace) {
+			atomic_dec(&device->tgtdev_refs);
+			wake_up(&device->tgtdev_wait);
+		}
+	}
+}
+
 void btrfs_get_bbio(struct btrfs_bio *bbio)
 {
 	WARN_ON(!atomic_read(&bbio->refs));
@@ -5312,8 +5343,10 @@ void btrfs_put_bbio(struct btrfs_bio *bbio)
 {
 	if (!bbio)
 		return;
-	if (atomic_dec_and_test(&bbio->refs))
+	if (atomic_dec_and_test(&bbio->refs)) {
+		unpin_bbio_target_device(bbio);
 		kfree(bbio);
+	}
 }
 
 static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
@@ -5868,6 +5901,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info,
 		bbio->stripes[0].physical = physical_to_patch_in_first_stripe;
 		bbio->mirror_num = map->num_stripes + 1;
 	}
+	pin_bbio_target_device(bbio);
 out:
 	if (dev_replace_is_ongoing) {
 		btrfs_dev_replace_clear_lock_blocking(dev_replace);
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 24ba6bc3ec34..702361182597 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -149,6 +149,10 @@ struct btrfs_device {
 	/* Counter to record the change of device stats */
 	atomic_t dev_stats_ccnt;
 	atomic_t dev_stat_values[BTRFS_DEV_STAT_VALUES_MAX];
+
+	/* To ensure we wait this target device before destroying it */
+	atomic_t tgtdev_refs;
+	wait_queue_head_t tgtdev_wait;
 };
 
 /*
@@ -538,4 +542,10 @@ struct list_head *btrfs_get_fs_uuids(void);
 void btrfs_set_fs_info_ptr(struct btrfs_fs_info *fs_info);
 void btrfs_reset_fs_info_ptr(struct btrfs_fs_info *fs_info);
 
+static inline void wait_target_device(struct btrfs_device *tgtdev)
+{
+	if (!tgtdev || !tgtdev->is_tgtdev_for_dev_replace)
+		return;
+	wait_event(tgtdev->tgtdev_wait, atomic_read(&tgtdev->tgtdev_refs) == 0);
+}
 #endif
-- 
2.11.0




  parent reply	other threads:[~2017-02-03  8:21 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-03  8:20 [PATCH 0/5] raid56: variant bug fixes Qu Wenruo
2017-02-03  8:20 ` [PATCH 1/5] btrfs: scrub: Introduce full stripe lock for RAID56 Qu Wenruo
2017-02-03  8:20 ` [PATCH 2/5] btrfs: scrub: Fix RAID56 recovery race condition Qu Wenruo
2017-02-03  8:20 ` [PATCH 3/5] btrfs: raid56: Use correct stolen pages to calculate P/Q Qu Wenruo
2017-03-16  5:36   ` Liu Bo
2017-03-16  8:30     ` Qu Wenruo
2017-03-17  6:31     ` Qu Wenruo
2017-03-17 22:19       ` Liu Bo
2017-03-20  4:33         ` Qu Wenruo
2017-02-03  8:20 ` [PATCH 4/5] btrfs: raid56: Don't keep rbio for later steal Qu Wenruo
2017-03-17  4:44   ` Liu Bo
2017-03-17  5:28     ` Qu Wenruo
2017-03-18  2:03       ` Liu Bo
2017-03-20  6:21         ` Qu Wenruo
2017-03-20 20:23           ` Liu Bo
2017-03-21  0:44             ` Qu Wenruo
2017-03-21  2:08               ` Liu Bo
2017-03-21  2:23                 ` Qu Wenruo
2017-03-21  5:45                   ` Liu Bo
2017-03-21  7:00                     ` Qu Wenruo
2017-02-03  8:20 ` Qu Wenruo [this message]
2017-03-18  2:12   ` [PATCH 5/5] btrfs: replace: Use ref counts to avoid destroying target device when canceled Liu Bo
2017-03-20  6:30     ` Qu Wenruo
2017-03-20 19:31       ` Liu Bo
2017-03-07  3:48 ` [PATCH 0/5] raid56: variant bug fixes Qu Wenruo
2017-03-14 13:47   ` David Sterba
2017-03-14 21:30     ` Goffredo Baroncelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170203082023.3577-6-quwenruo@cn.fujitsu.com \
    --to=quwenruo@cn.fujitsu.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.