[PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted
@ 2014-07-03 10:22 Miao Xie
  2014-07-03 10:22 ` [PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid Miao Xie
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Anand Jain, Wang Shilong

From: Anand Jain <anand.jain@oracle.com>

device_list_add() is called when user runs btrfs dev scan, which would add
any btrfs device into the btrfs_fs_devices list.

Now think of a mounted btrfs. And a new device which contains the a SB
from the mounted btrfs devices.

In this situation when user runs btrfs dev scan, the current code would
just replace existing device with the new device.

Which is to note that old device is neither closed nor gracefully
removed from the btrfs.

The FS is still operational with the old bdev however the device name
is the btrfs_device is new which is provided by the btrfs dev scan.

reproducer:

devmgt[1] detach /dev/sdc

replace the missing disk /dev/sdc

btrfs rep start -f 1 /dev/sde /btrfs
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
        Total devices 2 FS bytes used 32.00KiB
        devid    1 size 958.94MiB used 115.88MiB path /dev/sde
        devid    2 size 958.94MiB used 103.88MiB path /dev/sdd

make /dev/sdc to reappear

devmgt attach host2

btrfs dev scan

btrfs fi show -m
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M
        Total devices 2 FS bytes used 32.00KiB^M
        devid    1 size 958.94MiB used 115.88MiB path /dev/sdc <- Wrong.
        devid    2 size 958.94MiB used 103.88MiB path /dev/sdd

since /dev/sdc has been replaced with /dev/sde, the /dev/sdc shouldn't be
part of the btrfs-fsid when it reappears. If user want it to be part of it
then sys admin should be using btrfs device add instead.

[1] github.com/anajain/devmgt.git

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v3->v4:
- Fix the over-80-charactor problem
---
 fs/btrfs/volumes.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a9c11a0..16e71a1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -508,6 +508,33 @@ static noinline int device_list_add(const char *path,
 		ret = 1;
 		device->fs_devices = fs_devices;
 	} else if (!device->name || strcmp(device->name->str, path)) {
+		/*
+		 * When FS is already mounted.
+		 * 1. If you are here and if the device->name is NULL that
+		 *    means this device was missing at time of FS mount.
+		 * 2. If you are here and if the device->name is different
+		 *    from 'path' that means either
+		 *      a. The same device disappeared and reappeared with
+		 *         different name. or
+		 *      b. The missing-disk-which-was-replaced, has
+		 *         reappeared now.
+		 *
+		 * We must allow 1 and 2a above. But 2b would be a spurious
+		 * and unintentional.
+		 *
+		 * Further in case of 1 and 2a above, the disk at 'path'
+		 * would have missed some transaction when it was away and
+		 * in case of 2a the stale bdev has to be updated as well.
+		 * 2b must not be allowed at all time.
+		 */
+
+		/*
+		 * As of now don't allow update to btrfs_fs_device through
+		 * the btrfs dev scan cli, after FS has been mounted.
+		 */
+		if (fs_devices->opened)
+			return -EBUSY;
+
 		name = rcu_string_strdup(path, GFP_NOFS);
 		if (!name)
 			return -ENOMEM;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-03 10:22 ` [PATCH RESEND 3/9] Btrfs: make defragment work with nodatacow option Miao Xie
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Anand Jain, Wang Shilong

From: Anand Jain <anand.jain@oracle.com>

When FS in unmounted we need to check generation number as well
since devid+uuid combination could match with the missing replaced
disk when it reappears, and without this patch it might pair with
the replaced disk again.

 device_list_add() function is called in the following threads,
	mount device option
	mount argument
	ioctl BTRFS_IOC_SCAN_DEV (btrfs dev scan)
	ioctl BTRFS_IOC_DEVICES_READY (btrfs dev ready <dev>)
 they have been unit tested to work fine with this patch.

 If the user knows what he is doing and really want to pair with
 replaced disk (which is not a standard operation), then he should
 first clear the kernel btrfs device list in the memory by doing
 the module unload/load and followed with the mount -o device option.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1->v2:
- Fix the over-80-charactor problem and unreasonable error number
---
 fs/btrfs/volumes.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 16e71a1..1891541 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -532,8 +532,19 @@ static noinline int device_list_add(const char *path,
 		 * As of now don't allow update to btrfs_fs_device through
 		 * the btrfs dev scan cli, after FS has been mounted.
 		 */
-		if (fs_devices->opened)
+		if (fs_devices->opened) {
 			return -EBUSY;
+		} else {
+			/*
+			 * That is if the FS is _not_ mounted and if you
+			 * are here, that means there is more than one
+			 * disk with same uuid and devid.We keep the one
+			 * with larger generation number or the last-in if
+			 * generation are equal.
+			 */
+			if (found_transid < device->generation)
+				return -EEXIST;
+		}
 
 		name = rcu_string_strdup(path, GFP_NOFS);
 		if (!name)
@@ -546,6 +557,15 @@ static noinline int device_list_add(const char *path,
 		}
 	}
 
+	/*
+	 * Unmount does not free the btrfs_device struct but would zero
+	 * generation along with most of the other members. So just update
+	 * it back. We need it to pick the disk with largest generation
+	 * (as above).
+	 */
+	if (!fs_devices->opened)
+		device->generation = found_transid;
+
 	if (found_transid > fs_devices->latest_trans) {
 		fs_devices->latest_devid = devid;
 		fs_devices->latest_trans = found_transid;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RESEND 3/9] Btrfs: make defragment work with nodatacow option
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
  2014-07-03 10:22 ` [PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-03 10:22 ` [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail Miao Xie
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Wang Shilong

From: Wang Shilong <wangsl.fnst@cn.fujitsu.com>

Btrfs defragment will utilize COW feature, which means this
did not work for nodatacow option, this problem was detected
by xfstests generic/018 with nodatacow mount option.

Fix this problem by forcing cow for a extent with state
@EXTETN_DEFRAG setting.

Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/btrfs_inode.h |  6 ++++++
 fs/btrfs/inode.c       | 39 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index a0cf3e5..01cfcba 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -127,6 +127,12 @@ struct btrfs_inode {
 	u64 delalloc_bytes;
 
 	/*
+	 * total number of bytes pending defrag, used by stat to check whether
+	 * it needs COW.
+	 */
+	u64 defrag_bytes;
+
+	/*
 	 * the size of the file stored in the metadata on disk.  data=ordered
 	 * means the in-memory i_size might be larger than the size on disk
 	 * because not all the blocks are written yet.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6b65fab..a616fa4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1425,6 +1425,26 @@ error:
 	return ret;
 }
 
+static inline int need_force_cow(struct inode *inode, u64 start, u64 end)
+{
+
+	if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) &&
+	    !(BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC))
+		return 0;
+
+	/*
+	 * @defrag_bytes is a hint value, no spinlock held here,
+	 * if is not zero, it means the file is defragging.
+	 * Force cow if given extent needs to be defragged.
+	 */
+	if (BTRFS_I(inode)->defrag_bytes &&
+	    test_range_bit(&BTRFS_I(inode)->io_tree, start, end,
+			   EXTENT_DEFRAG, 0, NULL))
+		return 1;
+
+	return 0;
+}
+
 /*
  * extent_io.c call back to do delayed allocation processing
  */
@@ -1434,11 +1454,12 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page,
 {
 	int ret;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
+	int force_cow = need_force_cow(inode, start, end);
 
-	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW) {
+	if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 1, nr_written);
-	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC) {
+	} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
 		ret = run_delalloc_nocow(inode, locked_page, start, end,
 					 page_started, 0, nr_written);
 	} else if (!btrfs_test_opt(root, COMPRESS) &&
@@ -1535,6 +1556,8 @@ static void btrfs_set_bit_hook(struct inode *inode,
 			       struct extent_state *state, unsigned long *bits)
 {
 
+	if ((*bits & EXTENT_DEFRAG) && !(*bits & EXTENT_DELALLOC))
+		WARN_ON(1);
 	/*
 	 * set_bit and clear bit hooks normally require _irqsave/restore
 	 * but in this case, we are only testing for the DELALLOC
@@ -1557,6 +1580,8 @@ static void btrfs_set_bit_hook(struct inode *inode,
 				     root->fs_info->delalloc_batch);
 		spin_lock(&BTRFS_I(inode)->lock);
 		BTRFS_I(inode)->delalloc_bytes += len;
+		if (*bits & EXTENT_DEFRAG)
+			BTRFS_I(inode)->defrag_bytes += len;
 		if (do_list && !test_bit(BTRFS_INODE_IN_DELALLOC_LIST,
 					 &BTRFS_I(inode)->runtime_flags))
 			btrfs_add_delalloc_inodes(root, inode);
@@ -1571,6 +1596,13 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 				 struct extent_state *state,
 				 unsigned long *bits)
 {
+	u64 len = state->end + 1 - state->start;
+
+	spin_lock(&BTRFS_I(inode)->lock);
+	if ((state->state & EXTENT_DEFRAG) && (*bits & EXTENT_DEFRAG))
+		BTRFS_I(inode)->defrag_bytes -= len;
+	spin_unlock(&BTRFS_I(inode)->lock);
+
 	/*
 	 * set_bit and clear bit hooks normally require _irqsave/restore
 	 * but in this case, we are only testing for the DELALLOC
@@ -1578,7 +1610,6 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 	 */
 	if ((state->state & EXTENT_DELALLOC) && (*bits & EXTENT_DELALLOC)) {
 		struct btrfs_root *root = BTRFS_I(inode)->root;
-		u64 len = state->end + 1 - state->start;
 		bool do_list = !btrfs_is_free_space_inode(inode);
 
 		if (*bits & EXTENT_FIRST_DELALLOC) {
@@ -8089,6 +8120,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
 	ei->last_sub_trans = 0;
 	ei->logged_trans = 0;
 	ei->delalloc_bytes = 0;
+	ei->defrag_bytes = 0;
 	ei->disk_i_size = 0;
 	ei->flags = 0;
 	ei->csum_bytes = 0;
@@ -8148,6 +8180,7 @@ void btrfs_destroy_inode(struct inode *inode)
 	WARN_ON(BTRFS_I(inode)->reserved_extents);
 	WARN_ON(BTRFS_I(inode)->delalloc_bytes);
 	WARN_ON(BTRFS_I(inode)->csum_bytes);
+	WARN_ON(BTRFS_I(inode)->defrag_bytes);
 
 	/*
 	 * This can happen where we create an inode, but somebody else also
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
  2014-07-03 10:22 ` [PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid Miao Xie
  2014-07-03 10:22 ` [PATCH RESEND 3/9] Btrfs: make defragment work with nodatacow option Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-04  1:58   ` Satoru Takeuchi
  2014-07-03 10:22 ` [PATCH RESEND 5/9] Btrfs: fix missing error handler if submiting re-read bio fails Miao Xie
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs

The caller of btrfs_submit_direct_hook() will put the original dio bio
when btrfs_submit_direct_hook() return a error number, so we needn't
put the original bio in btrfs_submit_direct_hook().

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/inode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a616fa4..15902eb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7325,10 +7325,8 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
 	map_length = orig_bio->bi_iter.bi_size;
 	ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
 			      &map_length, NULL, 0);
-	if (ret) {
-		bio_put(orig_bio);
+	if (ret)
 		return -EIO;
-	}
 
 	if (map_length >= orig_bio->bi_iter.bi_size) {
 		bio = orig_bio;
@@ -7345,6 +7343,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
 	bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
 	if (!bio)
 		return -ENOMEM;
+
 	bio->bi_private = dip;
 	bio->bi_end_io = btrfs_end_dio_bio;
 	atomic_inc(&dip->pending_bios);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RESEND 5/9] Btrfs: fix missing error handler if submiting re-read bio fails
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
                   ` (2 preceding siblings ...)
  2014-07-03 10:22 ` [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-03 10:22 ` [PATCH RESEND 6/9] Btrfs: cleanup the read failure record after write or when the inode is freeing Miao Xie
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs

We forgot to free failure record and bio after submitting re-read bio failed,
fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/extent_io.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 23398ad..3a64354 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2345,6 +2345,11 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset,
 	ret = tree->ops->submit_bio_hook(inode, read_mode, bio,
 					 failrec->this_mirror,
 					 failrec->bio_flags, 0);
+	if (ret) {
+		free_io_failure(inode, failrec, 0);
+		bio_put(bio);
+	}
+
 	return ret;
 }
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH RESEND 6/9] Btrfs: cleanup the read failure record after write or when the inode is freeing
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
                   ` (3 preceding siblings ...)
  2014-07-03 10:22 ` [PATCH RESEND 5/9] Btrfs: fix missing error handler if submiting re-read bio fails Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-03 10:22 ` [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null Miao Xie
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs

After the data is written successfully, we should cleanup the read failure record
in that range because
- If we set data COW for the file, the range that the failure record pointed to is
  mapped to a new place, so it is invalid.
- If we set no data COW for the file, and if there is no error during writting,
  the corrupted data is corrected, so the failure record can be removed. And if
  some errors happen on the mirrors, we also needn't worry about it because the
  failure record will be recreated if we read the same place again.

Sometimes, we may fail to correct the data, so the failure records will be left
in the tree, we need free them when we free the inode or the memory leak happens.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/extent_io.c | 34 ++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_io.h |  1 +
 fs/btrfs/inode.c     |  6 ++++++
 3 files changed, 41 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3a64354..d67bb4f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2003,6 +2003,40 @@ static int free_io_failure(struct inode *inode, struct io_failure_record *rec,
 }
 
 /*
+ * Can be called when
+ * - hold extent lock
+ * - under ordered extent
+ * - the inode is freeing
+ */
+void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end)
+{
+	struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree;
+	struct io_failure_record *failrec;
+	struct extent_state *state, *next;
+
+	if (RB_EMPTY_ROOT(&failure_tree->state))
+		return;
+
+	spin_lock(&failure_tree->lock);
+	state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
+	while (state) {
+		if (state->start > end)
+			break;
+
+		ASSERT(state->end <= end);
+
+		next = next_state(state);
+
+		failrec = (struct io_failure_record *)state->private;
+		free_extent_state(state);
+		kfree(failrec);
+
+		state = next;
+	}
+	spin_unlock(&failure_tree->lock);
+}
+
+/*
  * this bypasses the standard btrfs submit functions deliberately, as
  * the standard behavior is to write all copies in a raid setup. here we only
  * want to write the one bad copy. so we do the mapping for ourselves and issue
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index ccc264e..d06780b 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -347,6 +347,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start,
 int end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int repair_eb_io_failure(struct btrfs_root *root, struct extent_buffer *eb,
 			 int mirror_num);
+void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 noinline u64 find_lock_delalloc_range(struct inode *inode,
 				      struct extent_io_tree *tree,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 15902eb..b431c58 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2670,6 +2670,10 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent)
 		goto out;
 	}
 
+	btrfs_free_io_failure_record(inode, ordered_extent->file_offset,
+				     ordered_extent->file_offset +
+				     ordered_extent->len - 1);
+
 	if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) {
 		truncated = true;
 		logical_len = ordered_extent->truncated_len;
@@ -4745,6 +4749,8 @@ void btrfs_evict_inode(struct inode *inode)
 	/* do we really want it for ->i_nlink > 0 and zero btrfs_root_refs? */
 	btrfs_wait_ordered_range(inode, 0, (u64)-1);
 
+	btrfs_free_io_failure_record(inode, 0, (u64)-1);
+
 	if (root->fs_info->log_root_recovering) {
 		BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 				 &BTRFS_I(inode)->runtime_flags));
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
                   ` (4 preceding siblings ...)
  2014-07-03 10:22 ` [PATCH RESEND 6/9] Btrfs: cleanup the read failure record after write or when the inode is freeing Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-07  4:04   ` Anand Jain
  2014-07-03 10:22 ` [PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs Miao Xie
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Anand Jain

From: Anand Jain <Anand.Jain@oracle.com>

when one of the device path is missing btrfs_device name is null. So this
patch will check for that.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812e18c0>] strlen+0x0/0x30
[<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
[<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
[<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
[<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
[<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
[<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
[<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
[<ffffffff81193426>] ? blkdev_put+0x106/0x110
[<ffffffff81179175>] ? mntput+0x35/0x40
[<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
[<ffffffff8115c72e>] ? ____fput+0xe/0x10
[<ffffffff81068033>] ? task_work_run+0xb3/0xd0
[<ffffffff8116d547>] SyS_ioctl+0x57/0x90
[<ffffffff817a793e>] ? do_page_fault+0xe/0x10
[<ffffffff817abe52>] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs

Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
Changelog v1->v2:
- Fix the problem that we forgot to set the missing flag for the cloned device
---
 fs/btrfs/volumes.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1891541..4731bd6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
 		if (IS_ERR(device))
 			goto error;
 
-		/*
-		 * This is ok to do without rcu read locked because we hold the
-		 * uuid mutex so nothing we touch in here is going to disappear.
-		 */
-		name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
-		if (!name) {
-			kfree(device);
-			goto error;
+		if (orig_dev->missing) {
+			device->missing = 1;
+			fs_devices->missing_devices++;
+		} else {
+			ASSERT(orig_dev->name);
+			/*
+			 * This is ok to do without rcu read locked because
+			 * we hold the uuid mutex so nothing we touch in here
+			 * is going to disappear.
+			 */
+			name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
+			if (!name) {
+				kfree(device);
+				goto error;
+			}
+			rcu_assign_pointer(device->name, name);
 		}
-		rcu_assign_pointer(device->name, name);
 
 		list_add(&device->dev_list, &fs_devices->devices);
 		device->fs_devices = fs_devices;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
                   ` (5 preceding siblings ...)
  2014-07-03 10:22 ` [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-03 10:22 ` [PATCH 9/9] Btrfs: fix writing data into the seed filesystem Miao Xie
  2014-07-04  8:07 ` [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Satoru Takeuchi
  8 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs

We forgot to zero some members in fs_devices when we create new fs_devices
from the one of the seed fs. It would cause the problem that we got wrong
chunk profile when allocating chunks. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4731bd6..73a82e5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1993,6 +1993,9 @@ static int btrfs_prepare_sprout(struct btrfs_root *root)
 	fs_devices->seeding = 0;
 	fs_devices->num_devices = 0;
 	fs_devices->open_devices = 0;
+	fs_devices->missing_devices = 0;
+	fs_devices->num_can_discard = 0;
+	fs_devices->rotating = 0;
 	fs_devices->seed = seed_devices;
 
 	generate_random_uuid(fs_devices->fsid);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] Btrfs: fix writing data into the seed filesystem
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
                   ` (6 preceding siblings ...)
  2014-07-03 10:22 ` [PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs Miao Xie
@ 2014-07-03 10:22 ` Miao Xie
  2014-07-04  3:17   ` Liu Bo
  2014-07-04  8:07 ` [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Satoru Takeuchi
  8 siblings, 1 reply; 16+ messages in thread
From: Miao Xie @ 2014-07-03 10:22 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Josef Bacik

If we mounted a seed filesystem with degraded option, and then added a new
device into the seed filesystem, then we found adding device failed because
of the IO failure.

Steps to reproduce:
 # mkfs.btrfs -d raid1 -m raid1 <dev0> <dev1>
 # btrfstune -S 1 <dev0>
 # mount <dev0> -o degraded <mnt>
 # btrfs device add -f <dev2> <mnt>

It is because the original didn't set the chunk on the seed device to be
read-only if the degraded flag was set. It was introduced by patch f48b90756,
which fixed the problem the raid1 filesystem became read-only after one device
of it was missing. But this fix method was not right, we should set the read-only
flag according to the number of the missing devices, not the degraded mount
option, if the number of the missing devices is less than the max error number
that the profile of the chunk tolerates, we don't set it to be read-only.

Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 52 ++++++++++++++++++++++++++++++++++++----------------
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73a82e5..daecfa5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4584,12 +4584,31 @@ out:
 	return ret;
 }
 
+static inline int btrfs_chunk_max_errors(struct map_lookup *map)
+{
+	int max_errors;
+
+	if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
+			 BTRFS_BLOCK_GROUP_RAID10 |
+			 BTRFS_BLOCK_GROUP_RAID5 |
+			 BTRFS_BLOCK_GROUP_DUP)) {
+		max_errors = 1;
+	} else if (map->type & BTRFS_BLOCK_GROUP_RAID6) {
+		max_errors = 2;
+	} else {
+		max_errors = 0;
+	}
+
+	return max_errors;
+}
+
 int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset)
 {
 	struct extent_map *em;
 	struct map_lookup *map;
 	struct btrfs_mapping_tree *map_tree = &root->fs_info->mapping_tree;
 	int readonly = 0;
+	int miss_ndevs = 0;
 	int i;
 
 	read_lock(&map_tree->map_tree.lock);
@@ -4598,18 +4617,27 @@ int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset)
 	if (!em)
 		return 1;
 
-	if (btrfs_test_opt(root, DEGRADED)) {
-		free_extent_map(em);
-		return 0;
-	}
-
 	map = (struct map_lookup *)em->bdev;
 	for (i = 0; i < map->num_stripes; i++) {
+		if (map->stripes[i].dev->missing) {
+			miss_ndevs++;
+			continue;
+		}
+
 		if (!map->stripes[i].dev->writeable) {
 			readonly = 1;
-			break;
+			goto end;
 		}
 	}
+
+	/*
+	 * If the number of missing devices is larger than max errors,
+	 * we can not write the data into that chunk successfully, so
+	 * set it readonly.
+	 */
+	if (miss_ndevs > btrfs_chunk_max_errors(map))
+		readonly = 1;
+end:
 	free_extent_map(em);
 	return readonly;
 }
@@ -5220,16 +5248,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
 		}
 	}
 
-	if (rw & (REQ_WRITE | REQ_GET_READ_MIRRORS)) {
-		if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
-				 BTRFS_BLOCK_GROUP_RAID10 |
-				 BTRFS_BLOCK_GROUP_RAID5 |
-				 BTRFS_BLOCK_GROUP_DUP)) {
-			max_errors = 1;
-		} else if (map->type & BTRFS_BLOCK_GROUP_RAID6) {
-			max_errors = 2;
-		}
-	}
+	if (rw & (REQ_WRITE | REQ_GET_READ_MIRRORS))
+		max_errors = btrfs_chunk_max_errors(map);
 
 	if (dev_replace_is_ongoing && (rw & (REQ_WRITE | REQ_DISCARD)) &&
 	    dev_replace->tgtdev != NULL) {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail
  2014-07-03 10:22 ` [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail Miao Xie
@ 2014-07-04  1:58   ` Satoru Takeuchi
  0 siblings, 0 replies; 16+ messages in thread
From: Satoru Takeuchi @ 2014-07-04  1:58 UTC (permalink / raw)
  To: Miao Xie, linux-btrfs

Hi Miao,

(2014/07/03 19:22), Miao Xie wrote:
> The caller of btrfs_submit_direct_hook() will put the original dio bio
> when btrfs_submit_direct_hook() return a error number, so we needn't
> put the original bio in btrfs_submit_direct_hook().
> 
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>

Here is the review result, CMIIW.

call trace:
  btrfs_submit_direct
    -> btrfs_submit_direct_hook

fs/btrfs/inode.c:
===============================================================================
static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
				    int skip_sum)
{
	...
	struct bio *orig_bio = dip->orig_bio;
	...
	map_length = orig_bio->bi_iter.bi_size;
	ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
			      &map_length, NULL, 0);
	if (ret) {
		bio_put(orig_bio); # (1)
		return -EIO;
	}
	...
}
...
static void btrfs_submit_direct(int rw, struct bio *dio_bio,
				struct inode *inode, loff_t file_offset)
{
	dip = kmalloc(sizeof(*dip) + sum_len, GFP_NOFS);
	if (!dip) {
		ret = -ENOMEM;
		goto free_io_bio; # (2)
	}
	...
	ret = btrfs_submit_direct_hook(rw, dip, skip_sum);
	if (!ret)
		return;

free_io_bio:
	bio_put(io_bio); # (3)
	...
}
===============================================================================

If btrfs_map_block() fails in btrfs_submit_direct_hook(), it put
orig_bio at (1) and return -EIO. Then caller, btrfs_submit_direct()
free the same bio at (3). Since (3) is also used for other error
handling (2), I consider your way, removing (1), is better.

Thanks,
Satoru

> ---
>   fs/btrfs/inode.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index a616fa4..15902eb 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7325,10 +7325,8 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
>   	map_length = orig_bio->bi_iter.bi_size;
>   	ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
>   			      &map_length, NULL, 0);
> -	if (ret) {
> -		bio_put(orig_bio);
> +	if (ret)
>   		return -EIO;
> -	}
>   
>   	if (map_length >= orig_bio->bi_iter.bi_size) {
>   		bio = orig_bio;
> @@ -7345,6 +7343,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
>   	bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev, start_sector, GFP_NOFS);
>   	if (!bio)
>   		return -ENOMEM;
> +
>   	bio->bi_private = dip;
>   	bio->bi_end_io = btrfs_end_dio_bio;
>   	atomic_inc(&dip->pending_bios);
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 9/9] Btrfs: fix writing data into the seed filesystem
  2014-07-03 10:22 ` [PATCH 9/9] Btrfs: fix writing data into the seed filesystem Miao Xie
@ 2014-07-04  3:17   ` Liu Bo
  0 siblings, 0 replies; 16+ messages in thread
From: Liu Bo @ 2014-07-04  3:17 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs, Josef Bacik

On Thu, Jul 03, 2014 at 06:22:13PM +0800, Miao Xie wrote:
> If we mounted a seed filesystem with degraded option, and then added a new
> device into the seed filesystem, then we found adding device failed because
> of the IO failure.
> 
> Steps to reproduce:
>  # mkfs.btrfs -d raid1 -m raid1 <dev0> <dev1>
>  # btrfstune -S 1 <dev0>
>  # mount <dev0> -o degraded <mnt>
>  # btrfs device add -f <dev2> <mnt>
> 
> It is because the original didn't set the chunk on the seed device to be
> read-only if the degraded flag was set. It was introduced by patch f48b90756,
> which fixed the problem the raid1 filesystem became read-only after one device
> of it was missing. But this fix method was not right, we should set the read-only
> flag according to the number of the missing devices, not the degraded mount
> option, if the number of the missing devices is less than the max error number
> that the profile of the chunk tolerates, we don't set it to be read-only.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

-liubo

> 
> Cc: Josef Bacik <jbacik@fb.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
>  fs/btrfs/volumes.c | 52 ++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 36 insertions(+), 16 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 73a82e5..daecfa5 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4584,12 +4584,31 @@ out:
>  	return ret;
>  }
>  
> +static inline int btrfs_chunk_max_errors(struct map_lookup *map)
> +{
> +	int max_errors;
> +
> +	if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
> +			 BTRFS_BLOCK_GROUP_RAID10 |
> +			 BTRFS_BLOCK_GROUP_RAID5 |
> +			 BTRFS_BLOCK_GROUP_DUP)) {
> +		max_errors = 1;
> +	} else if (map->type & BTRFS_BLOCK_GROUP_RAID6) {
> +		max_errors = 2;
> +	} else {
> +		max_errors = 0;
> +	}
> +
> +	return max_errors;
> +}
> +
>  int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset)
>  {
>  	struct extent_map *em;
>  	struct map_lookup *map;
>  	struct btrfs_mapping_tree *map_tree = &root->fs_info->mapping_tree;
>  	int readonly = 0;
> +	int miss_ndevs = 0;
>  	int i;
>  
>  	read_lock(&map_tree->map_tree.lock);
> @@ -4598,18 +4617,27 @@ int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset)
>  	if (!em)
>  		return 1;
>  
> -	if (btrfs_test_opt(root, DEGRADED)) {
> -		free_extent_map(em);
> -		return 0;
> -	}
> -
>  	map = (struct map_lookup *)em->bdev;
>  	for (i = 0; i < map->num_stripes; i++) {
> +		if (map->stripes[i].dev->missing) {
> +			miss_ndevs++;
> +			continue;
> +		}
> +
>  		if (!map->stripes[i].dev->writeable) {
>  			readonly = 1;
> -			break;
> +			goto end;
>  		}
>  	}
> +
> +	/*
> +	 * If the number of missing devices is larger than max errors,
> +	 * we can not write the data into that chunk successfully, so
> +	 * set it readonly.
> +	 */
> +	if (miss_ndevs > btrfs_chunk_max_errors(map))
> +		readonly = 1;
> +end:
>  	free_extent_map(em);
>  	return readonly;
>  }
> @@ -5220,16 +5248,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
>  		}
>  	}
>  
> -	if (rw & (REQ_WRITE | REQ_GET_READ_MIRRORS)) {
> -		if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
> -				 BTRFS_BLOCK_GROUP_RAID10 |
> -				 BTRFS_BLOCK_GROUP_RAID5 |
> -				 BTRFS_BLOCK_GROUP_DUP)) {
> -			max_errors = 1;
> -		} else if (map->type & BTRFS_BLOCK_GROUP_RAID6) {
> -			max_errors = 2;
> -		}
> -	}
> +	if (rw & (REQ_WRITE | REQ_GET_READ_MIRRORS))
> +		max_errors = btrfs_chunk_max_errors(map);
>  
>  	if (dev_replace_is_ongoing && (rw & (REQ_WRITE | REQ_DISCARD)) &&
>  	    dev_replace->tgtdev != NULL) {
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted
  2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
                   ` (7 preceding siblings ...)
  2014-07-03 10:22 ` [PATCH 9/9] Btrfs: fix writing data into the seed filesystem Miao Xie
@ 2014-07-04  8:07 ` Satoru Takeuchi
  8 siblings, 0 replies; 16+ messages in thread
From: Satoru Takeuchi @ 2014-07-04  8:07 UTC (permalink / raw)
  To: Miao Xie, linux-btrfs; +Cc: Anand Jain, Wang Shilong

Hi Miao, Anand,

(2014/07/03 19:22), Miao Xie wrote:
> From: Anand Jain <anand.jain@oracle.com>
> 
> device_list_add() is called when user runs btrfs dev scan, which would add
> any btrfs device into the btrfs_fs_devices list.

Reviewed-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>

Thanks,
Satoru

> 
> Now think of a mounted btrfs. And a new device which contains the a SB
> from the mounted btrfs devices.
> 
> In this situation when user runs btrfs dev scan, the current code would
> just replace existing device with the new device.
> 
> Which is to note that old device is neither closed nor gracefully
> removed from the btrfs.
> 
> The FS is still operational with the old bdev however the device name
> is the btrfs_device is new which is provided by the btrfs dev scan.
> 
> reproducer:
> 
> devmgt[1] detach /dev/sdc
> 
> replace the missing disk /dev/sdc
> 
> btrfs rep start -f 1 /dev/sde /btrfs
> Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
>          Total devices 2 FS bytes used 32.00KiB
>          devid    1 size 958.94MiB used 115.88MiB path /dev/sde
>          devid    2 size 958.94MiB used 103.88MiB path /dev/sdd
> 
> make /dev/sdc to reappear
> 
> devmgt attach host2
> 
> btrfs dev scan
> 
> btrfs fi show -m
> Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M
>          Total devices 2 FS bytes used 32.00KiB^M
>          devid    1 size 958.94MiB used 115.88MiB path /dev/sdc <- Wrong.
>          devid    2 size 958.94MiB used 103.88MiB path /dev/sdd
> 
> since /dev/sdc has been replaced with /dev/sde, the /dev/sdc shouldn't be
> part of the btrfs-fsid when it reappears. If user want it to be part of it
> then sys admin should be using btrfs device add instead.
> 
> [1] github.com/anajain/devmgt.git
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> Changelog v3->v4:
> - Fix the over-80-charactor problem
> ---
>   fs/btrfs/volumes.c | 27 +++++++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index a9c11a0..16e71a1 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -508,6 +508,33 @@ static noinline int device_list_add(const char *path,
>   		ret = 1;
>   		device->fs_devices = fs_devices;
>   	} else if (!device->name || strcmp(device->name->str, path)) {
> +		/*
> +		 * When FS is already mounted.
> +		 * 1. If you are here and if the device->name is NULL that
> +		 *    means this device was missing at time of FS mount.
> +		 * 2. If you are here and if the device->name is different
> +		 *    from 'path' that means either
> +		 *      a. The same device disappeared and reappeared with
> +		 *         different name. or
> +		 *      b. The missing-disk-which-was-replaced, has
> +		 *         reappeared now.
> +		 *
> +		 * We must allow 1 and 2a above. But 2b would be a spurious
> +		 * and unintentional.
> +		 *
> +		 * Further in case of 1 and 2a above, the disk at 'path'
> +		 * would have missed some transaction when it was away and
> +		 * in case of 2a the stale bdev has to be updated as well.
> +		 * 2b must not be allowed at all time.
> +		 */
> +
> +		/*
> +		 * As of now don't allow update to btrfs_fs_device through
> +		 * the btrfs dev scan cli, after FS has been mounted.
> +		 */
> +		if (fs_devices->opened)
> +			return -EBUSY;
> +
>   		name = rcu_string_strdup(path, GFP_NOFS);
>   		if (!name)
>   			return -ENOMEM;
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
  2014-07-03 10:22 ` [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null Miao Xie
@ 2014-07-07  4:04   ` Anand Jain
  2014-07-07  4:22     ` Miao Xie
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2014-07-07  4:04 UTC (permalink / raw)
  To: Miao Xie; +Cc: linux-btrfs



On 03/07/2014 18:22, Miao Xie wrote:
> From: Anand Jain <Anand.Jain@oracle.com>
>
> when one of the device path is missing btrfs_device name is null. So this
> patch will check for that.
>
> stack:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: [<ffffffff812e18c0>] strlen+0x0/0x30
> [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
> [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
> [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
> [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
> [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
> [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
> [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
> [<ffffffff81193426>] ? blkdev_put+0x106/0x110
> [<ffffffff81179175>] ? mntput+0x35/0x40
> [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
> [<ffffffff8115c72e>] ? ____fput+0xe/0x10
> [<ffffffff81068033>] ? task_work_run+0xb3/0xd0
> [<ffffffff8116d547>] SyS_ioctl+0x57/0x90
> [<ffffffff817a793e>] ? do_page_fault+0xe/0x10
> [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b
>
> reproducer:
> mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
> btrfstune -S 1 /dev/sdg1
> modprobe -r btrfs && modprobe btrfs
> mount -o degraded /dev/sdg1 /btrfs
> btrfs dev add /dev/sdg3 /btrfs
>
> Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
> ---
> Changelog v1->v2:
> - Fix the problem that we forgot to set the missing flag for the cloned device
> ---
>   fs/btrfs/volumes.c | 25 ++++++++++++++++---------
>   1 file changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 1891541..4731bd6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>   		if (IS_ERR(device))
>   			goto error;
>
> -		/*
> -		 * This is ok to do without rcu read locked because we hold the
> -		 * uuid mutex so nothing we touch in here is going to disappear.
> -		 */
> -		name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
> -		if (!name) {
> -			kfree(device);
> -			goto error;
> +		if (orig_dev->missing) {
> +			device->missing = 1;
> +			fs_devices->missing_devices++;

  as mentioned in some places we just check name (for missing device)
  and  don't set the missing flag so it better to ..

  if (orig_dev->missing || !orig_dev->name) {
			device->missing = 1;
			fs_devices->missing_devices++;

> +		} else {
> +			ASSERT(orig_dev->name);
> +			/*
> +			 * This is ok to do without rcu read locked because
> +			 * we hold the uuid mutex so nothing we touch in here
> +			 * is going to disappear.
> +			 */
> +			name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
> +			if (!name) {
> +				kfree(device);
> +				goto error;
> +			}
> +			rcu_assign_pointer(device->name, name);
>   		}
> -		rcu_assign_pointer(device->name, name);
>
>   		list_add(&device->dev_list, &fs_devices->devices);
>   		device->fs_devices = fs_devices;
>

Thanks, Anand

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
  2014-07-07  4:04   ` Anand Jain
@ 2014-07-07  4:22     ` Miao Xie
  2014-07-07  9:56       ` Anand Jain
  0 siblings, 1 reply; 16+ messages in thread
From: Miao Xie @ 2014-07-07  4:22 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Mon, 7 Jul 2014 12:04:09 +0800, Anand Jain wrote:
>> when one of the device path is missing btrfs_device name is null. So this
>> patch will check for that.
>>
>> stack:
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>> IP: [<ffffffff812e18c0>] strlen+0x0/0x30
>> [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
>> [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
>> [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
>> [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
>> [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
>> [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
>> [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
>> [<ffffffff81193426>] ? blkdev_put+0x106/0x110
>> [<ffffffff81179175>] ? mntput+0x35/0x40
>> [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
>> [<ffffffff8115c72e>] ? ____fput+0xe/0x10
>> [<ffffffff81068033>] ? task_work_run+0xb3/0xd0
>> [<ffffffff8116d547>] SyS_ioctl+0x57/0x90
>> [<ffffffff817a793e>] ? do_page_fault+0xe/0x10
>> [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b
>>
>> reproducer:
>> mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
>> btrfstune -S 1 /dev/sdg1
>> modprobe -r btrfs && modprobe btrfs
>> mount -o degraded /dev/sdg1 /btrfs
>> btrfs dev add /dev/sdg3 /btrfs
>>
>> Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>> ---
>> Changelog v1->v2:
>> - Fix the problem that we forgot to set the missing flag for the cloned device
>> ---
>>   fs/btrfs/volumes.c | 25 ++++++++++++++++---------
>>   1 file changed, 16 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 1891541..4731bd6 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>>           if (IS_ERR(device))
>>               goto error;
>>
>> -        /*
>> -         * This is ok to do without rcu read locked because we hold the
>> -         * uuid mutex so nothing we touch in here is going to disappear.
>> -         */
>> -        name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
>> -        if (!name) {
>> -            kfree(device);
>> -            goto error;
>> +        if (orig_dev->missing) {
>> +            device->missing = 1;
>> +            fs_devices->missing_devices++;
> 
>  as mentioned in some places we just check name (for missing device)
>  and  don't set the missing flag so it better to ..
> 
>  if (orig_dev->missing || !orig_dev->name) {
>             device->missing = 1;
>             fs_devices->missing_devices++;

I don't think we need check name pointer here because only missing device
doesn't have its own name. Or there is something wrong in the code, so
I add assert in else branch. Am I right?

Thanks
Miao

> 
>> +        } else {
>> +            ASSERT(orig_dev->name);
>> +            /*
>> +             * This is ok to do without rcu read locked because
>> +             * we hold the uuid mutex so nothing we touch in here
>> +             * is going to disappear.
>> +             */
>> +            name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
>> +            if (!name) {
>> +                kfree(device);
>> +                goto error;
>> +            }
>> +            rcu_assign_pointer(device->name, name);
>>           }
>> -        rcu_assign_pointer(device->name, name);
>>
>>           list_add(&device->dev_list, &fs_devices->devices);
>>           device->fs_devices = fs_devices;
>>
> 
> Thanks, Anand
> .
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
  2014-07-07  4:22     ` Miao Xie
@ 2014-07-07  9:56       ` Anand Jain
  2014-07-08  2:11         ` Miao Xie
  0 siblings, 1 reply; 16+ messages in thread
From: Anand Jain @ 2014-07-07  9:56 UTC (permalink / raw)
  To: miaox; +Cc: linux-btrfs



On 07/07/2014 12:22, Miao Xie wrote:
> On Mon, 7 Jul 2014 12:04:09 +0800, Anand Jain wrote:
>>> when one of the device path is missing btrfs_device name is null. So this
>>> patch will check for that.
>>>
>>> stack:
>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>>> IP: [<ffffffff812e18c0>] strlen+0x0/0x30
>>> [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
>>> [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
>>> [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
>>> [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
>>> [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
>>> [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
>>> [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
>>> [<ffffffff81193426>] ? blkdev_put+0x106/0x110
>>> [<ffffffff81179175>] ? mntput+0x35/0x40
>>> [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
>>> [<ffffffff8115c72e>] ? ____fput+0xe/0x10
>>> [<ffffffff81068033>] ? task_work_run+0xb3/0xd0
>>> [<ffffffff8116d547>] SyS_ioctl+0x57/0x90
>>> [<ffffffff817a793e>] ? do_page_fault+0xe/0x10
>>> [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b
>>>
>>> reproducer:
>>> mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
>>> btrfstune -S 1 /dev/sdg1
>>> modprobe -r btrfs && modprobe btrfs
>>> mount -o degraded /dev/sdg1 /btrfs
>>> btrfs dev add /dev/sdg3 /btrfs
>>>
>>> Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>> ---
>>> Changelog v1->v2:
>>> - Fix the problem that we forgot to set the missing flag for the cloned device
>>> ---
>>>    fs/btrfs/volumes.c | 25 ++++++++++++++++---------
>>>    1 file changed, 16 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 1891541..4731bd6 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>>>            if (IS_ERR(device))
>>>                goto error;
>>>
>>> -        /*
>>> -         * This is ok to do without rcu read locked because we hold the
>>> -         * uuid mutex so nothing we touch in here is going to disappear.
>>> -         */
>>> -        name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
>>> -        if (!name) {
>>> -            kfree(device);
>>> -            goto error;
>>> +        if (orig_dev->missing) {
>>> +            device->missing = 1;
>>> +            fs_devices->missing_devices++;
>>
>>   as mentioned in some places we just check name (for missing device)
>>   and  don't set the missing flag so it better to ..
>>
>>   if (orig_dev->missing || !orig_dev->name) {
>>              device->missing = 1;
>>              fs_devices->missing_devices++;
>
> I don't think we need check name pointer here because only missing device
> doesn't have its own name. Or there is something wrong in the code, so
> I add assert in else branch. Am I right?

At few critical code, the below and I guess in the chunk/strips
function as well, we don't make use of missing flag, but rather
->name.

-----
btrfsic_process_superblock
::
                 if (!device->bdev || !device->name)
                         continue;
-----

  But here without !orig_dev->name check, is also good enough.

Thanks, Anand


>>> +        } else {
>>> +            ASSERT(orig_dev->name);
>>> +            /*
>>> +             * This is ok to do without rcu read locked because
>>> +             * we hold the uuid mutex so nothing we touch in here
>>> +             * is going to disappear.
>>> +             */
>>> +            name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
>>> +            if (!name) {
>>> +                kfree(device);
>>> +                goto error;
>>> +            }
>>> +            rcu_assign_pointer(device->name, name);
>>>            }
>>> -        rcu_assign_pointer(device->name, name);
>>>
>>>            list_add(&device->dev_list, &fs_devices->devices);
>>>            device->fs_devices = fs_devices;
>>>
>>
>> Thanks, Anand
>> .
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
  2014-07-07  9:56       ` Anand Jain
@ 2014-07-08  2:11         ` Miao Xie
  0 siblings, 0 replies; 16+ messages in thread
From: Miao Xie @ 2014-07-08  2:11 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On Mon, 7 Jul 2014 17:56:13 +0800, Anand Jain wrote:
> 
> 
> On 07/07/2014 12:22, Miao Xie wrote:
>> On Mon, 7 Jul 2014 12:04:09 +0800, Anand Jain wrote:
>>>> when one of the device path is missing btrfs_device name is null. So this
>>>> patch will check for that.
>>>>
>>>> stack:
>>>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
>>>> IP: [<ffffffff812e18c0>] strlen+0x0/0x30
>>>> [<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
>>>> [<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
>>>> [<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
>>>> [<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
>>>> [<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
>>>> [<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
>>>> [<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
>>>> [<ffffffff81193426>] ? blkdev_put+0x106/0x110
>>>> [<ffffffff81179175>] ? mntput+0x35/0x40
>>>> [<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
>>>> [<ffffffff8115c72e>] ? ____fput+0xe/0x10
>>>> [<ffffffff81068033>] ? task_work_run+0xb3/0xd0
>>>> [<ffffffff8116d547>] SyS_ioctl+0x57/0x90
>>>> [<ffffffff817a793e>] ? do_page_fault+0xe/0x10
>>>> [<ffffffff817abe52>] system_call_fastpath+0x16/0x1b
>>>>
>>>> reproducer:
>>>> mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
>>>> btrfstune -S 1 /dev/sdg1
>>>> modprobe -r btrfs && modprobe btrfs
>>>> mount -o degraded /dev/sdg1 /btrfs
>>>> btrfs dev add /dev/sdg3 /btrfs
>>>>
>>>> Signed-off-by: Anand Jain <Anand.Jain@oracle.com>
>>>> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
>>>> ---
>>>> Changelog v1->v2:
>>>> - Fix the problem that we forgot to set the missing flag for the cloned device
>>>> ---
>>>>    fs/btrfs/volumes.c | 25 ++++++++++++++++---------
>>>>    1 file changed, 16 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 1891541..4731bd6 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig)
>>>>            if (IS_ERR(device))
>>>>                goto error;
>>>>
>>>> -        /*
>>>> -         * This is ok to do without rcu read locked because we hold the
>>>> -         * uuid mutex so nothing we touch in here is going to disappear.
>>>> -         */
>>>> -        name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
>>>> -        if (!name) {
>>>> -            kfree(device);
>>>> -            goto error;
>>>> +        if (orig_dev->missing) {
>>>> +            device->missing = 1;
>>>> +            fs_devices->missing_devices++;
>>>
>>>   as mentioned in some places we just check name (for missing device)
>>>   and  don't set the missing flag so it better to ..
>>>
>>>   if (orig_dev->missing || !orig_dev->name) {
>>>              device->missing = 1;
>>>              fs_devices->missing_devices++;
>>
>> I don't think we need check name pointer here because only missing device
>> doesn't have its own name. Or there is something wrong in the code, so
>> I add assert in else branch. Am I right?
> 
> At few critical code, the below and I guess in the chunk/strips
> function as well, we don't make use of missing flag, but rather
> ->name.
> 
> -----
> btrfsic_process_superblock
> ::
>                 if (!device->bdev || !device->name)
>                         continue;
> -----
> 
>  But here without !orig_dev->name check, is also good enough.

Right.
According to the code, only missing device doesn't have its own name,
that is we can check the device is a missing device or not by missing flag
or its name pointer. Maybe we can remove missing flag, check the device
just by its name pointer(In order to make the code be more readable, maybe
we need introduce a function to wrap the missing device check)

Thanks
Miao

> 
> Thanks, Anand
> 
> 
>>>> +        } else {
>>>> +            ASSERT(orig_dev->name);
>>>> +            /*
>>>> +             * This is ok to do without rcu read locked because
>>>> +             * we hold the uuid mutex so nothing we touch in here
>>>> +             * is going to disappear.
>>>> +             */
>>>> +            name = rcu_string_strdup(orig_dev->name->str, GFP_NOFS);
>>>> +            if (!name) {
>>>> +                kfree(device);
>>>> +                goto error;
>>>> +            }
>>>> +            rcu_assign_pointer(device->name, name);
>>>>            }
>>>> -        rcu_assign_pointer(device->name, name);
>>>>
>>>>            list_add(&device->dev_list, &fs_devices->devices);
>>>>            device->fs_devices = fs_devices;
>>>>
>>>
>>> Thanks, Anand
>>> .
>>>
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> .
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-07-08  2:09 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-03 10:22 [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Miao Xie
2014-07-03 10:22 ` [PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid Miao Xie
2014-07-03 10:22 ` [PATCH RESEND 3/9] Btrfs: make defragment work with nodatacow option Miao Xie
2014-07-03 10:22 ` [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail Miao Xie
2014-07-04  1:58   ` Satoru Takeuchi
2014-07-03 10:22 ` [PATCH RESEND 5/9] Btrfs: fix missing error handler if submiting re-read bio fails Miao Xie
2014-07-03 10:22 ` [PATCH RESEND 6/9] Btrfs: cleanup the read failure record after write or when the inode is freeing Miao Xie
2014-07-03 10:22 ` [PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null Miao Xie
2014-07-07  4:04   ` Anand Jain
2014-07-07  4:22     ` Miao Xie
2014-07-07  9:56       ` Anand Jain
2014-07-08  2:11         ` Miao Xie
2014-07-03 10:22 ` [PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs Miao Xie
2014-07-03 10:22 ` [PATCH 9/9] Btrfs: fix writing data into the seed filesystem Miao Xie
2014-07-04  3:17   ` Liu Bo
2014-07-04  8:07 ` [PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted Satoru Takeuchi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.