All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Chunk allocator DUP fix and cleanups
@ 2018-10-04 21:24 Hans van Kranenburg
  2018-10-04 21:24 ` [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes Hans van Kranenburg
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

This patch set contains an additional fix for a newly exposed bug after
the previous attempt to fix a chunk allocator bug for new DUP chunks:

https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374

The DUP fix is "fix more DUP stripe size handling". I did that one
before starting to change more things so it can be applied to earlier
LTS kernels.

Besides that patch, which is fixing the bug in a way that is least
intrusive, I added a bunch of other patches to help getting the chunk
allocator code in a state that is a bit less error-prone and
bug-attracting.

When running this and trying the reproduction scenario, I can now see
that the created DUP device extent is 827326464 bytes long, which is
good.

I wrote and tested this on top of linus 4.19-rc5. I still need to create
a list of related use cases and test more things to at least walk
through a bunch of obvious use cases to see if there's nothing exploding
too quickly with these changes. However, I'm quite confident about it,
so I'm sharing all of it already.

Any feedback and review is appreciated. Be gentle and keep in mind that
I'm still very much in a learning stage regarding kernel development.

The stable patches handling workflow is not 100% clear to me yet. I
guess I have to add a Fixes: in the DUP patch which points to the
previous commit 92e222df7b.

Moo!,
Knorrie

Hans van Kranenburg (6):
  btrfs: alloc_chunk: do not refurbish num_bytes
  btrfs: alloc_chunk: improve chunk size variable name
  btrfs: alloc_chunk: fix more DUP stripe size handling
  btrfs: fix ncopies raid_attr for RAID56
  btrfs: introduce nparity raid_attr
  btrfs: alloc_chunk: rework chunk/stripe calculations

 fs/btrfs/volumes.c | 84 +++++++++++++++++++++++-----------------------
 fs/btrfs/volumes.h |  4 ++-
 2 files changed, 45 insertions(+), 43 deletions(-)

-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
@ 2018-10-04 21:24 ` Hans van Kranenburg
  2018-10-05  8:59   ` Nikolay Borisov
  2018-10-04 21:24 ` [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name Hans van Kranenburg
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

num_bytes is used to store the chunk length of the chunk that we're
allocating. Do not reuse it for something really different in the same
function.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 fs/btrfs/volumes.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f4405e430da6..9c9bb127eeee 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4837,8 +4837,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 		goto error_del_extent;
 
 	for (i = 0; i < map->num_stripes; i++) {
-		num_bytes = map->stripes[i].dev->bytes_used + stripe_size;
-		btrfs_device_set_bytes_used(map->stripes[i].dev, num_bytes);
+		btrfs_device_set_bytes_used(map->stripes[i].dev,
+					    map->stripes[i].dev->bytes_used +
+					    stripe_size);
 	}
 
 	atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space);
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
  2018-10-04 21:24 ` [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes Hans van Kranenburg
@ 2018-10-04 21:24 ` Hans van Kranenburg
  2018-10-04 21:36   ` Hans van Kranenburg
  2018-10-05  9:00   ` Nikolay Borisov
  2018-10-04 21:24 ` [PATCH 3/6] btrfs: alloc_chunk: fix more DUP stripe size handling Hans van Kranenburg
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

num_bytes is really a way too generic name for a variable in this
function. There are a dozen other variables that hold a number of bytes
as value.

Give it a name that actually describes what it does, which is holding
the size of the chunk that we're allocating.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 fs/btrfs/volumes.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9c9bb127eeee..40fa85e68b1f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -123,6 +123,8 @@ const char *get_raid_name(enum btrfs_raid_types type)
 	return btrfs_raid_array[type].raid_name;
 }
 
+
+
 static int init_first_rw_device(struct btrfs_trans_handle *trans,
 				struct btrfs_fs_info *fs_info);
 static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
@@ -4599,7 +4601,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	u64 max_stripe_size;
 	u64 max_chunk_size;
 	u64 stripe_size;
-	u64 num_bytes;
+	u64 chunk_size;
 	int ndevs;
 	int i;
 	int j;
@@ -4801,9 +4803,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	map->type = type;
 	map->sub_stripes = sub_stripes;
 
-	num_bytes = stripe_size * data_stripes;
+	chunk_size = stripe_size * data_stripes;
 
-	trace_btrfs_chunk_alloc(info, map, start, num_bytes);
+	trace_btrfs_chunk_alloc(info, map, start, chunk_size);
 
 	em = alloc_extent_map();
 	if (!em) {
@@ -4814,7 +4816,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
 	em->map_lookup = map;
 	em->start = start;
-	em->len = num_bytes;
+	em->len = chunk_size;
 	em->block_start = 0;
 	em->block_len = em->len;
 	em->orig_block_len = stripe_size;
@@ -4832,7 +4834,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	refcount_inc(&em->refs);
 	write_unlock(&em_tree->lock);
 
-	ret = btrfs_make_block_group(trans, 0, type, start, num_bytes);
+	ret = btrfs_make_block_group(trans, 0, type, start, chunk_size);
 	if (ret)
 		goto error_del_extent;
 
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/6] btrfs: alloc_chunk: fix more DUP stripe size handling
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
  2018-10-04 21:24 ` [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes Hans van Kranenburg
  2018-10-04 21:24 ` [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name Hans van Kranenburg
@ 2018-10-04 21:24 ` Hans van Kranenburg
  2018-10-04 21:24 ` [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56 Hans van Kranenburg
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

Commit 92e222df7b "btrfs: alloc_chunk: fix DUP stripe size handling"
fixed calculating the stripe_size for a new DUP chunk.

However, the same calculation reappears a bit later, and that one was
not changed yet. The resulting bug that is exposed is that the newly
allocated device extents ('stripes') can have a few MiB overlap with the
next thing stored after them, which is another device extent or the end
of the disk.

The scenario in which this can happen is:
* The block device for the filesystem is less than 10GiB in size.
* The amount of contiguous free unallocated disk space chosen to use for
  chunk allocation is 20% of the total device size, or a few MiB more or
  less.

An example:
- The filesystem device is 7880MiB (max_chunk_size gets set to 788MiB)
- There's 1578MiB unallocated raw disk space left in one contiguous
  piece.

In this case stripe_size is first calculated as 789MiB, (half of
1578MiB).

Since 789MiB (stripe_size * data_stripes) > 788MiB (max_chunk_size), we
enter the if block. Now stripe_size value is immediately overwritten
while calculating an adjusted value based on max_chunk_size, which ends
up as 788MiB.

Next, the value is rounded up to a 16MiB boundary, 800MiB, which is
actually more than the value we had before. However, the last comparison
fails to detect this, because it's comparing the value with the total
amount of free space, which is about twice the size of stripe_size.

In the example above, this means that the resulting raw disk space being
allocated is 1600MiB, while only a gap of 1578MiB has been found. The
second device extent object for this DUP chunk will overlap for 22MiB
with whatever comes next.

The underlying problem here is that the stripe_size is reused all the
time for different things. So, when entering the code in the if block,
stripe_size is immediately overwritten with something else. If later we
decide we want to have the previous value back, then the logic to
compute it was copy pasted in again.

With this change, the value in stripe_size is not unnecessarily
destroyed, so the duplicated calculation is not needed any more.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 fs/btrfs/volumes.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 40fa85e68b1f..7045814fc98d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4763,19 +4763,16 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	/*
 	 * Use the number of data stripes to figure out how big this chunk
 	 * is really going to be in terms of logical address space,
-	 * and compare that answer with the max chunk size
+	 * and compare that answer with the max chunk size. If it's higher,
+	 * we try to reduce stripe_size.
 	 */
 	if (stripe_size * data_stripes > max_chunk_size) {
-		stripe_size = div_u64(max_chunk_size, data_stripes);
-
-		/* bump the answer up to a 16MB boundary */
-		stripe_size = round_up(stripe_size, SZ_16M);
-
-		/*
-		 * But don't go higher than the limits we found while searching
-		 * for free extents
+		/* Reduce stripe_size, round it up to a 16MB boundary
+		 * again and then use it, unless it ends up being even
+		 * bigger than the previous value we had already.
 		 */
-		stripe_size = min(devices_info[ndevs - 1].max_avail,
+		stripe_size = min(round_up(div_u64(max_chunk_size,
+						   data_stripes), SZ_16M),
 				  stripe_size);
 	}
 
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
                   ` (2 preceding siblings ...)
  2018-10-04 21:24 ` [PATCH 3/6] btrfs: alloc_chunk: fix more DUP stripe size handling Hans van Kranenburg
@ 2018-10-04 21:24 ` Hans van Kranenburg
  2018-10-05  9:10   ` Nikolay Borisov
  2018-10-04 21:24 ` [PATCH 5/6] btrfs: introduce nparity raid_attr Hans van Kranenburg
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

RAID5 and RAID6 profile store one copy of the data, not 2 or 3. These
values are not used anywhere by the way.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 fs/btrfs/volumes.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7045814fc98d..d82b3d735ebe 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -96,7 +96,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.devs_min	= 2,
 		.tolerated_failures = 1,
 		.devs_increment	= 1,
-		.ncopies	= 2,
+		.ncopies	= 1,
 		.raid_name	= "raid5",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID5,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
@@ -108,7 +108,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.devs_min	= 3,
 		.tolerated_failures = 2,
 		.devs_increment	= 1,
-		.ncopies	= 3,
+		.ncopies	= 1,
 		.raid_name	= "raid6",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID6,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/6] btrfs: introduce nparity raid_attr
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
                   ` (3 preceding siblings ...)
  2018-10-04 21:24 ` [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56 Hans van Kranenburg
@ 2018-10-04 21:24 ` Hans van Kranenburg
  2018-10-04 22:21   ` Hans van Kranenburg
  2018-10-05 14:42   ` Nikolay Borisov
  2018-10-04 21:24 ` [PATCH 6/6] btrfs: alloc_chunk: rework chunk/stripe calculations Hans van Kranenburg
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

Instead of hardcoding exceptions for RAID5 and RAID6 in the code, use an
nparity field in raid_attr.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 fs/btrfs/volumes.c | 18 +++++++++++-------
 fs/btrfs/volumes.h |  2 ++
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d82b3d735ebe..453046497ac8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -37,6 +37,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 1,
 		.devs_increment	= 2,
 		.ncopies	= 2,
+		.nparity        = 0,
 		.raid_name	= "raid10",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID10,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
@@ -49,6 +50,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 1,
 		.devs_increment	= 2,
 		.ncopies	= 2,
+		.nparity        = 0,
 		.raid_name	= "raid1",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
@@ -61,6 +63,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 0,
 		.devs_increment	= 1,
 		.ncopies	= 2,
+		.nparity        = 0,
 		.raid_name	= "dup",
 		.bg_flag	= BTRFS_BLOCK_GROUP_DUP,
 		.mindev_error	= 0,
@@ -73,6 +76,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 0,
 		.devs_increment	= 1,
 		.ncopies	= 1,
+		.nparity        = 0,
 		.raid_name	= "raid0",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID0,
 		.mindev_error	= 0,
@@ -85,6 +89,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 0,
 		.devs_increment	= 1,
 		.ncopies	= 1,
+		.nparity        = 0,
 		.raid_name	= "single",
 		.bg_flag	= 0,
 		.mindev_error	= 0,
@@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 1,
 		.devs_increment	= 1,
 		.ncopies	= 1,
+		.nparity        = 2,
 		.raid_name	= "raid5",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID5,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
@@ -109,6 +115,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.tolerated_failures = 2,
 		.devs_increment	= 1,
 		.ncopies	= 1,
+		.nparity        = 2,
 		.raid_name	= "raid6",
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID6,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
@@ -4597,6 +4604,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	int devs_min;		/* min devs needed */
 	int devs_increment;	/* ndevs has to be a multiple of this */
 	int ncopies;		/* how many copies to data has */
+	int nparity;		/* number of stripes worth of bytes to
+				   store parity information */
 	int ret;
 	u64 max_stripe_size;
 	u64 max_chunk_size;
@@ -4623,6 +4632,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	devs_min = btrfs_raid_array[index].devs_min;
 	devs_increment = btrfs_raid_array[index].devs_increment;
 	ncopies = btrfs_raid_array[index].ncopies;
+	nparity = btrfs_raid_array[index].nparity;
 
 	if (type & BTRFS_BLOCK_GROUP_DATA) {
 		max_stripe_size = SZ_1G;
@@ -4752,13 +4762,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	 * this will have to be fixed for RAID1 and RAID10 over
 	 * more drives
 	 */
-	data_stripes = num_stripes / ncopies;
-
-	if (type & BTRFS_BLOCK_GROUP_RAID5)
-		data_stripes = num_stripes - 1;
-
-	if (type & BTRFS_BLOCK_GROUP_RAID6)
-		data_stripes = num_stripes - 2;
+	data_stripes = (num_stripes - nparity) / ncopies;
 
 	/*
 	 * Use the number of data stripes to figure out how big this chunk
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 23e9285d88de..0fe005b4295a 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -331,6 +331,8 @@ struct btrfs_raid_attr {
 	int tolerated_failures; /* max tolerated fail devs */
 	int devs_increment;	/* ndevs has to be a multiple of this */
 	int ncopies;		/* how many copies to data has */
+	int nparity;		/* number of stripes worth of bytes to
+				   store parity information */
 	int mindev_error;	/* error code if min devs requisite is unmet */
 	const char raid_name[8]; /* name of the raid */
 	u64 bg_flag;		/* block group flag of the raid */
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/6] btrfs: alloc_chunk: rework chunk/stripe calculations
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
                   ` (4 preceding siblings ...)
  2018-10-04 21:24 ` [PATCH 5/6] btrfs: introduce nparity raid_attr Hans van Kranenburg
@ 2018-10-04 21:24 ` Hans van Kranenburg
  2018-10-05  7:51 ` [PATCH 0/6] Chunk allocator DUP fix and cleanups Qu Wenruo
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:24 UTC (permalink / raw)
  To: linux-btrfs

Previously, the stripe_size variable was modified too many times in the
__btrfs_alloc_chunk function. The most problematic place was the if
block dealing with a chunk bigger than max_chunk_size, which would throw
away (overwrite) the value of stripe_size, maybe realizing a few lines
later that the previous value was actually better and executing a copy
of former logic to try get it back in the previous state.

Instead of on-the-fly calculating the target chunk size, adjust the
max_stripe_size variable based on the max_chunk_size that was set
before, and use that to simply compare it to stripe_size at some point.
This removes the whole problematic if block.

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
---
 fs/btrfs/volumes.c | 46 +++++++++++++++++++++-------------------------
 fs/btrfs/volumes.h |  2 +-
 2 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 453046497ac8..862ee17ee0e5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4596,14 +4596,15 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	struct btrfs_device_info *devices_info = NULL;
 	u64 total_avail;
 	int num_stripes;	/* total number of stripes to allocate */
-	int data_stripes;	/* number of stripes that count for
-				   block group size */
+	int num_data_stripes;	/* number of stripes worth of bytes to
+				   store data including copies */
 	int sub_stripes;	/* sub_stripes info for map */
 	int dev_stripes;	/* stripes per dev */
 	int devs_max;		/* max devs to use */
 	int devs_min;		/* min devs needed */
 	int devs_increment;	/* ndevs has to be a multiple of this */
-	int ncopies;		/* how many copies to data has */
+	int ncopies;		/* how many times actual data is duplicated
+				   inside num_data_stripes */
 	int nparity;		/* number of stripes worth of bytes to
 				   store parity information */
 	int ret;
@@ -4747,6 +4748,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	}
 
 	ndevs = min(ndevs, devs_max);
+	num_stripes = ndevs * dev_stripes;
+	num_data_stripes = num_stripes - nparity;
 
 	/*
 	 * The primary goal is to maximize the number of stripes, so use as
@@ -4756,31 +4759,24 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	 * max_avail is the total size so we have to adjust.
 	 */
 	stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
-	num_stripes = ndevs * dev_stripes;
-
-	/*
-	 * this will have to be fixed for RAID1 and RAID10 over
-	 * more drives
-	 */
-	data_stripes = (num_stripes - nparity) / ncopies;
 
 	/*
-	 * Use the number of data stripes to figure out how big this chunk
-	 * is really going to be in terms of logical address space,
-	 * and compare that answer with the max chunk size. If it's higher,
-	 * we try to reduce stripe_size.
+	 * Now that we know how many stripes we're going to use, we can adjust
+	 * down max_stripe_size if needed, paying attention to max_chunk_size.
+	 *
+	 * By multiplying chunk size with ncopies, we get the total amount of
+	 * bytes that need to fit into all the non-parity stripes.
+	 *
+	 * A chunk is allowed to end up being a bit bigger than max_chunk_size
+	 * when rounding up the stripe_size to a 16MiB boundary makes it so.
+	 * Unless... it ends up being bigger than the amount of physical free
+	 * space we can use for it.
 	 */
-	if (stripe_size * data_stripes > max_chunk_size) {
-		/* Reduce stripe_size, round it up to a 16MB boundary
-		 * again and then use it, unless it ends up being even
-		 * bigger than the previous value we had already.
-		 */
-		stripe_size = min(round_up(div_u64(max_chunk_size,
-						   data_stripes), SZ_16M),
-				  stripe_size);
-	}
+	max_stripe_size = min(round_up((max_chunk_size * ncopies) /
+				       num_data_stripes, SZ_16M),
+			      max_stripe_size);
 
-	/* align to BTRFS_STRIPE_LEN */
+	stripe_size = min(max_stripe_size, stripe_size);
 	stripe_size = round_down(stripe_size, BTRFS_STRIPE_LEN);
 
 	map = kmalloc(map_lookup_size(num_stripes), GFP_NOFS);
@@ -4804,7 +4800,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	map->type = type;
 	map->sub_stripes = sub_stripes;
 
-	chunk_size = stripe_size * data_stripes;
+	chunk_size = div_u64(stripe_size * num_data_stripes, ncopies);
 
 	trace_btrfs_chunk_alloc(info, map, start, chunk_size);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 0fe005b4295a..ee2ec77b1291 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -330,7 +330,7 @@ struct btrfs_raid_attr {
 	int devs_min;		/* min devs needed */
 	int tolerated_failures; /* max tolerated fail devs */
 	int devs_increment;	/* ndevs has to be a multiple of this */
-	int ncopies;		/* how many copies to data has */
+	int ncopies;		/* how many copies the data has */
 	int nparity;		/* number of stripes worth of bytes to
 				   store parity information */
 	int mindev_error;	/* error code if min devs requisite is unmet */
-- 
2.19.0.329.g76f2f5c1e3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name
  2018-10-04 21:24 ` [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name Hans van Kranenburg
@ 2018-10-04 21:36   ` Hans van Kranenburg
  2018-10-05  9:00   ` Nikolay Borisov
  1 sibling, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 21:36 UTC (permalink / raw)
  To: linux-btrfs

On 10/04/2018 11:24 PM, Hans van Kranenburg wrote:
> num_bytes is really a way too generic name for a variable in this
> function. There are a dozen other variables that hold a number of bytes
> as value.
> 
> Give it a name that actually describes what it does, which is holding
> the size of the chunk that we're allocating.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> ---
>  fs/btrfs/volumes.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9c9bb127eeee..40fa85e68b1f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -123,6 +123,8 @@ const char *get_raid_name(enum btrfs_raid_types type)
>  	return btrfs_raid_array[type].raid_name;
>  }
>  
> +
> +

Meh, missed that one, will remove.

>  static int init_first_rw_device(struct btrfs_trans_handle *trans,
>  				struct btrfs_fs_info *fs_info);
>  static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
> @@ -4599,7 +4601,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	u64 max_stripe_size;
>  	u64 max_chunk_size;
>  	u64 stripe_size;
> -	u64 num_bytes;
> +	u64 chunk_size;
>  	int ndevs;
>  	int i;
>  	int j;
> @@ -4801,9 +4803,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	map->type = type;
>  	map->sub_stripes = sub_stripes;
>  
> -	num_bytes = stripe_size * data_stripes;
> +	chunk_size = stripe_size * data_stripes;
>  
> -	trace_btrfs_chunk_alloc(info, map, start, num_bytes);
> +	trace_btrfs_chunk_alloc(info, map, start, chunk_size);
>  
>  	em = alloc_extent_map();
>  	if (!em) {
> @@ -4814,7 +4816,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
>  	em->map_lookup = map;
>  	em->start = start;
> -	em->len = num_bytes;
> +	em->len = chunk_size;
>  	em->block_start = 0;
>  	em->block_len = em->len;
>  	em->orig_block_len = stripe_size;
> @@ -4832,7 +4834,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	refcount_inc(&em->refs);
>  	write_unlock(&em_tree->lock);
>  
> -	ret = btrfs_make_block_group(trans, 0, type, start, num_bytes);
> +	ret = btrfs_make_block_group(trans, 0, type, start, chunk_size);
>  	if (ret)
>  		goto error_del_extent;
>  
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: introduce nparity raid_attr
  2018-10-04 21:24 ` [PATCH 5/6] btrfs: introduce nparity raid_attr Hans van Kranenburg
@ 2018-10-04 22:21   ` Hans van Kranenburg
  2018-10-05 14:42   ` Nikolay Borisov
  1 sibling, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-04 22:21 UTC (permalink / raw)
  To: linux-btrfs

On 10/04/2018 11:24 PM, Hans van Kranenburg wrote:
> Instead of hardcoding exceptions for RAID5 and RAID6 in the code, use an
> nparity field in raid_attr.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> ---
>  fs/btrfs/volumes.c | 18 +++++++++++-------
>  fs/btrfs/volumes.h |  2 ++
>  2 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d82b3d735ebe..453046497ac8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -37,6 +37,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 1,
>  		.devs_increment	= 2,
>  		.ncopies	= 2,
> +		.nparity        = 0,
>  		.raid_name	= "raid10",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID10,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
> @@ -49,6 +50,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 1,
>  		.devs_increment	= 2,
>  		.ncopies	= 2,
> +		.nparity        = 0,
>  		.raid_name	= "raid1",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
> @@ -61,6 +63,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 0,
>  		.devs_increment	= 1,
>  		.ncopies	= 2,
> +		.nparity        = 0,
>  		.raid_name	= "dup",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_DUP,
>  		.mindev_error	= 0,
> @@ -73,6 +76,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 0,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 0,
>  		.raid_name	= "raid0",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID0,
>  		.mindev_error	= 0,
> @@ -85,6 +89,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 0,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 0,
>  		.raid_name	= "single",
>  		.bg_flag	= 0,
>  		.mindev_error	= 0,
> @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 1,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 2,

Ahem, this obviously needs to be 1. Another reorganize/rebase-fail.

>  		.raid_name	= "raid5",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID5,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
> @@ -109,6 +115,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 2,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 2,
>  		.raid_name	= "raid6",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID6,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
> @@ -4597,6 +4604,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	int devs_min;		/* min devs needed */
>  	int devs_increment;	/* ndevs has to be a multiple of this */
>  	int ncopies;		/* how many copies to data has */
> +	int nparity;		/* number of stripes worth of bytes to
> +				   store parity information */
>  	int ret;
>  	u64 max_stripe_size;
>  	u64 max_chunk_size;
> @@ -4623,6 +4632,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	devs_min = btrfs_raid_array[index].devs_min;
>  	devs_increment = btrfs_raid_array[index].devs_increment;
>  	ncopies = btrfs_raid_array[index].ncopies;
> +	nparity = btrfs_raid_array[index].nparity;
>  
>  	if (type & BTRFS_BLOCK_GROUP_DATA) {
>  		max_stripe_size = SZ_1G;
> @@ -4752,13 +4762,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	 * this will have to be fixed for RAID1 and RAID10 over
>  	 * more drives
>  	 */
> -	data_stripes = num_stripes / ncopies;
> -
> -	if (type & BTRFS_BLOCK_GROUP_RAID5)
> -		data_stripes = num_stripes - 1;
> -
> -	if (type & BTRFS_BLOCK_GROUP_RAID6)
> -		data_stripes = num_stripes - 2;
> +	data_stripes = (num_stripes - nparity) / ncopies;
>  
>  	/*
>  	 * Use the number of data stripes to figure out how big this chunk
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 23e9285d88de..0fe005b4295a 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -331,6 +331,8 @@ struct btrfs_raid_attr {
>  	int tolerated_failures; /* max tolerated fail devs */
>  	int devs_increment;	/* ndevs has to be a multiple of this */
>  	int ncopies;		/* how many copies to data has */
> +	int nparity;		/* number of stripes worth of bytes to
> +				   store parity information */
>  	int mindev_error;	/* error code if min devs requisite is unmet */
>  	const char raid_name[8]; /* name of the raid */
>  	u64 bg_flag;		/* block group flag of the raid */
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
                   ` (5 preceding siblings ...)
  2018-10-04 21:24 ` [PATCH 6/6] btrfs: alloc_chunk: rework chunk/stripe calculations Hans van Kranenburg
@ 2018-10-05  7:51 ` Qu Wenruo
  2018-10-05 10:58   ` Hans van Kranenburg
  2018-10-05  7:51 ` Qu Wenruo
  2018-10-11 15:13 ` David Sterba
  8 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2018-10-05  7:51 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2308 bytes --]



On 2018/10/5 上午5:24, Hans van Kranenburg wrote:
> This patch set contains an additional fix for a newly exposed bug after
> the previous attempt to fix a chunk allocator bug for new DUP chunks:
> 
> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374

For that bug, did you succeeded in reproducing the bug?

I'm adding dev extent overlap checking in btrfs_verify_dev_extents() and
btrfs-progs.
I think it could help to detect such problem.

Thanks,
Qu

> 
> The DUP fix is "fix more DUP stripe size handling". I did that one
> before starting to change more things so it can be applied to earlier
> LTS kernels.
> 
> Besides that patch, which is fixing the bug in a way that is least
> intrusive, I added a bunch of other patches to help getting the chunk
> allocator code in a state that is a bit less error-prone and
> bug-attracting.
> 
> When running this and trying the reproduction scenario, I can now see
> that the created DUP device extent is 827326464 bytes long, which is
> good.
> 
> I wrote and tested this on top of linus 4.19-rc5. I still need to create
> a list of related use cases and test more things to at least walk
> through a bunch of obvious use cases to see if there's nothing exploding
> too quickly with these changes. However, I'm quite confident about it,
> so I'm sharing all of it already.
> 
> Any feedback and review is appreciated. Be gentle and keep in mind that
> I'm still very much in a learning stage regarding kernel development.
> 
> The stable patches handling workflow is not 100% clear to me yet. I
> guess I have to add a Fixes: in the DUP patch which points to the
> previous commit 92e222df7b.
> 
> Moo!,
> Knorrie
> 
> Hans van Kranenburg (6):
>   btrfs: alloc_chunk: do not refurbish num_bytes
>   btrfs: alloc_chunk: improve chunk size variable name
>   btrfs: alloc_chunk: fix more DUP stripe size handling
>   btrfs: fix ncopies raid_attr for RAID56
>   btrfs: introduce nparity raid_attr
>   btrfs: alloc_chunk: rework chunk/stripe calculations
> 
>  fs/btrfs/volumes.c | 84 +++++++++++++++++++++++-----------------------
>  fs/btrfs/volumes.h |  4 ++-
>  2 files changed, 45 insertions(+), 43 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
                   ` (6 preceding siblings ...)
  2018-10-05  7:51 ` [PATCH 0/6] Chunk allocator DUP fix and cleanups Qu Wenruo
@ 2018-10-05  7:51 ` Qu Wenruo
  2018-10-11 15:13 ` David Sterba
  8 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2018-10-05  7:51 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2306 bytes --]



On 2018/10/5 上午5:24, Hans van Kranenburg wrote:
> This patch set contains an additional fix for a newly exposed bug after
> the previous attempt to fix a chunk allocator bug for new DUP chunks:
> 
> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374

For that bug, did you succeed in reproducing the bug?

I'm adding dev extent overlap checking in btrfs_verify_dev_extents() and
btrfs-progs.
I think it could help to detect such problem.

Thanks,
Qu

> 
> The DUP fix is "fix more DUP stripe size handling". I did that one
> before starting to change more things so it can be applied to earlier
> LTS kernels.
> 
> Besides that patch, which is fixing the bug in a way that is least
> intrusive, I added a bunch of other patches to help getting the chunk
> allocator code in a state that is a bit less error-prone and
> bug-attracting.
> 
> When running this and trying the reproduction scenario, I can now see
> that the created DUP device extent is 827326464 bytes long, which is
> good.
> 
> I wrote and tested this on top of linus 4.19-rc5. I still need to create
> a list of related use cases and test more things to at least walk
> through a bunch of obvious use cases to see if there's nothing exploding
> too quickly with these changes. However, I'm quite confident about it,
> so I'm sharing all of it already.
> 
> Any feedback and review is appreciated. Be gentle and keep in mind that
> I'm still very much in a learning stage regarding kernel development.
> 
> The stable patches handling workflow is not 100% clear to me yet. I
> guess I have to add a Fixes: in the DUP patch which points to the
> previous commit 92e222df7b.
> 
> Moo!,
> Knorrie
> 
> Hans van Kranenburg (6):
>   btrfs: alloc_chunk: do not refurbish num_bytes
>   btrfs: alloc_chunk: improve chunk size variable name
>   btrfs: alloc_chunk: fix more DUP stripe size handling
>   btrfs: fix ncopies raid_attr for RAID56
>   btrfs: introduce nparity raid_attr
>   btrfs: alloc_chunk: rework chunk/stripe calculations
> 
>  fs/btrfs/volumes.c | 84 +++++++++++++++++++++++-----------------------
>  fs/btrfs/volumes.h |  4 ++-
>  2 files changed, 45 insertions(+), 43 deletions(-)
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes
  2018-10-04 21:24 ` [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes Hans van Kranenburg
@ 2018-10-05  8:59   ` Nikolay Borisov
  2018-10-05  9:00     ` Nikolay Borisov
  0 siblings, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2018-10-05  8:59 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs



On  5.10.2018 00:24, Hans van Kranenburg wrote:
> num_bytes is used to store the chunk length of the chunk that we're
> allocating. Do not reuse it for something really different in the same
> function.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

nit: That's a minor cleanup and brings no functional changes. I think it
even allows to give a more descriptive name of num_bytes such as
chunk_size, especially since we have a max_chunk_size. I think they
would pair nicely.

Anyway,

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/volumes.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index f4405e430da6..9c9bb127eeee 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -4837,8 +4837,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  		goto error_del_extent;
>  
>  	for (i = 0; i < map->num_stripes; i++) {
> -		num_bytes = map->stripes[i].dev->bytes_used + stripe_size;
> -		btrfs_device_set_bytes_used(map->stripes[i].dev, num_bytes);
> +		btrfs_device_set_bytes_used(map->stripes[i].dev,
> +					    map->stripes[i].dev->bytes_used +
> +					    stripe_size);
>  	}
>  
>  	atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space);
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes
  2018-10-05  8:59   ` Nikolay Borisov
@ 2018-10-05  9:00     ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-10-05  9:00 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs



On  5.10.2018 11:59, Nikolay Borisov wrote:
> 
> 
> On  5.10.2018 00:24, Hans van Kranenburg wrote:
>> num_bytes is used to store the chunk length of the chunk that we're
>> allocating. Do not reuse it for something really different in the same
>> function.
>>
>> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> 
> nit: That's a minor cleanup and brings no functional changes. I think it
> even allows to give a more descriptive name of num_bytes such as
> chunk_size, especially since we have a max_chunk_size. I think they
> would pair nicely.

I saw that your patch 2 actually does that, so ignore this comment :)


> 
> Anyway,
> 
> Reviewed-by: Nikolay Borisov <nborisov@suse.com>
> 
>> ---
>>  fs/btrfs/volumes.c | 5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index f4405e430da6..9c9bb127eeee 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -4837,8 +4837,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  		goto error_del_extent;
>>  
>>  	for (i = 0; i < map->num_stripes; i++) {
>> -		num_bytes = map->stripes[i].dev->bytes_used + stripe_size;
>> -		btrfs_device_set_bytes_used(map->stripes[i].dev, num_bytes);
>> +		btrfs_device_set_bytes_used(map->stripes[i].dev,
>> +					    map->stripes[i].dev->bytes_used +
>> +					    stripe_size);
>>  	}
>>  
>>  	atomic64_sub(stripe_size * map->num_stripes, &info->free_chunk_space);
>>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name
  2018-10-04 21:24 ` [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name Hans van Kranenburg
  2018-10-04 21:36   ` Hans van Kranenburg
@ 2018-10-05  9:00   ` Nikolay Borisov
  1 sibling, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-10-05  9:00 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs



On  5.10.2018 00:24, Hans van Kranenburg wrote:
> num_bytes is really a way too generic name for a variable in this
> function. There are a dozen other variables that hold a number of bytes
> as value.
> 
> Give it a name that actually describes what it does, which is holding
> the size of the chunk that we're allocating.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

Apart from the extra 2 spaces you can add:

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/volumes.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 9c9bb127eeee..40fa85e68b1f 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -123,6 +123,8 @@ const char *get_raid_name(enum btrfs_raid_types type)
>  	return btrfs_raid_array[type].raid_name;
>  }
>  
> +
> +
>  static int init_first_rw_device(struct btrfs_trans_handle *trans,
>  				struct btrfs_fs_info *fs_info);
>  static int btrfs_relocate_sys_chunks(struct btrfs_fs_info *fs_info);
> @@ -4599,7 +4601,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	u64 max_stripe_size;
>  	u64 max_chunk_size;
>  	u64 stripe_size;
> -	u64 num_bytes;
> +	u64 chunk_size;
>  	int ndevs;
>  	int i;
>  	int j;
> @@ -4801,9 +4803,9 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	map->type = type;
>  	map->sub_stripes = sub_stripes;
>  
> -	num_bytes = stripe_size * data_stripes;
> +	chunk_size = stripe_size * data_stripes;
>  
> -	trace_btrfs_chunk_alloc(info, map, start, num_bytes);
> +	trace_btrfs_chunk_alloc(info, map, start, chunk_size);
>  
>  	em = alloc_extent_map();
>  	if (!em) {
> @@ -4814,7 +4816,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	set_bit(EXTENT_FLAG_FS_MAPPING, &em->flags);
>  	em->map_lookup = map;
>  	em->start = start;
> -	em->len = num_bytes;
> +	em->len = chunk_size;
>  	em->block_start = 0;
>  	em->block_len = em->len;
>  	em->orig_block_len = stripe_size;
> @@ -4832,7 +4834,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	refcount_inc(&em->refs);
>  	write_unlock(&em_tree->lock);
>  
> -	ret = btrfs_make_block_group(trans, 0, type, start, num_bytes);
> +	ret = btrfs_make_block_group(trans, 0, type, start, chunk_size);
>  	if (ret)
>  		goto error_del_extent;
>  
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56
  2018-10-04 21:24 ` [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56 Hans van Kranenburg
@ 2018-10-05  9:10   ` Nikolay Borisov
  0 siblings, 0 replies; 26+ messages in thread
From: Nikolay Borisov @ 2018-10-05  9:10 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs



On  5.10.2018 00:24, Hans van Kranenburg wrote:
> RAID5 and RAID6 profile store one copy of the data, not 2 or 3. These
> values are not used anywhere by the way.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>

Reviewed-by: Nikolay Borisov <nborisov@suse.com>

> ---
>  fs/btrfs/volumes.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 7045814fc98d..d82b3d735ebe 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -96,7 +96,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.devs_min	= 2,
>  		.tolerated_failures = 1,
>  		.devs_increment	= 1,
> -		.ncopies	= 2,
> +		.ncopies	= 1,
>  		.raid_name	= "raid5",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID5,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
> @@ -108,7 +108,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.devs_min	= 3,
>  		.tolerated_failures = 2,
>  		.devs_increment	= 1,
> -		.ncopies	= 3,
> +		.ncopies	= 1,
>  		.raid_name	= "raid6",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID6,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-05  7:51 ` [PATCH 0/6] Chunk allocator DUP fix and cleanups Qu Wenruo
@ 2018-10-05 10:58   ` Hans van Kranenburg
  2018-10-08  6:43     ` Qu Wenruo
  0 siblings, 1 reply; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-05 10:58 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2513 bytes --]

On 10/05/2018 09:51 AM, Qu Wenruo wrote:
> 
> 
> On 2018/10/5 上午5:24, Hans van Kranenburg wrote:
>> This patch set contains an additional fix for a newly exposed bug after
>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>
>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
> 
> For that bug, did you succeeded in reproducing the bug?

Yes, open the above link and scroll to "Steps to reproduce".

o/

> I'm adding dev extent overlap checking in btrfs_verify_dev_extents() and
> btrfs-progs.
> I think it could help to detect such problem.
> 
> Thanks,
> Qu
> 
>>
>> The DUP fix is "fix more DUP stripe size handling". I did that one
>> before starting to change more things so it can be applied to earlier
>> LTS kernels.
>>
>> Besides that patch, which is fixing the bug in a way that is least
>> intrusive, I added a bunch of other patches to help getting the chunk
>> allocator code in a state that is a bit less error-prone and
>> bug-attracting.
>>
>> When running this and trying the reproduction scenario, I can now see
>> that the created DUP device extent is 827326464 bytes long, which is
>> good.
>>
>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>> a list of related use cases and test more things to at least walk
>> through a bunch of obvious use cases to see if there's nothing exploding
>> too quickly with these changes. However, I'm quite confident about it,
>> so I'm sharing all of it already.
>>
>> Any feedback and review is appreciated. Be gentle and keep in mind that
>> I'm still very much in a learning stage regarding kernel development.
>>
>> The stable patches handling workflow is not 100% clear to me yet. I
>> guess I have to add a Fixes: in the DUP patch which points to the
>> previous commit 92e222df7b.
>>
>> Moo!,
>> Knorrie
>>
>> Hans van Kranenburg (6):
>>   btrfs: alloc_chunk: do not refurbish num_bytes
>>   btrfs: alloc_chunk: improve chunk size variable name
>>   btrfs: alloc_chunk: fix more DUP stripe size handling
>>   btrfs: fix ncopies raid_attr for RAID56
>>   btrfs: introduce nparity raid_attr
>>   btrfs: alloc_chunk: rework chunk/stripe calculations
>>
>>  fs/btrfs/volumes.c | 84 +++++++++++++++++++++++-----------------------
>>  fs/btrfs/volumes.h |  4 ++-
>>  2 files changed, 45 insertions(+), 43 deletions(-)
>>
> 


-- 
Hans van Kranenburg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: introduce nparity raid_attr
  2018-10-04 21:24 ` [PATCH 5/6] btrfs: introduce nparity raid_attr Hans van Kranenburg
  2018-10-04 22:21   ` Hans van Kranenburg
@ 2018-10-05 14:42   ` Nikolay Borisov
  2018-10-05 22:45     ` Hans van Kranenburg
  1 sibling, 1 reply; 26+ messages in thread
From: Nikolay Borisov @ 2018-10-05 14:42 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs



On  5.10.2018 00:24, Hans van Kranenburg wrote:
> Instead of hardcoding exceptions for RAID5 and RAID6 in the code, use an
> nparity field in raid_attr.
> 
> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
> ---
>  fs/btrfs/volumes.c | 18 +++++++++++-------
>  fs/btrfs/volumes.h |  2 ++
>  2 files changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index d82b3d735ebe..453046497ac8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -37,6 +37,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 1,
>  		.devs_increment	= 2,
>  		.ncopies	= 2,
> +		.nparity        = 0,
>  		.raid_name	= "raid10",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID10,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
> @@ -49,6 +50,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 1,
>  		.devs_increment	= 2,
>  		.ncopies	= 2,
> +		.nparity        = 0,
>  		.raid_name	= "raid1",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
> @@ -61,6 +63,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 0,
>  		.devs_increment	= 1,
>  		.ncopies	= 2,
> +		.nparity        = 0,
>  		.raid_name	= "dup",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_DUP,
>  		.mindev_error	= 0,
> @@ -73,6 +76,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 0,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 0,
>  		.raid_name	= "raid0",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID0,
>  		.mindev_error	= 0,
> @@ -85,6 +89,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 0,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 0,
>  		.raid_name	= "single",
>  		.bg_flag	= 0,
>  		.mindev_error	= 0,
> @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 1,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 2,
>  		.raid_name	= "raid5",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID5,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
> @@ -109,6 +115,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>  		.tolerated_failures = 2,
>  		.devs_increment	= 1,
>  		.ncopies	= 1,
> +		.nparity        = 2,
>  		.raid_name	= "raid6",
>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID6,
>  		.mindev_error	= BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
> @@ -4597,6 +4604,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	int devs_min;		/* min devs needed */
>  	int devs_increment;	/* ndevs has to be a multiple of this */
>  	int ncopies;		/* how many copies to data has */
> +	int nparity;		/* number of stripes worth of bytes to
> +				   store parity information */
>  	int ret;
>  	u64 max_stripe_size;
>  	u64 max_chunk_size;
> @@ -4623,6 +4632,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	devs_min = btrfs_raid_array[index].devs_min;
>  	devs_increment = btrfs_raid_array[index].devs_increment;
>  	ncopies = btrfs_raid_array[index].ncopies;
> +	nparity = btrfs_raid_array[index].nparity;
>  
>  	if (type & BTRFS_BLOCK_GROUP_DATA) {
>  		max_stripe_size = SZ_1G;
> @@ -4752,13 +4762,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>  	 * this will have to be fixed for RAID1 and RAID10 over
>  	 * more drives
>  	 */
> -	data_stripes = num_stripes / ncopies;
> -
> -	if (type & BTRFS_BLOCK_GROUP_RAID5)
> -		data_stripes = num_stripes - 1;
> -
> -	if (type & BTRFS_BLOCK_GROUP_RAID6)
> -		data_stripes = num_stripes - 2;
> +	data_stripes = (num_stripes - nparity) / ncopies;
>  
>  	/*
>  	 * Use the number of data stripes to figure out how big this chunk
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 23e9285d88de..0fe005b4295a 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -331,6 +331,8 @@ struct btrfs_raid_attr {
>  	int tolerated_failures; /* max tolerated fail devs */
>  	int devs_increment;	/* ndevs has to be a multiple of this */
>  	int ncopies;		/* how many copies to data has */
> +	int nparity;		/* number of stripes worth of bytes to
> +				   store parity information */

The description can be simplified to: number of parity stripes

>  	int mindev_error;	/* error code if min devs requisite is unmet */
>  	const char raid_name[8]; /* name of the raid */
>  	u64 bg_flag;		/* block group flag of the raid */
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 5/6] btrfs: introduce nparity raid_attr
  2018-10-05 14:42   ` Nikolay Borisov
@ 2018-10-05 22:45     ` Hans van Kranenburg
  0 siblings, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-05 22:45 UTC (permalink / raw)
  To: Nikolay Borisov, linux-btrfs

On 10/05/2018 04:42 PM, Nikolay Borisov wrote:
> 
> 
> On  5.10.2018 00:24, Hans van Kranenburg wrote:
>> Instead of hardcoding exceptions for RAID5 and RAID6 in the code, use an
>> nparity field in raid_attr.
>>
>> Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
>> ---
>>  fs/btrfs/volumes.c | 18 +++++++++++-------
>>  fs/btrfs/volumes.h |  2 ++
>>  2 files changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index d82b3d735ebe..453046497ac8 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -37,6 +37,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 1,
>>  		.devs_increment	= 2,
>>  		.ncopies	= 2,
>> +		.nparity        = 0,
>>  		.raid_name	= "raid10",
>>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID10,
>>  		.mindev_error	= BTRFS_ERROR_DEV_RAID10_MIN_NOT_MET,
>> @@ -49,6 +50,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 1,
>>  		.devs_increment	= 2,
>>  		.ncopies	= 2,
>> +		.nparity        = 0,
>>  		.raid_name	= "raid1",
>>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1,
>>  		.mindev_error	= BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
>> @@ -61,6 +63,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 0,
>>  		.devs_increment	= 1,
>>  		.ncopies	= 2,
>> +		.nparity        = 0,
>>  		.raid_name	= "dup",
>>  		.bg_flag	= BTRFS_BLOCK_GROUP_DUP,
>>  		.mindev_error	= 0,
>> @@ -73,6 +76,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 0,
>>  		.devs_increment	= 1,
>>  		.ncopies	= 1,
>> +		.nparity        = 0,
>>  		.raid_name	= "raid0",
>>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID0,
>>  		.mindev_error	= 0,
>> @@ -85,6 +89,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 0,
>>  		.devs_increment	= 1,
>>  		.ncopies	= 1,
>> +		.nparity        = 0,
>>  		.raid_name	= "single",
>>  		.bg_flag	= 0,
>>  		.mindev_error	= 0,
>> @@ -97,6 +102,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 1,
>>  		.devs_increment	= 1,
>>  		.ncopies	= 1,
>> +		.nparity        = 2,
>>  		.raid_name	= "raid5",
>>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID5,
>>  		.mindev_error	= BTRFS_ERROR_DEV_RAID5_MIN_NOT_MET,
>> @@ -109,6 +115,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
>>  		.tolerated_failures = 2,
>>  		.devs_increment	= 1,
>>  		.ncopies	= 1,
>> +		.nparity        = 2,
>>  		.raid_name	= "raid6",
>>  		.bg_flag	= BTRFS_BLOCK_GROUP_RAID6,
>>  		.mindev_error	= BTRFS_ERROR_DEV_RAID6_MIN_NOT_MET,
>> @@ -4597,6 +4604,8 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	int devs_min;		/* min devs needed */
>>  	int devs_increment;	/* ndevs has to be a multiple of this */
>>  	int ncopies;		/* how many copies to data has */
>> +	int nparity;		/* number of stripes worth of bytes to
>> +				   store parity information */
>>  	int ret;
>>  	u64 max_stripe_size;
>>  	u64 max_chunk_size;
>> @@ -4623,6 +4632,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	devs_min = btrfs_raid_array[index].devs_min;
>>  	devs_increment = btrfs_raid_array[index].devs_increment;
>>  	ncopies = btrfs_raid_array[index].ncopies;
>> +	nparity = btrfs_raid_array[index].nparity;
>>  
>>  	if (type & BTRFS_BLOCK_GROUP_DATA) {
>>  		max_stripe_size = SZ_1G;
>> @@ -4752,13 +4762,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
>>  	 * this will have to be fixed for RAID1 and RAID10 over
>>  	 * more drives
>>  	 */
>> -	data_stripes = num_stripes / ncopies;
>> -
>> -	if (type & BTRFS_BLOCK_GROUP_RAID5)
>> -		data_stripes = num_stripes - 1;
>> -
>> -	if (type & BTRFS_BLOCK_GROUP_RAID6)
>> -		data_stripes = num_stripes - 2;
>> +	data_stripes = (num_stripes - nparity) / ncopies;
>>  
>>  	/*
>>  	 * Use the number of data stripes to figure out how big this chunk
>> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
>> index 23e9285d88de..0fe005b4295a 100644
>> --- a/fs/btrfs/volumes.h
>> +++ b/fs/btrfs/volumes.h
>> @@ -331,6 +331,8 @@ struct btrfs_raid_attr {
>>  	int tolerated_failures; /* max tolerated fail devs */
>>  	int devs_increment;	/* ndevs has to be a multiple of this */
>>  	int ncopies;		/* how many copies to data has */
>> +	int nparity;		/* number of stripes worth of bytes to
>> +				   store parity information */
> 
> The description can be simplified to: number of parity stripes

Well, if 'stripe' is a synonym for 'device extent' here, then
technically that's incorrect, since all profiles using parity do
distribute the parity data over all stripes. IOW, it's not RAID4.

For the calculations in here we treat it like that, because on this
level we don't care about what goes where later, just that the total
amount of allocated bytes is right.

That's why I thought putting a little hint in there, to make the reader
think about it and then realize that it's actually not that simple. And
apparently that works. :)

> 
>>  	int mindev_error;	/* error code if min devs requisite is unmet */
>>  	const char raid_name[8]; /* name of the raid */
>>  	u64 bg_flag;		/* block group flag of the raid */
>>


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-05 10:58   ` Hans van Kranenburg
@ 2018-10-08  6:43     ` Qu Wenruo
  2018-10-08 13:19       ` Hans van Kranenburg
  0 siblings, 1 reply; 26+ messages in thread
From: Qu Wenruo @ 2018-10-08  6:43 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2844 bytes --]



On 2018/10/5 下午6:58, Hans van Kranenburg wrote:
> On 10/05/2018 09:51 AM, Qu Wenruo wrote:
>>
>>
>> On 2018/10/5 上午5:24, Hans van Kranenburg wrote:
>>> This patch set contains an additional fix for a newly exposed bug after
>>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>>
>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>
>> For that bug, did you succeeded in reproducing the bug?
> 
> Yes, open the above link and scroll to "Steps to reproduce".

That's beyond device boundary one. Also reproduced here.
And hand-crafted a super small image as test case for btrfs-progs.

But I'm a little curious about the dev extent overlapping case.
Have you got one?

Thanks,
Qu

> 
> o/
> 
>> I'm adding dev extent overlap checking in btrfs_verify_dev_extents() and
>> btrfs-progs.
>> I think it could help to detect such problem.
>>
>> Thanks,
>> Qu
>>
>>>
>>> The DUP fix is "fix more DUP stripe size handling". I did that one
>>> before starting to change more things so it can be applied to earlier
>>> LTS kernels.
>>>
>>> Besides that patch, which is fixing the bug in a way that is least
>>> intrusive, I added a bunch of other patches to help getting the chunk
>>> allocator code in a state that is a bit less error-prone and
>>> bug-attracting.
>>>
>>> When running this and trying the reproduction scenario, I can now see
>>> that the created DUP device extent is 827326464 bytes long, which is
>>> good.
>>>
>>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>>> a list of related use cases and test more things to at least walk
>>> through a bunch of obvious use cases to see if there's nothing exploding
>>> too quickly with these changes. However, I'm quite confident about it,
>>> so I'm sharing all of it already.
>>>
>>> Any feedback and review is appreciated. Be gentle and keep in mind that
>>> I'm still very much in a learning stage regarding kernel development.
>>>
>>> The stable patches handling workflow is not 100% clear to me yet. I
>>> guess I have to add a Fixes: in the DUP patch which points to the
>>> previous commit 92e222df7b.
>>>
>>> Moo!,
>>> Knorrie
>>>
>>> Hans van Kranenburg (6):
>>>   btrfs: alloc_chunk: do not refurbish num_bytes
>>>   btrfs: alloc_chunk: improve chunk size variable name
>>>   btrfs: alloc_chunk: fix more DUP stripe size handling
>>>   btrfs: fix ncopies raid_attr for RAID56
>>>   btrfs: introduce nparity raid_attr
>>>   btrfs: alloc_chunk: rework chunk/stripe calculations
>>>
>>>  fs/btrfs/volumes.c | 84 +++++++++++++++++++++++-----------------------
>>>  fs/btrfs/volumes.h |  4 ++-
>>>  2 files changed, 45 insertions(+), 43 deletions(-)
>>>
>>
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-08  6:43     ` Qu Wenruo
@ 2018-10-08 13:19       ` Hans van Kranenburg
  2018-10-08 23:51         ` Hans van Kranenburg
  0 siblings, 1 reply; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-08 13:19 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1220 bytes --]

On 10/08/2018 08:43 AM, Qu Wenruo wrote:
> 
> 
> On 2018/10/5 下午6:58, Hans van Kranenburg wrote:
>> On 10/05/2018 09:51 AM, Qu Wenruo wrote:
>>>
>>>
>>> On 2018/10/5 上午5:24, Hans van Kranenburg wrote:
>>>> This patch set contains an additional fix for a newly exposed bug after
>>>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>>>
>>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>>
>>> For that bug, did you succeeded in reproducing the bug?
>>
>> Yes, open the above link and scroll to "Steps to reproduce".
> 
> That's beyond device boundary one. Also reproduced here.
> And hand-crafted a super small image as test case for btrfs-progs.
> 
> But I'm a little curious about the dev extent overlapping case.
> Have you got one?

Ah ok, I see. No, I didn't do that yet.

By using unmodified tooling, I think this can be done by a combination
of a few resizings and using very specific balance to cause a fs of
exactly 7880MiB again with a single 1578MiB gap inside...

I'll try later today to see if I can come up with a recipe for this.

-- 
Hans van Kranenburg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-08 13:19       ` Hans van Kranenburg
@ 2018-10-08 23:51         ` Hans van Kranenburg
  0 siblings, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-08 23:51 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 13020 bytes --]

On 10/08/2018 03:19 PM, Hans van Kranenburg wrote:
> On 10/08/2018 08:43 AM, Qu Wenruo wrote:
>>
>>
>> On 2018/10/5 下午6:58, Hans van Kranenburg wrote:
>>> On 10/05/2018 09:51 AM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2018/10/5 上午5:24, Hans van Kranenburg wrote:
>>>>> This patch set contains an additional fix for a newly exposed bug after
>>>>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>>>>
>>>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>>>
>>>> For that bug, did you succeeded in reproducing the bug?
>>>
>>> Yes, open the above link and scroll to "Steps to reproduce".
>>
>> That's beyond device boundary one. Also reproduced here.
>> And hand-crafted a super small image as test case for btrfs-progs.
>>
>> But I'm a little curious about the dev extent overlapping case.
>> Have you got one?
> 
> Ah ok, I see. No, I didn't do that yet.
> 
> By using unmodified tooling, I think this can be done by a combination
> of a few resizings and using very specific balance to cause a fs of
> exactly 7880MiB again with a single 1578MiB gap inside...
> 
> I'll try later today to see if I can come up with a recipe for this.

Ok, this is actually pretty simple to do:

---- >8 ----

-# mkdir bork
-# cd bork
-# dd if=/dev/zero of=image bs=1 count=0 seek=1024M
0+0 records in
0+0 records out
0 bytes copied, 0.000183343 s, 0.0 kB/s
-# mkfs.btrfs -d dup -m dup image
-# losetup -f image
-# mount -o space_cache=v2 /dev/loop0 mountpoint
-# dd if=/dev/zero of=mountpoint/flapsie bs=1M
dd: error writing 'mountpoint/flapsie': No space left on device
453+0 records in
452+0 records out
474185728 bytes (474 MB, 452 MiB) copied, 4.07663 s, 116 MB/s

---- >8 ----

-# ./show_usage.py /bork/mountpoint/
Target profile for SYSTEM (chunk tree): DUP
Target profile for METADATA: DUP
Target profile for DATA: DUP
Mixed groups: False

Virtual space usage by block group type:
|
| type                    total         used
| ----                    -----         ----
| Data                452.31MiB    452.22MiB
| System                8.00MiB     16.00KiB
| Metadata             51.19MiB    656.00KiB

Total raw filesystem size: 1.00GiB
Total raw allocated bytes: 1023.00MiB
Allocatable bytes remaining: 1.00MiB
Unallocatable raw bytes: 0.00B
Unallocatable bytes that can be reclaimed by balancing: 0.00B

Estimated virtual space left to use for metadata: 51.05MiB
Estimated virtual space left to use for data: 96.00KiB

Allocated raw disk bytes by chunk type. Parity is a reserved part of the
allocated bytes, limiting the amount that can be used for data or metadata:
|
| flags               allocated         used       parity
| -----               ---------         ----       ------
| DATA|DUP            904.62MiB    904.44MiB        0.00B
| SYSTEM|DUP           16.00MiB     32.00KiB        0.00B
| METADATA|DUP        102.38MiB      1.28MiB        0.00B

Allocated bytes per device:
|
| devid      total size    allocated path
| -----      ----------    --------- ----
| 1             1.00GiB   1023.00MiB /dev/loop0

Allocated bytes per device, split up per chunk type. Parity bytes are again
part of the total amount of allocated bytes.
|
| Device ID: 1
| | flags               allocated       parity
| | -----               ---------       ------
| | DATA|DUP            904.62MiB        0.00B
| | SYSTEM|DUP           16.00MiB        0.00B
| | METADATA|DUP        102.38MiB        0.00B

Unallocatable raw bytes per device:
|
| devid    unallocatable
| -----    -------------
| 1               0.00B

---- >8 ----

Now we're gonna cause some neat 1578MiB to happen that we're going to
free up later to reproduce the failure:

-# dd if=/dev/zero of=image bs=1 count=0 seek=2602M
0+0 records in
0+0 records out
0 bytes copied, 0.000188621 s, 0.0 kB/s
-# losetup -c /dev/loop0
-# btrfs fi resize max mountpoint/
Resize 'mountpoint/' of 'max'
-# dd if=/dev/zero of=mountpoint/1578MiB bs=1M
dd: error writing 'mountpoint/1578MiB': No space left on device
790+0 records in
789+0 records out
827326464 bytes (827 MB, 789 MiB) copied, 12.2452 s, 67.6 MB/s

---- >8 ----

-# python3
import btrfs
fs = btrfs.FileSystem('/bork/mountpoint/')
for d in fs.dev_extents():
  print("start {} end {} vaddr {}".format(d.paddr, d.paddr + d.length,
d.vaddr))

start 1048576 end 11534336 vaddr 547880960
start 11534336 end 22020096 vaddr 547880960
start 22020096 end 30408704 vaddr 22020096
start 30408704 end 38797312 vaddr 22020096
start 38797312 end 92471296 vaddr 30408704
start 92471296 end 146145280 vaddr 30408704
start 146145280 end 213254144 vaddr 84082688
start 213254144 end 280363008 vaddr 84082688
start 280363008 end 397803520 vaddr 151191552
start 397803520 end 515244032 vaddr 151191552
start 515244032 end 632684544 vaddr 268632064
start 632684544 end 750125056 vaddr 268632064
start 750125056 end 867565568 vaddr 386072576
start 867565568 end 985006080 vaddr 386072576
start 985006080 end 1029373952 vaddr 503513088
start 1029373952 end 1073741824 vaddr 503513088
start 1073741824 end 1358954496 vaddr 558366720
start 1358954496 end 1644167168 vaddr 558366720
start 1644167168 end 1929379840 vaddr 843579392
start 1929379840 end 2214592512 vaddr 843579392
start 2214592512 end 2471493632 vaddr 1128792064
start 2471493632 end 2728394752 vaddr 1128792064

The last three block groups were just added, using up the new space:

for vaddr in (558366720, 843579392, 1128792064):
  print(fs.block_group(vaddr))

block group vaddr 558366720 length 285212672 flags DATA|DUP used
285212672 used_pct 100
block group vaddr 843579392 length 285212672 flags DATA|DUP used
285212672 used_pct 100
block group vaddr 1128792064 length 256901120 flags DATA|DUP used
256901120 used_pct 100

By the way.. for the first and second time (to do the writeup) I did
this, extent allocation looks like paste linked below O_o... I have no
idea where the pattern with decreasing sizes the first time is coming
from, and no idea why the second time all the data is being stored in
144KiB extents...

http://paste.debian.net/plainh/537bdd95

---- >8 ----

Ok, now let's extend the disk to our famous number of 7880M

-# dd if=/dev/zero of=image bs=1 count=0 seek=7880M
0+0 records in
0+0 records out
0 bytes copied, 0.000185768 s, 0.0 kB/s
-# losetup -c /dev/loop0
-# btrfs fi resize max mountpoint/
Resize 'mountpoint/' of 'max'

-# ./show_usage.py /bork/mountpoint/
[...]
Total raw filesystem size: 7.70GiB
Total raw allocated bytes: 2.54GiB
Allocatable bytes remaining: 5.16GiB
[...]

-# dd if=/dev/zero of=mountpoint/bakkiepleur bs=1M
dd: error writing 'mountpoint/bakkiepleur': No space left on device
2384+0 records in
2383+0 records out
2498756608 bytes (2.5 GB, 2.3 GiB) copied, 34.1946 s, 73.1 MB/s

-# ./show_usage.py /bork/mountpoint/
Target profile for SYSTEM (chunk tree): DUP
Target profile for METADATA: DUP
Target profile for DATA: DUP
Mixed groups: False

Virtual space usage by block group type:
|
| type                    total         used
| ----                    -----         ----
| Data                  3.54GiB      3.54GiB
| System                8.00MiB     16.00KiB
| Metadata            307.19MiB    102.05MiB

Total raw filesystem size: 7.70GiB
Total raw allocated bytes: 7.69GiB
Allocatable bytes remaining: 1.00MiB
Unallocatable raw bytes: 0.00B
Unallocatable bytes that can be reclaimed by balancing: 0.00B

Estimated virtual space left to use for metadata: 205.64MiB
Estimated virtual space left to use for data: 0.00B

Allocated raw disk bytes by chunk type. Parity is a reserved part of the
allocated bytes, limiting the amount that can be used for data or metadata:
|
| flags               allocated         used       parity
| -----               ---------         ----       ------
| DATA|DUP              7.08GiB      7.08GiB        0.00B
| SYSTEM|DUP           16.00MiB     32.00KiB        0.00B
| METADATA|DUP        614.38MiB    204.09MiB        0.00B

Allocated bytes per device:
|
| devid      total size    allocated path
| -----      ----------    --------- ----
| 1             7.70GiB      7.69GiB /dev/loop0

Allocated bytes per device, split up per chunk type. Parity bytes are again
part of the total amount of allocated bytes.
|
| Device ID: 1
| | flags               allocated       parity
| | -----               ---------       ------
| | DATA|DUP              7.08GiB        0.00B
| | SYSTEM|DUP           16.00MiB        0.00B
| | METADATA|DUP        614.38MiB        0.00B

Unallocatable raw bytes per device:
|
| devid    unallocatable
| -----    -------------
| 1               0.00B

---- >8 ----

Now, generating the gap in the physical is as simple as dropping the
1578MiB file and triggering a transaction commit to clean up empty block
groups:

-# rm mountpoint/1578MiB
-# btrfs fi sync mountpoint/

for d in fs.dev_extents():
  print("start {} end {} vaddr {}".format(d.paddr, d.paddr + d.length,
d.vaddr))

start 1048576 end 11534336 vaddr 547880960
start 11534336 end 22020096 vaddr 547880960
start 22020096 end 30408704 vaddr 22020096
start 30408704 end 38797312 vaddr 22020096
start 38797312 end 92471296 vaddr 30408704
start 92471296 end 146145280 vaddr 30408704
start 146145280 end 213254144 vaddr 84082688
start 213254144 end 280363008 vaddr 84082688
start 280363008 end 397803520 vaddr 151191552
start 397803520 end 515244032 vaddr 151191552
start 515244032 end 632684544 vaddr 268632064
start 632684544 end 750125056 vaddr 268632064
start 750125056 end 867565568 vaddr 386072576
start 867565568 end 985006080 vaddr 386072576
start 985006080 end 1029373952 vaddr 503513088
start 1029373952 end 1073741824 vaddr 503513088

  [558366720, 843579392 and 1128792064 are gone]

start 2728394752 end 3567255552 vaddr 1385693184
start 3567255552 end 4406116352 vaddr 1385693184
start 4406116352 end 5244977152 vaddr 2224553984
start 5244977152 end 6083837952 vaddr 2224553984
start 6083837952 end 6352273408 vaddr 3063414784
start 6352273408 end 6620708864 vaddr 3063414784
start 6620708864 end 7441743872 vaddr 3331850240
start 7441743872 end 8262778880 vaddr 3331850240

-# ./show_usage.py /bork/mountpoint/
[...]
Allocatable bytes remaining: 1.54GiB
[...]

---- >8 ----

And then...

-# dd if=/dev/zero of=mountpoint/omgwtfbbq bs=1M
dd: error writing 'mountpoint/omgwtfbbq': No space left on device
801+0 records in
800+0 records out
838860800 bytes (839 MB, 800 MiB) copied, 12.6689 s, 66.2 MB/s

We happily write 800MiB(!), destroying the first 22MiB of the device
extent at 2728394752...

for d in fs.dev_extents():
  print("start {} end {} vaddr {}".format(d.paddr, d.paddr + d.length,
d.vaddr))

[...]
start 1912602624 end 2751463424 vaddr 4152885248
start 2728394752 end 3567255552 vaddr 1385693184
[...]

import btrfs
fs = btrfs.FileSystem('/bork/mountpoint/')
paddr = 1912602624
tree = btrfs.ctree.DEV_TREE_OBJECTID
devid = 1
for header, data in btrfs.ioctl.search_v2(fs.fd, tree, key, key):
  btrfs.utils.pretty_print(btrfs.ctree.DevExtent(header, data))

<btrfs.ctree.DevExtent (1 DEV_EXTENT 1912602624)>
devid: 1 (key objectid)
paddr: 1912602624 (key offset)
length: 838860800
chunk_offset: 4152885248
uuid: 00000000-0000-0000-0000-000000000000
chunk_objectid: 256
chunk_tree: 3

Ouch!

-# ./show_usage.py /bork/mountpoint/

[...]
Total raw filesystem size: 7.70GiB
Total raw allocated bytes: 7.72GiB
Allocatable bytes remaining: 0.00B
Unallocatable raw bytes: -22020096.00B
[...]

---- >8 ----

And now more fun:

-# btrfs scrub status mountpoint/
scrub status for 1f7a998a-4ea3-4117-8b42-9d0e6f1d3b4c
	scrub started at Tue Oct  9 01:35:30 2018 and finished after 00:00:13
	total bytes scrubbed: 6.52GiB with 0 errors

But that's probably because it's all data from /dev/zero...

-# btrfs check /dev/loop0
Checking filesystem on /dev/loop0
UUID: 1f7a998a-4ea3-4117-8b42-9d0e6f1d3b4c
checking extents
Device extent[1, 2728394752, 838860800] existed.
Chunk[256, 228, 1385693184] stripe[1, 2728394752] dismatch dev extent[1,
1912602624, 838860800]
Dev extent's total-byte(7445938176) is not equal to
byte-used(8284798976) in dev[1, 216, 1]
ERROR: errors found in extent allocation tree or chunk allocation
checking free space tree
checking fs roots
checking only csum items (without verifying data)
checking root refs
found 3917824000 bytes used, error(s) found
total csum bytes: 3722464
total tree bytes: 106020864
total fs tree bytes: 48955392
total extent tree bytes: 48644096
btree space waste bytes: 1428937
file data blocks allocated: 3811803136
 referenced 3811803136

This one actually already detects that something is wrong.

-- 
Hans van Kranenburg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
                   ` (7 preceding siblings ...)
  2018-10-05  7:51 ` Qu Wenruo
@ 2018-10-11 15:13 ` David Sterba
  2018-10-11 19:40   ` Hans van Kranenburg
  8 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2018-10-11 15:13 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs

On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
> This patch set contains an additional fix for a newly exposed bug after
> the previous attempt to fix a chunk allocator bug for new DUP chunks:
> 
> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
> 
> The DUP fix is "fix more DUP stripe size handling". I did that one
> before starting to change more things so it can be applied to earlier
> LTS kernels.
> 
> Besides that patch, which is fixing the bug in a way that is least
> intrusive, I added a bunch of other patches to help getting the chunk
> allocator code in a state that is a bit less error-prone and
> bug-attracting.
> 
> When running this and trying the reproduction scenario, I can now see
> that the created DUP device extent is 827326464 bytes long, which is
> good.
> 
> I wrote and tested this on top of linus 4.19-rc5. I still need to create
> a list of related use cases and test more things to at least walk
> through a bunch of obvious use cases to see if there's nothing exploding
> too quickly with these changes. However, I'm quite confident about it,
> so I'm sharing all of it already.
> 
> Any feedback and review is appreciated. Be gentle and keep in mind that
> I'm still very much in a learning stage regarding kernel development.

The patches look good, thanks. Problem is explained, preparatory work is
separated from the fix itself.

> The stable patches handling workflow is not 100% clear to me yet. I
> guess I have to add a Fixes: in the DUP patch which points to the
> previous commit 92e222df7b.

Almost nobody does it right, no worries. If you can identify a single
patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
with version where it makes sense & applies is enough. I do that check
myself regardless of what's in the patch.

I ran the patches in a VM and hit a division-by-zero in test
fstests/btrfs/011, stacktrace below. First guess is that it's caused by
patch 3/6.

[ 3116.065595] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 1 transid 5 /dev/vdb
[ 3116.071274] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 2 transid 5 /dev/vdc
[ 3116.087086] BTRFS info (device vdb): disk space caching is enabled
[ 3116.088644] BTRFS info (device vdb): has skinny extents
[ 3116.089796] BTRFS info (device vdb): flagging fs with big metadata feature
[ 3116.093971] BTRFS info (device vdb): checking UUID tree
[ 3125.853755] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdd started
[ 3125.860269] divide error: 0000 [#1] PREEMPT SMP
[ 3125.861264] CPU: 1 PID: 6477 Comm: btrfs Not tainted 4.19.0-rc7-default+ #288
[ 3125.862841] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
[ 3125.865385] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
[ 3125.870541] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
[ 3125.871862] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002
[ 3125.873587] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000
[ 3125.874677] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002
[ 3125.875816] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000
[ 3125.876742] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
[ 3125.877657] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000
[ 3125.878862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3125.880080] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0
[ 3125.881485] Call Trace:
[ 3125.882105]  do_chunk_alloc+0x266/0x3e0 [btrfs]
[ 3125.882841]  btrfs_inc_block_group_ro+0x10e/0x160 [btrfs]
[ 3125.883875]  scrub_enumerate_chunks+0x18b/0x5d0 [btrfs]
[ 3125.884658]  ? is_module_address+0x11/0x30
[ 3125.885271]  ? wait_for_completion+0x160/0x190
[ 3125.885928]  btrfs_scrub_dev+0x1b8/0x5a0 [btrfs]
[ 3125.887767]  ? start_transaction+0xa1/0x470 [btrfs]
[ 3125.888648]  btrfs_dev_replace_start.cold.19+0x155/0x17e [btrfs]
[ 3125.889459]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
[ 3125.890251]  btrfs_ioctl+0x2a94/0x31d0 [btrfs]
[ 3125.890885]  ? do_sigaction+0x7c/0x210
[ 3125.891731]  ? do_vfs_ioctl+0xa2/0x6b0
[ 3125.892652]  do_vfs_ioctl+0xa2/0x6b0
[ 3125.893642]  ? do_sigaction+0x1a7/0x210
[ 3125.894665]  ksys_ioctl+0x3a/0x70
[ 3125.895523]  __x64_sys_ioctl+0x16/0x20
[ 3125.896339]  do_syscall_64+0x5a/0x1a0
[ 3125.896949]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3125.897638] RIP: 0033:0x7f6de28ecaa7
[ 3125.901313] RSP: 002b:00007ffe963da9c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 3125.902486] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6de28ecaa7
[ 3125.903538] RDX: 00007ffe963dae00 RSI: 00000000ca289435 RDI: 0000000000000003
[ 3125.904878] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 3125.905788] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe963de26f
[ 3125.906700] R13: 0000000000000001 R14: 0000000000000004 R15: 000055fceeceb2a0
[ 3125.907954] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
[ 3125.909342] ---[ end trace 5492bb467d3be2da ]---
[ 3125.910031] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
[ 3125.913600] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
[ 3125.914595] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002
[ 3125.916209] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000
[ 3125.917701] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002
[ 3125.919209] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000
[ 3125.920782] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
[ 3125.922413] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000
[ 3125.924264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3125.925627] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-11 15:13 ` David Sterba
@ 2018-10-11 19:40   ` Hans van Kranenburg
  2018-10-11 20:34     ` Hans van Kranenburg
  2018-11-13 15:03     ` David Sterba
  0 siblings, 2 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-11 19:40 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 10/11/2018 05:13 PM, David Sterba wrote:
> On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
>> This patch set contains an additional fix for a newly exposed bug after
>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>
>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>
>> The DUP fix is "fix more DUP stripe size handling". I did that one
>> before starting to change more things so it can be applied to earlier
>> LTS kernels.
>>
>> Besides that patch, which is fixing the bug in a way that is least
>> intrusive, I added a bunch of other patches to help getting the chunk
>> allocator code in a state that is a bit less error-prone and
>> bug-attracting.
>>
>> When running this and trying the reproduction scenario, I can now see
>> that the created DUP device extent is 827326464 bytes long, which is
>> good.
>>
>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>> a list of related use cases and test more things to at least walk
>> through a bunch of obvious use cases to see if there's nothing exploding
>> too quickly with these changes. However, I'm quite confident about it,
>> so I'm sharing all of it already.
>>
>> Any feedback and review is appreciated. Be gentle and keep in mind that
>> I'm still very much in a learning stage regarding kernel development.
> 
> The patches look good, thanks. Problem is explained, preparatory work is
> separated from the fix itself.

\o/

>> The stable patches handling workflow is not 100% clear to me yet. I
>> guess I have to add a Fixes: in the DUP patch which points to the
>> previous commit 92e222df7b.
> 
> Almost nobody does it right, no worries. If you can identify a single
> patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
> with version where it makes sense & applies is enough. I do that check
> myself regardless of what's in the patch.

It's 92e222df7b and the thing I'm not sure about is if it also will
catch the same patch which was already backported to LTS kernels since
92e222df7b also has Fixes in it... So by now the new bug is in 4.19,
4.14, 4.9, 4.4, 3.16...

> I ran the patches in a VM and hit a division-by-zero in test
> fstests/btrfs/011, stacktrace below. First guess is that it's caused by
> patch 3/6.

Ah interesting, dev replace.

I'll play around with replace and find out how to run the tests properly
and then reproduce this.

The code introduced in patch 3 is removed again in patch 6, so I don't
suspect that one. :)

But, I'll find out.

Thanks for testing.

Hans

> [ 3116.065595] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 1 transid 5 /dev/vdb
> [ 3116.071274] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 2 transid 5 /dev/vdc
> [ 3116.087086] BTRFS info (device vdb): disk space caching is enabled
> [ 3116.088644] BTRFS info (device vdb): has skinny extents
> [ 3116.089796] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 3116.093971] BTRFS info (device vdb): checking UUID tree
> [ 3125.853755] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdd started
> [ 3125.860269] divide error: 0000 [#1] PREEMPT SMP
> [ 3125.861264] CPU: 1 PID: 6477 Comm: btrfs Not tainted 4.19.0-rc7-default+ #288
> [ 3125.862841] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 3125.865385] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
> [ 3125.870541] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
> [ 3125.871862] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002
> [ 3125.873587] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000
> [ 3125.874677] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002
> [ 3125.875816] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000
> [ 3125.876742] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
> [ 3125.877657] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000
> [ 3125.878862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3125.880080] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0
> [ 3125.881485] Call Trace:
> [ 3125.882105]  do_chunk_alloc+0x266/0x3e0 [btrfs]
> [ 3125.882841]  btrfs_inc_block_group_ro+0x10e/0x160 [btrfs]
> [ 3125.883875]  scrub_enumerate_chunks+0x18b/0x5d0 [btrfs]
> [ 3125.884658]  ? is_module_address+0x11/0x30
> [ 3125.885271]  ? wait_for_completion+0x160/0x190
> [ 3125.885928]  btrfs_scrub_dev+0x1b8/0x5a0 [btrfs]
> [ 3125.887767]  ? start_transaction+0xa1/0x470 [btrfs]
> [ 3125.888648]  btrfs_dev_replace_start.cold.19+0x155/0x17e [btrfs]
> [ 3125.889459]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
> [ 3125.890251]  btrfs_ioctl+0x2a94/0x31d0 [btrfs]
> [ 3125.890885]  ? do_sigaction+0x7c/0x210
> [ 3125.891731]  ? do_vfs_ioctl+0xa2/0x6b0
> [ 3125.892652]  do_vfs_ioctl+0xa2/0x6b0
> [ 3125.893642]  ? do_sigaction+0x1a7/0x210
> [ 3125.894665]  ksys_ioctl+0x3a/0x70
> [ 3125.895523]  __x64_sys_ioctl+0x16/0x20
> [ 3125.896339]  do_syscall_64+0x5a/0x1a0
> [ 3125.896949]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 3125.897638] RIP: 0033:0x7f6de28ecaa7
> [ 3125.901313] RSP: 002b:00007ffe963da9c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [ 3125.902486] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6de28ecaa7
> [ 3125.903538] RDX: 00007ffe963dae00 RSI: 00000000ca289435 RDI: 0000000000000003
> [ 3125.904878] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ 3125.905788] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe963de26f
> [ 3125.906700] R13: 0000000000000001 R14: 0000000000000004 R15: 000055fceeceb2a0
> [ 3125.907954] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
> [ 3125.909342] ---[ end trace 5492bb467d3be2da ]---
> [ 3125.910031] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
> [ 3125.913600] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
> [ 3125.914595] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002
> [ 3125.916209] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000
> [ 3125.917701] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002
> [ 3125.919209] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000
> [ 3125.920782] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
> [ 3125.922413] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000
> [ 3125.924264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3125.925627] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-11 19:40   ` Hans van Kranenburg
@ 2018-10-11 20:34     ` Hans van Kranenburg
  2018-11-13 15:03     ` David Sterba
  1 sibling, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-10-11 20:34 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 10/11/2018 09:40 PM, Hans van Kranenburg wrote:
> On 10/11/2018 05:13 PM, David Sterba wrote:
>> On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
>>> This patch set contains an additional fix for a newly exposed bug after
>>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>>
>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>>
>>> The DUP fix is "fix more DUP stripe size handling". I did that one
>>> before starting to change more things so it can be applied to earlier
>>> LTS kernels.
>>>
>>> Besides that patch, which is fixing the bug in a way that is least
>>> intrusive, I added a bunch of other patches to help getting the chunk
>>> allocator code in a state that is a bit less error-prone and
>>> bug-attracting.
>>>
>>> When running this and trying the reproduction scenario, I can now see
>>> that the created DUP device extent is 827326464 bytes long, which is
>>> good.
>>>
>>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>>> a list of related use cases and test more things to at least walk
>>> through a bunch of obvious use cases to see if there's nothing exploding
>>> too quickly with these changes. However, I'm quite confident about it,
>>> so I'm sharing all of it already.
>>>
>>> Any feedback and review is appreciated. Be gentle and keep in mind that
>>> I'm still very much in a learning stage regarding kernel development.
>>
>> The patches look good, thanks. Problem is explained, preparatory work is
>> separated from the fix itself.
> 
> \o/
> 
>>> The stable patches handling workflow is not 100% clear to me yet. I
>>> guess I have to add a Fixes: in the DUP patch which points to the
>>> previous commit 92e222df7b.
>>
>> Almost nobody does it right, no worries. If you can identify a single
>> patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
>> with version where it makes sense & applies is enough. I do that check
>> myself regardless of what's in the patch.
> 
> It's 92e222df7b and the thing I'm not sure about is if it also will
> catch the same patch which was already backported to LTS kernels since
> 92e222df7b also has Fixes in it... So by now the new bug is in 4.19,
> 4.14, 4.9, 4.4, 3.16...
> 
>> I ran the patches in a VM and hit a division-by-zero in test
>> fstests/btrfs/011, stacktrace below. First guess is that it's caused by
>> patch 3/6.
> 
> Ah interesting, dev replace.
> 
> I'll play around with replace and find out how to run the tests properly
> and then reproduce this.
> 
> The code introduced in patch 3 is removed again in patch 6, so I don't
> suspect that one. :)

Actually, while writing this I realize that this means it should be
tested separately (like, older kernel with only 3), because, well,
obvious I guess.

> But, I'll find out.
> 
> Thanks for testing.
> 
> Hans
> 
>> [ 3116.065595] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 1 transid 5 /dev/vdb
>> [ 3116.071274] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 2 transid 5 /dev/vdc
>> [ 3116.087086] BTRFS info (device vdb): disk space caching is enabled
>> [ 3116.088644] BTRFS info (device vdb): has skinny extents
>> [ 3116.089796] BTRFS info (device vdb): flagging fs with big metadata feature
>> [ 3116.093971] BTRFS info (device vdb): checking UUID tree
>> [ 3125.853755] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdd started
>> [ 3125.860269] divide error: 0000 [#1] PREEMPT SMP
>> [ 3125.861264] CPU: 1 PID: 6477 Comm: btrfs Not tainted 4.19.0-rc7-default+ #288
>> [ 3125.862841] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
>> [ 3125.865385] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
>> [ 3125.870541] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
>> [ 3125.871862] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002
>> [ 3125.873587] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000
>> [ 3125.874677] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002
>> [ 3125.875816] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000
>> [ 3125.876742] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
>> [ 3125.877657] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000
>> [ 3125.878862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3125.880080] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0
>> [ 3125.881485] Call Trace:
>> [ 3125.882105]  do_chunk_alloc+0x266/0x3e0 [btrfs]
>> [ 3125.882841]  btrfs_inc_block_group_ro+0x10e/0x160 [btrfs]
>> [ 3125.883875]  scrub_enumerate_chunks+0x18b/0x5d0 [btrfs]
>> [ 3125.884658]  ? is_module_address+0x11/0x30
>> [ 3125.885271]  ? wait_for_completion+0x160/0x190
>> [ 3125.885928]  btrfs_scrub_dev+0x1b8/0x5a0 [btrfs]
>> [ 3125.887767]  ? start_transaction+0xa1/0x470 [btrfs]
>> [ 3125.888648]  btrfs_dev_replace_start.cold.19+0x155/0x17e [btrfs]
>> [ 3125.889459]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
>> [ 3125.890251]  btrfs_ioctl+0x2a94/0x31d0 [btrfs]
>> [ 3125.890885]  ? do_sigaction+0x7c/0x210
>> [ 3125.891731]  ? do_vfs_ioctl+0xa2/0x6b0
>> [ 3125.892652]  do_vfs_ioctl+0xa2/0x6b0
>> [ 3125.893642]  ? do_sigaction+0x1a7/0x210
>> [ 3125.894665]  ksys_ioctl+0x3a/0x70
>> [ 3125.895523]  __x64_sys_ioctl+0x16/0x20
>> [ 3125.896339]  do_syscall_64+0x5a/0x1a0
>> [ 3125.896949]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [ 3125.897638] RIP: 0033:0x7f6de28ecaa7
>> [ 3125.901313] RSP: 002b:00007ffe963da9c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>> [ 3125.902486] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6de28ecaa7
>> [ 3125.903538] RDX: 00007ffe963dae00 RSI: 00000000ca289435 RDI: 0000000000000003
>> [ 3125.904878] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>> [ 3125.905788] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe963de26f
>> [ 3125.906700] R13: 0000000000000001 R14: 0000000000000004 R15: 000055fceeceb2a0
>> [ 3125.907954] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop
>> [ 3125.909342] ---[ end trace 5492bb467d3be2da ]---
>> [ 3125.910031] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
>> [ 3125.913600] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
>> [ 3125.914595] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002
>> [ 3125.916209] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000
>> [ 3125.917701] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002
>> [ 3125.919209] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000
>> [ 3125.920782] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002
>> [ 3125.922413] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000
>> [ 3125.924264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3125.925627] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0
>>
> 
> 


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-10-11 19:40   ` Hans van Kranenburg
  2018-10-11 20:34     ` Hans van Kranenburg
@ 2018-11-13 15:03     ` David Sterba
  2018-11-13 16:45       ` Hans van Kranenburg
  1 sibling, 1 reply; 26+ messages in thread
From: David Sterba @ 2018-11-13 15:03 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: dsterba, linux-btrfs

On Thu, Oct 11, 2018 at 07:40:22PM +0000, Hans van Kranenburg wrote:
> On 10/11/2018 05:13 PM, David Sterba wrote:
> > On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
> >> This patch set contains an additional fix for a newly exposed bug after
> >> the previous attempt to fix a chunk allocator bug for new DUP chunks:
> >>
> >> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
> >>
> >> The DUP fix is "fix more DUP stripe size handling". I did that one
> >> before starting to change more things so it can be applied to earlier
> >> LTS kernels.
> >>
> >> Besides that patch, which is fixing the bug in a way that is least
> >> intrusive, I added a bunch of other patches to help getting the chunk
> >> allocator code in a state that is a bit less error-prone and
> >> bug-attracting.
> >>
> >> When running this and trying the reproduction scenario, I can now see
> >> that the created DUP device extent is 827326464 bytes long, which is
> >> good.
> >>
> >> I wrote and tested this on top of linus 4.19-rc5. I still need to create
> >> a list of related use cases and test more things to at least walk
> >> through a bunch of obvious use cases to see if there's nothing exploding
> >> too quickly with these changes. However, I'm quite confident about it,
> >> so I'm sharing all of it already.
> >>
> >> Any feedback and review is appreciated. Be gentle and keep in mind that
> >> I'm still very much in a learning stage regarding kernel development.
> > 
> > The patches look good, thanks. Problem is explained, preparatory work is
> > separated from the fix itself.
> 
> \o/
> 
> >> The stable patches handling workflow is not 100% clear to me yet. I
> >> guess I have to add a Fixes: in the DUP patch which points to the
> >> previous commit 92e222df7b.
> > 
> > Almost nobody does it right, no worries. If you can identify a single
> > patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
> > with version where it makes sense & applies is enough. I do that check
> > myself regardless of what's in the patch.
> 
> It's 92e222df7b and the thing I'm not sure about is if it also will
> catch the same patch which was already backported to LTS kernels since
> 92e222df7b also has Fixes in it... So by now the new bug is in 4.19,
> 4.14, 4.9, 4.4, 3.16...
> 
> > I ran the patches in a VM and hit a division-by-zero in test
> > fstests/btrfs/011, stacktrace below. First guess is that it's caused by
> > patch 3/6.
> 
> Ah interesting, dev replace.
> 
> I'll play around with replace and find out how to run the tests properly
> and then reproduce this.
> 
> The code introduced in patch 3 is removed again in patch 6, so I don't
> suspect that one. :)
> 
> But, I'll find out.
> 
> Thanks for testing.

I've merged patches 1-5 to misc-next as they seem to be safe and pass
fstests, please let me know when you have updates to the last one.
Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/6] Chunk allocator DUP fix and cleanups
  2018-11-13 15:03     ` David Sterba
@ 2018-11-13 16:45       ` Hans van Kranenburg
  0 siblings, 0 replies; 26+ messages in thread
From: Hans van Kranenburg @ 2018-11-13 16:45 UTC (permalink / raw)
  To: dsterba, linux-btrfs

On 11/13/18 4:03 PM, David Sterba wrote:
> On Thu, Oct 11, 2018 at 07:40:22PM +0000, Hans van Kranenburg wrote:
>> On 10/11/2018 05:13 PM, David Sterba wrote:
>>> On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
>>>> This patch set contains an additional fix for a newly exposed bug after
>>>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>>>
>>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>>>
>>>> The DUP fix is "fix more DUP stripe size handling". I did that one
>>>> before starting to change more things so it can be applied to earlier
>>>> LTS kernels.
>>>>
>>>> Besides that patch, which is fixing the bug in a way that is least
>>>> intrusive, I added a bunch of other patches to help getting the chunk
>>>> allocator code in a state that is a bit less error-prone and
>>>> bug-attracting.
>>>>
>>>> When running this and trying the reproduction scenario, I can now see
>>>> that the created DUP device extent is 827326464 bytes long, which is
>>>> good.
>>>>
>>>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>>>> a list of related use cases and test more things to at least walk
>>>> through a bunch of obvious use cases to see if there's nothing exploding
>>>> too quickly with these changes. However, I'm quite confident about it,
>>>> so I'm sharing all of it already.
>>>>
>>>> Any feedback and review is appreciated. Be gentle and keep in mind that
>>>> I'm still very much in a learning stage regarding kernel development.
>>>
>>> The patches look good, thanks. Problem is explained, preparatory work is
>>> separated from the fix itself.
>>
>> \o/
>>
>>>> The stable patches handling workflow is not 100% clear to me yet. I
>>>> guess I have to add a Fixes: in the DUP patch which points to the
>>>> previous commit 92e222df7b.
>>>
>>> Almost nobody does it right, no worries. If you can identify a single
>>> patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
>>> with version where it makes sense & applies is enough. I do that check
>>> myself regardless of what's in the patch.
>>
>> It's 92e222df7b and the thing I'm not sure about is if it also will
>> catch the same patch which was already backported to LTS kernels since
>> 92e222df7b also has Fixes in it... So by now the new bug is in 4.19,
>> 4.14, 4.9, 4.4, 3.16...
>>
>>> I ran the patches in a VM and hit a division-by-zero in test
>>> fstests/btrfs/011, stacktrace below. First guess is that it's caused by
>>> patch 3/6.
>>
>> Ah interesting, dev replace.
>>
>> I'll play around with replace and find out how to run the tests properly
>> and then reproduce this.
>>
>> The code introduced in patch 3 is removed again in patch 6, so I don't
>> suspect that one. :)
>>
>> But, I'll find out.
>>
>> Thanks for testing.
> 
> I've merged patches 1-5 to misc-next as they seem to be safe and pass
> fstests, please let me know when you have updates to the last one.
> Thanks.

I'll definitely follow up on that. It has not happened because something
about todo lists and ordering work and not doing too many things at the
same time.

Thanks for already confirming it was not patch 3 but 6 that has the
problem, I already suspected that.

For patch 3, the actual fix, how do we get that on top of old stable
kernels where the previous fix (92e222df7b) is in? Because of the Fixes:
line that one was picked again in 4.14, 4.9, 4.4 and even 3.16. How does
this work? Does it need all the other commit ids from those branches in
Fixes lines? Or is there magic somewhere that does this?

From your "development cycle open" mails, I understand that for testing
myself, I would test based on misc-next, for-next or for-x.y in that
order depending on where the first 5 are yet at that moment?

Hans

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-11-13 16:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-04 21:24 [PATCH 0/6] Chunk allocator DUP fix and cleanups Hans van Kranenburg
2018-10-04 21:24 ` [PATCH 1/6] btrfs: alloc_chunk: do not refurbish num_bytes Hans van Kranenburg
2018-10-05  8:59   ` Nikolay Borisov
2018-10-05  9:00     ` Nikolay Borisov
2018-10-04 21:24 ` [PATCH 2/6] btrfs: alloc_chunk: improve chunk size variable name Hans van Kranenburg
2018-10-04 21:36   ` Hans van Kranenburg
2018-10-05  9:00   ` Nikolay Borisov
2018-10-04 21:24 ` [PATCH 3/6] btrfs: alloc_chunk: fix more DUP stripe size handling Hans van Kranenburg
2018-10-04 21:24 ` [PATCH 4/6] btrfs: fix ncopies raid_attr for RAID56 Hans van Kranenburg
2018-10-05  9:10   ` Nikolay Borisov
2018-10-04 21:24 ` [PATCH 5/6] btrfs: introduce nparity raid_attr Hans van Kranenburg
2018-10-04 22:21   ` Hans van Kranenburg
2018-10-05 14:42   ` Nikolay Borisov
2018-10-05 22:45     ` Hans van Kranenburg
2018-10-04 21:24 ` [PATCH 6/6] btrfs: alloc_chunk: rework chunk/stripe calculations Hans van Kranenburg
2018-10-05  7:51 ` [PATCH 0/6] Chunk allocator DUP fix and cleanups Qu Wenruo
2018-10-05 10:58   ` Hans van Kranenburg
2018-10-08  6:43     ` Qu Wenruo
2018-10-08 13:19       ` Hans van Kranenburg
2018-10-08 23:51         ` Hans van Kranenburg
2018-10-05  7:51 ` Qu Wenruo
2018-10-11 15:13 ` David Sterba
2018-10-11 19:40   ` Hans van Kranenburg
2018-10-11 20:34     ` Hans van Kranenburg
2018-11-13 15:03     ` David Sterba
2018-11-13 16:45       ` Hans van Kranenburg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.