linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] RAID1 with 3- and 4- copies
@ 2019-10-31 15:13 David Sterba
  2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Here it goes again, RAID1 with 3- and 4- copies. I found the bug that stopped
it from inclusion last time, it was in the test itself, so the kernel code is
effectively unchanged.

So, with 1 or 2 missing devices, replace by device id works. There's one
annoying thing but not new: regarding replace of a missing device, some
extra single/dup block groups are created during the replace process.
Example below. This can happen on plain raid1 with degraded read-write
mount as well.

Now what's the merge target.

The patches almost made it to 5.3, the changes build on existing code so the
actual addition of new profiles is namely in the definitions and additional
cases. So it should be safe.

I'm for adding it to 5.5 queue, though we're at rc5 and this can be seen as a
late time for a feature. The user benefits are noticeable, raid1c3 can replace
raid6 of metadata which is the most problematic part and much more complicated
to fix (write ahead journal or something like that). The feedback regarding the
plain 3-copy as a replacement was positive, on IRC and there are mails about
that too.

Further information can be found in the 5.3-time submission:
https://lore.kernel.org/linux-btrfs/cover.1559917235.git.dsterba@suse.com/

--

Example of 2 devices gone missing and replaced
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 - mkfs -d raid1c3 -m raidc3 /dev/sda10 /dev/sda11 /dev/sda12

 - delete devices 2 and 3 from the system

              Data      Metadata  System
Id Path       RAID1C3   RAID1C3   RAID1C3  Unallocated
-- ---------- --------- --------- -------- -----------
 1 /dev/sda10   1.00GiB 256.00MiB  8.00MiB     8.74GiB
 2 missing      1.00GiB 256.00MiB  8.00MiB    -1.26GiB
 3 missing      1.00GiB 256.00MiB  8.00MiB    -1.26GiB
-- ---------- --------- --------- -------- -----------
   Total        1.00GiB 256.00MiB  8.00MiB     6.23GiB
   Used       200.31MiB 320.00KiB 16.00KiB

- mount -o degraded

- btrfs replace 2 /dev/sda13

              Data      Metadata  Metadata  System   System
Id Path       RAID1C3   single    RAID1C3   single   RAID1C3 Unallocated
-- ---------- --------- --------- --------- -------- ------- -----------
 1 /dev/sda10   1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB     8.46GiB
 2 /dev/sda13   1.00GiB         - 256.00MiB        - 8.00MiB     8.74GiB
 3 missing      1.00GiB         - 256.00MiB        - 8.00MiB    -1.26GiB
-- ---------- --------- --------- --------- -------- ------- -----------
   Total        1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB    15.95GiB
   Used       200.31MiB     0.00B 320.00KiB 16.00KiB   0.00B


- btrfs replace 3 /dev/sda14

              Data      Metadata  Metadata  System   System
Id Path       RAID1C3   single    RAID1C3   single   RAID1C3 Unallocated
-- ---------- --------- --------- --------- -------- ------- -----------
 1 /dev/sda10   1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB     8.46GiB
 2 /dev/sda13   1.00GiB         - 256.00MiB        - 8.00MiB     8.74GiB
 3 /dev/sda14   1.00GiB         - 256.00MiB        - 8.00MiB     8.74GiB
-- ---------- --------- --------- --------- -------- ------- -----------
   Total        1.00GiB 256.00MiB 256.00MiB 32.00MiB 8.00MiB    25.95GiB
   Used       200.31MiB     0.00B 320.00KiB 16.00KiB   0.00B

There you can see the metadata/single and system/single chunks, that are
otherwise unused if there are no other writes happening during replace.
Running 'balance start -mconvert=raid1c3,profiles=single' should get rid of
them.

This is an annoyance, we have a plan to avoid that but it needs to change
behaviour with degraded mount and enabled writes.

Implementation details: The new profiles are reduced from the expected ones
  (raid1 -> single or dup) to allow writes without breaking the raid
  constraints.  To relax that condition, allow writing to "half" of the raid
  with a missing device will skip creating the block groups.

  This is similar to MD-RAID that allows writing to just one of the RAID1
  devices, and then sync to the other when it's available again.

  With the btrfs style raid1 we can do better in case there are enough other
  devices that would satify the raid1 constraint (yet with a missing device).

--

David Sterba (4):
  btrfs: add support for 3-copy replication (raid1c3)
  btrfs: add support for 4-copy replication (raid1c4)
  btrfs: add incompat for raid1 with 3, 4 copies
  btrfs: drop incompat bit for raid1c34 after last block group is gone

 fs/btrfs/block-group.c          | 27 ++++++++++++++--------
 fs/btrfs/ctree.h                |  7 +++---
 fs/btrfs/super.c                |  4 ++++
 fs/btrfs/sysfs.c                |  2 ++
 fs/btrfs/volumes.c              | 40 +++++++++++++++++++++++++++++++--
 fs/btrfs/volumes.h              |  4 ++++
 include/uapi/linux/btrfs.h      |  5 ++++-
 include/uapi/linux/btrfs_tree.h | 10 ++++++++-
 8 files changed, 83 insertions(+), 16 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3)
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
@ 2019-10-31 15:13 ` David Sterba
  2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Add new block group profile to store 3 copies in a simliar way that
current RAID1 does. The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the
2-copy RAID1.

The minimum number of devices is 3, the maximum number of devices/chunks
that can be lost/damaged is 2. Like RAID6 but with 33% space
utilization.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h                |  4 ++--
 fs/btrfs/super.c                |  2 ++
 fs/btrfs/volumes.c              | 19 +++++++++++++++++--
 fs/btrfs/volumes.h              |  2 ++
 include/uapi/linux/btrfs.h      |  3 ++-
 include/uapi/linux/btrfs_tree.h |  6 +++++-
 6 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1c8f01eaf27c..aa1b437fb951 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -57,9 +57,9 @@ struct btrfs_ref;
  * filesystem data as well that can be used to read data in order to repair
  * read errors on other disks.
  *
- * Current value is derived from RAID1 with 2 copies.
+ * Current value is derived from RAID1C3 with 3 copies.
  */
-#define BTRFS_MAX_MIRRORS (2 + 1)
+#define BTRFS_MAX_MIRRORS (3 + 1)
 
 #define BTRFS_MAX_LEVEL 8
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 3f49407cc2aa..a5aff138e2e0 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1935,6 +1935,8 @@ static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
 		num_stripes = nr_devices;
 	else if (type & BTRFS_BLOCK_GROUP_RAID1)
 		num_stripes = 2;
+	else if (type & BTRFS_BLOCK_GROUP_RAID1C3)
+		num_stripes = 3;
 	else if (type & BTRFS_BLOCK_GROUP_RAID10)
 		num_stripes = 4;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f534a6a5553e..22560062269f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -58,6 +58,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
 	},
+	[BTRFS_RAID_RAID1C3] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 3,
+		.tolerated_failures = 2,
+		.devs_increment	= 3,
+		.ncopies	= 3,
+		.raid_name	= "raid1c3",
+		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1C3,
+		.mindev_error	= BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET,
+	},
 	[BTRFS_RAID_DUP] = {
 		.sub_stripes	= 1,
 		.dev_stripes	= 2,
@@ -4839,8 +4851,11 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 	sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
 	     btrfs_cmp_device_info, NULL);
 
-	/* round down to number of usable stripes */
-	ndevs = round_down(ndevs, devs_increment);
+	/*
+	 * Round down to number of usable stripes, devs_increment can be any
+	 * number so we can't use round_down()
+	 */
+	ndevs -= ndevs % devs_increment;
 
 	if (ndevs < devs_min) {
 		ret = -ENOSPC;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index ac4ba8c57283..a4e26b84e1b9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -545,6 +545,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
 		return BTRFS_RAID_RAID10;
 	else if (flags & BTRFS_BLOCK_GROUP_RAID1)
 		return BTRFS_RAID_RAID1;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID1C3)
+		return BTRFS_RAID_RAID1C3;
 	else if (flags & BTRFS_BLOCK_GROUP_DUP)
 		return BTRFS_RAID_DUP;
 	else if (flags & BTRFS_BLOCK_GROUP_RAID0)
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index 3ee0678c0a83..ba22f91a3f5b 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -831,7 +831,8 @@ enum btrfs_err_code {
 	BTRFS_ERROR_DEV_TGT_REPLACE,
 	BTRFS_ERROR_DEV_MISSING_NOT_FOUND,
 	BTRFS_ERROR_DEV_ONLY_WRITABLE,
-	BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS
+	BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS,
+	BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET,
 };
 
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 5160be1d7332..52b2964b0311 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -841,6 +841,7 @@ struct btrfs_dev_replace_item {
 #define BTRFS_BLOCK_GROUP_RAID10	(1ULL << 6)
 #define BTRFS_BLOCK_GROUP_RAID5         (1ULL << 7)
 #define BTRFS_BLOCK_GROUP_RAID6         (1ULL << 8)
+#define BTRFS_BLOCK_GROUP_RAID1C3       (1ULL << 9)
 #define BTRFS_BLOCK_GROUP_RESERVED	(BTRFS_AVAIL_ALLOC_BIT_SINGLE | \
 					 BTRFS_SPACE_INFO_GLOBAL_RSV)
 
@@ -852,6 +853,7 @@ enum btrfs_raid_types {
 	BTRFS_RAID_SINGLE,
 	BTRFS_RAID_RAID5,
 	BTRFS_RAID_RAID6,
+	BTRFS_RAID_RAID1C3,
 	BTRFS_NR_RAID_TYPES
 };
 
@@ -861,6 +863,7 @@ enum btrfs_raid_types {
 
 #define BTRFS_BLOCK_GROUP_PROFILE_MASK	(BTRFS_BLOCK_GROUP_RAID0 |   \
 					 BTRFS_BLOCK_GROUP_RAID1 |   \
+					 BTRFS_BLOCK_GROUP_RAID1C3 | \
 					 BTRFS_BLOCK_GROUP_RAID5 |   \
 					 BTRFS_BLOCK_GROUP_RAID6 |   \
 					 BTRFS_BLOCK_GROUP_DUP |     \
@@ -868,7 +871,8 @@ enum btrfs_raid_types {
 #define BTRFS_BLOCK_GROUP_RAID56_MASK	(BTRFS_BLOCK_GROUP_RAID5 |   \
 					 BTRFS_BLOCK_GROUP_RAID6)
 
-#define BTRFS_BLOCK_GROUP_RAID1_MASK	(BTRFS_BLOCK_GROUP_RAID1)
+#define BTRFS_BLOCK_GROUP_RAID1_MASK	(BTRFS_BLOCK_GROUP_RAID1 |   \
+					 BTRFS_BLOCK_GROUP_RAID1C3)
 
 /*
  * We need a bit for restriper to be able to tell when chunks of type
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4)
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
  2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
@ 2019-10-31 15:13 ` David Sterba
  2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Add new block group profile to store 4 copies in a simliar way that
current RAID1 does.  The profile attributes and constraints are defined
in the raid table and used by the same code that already handles the 2-
and 3-copy RAID1.

The minimum number of devices is 4, the maximum number of devices/chunks
that can be lost/damaged is 3. There is no comparable traditional RAID
level, the profile is added for future needs to accompany triple-parity
and beyond.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h                |  4 ++--
 fs/btrfs/super.c                |  2 ++
 fs/btrfs/volumes.c              | 12 ++++++++++++
 fs/btrfs/volumes.h              |  2 ++
 include/uapi/linux/btrfs.h      |  1 +
 include/uapi/linux/btrfs_tree.h |  6 +++++-
 6 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aa1b437fb951..923a8804ae94 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -57,9 +57,9 @@ struct btrfs_ref;
  * filesystem data as well that can be used to read data in order to repair
  * read errors on other disks.
  *
- * Current value is derived from RAID1C3 with 3 copies.
+ * Current value is derived from RAID1C4 with 4 copies.
  */
-#define BTRFS_MAX_MIRRORS (3 + 1)
+#define BTRFS_MAX_MIRRORS (4 + 1)
 
 #define BTRFS_MAX_LEVEL 8
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index a5aff138e2e0..a98c3c71fc54 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1937,6 +1937,8 @@ static inline int btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
 		num_stripes = 2;
 	else if (type & BTRFS_BLOCK_GROUP_RAID1C3)
 		num_stripes = 3;
+	else if (type & BTRFS_BLOCK_GROUP_RAID1C4)
+		num_stripes = 4;
 	else if (type & BTRFS_BLOCK_GROUP_RAID10)
 		num_stripes = 4;
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 22560062269f..238d814f83a1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -70,6 +70,18 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1C3,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET,
 	},
+	[BTRFS_RAID_RAID1C4] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 4,
+		.tolerated_failures = 3,
+		.devs_increment	= 4,
+		.ncopies	= 4,
+		.raid_name	= "raid1c4",
+		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1C4,
+		.mindev_error	= BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET,
+	},
 	[BTRFS_RAID_DUP] = {
 		.sub_stripes	= 1,
 		.dev_stripes	= 2,
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index a4e26b84e1b9..46987a2da786 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -547,6 +547,8 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
 		return BTRFS_RAID_RAID1;
 	else if (flags & BTRFS_BLOCK_GROUP_RAID1C3)
 		return BTRFS_RAID_RAID1C3;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID1C4)
+		return BTRFS_RAID_RAID1C4;
 	else if (flags & BTRFS_BLOCK_GROUP_DUP)
 		return BTRFS_RAID_DUP;
 	else if (flags & BTRFS_BLOCK_GROUP_RAID0)
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index ba22f91a3f5b..a2b761275bba 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -833,6 +833,7 @@ enum btrfs_err_code {
 	BTRFS_ERROR_DEV_ONLY_WRITABLE,
 	BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS,
 	BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET,
+	BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET,
 };
 
 #define BTRFS_IOC_SNAP_CREATE _IOW(BTRFS_IOCTL_MAGIC, 1, \
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 52b2964b0311..8e322e2c7e78 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -842,6 +842,7 @@ struct btrfs_dev_replace_item {
 #define BTRFS_BLOCK_GROUP_RAID5         (1ULL << 7)
 #define BTRFS_BLOCK_GROUP_RAID6         (1ULL << 8)
 #define BTRFS_BLOCK_GROUP_RAID1C3       (1ULL << 9)
+#define BTRFS_BLOCK_GROUP_RAID1C4       (1ULL << 10)
 #define BTRFS_BLOCK_GROUP_RESERVED	(BTRFS_AVAIL_ALLOC_BIT_SINGLE | \
 					 BTRFS_SPACE_INFO_GLOBAL_RSV)
 
@@ -854,6 +855,7 @@ enum btrfs_raid_types {
 	BTRFS_RAID_RAID5,
 	BTRFS_RAID_RAID6,
 	BTRFS_RAID_RAID1C3,
+	BTRFS_RAID_RAID1C4,
 	BTRFS_NR_RAID_TYPES
 };
 
@@ -864,6 +866,7 @@ enum btrfs_raid_types {
 #define BTRFS_BLOCK_GROUP_PROFILE_MASK	(BTRFS_BLOCK_GROUP_RAID0 |   \
 					 BTRFS_BLOCK_GROUP_RAID1 |   \
 					 BTRFS_BLOCK_GROUP_RAID1C3 | \
+					 BTRFS_BLOCK_GROUP_RAID1C4 | \
 					 BTRFS_BLOCK_GROUP_RAID5 |   \
 					 BTRFS_BLOCK_GROUP_RAID6 |   \
 					 BTRFS_BLOCK_GROUP_DUP |     \
@@ -872,7 +875,8 @@ enum btrfs_raid_types {
 					 BTRFS_BLOCK_GROUP_RAID6)
 
 #define BTRFS_BLOCK_GROUP_RAID1_MASK	(BTRFS_BLOCK_GROUP_RAID1 |   \
-					 BTRFS_BLOCK_GROUP_RAID1C3)
+					 BTRFS_BLOCK_GROUP_RAID1C3 | \
+					 BTRFS_BLOCK_GROUP_RAID1C4)
 
 /*
  * We need a bit for restriper to be able to tell when chunks of type
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
  2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
  2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba
@ 2019-10-31 15:13 ` David Sterba
  2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

The new raid1c3 and raid1c4 profiles are backward incompatible and the
name shall be 'raid1c34', the status can be found in the global
supported features in /sys/fs/btrfs/features or in the per-filesystem
directory.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ctree.h           | 3 ++-
 fs/btrfs/sysfs.c           | 2 ++
 fs/btrfs/volumes.c         | 9 +++++++++
 include/uapi/linux/btrfs.h | 1 +
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 923a8804ae94..e76b3cda13e3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -292,7 +292,8 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
 	 BTRFS_FEATURE_INCOMPAT_NO_HOLES	|	\
-	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
+	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID	|	\
+	 BTRFS_FEATURE_INCOMPAT_RAID1C34)
 
 #define BTRFS_FEATURE_INCOMPAT_SAFE_SET			\
 	(BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF)
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 4a78bc4ec62e..1725578c5464 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -259,6 +259,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(skinny_metadata, SKINNY_METADATA);
 BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES);
 BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID);
 BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
+BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34);
 
 /*
 static struct btrfs_feature_attr btrfs_attr_features_checksums_name = {
@@ -283,6 +284,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(no_holes),
 	BTRFS_FEAT_ATTR_PTR(metadata_uuid),
 	BTRFS_FEAT_ATTR_PTR(free_space_tree),
+	BTRFS_FEAT_ATTR_PTR(raid1c34),
 	NULL
 };
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 238d814f83a1..a674a960c7be 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4717,6 +4717,14 @@ static void check_raid56_incompat_flag(struct btrfs_fs_info *info, u64 type)
 	btrfs_set_fs_incompat(info, RAID56);
 }
 
+static void check_raid1c34_incompat_flag(struct btrfs_fs_info *info, u64 type)
+{
+	if (!(type & (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)))
+		return;
+
+	btrfs_set_fs_incompat(info, RAID1C34);
+}
+
 static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			       u64 start, u64 type)
 {
@@ -4983,6 +4991,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 
 	free_extent_map(em);
 	check_raid56_incompat_flag(info, type);
+	check_raid1c34_incompat_flag(info, type);
 
 	kfree(devices_info);
 	return 0;
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index a2b761275bba..7a8bc8b920f5 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -270,6 +270,7 @@ struct btrfs_ioctl_fs_info_args {
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID	(1ULL << 10)
+#define BTRFS_FEATURE_INCOMPAT_RAID1C34		(1ULL << 11)
 
 struct btrfs_ioctl_feature_flags {
 	__u64 compat_flags;
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
                   ` (2 preceding siblings ...)
  2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba
@ 2019-10-31 15:13 ` David Sterba
  2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 15:13 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

When there are no raid1c3 or raid1c4 block groups left after balance
(either convert or with other filters applied), remove the incompat bit.
This is already done for RAID56, do the same for RAID1C34.

Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/block-group.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 1e521db3ef56..9ce9c2e318cf 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -828,27 +828,36 @@ static void clear_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
  *
  * - RAID56 - in case there's neither RAID5 nor RAID6 profile block group
  *            in the whole filesystem
+ *
+ * - RAID1C34 - same as above for RAID1C3 and RAID1C4 block groups
  */
 static void clear_incompat_bg_bits(struct btrfs_fs_info *fs_info, u64 flags)
 {
-	if (flags & BTRFS_BLOCK_GROUP_RAID56_MASK) {
+	bool found_raid56 = false;
+	bool found_raid1c34 = false;
+
+	if ((flags & BTRFS_BLOCK_GROUP_RAID56_MASK) ||
+	    (flags & BTRFS_BLOCK_GROUP_RAID1C3) ||
+	    (flags & BTRFS_BLOCK_GROUP_RAID1C4)) {
 		struct list_head *head = &fs_info->space_info;
 		struct btrfs_space_info *sinfo;
 
 		list_for_each_entry_rcu(sinfo, head, list) {
-			bool found = false;
-
 			down_read(&sinfo->groups_sem);
 			if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID5]))
-				found = true;
+				found_raid56 = true;
 			if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID6]))
-				found = true;
+				found_raid56 = true;
+			if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID1C3]))
+				found_raid1c34 = true;
+			if (!list_empty(&sinfo->block_groups[BTRFS_RAID_RAID1C4]))
+				found_raid1c34 = true;
 			up_read(&sinfo->groups_sem);
-
-			if (found)
-				return;
 		}
-		btrfs_clear_fs_incompat(fs_info, RAID56);
+		if (found_raid56)
+			btrfs_clear_fs_incompat(fs_info, RAID56);
+		if (found_raid1c34)
+			btrfs_clear_fs_incompat(fs_info, RAID1C34);
 	}
 }
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH] btrfs-progs: add support for raid1c3 and raid1c4
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
                   ` (3 preceding siblings ...)
  2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba
@ 2019-10-31 18:43 ` David Sterba
  2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
  2019-11-01 14:54 ` Neal Gompa
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 18:43 UTC (permalink / raw)
  To: linux-btrfs; +Cc: David Sterba

Add support for 3- and 4- copy variants of RAID1. This adds resiliency
against 2 or resp. 3 devices lost or damaged.

$ ./mkfs.btrfs -m raid1c4 -d raid1c3 /dev/sd[abcd]

Label:              (null)
UUID:               f1f988ab-6750-4bc2-957b-98a4ebe98631
Node size:          16384
Sector size:        4096
Filesystem size:    8.00GiB
Block group profiles:
  Data:             RAID1C3         273.06MiB
  Metadata:         RAID1C4         204.75MiB
  System:           RAID1C4           8.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata, raid1c34
Number of devices:  4
Devices:
   ID        SIZE  PATH
    1     2.00GiB  /dev/sda
    2     2.00GiB  /dev/sdb
    3     2.00GiB  /dev/sdc
    4     2.00GiB  /dev/sdd

Signed-off-by: David Sterba <dsterba@suse.com>
---
 cmds/balance.c              |  4 ++++
 cmds/filesystem-usage.c     |  8 +++++++
 cmds/inspect-dump-super.c   |  3 ++-
 cmds/rescue-chunk-recover.c |  4 ++++
 common/fsfeatures.c         |  6 +++++
 common/utils.c              | 12 +++++++++-
 ctree.h                     |  8 +++++++
 extent-tree.c               |  4 ++++
 ioctl.h                     |  4 +++-
 mkfs/main.c                 | 11 ++++++++-
 print-tree.c                |  6 +++++
 volumes.c                   | 48 +++++++++++++++++++++++++++++++++++--
 volumes.h                   |  4 ++++
 13 files changed, 116 insertions(+), 6 deletions(-)

diff --git a/cmds/balance.c b/cmds/balance.c
index 32830002f3a0..2d0fb6ef52ed 100644
--- a/cmds/balance.c
+++ b/cmds/balance.c
@@ -46,6 +46,10 @@ static int parse_one_profile(const char *profile, u64 *flags)
 		*flags |= BTRFS_BLOCK_GROUP_RAID0;
 	} else if (!strcmp(profile, "raid1")) {
 		*flags |= BTRFS_BLOCK_GROUP_RAID1;
+	} else if (!strcmp(profile, "raid1c3")) {
+		*flags |= BTRFS_BLOCK_GROUP_RAID1C3;
+	} else if (!strcmp(profile, "raid1c4")) {
+		*flags |= BTRFS_BLOCK_GROUP_RAID1C4;
 	} else if (!strcmp(profile, "raid10")) {
 		*flags |= BTRFS_BLOCK_GROUP_RAID10;
 	} else if (!strcmp(profile, "raid5")) {
diff --git a/cmds/filesystem-usage.c b/cmds/filesystem-usage.c
index 212322188d19..744ff2de5a7f 100644
--- a/cmds/filesystem-usage.c
+++ b/cmds/filesystem-usage.c
@@ -374,6 +374,10 @@ static int print_filesystem_usage_overall(int fd, struct chunk_info *chunkinfo,
 			ratio = 1;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID1)
 			ratio = 2;
+		else if (flags & BTRFS_BLOCK_GROUP_RAID1C3)
+			ratio = 3;
+		else if (flags & BTRFS_BLOCK_GROUP_RAID1C4)
+			ratio = 4;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID5)
 			ratio = 0;
 		else if (flags & BTRFS_BLOCK_GROUP_RAID6)
@@ -654,6 +658,10 @@ static u64 calc_chunk_size(struct chunk_info *ci)
 		return ci->size / ci->num_stripes;
 	else if (ci->type & BTRFS_BLOCK_GROUP_RAID1)
 		return ci->size ;
+	else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C3)
+		return ci->size;
+	else if (ci->type & BTRFS_BLOCK_GROUP_RAID1C4)
+		return ci->size;
 	else if (ci->type & BTRFS_BLOCK_GROUP_DUP)
 		return ci->size ;
 	else if (ci->type & BTRFS_BLOCK_GROUP_RAID5)
diff --git a/cmds/inspect-dump-super.c b/cmds/inspect-dump-super.c
index bf380ad2b56a..b32a5ebecc86 100644
--- a/cmds/inspect-dump-super.c
+++ b/cmds/inspect-dump-super.c
@@ -227,7 +227,8 @@ static struct readable_flag_entry incompat_flags_array[] = {
 	DEF_INCOMPAT_FLAG_ENTRY(RAID56),
 	DEF_INCOMPAT_FLAG_ENTRY(SKINNY_METADATA),
 	DEF_INCOMPAT_FLAG_ENTRY(NO_HOLES),
-	DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID)
+	DEF_INCOMPAT_FLAG_ENTRY(METADATA_UUID),
+	DEF_INCOMPAT_FLAG_ENTRY(RAID1C34),
 };
 static const int incompat_flags_num = sizeof(incompat_flags_array) /
 				      sizeof(struct readable_flag_entry);
diff --git a/cmds/rescue-chunk-recover.c b/cmds/rescue-chunk-recover.c
index 329a608dfc6b..5d573161905f 100644
--- a/cmds/rescue-chunk-recover.c
+++ b/cmds/rescue-chunk-recover.c
@@ -1582,6 +1582,10 @@ static int calc_num_stripes(u64 type)
 	else if (type & (BTRFS_BLOCK_GROUP_RAID1 |
 			 BTRFS_BLOCK_GROUP_DUP))
 		return 2;
+	else if (type & (BTRFS_BLOCK_GROUP_RAID1C3))
+		return 3;
+	else if (type & (BTRFS_BLOCK_GROUP_RAID1C4))
+		return 4;
 	else
 		return 1;
 }
diff --git a/common/fsfeatures.c b/common/fsfeatures.c
index 50934bd161b0..ac12d57b25a3 100644
--- a/common/fsfeatures.c
+++ b/common/fsfeatures.c
@@ -86,6 +86,12 @@ static const struct btrfs_fs_feature {
 		VERSION_TO_STRING2(4,0),
 		NULL, 0,
 		"no explicit hole extents for files" },
+	{ "raid1c34", BTRFS_FEATURE_INCOMPAT_RAID1C34,
+		"raid1c34",
+		VERSION_TO_STRING2(5,5),
+		NULL, 0,
+		NULL, 0,
+		"RAID1 with 3 or 4 copies" },
 	/* Keep this one last */
 	{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
 };
diff --git a/common/utils.c b/common/utils.c
index 2cf15c333f6b..23e0a7927172 100644
--- a/common/utils.c
+++ b/common/utils.c
@@ -1117,8 +1117,10 @@ static int group_profile_devs_min(u64 flag)
 	case BTRFS_BLOCK_GROUP_RAID5:
 		return 2;
 	case BTRFS_BLOCK_GROUP_RAID6:
+	case BTRFS_BLOCK_GROUP_RAID1C3:
 		return 3;
 	case BTRFS_BLOCK_GROUP_RAID10:
+	case BTRFS_BLOCK_GROUP_RAID1C4:
 		return 4;
 	default:
 		return -1;
@@ -1135,9 +1137,10 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
 	default:
 	case 4:
 		allowed |= BTRFS_BLOCK_GROUP_RAID10;
+		allowed |= BTRFS_BLOCK_GROUP_RAID10 | BTRFS_BLOCK_GROUP_RAID1C4;
 		__attribute__ ((fallthrough));
 	case 3:
-		allowed |= BTRFS_BLOCK_GROUP_RAID6;
+		allowed |= BTRFS_BLOCK_GROUP_RAID6 | BTRFS_BLOCK_GROUP_RAID1C3;
 		__attribute__ ((fallthrough));
 	case 2:
 		allowed |= BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
@@ -1191,7 +1194,10 @@ int group_profile_max_safe_loss(u64 flags)
 	case BTRFS_BLOCK_GROUP_RAID10:
 		return 1;
 	case BTRFS_BLOCK_GROUP_RAID6:
+	case BTRFS_BLOCK_GROUP_RAID1C3:
 		return 2;
+	case BTRFS_BLOCK_GROUP_RAID1C4:
+		return 3;
 	default:
 		return -1;
 	}
@@ -1341,6 +1347,10 @@ const char* btrfs_group_profile_str(u64 flag)
 		return "RAID0";
 	case BTRFS_BLOCK_GROUP_RAID1:
 		return "RAID1";
+	case BTRFS_BLOCK_GROUP_RAID1C3:
+		return "RAID1C3";
+	case BTRFS_BLOCK_GROUP_RAID1C4:
+		return "RAID1C4";
 	case BTRFS_BLOCK_GROUP_RAID5:
 		return "RAID5";
 	case BTRFS_BLOCK_GROUP_RAID6:
diff --git a/ctree.h b/ctree.h
index b2745e1e8f13..f5227c053eb2 100644
--- a/ctree.h
+++ b/ctree.h
@@ -489,6 +489,7 @@ struct btrfs_super_block {
 #define BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA	(1ULL << 8)
 #define BTRFS_FEATURE_INCOMPAT_NO_HOLES		(1ULL << 9)
 #define BTRFS_FEATURE_INCOMPAT_METADATA_UUID    (1ULL << 10)
+#define BTRFS_FEATURE_INCOMPAT_RAID1C34		(1ULL << 11)
 
 #define BTRFS_FEATURE_COMPAT_SUPP		0ULL
 
@@ -512,6 +513,7 @@ struct btrfs_super_block {
 	 BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS |		\
 	 BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA |	\
 	 BTRFS_FEATURE_INCOMPAT_NO_HOLES |		\
+	 BTRFS_FEATURE_INCOMPAT_RAID1C34 |		\
 	 BTRFS_FEATURE_INCOMPAT_METADATA_UUID)
 
 /*
@@ -961,6 +963,8 @@ struct btrfs_csum_item {
 #define BTRFS_BLOCK_GROUP_RAID10	(1ULL << 6)
 #define BTRFS_BLOCK_GROUP_RAID5    	(1ULL << 7)
 #define BTRFS_BLOCK_GROUP_RAID6    	(1ULL << 8)
+#define BTRFS_BLOCK_GROUP_RAID1C3    	(1ULL << 9)
+#define BTRFS_BLOCK_GROUP_RAID1C4    	(1ULL << 10)
 #define BTRFS_BLOCK_GROUP_RESERVED	BTRFS_AVAIL_ALLOC_BIT_SINGLE
 
 enum btrfs_raid_types {
@@ -971,6 +975,8 @@ enum btrfs_raid_types {
 	BTRFS_RAID_SINGLE,
 	BTRFS_RAID_RAID5,
 	BTRFS_RAID_RAID6,
+	BTRFS_RAID_RAID1C3,
+	BTRFS_RAID_RAID1C4,
 	BTRFS_NR_RAID_TYPES
 };
 
@@ -982,6 +988,8 @@ enum btrfs_raid_types {
 					 BTRFS_BLOCK_GROUP_RAID1 |   \
 					 BTRFS_BLOCK_GROUP_RAID5 |   \
 					 BTRFS_BLOCK_GROUP_RAID6 |   \
+					 BTRFS_BLOCK_GROUP_RAID1C3 | \
+					 BTRFS_BLOCK_GROUP_RAID1C4 | \
 					 BTRFS_BLOCK_GROUP_DUP |     \
 					 BTRFS_BLOCK_GROUP_RAID10)
 
diff --git a/extent-tree.c b/extent-tree.c
index 662fb1fa2b9a..d5cd13bd4328 100644
--- a/extent-tree.c
+++ b/extent-tree.c
@@ -1669,6 +1669,8 @@ static void set_avail_alloc_bits(struct btrfs_fs_info *fs_info, u64 flags)
 {
 	u64 extra_flags = flags & (BTRFS_BLOCK_GROUP_RAID0 |
 				   BTRFS_BLOCK_GROUP_RAID1 |
+				   BTRFS_BLOCK_GROUP_RAID1C3 |
+				   BTRFS_BLOCK_GROUP_RAID1C4 |
 				   BTRFS_BLOCK_GROUP_RAID10 |
 				   BTRFS_BLOCK_GROUP_RAID5 |
 				   BTRFS_BLOCK_GROUP_RAID6 |
@@ -3104,6 +3106,8 @@ static u64 get_dev_extent_len(struct map_lookup *map)
 	case 0: /* Single */
 	case BTRFS_BLOCK_GROUP_DUP:
 	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID1C3:
+	case BTRFS_BLOCK_GROUP_RAID1C4:
 		div = 1;
 		break;
 	case BTRFS_BLOCK_GROUP_RAID5:
diff --git a/ioctl.h b/ioctl.h
index 66ee599f7a82..d3dfd6375de1 100644
--- a/ioctl.h
+++ b/ioctl.h
@@ -775,7 +775,9 @@ enum btrfs_err_code {
 	BTRFS_ERROR_DEV_TGT_REPLACE,
 	BTRFS_ERROR_DEV_MISSING_NOT_FOUND,
 	BTRFS_ERROR_DEV_ONLY_WRITABLE,
-	BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS
+	BTRFS_ERROR_DEV_EXCL_RUN_IN_PROGRESS,
+	BTRFS_ERROR_DEV_RAID1C3_MIN_NOT_MET,
+	BTRFS_ERROR_DEV_RAID1C4_MIN_NOT_MET,
 };
 
 /* An error code to error string mapping for the kernel
diff --git a/mkfs/main.c b/mkfs/main.c
index f52e8b61a460..dd1223f703e4 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -337,7 +337,7 @@ static void print_usage(int ret)
 	printf("Usage: mkfs.btrfs [options] dev [ dev ... ]\n");
 	printf("Options:\n");
 	printf("  allocation profiles:\n");
-	printf("\t-d|--data PROFILE       data profile, raid0, raid1, raid5, raid6, raid10, dup or single\n");
+	printf("\t-d|--data PROFILE       data profile, raid0, raid1, raid1c3, raid1c4, raid5, raid6, raid10, dup or single\n");
 	printf("\t-m|--metadata PROFILE   metadata profile, values like for data profile\n");
 	printf("\t-M|--mixed              mix metadata and data together\n");
 	printf("  features:\n");
@@ -370,6 +370,10 @@ static u64 parse_profile(const char *s)
 		return BTRFS_BLOCK_GROUP_RAID0;
 	} else if (strcasecmp(s, "raid1") == 0) {
 		return BTRFS_BLOCK_GROUP_RAID1;
+	} else if (strcasecmp(s, "raid1c3") == 0) {
+		return BTRFS_BLOCK_GROUP_RAID1C3;
+	} else if (strcasecmp(s, "raid1c4") == 0) {
+		return BTRFS_BLOCK_GROUP_RAID1C4;
 	} else if (strcasecmp(s, "raid5") == 0) {
 		return BTRFS_BLOCK_GROUP_RAID5;
 	} else if (strcasecmp(s, "raid6") == 0) {
@@ -1065,6 +1069,11 @@ int BOX_MAIN(mkfs)(int argc, char **argv)
 		features |= BTRFS_FEATURE_INCOMPAT_RAID56;
 	}
 
+	if ((data_profile | metadata_profile) &
+	    (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4)) {
+		features |= BTRFS_FEATURE_INCOMPAT_RAID1C34;
+	}
+
 	if (btrfs_check_nodesize(nodesize, sectorsize,
 				 features))
 		goto error;
diff --git a/print-tree.c b/print-tree.c
index f70ce6844a7e..35ab9234cf48 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -162,6 +162,12 @@ static void bg_flags_to_str(u64 flags, char *ret)
 	case BTRFS_BLOCK_GROUP_RAID1:
 		strcat(ret, "|RAID1");
 		break;
+	case BTRFS_BLOCK_GROUP_RAID1C3:
+		strcat(ret, "|RAID1C3");
+		break;
+	case BTRFS_BLOCK_GROUP_RAID1C4:
+		strcat(ret, "|RAID1C4");
+		break;
 	case BTRFS_BLOCK_GROUP_DUP:
 		strcat(ret, "|DUP");
 		break;
diff --git a/volumes.c b/volumes.c
index fbbc22b5b1b3..63e7fba975cf 100644
--- a/volumes.c
+++ b/volumes.c
@@ -57,6 +57,24 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = {
 		.bg_flag	= BTRFS_BLOCK_GROUP_RAID1,
 		.mindev_error	= BTRFS_ERROR_DEV_RAID1_MIN_NOT_MET,
 	},
+	[BTRFS_RAID_RAID1C3] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 3,
+		.tolerated_failures = 2,
+		.devs_increment	= 3,
+		.ncopies	= 3,
+	},
+	[BTRFS_RAID_RAID1C4] = {
+		.sub_stripes	= 1,
+		.dev_stripes	= 1,
+		.devs_max	= 0,
+		.devs_min	= 4,
+		.tolerated_failures = 3,
+		.devs_increment	= 4,
+		.ncopies	= 4,
+	},
 	[BTRFS_RAID_DUP] = {
 		.sub_stripes	= 1,
 		.dev_stripes	= 2,
@@ -854,6 +872,8 @@ static u64 chunk_bytes_by_type(u64 type, u64 calc_size, int num_stripes,
 {
 	if (type & (BTRFS_BLOCK_GROUP_RAID1 | BTRFS_BLOCK_GROUP_DUP))
 		return calc_size;
+	else if (type & (BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4))
+		return calc_size;
 	else if (type & BTRFS_BLOCK_GROUP_RAID10)
 		return calc_size * (num_stripes / sub_stripes);
 	else if (type & BTRFS_BLOCK_GROUP_RAID5)
@@ -1034,6 +1054,20 @@ int btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
 			return -ENOSPC;
 		min_stripes = 2;
 	}
+	if (type & BTRFS_BLOCK_GROUP_RAID1C3) {
+		num_stripes = min_t(u64, 3,
+				  btrfs_super_num_devices(info->super_copy));
+		if (num_stripes < 3)
+			return -ENOSPC;
+		min_stripes = 3;
+	}
+	if (type & BTRFS_BLOCK_GROUP_RAID1C4) {
+		num_stripes = min_t(u64, 4,
+				  btrfs_super_num_devices(info->super_copy));
+		if (num_stripes < 4)
+			return -ENOSPC;
+		min_stripes = 4;
+	}
 	if (type & BTRFS_BLOCK_GROUP_DUP) {
 		num_stripes = 2;
 		min_stripes = 2;
@@ -1382,7 +1416,8 @@ int btrfs_num_copies(struct btrfs_fs_info *fs_info, u64 logical, u64 len)
 	}
 	map = container_of(ce, struct map_lookup, ce);
 
-	if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1))
+	if (map->type & (BTRFS_BLOCK_GROUP_DUP | BTRFS_BLOCK_GROUP_RAID1 |
+			 BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4))
 		ret = map->num_stripes;
 	else if (map->type & BTRFS_BLOCK_GROUP_RAID10)
 		ret = map->sub_stripes;
@@ -1578,6 +1613,8 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
 
 	if (rw == WRITE) {
 		if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
+				 BTRFS_BLOCK_GROUP_RAID1C3 |
+				 BTRFS_BLOCK_GROUP_RAID1C4 |
 				 BTRFS_BLOCK_GROUP_DUP)) {
 			stripes_required = map->num_stripes;
 		} else if (map->type & BTRFS_BLOCK_GROUP_RAID10) {
@@ -1620,6 +1657,7 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
 	stripe_offset = offset - stripe_offset;
 
 	if (map->type & (BTRFS_BLOCK_GROUP_RAID0 | BTRFS_BLOCK_GROUP_RAID1 |
+			 BTRFS_BLOCK_GROUP_RAID1C3 | BTRFS_BLOCK_GROUP_RAID1C4 |
 			 BTRFS_BLOCK_GROUP_RAID5 | BTRFS_BLOCK_GROUP_RAID6 |
 			 BTRFS_BLOCK_GROUP_RAID10 |
 			 BTRFS_BLOCK_GROUP_DUP)) {
@@ -1635,7 +1673,9 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw,
 
 	multi->num_stripes = 1;
 	stripe_index = 0;
-	if (map->type & BTRFS_BLOCK_GROUP_RAID1) {
+	if (map->type & (BTRFS_BLOCK_GROUP_RAID1 |
+			 BTRFS_BLOCK_GROUP_RAID1C3 |
+			 BTRFS_BLOCK_GROUP_RAID1C4)) {
 		if (rw == WRITE)
 			multi->num_stripes = map->num_stripes;
 		else if (mirror_num)
@@ -1905,6 +1945,8 @@ int btrfs_check_chunk_valid(struct btrfs_fs_info *fs_info,
 	if ((type & BTRFS_BLOCK_GROUP_RAID10 && (sub_stripes != 2 ||
 		  !IS_ALIGNED(num_stripes, sub_stripes))) ||
 	    (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes < 1) ||
+	    (type & BTRFS_BLOCK_GROUP_RAID1C3 && num_stripes < 3) ||
+	    (type & BTRFS_BLOCK_GROUP_RAID1C4 && num_stripes < 4) ||
 	    (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < 2) ||
 	    (type & BTRFS_BLOCK_GROUP_RAID6 && num_stripes < 3) ||
 	    (type & BTRFS_BLOCK_GROUP_DUP && num_stripes > 2) ||
@@ -2464,6 +2506,8 @@ u64 btrfs_stripe_length(struct btrfs_fs_info *fs_info,
 	switch (profile) {
 	case 0: /* Single profile */
 	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID1C3:
+	case BTRFS_BLOCK_GROUP_RAID1C4:
 	case BTRFS_BLOCK_GROUP_DUP:
 		stripe_len = chunk_len;
 		break;
diff --git a/volumes.h b/volumes.h
index 586588c871ab..a6351dcf0bc3 100644
--- a/volumes.h
+++ b/volumes.h
@@ -135,6 +135,10 @@ static inline enum btrfs_raid_types btrfs_bg_flags_to_raid_index(u64 flags)
 		return BTRFS_RAID_RAID10;
 	else if (flags & BTRFS_BLOCK_GROUP_RAID1)
 		return BTRFS_RAID_RAID1;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID1C3)
+		return BTRFS_RAID_RAID1C3;
+	else if (flags & BTRFS_BLOCK_GROUP_RAID1C4)
+		return BTRFS_RAID_RAID1C4;
 	else if (flags & BTRFS_BLOCK_GROUP_DUP)
 		return BTRFS_RAID_DUP;
 	else if (flags & BTRFS_BLOCK_GROUP_RAID0)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
                   ` (4 preceding siblings ...)
  2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
@ 2019-10-31 18:44 ` David Sterba
  2019-11-01 14:54 ` Neal Gompa
  6 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-10-31 18:44 UTC (permalink / raw)
  To: David Sterba; +Cc: linux-btrfs

The kernel code can be pulled from (based on misc-next)

  git://github.com/kdave/btrfs-devel.git dev/raid1c3-5.5-final

and for btrfs-progs (based on 5.3.1)

  git://github.com/kdave/btrfs-progs.git dev/raid1c34

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
                   ` (5 preceding siblings ...)
  2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
@ 2019-11-01 14:54 ` Neal Gompa
  2019-11-01 15:09   ` David Sterba
  6 siblings, 1 reply; 13+ messages in thread
From: Neal Gompa @ 2019-11-01 14:54 UTC (permalink / raw)
  To: David Sterba; +Cc: Btrfs BTRFS

On Thu, Oct 31, 2019 at 11:17 AM David Sterba <dsterba@suse.com> wrote:
>
> Here it goes again, RAID1 with 3- and 4- copies. I found the bug that stopped
> it from inclusion last time, it was in the test itself, so the kernel code is
> effectively unchanged.
>
> So, with 1 or 2 missing devices, replace by device id works. There's one
> annoying thing but not new: regarding replace of a missing device, some
> extra single/dup block groups are created during the replace process.
> Example below. This can happen on plain raid1 with degraded read-write
> mount as well.
>
> Now what's the merge target.
>
> The patches almost made it to 5.3, the changes build on existing code so the
> actual addition of new profiles is namely in the definitions and additional
> cases. So it should be safe.
>
> I'm for adding it to 5.5 queue, though we're at rc5 and this can be seen as a
> late time for a feature. The user benefits are noticeable, raid1c3 can replace
> raid6 of metadata which is the most problematic part and much more complicated
> to fix (write ahead journal or something like that). The feedback regarding the
> plain 3-copy as a replacement was positive, on IRC and there are mails about
> that too.
>

What's the reasoning for not submitting this for 5.4? I think the
improvements here are definitely worth pulling into the 5.4 kernel
release...

-- 
真実はいつも一つ!/ Always, there's only one truth!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-11-01 14:54 ` Neal Gompa
@ 2019-11-01 15:09   ` David Sterba
  2019-11-03  0:35     ` waxhead
  2019-11-14  5:13     ` Zygo Blaxell
  0 siblings, 2 replies; 13+ messages in thread
From: David Sterba @ 2019-11-01 15:09 UTC (permalink / raw)
  To: Neal Gompa; +Cc: David Sterba, Btrfs BTRFS

On Fri, Nov 01, 2019 at 10:54:45AM -0400, Neal Gompa wrote:
> What's the reasoning for not submitting this for 5.4? I think the
> improvements here are definitely worth pulling into the 5.4 kernel
> release...

Because 5.4 is at rc5, new features are allowed to be merged only during
the merge window, ie. before 5.4-rc1. Thats more than a month ago.  From
rc1-rcX only regressions or fixes can be applied, so you can see pull
requests but the subject lines almost always contain 'fix'.

A new feature has to be in the develoment branch at least 2 weeks before
the merge window opens (for testing), so right now it's the last
opportunity to get it to 5.5, 5.4 is out of question. No matter how much
I or users want to get it merged. This is how the linux development
process works.

The raid1c34 patches are not intrusive and could be backported on top of
5.3 because all the preparatory work has been merged already.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-11-01 15:09   ` David Sterba
@ 2019-11-03  0:35     ` waxhead
  2019-11-04 13:40       ` David Sterba
  2019-11-14  5:13     ` Zygo Blaxell
  1 sibling, 1 reply; 13+ messages in thread
From: waxhead @ 2019-11-03  0:35 UTC (permalink / raw)
  To: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS

Would GRUB be able to boot from RAID1c34 by treating it as "regular" 
RAID1?! If not I think a warning could be useful.

David Sterba wrote:
> On Fri, Nov 01, 2019 at 10:54:45AM -0400, Neal Gompa wrote:
>> What's the reasoning for not submitting this for 5.4? I think the
>> improvements here are definitely worth pulling into the 5.4 kernel
>> release...
> 
> Because 5.4 is at rc5, new features are allowed to be merged only during
> the merge window, ie. before 5.4-rc1. Thats more than a month ago.  From
> rc1-rcX only regressions or fixes can be applied, so you can see pull
> requests but the subject lines almost always contain 'fix'.
> 
> A new feature has to be in the develoment branch at least 2 weeks before
> the merge window opens (for testing), so right now it's the last
> opportunity to get it to 5.5, 5.4 is out of question. No matter how much
> I or users want to get it merged. This is how the linux development
> process works.
> 
> The raid1c34 patches are not intrusive and could be backported on top of
> 5.3 because all the preparatory work has been merged already.
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-11-03  0:35     ` waxhead
@ 2019-11-04 13:40       ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-11-04 13:40 UTC (permalink / raw)
  To: waxhead; +Cc: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS

On Sun, Nov 03, 2019 at 01:35:34AM +0100, waxhead wrote:
> Would GRUB be able to boot from RAID1c34 by treating it as "regular" 
> RAID1?! If not I think a warning could be useful.

Currently grub will refuse to boot from that with 'unknown profile'
message. Adding the support seems to be fairly easi, I'll send the
patches.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-11-01 15:09   ` David Sterba
  2019-11-03  0:35     ` waxhead
@ 2019-11-14  5:13     ` Zygo Blaxell
  2019-11-15 10:28       ` David Sterba
  1 sibling, 1 reply; 13+ messages in thread
From: Zygo Blaxell @ 2019-11-14  5:13 UTC (permalink / raw)
  To: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS

[-- Attachment #1: Type: text/plain, Size: 1385 bytes --]

On Fri, Nov 01, 2019 at 04:09:08PM +0100, David Sterba wrote:
> The raid1c34 patches are not intrusive and could be backported on top of
> 5.3 because all the preparatory work has been merged already.

Indeed, that's how I ended up testing them.  I couldn't get the 5.4-rc
kernels to run long enough to do meaningful testing before they locked
up.  I tested with 5.3.8 + patches.

I left out the last patch that removes the raid1c3 incompat flag because
5.3 didn't have the block group tree code to apply it to.

I ran my raid1 and raid56 corruption recovery tests modified for raid1c3.
The first test is roughly:

	mkfs.btrfs -draid1c3 -mraid1c3 /dev/vd[bcdef]
	mount /dev/vdb /test
	cp -a 9GB_data /test
	sync
	sysctl vm.drop_caches=3
	diff -r 9GB_data /test
	head -c 9g /dev/urandom > /dev/vdb
	head -c 9g /dev/urandom > /dev/vdc
	sync
	sysctl vm.drop_caches=3
	diff -r 9GB_data /test
	btrfs scrub start -Bd /test
	sysctl vm.drop_caches=3
	diff -r 9GB_data /test
	btrfs scrub start -Bd /test
	sysctl vm.drop_caches=3
	diff -r 9GB_data /test

First scrub reported a lot of corruption on /dev/vdb and /dev/vdc.  Second
scrub reported no errors.  diff (all instances) reported no differences.

Second test is:

	mkfs.btrfs -draid6 -mraid1c3 /dev/vd[bcdef]
	# rest as above...

Similar results:  first scrub reported many errors as expected.
Second scrub reported no errors.  No diffs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 0/4] RAID1 with 3- and 4- copies
  2019-11-14  5:13     ` Zygo Blaxell
@ 2019-11-15 10:28       ` David Sterba
  0 siblings, 0 replies; 13+ messages in thread
From: David Sterba @ 2019-11-15 10:28 UTC (permalink / raw)
  To: Zygo Blaxell; +Cc: dsterba, Neal Gompa, David Sterba, Btrfs BTRFS

On Thu, Nov 14, 2019 at 12:13:24AM -0500, Zygo Blaxell wrote:
> On Fri, Nov 01, 2019 at 04:09:08PM +0100, David Sterba wrote:
> > The raid1c34 patches are not intrusive and could be backported on top of
> > 5.3 because all the preparatory work has been merged already.
> 
> Indeed, that's how I ended up testing them.  I couldn't get the 5.4-rc
> kernels to run long enough to do meaningful testing before they locked
> up.  I tested with 5.3.8 + patches.
> 
> I left out the last patch that removes the raid1c3 incompat flag because
> 5.3 didn't have the block group tree code to apply it to.
> 
> I ran my raid1 and raid56 corruption recovery tests modified for raid1c3.
> The first test is roughly:
> 
> 	mkfs.btrfs -draid1c3 -mraid1c3 /dev/vd[bcdef]
> 	mount /dev/vdb /test
> 	cp -a 9GB_data /test
> 	sync
> 	sysctl vm.drop_caches=3
> 	diff -r 9GB_data /test
> 	head -c 9g /dev/urandom > /dev/vdb
> 	head -c 9g /dev/urandom > /dev/vdc
> 	sync
> 	sysctl vm.drop_caches=3
> 	diff -r 9GB_data /test
> 	btrfs scrub start -Bd /test
> 	sysctl vm.drop_caches=3
> 	diff -r 9GB_data /test
> 	btrfs scrub start -Bd /test
> 	sysctl vm.drop_caches=3
> 	diff -r 9GB_data /test
> 
> First scrub reported a lot of corruption on /dev/vdb and /dev/vdc.  Second
> scrub reported no errors.  diff (all instances) reported no differences.
> 
> Second test is:
> 
> 	mkfs.btrfs -draid6 -mraid1c3 /dev/vd[bcdef]
> 	# rest as above...
> 
> Similar results:  first scrub reported many errors as expected.
> Second scrub reported no errors.  No diffs.

Thanks for the tests.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-15 10:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-31 15:13 [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
2019-10-31 15:13 ` [PATCH v2 1/4] btrfs: add support for 3-copy replication (raid1c3) David Sterba
2019-10-31 15:13 ` [PATCH v2 2/4] btrfs: add support for 4-copy replication (raid1c4) David Sterba
2019-10-31 15:13 ` [PATCH v2 3/4] btrfs: add incompat for raid1 with 3, 4 copies David Sterba
2019-10-31 15:13 ` [PATCH v2 4/4] btrfs: drop incompat bit for raid1c34 after last block group is gone David Sterba
2019-10-31 18:43 ` [PATCH] btrfs-progs: add support for raid1c3 and raid1c4 David Sterba
2019-10-31 18:44 ` [PATCH v3 0/4] RAID1 with 3- and 4- copies David Sterba
2019-11-01 14:54 ` Neal Gompa
2019-11-01 15:09   ` David Sterba
2019-11-03  0:35     ` waxhead
2019-11-04 13:40       ` David Sterba
2019-11-14  5:13     ` Zygo Blaxell
2019-11-15 10:28       ` David Sterba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).