All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Introduce new raid0 state 'broken'
@ 2019-07-29 20:31 ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-29 20:31 UTC (permalink / raw)
  To: linux-raid
  Cc: songliubraving, gpiccoli, neilb, linux-block, dm-devel, jay.vosburgh

Currently the md/raid0 device behaves quite differently of other block
devices when it comes to failure. While other md levels contain vast
logic to deal with failures, and other non-md devices like scsi disks
or nvme rely on a dying queue when they fail, md/raid0 for instance
does not signal failures if an array member is removed while the array
is mounted; in that case, udev signals the device removal but mdadm
cannot succeed in the STOP_ARRAY ioctl, since it's mounted.

This behavior was tentatively changed in the past to match the scsi/nvme
devices (see [0]), but this attempt was quite complex, it had some corner
cases and (the few) community reviews weren't generally positive.
So, we are trying again with a simpler approach this time.

This series introduces a new array state 'broken' (for raid0 only), which
mimics the state 'clean'. The main goal for this new state is a way to
signal the user that something is wrong with the array. We also included a
warn_once-style message in kernel log to alert the user when the array has
one failed member.

The series encompass changes in the kernel and in mdadm tool. To get the
'broken' state completely functional one requires both changes, but mdadm
and kernel can live without their counterpart changes (in case some users
gets an updated mdadm for example, but keeps using an old kernel).

This series does not affect at all the way md/raid0 will react to I/O
failures. It was discussed in [0] that it should be better if raid0 could
fail faster in case it gets a member removed; we just proposed a change in
that realm too (see [1]), but it seems better to have them reviewed/treated
separately.

This series was tested with raid0 arrays holding both an ext4 and xfs
filesystems. Thanks in advance for the reviews/feedbacks.
Cheers,


Guilherme


[0] lore.kernel.org/linux-block/20190418220448.7219-1-gpiccoli@canonical.com
[1] lore.kernel.org/linux-block/20190729193359.11040-1-gpiccoli@canonical.com


Guilherme G. Piccoli (1):
  md/raid0: Introduce new array state 'broken' for raid0

[kernel part]
 drivers/md/md.c    | 23 +++++++++++++++++++----
 drivers/md/md.h    |  2 ++
 drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 4 deletions(-)

[mdadm]
 Detail.c  | 16 ++++++++++++++--
 Monitor.c |  9 +++++++--
 maps.c    |  1 +
 mdadm.h   |  1 +
 mdmon.h   |  2 +-
 monitor.c |  4 ++--
 6 files changed, 26 insertions(+), 7 deletions(-)

-- 
2.22.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/2] Introduce new raid0 state 'broken'
@ 2019-07-29 20:31 ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-29 20:31 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-block, dm-devel, gpiccoli, jay.vosburgh, neilb, songliubraving

Currently the md/raid0 device behaves quite differently of other block
devices when it comes to failure. While other md levels contain vast
logic to deal with failures, and other non-md devices like scsi disks
or nvme rely on a dying queue when they fail, md/raid0 for instance
does not signal failures if an array member is removed while the array
is mounted; in that case, udev signals the device removal but mdadm
cannot succeed in the STOP_ARRAY ioctl, since it's mounted.

This behavior was tentatively changed in the past to match the scsi/nvme
devices (see [0]), but this attempt was quite complex, it had some corner
cases and (the few) community reviews weren't generally positive.
So, we are trying again with a simpler approach this time.

This series introduces a new array state 'broken' (for raid0 only), which
mimics the state 'clean'. The main goal for this new state is a way to
signal the user that something is wrong with the array. We also included a
warn_once-style message in kernel log to alert the user when the array has
one failed member.

The series encompass changes in the kernel and in mdadm tool. To get the
'broken' state completely functional one requires both changes, but mdadm
and kernel can live without their counterpart changes (in case some users
gets an updated mdadm for example, but keeps using an old kernel).

This series does not affect at all the way md/raid0 will react to I/O
failures. It was discussed in [0] that it should be better if raid0 could
fail faster in case it gets a member removed; we just proposed a change in
that realm too (see [1]), but it seems better to have them reviewed/treated
separately.

This series was tested with raid0 arrays holding both an ext4 and xfs
filesystems. Thanks in advance for the reviews/feedbacks.
Cheers,


Guilherme


[0] lore.kernel.org/linux-block/20190418220448.7219-1-gpiccoli@canonical.com
[1] lore.kernel.org/linux-block/20190729193359.11040-1-gpiccoli@canonical.com


Guilherme G. Piccoli (1):
  md/raid0: Introduce new array state 'broken' for raid0

[kernel part]
 drivers/md/md.c    | 23 +++++++++++++++++++----
 drivers/md/md.h    |  2 ++
 drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 4 deletions(-)

[mdadm]
 Detail.c  | 16 ++++++++++++++--
 Monitor.c |  9 +++++++--
 maps.c    |  1 +
 mdadm.h   |  1 +
 mdmon.h   |  2 +-
 monitor.c |  4 ++--
 6 files changed, 26 insertions(+), 7 deletions(-)

-- 
2.22.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-29 20:31 ` Guilherme G. Piccoli
@ 2019-07-29 20:31   ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-29 20:31 UTC (permalink / raw)
  To: linux-raid
  Cc: songliubraving, gpiccoli, neilb, linux-block, dm-devel, jay.vosburgh

Currently if a md/raid0 array gets one or more members removed while
being mounted, kernel keeps showing state 'clean' in the 'array_state'
sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.

Nothing else hints that something is wrong (except that the removed devices
don't show properly in the output of 'mdadm detail' command). There is no
other property to be checked, and if user is not performing reads/writes
to the array, even kernel log is quiet and doesn't give a clue about the
missing member.

This patch changes this behavior; when 'array_state' is read we introduce
a non-expensive check (only for raid0) that relies in the comparison of
the total number of disks when array was assembled with gendisk flags of
those devices to validate if all members are available and functional.
A new array state 'broken' was added: it mimics the state 'clean' in every
aspect, being useful only to distinguish if such array has some member
missing. Also, we show a rate-limited warning in kernel log in such case.

This patch has no proper functional change other than adding a 'clean'-like
state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
counterpart to handle the 'broken' state.

Cc: NeilBrown <neilb@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 drivers/md/md.c    | 23 +++++++++++++++++++----
 drivers/md/md.h    |  2 ++
 drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fba49918d591..b80f36084ec1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4160,12 +4160,18 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
  * active-idle
  *     like active, but no writes have been seen for a while (100msec).
  *
+ * broken
+ *     RAID0-only: same as clean, but array is missing a member.
+ *     It's useful because RAID0 mounted-arrays aren't stopped
+ *     when a member is gone, so this state will at least alert
+ *     the user that something is wrong.
+ *
  */
 enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
-		   write_pending, active_idle, bad_word};
+		   write_pending, active_idle, broken, bad_word};
 static char *array_states[] = {
 	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
-	"write-pending", "active-idle", NULL };
+	"write-pending", "active-idle", "broken", NULL };
 
 static int match_word(const char *word, char **list)
 {
@@ -4181,7 +4187,7 @@ array_state_show(struct mddev *mddev, char *page)
 {
 	enum array_state st = inactive;
 
-	if (mddev->pers)
+	if (mddev->pers) {
 		switch(mddev->ro) {
 		case 1:
 			st = readonly;
@@ -4201,7 +4207,15 @@ array_state_show(struct mddev *mddev, char *page)
 				st = active;
 			spin_unlock(&mddev->lock);
 		}
-	else {
+
+		if ((mddev->pers->level == 0) &&
+		   ((st == clean) || (st == broken))) {
+			if (mddev->pers->is_missing_dev(mddev))
+				st = broken;
+			else
+				st = clean;
+		}
+	} else {
 		if (list_empty(&mddev->disks) &&
 		    mddev->raid_disks == 0 &&
 		    mddev->dev_sectors == 0)
@@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
 		break;
 	case write_pending:
 	case active_idle:
+	case broken:
 		/* these cannot be set */
 		break;
 	}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 41552e615c4c..e7b42b75701a 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -590,6 +590,8 @@ struct md_personality
 	int (*congested)(struct mddev *mddev, int bits);
 	/* Changes the consistency policy of an active array. */
 	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
+	/* Check if there is any missing/failed members - RAID0 only for now. */
+	bool (*is_missing_dev)(struct mddev *mddev);
 };
 
 struct md_sysfs_entry {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 58a9cc5193bf..79618a6ae31a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
 	}
 }
 
+bool raid0_is_missing_dev(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+	static int already_missing;
+	int def_disks, work_disks = 0;
+	struct r0conf *conf = mddev->private;
+
+	def_disks = conf->strip_zone[0].nb_dev;
+	rdev_for_each(rdev, mddev)
+		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
+			work_disks++;
+
+	if (unlikely(def_disks - work_disks)) {
+		if (!already_missing) {
+			already_missing = 1;
+			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
+				mdname(mddev), (def_disks - work_disks));
+		}
+		return true;
+	}
+
+	already_missing = 0;
+	return false;
+}
+
 static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 {
 	struct r0conf *conf = mddev->private;
@@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
 	.takeover	= raid0_takeover,
 	.quiesce	= raid0_quiesce,
 	.congested	= raid0_congested,
+	.is_missing_dev	= raid0_is_missing_dev,
 };
 
 static int __init raid0_init (void)
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-29 20:31   ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-29 20:31 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-block, dm-devel, gpiccoli, jay.vosburgh, neilb, songliubraving

Currently if a md/raid0 array gets one or more members removed while
being mounted, kernel keeps showing state 'clean' in the 'array_state'
sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.

Nothing else hints that something is wrong (except that the removed devices
don't show properly in the output of 'mdadm detail' command). There is no
other property to be checked, and if user is not performing reads/writes
to the array, even kernel log is quiet and doesn't give a clue about the
missing member.

This patch changes this behavior; when 'array_state' is read we introduce
a non-expensive check (only for raid0) that relies in the comparison of
the total number of disks when array was assembled with gendisk flags of
those devices to validate if all members are available and functional.
A new array state 'broken' was added: it mimics the state 'clean' in every
aspect, being useful only to distinguish if such array has some member
missing. Also, we show a rate-limited warning in kernel log in such case.

This patch has no proper functional change other than adding a 'clean'-like
state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
counterpart to handle the 'broken' state.

Cc: NeilBrown <neilb@suse.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 drivers/md/md.c    | 23 +++++++++++++++++++----
 drivers/md/md.h    |  2 ++
 drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fba49918d591..b80f36084ec1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4160,12 +4160,18 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
  * active-idle
  *     like active, but no writes have been seen for a while (100msec).
  *
+ * broken
+ *     RAID0-only: same as clean, but array is missing a member.
+ *     It's useful because RAID0 mounted-arrays aren't stopped
+ *     when a member is gone, so this state will at least alert
+ *     the user that something is wrong.
+ *
  */
 enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
-		   write_pending, active_idle, bad_word};
+		   write_pending, active_idle, broken, bad_word};
 static char *array_states[] = {
 	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
-	"write-pending", "active-idle", NULL };
+	"write-pending", "active-idle", "broken", NULL };
 
 static int match_word(const char *word, char **list)
 {
@@ -4181,7 +4187,7 @@ array_state_show(struct mddev *mddev, char *page)
 {
 	enum array_state st = inactive;
 
-	if (mddev->pers)
+	if (mddev->pers) {
 		switch(mddev->ro) {
 		case 1:
 			st = readonly;
@@ -4201,7 +4207,15 @@ array_state_show(struct mddev *mddev, char *page)
 				st = active;
 			spin_unlock(&mddev->lock);
 		}
-	else {
+
+		if ((mddev->pers->level == 0) &&
+		   ((st == clean) || (st == broken))) {
+			if (mddev->pers->is_missing_dev(mddev))
+				st = broken;
+			else
+				st = clean;
+		}
+	} else {
 		if (list_empty(&mddev->disks) &&
 		    mddev->raid_disks == 0 &&
 		    mddev->dev_sectors == 0)
@@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
 		break;
 	case write_pending:
 	case active_idle:
+	case broken:
 		/* these cannot be set */
 		break;
 	}
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 41552e615c4c..e7b42b75701a 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -590,6 +590,8 @@ struct md_personality
 	int (*congested)(struct mddev *mddev, int bits);
 	/* Changes the consistency policy of an active array. */
 	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
+	/* Check if there is any missing/failed members - RAID0 only for now. */
+	bool (*is_missing_dev)(struct mddev *mddev);
 };
 
 struct md_sysfs_entry {
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 58a9cc5193bf..79618a6ae31a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
 	}
 }
 
+bool raid0_is_missing_dev(struct mddev *mddev)
+{
+	struct md_rdev *rdev;
+	static int already_missing;
+	int def_disks, work_disks = 0;
+	struct r0conf *conf = mddev->private;
+
+	def_disks = conf->strip_zone[0].nb_dev;
+	rdev_for_each(rdev, mddev)
+		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
+			work_disks++;
+
+	if (unlikely(def_disks - work_disks)) {
+		if (!already_missing) {
+			already_missing = 1;
+			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
+				mdname(mddev), (def_disks - work_disks));
+		}
+		return true;
+	}
+
+	already_missing = 0;
+	return false;
+}
+
 static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 {
 	struct r0conf *conf = mddev->private;
@@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
 	.takeover	= raid0_takeover,
 	.quiesce	= raid0_quiesce,
 	.congested	= raid0_congested,
+	.is_missing_dev	= raid0_is_missing_dev,
 };
 
 static int __init raid0_init (void)
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] mdadm: Introduce new array state 'broken' for raid0
  2019-07-29 20:31 ` Guilherme G. Piccoli
@ 2019-07-29 20:31   ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-29 20:31 UTC (permalink / raw)
  To: linux-raid
  Cc: songliubraving, gpiccoli, neilb, linux-block, dm-devel, jay.vosburgh

Currently if a md/raid0 array gets one or more members removed while
being mounted, kernel keeps showing state 'clean' in the 'array_state'
sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.

Nothing else hints that something is wrong (except that the removed devices
don't show properly in the output of mdadm 'detail' command). There is no
other property to be checked, and if user is not performing reads/writes
to the array, even kernel log is quiet and doesn't give a clue about the
missing member.

This patch is the mdadm counterpart of kernel new array state 'broken'.
The 'broken' state mimics the state 'clean' in every aspect, being useful
only to distinguish if an array has some member missing. All necessary
paths in mdadm were changed to deal with 'broken' state, and in case the
tool runs in a kernel that is not updated, it'll work normally, i.e., it
doesn't require the 'broken' state in order to work.
Also, this patch changes the way the array state is showed in the 'detail'
command (for raid0 only) - now it takes the 'array_state' sysfs attribute
into account instead of only rely in the MD_SB_CLEAN flag.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 Detail.c  | 16 ++++++++++++++--
 Monitor.c |  9 +++++++--
 maps.c    |  1 +
 mdadm.h   |  1 +
 mdmon.h   |  2 +-
 monitor.c |  4 ++--
 6 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Detail.c b/Detail.c
index 20ea03a..4bf86b1 100644
--- a/Detail.c
+++ b/Detail.c
@@ -81,6 +81,7 @@ int Detail(char *dev, struct context *c)
 	int external;
 	int inactive;
 	int is_container = 0;
+	char arrayst[12] = { 0 }; /* no state is >10 chars currently */
 
 	if (fd < 0) {
 		pr_err("cannot open %s: %s\n",
@@ -485,9 +486,20 @@ int Detail(char *dev, struct context *c)
 			else
 				st = ", degraded";
 
+			if (array.state & (1 << MD_SB_CLEAN)) {
+				if (array.level == 0)
+					strncpy(arrayst,
+						map_num(sysfs_array_states,
+							sra->array_state),
+						sizeof(arrayst)-1);
+				else
+					strncpy(arrayst, "clean",
+						sizeof(arrayst)-1);
+			} else
+				strncpy(arrayst, "active", sizeof(arrayst)-1);
+
 			printf("             State : %s%s%s%s%s%s \n",
-			       (array.state & (1 << MD_SB_CLEAN)) ?
-			       "clean" : "active", st,
+			       arrayst, st,
 			       (!e || (e->percent < 0 &&
 				       e->percent != RESYNC_PENDING &&
 				       e->percent != RESYNC_DELAYED)) ?
diff --git a/Monitor.c b/Monitor.c
index 036103f..9a6a250 100644
--- a/Monitor.c
+++ b/Monitor.c
@@ -1055,8 +1055,12 @@ int Wait(char *dev)
 	}
 }
 
+/* The state "broken" is used only for RAID0 - it's the same as "clean", but
+ * used in case the array has one or more members missing.
+ */
+#define CLEAN_STATES_LAST_POS	5
 static char *clean_states[] = {
-	"clear", "inactive", "readonly", "read-auto", "clean", NULL };
+	"clear", "inactive", "readonly", "read-auto", "clean", "broken", NULL };
 
 int WaitClean(char *dev, int verbose)
 {
@@ -1116,7 +1120,8 @@ int WaitClean(char *dev, int verbose)
 			rv = read(state_fd, buf, sizeof(buf));
 			if (rv < 0)
 				break;
-			if (sysfs_match_word(buf, clean_states) <= 4)
+			if (sysfs_match_word(buf, clean_states)
+			    <= CLEAN_STATES_LAST_POS)
 				break;
 			rv = sysfs_wait(state_fd, &delay);
 			if (rv < 0 && errno != EINTR)
diff --git a/maps.c b/maps.c
index 02a0474..98ddbbc 100644
--- a/maps.c
+++ b/maps.c
@@ -150,6 +150,7 @@ mapping_t sysfs_array_states[] = {
 	{ "read-auto", ARRAY_READ_AUTO },
 	{ "clean", ARRAY_CLEAN },
 	{ "write-pending", ARRAY_WRITE_PENDING },
+	{ "broken", ARRAY_BROKEN_RAID0 },
 	{ NULL, ARRAY_UNKNOWN_STATE }
 };
 
diff --git a/mdadm.h b/mdadm.h
index c36d7fd..72c2525 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -375,6 +375,7 @@ struct mdinfo {
 		ARRAY_ACTIVE,
 		ARRAY_WRITE_PENDING,
 		ARRAY_ACTIVE_IDLE,
+		ARRAY_BROKEN_RAID0,
 		ARRAY_UNKNOWN_STATE,
 	} array_state;
 	struct md_bb bb;
diff --git a/mdmon.h b/mdmon.h
index 818367c..b3d72ac 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -21,7 +21,7 @@
 extern const char Name[];
 
 enum array_state { clear, inactive, suspended, readonly, read_auto,
-		   clean, active, write_pending, active_idle, bad_word};
+		   clean, active, write_pending, active_idle, broken, bad_word};
 
 enum sync_action { idle, reshape, resync, recover, check, repair, bad_action };
 
diff --git a/monitor.c b/monitor.c
index 81537ed..e0d3be6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -26,7 +26,7 @@
 
 static char *array_states[] = {
 	"clear", "inactive", "suspended", "readonly", "read-auto",
-	"clean", "active", "write-pending", "active-idle", NULL };
+	"clean", "active", "write-pending", "active-idle", "broken", NULL };
 static char *sync_actions[] = {
 	"idle", "reshape", "resync", "recover", "check", "repair", NULL
 };
@@ -476,7 +476,7 @@ static int read_and_act(struct active_array *a, fd_set *fds)
 		a->next_state = clean;
 		ret |= ARRAY_DIRTY;
 	}
-	if (a->curr_state == clean) {
+	if ((a->curr_state == clean) || (a->curr_state == broken)) {
 		a->container->ss->set_array_state(a, 1);
 	}
 	if (a->curr_state == active ||
-- 
2.22.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/2] mdadm: Introduce new array state 'broken' for raid0
@ 2019-07-29 20:31   ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-29 20:31 UTC (permalink / raw)
  To: linux-raid
  Cc: linux-block, dm-devel, gpiccoli, jay.vosburgh, neilb, songliubraving

Currently if a md/raid0 array gets one or more members removed while
being mounted, kernel keeps showing state 'clean' in the 'array_state'
sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.

Nothing else hints that something is wrong (except that the removed devices
don't show properly in the output of mdadm 'detail' command). There is no
other property to be checked, and if user is not performing reads/writes
to the array, even kernel log is quiet and doesn't give a clue about the
missing member.

This patch is the mdadm counterpart of kernel new array state 'broken'.
The 'broken' state mimics the state 'clean' in every aspect, being useful
only to distinguish if an array has some member missing. All necessary
paths in mdadm were changed to deal with 'broken' state, and in case the
tool runs in a kernel that is not updated, it'll work normally, i.e., it
doesn't require the 'broken' state in order to work.
Also, this patch changes the way the array state is showed in the 'detail'
command (for raid0 only) - now it takes the 'array_state' sysfs attribute
into account instead of only rely in the MD_SB_CLEAN flag.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---
 Detail.c  | 16 ++++++++++++++--
 Monitor.c |  9 +++++++--
 maps.c    |  1 +
 mdadm.h   |  1 +
 mdmon.h   |  2 +-
 monitor.c |  4 ++--
 6 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Detail.c b/Detail.c
index 20ea03a..4bf86b1 100644
--- a/Detail.c
+++ b/Detail.c
@@ -81,6 +81,7 @@ int Detail(char *dev, struct context *c)
 	int external;
 	int inactive;
 	int is_container = 0;
+	char arrayst[12] = { 0 }; /* no state is >10 chars currently */
 
 	if (fd < 0) {
 		pr_err("cannot open %s: %s\n",
@@ -485,9 +486,20 @@ int Detail(char *dev, struct context *c)
 			else
 				st = ", degraded";
 
+			if (array.state & (1 << MD_SB_CLEAN)) {
+				if (array.level == 0)
+					strncpy(arrayst,
+						map_num(sysfs_array_states,
+							sra->array_state),
+						sizeof(arrayst)-1);
+				else
+					strncpy(arrayst, "clean",
+						sizeof(arrayst)-1);
+			} else
+				strncpy(arrayst, "active", sizeof(arrayst)-1);
+
 			printf("             State : %s%s%s%s%s%s \n",
-			       (array.state & (1 << MD_SB_CLEAN)) ?
-			       "clean" : "active", st,
+			       arrayst, st,
 			       (!e || (e->percent < 0 &&
 				       e->percent != RESYNC_PENDING &&
 				       e->percent != RESYNC_DELAYED)) ?
diff --git a/Monitor.c b/Monitor.c
index 036103f..9a6a250 100644
--- a/Monitor.c
+++ b/Monitor.c
@@ -1055,8 +1055,12 @@ int Wait(char *dev)
 	}
 }
 
+/* The state "broken" is used only for RAID0 - it's the same as "clean", but
+ * used in case the array has one or more members missing.
+ */
+#define CLEAN_STATES_LAST_POS	5
 static char *clean_states[] = {
-	"clear", "inactive", "readonly", "read-auto", "clean", NULL };
+	"clear", "inactive", "readonly", "read-auto", "clean", "broken", NULL };
 
 int WaitClean(char *dev, int verbose)
 {
@@ -1116,7 +1120,8 @@ int WaitClean(char *dev, int verbose)
 			rv = read(state_fd, buf, sizeof(buf));
 			if (rv < 0)
 				break;
-			if (sysfs_match_word(buf, clean_states) <= 4)
+			if (sysfs_match_word(buf, clean_states)
+			    <= CLEAN_STATES_LAST_POS)
 				break;
 			rv = sysfs_wait(state_fd, &delay);
 			if (rv < 0 && errno != EINTR)
diff --git a/maps.c b/maps.c
index 02a0474..98ddbbc 100644
--- a/maps.c
+++ b/maps.c
@@ -150,6 +150,7 @@ mapping_t sysfs_array_states[] = {
 	{ "read-auto", ARRAY_READ_AUTO },
 	{ "clean", ARRAY_CLEAN },
 	{ "write-pending", ARRAY_WRITE_PENDING },
+	{ "broken", ARRAY_BROKEN_RAID0 },
 	{ NULL, ARRAY_UNKNOWN_STATE }
 };
 
diff --git a/mdadm.h b/mdadm.h
index c36d7fd..72c2525 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -375,6 +375,7 @@ struct mdinfo {
 		ARRAY_ACTIVE,
 		ARRAY_WRITE_PENDING,
 		ARRAY_ACTIVE_IDLE,
+		ARRAY_BROKEN_RAID0,
 		ARRAY_UNKNOWN_STATE,
 	} array_state;
 	struct md_bb bb;
diff --git a/mdmon.h b/mdmon.h
index 818367c..b3d72ac 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -21,7 +21,7 @@
 extern const char Name[];
 
 enum array_state { clear, inactive, suspended, readonly, read_auto,
-		   clean, active, write_pending, active_idle, bad_word};
+		   clean, active, write_pending, active_idle, broken, bad_word};
 
 enum sync_action { idle, reshape, resync, recover, check, repair, bad_action };
 
diff --git a/monitor.c b/monitor.c
index 81537ed..e0d3be6 100644
--- a/monitor.c
+++ b/monitor.c
@@ -26,7 +26,7 @@
 
 static char *array_states[] = {
 	"clear", "inactive", "suspended", "readonly", "read-auto",
-	"clean", "active", "write-pending", "active-idle", NULL };
+	"clean", "active", "write-pending", "active-idle", "broken", NULL };
 static char *sync_actions[] = {
 	"idle", "reshape", "resync", "recover", "check", "repair", NULL
 };
@@ -476,7 +476,7 @@ static int read_and_act(struct active_array *a, fd_set *fds)
 		a->next_state = clean;
 		ret |= ARRAY_DIRTY;
 	}
-	if (a->curr_state == clean) {
+	if ((a->curr_state == clean) || (a->curr_state == broken)) {
 		a->container->ss->set_array_state(a, 1);
 	}
 	if (a->curr_state == active ||
-- 
2.22.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-29 20:31   ` Guilherme G. Piccoli
@ 2019-07-30  0:11     ` NeilBrown
  -1 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2019-07-30  0:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-raid
  Cc: linux-block, jay.vosburgh, songliubraving, dm-devel, Neil F Brown


[-- Attachment #1.1: Type: text/plain, Size: 5759 bytes --]

On Mon, Jul 29 2019,  Guilherme G. Piccoli  wrote:

> Currently if a md/raid0 array gets one or more members removed while
> being mounted, kernel keeps showing state 'clean' in the 'array_state'
> sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
> cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
>
> Nothing else hints that something is wrong (except that the removed devices
> don't show properly in the output of 'mdadm detail' command). There is no
> other property to be checked, and if user is not performing reads/writes
> to the array, even kernel log is quiet and doesn't give a clue about the
> missing member.
>
> This patch changes this behavior; when 'array_state' is read we introduce
> a non-expensive check (only for raid0) that relies in the comparison of
> the total number of disks when array was assembled with gendisk flags of
> those devices to validate if all members are available and functional.
> A new array state 'broken' was added: it mimics the state 'clean' in every
> aspect, being useful only to distinguish if such array has some member
> missing. Also, we show a rate-limited warning in kernel log in such case.
>
> This patch has no proper functional change other than adding a 'clean'-like
> state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
> counterpart to handle the 'broken' state.
>
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  drivers/md/md.c    | 23 +++++++++++++++++++----
>  drivers/md/md.h    |  2 ++
>  drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index fba49918d591..b80f36084ec1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4160,12 +4160,18 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
>   * active-idle
>   *     like active, but no writes have been seen for a while (100msec).
>   *
> + * broken
> + *     RAID0-only: same as clean, but array is missing a member.
> + *     It's useful because RAID0 mounted-arrays aren't stopped
> + *     when a member is gone, so this state will at least alert
> + *     the user that something is wrong.
> + *
>   */
>  enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
> -		   write_pending, active_idle, bad_word};
> +		   write_pending, active_idle, broken, bad_word};
>  static char *array_states[] = {
>  	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
> -	"write-pending", "active-idle", NULL };
> +	"write-pending", "active-idle", "broken", NULL };
>  
>  static int match_word(const char *word, char **list)
>  {
> @@ -4181,7 +4187,7 @@ array_state_show(struct mddev *mddev, char *page)
>  {
>  	enum array_state st = inactive;
>  
> -	if (mddev->pers)
> +	if (mddev->pers) {
>  		switch(mddev->ro) {
>  		case 1:
>  			st = readonly;
> @@ -4201,7 +4207,15 @@ array_state_show(struct mddev *mddev, char *page)
>  				st = active;
>  			spin_unlock(&mddev->lock);
>  		}
> -	else {
> +
> +		if ((mddev->pers->level == 0) &&

Don't test if ->level is 0.  Instead, test if ->is_missing_dev is not
NULL.

NeilBrown


> +		   ((st == clean) || (st == broken))) {
> +			if (mddev->pers->is_missing_dev(mddev))
> +				st = broken;
> +			else
> +				st = clean;
> +		}
> +	} else {
>  		if (list_empty(&mddev->disks) &&
>  		    mddev->raid_disks == 0 &&
>  		    mddev->dev_sectors == 0)
> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>  		break;
>  	case write_pending:
>  	case active_idle:
> +	case broken:
>  		/* these cannot be set */
>  		break;
>  	}
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 41552e615c4c..e7b42b75701a 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -590,6 +590,8 @@ struct md_personality
>  	int (*congested)(struct mddev *mddev, int bits);
>  	/* Changes the consistency policy of an active array. */
>  	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
> +	/* Check if there is any missing/failed members - RAID0 only for now. */
> +	bool (*is_missing_dev)(struct mddev *mddev);
>  };
>  
>  struct md_sysfs_entry {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 58a9cc5193bf..79618a6ae31a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>  	}
>  }
>  
> +bool raid0_is_missing_dev(struct mddev *mddev)
> +{
> +	struct md_rdev *rdev;
> +	static int already_missing;
> +	int def_disks, work_disks = 0;
> +	struct r0conf *conf = mddev->private;
> +
> +	def_disks = conf->strip_zone[0].nb_dev;
> +	rdev_for_each(rdev, mddev)
> +		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
> +			work_disks++;
> +
> +	if (unlikely(def_disks - work_disks)) {
> +		if (!already_missing) {
> +			already_missing = 1;
> +			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
> +				mdname(mddev), (def_disks - work_disks));
> +		}
> +		return true;
> +	}
> +
> +	already_missing = 0;
> +	return false;
> +}
> +
>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  {
>  	struct r0conf *conf = mddev->private;
> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>  	.takeover	= raid0_takeover,
>  	.quiesce	= raid0_quiesce,
>  	.congested	= raid0_congested,
> +	.is_missing_dev	= raid0_is_missing_dev,
>  };
>  
>  static int __init raid0_init (void)
> -- 
> 2.22.0

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-30  0:11     ` NeilBrown
  0 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2019-07-30  0:11 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-raid
  Cc: jay.vosburgh, songliubraving, dm-devel, Neil F Brown, linux-block

[-- Attachment #1: Type: text/plain, Size: 5759 bytes --]

On Mon, Jul 29 2019,  Guilherme G. Piccoli  wrote:

> Currently if a md/raid0 array gets one or more members removed while
> being mounted, kernel keeps showing state 'clean' in the 'array_state'
> sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
> cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
>
> Nothing else hints that something is wrong (except that the removed devices
> don't show properly in the output of 'mdadm detail' command). There is no
> other property to be checked, and if user is not performing reads/writes
> to the array, even kernel log is quiet and doesn't give a clue about the
> missing member.
>
> This patch changes this behavior; when 'array_state' is read we introduce
> a non-expensive check (only for raid0) that relies in the comparison of
> the total number of disks when array was assembled with gendisk flags of
> those devices to validate if all members are available and functional.
> A new array state 'broken' was added: it mimics the state 'clean' in every
> aspect, being useful only to distinguish if such array has some member
> missing. Also, we show a rate-limited warning in kernel log in such case.
>
> This patch has no proper functional change other than adding a 'clean'-like
> state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
> counterpart to handle the 'broken' state.
>
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  drivers/md/md.c    | 23 +++++++++++++++++++----
>  drivers/md/md.h    |  2 ++
>  drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index fba49918d591..b80f36084ec1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4160,12 +4160,18 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
>   * active-idle
>   *     like active, but no writes have been seen for a while (100msec).
>   *
> + * broken
> + *     RAID0-only: same as clean, but array is missing a member.
> + *     It's useful because RAID0 mounted-arrays aren't stopped
> + *     when a member is gone, so this state will at least alert
> + *     the user that something is wrong.
> + *
>   */
>  enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
> -		   write_pending, active_idle, bad_word};
> +		   write_pending, active_idle, broken, bad_word};
>  static char *array_states[] = {
>  	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
> -	"write-pending", "active-idle", NULL };
> +	"write-pending", "active-idle", "broken", NULL };
>  
>  static int match_word(const char *word, char **list)
>  {
> @@ -4181,7 +4187,7 @@ array_state_show(struct mddev *mddev, char *page)
>  {
>  	enum array_state st = inactive;
>  
> -	if (mddev->pers)
> +	if (mddev->pers) {
>  		switch(mddev->ro) {
>  		case 1:
>  			st = readonly;
> @@ -4201,7 +4207,15 @@ array_state_show(struct mddev *mddev, char *page)
>  				st = active;
>  			spin_unlock(&mddev->lock);
>  		}
> -	else {
> +
> +		if ((mddev->pers->level == 0) &&

Don't test if ->level is 0.  Instead, test if ->is_missing_dev is not
NULL.

NeilBrown


> +		   ((st == clean) || (st == broken))) {
> +			if (mddev->pers->is_missing_dev(mddev))
> +				st = broken;
> +			else
> +				st = clean;
> +		}
> +	} else {
>  		if (list_empty(&mddev->disks) &&
>  		    mddev->raid_disks == 0 &&
>  		    mddev->dev_sectors == 0)
> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>  		break;
>  	case write_pending:
>  	case active_idle:
> +	case broken:
>  		/* these cannot be set */
>  		break;
>  	}
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 41552e615c4c..e7b42b75701a 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -590,6 +590,8 @@ struct md_personality
>  	int (*congested)(struct mddev *mddev, int bits);
>  	/* Changes the consistency policy of an active array. */
>  	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
> +	/* Check if there is any missing/failed members - RAID0 only for now. */
> +	bool (*is_missing_dev)(struct mddev *mddev);
>  };
>  
>  struct md_sysfs_entry {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 58a9cc5193bf..79618a6ae31a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>  	}
>  }
>  
> +bool raid0_is_missing_dev(struct mddev *mddev)
> +{
> +	struct md_rdev *rdev;
> +	static int already_missing;
> +	int def_disks, work_disks = 0;
> +	struct r0conf *conf = mddev->private;
> +
> +	def_disks = conf->strip_zone[0].nb_dev;
> +	rdev_for_each(rdev, mddev)
> +		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
> +			work_disks++;
> +
> +	if (unlikely(def_disks - work_disks)) {
> +		if (!already_missing) {
> +			already_missing = 1;
> +			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
> +				mdname(mddev), (def_disks - work_disks));
> +		}
> +		return true;
> +	}
> +
> +	already_missing = 0;
> +	return false;
> +}
> +
>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  {
>  	struct r0conf *conf = mddev->private;
> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>  	.takeover	= raid0_takeover,
>  	.quiesce	= raid0_quiesce,
>  	.congested	= raid0_congested,
> +	.is_missing_dev	= raid0_is_missing_dev,
>  };
>  
>  static int __init raid0_init (void)
> -- 
> 2.22.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-29 20:31   ` Guilherme G. Piccoli
@ 2019-07-30  6:20     ` Bob Liu
  -1 siblings, 0 replies; 26+ messages in thread
From: Bob Liu @ 2019-07-30  6:20 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-raid
  Cc: linux-block, jay.vosburgh, dm-devel, songliubraving, neilb

On 7/30/19 4:31 AM, Guilherme G. Piccoli wrote:
> Currently if a md/raid0 array gets one or more members removed while
> being mounted, kernel keeps showing state 'clean' in the 'array_state'
> sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
> cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
> 
> Nothing else hints that something is wrong (except that the removed devices
> don't show properly in the output of 'mdadm detail' command). There is no
> other property to be checked, and if user is not performing reads/writes
> to the array, even kernel log is quiet and doesn't give a clue about the
> missing member.
> 
> This patch changes this behavior; when 'array_state' is read we introduce
> a non-expensive check (only for raid0) that relies in the comparison of
> the total number of disks when array was assembled with gendisk flags of> those devices to validate if all members are available and functional.
> A new array state 'broken' was added: it mimics the state 'clean' in every
> aspect, being useful only to distinguish if such array has some member
> missing. Also, we show a rate-limited warning in kernel log in such case.
> 
> This patch has no proper functional change other than adding a 'clean'-like
> state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
> counterpart to handle the 'broken' state.
> 
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  drivers/md/md.c    | 23 +++++++++++++++++++----
>  drivers/md/md.h    |  2 ++
>  drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index fba49918d591..b80f36084ec1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4160,12 +4160,18 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
>   * active-idle
>   *     like active, but no writes have been seen for a while (100msec).
>   *
> + * broken
> + *     RAID0-only: same as clean, but array is missing a member.
> + *     It's useful because RAID0 mounted-arrays aren't stopped
> + *     when a member is gone, so this state will at least alert
> + *     the user that something is wrong.


Curious why only raid0 has this issue? 

Thanks, -Bob

> + *
>   */
>  enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
> -		   write_pending, active_idle, bad_word};
> +		   write_pending, active_idle, broken, bad_word};
>  static char *array_states[] = {
>  	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
> -	"write-pending", "active-idle", NULL };
> +	"write-pending", "active-idle", "broken", NULL };
>  
>  static int match_word(const char *word, char **list)
>  {
> @@ -4181,7 +4187,7 @@ array_state_show(struct mddev *mddev, char *page)
>  {
>  	enum array_state st = inactive;
>  
> -	if (mddev->pers)
> +	if (mddev->pers) {
>  		switch(mddev->ro) {
>  		case 1:
>  			st = readonly;
> @@ -4201,7 +4207,15 @@ array_state_show(struct mddev *mddev, char *page)
>  				st = active;
>  			spin_unlock(&mddev->lock);
>  		}
> -	else {
> +
> +		if ((mddev->pers->level == 0) &&
> +		   ((st == clean) || (st == broken))) {
> +			if (mddev->pers->is_missing_dev(mddev))
> +				st = broken;
> +			else
> +				st = clean;
> +		}
> +	} else {
>  		if (list_empty(&mddev->disks) &&
>  		    mddev->raid_disks == 0 &&
>  		    mddev->dev_sectors == 0)
> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>  		break;
>  	case write_pending:
>  	case active_idle:
> +	case broken:
>  		/* these cannot be set */
>  		break;
>  	}
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 41552e615c4c..e7b42b75701a 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -590,6 +590,8 @@ struct md_personality
>  	int (*congested)(struct mddev *mddev, int bits);
>  	/* Changes the consistency policy of an active array. */
>  	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
> +	/* Check if there is any missing/failed members - RAID0 only for now. */
> +	bool (*is_missing_dev)(struct mddev *mddev);
>  };
>  
>  struct md_sysfs_entry {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 58a9cc5193bf..79618a6ae31a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>  	}
>  }
>  
> +bool raid0_is_missing_dev(struct mddev *mddev)
> +{
> +	struct md_rdev *rdev;
> +	static int already_missing;
> +	int def_disks, work_disks = 0;
> +	struct r0conf *conf = mddev->private;
> +
> +	def_disks = conf->strip_zone[0].nb_dev;
> +	rdev_for_each(rdev, mddev)
> +		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
> +			work_disks++;
> +
> +	if (unlikely(def_disks - work_disks)) {
> +		if (!already_missing) {
> +			already_missing = 1;
> +			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
> +				mdname(mddev), (def_disks - work_disks));
> +		}
> +		return true;
> +	}
> +
> +	already_missing = 0;
> +	return false;
> +}
> +
>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  {
>  	struct r0conf *conf = mddev->private;
> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>  	.takeover	= raid0_takeover,
>  	.quiesce	= raid0_quiesce,
>  	.congested	= raid0_congested,
> +	.is_missing_dev	= raid0_is_missing_dev,
>  };
>  
>  static int __init raid0_init (void)
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-30  6:20     ` Bob Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Bob Liu @ 2019-07-30  6:20 UTC (permalink / raw)
  To: Guilherme G. Piccoli, linux-raid
  Cc: linux-block, dm-devel, jay.vosburgh, neilb, songliubraving

On 7/30/19 4:31 AM, Guilherme G. Piccoli wrote:
> Currently if a md/raid0 array gets one or more members removed while
> being mounted, kernel keeps showing state 'clean' in the 'array_state'
> sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
> cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
> 
> Nothing else hints that something is wrong (except that the removed devices
> don't show properly in the output of 'mdadm detail' command). There is no
> other property to be checked, and if user is not performing reads/writes
> to the array, even kernel log is quiet and doesn't give a clue about the
> missing member.
> 
> This patch changes this behavior; when 'array_state' is read we introduce
> a non-expensive check (only for raid0) that relies in the comparison of
> the total number of disks when array was assembled with gendisk flags of> those devices to validate if all members are available and functional.
> A new array state 'broken' was added: it mimics the state 'clean' in every
> aspect, being useful only to distinguish if such array has some member
> missing. Also, we show a rate-limited warning in kernel log in such case.
> 
> This patch has no proper functional change other than adding a 'clean'-like
> state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
> counterpart to handle the 'broken' state.
> 
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>  drivers/md/md.c    | 23 +++++++++++++++++++----
>  drivers/md/md.h    |  2 ++
>  drivers/md/raid0.c | 26 ++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index fba49918d591..b80f36084ec1 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4160,12 +4160,18 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
>   * active-idle
>   *     like active, but no writes have been seen for a while (100msec).
>   *
> + * broken
> + *     RAID0-only: same as clean, but array is missing a member.
> + *     It's useful because RAID0 mounted-arrays aren't stopped
> + *     when a member is gone, so this state will at least alert
> + *     the user that something is wrong.


Curious why only raid0 has this issue? 

Thanks, -Bob

> + *
>   */
>  enum array_state { clear, inactive, suspended, readonly, read_auto, clean, active,
> -		   write_pending, active_idle, bad_word};
> +		   write_pending, active_idle, broken, bad_word};
>  static char *array_states[] = {
>  	"clear", "inactive", "suspended", "readonly", "read-auto", "clean", "active",
> -	"write-pending", "active-idle", NULL };
> +	"write-pending", "active-idle", "broken", NULL };
>  
>  static int match_word(const char *word, char **list)
>  {
> @@ -4181,7 +4187,7 @@ array_state_show(struct mddev *mddev, char *page)
>  {
>  	enum array_state st = inactive;
>  
> -	if (mddev->pers)
> +	if (mddev->pers) {
>  		switch(mddev->ro) {
>  		case 1:
>  			st = readonly;
> @@ -4201,7 +4207,15 @@ array_state_show(struct mddev *mddev, char *page)
>  				st = active;
>  			spin_unlock(&mddev->lock);
>  		}
> -	else {
> +
> +		if ((mddev->pers->level == 0) &&
> +		   ((st == clean) || (st == broken))) {
> +			if (mddev->pers->is_missing_dev(mddev))
> +				st = broken;
> +			else
> +				st = clean;
> +		}
> +	} else {
>  		if (list_empty(&mddev->disks) &&
>  		    mddev->raid_disks == 0 &&
>  		    mddev->dev_sectors == 0)
> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>  		break;
>  	case write_pending:
>  	case active_idle:
> +	case broken:
>  		/* these cannot be set */
>  		break;
>  	}
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 41552e615c4c..e7b42b75701a 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -590,6 +590,8 @@ struct md_personality
>  	int (*congested)(struct mddev *mddev, int bits);
>  	/* Changes the consistency policy of an active array. */
>  	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
> +	/* Check if there is any missing/failed members - RAID0 only for now. */
> +	bool (*is_missing_dev)(struct mddev *mddev);
>  };
>  
>  struct md_sysfs_entry {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 58a9cc5193bf..79618a6ae31a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>  	}
>  }
>  
> +bool raid0_is_missing_dev(struct mddev *mddev)
> +{
> +	struct md_rdev *rdev;
> +	static int already_missing;
> +	int def_disks, work_disks = 0;
> +	struct r0conf *conf = mddev->private;
> +
> +	def_disks = conf->strip_zone[0].nb_dev;
> +	rdev_for_each(rdev, mddev)
> +		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
> +			work_disks++;
> +
> +	if (unlikely(def_disks - work_disks)) {
> +		if (!already_missing) {
> +			already_missing = 1;
> +			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
> +				mdname(mddev), (def_disks - work_disks));
> +		}
> +		return true;
> +	}
> +
> +	already_missing = 0;
> +	return false;
> +}
> +
>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  {
>  	struct r0conf *conf = mddev->private;
> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>  	.takeover	= raid0_takeover,
>  	.quiesce	= raid0_quiesce,
>  	.congested	= raid0_congested,
> +	.is_missing_dev	= raid0_is_missing_dev,
>  };
>  
>  static int __init raid0_init (void)
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-30  0:11     ` NeilBrown
@ 2019-07-30 11:43       ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-30 11:43 UTC (permalink / raw)
  To: NeilBrown, linux-raid
  Cc: linux-block, jay.vosburgh, songliubraving, dm-devel, Neil F Brown

On 29/07/2019 21:11, NeilBrown wrote:
> [...]
>> -	else {
>> +
>> +		if ((mddev->pers->level == 0) &&
> 
> Don't test if ->level is 0.  Instead, test if ->is_missing_dev is not
> NULL.
> 
> NeilBrown

Hi Neil, thanks for the feedback. I'll change that in a potential V2,
(if the patches are likely to be accepted), good idea.
Cheers,


Guilherme


> 
> 
>> +		   ((st == clean) || (st == broken))) {
>> +			if (mddev->pers->is_missing_dev(mddev))
>> +				st = broken;
>> +			else
>> +				st = clean;
>> +		}
>> +	} else {
>>  		if (list_empty(&mddev->disks) &&
>>  		    mddev->raid_disks == 0 &&
>>  		    mddev->dev_sectors == 0)
>> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>>  		break;
>>  	case write_pending:
>>  	case active_idle:
>> +	case broken:
>>  		/* these cannot be set */
>>  		break;
>>  	}
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 41552e615c4c..e7b42b75701a 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -590,6 +590,8 @@ struct md_personality
>>  	int (*congested)(struct mddev *mddev, int bits);
>>  	/* Changes the consistency policy of an active array. */
>>  	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
>> +	/* Check if there is any missing/failed members - RAID0 only for now. */
>> +	bool (*is_missing_dev)(struct mddev *mddev);
>>  };
>>  
>>  struct md_sysfs_entry {
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index 58a9cc5193bf..79618a6ae31a 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>>  	}
>>  }
>>  
>> +bool raid0_is_missing_dev(struct mddev *mddev)
>> +{
>> +	struct md_rdev *rdev;
>> +	static int already_missing;
>> +	int def_disks, work_disks = 0;
>> +	struct r0conf *conf = mddev->private;
>> +
>> +	def_disks = conf->strip_zone[0].nb_dev;
>> +	rdev_for_each(rdev, mddev)
>> +		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
>> +			work_disks++;
>> +
>> +	if (unlikely(def_disks - work_disks)) {
>> +		if (!already_missing) {
>> +			already_missing = 1;
>> +			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
>> +				mdname(mddev), (def_disks - work_disks));
>> +		}
>> +		return true;
>> +	}
>> +
>> +	already_missing = 0;
>> +	return false;
>> +}
>> +
>>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>>  {
>>  	struct r0conf *conf = mddev->private;
>> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>>  	.takeover	= raid0_takeover,
>>  	.quiesce	= raid0_quiesce,
>>  	.congested	= raid0_congested,
>> +	.is_missing_dev	= raid0_is_missing_dev,
>>  };
>>  
>>  static int __init raid0_init (void)
>> -- 
>> 2.22.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-30 11:43       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-30 11:43 UTC (permalink / raw)
  To: NeilBrown, linux-raid
  Cc: jay.vosburgh, songliubraving, dm-devel, Neil F Brown, linux-block

On 29/07/2019 21:11, NeilBrown wrote:
> [...]
>> -	else {
>> +
>> +		if ((mddev->pers->level == 0) &&
> 
> Don't test if ->level is 0.  Instead, test if ->is_missing_dev is not
> NULL.
> 
> NeilBrown

Hi Neil, thanks for the feedback. I'll change that in a potential V2,
(if the patches are likely to be accepted), good idea.
Cheers,


Guilherme


> 
> 
>> +		   ((st == clean) || (st == broken))) {
>> +			if (mddev->pers->is_missing_dev(mddev))
>> +				st = broken;
>> +			else
>> +				st = clean;
>> +		}
>> +	} else {
>>  		if (list_empty(&mddev->disks) &&
>>  		    mddev->raid_disks == 0 &&
>>  		    mddev->dev_sectors == 0)
>> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>>  		break;
>>  	case write_pending:
>>  	case active_idle:
>> +	case broken:
>>  		/* these cannot be set */
>>  		break;
>>  	}
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 41552e615c4c..e7b42b75701a 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -590,6 +590,8 @@ struct md_personality
>>  	int (*congested)(struct mddev *mddev, int bits);
>>  	/* Changes the consistency policy of an active array. */
>>  	int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
>> +	/* Check if there is any missing/failed members - RAID0 only for now. */
>> +	bool (*is_missing_dev)(struct mddev *mddev);
>>  };
>>  
>>  struct md_sysfs_entry {
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index 58a9cc5193bf..79618a6ae31a 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>>  	}
>>  }
>>  
>> +bool raid0_is_missing_dev(struct mddev *mddev)
>> +{
>> +	struct md_rdev *rdev;
>> +	static int already_missing;
>> +	int def_disks, work_disks = 0;
>> +	struct r0conf *conf = mddev->private;
>> +
>> +	def_disks = conf->strip_zone[0].nb_dev;
>> +	rdev_for_each(rdev, mddev)
>> +		if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
>> +			work_disks++;
>> +
>> +	if (unlikely(def_disks - work_disks)) {
>> +		if (!already_missing) {
>> +			already_missing = 1;
>> +			pr_warn("md: %s: raid0 array has %d missing/failed members\n",
>> +				mdname(mddev), (def_disks - work_disks));
>> +		}
>> +		return true;
>> +	}
>> +
>> +	already_missing = 0;
>> +	return false;
>> +}
>> +
>>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>>  {
>>  	struct r0conf *conf = mddev->private;
>> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>>  	.takeover	= raid0_takeover,
>>  	.quiesce	= raid0_quiesce,
>>  	.congested	= raid0_congested,
>> +	.is_missing_dev	= raid0_is_missing_dev,
>>  };
>>  
>>  static int __init raid0_init (void)
>> -- 
>> 2.22.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-30  6:20     ` Bob Liu
@ 2019-07-30 12:18       ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-30 12:18 UTC (permalink / raw)
  To: Bob Liu, linux-raid
  Cc: linux-block, jay.vosburgh, dm-devel, songliubraving, neilb

On 30/07/2019 03:20, Bob Liu wrote:
> [...]
>> + * broken
>> + *     RAID0-only: same as clean, but array is missing a member.
>> + *     It's useful because RAID0 mounted-arrays aren't stopped
>> + *     when a member is gone, so this state will at least alert
>> + *     the user that something is wrong.
> 
> 
> Curious why only raid0 has this issue? 
> 
> Thanks, -Bob

Hi Bob, I understand that all other levels have fault-tolerance logic,
while raid0 is just a "bypass" driver that selects the correct
underlying device to send the BIO and blindly sends it. It's known to be
a performance-only /lightweight solution whereas the other levels aim to
be reliable.

I've quickly tested raid5 and rai10, and see messages like this on
kernel log when removing a device (in raid5):

[35.764975] md/raid:md0: Disk failure on nvme1n1, disabling device.
md/raid:md0: Operation continuing on 1 devices.

The message seen in raid10 is basically the same. As a (cheap)
comparison of the complexity among levels, look that:

<...>/linux-mainline/drivers/md# cat raid5* | wc -l
14191

<...>/linux-mainline/drivers/md# cat raid10* | wc -l
5135

<...>/linux-mainline/drivers/md# cat raid0* | wc -l
820

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-30 12:18       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-30 12:18 UTC (permalink / raw)
  To: Bob Liu, linux-raid
  Cc: linux-block, dm-devel, jay.vosburgh, neilb, songliubraving

On 30/07/2019 03:20, Bob Liu wrote:
> [...]
>> + * broken
>> + *     RAID0-only: same as clean, but array is missing a member.
>> + *     It's useful because RAID0 mounted-arrays aren't stopped
>> + *     when a member is gone, so this state will at least alert
>> + *     the user that something is wrong.
> 
> 
> Curious why only raid0 has this issue? 
> 
> Thanks, -Bob

Hi Bob, I understand that all other levels have fault-tolerance logic,
while raid0 is just a "bypass" driver that selects the correct
underlying device to send the BIO and blindly sends it. It's known to be
a performance-only /lightweight solution whereas the other levels aim to
be reliable.

I've quickly tested raid5 and rai10, and see messages like this on
kernel log when removing a device (in raid5):

[35.764975] md/raid:md0: Disk failure on nvme1n1, disabling device.
md/raid:md0: Operation continuing on 1 devices.

The message seen in raid10 is basically the same. As a (cheap)
comparison of the complexity among levels, look that:

<...>/linux-mainline/drivers/md# cat raid5* | wc -l
14191

<...>/linux-mainline/drivers/md# cat raid10* | wc -l
5135

<...>/linux-mainline/drivers/md# cat raid0* | wc -l
820

Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-30  6:20     ` Bob Liu
@ 2019-07-31  0:28       ` NeilBrown
  -1 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2019-07-31  0:28 UTC (permalink / raw)
  To: Bob Liu, Guilherme G. Piccoli, linux-raid
  Cc: linux-block, jay.vosburgh, songliubraving, dm-devel, Neil F Brown


[-- Attachment #1.1: Type: text/plain, Size: 224 bytes --]

On Tue, Jul 30 2019, Bob Liu wrote:
>
>
> Curious why only raid0 has this issue? 

Actually, it isn't only raid0.  'linear' has the same issue.
Probably the fix for raid0 should be applied to linear too.

NeilBrown

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-31  0:28       ` NeilBrown
  0 siblings, 0 replies; 26+ messages in thread
From: NeilBrown @ 2019-07-31  0:28 UTC (permalink / raw)
  To: Bob Liu, Guilherme G. Piccoli, linux-raid
  Cc: jay.vosburgh, songliubraving, dm-devel, Neil F Brown, linux-block

[-- Attachment #1: Type: text/plain, Size: 224 bytes --]

On Tue, Jul 30 2019, Bob Liu wrote:
>
>
> Curious why only raid0 has this issue? 

Actually, it isn't only raid0.  'linear' has the same issue.
Probably the fix for raid0 should be applied to linear too.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-31  0:28       ` NeilBrown
@ 2019-07-31 13:04         ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-31 13:04 UTC (permalink / raw)
  To: NeilBrown, Bob Liu, linux-raid
  Cc: linux-block, jay.vosburgh, songliubraving, dm-devel, Neil F Brown

On 30/07/2019 21:28, NeilBrown wrote:
> On Tue, Jul 30 2019, Bob Liu wrote:
>>
>>
>> Curious why only raid0 has this issue? 
> 
> Actually, it isn't only raid0.  'linear' has the same issue.
> Probably the fix for raid0 should be applied to linear too.
> 
> NeilBrown
> 

Thanks Neil, it makes sense! I didn't considered "linear" and indeed,
after some testing, it reacts exactly as raid0/stripping.

In case this patch gets good acceptance I can certainly include
md/linear in that!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-31 13:04         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-07-31 13:04 UTC (permalink / raw)
  To: NeilBrown, Bob Liu, linux-raid
  Cc: jay.vosburgh, songliubraving, dm-devel, Neil F Brown, linux-block

On 30/07/2019 21:28, NeilBrown wrote:
> On Tue, Jul 30 2019, Bob Liu wrote:
>>
>>
>> Curious why only raid0 has this issue? 
> 
> Actually, it isn't only raid0.  'linear' has the same issue.
> Probably the fix for raid0 should be applied to linear too.
> 
> NeilBrown
> 

Thanks Neil, it makes sense! I didn't considered "linear" and indeed,
after some testing, it reacts exactly as raid0/stripping.

In case this patch gets good acceptance I can certainly include
md/linear in that!
Cheers,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-29 20:31   ` Guilherme G. Piccoli
@ 2019-07-31 19:43     ` Song Liu
  -1 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2019-07-31 19:43 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-block, Song Liu, NeilBrown, linux-raid, dm-devel, Jay Vosburgh

On Mon, Jul 29, 2019 at 1:33 PM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Currently if a md/raid0 array gets one or more members removed while
> being mounted, kernel keeps showing state 'clean' in the 'array_state'
> sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
> cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
>
> Nothing else hints that something is wrong (except that the removed devices
> don't show properly in the output of 'mdadm detail' command). There is no
> other property to be checked, and if user is not performing reads/writes
> to the array, even kernel log is quiet and doesn't give a clue about the
> missing member.
>
> This patch changes this behavior; when 'array_state' is read we introduce
> a non-expensive check (only for raid0) that relies in the comparison of
> the total number of disks when array was assembled with gendisk flags of
> those devices to validate if all members are available and functional.
> A new array state 'broken' was added: it mimics the state 'clean' in every
> aspect, being useful only to distinguish if such array has some member
> missing. Also, we show a rate-limited warning in kernel log in such case.
>
> This patch has no proper functional change other than adding a 'clean'-like
> state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
> counterpart to handle the 'broken' state.
>
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>
[...]
> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>                 break;
>         case write_pending:
>         case active_idle:
> +       case broken:
>                 /* these cannot be set */
>                 break;
>         }

Maybe it is useful to set "broken" state? When user space found some issues
with the drive?

Thanks,
Song


> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 41552e615c4c..e7b42b75701a 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -590,6 +590,8 @@ struct md_personality
>         int (*congested)(struct mddev *mddev, int bits);
>         /* Changes the consistency policy of an active array. */
>         int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
> +       /* Check if there is any missing/failed members - RAID0 only for now. */
> +       bool (*is_missing_dev)(struct mddev *mddev);
>  };
>
>  struct md_sysfs_entry {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 58a9cc5193bf..79618a6ae31a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>         }
>  }
>
> +bool raid0_is_missing_dev(struct mddev *mddev)
> +{
> +       struct md_rdev *rdev;
> +       static int already_missing;
> +       int def_disks, work_disks = 0;
> +       struct r0conf *conf = mddev->private;
> +
> +       def_disks = conf->strip_zone[0].nb_dev;
> +       rdev_for_each(rdev, mddev)
> +               if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
> +                       work_disks++;
> +
> +       if (unlikely(def_disks - work_disks)) {
> +               if (!already_missing) {
> +                       already_missing = 1;
> +                       pr_warn("md: %s: raid0 array has %d missing/failed members\n",
> +                               mdname(mddev), (def_disks - work_disks));
> +               }
> +               return true;
> +       }
> +
> +       already_missing = 0;
> +       return false;
> +}
> +
>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  {
>         struct r0conf *conf = mddev->private;
> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>         .takeover       = raid0_takeover,
>         .quiesce        = raid0_quiesce,
>         .congested      = raid0_congested,
> +       .is_missing_dev = raid0_is_missing_dev,
>  };
>
>  static int __init raid0_init (void)
> --
> 2.22.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-31 19:43     ` Song Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2019-07-31 19:43 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: linux-raid, linux-block, dm-devel, Jay Vosburgh, NeilBrown, Song Liu

On Mon, Jul 29, 2019 at 1:33 PM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> Currently if a md/raid0 array gets one or more members removed while
> being mounted, kernel keeps showing state 'clean' in the 'array_state'
> sysfs attribute. Despite udev signaling the member device is gone, 'mdadm'
> cannot issue the STOP_ARRAY ioctl successfully, given the array is mounted.
>
> Nothing else hints that something is wrong (except that the removed devices
> don't show properly in the output of 'mdadm detail' command). There is no
> other property to be checked, and if user is not performing reads/writes
> to the array, even kernel log is quiet and doesn't give a clue about the
> missing member.
>
> This patch changes this behavior; when 'array_state' is read we introduce
> a non-expensive check (only for raid0) that relies in the comparison of
> the total number of disks when array was assembled with gendisk flags of
> those devices to validate if all members are available and functional.
> A new array state 'broken' was added: it mimics the state 'clean' in every
> aspect, being useful only to distinguish if such array has some member
> missing. Also, we show a rate-limited warning in kernel log in such case.
>
> This patch has no proper functional change other than adding a 'clean'-like
> state; it was tested with ext4 and xfs filesystems. It requires a 'mdadm'
> counterpart to handle the 'broken' state.
>
> Cc: NeilBrown <neilb@suse.com>
> Cc: Song Liu <songliubraving@fb.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
>
[...]
> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>                 break;
>         case write_pending:
>         case active_idle:
> +       case broken:
>                 /* these cannot be set */
>                 break;
>         }

Maybe it is useful to set "broken" state? When user space found some issues
with the drive?

Thanks,
Song


> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 41552e615c4c..e7b42b75701a 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -590,6 +590,8 @@ struct md_personality
>         int (*congested)(struct mddev *mddev, int bits);
>         /* Changes the consistency policy of an active array. */
>         int (*change_consistency_policy)(struct mddev *mddev, const char *buf);
> +       /* Check if there is any missing/failed members - RAID0 only for now. */
> +       bool (*is_missing_dev)(struct mddev *mddev);
>  };
>
>  struct md_sysfs_entry {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 58a9cc5193bf..79618a6ae31a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -455,6 +455,31 @@ static inline int is_io_in_chunk_boundary(struct mddev *mddev,
>         }
>  }
>
> +bool raid0_is_missing_dev(struct mddev *mddev)
> +{
> +       struct md_rdev *rdev;
> +       static int already_missing;
> +       int def_disks, work_disks = 0;
> +       struct r0conf *conf = mddev->private;
> +
> +       def_disks = conf->strip_zone[0].nb_dev;
> +       rdev_for_each(rdev, mddev)
> +               if (rdev->bdev->bd_disk->flags & GENHD_FL_UP)
> +                       work_disks++;
> +
> +       if (unlikely(def_disks - work_disks)) {
> +               if (!already_missing) {
> +                       already_missing = 1;
> +                       pr_warn("md: %s: raid0 array has %d missing/failed members\n",
> +                               mdname(mddev), (def_disks - work_disks));
> +               }
> +               return true;
> +       }
> +
> +       already_missing = 0;
> +       return false;
> +}
> +
>  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
>  {
>         struct r0conf *conf = mddev->private;
> @@ -789,6 +814,7 @@ static struct md_personality raid0_personality=
>         .takeover       = raid0_takeover,
>         .quiesce        = raid0_quiesce,
>         .congested      = raid0_congested,
> +       .is_missing_dev = raid0_is_missing_dev,
>  };
>
>  static int __init raid0_init (void)
> --
> 2.22.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-31 13:04         ` Guilherme G. Piccoli
@ 2019-07-31 19:47           ` Song Liu
  -1 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2019-07-31 19:47 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: Neil F Brown, Song Liu, dm-devel, NeilBrown, linux-raid, Bob Liu,
	linux-block, Jay Vosburgh

On Wed, Jul 31, 2019 at 6:05 AM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> On 30/07/2019 21:28, NeilBrown wrote:
> > On Tue, Jul 30 2019, Bob Liu wrote:
> >>
> >>
> >> Curious why only raid0 has this issue?
> >
> > Actually, it isn't only raid0.  'linear' has the same issue.
> > Probably the fix for raid0 should be applied to linear too.
> >
> > NeilBrown
> >
>
> Thanks Neil, it makes sense! I didn't considered "linear" and indeed,
> after some testing, it reacts exactly as raid0/stripping.
>
> In case this patch gets good acceptance I can certainly include
> md/linear in that!
> Cheers,

This looks good. Please include Neil's feedback in v2.

Btw, there is no 2/2 in this set, right?

Song

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-07-31 19:47           ` Song Liu
  0 siblings, 0 replies; 26+ messages in thread
From: Song Liu @ 2019-07-31 19:47 UTC (permalink / raw)
  To: Guilherme G. Piccoli
  Cc: NeilBrown, Bob Liu, linux-raid, Jay Vosburgh, Song Liu, dm-devel,
	Neil F Brown, linux-block

On Wed, Jul 31, 2019 at 6:05 AM Guilherme G. Piccoli
<gpiccoli@canonical.com> wrote:
>
> On 30/07/2019 21:28, NeilBrown wrote:
> > On Tue, Jul 30 2019, Bob Liu wrote:
> >>
> >>
> >> Curious why only raid0 has this issue?
> >
> > Actually, it isn't only raid0.  'linear' has the same issue.
> > Probably the fix for raid0 should be applied to linear too.
> >
> > NeilBrown
> >
>
> Thanks Neil, it makes sense! I didn't considered "linear" and indeed,
> after some testing, it reacts exactly as raid0/stripping.
>
> In case this patch gets good acceptance I can certainly include
> md/linear in that!
> Cheers,

This looks good. Please include Neil's feedback in v2.

Btw, there is no 2/2 in this set, right?

Song

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-07-31 19:43     ` Song Liu
@ 2019-08-01 12:07       ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-08-01 12:07 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, Song Liu, NeilBrown, linux-raid, dm-devel, Jay Vosburgh

On 31/07/2019 16:43, Song Liu wrote:
>[...]
>> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>>                 break;
>>         case write_pending:
>>         case active_idle:
>> +       case broken:
>>                 /* these cannot be set */
>>                 break;
>>         }
> 
> Maybe it is useful to set "broken" state? When user space found some issues
> with the drive?
> 
> Thanks,
> Song

Hi Song, thanks a lot for your feedback. I can easily add that in V2
along with Neil's suggestion, I agree with you.
There is a 2/2 patch regarding the mdadm counterpart; you should have
received the email, but for completeness, the link is:

lore.kernel.org/linux-block/20190729203135.12934-3-gpiccoli@canonical.com

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-08-01 12:07       ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-08-01 12:07 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-block, dm-devel, Jay Vosburgh, NeilBrown, Song Liu

On 31/07/2019 16:43, Song Liu wrote:
>[...]
>> @@ -4315,6 +4329,7 @@ array_state_store(struct mddev *mddev, const char *buf, size_t len)
>>                 break;
>>         case write_pending:
>>         case active_idle:
>> +       case broken:
>>                 /* these cannot be set */
>>                 break;
>>         }
> 
> Maybe it is useful to set "broken" state? When user space found some issues
> with the drive?
> 
> Thanks,
> Song

Hi Song, thanks a lot for your feedback. I can easily add that in V2
along with Neil's suggestion, I agree with you.
There is a 2/2 patch regarding the mdadm counterpart; you should have
received the email, but for completeness, the link is:

lore.kernel.org/linux-block/20190729203135.12934-3-gpiccoli@canonical.com

Thanks,


Guilherme


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
  2019-08-01 12:07       ` Guilherme G. Piccoli
@ 2019-08-16 13:48         ` Guilherme G. Piccoli
  -1 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-08-16 13:48 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-block, Song Liu, NeilBrown, linux-raid, dm-devel, Jay Vosburgh

V2 just sent:
lore.kernel.org/linux-block/20190816134059.29751-1-gpiccoli@canonical.com

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0
@ 2019-08-16 13:48         ` Guilherme G. Piccoli
  0 siblings, 0 replies; 26+ messages in thread
From: Guilherme G. Piccoli @ 2019-08-16 13:48 UTC (permalink / raw)
  To: Song Liu
  Cc: linux-raid, linux-block, dm-devel, Jay Vosburgh, NeilBrown, Song Liu

V2 just sent:
lore.kernel.org/linux-block/20190816134059.29751-1-gpiccoli@canonical.com

Thanks,


Guilherme

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-08-16 13:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-29 20:31 [PATCH 0/2] Introduce new raid0 state 'broken' Guilherme G. Piccoli
2019-07-29 20:31 ` Guilherme G. Piccoli
2019-07-29 20:31 ` [PATCH 1/2] md/raid0: Introduce new array state 'broken' for raid0 Guilherme G. Piccoli
2019-07-29 20:31   ` Guilherme G. Piccoli
2019-07-30  0:11   ` NeilBrown
2019-07-30  0:11     ` NeilBrown
2019-07-30 11:43     ` Guilherme G. Piccoli
2019-07-30 11:43       ` Guilherme G. Piccoli
2019-07-30  6:20   ` Bob Liu
2019-07-30  6:20     ` Bob Liu
2019-07-30 12:18     ` Guilherme G. Piccoli
2019-07-30 12:18       ` Guilherme G. Piccoli
2019-07-31  0:28     ` NeilBrown
2019-07-31  0:28       ` NeilBrown
2019-07-31 13:04       ` Guilherme G. Piccoli
2019-07-31 13:04         ` Guilherme G. Piccoli
2019-07-31 19:47         ` Song Liu
2019-07-31 19:47           ` Song Liu
2019-07-31 19:43   ` Song Liu
2019-07-31 19:43     ` Song Liu
2019-08-01 12:07     ` Guilherme G. Piccoli
2019-08-01 12:07       ` Guilherme G. Piccoli
2019-08-16 13:48       ` Guilherme G. Piccoli
2019-08-16 13:48         ` Guilherme G. Piccoli
2019-07-29 20:31 ` [PATCH 2/2] mdadm: " Guilherme G. Piccoli
2019-07-29 20:31   ` Guilherme G. Piccoli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.