All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] patches: fix dm-raid1 race, bug 502927
@ 2009-11-18 12:09 Mikulas Patocka
  2009-11-18 12:10 ` [PATCH 1/7] Explicitly initialize bio lists Mikulas Patocka
  2009-11-30 16:46 ` [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Takahiro Yasui
  0 siblings, 2 replies; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:09 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Hi

Here is the serie of 7 patches to hold write bios on dm-raid1 until 
dmeventd does its job. It fixes bug 
https://bugzilla.redhat.com/show_bug.cgi?id=502927 . The first 6 patches 
are preparatory, they just move the code around, the last patch does the 
fix.

I tested the thing, I managed to reproduce the bug (by manually stopping 
dmeventd with STOP signal, failing primary mirror leg and writing to the 
device) and I also verified that the patches fix the bug.

For non-dmeventd operation, the current behavior is wrong and I just keep 
it as wrong as it was. There is no easy fix. It is just assume that if the 
user doesn't use dmeventd, he can't activate failed disks again.

Mikulas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/7] Explicitly initialize bio lists
  2009-11-18 12:09 [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Mikulas Patocka
@ 2009-11-18 12:10 ` Mikulas Patocka
  2009-11-18 12:11   ` [PATCH 2/7] A framework for holding bios until suspend Mikulas Patocka
  2009-11-30 16:46 ` [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Takahiro Yasui
  1 sibling, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:10 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Explicitly initialize bio lists.

Currently, zeroing the list with kzalloc already does the initialization,
bit if we ever switch to another bio list implementation, it could be a problem.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.31-fast-new/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31-fast-new.orig/drivers/md/dm-raid1.c	2009-10-06 18:15:09.000000000 +0200
+++ linux-2.6.31-fast-new/drivers/md/dm-raid1.c	2009-10-06 18:15:10.000000000 +0200
@@ -791,6 +791,9 @@ static struct mirror_set *alloc_context(
 	}
 
 	spin_lock_init(&ms->lock);
+	bio_list_init(&ms->reads);
+	bio_list_init(&ms->writes);
+	bio_list_init(&ms->failures);
 
 	ms->ti = ti;
 	ms->nr_mirrors = nr_mirrors;

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/7] A framework for holding bios until suspend
  2009-11-18 12:10 ` [PATCH 1/7] Explicitly initialize bio lists Mikulas Patocka
@ 2009-11-18 12:11   ` Mikulas Patocka
  2009-11-18 12:11     ` [PATCH 3/7] Use the hold framework in do_failures Mikulas Patocka
  2009-11-28 18:02     ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
  0 siblings, 2 replies; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:11 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

A framework for holding bios until suspend and then resubmitting them
with either DM_ENDIO_REQUEUE (if the suspend was noflush) or terminating
them with -EIO.

The bio can be held with the function hold_bio. So far it is unused,
it will be used in later patches.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |   36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

Index: linux-2.6.31-fast-new/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31-fast-new.orig/drivers/md/dm-raid1.c	2009-10-06 18:15:10.000000000 +0200
+++ linux-2.6.31-fast-new/drivers/md/dm-raid1.c	2009-10-06 18:15:14.000000000 +0200
@@ -57,6 +57,7 @@ struct mirror_set {
 	struct bio_list reads;
 	struct bio_list writes;
 	struct bio_list failures;
+	struct bio_list hold;	/* bios are waiting until suspend */
 
 	struct dm_region_hash *rh;
 	struct dm_kcopyd_client *kcopyd_client;
@@ -415,6 +416,20 @@ static void map_region(struct dm_io_regi
 	io->count = bio->bi_size >> 9;
 }
 
+static void hold_bio(struct mirror_set *ms, struct bio *bio)
+{
+	if (atomic_read(&ms->suspend)) {
+		if (dm_noflush_suspending(ms->ti))
+			bio_endio(bio, DM_ENDIO_REQUEUE);
+		else
+			bio_endio(bio, -EIO);
+	} else {
+		spin_lock_irq(&ms->lock);
+		bio_list_add(&ms->hold, bio);
+		spin_unlock_irq(&ms->lock);
+	}
+}
+
 /*-----------------------------------------------------------------
  * Reads
  *---------------------------------------------------------------*/
@@ -760,6 +775,7 @@ static void do_mirror(struct work_struct
 	bio_list_init(&ms->reads);
 	bio_list_init(&ms->writes);
 	bio_list_init(&ms->failures);
+	bio_list_init(&ms->hold);
 	spin_unlock_irqrestore(&ms->lock, flags);
 
 	dm_rh_update_states(ms->rh, errors_handled(ms));
@@ -1192,6 +1208,9 @@ static void mirror_presuspend(struct dm_
 	struct mirror_set *ms = (struct mirror_set *) ti->private;
 	struct dm_dirty_log *log = dm_rh_dirty_log(ms->rh);
 
+	struct bio_list hold;
+	struct bio *bio;
+
 	atomic_set(&ms->suspend, 1);
 
 	/*
@@ -1214,6 +1233,23 @@ static void mirror_presuspend(struct dm_
 	 * we know that all of our I/O has been pushed.
 	 */
 	flush_workqueue(ms->kmirrord_wq);
+
+	/*
+	 * Once we set ms->suspend and flush the workqueue,
+	 * no more entries are being added to ms->hold list.
+	 * Process the list now.
+	 *
+	 * (note that bios can still arive concurrently with
+	 * or after presuspend, but they are never put to
+	 * hold list because of ms->suspend != 0).
+	 */
+	spin_lock_irq(&ms->lock);
+	hold = ms->hold;
+	bio_list_init(&ms->hold);
+	spin_unlock_irq(&ms->lock);
+
+	while ((bio = bio_list_pop(&hold)))
+		hold_bio(ms, bio);
 }
 
 static void mirror_postsuspend(struct dm_target *ti)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 3/7] Use the hold framework in do_failures
  2009-11-18 12:11   ` [PATCH 2/7] A framework for holding bios until suspend Mikulas Patocka
@ 2009-11-18 12:11     ` Mikulas Patocka
  2009-11-18 12:12       ` [PATCH 4/7] Don't optimize for failure case Mikulas Patocka
  2009-11-28 18:02     ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
  1 sibling, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:11 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Use the hold framework in do_failures.

This patch doesn't change any logic in processing bios, it just
simplifies failure handling and avoids periodically polling the failures
list.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |   34 +++++++++-------------------------
 1 file changed, 9 insertions(+), 25 deletions(-)

Index: linux-2.6.31-fast-new/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31-fast-new.orig/drivers/md/dm-raid1.c	2009-10-06 18:15:14.000000000 +0200
+++ linux-2.6.31-fast-new/drivers/md/dm-raid1.c	2009-10-06 18:15:17.000000000 +0200
@@ -703,20 +703,12 @@ static void do_failures(struct mirror_se
 {
 	struct bio *bio;
 
-	if (!failures->head)
+	if (likely(!failures->head))
 		return;
 
-	if (!ms->log_failure) {
-		while ((bio = bio_list_pop(failures))) {
-			ms->in_sync = 0;
-			dm_rh_mark_nosync(ms->rh, bio, bio->bi_size, 0);
-		}
-		return;
-	}
-
 	/*
 	 * If the log has failed, unattempted writes are being
-	 * put on the failures list.  We can't issue those writes
+	 * put on the hold list.  We can't issue those writes
 	 * until a log has been marked, so we must store them.
 	 *
 	 * If a 'noflush' suspend is in progress, we can requeue
@@ -731,23 +723,15 @@ static void do_failures(struct mirror_se
 	 * for us to treat them the same and requeue them
 	 * as well.
 	 */
-	if (dm_noflush_suspending(ms->ti)) {
-		while ((bio = bio_list_pop(failures)))
-			bio_endio(bio, DM_ENDIO_REQUEUE);
-		return;
-	}
 
-	if (atomic_read(&ms->suspend)) {
-		while ((bio = bio_list_pop(failures)))
-			bio_endio(bio, -EIO);
-		return;
+	while ((bio = bio_list_pop(failures))) {
+		if (!ms->log_failure) {
+			ms->in_sync = 0;
+			dm_rh_mark_nosync(ms->rh, bio, bio->bi_size, 0);
+		} else {
+			hold_bio(ms, bio);
+		}
 	}
-
-	spin_lock_irq(&ms->lock);
-	bio_list_merge(&ms->failures, failures);
-	spin_unlock_irq(&ms->lock);
-
-	delayed_wake(ms);
 }
 
 static void trigger_event(struct work_struct *work)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 4/7] Don't optimize for failure case
  2009-11-18 12:11     ` [PATCH 3/7] Use the hold framework in do_failures Mikulas Patocka
@ 2009-11-18 12:12       ` Mikulas Patocka
  2009-11-18 12:13         ` [PATCH 5/7] Move a logic to get a valid mirror leg to a function Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:12 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Don't optimize for failure case.

should_wake is just an optimization to avoid waking the thread up when
there's already something to do. Don't do this optimization in failure
handling code. It is totally pointless to optimize failures.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |    6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

Index: linux-2.6.31-fast-new/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31-fast-new.orig/drivers/md/dm-raid1.c	2009-10-06 18:15:17.000000000 +0200
+++ linux-2.6.31-fast-new/drivers/md/dm-raid1.c	2009-10-06 18:15:20.000000000 +0200
@@ -529,7 +529,6 @@ static void write_callback(unsigned long
 	struct bio *bio = (struct bio *) context;
 	struct mirror_set *ms;
 	int uptodate = 0;
-	int should_wake = 0;
 	unsigned long flags;
 
 	ms = bio_get_m(bio)->ms;
@@ -561,12 +560,9 @@ static void write_callback(unsigned long
 		 * the main thread.
 		 */
 		spin_lock_irqsave(&ms->lock, flags);
-		if (!ms->failures.head)
-			should_wake = 1;
 		bio_list_add(&ms->failures, bio);
 		spin_unlock_irqrestore(&ms->lock, flags);
-		if (should_wake)
-			wakeup_mirrord(ms);
+		wakeup_mirrord(ms);
 		return;
 	}
 out:

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 5/7] Move a logic to get a valid mirror leg to a function
  2009-11-18 12:12       ` [PATCH 4/7] Don't optimize for failure case Mikulas Patocka
@ 2009-11-18 12:13         ` Mikulas Patocka
  2009-11-18 12:18           ` [PATCH 6/7] Move bio completion from dm_rh_mark_nosync to its caller Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:13 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Move a logic to get a valid mirror leg to a function.
It will be reused later.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

Index: linux-2.6.31-rc3-devel/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31-rc3-devel.orig/drivers/md/dm-raid1.c	2009-07-20 20:46:49.000000000 +0200
+++ linux-2.6.31-rc3-devel/drivers/md/dm-raid1.c	2009-07-20 20:47:44.000000000 +0200
@@ -180,6 +180,15 @@ static void set_default_mirror(struct mi
 	atomic_set(&ms->default_mirror, m - m0);
 }
 
+static struct mirror *get_valid_mirror(struct mirror_set *ms)
+{
+	struct mirror *m;
+	for (m = ms->mirror; m < ms->mirror + ms->nr_mirrors; m++)
+		if (!atomic_read(&m->error_count))
+			return m;
+	return NULL;
+}
+
 /* fail_mirror
  * @m: mirror device to fail
  * @error_type: one of the enum's, DM_RAID1_*_ERROR
@@ -225,13 +234,10 @@ static void fail_mirror(struct mirror *m
 		goto out;
 	}
 
-	for (new = ms->mirror; new < ms->mirror + ms->nr_mirrors; new++)
-		if (!atomic_read(&new->error_count)) {
-			set_default_mirror(new);
-			break;
-		}
-
-	if (unlikely(new == ms->mirror + ms->nr_mirrors))
+	new = get_valid_mirror(ms);
+	if (new)
+		set_default_mirror(new);
+	else
 		DMWARN("All sides of mirror have failed.");
 
 out:

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 6/7] Move bio completion from dm_rh_mark_nosync to its caller
  2009-11-18 12:13         ` [PATCH 5/7] Move a logic to get a valid mirror leg to a function Mikulas Patocka
@ 2009-11-18 12:18           ` Mikulas Patocka
  2009-11-18 12:19             ` [PATCH 7/7] Hold all write bios when errors are handled Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:18 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Move bio completion from dm_rh_mark_nosync to its caller.

It is more understandable this way.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c          |    3 ++-
 drivers/md/dm-region-hash.c    |    6 +-----
 include/linux/dm-region-hash.h |    3 +--
 3 files changed, 4 insertions(+), 8 deletions(-)

Index: linux-2.6.31.6-fast/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-raid1.c	2009-11-18 10:51:37.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-raid1.c	2009-11-18 10:53:00.000000000 +0100
@@ -729,7 +729,8 @@ static void do_failures(struct mirror_se
 	while ((bio = bio_list_pop(failures))) {
 		if (!ms->log_failure) {
 			ms->in_sync = 0;
-			dm_rh_mark_nosync(ms->rh, bio, bio->bi_size, 0);
+			dm_rh_mark_nosync(ms->rh, bio);
+			bio_endio(bio, 0);
 		} else {
 			hold_bio(ms, bio);
 		}
Index: linux-2.6.31.6-fast/drivers/md/dm-region-hash.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-region-hash.c	2009-11-18 10:53:10.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-region-hash.c	2009-11-18 10:59:17.000000000 +0100
@@ -392,8 +392,6 @@ static void complete_resync_work(struct 
 /* dm_rh_mark_nosync
  * @ms
  * @bio
- * @done
- * @error
  *
  * The bio was written on some mirror(s) but failed on other mirror(s).
  * We can successfully endio the bio but should avoid the region being
@@ -401,8 +399,7 @@ static void complete_resync_work(struct 
  *
  * This function is _not_ safe in interrupt context!
  */
-void dm_rh_mark_nosync(struct dm_region_hash *rh,
-		       struct bio *bio, unsigned done, int error)
+void dm_rh_mark_nosync(struct dm_region_hash *rh, struct bio *bio)
 {
 	unsigned long flags;
 	struct dm_dirty_log *log = rh->log;
@@ -439,7 +436,6 @@ void dm_rh_mark_nosync(struct dm_region_
 	BUG_ON(!list_empty(&reg->list));
 	spin_unlock_irqrestore(&rh->region_lock, flags);
 
-	bio_endio(bio, error);
 	if (recovering)
 		complete_resync_work(reg, 0);
 }
Index: linux-2.6.31.6-fast/include/linux/dm-region-hash.h
===================================================================
--- linux-2.6.31.6-fast.orig/include/linux/dm-region-hash.h	2009-11-18 10:53:39.000000000 +0100
+++ linux-2.6.31.6-fast/include/linux/dm-region-hash.h	2009-11-18 10:53:45.000000000 +0100
@@ -78,8 +78,7 @@ void dm_rh_dec(struct dm_region_hash *rh
 /* Delay bios on regions. */
 void dm_rh_delay(struct dm_region_hash *rh, struct bio *bio);
 
-void dm_rh_mark_nosync(struct dm_region_hash *rh,
-		       struct bio *bio, unsigned done, int error);
+void dm_rh_mark_nosync(struct dm_region_hash *rh, struct bio *bio);
 
 /*
  * Region recovery control.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-18 12:18           ` [PATCH 6/7] Move bio completion from dm_rh_mark_nosync to its caller Mikulas Patocka
@ 2009-11-18 12:19             ` Mikulas Patocka
  2009-11-23  5:58               ` malahal
  0 siblings, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-18 12:19 UTC (permalink / raw)
  To: Alasdair G Kergon; +Cc: dm-devel

Hold all write bios when errors are handled.

The patch changes these things:
- previously, the failures list was used only when handling errors with
  dmeventd. Now, it is used for all bios. Even when not using dmeventd,
  the regions where some writes failed must be marked as nosync. This can only
  be done in process context (i.e. in raid1 workqueue), not in
  the write_callback function.
- previously, the write would succeed if writing to at least one leg succeeded.
  This is wrong because data from the failed leg may be replicated to the
  correct leg.
  Now, if using dmeventd, the write with some failures will fail be held until
  dmeventd does its job and reconfigures the array.
  If not using dmeventd, the write still succeeds if at least one leg succeeds,
  it is wrong but it is consistent with current behavior.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |   54 ++++++++++++++++++++++++++------------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

Index: linux-2.6.31.6-fast/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.31.6-fast.orig/drivers/md/dm-raid1.c	2009-11-18 10:53:00.000000000 +0100
+++ linux-2.6.31.6-fast/drivers/md/dm-raid1.c	2009-11-18 12:40:35.000000000 +0100
@@ -534,7 +534,6 @@ static void write_callback(unsigned long
 	unsigned i, ret = 0;
 	struct bio *bio = (struct bio *) context;
 	struct mirror_set *ms;
-	int uptodate = 0;
 	unsigned long flags;
 
 	ms = bio_get_m(bio)->ms;
@@ -546,33 +545,23 @@ static void write_callback(unsigned long
 	 * This way we handle both writes to SYNC and NOSYNC
 	 * regions with the same code.
 	 */
-	if (likely(!error))
-		goto out;
+	if (likely(!error)) {
+		bio_endio(bio, ret);
+		return;
+	}
 
 	for (i = 0; i < ms->nr_mirrors; i++)
 		if (test_bit(i, &error))
 			fail_mirror(ms->mirror + i, DM_RAID1_WRITE_ERROR);
-		else
-			uptodate = 1;
 
-	if (unlikely(!uptodate)) {
-		DMERR("All replicated volumes dead, failing I/O");
-		/* None of the writes succeeded, fail the I/O. */
-		ret = -EIO;
-	} else if (errors_handled(ms)) {
-		/*
-		 * Need to raise event.  Since raising
-		 * events can block, we need to do it in
-		 * the main thread.
-		 */
-		spin_lock_irqsave(&ms->lock, flags);
-		bio_list_add(&ms->failures, bio);
-		spin_unlock_irqrestore(&ms->lock, flags);
-		wakeup_mirrord(ms);
-		return;
-	}
-out:
-	bio_endio(bio, ret);
+	/*
+	 * In either case we must mark the region as NOSYNC.
+	 * That would block, so do it in the thread.
+	 */
+	spin_lock_irqsave(&ms->lock, flags);
+	bio_list_add(&ms->failures, bio);
+	spin_unlock_irqrestore(&ms->lock, flags);
+	wakeup_mirrord(ms);
 }
 
 static void do_write(struct mirror_set *ms, struct bio *bio)
@@ -730,10 +719,25 @@ static void do_failures(struct mirror_se
 		if (!ms->log_failure) {
 			ms->in_sync = 0;
 			dm_rh_mark_nosync(ms->rh, bio);
+		}
+		/*
+		 * If all the legs are dead, fail the I/O.
+		 *
+		 * If we are not using dmeventd, we pretend that the I/O
+		 * succeeded. This is wrong (the failed leg might come online
+		 * again after reboot and it would be replicated back to
+		 * the good leg) but it is consistent with current behavior.
+		 * For proper behavior, dm-raid1 shouldn't be used without
+		 * dmeventd at all.
+		 *
+		 * If we use dmeventd, hold the bio until dmeventd does its job.
+		 */
+		if (!get_valid_mirror(ms))
+			bio_endio(bio, -EIO);
+		else if (!errors_handled(ms))
 			bio_endio(bio, 0);
-		} else {
+		else
 			hold_bio(ms, bio);
-		}
 	}
 }
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-18 12:19             ` [PATCH 7/7] Hold all write bios when errors are handled Mikulas Patocka
@ 2009-11-23  5:58               ` malahal
  2009-11-23 17:54                 ` Takahiro Yasui
  2009-11-24 11:51                 ` Mikulas Patocka
  0 siblings, 2 replies; 29+ messages in thread
From: malahal @ 2009-11-23  5:58 UTC (permalink / raw)
  To: dm-devel

Mikulas Patocka [mpatocka@redhat.com] wrote:
> Index: linux-2.6.31.6-fast/drivers/md/dm-raid1.c
> -	if (unlikely(!uptodate)) {
> -		DMERR("All replicated volumes dead, failing I/O");
> -		/* None of the writes succeeded, fail the I/O. */
> -		ret = -EIO;
> -	} else if (errors_handled(ms)) {
> -		/*
> -		 * Need to raise event.  Since raising
> -		 * events can block, we need to do it in
> -		 * the main thread.
> -		 */
> -		spin_lock_irqsave(&ms->lock, flags);
> -		bio_list_add(&ms->failures, bio);
> -		spin_unlock_irqrestore(&ms->lock, flags);
> -		wakeup_mirrord(ms);
> -		return;
> -	}
> -out:
> -	bio_endio(bio, ret);
> +	/*
> +	 * In either case we must mark the region as NOSYNC.
> +	 * That would block, so do it in the thread.
> +	 */

What exactly you mean by "either case' here? "errors_handled" case or not?
Need to adjust the comment as we don't do that check here anymore.

> @@ -730,10 +719,25 @@ static void do_failures(struct mirror_se
>  		if (!ms->log_failure) {
>  			ms->in_sync = 0;
>  			dm_rh_mark_nosync(ms->rh, bio);
> +		}
> +		/*
> +		 * If all the legs are dead, fail the I/O.
> +		 *
> +		 * If we are not using dmeventd, we pretend that the I/O
> +		 * succeeded. This is wrong (the failed leg might come online
> +		 * again after reboot and it would be replicated back to
> +		 * the good leg) but it is consistent with current behavior.
> +		 * For proper behavior, dm-raid1 shouldn't be used without
> +		 * dmeventd at all.
> +		 *
> +		 * If we use dmeventd, hold the bio until dmeventd does its job.
> +		 */
> +		if (!get_valid_mirror(ms))
> +			bio_endio(bio, -EIO);
> +		else if (!errors_handled(ms))
>  			bio_endio(bio, 0);
> -		} else {
> +		else
>  			hold_bio(ms, bio);

You seem to be holding the I/O that has failed, but what about writes
that are issued after this failure. They seem to go through just fine.
Did I miss something? I think do_writes() needs to check for a leg
failure (just like it does for log_failure already), and put them on
failure/hold list, right?

Also, we do need to do the above work only if "primary" leg fails. We
can continue to work just like the old code if "secondary" legs fail,
right? Not sure if this is worth optimizing though, but I would like to
see it implemented as it is just a few extra checks. We can have
primary_failure field like log_failure field.

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-23  5:58               ` malahal
@ 2009-11-23 17:54                 ` Takahiro Yasui
  2009-11-24 11:51                 ` Mikulas Patocka
  1 sibling, 0 replies; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-23 17:54 UTC (permalink / raw)
  To: dm-devel

On 11/23/09 00:58, malahal@us.ibm.com wrote:
> Mikulas Patocka [mpatocka@redhat.com] wrote:
>> @@ -730,10 +719,25 @@ static void do_failures(struct mirror_se
>>  		if (!ms->log_failure) {
>>  			ms->in_sync = 0;
>>  			dm_rh_mark_nosync(ms->rh, bio);
>> +		}
>> +		/*
>> +		 * If all the legs are dead, fail the I/O.
>> +		 *
>> +		 * If we are not using dmeventd, we pretend that the I/O
>> +		 * succeeded. This is wrong (the failed leg might come online
>> +		 * again after reboot and it would be replicated back to
>> +		 * the good leg) but it is consistent with current behavior.
>> +		 * For proper behavior, dm-raid1 shouldn't be used without
>> +		 * dmeventd at all.
>> +		 *
>> +		 * If we use dmeventd, hold the bio until dmeventd does its job.
>> +		 */
>> +		if (!get_valid_mirror(ms))
>> +			bio_endio(bio, -EIO);
>> +		else if (!errors_handled(ms))
>>  			bio_endio(bio, 0);
>> -		} else {
>> +		else
>>  			hold_bio(ms, bio);
> 
> You seem to be holding the I/O that has failed, but what about writes
> that are issued after this failure. They seem to go through just fine.
> Did I miss something? I think do_writes() needs to check for a leg
> failure (just like it does for log_failure already), and put them on
> failure/hold list, right?

On 09/09/09 16:18, Takahiro Yasui wrote:
>  - As I mentioned before, bios which are sent to out-of-sync regions also
>    need to be blocked because bios to out-of-sync regions are processed by
>    generic_make_request() and would return the "success" to upper layer
>    without dm-raid1 notices it. This might cause data corruption.
>    https://www.redhat.com/archives/dm-devel/2009-July/msg00118.html

I think you are right. I also commented above two months ago, but
I haven't got comments from Mikulas.

> Also, we do need to do the above work only if "primary" leg fails. We
> can continue to work just like the old code if "secondary" legs fail,
> right? Not sure if this is worth optimizing though, but I would like to
> see it implemented as it is just a few extra checks. We can have
> primary_failure field like log_failure field.

Good idea. This data corruption we are talking could happen only if
the primary leg fails. Keeping system running in case of non-primary
legs looks reasonable.

Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-23  5:58               ` malahal
  2009-11-23 17:54                 ` Takahiro Yasui
@ 2009-11-24 11:51                 ` Mikulas Patocka
  2009-11-24 19:17                   ` malahal
  1 sibling, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-24 11:51 UTC (permalink / raw)
  To: malahal; +Cc: dm-devel

> > -out:
> > -	bio_endio(bio, ret);
> > +	/*
> > +	 * In either case we must mark the region as NOSYNC.
> > +	 * That would block, so do it in the thread.
> > +	 */
> 
> What exactly you mean by "either case' here? "errors_handled" case or not?
> Need to adjust the comment as we don't do that check here anymore.

Yes. I mean both "errors_handled" case and "errors_not_handled".

> You seem to be holding the I/O that has failed, but what about writes
> that are issued after this failure. They seem to go through just fine.
> Did I miss something? I think do_writes() needs to check for a leg
> failure (just like it does for log_failure already), and put them on
> failure/hold list, right?

Yes, writes after the failed request are processed, but it is not a 
problem --- if the write succeeded on all legs, it is returned as success 
--- in this case, resychronization can't corrupt written data. If the 
write succeeded only on some legs, it is held again.

So in practice, if some leg fails completely, all writes will be held.

> Also, we do need to do the above work only if "primary" leg fails. We
> can continue to work just like the old code if "secondary" legs fail,
> right? Not sure if this is worth optimizing though, but I would like to
> see it implemented as it is just a few extra checks. We can have
> primary_failure field like log_failure field.
> 
> Thanks, Malahal.

I thought about it too, but concluded that we need to hold bios even if 
the primary leg fails.

Imagine this scenario:
* secondary leg fails
* write fails on the secondaty leg and succeeds on the primary leg 
and is successfully complete
* the computer crashes
* after a reboot, the primary leg is inaccessible and the secondary leg is 
back online --- now raid1 would be returning stale data.

In transaction processing applications, errorneous revert of committed 
transaction is worse problem than computer crash.


If we hold the bios if the secondary leg fails (as the patch does), one of 
these two scenarios happen:

* secondary leg fails
* write succeeds on the primary leg and is held
* the computer crashes
* after a reboot, the primary leg is inaccessible and the secondary leg is
back online --- but we haven't completed the write, so the transaction 
wasn't reported as committed

or

* secondary leg fails
* write succeeds on the primary leg and is held
* dmeventd removes the secondary leg and the write succeeds
* the computer crashes
* after a reboot, the primary leg is inaccessible, the secondary leg was 
already removed by dmeventd, so the array is considered inaccessible. So 
it doesn't work but at least it doesn't revert already committed 
transaction.

Mikulas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-24 11:51                 ` Mikulas Patocka
@ 2009-11-24 19:17                   ` malahal
  2009-11-25 13:19                     ` Mikulas Patocka
  0 siblings, 1 reply; 29+ messages in thread
From: malahal @ 2009-11-24 19:17 UTC (permalink / raw)
  To: dm-devel

Mikulas Patocka [mpatocka@redhat.com] wrote:
> Yes, writes after the failed request are processed, but it is not a 
> problem --- if the write succeeded on all legs, it is returned as success 
> --- in this case, resychronization can't corrupt written data. If the 
> write succeeded only on some legs, it is held again.
> 
> So in practice, if some leg fails completely, all writes will be held.

I need to look at the code again, but I thought any new writes to a
failed region go to a surviving leg. In that case, we end up returning
I/O's to the application after writing to a single leg.
 
> > Also, we do need to do the above work only if "primary" leg fails. We
> > can continue to work just like the old code if "secondary" legs fail,
> > right? Not sure if this is worth optimizing though, but I would like to
> > see it implemented as it is just a few extra checks. We can have
> > primary_failure field like log_failure field.
 
> I thought about it too, but concluded that we need to hold bios even if 
> the primary leg fails.
> 
> Imagine this scenario:
> * secondary leg fails
> * write fails on the secondaty leg and succeeds on the primary leg 
> and is successfully complete
> * the computer crashes
> * after a reboot, the primary leg is inaccessible and the secondary leg is 
> back online --- now raid1 would be returning stale data.

The software can detect this case. We can fail this completely or use
the data from the secondary that could be "stale" with help from admin. 
Let us call this method 1.

> If we hold the bios if the secondary leg fails (as the patch does), one of 
> these two scenarios happen:
> 
> * secondary leg fails
> * write succeeds on the primary leg and is held
> * the computer crashes
> * after a reboot, the primary leg is inaccessible and the secondary leg is
> back online --- but we haven't completed the write, so the transaction 
> wasn't reported as committed
> 
> or
> 
> * secondary leg fails
> * write succeeds on the primary leg and is held
> * dmeventd removes the secondary leg and the write succeeds
> * the computer crashes
> * after a reboot, the primary leg is inaccessible, the secondary leg was 
> already removed by dmeventd, so the array is considered inaccessible. So 
> it doesn't work but at least it doesn't revert already committed 
> transaction.

How is this latter case (it doesn't need a crash anyway)
different/better from the case where we detect that 'primary' is missing
and ask admin if he wants to use the data on the secondary or not. At
least, the admin has a choice with "method 1" and this doesn't have that
choice.

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-24 19:17                   ` malahal
@ 2009-11-25 13:19                     ` Mikulas Patocka
  2009-11-25 15:43                       ` Takahiro Yasui
  2009-11-25 20:23                       ` [PATCH 7/7] Hold all write bios when errors are handled malahal
  0 siblings, 2 replies; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-25 13:19 UTC (permalink / raw)
  To: malahal; +Cc: dm-devel



On Tue, 24 Nov 2009, malahal@us.ibm.com wrote:

> I need to look at the code again, but I thought any new writes to a
> failed region go to a surviving leg. In that case, we end up returning
> I/O's to the application after writing to a single leg.

Writes always go to all the legs, see do_write(). Anyway, dmeventd removes 
the failed leg soon.

> > > Also, we do need to do the above work only if "primary" leg fails. We
> > > can continue to work just like the old code if "secondary" legs fail,
> > > right? Not sure if this is worth optimizing though, but I would like to
> > > see it implemented as it is just a few extra checks. We can have
> > > primary_failure field like log_failure field.
>  
> > I thought about it too, but concluded that we need to hold bios even if 
> > the primary leg fails.
> > 
> > Imagine this scenario:
> > * secondary leg fails
> > * write fails on the secondaty leg and succeeds on the primary leg 
> > and is successfully complete
> > * the computer crashes
> > * after a reboot, the primary leg is inaccessible and the secondary leg is 
> > back online --- now raid1 would be returning stale data.
> 
> The software can detect this case. We can fail this completely or use
> the data from the secondary that could be "stale" with help from admin. 
> Let us call this method 1.

You can't detect it because the computer crashed *before* you write the 
information that the secondary leg failed to the metadata.

So, after a reboot, you can't tell if any mirror leg failed some requests 
before the crash.

> > If we hold the bios if the secondary leg fails (as the patch does), one of 
> > these two scenarios happen:
> > 
> > * secondary leg fails
> > * write succeeds on the primary leg and is held
> > * the computer crashes
> > * after a reboot, the primary leg is inaccessible and the secondary leg is
> > back online --- but we haven't completed the write, so the transaction 
> > wasn't reported as committed
> > 
> > or
> > 
> > * secondary leg fails
> > * write succeeds on the primary leg and is held
> > * dmeventd removes the secondary leg and the write succeeds
> > * the computer crashes
> > * after a reboot, the primary leg is inaccessible, the secondary leg was 
> > already removed by dmeventd, so the array is considered inaccessible. So 
> > it doesn't work but at least it doesn't revert already committed 
> > transaction.
> 
> How is this latter case (it doesn't need a crash anyway)
> different/better from the case where we detect that 'primary' is missing
> and ask admin if he wants to use the data on the secondary or not. At
> least, the admin has a choice with "method 1" and this doesn't have that
> choice.

If you ask the admin always if primary leg failed and wait for his action, 
you lose fault-tolerance --- the computer would wait until the admin does 
an action.

The requirements are:
* if one of legs fail or log fails, you must automatically continue 
without human intervention
* if both legs fail, you must shut it down and not pretend that something 
was written when it wasn't (this would break durability requirement of 
transactions).

Mikulas

> Thanks, Malahal.
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 13:19                     ` Mikulas Patocka
@ 2009-11-25 15:43                       ` Takahiro Yasui
  2009-11-25 20:44                         ` malahal
  2009-11-26 17:54                         ` [PATCH 8/7] Hold all write bios in nosync region Mikulas Patocka
  2009-11-25 20:23                       ` [PATCH 7/7] Hold all write bios when errors are handled malahal
  1 sibling, 2 replies; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-25 15:43 UTC (permalink / raw)
  To: dm-devel

On 11/25/09 08:19, Mikulas Patocka wrote:
> 
> 
> On Tue, 24 Nov 2009, malahal@us.ibm.com wrote:
> 
>> I need to look at the code again, but I thought any new writes to a
>> failed region go to a surviving leg. In that case, we end up returning
>> I/O's to the application after writing to a single leg.
> 
> Writes always go to all the legs, see do_write(). Anyway, dmeventd removes 
> the failed leg soon.

Is it correct? When the region is in the state of out of sync (NOSYNC), 
I/Os are not processed by do_write() but generic_make_request() in the
do_writes().

>>>> Also, we do need to do the above work only if "primary" leg fails. We
>>>> can continue to work just like the old code if "secondary" legs fail,
>>>> right? Not sure if this is worth optimizing though, but I would like to
>>>> see it implemented as it is just a few extra checks. We can have
>>>> primary_failure field like log_failure field.
>>  
>>> I thought about it too, but concluded that we need to hold bios even if 
>>> the primary leg fails.
>>>
>>> Imagine this scenario:
>>> * secondary leg fails
>>> * write fails on the secondaty leg and succeeds on the primary leg 
>>> and is successfully complete
>>> * the computer crashes
>>> * after a reboot, the primary leg is inaccessible and the secondary leg is 
>>> back online --- now raid1 would be returning stale data.
>>
>> The software can detect this case. We can fail this completely or use
>> the data from the secondary that could be "stale" with help from admin. 
>> Let us call this method 1.
> 
> You can't detect it because the computer crashed *before* you write the 
> information that the secondary leg failed to the metadata.
> 
> So, after a reboot, you can't tell if any mirror leg failed some requests 
> before the crash.
> 
>>> If we hold the bios if the secondary leg fails (as the patch does), one of 
>>> these two scenarios happen:
>>>
>>> * secondary leg fails
>>> * write succeeds on the primary leg and is held
>>> * the computer crashes
>>> * after a reboot, the primary leg is inaccessible and the secondary leg is
>>> back online --- but we haven't completed the write, so the transaction 
>>> wasn't reported as committed
>>>
>>> or
>>>
>>> * secondary leg fails
>>> * write succeeds on the primary leg and is held
>>> * dmeventd removes the secondary leg and the write succeeds
>>> * the computer crashes
>>> * after a reboot, the primary leg is inaccessible, the secondary leg was 
>>> already removed by dmeventd, so the array is considered inaccessible. So 
>>> it doesn't work but at least it doesn't revert already committed 
>>> transaction.
>>
>> How is this latter case (it doesn't need a crash anyway)
>> different/better from the case where we detect that 'primary' is missing
>> and ask admin if he wants to use the data on the secondary or not. At
>> least, the admin has a choice with "method 1" and this doesn't have that
>> choice.
> 
> If you ask the admin always if primary leg failed and wait for his action, 
> you lose fault-tolerance --- the computer would wait until the admin does 
> an action.
> 
> The requirements are:
> * if one of legs fail or log fails, you must automatically continue 
> without human intervention
> * if both legs fail, you must shut it down and not pretend that something 
> was written when it wasn't (this would break durability requirement of 
> transactions).

I agree with this point. lvm mirror could be used on filesystems such as
ext3 and each filesystem and application needs to take care those situation
to prevent data corruption. I don't think that it is realistic, and the
underlying layer should prevent data corruption. I now understand primary
and secondary disks need to be blocked.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 13:19                     ` Mikulas Patocka
  2009-11-25 15:43                       ` Takahiro Yasui
@ 2009-11-25 20:23                       ` malahal
  2009-11-25 22:47                         ` Takahiro Yasui
  2009-11-26 17:58                         ` Mikulas Patocka
  1 sibling, 2 replies; 29+ messages in thread
From: malahal @ 2009-11-25 20:23 UTC (permalink / raw)
  To: dm-devel

Mikulas Patocka [mpatocka@redhat.com] wrote:
> 
> > > Imagine this scenario:
> > > * secondary leg fails
> > > * write fails on the secondaty leg and succeeds on the primary leg 
> > > and is successfully complete
> > > * the computer crashes
> > > * after a reboot, the primary leg is inaccessible and the secondary leg is 
> > > back online --- now raid1 would be returning stale data.
> > 
> > The software can detect this case. We can fail this completely or use
> > the data from the secondary that could be "stale" with help from admin. 
> > Let us call this method 1.
> 
> You can't detect it because the computer crashed *before* you write the 
> information that the secondary leg failed to the metadata.
> 
> So, after a reboot, you can't tell if any mirror leg failed some requests 
> before the crash.

My definition of 'primary' is the first leg. Now on, I will use "first
leg" to avoid confusion.  On a reboot, LVM can find if its first leg is
missing. If it is missing, it can ask the admin whether to use the
'second' leg or not. When I said, "software" can detect, I really meant
that LVM can detect that the "first leg" is missing.

> > > If we hold the bios if the secondary leg fails (as the patch does), one of 
> > > these two scenarios happen:
> > > 
> > > * secondary leg fails
> > > * write succeeds on the primary leg and is held
> > > * the computer crashes
> > > * after a reboot, the primary leg is inaccessible and the secondary leg is
> > > back online --- but we haven't completed the write, so the transaction 
> > > wasn't reported as committed
> > > 
> > > or
> > > 
> > > * secondary leg fails
> > > * write succeeds on the primary leg and is held
> > > * dmeventd removes the secondary leg and the write succeeds
> > > * the computer crashes
> > > * after a reboot, the primary leg is inaccessible, the secondary leg was 
> > > already removed by dmeventd, so the array is considered inaccessible. So 
> > > it doesn't work but at least it doesn't revert already committed 
> > > transaction.
> > 
> > How is this latter case (it doesn't need a crash anyway)
> > different/better from the case where we detect that 'primary' is missing
> > and ask admin if he wants to use the data on the secondary or not. At
> > least, the admin has a choice with "method 1" and this doesn't have that
> > choice.
> 
> If you ask the admin always if primary leg failed and wait for his action, 
> you lose fault-tolerance --- the computer would wait until the admin does 
> an action.

Well, we are talking one time admin help at reboot [actually activation
time] if and only if the "state of the mirror" changed during reboot!
This is not applicable at run time. Most of the time, dmeventd will have
a chance to change the mirror to linear anyway.

> The requirements are:
> * if one of legs fail or log fails, you must automatically continue 
> without human intervention

This is satisfied. I was saying for the admin input only at reboot time.
What happens if first leg fails now? I think, activation fails and needs
to specify "--partial" or so.

> * if both legs fail, you must shut it down and not pretend that something 
> was written when it wasn't (this would break durability requirement of 
> transactions).

True.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 15:43                       ` Takahiro Yasui
@ 2009-11-25 20:44                         ` malahal
  2009-11-25 22:50                           ` Takahiro Yasui
  2009-11-26 17:56                           ` Mikulas Patocka
  2009-11-26 17:54                         ` [PATCH 8/7] Hold all write bios in nosync region Mikulas Patocka
  1 sibling, 2 replies; 29+ messages in thread
From: malahal @ 2009-11-25 20:44 UTC (permalink / raw)
  To: dm-devel

Takahiro Yasui [tyasui@redhat.com] wrote:
> > The requirements are:
> > * if one of legs fail or log fails, you must automatically continue 
> > without human intervention
> > * if both legs fail, you must shut it down and not pretend that something 
> > was written when it wasn't (this would break durability requirement of 
> > transactions).
> 
> I agree with this point. lvm mirror could be used on filesystems such as
> ext3 and each filesystem and application needs to take care those situation
> to prevent data corruption. I don't think that it is realistic, and the
> underlying layer should prevent data corruption. I now understand primary
> and secondary disks need to be blocked.

If you think that I/O needs to be blocked for first or second leg
failures, then I am afraid that there is no way to keep the failed
mirror in the system for re-integrating failed devices.

I would like to keep the mirror as mirror when one of the legs fails as
opposed to making it linear now by dmeventd. This is to re-integrate a
failed leg into the mirror to handle transient device failures. Here is
what I am planning to do, let me know if you find any issues:

1. Secondary leg failure. dmeventd will NOT change the meta data at all.
   It will call "lvchange --refresh" that will start resynchronization.
   This will be done a few times. If the device still fails to
   re-integrate, it is left the way it is and no further re-integrations
   attempts are done. The number of attempts can be done at configurable
   intervals.

2. First leg failure (aka Primary leg failure): The kernel would stop
   handling any further I/O. The dmeventd will change mirror meta data
   [it will re-order the legs ] so that the secondary now becomes the
   primary. I will try to re-integrate the failed device few times before
   giving up just as in the case "1" above.

The kernel module will never progress with a first leg failure as you
see above.

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 20:23                       ` [PATCH 7/7] Hold all write bios when errors are handled malahal
@ 2009-11-25 22:47                         ` Takahiro Yasui
  2009-11-25 23:20                           ` malahal
  2009-11-26 17:58                         ` Mikulas Patocka
  1 sibling, 1 reply; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-25 22:47 UTC (permalink / raw)
  To: dm-devel

On 11/25/09 15:23, malahal@us.ibm.com wrote:
> Mikulas Patocka [mpatocka@redhat.com] wrote:
>>
>>>> Imagine this scenario:
>>>> * secondary leg fails
>>>> * write fails on the secondaty leg and succeeds on the primary leg 
>>>> and is successfully complete
>>>> * the computer crashes
>>>> * after a reboot, the primary leg is inaccessible and the secondary leg is 
>>>> back online --- now raid1 would be returning stale data.
>>>
>>> The software can detect this case. We can fail this completely or use
>>> the data from the secondary that could be "stale" with help from admin. 
>>> Let us call this method 1.
>>
>> You can't detect it because the computer crashed *before* you write the 
>> information that the secondary leg failed to the metadata.
>>
>> So, after a reboot, you can't tell if any mirror leg failed some requests 
>> before the crash.
> 
> My definition of 'primary' is the first leg. Now on, I will use "first
> leg" to avoid confusion.  On a reboot, LVM can find if its first leg is
> missing. If it is missing, it can ask the admin whether to use the
> 'second' leg or not. When I said, "software" can detect, I really meant
> that LVM can detect that the "first leg" is missing.

I think again the scenario which Mikulas pointed. It looks double failures
(fails happened on two legs), and human intervention would be acceptable.
However, how do we know if the second leg contains valid data?

There might be two cases.

  1) System crashed during write operations without any disk failures, and
     the first leg fails at the next boot.

     We can use the secondary leg because data in the secondary leg is valid.

  2) System crashed after the secondary leg failed, and the first leg fails
     and the secondary leg gets back online at the next boot.

     We can't use the secondary leg because data might be stale.

I haven't checked the contents of log disk, but I guess we can't differentiate
these cases from log disks. Another possibility I thought was error messages.
If any error messages for the secondary leg are recorded, we can judge that
the secondary leg contains stale data, but I suspect that it is not a secure
way because syslog might not be written in disk before system crash.

I would like to enhance system availability by keep system running when
the secondary leg fails, but we need to confirm this case.

I appreciate your comments.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 20:44                         ` malahal
@ 2009-11-25 22:50                           ` Takahiro Yasui
  2009-11-26 17:56                           ` Mikulas Patocka
  1 sibling, 0 replies; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-25 22:50 UTC (permalink / raw)
  To: dm-devel

On 11/25/09 15:44, malahal@us.ibm.com wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
>>> The requirements are:
>>> * if one of legs fail or log fails, you must automatically continue 
>>> without human intervention
>>> * if both legs fail, you must shut it down and not pretend that something 
>>> was written when it wasn't (this would break durability requirement of 
>>> transactions).
>>
>> I agree with this point. lvm mirror could be used on filesystems such as
>> ext3 and each filesystem and application needs to take care those situation
>> to prevent data corruption. I don't think that it is realistic, and the
>> underlying layer should prevent data corruption. I now understand primary
>> and secondary disks need to be blocked.
> 
> If you think that I/O needs to be blocked for first or second leg
> failures, then I am afraid that there is no way to keep the failed
> mirror in the system for re-integrating failed devices.
> 
> I would like to keep the mirror as mirror when one of the legs fails as
> opposed to making it linear now by dmeventd. This is to re-integrate a
> failed leg into the mirror to handle transient device failures. Here is
> what I am planning to do, let me know if you find any issues:
> 
> 1. Secondary leg failure. dmeventd will NOT change the meta data at all.
>    It will call "lvchange --refresh" that will start resynchronization.
>    This will be done a few times. If the device still fails to
>    re-integrate, it is left the way it is and no further re-integrations
>    attempts are done. The number of attempts can be done at configurable
>    intervals.
> 
> 2. First leg failure (aka Primary leg failure): The kernel would stop
>    handling any further I/O. The dmeventd will change mirror meta data
>    [it will re-order the legs ] so that the secondary now becomes the
>    primary. I will try to re-integrate the failed device few times before
>    giving up just as in the case "1" above.
> 
> The kernel module will never progress with a first leg failure as you
> see above.

It looks very good idea for both increasing system availability and
handling transient errors. I hope we could solve the issue I commented
in the email:
  https://www.redhat.com/archives/dm-devel/2009-November/msg00253.html
and adopt this method.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 22:47                         ` Takahiro Yasui
@ 2009-11-25 23:20                           ` malahal
  2009-11-25 23:50                             ` Takahiro Yasui
  0 siblings, 1 reply; 29+ messages in thread
From: malahal @ 2009-11-25 23:20 UTC (permalink / raw)
  To: dm-devel

Takahiro Yasui [tyasui@redhat.com] wrote:
> I think again the scenario which Mikulas pointed. It looks double failures
> (fails happened on two legs), and human intervention would be acceptable.
> However, how do we know if the second leg contains valid data?
> 
> There might be two cases.
> 
>   1) System crashed during write operations without any disk failures, and
>      the first leg fails at the next boot.
> 
>      We can use the secondary leg because data in the secondary leg is valid.
> 
>   2) System crashed after the secondary leg failed, and the first leg fails
>      and the secondary leg gets back online at the next boot.
> 
>      We can't use the secondary leg because data might be stale.
> 
> I haven't checked the contents of log disk, but I guess we can't
> differentiate these cases from log disks.

There were plans to add a new region state to make sure that all the
mirror legs have same data after a "crash". Currently your best bet is a
complete resync after a crash! I am not sure if the state is written to
log disk though. It may be possible to distinguish the above two cases
with this...

Or just have LVM meta data that records a device failure. Suspend writes
[for any kind of leg] and record device failure in the LVM meta data and
restart writes. This requires LVM meta data change though!

> Another possibility I thought was error messages.
> If any error messages for the secondary leg are recorded, we can judge that
> the secondary leg contains stale data, but I suspect that it is not a secure
> way because syslog might not be written in disk before system crash.

We should be able to fix it in the LVM meta data!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 23:20                           ` malahal
@ 2009-11-25 23:50                             ` Takahiro Yasui
  2009-11-26  0:30                               ` malahal
  0 siblings, 1 reply; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-25 23:50 UTC (permalink / raw)
  To: dm-devel

On 11/25/09 18:20, malahal@us.ibm.com wrote:
> Takahiro Yasui [tyasui@redhat.com] wrote:
>> I think again the scenario which Mikulas pointed. It looks double failures
>> (fails happened on two legs), and human intervention would be acceptable.
>> However, how do we know if the second leg contains valid data?
>>
>> There might be two cases.
>>
>>   1) System crashed during write operations without any disk failures, and
>>      the first leg fails at the next boot.
>>
>>      We can use the secondary leg because data in the secondary leg is valid.
>>
>>   2) System crashed after the secondary leg failed, and the first leg fails
>>      and the secondary leg gets back online at the next boot.
>>
>>      We can't use the secondary leg because data might be stale.
>>
>> I haven't checked the contents of log disk, but I guess we can't
>> differentiate these cases from log disks.
> 
> There were plans to add a new region state to make sure that all the
> mirror legs have same data after a "crash". Currently your best bet is a
> complete resync after a crash!

Please let me clarify this. There are two legs and system crash happens.
Then, how can we resync? We have only one leg (secondary) after boot.
When we use "mirror", we expect the last device to contain valid data,
don't we?

> Or just have LVM meta data that records a device failure. Suspend writes
> [for any kind of leg] and record device failure in the LVM meta data and
> restart writes. This requires LVM meta data change though!

Do you mean that write I/Os need to be blocked when the secondary leg fails
in order to update LVM meta data by lvm commands called by dmeventd?

Thanks,
Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 23:50                             ` Takahiro Yasui
@ 2009-11-26  0:30                               ` malahal
  0 siblings, 0 replies; 29+ messages in thread
From: malahal @ 2009-11-26  0:30 UTC (permalink / raw)
  To: dm-devel

Takahiro Yasui [tyasui@redhat.com] wrote:
> >> I haven't checked the contents of log disk, but I guess we can't
> >> differentiate these cases from log disks.
> > 
> > There were plans to add a new region state to make sure that all the
> > mirror legs have same data after a "crash". Currently your best bet is a
> > complete resync after a crash!
> 
> Please let me clarify this. There are two legs and system crash happens.
> Then, how can we resync? We have only one leg (secondary) after boot.
> When we use "mirror", we expect the last device to contain valid data,
> don't we?

I was actually referring to a problem that I mentioned in a DM/LVM
meeting.  Mikulas posted it here and there were some discussions.
http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/5392

If I remember, the proposed solution would add another state bit. If it
is implemented, then we can use that information to differentiate the
cases you mentioned. Essentially, if there are only 'dirty' regions, we
can safely use the secondary. On the other hand, if th log has some 'out
of sync' regions, then we can't use the secondary.

You can ignore my reference to "complete resynchronization". I was
referring to an existing problem with system crashes!
 
> > Or just have LVM meta data that records a device failure. Suspend writes
> > [for any kind of leg] and record device failure in the LVM meta data and
> > restart writes. This requires LVM meta data change though!
> 
> Do you mean that write I/Os need to be blocked when the secondary leg fails
> in order to update LVM meta data by lvm commands called by dmeventd?

Yes, that is correct. We need to stop write I/O for any leg failure if
we chose LVM meta data method.

Thanks, Malahal.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 8/7] Hold all write bios in nosync region
  2009-11-25 15:43                       ` Takahiro Yasui
  2009-11-25 20:44                         ` malahal
@ 2009-11-26 17:54                         ` Mikulas Patocka
  1 sibling, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-26 17:54 UTC (permalink / raw)
  To: Takahiro Yasui; +Cc: dm-devel, Alasdair G Kergon



On Wed, 25 Nov 2009, Takahiro Yasui wrote:

> On 11/25/09 08:19, Mikulas Patocka wrote:
> > 
> > 
> > On Tue, 24 Nov 2009, malahal@us.ibm.com wrote:
> > 
> >> I need to look at the code again, but I thought any new writes to a
> >> failed region go to a surviving leg. In that case, we end up returning
> >> I/O's to the application after writing to a single leg.
> > 
> > Writes always go to all the legs, see do_write(). Anyway, dmeventd removes 
> > the failed leg soon.
> 
> Is it correct? When the region is in the state of out of sync (NOSYNC), 
> I/Os are not processed by do_write() but generic_make_request() in the
> do_writes().

This is good point. Here I'm sending patch 8 (to be applied on the top of 
the rest of patches) that fixes it.

Mikulas

---

Hold all write bios when leg fails and errors are handled

When using dmeventd to handle errors, we must be held until dmeventd does
its job. This patch prevents the following race:
* primary leg fails
* write "1" fail, the write is held, secondary leg is set default
* write "2" goes straight to the secondary leg
*** crash *** (before dmeventd does its job)
* after a reboot, primary leg goes back online, it is resynchronized to
  the secondary leg, write "2" is reverted although it already completed

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 drivers/md/dm-raid1.c |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Index: linux-2.6.32-rc8-devel/drivers/md/dm-raid1.c
===================================================================
--- linux-2.6.32-rc8-devel.orig/drivers/md/dm-raid1.c	2009-11-26 17:26:26.000000000 +0100
+++ linux-2.6.32-rc8-devel/drivers/md/dm-raid1.c	2009-11-26 18:30:48.000000000 +0100
@@ -68,6 +68,7 @@ struct mirror_set {
 	region_t nr_regions;
 	int in_sync;
 	int log_failure;
+	int leg_failure;
 	atomic_t suspend;
 
 	atomic_t default_mirror;	/* Default mirror */
@@ -210,6 +211,8 @@ static void fail_mirror(struct mirror *m
 	struct mirror_set *ms = m->ms;
 	struct mirror *new;
 
+	ms->leg_failure = 1;
+
 	/*
 	 * error_count is used for nothing more than a
 	 * simple way to tell if a device has encountered
@@ -694,8 +697,12 @@ static void do_writes(struct mirror_set 
 		dm_rh_delay(ms->rh, bio);
 
 	while ((bio = bio_list_pop(&nosync))) {
-		map_bio(get_default_mirror(ms), bio);
-		generic_make_request(bio);
+		if (unlikely(ms->leg_failure) && errors_handled(ms))
+			hold_bio(ms, bio);
+		else {
+			map_bio(get_default_mirror(ms), bio);
+			generic_make_request(bio);
+		}
 	}
 }
 
@@ -816,6 +823,7 @@ static struct mirror_set *alloc_context(
 	ms->nr_regions = dm_sector_div_up(ti->len, region_size);
 	ms->in_sync = 0;
 	ms->log_failure = 0;
+	ms->leg_failure = 0;
 	atomic_set(&ms->suspend, 0);
 	atomic_set(&ms->default_mirror, DEFAULT_MIRROR);
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 20:44                         ` malahal
  2009-11-25 22:50                           ` Takahiro Yasui
@ 2009-11-26 17:56                           ` Mikulas Patocka
  1 sibling, 0 replies; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-26 17:56 UTC (permalink / raw)
  To: malahal; +Cc: dm-devel



On Wed, 25 Nov 2009, malahal@us.ibm.com wrote:

> Takahiro Yasui [tyasui@redhat.com] wrote:
> > > The requirements are:
> > > * if one of legs fail or log fails, you must automatically continue 
> > > without human intervention
> > > * if both legs fail, you must shut it down and not pretend that something 
> > > was written when it wasn't (this would break durability requirement of 
> > > transactions).
> > 
> > I agree with this point. lvm mirror could be used on filesystems such as
> > ext3 and each filesystem and application needs to take care those situation
> > to prevent data corruption. I don't think that it is realistic, and the
> > underlying layer should prevent data corruption. I now understand primary
> > and secondary disks need to be blocked.
> 
> If you think that I/O needs to be blocked for first or second leg
> failures, then I am afraid that there is no way to keep the failed
> mirror in the system for re-integrating failed devices.
> 
> I would like to keep the mirror as mirror when one of the legs fails as
> opposed to making it linear now by dmeventd. This is to re-integrate a
> failed leg into the mirror to handle transient device failures. Here is
> what I am planning to do, let me know if you find any issues:
> 
> 1. Secondary leg failure. dmeventd will NOT change the meta data at all.
>    It will call "lvchange --refresh" that will start resynchronization.
>    This will be done a few times. If the device still fails to
>    re-integrate, it is left the way it is and no further re-integrations
>    attempts are done. The number of attempts can be done at configurable
>    intervals.
> 
> 2. First leg failure (aka Primary leg failure): The kernel would stop
>    handling any further I/O. The dmeventd will change mirror meta data
>    [it will re-order the legs ] so that the secondary now becomes the
>    primary. I will try to re-integrate the failed device few times before
>    giving up just as in the case "1" above.
> 
> The kernel module will never progress with a first leg failure as you
> see above.
> 
> Thanks, Malahal.

This is possible but it would need some redesign. The purpose of my patch 
was just to fix the race, not redesign it for transient failures. But yes 
--- in the long term, it could be done.

Mikulas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-25 20:23                       ` [PATCH 7/7] Hold all write bios when errors are handled malahal
  2009-11-25 22:47                         ` Takahiro Yasui
@ 2009-11-26 17:58                         ` Mikulas Patocka
  2009-11-26 22:22                           ` malahal
  1 sibling, 1 reply; 29+ messages in thread
From: Mikulas Patocka @ 2009-11-26 17:58 UTC (permalink / raw)
  To: device-mapper development

> > If you ask the admin always if primary leg failed and wait for his action, 
> > you lose fault-tolerance --- the computer would wait until the admin does 
> > an action.
> 
> Well, we are talking one time admin help at reboot [actually activation
> time] if and only if the "state of the mirror" changed during reboot!
> This is not applicable at run time. Most of the time, dmeventd will have
> a chance to change the mirror to linear anyway.

You can ask the admin about failed on reboot --- but why do it if it can 
continue on its own? (unless both legs fail)

Mikulas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 7/7] Hold all write bios when errors are handled
  2009-11-26 17:58                         ` Mikulas Patocka
@ 2009-11-26 22:22                           ` malahal
  0 siblings, 0 replies; 29+ messages in thread
From: malahal @ 2009-11-26 22:22 UTC (permalink / raw)
  To: dm-devel

Mikulas Patocka [mpatocka@redhat.com] wrote:
> > > If you ask the admin always if primary leg failed and wait for his
> > > action, you lose fault-tolerance --- the computer would wait until
> > > the admin does an action.
> > 
> > Well, we are talking one time admin help at reboot [actually activation
> > time] if and only if the "state of the mirror" changed during reboot!
> > This is not applicable at run time. Most of the time, dmeventd will have
> > a chance to change the mirror to linear anyway.
> 
> You can ask the admin about failed on reboot --- but why do it if it can 
> continue on its own? (unless both legs fail)

Agreed, it is better to stop on either failure.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] A framework for holding bios until suspend
  2009-11-18 12:11   ` [PATCH 2/7] A framework for holding bios until suspend Mikulas Patocka
  2009-11-18 12:11     ` [PATCH 3/7] Use the hold framework in do_failures Mikulas Patocka
@ 2009-11-28 18:02     ` Takahiro Yasui
  2009-11-30  2:55       ` malahal
  2009-11-30  9:41       ` Alasdair G Kergon
  1 sibling, 2 replies; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-28 18:02 UTC (permalink / raw)
  To: dm-devel

On 11/18/09 07:11, Mikulas Patocka wrote:
> @@ -760,6 +775,7 @@ static void do_mirror(struct work_struct
>  	bio_list_init(&ms->reads);
>  	bio_list_init(&ms->writes);
>  	bio_list_init(&ms->failures);
> +	bio_list_init(&ms->hold);
>  	spin_unlock_irqrestore(&ms->lock, flags);

Initializing the hold list in do_mirror() is a problem. Some bios might
already in it and they will never be processed. This makes device-mapper
"stuck" when the device goes suspend. The hold list should be inizialized
only when the mirror device is created in alloc_context().

alloc_context()
        spin_lock_init(&ms->lock);
        bio_list_init(&ms->reads);
        bio_list_init(&ms->writes);
        bio_list_init(&ms->failures);
+        bio_list_init(&ms->hold);

Thanks,
Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] A framework for holding bios until suspend
  2009-11-28 18:02     ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
@ 2009-11-30  2:55       ` malahal
  2009-11-30  9:41       ` Alasdair G Kergon
  1 sibling, 0 replies; 29+ messages in thread
From: malahal @ 2009-11-30  2:55 UTC (permalink / raw)
  To: dm-devel

Takahiro Yasui [tyasui@redhat.com] wrote:
> On 11/18/09 07:11, Mikulas Patocka wrote:
> > @@ -760,6 +775,7 @@ static void do_mirror(struct work_struct
> >  	bio_list_init(&ms->reads);
> >  	bio_list_init(&ms->writes);
> >  	bio_list_init(&ms->failures);
> > +	bio_list_init(&ms->hold);
> >  	spin_unlock_irqrestore(&ms->lock, flags);
> 
> Initializing the hold list in do_mirror() is a problem. Some bios might
> already in it and they will never be processed. This makes device-mapper
> "stuck" when the device goes suspend. The hold list should be inizialized
> only when the mirror device is created in alloc_context().
> 
> alloc_context()
>         spin_lock_init(&ms->lock);
>         bio_list_init(&ms->reads);
>         bio_list_init(&ms->writes);
>         bio_list_init(&ms->failures);
> +        bio_list_init(&ms->hold);
> 
> Thanks,
> Taka

Good catch!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/7] A framework for holding bios until suspend
  2009-11-28 18:02     ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
  2009-11-30  2:55       ` malahal
@ 2009-11-30  9:41       ` Alasdair G Kergon
  1 sibling, 0 replies; 29+ messages in thread
From: Alasdair G Kergon @ 2009-11-30  9:41 UTC (permalink / raw)
  To: Takahiro Yasui; +Cc: dm-devel

On Sat, Nov 28, 2009 at 01:02:15PM -0500, Takahiro Yasui wrote:
> Initializing the hold list in do_mirror() is a problem. Some bios might
> already in it and they will never be processed. This makes device-mapper
> "stuck" when the device goes suspend. The hold list should be inizialized
> only when the mirror device is created in alloc_context().
 
I fixed that as I merged the patch into my tree and didn't even notice the
original patch was wrong!  (I assumed it was a fuzzy patching error at my end.)

http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-raid1-add-framework-to-hold-bios-during-suspend.patch
http://www.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/dm-raid1-explicitly-initialise-bio_lists.patch

Alasdair

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/7] patches: fix dm-raid1 race, bug 502927
  2009-11-18 12:09 [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Mikulas Patocka
  2009-11-18 12:10 ` [PATCH 1/7] Explicitly initialize bio lists Mikulas Patocka
@ 2009-11-30 16:46 ` Takahiro Yasui
  1 sibling, 0 replies; 29+ messages in thread
From: Takahiro Yasui @ 2009-11-30 16:46 UTC (permalink / raw)
  To: dm-devel

On 11/18/09 07:09, Mikulas Patocka wrote:
> Hi
> 
> Here is the serie of 7 patches to hold write bios on dm-raid1 until 
> dmeventd does its job. It fixes bug 
> https://bugzilla.redhat.com/show_bug.cgi?id=502927 . The first 6 patches 
> are preparatory, they just move the code around, the last patch does the 
> fix.
> 
> I tested the thing, I managed to reproduce the bug (by manually stopping 
> dmeventd with STOP signal, failing primary mirror leg and writing to the 
> device) and I also verified that the patches fix the bug.
> 
> For non-dmeventd operation, the current behavior is wrong and I just keep 
> it as wrong as it was. There is no easy fix. It is just assume that if the 
> user doesn't use dmeventd, he can't activate failed disks again.
> 
> Mikulas
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

I reviewed and tested your patch set and looks good as a kernel side.

Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>

However, there are two issues found related to #8 and requires
improvements of dmeventd and lvm commands. This patch set is based on
the idea that dmeventd and lvm commands (lvconvert and vgreduce) fix
device failures and release blocked write I/Os. However, the blocked
write I/Os won't be released forever in some cases.

* Case 1: Medium error

When medium errors are detected for a write I/O and reported to dmeventd,
lvconvert is kicked from dmeventd, but nothing is done. Therefore, write
I/Os will be blocked forever. lvm commands will result in as follows:

# dmsetup status vg00-lv00
0 24576 mirror 2 253:1 253:2 23/24 1 DA 3 disk 253:0 A
# lvconvert --config devices{ignore_suspended_devices=1} --repair vg00/lv00
  The mirror is consistent, nothing to repair.
# vgreduce --removemissing vg00
  Volume group "vg00" is already consistent

# /usr/sbin/lvm version
  LVM version:     2.02.57(1)-cvs (2009-11-24)
  Library version: 1.02.41-cvs (2009-11-24)
  Driver version:  4.15.0
# uname -mr
2.6.31.6 i686


* Case 2: Sync error

dmeventd doesn't handle sync error ('S' showed by status) which happens
during recovery. When write I/Os are issued on out-of-sync region, they
are blocked, but dmeventd won't handle sync error and release blocked I/Os.

The error on the primary leg during recovery won't help and we need to
accept the system stop because of no valid leg. The error on the secondary
leg can be handled as the regular write error. We can fix this issue by
changing the error flag from DM_RAID1_SYNC_ERROR to DM_RAID1_WRITE_ERROR
so that this error can be handled by dmeventd.

Other idea is to change dmeventd so that it can handle sync error ('S')
on the secondary error. It is also easy to make a small patch.

Thanks,
Taka

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2009-11-30 16:46 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-18 12:09 [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Mikulas Patocka
2009-11-18 12:10 ` [PATCH 1/7] Explicitly initialize bio lists Mikulas Patocka
2009-11-18 12:11   ` [PATCH 2/7] A framework for holding bios until suspend Mikulas Patocka
2009-11-18 12:11     ` [PATCH 3/7] Use the hold framework in do_failures Mikulas Patocka
2009-11-18 12:12       ` [PATCH 4/7] Don't optimize for failure case Mikulas Patocka
2009-11-18 12:13         ` [PATCH 5/7] Move a logic to get a valid mirror leg to a function Mikulas Patocka
2009-11-18 12:18           ` [PATCH 6/7] Move bio completion from dm_rh_mark_nosync to its caller Mikulas Patocka
2009-11-18 12:19             ` [PATCH 7/7] Hold all write bios when errors are handled Mikulas Patocka
2009-11-23  5:58               ` malahal
2009-11-23 17:54                 ` Takahiro Yasui
2009-11-24 11:51                 ` Mikulas Patocka
2009-11-24 19:17                   ` malahal
2009-11-25 13:19                     ` Mikulas Patocka
2009-11-25 15:43                       ` Takahiro Yasui
2009-11-25 20:44                         ` malahal
2009-11-25 22:50                           ` Takahiro Yasui
2009-11-26 17:56                           ` Mikulas Patocka
2009-11-26 17:54                         ` [PATCH 8/7] Hold all write bios in nosync region Mikulas Patocka
2009-11-25 20:23                       ` [PATCH 7/7] Hold all write bios when errors are handled malahal
2009-11-25 22:47                         ` Takahiro Yasui
2009-11-25 23:20                           ` malahal
2009-11-25 23:50                             ` Takahiro Yasui
2009-11-26  0:30                               ` malahal
2009-11-26 17:58                         ` Mikulas Patocka
2009-11-26 22:22                           ` malahal
2009-11-28 18:02     ` [PATCH 2/7] A framework for holding bios until suspend Takahiro Yasui
2009-11-30  2:55       ` malahal
2009-11-30  9:41       ` Alasdair G Kergon
2009-11-30 16:46 ` [PATCH 0/7] patches: fix dm-raid1 race, bug 502927 Takahiro Yasui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.