All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2/3] Fix possible lost of temporary devices in pvmove.
@ 2009-09-17 15:25 Milan Broz
  2009-09-28 19:44 ` Alasdair G Kergon
  0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2009-09-17 15:25 UTC (permalink / raw)
  To: lvm-devel

Fix possible lost of temporary devices in pvmove.

If some process locks temporary pvmove device (udev scan etc),
the code do not detect that device remains in kernel.
Because device is removed from lvm metadata, subsequent
pvmove will crash leaving suspended devices.

Fix consists from these steps:
 - with previous patch we can detect that lv_deactivate failed.
 
 - instead of creating empty pvmove layer in remove_layers_for_segments_all
 replace this device with error target.
 (which btw means that number of segments for active kernel device is still
 the same as size in metadata - now we have activated mirror and zero size
 LV in metadata)
 
 - if pvmove fails, try to sleep some time and retry. Usually device is locked
 by scan from wrong udev rule and remapping to error target causes all scanning
 programs to quickly fail.
 (This is real hack, but it was proved in cryptsetup in similar situation
 that it helps.)

Signed-off-by: Milan Broz <mbroz@redhat.com>
---
 lib/metadata/lv_manip.c |    2 +-
 tools/pvmove.c          |   27 ++++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/lib/metadata/lv_manip.c b/lib/metadata/lv_manip.c
index 70d9da2..33be952 100644
--- a/lib/metadata/lv_manip.c
+++ b/lib/metadata/lv_manip.c
@@ -2351,7 +2351,7 @@ int remove_layers_for_segments_all(struct cmd_context *cmd,
 			return_0;
 	}
 
-	if (!lv_empty(layer_lv))
+	if (!replace_lv_with_error_segment(layer_lv))
 		return_0;
 
 	return 1;
diff --git a/tools/pvmove.c b/tools/pvmove.c
index 2490863..364b084 100644
--- a/tools/pvmove.c
+++ b/tools/pvmove.c
@@ -456,6 +456,26 @@ out:
 	return r;
 }
 
+#define TEMP_DEVICE_REMOVAL_RETRIES 3
+static int retry_deactivate_lv(struct cmd_context *cmd,
+				struct logical_volume *lv_mirr)
+{
+	int retries = 0;
+
+	while (!deactivate_lv(cmd, lv_mirr)) {
+		log_debug("Temporary pvmove volume %s locked, retrying...",
+			  lv_mirr->name);
+		sleep(1);
+		if (++retries > TEMP_DEVICE_REMOVAL_RETRIES) {
+			log_error("ABORTING: Unable to deactivate temporary logical "
+			"volume %s", lv_mirr->name);
+			return 0;
+		}
+	}
+
+	return 1;
+}
+
 static int _finish_pvmove(struct cmd_context *cmd, struct volume_group *vg,
 			  struct logical_volume *lv_mirr,
 			  struct dm_list *lvs_changed)
@@ -517,11 +537,8 @@ static int _finish_pvmove(struct cmd_context *cmd, struct volume_group *vg,
 	resume_lvs(cmd, lvs_changed);
 
 	/* Deactivate mirror LV */
-	if (!deactivate_lv(cmd, lv_mirr)) {
-		log_error("ABORTING: Unable to deactivate temporary logical "
-			  "volume \"%s\"", lv_mirr->name);
-		r = 0;
-	}
+	if (!retry_deactivate_lv(cmd, lv_mirr))
+		return 0;
 
 	log_verbose("Removing temporary pvmove LV");
 	if (!lv_remove(lv_mirr)) {




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] Fix possible lost of temporary devices in pvmove.
  2009-09-17 15:25 [PATCH 2/3] Fix possible lost of temporary devices in pvmove Milan Broz
@ 2009-09-28 19:44 ` Alasdair G Kergon
  2009-09-29  6:13   ` Milan Broz
  0 siblings, 1 reply; 4+ messages in thread
From: Alasdair G Kergon @ 2009-09-28 19:44 UTC (permalink / raw)
  To: lvm-devel

On Thu, Sep 17, 2009 at 05:25:43PM +0200, Milan Broz wrote:
>  - if pvmove fails, try to sleep some time and retry. Usually device is locked
>  by scan from wrong udev rule and remapping to error target causes all scanning
>  programs to quickly fail.

I'd rather see the real problem addressed: people fixing misbehaving udev
rules.  There should be no need for workarounds like this in lvm2, which would
only perpetuate the problem.  Presumably the problem will disappear anyway
when the udev integration work is activated.

Alasdair



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] Fix possible lost of temporary devices in pvmove.
  2009-09-28 19:44 ` Alasdair G Kergon
@ 2009-09-29  6:13   ` Milan Broz
  2009-09-29 11:08     ` Alasdair G Kergon
  0 siblings, 1 reply; 4+ messages in thread
From: Milan Broz @ 2009-09-29  6:13 UTC (permalink / raw)
  To: lvm-devel

Alasdair G Kergon wrote:
> On Thu, Sep 17, 2009 at 05:25:43PM +0200, Milan Broz wrote:
>>  - if pvmove fails, try to sleep some time and retry. Usually device is locked
>>  by scan from wrong udev rule and remapping to error target causes all scanning
>>  programs to quickly fail.
> 
> I'd rather see the real problem addressed: people fixing misbehaving udev
> rules.  There should be no need for workarounds like this in lvm2, which would
> only perpetuate the problem.  Presumably the problem will disappear anyway
> when the udev integration work is activated.

Of course the real problem must be addressed. But this is not the first time someone
had idea to scan all devices in system. Now udev calling dmsetup from rules,
DevKit doing the same using own programs, various GUI disk utilities etc.
Maybe even some monitoring systems do that...

If there is no such workaround, pvmove is no longer reliable in such enviroment.
The problem exists in most of recent distros.
And it leads to data corruption, see  https://bugzilla.redhat.com/show_bug.cgi?id=520913

I would prefer adding simple and documented workaround then lose users' data.

Milan



^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] Fix possible lost of temporary devices in pvmove.
  2009-09-29  6:13   ` Milan Broz
@ 2009-09-29 11:08     ` Alasdair G Kergon
  0 siblings, 0 replies; 4+ messages in thread
From: Alasdair G Kergon @ 2009-09-29 11:08 UTC (permalink / raw)
  To: lvm-devel

On Tue, Sep 29, 2009 at 08:13:26AM +0200, Milan Broz wrote:
> And it leads to data corruption, see  https://bugzilla.redhat.com/show_bug.cgi?id=520913
 
Any data corruption is not a consequence of bogus udev rules.
It's a consequence of incorrect error handling in the lvm2 code.

Alasdair



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-09-29 11:08 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-17 15:25 [PATCH 2/3] Fix possible lost of temporary devices in pvmove Milan Broz
2009-09-28 19:44 ` Alasdair G Kergon
2009-09-29  6:13   ` Milan Broz
2009-09-29 11:08     ` Alasdair G Kergon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.