All of lore.kernel.org
 help / color / mirror / Atom feed
* [md PATCH 2/5]  md: Enable reshape for external metadata
@ 2010-06-09 14:21 Kwolek, Adam
  2010-06-16  4:53 ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Kwolek, Adam @ 2010-06-09 14:21 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

(md: Online Capacity Expansion for IMSM)
Reshape can't go forward for external metadatas due to fact, that internal md flags are updated during native meta writing. For external metadatas
md_update_sb() is not called (reshape process is blocked).
To take carry about md flags in external metadata array case and allow reshape to roll over, md_update_sb() is called in similar way to native metadata. The difference is that metadata is not stored to disks by md, but externally by mdmon.
---

 drivers/md/md.c |   24 +++++++++++++++++-------
 1 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c index 9cec771..0a51406 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2065,8 +2065,6 @@ static void md_update_sb(mddev_t * mddev, int force_change)
 	int nospares = 0;
 
 	mddev->utime = get_seconds();
-	if (mddev->external)
-		return;
 repeat:
 	spin_lock_irq(&mddev->write_lock);
 
@@ -2130,11 +2128,16 @@ repeat:
 	/*
 	 * do not write anything to disk if using
 	 * nonpersistent superblocks
+	 * and
+	 * for external meta and reshape
+	 * we can get here also
 	 */
-	if (!mddev->persistent) {
-		if (!mddev->external)
-			clear_bit(MD_CHANGE_PENDING, &mddev->flags);
-
+	if (!mddev->persistent ||
+	    mddev->external) {
+		/* Reshape for external meta
+		 * MD_CHANGE_PENDING has to be cleared also
+		 */
+		clear_bit(MD_CHANGE_PENDING, &mddev->flags);
 		spin_unlock_irq(&mddev->write_lock);
 		wake_up(&mddev->sb_wait);
 		return;
@@ -6895,8 +6898,15 @@ void md_check_recovery(mddev_t *mddev)
 		(mddev->external == 0 && mddev->safemode == 1) ||
 		(mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending)
 		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
-		))
+		)) {
+
+		/* Reshape for external meta /add disks/
+		 */
+		if ((mddev->external) && (mddev->flags))
+			md_update_sb(mddev, 0);
+
 		return;
+	}
 
 	if (mddev_trylock(mddev)) {
 		int spares = 0;


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-09 14:21 [md PATCH 2/5] md: Enable reshape for external metadata Kwolek, Adam
@ 2010-06-16  4:53 ` Neil Brown
  2010-06-16 14:52   ` Kwolek, Adam
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Brown @ 2010-06-16  4:53 UTC (permalink / raw)
  To: Kwolek, Adam; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

On Wed, 9 Jun 2010 15:21:05 +0100
"Kwolek, Adam" <adam.kwolek@intel.com> wrote:

> (md: Online Capacity Expansion for IMSM)
> Reshape can't go forward for external metadatas due to fact, that internal md flags are updated during native meta writing. For external metadatas
> md_update_sb() is not called (reshape process is blocked).
> To take carry about md flags in external metadata array case and allow reshape to roll over, md_update_sb() is called in similar way to native metadata. The difference is that metadata is not stored to disks by md, but externally by mdmon.

I agree there is a problem here but I think you are approaching it the wrong
way.  We need to make sure the problem flag doesn't get set when external
metadata is used.

I found something similar (maybe the same thing) when writing the dm-raid456
module.  Does that patch:

http://neil.brown.name/git?p=md;a=commitdiff;h=3b930b37e50702f97f8fad6d7f5ee6d3f268394e

solve the problem for you?

Thanks,
NeilBrown


> ---
> 
>  drivers/md/md.c |   24 +++++++++++++++++-------
>  1 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c index 9cec771..0a51406 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -2065,8 +2065,6 @@ static void md_update_sb(mddev_t * mddev, int force_change)
>  	int nospares = 0;
>  
>  	mddev->utime = get_seconds();
> -	if (mddev->external)
> -		return;
>  repeat:
>  	spin_lock_irq(&mddev->write_lock);
>  
> @@ -2130,11 +2128,16 @@ repeat:
>  	/*
>  	 * do not write anything to disk if using
>  	 * nonpersistent superblocks
> +	 * and
> +	 * for external meta and reshape
> +	 * we can get here also
>  	 */
> -	if (!mddev->persistent) {
> -		if (!mddev->external)
> -			clear_bit(MD_CHANGE_PENDING, &mddev->flags);
> -
> +	if (!mddev->persistent ||
> +	    mddev->external) {
> +		/* Reshape for external meta
> +		 * MD_CHANGE_PENDING has to be cleared also
> +		 */
> +		clear_bit(MD_CHANGE_PENDING, &mddev->flags);
>  		spin_unlock_irq(&mddev->write_lock);
>  		wake_up(&mddev->sb_wait);
>  		return;
> @@ -6895,8 +6898,15 @@ void md_check_recovery(mddev_t *mddev)
>  		(mddev->external == 0 && mddev->safemode == 1) ||
>  		(mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending)
>  		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
> -		))
> +		)) {
> +
> +		/* Reshape for external meta /add disks/
> +		 */
> +		if ((mddev->external) && (mddev->flags))
> +			md_update_sb(mddev, 0);
> +
>  		return;
> +	}
>  
>  	if (mddev_trylock(mddev)) {
>  		int spares = 0;
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-16  4:53 ` Neil Brown
@ 2010-06-16 14:52   ` Kwolek, Adam
  2010-06-17  6:11     ` Neil Brown
  0 siblings, 1 reply; 8+ messages in thread
From: Kwolek, Adam @ 2010-06-16 14:52 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Wednesday, June 16, 2010 6:54 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: [md PATCH 2/5] md: Enable reshape for external metadata
> 
> On Wed, 9 Jun 2010 15:21:05 +0100
> "Kwolek, Adam" <adam.kwolek@intel.com> wrote:
> 
> > (md: Online Capacity Expansion for IMSM)
> > Reshape can't go forward for external metadatas due to fact, that
> internal md flags are updated during native meta writing. For external
> metadatas
> > md_update_sb() is not called (reshape process is blocked).
> > To take carry about md flags in external metadata array case and
> allow reshape to roll over, md_update_sb() is called in similar way to
> native metadata. The difference is that metadata is not stored to disks
> by md, but externally by mdmon.
> 
> I agree there is a problem here but I think you are approaching it the
> wrong
> way.  We need to make sure the problem flag doesn't get set when
> external
> metadata is used.
> 
> I found something similar (maybe the same thing) when writing the dm-
> raid456
> module.  Does that patch:
> 
> http://neil.brown.name/git?p=md;a=commitdiff;h=3b930b37e50702f97f8fad6d
> 7f5ee6d3f268394e
> 
> solve the problem for you?
> 
> Thanks,
> NeilBrown

It almost solves problem.
To make it solved the following change I have to add:

md/md.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/md/md.c b/md/md.c
index e9b5d67..01e88cd 100644
--- a/md/md.c
+++ b/md/md.c
@@ -6437,7 +6437,8 @@ int md_allow_write(mddev_t *mddev)
 	spin_lock_irq(&mddev->write_lock);
 	if (mddev->in_sync) {
 		mddev->in_sync = 0;
-		set_bit(MD_CHANGE_CLEAN, &mddev->flags);
+		if (mddev->persistent)
+			set_bit(MD_CHANGE_CLEAN, &mddev->flags);
 		if (mddev->safemode_delay &&
 		    mddev->safemode == 0)
 			mddev->safemode = 1;

This closes MD_CHANGE_CLEAN flag management for me.
What do you think about this?

Another thing is waiting during reshape for metadata update on MD_CHANGE_DEVS flag. 
To roll reshape I've added the following code (instead calling md_ubdate_sb()):


diff --git a/md/md.c b/md/md.c
index 01e88cd..539b323 100644
--- a/md/md.c
+++ b/md/md.c
@@ -6896,8 +6896,17 @@ void md_check_recovery(mddev_t *mddev)
 		(mddev->external == 0 && mddev->safemode == 1) ||
 		(mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending)
 		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
-		))
+		)) {
+			if ((mddev->external) && (mddev->flags)) {
+				/* FIXME: 
+				 * for external metadata checkpointing purposes
+				 * put communication with userspace here
+				 */
+				clear_bit(MD_CHANGE_DEVS, &mddev->flags);
+				wake_up(&mddev->sb_wait);
+			}
 			return;
+		}
 
 	if (mddev_trylock(mddev)) {
 		int spares = 0;


This change we can reuse for checkpointing purposes.
External metadata updates can be triggered in the same moments as native one (reuse internal md communication mechanisms).

What is your opinion about this?

BR
Adam



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-16 14:52   ` Kwolek, Adam
@ 2010-06-17  6:11     ` Neil Brown
  2010-06-17  9:40       ` Trela, Maciej
  2010-06-17 16:49       ` Kwolek, Adam
  0 siblings, 2 replies; 8+ messages in thread
From: Neil Brown @ 2010-06-17  6:11 UTC (permalink / raw)
  To: Kwolek, Adam; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

On Wed, 16 Jun 2010 15:52:06 +0100
"Kwolek, Adam" <adam.kwolek@intel.com> wrote:

> > -----Original Message-----
> > From: Neil Brown [mailto:neilb@suse.de]
> > Sent: Wednesday, June 16, 2010 6:54 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> > Subject: Re: [md PATCH 2/5] md: Enable reshape for external metadata
> > 
> > On Wed, 9 Jun 2010 15:21:05 +0100
> > "Kwolek, Adam" <adam.kwolek@intel.com> wrote:
> > 
> > > (md: Online Capacity Expansion for IMSM)
> > > Reshape can't go forward for external metadatas due to fact, that
> > internal md flags are updated during native meta writing. For external
> > metadatas
> > > md_update_sb() is not called (reshape process is blocked).
> > > To take carry about md flags in external metadata array case and
> > allow reshape to roll over, md_update_sb() is called in similar way to
> > native metadata. The difference is that metadata is not stored to disks
> > by md, but externally by mdmon.
> > 
> > I agree there is a problem here but I think you are approaching it the
> > wrong
> > way.  We need to make sure the problem flag doesn't get set when
> > external
> > metadata is used.
> > 
> > I found something similar (maybe the same thing) when writing the dm-
> > raid456
> > module.  Does that patch:
> > 
> > http://neil.brown.name/git?p=md;a=commitdiff;h=3b930b37e50702f97f8fad6d
> > 7f5ee6d3f268394e
> > 
> > solve the problem for you?
> > 
> > Thanks,
> > NeilBrown
> 
> It almost solves problem.
> To make it solved the following change I have to add:
> 
> md/md.c |    5 +++--
>  1 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/md/md.c b/md/md.c
> index e9b5d67..01e88cd 100644
> --- a/md/md.c
> +++ b/md/md.c
> @@ -6437,7 +6437,8 @@ int md_allow_write(mddev_t *mddev)
>  	spin_lock_irq(&mddev->write_lock);
>  	if (mddev->in_sync) {
>  		mddev->in_sync = 0;
> -		set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> +		if (mddev->persistent)
> +			set_bit(MD_CHANGE_CLEAN, &mddev->flags);
>  		if (mddev->safemode_delay &&
>  		    mddev->safemode == 0)
>  			mddev->safemode = 1;
> 
> This closes MD_CHANGE_CLEAN flag management for me.
> What do you think about this?

No, that would be wrong.

If the array is 'clean' and a write happens, the array_state changes to
"write_pending" and the write blocks until mdmon updates the metadata and
then writes "active" to "array_state".

Setting MD_CHANGE_CLEAN is important for this handshake to work properly.


> 
> Another thing is waiting during reshape for metadata update on MD_CHANGE_DEVS flag. 
> To roll reshape I've added the following code (instead calling md_ubdate_sb()):

Yes, there is a real issue there...

I don't think we ever need the kernel to wait for an external metadata handler
to respond to device changes (apart from failure which is handled separately).
So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with
if (mddev->persistent)

I think that would be best, but I've make a note to review that later.

Thanks,
NeilBrown


> 
> 
> diff --git a/md/md.c b/md/md.c
> index 01e88cd..539b323 100644
> --- a/md/md.c
> +++ b/md/md.c
> @@ -6896,8 +6896,17 @@ void md_check_recovery(mddev_t *mddev)
>  		(mddev->external == 0 && mddev->safemode == 1) ||
>  		(mddev->safemode == 2 && ! atomic_read(&mddev->writes_pending)
>  		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
> -		))
> +		)) {
> +			if ((mddev->external) && (mddev->flags)) {
> +				/* FIXME: 
> +				 * for external metadata checkpointing purposes
> +				 * put communication with userspace here
> +				 */
> +				clear_bit(MD_CHANGE_DEVS, &mddev->flags);
> +				wake_up(&mddev->sb_wait);
> +			}
>  			return;
> +		}
>  
>  	if (mddev_trylock(mddev)) {
>  		int spares = 0;
> 
> 
> This change we can reuse for checkpointing purposes.
> External metadata updates can be triggered in the same moments as native one (reuse internal md communication mechanisms).
> 
> What is your opinion about this?
> 
> BR
> Adam
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-17  6:11     ` Neil Brown
@ 2010-06-17  9:40       ` Trela, Maciej
  2010-06-17 10:35         ` Neil Brown
  2010-06-17 16:49       ` Kwolek, Adam
  1 sibling, 1 reply; 8+ messages in thread
From: Trela, Maciej @ 2010-06-17  9:40 UTC (permalink / raw)
  To: Neil Brown, Kwolek, Adam; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed

> 
> >
> > Another thing is waiting during reshape for metadata update on
> MD_CHANGE_DEVS flag.
> > To roll reshape I've added the following code (instead calling
> md_ubdate_sb()):
> 
> Yes, there is a real issue there...
> 
> I don't think we ever need the kernel to wait for an external metadata
> handler
> to respond to device changes (apart from failure which is handled
> separately).
> So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with
> if (mddev->persistent)
> 
> I think that would be best, but I've make a note to review that later.
> 

Neil,
from what I see in the raid5.c/md.c "native" code uses MD_CHANGE_DEVS
during the reshape if it reaches special points when metadata
write is really needed to update the reshape checkpoint.
In reshape_request():
	/* Cannot proceed until we've updated the superblock */
	..
	set_bit(MD_CHANGE_DEVS, mddev->flags)

In md_check_recovery() we have:
	if (mddev->flags) 
		md_update_sb()

Couldn't we follow this logic with MD_CHANGE_DEVS for external metadata?
If not, how to detect the need for migration checkpoint update?

Thanks,
Maciek


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-17  9:40       ` Trela, Maciej
@ 2010-06-17 10:35         ` Neil Brown
  2010-06-17 13:19           ` Trela, Maciej
  0 siblings, 1 reply; 8+ messages in thread
From: Neil Brown @ 2010-06-17 10:35 UTC (permalink / raw)
  To: Trela, Maciej; +Cc: Kwolek, Adam, linux-raid, Williams, Dan J, Ciechanowski, Ed

On Thu, 17 Jun 2010 10:40:36 +0100
"Trela, Maciej" <Maciej.Trela@intel.com> wrote:

> > 
> > >
> > > Another thing is waiting during reshape for metadata update on
> > MD_CHANGE_DEVS flag.
> > > To roll reshape I've added the following code (instead calling
> > md_ubdate_sb()):
> > 
> > Yes, there is a real issue there...
> > 
> > I don't think we ever need the kernel to wait for an external metadata
> > handler
> > to respond to device changes (apart from failure which is handled
> > separately).
> > So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with
> > if (mddev->persistent)
> > 
> > I think that would be best, but I've make a note to review that later.
> > 
> 
> Neil,
> from what I see in the raid5.c/md.c "native" code uses MD_CHANGE_DEVS
> during the reshape if it reaches special points when metadata
> write is really needed to update the reshape checkpoint.
> In reshape_request():
> 	/* Cannot proceed until we've updated the superblock */
> 	..
> 	set_bit(MD_CHANGE_DEVS, mddev->flags)
> 
> In md_check_recovery() we have:
> 	if (mddev->flags) 
> 		md_update_sb()
> 
> Couldn't we follow this logic with MD_CHANGE_DEVS for external metadata?
> If not, how to detect the need for migration checkpoint update?

Good question.
The first question to ask is
  How does mdmon know when a metadata update is required, and how does
  it tell md that the metadata update is complete.

OK, 2 first questions...

For the first I suspect it should watch 'md/reshape_position' (which need to
use sysfs_notify for).
For the second .... I don't know.
- Maybe sync_action could change to 'paused' and mdmon writes 'continue'....
  but that is possibly overloading that file too much.
- We could have a new sysfs file which just shows paused/active ??
- We could require that mdmon sets 'sync_max' appropriately so that reshape
  will stop at the right place, and then when mdmon has updated the metadata,
  it sets a new sync_max value.
- As above, but if sync_max is set too high, it is automatically reduced
  to the place when raid5 finds that it has to stop

I think the last one is probably best.
Before updating ->reshape_position, raid5 checks ->resync_max and if it is
too high for safety it set is lower to a safer value.
Then it changes ->reshape_position and calls sysfs_notify.

mdmon watches for 'reshape_postion' to change.  when it does it updates the
metadata and then writes a larger value to ->resync_max.

Things can get a little confusing when reshaping to fewer devices as
reshape_position decreases, but sync_completed always increases and sync_max
is still an 'upper' limit.

But it should work OK.

Does that seem reasonable?

NeilBrown

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-17 10:35         ` Neil Brown
@ 2010-06-17 13:19           ` Trela, Maciej
  0 siblings, 0 replies; 8+ messages in thread
From: Trela, Maciej @ 2010-06-17 13:19 UTC (permalink / raw)
  To: Neil Brown; +Cc: Kwolek, Adam, linux-raid, Williams, Dan J, Ciechanowski, Ed



> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Thursday, June 17, 2010 12:35 PM
> To: Trela, Maciej
> Cc: Kwolek, Adam; linux-raid@vger.kernel.org; Williams, Dan J;
> Ciechanowski, Ed
> Subject: Re: [md PATCH 2/5] md: Enable reshape for external metadata
> 
> On Thu, 17 Jun 2010 10:40:36 +0100
> "Trela, Maciej" <Maciej.Trela@intel.com> wrote:
> 
> > >
> > > >
> > > > Another thing is waiting during reshape for metadata update on
> > > MD_CHANGE_DEVS flag.
> > > > To roll reshape I've added the following code (instead calling
> > > md_ubdate_sb()):
> > >
> > > Yes, there is a real issue there...
> > >
> > > I don't think we ever need the kernel to wait for an external
> metadata
> > > handler
> > > to respond to device changes (apart from failure which is handled
> > > separately).
> > > So maybe the best thing is to guard all settings of MD_CHANGE_DEVS
> with
> > > if (mddev->persistent)
> > >
> > > I think that would be best, but I've make a note to review that
> later.
> > >
> >
> > Neil,
> > from what I see in the raid5.c/md.c "native" code uses MD_CHANGE_DEVS
> > during the reshape if it reaches special points when metadata
> > write is really needed to update the reshape checkpoint.
> > In reshape_request():
> > 	/* Cannot proceed until we've updated the superblock */
> > 	..
> > 	set_bit(MD_CHANGE_DEVS, mddev->flags)
> >
> > In md_check_recovery() we have:
> > 	if (mddev->flags)
> > 		md_update_sb()
> >
> > Couldn't we follow this logic with MD_CHANGE_DEVS for external
> metadata?
> > If not, how to detect the need for migration checkpoint update?
> 
> Good question.
> The first question to ask is
>   How does mdmon know when a metadata update is required, and how does
>   it tell md that the metadata update is complete.
> 
> OK, 2 first questions...
> 
> For the first I suspect it should watch 'md/reshape_position' (which
> need to
> use sysfs_notify for).

Yes, I agree here.

> For the second .... I don't know.
> - Maybe sync_action could change to 'paused' and mdmon writes
> 'continue'....
>   but that is possibly overloading that file too much.
> - We could have a new sysfs file which just shows paused/active ??
> - We could require that mdmon sets 'sync_max' appropriately so that
> reshape
>   will stop at the right place, and then when mdmon has updated the
> metadata,
>   it sets a new sync_max value.
> - As above, but if sync_max is set too high, it is automatically
> reduced
>   to the place when raid5 finds that it has to stop
> 
> I think the last one is probably best.
> Before updating ->reshape_position, raid5 checks ->resync_max and if it
> is
> too high for safety it set is lower to a safer value.
> Then it changes ->reshape_position and calls sysfs_notify.
> 
> mdmon watches for 'reshape_postion' to change.  when it does it updates
> the
> metadata and then writes a larger value to ->resync_max.
> 
> Things can get a little confusing when reshaping to fewer devices as
> reshape_position decreases, but sync_completed always increases and
> sync_max
> is still an 'upper' limit.
> 
> But it should work OK.
> 
> Does that seem reasonable?
> 

Yes, it looks ok to me.
However, I have one doubt:
Let's assume we have child_grow() reshape type:

1. child_grow() performs backup for N stripes and sets 
	resync_max = N * stripe_size

2. md start reshaping

3. For one of the first stripes = M (where M < N) md decides to
   update metadata and notifies mdmon with the new reshape_position = M
   However, before that, it changes resync_max = M so md_do_sync main
   loop will wait for a new resync_max value.

4. mdmon wakes on reshape_position, updates meta and it should
   set the new value for resync_max to confirm the checkpoint.
   I think that resync_max should be set to N at this moment but mdmon
   is not able to do that... 
   it does not know the original resync_max value (set in Grow.c)

what do you think?

Maybe using other sysfs file would be better?
I would vote for "resync_start" = "continue" or something like that.. :)


Maciek.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [md PATCH 2/5]  md: Enable reshape for external metadata
  2010-06-17  6:11     ` Neil Brown
  2010-06-17  9:40       ` Trela, Maciej
@ 2010-06-17 16:49       ` Kwolek, Adam
  1 sibling, 0 replies; 8+ messages in thread
From: Kwolek, Adam @ 2010-06-17 16:49 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Williams, Dan J, Ciechanowski, Ed



> -----Original Message-----
> From: Neil Brown [mailto:neilb@suse.de]
> Sent: Thursday, June 17, 2010 8:12 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> Subject: Re: [md PATCH 2/5] md: Enable reshape for external metadata
> 
> On Wed, 16 Jun 2010 15:52:06 +0100
> "Kwolek, Adam" <adam.kwolek@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Neil Brown [mailto:neilb@suse.de]
> > > Sent: Wednesday, June 16, 2010 6:54 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed
> > > Subject: Re: [md PATCH 2/5] md: Enable reshape for external
> metadata
> > >
> > > On Wed, 9 Jun 2010 15:21:05 +0100
> > > "Kwolek, Adam" <adam.kwolek@intel.com> wrote:
> > >
> > > > (md: Online Capacity Expansion for IMSM)
> > > > Reshape can't go forward for external metadatas due to fact, that
> > > internal md flags are updated during native meta writing. For
> external
> > > metadatas
> > > > md_update_sb() is not called (reshape process is blocked).
> > > > To take carry about md flags in external metadata array case and
> > > allow reshape to roll over, md_update_sb() is called in similar way
> to
> > > native metadata. The difference is that metadata is not stored to
> disks
> > > by md, but externally by mdmon.
> > >
> > > I agree there is a problem here but I think you are approaching it
> the
> > > wrong
> > > way.  We need to make sure the problem flag doesn't get set when
> > > external
> > > metadata is used.
> > >
> > > I found something similar (maybe the same thing) when writing the
> dm-
> > > raid456
> > > module.  Does that patch:
> > >
> > >
> http://neil.brown.name/git?p=md;a=commitdiff;h=3b930b37e50702f97f8fad6d
> > > 7f5ee6d3f268394e
> > >
> > > solve the problem for you?
> > >
> > > Thanks,
> > > NeilBrown
> >
> > It almost solves problem.
> > To make it solved the following change I have to add:
> >
> > md/md.c |    5 +++--
> >  1 files changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/md/md.c b/md/md.c
> > index e9b5d67..01e88cd 100644
> > --- a/md/md.c
> > +++ b/md/md.c
> > @@ -6437,7 +6437,8 @@ int md_allow_write(mddev_t *mddev)
> >  	spin_lock_irq(&mddev->write_lock);
> >  	if (mddev->in_sync) {
> >  		mddev->in_sync = 0;
> > -		set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> > +		if (mddev->persistent)
> > +			set_bit(MD_CHANGE_CLEAN, &mddev->flags);
> >  		if (mddev->safemode_delay &&
> >  		    mddev->safemode == 0)
> >  			mddev->safemode = 1;
> >
> > This closes MD_CHANGE_CLEAN flag management for me.
> > What do you think about this?
> 
> 
> >
> > Another thing is waiting during reshape for metadata update on
> MD_CHANGE_DEVS flag.
> > To roll reshape I've added the following code (instead calling
> md_ubdate_sb()):
> 
> Yes, there is a real issue there...
> 
> I don't think we ever need the kernel to wait for an external metadata
> handler
> to respond to device changes (apart from failure which is handled
> separately).
> So maybe the best thing is to guard all settings of MD_CHANGE_DEVS with
> if (mddev->persistent)
> 
> I think that would be best, but I've make a note to review that later.
> 
> Thanks,
> NeilBrown
> 
> 
> >
> >
> > diff --git a/md/md.c b/md/md.c
> > index 01e88cd..539b323 100644
> > --- a/md/md.c
> > +++ b/md/md.c
> > @@ -6896,8 +6896,17 @@ void md_check_recovery(mddev_t *mddev)
> >  		(mddev->external == 0 && mddev->safemode == 1) ||
> >  		(mddev->safemode == 2 && ! atomic_read(&mddev-
> >writes_pending)
> >  		 && !mddev->in_sync && mddev->recovery_cp == MaxSector)
> > -		))
> > +		)) {
> > +			if ((mddev->external) && (mddev->flags)) {
> > +				/* FIXME:
> > +				 * for external metadata checkpointing purposes
> > +				 * put communication with userspace here
> > +				 */
> > +				clear_bit(MD_CHANGE_DEVS, &mddev->flags);
> > +				wake_up(&mddev->sb_wait);
> > +			}
> >  			return;
> > +		}
> >
> >  	if (mddev_trylock(mddev)) {
> >  		int spares = 0;
> >
> >
> > This change we can reuse for checkpointing purposes.
> > External metadata updates can be triggered in the same moments as
> native one (reuse internal md communication mechanisms).
> >
> > What is your opinion about this?
> >
> > BR
> > Adam
> >


> No, that would be wrong.
> 
> If the array is 'clean' and a write happens, the array_state changes to
> "write_pending" and the write blocks until mdmon updates the metadata
> and
> then writes "active" to "array_state".
> 
> Setting MD_CHANGE_CLEAN is important for this handshake to work
> properly.


The problem is during setting new raid_disks value for external meta when md doesn't update metadata.
When in resize_stripe() md_allow_write() is called it returns an error (due to this handshake mechanism), so user space
has no opportunity to answer in such case (before function call result is checked in md) even for notification that is present there. 
In such case, I can clear mddev->in_sync, before md_allow_write(). This leaves md_allow_write()
unattached (for use in other places). I think the best place for this is chechk_reshape(). 
This allows to set new raid_disks value.

diff --git a/md/raid5.c b/md/raid5.c
index cbd5c14..f5c42c9 100644
--- a/md/raid5.c
+++ b/md/raid5.c
@@ -5404,6 +5403,10 @@ static int check_reshape(mddev_t *mddev)
 	if (!check_stripe_cache(mddev))
 		return -ENOSPC;
 
+	if (mddev->external && mddev->delta_disks > 0)
+		mddev->in_sync = 0;
+
+
 	return resize_stripes(conf, conf->raid_disks + mddev->delta_disks);
 }


In fact in_sync == 0 at this moment shows real situation, not just trick.

This is my proposition as I see no possibility to interact with user space during this call 
to manage MD_CHANGE_CLEAN flag.

Can you have a look at this problem?

Thanks 
Adam



^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-06-17 16:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-09 14:21 [md PATCH 2/5] md: Enable reshape for external metadata Kwolek, Adam
2010-06-16  4:53 ` Neil Brown
2010-06-16 14:52   ` Kwolek, Adam
2010-06-17  6:11     ` Neil Brown
2010-06-17  9:40       ` Trela, Maciej
2010-06-17 10:35         ` Neil Brown
2010-06-17 13:19           ` Trela, Maciej
2010-06-17 16:49       ` Kwolek, Adam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.