[PATCH] md: Prevent IO hold during accessing to failed raid5 array

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] md: Prevent IO hold during accessing to failed raid5 array
@ 2016-07-15 13:24 Alexey Obitotskiy
  2016-07-19 22:46 ` Shaohua Li
  0 siblings, 1 reply; 5+ messages in thread
From: Alexey Obitotskiy @ 2016-07-15 13:24 UTC (permalink / raw)
  To: shli; +Cc: linux-raid

After array enters in failed state (e.g. number of failed drives
becomes more then accepted for raid5 level) it sets error flags
(one of this flags is MD_CHANGE_PENDING). This flag prevents to
finish all new or non-finished IOs to array and hold them in
pending state. In some cases this can leads to deadlock situation.

For example udev handle array state changes (drives becomes faulty)
and blkid started but unable to finish reads due to IO hold.
At the same time we unable to get exclusive access to array
(to stop array in our case) because another external application
still use this array (blkid in our case).

Fix makes possible to return IO with errors immediately.
So external application can finish working with array and
give exclusive access to other applications.

Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
---
 drivers/md/raid5.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 6c1149d..99471b6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4692,7 +4692,9 @@ finish:
 	}

 	if (!bio_list_empty(&s.return_bi)) {
-		if (test_bit(MD_CHANGE_PENDING, &conf->mddev->flags)) {
+		if (test_bit(MD_CHANGE_PENDING, &conf->mddev->flags) &&
+				(s.failed <= conf->max_degraded ||
+					conf->mddev->external == 0)) {
 			spin_lock_irq(&conf->device_lock);
 			bio_list_merge(&conf->return_bi, &s.return_bi);
 			spin_unlock_irq(&conf->device_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] md: Prevent IO hold during accessing to failed raid5 array
  2016-07-15 13:24 [PATCH] md: Prevent IO hold during accessing to failed raid5 array Alexey Obitotskiy
@ 2016-07-19 22:46 ` Shaohua Li
  2016-07-20  6:25   ` Obitotskiy, Aleksey
  2016-07-29  9:07   ` Obitotskiy, Aleksey
  0 siblings, 2 replies; 5+ messages in thread
From: Shaohua Li @ 2016-07-19 22:46 UTC (permalink / raw)
  To: Alexey Obitotskiy; +Cc: linux-raid

On Fri, Jul 15, 2016 at 03:24:27PM +0200, Alexey Obitotskiy wrote:
> After array enters in failed state (e.g. number of failed drives
> becomes more then accepted for raid5 level) it sets error flags
> (one of this flags is MD_CHANGE_PENDING). This flag prevents to
> finish all new or non-finished IOs to array and hold them in
> pending state. In some cases this can leads to deadlock situation.
> 
> For example udev handle array state changes (drives becomes faulty)
> and blkid started but unable to finish reads due to IO hold.
> At the same time we unable to get exclusive access to array
> (to stop array in our case) because another external application
> still use this array (blkid in our case).
> 
> Fix makes possible to return IO with errors immediately.
> So external application can finish working with array and
> give exclusive access to other applications.
> 
> Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
> ---
>  drivers/md/raid5.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 6c1149d..99471b6 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4692,7 +4692,9 @@ finish:
>  	}
>  
>  	if (!bio_list_empty(&s.return_bi)) {
> -		if (test_bit(MD_CHANGE_PENDING, &conf->mddev->flags)) {
> +		if (test_bit(MD_CHANGE_PENDING, &conf->mddev->flags) &&
> +				(s.failed <= conf->max_degraded ||
> +					conf->mddev->external == 0)) {
>  			spin_lock_irq(&conf->device_lock);
>  			bio_list_merge(&conf->return_bi, &s.return_bi);
>  			spin_unlock_irq(&conf->device_lock);
> -- 
> 2.7.4

Hi Alexey,

I'm not clear about the race. When we set the MD_CHANGE_PENDING, we will
schedule superblock write, which will eventually finish (either success or
timedout). Why will the IO be hold forever?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] md: Prevent IO hold during accessing to failed raid5 array
  2016-07-19 22:46 ` Shaohua Li
@ 2016-07-20  6:25   ` Obitotskiy, Aleksey
  2016-07-29  9:07   ` Obitotskiy, Aleksey
  1 sibling, 0 replies; 5+ messages in thread
From: Obitotskiy, Aleksey @ 2016-07-20  6:25 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid

Hello,

md_update_sb does not clean MD_CHANGE_PENDING flag for imsm arrays (i.e. external == 1).
And until MD_CHANGE_PENDING is set all remaining or new IO will not be finished
but will stay in return_bi list.

Regards,
Aleskey

-----Original Message-----
From: Shaohua Li [mailto:shli@kernel.org] 
Sent: Wednesday, 20 July, 2016 00:46
To: Obitotskiy, Aleksey <aleksey.obitotskiy@intel.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [PATCH] md: Prevent IO hold during accessing to failed raid5 array

On Fri, Jul 15, 2016 at 03:24:27PM +0200, Alexey Obitotskiy wrote:
> After array enters in failed state (e.g. number of failed drives 
> becomes more then accepted for raid5 level) it sets error flags (one 
> of this flags is MD_CHANGE_PENDING). This flag prevents to finish all 
> new or non-finished IOs to array and hold them in pending state. In 
> some cases this can leads to deadlock situation.
> 
> For example udev handle array state changes (drives becomes faulty) 
> and blkid started but unable to finish reads due to IO hold.
> At the same time we unable to get exclusive access to array (to stop 
> array in our case) because another external application still use this 
> array (blkid in our case).
> 
> Fix makes possible to return IO with errors immediately.
> So external application can finish working with array and give 
> exclusive access to other applications.
> 
> Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
> ---
>  drivers/md/raid5.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 
> 6c1149d..99471b6 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4692,7 +4692,9 @@ finish:
>  	}
>  
>  	if (!bio_list_empty(&s.return_bi)) {
> -		if (test_bit(MD_CHANGE_PENDING, &conf->mddev->flags)) {
> +		if (test_bit(MD_CHANGE_PENDING, &conf->mddev->flags) &&
> +				(s.failed <= conf->max_degraded ||
> +					conf->mddev->external == 0)) {
>  			spin_lock_irq(&conf->device_lock);
>  			bio_list_merge(&conf->return_bi, &s.return_bi);
>  			spin_unlock_irq(&conf->device_lock);
> --
> 2.7.4

Hi Alexey,

I'm not clear about the race. When we set the MD_CHANGE_PENDING, we will schedule superblock write, which will eventually finish (either success or timedout). Why will the IO be hold forever?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md: Prevent IO hold during accessing to failed raid5 array
  2016-07-19 22:46 ` Shaohua Li
  2016-07-20  6:25   ` Obitotskiy, Aleksey
@ 2016-07-29  9:07   ` Obitotskiy, Aleksey
  2016-07-30 21:01     ` Shaohua Li
  1 sibling, 1 reply; 5+ messages in thread
From: Obitotskiy, Aleksey @ 2016-07-29  9:07 UTC (permalink / raw)
  To: shli; +Cc: linux-raid

Hello,

I would like to know what the status of this patch.
Maybe I should provide more info about?

Regards,
Aleksey

On Tue, 2016-07-19 at 15:46 -0700, Shaohua Li wrote:
> On Fri, Jul 15, 2016 at 03:24:27PM +0200, Alexey Obitotskiy wrote:
> > 
> > After array enters in failed state (e.g. number of failed drives
> > becomes more then accepted for raid5 level) it sets error flags
> > (one of this flags is MD_CHANGE_PENDING). This flag prevents to
> > finish all new or non-finished IOs to array and hold them in
> > pending state. In some cases this can leads to deadlock situation.
> > 
> > For example udev handle array state changes (drives becomes faulty)
> > and blkid started but unable to finish reads due to IO hold.
> > At the same time we unable to get exclusive access to array
> > (to stop array in our case) because another external application
> > still use this array (blkid in our case).
> > 
> > Fix makes possible to return IO with errors immediately.
> > So external application can finish working with array and
> > give exclusive access to other applications.
> > 
> > Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
> > ---
> >  drivers/md/raid5.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > index 6c1149d..99471b6 100644
> > --- a/drivers/md/raid5.c
> > +++ b/drivers/md/raid5.c
> > @@ -4692,7 +4692,9 @@ finish:
> >  	}
> >  
> >  	if (!bio_list_empty(&s.return_bi)) {
> > -		if (test_bit(MD_CHANGE_PENDING, &conf->mddev-
> > >flags)) {
> > +		if (test_bit(MD_CHANGE_PENDING, &conf->mddev-
> > >flags) &&
> > +				(s.failed <= conf->max_degraded ||
> > +					conf->mddev->external ==
> > 0)) {
> >  			spin_lock_irq(&conf->device_lock);
> >  			bio_list_merge(&conf->return_bi,
> > &s.return_bi);
> >  			spin_unlock_irq(&conf->device_lock);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] md: Prevent IO hold during accessing to failed raid5 array
  2016-07-29  9:07   ` Obitotskiy, Aleksey
@ 2016-07-30 21:01     ` Shaohua Li
  0 siblings, 0 replies; 5+ messages in thread
From: Shaohua Li @ 2016-07-30 21:01 UTC (permalink / raw)
  To: Obitotskiy, Aleksey; +Cc: linux-raid

On Fri, Jul 29, 2016 at 09:07:43AM +0000, Obitotskiy, Aleksey wrote:
> Hello,
> 
> I would like to know what the status of this patch.
> Maybe I should provide more info about?

I'm in vacation, so response is slow, sorry. Please reorganize the patch log
and mention this is for external managed array. What's the s.failed <=
conf->max_degraded check for?

Thanks,
Shaohua

> 
> Regards,
> Aleksey
> 
> On Tue, 2016-07-19 at 15:46 -0700, Shaohua Li wrote:
> > On Fri, Jul 15, 2016 at 03:24:27PM +0200, Alexey Obitotskiy wrote:
> > > 
> > > After array enters in failed state (e.g. number of failed drives
> > > becomes more then accepted for raid5 level) it sets error flags
> > > (one of this flags is MD_CHANGE_PENDING). This flag prevents to
> > > finish all new or non-finished IOs to array and hold them in
> > > pending state. In some cases this can leads to deadlock situation.
> > > 
> > > For example udev handle array state changes (drives becomes faulty)
> > > and blkid started but unable to finish reads due to IO hold.
> > > At the same time we unable to get exclusive access to array
> > > (to stop array in our case) because another external application
> > > still use this array (blkid in our case).
> > > 
> > > Fix makes possible to return IO with errors immediately.
> > > So external application can finish working with array and
> > > give exclusive access to other applications.
> > > 
> > > Signed-off-by: Alexey Obitotskiy <aleksey.obitotskiy@intel.com>
> > > ---
> > >  drivers/md/raid5.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > index 6c1149d..99471b6 100644
> > > --- a/drivers/md/raid5.c
> > > +++ b/drivers/md/raid5.c
> > > @@ -4692,7 +4692,9 @@ finish:
> > >  	}
> > >  
> > >  	if (!bio_list_empty(&s.return_bi)) {
> > > -		if (test_bit(MD_CHANGE_PENDING, &conf->mddev-
> > > >flags)) {
> > > +		if (test_bit(MD_CHANGE_PENDING, &conf->mddev-
> > > >flags) &&
> > > +				(s.failed <= conf->max_degraded ||
> > > +					conf->mddev->external ==
> > > 0)) {
> > >  			spin_lock_irq(&conf->device_lock);
> > >  			bio_list_merge(&conf->return_bi,
> > > &s.return_bi);
> > >  			spin_unlock_irq(&conf->device_lock);

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-30 21:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-15 13:24 [PATCH] md: Prevent IO hold during accessing to failed raid5 array Alexey Obitotskiy
2016-07-19 22:46 ` Shaohua Li
2016-07-20  6:25   ` Obitotskiy, Aleksey
2016-07-29  9:07   ` Obitotskiy, Aleksey
2016-07-30 21:01     ` Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.