All of lore.kernel.org
 help / color / mirror / Atom feed
* New RAID causing system lockups
@ 2010-09-11 18:20 Mike Hartman
  2010-09-11 18:45 ` Mike Hartman
  2010-09-11 20:43 ` Neil Brown
  0 siblings, 2 replies; 17+ messages in thread
From: Mike Hartman @ 2010-09-11 18:20 UTC (permalink / raw)
  To: linux-raid

PART 3:

Update:

I'm even more concerned about this now, because I just started the
newest reshaping to add a new drive with:

mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0

And the system output:

mdadm: Need to backup 768K of critical section..

cat /proc/mdstat shows the reshaping is proceeding,

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
      2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU]
      [>....................]  reshape =  0.0% (56576/1464845568)
finish=2156.9min speed=11315K/sec

md1 : active raid0 sdg1[0] sdk1[1]
      1465141760 blocks super 1.2 128k chunks

unused devices: <none>

but I've checked for /grow_md0.bak and it's not there. So it looks
like for some reason it ignored my backup file option.

This scares me, because if I experience the lockup again and am forced
to reboot, without a backup file I'm afraid my array will be hosed.
I'm also afraid to stop it cleanly right now for the same reason.

So in addition to fixing the lockup itself, does anyone know if
there's a way to either cancel this reshaping or belatedly add the
backup file in a different way so it will be recoverable? It's only at
1% and says it will take another 2193 minutes.

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-11 18:20 New RAID causing system lockups Mike Hartman
@ 2010-09-11 18:45 ` Mike Hartman
  2010-09-11 20:43 ` Neil Brown
  1 sibling, 0 replies; 17+ messages in thread
From: Mike Hartman @ 2010-09-11 18:45 UTC (permalink / raw)
  To: linux-raid

Every time I try to send the list a message with the relevant
attachments the message never gets there, so I've given up and posted
them here:

http://www.hartmanipulation.com/raid/

Includes lspci -v, kernel config, dmesg outputs from both lockups.

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-11 18:20 New RAID causing system lockups Mike Hartman
  2010-09-11 18:45 ` Mike Hartman
@ 2010-09-11 20:43 ` Neil Brown
  2010-09-11 20:56   ` Mike Hartman
  1 sibling, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-09-11 20:43 UTC (permalink / raw)
  To: Mike Hartman; +Cc: linux-raid

On Sat, 11 Sep 2010 14:20:40 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> PART 3:
> 
> Update:
> 
> I'm even more concerned about this now, because I just started the
> newest reshaping to add a new drive with:
> 
> mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0
> 
> And the system output:
> 
> mdadm: Need to backup 768K of critical section..
> 
> cat /proc/mdstat shows the reshaping is proceeding,
> 
> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
> md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
>       2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU]
>       [>....................]  reshape =  0.0% (56576/1464845568)
> finish=2156.9min speed=11315K/sec
> 
> md1 : active raid0 sdg1[0] sdk1[1]
>       1465141760 blocks super 1.2 128k chunks
> 
> unused devices: <none>
> 
> but I've checked for /grow_md0.bak and it's not there. So it looks
> like for some reason it ignored my backup file option.

It didn't.

When you making an array larger, you only need the backup file for a small
'critical region' at the beginning of the reshape - 768K worth in your case.

Once that is complete the backup-file is not needed and so is removed.

So your current situation is no worse that before.

[When making an array smaller, the critical section happen and the very end,
so mdadm keeps the backup file around - unused - until then.  Then uses it
quickly and completes.  When reshaping an array without changing the size the
'critical section' lasts for the entire time so a backup file is needed and
is very heavily used]

I don't know yet what is causing the lock-up.  A quick look at your logs
suggest that it could be related to the barrier handling.  Maybe trying to
handle a barrier during a reshape is prone to races of some sort - I wouldn't
be very surprised by that.

I'll have a look at the code and see what I can find.

Thanks for the report,
NeilBrown


> 
> This scares me, because if I experience the lockup again and am forced
> to reboot, without a backup file I'm afraid my array will be hosed.
> I'm also afraid to stop it cleanly right now for the same reason.
> 
> So in addition to fixing the lockup itself, does anyone know if
> there's a way to either cancel this reshaping or belatedly add the
> backup file in a different way so it will be recoverable? It's only at
> 1% and says it will take another 2193 minutes.
> 
> Mike
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-11 20:43 ` Neil Brown
@ 2010-09-11 20:56   ` Mike Hartman
  2010-09-13  6:28     ` Mike Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-11 20:56 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sat, Sep 11, 2010 at 4:43 PM, Neil Brown <neilb@suse.de> wrote:
> On Sat, 11 Sep 2010 14:20:40 -0400
> Mike Hartman <mike@hartmanipulation.com> wrote:
>
>> PART 3:
>>
>> Update:
>>
>> I'm even more concerned about this now, because I just started the
>> newest reshaping to add a new drive with:
>>
>> mdadm --grow -c 256 --raid-devices=5 --backup-file=/grow_md0.bak /dev/md0
>>
>> And the system output:
>>
>> mdadm: Need to backup 768K of critical section..
>>
>> cat /proc/mdstat shows the reshaping is proceeding,
>>
>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
>> md0 : active raid6 sdi1[0] sdf1[5] md1p1[4] sdj1[3] sdh1[1]
>>       2929691136 blocks super 1.2 level 6, 128k chunk, algorithm 2 [5/5] [UUUUU]
>>       [>....................]  reshape =  0.0% (56576/1464845568)
>> finish=2156.9min speed=11315K/sec
>>
>> md1 : active raid0 sdg1[0] sdk1[1]
>>       1465141760 blocks super 1.2 128k chunks
>>
>> unused devices: <none>
>>
>> but I've checked for /grow_md0.bak and it's not there. So it looks
>> like for some reason it ignored my backup file option.
>
> It didn't.
>
> When you making an array larger, you only need the backup file for a small
> 'critical region' at the beginning of the reshape - 768K worth in your case.
>
> Once that is complete the backup-file is not needed and so is removed.
>
> So your current situation is no worse that before.

Ok. When I did the reshape from RAID 5 to RAID 6 (moving from 3 disks
to 4) it kept the backup file around until at least 13% (since that's
when it locked and I had to restart it with the backup) but I imagine
that's a less common case than just growing an array. Your comments
give me renewed confidence.

>
> [When making an array smaller, the critical section happen and the very end,
> so mdadm keeps the backup file around - unused - until then.  Then uses it
> quickly and completes.  When reshaping an array without changing the size the
> 'critical section' lasts for the entire time so a backup file is needed and
> is very heavily used]
>
> I don't know yet what is causing the lock-up.  A quick look at your logs
> suggest that it could be related to the barrier handling.  Maybe trying to
> handle a barrier during a reshape is prone to races of some sort - I wouldn't
> be very surprised by that.

Just note that during the second lockup no reshape or resync was going
on. The array state was stable, I was just writing to it.

>
> I'll have a look at the code and see what I can find.

Thanks a lot. If it was only a risk when I was growing/reshaping the
array, and covered by the backup file, it would just be an
inconvenience. But since it can seemingly happen at any time it's a
problem.

>
> Thanks for the report,
> NeilBrown
>
>
>>
>> This scares me, because if I experience the lockup again and am forced
>> to reboot, without a backup file I'm afraid my array will be hosed.
>> I'm also afraid to stop it cleanly right now for the same reason.
>>
>> So in addition to fixing the lockup itself, does anyone know if
>> there's a way to either cancel this reshaping or belatedly add the
>> backup file in a different way so it will be recoverable? It's only at
>> 1% and says it will take another 2193 minutes.
>>
>> Mike
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-11 20:56   ` Mike Hartman
@ 2010-09-13  6:28     ` Mike Hartman
  2010-09-13 15:57       ` Mike Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-13  6:28 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>> I don't know yet what is causing the lock-up.  A quick look at your logs
>> suggest that it could be related to the barrier handling.  Maybe trying to
>> handle a barrier during a reshape is prone to races of some sort - I wouldn't
>> be very surprised by that.
>
> Just note that during the second lockup no reshape or resync was going
> on. The array state was stable, I was just writing to it.
>
>>
>> I'll have a look at the code and see what I can find.
>
> Thanks a lot. If it was only a risk when I was growing/reshaping the
> array, and covered by the backup file, it would just be an
> inconvenience. But since it can seemingly happen at any time it's a
> problem.
>

The lockup just happened again. I wasn't doing any
growing/reshaping/anything like that. Just copying some data into the
partition that lives on md0. dmesg_3.txt has been uploaded alongside
the other files at http://www.hartmanipulation.com/raid/. The trace
looks pretty similar to me.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-13  6:28     ` Mike Hartman
@ 2010-09-13 15:57       ` Mike Hartman
  2010-09-13 23:51         ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-13 15:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>>> I don't know yet what is causing the lock-up.  A quick look at your logs
>>> suggest that it could be related to the barrier handling.  Maybe trying to
>>> handle a barrier during a reshape is prone to races of some sort - I wouldn't
>>> be very surprised by that.
>>
>> Just note that during the second lockup no reshape or resync was going
>> on. The array state was stable, I was just writing to it.
>>
>>>
>>> I'll have a look at the code and see what I can find.
>>
>> Thanks a lot. If it was only a risk when I was growing/reshaping the
>> array, and covered by the backup file, it would just be an
>> inconvenience. But since it can seemingly happen at any time it's a
>> problem.
>>
>
> The lockup just happened again. I wasn't doing any
> growing/reshaping/anything like that. Just copying some data into the
> partition that lives on md0. dmesg_3.txt has been uploaded alongside
> the other files at http://www.hartmanipulation.com/raid/. The trace
> looks pretty similar to me.
>

The lockup just happened for the fourth time, less than an hour after
I rebooted to clear the previous lockup from last night. All I did was
boot the system, start the RAID, and start copying some files onto it.
The problem seems to be getting worse - up until now I got at least a
full day of fairly heavy usage out of the system before it happened.
dmesg_4.txt has been uploaded alongside the other files. Let me know
if there's any other system information that would be useful.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-13 15:57       ` Mike Hartman
@ 2010-09-13 23:51         ` Neil Brown
       [not found]           ` <AANLkTin=jy=xJTtN5mQ6U=rYw3p+_4-nmkhO7zqR0KLP@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-09-13 23:51 UTC (permalink / raw)
  To: Mike Hartman; +Cc: linux-raid

On Mon, 13 Sep 2010 11:57:03 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> >>> I don't know yet what is causing the lock-up.  A quick look at your logs
> >>> suggest that it could be related to the barrier handling.  Maybe trying to
> >>> handle a barrier during a reshape is prone to races of some sort - I wouldn't
> >>> be very surprised by that.
> >>
> >> Just note that during the second lockup no reshape or resync was going
> >> on. The array state was stable, I was just writing to it.
> >>
> >>>
> >>> I'll have a look at the code and see what I can find.
> >>
> >> Thanks a lot. If it was only a risk when I was growing/reshaping the
> >> array, and covered by the backup file, it would just be an
> >> inconvenience. But since it can seemingly happen at any time it's a
> >> problem.
> >>
> >
> > The lockup just happened again. I wasn't doing any
> > growing/reshaping/anything like that. Just copying some data into the
> > partition that lives on md0. dmesg_3.txt has been uploaded alongside
> > the other files at http://www.hartmanipulation.com/raid/. The trace
> > looks pretty similar to me.
> >
> 
> The lockup just happened for the fourth time, less than an hour after
> I rebooted to clear the previous lockup from last night. All I did was
> boot the system, start the RAID, and start copying some files onto it.
> The problem seems to be getting worse - up until now I got at least a
> full day of fairly heavy usage out of the system before it happened.
> dmesg_4.txt has been uploaded alongside the other files. Let me know
> if there's any other system information that would be useful.
> 
> Mike

Hi Mike,
 thanks for the updates.

I'm not entirely clear what is happening (in fact, due to a cold that I am
still fighting off, nothing is entirely clear at the moment), but it looks
very likely that the problem is due to an interplay between barrier handling,
and the multi-level structure of your array (a raid0 being a member of a
raid5).

When a barrier request is processed, both arrays will schedule 'work' to be
done by the 'event' thread and I'm guess that you can get into a situation
where one work time is wait for the other, but the other is behind the one on
the single queue (I wonder if that make sense...)

Anyway, this patch might make a difference,  It reduced the number of work
items schedule in a way that could conceivably fix the problem.

If you can test this, please report the results.  I cannot easily reproduce
the problem so there is limited testing that I can do.

Thanks,
NeilBrown


diff --git a/drivers/md/md.c b/drivers/md/md.c
index f20d13e..7f2785c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
 
 #define POST_REQUEST_BARRIER ((void*)1)
 
+static void md_barrier_done(mddev_t *mddev)
+{
+	struct bio *bio = mddev->barrier;
+
+	if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
+		bio_endio(bio, -EOPNOTSUPP);
+	else if (bio->bi_size == 0)
+		bio_endio(bio, 0);
+	else {
+		/* other options need to be handled from process context */
+		schedule_work(&mddev->barrier_work);
+		return;
+	}
+	mddev->barrier = NULL;
+	wake_up(&mddev->sb_wait);
+}
+
 static void md_end_barrier(struct bio *bio, int err)
 {
 	mdk_rdev_t *rdev = bio->bi_private;
@@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err)
 			wake_up(&mddev->sb_wait);
 		} else
 			/* The pre-request barrier has finished */
-			schedule_work(&mddev->barrier_work);
+			md_barrier_done(mddev);
 	}
 	bio_put(bio);
 }
@@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct *ws)
 
 	atomic_set(&mddev->flush_pending, 1);
 
-	if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
-		bio_endio(bio, -EOPNOTSUPP);
-	else if (bio->bi_size == 0)
-		/* an empty barrier - all done */
-		bio_endio(bio, 0);
-	else {
-		bio->bi_rw &= ~REQ_HARDBARRIER;
-		if (mddev->pers->make_request(mddev, bio))
-			generic_make_request(bio);
-		mddev->barrier = POST_REQUEST_BARRIER;
-		submit_barriers(mddev);
-	}
+	bio->bi_rw &= ~REQ_HARDBARRIER;
+	if (mddev->pers->make_request(mddev, bio))
+		generic_make_request(bio);
+	mddev->barrier = POST_REQUEST_BARRIER;
+	submit_barriers(mddev);
+
 	if (atomic_dec_and_test(&mddev->flush_pending)) {
 		mddev->barrier = NULL;
 		wake_up(&mddev->sb_wait);
@@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio *bio)
 	submit_barriers(mddev);
 
 	if (atomic_dec_and_test(&mddev->flush_pending))
-		schedule_work(&mddev->barrier_work);
+		md_barrier_done(mddev);
 }
 EXPORT_SYMBOL(md_barrier_request);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* RE: New RAID causing system lockups
       [not found]           ` <AANLkTin=jy=xJTtN5mQ6U=rYw3p+_4-nmkhO7zqR0KLP@mail.gmail.com>
@ 2010-09-14  1:11             ` Mike Hartman
  2010-09-14  1:35               ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-14  1:11 UTC (permalink / raw)
  To: linux-raid

Forgot to include the mailing list on this.

> Hi Mike,
>  thanks for the updates.
>
> I'm not entirely clear what is happening (in fact, due to a cold that I am
> still fighting off, nothing is entirely clear at the moment), but it looks
> very likely that the problem is due to an interplay between barrier handling,
> and the multi-level structure of your array (a raid0 being a member of a
> raid5).
>
> When a barrier request is processed, both arrays will schedule 'work' to be
> done by the 'event' thread and I'm guess that you can get into a situation
> where one work time is wait for the other, but the other is behind the one on
> the single queue (I wonder if that make sense...)
>
> Anyway, this patch might make a difference,  It reduced the number of work
> items schedule in a way that could conceivably fix the problem.
>
> If you can test this, please report the results.  I cannot easily reproduce
> the problem so there is limited testing that I can do.
>
> Thanks,
> NeilBrown
>
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index f20d13e..7f2785c 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
>
>  #define POST_REQUEST_BARRIER ((void*)1)
>
> +static void md_barrier_done(mddev_t *mddev)
> +{
> +       struct bio *bio = mddev->barrier;
> +
> +       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> +               bio_endio(bio, -EOPNOTSUPP);
> +       else if (bio->bi_size == 0)
> +               bio_endio(bio, 0);
> +       else {
> +               /* other options need to be handled from process context */
> +               schedule_work(&mddev->barrier_work);
> +               return;
> +       }
> +       mddev->barrier = NULL;
> +       wake_up(&mddev->sb_wait);
> +}
> +
>  static void md_end_barrier(struct bio *bio, int err)
>  {
>        mdk_rdev_t *rdev = bio->bi_private;
> @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err)
>                        wake_up(&mddev->sb_wait);
>                } else
>                        /* The pre-request barrier has finished */
> -                       schedule_work(&mddev->barrier_work);
> +                       md_barrier_done(mddev);
>        }
>        bio_put(bio);
>  }
> @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct *ws)
>
>        atomic_set(&mddev->flush_pending, 1);
>
> -       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> -               bio_endio(bio, -EOPNOTSUPP);
> -       else if (bio->bi_size == 0)
> -               /* an empty barrier - all done */
> -               bio_endio(bio, 0);
> -       else {
> -               bio->bi_rw &= ~REQ_HARDBARRIER;
> -               if (mddev->pers->make_request(mddev, bio))
> -                       generic_make_request(bio);
> -               mddev->barrier = POST_REQUEST_BARRIER;
> -               submit_barriers(mddev);
> -       }
> +       bio->bi_rw &= ~REQ_HARDBARRIER;
> +       if (mddev->pers->make_request(mddev, bio))
> +               generic_make_request(bio);
> +       mddev->barrier = POST_REQUEST_BARRIER;
> +       submit_barriers(mddev);
> +
>        if (atomic_dec_and_test(&mddev->flush_pending)) {
>                mddev->barrier = NULL;
>                wake_up(&mddev->sb_wait);
> @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio *bio)
>        submit_barriers(mddev);
>
>        if (atomic_dec_and_test(&mddev->flush_pending))
> -               schedule_work(&mddev->barrier_work);
> +               md_barrier_done(mddev);
>  }
>  EXPORT_SYMBOL(md_barrier_request);
>
>
>

Neil, thanks for the patch. I experienced the lockup for the 5th time
an hour ago (about 3 hours after the last hard reboot) so I thought it
would be a good time to try your patch. Unfortunately I'm getting an
error:

patching file drivers/md/md.c
Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
Hunk #2 FAILED at 324.
Hunk #3 FAILED at 364.
Hunk #4 FAILED at 391.
3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.rej

"uname -r" gives "2.6.35-gentoo-r4", so I suspect that's why. I guess
the standard gentoo patchset does something with that file. I'm
skimming through md.c to see if I can understand it well enough to
apply the patch functionality manually. I've also uploaded my
2.6.35-gentoo-r4 md.c to www.hartmanipulation.com/raid/ with the other
files in case you or someone else wants to take a look at it.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-14  1:11             ` Mike Hartman
@ 2010-09-14  1:35               ` Neil Brown
  2010-09-14  2:50                 ` Mike Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-09-14  1:35 UTC (permalink / raw)
  To: Mike Hartman; +Cc: linux-raid

On Mon, 13 Sep 2010 21:11:30 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> Forgot to include the mailing list on this.
> 
> > Hi Mike,
> >  thanks for the updates.
> >
> > I'm not entirely clear what is happening (in fact, due to a cold that I am
> > still fighting off, nothing is entirely clear at the moment), but it looks
> > very likely that the problem is due to an interplay between barrier handling,
> > and the multi-level structure of your array (a raid0 being a member of a
> > raid5).
> >
> > When a barrier request is processed, both arrays will schedule 'work' to be
> > done by the 'event' thread and I'm guess that you can get into a situation
> > where one work time is wait for the other, but the other is behind the one on
> > the single queue (I wonder if that make sense...)
> >
> > Anyway, this patch might make a difference,  It reduced the number of work
> > items schedule in a way that could conceivably fix the problem.
> >
> > If you can test this, please report the results.  I cannot easily reproduce
> > the problem so there is limited testing that I can do.
> >
> > Thanks,
> > NeilBrown
> >
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index f20d13e..7f2785c 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
> >
> >  #define POST_REQUEST_BARRIER ((void*)1)
> >
> > +static void md_barrier_done(mddev_t *mddev)
> > +{
> > +       struct bio *bio = mddev->barrier;
> > +
> > +       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> > +               bio_endio(bio, -EOPNOTSUPP);
> > +       else if (bio->bi_size == 0)
> > +               bio_endio(bio, 0);
> > +       else {
> > +               /* other options need to be handled from process context */
> > +               schedule_work(&mddev->barrier_work);
> > +               return;
> > +       }
> > +       mddev->barrier = NULL;
> > +       wake_up(&mddev->sb_wait);
> > +}
> > +
> >  static void md_end_barrier(struct bio *bio, int err)
> >  {
> >        mdk_rdev_t *rdev = bio->bi_private;
> > @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err)
> >                        wake_up(&mddev->sb_wait);
> >                } else
> >                        /* The pre-request barrier has finished */
> > -                       schedule_work(&mddev->barrier_work);
> > +                       md_barrier_done(mddev);
> >        }
> >        bio_put(bio);
> >  }
> > @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct *ws)
> >
> >        atomic_set(&mddev->flush_pending, 1);
> >
> > -       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> > -               bio_endio(bio, -EOPNOTSUPP);
> > -       else if (bio->bi_size == 0)
> > -               /* an empty barrier - all done */
> > -               bio_endio(bio, 0);
> > -       else {
> > -               bio->bi_rw &= ~REQ_HARDBARRIER;
> > -               if (mddev->pers->make_request(mddev, bio))
> > -                       generic_make_request(bio);
> > -               mddev->barrier = POST_REQUEST_BARRIER;
> > -               submit_barriers(mddev);
> > -       }
> > +       bio->bi_rw &= ~REQ_HARDBARRIER;
> > +       if (mddev->pers->make_request(mddev, bio))
> > +               generic_make_request(bio);
> > +       mddev->barrier = POST_REQUEST_BARRIER;
> > +       submit_barriers(mddev);
> > +
> >        if (atomic_dec_and_test(&mddev->flush_pending)) {
> >                mddev->barrier = NULL;
> >                wake_up(&mddev->sb_wait);
> > @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio *bio)
> >        submit_barriers(mddev);
> >
> >        if (atomic_dec_and_test(&mddev->flush_pending))
> > -               schedule_work(&mddev->barrier_work);
> > +               md_barrier_done(mddev);
> >  }
> >  EXPORT_SYMBOL(md_barrier_request);
> >
> >
> >
> 
> Neil, thanks for the patch. I experienced the lockup for the 5th time
> an hour ago (about 3 hours after the last hard reboot) so I thought it
> would be a good time to try your patch. Unfortunately I'm getting an
> error:
> 
> patching file drivers/md/md.c
> Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
> Hunk #2 FAILED at 324.
> Hunk #3 FAILED at 364.
> Hunk #4 FAILED at 391.
> 3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.rej

That is odd.
I took the md.c that you posted on the web site, use "patch" to apply my
patch to it, and only Hunk #3 failed.

I used 'wiggle' to apply the patch and it applied perfectly, properly
replacing (1<<BIO_RW_BARRIER) with REQ_HARDBARRIER (or the other way around).

Try this version.  You will need to be in drivers/md/, or use

 patch drivers/md/md.c < this-patch


NeilBrown

--- md.c.orig	2010-09-14 11:29:15.000000000 +1000
+++ md.c	2010-09-14 11:29:50.000000000 +1000
@@ -291,6 +291,23 @@
 
 #define POST_REQUEST_BARRIER ((void*)1)
 
+static void md_barrier_done(mddev_t *mddev)
+{
+	struct bio *bio = mddev->barrier;
+
+	if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
+		bio_endio(bio, -EOPNOTSUPP);
+	else if (bio->bi_size == 0)
+		bio_endio(bio, 0);
+	else {
+		/* other options need to be handled from process context */
+		schedule_work(&mddev->barrier_work);
+		return;
+	}
+	mddev->barrier = NULL;
+	wake_up(&mddev->sb_wait);
+}
+
 static void md_end_barrier(struct bio *bio, int err)
 {
 	mdk_rdev_t *rdev = bio->bi_private;
@@ -307,7 +324,7 @@
 			wake_up(&mddev->sb_wait);
 		} else
 			/* The pre-request barrier has finished */
-			schedule_work(&mddev->barrier_work);
+			md_barrier_done(mddev);
 	}
 	bio_put(bio);
 }
@@ -347,18 +364,12 @@
 
 	atomic_set(&mddev->flush_pending, 1);
 
-	if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
-		bio_endio(bio, -EOPNOTSUPP);
-	else if (bio->bi_size == 0)
-		/* an empty barrier - all done */
-		bio_endio(bio, 0);
-	else {
-		bio->bi_rw &= ~(1<<BIO_RW_BARRIER);
-		if (mddev->pers->make_request(mddev, bio))
-			generic_make_request(bio);
-		mddev->barrier = POST_REQUEST_BARRIER;
-		submit_barriers(mddev);
-	}
+	bio->bi_rw &= ~(1<<BIO_RW_BARRIER);
+	if (mddev->pers->make_request(mddev, bio))
+		generic_make_request(bio);
+	mddev->barrier = POST_REQUEST_BARRIER;
+	submit_barriers(mddev);
+
 	if (atomic_dec_and_test(&mddev->flush_pending)) {
 		mddev->barrier = NULL;
 		wake_up(&mddev->sb_wait);
@@ -380,7 +391,7 @@
 	submit_barriers(mddev);
 
 	if (atomic_dec_and_test(&mddev->flush_pending))
-		schedule_work(&mddev->barrier_work);
+		md_barrier_done(mddev);
 }
 EXPORT_SYMBOL(md_barrier_request);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-14  1:35               ` Neil Brown
@ 2010-09-14  2:50                 ` Mike Hartman
  2010-09-14  3:35                   ` Mike Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-14  2:50 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>>
>> > Hi Mike,
>> >  thanks for the updates.
>> >
>> > I'm not entirely clear what is happening (in fact, due to a cold that I am
>> > still fighting off, nothing is entirely clear at the moment), but it looks
>> > very likely that the problem is due to an interplay between barrier handling,
>> > and the multi-level structure of your array (a raid0 being a member of a
>> > raid5).
>> >
>> > When a barrier request is processed, both arrays will schedule 'work' to be
>> > done by the 'event' thread and I'm guess that you can get into a situation
>> > where one work time is wait for the other, but the other is behind the one on
>> > the single queue (I wonder if that make sense...)
>> >
>> > Anyway, this patch might make a difference,  It reduced the number of work
>> > items schedule in a way that could conceivably fix the problem.
>> >
>> > If you can test this, please report the results.  I cannot easily reproduce
>> > the problem so there is limited testing that I can do.
>> >
>> > Thanks,
>> > NeilBrown
>> >
>> >
>> > diff --git a/drivers/md/md.c b/drivers/md/md.c
>> > index f20d13e..7f2785c 100644
>> > --- a/drivers/md/md.c
>> > +++ b/drivers/md/md.c
>> > @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
>> >
>> >  #define POST_REQUEST_BARRIER ((void*)1)
>> >
>> > +static void md_barrier_done(mddev_t *mddev)
>> > +{
>> > +       struct bio *bio = mddev->barrier;
>> > +
>> > +       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> > +               bio_endio(bio, -EOPNOTSUPP);
>> > +       else if (bio->bi_size == 0)
>> > +               bio_endio(bio, 0);
>> > +       else {
>> > +               /* other options need to be handled from process context */
>> > +               schedule_work(&mddev->barrier_work);
>> > +               return;
>> > +       }
>> > +       mddev->barrier = NULL;
>> > +       wake_up(&mddev->sb_wait);
>> > +}
>> > +
>> >  static void md_end_barrier(struct bio *bio, int err)
>> >  {
>> >        mdk_rdev_t *rdev = bio->bi_private;
>> > @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err)
>> >                        wake_up(&mddev->sb_wait);
>> >                } else
>> >                        /* The pre-request barrier has finished */
>> > -                       schedule_work(&mddev->barrier_work);
>> > +                       md_barrier_done(mddev);
>> >        }
>> >        bio_put(bio);
>> >  }
>> > @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct *ws)
>> >
>> >        atomic_set(&mddev->flush_pending, 1);
>> >
>> > -       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> > -               bio_endio(bio, -EOPNOTSUPP);
>> > -       else if (bio->bi_size == 0)
>> > -               /* an empty barrier - all done */
>> > -               bio_endio(bio, 0);
>> > -       else {
>> > -               bio->bi_rw &= ~REQ_HARDBARRIER;
>> > -               if (mddev->pers->make_request(mddev, bio))
>> > -                       generic_make_request(bio);
>> > -               mddev->barrier = POST_REQUEST_BARRIER;
>> > -               submit_barriers(mddev);
>> > -       }
>> > +       bio->bi_rw &= ~REQ_HARDBARRIER;
>> > +       if (mddev->pers->make_request(mddev, bio))
>> > +               generic_make_request(bio);
>> > +       mddev->barrier = POST_REQUEST_BARRIER;
>> > +       submit_barriers(mddev);
>> > +
>> >        if (atomic_dec_and_test(&mddev->flush_pending)) {
>> >                mddev->barrier = NULL;
>> >                wake_up(&mddev->sb_wait);
>> > @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio *bio)
>> >        submit_barriers(mddev);
>> >
>> >        if (atomic_dec_and_test(&mddev->flush_pending))
>> > -               schedule_work(&mddev->barrier_work);
>> > +               md_barrier_done(mddev);
>> >  }
>> >  EXPORT_SYMBOL(md_barrier_request);
>> >
>> >
>> >
>>
>> Neil, thanks for the patch. I experienced the lockup for the 5th time
>> an hour ago (about 3 hours after the last hard reboot) so I thought it
>> would be a good time to try your patch. Unfortunately I'm getting an
>> error:
>>
>> patching file drivers/md/md.c
>> Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
>> Hunk #2 FAILED at 324.
>> Hunk #3 FAILED at 364.
>> Hunk #4 FAILED at 391.
>> 3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.rej
>
> That is odd.
> I took the md.c that you posted on the web site, use "patch" to apply my
> patch to it, and only Hunk #3 failed.
>
> I used 'wiggle' to apply the patch and it applied perfectly, properly
> replacing (1<<BIO_RW_BARRIER) with REQ_HARDBARRIER (or the other way around).
>
> Try this version.  You will need to be in drivers/md/, or use
>
>  patch drivers/md/md.c < this-patch
>
>
> NeilBrown
>
> --- md.c.orig   2010-09-14 11:29:15.000000000 +1000
> +++ md.c        2010-09-14 11:29:50.000000000 +1000
> @@ -291,6 +291,23 @@
>
>  #define POST_REQUEST_BARRIER ((void*)1)
>
> +static void md_barrier_done(mddev_t *mddev)
> +{
> +       struct bio *bio = mddev->barrier;
> +
> +       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> +               bio_endio(bio, -EOPNOTSUPP);
> +       else if (bio->bi_size == 0)
> +               bio_endio(bio, 0);
> +       else {
> +               /* other options need to be handled from process context */
> +               schedule_work(&mddev->barrier_work);
> +               return;
> +       }
> +       mddev->barrier = NULL;
> +       wake_up(&mddev->sb_wait);
> +}
> +
>  static void md_end_barrier(struct bio *bio, int err)
>  {
>        mdk_rdev_t *rdev = bio->bi_private;
> @@ -307,7 +324,7 @@
>                        wake_up(&mddev->sb_wait);
>                } else
>                        /* The pre-request barrier has finished */
> -                       schedule_work(&mddev->barrier_work);
> +                       md_barrier_done(mddev);
>        }
>        bio_put(bio);
>  }
> @@ -347,18 +364,12 @@
>
>        atomic_set(&mddev->flush_pending, 1);
>
> -       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
> -               bio_endio(bio, -EOPNOTSUPP);
> -       else if (bio->bi_size == 0)
> -               /* an empty barrier - all done */
> -               bio_endio(bio, 0);
> -       else {
> -               bio->bi_rw &= ~(1<<BIO_RW_BARRIER);
> -               if (mddev->pers->make_request(mddev, bio))
> -                       generic_make_request(bio);
> -               mddev->barrier = POST_REQUEST_BARRIER;
> -               submit_barriers(mddev);
> -       }
> +       bio->bi_rw &= ~(1<<BIO_RW_BARRIER);
> +       if (mddev->pers->make_request(mddev, bio))
> +               generic_make_request(bio);
> +       mddev->barrier = POST_REQUEST_BARRIER;
> +       submit_barriers(mddev);
> +
>        if (atomic_dec_and_test(&mddev->flush_pending)) {
>                mddev->barrier = NULL;
>                wake_up(&mddev->sb_wait);
> @@ -380,7 +391,7 @@
>        submit_barriers(mddev);
>
>        if (atomic_dec_and_test(&mddev->flush_pending))
> -               schedule_work(&mddev->barrier_work);
> +               md_barrier_done(mddev);
>  }
>  EXPORT_SYMBOL(md_barrier_request);
>
>

Sorry about that Neil, it was my fault. I copied your patch out of the
email and I think it picked up some unintended characters. I tried
copying it from the mailing list archive website instead and it
patched in fine. The kernel compiled with no trouble and I'm booted
into it now. No unexpected side-effects yet. I'll continue with my
copying and we'll see if it locks up again. Thanks for all your help!

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-14  2:50                 ` Mike Hartman
@ 2010-09-14  3:35                   ` Mike Hartman
  2010-09-14  3:48                     ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-14  3:35 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

>>>
>>> > Hi Mike,
>>> >  thanks for the updates.
>>> >
>>> > I'm not entirely clear what is happening (in fact, due to a cold that I am
>>> > still fighting off, nothing is entirely clear at the moment), but it looks
>>> > very likely that the problem is due to an interplay between barrier handling,
>>> > and the multi-level structure of your array (a raid0 being a member of a
>>> > raid5).
>>> >
>>> > When a barrier request is processed, both arrays will schedule 'work' to be
>>> > done by the 'event' thread and I'm guess that you can get into a situation
>>> > where one work time is wait for the other, but the other is behind the one on
>>> > the single queue (I wonder if that make sense...)
>>> >
>>> > Anyway, this patch might make a difference,  It reduced the number of work
>>> > items schedule in a way that could conceivably fix the problem.
>>> >
>>> > If you can test this, please report the results.  I cannot easily reproduce
>>> > the problem so there is limited testing that I can do.
>>> >
>>> > Thanks,
>>> > NeilBrown
>>> >
>>> >
>>> > diff --git a/drivers/md/md.c b/drivers/md/md.c
>>> > index f20d13e..7f2785c 100644
>>> > --- a/drivers/md/md.c
>>> > +++ b/drivers/md/md.c
>>> > @@ -294,6 +294,23 @@ EXPORT_SYMBOL(mddev_congested);
>>> >
>>> >  #define POST_REQUEST_BARRIER ((void*)1)
>>> >
>>> > +static void md_barrier_done(mddev_t *mddev)
>>> > +{
>>> > +       struct bio *bio = mddev->barrier;
>>> > +
>>> > +       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>>> > +               bio_endio(bio, -EOPNOTSUPP);
>>> > +       else if (bio->bi_size == 0)
>>> > +               bio_endio(bio, 0);
>>> > +       else {
>>> > +               /* other options need to be handled from process context */
>>> > +               schedule_work(&mddev->barrier_work);
>>> > +               return;
>>> > +       }
>>> > +       mddev->barrier = NULL;
>>> > +       wake_up(&mddev->sb_wait);
>>> > +}
>>> > +
>>> >  static void md_end_barrier(struct bio *bio, int err)
>>> >  {
>>> >        mdk_rdev_t *rdev = bio->bi_private;
>>> > @@ -310,7 +327,7 @@ static void md_end_barrier(struct bio *bio, int err)
>>> >                        wake_up(&mddev->sb_wait);
>>> >                } else
>>> >                        /* The pre-request barrier has finished */
>>> > -                       schedule_work(&mddev->barrier_work);
>>> > +                       md_barrier_done(mddev);
>>> >        }
>>> >        bio_put(bio);
>>> >  }
>>> > @@ -350,18 +367,12 @@ static void md_submit_barrier(struct work_struct *ws)
>>> >
>>> >        atomic_set(&mddev->flush_pending, 1);
>>> >
>>> > -       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>>> > -               bio_endio(bio, -EOPNOTSUPP);
>>> > -       else if (bio->bi_size == 0)
>>> > -               /* an empty barrier - all done */
>>> > -               bio_endio(bio, 0);
>>> > -       else {
>>> > -               bio->bi_rw &= ~REQ_HARDBARRIER;
>>> > -               if (mddev->pers->make_request(mddev, bio))
>>> > -                       generic_make_request(bio);
>>> > -               mddev->barrier = POST_REQUEST_BARRIER;
>>> > -               submit_barriers(mddev);
>>> > -       }
>>> > +       bio->bi_rw &= ~REQ_HARDBARRIER;
>>> > +       if (mddev->pers->make_request(mddev, bio))
>>> > +               generic_make_request(bio);
>>> > +       mddev->barrier = POST_REQUEST_BARRIER;
>>> > +       submit_barriers(mddev);
>>> > +
>>> >        if (atomic_dec_and_test(&mddev->flush_pending)) {
>>> >                mddev->barrier = NULL;
>>> >                wake_up(&mddev->sb_wait);
>>> > @@ -383,7 +394,7 @@ void md_barrier_request(mddev_t *mddev, struct bio *bio)
>>> >        submit_barriers(mddev);
>>> >
>>> >        if (atomic_dec_and_test(&mddev->flush_pending))
>>> > -               schedule_work(&mddev->barrier_work);
>>> > +               md_barrier_done(mddev);
>>> >  }
>>> >  EXPORT_SYMBOL(md_barrier_request);
>>> >
>>> >
>>> >
>>>
>>> Neil, thanks for the patch. I experienced the lockup for the 5th time
>>> an hour ago (about 3 hours after the last hard reboot) so I thought it
>>> would be a good time to try your patch. Unfortunately I'm getting an
>>> error:
>>>
>>> patching file drivers/md/md.c
>>> Hunk #1 succeeded at 291 with fuzz 1 (offset -3 lines).
>>> Hunk #2 FAILED at 324.
>>> Hunk #3 FAILED at 364.
>>> Hunk #4 FAILED at 391.
>>> 3 out of 4 hunks FAILED -- saving rejects to file drivers/md/md.c.rej
>>
>> That is odd.
>> I took the md.c that you posted on the web site, use "patch" to apply my
>> patch to it, and only Hunk #3 failed.
>>
>> I used 'wiggle' to apply the patch and it applied perfectly, properly
>> replacing (1<<BIO_RW_BARRIER) with REQ_HARDBARRIER (or the other way around).
>>
>> Try this version.  You will need to be in drivers/md/, or use
>>
>>  patch drivers/md/md.c < this-patch
>>
>>
>> NeilBrown
>>
>> --- md.c.orig   2010-09-14 11:29:15.000000000 +1000
>> +++ md.c        2010-09-14 11:29:50.000000000 +1000
>> @@ -291,6 +291,23 @@
>>
>>  #define POST_REQUEST_BARRIER ((void*)1)
>>
>> +static void md_barrier_done(mddev_t *mddev)
>> +{
>> +       struct bio *bio = mddev->barrier;
>> +
>> +       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> +               bio_endio(bio, -EOPNOTSUPP);
>> +       else if (bio->bi_size == 0)
>> +               bio_endio(bio, 0);
>> +       else {
>> +               /* other options need to be handled from process context */
>> +               schedule_work(&mddev->barrier_work);
>> +               return;
>> +       }
>> +       mddev->barrier = NULL;
>> +       wake_up(&mddev->sb_wait);
>> +}
>> +
>>  static void md_end_barrier(struct bio *bio, int err)
>>  {
>>        mdk_rdev_t *rdev = bio->bi_private;
>> @@ -307,7 +324,7 @@
>>                        wake_up(&mddev->sb_wait);
>>                } else
>>                        /* The pre-request barrier has finished */
>> -                       schedule_work(&mddev->barrier_work);
>> +                       md_barrier_done(mddev);
>>        }
>>        bio_put(bio);
>>  }
>> @@ -347,18 +364,12 @@
>>
>>        atomic_set(&mddev->flush_pending, 1);
>>
>> -       if (test_bit(BIO_EOPNOTSUPP, &bio->bi_flags))
>> -               bio_endio(bio, -EOPNOTSUPP);
>> -       else if (bio->bi_size == 0)
>> -               /* an empty barrier - all done */
>> -               bio_endio(bio, 0);
>> -       else {
>> -               bio->bi_rw &= ~(1<<BIO_RW_BARRIER);
>> -               if (mddev->pers->make_request(mddev, bio))
>> -                       generic_make_request(bio);
>> -               mddev->barrier = POST_REQUEST_BARRIER;
>> -               submit_barriers(mddev);
>> -       }
>> +       bio->bi_rw &= ~(1<<BIO_RW_BARRIER);
>> +       if (mddev->pers->make_request(mddev, bio))
>> +               generic_make_request(bio);
>> +       mddev->barrier = POST_REQUEST_BARRIER;
>> +       submit_barriers(mddev);
>> +
>>        if (atomic_dec_and_test(&mddev->flush_pending)) {
>>                mddev->barrier = NULL;
>>                wake_up(&mddev->sb_wait);
>> @@ -380,7 +391,7 @@
>>        submit_barriers(mddev);
>>
>>        if (atomic_dec_and_test(&mddev->flush_pending))
>> -               schedule_work(&mddev->barrier_work);
>> +               md_barrier_done(mddev);
>>  }
>>  EXPORT_SYMBOL(md_barrier_request);
>>
>>
>
> Sorry about that Neil, it was my fault. I copied your patch out of the
> email and I think it picked up some unintended characters. I tried
> copying it from the mailing list archive website instead and it
> patched in fine. The kernel compiled with no trouble and I'm booted
> into it now. No unexpected side-effects yet. I'll continue with my
> copying and we'll see if it locks up again. Thanks for all your help!
>
> Mike
>

Sorry Neil, locked up again in less than an hour. I've uploaded
dmesg_5.txt in case the trace shows something different/useful in
light of your patch.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-14  3:35                   ` Mike Hartman
@ 2010-09-14  3:48                     ` Neil Brown
       [not found]                       ` <AANLkTimXabL-TyjqJ81syrx-Oxn50qexbA8q9p22sxJt@mail.gmail.com>
  0 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-09-14  3:48 UTC (permalink / raw)
  To: Mike Hartman; +Cc: linux-raid

On Mon, 13 Sep 2010 23:35:50 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> Sorry Neil, locked up again in less than an hour. I've uploaded
> dmesg_5.txt in case the trace shows something different/useful in
> light of your patch.
> 
> Mike


Hmmm..
 Can you try mounting with 
    -o barrier=0

just to see if my theory is at all correct?

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
       [not found]                       ` <AANLkTimXabL-TyjqJ81syrx-Oxn50qexbA8q9p22sxJt@mail.gmail.com>
@ 2010-09-15 21:49                         ` Mike Hartman
  2010-09-21  2:26                           ` Neil Brown
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Hartman @ 2010-09-15 21:49 UTC (permalink / raw)
  To: Neil Brown, linux-raid

>> Hmmm..
>>  Can you try mounting with
>>    -o barrier=0
>>
>> just to see if my theory is at all correct?
>>
>> Thanks,
>> NeilBrown
>>
>

Progress report:

I made the barrier change shortly after sending my last message (about
40 hours ago). With that in place, I was able to finish emptying one
of the non-assimilated drives onto the array, after which I added that
drive as a hot spare and started the process to grow the array onto it
- the same procedure I've been applying since I created the RAID the
other week. No problems so far, and the reshape is at 46%.

It's hard to be positive that the barrier deactivation is responsible
yet though - while the last few lockups have only been 1-16 hours
apart, I believe the first two had at least 2 or 3 days between them.
I'll keep the array busy to enhance the chances of a lockup though -
each one so far has been during a reshape or a large batch of writing
to the array's partition. If I make it another couple days (meaning
time for this reshape to complete, another drive to be emptied onto
the array, and another reshape at least started) I'll be pretty
confident the problem has been identified.

Assuming the barrier is the culprit (and I'm pretty sure you're right)
what are the consequences of just leaving it off? I gather the idea of
the barrier is to prevent journal corruption in the event of a power
failure or other sudden shutdown, which seems pretty important, but it
also doesn't seem like it was enabled by default in ext3/4 until 2008,
which makes it seem less critical.

Even if the ultimate solution for me is to just leave it disabled I'm
happy to keep trying patches if you want to get it properly fixed in
md. We may have to come up with an alternate way to work the array
hard enough to trigger the lockups though - my last 1.5TB drive is
what's being merged in now. After that completes I only have one more
pair of 750GBs (that will have to be shoehorned in using RAID0 again).
I do have a single 750GB left over, so I'll probably find a mate for
it and get it added to. After that we're maxed out on hardware for a
while.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-15 21:49                         ` Mike Hartman
@ 2010-09-21  2:26                           ` Neil Brown
  2010-09-21 11:28                             ` Mike Hartman
  0 siblings, 1 reply; 17+ messages in thread
From: Neil Brown @ 2010-09-21  2:26 UTC (permalink / raw)
  To: Mike Hartman; +Cc: linux-raid

On Wed, 15 Sep 2010 17:49:44 -0400
Mike Hartman <mike@hartmanipulation.com> wrote:

> >> Hmmm..
> >>  Can you try mounting with
> >>    -o barrier=0
> >>
> >> just to see if my theory is at all correct?
> >>
> >> Thanks,
> >> NeilBrown
> >>
> >
> 
> Progress report:
> 
> I made the barrier change shortly after sending my last message (about
> 40 hours ago). With that in place, I was able to finish emptying one
> of the non-assimilated drives onto the array, after which I added that
> drive as a hot spare and started the process to grow the array onto it
> - the same procedure I've been applying since I created the RAID the
> other week. No problems so far, and the reshape is at 46%.
> 
> It's hard to be positive that the barrier deactivation is responsible
> yet though - while the last few lockups have only been 1-16 hours
> apart, I believe the first two had at least 2 or 3 days between them.
> I'll keep the array busy to enhance the chances of a lockup though -
> each one so far has been during a reshape or a large batch of writing
> to the array's partition. If I make it another couple days (meaning
> time for this reshape to complete, another drive to be emptied onto
> the array, and another reshape at least started) I'll be pretty
> confident the problem has been identified.

Thanks for the update.

> 
> Assuming the barrier is the culprit (and I'm pretty sure you're right)
> what are the consequences of just leaving it off? I gather the idea of
> the barrier is to prevent journal corruption in the event of a power
> failure or other sudden shutdown, which seems pretty important, but it
> also doesn't seem like it was enabled by default in ext3/4 until 2008,
> which makes it seem less critical.

Correct.  Without the barriers the chance of corruption during powerfail is
higher.  I don't really know how much higher, it depends a lot on the
filesystem design and the particular implementation.  I think ext4 tends to
be fairly safe - after all some devices don't support barriers and it has to
do best-effort on those too.

> 
> Even if the ultimate solution for me is to just leave it disabled I'm
> happy to keep trying patches if you want to get it properly fixed in
> md. We may have to come up with an alternate way to work the array
> hard enough to trigger the lockups though - my last 1.5TB drive is
> what's being merged in now. After that completes I only have one more
> pair of 750GBs (that will have to be shoehorned in using RAID0 again).
> I do have a single 750GB left over, so I'll probably find a mate for
> it and get it added to. After that we're maxed out on hardware for a
> while.
> 
> Mike

I'll stare at the code a bit more and see if anything jumps out at me.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: New RAID causing system lockups
  2010-09-21  2:26                           ` Neil Brown
@ 2010-09-21 11:28                             ` Mike Hartman
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Hartman @ 2010-09-21 11:28 UTC (permalink / raw)
  To: Neil Brown, linux-raid

On Mon, Sep 20, 2010 at 10:26 PM, Neil Brown <neilb@suse.de> wrote:
> On Wed, 15 Sep 2010 17:49:44 -0400
> Mike Hartman <mike@hartmanipulation.com> wrote:
>
>> >> Hmmm..
>> >>  Can you try mounting with
>> >>    -o barrier=0
>> >>
>> >> just to see if my theory is at all correct?
>> >>
>> >> Thanks,
>> >> NeilBrown
>> >>
>> >
>>
>> Progress report:
>>
>> I made the barrier change shortly after sending my last message (about
>> 40 hours ago). With that in place, I was able to finish emptying one
>> of the non-assimilated drives onto the array, after which I added that
>> drive as a hot spare and started the process to grow the array onto it
>> - the same procedure I've been applying since I created the RAID the
>> other week. No problems so far, and the reshape is at 46%.
>>
>> It's hard to be positive that the barrier deactivation is responsible
>> yet though - while the last few lockups have only been 1-16 hours
>> apart, I believe the first two had at least 2 or 3 days between them.
>> I'll keep the array busy to enhance the chances of a lockup though -
>> each one so far has been during a reshape or a large batch of writing
>> to the array's partition. If I make it another couple days (meaning
>> time for this reshape to complete, another drive to be emptied onto
>> the array, and another reshape at least started) I'll be pretty
>> confident the problem has been identified.
>
> Thanks for the update.
>
>>
>> Assuming the barrier is the culprit (and I'm pretty sure you're right)
>> what are the consequences of just leaving it off? I gather the idea of
>> the barrier is to prevent journal corruption in the event of a power
>> failure or other sudden shutdown, which seems pretty important, but it
>> also doesn't seem like it was enabled by default in ext3/4 until 2008,
>> which makes it seem less critical.
>
> Correct.  Without the barriers the chance of corruption during powerfail is
> higher.  I don't really know how much higher, it depends a lot on the
> filesystem design and the particular implementation.  I think ext4 tends to
> be fairly safe - after all some devices don't support barriers and it has to
> do best-effort on those too.
>
>>
>> Even if the ultimate solution for me is to just leave it disabled I'm
>> happy to keep trying patches if you want to get it properly fixed in
>> md. We may have to come up with an alternate way to work the array
>> hard enough to trigger the lockups though - my last 1.5TB drive is
>> what's being merged in now. After that completes I only have one more
>> pair of 750GBs (that will have to be shoehorned in using RAID0 again).
>> I do have a single 750GB left over, so I'll probably find a mate for
>> it and get it added to. After that we're maxed out on hardware for a
>> while.
>>
>> Mike
>
> I'll stare at the code a bit more and see if anything jumps out at me.
>
> Thanks,
> NeilBrown
>
>

I've just finished my last grow-and-copy with no problems. The only
drive that's not part of the array now is the leftover 750GB, which is
now empty. I haven't experienced any further lockups so your barrier
diagnosis seems to be spot on. I'm planning to just leave that option
turned off, but as I said, I'm happy to test any patches you come up
with. Thanks for all your help.

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 17+ messages in thread

* New RAID causing system lockups
@ 2010-09-11 18:13 Mike Hartman
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Hartman @ 2010-09-11 18:13 UTC (permalink / raw)
  To: linux-raid

PART 2:

Eventually I realized that while I couldn't do anything with bash, I
could run (some) commands directly via ssh (ssh odin <command>) and
they would work ok. I was able to run dmesg, cat some files. Was able
to ls some directories for a while, but eventually couldn't anymore.
Was NOT able to cat /proc/mdstat. It would just hang. Attached
(dmesg_1.txt) is the dmesg output I got, which seems to include
everything from the start of the reshaping up to the lockup. The RAID
system definitely seems to be involved.

After waiting a day or so with no change and nothing else working I
gritted my teeth and did a hard reboot, hoping my array wasn't totally
hosed. Fortunately, I was able to reassemble the array using the
backup file specified as part of my conversion command and the
reshaping picked back up where it left off. It completed without
further incident (took about 4 days).

Once the reshaping was complete I ran fsck on its filesystem (came
back clean even when forced), mounted it, and everything looked ok. No
files appeared to be lost. Chalking the freeze up to a one-time
problem related to the reshaping, I started copying all the data from
one of the other 1.5TB drives into the md0. (The idea is to keep
copying each drive's contents into the array, wiping it, adding it as
a hot spare, and then growing the RAID and its filesystem
accordingly.)

When I'm almost done (the 1.5TB only has 55GB left on it) the system
hangs again. Same symptoms as before. I was able to run dmesg again
(dmesg_2.txt) and the call trace looks pretty similar. It still
mentions the RAID system a good bit, even though no high level RAID
operations were going on and I was just writing to the array. This
time I only waited an hour or two before giving up and opting for the
hard reboot. Once again the array seemed to be ok once it was brought
back up.

Seems to be a fairly fundamental problem, whatever it is, and anything
that causes a lockup like this is a pretty big bug in a stable kernel.
The individual drives test out fine with everything I've tried.
Everything looks completely healthy until these lockups occur. I've
attached my lspci and kernel config in case there's something useful
in there.

Any ideas?

Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* New RAID causing system lockups
@ 2010-09-11 18:12 Mike Hartman
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Hartman @ 2010-09-11 18:12 UTC (permalink / raw)
  To: linux-raid

I've tried sending this twice now (it was my first post to the list)
but it never seems to make it through. Resending in multiple parts to
see if it's just too long.

PART 1:

First let me outline where I am and how I got to this point.

About a week ago I created a RAID array on my Gentoo server. I already
had a handful of full, independent drives on that server, and 3 new
empty ones. The three new 1.5TB SATA drives are in an external e-SATA
enclosure, along with 2 of the existing drives (750GB each). The
e-SATA enclosure is connected to the server with a Syba SD-SA2PEX-2E
card (SIL3132 chipset) since it supports port multiplying. The other 4
drives (2 1.5TB, 2 750GB) are still mounted in the server itself.

My goal was to end up with all the drives (9) in a single RAID 6 array
to use as a storage partition (not for any system files). I only had 3
clean ones, so I wanted to start with RAID 5, use that new space to
clear off some of the other drives, and bootstrap up to a RAID 6.

My first step was to update to the newest stable gentoo kernel
(2.6.35-gentoo-r4) to be sure I had reasonably current mdadm support.
No problems during that upgrade.

Then I created 1.5 TB partitions (type 0xDA) on each of the 3 new
(empty) drives and assembled them into a RAID 5 array (md0). Once that
was finished resyncing I created an ext4 filesystem and started
copying over everything that was on the 2 750GB drives in the same
enclosure.

Once that was done (no problems) and the 750s were empty I created a
RAID 0 (md1) from them. I created a 1.5TB partition on md1 just like I
had on a bare drives, and then added that partition to md0 as a hot
spare. I've seen that approach in several RAID tutorials - it seems
like the only way to get these undersized drives into the same RAID 6.

Then I switched md0 over to a RAID 6, using that hot spare. The
reshaping was SLOW (4MB/s) but that seems to be par for the course in
a RAID5->RAID6 transition.

It was during this reshaping that I saw my first lockup. I was
monitoring things via SSH, and the reshaping was about 13% complete.
The filesystem was mounted but wasn't being written to (or even read
much). I noticed my SSH session had stopped responding, so I tried
creating a new one in a fresh terminal. I was able to enter my
password, see the MOTD, and get a prompt, but couldn't type anything
into it. Tried this several times with no luck. Physically sat down at
the computer (no X running) and couldn't even get the screen to wake
up. The monitor's LED made it seem awake, but I only got a black
screen and couldn't even Ctrl-Alt-F2 to get a fresh terminal.

CONTINUED IN PART 2

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-09-21 11:28 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-11 18:20 New RAID causing system lockups Mike Hartman
2010-09-11 18:45 ` Mike Hartman
2010-09-11 20:43 ` Neil Brown
2010-09-11 20:56   ` Mike Hartman
2010-09-13  6:28     ` Mike Hartman
2010-09-13 15:57       ` Mike Hartman
2010-09-13 23:51         ` Neil Brown
     [not found]           ` <AANLkTin=jy=xJTtN5mQ6U=rYw3p+_4-nmkhO7zqR0KLP@mail.gmail.com>
2010-09-14  1:11             ` Mike Hartman
2010-09-14  1:35               ` Neil Brown
2010-09-14  2:50                 ` Mike Hartman
2010-09-14  3:35                   ` Mike Hartman
2010-09-14  3:48                     ` Neil Brown
     [not found]                       ` <AANLkTimXabL-TyjqJ81syrx-Oxn50qexbA8q9p22sxJt@mail.gmail.com>
2010-09-15 21:49                         ` Mike Hartman
2010-09-21  2:26                           ` Neil Brown
2010-09-21 11:28                             ` Mike Hartman
  -- strict thread matches above, loose matches on Subject: below --
2010-09-11 18:13 Mike Hartman
2010-09-11 18:12 Mike Hartman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.