* 2.6.20-rc4-mm1 md problem
@ 2007-01-12 13:33 Michal Piotrowski
2007-01-12 15:52 ` Rafael J. Wysocki
2007-01-15 7:17 ` Ingo Molnar
0 siblings, 2 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-12 13:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: Ingo Molnar, Neil Brown, LKML
My system hangs on this
http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
Debug plan:
- revert md-* patches
- binary search
Does someone have a better idea?
Regards,
Michal
--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem
2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski
@ 2007-01-12 15:52 ` Rafael J. Wysocki
2007-01-12 17:40 ` Michal Piotrowski
2007-01-15 7:17 ` Ingo Molnar
1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2007-01-12 15:52 UTC (permalink / raw)
To: Michal Piotrowski; +Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML
On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> My system hangs on this
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
>
> Debug plan:
> - revert md-* patches
> - binary search
>
> Does someone have a better idea?
Revert git-block.patch and related stuff?
Greetings,
Rafael
--
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem
2007-01-12 15:52 ` Rafael J. Wysocki
@ 2007-01-12 17:40 ` Michal Piotrowski
2007-01-12 17:58 ` Michal Piotrowski
0 siblings, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-12 17:40 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML, Jens Axboe
On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> > My system hangs on this
> > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> >
> > Debug plan:
> > - revert md-* patches
> > - binary search
> >
> > Does someone have a better idea?
>
> Revert git-block.patch and related stuff?
Indeed.
GOOD
#
##git-sym2.patch
#git-scsi-target.patch
#git-scsi-target-fixup.patch
#
git-block.patch
git-block-fixup.patch
BAD
git-block.patch it's huge patch.
diffstat git-block.patch
[..]
drivers/md/bitmap.c | 1
drivers/md/dm-emc.c | 2
drivers/md/dm-table.c | 14 -
drivers/md/dm.c | 18 -
drivers/md/dm.h | 1
drivers/md/linear.c | 14 -
drivers/md/md.c | 3
drivers/md/multipath.c | 32 --
drivers/md/raid0.c | 17 -
drivers/md/raid1.c | 70 -----
drivers/md/raid10.c | 73 ------
drivers/md/raid5.c | 60 ----
[..]
I'll do a binary search through those files.
>
> Greetings,
> Rafael
Regards,
Michal
--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem
2007-01-12 17:40 ` Michal Piotrowski
@ 2007-01-12 17:58 ` Michal Piotrowski
2007-01-14 21:59 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-12 17:58 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Andrew Morton, Ingo Molnar, Neil Brown, LKML, Jens Axboe, Jens Axboe
On 12/01/07, Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> > On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> > > My system hangs on this
> > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> > > http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> > >
> > > Debug plan:
> > > - revert md-* patches
> > > - binary search
> > >
> > > Does someone have a better idea?
> >
> > Revert git-block.patch and related stuff?
>
> Indeed.
>
> GOOD
> #
> ##git-sym2.patch
> #git-scsi-target.patch
> #git-scsi-target-fixup.patch
> #
> git-block.patch
> git-block-fixup.patch
> BAD
>
> git-block.patch it's huge patch.
>
> diffstat git-block.patch
> [..]
> drivers/md/bitmap.c | 1
> drivers/md/dm-emc.c | 2
> drivers/md/dm-table.c | 14 -
> drivers/md/dm.c | 18 -
> drivers/md/dm.h | 1
> drivers/md/linear.c | 14 -
> drivers/md/md.c | 3
> drivers/md/multipath.c | 32 --
> drivers/md/raid0.c | 17 -
> drivers/md/raid1.c | 70 -----
> drivers/md/raid10.c | 73 ------
> drivers/md/raid5.c | 60 ----
> [..]
>
> I'll do a binary search through those files.
>
That wasn't a good idea.
/mnt/md0/devel/linux-mm/drivers/md/raid1.c: In function 'unplug_slaves':
/mnt/md0/devel/linux-mm/drivers/md/raid1.c:556: error:
'request_queue_t' has no member named 'unplug_fn'
Regards,
Michal
--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem
2007-01-12 17:58 ` Michal Piotrowski
@ 2007-01-14 21:59 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2007-01-14 21:59 UTC (permalink / raw)
To: Michal Piotrowski
Cc: Rafael J. Wysocki, Andrew Morton, Ingo Molnar, Neil Brown, LKML
On Fri, Jan 12 2007, Michal Piotrowski wrote:
> On 12/01/07, Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> >On 12/01/07, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> >> On Friday, 12 January 2007 14:33, Michal Piotrowski wrote:
> >> > My system hangs on this
> >> >
> >http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> >> >
> >http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
> >> >
> >> > Debug plan:
> >> > - revert md-* patches
> >> > - binary search
> >> >
> >> > Does someone have a better idea?
> >>
> >> Revert git-block.patch and related stuff?
> >
> >Indeed.
> >
> >GOOD
> >#
> >##git-sym2.patch
> >#git-scsi-target.patch
> >#git-scsi-target-fixup.patch
> >#
> >git-block.patch
> >git-block-fixup.patch
> >BAD
> >
> >git-block.patch it's huge patch.
> >
> >diffstat git-block.patch
> >[..]
> >drivers/md/bitmap.c | 1
> > drivers/md/dm-emc.c | 2
> > drivers/md/dm-table.c | 14 -
> > drivers/md/dm.c | 18 -
> > drivers/md/dm.h | 1
> > drivers/md/linear.c | 14 -
> > drivers/md/md.c | 3
> > drivers/md/multipath.c | 32 --
> > drivers/md/raid0.c | 17 -
> > drivers/md/raid1.c | 70 -----
> > drivers/md/raid10.c | 73 ------
> > drivers/md/raid5.c | 60 ----
> >[..]
> >
> >I'll do a binary search through those files.
> >
>
> That wasn't a good idea.
>
> /mnt/md0/devel/linux-mm/drivers/md/raid1.c: In function 'unplug_slaves':
> /mnt/md0/devel/linux-mm/drivers/md/raid1.c:556: error:
> 'request_queue_t' has no member named 'unplug_fn'
You can't go throught he individual files and expect that to compile,
yet alone work. Fortunately I think I already fixed this last week,
unfortunately I think I forgot to push the #plug branch for #for-akpm so
that Andrew automatically picks it up...
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem
2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski
2007-01-12 15:52 ` Rafael J. Wysocki
@ 2007-01-15 7:17 ` Ingo Molnar
2007-01-15 8:08 ` Thomas Gleixner
1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2007-01-15 7:17 UTC (permalink / raw)
To: Michal Piotrowski
Cc: Andrew Morton, Neil Brown, LKML, Thomas Gleixner, Jens Axboe
* Michal Piotrowski <michal.k.k.piotrowski@gmail.com> wrote:
> My system hangs on this
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/bug2.jpg
> http://www.stardust.webpages.pl/files/tbf/euridica/2.6.20-rc4-mm1/mm-config
>
> Debug plan:
> - revert md-* patches
> - binary search
>
> Does someone have a better idea?
Thomas saw something similar yesterday and he the partial results that
git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or
filesystems drivers. For me it worked fine, so it must be only on some
combinations. The changes to ll_rw_block.c look quite extensive.
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.20-rc4-mm1 md problem
2007-01-15 7:17 ` Ingo Molnar
@ 2007-01-15 8:08 ` Thomas Gleixner
2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-01-15 8:08 UTC (permalink / raw)
To: Ingo Molnar
Cc: Michal Piotrowski, Andrew Morton, Neil Brown, LKML, Jens Axboe
On Mon, 2007-01-15 at 08:17 +0100, Ingo Molnar wrote:
> > Debug plan:
> > - revert md-* patches
> > - binary search
> >
> > Does someone have a better idea?
>
> Thomas saw something similar yesterday and he the partial results that
> git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or
> filesystems drivers. For me it worked fine, so it must be only on some
> combinations. The changes to ll_rw_block.c look quite extensive.
Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* [patch-mm] Workaround for RAID breakage
2007-01-15 8:08 ` Thomas Gleixner
@ 2007-01-15 18:03 ` Thomas Gleixner
2007-01-15 20:17 ` Michal Piotrowski
2007-01-16 0:41 ` Jens Axboe
0 siblings, 2 replies; 12+ messages in thread
From: Thomas Gleixner @ 2007-01-15 18:03 UTC (permalink / raw)
To: Ingo Molnar
Cc: Michal Piotrowski, Andrew Morton, Neil Brown, LKML, Jens Axboe
On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote:
> > Thomas saw something similar yesterday and he the partial results that
> > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or
> > filesystems drivers. For me it worked fine, so it must be only on some
> > combinations. The changes to ll_rw_block.c look quite extensive.
>
> Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.
I tracked this down and found two problems:
- The new plug/unplug code does not check for underruns. That allows the
plug count (ioc->plugged) to become negative. This gets triggered from
various places.
AFAICS this is intentional to avoid checks all over the place, but the
underflow check is missing. All we need to do is make sure, that in case
of ioc->plugged == 0 we return early and bug, if there is either a queue
plugged in or the plugged_list is not empty.
Jens ?
- The raid1 code has no bitmap set in remount r/w. So the
pending_bio_list gets not processed for quite a time. The workaround is
to kick mddev->thread, so the list is processed. Not sure about that.
Neil ?
At least it boots and behaves normal.
tglx
Index: linux-2.6.20-rc4-mm1/block/ll_rw_blk.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/block/ll_rw_blk.c
+++ linux-2.6.20-rc4-mm1/block/ll_rw_blk.c
@@ -3757,6 +3757,12 @@ void blk_unplug_current(void)
if (!ioc)
return;
+ if (!ioc->plugged) {
+ BUG_ON(!list_empty(&ioc->plugged_list));
+ BUG_ON(ioc->plugged_queue);
+ return;
+ }
+
ioc->plugged--;
if (ioc->plugged)
return;
Index: linux-2.6.20-rc4-mm1/drivers/md/raid1.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/drivers/md/raid1.c
+++ linux-2.6.20-rc4-mm1/drivers/md/raid1.c
@@ -897,7 +897,7 @@ static int make_request(request_queue_t
spin_unlock_irqrestore(&conf->device_lock, flags);
- if (do_sync)
+ if (do_sync || !bitmap)
md_wakeup_thread(mddev->thread);
#if 0
while ((bio = bio_list_pop(&bl)) != NULL)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage
2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
@ 2007-01-15 20:17 ` Michal Piotrowski
2007-01-16 0:41 ` Jens Axboe
1 sibling, 0 replies; 12+ messages in thread
From: Michal Piotrowski @ 2007-01-15 20:17 UTC (permalink / raw)
To: tglx; +Cc: Ingo Molnar, Andrew Morton, Neil Brown, LKML, Jens Axboe
On 15/01/07, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote:
> > > Thomas saw something similar yesterday and he the partial results that
> > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or
> > > filesystems drivers. For me it worked fine, so it must be only on some
> > > combinations. The changes to ll_rw_block.c look quite extensive.
> >
> > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.
>
> I tracked this down and found two problems:
>
> - The new plug/unplug code does not check for underruns. That allows the
> plug count (ioc->plugged) to become negative. This gets triggered from
> various places.
>
> AFAICS this is intentional to avoid checks all over the place, but the
> underflow check is missing. All we need to do is make sure, that in case
> of ioc->plugged == 0 we return early and bug, if there is either a queue
> plugged in or the plugged_list is not empty.
>
> Jens ?
>
> - The raid1 code has no bitmap set in remount r/w. So the
> pending_bio_list gets not processed for quite a time. The workaround is
> to kick mddev->thread, so the list is processed. Not sure about that.
>
> Neil ?
>
> At least it boots and behaves normal.
Yes, it works fine now. Thanks!
> tglx
Regards,
Michal
--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage
2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
2007-01-15 20:17 ` Michal Piotrowski
@ 2007-01-16 0:41 ` Jens Axboe
2007-01-16 8:27 ` Thomas Gleixner
1 sibling, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2007-01-16 0:41 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML
On Mon, Jan 15 2007, Thomas Gleixner wrote:
> On Mon, 2007-01-15 at 09:08 +0100, Thomas Gleixner wrote:
> > > Thomas saw something similar yesterday and he the partial results that
> > > git.block (between rc2-mm1 and rc4-mm1) breaks certain disk drivers or
> > > filesystems drivers. For me it worked fine, so it must be only on some
> > > combinations. The changes to ll_rw_block.c look quite extensive.
> >
> > Yes. Jens Axboe confirmed yesterday that the plug changes broke RAID.
>
> I tracked this down and found two problems:
>
> - The new plug/unplug code does not check for underruns. That allows the
> plug count (ioc->plugged) to become negative. This gets triggered from
> various places.
>
> AFAICS this is intentional to avoid checks all over the place, but the
> underflow check is missing. All we need to do is make sure, that in case
> of ioc->plugged == 0 we return early and bug, if there is either a queue
> plugged in or the plugged_list is not empty.
>
> Jens ?
It should not go negative, that would be a bug elsewhere. So it's
interesting if it does, we should definitely put a WARN_ON() check in
there for that.
> - The raid1 code has no bitmap set in remount r/w. So the
> pending_bio_list gets not processed for quite a time. The workaround is
> to kick mddev->thread, so the list is processed. Not sure about that.
>
> Neil ?
Super, thanks for that Thomas! I'll merge it in the plug branch.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage
2007-01-16 0:41 ` Jens Axboe
@ 2007-01-16 8:27 ` Thomas Gleixner
2007-01-16 10:07 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Thomas Gleixner @ 2007-01-16 8:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML
On Tue, 2007-01-16 at 11:41 +1100, Jens Axboe wrote:
> > AFAICS this is intentional to avoid checks all over the place, but the
> > underflow check is missing. All we need to do is make sure, that in case
> > of ioc->plugged == 0 we return early and bug, if there is either a queue
> > plugged in or the plugged_list is not empty.
> >
> > Jens ?
>
> It should not go negative, that would be a bug elsewhere. So it's
> interesting if it does, we should definitely put a WARN_ON() check in
> there for that.
Well. All offenders come via queue_sync_plugs(). queue_sync_plugs()
calls blk_unplug_current().
One path which triggers is blk_sync_queue(). This happens e.g. in the
cleanup of the floppy check. There are other call pathes too.
The other is raid md_super_write(). It is not plugged and calls with the
barrier bit set, which executes the unlikely path in __make_request():
if (unlikely(bio_barrier(bio))) {
queue_sync_plugs(q);
spin_lock_irq(q->queue_lock);
goto get_rq;
}
So you either need checks all over the place to avoid calling
blk_unplug_current(), or you prevent the unplug below 0 like I did. The
BUG_ON()s I added should catch any real invalid callers. But it's up to
you.
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [patch-mm] Workaround for RAID breakage
2007-01-16 8:27 ` Thomas Gleixner
@ 2007-01-16 10:07 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2007-01-16 10:07 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, Michal Piotrowski, Andrew Morton, Neil Brown, LKML
On Tue, Jan 16 2007, Thomas Gleixner wrote:
> On Tue, 2007-01-16 at 11:41 +1100, Jens Axboe wrote:
> > > AFAICS this is intentional to avoid checks all over the place, but the
> > > underflow check is missing. All we need to do is make sure, that in case
> > > of ioc->plugged == 0 we return early and bug, if there is either a queue
> > > plugged in or the plugged_list is not empty.
> > >
> > > Jens ?
> >
> > It should not go negative, that would be a bug elsewhere. So it's
> > interesting if it does, we should definitely put a WARN_ON() check in
> > there for that.
>
> Well. All offenders come via queue_sync_plugs(). queue_sync_plugs()
> calls blk_unplug_current().
Ah, again a problem because the #plug branch wasn't pushed to #for-akpm
in due time. That problem was fixed already :/
> One path which triggers is blk_sync_queue(). This happens e.g. in the
> cleanup of the floppy check. There are other call pathes too.
>
> The other is raid md_super_write(). It is not plugged and calls with the
> barrier bit set, which executes the unlikely path in __make_request():
>
> if (unlikely(bio_barrier(bio))) {
> queue_sync_plugs(q);
> spin_lock_irq(q->queue_lock);
> goto get_rq;
> }
>
> So you either need checks all over the place to avoid calling
> blk_unplug_current(), or you prevent the unplug below 0 like I did. The
> BUG_ON()s I added should catch any real invalid callers. But it's up to
> you.
blk_replug_current_nested() should do it. I'll make sure the branch is
updated for next -mm.
The BUG_ON()'s are indeed correct, they should just be moved further
down the function.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-01-16 10:08 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-12 13:33 2.6.20-rc4-mm1 md problem Michal Piotrowski
2007-01-12 15:52 ` Rafael J. Wysocki
2007-01-12 17:40 ` Michal Piotrowski
2007-01-12 17:58 ` Michal Piotrowski
2007-01-14 21:59 ` Jens Axboe
2007-01-15 7:17 ` Ingo Molnar
2007-01-15 8:08 ` Thomas Gleixner
2007-01-15 18:03 ` [patch-mm] Workaround for RAID breakage Thomas Gleixner
2007-01-15 20:17 ` Michal Piotrowski
2007-01-16 0:41 ` Jens Axboe
2007-01-16 8:27 ` Thomas Gleixner
2007-01-16 10:07 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.