linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neil Brown <neilb@suse.de>
To: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@osdl.org>,
	Reuben Farrelly <reuben-lkml@reub.net>,
	linux-kernel@vger.kernel.org
Subject: Re: 2.6.15-mm2
Date: Wed, 11 Jan 2006 15:16:40 +1100	[thread overview]
Message-ID: <17348.34472.105452.831193@cse.unsw.edu.au> (raw)
In-Reply-To: message from Ingo Molnar on Tuesday January 10

On Tuesday January 10, mingo@elte.hu wrote:
> 
> * Andrew Morton <akpm@osdl.org> wrote:
> 
> > Reuben Farrelly <reuben-lkml@reub.net> wrote:
> > >
> > > Ok here's the latest one, this time with KALLSYMS_ALL, CONFIG_FRAME_POINTER, 
> > >  CONFIG_DETECT_SOFTLOCKUP and the DEBUG_WARN_ON(current->state != TASK_RUNNING); 
> > >  patch from Ingo.
> > 
> > This is quite ugly.  I'd be suspecting a block layer problem: RAID or 
> > the underlying device driver (ahci) has lost an IO.
> 
> yeah, now it more looks like that to me too. What happens is a raid1 
> resync happens in the background - which is one of the more complex 
> raid1 workloads - and there've been a good number of md patches 
> recently. Reuben, does -git5 show the same symptoms?

There isn't a resync happening - if there was you would a process
called
   mdX_resync
(for some X).

What I see here is:
 pdflush at:
Call Trace:
  [<c02a2f72>] md_write_start+0xbc/0x150
  [<c029a659>] make_request+0x51/0x432
  [<c01e1146>] generic_make_request+0xbe/0x13d
  [<c01e120e>] submit_bio+0x49/0xd3

So it is trying to write to a raid1 which was 'clean' and needs to
be marked 'dirty' (or 'active') before the first write.
md_start_write arranges for the array's thread to do this.
What is that thread doing?

md2_raid1     D F7227200     0   386     11           390   382 (L-TLB)
  ...
Call Trace:
  [<c029d004>] md_super_wait+0xd5/0xea
  [<c02a4f93>] bitmap_unplug+0x1d8/0x1df
  [<c029b72b>] raid1d+0x7d/0x555
  [<c02a211a>] md_thread+0x44/0x14f

It probably hasn't tried to write out the superblock, and just
now it is writing out some write-intent-bitmap entries and waiting
for the write to complete.

md_super_wait is waiting for 'pending_writes' to become zero.
It is incremented when any superblock or bitmap write starts, and
is decremented when that write completes.

So a lost write request in one of the components of the array could
cause this, but it is too easy to simply blame it on someone else....

But there is something I don't understand....

If md2_raid1 is in bitmap_unplug, that means there are outstanding
write requests to md2_raid1, so the one that pdflush is currently
generating cannot be the first.

This suggests that pdflush is not writing to md2, but to something
else.
Ahhhh.. md0_raid1 is also blocked:
Call Trace:
  [<c029d004>] md_super_wait+0xd5/0xea
  [<c029ec29>] md_update_sb+0xc9/0x153
  [<c02a3a20>] md_check_recovery+0x182/0x437
  [<c029b6cd>] raid1d+0x1f/0x555

It has just updated the superblocks for md0 and is waiting for those
writes to complete.  But they don't seem to want to complete.

So it seems that two raid1 arrays are blocked in slightly different
places.

I'm tempted to blame the IO scheduled, only because there have been
vaguely similar problems in the recent past that can be avoided by
changing the scheduler.

Reuben:  could you check what IO scheduler your drives are using, and 
try changing it.  I suspect they use 'as' by default.  Try 'cfq' or
'deadline'.

NeilBrown

  reply	other threads:[~2006-01-11  4:16 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-07 13:22 2.6.15-mm2 Andrew Morton
2006-01-07 13:23 ` 2.6.15-mm2 Andrew Morton
2006-01-07 15:05 ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 21:31   ` 2.6.15-mm2 Andrew Morton
2006-01-07 22:06     ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:15       ` 2.6.15-mm2 Reuben Farrelly
2006-01-07 23:40         ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:15     ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:30       ` 2.6.15-mm2 Andrew Morton
2006-01-10 10:58         ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 10:47       ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:52         ` 2.6.15-mm2 Ingo Molnar
2006-01-10 10:58           ` 2.6.15-mm2 Ingo Molnar
2006-01-10 11:34           ` 2.6.15-mm2 Ingo Molnar
2006-01-10 12:28         ` 2.6.15-mm2 Reuben Farrelly
2006-01-10 12:42           ` 2.6.15-mm2 Andrew Morton
2006-01-10 13:16             ` 2.6.15-mm2 Ingo Molnar
2006-01-11  4:16               ` Neil Brown [this message]
2006-01-11  5:15                 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11  5:30                   ` 2.6.15-mm2 Andrew Morton
2006-01-11  5:30                     ` 2.6.15-mm2 Andrew Morton
2006-01-11 10:49                       ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:05                         ` 2.6.15-mm2 Andrew Morton
2006-01-11 11:13                           ` 2.6.15-mm2 Jens Axboe
2006-01-11 11:40                             ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 11:56                               ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:39                                 ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 14:52                                   ` 2.6.15-mm2 Jens Axboe
2006-01-11 14:55                                     ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:23                                       ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 19:45                                         ` 2.6.15-mm2 Jens Axboe
2006-01-11 19:53                                           ` 2.6.15-mm2 Jens Axboe
2006-01-12  3:49                                             ` 2.6.15-mm2 Reuben Farrelly
2006-01-12  8:00                                               ` 2.6.15-mm2 Tejun Heo
2006-01-12  8:22                                                 ` 2.6.15-mm2 Jens Axboe
     [not found]                                                 ` <43C61598.7050004@reub.net>
2006-01-12 11:18                                                   ` 2.6.15-mm2 Tejun Heo
2006-01-12 12:05                                                     ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 12:31                                                       ` 2.6.15-mm2 Ric Wheeler
2006-01-12 12:39                                                         ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 13:55                                                           ` 2.6.15-mm2 Tejun Heo
2006-01-12 14:10                                                             ` 2.6.15-mm2 Jens Axboe
2006-01-12 14:20                                                               ` 2.6.15-mm2 Tejun Heo
2006-01-12 19:26                                                             ` 2.6.15-mm2 Reuben Farrelly
2006-01-12 20:32                                                               ` 2.6.15-mm2 Andrew Morton
2006-01-12 20:51                                                                 ` 2.6.15-mm2 Jeff Garzik
2006-01-13  4:49                                                                   ` 2.6.15-mm2 Reuben Farrelly
2006-01-11 21:44                                 ` 2.6.15-mm2 Neil Brown
2006-01-12  7:35                                   ` 2.6.15-mm2 Jens Axboe
2006-01-07 15:08 ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:47   ` 2.6.15-mm2 Jesper Juhl
2006-01-09 17:57     ` 2.6.15-mm2 Dave Jones
2006-01-09 18:01       ` 2.6.15-mm2 Jesper Juhl
2006-01-09 18:24         ` 2.6.15-mm2 Hugh Dickins
2006-01-09 18:48           ` 2.6.15-mm2 Jesper Juhl
2006-01-09 19:16             ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:21               ` 2.6.15-mm2 Hugh Dickins
2006-01-09 19:39               ` 2.6.15-mm2 Jesper Juhl
2006-01-09 20:15                 ` 2.6.15-mm Hugh Dickins
2006-01-09 20:30                   ` 2.6.15-mm Jesper Juhl
2006-01-09 20:41                     ` 2.6.15-mm Hugh Dickins
2006-01-09 20:46                       ` [PATCH] fix Jesper's sg_page_free Bad page states Hugh Dickins
2006-01-09 20:44                   ` 2.6.15-mm Mike Christie
2006-01-09 21:04                     ` 2.6.15-mm Hugh Dickins
2006-01-07 16:20 ` 2.6.15-mm2: why is __get_page_state() global again? Adrian Bunk
2006-01-07 18:00 ` [-mm patch] drivers/block/amiflop.c: fix compilation Adrian Bunk
2006-01-07 18:19 ` [-mm patch] drivers/acpi/: make two functions static Adrian Bunk
2006-01-07 18:21 ` [-mm patch] kernel/synchro-test.c: make 5 " Adrian Bunk
2006-01-07 19:31 ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:04   ` 2.6.15-mm2 Dave Jones
2006-01-07 21:26     ` 2.6.15-mm2 Brice Goglin
2006-01-07 21:29       ` 2.6.15-mm2 David S. Miller
2006-01-07 21:41       ` 2.6.15-mm2 Arjan van de Ven
2006-01-07 21:42       ` 2.6.15-mm2 Dave Jones
2006-01-07 21:50         ` 2.6.15-mm2 Brice Goglin
2006-01-07 22:13           ` 2.6.15-mm2 Dave Jones
2006-01-07 22:26             ` 2.6.15-mm2 Brice Goglin
2006-01-11 18:41       ` 2.6.15-mm2 Brice Goglin
2006-01-11 20:29         ` 2.6.15-mm2 Dave Jones
2006-01-11 21:50           ` 2.6.15-mm2 Dave Airlie
2006-01-11 21:56             ` 2.6.15-mm2 Dave Jones
2006-01-11 23:50               ` 2.6.15-mm2 Dave Airlie
2006-01-12 10:58           ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 17:11             ` 2.6.15-mm2 Dave Jones
2006-01-12 18:11               ` 2.6.15-mm2 Ulrich Mueller
2006-01-12 20:37                 ` 2.6.15-mm2 Dave Airlie
2006-01-12 21:03                   ` 2.6.15-mm2 Alan Hourihane
2006-01-12 22:02                     ` 2.6.15-mm2 Dave Airlie
2006-01-13  8:32                       ` 2.6.15-mm2 Alan Hourihane
2006-01-13 16:49                         ` 2.6.15-mm2 Dave Jones
2006-01-12 19:12               ` 2.6.15-mm2 Brice Goglin
2006-01-12 19:21                 ` 2.6.15-mm2 Dave Jones
2006-01-07 22:58   ` 2.6.15-mm2 Andrew Morton
2006-01-07 23:38     ` 2.6.15-mm2 Brice Goglin
2006-01-08 12:24       ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:39         ` 2.6.15-mm2 Brice Goglin
2006-01-08 18:56           ` 2.6.15-mm2 Andrew Morton
2006-01-08 12:28       ` 2.6.15-mm2 Andrew Morton
2006-01-08 14:14         ` 2.6.15-mm2 Brice Goglin
2006-01-07 20:51 ` Badness in __mutex_unlock_slowpath Andrew James Wade
2006-01-07 21:13   ` Arjan van de Ven
2006-01-08  8:53     ` Ingo Molnar
2006-01-07 21:06 ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-07 23:48   ` Andrew Morton
2006-01-08  0:45     ` [PATCH -mm] fixup *at syscalls additions (alpha, sparc64) Alexey Dobriyan
2006-01-08  0:54     ` [PATCH -mm] Fixup arch/alpha/mm/init.c compilation Alexey Dobriyan
2006-01-08 12:31     ` 2.6.15-mm2: alpha broken Alexey Dobriyan
2006-01-11  2:24     ` Paul Jackson
2006-01-13 14:11       ` Adrian Bunk
2006-01-13 15:52         ` Paul Jackson
2006-01-13 16:37         ` Al Viro
2006-01-13 18:10         ` Paul Jackson
2006-01-13 18:19           ` Randy.Dunlap
2006-01-13 19:05             ` Thomas Gleixner
2006-01-13 21:08             ` Adrian Bunk
2006-01-13 21:12               ` Randy.Dunlap
2006-01-13 21:32                 ` Adrian Bunk
2006-01-13 21:52                   ` Paul Jackson
2006-01-13 22:18                     ` Andrew Morton
2006-01-13 19:26           ` Andrew Morton
2006-01-13 21:05           ` Adrian Bunk
2006-01-08  0:40 ` 2.6.15-mm2 Alexander Gran
     [not found] ` <200601080139.34774@zodiac.zodiac.dnsalias.org>
     [not found]   ` <20060107175056.3d7a2895.akpm@osdl.org>
2006-01-10  0:30     ` 2.6.15-mm2 Alexander Gran
2006-01-10  1:22       ` 2.6.15-mm2 Andrew Morton
2006-01-10 21:20 ` 2.6.15-mm2 Serge E. Hallyn
2006-01-07 21:51 2.6.15-mm2 Chuck Ebbert
2006-01-07 22:01 2.6.15-mm2 Chuck Ebbert
2006-01-08  8:16 2.6.15-mm2 Brown, Len
2006-01-08 14:23 ` 2.6.15-mm2 Brice Goglin
2006-01-08  8:19 2.6.15-mm2 Brown, Len
2006-01-08  9:40 ` 2.6.15-mm2 Reuben Farrelly
2006-01-08 17:58 2.6.15-mm2 Brown, Len
2006-01-08 18:08 2.6.15-mm2 Brown, Len
2006-01-08 18:18 2.6.15-mm2 Brown, Len

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=17348.34472.105452.831193@cse.unsw.edu.au \
    --to=neilb@suse.de \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=reuben-lkml@reub.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).