linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bdflush and postgres stuck in D state
@ 2001-09-18 10:56 Jakob Østergaard
  2001-09-18 17:30 ` Jakob Østergaard
  0 siblings, 1 reply; 5+ messages in thread
From: Jakob Østergaard @ 2001-09-18 10:56 UTC (permalink / raw)
  To: linux-kernel


Hello,

I have a machine here with (RedHat 7.0 plus official updates plus kernel.org
kernel):

[osprey:joe] $ uname -a
Linux osprey 2.4.7 #1 Sat Jul 21 21:50:21 CEST 2001 i686 unknown

[osprey:joe] $ gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-85)

/ is mounted on a four disk (software) RAID-5, the same four disks are used for
a RAID-1 but the fs on there is mounted somewhere irrelevant for this post.

All fs are ext2.  The machine is not heavily loaded at all.  What made me
wonder was, that the load on the machine was '1' some days ago, today it was
'2'.  The machine usually has 90%+ CPU idle, and doesn't use the disks very
much.  Looking at 'ps' shows:

[osprey:joe] $ ps ax|grep ' D'
    6 ?        DW     0:26 [bdflush]
10023 ?        D      0:00 /usr/bin/postmaster -D /var/lib/pgsql/data

But there is *NO* disk activity. The processes are just stuck.   As far as
I can see, there's nothing even remotely suspicious in dmesg.

Any ideas ?    I can dig further before rebooting and trying 2.4.9
if someone tells me where to dig...

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdflush and postgres stuck in D state
  2001-09-18 10:56 bdflush and postgres stuck in D state Jakob Østergaard
@ 2001-09-18 17:30 ` Jakob Østergaard
  2001-09-18 17:49   ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Jakob Østergaard @ 2001-09-18 17:30 UTC (permalink / raw)
  To: linux-kernel


Sorry for following up on my own post, I have a little extra
information.

I started a g++ job to try to force the machine to write out some dirty
buffers before I reboot.   g++ now hangs along with two sync's, bdflush
and the postgres process.

This is from top:

  PID USER     PRI  NI  SIZE  RSS SHARE WCHAN     STAT %CPU %MEM   TIME COMMAN
    6 root       9   0     0    0     0 raid1_all DW    0.0  0.0   0:26 bdflush
 1140 joe        9   0 71564  39M     0 wait_on_b D     0.0 32.3   1:04 cc1plus
 1007 root       9   0    72    4     4 wait_on_b D     0.0  0.0   0:00 sync
10023 postgres   9   0   368    4     4 wait_on_b D     0.0  0.0   0:00 postmas


Seems like something's rotten with bdflush and raid1_all (-something).

There is one (software) RAID1 on four SCSI disks in the machine, perhaps RAID1
has a misfeature when more than just two disks are in the mirror ?

Anyway, this machine is going down now - I can't wait anymore, sorry. I wonder
how many file writes didn't make it...

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdflush and postgres stuck in D state
  2001-09-18 17:30 ` Jakob Østergaard
@ 2001-09-18 17:49   ` Andrew Morton
  2001-09-18 21:08     ` David Rees
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2001-09-18 17:49 UTC (permalink / raw)
  To: Jakob Østergaard; +Cc: linux-kernel

Jakob Østergaard wrote:
> 
> Sorry for following up on my own post, I have a little extra
> information.
> 
> I started a g++ job to try to force the machine to write out some dirty
> buffers before I reboot.   g++ now hangs along with two sync's, bdflush
> and the postgres process.
> 

Since 2.4.7 several bugs have been fixed in RAID1 which would
cause this, including a missing blockdevice unplug and failure
to hang onto the supposedly-reserved RAID1 buffer-heads.

-

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdflush and postgres stuck in D state
  2001-09-18 17:49   ` Andrew Morton
@ 2001-09-18 21:08     ` David Rees
  2001-09-19  9:26       ` Jakob Østergaard
  0 siblings, 1 reply; 5+ messages in thread
From: David Rees @ 2001-09-18 21:08 UTC (permalink / raw)
  To: linux-kernel

On Tue, Sep 18, 2001 at 10:49:10AM -0700, Andrew Morton wrote:
> Jakob Østergaard wrote:
> > 
> > Sorry for following up on my own post, I have a little extra
> > information.
> > 
> > I started a g++ job to try to force the machine to write out some dirty
> > buffers before I reboot.   g++ now hangs along with two sync's, bdflush
> > and the postgres process.
> > 
> 
> Since 2.4.7 several bugs have been fixed in RAID1 which would
> cause this, including a missing blockdevice unplug and failure
> to hang onto the supposedly-reserved RAID1 buffer-heads.

Even kernels as recent as 2.4.9 have this bug.  See this thread for more
info and a patch which fixes this bug.

The thread:
http://marc.theaimsgroup.com/?t=99911655500004&w=2&r=1

The patch:
http://marc.theaimsgroup.com/?l=linux-kernel&m=99913223508789&w=2

-Dave

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: bdflush and postgres stuck in D state
  2001-09-18 21:08     ` David Rees
@ 2001-09-19  9:26       ` Jakob Østergaard
  0 siblings, 0 replies; 5+ messages in thread
From: Jakob Østergaard @ 2001-09-19  9:26 UTC (permalink / raw)
  To: David Rees, linux-kernel

On Tue, Sep 18, 2001 at 02:08:20PM -0700, David Rees wrote:
> On Tue, Sep 18, 2001 at 10:49:10AM -0700, Andrew Morton wrote:
> > Jakob Østergaard wrote:
> > > 
> > > Sorry for following up on my own post, I have a little extra
> > > information.
> > > 
> > > I started a g++ job to try to force the machine to write out some dirty
> > > buffers before I reboot.   g++ now hangs along with two sync's, bdflush
> > > and the postgres process.
> > > 
> > 
> > Since 2.4.7 several bugs have been fixed in RAID1 which would
> > cause this, including a missing blockdevice unplug and failure
> > to hang onto the supposedly-reserved RAID1 buffer-heads.
> 
> Even kernels as recent as 2.4.9 have this bug.  See this thread for more
> info and a patch which fixes this bug.
> 
> The thread:
> http://marc.theaimsgroup.com/?t=99911655500004&w=2&r=1
> 
> The patch:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=99913223508789&w=2


Thanks a lot !

Somehow I seem not have lost "most" linux-raid mails, dunno why...  I hadn't
seen that thread before, but it was indeed the problem I saw here too.

I didn't lose any data on the 2.4.7 that did this, but it seems the situation
is more severe in 2.4.9, leading potentially to significant data loss.

/me prepares another boot (and a spare 32MB stick) for the raid-1 box

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2001-09-19  9:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-18 10:56 bdflush and postgres stuck in D state Jakob Østergaard
2001-09-18 17:30 ` Jakob Østergaard
2001-09-18 17:49   ` Andrew Morton
2001-09-18 21:08     ` David Rees
2001-09-19  9:26       ` Jakob Østergaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).