linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: OOM Killer killing whole system
@ 2006-01-23 12:55 Nicolas Mailhot
  0 siblings, 0 replies; 22+ messages in thread
From: Nicolas Mailhot @ 2006-01-23 12:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: kalin, chase.venters, axboe, arjan, akpm, James.Bottomley,
	a.titov, davej, jgarzik

Chase Venters wrote :
> Just a shot in the dark, but in the last few kernel revisions have you
> experienced any SATA problems with DMA timeouts, in some versions
leading to
> a hang?

Like this for example ?
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=177951
http://bugzilla.kernel.org/show_bug.cgi?id=5914

Jens Axboe wrote :
> Just a note - you seem to have the raid1 in common with the rest of the
> reporters so far.

This is a sata x86-64 raid1 system too.
With recent git kernels raid goes half-dead at boot, with FS corruption.

I think I also managed to have it OOM a few weeks ago (2.6.15 or
pre-2.6.15) which is quite a feat for a 2-GiB desktop (didn't report it at
the time, possible culprits where either ivtv of massive IO - I
shuffled/processed ~10 GiB of picture data from SATA CDs and then all
around the FS). The memory was unreclaimable, didn't know to check for
slab at the time. (also when I did use ivtv it generated some fat mpeg2
files)
The new corrupting problem is whith kernels without ivtv (I only build it
for kernels which boot fine)

All the kernels are un-tainted (only patches are davej ones and in one
case v4l+ivtv)

Regards,

-- 
Nicolas Mailhot


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  3:29               ` Anton Titov
  2006-01-21  3:44                 ` Andrew Morton
@ 2006-01-25 22:20                 ` Matthias Urlichs
  1 sibling, 0 replies; 22+ messages in thread
From: Matthias Urlichs @ 2006-01-25 22:20 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

Hi, Anton Titov wrote:

> Just to mention, that 2.6.14.2 does not have this problem:

That's good, in that a couple of iterations with "git bisect" should
be able to pinpoint the bug.

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
:hack on: vt. [very common] To {hack}; implies that the subject is some
   pre-existing hunk of code that one is evolving, as opposed to something
   one might {hack up}.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-23  8:41             ` Arjan van de Ven
@ 2006-01-23  9:10               ` Chase Venters
  0 siblings, 0 replies; 22+ messages in thread
From: Chase Venters @ 2006-01-23  9:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jens Axboe, Andrew Morton, Anton Titov, linux-kernel, linux-scsi

On Monday 23 January 2006 02:41, Arjan van de Ven wrote:
> > Just a note - you seem to have the raid1 in common with the rest of the
> > reporters so far.
>
> time to get out some of the obvious heavy hitters.. and enable slab
> debug and CONFIG_DEBUG_PAGEALLOC just with the chance to catch a random
> scribble of sorts

Will do this tomorrow. Please note that md1 is only used on the 
unmounted /boot filesystem. mount says:

turbotaz linux # mount
/dev/md/1 on / type reiserfs (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
udev on /dev type tmpfs (rw,nosuid)
devpts on /dev/pts type devpts (rw)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
usbfs on /proc/bus/usb type usbfs (rw,devmode=0664,devgid=85)

/dev/md/1 is raid10.

Cheers,
Chase

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-23  8:39           ` Jens Axboe
@ 2006-01-23  8:41             ` Arjan van de Ven
  2006-01-23  9:10               ` Chase Venters
  0 siblings, 1 reply; 22+ messages in thread
From: Arjan van de Ven @ 2006-01-23  8:41 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Chase Venters, Andrew Morton, Anton Titov, linux-kernel, linux-scsi

On Mon, 2006-01-23 at 09:39 +0100, Jens Axboe wrote:
> On Fri, Jan 20 2006, Chase Venters wrote:
> > On Friday 20 January 2006 16:49, Andrew Morton wrote:
> > > This is 2.6.15 and we have a deadly bug in scsi.
> > >
> > > Next time you reboot 2.6.15 on that machine can you please send the output
> > > of `dmesg -s 1000000'?  You might have to set CONFIG_LOG_BUF_SHIFT=17 to
> > > prevent it from being truncated.
> > 
> > Here's mine (attached). Curious - the -s... were you expecting the
> > ring buffer to exceed 16384? I don't think my (boot time) buffer does.
> 
> Just a note - you seem to have the raid1 in common with the rest of the
> reporters so far.

time to get out some of the obvious heavy hitters.. and enable slab
debug and CONFIG_DEBUG_PAGEALLOC just with the chance to catch a random
scribble of sorts


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  0:19         ` Chase Venters
  2006-01-21  0:50           ` Andrew Morton
@ 2006-01-23  8:39           ` Jens Axboe
  2006-01-23  8:41             ` Arjan van de Ven
  1 sibling, 1 reply; 22+ messages in thread
From: Jens Axboe @ 2006-01-23  8:39 UTC (permalink / raw)
  To: Chase Venters; +Cc: Andrew Morton, Anton Titov, linux-kernel, linux-scsi

On Fri, Jan 20 2006, Chase Venters wrote:
> On Friday 20 January 2006 16:49, Andrew Morton wrote:
> > This is 2.6.15 and we have a deadly bug in scsi.
> >
> > Next time you reboot 2.6.15 on that machine can you please send the output
> > of `dmesg -s 1000000'?  You might have to set CONFIG_LOG_BUF_SHIFT=17 to
> > prevent it from being truncated.
> 
> Here's mine (attached). Curious - the -s... were you expecting the
> ring buffer to exceed 16384? I don't think my (boot time) buffer does.

Just a note - you seem to have the raid1 in common with the rest of the
reporters so far.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-23  2:56                       ` Kalin KOZHUHAROV
@ 2006-01-23  3:28                         ` Chase Venters
  0 siblings, 0 replies; 22+ messages in thread
From: Chase Venters @ 2006-01-23  3:28 UTC (permalink / raw)
  To: Kalin KOZHUHAROV; +Cc: linux-kernel, linux-scsi

On Sunday 22 January 2006 20:56, Kalin KOZHUHAROV wrote:
> I have two of these boards and one of them is constantly hanging, just
> simply dead. With 2.6.15 it reports failed I/O (SATA here) and mounts
> reiserfs root RO. sky2 works for me, but I had another hang, so sk98lin
> might not be the culprit.

Really? I had serious problems with mine hanging in earlier kernel revisions. 
I haven't seen a hang yet on 2.6.15, but that may be because I've not made it 
to a longer uptime because of the scsi leak. 

When I hang I get complaints about DMA timeouts / weird ATA port statuses as 
the last messages on my serial console. After that, not even SysRQ works.

Cheers,
Chase


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  4:35                     ` Chase Venters
@ 2006-01-23  2:56                       ` Kalin KOZHUHAROV
  2006-01-23  3:28                         ` Chase Venters
  0 siblings, 1 reply; 22+ messages in thread
From: Kalin KOZHUHAROV @ 2006-01-23  2:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

Chase Venters wrote:
> On Friday 20 January 2006 22:21, Anton Titov wrote:
>> On Fri, 2006-01-20 at 21:53 -0600, Chase Venters wrote:
>>> Random guess... Asus P5GDC-V with Firewire and USB turned off?
>> Exactly (Asus P5GDC-V Deluxe actually, with few more things off). So
>> maybe it's ICH6?
> 
> Just a shot in the dark, but in the last few kernel revisions have you 
> experienced any SATA problems with DMA timeouts, in some versions leading to 
> a hang?

I have two of these boards and one of them is constantly hanging, just
simply dead. With 2.6.15 it reports failed I/O (SATA here) and mounts
reiserfs root RO. sky2 works for me, but I had another hang, so sk98lin
might not be the culprit.

The other box (the difference is the SATA drive and the CD) is working OK,
almost 4d uptime now.

Will try to revive the black machine and report more.
(the bad machine is called black and the good is called white :-)

Kalin.

-- 
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  4:21                   ` Anton Titov
@ 2006-01-21  4:35                     ` Chase Venters
  2006-01-23  2:56                       ` Kalin KOZHUHAROV
  0 siblings, 1 reply; 22+ messages in thread
From: Chase Venters @ 2006-01-21  4:35 UTC (permalink / raw)
  To: Anton Titov; +Cc: JamesBottomley, linux-kernel, linux-scsi

On Friday 20 January 2006 22:21, Anton Titov wrote:
> On Fri, 2006-01-20 at 21:53 -0600, Chase Venters wrote:
> > Random guess... Asus P5GDC-V with Firewire and USB turned off?
>
> Exactly (Asus P5GDC-V Deluxe actually, with few more things off). So
> maybe it's ICH6?

Just a shot in the dark, but in the last few kernel revisions have you 
experienced any SATA problems with DMA timeouts, in some versions leading to 
a hang?

Cheers,
Chase

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  3:53                 ` Chase Venters
@ 2006-01-21  4:21                   ` Anton Titov
  2006-01-21  4:35                     ` Chase Venters
  0 siblings, 1 reply; 22+ messages in thread
From: Anton Titov @ 2006-01-21  4:21 UTC (permalink / raw)
  To: Chase Venters; +Cc: JamesBottomley, linux-kernel, linux-scsi

On Fri, 2006-01-20 at 21:53 -0600, Chase Venters wrote:
> Random guess... Asus P5GDC-V with Firewire and USB turned off?

Exactly (Asus P5GDC-V Deluxe actually, with few more things off). So
maybe it's ICH6?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  3:45               ` Anton Titov
@ 2006-01-21  3:53                 ` Chase Venters
  2006-01-21  4:21                   ` Anton Titov
  0 siblings, 1 reply; 22+ messages in thread
From: Chase Venters @ 2006-01-21  3:53 UTC (permalink / raw)
  To: Anton Titov; +Cc: James Bottomley, Andrew Morton, linux-kernel, linux-scsi

On Friday 20 January 2006 21:44, Anton Titov wrote:
> 00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor
> to I/O Controller (rev 0e)
> 00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL
> Express Chipset Family Graphics Controller (rev 0e)
> 00:02.1 Display controller: Intel Corporation 82915G Express Chipset
> Family Graphics Controller (rev 0e)
> 00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) PCI Express Port 1 (rev 03)
> 00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) PCI Express Port 2 (rev 03)
> 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
> 00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC
> Interface Bridge (rev 03)
> 00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
> Family) IDE Controller (rev 03)
> 00:1f.2 SATA controller: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW)
> SATA Controller (rev 03)
> 00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
> SMBus Controller (rev 03)
> 01:04.0 Mass storage controller: <pci_lookup_name: buffer too small>
> (rev 13)
> 02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
> Gigabit Ethernet Controller (rev 15)

Random guess... Asus P5GDC-V with Firewire and USB turned off?

00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor to I/O 
Controller (rev 04)
00:01.0 PCI bridge: Intel Corporation 915G/P/GV/GL/PL/910GL PCI Express Root 
Port (rev 04)
00:1b.0 Class 0403: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) High 
Definition Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI 
Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) PCI 
Express Port 2 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
USB UHCI #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
USB UHCI #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
USB UHCI #3 (rev 03)
00:1d.3 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
USB UHCI #4 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
USB2 EHCI Controller (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC Interface 
Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) 
IDE Controller (rev 03)
00:1f.2 Class 0106: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW) SATA 
Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) SMBus 
Controller (rev 03)
01:03.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 
Controller (PHY/Link)
01:04.0 Mass storage controller: <pci_lookup_name: buffer too small> (rev 13)
01:09.0 Multimedia audio controller: Creative Labs SB Audigy (rev 04)
01:09.1 Input device controller: Creative Labs SB Audigy MIDI/Game port (rev 
04)
01:09.2 FireWire (IEEE 1394): Creative Labs SB Audigy FireWire Port (rev 04)
01:0a.0 SCSI storage controller: Adaptec AHA-7850 (rev 03)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 Gigabit 
Ethernet Controller (rev 15)
04:00.0 VGA compatible controller: nVidia Corporation Unknown device 0092 (rev 
a1)

Also using Marvell's sk98lin driver (iirc, sky2 should supercede it soon 
enough). This is the only machine I'm using sk98lin on, but I haven't had any 
trouble with it on prior kernels.

Thanks,
Chase

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  1:17             ` James Bottomley
  2006-01-21  3:29               ` Anton Titov
@ 2006-01-21  3:45               ` Anton Titov
  2006-01-21  3:53                 ` Chase Venters
  1 sibling, 1 reply; 22+ messages in thread
From: Anton Titov @ 2006-01-21  3:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, Chase Venters, linux-kernel, linux-scsi

On Fri, 2006-01-20 at 19:17 -0600, James Bottomley wrote:
> There's another curiosity about this: the linux command stack is pretty
> well counted per scsi device (it's how we control queue depth), so if a
> driver leaks commands we see it not by this type of behaviour, but by
> the system hanging (waiting for all the commands the mid-layer thinks
> are outstanding to return).  So, the only way we could leak commands
> like this is in the mid-layer command return logic ... and I can't find
> anywhere this might happen.

Additionaly I've looked into Chase's dmesg and we seem to use pretty
much the same motherboard (at least Marvell NIC and ICH6 controller), so
it may be ICH6 issue? Or sk98lin (I have another sk98lin patched server,
which works well)?

just in case, here is lspci:

00:00.0 Host bridge: Intel Corporation 915G/P/GV/GL/PL/910GL Processor
to I/O Controller (rev 0e)
00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL
Express Chipset Family Graphics Controller (rev 0e)
00:02.1 Display controller: Intel Corporation 82915G Express Chipset
Family Graphics Controller (rev 0e)
00:1c.0 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) PCI Express Port 2 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d3)
00:1f.0 ISA bridge: Intel Corporation 82801FB/FR (ICH6/ICH6R) LPC
Interface Bridge (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6
Family) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801FR/FRW (ICH6R/ICH6RW)
SATA Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family)
SMBus Controller (rev 03)
01:04.0 Mass storage controller: <pci_lookup_name: buffer too small>
(rev 13)
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E
Gigabit Ethernet Controller (rev 15)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  3:29               ` Anton Titov
@ 2006-01-21  3:44                 ` Andrew Morton
  2006-01-25 22:20                 ` Matthias Urlichs
  1 sibling, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2006-01-21  3:44 UTC (permalink / raw)
  To: Anton Titov; +Cc: James.Bottomley, chase.venters, linux-kernel, linux-scsi

Anton Titov <a.titov@host.bg> wrote:
>
> On Fri, 2006-01-20 at 19:17 -0600, James Bottomley wrote:
> > On Fri, 2006-01-20 at 16:50 -0800, Andrew Morton wrote:
> > > For linux-scsi reference, Chase's /proc/slabinfo says:
> > > 
> > > scsi_cmd_cache    1547440 1547440    384   10    1 : tunables   54   27    8 : 
> > > slabdata 154744 154744      0
> > 
> > There's another curiosity about this: the linux command stack is pretty
> > well counted per scsi device (it's how we control queue depth), so if a
> > driver leaks commands we see it not by this type of behaviour, but by
> > the system hanging (waiting for all the commands the mid-layer thinks
> > are outstanding to return).  So, the only way we could leak commands
> > like this is in the mid-layer command return logic ... and I can't find
> > anywhere this might happen.
> > 
> 
> Just to mention, that 2.6.14.2 does not have this problem:
> 
> vip ~ # cat /proc/slabinfo | grep scsi
> scsi_cmd_cache        60     60    384   10    1 : tunables   54   27
> 8 : slabdata      6      6     27
> 
> but my guess is that the problem may be not in SCSI, as not /and
> previosly actually/ I have this:
> 
> vip ~ # cat /proc/slabinfo | grep reiser
> reiser_inode_cache 556594 556614    408    9    1 : tunables   54   27
> 8 : slabdata  61846  61846      0
> 
> which seems too high too

Having large numbers of cached inodes is fairly common.  Try running
something which uses lots of memory: memset(malloc(gigabytes)), or usemem
from http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz or
read a multi-gigabyte file from disk and you shuld see the inode count wind
down.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  1:17             ` James Bottomley
@ 2006-01-21  3:29               ` Anton Titov
  2006-01-21  3:44                 ` Andrew Morton
  2006-01-25 22:20                 ` Matthias Urlichs
  2006-01-21  3:45               ` Anton Titov
  1 sibling, 2 replies; 22+ messages in thread
From: Anton Titov @ 2006-01-21  3:29 UTC (permalink / raw)
  To: James Bottomley; +Cc: Andrew Morton, Chase Venters, linux-kernel, linux-scsi

On Fri, 2006-01-20 at 19:17 -0600, James Bottomley wrote:
> On Fri, 2006-01-20 at 16:50 -0800, Andrew Morton wrote:
> > For linux-scsi reference, Chase's /proc/slabinfo says:
> > 
> > scsi_cmd_cache    1547440 1547440    384   10    1 : tunables   54   27    8 : 
> > slabdata 154744 154744      0
> 
> There's another curiosity about this: the linux command stack is pretty
> well counted per scsi device (it's how we control queue depth), so if a
> driver leaks commands we see it not by this type of behaviour, but by
> the system hanging (waiting for all the commands the mid-layer thinks
> are outstanding to return).  So, the only way we could leak commands
> like this is in the mid-layer command return logic ... and I can't find
> anywhere this might happen.
> 

Just to mention, that 2.6.14.2 does not have this problem:

vip ~ # cat /proc/slabinfo | grep scsi
scsi_cmd_cache        60     60    384   10    1 : tunables   54   27
8 : slabdata      6      6     27

but my guess is that the problem may be not in SCSI, as not /and
previosly actually/ I have this:

vip ~ # cat /proc/slabinfo | grep reiser
reiser_inode_cache 556594 556614    408    9    1 : tunables   54   27
8 : slabdata  61846  61846      0

which seems too high too


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  0:50           ` Andrew Morton
@ 2006-01-21  1:17             ` James Bottomley
  2006-01-21  3:29               ` Anton Titov
  2006-01-21  3:45               ` Anton Titov
  0 siblings, 2 replies; 22+ messages in thread
From: James Bottomley @ 2006-01-21  1:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Chase Venters, a.titov, linux-kernel, linux-scsi

On Fri, 2006-01-20 at 16:50 -0800, Andrew Morton wrote:
> For linux-scsi reference, Chase's /proc/slabinfo says:
> 
> scsi_cmd_cache    1547440 1547440    384   10    1 : tunables   54   27    8 : 
> slabdata 154744 154744      0

There's another curiosity about this: the linux command stack is pretty
well counted per scsi device (it's how we control queue depth), so if a
driver leaks commands we see it not by this type of behaviour, but by
the system hanging (waiting for all the commands the mid-layer thinks
are outstanding to return).  So, the only way we could leak commands
like this is in the mid-layer command return logic ... and I can't find
anywhere this might happen.

The sequence is:

driver -> cmd->scsi_done() -> blk softirq -> scsi_softirq_done() ->
scsi_finish_cmd() (where the queue counts are decremented, so anything
after here could leak commands if the rest of the chain is broken) ->
cmd->done() (which is the ULD completion callback) ->
scsi_io_completion() (frees the sg table, so if the sgpool slabs aren't
out of whack we must be past here) -> scsi_end_request() ->
scsi_next_command() -> scsi_put_command() (which is where the command
goes back to the slab).

James


> > Curious - the -s... were you expecting the ring buffer 
> > to exceed 16384?
> 
> It can sometimes be quite large.  I always say -s 1000000 to make sure
> everything got there.
> 
> > I don't think my (boot time) buffer does.
> 
> It's compile-time configurable with CONFIG_LOG_BUF_SHIFT and boot-time
> configurable with log_buf_len=n.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-21  0:19         ` Chase Venters
@ 2006-01-21  0:50           ` Andrew Morton
  2006-01-21  1:17             ` James Bottomley
  2006-01-23  8:39           ` Jens Axboe
  1 sibling, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-01-21  0:50 UTC (permalink / raw)
  To: Chase Venters; +Cc: a.titov, linux-kernel, linux-scsi

Chase Venters <chase.venters@clientec.com> wrote:
>
> > Next time you reboot 2.6.15 on that machine can you please send the output
> > of `dmesg -s 1000000'?  You might have to set CONFIG_LOG_BUF_SHIFT=17 to
> > prevent it from being truncated.
> 
> Here's mine (attached).

Great, thanks.  That tells us all sorts of stuff about your setup.

For linux-scsi reference, Chase's /proc/slabinfo says:

scsi_cmd_cache    1547440 1547440    384   10    1 : tunables   54   27    8 : 
slabdata 154744 154744      0

> Curious - the -s... were you expecting the ring buffer 
> to exceed 16384?

It can sometimes be quite large.  I always say -s 1000000 to make sure
everything got there.

> I don't think my (boot time) buffer does.

It's compile-time configurable with CONFIG_LOG_BUF_SHIFT and boot-time
configurable with log_buf_len=n.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-20 22:50       ` Andrew Morton
  2006-01-21  0:09         ` Anton Titov
@ 2006-01-21  0:19         ` Chase Venters
  2006-01-21  0:50           ` Andrew Morton
  2006-01-23  8:39           ` Jens Axboe
  1 sibling, 2 replies; 22+ messages in thread
From: Chase Venters @ 2006-01-21  0:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Anton Titov, linux-kernel, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

On Friday 20 January 2006 16:49, Andrew Morton wrote:
> This is 2.6.15 and we have a deadly bug in scsi.
>
> Next time you reboot 2.6.15 on that machine can you please send the output
> of `dmesg -s 1000000'?  You might have to set CONFIG_LOG_BUF_SHIFT=17 to
> prevent it from being truncated.

Here's mine (attached). Curious - the -s... were you expecting the ring buffer 
to exceed 16384? I don't think my (boot time) buffer does.

Thanks,
Chase

[-- Attachment #2: dmesg --]
[-- Type: text/plain, Size: 31899 bytes --]

Linux version 2.6.15-ck1 (root@turbotaz) (gcc version 3.3.6 (Gentoo 3.3.6, ssp-3.3.6-1.0, pie-8.7.8)) #1 SMP PREEMPT Tue Jan 10 12:42:28 CST 2006
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003ffb0000 (usable)
 BIOS-e820: 000000003ffb0000 - 000000003ffbe000 (ACPI data)
 BIOS-e820: 000000003ffbe000 - 000000003fff0000 (ACPI NVS)
 BIOS-e820: 000000003fff0000 - 0000000040000000 (reserved)
 BIOS-e820: 00000000ffb80000 - 0000000100000000 (reserved)
1023MB LOWMEM available.
found SMP MP-table at 000ff780
On node 0 totalpages: 262064
  DMA zone: 4096 pages, LIFO batch:0
  DMA32 zone: 0 pages, LIFO batch:0
  Normal zone: 257968 pages, LIFO batch:31
  HighMem zone: 0 pages, LIFO batch:0
DMI 2.3 present.
Intel MultiProcessor Specification v1.4
    Virtual Wire compatibility mode.
OEM ID: INTEL    Product ID: DELUXE       APIC at: 0xFEE00000
Processor #0 15:4 APIC version 20
I/O APIC #2 Version 32 at 0xFEC00000.
Enabling APIC mode:  Flat.  Using 1 I/O APICs
Processors: 1
Allocating PCI resources starting at 50000000 (gap: 40000000:bfb80000)
Built 1 zonelists
Kernel command line: root=/dev/md1 noapic acpi=off console=ttyS0,38400n8 console=tty0
mapped APIC to ffffd000 (fee00000)
mapped IOAPIC to ffffc000 (fec00000)
Initializing CPU#0
CPU 0 irqstacks, hard=b0791000 soft=b078f000
PID hash table entries: 4096 (order: 12, 65536 bytes)
Detected 3212.427 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 131072 (order: 7, 524288 bytes)
Memory: 1030612k/1048256k available (4837k kernel code, 17088k reserved, 1597k data, 256k init, 0k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay using timer specific routine.. 6429.43 BogoMIPS (lpj=3214717)
Mount-cache hash table entries: 512
CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 0000441d 00000000 00000000
CPU: After vendor identify, caps: bfebfbff 00000000 00000000 00000000 0000441d 00000000 00000000
monitor/mwait feature present.
using mwait in idle threads.
CPU: Trace cache: 12K uops, L1 D cache: 16K
CPU: L2 cache: 1024K
CPU: Physical Processor ID: 0
CPU: After all inits, caps: bfebfbff 00000000 00000000 00000080 0000441d 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
mtrr: v2.0 (20020519)
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) Pentium(R) 4 CPU 3.20GHz stepping 01
Total of 1 processors activated (6429.43 BogoMIPS).
Brought up 1 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xf0031, last bus=4
PCI: Using configuration type 1
ACPI: Subsystem revision 20050902
ACPI: Interpreter disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI: disabled
Generic PHY: Registered new driver
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Ignoring BAR0-3 of IDE controller 0000:00:1f.1
Boot video device is 0000:04:00.0
PCI: Transparent bridge - 0000:00:1e.0
PCI: Discovered primary peer bus ff [IRQ]
PCI: Using IRQ router PIIX/ICH [8086/2640] at 0000:00:1f.0
PCI: IRQ 0 for device 0000:00:01.0 doesn't match PIRQ mask - try pci=usepirqmask
PCI: Found IRQ 10 for device 0000:00:01.0
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1c.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: IRQ 0 for device 0000:00:1c.1 doesn't match PIRQ mask - try pci=usepirqmask
PCI: Found IRQ 5 for device 0000:00:1c.1
PCI: Sharing IRQ 5 with 0000:02:00.0
PCI: Sharing IRQ 5 with 0000:01:09.0
PCI: IRQ 0 for device 0000:00:1f.1 doesn't match PIRQ mask - try pci=usepirqmask
PCI: Found IRQ 5 for device 0000:00:1f.1
PCI: Sharing IRQ 5 with 0000:00:1d.2
PCI: Sharing IRQ 5 with 0000:01:09.2
PCI: IRQ 0 for device 0000:00:1f.3 doesn't match PIRQ mask - try pci=usepirqmask
PCI: Found IRQ 3 for device 0000:00:1f.3
PCI: Sharing IRQ 3 with 0000:00:1d.1
PCI: Sharing IRQ 3 with 0000:00:1f.2
PCI: Bridge: 0000:00:01.0
  IO window: e000-efff
  MEM window: cdf00000-cfffffff
  PREFETCH window: d0000000-dfffffff
PCI: Bridge: 0000:00:1c.0
  IO window: d000-dfff
  MEM window: disabled.
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:1c.1
  IO window: c000-cfff
  MEM window: cde00000-cdefffff
  PREFETCH window: disabled.
PCI: Bridge: 0000:00:1e.0
  IO window: a000-bfff
  MEM window: cdd00000-cddfffff
  PREFETCH window: 50000000-500fffff
PCI: Found IRQ 10 for device 0000:00:01.0
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1c.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: Setting latency timer of device 0000:00:01.0 to 64
PCI: Found IRQ 10 for device 0000:00:1c.0
PCI: Sharing IRQ 10 with 0000:00:01.0
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: Setting latency timer of device 0000:00:1c.0 to 64
PCI: Found IRQ 5 for device 0000:00:1c.1
PCI: Sharing IRQ 5 with 0000:02:00.0
PCI: Sharing IRQ 5 with 0000:01:09.0
PCI: Setting latency timer of device 0000:00:1c.1 to 64
PCI: Setting latency timer of device 0000:00:1e.0 to 64
Machine check exception polling timer started.
IA-32 Microcode Update Driver: v1.14 <tigran@veritas.com>
audit: initializing netlink socket (disabled)
audit(1137716093.984:1): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
fuse init (API version 7.3)
JFS: nTxBlock = 8052, nTxLock = 64423
SGI XFS with ACLs, security attributes, large block numbers, no debug enabled
SGI XFS Quota Management subsystem
Initializing Cryptographic API
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
 0000:00:1d.0: uhci_check_and_reset_hc: legsup = 0x0f30
 0000:00:1d.0: Performing full reset
 0000:00:1d.1: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.1: Performing full reset
 0000:00:1d.2: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.2: Performing full reset
 0000:00:1d.3: uhci_check_and_reset_hc: legsup = 0x0030
 0000:00:1d.3: Performing full reset
0000:00:1d.7 EHCI: early BIOS handoff failed (BIOS bug ?)
PCI: Found IRQ 10 for device 0000:00:01.0
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1c.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: Setting latency timer of device 0000:00:01.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[pcie00]
PCI: Found IRQ 10 for device 0000:00:1c.0
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: Setting latency timer of device 0000:00:1c.0 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[pcie00]
Allocate Port Service[pcie02]
PCI: Found IRQ 5 for device 0000:00:1c.1
PCI: Sharing IRQ 5 with 0000:02:00.0
PCI: Sharing IRQ 5 with 0000:01:09.0
PCI: Setting latency timer of device 0000:00:1c.1 to 64
assign_interrupt_mode Found MSI capability
Allocate Port Service[pcie00]
Allocate Port Service[pcie02]
Real Time Clock Driver v1.12
Non-volatile memory driver v1.2
hw_random: RNG not detected
[drm] Initialized drm 1.0.0 20040925
PNP: No PS/2 controller found. Probing ports directly.
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
pktcdvd: v0.2.0a 2004-07-14 Jens Axboe (axboe@suse.de) and petero2@telia.com
Marvell 88E1101: Registered new driver
Davicom DM9161E: Registered new driver
Davicom DM9131: Registered new driver
Cicada Cis8204: Registered new driver
LXT970: Registered new driver
LXT971: Registered new driver
QS6612: Registered new driver
Linux video capture interface: v1.00
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ICH6: IDE controller at PCI slot 0000:00:1f.1
PCI: Found IRQ 5 for device 0000:00:1f.1
PCI: Sharing IRQ 5 with 0000:00:1d.2
PCI: Sharing IRQ 5 with 0000:01:09.2
ICH6: chipset revision 3
ICH6: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
Probing IDE interface ide0...
hda: PLEXTOR DVDR PX-716A, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Probing IDE interface ide1...
IT8212: IDE controller at PCI slot 0000:01:04.0
PCI: Found IRQ 11 for device 0000:01:04.0
PCI: Sharing IRQ 11 with 0000:00:1d.0
PCI: Sharing IRQ 11 with 0000:00:1d.7
IT8212: chipset revision 19
it821x: controller in pass through mode.
IT8212: 100% native mode on irq 11
    ide2: BM-DMA at 0xa880-0xa887, BIOS settings: hde:pio, hdf:pio
    ide3: BM-DMA at 0xa888-0xa88f, BIOS settings: hdg:pio, hdh:pio
Probing IDE interface ide2...
Probing IDE interface ide3...
Probing IDE interface ide1...
Probing IDE interface ide2...
Probing IDE interface ide3...
hda: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 8192kB Cache, UDMA(66)
Uniform CD-ROM driver Revision: 3.20
PCI: Found IRQ 11 for device 0000:01:0a.0
PCI: Sharing IRQ 11 with 0000:01:03.0
scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
        <Adaptec 2902/04/10/15/20C/30C SCSI adapter>
        aic7850: Ultra Single Channel A, SCSI Id=7, 3/253 SCBs

  Vendor: PLEXTOR   Model: CD-R   PX-W124TS  Rev: 1.07
  Type:   CD-ROM                             ANSI SCSI revision: 02
 target0:0:4: Beginning Domain Validation
 target0:0:4: FAST-20 SCSI 20.0 MB/s ST (50 ns, offset 8)
 target0:0:4: Domain Validation skipping write tests
 target0:0:4: Ending Domain Validation
libata version 1.20 loaded.
ata_piix 0000:00:1f.2: version 1.05
PCI: Found IRQ 3 for device 0000:00:1f.2
PCI: Sharing IRQ 3 with 0000:00:1d.1
PCI: Sharing IRQ 3 with 0000:00:1f.3
PCI: Setting latency timer of device 0000:00:1f.2 to 64
ata1: SATA max UDMA/133 cmd 0x9C00 ctl 0x9882 bmdma 0x9400 irq 3
ata2: SATA max UDMA/133 cmd 0x9800 ctl 0x9482 bmdma 0x9408 irq 3
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3e01 87:4003 88:203f
ata1: dev 0 ATA-6, max UDMA/100, 625142448 sectors: LBA48
ata1: dev 1 cfg 49:2f00 82:306b 83:7e01 84:4003 85:3069 86:3c01 87:4003 88:203f
ata1: dev 1 ATA-6, max UDMA/100, 625142448 sectors: LBA48
ata1: dev 0 configured for UDMA/100
ata1: dev 1 configured for UDMA/100
scsi1 : ata_piix
ata2: dev 0 cfg 49:2f00 82:306b 83:7e01 84:4003 85:3069 86:3c01 87:4003 88:203f
ata2: dev 0 ATA-6, max UDMA/100, 625142448 sectors: LBA48
ata2: dev 1 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata2: dev 1 ATA-6, max UDMA/100, 625142448 sectors: LBA48
ata2: dev 0 configured for UDMA/100
ata2: dev 1 configured for UDMA/100
scsi2 : ata_piix
  Vendor: ATA       Model: WDC WD3200JD-98K  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: WDC WD3200JD-60K  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: WDC WD3200JD-60K  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: WDC WD3200JD-00K  Rev: 08.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sda: drive cache: write back
 sda: sda1 sda2 sda3
sd 1:0:0:0: Attached scsi disk sda
SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdb: drive cache: write back
 sdb: sdb1 sdb2 sdb3
sd 1:0:1:0: Attached scsi disk sdb
SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdc: drive cache: write back
SCSI device sdc: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdc: drive cache: write back
 sdc: sdc1 sdc2 sdc3
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdd: drive cache: write back
SCSI device sdd: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdd: drive cache: write back
 sdd: sdd1 sdd2 sdd3
sd 2:0:1:0: Attached scsi disk sdd
sr0: scsi3-mmc drive: 32x/32x writer cd/rw xa/form2 cdda tray
sr 0:0:4:0: Attached scsi CD-ROM sr0
sr 0:0:4:0: Attached scsi generic sg0 type 5
sd 1:0:0:0: Attached scsi generic sg1 type 0
sd 1:0:1:0: Attached scsi generic sg2 type 0
sd 2:0:0:0: Attached scsi generic sg3 type 0
sd 2:0:1:0: Attached scsi generic sg4 type 0
ehci_hcd: block sizes: qh 128 qtd 96 itd 192 sitd 96
PCI: Found IRQ 11 for device 0000:00:1d.7
PCI: Sharing IRQ 11 with 0000:00:1d.0
PCI: Sharing IRQ 11 with 0000:01:04.0
PCI: Setting latency timer of device 0000:00:1d.7 to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
ehci_hcd 0000:00:1d.7: reset hcs_params 0x104208 dbg=1 cc=4 pcc=2 ordered !ppc ports=8
ehci_hcd 0000:00:1d.7: reset hcc_params 6871 thresh 7 uframes 1024 64 bit addr
ehci_hcd 0000:00:1d.7: debug port 1
ehci_hcd 0000:00:1d.7: capability 1000001 at 68
PCI: cache line size of 128 is not supported by device 0000:00:1d.7
drivers/usb/core/inode.c: creating file 'devices'
drivers/usb/core/inode.c: creating file '001'
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1d.7: irq 11, io mem 0xcdcff800
ehci_hcd 0000:00:1d.7: reset command 080012 (park)=0 ithresh=8 Periodic period=1024 Reset HALT
ehci_hcd 0000:00:1d.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: default language 0x0409
usb usb1: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.15-ck1 ehci_hcd
usb usb1: SerialNumber: 0000:00:1d.7
usb usb1: hotplug
usb usb1: adding 1-0:1.0 (config #1, interface 0)
usb 1-0:1.0: hotplug
hub 1-0:1.0: usb_probe_interface
hub 1-0:1.0: usb_probe_interface - got id
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 8 ports detected
hub 1-0:1.0: standalone hub
hub 1-0:1.0: no power switching (usb 1.0)
hub 1-0:1.0: individual port over-current protection
hub 1-0:1.0: Single TT
hub 1-0:1.0: TT requires at most 8 FS bit times (666 ns)
hub 1-0:1.0: power on to power good time: 20ms
hub 1-0:1.0: local power source is good
hub 1-0:1.0: state 5 ports 8 chg 0000 evt 0000
drivers/usb/core/inode.c: creating file '001'
ehci_hcd 0000:00:1d.7: GetStatus port 2 status 001803 POWER sig=j CSC CONNECT
hub 1-0:1.0: port 2, status 0501, change 0001, 480 Mb/s
USB Universal Host Controller Interface driver v2.3
PCI: Found IRQ 11 for device 0000:00:1d.0
PCI: Sharing IRQ 11 with 0000:00:1d.7
PCI: Sharing IRQ 11 with 0000:01:04.0
PCI: Setting latency timer of device 0000:00:1d.0 to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
uhci_hcd 0000:00:1d.0: detected 2 ports
uhci_hcd 0000:00:1d.0: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.0: Performing full reset
drivers/usb/core/inode.c: creating file '002'
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 2
uhci_hcd 0000:00:1d.0: irq 11, io base 0x00008880
usb usb2: default language 0x0409
usb usb2: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: UHCI Host Controller
hub 1-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x501
usb usb2: Manufacturer: Linux 2.6.15-ck1 uhci_hcd
usb usb2: SerialNumber: 0000:00:1d.0
usb usb2: hotplug
usb usb2: adding 2-0:1.0 (config #1, interface 0)
usb 2-0:1.0: hotplug
hub 2-0:1.0: usb_probe_interface
hub 2-0:1.0: usb_probe_interface - got id
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 2 ports detected
hub 2-0:1.0: standalone hub
hub 2-0:1.0: no power switching (usb 1.0)
hub 2-0:1.0: individual port over-current protection
hub 2-0:1.0: power on to power good time: 2ms
hub 2-0:1.0: local power source is good
ehci_hcd 0000:00:1d.7: port 2 high speed
ehci_hcd 0000:00:1d.7: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
usb 1-2: new high speed USB device using ehci_hcd and address 2
drivers/usb/core/inode.c: creating file '001'
PCI: Found IRQ 3 for device 0000:00:1d.1
PCI: Sharing IRQ 3 with 0000:00:1f.2
PCI: Sharing IRQ 3 with 0000:00:1f.3
ehci_hcd 0000:00:1d.7: port 2 high speed
ehci_hcd 0000:00:1d.7: GetStatus port 2 status 001005 POWER sig=se0 PE CONNECT
PCI: Setting latency timer of device 0000:00:1d.1 to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
uhci_hcd 0000:00:1d.1: detected 2 ports
uhci_hcd 0000:00:1d.1: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.1: Performing full reset
drivers/usb/core/inode.c: creating file '003'
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1d.1: irq 3, io base 0x00008c00
usb usb3: default language 0x0409
usb usb3: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.15-ck1 uhci_hcd
usb usb3: SerialNumber: 0000:00:1d.1
usb usb3: hotplug
usb usb3: adding 3-0:1.0 (config #1, interface 0)
usb 3-0:1.0: hotplug
hub 3-0:1.0: usb_probe_interface
hub 3-0:1.0: usb_probe_interface - got id
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
hub 3-0:1.0: standalone hub
hub 3-0:1.0: no power switching (usb 1.0)
hub 3-0:1.0: individual port over-current protection
hub 3-0:1.0: power on to power good time: 2ms
hub 3-0:1.0: local power source is good
usb 1-2: default language 0x0409
usb 1-2: new device strings: Mfr=0, Product=1, SerialNumber=0
usb 1-2: Product: USB2.0 Hub
usb 1-2: hotplug
usb 1-2: adding 1-2:1.0 (config #1, interface 0)
usb 1-2:1.0: hotplug
hub 1-2:1.0: usb_probe_interface
hub 1-2:1.0: usb_probe_interface - got id
hub 1-2:1.0: USB hub found
hub 1-2:1.0: 4 ports detected
hub 1-2:1.0: standalone hub
hub 1-2:1.0: individual port power switching
hub 1-2:1.0: individual port over-current protection
hub 1-2:1.0: Single TT
hub 1-2:1.0: TT requires at most 32 FS bit times (2664 ns)
hub 1-2:1.0: Port indicators are supported
hub 1-2:1.0: power on to power good time: 100ms
hub 1-2:1.0: local power source is good
hub 1-2:1.0: enabling power on all ports
drivers/usb/core/inode.c: creating file '001'
PCI: Found IRQ 5 for device 0000:00:1d.2
PCI: Sharing IRQ 5 with 0000:00:1f.1
PCI: Sharing IRQ 5 with 0000:01:09.2
usb 1-2: link qh256-0001/efad5100 start 255 [1/0 us]
drivers/usb/core/inode.c: creating file '002'
ehci_hcd 0000:00:1d.7: GetStatus port 5 status 001403 POWER sig=k CSC CONNECT
hub 1-0:1.0: port 5, status 0501, change 0001, 480 Mb/s
PCI: Setting latency timer of device 0000:00:1d.2 to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
uhci_hcd 0000:00:1d.2: detected 2 ports
uhci_hcd 0000:00:1d.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.2: Performing full reset
drivers/usb/core/inode.c: creating file '004'
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1d.2: irq 5, io base 0x00009000
usb usb4: default language 0x0409
usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.15-ck1 uhci_hcd
usb usb4: SerialNumber: 0000:00:1d.2
usb usb4: hotplug
usb usb4: adding 4-0:1.0 (config #1, interface 0)
usb 4-0:1.0: hotplug
hub 4-0:1.0: usb_probe_interface
hub 4-0:1.0: usb_probe_interface - got id
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
hub 4-0:1.0: standalone hub
hub 4-0:1.0: no power switching (usb 1.0)
hub 4-0:1.0: individual port over-current protection
hub 4-0:1.0: power on to power good time: 2ms
hub 4-0:1.0: local power source is good
hub 1-0:1.0: debounce: port 5: total 100ms stable 100ms status 0x501
ehci_hcd 0000:00:1d.7: port 5 low speed --> companion
ehci_hcd 0000:00:1d.7: GetStatus port 5 status 003002 POWER OWNER sig=se0 CSC
hub 1-0:1.0: state 5 ports 8 chg 0000 evt 0000
hub 2-0:1.0: state 5 ports 2 chg 0000 evt 0004
uhci_hcd 0000:00:1d.0: port 2 portsc 0082,00
hub 2-0:1.0: port 2, status 0100, change 0001, 12 Mb/s
drivers/usb/core/inode.c: creating file '001'
PCI: Found IRQ 10 for device 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: Setting latency timer of device 0000:00:1d.3 to 64
uhci_hcd 0000:00:1d.3: UHCI Host Controller
uhci_hcd 0000:00:1d.3: detected 2 ports
uhci_hcd 0000:00:1d.3: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.3: Performing full reset
drivers/usb/core/inode.c: creating file '005'
uhci_hcd 0000:00:1d.3: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1d.3: irq 10, io base 0x00009080
hub 2-0:1.0: debounce: port 2: total 100ms stable 100ms status 0x100
hub 3-0:1.0: state 5 ports 2 chg 0000 evt 0000
hub 1-2:1.0: state 5 ports 4 chg 0000 evt 0018
usb usb5: default language 0x0409
usb usb5: new device strings: Mfr=3, Product=2, SerialNumber=1
usb usb5: Product: UHCI Host Controller
usb usb5: Manufacturer: Linux 2.6.15-ck1 uhci_hcd
usb usb5: SerialNumber: 0000:00:1d.3
usb usb5: hotplug
usb usb5: adding 5-0:1.0 (config #1, interface 0)
usb 5-0:1.0: hotplug
hub 5-0:1.0: usb_probe_interface
hub 5-0:1.0: usb_probe_interface - got id
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
hub 1-2:1.0: port 3, status 0301, change 0001, 1.5 Mb/s
hub 5-0:1.0: standalone hub
hub 5-0:1.0: no power switching (usb 1.0)
hub 5-0:1.0: individual port over-current protection
hub 5-0:1.0: power on to power good time: 2ms
hub 5-0:1.0: local power source is good
drivers/usb/core/inode.c: creating file '001'
hub 1-2:1.0: debounce: port 3: total 100ms stable 100ms status 0x301
usb 1-2.3: new low speed USB device using ehci_hcd and address 4
usb 1-2.3: skipped 1 descriptor after interface
usb 1-2.3: default language 0x0409
usb 1-2.3: new device strings: Mfr=4, Product=26, SerialNumber=0
usb 1-2.3: Product: Keytronic USB Keyboard
usb 1-2.3: Manufacturer: Key Tronic
usb 1-2.3: hotplug
usb 1-2.3: adding 1-2.3:1.0 (config #1, interface 0)
usb 1-2.3:1.0: hotplug
usb 1-2.3: wrong descriptor type 00 for string 20 ("ic?Keytronic USB Keyboard?1234???????????????????????©")
drivers/usb/core/inode.c: creating file '004'
hub 1-2:1.0: port 4, status 0301, change 0001, 1.5 Mb/s
hub 1-2:1.0: debounce: port 4: total 100ms stable 100ms status 0x301
usb 1-2.4: new low speed USB device using ehci_hcd and address 5
usb 1-2.4: skipped 1 descriptor after interface
usb 1-2.4: default language 0x0409
usb 1-2.4: new device strings: Mfr=0, Product=1, SerialNumber=0
usb 1-2.4: Product: USB Mouse
usb 1-2.4: hotplug
usb 1-2.4: adding 1-2.4:1.0 (config #1, interface 0)
usb 1-2.4:1.0: hotplug
drivers/usb/core/inode.c: creating file '005'
hub 4-0:1.0: state 5 ports 2 chg 0000 evt 0002
uhci_hcd 0000:00:1d.2: port 1 portsc 01a3,00
hub 4-0:1.0: port 1, status 0301, change 0001, 1.5 Mb/s
usbcore: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
uhci_hcd 0000:00:1d.1: suspend_rh (auto-stop)
Initializing USB Mass Storage driver...
hub 4-0:1.0: debounce: port 1: total 100ms stable 100ms status 0x301
usb 4-1: new low speed USB device using uhci_hcd and address 2
uhci_hcd 0000:00:1d.0: suspend_rh (auto-stop)
usb 4-1: skipped 1 descriptor after interface
usb 4-1: default language 0x0409
usb 4-1: new device strings: Mfr=3, Product=1, SerialNumber=2
usb 4-1: Product: Back-UPS RS 1000 FW:7.g8 .D USB FW:g8 
uhci_hcd 0000:00:1d.3: suspend_rh (auto-stop)
usb 4-1: Manufacturer: American Power Conversion
usb 4-1: SerialNumber: QB0507149462  
usb 4-1: hotplug
usb 4-1: adding 4-1:1.0 (config #1, interface 0)
usb 4-1:1.0: hotplug
drivers/usb/core/inode.c: creating file '002'
hub 5-0:1.0: state 5 ports 2 chg 0000 evt 0000
hub 1-2:1.0: state 5 ports 4 chg 0000 evt 0010
usbcore: registered new driver usb-storage
hub 4-0:1.0: state 5 ports 2 chg 0000 evt 0002
USB Mass Storage support registered.
usbcore: registered new driver ati_remote
drivers/usb/input/ati_remote.c: Registered USB driver ATI/X10 RF USB Remote Control v. 2.2.1
usbcore: registered new driver hiddev
usbhid 1-2.3:1.0: usb_probe_interface
usbhid 1-2.3:1.0: usb_probe_interface - got id
input: Key Tronic Keytronic USB Keyboard as /class/input/input0
usb 1-2.3: link qh8-0601/efad5280 start 7 [1/2 us]
input: USB HID v1.10 Keyboard [Key Tronic Keytronic USB Keyboard] on usb-0000:00:1d.7-2.3
usbhid 1-2.4:1.0: usb_probe_interface
usbhid 1-2.4:1.0: usb_probe_interface - got id
input: USB Mouse as /class/input/input1
input: USB HID v1.00 Mouse [USB Mouse] on usb-0000:00:1d.7-2.4
usbhid 4-1:1.0: usb_probe_interface
usbhid 4-1:1.0: usb_probe_interface - got id
drivers/usb/core/file.c: looking for a minor, starting at 0
hiddev0: USB HID v1.10 Device [American Power Conversion Back-UPS RS 1000 FW:7.g8 .D USB FW:g8 ] on usb-0000:00:1d.2-1
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.6:USB HID core driver
usbcore: registered new driver usbserial
drivers/usb/serial/usb-serial.c: USB Serial support registered for generic
usbcore: registered new driver usbserial_generic
drivers/usb/serial/usb-serial.c: USB Serial Driver core
gameport: EMU10K1 is pci0000:01:09.1/gameport0, io 0xbc00, speed 932kHz
mice: PS/2 mouse device common for all mice
input: PC Speaker as /class/input/input2
I2O subsystem v1.288
i2o: max drivers = 8
I2O Configuration OSM v1.248
I2O Bus Adapter OSM v$Rev$
I2O Block Device OSM v1.287
I2O SCSI Peripheral OSM v1.282
I2O ProcFS OSM v1.145
i2c /dev entries driver
md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid10 personality registered as nr 9
md: raid5 personality registered as nr 4
raid5: automatically using best checksumming function: pIII_sse
   pIII_sse  :  4744.000 MB/sec
raid5: using function: pIII_sse (4744.000 MB/sec)
raid6: int32x1    921 MB/s
raid6: int32x2    992 MB/s
raid6: int32x4    726 MB/s
raid6: int32x8    609 MB/s
raid6: mmxx1     1964 MB/s
raid6: mmxx2     2269 MB/s
raid6: sse1x1    1164 MB/s
raid6: sse1x2    1250 MB/s
raid6: sse2x1    2226 MB/s
raid6: sse2x2    2402 MB/s
raid6: using algorithm sse2x2 (2402 MB/s)
md: raid6 personality registered as nr 8
md: multipath personality registered as nr 7
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
device-mapper: dm-multipath version 1.0.4 loaded
device-mapper: dm-round-robin version 1.0.0 loaded
device-mapper: dm-emc version 0.0.3 loaded
wbsd: Winbond W83L51xD SD/MMC card interface driver, 1.5
wbsd: Copyright(c) Pierre Ossman
padlock: VIA PadLock not detected.
oprofile: using NMI interrupt.
NET: Registered protocol family 2
IP route cache hash table entries: 65536 (order: 6, 262144 bytes)
TCP established hash table entries: 262144 (order: 9, 3145728 bytes)
TCP bind hash table entries: 65536 (order: 7, 786432 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
TCP reno registered
IPv4 over IPv4 tunneling driver
GRE over IPv4 tunneling driver
TCP bic registered
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
NET: Registered protocol family 15
Bridge firewalling registered
802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
All bugs added by David S. Miller <davem@redhat.com>
CCID: Registered CCID 3 (ccid3)
Using IPI Shortcut mode
md: Autodetecting RAID arrays.
spurious 8259A interrupt: IRQ7.
md: autorun ...
md: considering sdd3 ...
md:  adding sdd3 ...
md: sdd1 has different UUID to sdd3
md:  adding sdc3 ...
md: sdc1 has different UUID to sdd3
md:  adding sdb3 ...
md: sdb1 has different UUID to sdd3
md:  adding sda3 ...
md: sda1 has different UUID to sdd3
md: created md1
md: bind<sda3>
md: bind<sdb3>
md: bind<sdc3>
md: bind<sdd3>
md: running: <sdd3><sdc3><sdb3><sda3>
raid10: raid set md1 active with 4 out of 4 devices
md: considering sdd1 ...
md:  adding sdd1 ...
md:  adding sdc1 ...
md:  adding sdb1 ...
md:  adding sda1 ...
md: created md0
md: bind<sda1>
md: bind<sdb1>
md: bind<sdc1>
md: bind<sdd1>
md: running: <sdd1><sdc1><sdb1><sda1>
raid1: raid set md0 active with 4 out of 4 mirrors
md: ... autorun DONE.
ReiserFS: md1: found reiserfs format "3.6" with standard journal
ReiserFS: md1: using ordered data mode
ReiserFS: md1: journal params: device md1, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: md1: checking transaction log (md1)
ReiserFS: md1: Using r5 hash to sort names
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 256k freed
Adding 1004052k swap on /dev/sda2.  Priority:-1 extents:1 across:1004052k
Adding 1004052k swap on /dev/sdb2.  Priority:-2 extents:1 across:1004052k
PCI: Found IRQ 5 for device 0000:02:00.0
PCI: Sharing IRQ 5 with 0000:01:09.0
sk98lin: Network Device Driver v8.23.1.3
(C)Copyright 1999-2005 Marvell(R).
PCI: Found IRQ 5 for device 0000:02:00.0
PCI: Sharing IRQ 5 with 0000:01:09.0
PCI: Setting latency timer of device 0000:02:00.0 to 64
eth0: Yukon Gigabit Ethernet 10/100/1000Base-T Adapter
      PrefPort:A  RlmtMode:Check Link State
PCI: Found IRQ 10 for device 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Sharing IRQ 10 with 0000:04:00.0
PCI: Setting latency timer of device 0000:00:1b.0 to 64
nvidia: no version for "struct_module" found: kernel tainted.
nvidia: module license 'NVIDIA' taints kernel.
PCI: Found IRQ 10 for device 0000:04:00.0
PCI: Sharing IRQ 10 with 0000:00:1b.0
PCI: Sharing IRQ 10 with 0000:00:1d.3
PCI: Setting latency timer of device 0000:04:00.0 to 64
NVRM: loading NVIDIA Linux x86 NVIDIA Kernel Module  1.0-7676  Fri Jul 29 12:58:54 PDT 2005
ADDRCONF(NETDEV_UP): eth0: link is not ready
eth0: network connection up using port A
    speed:           100
    autonegotiation: yes
    duplex mode:     full
    flowctrl:        none
    irq moderation:  disabled
    tcp offload:     enabled
    scatter-gather:  enabled
    tx-checksum:     enabled
    rx-checksum:     enabled
    rx-polling:      enabled
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
cdrom: open failed.
eth0: no IPv6 routers present
usb 1-2.4: link qh8-3008/efad5300 start 7 [1/2 us]
usb 1-2.4: unlink qh8-3008/efad5300 start 7 [1/2 us]
ehci_hcd 0000:00:1d.7: reused qh efad5300 schedule
usb 1-2.4: link qh8-3008/efad5300 start 7 [1/2 us]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-20 22:50       ` Andrew Morton
@ 2006-01-21  0:09         ` Anton Titov
  2006-01-21  0:19         ` Chase Venters
  1 sibling, 0 replies; 22+ messages in thread
From: Anton Titov @ 2006-01-21  0:09 UTC (permalink / raw)
  To: Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On Fri, 2006-01-20 at 14:50 -0800, Andrew Morton wrote:
> That's great, thanks.
> 
> This is 2.6.15 and we have a deadly bug in scsi.
> 
> Next time you reboot 2.6.15 on that machine can you please send the output
> of `dmesg -s 1000000'?  You might have to set CONFIG_LOG_BUF_SHIFT=17 to
> prevent it from being truncated.

Sure, here is it, just rebooted, seems complete to me.

15mins after rebooting I have:

vip ~ # cat /proc/slabinfo | grep scsi
scsi_cmd_cache      6160   6160    384   10    1 : tunables   54   27
8 : slabdata    616    616      0
vip ~ # uptime
 02:04:49 up 15 min,  1 user,  load average: 0.16, 0.21, 0.19

[-- Attachment #2: dmesg.gz --]
[-- Type: application/x-gzip, Size: 4332 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-20 21:48     ` Anton Titov
@ 2006-01-20 22:50       ` Andrew Morton
  2006-01-21  0:09         ` Anton Titov
  2006-01-21  0:19         ` Chase Venters
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2006-01-20 22:50 UTC (permalink / raw)
  To: Anton Titov; +Cc: chase.venters, linux-kernel, linux-scsi

Anton Titov <a.titov@host.bg> wrote:
>
> On Fri, 2006-01-20 at 14:04 -0600, Chase Venters wrote:
> > On Fri, 20 Jan 2006, Andrew Morton wrote:
> > >> Jan 15 06:05:09 vip 216477 pages slab
> > >
> > > It's all in slab.  800MB.
> > >
> > > I'd be suspecting a slab memory leak.  If it happens again, please take a
> > > copy of /proc/slabinfo, send it.
> > >
> > 
> > Andrew & Anton,
> >  The culprit was 1.5 million SCSI commands in the scsi command cache. 
> > 
> > Thanks,
> > Chase
> 
> I currently have this:
> scsi_cmd_cache    1458778 1458790    384   10    1 : tunables   54 27
> 8 : slabdata 145879 145879      0
> 
> in /proc/slabinfo, which is pretty close to 1.5 million. The system is
> working fine but it should be not very loaded anyway, so a mem leakage
> will not show up early. Just checked, that scsi_cmd_cache on other
> machines of mine is under 100, so it seems like a problem.

That's great, thanks.

This is 2.6.15 and we have a deadly bug in scsi.

Next time you reboot 2.6.15 on that machine can you please send the output
of `dmesg -s 1000000'?  You might have to set CONFIG_LOG_BUF_SHIFT=17 to
prevent it from being truncated.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-20 20:04   ` Chase Venters
@ 2006-01-20 21:48     ` Anton Titov
  2006-01-20 22:50       ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Anton Titov @ 2006-01-20 21:48 UTC (permalink / raw)
  To: Chase Venters; +Cc: Andrew Morton, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1089 bytes --]

On Fri, 2006-01-20 at 14:04 -0600, Chase Venters wrote:
> On Fri, 20 Jan 2006, Andrew Morton wrote:
> >> Jan 15 06:05:09 vip 216477 pages slab
> >
> > It's all in slab.  800MB.
> >
> > I'd be suspecting a slab memory leak.  If it happens again, please take a
> > copy of /proc/slabinfo, send it.
> >
> 
> Andrew & Anton,
>  The culprit was 1.5 million SCSI commands in the scsi command cache. 
> 
> Thanks,
> Chase

I currently have this:
scsi_cmd_cache    1458778 1458790    384   10    1 : tunables   54 27
8 : slabdata 145879 145879      0

in /proc/slabinfo, which is pretty close to 1.5 million. The system is
working fine but it should be not very loaded anyway, so a mem leakage
will not show up early. Just checked, that scsi_cmd_cache on other
machines of mine is under 100, so it seems like a problem.

Unfortunately, while being a programmer, I'm totally unaware
what /proc/slabinfo means, but I'm perfectly willing to provide a shell
(in case of Andrew or other famous developer it may be even root) on
this machine.

I'm attaching the /proc/slabinfo

Thanks for help,
Anton



[-- Attachment #2: slab.gz --]
[-- Type: application/x-gzip, Size: 1863 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-20 12:11 ` Andrew Morton
@ 2006-01-20 20:04   ` Chase Venters
  2006-01-20 21:48     ` Anton Titov
  0 siblings, 1 reply; 22+ messages in thread
From: Chase Venters @ 2006-01-20 20:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Anton Titov, linux-kernel

On Fri, 20 Jan 2006, Andrew Morton wrote:
>> Jan 15 06:05:09 vip 216477 pages slab
>
> It's all in slab.  800MB.
>
> I'd be suspecting a slab memory leak.  If it happens again, please take a
> copy of /proc/slabinfo, send it.
>

Andrew & Anton,
 	I've experienced slab leaking in my system lately too. The culprit 
was 1.5 million SCSI commands in the scsi command cache. I haven't had an 
opportunity to look into it further yet; I'll try to copy you guys when I 
do.

Thanks,
Chase

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: OOM Killer killing whole system
  2006-01-15 15:05 Anton Titov
@ 2006-01-20 12:11 ` Andrew Morton
  2006-01-20 20:04   ` Chase Venters
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-01-20 12:11 UTC (permalink / raw)
  To: Anton Titov; +Cc: linux-kernel

Anton Titov <a.titov@host.bg> wrote:
>
> Yesterday I accidently noticed few OOM killer messages in the system log
>  and leaved a console tailing the log for the night. In 6 in the morning
>  OOM killer got mad generating 500 lines in the log and 5 minutes later
>  system closed the ssh connection and became inresponsive. The guy in the
>  datacenter told me that when he attached keyboard even caps lock was not
>  working. Inspite of this the system still was responsive (only to) ping.
> 
>  The strange thing is this machine is relatively light loaded - now after
>  6 hours being up free shows:
>               total       used       free     shared    buffers    cached
>  Mem:       2075468    1148564     926904          0     123472    314516
>  -/+ buffers/cache:     710576    1364892
>  Swap:      1004020          0    1004020
> 
>  Load average stays under 0.5 most of the time. In 6 in the morning it
>  should be almost no load (there is no crons scheduled at that time).
> 
>  I'm attaching messages from the log and my .config.

What kernel version?  <looks in config.gz>.   2.6.15.


> Jan 15 06:05:09 vip Normal free:3700kB min:3756kB low:4692kB high:5632kB active:9964kB inactive:8532kB present:901120kB pages_scanned:19628 

Pretty much all of the ZONE_NORMAL memory is AWOL.

> Jan 15 06:05:09 vip 216477 pages slab

It's all in slab.  800MB.

I'd be suspecting a slab memory leak.  If it happens again, please take a
copy of /proc/slabinfo, send it.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* OOM Killer killing whole system
@ 2006-01-15 15:05 Anton Titov
  2006-01-20 12:11 ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Anton Titov @ 2006-01-15 15:05 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 972 bytes --]

Yesterday I accidently noticed few OOM killer messages in the system log
and leaved a console tailing the log for the night. In 6 in the morning
OOM killer got mad generating 500 lines in the log and 5 minutes later
system closed the ssh connection and became inresponsive. The guy in the
datacenter told me that when he attached keyboard even caps lock was not
working. Inspite of this the system still was responsive (only to) ping.

The strange thing is this machine is relatively light loaded - now after
6 hours being up free shows:
             total       used       free     shared    buffers    cached
Mem:       2075468    1148564     926904          0     123472    314516
-/+ buffers/cache:     710576    1364892
Swap:      1004020          0    1004020

Load average stays under 0.5 most of the time. In 6 in the morning it
should be almost no load (there is no crons scheduled at that time).

I'm attaching messages from the log and my .config.

Anton Titov

[-- Attachment #2: log.gz --]
[-- Type: application/x-gzip, Size: 1320 bytes --]

[-- Attachment #3: config.gz --]
[-- Type: application/x-gzip, Size: 5700 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-01-25 22:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-23 12:55 OOM Killer killing whole system Nicolas Mailhot
  -- strict thread matches above, loose matches on Subject: below --
2006-01-15 15:05 Anton Titov
2006-01-20 12:11 ` Andrew Morton
2006-01-20 20:04   ` Chase Venters
2006-01-20 21:48     ` Anton Titov
2006-01-20 22:50       ` Andrew Morton
2006-01-21  0:09         ` Anton Titov
2006-01-21  0:19         ` Chase Venters
2006-01-21  0:50           ` Andrew Morton
2006-01-21  1:17             ` James Bottomley
2006-01-21  3:29               ` Anton Titov
2006-01-21  3:44                 ` Andrew Morton
2006-01-25 22:20                 ` Matthias Urlichs
2006-01-21  3:45               ` Anton Titov
2006-01-21  3:53                 ` Chase Venters
2006-01-21  4:21                   ` Anton Titov
2006-01-21  4:35                     ` Chase Venters
2006-01-23  2:56                       ` Kalin KOZHUHAROV
2006-01-23  3:28                         ` Chase Venters
2006-01-23  8:39           ` Jens Axboe
2006-01-23  8:41             ` Arjan van de Ven
2006-01-23  9:10               ` Chase Venters

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).