linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: buffer_head slab memory leak, Linux bug?
       [not found] <20010902140126.E28228@checkpoint.com.suse.lists.linux.kernel>
@ 2001-09-02 13:13 ` Andi Kleen
  0 siblings, 0 replies; 3+ messages in thread
From: Andi Kleen @ 2001-09-02 13:13 UTC (permalink / raw)
  To: Elisheva Alexander; +Cc: linux-kernel

Elisheva Alexander <ealexand@checkpoint.com> writes:


> it happens quite often (at random), so it's not too hard to recreate it.

It is linux telling you that your code is crappy ;)

It's easy to fix. You just need to fix the lock to not turn off interrupts
for such a long time. If you're writing non driver network code you likely don't need
an _irqsave lock anyways, as a _bh lock should suffice. Better would be to use 
a different lock structure however for such long locks that do not depend on blocking 
bottom halves or interrupts (e.g. see how the TCP socket lock works as an example) 

-Andi


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: buffer_head slab memory leak, Linux bug?
  2001-09-02 11:01 Elisheva Alexander
@ 2001-09-02 12:46 ` Alan Cox
  0 siblings, 0 replies; 3+ messages in thread
From: Alan Cox @ 2001-09-02 12:46 UTC (permalink / raw)
  To: eli7; +Cc: linux-kernel

> on an SMP machine i get:
> "stuck on TLB IPI wait (CPU#1)"
> the driver that i am debugging uses a spin lock, and sometimes we take the
> lock for a pretty long time.

Basically you can't hold a spinlock too long or the kernel wil conclude the
other processor has hung. Anything which is going to take a spinlock long
enough to trigger that event is so non-scalable its not funny

It could also be that you have a locking error and are leaving the lock
held in some obscure case - and genuinely deadlocking the box.


> this happens during heavy load, which is why i think that the problem
> is that in smp_flush_tlb() in ./arch/i386/kernel/smp.c, one of the CPUs gets 
> all upset that the other CPU is stuck in the lock for too long, and releases 
> it before it was ment to be released.

Probably

Alan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* buffer_head slab memory leak, Linux bug?
@ 2001-09-02 11:01 Elisheva Alexander
  2001-09-02 12:46 ` Alan Cox
  0 siblings, 1 reply; 3+ messages in thread
From: Elisheva Alexander @ 2001-09-02 11:01 UTC (permalink / raw)
  To: linux-kernel

Dear kernel list,

if anyone can send me some pointers or hints on how to tackle this bug i 
will be very happy.

on an SMP machine i get:
"stuck on TLB IPI wait (CPU#1)"
the driver that i am debugging uses a spin lock, and sometimes we take the
lock for a pretty long time.
this happens during heavy load, which is why i think that the problem
is that in smp_flush_tlb() in ./arch/i386/kernel/smp.c, one of the CPUs gets 
all upset that the other CPU is stuck in the lock for too long, and releases 
it before it was ment to be released.

things i did that didn't help:

a patch that fixed a similar problem in reiserfs
(http://www.geocrawler.com/mail/msg.php3?msg_id=3962182&list=3455)
the patch for the fast pentium problem, since i have a pentium III.
(http://www.ultraviolet.org/mail-archives/reiserfs.2000/6201.html)

i put a breakpoint when this occurs using kGDB, but i am not able to get 
the registers (and stack) of the CPU that is stuck, only the one that 
prints the message. so i don't really know where this occurs
in our own code. 
does anyone know how i may extract the stack of the second CPU at the 
time of this error?

I am using an Intel pentium III with dual CPU.
I am debugging check point's firewall and vpn modules, with kernel-2.2.14 
from the redhat RPM, but this also happens with the latest 2.2.19.

it happens quite often (at random), so it's not too hard to recreate it.

thanks a lot.

(please CC me, as i am not subscribed to the list.)

-- 
 Elisheva Alexander                          Software Developer
================================================================
 Email from people at checkpoint.com does not usually represent 
 official policy of Check Point (TM) Software Technologies Ltd.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-09-02 13:13 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20010902140126.E28228@checkpoint.com.suse.lists.linux.kernel>
2001-09-02 13:13 ` buffer_head slab memory leak, Linux bug? Andi Kleen
2001-09-02 11:01 Elisheva Alexander
2001-09-02 12:46 ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).