linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -rt] Preemption problem in kernel RT Patch
@ 2007-06-21 19:39 Beauchemin, Mark
  2007-06-22  8:03 ` Thomas Gleixner
  2009-10-14 12:55 ` dtslinux
  0 siblings, 2 replies; 15+ messages in thread
From: Beauchemin, Mark @ 2007-06-21 19:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, tglx

Hi,
	I've found a preemption problem in kernel/rtmutex.c:649.  The BUG_ON listed in the patch below makes sure a preemption event hasn't occurred since the thread last checked the owner of the lock.  If it did happen and the current task is now the owner, it asserts with BUG_ON.  With the RT-PATCH applied, however, interrupts are not disabled and preemption is possible.  The following patch removes the BUG_ON as it is an incorrect check in the rt kernel. I've checked the rtmutex code and it appears to handle this case just fine..

	Thanks,

		Mark Beauchemin

Here's the patch:

--- linux-2.6.21.3-rt9/kernel/rtmutex.c 2007-06-01 15:21:12.000000000 -0400
+++ linux-2.6.21.3-rt9_new/kernel/rtmutex.c     2007-06-20 12:15:44.000000000 -0400
@@ -646,7 +646,7 @@
          return;
        }
 
-       BUG_ON(rt_mutex_owner(lock) == current);
+/*     BUG_ON(rt_mutex_owner(lock) == current); */
 
        /*
         * Here we save whatever state the task was in originally,



Here's the bug assertion:


: ------------[ cut here ]------------                                                                                
Kernel BUG at c01c7bc4 [verbose debug info unavailable]                                                               
Oops: Exception in kernel mode, sig: 5 [#1]                                                                           
PREEMPT                                                                                                               
NIP: C01C7BC4 LR: C01C7BA8 CTR: C01531CCJun  1 23:06:51 BC122 kern.warn kernel: REGS: d010ba90 TRAP: 0700   Tainted: P
                                                                                                                      
MSR: 00021000 <ME>  CR: 24002082  XER: 00000000                                                                       
TASK = d0100920[5] 'softirq-timer/0' THREAD: d010a000                                                                 
GPR00: 00000001 D010BB40 D0100920 00000000 00000030 00000002 C0260000 00029000                                        
GPR08: D0100920 00000000 D2C07970 D0100920 9F472D4B 00121868 C0220000 00000000                                        
GPR16: 00000000 000F422C 29000000 0000003B 9ACA0000 C026562C D010A028 00004000                                        
GPR24: D191F898 00000000 D2C1CAC8 D2C1CB60 D2C07960 CEA752C0 D010A000 00029000                                        
NIP [C01C7BC4] rt_spin_lock_slowlock+0x60/0x1f8                                                                       
LR [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8                                                                        
Call Trace:                                                                                                           
[D010BB40] [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8 (unreliable)                                                   
[D010BB90] [C0153464] dev_queue_xmit+0x298/0x2a0         Tunnel2                                                      
[D010BBB0] [C0176398] ip_output+0x288/0x2dc                                                                           
[D010BBE0] [C01AC078] ipip_tunnel_xmit+0x508/0x698                                                                    
[D010BC60] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4                                                                 
[D010BC80] [C0153430] dev_queue_xmit+0x264/0x2a0         Tunnel4                                                      
[D010BCA0] [C0176398] ip_output+0x288/0x2dc                                                                           
[D010BCD0] [C01AC078] ipip_tunnel_xmit+0x508/0x698                                                                    
[D010BD50] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4                                                                 
[D010BD70] [C0153430] dev_queue_xmit+0x264/0x2a0         Tunnel2                                                      
[D010BD90] [C0176398] ip_output+0x288/0x2dc                                                                           
[D010BDC0] [C017685C] ip_queue_xmit+0x1ac/0x4e4                                                                       
[D010BE30] [C018762C] tcp_transmit_skb+0x390/0x810                                                                    
[D010BE70] [C018882C] tcp_retransmit_skb+0x160/0x638                                                                  
[D010BEA0] [C018BA5C] tcp_write_timer+0x274/0x6c0                                                                     
[D010BED0] [C0024314] run_timer_softirq+0x2d0/0xedc                                                                   
[D010BF80] [C001F1C4] ksoftirqd+0xf8/0x1b0                                                                            
[D010BFC0] [C0031588] kthread+0xc0/0xfc                                                                               
[D010BFF0] [C000471C] kernel_thread+0x44/0x60                                                                         
Instruction dump:                                                                                                     
913e000c 80030004 2f800000 419e0188 4be73599 2f830000 409e0144 801c0010                                               
5400003a 7c001278 7c000034 5400d97e <0f000000> 39200004 7f401028 7d20112d                                             
note: softirq-timer/0[5] exited with preempt_count 1                                                                  





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-06-21 19:39 [PATCH -rt] Preemption problem in kernel RT Patch Beauchemin, Mark
@ 2007-06-22  8:03 ` Thomas Gleixner
  2007-06-23 14:08   ` Beauchemin, Mark
  2009-10-14 12:55 ` dtslinux
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2007-06-22  8:03 UTC (permalink / raw)
  To: Beauchemin, Mark; +Cc: linux-kernel, mingo, David Miller

Mark,

please fix your mail client to do proper line wraps at column 78.

On Thu, 2007-06-21 at 15:39 -0400, Beauchemin, Mark wrote:
> Hi,
> 	I've found a preemption problem in kernel/rtmutex.c:649.  The BUG_ON
> listed in the patch below makes sure a preemption event hasn't
> occurred since the thread last checked the owner of the lock.  If it
> did happen and the current task is now the owner, it asserts with
> BUG_ON.  With the RT-PATCH applied, however, interrupts are not
> disabled and preemption is possible.  The following patch removes the
> BUG_ON as it is an incorrect check in the rt kernel. I've checked the
> rtmutex code and it appears to handle this case just fine..

Nice, but nevertheless wrong theory.

This check is part of the RT-Patch and it _is_ entirely correct: 
Something tries to do a spin_lock() on a lock, which the same task has
already locked before. That's what the BUG_ON is catching.

There is nothing which can make a task magically the owner of a lock,
whether preemption is enabled or not.

> Call Trace:                                                                                                           
> [D010BB40] [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8 (unreliable)                                                   
> [D010BB90] [C0153464] dev_queue_xmit+0x298/0x2a0         Tunnel2                                                      
> [D010BBB0] [C0176398] ip_output+0x288/0x2dc                                                                           
> [D010BBE0] [C01AC078] ipip_tunnel_xmit+0x508/0x698                                                                    
> [D010BC60] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4                                                                 
> [D010BC80] [C0153430] dev_queue_xmit+0x264/0x2a0         Tunnel4                                                      
> [D010BCA0] [C0176398] ip_output+0x288/0x2dc                                                                           
> [D010BCD0] [C01AC078] ipip_tunnel_xmit+0x508/0x698                                                                    
> [D010BD50] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4                                                                 
> [D010BD70] [C0153430] dev_queue_xmit+0x264/0x2a0         Tunnel2                                                      
> [D010BD90] [C0176398] ip_output+0x288/0x2dc                                                                           
> [D010BDC0] [C017685C] ip_queue_xmit+0x1ac/0x4e4                                                                       
> [D010BE30] [C018762C] tcp_transmit_skb+0x390/0x810                                                                    
> [D010BE70] [C018882C] tcp_retransmit_skb+0x160/0x638                                                                  
> [D010BEA0] [C018BA5C] tcp_write_timer+0x274/0x6c0                                                                     
> [D010BED0] [C0024314] run_timer_softirq+0x2d0/0xedc                                                                   
> [D010BF80] [C001F1C4] ksoftirqd+0xf8/0x1b0                                                                            
> [D010BFC0] [C0031588] kthread+0xc0/0xfc                                                                               
> [D010BFF0] [C000471C] kernel_thread+0x44/0x60                                                                         

Looks like the tunnel code is doing a nasty recursive thing. 

Dave, any idea ?

Mark, can you please turn on CONFIG_PROVE_LOCKING. This should produce
more detailed information about the problem.

	tglx



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-06-22  8:03 ` Thomas Gleixner
@ 2007-06-23 14:08   ` Beauchemin, Mark
  2007-06-23 14:26     ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Beauchemin, Mark @ 2007-06-23 14:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, mingo, David Miller

Thomas,


> please fix your mail client to do proper line wraps at column 78.

	Outlook sucks.  I'll install thunderbird this weekend.  sorry.

> Nice, but nevertheless wrong theory.
> 
> This check is part of the RT-Patch and it _is_ entirely correct: 
> Something tries to do a spin_lock() on a lock, which the same task has
> already locked before. That's what the BUG_ON is catching.
>
> There is nothing which can make a task magically the owner of a lock,
> whether preemption is enabled or not.

	Thanks for straightening me out.  I was reading the function 
try_to_take_rt_mutex wrong...  The problem makes more sense now.  The tunnel 
code encapsulates the current packet in a new packet and calls ip_output 
to get it to the destination.  If the routing table is changing(which 
I'm doing when this happens) it could be called recursively.  The tunnel
code tries to handle recursion at the top of ipip_tunnel_xmit:

	if (tunnel->recursion++) {
		tunnel->stat.collisions++;
		goto tx_error;
	} 

	The problem is it tries to take dev->lock which it already owns in 
dev_queue_xmit before the check for recursion.

	Unfortunately, every time I put in debug to see the routing 
changes which cause the bug, it doesn't happen.  I'll certainly try to 
reproduce it with CONFIG_PROVE_LOCKING on, but it won't be till end of next 
week as we have a release going out.

	Thanks for your help,

		Mark Beauchemin






^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-06-23 14:08   ` Beauchemin, Mark
@ 2007-06-23 14:26     ` Thomas Gleixner
  2007-07-24 15:48       ` Beauchemin, Mark
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2007-06-23 14:26 UTC (permalink / raw)
  To: Beauchemin, Mark; +Cc: linux-kernel, mingo, David Miller

Mark,

On Sat, 2007-06-23 at 10:08 -0400, Beauchemin, Mark wrote:
> 	Thanks for straightening me out.  I was reading the function 
> try_to_take_rt_mutex wrong...  The problem makes more sense now.  The tunnel 
> code encapsulates the current packet in a new packet and calls ip_output 
> to get it to the destination.  If the routing table is changing(which 
> I'm doing when this happens) it could be called recursively.  The tunnel
> code tries to handle recursion at the top of ipip_tunnel_xmit:
> 
> 	if (tunnel->recursion++) {
> 		tunnel->stat.collisions++;
> 		goto tx_error;
> 	} 
> 
> 	The problem is it tries to take dev->lock which it already owns in 
> dev_queue_xmit before the check for recursion.

Hmm, this sounds scary. On a vanilla kernel (with debugging disabled),
this code will simply deadlock.

Do you have a test case? If you need more help, please contact the
netdev folks (netdev@vger.kernel.org).

> 	Unfortunately, every time I put in debug to see the routing 
> changes which cause the bug, it doesn't happen.  I'll certainly try to 
> reproduce it with CONFIG_PROVE_LOCKING on, but it won't be till end of next 
> week as we have a release going out.

Well, you won't see much more than you already debugged. You see the
place where the lock was taken and the call trace of the function in the
same way you have seen it with the BUG_ON().

	tglx



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-06-23 14:26     ` Thomas Gleixner
@ 2007-07-24 15:48       ` Beauchemin, Mark
  2007-07-24 19:15         ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Beauchemin, Mark @ 2007-07-24 15:48 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, mingo, David Miller

Thomas,

	I think I've gotten to the heart of the problem.  Here's an excerpt
from the latest -rt patch:  net/core/dev.c in the function dev_queue_xmit


@@ -1588,11 +1588,17 @@ gso:
 	   Either shot noqueue qdisc, it is even simpler 8)
 	 */
 	if (dev->flags & IFF_UP) {
-		int cpu = smp_processor_id(); /* ok because BHs are off */
+		/*
+		 * No need to check for recursion with threaded interrupts:
+		 */
+#ifdef CONFIG_PREEMPT_RT
+		if (1) {
+#else
+		int cpu = raw_smp_processor_id(); /* ok because BHs are off */
 
 		if (dev->xmit_lock_owner != cpu) {
-
-			HARD_TX_LOCK(dev, cpu);
+#endif
+			HARD_TX_LOCK(dev);
 
 			if (!netif_queue_stopped(dev) &&
 			    !netif_subqueue_stopped(dev, skb->queue_mapping)) {


I'm not sure why the check for recursion has been removed.  
In the backtrace below, I think it would be caught by this check and 
not recursively call the spinlock code.

> Call Trace:                                                                                                           
> [D010BB40] [C01C7BA8] rt_spin_lock_slowlock+0x44/0x1f8 (unreliable)                                                   
> [D010BB90] [C0153464] dev_queue_xmit+0x298/0x2a0         Tunnel2                                                      
> [D010BBB0] [C0176398] ip_output+0x288/0x2dc                                                                           
> [D010BBE0] [C01AC078] ipip_tunnel_xmit+0x508/0x698                                                                    
> [D010BC60] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4                                                                 
> [D010BC80] [C0153430] dev_queue_xmit+0x264/0x2a0         Tunnel4                                                      
> [D010BCA0] [C0176398] ip_output+0x288/0x2dc                                                                           
> [D010BCD0] [C01AC078] ipip_tunnel_xmit+0x508/0x698                                                                    
> [D010BD50] [C0150DF4] dev_hard_start_xmit+0x1b4/0x2a4                                                                 
> [D010BD70] [C0153430] dev_queue_xmit+0x264/0x2a0         Tunnel2                                                      
> [D010BD90] [C0176398] ip_output+0x288/0x2dc                                                                           
> [D010BDC0] [C017685C] ip_queue_xmit+0x1ac/0x4e4                                                                       
> [D010BE30] [C018762C] tcp_transmit_skb+0x390/0x810                                                                    
> [D010BE70] [C018882C] tcp_retransmit_skb+0x160/0x638                                                                  
> [D010BEA0] [C018BA5C] tcp_write_timer+0x274/0x6c0                                                                     
> [D010BED0] [C0024314] run_timer_softirq+0x2d0/0xedc                                                                   
> [D010BF80] [C001F1C4] ksoftirqd+0xf8/0x1b0                                                                            
> [D010BFC0] [C0031588] kthread+0xc0/0xfc                                                                               
> [D010BFF0] [C000471C] kernel_thread+0x44/0x60                                                                         



I found one other place in the code which appears to do the same thing.
Although it is written to handle smp collisions, I think it should also
handle the error case above.


Index: linux-rt-rebase.q/net/sched/sch_generic.c
===================================================================
--- linux-rt-rebase.q.orig/net/sched/sch_generic.c
+++ linux-rt-rebase.q/net/sched/sch_generic.c
@@ -12,6 +12,7 @@
  */
 
 #include <linux/bitops.h>
+#include <linux/kallsyms.h>
 #include <linux/module.h>
 #include <linux/types.h>
 #include <linux/kernel.h>
@@ -150,16 +151,28 @@ static inline int qdisc_restart(struct n
 	 */
 	lockless = (dev->features & NETIF_F_LLTX);
 
-	if (!lockless && !netif_tx_trylock(dev)) {
-		/* Another CPU grabbed the driver tx lock */
-		return handle_dev_cpu_collision(skb, dev, q);
+	if (!lockless) {
+#ifdef CONFIG_PREEMPT_RT
+		netif_tx_lock(dev);
+#else
+		if (netif_tx_trylock(dev))
+			/* Another CPU grabbed the driver tx lock */
+			return handle_dev_cpu_collision(skb, dev, q);
+#endif


	What do you think?

		Thanks,
			Mark


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-07-24 15:48       ` Beauchemin, Mark
@ 2007-07-24 19:15         ` Ingo Molnar
  2007-07-24 19:36           ` Beauchemin, Mark
  2007-08-01 14:15           ` Beauchemin, Mark
  0 siblings, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2007-07-24 19:15 UTC (permalink / raw)
  To: Beauchemin, Mark; +Cc: Thomas Gleixner, linux-kernel, David Miller


* Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote:

> I'm not sure why the check for recursion has been removed.  In the 
> backtrace below, I think it would be caught by this check and not 
> recursively call the spinlock code.

ah ... i think i did it like that because i didnt realize that there 
would be a recursive call sequence, i was concentrating on recursive 
locking.

incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks 
quite similar to your suggested fix. Could you double-check that it 
solves your problem?

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-07-24 19:15         ` Ingo Molnar
@ 2007-07-24 19:36           ` Beauchemin, Mark
  2007-08-01 14:15           ` Beauchemin, Mark
  1 sibling, 0 replies; 15+ messages in thread
From: Beauchemin, Mark @ 2007-07-24 19:36 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller

Ingo,

> ah ... i think i did it like that because i didnt realize that there 
> would be a recursive call sequence, i was concentrating on recursive 
> locking.

	That makes sense.  It doesn't seem right that it gets called 
recursive in the first place.  I'm not sure, however, if the tunnel 
code can help it as it uses ip_output to send packets.

> incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks 
> quite similar to your suggested fix. Could you double-check that it 
> solves your problem?

I've downloaded .23-rc1-rt0.  It appears the issue remains in both places
as seen below:


net/core/dev.c:  dev_queue_xmit

#ifdef CONFIG_PREEMPT_RT
		if (1) {
#else
		int cpu = raw_smp_processor_id(); /* ok because BHs are off */

		if (dev->xmit_lock_owner != cpu) {
#endif


and net/sched/sch_generic.c: qdisc_restart

#ifdef CONFIG_PREEMPT_RT
		netif_tx_lock(dev);
#else
		if (netif_tx_trylock(dev))
			/* Another CPU grabbed the driver tx lock */
			return handle_dev_cpu_collision(skb, dev, q);
#endif

	Mark

	

-----Original Message-----
From: Ingo Molnar [mailto:mingo@elte.hu]
Sent: Tuesday, July 24, 2007 3:15 PM
To: Beauchemin, Mark
Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller
Subject: Re: [PATCH -rt] Preemption problem in kernel RT Patch



* Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote:

> I'm not sure why the check for recursion has been removed.  In the 
> backtrace below, I think it would be caught by this check and not 
> recursively call the spinlock code.

ah ... i think i did it like that because i didnt realize that there 
would be a recursive call sequence, i was concentrating on recursive 
locking.

incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks 
quite similar to your suggested fix. Could you double-check that it 
solves your problem?

	Ingo



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-07-24 19:15         ` Ingo Molnar
  2007-07-24 19:36           ` Beauchemin, Mark
@ 2007-08-01 14:15           ` Beauchemin, Mark
  2007-08-01 14:22             ` Beauchemin, Mark
  1 sibling, 1 reply; 15+ messages in thread
From: Beauchemin, Mark @ 2007-08-01 14:15 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller

Ingo,

I tried just removing the CONFIG_PREEMPT_RT code, but that drops 
packets if another task has the lock.  Here's the debug printouts:

<4>xmit_lock_owner owned by sigd not softirq-net-rx/
<4>xmit_lock_owner owned by sigd not softirq-net-rx/
<4>xmit_lock_owner owned by sigd not softirq-net-rx/

Our quality department has been testing the patch below for a 
few days and has not seen any problems.  It pretty much 
preserves the original -rt patch pieces, but adds recursive checking.

I changed xmit_lock_owner to a void * as it is now a pointer
to the task which owns the lock.  What do you think?

Thanks,
	Mark

diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h
--- linux-2.6.23-rc1-rt0/include/linux/netdevice.h      2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h  2007-08-01 09:01:32.000000000 -0400
@@ -468,7 +468,11 @@
        /* cpu id of processor entered to hard_start_xmit or -1,
           if nobody entered there.
         */
-       int                     xmit_lock_owner;
+#ifdef CONFIG_PREEMPT_RT
+       void *                  xmit_lock_owner;
+#else
+       int                     xmit_lock_owner;
+#endif
        void                    *priv;  /* pointer to private data      */
        int                     (*hard_start_xmit) (struct sk_buff *skb,
                                                    struct net_device *dev);
@@ -1041,32 +1045,54 @@
 static inline void netif_tx_lock(struct net_device *dev)
 {
        spin_lock(&dev->_xmit_lock);
-       dev->xmit_lock_owner = raw_smp_processor_id();
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)current;
+#else
+       dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
 }
 
 static inline void netif_tx_lock_bh(struct net_device *dev)
 {
        spin_lock_bh(&dev->_xmit_lock);
-       dev->xmit_lock_owner = raw_smp_processor_id();
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)current;
+#else
+       dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
 }
 
 static inline int netif_tx_trylock(struct net_device *dev)
 {
        int ok = spin_trylock(&dev->_xmit_lock);
        if (likely(ok))
-               dev->xmit_lock_owner = raw_smp_processor_id();
+       {
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)current;
+#else
+       dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
+    }
        return ok;
 }
 
 static inline void netif_tx_unlock(struct net_device *dev)
 {
+#ifdef CONFIG_PREEMPT_RT
        dev->xmit_lock_owner = -1;
+#else
+       dev->xmit_lock_owner = (void *)-1;
+#endif
        spin_unlock(&dev->_xmit_lock);
 }
 
 static inline void netif_tx_unlock_bh(struct net_device *dev)
 {
+#ifdef CONFIG_PREEMPT_RT
        dev->xmit_lock_owner = -1;
+#else
+       dev->xmit_lock_owner = (void *)-1;
+#endif
        spin_unlock_bh(&dev->_xmit_lock);
 }
 
diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c
--- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/core/dev.c     2007-08-01 08:56:02.000000000 -0400
@@ -1592,7 +1592,7 @@
                 * No need to check for recursion with threaded interrupts:
                 */
 #ifdef CONFIG_PREEMPT_RT
-               if (1) {
+               if (dev->xmit_lock_owner != (void *)current) {
 #else
                int cpu = raw_smp_processor_id(); /* ok because BHs are off */
 
diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c
--- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c        2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c    2007-08-01 08:57:14.000000000 -0400
@@ -153,7 +153,13 @@
 
        if (!lockless) {
 #ifdef CONFIG_PREEMPT_RT
-               netif_tx_lock(dev);
+        if (dev->xmit_lock_owner == (void *)current) {
+            kfree_skb(skb);
+            if (net_ratelimit())
+                printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+            return -1;
+        }
+        netif_tx_lock(dev);
 #else
                if (netif_tx_trylock(dev))
                        /* Another CPU grabbed the driver tx lock */



-----Original Message-----
From: Ingo Molnar [mailto:mingo@elte.hu]
Sent: Tuesday, July 24, 2007 3:15 PM
To: Beauchemin, Mark
Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller
Subject: Re: [PATCH -rt] Preemption problem in kernel RT Patch



* Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote:

> I'm not sure why the check for recursion has been removed.  In the 
> backtrace below, I think it would be caught by this check and not 
> recursively call the spinlock code.

ah ... i think i did it like that because i didnt realize that there 
would be a recursive call sequence, i was concentrating on recursive 
locking.

incidentally, this code got cleaned up in .23-rc1-rt0, and now it looks 
quite similar to your suggested fix. Could you double-check that it 
solves your problem?

	Ingo



^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-08-01 14:15           ` Beauchemin, Mark
@ 2007-08-01 14:22             ` Beauchemin, Mark
  2007-08-06  7:13               ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Beauchemin, Mark @ 2007-08-01 14:22 UTC (permalink / raw)
  To: Beauchemin, Mark, Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller

sorry..  I sent the wrong patch file.  There was a warning in the other one.



diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h
--- linux-2.6.23-rc1-rt0/include/linux/netdevice.h      2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h  2007-08-01 09:10:01.000000000 -0400
@@ -468,7 +468,11 @@
        /* cpu id of processor entered to hard_start_xmit or -1,
           if nobody entered there.
         */
-       int                     xmit_lock_owner;
+#ifdef CONFIG_PREEMPT_RT
+       void *                  xmit_lock_owner;
+#else
+       int                     xmit_lock_owner;
+#endif
        void                    *priv;  /* pointer to private data      */
        int                     (*hard_start_xmit) (struct sk_buff *skb,
                                                    struct net_device *dev);
@@ -1041,32 +1045,54 @@
 static inline void netif_tx_lock(struct net_device *dev)
 {
        spin_lock(&dev->_xmit_lock);
-       dev->xmit_lock_owner = raw_smp_processor_id();
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)current;
+#else
+       dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
 }
 
 static inline void netif_tx_lock_bh(struct net_device *dev)
 {
        spin_lock_bh(&dev->_xmit_lock);
-       dev->xmit_lock_owner = raw_smp_processor_id();
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)current;
+#else
+       dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
 }
 
 static inline int netif_tx_trylock(struct net_device *dev)
 {
        int ok = spin_trylock(&dev->_xmit_lock);
        if (likely(ok))
-               dev->xmit_lock_owner = raw_smp_processor_id();
+       {
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)current;
+#else
+       dev->xmit_lock_owner = raw_smp_processor_id();
+#endif
+    }
        return ok;
 }
 
 static inline void netif_tx_unlock(struct net_device *dev)
 {
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)-1;
+#else
        dev->xmit_lock_owner = -1;
+#endif
        spin_unlock(&dev->_xmit_lock);
 }
 
 static inline void netif_tx_unlock_bh(struct net_device *dev)
 {
+#ifdef CONFIG_PREEMPT_RT
+       dev->xmit_lock_owner = (void *)-1;
+#else
        dev->xmit_lock_owner = -1;
+#endif
        spin_unlock_bh(&dev->_xmit_lock);
 }
 
diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c
--- linux-2.6.23-rc1-rt0/net/core/dev.c 2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/core/dev.c     2007-08-01 08:56:02.000000000 -0400
@@ -1592,7 +1592,7 @@
                 * No need to check for recursion with threaded interrupts:
                 */
 #ifdef CONFIG_PREEMPT_RT
-               if (1) {
+               if (dev->xmit_lock_owner != (void *)current) {
 #else
                int cpu = raw_smp_processor_id(); /* ok because BHs are off */
 diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c
--- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c        2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c    2007-08-01 08:57:14.000000000 -0400
@@ -153,7 +153,13 @@
 
        if (!lockless) {
 #ifdef CONFIG_PREEMPT_RT
-               netif_tx_lock(dev);
+        if (dev->xmit_lock_owner == (void *)current) {
+            kfree_skb(skb);
+            if (net_ratelimit())
+                printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+            return -1;
+        }
+        netif_tx_lock(dev);
 #else
                if (netif_tx_trylock(dev))
                        /* Another CPU grabbed the driver tx lock */

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-08-01 14:22             ` Beauchemin, Mark
@ 2007-08-06  7:13               ` Ingo Molnar
  2007-08-07 19:41                 ` Beauchemin, Mark
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2007-08-06  7:13 UTC (permalink / raw)
  To: Beauchemin, Mark; +Cc: Thomas Gleixner, linux-kernel, David Miller


* Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote:

> sorry..  I sent the wrong patch file.  There was a warning in the 
> other one.

> -       int                     xmit_lock_owner;
> +#ifdef CONFIG_PREEMPT_RT
> +       void *                  xmit_lock_owner;
> +#else
> +       int                     xmit_lock_owner;
> +#endif

could you please change this to use 'current' (instead of the CPU 
number) as the xmit_lock_owner unconditionally? That results in much 
fewer #ifdefs and far cleaner code.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-08-06  7:13               ` Ingo Molnar
@ 2007-08-07 19:41                 ` Beauchemin, Mark
  2007-09-17 13:03                   ` Beauchemin, Mark
  0 siblings, 1 reply; 15+ messages in thread
From: Beauchemin, Mark @ 2007-08-07 19:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel, David Miller



> could you please change this to use 'current' (instead of the CPU 
> number) as the xmit_lock_owner unconditionally? That results in much 
> fewer #ifdefs and far cleaner code.
> 
> 	Ingo

Ingo,

	Here's the new patch.  Please check me on the non-rt portion.  
I think the check is still functionally the same.  

Thanks,
	Mark  


diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h
--- linux-2.6.23-rc1-rt0/include/linux/netdevice.h	2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h	2007-08-06 09:26:51.000000000 -0400
@@ -468,7 +468,7 @@
 	/* cpu id of processor entered to hard_start_xmit or -1,
 	   if nobody entered there.
 	 */
-	int			xmit_lock_owner;
+	void *			xmit_lock_owner;
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 						    struct net_device *dev);
@@ -1041,32 +1041,34 @@
 static inline void netif_tx_lock(struct net_device *dev)
 {
 	spin_lock(&dev->_xmit_lock);
-	dev->xmit_lock_owner = raw_smp_processor_id();
+ 	dev->xmit_lock_owner = (void *)current;
 }
 
 static inline void netif_tx_lock_bh(struct net_device *dev)
 {
 	spin_lock_bh(&dev->_xmit_lock);
-	dev->xmit_lock_owner = raw_smp_processor_id();
+ 	dev->xmit_lock_owner = (void *)current;
 }
 
 static inline int netif_tx_trylock(struct net_device *dev)
 {
 	int ok = spin_trylock(&dev->_xmit_lock);
 	if (likely(ok))
-		dev->xmit_lock_owner = raw_smp_processor_id();
+	{
+     	dev->xmit_lock_owner = (void *)current;
+    }
 	return ok;
 }
 
 static inline void netif_tx_unlock(struct net_device *dev)
 {
-	dev->xmit_lock_owner = -1;
+	dev->xmit_lock_owner = (void *)-1;
 	spin_unlock(&dev->_xmit_lock);
 }
 
 static inline void netif_tx_unlock_bh(struct net_device *dev)
 {
-	dev->xmit_lock_owner = -1;
+	dev->xmit_lock_owner = (void *)-1;
 	spin_unlock_bh(&dev->_xmit_lock);
 }
 
diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c
--- linux-2.6.23-rc1-rt0/net/core/dev.c	2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/core/dev.c	2007-08-07 15:22:31.000000000 -0400
@@ -1588,16 +1588,7 @@
 	   Either shot noqueue qdisc, it is even simpler 8)
 	 */
 	if (dev->flags & IFF_UP) {
-		/*
-		 * No need to check for recursion with threaded interrupts:
-		 */
-#ifdef CONFIG_PREEMPT_RT
-		if (1) {
-#else
-		int cpu = raw_smp_processor_id(); /* ok because BHs are off */
-
-		if (dev->xmit_lock_owner != cpu) {
-#endif
+		if (dev->xmit_lock_owner != (void *)current) {
 			HARD_TX_LOCK(dev);
 
 			if (!netif_queue_stopped(dev) &&
@@ -3349,7 +3340,7 @@
 	spin_lock_init(&dev->queue_lock);
 	spin_lock_init(&dev->_xmit_lock);
 	netdev_set_lockdep_class(&dev->_xmit_lock, dev->type);
-	dev->xmit_lock_owner = -1;
+	dev->xmit_lock_owner = (void *)-1;
 	spin_lock_init(&dev->ingress_lock);
 
 	dev->iflink = -1;
diff -ur linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c
--- linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c	2007-07-24 15:15:57.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c	2007-08-07 15:21:28.000000000 -0400
@@ -2413,7 +2413,7 @@
 static inline void netif_tx_lock_nested(struct net_device *dev, int subclass)
 {
 	spin_lock_nested(&dev->_xmit_lock, subclass);
-	dev->xmit_lock_owner = smp_processor_id();
+	dev->xmit_lock_owner = (void *)current;
 }
 
 static void ieee80211_set_multicast_list(struct net_device *dev)
diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c
--- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c	2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c	2007-08-07 15:20:10.000000000 -0400
@@ -88,7 +88,7 @@
 {
 	int ret;
 
-	if (unlikely(dev->xmit_lock_owner == smp_processor_id())) {
+	if (unlikely(dev->xmit_lock_owner == (void *)current)) {
 		/*
 		 * Same CPU holding the lock. It may be a transient
 		 * configuration error, when hard_start_xmit() recurses. We
@@ -153,7 +153,13 @@
 
 	if (!lockless) {
 #ifdef CONFIG_PREEMPT_RT
-		netif_tx_lock(dev);
+        if (dev->xmit_lock_owner == (void *)current) {
+            kfree_skb(skb);
+            if (net_ratelimit())
+                printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+            return -1;
+        }
+        netif_tx_lock(dev);
 #else
 		if (netif_tx_trylock(dev))
 			/* Another CPU grabbed the driver tx lock */




^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-08-07 19:41                 ` Beauchemin, Mark
@ 2007-09-17 13:03                   ` Beauchemin, Mark
  2007-09-17 13:59                     ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Beauchemin, Mark @ 2007-09-17 13:03 UTC (permalink / raw)
  To: Beauchemin, Mark, Ingo Molnar; +Cc: Thomas Gleixner, linux-kernel


Ingo,

	Any thoughts on the patch?

Thanks,
	Mark

-----Original Message-----
From: Beauchemin, Mark 
Sent: Tuesday, August 07, 2007 3:42 PM
To: 'Ingo Molnar'
Cc: Thomas Gleixner; linux-kernel@vger.kernel.org; David Miller
Subject: RE: [PATCH -rt] Preemption problem in kernel RT Patch



diff -ur linux-2.6.23-rc1-rt0/include/linux/netdevice.h linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h
--- linux-2.6.23-rc1-rt0/include/linux/netdevice.h	2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/include/linux/netdevice.h	2007-08-06 09:26:51.000000000 -0400
@@ -468,7 +468,7 @@
 	/* cpu id of processor entered to hard_start_xmit or -1,
 	   if nobody entered there.
 	 */
-	int			xmit_lock_owner;
+	void *			xmit_lock_owner;
 	void			*priv;	/* pointer to private data	*/
 	int			(*hard_start_xmit) (struct sk_buff *skb,
 						    struct net_device *dev);
@@ -1041,32 +1041,34 @@
 static inline void netif_tx_lock(struct net_device *dev)
 {
 	spin_lock(&dev->_xmit_lock);
-	dev->xmit_lock_owner = raw_smp_processor_id();
+ 	dev->xmit_lock_owner = (void *)current;
 }
 
 static inline void netif_tx_lock_bh(struct net_device *dev)
 {
 	spin_lock_bh(&dev->_xmit_lock);
-	dev->xmit_lock_owner = raw_smp_processor_id();
+ 	dev->xmit_lock_owner = (void *)current;
 }
 
 static inline int netif_tx_trylock(struct net_device *dev)
 {
 	int ok = spin_trylock(&dev->_xmit_lock);
 	if (likely(ok))
-		dev->xmit_lock_owner = raw_smp_processor_id();
+	{
+     	dev->xmit_lock_owner = (void *)current;
+    }
 	return ok;
 }
 
 static inline void netif_tx_unlock(struct net_device *dev)
 {
-	dev->xmit_lock_owner = -1;
+	dev->xmit_lock_owner = (void *)-1;
 	spin_unlock(&dev->_xmit_lock);
 }
 
 static inline void netif_tx_unlock_bh(struct net_device *dev)
 {
-	dev->xmit_lock_owner = -1;
+	dev->xmit_lock_owner = (void *)-1;
 	spin_unlock_bh(&dev->_xmit_lock);
 }
 
diff -ur linux-2.6.23-rc1-rt0/net/core/dev.c linux-2.6.23-rc1-rt0_new/net/core/dev.c
--- linux-2.6.23-rc1-rt0/net/core/dev.c	2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/core/dev.c	2007-08-07 15:22:31.000000000 -0400
@@ -1588,16 +1588,7 @@
 	   Either shot noqueue qdisc, it is even simpler 8)
 	 */
 	if (dev->flags & IFF_UP) {
-		/*
-		 * No need to check for recursion with threaded interrupts:
-		 */
-#ifdef CONFIG_PREEMPT_RT
-		if (1) {
-#else
-		int cpu = raw_smp_processor_id(); /* ok because BHs are off */
-
-		if (dev->xmit_lock_owner != cpu) {
-#endif
+		if (dev->xmit_lock_owner != (void *)current) {
 			HARD_TX_LOCK(dev);
 
 			if (!netif_queue_stopped(dev) &&
@@ -3349,7 +3340,7 @@
 	spin_lock_init(&dev->queue_lock);
 	spin_lock_init(&dev->_xmit_lock);
 	netdev_set_lockdep_class(&dev->_xmit_lock, dev->type);
-	dev->xmit_lock_owner = -1;
+	dev->xmit_lock_owner = (void *)-1;
 	spin_lock_init(&dev->ingress_lock);
 
 	dev->iflink = -1;
diff -ur linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c
--- linux-2.6.23-rc1-rt0/net/mac80211/ieee80211.c	2007-07-24 15:15:57.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/mac80211/ieee80211.c	2007-08-07 15:21:28.000000000 -0400
@@ -2413,7 +2413,7 @@
 static inline void netif_tx_lock_nested(struct net_device *dev, int subclass)
 {
 	spin_lock_nested(&dev->_xmit_lock, subclass);
-	dev->xmit_lock_owner = smp_processor_id();
+	dev->xmit_lock_owner = (void *)current;
 }
 
 static void ieee80211_set_multicast_list(struct net_device *dev)
diff -ur linux-2.6.23-rc1-rt0/net/sched/sch_generic.c linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c
--- linux-2.6.23-rc1-rt0/net/sched/sch_generic.c	2007-07-24 15:17:07.000000000 -0400
+++ linux-2.6.23-rc1-rt0_new/net/sched/sch_generic.c	2007-08-07 15:20:10.000000000 -0400
@@ -88,7 +88,7 @@
 {
 	int ret;
 
-	if (unlikely(dev->xmit_lock_owner == smp_processor_id())) {
+	if (unlikely(dev->xmit_lock_owner == (void *)current)) {
 		/*
 		 * Same CPU holding the lock. It may be a transient
 		 * configuration error, when hard_start_xmit() recurses. We
@@ -153,7 +153,13 @@
 
 	if (!lockless) {
 #ifdef CONFIG_PREEMPT_RT
-		netif_tx_lock(dev);
+        if (dev->xmit_lock_owner == (void *)current) {
+            kfree_skb(skb);
+            if (net_ratelimit())
+                printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+            return -1;
+        }
+        netif_tx_lock(dev);
 #else
 		if (netif_tx_trylock(dev))
 			/* Another CPU grabbed the driver tx lock */




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-09-17 13:03                   ` Beauchemin, Mark
@ 2007-09-17 13:59                     ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2007-09-17 13:59 UTC (permalink / raw)
  To: Beauchemin, Mark; +Cc: Thomas Gleixner, linux-kernel


* Beauchemin, Mark <Mark.Beauchemin@sycamorenet.com> wrote:

> Ingo,
> 
> 	Any thoughts on the patch?

looks good to me - but it has a number of style issues, please run it 
through scripts/checkpatch.pl to see those.

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -rt] Preemption problem in kernel RT Patch
  2007-06-21 19:39 [PATCH -rt] Preemption problem in kernel RT Patch Beauchemin, Mark
  2007-06-22  8:03 ` Thomas Gleixner
@ 2009-10-14 12:55 ` dtslinux
  2009-10-14 15:38   ` Steven Rostedt
  1 sibling, 1 reply; 15+ messages in thread
From: dtslinux @ 2009-10-14 12:55 UTC (permalink / raw)
  To: linux-kernel

Hello All,

I am having an issue in kernel 2.6.24.7 with RT-27 patch. I am using a block device driver that is doing I/O operations on a virtual device. The driver is using separate kernel threads to perform read and write operations. The driver is working fine in the normal kernels, it is also working fine in RT-27 patch with 2.6.24.7 kernel, but some times I am getting following bug when performing write test with xdd benchmark (in RT-27 patch with 2.6.24.7 kernel)  :

WARNING: at kernel/rtmutex.c:979 rt_spin_lock()
Pid: 12634, comm: pdflush Tainted: GF       2.6.24.7-rt27 #9
 [<c04046b8>] show_trace_log_lvl+0x1f/0x34
 [<c0404f67>] show_trace+0x17/0x19
 [<c04052e2>] dump_stack+0x6f/0x75
 [<c063ac74>] rt_spin_lock+0x4a/0xa2
 [<c04f33a4>] cfq_exit_io_context+0x30/0x56
 [<c04ed88f>] exit_io_context+0x68/0x72
 [<c04206c1>] do_exit+0x6c2/0x739
 [<c040432d>] kernel_thread_helper+0xd/0x10
 =======================

<1>BUG: unable to handle kernel NULL pointer dereference at virtual address 0000003d
printing eip: c043d18a *pdpt = 00000000349dd001 *pde = 0000000000000000

<0>Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
Modules linked in: sysfs_driver(F) regularcache(F) dts nls_utf8 hfsplus ramdisk_driver bridge autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp ib_ipoib ipv6 rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_mirror dm_multipath dm_mod sbs sbshc battery ac lp floppy sg serio_raw parport_pc parport snd_intel8x0 snd_ac97_codec 8250_pnp ac97_bus snd_seq_dummy 8250 serial_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss button snd_pcm e1000 snd_timer snd soundcore i2c_i801 snd_page_alloc i2c_core pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 12634, comm: pdflush Tainted: GF       (2.6.24.7-rt27 #9)
EIP: 0060:[<c043d18a>] EFLAGS: 00010016 CPU: 0
EIP is at task_blocks_on_rt_mutex+0xf8/0x240
EAX: ef42406c EBX: 0000001a ECX: ef424044 EDX: ef424044
ESI: ef424044 EDI: 00000009 EBP: e7ba7eec ESP: e7ba7ebc
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 preempt:00000003
Process pdflush (pid: 12634, ti=e7ba7000 task=e5268060 task.ti=e7ba7000)
Stack: 00000001 ef424054 ef424044 00000296 00000000 e7ba7f04 ef424044 00000009
       e7ba7f1c 00000296 ef424044 00000296 e7ba7f58 c063a57e 00000296 00000046
       00000000 ffffffff 00000078 e7ba7f08 e7ba7f08 e7ba7f10 e7ba7f10 00000000
Call Trace:
 [<c04046b8>] show_trace_log_lvl+0x1f/0x34
 [<c0404772>] show_stack_log_lvl+0xa5/0xb9
 [<c040483a>] show_registers+0xb4/0x1b8
 [<c0404a5e>] die+0x120/0x21b
 [<c063ddfc>] do_page_fault+0x845/0xa07
 [<c063c29a>] error_code+0x6a/0x70
 [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7
 [<c063a9d4>] __rt_spin_lock+0x48/0x4b
 [<c063acbe>] rt_spin_lock+0x94/0xa2
 [<c04f33a4>] cfq_exit_io_context+0x30/0x56
 [<c04ed88f>] exit_io_context+0x68/0x72
 [<c04206c1>] do_exit+0x6c2/0x739
 [<c040432d>] kernel_thread_helper+0xd/0x10
 =======================
INFO: lockdep is turned off.
Code: 24 08 4c 00 00 00 c7 44 24 04 83 42 6e c0 c7 04 24 b6 78 6d c0 e8 c0 0d fe ff e8 f5 80 fc ff 8b 7f 08 8b 4d e8 83 ef 0c 89 7d ec <39> 4f 34 74 04 0f 0b eb fe 8b 7d e8 8b 45 e4 83 c7 20 89 fa e8

<0>EIP: [<c043d18a>] task_blocks_on_rt_mutex+0xf8/0x240 SS:ESP 0068:e7ba7ebc
---[ end trace 432e3e53cc0cfa18 ]---
Fixing recursive fault but reboot is needed!
BUG: scheduling with irqs disabled: pdflush/0x00000002/12634
caller is do_exit+0xcc/0x739
Pid: 12634, comm: pdflush Tainted: GF     D 2.6.24.7-rt27 #9
 [<c04046b8>] show_trace_log_lvl+0x1f/0x34
 [<c0404f67>] show_trace+0x17/0x19
 [<c04052e2>] dump_stack+0x6f/0x75
 [<c0638e41>] schedule+0x8a/0x105
 [<c04200cb>] do_exit+0xcc/0x739
 [<c0404b51>] die+0x213/0x21b
 [<c063ddfc>] do_page_fault+0x845/0xa07
 [<c063c29a>] error_code+0x6a/0x70
 [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7
 [<c063a9d4>] __rt_spin_lock+0x48/0x4b
 [<c063acbe>] rt_spin_lock+0x94/0xa2
 [<c04f33a4>] cfq_exit_io_context+0x30/0x56
 [<c04ed88f>] exit_io_context+0x68/0x72
 [<c04206c1>] do_exit+0x6c2/0x739
 [<c040432d>] kernel_thread_helper+0xd/0x10
 =======================

I am confuse whether my driver is causing this problem or not as in the trace above I cannot find any function of my driver. All the functions are of kernel and pdflush is causing this problem.  In the above trace "Modules linked in: sysfs_driver(F) regularcache(F) dts "     are my modules.
I am using CentOS 5 with 1GB RAM on  "Intel(R) Pentium(R) 4 CPU 2.80GHz".  Please guide me if I am doing any mistake

Thanks!
Furahm

--
This message was sent on behalf of dtslinux@hotmail.com at openSubscriber.com
http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/6978704.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH -rt] Preemption problem in kernel RT Patch
  2009-10-14 12:55 ` dtslinux
@ 2009-10-14 15:38   ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2009-10-14 15:38 UTC (permalink / raw)
  To: dtslinux; +Cc: linux-kernel

Note, LKML receives 600 emails a day. If you want to make sure your email is seen,
it is best to Cc those that post the -rt patches. Otherwise, your email is
likely to get lost in the noise.

On Wed, Oct 14, 2009 at 08:55:28AM -0400, dtslinux@hotmail.com wrote:
> Hello All,
> 
> I am having an issue in kernel 2.6.24.7 with RT-27 patch. I am using a block device driver that is doing I/O operations on a virtual device. The driver is using separate kernel threads to perform read and write operations. The driver is working fine in the normal kernels, it is also working fine in RT-27 patch with 2.6.24.7 kernel, but some times I am getting following bug when performing write test with xdd benchmark (in RT-27 patch with 2.6.24.7 kernel)  :
> 
> WARNING: at kernel/rtmutex.c:979 rt_spin_lock()

This shows that a non "rtmutex" was used in the rtmutex code.

> Pid: 12634, comm: pdflush Tainted: GF       2.6.24.7-rt27 #9
>  [<c04046b8>] show_trace_log_lvl+0x1f/0x34
>  [<c0404f67>] show_trace+0x17/0x19
>  [<c04052e2>] dump_stack+0x6f/0x75
>  [<c063ac74>] rt_spin_lock+0x4a/0xa2
>  [<c04f33a4>] cfq_exit_io_context+0x30/0x56

The q->queue_lock used in cfq_exit_single_io_context is not an rtmutex.
Yes, this will crash the kernel.

-- Steve

>  [<c04ed88f>] exit_io_context+0x68/0x72
>  [<c04206c1>] do_exit+0x6c2/0x739
>  [<c040432d>] kernel_thread_helper+0xd/0x10
>  =======================
> 
> <1>BUG: unable to handle kernel NULL pointer dereference at virtual address 0000003d
> printing eip: c043d18a *pdpt = 00000000349dd001 *pde = 0000000000000000
> 
> <0>Oops: 0000 [#1] PREEMPT DEBUG_PAGEALLOC
> Modules linked in: sysfs_driver(F) regularcache(F) dts nls_utf8 hfsplus ramdisk_driver bridge autofs4 hidp rfcomm l2cap bluetooth sunrpc ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp ib_ipoib ipv6 rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ib_sa ib_mad ib_core dm_mirror dm_multipath dm_mod sbs sbshc battery ac lp floppy sg serio_raw parport_pc parport snd_intel8x0 snd_ac97_codec 8250_pnp ac97_bus snd_seq_dummy 8250 serial_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss button snd_pcm e1000 snd_timer snd soundcore i2c_i801 snd_page_alloc i2c_core pcspkr ata_piix libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
> 
> Pid: 12634, comm: pdflush Tainted: GF       (2.6.24.7-rt27 #9)
> EIP: 0060:[<c043d18a>] EFLAGS: 00010016 CPU: 0
> EIP is at task_blocks_on_rt_mutex+0xf8/0x240
> EAX: ef42406c EBX: 0000001a ECX: ef424044 EDX: ef424044
> ESI: ef424044 EDI: 00000009 EBP: e7ba7eec ESP: e7ba7ebc
>  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 preempt:00000003
> Process pdflush (pid: 12634, ti=e7ba7000 task=e5268060 task.ti=e7ba7000)
> Stack: 00000001 ef424054 ef424044 00000296 00000000 e7ba7f04 ef424044 00000009
>        e7ba7f1c 00000296 ef424044 00000296 e7ba7f58 c063a57e 00000296 00000046
>        00000000 ffffffff 00000078 e7ba7f08 e7ba7f08 e7ba7f10 e7ba7f10 00000000
> Call Trace:
>  [<c04046b8>] show_trace_log_lvl+0x1f/0x34
>  [<c0404772>] show_stack_log_lvl+0xa5/0xb9
>  [<c040483a>] show_registers+0xb4/0x1b8
>  [<c0404a5e>] die+0x120/0x21b
>  [<c063ddfc>] do_page_fault+0x845/0xa07
>  [<c063c29a>] error_code+0x6a/0x70
>  [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7
>  [<c063a9d4>] __rt_spin_lock+0x48/0x4b
>  [<c063acbe>] rt_spin_lock+0x94/0xa2
>  [<c04f33a4>] cfq_exit_io_context+0x30/0x56
>  [<c04ed88f>] exit_io_context+0x68/0x72
>  [<c04206c1>] do_exit+0x6c2/0x739
>  [<c040432d>] kernel_thread_helper+0xd/0x10
>  =======================
> INFO: lockdep is turned off.
> Code: 24 08 4c 00 00 00 c7 44 24 04 83 42 6e c0 c7 04 24 b6 78 6d c0 e8 c0 0d fe ff e8 f5 80 fc ff 8b 7f 08 8b 4d e8 83 ef 0c 89 7d ec <39> 4f 34 74 04 0f 0b eb fe 8b 7d e8 8b 45 e4 83 c7 20 89 fa e8
> 
> <0>EIP: [<c043d18a>] task_blocks_on_rt_mutex+0xf8/0x240 SS:ESP 0068:e7ba7ebc
> ---[ end trace 432e3e53cc0cfa18 ]---
> Fixing recursive fault but reboot is needed!
> BUG: scheduling with irqs disabled: pdflush/0x00000002/12634
> caller is do_exit+0xcc/0x739
> Pid: 12634, comm: pdflush Tainted: GF     D 2.6.24.7-rt27 #9
>  [<c04046b8>] show_trace_log_lvl+0x1f/0x34
>  [<c0404f67>] show_trace+0x17/0x19
>  [<c04052e2>] dump_stack+0x6f/0x75
>  [<c0638e41>] schedule+0x8a/0x105
>  [<c04200cb>] do_exit+0xcc/0x739
>  [<c0404b51>] die+0x213/0x21b
>  [<c063ddfc>] do_page_fault+0x845/0xa07
>  [<c063c29a>] error_code+0x6a/0x70
>  [<c063a57e>] rt_spin_lock_slowlock+0xc5/0x1b7
>  [<c063a9d4>] __rt_spin_lock+0x48/0x4b
>  [<c063acbe>] rt_spin_lock+0x94/0xa2
>  [<c04f33a4>] cfq_exit_io_context+0x30/0x56
>  [<c04ed88f>] exit_io_context+0x68/0x72
>  [<c04206c1>] do_exit+0x6c2/0x739
>  [<c040432d>] kernel_thread_helper+0xd/0x10
>  =======================
> 
> I am confuse whether my driver is causing this problem or not as in the trace above I cannot find any function of my driver. All the functions are of kernel and pdflush is causing this problem.  In the above trace "Modules linked in: sysfs_driver(F) regularcache(F) dts "     are my modules.
> I am using CentOS 5 with 1GB RAM on  "Intel(R) Pentium(R) 4 CPU 2.80GHz".  Please guide me if I am doing any mistake
> 
> Thanks!
> Furahm
> 
> --
> This message was sent on behalf of dtslinux@hotmail.com at openSubscriber.com
> http://www.opensubscriber.com/message/linux-kernel@vger.kernel.org/6978704.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-10-14 15:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-21 19:39 [PATCH -rt] Preemption problem in kernel RT Patch Beauchemin, Mark
2007-06-22  8:03 ` Thomas Gleixner
2007-06-23 14:08   ` Beauchemin, Mark
2007-06-23 14:26     ` Thomas Gleixner
2007-07-24 15:48       ` Beauchemin, Mark
2007-07-24 19:15         ` Ingo Molnar
2007-07-24 19:36           ` Beauchemin, Mark
2007-08-01 14:15           ` Beauchemin, Mark
2007-08-01 14:22             ` Beauchemin, Mark
2007-08-06  7:13               ` Ingo Molnar
2007-08-07 19:41                 ` Beauchemin, Mark
2007-09-17 13:03                   ` Beauchemin, Mark
2007-09-17 13:59                     ` Ingo Molnar
2009-10-14 12:55 ` dtslinux
2009-10-14 15:38   ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).