All of lore.kernel.org
 help / color / mirror / Atom feed
* unable to handle kernel NULL pointer dereference in skb_dequeue
@ 2010-12-03 12:42 Toshio
  2010-12-03 13:09 ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Toshio @ 2010-12-03 12:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: gvs

I have also hit on Bug 20292 (https://bugzilla.kernel.org/show_bug.cgi?id=20292) in final 2.6.36. After investigating changes made between 2.6.35.4, which worked, and 2.6.36 which started oopsing, I think the problem was in double freeing of skb caused by change of return value for __pppoe_xmit in case of errors.

As it turned out this might be the cause of random BUG reports throught the kernel, whenever something touched skb. Most common BUG with my use case happened at skb_dequeue:

00000060 <skb_dequeue>:
      60:       53                      push   %ebx
      61:       89 c2                   mov    %eax,%edx
      63:       9c                      pushf  
      64:       59                      pop    %ecx
      65:       fa                      cli    
      66:       8b 00                   mov    (%eax),%eax
      68:       39 c2                   cmp    %eax,%edx
      6a:       74 24                   je     90 <skb_dequeue+0x30>
      6c:       85 c0                   test   %eax,%eax
      6e:       74 1a                   je     8a <skb_dequeue+0x2a>
      70:       ff 4a 08                decl   0x8(%edx)
      73:       8b 18                   mov    (%eax),%ebx
      75:       c7 00 00 00 00 00       movl   $0x0,(%eax)
      7b:       8b 50 04                mov    0x4(%eax),%edx
      7e:       c7 40 04 00 00 00 00    movl   $0x0,0x4(%eax)
      85:       89 53 04                mov    %edx,0x4(%ebx)
      88:*      89 1a                   mov    %ebx,(%edx)
      8a:       51                      push   %ecx
      8b:       9d                      popf   
      8c:       5b                      pop    %ebx
      8d:       c3                      ret    
      8e:       66 90                   xchg   %ax,%ax
      90:       b8 00 00 00 00          mov    $0x0,%eax
      95:       eb f3                   jmp    8a <skb_dequeue+0x2a>
      97:       89 f6                   mov    %esi,%esi
      99:       8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi


This location corresponds to line "next = next->next" from inlined __skb_dequeue(). The only reason BUG could happen here is something overwriting or otherwise corrupting skb list.

Patch that works for me is below. Now I only hope I haven't (re)introduced a memory leak...

I am not subscribed to LKML, so please reply-to-all if you need to contact me.

-----------------------------------------------------------------------------
--- linux-2.6.36/drivers/net/pppoe.c    2010-10-20 22:30:22.000000000 +0200
+++ linux-2.6.36.toshio/drivers/net/pppoe.c     2010-12-03 13:11:56.000000000 +0100
@@ -924,8 +924,10 @@
        /* Copy the data if there is no space for the header or if it's
         * read-only.
         */
-       if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
+       if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) {
+               kfree_skb(skb);
                goto abort;
+       }
 
        __skb_push(skb, sizeof(*ph));
        skb_reset_network_header(skb);
@@ -947,7 +949,6 @@
        return 1;
 
 abort:
-       kfree_skb(skb);
        return 0;
 }

-----------------------------------------------------------------------------

Andrej Ota.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-03 12:42 unable to handle kernel NULL pointer dereference in skb_dequeue Toshio
@ 2010-12-03 13:09 ` Eric Dumazet
  2010-12-03 14:37   ` Andrej Ota
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2010-12-03 13:09 UTC (permalink / raw)
  To: Toshio; +Cc: linux-kernel, gvs, Rami Rosen, netdev

Le vendredi 03 décembre 2010 à 13:42 +0100, Toshio a écrit :
> I have also hit on Bug 20292 (https://bugzilla.kernel.org/show_bug.cgi?id=20292) in final 2.6.36. After investigating changes made between 2.6.35.4, which worked, and 2.6.36 which started oopsing, I think the problem was in double freeing of skb caused by change of return value for __pppoe_xmit in case of errors.
> 
> As it turned out this might be the cause of random BUG reports throught the kernel, whenever something touched skb. Most common BUG with my use case happened at skb_dequeue:
> 

CC netdev and Rami Rosen

> 00000060 <skb_dequeue>:
>       60:       53                      push   %ebx
>       61:       89 c2                   mov    %eax,%edx
>       63:       9c                      pushf  
>       64:       59                      pop    %ecx
>       65:       fa                      cli    
>       66:       8b 00                   mov    (%eax),%eax
>       68:       39 c2                   cmp    %eax,%edx
>       6a:       74 24                   je     90 <skb_dequeue+0x30>
>       6c:       85 c0                   test   %eax,%eax
>       6e:       74 1a                   je     8a <skb_dequeue+0x2a>
>       70:       ff 4a 08                decl   0x8(%edx)
>       73:       8b 18                   mov    (%eax),%ebx
>       75:       c7 00 00 00 00 00       movl   $0x0,(%eax)
>       7b:       8b 50 04                mov    0x4(%eax),%edx
>       7e:       c7 40 04 00 00 00 00    movl   $0x0,0x4(%eax)
>       85:       89 53 04                mov    %edx,0x4(%ebx)
>       88:*      89 1a                   mov    %ebx,(%edx)
>       8a:       51                      push   %ecx
>       8b:       9d                      popf   
>       8c:       5b                      pop    %ebx
>       8d:       c3                      ret    
>       8e:       66 90                   xchg   %ax,%ax
>       90:       b8 00 00 00 00          mov    $0x0,%eax
>       95:       eb f3                   jmp    8a <skb_dequeue+0x2a>
>       97:       89 f6                   mov    %esi,%esi
>       99:       8d bc 27 00 00 00 00    lea    0x0(%edi,%eiz,1),%edi
> 
> 
> This location corresponds to line "next = next->next" from inlined __skb_dequeue(). The only reason BUG could happen here is something overwriting or otherwise corrupting skb list.
> 
> Patch that works for me is below. Now I only hope I haven't (re)introduced a memory leak...
> 
> I am not subscribed to LKML, so please reply-to-all if you need to contact me.
> 
> -----------------------------------------------------------------------------
> --- linux-2.6.36/drivers/net/pppoe.c    2010-10-20 22:30:22.000000000 +0200
> +++ linux-2.6.36.toshio/drivers/net/pppoe.c     2010-12-03 13:11:56.000000000 +0100
> @@ -924,8 +924,10 @@
>         /* Copy the data if there is no space for the header or if it's
>          * read-only.
>          */
> -       if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len))
> +       if (skb_cow_head(skb, sizeof(*ph) + dev->hard_header_len)) {
> +               kfree_skb(skb);
>                 goto abort;
> +       }
>  
>         __skb_push(skb, sizeof(*ph));
>         skb_reset_network_header(skb);
> @@ -947,7 +949,6 @@
>         return 1;
>  
>  abort:
> -       kfree_skb(skb);
>         return 0;
>  }
> 

Problem comes from commit 55c95e738da85 (fix return value of
__pppoe_xmit() method)

I am not sure patch is OK





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-03 13:09 ` Eric Dumazet
@ 2010-12-03 14:37   ` Andrej Ota
  2010-12-03 14:46     ` Eric Dumazet
  0 siblings, 1 reply; 9+ messages in thread
From: Andrej Ota @ 2010-12-03 14:37 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, gvs, Rami Rosen, netdev

>> Patch that works for me is below. Now I only hope I haven't
>> (re)introduced a memory leak...

> Problem comes from commit 55c95e738da85 (fix return value of
> __pppoe_xmit() method)
> 
> I am not sure patch is OK


Me neither. That's why I wrote "works for me". All I dare say is that it
works better than current code and is probably no worse than it was before
above mentioned commit. Apart from that, there is no point in having return
value for __pppoe_xmit if return value isn't needed.

Easiest way of triggering this BUG is by terminating PPPoE on the server
side, which then hits "if (!dev) { goto abort; }". This in turn calls
"kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then
goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug.

I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have
just caused another BUG. However, if I read file comments at the top, I see
a comment from 19/07/01 stating that I have to delete original skb if code
succeeds and never delete it on failure. About the skb copy mentioned in
the same comment, I don't know. 2001 was many commits ago.

Andrej Ota.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-03 14:37   ` Andrej Ota
@ 2010-12-03 14:46     ` Eric Dumazet
  2010-12-03 22:07       ` Jarek Poplawski
                         ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Eric Dumazet @ 2010-12-03 14:46 UTC (permalink / raw)
  To: Andrej Ota; +Cc: linux-kernel, gvs, Rami Rosen, netdev

Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit :
> >> Patch that works for me is below. Now I only hope I haven't
> >> (re)introduced a memory leak...
> 
> > Problem comes from commit 55c95e738da85 (fix return value of
> > __pppoe_xmit() method)
> > 
> > I am not sure patch is OK
> 
> 
> Me neither. That's why I wrote "works for me". All I dare say is that it
> works better than current code and is probably no worse than it was before
> above mentioned commit. Apart from that, there is no point in having return
> value for __pppoe_xmit if return value isn't needed.
> 
> Easiest way of triggering this BUG is by terminating PPPoE on the server
> side, which then hits "if (!dev) { goto abort; }". This in turn calls
> "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then
> goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug.
> 
> I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have
> just caused another BUG. However, if I read file comments at the top, I see
> a comment from 19/07/01 stating that I have to delete original skb if code
> succeeds and never delete it on failure. About the skb copy mentioned in
> the same comment, I don't know. 2001 was many commits ago.

Well, all I wanted to say was that _I_ was not sure, but probably other
network guys have a better diagnostic.

Rami, could you re-explain the rationale of your patch ?

Thanks



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-03 14:46     ` Eric Dumazet
@ 2010-12-03 22:07       ` Jarek Poplawski
  2010-12-03 22:16       ` Denys Fedoryshchenko
  2010-12-10 19:51       ` Denys Fedoryshchenko
  2 siblings, 0 replies; 9+ messages in thread
From: Jarek Poplawski @ 2010-12-03 22:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrej Ota, linux-kernel, gvs, Rami Rosen, netdev

Eric Dumazet wrote:
> Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit :
>>>> Patch that works for me is below. Now I only hope I haven't
>>>> (re)introduced a memory leak...
>>
>>> Problem comes from commit 55c95e738da85 (fix return value of
>>> __pppoe_xmit() method)
...
> Rami, could you re-explain the rationale of your patch ?

I guess we could revert it in the meantime: it seems there might be
at least three threads (including bugzilla) with this problem.

Jarek P.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-03 14:46     ` Eric Dumazet
  2010-12-03 22:07       ` Jarek Poplawski
@ 2010-12-03 22:16       ` Denys Fedoryshchenko
  2010-12-10 19:51       ` Denys Fedoryshchenko
  2 siblings, 0 replies; 9+ messages in thread
From: Denys Fedoryshchenko @ 2010-12-03 22:16 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrej Ota, linux-kernel, gvs, Rami Rosen, netdev

On Friday 03 December 2010 16:46:35 Eric Dumazet wrote:
> Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit :
> > >> Patch that works for me is below. Now I only hope I haven't
> > >> (re)introduced a memory leak...
> > > 
> > > Problem comes from commit 55c95e738da85 (fix return value of
> > > __pppoe_xmit() method)
> > > 
> > > I am not sure patch is OK
> > 
> > Me neither. That's why I wrote "works for me". All I dare say is that it
> > works better than current code and is probably no worse than it was
> > before above mentioned commit. Apart from that, there is no point in
> > having return value for __pppoe_xmit if return value isn't needed.
> > 
> > Easiest way of triggering this BUG is by terminating PPPoE on the server
> > side, which then hits "if (!dev) { goto abort; }". This in turn calls
> > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then
> > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug.
> > 
> > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have
> > just caused another BUG. However, if I read file comments at the top, I
> > see a comment from 19/07/01 stating that I have to delete original skb
> > if code succeeds and never delete it on failure. About the skb copy
> > mentioned in the same comment, I don't know. 2001 was many commits ago.
> 
> Well, all I wanted to say was that _I_ was not sure, but probably other
> network guys have a better diagnostic.
> 
> Rami, could you re-explain the rationale of your patch ?
> 
> Thanks
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
This patch seems fixed my issue (Re: 2.6.35->2.6.36 regression, vanilla kernel 
panic, ppp or hrtimers crashing), tested now.
	

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-03 14:46     ` Eric Dumazet
  2010-12-03 22:07       ` Jarek Poplawski
  2010-12-03 22:16       ` Denys Fedoryshchenko
@ 2010-12-10 19:51       ` Denys Fedoryshchenko
  2010-12-10 20:18         ` David Miller
  2 siblings, 1 reply; 9+ messages in thread
From: Denys Fedoryshchenko @ 2010-12-10 19:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Andrej Ota, linux-kernel, gvs, Rami Rosen, netdev

On Friday 03 December 2010 16:46:35 Eric Dumazet wrote:
> Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit :
> > >> Patch that works for me is below. Now I only hope I haven't
> > >> (re)introduced a memory leak...
> > > 
> > > Problem comes from commit 55c95e738da85 (fix return value of
> > > __pppoe_xmit() method)
> > > 
> > > I am not sure patch is OK
> > 
> > Me neither. That's why I wrote "works for me". All I dare say is that it
> > works better than current code and is probably no worse than it was
> > before above mentioned commit. Apart from that, there is no point in
> > having return value for __pppoe_xmit if return value isn't needed.
> > 
> > Easiest way of triggering this BUG is by terminating PPPoE on the server
> > side, which then hits "if (!dev) { goto abort; }". This in turn calls
> > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then
> > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug.
> > 
> > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have
> > just caused another BUG. However, if I read file comments at the top, I
> > see a comment from 19/07/01 stating that I have to delete original skb
> > if code succeeds and never delete it on failure. About the skb copy
> > mentioned in the same comment, I don't know. 2001 was many commits ago.
> 
> Well, all I wanted to say was that _I_ was not sure, but probably other
> network guys have a better diagnostic.
> 
> Rami, could you re-explain the rationale of your patch ?
> 
> Thanks
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Is there any plans to queue any patch to stable?
pppoe is almost dead in 2.6.36.*


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-10 19:51       ` Denys Fedoryshchenko
@ 2010-12-10 20:18         ` David Miller
  2010-12-10 21:30           ` Jarek Poplawski
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2010-12-10 20:18 UTC (permalink / raw)
  To: nuclearcat; +Cc: eric.dumazet, andrej, linux-kernel, gvs, ramirose, netdev

From: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
Date: Fri, 10 Dec 2010 21:51:04 +0200

> On Friday 03 December 2010 16:46:35 Eric Dumazet wrote:
>> Le vendredi 03 décembre 2010 à 15:37 +0100, Andrej Ota a écrit :
>> > >> Patch that works for me is below. Now I only hope I haven't
>> > >> (re)introduced a memory leak...
>> > > 
>> > > Problem comes from commit 55c95e738da85 (fix return value of
>> > > __pppoe_xmit() method)
>> > > 
>> > > I am not sure patch is OK
>> > 
>> > Me neither. That's why I wrote "works for me". All I dare say is that it
>> > works better than current code and is probably no worse than it was
>> > before above mentioned commit. Apart from that, there is no point in
>> > having return value for __pppoe_xmit if return value isn't needed.
>> > 
>> > Easiest way of triggering this BUG is by terminating PPPoE on the server
>> > side, which then hits "if (!dev) { goto abort; }". This in turn calls
>> > "kfree_skb(skb); return 0;" which returns to pppoe_rcv_core which then
>> > goto-s to "abort_put" which again calls "kfree_skb(skb)". Voila the bug.
>> > 
>> > I don't know how to trigger "if (skb_cow_head(skb, ..." to see if I have
>> > just caused another BUG. However, if I read file comments at the top, I
>> > see a comment from 19/07/01 stating that I have to delete original skb
>> > if code succeeds and never delete it on failure. About the skb copy
>> > mentioned in the same comment, I don't know. 2001 was many commits ago.
>> 
>> Well, all I wanted to say was that _I_ was not sure, but probably other
>> network guys have a better diagnostic.
>> 
>> Rami, could you re-explain the rationale of your patch ?
>> 
>> Thanks
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Is there any plans to queue any patch to stable?
> pppoe is almost dead in 2.6.36.*

I'll deal with it for -stable once I evaluate this patch for upstream,
which I haven't even gotten to yet.

When people bark about -stable this and -stable that, it just takes
more time away from me actually getting through all the patches.  If
it causes a crash, I know it should go to stable and I'll take care of
it.  So there is no need to make an explicit note or query about it.

Thanks.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: unable to handle kernel NULL pointer dereference in skb_dequeue
  2010-12-10 20:18         ` David Miller
@ 2010-12-10 21:30           ` Jarek Poplawski
  0 siblings, 0 replies; 9+ messages in thread
From: Jarek Poplawski @ 2010-12-10 21:30 UTC (permalink / raw)
  To: David Miller
  Cc: nuclearcat, eric.dumazet, andrej, linux-kernel, gvs, ramirose, netdev

David Miller wrote:
> From: Denys Fedoryshchenko <nuclearcat@nuclearcat.com>
> Date: Fri, 10 Dec 2010 21:51:04 +0200
...
>> Is there any plans to queue any patch to stable?
>> pppoe is almost dead in 2.6.36.*
> 
> I'll deal with it for -stable once I evaluate this patch for upstream,
> which I haven't even gotten to yet.

This patch is here:
http://patchwork.ozlabs.org/patch/75095/

But IMHO it's buggy for the reason I mentioned. Since Andrej
didn't respond yet I'd suggest reverting of the offending
commit 55c95e738da85. If you agree with that but e.g. too busy,
let me know.

Jarek P.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-12-10 21:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-03 12:42 unable to handle kernel NULL pointer dereference in skb_dequeue Toshio
2010-12-03 13:09 ` Eric Dumazet
2010-12-03 14:37   ` Andrej Ota
2010-12-03 14:46     ` Eric Dumazet
2010-12-03 22:07       ` Jarek Poplawski
2010-12-03 22:16       ` Denys Fedoryshchenko
2010-12-10 19:51       ` Denys Fedoryshchenko
2010-12-10 20:18         ` David Miller
2010-12-10 21:30           ` Jarek Poplawski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.