netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
@ 2011-07-29 13:18 synapse
  2011-07-29 13:33 ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: synapse @ 2011-07-29 13:18 UTC (permalink / raw)
  To: netdev

Hello guys,

I have a problem that I hope you can help me resolv. This is my first 
real bug report, so please be
patient :)

### Description:
3.0.0-rc4 routinely locks up with BUG: unable to handle kernel NULL 
pointer dereference at 000000000000002c
I have an intel sr2600 machine with a 10Gbit interface, it periodically 
locks up after a few days.
It serves a lot of traffic. The trace is at the end of the mail.
###

### My efforts:
I've traced the error back from atomic_dec_and_test() to:

ipv4_dst_check()
check_peer_redir()
neigh_release()
atomic_dec_and_test()

The parameter to atomic_dec_and_test() is NULL (&neigh->refcnt in 
neigh_release), so atomic_dec_and_test()
at /arch/x86/include/asm/atomic.h dies at offset 0xffffffff8140f56f.

ffffffff8140f560:       48 8b 15 19 47 2f 00    mov    
0x2f4719(%rip),%rdx        # 0xffffffff81703c80
ffffffff8140f567:       48 89 50 18             mov    %rdx,0x18(%rax)
ffffffff8140f56b:       48 8b 7b 40             mov    0x40(%rbx),%rdi
ffffffff8140f56f:       f0 ff 4f 2c             lock decl 0x2c(%rdi)
ffffffff8140f573:       0f 94 c0                sete   %al
ffffffff8140f576:       84 c0                   test   %al,%al
ffffffff8140f578:       0f 85 ab 00 00 00       jne    0xffffffff8140f629

 From what I've seen is that this code is responsible for pmtu related 
things. The refcount member of struct neighbour
is NULL and the neigh pointer (struct neighbour *) in neigh_release() is 
not. I have no clue how this might happen,
though I suspect somebody releases the data structure somehow. Note that 
this code is invoked when redirect_learned.a4
is set and is different from rt_gateway in ipv4_dst_check().

Is it possible that two packets go to two different cores for processing 
and one core invalidates the rt entry
the other is currently working on (meaning the second will try to 
dereference a NULL ptr)?
###


This is just my clumsy attempt at tracking this down, I'm not a kernel 
expert unfortunately. I'm happy to provide
further info on the matter. If I'm completely on the wrong track please 
let me know.

Thank you for any help,
Gergely Kalman


TRACE:
===============================================================
BUG: unable to handle kernel NULL pointer dereference at 000000000000002c
IP: [<ffffffff8140f56f>] ipv4_dst_check+0xaf/0x190
PGD 0
Oops: 0002 [#1] SMP
CPU 8
Modules linked in: 8021q garp bridge stp llc iptable_filter ip_tables 
ixgbe ioatdma mdio dca hed

Pid: 0, comm: kworker/0:1 Not tainted 3.0.0-rc4-10g-lvs-pktgen #1 Intel 
Corporation S5520UR/S5520UR
RIP: 0010:[<ffffffff8140f56f>]  [<ffffffff8140f56f>] 
ipv4_dst_check+0xaf/0x190
RSP: 0018:ffff8801efc83a40  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88014d428900 RCX: ffff8801a44fa000
RDX: 0000000000000000 RSI: ffff8801a4335bc0 RDI: 0000000000000000
RBP: 00000000fea2476d R08: 000000000000fa4b R09: 0000000000007d25
R10: 00000000000000c0 R11: 0000000000000003 R12: ffff8801a4335bc0
R13: 0000000000006bc1 R14: 0000000000000000 R15: ffff88016291da20
FS:  0000000000000000(0000) GS:ffff8801efc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000002c CR3: 0000000001697000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 0, threadinfo ffff8801e90ee000, task 
ffff8801e90d9680)
Stack:
  ffff88014d428900 ffff88016291d780 0000000000000000 ffffffff813dccfa
  ffff88036fff9000 ffff8801b77bfc58 ffff88016291d780 ffffffff81417a82
  ffff8801a44fb0a0 ffff88016291d780 ffff8801b77bfc58 ffff8801b77bfc80
Call Trace:
<IRQ>
  [<ffffffff813dccfa>] ? __sk_dst_check+0x4a/0x70
  [<ffffffff81417a82>] ? ip_queue_xmit+0x2b2/0x3c0
  [<ffffffff8142c23b>] ? tcp_transmit_skb+0x3bb/0x850
  [<ffffffff8142e8cc>] ? tcp_write_xmit+0x1ec/0xa10
  [<ffffffff8142f239>] ? __tcp_push_pending_frames+0x19/0x80
  [<ffffffff81426076>] ? tcp_data_snd_check+0x36/0x120
  [<ffffffff8142a5d9>] ? tcp_rcv_established+0x349/0x7c0
  [<ffffffff8143204f>] ? tcp_v4_do_rcv+0x10f/0x2e0
  [<ffffffff81412300>] ? ip_rcv_finish+0x350/0x350
  [<ffffffff81433102>] ? tcp_v4_rcv+0x4e2/0x7a0
  [<ffffffff8141237d>] ? ip_local_deliver_finish+0x7d/0x130
  [<ffffffff813e802e>] ? __netif_receive_skb+0x1ae/0x350
  [<ffffffff813edc78>] ? netif_receive_skb+0x78/0x80
  [<ffffffff813ee21b>] ? napi_gro_receive+0xbb/0xd0
  [<ffffffff813edda8>] ? napi_skb_finish+0x38/0x50
  [<ffffffffa004c372>] ? ixgbe_clean_rx_irq+0x4f2/0x780 [ixgbe]
  [<ffffffffa004eddd>] ? ixgbe_clean_rxtx_many+0xed/0x1f0 [ixgbe]
  [<ffffffff8120b890>] ? timerqueue_add+0x60/0xb0
  [<ffffffff813ee366>] ? net_rx_action+0x86/0x170
  [<ffffffff8104aab1>] ? __do_softirq+0x91/0x140
  [<ffffffff8107ccfa>] ? handle_irq_event_percpu+0x7a/0x140
  [<ffffffff81474e4c>] ? call_softirq+0x1c/0x30
  [<ffffffff8100428d>] ? do_softirq+0x4d/0x80
  [<ffffffff8104a975>] ? irq_exit+0xb5/0xc0
  [<ffffffff81003aac>] ? do_IRQ+0x5c/0xd0
  [<ffffffff814737d3>] ? common_interrupt+0x13/0x13
<EOI>
  [<ffffffff81251c8c>] ? acpi_hw_read_multiple+0x28/0x60
  [<ffffffff81261afd>] ? acpi_idle_enter_bm+0x22c/0x260
  [<ffffffff81261af8>] ? acpi_idle_enter_bm+0x227/0x260
  [<ffffffff813b7281>] ? cpuidle_idle_call+0x81/0xf0
  [<ffffffff810017d8>] ? cpu_idle+0x58/0xb0
Code: 00 89 83 d4 00 00 00 eb 98 0f 1f 00 48 85 db 74 16 48 8b 43 40 31 
ff 48 85 c0 74 0f 48 8b 15 19 47 2f 00 48 89 50 18 48 8b 7b 40 <f0> ff 
4f 2c 0f 94 c0 84 c0 0f 85 ab 00 00 00 48 c7 43 40 00 00
RIP  [<ffffffff8140f56f>] ipv4_dst_check+0xaf/0x190
  RSP <ffff8801efc83a40>
CR2: 000000000000002c
---[ end trace 8a3fd44eb302579f ]---

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 13:18 PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check) synapse
@ 2011-07-29 13:33 ` Eric Dumazet
  2011-07-29 14:26   ` synapse
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-07-29 13:33 UTC (permalink / raw)
  To: synapse; +Cc: netdev

Le vendredi 29 juillet 2011 à 15:18 +0200, synapse a écrit :
> Hello guys,
> 
> I have a problem that I hope you can help me resolv. This is my first 
> real bug report, so please be
> patient :)
> 
> ### Description:
> 3.0.0-rc4 routinely locks up with BUG: unable to handle kernel NULL 
> pointer dereference at 000000000000002c
> I have an intel sr2600 machine with a 10Gbit interface, it periodically 
> locks up after a few days.
> It serves a lot of traffic. The trace is at the end of the mail.
> ###
> 
> ### My efforts:
> I've traced the error back from atomic_dec_and_test() to:
> 
> ipv4_dst_check()
> check_peer_redir()
> neigh_release()
> atomic_dec_and_test()
> 
> The parameter to atomic_dec_and_test() is NULL (&neigh->refcnt in 
> neigh_release), so atomic_dec_and_test()
> at /arch/x86/include/asm/atomic.h dies at offset 0xffffffff8140f56f.
> 
> ffffffff8140f560:       48 8b 15 19 47 2f 00    mov    
> 0x2f4719(%rip),%rdx        # 0xffffffff81703c80
> ffffffff8140f567:       48 89 50 18             mov    %rdx,0x18(%rax)
> ffffffff8140f56b:       48 8b 7b 40             mov    0x40(%rbx),%rdi
> ffffffff8140f56f:       f0 ff 4f 2c             lock decl 0x2c(%rdi)
> ffffffff8140f573:       0f 94 c0                sete   %al
> ffffffff8140f576:       84 c0                   test   %al,%al
> ffffffff8140f578:       0f 85 ab 00 00 00       jne    0xffffffff8140f629
> 
>  From what I've seen is that this code is responsible for pmtu related 
> things. The refcount member of struct neighbour
> is NULL and the neigh pointer (struct neighbour *) in neigh_release() is 
> not. I have no clue how this might happen,
> though I suspect somebody releases the data structure somehow. Note that 
> this code is invoked when redirect_learned.a4
> is set and is different from rt_gateway in ipv4_dst_check().
> 
> Is it possible that two packets go to two different cores for processing 
> and one core invalidates the rt entry
> the other is currently working on (meaning the second will try to 
> dereference a NULL ptr)?
> ###
> 
> 
> This is just my clumsy attempt at tracking this down, I'm not a kernel 
> expert unfortunately. I'm happy to provide
> further info on the matter. If I'm completely on the wrong track please 
> let me know.
> 
> Thank you for any help,
> Gergely Kalman
> 

This bug was probably already fixed.

Please try current linux tree




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 13:33 ` Eric Dumazet
@ 2011-07-29 14:26   ` synapse
  2011-07-29 14:36     ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: synapse @ 2011-07-29 14:26 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On 07/29/11 15:33, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 15:18 +0200, synapse a écrit :
>> Hello guys,
>>
>> I have a problem that I hope you can help me resolv. This is my first
>> real bug report, so please be
>> patient :)
>>
>> ### Description:
>> 3.0.0-rc4 routinely locks up with BUG: unable to handle kernel NULL
>> pointer dereference at 000000000000002c
>> I have an intel sr2600 machine with a 10Gbit interface, it periodically
>> locks up after a few days.
>> It serves a lot of traffic. The trace is at the end of the mail.
>> ###
>>
>> ### My efforts:
>> I've traced the error back from atomic_dec_and_test() to:
>>
>> ipv4_dst_check()
>> check_peer_redir()
>> neigh_release()
>> atomic_dec_and_test()
>>
>> The parameter to atomic_dec_and_test() is NULL (&neigh->refcnt in
>> neigh_release), so atomic_dec_and_test()
>> at /arch/x86/include/asm/atomic.h dies at offset 0xffffffff8140f56f.
>>
>> ffffffff8140f560:       48 8b 15 19 47 2f 00    mov
>> 0x2f4719(%rip),%rdx        # 0xffffffff81703c80
>> ffffffff8140f567:       48 89 50 18             mov    %rdx,0x18(%rax)
>> ffffffff8140f56b:       48 8b 7b 40             mov    0x40(%rbx),%rdi
>> ffffffff8140f56f:       f0 ff 4f 2c             lock decl 0x2c(%rdi)
>> ffffffff8140f573:       0f 94 c0                sete   %al
>> ffffffff8140f576:       84 c0                   test   %al,%al
>> ffffffff8140f578:       0f 85 ab 00 00 00       jne    0xffffffff8140f629
>>
>>   From what I've seen is that this code is responsible for pmtu related
>> things. The refcount member of struct neighbour
>> is NULL and the neigh pointer (struct neighbour *) in neigh_release() is
>> not. I have no clue how this might happen,
>> though I suspect somebody releases the data structure somehow. Note that
>> this code is invoked when redirect_learned.a4
>> is set and is different from rt_gateway in ipv4_dst_check().
>>
>> Is it possible that two packets go to two different cores for processing
>> and one core invalidates the rt entry
>> the other is currently working on (meaning the second will try to
>> dereference a NULL ptr)?
>> ###
>>
>>
>> This is just my clumsy attempt at tracking this down, I'm not a kernel
>> expert unfortunately. I'm happy to provide
>> further info on the matter. If I'm completely on the wrong track please
>> let me know.
>>
>> Thank you for any help,
>> Gergely Kalman
>>
> This bug was probably already fixed.
>
> Please try current linux tree
>
>
found no relevant things in the diffs, except for a check against 
DST_NOCOUNT
when calling dst_entries_add(opc, 1). Will try with the new kernel, but 
unfortunately
it might take days to reproduce.

Gergely Kalman


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 14:26   ` synapse
@ 2011-07-29 14:36     ` Eric Dumazet
  2011-07-29 14:39       ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-07-29 14:36 UTC (permalink / raw)
  To: synapse; +Cc: netdev

Le vendredi 29 juillet 2011 à 16:26 +0200, synapse a écrit :
> On 07/29/11 15:33, Eric Dumazet wrote:
> > Le vendredi 29 juillet 2011 à 15:18 +0200, synapse a écrit :
> >> Hello guys,
> >>
> >> I have a problem that I hope you can help me resolv. This is my first
> >> real bug report, so please be
> >> patient :)
> >>
> >> ### Description:
> >> 3.0.0-rc4 routinely locks up with BUG: unable to handle kernel NULL
> >> pointer dereference at 000000000000002c
> >> I have an intel sr2600 machine with a 10Gbit interface, it periodically
> >> locks up after a few days.
> >> It serves a lot of traffic. The trace is at the end of the mail.
> >> ###
> >>
> >> ### My efforts:
> >> I've traced the error back from atomic_dec_and_test() to:
> >>
> >> ipv4_dst_check()
> >> check_peer_redir()
> >> neigh_release()
> >> atomic_dec_and_test()
> >>
> >> The parameter to atomic_dec_and_test() is NULL (&neigh->refcnt in
> >> neigh_release), so atomic_dec_and_test()
> >> at /arch/x86/include/asm/atomic.h dies at offset 0xffffffff8140f56f.
> >>
> >> ffffffff8140f560:       48 8b 15 19 47 2f 00    mov
> >> 0x2f4719(%rip),%rdx        # 0xffffffff81703c80
> >> ffffffff8140f567:       48 89 50 18             mov    %rdx,0x18(%rax)
> >> ffffffff8140f56b:       48 8b 7b 40             mov    0x40(%rbx),%rdi
> >> ffffffff8140f56f:       f0 ff 4f 2c             lock decl 0x2c(%rdi)
> >> ffffffff8140f573:       0f 94 c0                sete   %al
> >> ffffffff8140f576:       84 c0                   test   %al,%al
> >> ffffffff8140f578:       0f 85 ab 00 00 00       jne    0xffffffff8140f629
> >>
> >>   From what I've seen is that this code is responsible for pmtu related
> >> things. The refcount member of struct neighbour
> >> is NULL and the neigh pointer (struct neighbour *) in neigh_release() is
> >> not. I have no clue how this might happen,
> >> though I suspect somebody releases the data structure somehow. Note that
> >> this code is invoked when redirect_learned.a4
> >> is set and is different from rt_gateway in ipv4_dst_check().
> >>
> >> Is it possible that two packets go to two different cores for processing
> >> and one core invalidates the rt entry
> >> the other is currently working on (meaning the second will try to
> >> dereference a NULL ptr)?
> >> ###
> >>
> >>
> >> This is just my clumsy attempt at tracking this down, I'm not a kernel
> >> expert unfortunately. I'm happy to provide
> >> further info on the matter. If I'm completely on the wrong track please
> >> let me know.
> >>
> >> Thank you for any help,
> >> Gergely Kalman
> >>
> > This bug was probably already fixed.
> >
> > Please try current linux tree
> >
> >
> found no relevant things in the diffs, except for a check against 
> DST_NOCOUNT
> when calling dst_entries_add(opc, 1). Will try with the new kernel, but 
> unfortunately
> it might take days to reproduce.

Hmm, I'll take a look, but check_peer_redir() seems suspicious at first
glance.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 14:36     ` Eric Dumazet
@ 2011-07-29 14:39       ` David Miller
  2011-07-29 14:41         ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2011-07-29 14:39 UTC (permalink / raw)
  To: eric.dumazet; +Cc: synapse, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 29 Jul 2011 16:36:24 +0200

> Hmm, I'll take a look, but check_peer_redir() seems suspicious at first
> glance.

I take full responsibility for any bugs you discover in it :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 14:39       ` David Miller
@ 2011-07-29 14:41         ` Eric Dumazet
  2011-07-29 14:44           ` synapse
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-07-29 14:41 UTC (permalink / raw)
  To: David Miller; +Cc: synapse, netdev

Le vendredi 29 juillet 2011 à 07:39 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 29 Jul 2011 16:36:24 +0200
> 
> > Hmm, I'll take a look, but check_peer_redir() seems suspicious at first
> > glance.
> 
> I take full responsibility for any bugs you discover in it :-)


;)

Bug origin is commit f39925dbde7788cfb96419c0f092b086aa325c0f
ipv4: Cache learned redirect information in inetpeer.

I'll cook a patch ASAP



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 14:41         ` Eric Dumazet
@ 2011-07-29 14:44           ` synapse
  2011-07-29 15:11             ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: synapse @ 2011-07-29 14:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 07/29/11 16:41, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 07:39 -0700, David Miller a écrit :
>> From: Eric Dumazet<eric.dumazet@gmail.com>
>> Date: Fri, 29 Jul 2011 16:36:24 +0200
>>
>>> Hmm, I'll take a look, but check_peer_redir() seems suspicious at first
>>> glance.
>> I take full responsibility for any bugs you discover in it :-)
>
> ;)
>
> Bug origin is commit f39925dbde7788cfb96419c0f092b086aa325c0f
> ipv4: Cache learned redirect information in inetpeer.
>
> I'll cook a patch ASAP
>
wow that was QUICK :)

When you're done I'll check it ASAP

Gergely Kalman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 14:44           ` synapse
@ 2011-07-29 15:11             ` Eric Dumazet
  2011-07-29 15:19               ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-07-29 15:11 UTC (permalink / raw)
  To: synapse; +Cc: David Miller, netdev

Le vendredi 29 juillet 2011 à 16:44 +0200, synapse a écrit :
> On 07/29/11 16:41, Eric Dumazet wrote:
> > Le vendredi 29 juillet 2011 à 07:39 -0700, David Miller a écrit :
> >> From: Eric Dumazet<eric.dumazet@gmail.com>
> >> Date: Fri, 29 Jul 2011 16:36:24 +0200
> >>
> >>> Hmm, I'll take a look, but check_peer_redir() seems suspicious at first
> >>> glance.
> >> I take full responsibility for any bugs you discover in it :-)
> >
> > ;)
> >
> > Bug origin is commit f39925dbde7788cfb96419c0f092b086aa325c0f
> > ipv4: Cache learned redirect information in inetpeer.
> >
> > I'll cook a patch ASAP
> >
> wow that was QUICK :)
> 
> When you're done I'll check it ASAP
> 
> Gergely Kalman

Thats tricky, because I am not sure we dont need RCU protection since we
can now exchange dst neighbour on the fly.

Following patch would only reduce the window of bug, not a complete
fix...

David, any opinion on this ?


diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1730689..6afc4eb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1628,16 +1628,18 @@ static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
 {
 	struct rtable *rt = (struct rtable *) dst;
 	__be32 orig_gw = rt->rt_gateway;
-	struct neighbour *n;
+	struct neighbour *n, *old_n;
 
 	dst_confirm(&rt->dst);
 
-	neigh_release(dst_get_neighbour(&rt->dst));
-	dst_set_neighbour(&rt->dst, NULL);
-
 	rt->rt_gateway = peer->redirect_learned.a4;
-	rt_bind_neighbour(rt);
-	n = dst_get_neighbour(&rt->dst);
+
+	n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
+	if (IS_ERR(n))
+		return PTR_ERR(n);
+	old_n = xchg(&rt->dst._neighbour, n);
+	if (old_n)
+		neigh_release(old_n);
 	if (!n || !(n->nud_state & NUD_VALID)) {
 		if (n)
 			neigh_event_send(n, NULL);



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 15:11             ` Eric Dumazet
@ 2011-07-29 15:19               ` David Miller
  2011-07-29 15:43                 ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2011-07-29 15:19 UTC (permalink / raw)
  To: eric.dumazet; +Cc: synapse, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 29 Jul 2011 17:11:46 +0200

> Thats tricky, because I am not sure we dont need RCU protection since we
> can now exchange dst neighbour on the fly.
> 
> Following patch would only reduce the window of bug, not a complete
> fix...
> 
> David, any opinion on this ?

Indeed, old code worked because we invalidated entire route cache
entry, and we never before ran arp_bind_neighbour() except on new
route cache entires before they become globally visible.

I think when we change an existing neigh we will need to release old
neigh via RCU, at a minimum.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 15:19               ` David Miller
@ 2011-07-29 15:43                 ` Eric Dumazet
  2011-07-30  5:00                   ` Eric Dumazet
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-07-29 15:43 UTC (permalink / raw)
  To: David Miller; +Cc: synapse, netdev

Le vendredi 29 juillet 2011 à 08:19 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 29 Jul 2011 17:11:46 +0200
> 
> > Thats tricky, because I am not sure we dont need RCU protection since we
> > can now exchange dst neighbour on the fly.
> > 
> > Following patch would only reduce the window of bug, not a complete
> > fix...
> > 
> > David, any opinion on this ?
> 
> Indeed, old code worked because we invalidated entire route cache
> entry, and we never before ran arp_bind_neighbour() except on new
> route cache entires before they become globally visible.
> 
> I think when we change an existing neigh we will need to release old
> neigh via RCU, at a minimum.


Oh well, we already use RCU in neigh_destroy(), so adding rcu would need
to change all dst_get_neighbour() callers to be in one rcu_read_lock()
section.

I'll take a look, I suspect its mostly already done.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-29 15:43                 ` Eric Dumazet
@ 2011-07-30  5:00                   ` Eric Dumazet
  2011-08-01  8:57                     ` synapse
  2011-08-03 10:34                     ` David Miller
  0 siblings, 2 replies; 18+ messages in thread
From: Eric Dumazet @ 2011-07-30  5:00 UTC (permalink / raw)
  To: David Miller; +Cc: synapse, netdev

Le vendredi 29 juillet 2011 à 17:43 +0200, Eric Dumazet a écrit :

> Oh well, we already use RCU in neigh_destroy(), so adding rcu would need
> to change all dst_get_neighbour() callers to be in one rcu_read_lock()
> section.
> 
> I'll take a look, I suspect its mostly already done.
> 
> 

Here is the patch I finally cooked and tested.

Could you please test it as well Gergely ?

Thanks !

[PATCH] net: fix NULL dereferences in check_peer_redir()

Gergely Kalman reported crashes in check_peer_redir().

It appears commit f39925dbde778 (ipv4: Cache learned redirect
information in inetpeer.) added a race, leading to possible NULL ptr
dereference.

Since we can now change dst neighbour, we should make sure a reader can
safely use a neighbour.

Add RCU protection to dst neighbour, and make sure check_peer_redir()
can be called safely by different cpus in parallel.

As neighbours are already freed after one RCU grace period, this patch
should not add typical RCU penalty (cache cold effects)

Many thanks to Gergely for providing a pretty report pointing to the
bug.

Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/dst.h     |   17 +++++++++++++----
 net/ipv4/ip_output.c  |   10 ++++++++--
 net/ipv4/route.c      |   14 ++++++++------
 net/ipv6/addrconf.c   |    2 +-
 net/ipv6/ip6_fib.c    |    2 +-
 net/ipv6/ip6_output.c |   13 +++++++++++--
 net/ipv6/route.c      |   35 +++++++++++++++++++++++++----------
 7 files changed, 67 insertions(+), 26 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 29e2557..13d507d 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -37,7 +37,7 @@ struct dst_entry {
 	unsigned long		_metrics;
 	unsigned long		expires;
 	struct dst_entry	*path;
-	struct neighbour	*_neighbour;
+	struct neighbour __rcu	*_neighbour;
 #ifdef CONFIG_XFRM
 	struct xfrm_state	*xfrm;
 #else
@@ -88,12 +88,17 @@ struct dst_entry {
 
 static inline struct neighbour *dst_get_neighbour(struct dst_entry *dst)
 {
-	return dst->_neighbour;
+	return rcu_dereference(dst->_neighbour);
+}
+
+static inline struct neighbour *dst_get_neighbour_raw(struct dst_entry *dst)
+{
+	return rcu_dereference_raw(dst->_neighbour);
 }
 
 static inline void dst_set_neighbour(struct dst_entry *dst, struct neighbour *neigh)
 {
-	dst->_neighbour = neigh;
+	rcu_assign_pointer(dst->_neighbour, neigh);
 }
 
 extern u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old);
@@ -382,8 +387,12 @@ static inline void dst_rcu_free(struct rcu_head *head)
 static inline void dst_confirm(struct dst_entry *dst)
 {
 	if (dst) {
-		struct neighbour *n = dst_get_neighbour(dst);
+		struct neighbour *n;
+
+		rcu_read_lock();
+		n = dst_get_neighbour(dst);
 		neigh_confirm(n);
+		rcu_read_unlock();
 	}
 }
 
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index ccaaa85..77d3ede 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -204,9 +204,15 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 		skb = skb2;
 	}
 
+	rcu_read_lock();
 	neigh = dst_get_neighbour(dst);
-	if (neigh)
-		return neigh_output(neigh, skb);
+	if (neigh) {
+		int res = neigh_output(neigh, skb);
+
+		rcu_read_unlock();
+		return res;
+	}
+	rcu_read_unlock();
 
 	if (net_ratelimit())
 		printk(KERN_DEBUG "ip_finish_output2: No header cache and no neighbour!\n");
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 1730689..6afc4eb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1628,16 +1628,18 @@ static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
 {
 	struct rtable *rt = (struct rtable *) dst;
 	__be32 orig_gw = rt->rt_gateway;
-	struct neighbour *n;
+	struct neighbour *n, *old_n;
 
 	dst_confirm(&rt->dst);
 
-	neigh_release(dst_get_neighbour(&rt->dst));
-	dst_set_neighbour(&rt->dst, NULL);
-
 	rt->rt_gateway = peer->redirect_learned.a4;
-	rt_bind_neighbour(rt);
-	n = dst_get_neighbour(&rt->dst);
+
+	n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
+	if (IS_ERR(n))
+		return PTR_ERR(n);
+	old_n = xchg(&rt->dst._neighbour, n);
+	if (old_n)
+		neigh_release(old_n);
 	if (!n || !(n->nud_state & NUD_VALID)) {
 		if (n)
 			neigh_event_send(n, NULL);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a55500c..f012ebd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -656,7 +656,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
 	 * layer address of our nexhop router
 	 */
 
-	if (dst_get_neighbour(&rt->dst) == NULL)
+	if (dst_get_neighbour_raw(&rt->dst) == NULL)
 		ifa->flags &= ~IFA_F_OPTIMISTIC;
 
 	ifa->idev = idev;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 54a4678..320d91d 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1455,7 +1455,7 @@ static int fib6_age(struct rt6_info *rt, void *arg)
 			RT6_TRACE("aging clone %p\n", rt);
 			return -1;
 		} else if ((rt->rt6i_flags & RTF_GATEWAY) &&
-			   (!(dst_get_neighbour(&rt->dst)->flags & NTF_ROUTER))) {
+			   (!(dst_get_neighbour_raw(&rt->dst)->flags & NTF_ROUTER))) {
 			RT6_TRACE("purging route %p via non-router but gateway\n",
 				  rt);
 			return -1;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 32e5339..4c882cf 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -135,10 +135,15 @@ static int ip6_finish_output2(struct sk_buff *skb)
 				skb->len);
 	}
 
+	rcu_read_lock();
 	neigh = dst_get_neighbour(dst);
-	if (neigh)
-		return neigh_output(neigh, skb);
+	if (neigh) {
+		int res = neigh_output(neigh, skb);
 
+		rcu_read_unlock();
+		return res;
+	}
+	rcu_read_unlock();
 	IP6_INC_STATS_BH(dev_net(dst->dev),
 			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
 	kfree_skb(skb);
@@ -975,12 +980,14 @@ static int ip6_dst_lookup_tail(struct sock *sk,
 	 * dst entry and replace it instead with the
 	 * dst entry of the nexthop router
 	 */
+	rcu_read_lock();
 	n = dst_get_neighbour(*dst);
 	if (n && !(n->nud_state & NUD_VALID)) {
 		struct inet6_ifaddr *ifp;
 		struct flowi6 fl_gw6;
 		int redirect;
 
+		rcu_read_unlock();
 		ifp = ipv6_get_ifaddr(net, &fl6->saddr,
 				      (*dst)->dev, 1);
 
@@ -1000,6 +1007,8 @@ static int ip6_dst_lookup_tail(struct sock *sk,
 			if ((err = (*dst)->error))
 				goto out_err_release;
 		}
+	} else {
+		rcu_read_unlock();
 	}
 #endif
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e8987da..9e69eb0 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -364,7 +364,7 @@ out:
 #ifdef CONFIG_IPV6_ROUTER_PREF
 static void rt6_probe(struct rt6_info *rt)
 {
-	struct neighbour *neigh = rt ? dst_get_neighbour(&rt->dst) : NULL;
+	struct neighbour *neigh;
 	/*
 	 * Okay, this does not seem to be appropriate
 	 * for now, however, we need to check if it
@@ -373,8 +373,10 @@ static void rt6_probe(struct rt6_info *rt)
 	 * Router Reachability Probe MUST be rate-limited
 	 * to no more than one per minute.
 	 */
+	rcu_read_lock();
+	neigh = rt ? dst_get_neighbour(&rt->dst) : NULL;
 	if (!neigh || (neigh->nud_state & NUD_VALID))
-		return;
+		goto out;
 	read_lock_bh(&neigh->lock);
 	if (!(neigh->nud_state & NUD_VALID) &&
 	    time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) {
@@ -387,8 +389,11 @@ static void rt6_probe(struct rt6_info *rt)
 		target = (struct in6_addr *)&neigh->primary_key;
 		addrconf_addr_solict_mult(target, &mcaddr);
 		ndisc_send_ns(rt->rt6i_dev, NULL, target, &mcaddr, NULL);
-	} else
+	} else {
 		read_unlock_bh(&neigh->lock);
+	}
+out:
+	rcu_read_unlock();
 }
 #else
 static inline void rt6_probe(struct rt6_info *rt)
@@ -412,8 +417,11 @@ static inline int rt6_check_dev(struct rt6_info *rt, int oif)
 
 static inline int rt6_check_neigh(struct rt6_info *rt)
 {
-	struct neighbour *neigh = dst_get_neighbour(&rt->dst);
+	struct neighbour *neigh;
 	int m;
+
+	rcu_read_lock();
+	neigh = dst_get_neighbour(&rt->dst);
 	if (rt->rt6i_flags & RTF_NONEXTHOP ||
 	    !(rt->rt6i_flags & RTF_GATEWAY))
 		m = 1;
@@ -430,6 +438,7 @@ static inline int rt6_check_neigh(struct rt6_info *rt)
 		read_unlock_bh(&neigh->lock);
 	} else
 		m = 0;
+	rcu_read_unlock();
 	return m;
 }
 
@@ -769,7 +778,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
 		rt->rt6i_dst.plen = 128;
 		rt->rt6i_flags |= RTF_CACHE;
 		rt->dst.flags |= DST_HOST;
-		dst_set_neighbour(&rt->dst, neigh_clone(dst_get_neighbour(&ort->dst)));
+		dst_set_neighbour(&rt->dst, neigh_clone(dst_get_neighbour_raw(&ort->dst)));
 	}
 	return rt;
 }
@@ -803,7 +812,7 @@ restart:
 	dst_hold(&rt->dst);
 	read_unlock_bh(&table->tb6_lock);
 
-	if (!dst_get_neighbour(&rt->dst) && !(rt->rt6i_flags & RTF_NONEXTHOP))
+	if (!dst_get_neighbour_raw(&rt->dst) && !(rt->rt6i_flags & RTF_NONEXTHOP))
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
 	else if (!(rt->dst.flags & DST_HOST))
 		nrt = rt6_alloc_clone(rt, &fl6->daddr);
@@ -1587,7 +1596,7 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
 	dst_confirm(&rt->dst);
 
 	/* Duplicate redirect: silently ignore. */
-	if (neigh == dst_get_neighbour(&rt->dst))
+	if (neigh == dst_get_neighbour_raw(&rt->dst))
 		goto out;
 
 	nrt = ip6_rt_copy(rt, dest);
@@ -1682,7 +1691,7 @@ again:
 	   1. It is connected route. Action: COW
 	   2. It is gatewayed route or NONEXTHOP route. Action: clone it.
 	 */
-	if (!dst_get_neighbour(&rt->dst) && !(rt->rt6i_flags & RTF_NONEXTHOP))
+	if (!dst_get_neighbour_raw(&rt->dst) && !(rt->rt6i_flags & RTF_NONEXTHOP))
 		nrt = rt6_alloc_cow(rt, daddr, saddr);
 	else
 		nrt = rt6_alloc_clone(rt, daddr);
@@ -2326,6 +2335,7 @@ static int rt6_fill_node(struct net *net,
 	struct nlmsghdr *nlh;
 	long expires;
 	u32 table;
+	struct neighbour *n;
 
 	if (prefix) {	/* user wants prefix routes only */
 		if (!(rt->rt6i_flags & RTF_PREFIX_RT)) {
@@ -2414,8 +2424,11 @@ static int rt6_fill_node(struct net *net,
 	if (rtnetlink_put_metrics(skb, dst_metrics_ptr(&rt->dst)) < 0)
 		goto nla_put_failure;
 
-	if (dst_get_neighbour(&rt->dst))
-		NLA_PUT(skb, RTA_GATEWAY, 16, &dst_get_neighbour(&rt->dst)->primary_key);
+	rcu_read_lock();
+	n = dst_get_neighbour(&rt->dst);
+	if (n)
+		NLA_PUT(skb, RTA_GATEWAY, 16, &n->primary_key);
+	rcu_read_unlock();
 
 	if (rt->dst.dev)
 		NLA_PUT_U32(skb, RTA_OIF, rt->rt6i_dev->ifindex);
@@ -2608,12 +2621,14 @@ static int rt6_info_route(struct rt6_info *rt, void *p_arg)
 #else
 	seq_puts(m, "00000000000000000000000000000000 00 ");
 #endif
+	rcu_read_lock();
 	n = dst_get_neighbour(&rt->dst);
 	if (n) {
 		seq_printf(m, "%pi6", n->primary_key);
 	} else {
 		seq_puts(m, "00000000000000000000000000000000");
 	}
+	rcu_read_unlock();
 	seq_printf(m, " %08x %08x %08x %08x %8s\n",
 		   rt->rt6i_metric, atomic_read(&rt->dst.__refcnt),
 		   rt->dst.__use, rt->rt6i_flags,



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-30  5:00                   ` Eric Dumazet
@ 2011-08-01  8:57                     ` synapse
  2011-08-01  9:15                       ` Eric Dumazet
  2011-08-03 10:34                     ` David Miller
  1 sibling, 1 reply; 18+ messages in thread
From: synapse @ 2011-08-01  8:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

Hello

Sorry, I wasn't home on the weekend. Exactly to which tree should I 
apply this?
It doesn't apply cleanly to 3.0.0. Am I missing something?

Gergely Kalman


On 07/30/11 07:00, Eric Dumazet wrote:
> Le vendredi 29 juillet 2011 à 17:43 +0200, Eric Dumazet a écrit :
>
>> Oh well, we already use RCU in neigh_destroy(), so adding rcu would need
>> to change all dst_get_neighbour() callers to be in one rcu_read_lock()
>> section.
>>
>> I'll take a look, I suspect its mostly already done.
>>
>>
> Here is the patch I finally cooked and tested.
>
> Could you please test it as well Gergely ?
>
> Thanks !
>
> [PATCH] net: fix NULL dereferences in check_peer_redir()
>
> Gergely Kalman reported crashes in check_peer_redir().
>
> It appears commit f39925dbde778 (ipv4: Cache learned redirect
> information in inetpeer.) added a race, leading to possible NULL ptr
> dereference.
>
> Since we can now change dst neighbour, we should make sure a reader can
> safely use a neighbour.
>
> Add RCU protection to dst neighbour, and make sure check_peer_redir()
> can be called safely by different cpus in parallel.
>
> As neighbours are already freed after one RCU grace period, this patch
> should not add typical RCU penalty (cache cold effects)
>
> Many thanks to Gergely for providing a pretty report pointing to the
> bug.
>
> Reported-by: Gergely Kalman<synapse@hippy.csoma.elte.hu>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> ---
>   include/net/dst.h     |   17 +++++++++++++----
>   net/ipv4/ip_output.c  |   10 ++++++++--
>   net/ipv4/route.c      |   14 ++++++++------
>   net/ipv6/addrconf.c   |    2 +-
>   net/ipv6/ip6_fib.c    |    2 +-
>   net/ipv6/ip6_output.c |   13 +++++++++++--
>   net/ipv6/route.c      |   35 +++++++++++++++++++++++++----------
>   7 files changed, 67 insertions(+), 26 deletions(-)
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 29e2557..13d507d 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -37,7 +37,7 @@ struct dst_entry {
>   	unsigned long		_metrics;
>   	unsigned long		expires;
>   	struct dst_entry	*path;
> -	struct neighbour	*_neighbour;
> +	struct neighbour __rcu	*_neighbour;
>   #ifdef CONFIG_XFRM
>   	struct xfrm_state	*xfrm;
>   #else
> @@ -88,12 +88,17 @@ struct dst_entry {
>
>   static inline struct neighbour *dst_get_neighbour(struct dst_entry *dst)
>   {
> -	return dst->_neighbour;
> +	return rcu_dereference(dst->_neighbour);
> +}
> +
> +static inline struct neighbour *dst_get_neighbour_raw(struct dst_entry *dst)
> +{
> +	return rcu_dereference_raw(dst->_neighbour);
>   }
>
>   static inline void dst_set_neighbour(struct dst_entry *dst, struct neighbour *neigh)
>   {
> -	dst->_neighbour = neigh;
> +	rcu_assign_pointer(dst->_neighbour, neigh);
>   }
>
>   extern u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old);
> @@ -382,8 +387,12 @@ static inline void dst_rcu_free(struct rcu_head *head)
>   static inline void dst_confirm(struct dst_entry *dst)
>   {
>   	if (dst) {
> -		struct neighbour *n = dst_get_neighbour(dst);
> +		struct neighbour *n;
> +
> +		rcu_read_lock();
> +		n = dst_get_neighbour(dst);
>   		neigh_confirm(n);
> +		rcu_read_unlock();
>   	}
>   }
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index ccaaa85..77d3ede 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -204,9 +204,15 @@ static inline int ip_finish_output2(struct sk_buff *skb)
>   		skb = skb2;
>   	}
>
> +	rcu_read_lock();
>   	neigh = dst_get_neighbour(dst);
> -	if (neigh)
> -		return neigh_output(neigh, skb);
> +	if (neigh) {
> +		int res = neigh_output(neigh, skb);
> +
> +		rcu_read_unlock();
> +		return res;
> +	}
> +	rcu_read_unlock();
>
>   	if (net_ratelimit())
>   		printk(KERN_DEBUG "ip_finish_output2: No header cache and no neighbour!\n");
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 1730689..6afc4eb 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1628,16 +1628,18 @@ static int check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
>   {
>   	struct rtable *rt = (struct rtable *) dst;
>   	__be32 orig_gw = rt->rt_gateway;
> -	struct neighbour *n;
> +	struct neighbour *n, *old_n;
>
>   	dst_confirm(&rt->dst);
>
> -	neigh_release(dst_get_neighbour(&rt->dst));
> -	dst_set_neighbour(&rt->dst, NULL);
> -
>   	rt->rt_gateway = peer->redirect_learned.a4;
> -	rt_bind_neighbour(rt);
> -	n = dst_get_neighbour(&rt->dst);
> +
> +	n = ipv4_neigh_lookup(&rt->dst,&rt->rt_gateway);
> +	if (IS_ERR(n))
> +		return PTR_ERR(n);
> +	old_n = xchg(&rt->dst._neighbour, n);
> +	if (old_n)
> +		neigh_release(old_n);
>   	if (!n || !(n->nud_state&  NUD_VALID)) {
>   		if (n)
>   			neigh_event_send(n, NULL);
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index a55500c..f012ebd 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -656,7 +656,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
>   	 * layer address of our nexhop router
>   	 */
>
> -	if (dst_get_neighbour(&rt->dst) == NULL)
> +	if (dst_get_neighbour_raw(&rt->dst) == NULL)
>   		ifa->flags&= ~IFA_F_OPTIMISTIC;
>
>   	ifa->idev = idev;
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index 54a4678..320d91d 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -1455,7 +1455,7 @@ static int fib6_age(struct rt6_info *rt, void *arg)
>   			RT6_TRACE("aging clone %p\n", rt);
>   			return -1;
>   		} else if ((rt->rt6i_flags&  RTF_GATEWAY)&&
> -			   (!(dst_get_neighbour(&rt->dst)->flags&  NTF_ROUTER))) {
> +			   (!(dst_get_neighbour_raw(&rt->dst)->flags&  NTF_ROUTER))) {
>   			RT6_TRACE("purging route %p via non-router but gateway\n",
>   				  rt);
>   			return -1;
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 32e5339..4c882cf 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -135,10 +135,15 @@ static int ip6_finish_output2(struct sk_buff *skb)
>   				skb->len);
>   	}
>
> +	rcu_read_lock();
>   	neigh = dst_get_neighbour(dst);
> -	if (neigh)
> -		return neigh_output(neigh, skb);
> +	if (neigh) {
> +		int res = neigh_output(neigh, skb);
>
> +		rcu_read_unlock();
> +		return res;
> +	}
> +	rcu_read_unlock();
>   	IP6_INC_STATS_BH(dev_net(dst->dev),
>   			 ip6_dst_idev(dst), IPSTATS_MIB_OUTNOROUTES);
>   	kfree_skb(skb);
> @@ -975,12 +980,14 @@ static int ip6_dst_lookup_tail(struct sock *sk,
>   	 * dst entry and replace it instead with the
>   	 * dst entry of the nexthop router
>   	 */
> +	rcu_read_lock();
>   	n = dst_get_neighbour(*dst);
>   	if (n&&  !(n->nud_state&  NUD_VALID)) {
>   		struct inet6_ifaddr *ifp;
>   		struct flowi6 fl_gw6;
>   		int redirect;
>
> +		rcu_read_unlock();
>   		ifp = ipv6_get_ifaddr(net,&fl6->saddr,
>   				      (*dst)->dev, 1);
>
> @@ -1000,6 +1007,8 @@ static int ip6_dst_lookup_tail(struct sock *sk,
>   			if ((err = (*dst)->error))
>   				goto out_err_release;
>   		}
> +	} else {
> +		rcu_read_unlock();
>   	}
>   #endif
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index e8987da..9e69eb0 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -364,7 +364,7 @@ out:
>   #ifdef CONFIG_IPV6_ROUTER_PREF
>   static void rt6_probe(struct rt6_info *rt)
>   {
> -	struct neighbour *neigh = rt ? dst_get_neighbour(&rt->dst) : NULL;
> +	struct neighbour *neigh;
>   	/*
>   	 * Okay, this does not seem to be appropriate
>   	 * for now, however, we need to check if it
> @@ -373,8 +373,10 @@ static void rt6_probe(struct rt6_info *rt)
>   	 * Router Reachability Probe MUST be rate-limited
>   	 * to no more than one per minute.
>   	 */
> +	rcu_read_lock();
> +	neigh = rt ? dst_get_neighbour(&rt->dst) : NULL;
>   	if (!neigh || (neigh->nud_state&  NUD_VALID))
> -		return;
> +		goto out;
>   	read_lock_bh(&neigh->lock);
>   	if (!(neigh->nud_state&  NUD_VALID)&&
>   	time_after(jiffies, neigh->updated + rt->rt6i_idev->cnf.rtr_probe_interval)) {
> @@ -387,8 +389,11 @@ static void rt6_probe(struct rt6_info *rt)
>   		target = (struct in6_addr *)&neigh->primary_key;
>   		addrconf_addr_solict_mult(target,&mcaddr);
>   		ndisc_send_ns(rt->rt6i_dev, NULL, target,&mcaddr, NULL);
> -	} else
> +	} else {
>   		read_unlock_bh(&neigh->lock);
> +	}
> +out:
> +	rcu_read_unlock();
>   }
>   #else
>   static inline void rt6_probe(struct rt6_info *rt)
> @@ -412,8 +417,11 @@ static inline int rt6_check_dev(struct rt6_info *rt, int oif)
>
>   static inline int rt6_check_neigh(struct rt6_info *rt)
>   {
> -	struct neighbour *neigh = dst_get_neighbour(&rt->dst);
> +	struct neighbour *neigh;
>   	int m;
> +
> +	rcu_read_lock();
> +	neigh = dst_get_neighbour(&rt->dst);
>   	if (rt->rt6i_flags&  RTF_NONEXTHOP ||
>   	    !(rt->rt6i_flags&  RTF_GATEWAY))
>   		m = 1;
> @@ -430,6 +438,7 @@ static inline int rt6_check_neigh(struct rt6_info *rt)
>   		read_unlock_bh(&neigh->lock);
>   	} else
>   		m = 0;
> +	rcu_read_unlock();
>   	return m;
>   }
>
> @@ -769,7 +778,7 @@ static struct rt6_info *rt6_alloc_clone(struct rt6_info *ort,
>   		rt->rt6i_dst.plen = 128;
>   		rt->rt6i_flags |= RTF_CACHE;
>   		rt->dst.flags |= DST_HOST;
> -		dst_set_neighbour(&rt->dst, neigh_clone(dst_get_neighbour(&ort->dst)));
> +		dst_set_neighbour(&rt->dst, neigh_clone(dst_get_neighbour_raw(&ort->dst)));
>   	}
>   	return rt;
>   }
> @@ -803,7 +812,7 @@ restart:
>   	dst_hold(&rt->dst);
>   	read_unlock_bh(&table->tb6_lock);
>
> -	if (!dst_get_neighbour(&rt->dst)&&  !(rt->rt6i_flags&  RTF_NONEXTHOP))
> +	if (!dst_get_neighbour_raw(&rt->dst)&&  !(rt->rt6i_flags&  RTF_NONEXTHOP))
>   		nrt = rt6_alloc_cow(rt,&fl6->daddr,&fl6->saddr);
>   	else if (!(rt->dst.flags&  DST_HOST))
>   		nrt = rt6_alloc_clone(rt,&fl6->daddr);
> @@ -1587,7 +1596,7 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
>   	dst_confirm(&rt->dst);
>
>   	/* Duplicate redirect: silently ignore. */
> -	if (neigh == dst_get_neighbour(&rt->dst))
> +	if (neigh == dst_get_neighbour_raw(&rt->dst))
>   		goto out;
>
>   	nrt = ip6_rt_copy(rt, dest);
> @@ -1682,7 +1691,7 @@ again:
>   	   1. It is connected route. Action: COW
>   	   2. It is gatewayed route or NONEXTHOP route. Action: clone it.
>   	 */
> -	if (!dst_get_neighbour(&rt->dst)&&  !(rt->rt6i_flags&  RTF_NONEXTHOP))
> +	if (!dst_get_neighbour_raw(&rt->dst)&&  !(rt->rt6i_flags&  RTF_NONEXTHOP))
>   		nrt = rt6_alloc_cow(rt, daddr, saddr);
>   	else
>   		nrt = rt6_alloc_clone(rt, daddr);
> @@ -2326,6 +2335,7 @@ static int rt6_fill_node(struct net *net,
>   	struct nlmsghdr *nlh;
>   	long expires;
>   	u32 table;
> +	struct neighbour *n;
>
>   	if (prefix) {	/* user wants prefix routes only */
>   		if (!(rt->rt6i_flags&  RTF_PREFIX_RT)) {
> @@ -2414,8 +2424,11 @@ static int rt6_fill_node(struct net *net,
>   	if (rtnetlink_put_metrics(skb, dst_metrics_ptr(&rt->dst))<  0)
>   		goto nla_put_failure;
>
> -	if (dst_get_neighbour(&rt->dst))
> -		NLA_PUT(skb, RTA_GATEWAY, 16,&dst_get_neighbour(&rt->dst)->primary_key);
> +	rcu_read_lock();
> +	n = dst_get_neighbour(&rt->dst);
> +	if (n)
> +		NLA_PUT(skb, RTA_GATEWAY, 16,&n->primary_key);
> +	rcu_read_unlock();
>
>   	if (rt->dst.dev)
>   		NLA_PUT_U32(skb, RTA_OIF, rt->rt6i_dev->ifindex);
> @@ -2608,12 +2621,14 @@ static int rt6_info_route(struct rt6_info *rt, void *p_arg)
>   #else
>   	seq_puts(m, "00000000000000000000000000000000 00 ");
>   #endif
> +	rcu_read_lock();
>   	n = dst_get_neighbour(&rt->dst);
>   	if (n) {
>   		seq_printf(m, "%pi6", n->primary_key);
>   	} else {
>   		seq_puts(m, "00000000000000000000000000000000");
>   	}
> +	rcu_read_unlock();
>   	seq_printf(m, " %08x %08x %08x %08x %8s\n",
>   		   rt->rt6i_metric, atomic_read(&rt->dst.__refcnt),
>   		   rt->dst.__use, rt->rt6i_flags,
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-08-01  8:57                     ` synapse
@ 2011-08-01  9:15                       ` Eric Dumazet
  2011-08-01 15:25                         ` synapse
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-08-01  9:15 UTC (permalink / raw)
  To: synapse; +Cc: David Miller, netdev

Le lundi 01 août 2011 à 10:57 +0200, synapse a écrit :
> Hello
> 
> Sorry, I wasn't home on the weekend. Exactly to which tree should I 
> apply this?
> It doesn't apply cleanly to 3.0.0. Am I missing something?
> 

Could you try latest linux tree ?

We first validate patches on current tree, then backport them if needed
to previous kernels.

Thanks



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-08-01  9:15                       ` Eric Dumazet
@ 2011-08-01 15:25                         ` synapse
  0 siblings, 0 replies; 18+ messages in thread
From: synapse @ 2011-08-01 15:25 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev

On 08/01/11 11:15, Eric Dumazet wrote:
> Le lundi 01 août 2011 à 10:57 +0200, synapse a écrit :
>> Hello
>>
>> Sorry, I wasn't home on the weekend. Exactly to which tree should I
>> apply this?
>> It doesn't apply cleanly to 3.0.0. Am I missing something?
>>
> Could you try latest linux tree ?
>
> We first validate patches on current tree, then backport them if needed
> to previous kernels.
>
> Thanks
>
deployed, we'll see if it works out :)

Gergely Kalman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-07-30  5:00                   ` Eric Dumazet
  2011-08-01  8:57                     ` synapse
@ 2011-08-03 10:34                     ` David Miller
  2011-08-08 13:08                       ` synapse
  2011-08-09  6:56                       ` [PATCH] net: fix potential neighbour race in dst_ifdown() Eric Dumazet
  1 sibling, 2 replies; 18+ messages in thread
From: David Miller @ 2011-08-03 10:34 UTC (permalink / raw)
  To: eric.dumazet; +Cc: synapse, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 30 Jul 2011 07:00:53 +0200

> [PATCH] net: fix NULL dereferences in check_peer_redir()

I'm adding this now to my tree so it gets more widespread
testing.

Thanks Eric.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check)
  2011-08-03 10:34                     ` David Miller
@ 2011-08-08 13:08                       ` synapse
  2011-08-09  6:56                       ` [PATCH] net: fix potential neighbour race in dst_ifdown() Eric Dumazet
  1 sibling, 0 replies; 18+ messages in thread
From: synapse @ 2011-08-08 13:08 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

On 08/03/11 12:34, David Miller wrote:
> From: Eric Dumazet<eric.dumazet@gmail.com>
> Date: Sat, 30 Jul 2011 07:00:53 +0200
>
>> [PATCH] net: fix NULL dereferences in check_peer_redir()
> I'm adding this now to my tree so it gets more widespread
> testing.
>
> Thanks Eric.
So far, so good. The machine is alive and well ever since.

Thanks for your fast response.

Gergely Kalman

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] net: fix potential neighbour race in dst_ifdown()
  2011-08-03 10:34                     ` David Miller
  2011-08-08 13:08                       ` synapse
@ 2011-08-09  6:56                       ` Eric Dumazet
  2011-08-10  4:47                         ` David Miller
  1 sibling, 1 reply; 18+ messages in thread
From: Eric Dumazet @ 2011-08-09  6:56 UTC (permalink / raw)
  To: David Miller; +Cc: synapse, netdev

Le mercredi 03 août 2011 à 03:34 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sat, 30 Jul 2011 07:00:53 +0200
> 
> > [PATCH] net: fix NULL dereferences in check_peer_redir()
> 
> I'm adding this now to my tree so it gets more widespread
> testing.

Thanks David

We probably have other races, here is a followup patch, probably suited
for net-next, since its not clear if its a real problem while device is
unregistering. Consider it as a cleanup at very least.

[PATCH] net: fix potential neighbour race in dst_ifdown()

Followup of commit f2c31e32b378a (fix NULL dereferences in
check_peer_redir()).

We need to make sure dst neighbour doesnt change in dst_ifdown().

Fix some sparse errors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/core/dst.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 14b33baf..d5e2c4c 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -171,7 +171,7 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
 	dst_init_metrics(dst, dst_default_metrics, true);
 	dst->expires = 0UL;
 	dst->path = dst;
-	dst->_neighbour = NULL;
+	RCU_INIT_POINTER(dst->_neighbour, NULL);
 #ifdef CONFIG_XFRM
 	dst->xfrm = NULL;
 #endif
@@ -229,11 +229,11 @@ struct dst_entry *dst_destroy(struct dst_entry * dst)
 	smp_rmb();
 
 again:
-	neigh = dst->_neighbour;
+	neigh = rcu_dereference_protected(dst->_neighbour, 1);
 	child = dst->child;
 
 	if (neigh) {
-		dst->_neighbour = NULL;
+		RCU_INIT_POINTER(dst->_neighbour, NULL);
 		neigh_release(neigh);
 	}
 
@@ -360,14 +360,19 @@ static void dst_ifdown(struct dst_entry *dst, struct net_device *dev,
 	if (!unregister) {
 		dst->input = dst->output = dst_discard;
 	} else {
+		struct neighbour *neigh;
+
 		dst->dev = dev_net(dst->dev)->loopback_dev;
 		dev_hold(dst->dev);
 		dev_put(dev);
-		if (dst->_neighbour && dst->_neighbour->dev == dev) {
-			dst->_neighbour->dev = dst->dev;
+		rcu_read_lock();
+		neigh = dst_get_neighbour(dst);
+		if (neigh && neigh->dev == dev) {
+			neigh->dev = dst->dev;
 			dev_hold(dst->dev);
 			dev_put(dev);
 		}
+		rcu_read_unlock();
 	}
 }
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] net: fix potential neighbour race in dst_ifdown()
  2011-08-09  6:56                       ` [PATCH] net: fix potential neighbour race in dst_ifdown() Eric Dumazet
@ 2011-08-10  4:47                         ` David Miller
  0 siblings, 0 replies; 18+ messages in thread
From: David Miller @ 2011-08-10  4:47 UTC (permalink / raw)
  To: eric.dumazet; +Cc: synapse, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 09 Aug 2011 08:56:14 +0200

> Le mercredi 03 août 2011 à 03:34 -0700, David Miller a écrit :
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sat, 30 Jul 2011 07:00:53 +0200
>> 
>> > [PATCH] net: fix NULL dereferences in check_peer_redir()
>> 
>> I'm adding this now to my tree so it gets more widespread
>> testing.
> 
> Thanks David
> 
> We probably have other races, here is a followup patch, probably suited
> for net-next, since its not clear if its a real problem while device is
> unregistering. Consider it as a cleanup at very least.
> 
> [PATCH] net: fix potential neighbour race in dst_ifdown()
> 
> Followup of commit f2c31e32b378a (fix NULL dereferences in
> check_peer_redir()).
> 
> We need to make sure dst neighbour doesnt change in dst_ifdown().
> 
> Fix some sparse errors.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Ok, applied.  I'll keep working to eliminate dst->_neighbour entirely
as that's a better long term solution I think :-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2011-08-10  4:48 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-29 13:18 PROBLEM: BUG (NULL ptr dereference in ipv4_dst_check) synapse
2011-07-29 13:33 ` Eric Dumazet
2011-07-29 14:26   ` synapse
2011-07-29 14:36     ` Eric Dumazet
2011-07-29 14:39       ` David Miller
2011-07-29 14:41         ` Eric Dumazet
2011-07-29 14:44           ` synapse
2011-07-29 15:11             ` Eric Dumazet
2011-07-29 15:19               ` David Miller
2011-07-29 15:43                 ` Eric Dumazet
2011-07-30  5:00                   ` Eric Dumazet
2011-08-01  8:57                     ` synapse
2011-08-01  9:15                       ` Eric Dumazet
2011-08-01 15:25                         ` synapse
2011-08-03 10:34                     ` David Miller
2011-08-08 13:08                       ` synapse
2011-08-09  6:56                       ` [PATCH] net: fix potential neighbour race in dst_ifdown() Eric Dumazet
2011-08-10  4:47                         ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).