All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
@ 2017-10-16 21:29 David Ahern
  2017-10-16 21:46 ` Cong Wang
  2017-10-17  3:45 ` Eric Dumazet
  0 siblings, 2 replies; 8+ messages in thread
From: David Ahern @ 2017-10-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: xiyou.wangcong, David Ahern

Running perf in one window to capture tcp_retransmit_skb tracepoint:
    $ perf record -e tcp:tcp_retransmit_skb -a

And causing a retransmission on an active TCP session (e.g., dropping
packets in the receiver, changing MTU on the interface to 500 and back
to 1500) triggers a panic:

[   58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.546770] PGD 0 P4D 0
[   58.547472] Oops: 0000 [#1] SMP
[   58.548328] Modules linked in: vrf
[   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
[   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[   58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
[   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
[   58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
[   58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
[   58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
[   58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
[   58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
[   58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
[   58.563845] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[   58.564873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
[   58.566538] Call Trace:
[   58.566865]  <IRQ>
[   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
[   58.567704]  ? tcp_set_ca_state+0x22/0x3f
[   58.568231]  tcp_retransmit_skb+0x14/0xa3
[   58.568754]  tcp_retransmit_timer+0x472/0x5e3
[   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
[   58.569946]  tcp_write_timer_handler+0x95/0x1e9
[   58.570548]  tcp_write_timer+0x2a/0x58

Check that daddr_cache is non-NULL before de-referencing.

Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/trace/events/tcp.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 3d1cbd072b7e..13e8ee8af2c8 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -48,7 +48,8 @@ TRACE_EVENT(tcp_retransmit_skb,
 			pin6 = (struct in6_addr *)__entry->saddr_v6;
 			*pin6 = np->saddr;
 			pin6 = (struct in6_addr *)__entry->daddr_v6;
-			*pin6 = *(np->daddr_cache);
+			if (np->daddr_cache)
+				*pin6 = *(np->daddr_cache);
 		} else {
 			pin6 = (struct in6_addr *)__entry->saddr_v6;
 			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-16 21:29 [PATCH net-next] tcp: Check daddr_cache before use in tracepoint David Ahern
@ 2017-10-16 21:46 ` Cong Wang
  2017-10-16 21:55   ` David Ahern
  2017-10-17  3:45 ` Eric Dumazet
  1 sibling, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-10-16 21:46 UTC (permalink / raw)
  To: David Ahern; +Cc: Linux Kernel Network Developers

On Mon, Oct 16, 2017 at 2:29 PM, David Ahern <dsahern@gmail.com> wrote:
> Running perf in one window to capture tcp_retransmit_skb tracepoint:
>     $ perf record -e tcp:tcp_retransmit_skb -a
>
> And causing a retransmission on an active TCP session (e.g., dropping
> packets in the receiver, changing MTU on the interface to 500 and back
> to 1500) triggers a panic:
>
> [   58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [   58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
> [   58.546770] PGD 0 P4D 0
> [   58.547472] Oops: 0000 [#1] SMP
> [   58.548328] Modules linked in: vrf
> [   58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
> [   58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [   58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
> [   58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
> [   58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
> [   58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
> [   58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
> [   58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
> [   58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
> [   58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
> [   58.563845] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [   58.564873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
> [   58.566538] Call Trace:
> [   58.566865]  <IRQ>
> [   58.567140]  __tcp_retransmit_skb+0x4ab/0x4c6
> [   58.567704]  ? tcp_set_ca_state+0x22/0x3f
> [   58.568231]  tcp_retransmit_skb+0x14/0xa3
> [   58.568754]  tcp_retransmit_timer+0x472/0x5e3
> [   58.569324]  ? tcp_write_timer_handler+0x1e9/0x1e9
> [   58.569946]  tcp_write_timer_handler+0x95/0x1e9
> [   58.570548]  tcp_write_timer+0x2a/0x58
>
> Check that daddr_cache is non-NULL before de-referencing.

Well, for NULL case, the entry->saddr_v6 will not be filled with
your patch.

How about using sk->sk_v6_daddr?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-16 21:46 ` Cong Wang
@ 2017-10-16 21:55   ` David Ahern
  2017-10-16 22:05     ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2017-10-16 21:55 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers

On 10/16/17 3:46 PM, Cong Wang wrote:
> Well, for NULL case, the entry->saddr_v6 will not be filled with
> your patch.

I think you meant daddr_v6 and yes it will not get filled in, but 0 is
better than a panic ;-)

> 
> How about using sk->sk_v6_daddr?
> 

That works, but then both addresses should come from sk and not np:

diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1ffab6d96e94..7989c2dcedf1 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -46,9 +46,9 @@ TRACE_EVENT(tcp_retransmit_skb,

                if (np) {
                        pin6 = (struct in6_addr *)__entry->saddr_v6;
-                       *pin6 = np->saddr;
+                       *pin6 = sk->sk_v6_rcv_saddr;
                        pin6 = (struct in6_addr *)__entry->daddr_v6;
-                       *pin6 = *(np->daddr_cache);
+                       *pin6 = sk->sk_v6_daddr;
                } else {
                        pin6 = (struct in6_addr *)__entry->saddr_v6;
                        ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);


So np is just a trigger for IPv6 versus v4 mapped.

I do have a question about why ipv4 addresses are non-NULL:

$ perf script
         swapper     0 [000]    25.025102: tcp:tcp_retransmit_skb:
sport=22 dport=48076 saddr=127.0.0.6 daddr=127.0.0.6
saddrv6=2001:db8:1::4 daddrv6=2001:db8:1::64
         swapper     0 [000]    25.231125: tcp:tcp_retransmit_skb:
sport=22 dport=48076 saddr=127.0.0.6 daddr=127.0.0.6
saddrv6=2001:db8:1::4 daddrv6=2001:db8:1::64

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-16 21:55   ` David Ahern
@ 2017-10-16 22:05     ` Cong Wang
  2017-10-16 22:08       ` David Ahern
  0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-10-16 22:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Linux Kernel Network Developers

On Mon, Oct 16, 2017 at 2:55 PM, David Ahern <dsahern@gmail.com> wrote:
> On 10/16/17 3:46 PM, Cong Wang wrote:
>> Well, for NULL case, the entry->saddr_v6 will not be filled with
>> your patch.
>
> I think you meant daddr_v6 and yes it will not get filled in, but 0 is
> better than a panic ;-)
>

Sure, but we can do even better with sk->sk_v6_daddr.


>>
>> How about using sk->sk_v6_daddr?
>>
>
> That works, but then both addresses should come from sk and not np:

Yes.

>
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 1ffab6d96e94..7989c2dcedf1 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -46,9 +46,9 @@ TRACE_EVENT(tcp_retransmit_skb,
>
>                 if (np) {
>                         pin6 = (struct in6_addr *)__entry->saddr_v6;
> -                       *pin6 = np->saddr;
> +                       *pin6 = sk->sk_v6_rcv_saddr;
>                         pin6 = (struct in6_addr *)__entry->daddr_v6;
> -                       *pin6 = *(np->daddr_cache);
> +                       *pin6 = sk->sk_v6_daddr;
>                 } else {
>                         pin6 = (struct in6_addr *)__entry->saddr_v6;
>                         ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
>
>
> So np is just a trigger for IPv6 versus v4 mapped.
>
> I do have a question about why ipv4 addresses are non-NULL:

If you really mean non-NULL, they are addresses, not pointers to
addresses.

Or if you mean non-zero, they are set to LOOPBACK4_IPV6
which is what I expect.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-16 22:05     ` Cong Wang
@ 2017-10-16 22:08       ` David Ahern
  0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2017-10-16 22:08 UTC (permalink / raw)
  To: Cong Wang; +Cc: Linux Kernel Network Developers

On 10/16/17 4:05 PM, Cong Wang wrote:
> Or if you mean non-zero, they are set to LOOPBACK4_IPV6
> which is what I expect.

yes, I meant non-0. thanks for the explanation. I'll send a v2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-16 21:29 [PATCH net-next] tcp: Check daddr_cache before use in tracepoint David Ahern
  2017-10-16 21:46 ` Cong Wang
@ 2017-10-17  3:45 ` Eric Dumazet
  2017-10-17 15:32   ` David Ahern
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-10-17  3:45 UTC (permalink / raw)
  To: David Ahern; +Cc: netdev, xiyou.wangcong

On Mon, 2017-10-16 at 14:29 -0700, David Ahern wrote:
> Running perf in one window to capture tcp_retransmit_skb tracepoint:
>     $ perf record -e tcp:tcp_retransmit_skb -a
> 
> And causing a retransmission on an active TCP session (e.g., dropping
> packets in the receiver, changing MTU on the interface to 500 and back
> to 1500) triggers a panic:
...

> Check that daddr_cache is non-NULL before de-referencing.
> 
> Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
>  include/trace/events/tcp.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 3d1cbd072b7e..13e8ee8af2c8 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -48,7 +48,8 @@ TRACE_EVENT(tcp_retransmit_skb,
>  			pin6 = (struct in6_addr *)__entry->saddr_v6;
>  			*pin6 = np->saddr;
>  			pin6 = (struct in6_addr *)__entry->daddr_v6;
> -			*pin6 = *(np->daddr_cache);
> +			if (np->daddr_cache)
> +				*pin6 = *(np->daddr_cache);
>  		} else {
>  			pin6 = (struct in6_addr *)__entry->saddr_v6;
>  			ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);

This is weird.

IPV6 TCP uses sk->sk_v6_daddr and sk->->sk_v6_rcv_saddr

So I would rather remove the need to fetch np = inet6_sk(sk) in the
first place, and look at sk->sk_family instead.

No need to get to np at all.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-17  3:45 ` Eric Dumazet
@ 2017-10-17 15:32   ` David Ahern
  2017-10-17 17:05     ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2017-10-17 15:32 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, xiyou.wangcong

On 10/16/17 9:45 PM, Eric Dumazet wrote:
> IPV6 TCP uses sk->sk_v6_daddr and sk->->sk_v6_rcv_saddr

I moved v2 to those.

> So I would rather remove the need to fetch np = inet6_sk(sk) in the
> first place, and look at sk->sk_family instead.

I'll spin a v3 with that change.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
  2017-10-17 15:32   ` David Ahern
@ 2017-10-17 17:05     ` Cong Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Cong Wang @ 2017-10-17 17:05 UTC (permalink / raw)
  To: David Ahern; +Cc: Eric Dumazet, Linux Kernel Network Developers

On Tue, Oct 17, 2017 at 8:32 AM, David Ahern <dsahern@gmail.com> wrote:
> On 10/16/17 9:45 PM, Eric Dumazet wrote:
>> IPV6 TCP uses sk->sk_v6_daddr and sk->->sk_v6_rcv_saddr
>
> I moved v2 to those.
>
>> So I would rather remove the need to fetch np = inet6_sk(sk) in the
>> first place, and look at sk->sk_family instead.
>
> I'll spin a v3 with that change.

Note, with CONFIG_IPV6=n, inet6_sk() is just NULL and compiler
could eliminate the if(NULL) branch, I doubt compiler could do same
for sk->sk_family.

Oh, BTW, I thought it is okay to use sk_v6_daddr in if (NULL)
case (aka CONFIG_IPV6=n), but a quick test shows compiler still
complains... so probably need #if IS_ENABLED(CONFIG_IPV6).

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-17 17:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-16 21:29 [PATCH net-next] tcp: Check daddr_cache before use in tracepoint David Ahern
2017-10-16 21:46 ` Cong Wang
2017-10-16 21:55   ` David Ahern
2017-10-16 22:05     ` Cong Wang
2017-10-16 22:08       ` David Ahern
2017-10-17  3:45 ` Eric Dumazet
2017-10-17 15:32   ` David Ahern
2017-10-17 17:05     ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.