* [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
@ 2017-10-16 21:29 David Ahern
2017-10-16 21:46 ` Cong Wang
2017-10-17 3:45 ` Eric Dumazet
0 siblings, 2 replies; 8+ messages in thread
From: David Ahern @ 2017-10-16 21:29 UTC (permalink / raw)
To: netdev; +Cc: xiyou.wangcong, David Ahern
Running perf in one window to capture tcp_retransmit_skb tracepoint:
$ perf record -e tcp:tcp_retransmit_skb -a
And causing a retransmission on an active TCP session (e.g., dropping
packets in the receiver, changing MTU on the interface to 500 and back
to 1500) triggers a panic:
[ 58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
[ 58.546770] PGD 0 P4D 0
[ 58.547472] Oops: 0000 [#1] SMP
[ 58.548328] Modules linked in: vrf
[ 58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
[ 58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[ 58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
[ 58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
[ 58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
[ 58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
[ 58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
[ 58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
[ 58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
[ 58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
[ 58.563845] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 58.564873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
[ 58.566538] Call Trace:
[ 58.566865] <IRQ>
[ 58.567140] __tcp_retransmit_skb+0x4ab/0x4c6
[ 58.567704] ? tcp_set_ca_state+0x22/0x3f
[ 58.568231] tcp_retransmit_skb+0x14/0xa3
[ 58.568754] tcp_retransmit_timer+0x472/0x5e3
[ 58.569324] ? tcp_write_timer_handler+0x1e9/0x1e9
[ 58.569946] tcp_write_timer_handler+0x95/0x1e9
[ 58.570548] tcp_write_timer+0x2a/0x58
Check that daddr_cache is non-NULL before de-referencing.
Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
Signed-off-by: David Ahern <dsahern@gmail.com>
---
include/trace/events/tcp.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 3d1cbd072b7e..13e8ee8af2c8 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -48,7 +48,8 @@ TRACE_EVENT(tcp_retransmit_skb,
pin6 = (struct in6_addr *)__entry->saddr_v6;
*pin6 = np->saddr;
pin6 = (struct in6_addr *)__entry->daddr_v6;
- *pin6 = *(np->daddr_cache);
+ if (np->daddr_cache)
+ *pin6 = *(np->daddr_cache);
} else {
pin6 = (struct in6_addr *)__entry->saddr_v6;
ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
--
2.1.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-16 21:29 [PATCH net-next] tcp: Check daddr_cache before use in tracepoint David Ahern
@ 2017-10-16 21:46 ` Cong Wang
2017-10-16 21:55 ` David Ahern
2017-10-17 3:45 ` Eric Dumazet
1 sibling, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-10-16 21:46 UTC (permalink / raw)
To: David Ahern; +Cc: Linux Kernel Network Developers
On Mon, Oct 16, 2017 at 2:29 PM, David Ahern <dsahern@gmail.com> wrote:
> Running perf in one window to capture tcp_retransmit_skb tracepoint:
> $ perf record -e tcp:tcp_retransmit_skb -a
>
> And causing a retransmission on an active TCP session (e.g., dropping
> packets in the receiver, changing MTU on the interface to 500 and back
> to 1500) triggers a panic:
>
> [ 58.543144] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [ 58.545300] IP: perf_trace_tcp_retransmit_skb+0xd0/0x145
> [ 58.546770] PGD 0 P4D 0
> [ 58.547472] Oops: 0000 [#1] SMP
> [ 58.548328] Modules linked in: vrf
> [ 58.549262] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc4+ #26
> [ 58.551004] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [ 58.554560] task: ffffffff81a0e540 task.stack: ffffffff81a00000
> [ 58.555817] RIP: 0010:perf_trace_tcp_retransmit_skb+0xd0/0x145
> [ 58.557137] RSP: 0018:ffff88003fc03d68 EFLAGS: 00010282
> [ 58.558292] RAX: 0000000000000000 RBX: ffffe8ffffc0ec80 RCX: ffff880038543098
> [ 58.559850] RDX: 0400000000000000 RSI: ffff88003fc03d70 RDI: ffff88003fc14b68
> [ 58.561099] RBP: ffff88003fc03da8 R08: 0000000000000000 R09: ffffea0000d3224a
> [ 58.562005] R10: ffff88003fc03db8 R11: 0000000000000010 R12: ffff8800385428c0
> [ 58.562930] R13: ffffe8ffffc0e478 R14: ffffffff81a93a40 R15: ffff88003d4f0c00
> [ 58.563845] FS: 0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
> [ 58.564873] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 58.565613] CR2: 0000000000000008 CR3: 000000003d68f004 CR4: 00000000000606f0
> [ 58.566538] Call Trace:
> [ 58.566865] <IRQ>
> [ 58.567140] __tcp_retransmit_skb+0x4ab/0x4c6
> [ 58.567704] ? tcp_set_ca_state+0x22/0x3f
> [ 58.568231] tcp_retransmit_skb+0x14/0xa3
> [ 58.568754] tcp_retransmit_timer+0x472/0x5e3
> [ 58.569324] ? tcp_write_timer_handler+0x1e9/0x1e9
> [ 58.569946] tcp_write_timer_handler+0x95/0x1e9
> [ 58.570548] tcp_write_timer+0x2a/0x58
>
> Check that daddr_cache is non-NULL before de-referencing.
Well, for NULL case, the entry->saddr_v6 will not be filled with
your patch.
How about using sk->sk_v6_daddr?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-16 21:46 ` Cong Wang
@ 2017-10-16 21:55 ` David Ahern
2017-10-16 22:05 ` Cong Wang
0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2017-10-16 21:55 UTC (permalink / raw)
To: Cong Wang; +Cc: Linux Kernel Network Developers
On 10/16/17 3:46 PM, Cong Wang wrote:
> Well, for NULL case, the entry->saddr_v6 will not be filled with
> your patch.
I think you meant daddr_v6 and yes it will not get filled in, but 0 is
better than a panic ;-)
>
> How about using sk->sk_v6_daddr?
>
That works, but then both addresses should come from sk and not np:
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
index 1ffab6d96e94..7989c2dcedf1 100644
--- a/include/trace/events/tcp.h
+++ b/include/trace/events/tcp.h
@@ -46,9 +46,9 @@ TRACE_EVENT(tcp_retransmit_skb,
if (np) {
pin6 = (struct in6_addr *)__entry->saddr_v6;
- *pin6 = np->saddr;
+ *pin6 = sk->sk_v6_rcv_saddr;
pin6 = (struct in6_addr *)__entry->daddr_v6;
- *pin6 = *(np->daddr_cache);
+ *pin6 = sk->sk_v6_daddr;
} else {
pin6 = (struct in6_addr *)__entry->saddr_v6;
ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
So np is just a trigger for IPv6 versus v4 mapped.
I do have a question about why ipv4 addresses are non-NULL:
$ perf script
swapper 0 [000] 25.025102: tcp:tcp_retransmit_skb:
sport=22 dport=48076 saddr=127.0.0.6 daddr=127.0.0.6
saddrv6=2001:db8:1::4 daddrv6=2001:db8:1::64
swapper 0 [000] 25.231125: tcp:tcp_retransmit_skb:
sport=22 dport=48076 saddr=127.0.0.6 daddr=127.0.0.6
saddrv6=2001:db8:1::4 daddrv6=2001:db8:1::64
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-16 21:55 ` David Ahern
@ 2017-10-16 22:05 ` Cong Wang
2017-10-16 22:08 ` David Ahern
0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-10-16 22:05 UTC (permalink / raw)
To: David Ahern; +Cc: Linux Kernel Network Developers
On Mon, Oct 16, 2017 at 2:55 PM, David Ahern <dsahern@gmail.com> wrote:
> On 10/16/17 3:46 PM, Cong Wang wrote:
>> Well, for NULL case, the entry->saddr_v6 will not be filled with
>> your patch.
>
> I think you meant daddr_v6 and yes it will not get filled in, but 0 is
> better than a panic ;-)
>
Sure, but we can do even better with sk->sk_v6_daddr.
>>
>> How about using sk->sk_v6_daddr?
>>
>
> That works, but then both addresses should come from sk and not np:
Yes.
>
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 1ffab6d96e94..7989c2dcedf1 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -46,9 +46,9 @@ TRACE_EVENT(tcp_retransmit_skb,
>
> if (np) {
> pin6 = (struct in6_addr *)__entry->saddr_v6;
> - *pin6 = np->saddr;
> + *pin6 = sk->sk_v6_rcv_saddr;
> pin6 = (struct in6_addr *)__entry->daddr_v6;
> - *pin6 = *(np->daddr_cache);
> + *pin6 = sk->sk_v6_daddr;
> } else {
> pin6 = (struct in6_addr *)__entry->saddr_v6;
> ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
>
>
> So np is just a trigger for IPv6 versus v4 mapped.
>
> I do have a question about why ipv4 addresses are non-NULL:
If you really mean non-NULL, they are addresses, not pointers to
addresses.
Or if you mean non-zero, they are set to LOOPBACK4_IPV6
which is what I expect.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-16 22:05 ` Cong Wang
@ 2017-10-16 22:08 ` David Ahern
0 siblings, 0 replies; 8+ messages in thread
From: David Ahern @ 2017-10-16 22:08 UTC (permalink / raw)
To: Cong Wang; +Cc: Linux Kernel Network Developers
On 10/16/17 4:05 PM, Cong Wang wrote:
> Or if you mean non-zero, they are set to LOOPBACK4_IPV6
> which is what I expect.
yes, I meant non-0. thanks for the explanation. I'll send a v2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-16 21:29 [PATCH net-next] tcp: Check daddr_cache before use in tracepoint David Ahern
2017-10-16 21:46 ` Cong Wang
@ 2017-10-17 3:45 ` Eric Dumazet
2017-10-17 15:32 ` David Ahern
1 sibling, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-10-17 3:45 UTC (permalink / raw)
To: David Ahern; +Cc: netdev, xiyou.wangcong
On Mon, 2017-10-16 at 14:29 -0700, David Ahern wrote:
> Running perf in one window to capture tcp_retransmit_skb tracepoint:
> $ perf record -e tcp:tcp_retransmit_skb -a
>
> And causing a retransmission on an active TCP session (e.g., dropping
> packets in the receiver, changing MTU on the interface to 500 and back
> to 1500) triggers a panic:
...
> Check that daddr_cache is non-NULL before de-referencing.
>
> Fixes: e086101b150a ("tcp: add a tracepoint for tcp retransmission")
> Signed-off-by: David Ahern <dsahern@gmail.com>
> ---
> include/trace/events/tcp.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h
> index 3d1cbd072b7e..13e8ee8af2c8 100644
> --- a/include/trace/events/tcp.h
> +++ b/include/trace/events/tcp.h
> @@ -48,7 +48,8 @@ TRACE_EVENT(tcp_retransmit_skb,
> pin6 = (struct in6_addr *)__entry->saddr_v6;
> *pin6 = np->saddr;
> pin6 = (struct in6_addr *)__entry->daddr_v6;
> - *pin6 = *(np->daddr_cache);
> + if (np->daddr_cache)
> + *pin6 = *(np->daddr_cache);
> } else {
> pin6 = (struct in6_addr *)__entry->saddr_v6;
> ipv6_addr_set_v4mapped(inet->inet_saddr, pin6);
This is weird.
IPV6 TCP uses sk->sk_v6_daddr and sk->->sk_v6_rcv_saddr
So I would rather remove the need to fetch np = inet6_sk(sk) in the
first place, and look at sk->sk_family instead.
No need to get to np at all.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-17 3:45 ` Eric Dumazet
@ 2017-10-17 15:32 ` David Ahern
2017-10-17 17:05 ` Cong Wang
0 siblings, 1 reply; 8+ messages in thread
From: David Ahern @ 2017-10-17 15:32 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, xiyou.wangcong
On 10/16/17 9:45 PM, Eric Dumazet wrote:
> IPV6 TCP uses sk->sk_v6_daddr and sk->->sk_v6_rcv_saddr
I moved v2 to those.
> So I would rather remove the need to fetch np = inet6_sk(sk) in the
> first place, and look at sk->sk_family instead.
I'll spin a v3 with that change.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] tcp: Check daddr_cache before use in tracepoint
2017-10-17 15:32 ` David Ahern
@ 2017-10-17 17:05 ` Cong Wang
0 siblings, 0 replies; 8+ messages in thread
From: Cong Wang @ 2017-10-17 17:05 UTC (permalink / raw)
To: David Ahern; +Cc: Eric Dumazet, Linux Kernel Network Developers
On Tue, Oct 17, 2017 at 8:32 AM, David Ahern <dsahern@gmail.com> wrote:
> On 10/16/17 9:45 PM, Eric Dumazet wrote:
>> IPV6 TCP uses sk->sk_v6_daddr and sk->->sk_v6_rcv_saddr
>
> I moved v2 to those.
>
>> So I would rather remove the need to fetch np = inet6_sk(sk) in the
>> first place, and look at sk->sk_family instead.
>
> I'll spin a v3 with that change.
Note, with CONFIG_IPV6=n, inet6_sk() is just NULL and compiler
could eliminate the if(NULL) branch, I doubt compiler could do same
for sk->sk_family.
Oh, BTW, I thought it is okay to use sk_v6_daddr in if (NULL)
case (aka CONFIG_IPV6=n), but a quick test shows compiler still
complains... so probably need #if IS_ENABLED(CONFIG_IPV6).
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-10-17 17:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-16 21:29 [PATCH net-next] tcp: Check daddr_cache before use in tracepoint David Ahern
2017-10-16 21:46 ` Cong Wang
2017-10-16 21:55 ` David Ahern
2017-10-16 22:05 ` Cong Wang
2017-10-16 22:08 ` David Ahern
2017-10-17 3:45 ` Eric Dumazet
2017-10-17 15:32 ` David Ahern
2017-10-17 17:05 ` Cong Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.