All of lore.kernel.org
 help / color / mirror / Atom feed
* [4.9.13] use after free in ipv4_mtu
@ 2017-03-02 12:42 Daniel J Blueman
  2017-03-02 13:08 ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel J Blueman @ 2017-03-02 12:42 UTC (permalink / raw)
  To: Netdev; +Cc: Eric Dumazet, David Miller, Stephen Hemminger

With debugging enabled [1,2], KASAN finds a use-after-free in ipv4_mtu
[3,4] with activity over a brcmfmac wireless card.

Let me know for further testing/debugging.

Thanks!
  Dan

[1] https://quora.org/linux/ipv4_mtu/config
[2] https://quora.org/linux/ipv4_mtu/vmlinux

-- [3]

BUG: KASAN: use-after-free in ipv4_mtu+0x25d/0x320 at addr ffff8803dd32e3cc
Read of size 4 by task https/2070
CPU: 3 PID: 2070 Comm: https Not tainted 4.9.13-debug+ #8
Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
 ffff88040e897690 ffffffff944db781 ffff88042740f740 ffff8803dd32e3c8
 ffff88040e8976b8 ffffffff93e4d9e1 ffff88040e897750 ffff8803dd32e3c0
 ffff88042740f740 ffff88040e897740 ffffffff93e4dc7a ffffffff93e471f2
Call Trace:
 [<ffffffff944db781>] dump_stack+0x85/0xc4
 [<ffffffff93e4d9e1>] kasan_object_err+0x21/0x70
 [<ffffffff93e4dc7a>] kasan_report_error+0x1fa/0x500
 [<ffffffff93e471f2>] ? __slab_alloc.isra.60+0x72/0xa0
 [<ffffffff93e4e0c1>] __asan_report_load4_noabort+0x61/0x70
 [<ffffffff9505d5cd>] ? ipv4_mtu+0x25d/0x320
 [<ffffffff9505d5cd>] ipv4_mtu+0x25d/0x320
 [<ffffffff950fca39>] tcp_current_mss+0x149/0x2f0
 [<ffffffff950fc8f0>] ? tcp_mtu_to_mss+0x360/0x360
 [<ffffffff93e4d0c1>] ? memset+0x31/0x40
 [<ffffffff94f40406>] ? __alloc_skb+0x316/0x5a0
 [<ffffffff94f400f0>] ? skb_checksum+0xa0/0xa0
 [<ffffffff93a8d0c8>] ? mark_held_locks+0xc8/0x120
 [<ffffffff9398e57f>] ? __local_bh_enable_ip+0x6f/0xd0
 [<ffffffff9510f2db>] tcp_send_fin+0x15b/0xbc0
 [<ffffffff950c43e1>] tcp_close+0xc41/0xf50
 [<ffffffff93ad9a87>] ? debug_lockdep_rcu_enabled+0x77/0x90
 [<ffffffff93f88a40>] ? __fsnotify_update_child_dentry_flags.part.1+0x270/0x270
 [<ffffffff9518003e>] inet_release+0xde/0x1c0
 [<ffffffff94f19718>] sock_release+0x88/0x1a0
 [<ffffffff94f19842>] sock_close+0x12/0x20
 [<ffffffff93eb7f51>] __fput+0x261/0x740
 [<ffffffff93eb849e>] ____fput+0xe/0x10
 [<ffffffff939e076e>] task_work_run+0xde/0x150
 [<ffffffff93989617>] do_exit+0x727/0x2cf0
 [<ffffffff93a8dca0>] ? debug_check_no_locks_freed+0x290/0x290
 [<ffffffff93988ef0>] ? mm_update_next_owner+0x720/0x720
 [<ffffffff93ad9a87>] ? debug_lockdep_rcu_enabled+0x77/0x90
 [<ffffffff939af064>] ? get_signal+0x1b4/0x1340
 [<ffffffff9398bd39>] do_group_exit+0xf9/0x300
 [<ffffffff939af345>] get_signal+0x495/0x1340
 [<ffffffff93ef3400>] ? do_select+0x1380/0x1380
 [<ffffffff9388fc73>] do_signal+0x93/0x1920
 [<ffffffff93ee4ed1>] ? putname+0xc1/0xf0
 [<ffffffff93ef16b0>] ? set_fd_set+0x70/0x70
 [<ffffffff9388fbe0>] ? setup_sigcontext+0x7e0/0x7e0
 [<ffffffff93ef3bef>] ? SyS_select+0x15f/0x1c0
 [<ffffffff93ef3a90>] ? core_sys_select+0x690/0x690
 [<ffffffff9380497a>] ? exit_to_usermode_loop+0x8a/0x130
 [<ffffffff938049ab>] exit_to_usermode_loop+0xbb/0x130
 [<ffffffff938075cc>] syscall_return_slowpath+0x16c/0x1a0
 [<ffffffff953d84e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
Object at ffff8803dd32e3c8, in cache kmalloc-64 size: 64
Allocated:
PID = 1598
 save_stack_trace+0x1b/0x20
 save_stack+0x46/0xd0
 kasan_kmalloc+0xad/0xe0
 kmem_cache_alloc_trace+0x136/0x340
 fib_create_info+0x912/0x41f0
 fib_table_insert+0x18f/0x1ac0
 inet_rtm_newroute+0xfb/0x180
 rtnetlink_rcv_msg+0x249/0x6a0
 netlink_rcv_skb+0x247/0x350
 rtnetlink_rcv+0x2a/0x40
 netlink_unicast+0x481/0x690
 netlink_sendmsg+0x8c1/0xb70
 sock_sendmsg+0xba/0xf0
 ___sys_sendmsg+0x6c3/0x890
 __sys_sendmsg+0xd5/0x170
 SyS_sendmsg+0x12/0x20
 entry_SYSCALL_64_fastpath+0x23/0xc6
Freed:
PID = 1869
 save_stack_trace+0x1b/0x20
 save_stack+0x46/0xd0
 kasan_slab_free+0x71/0xb0
 kfree+0xe8/0x2e0
 free_fib_info_rcu+0x39a/0x490
 rcu_process_callbacks+0x9d2/0x1220
 __do_softirq+0x286/0x87d
Memory state around the buggy address:
 ffff8803dd32e280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8803dd32e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8803dd32e380: fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
                       ^
 ffff8803dd32e400: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 ffff8803dd32e480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

-- [4]

(gdb) list *(ipv4_mtu+0x25d)
0xffffffff8285d5cd is in ipv4_mtu (./include/net/dst.h:176).
171    static inline u32
172    dst_metric_raw(const struct dst_entry *dst, const int metric)
173    {
174        u32 *p = DST_METRICS_PTR(dst);
175
176        return p[metric-1];
177    }
178
179    static inline u32
180    dst_metric(const struct dst_entry *dst, const int metric)
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-02 12:42 [4.9.13] use after free in ipv4_mtu Daniel J Blueman
@ 2017-03-02 13:08 ` Eric Dumazet
  2017-03-02 13:28   ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2017-03-02 13:08 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Netdev, David Miller, Stephen Hemminger

On Thu, 2017-03-02 at 20:42 +0800, Daniel J Blueman wrote:
> With debugging enabled [1,2], KASAN finds a use-after-free in ipv4_mtu
> [3,4] with activity over a brcmfmac wireless card.
> 
> Let me know for further testing/debugging.
> 
> Thanks!
>   Dan
> 
> [1] https://quora.org/linux/ipv4_mtu/config
> [2] https://quora.org/linux/ipv4_mtu/vmlinux
> 
> -- [3]
> 
> BUG: KASAN: use-after-free in ipv4_mtu+0x25d/0x320 at addr ffff8803dd32e3cc
> Read of size 4 by task https/2070
> CPU: 3 PID: 2070 Comm: https Not tainted 4.9.13-debug+ #8
> Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS 1.2.19 12/22/2016
>  ffff88040e897690 ffffffff944db781 ffff88042740f740 ffff8803dd32e3c8
>  ffff88040e8976b8 ffffffff93e4d9e1 ffff88040e897750 ffff8803dd32e3c0
>  ffff88042740f740 ffff88040e897740 ffffffff93e4dc7a ffffffff93e471f2
> Call Trace:
>  [<ffffffff944db781>] dump_stack+0x85/0xc4
>  [<ffffffff93e4d9e1>] kasan_object_err+0x21/0x70
>  [<ffffffff93e4dc7a>] kasan_report_error+0x1fa/0x500
>  [<ffffffff93e471f2>] ? __slab_alloc.isra.60+0x72/0xa0
>  [<ffffffff93e4e0c1>] __asan_report_load4_noabort+0x61/0x70
>  [<ffffffff9505d5cd>] ? ipv4_mtu+0x25d/0x320
>  [<ffffffff9505d5cd>] ipv4_mtu+0x25d/0x320
>  [<ffffffff950fca39>] tcp_current_mss+0x149/0x2f0
>  [<ffffffff950fc8f0>] ? tcp_mtu_to_mss+0x360/0x360
>  [<ffffffff93e4d0c1>] ? memset+0x31/0x40
>  [<ffffffff94f40406>] ? __alloc_skb+0x316/0x5a0
>  [<ffffffff94f400f0>] ? skb_checksum+0xa0/0xa0
>  [<ffffffff93a8d0c8>] ? mark_held_locks+0xc8/0x120
>  [<ffffffff9398e57f>] ? __local_bh_enable_ip+0x6f/0xd0
>  [<ffffffff9510f2db>] tcp_send_fin+0x15b/0xbc0
>  [<ffffffff950c43e1>] tcp_close+0xc41/0xf50
>  [<ffffffff93ad9a87>] ? debug_lockdep_rcu_enabled+0x77/0x90
>  [<ffffffff93f88a40>] ? __fsnotify_update_child_dentry_flags.part.1+0x270/0x270
>  [<ffffffff9518003e>] inet_release+0xde/0x1c0
>  [<ffffffff94f19718>] sock_release+0x88/0x1a0
>  [<ffffffff94f19842>] sock_close+0x12/0x20
>  [<ffffffff93eb7f51>] __fput+0x261/0x740
>  [<ffffffff93eb849e>] ____fput+0xe/0x10
>  [<ffffffff939e076e>] task_work_run+0xde/0x150
>  [<ffffffff93989617>] do_exit+0x727/0x2cf0
>  [<ffffffff93a8dca0>] ? debug_check_no_locks_freed+0x290/0x290
>  [<ffffffff93988ef0>] ? mm_update_next_owner+0x720/0x720
>  [<ffffffff93ad9a87>] ? debug_lockdep_rcu_enabled+0x77/0x90
>  [<ffffffff939af064>] ? get_signal+0x1b4/0x1340
>  [<ffffffff9398bd39>] do_group_exit+0xf9/0x300
>  [<ffffffff939af345>] get_signal+0x495/0x1340
>  [<ffffffff93ef3400>] ? do_select+0x1380/0x1380
>  [<ffffffff9388fc73>] do_signal+0x93/0x1920
>  [<ffffffff93ee4ed1>] ? putname+0xc1/0xf0
>  [<ffffffff93ef16b0>] ? set_fd_set+0x70/0x70
>  [<ffffffff9388fbe0>] ? setup_sigcontext+0x7e0/0x7e0
>  [<ffffffff93ef3bef>] ? SyS_select+0x15f/0x1c0
>  [<ffffffff93ef3a90>] ? core_sys_select+0x690/0x690
>  [<ffffffff9380497a>] ? exit_to_usermode_loop+0x8a/0x130
>  [<ffffffff938049ab>] exit_to_usermode_loop+0xbb/0x130
>  [<ffffffff938075cc>] syscall_return_slowpath+0x16c/0x1a0
>  [<ffffffff953d84e6>] entry_SYSCALL_64_fastpath+0xc4/0xc6
> Object at ffff8803dd32e3c8, in cache kmalloc-64 size: 64
> Allocated:
> PID = 1598
>  save_stack_trace+0x1b/0x20
>  save_stack+0x46/0xd0
>  kasan_kmalloc+0xad/0xe0
>  kmem_cache_alloc_trace+0x136/0x340
>  fib_create_info+0x912/0x41f0
>  fib_table_insert+0x18f/0x1ac0
>  inet_rtm_newroute+0xfb/0x180
>  rtnetlink_rcv_msg+0x249/0x6a0
>  netlink_rcv_skb+0x247/0x350
>  rtnetlink_rcv+0x2a/0x40
>  netlink_unicast+0x481/0x690
>  netlink_sendmsg+0x8c1/0xb70
>  sock_sendmsg+0xba/0xf0
>  ___sys_sendmsg+0x6c3/0x890
>  __sys_sendmsg+0xd5/0x170
>  SyS_sendmsg+0x12/0x20
>  entry_SYSCALL_64_fastpath+0x23/0xc6
> Freed:
> PID = 1869
>  save_stack_trace+0x1b/0x20
>  save_stack+0x46/0xd0
>  kasan_slab_free+0x71/0xb0
>  kfree+0xe8/0x2e0
>  free_fib_info_rcu+0x39a/0x490
>  rcu_process_callbacks+0x9d2/0x1220
>  __do_softirq+0x286/0x87d
> Memory state around the buggy address:
>  ffff8803dd32e280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8803dd32e300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> >ffff8803dd32e380: fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb
>                        ^
>  ffff8803dd32e400: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>  ffff8803dd32e480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> 
> -- [4]
> 
> (gdb) list *(ipv4_mtu+0x25d)
> 0xffffffff8285d5cd is in ipv4_mtu (./include/net/dst.h:176).
> 171    static inline u32
> 172    dst_metric_raw(const struct dst_entry *dst, const int metric)
> 173    {
> 174        u32 *p = DST_METRICS_PTR(dst);
> 175
> 176        return p[metric-1];
> 177    }
> 178
> 179    static inline u32
> 180    dst_metric(const struct dst_entry *dst, const int metric)

Thanks for the report !

This patch should solve this precise issue, but we need more work.

We need to audit all __sk_dst_get() and make sure they are inside an
rcu_read_lock()/rcu_read_unlock() section.

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
 unsigned int tcp_current_mss(struct sock *sk)
 {
 	const struct tcp_sock *tp = tcp_sk(sk);
-	const struct dst_entry *dst = __sk_dst_get(sk);
+	const struct dst_entry *dst;
 	u32 mss_now;
 	unsigned int header_len;
 	struct tcp_out_options opts;
@@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
 
 	mss_now = tp->mss_cache;
 
+	rcu_read_lock();
+	dst = __sk_dst_get(sk);
 	if (dst) {
 		u32 mtu = dst_mtu(dst);
 		if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
 			mss_now = tcp_sync_mss(sk, mtu);
 	}
+	rcu_read_unlock();
 
 	header_len = tcp_established_options(sk, NULL, &opts, &md5) +
 		     sizeof(struct tcphdr);

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-02 13:08 ` Eric Dumazet
@ 2017-03-02 13:28   ` Eric Dumazet
  2017-03-06  6:33     ` Daniel J Blueman
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2017-03-02 13:28 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Netdev, David Miller, Stephen Hemminger

On Thu, 2017-03-02 at 05:08 -0800, Eric Dumazet wrote:

> Thanks for the report !
> 
> This patch should solve this precise issue, but we need more work.
> 
> We need to audit all __sk_dst_get() and make sure they are inside an
> rcu_read_lock()/rcu_read_unlock() section.
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
>  unsigned int tcp_current_mss(struct sock *sk)
>  {
>  	const struct tcp_sock *tp = tcp_sk(sk);
> -	const struct dst_entry *dst = __sk_dst_get(sk);
> +	const struct dst_entry *dst;
>  	u32 mss_now;
>  	unsigned int header_len;
>  	struct tcp_out_options opts;
> @@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
>  
>  	mss_now = tp->mss_cache;
>  
> +	rcu_read_lock();
> +	dst = __sk_dst_get(sk);
>  	if (dst) {
>  		u32 mtu = dst_mtu(dst);
>  		if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
>  			mss_now = tcp_sync_mss(sk, mtu);
>  	}
> +	rcu_read_unlock();
>  
>  	header_len = tcp_established_options(sk, NULL, &opts, &md5) +
>  		     sizeof(struct tcphdr);
> 

Normally TCP sockets sk_dst_cache can only be changed if the thread
doing the change owns the socket.

I have not yet understood which point was breaking the rule yet.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-02 13:28   ` Eric Dumazet
@ 2017-03-06  6:33     ` Daniel J Blueman
  2017-03-06 13:45       ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel J Blueman @ 2017-03-06  6:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, David Miller, Stephen Hemminger

On 2 March 2017 at 21:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-03-02 at 05:08 -0800, Eric Dumazet wrote:
>
>> Thanks for the report !
>>
>> This patch should solve this precise issue, but we need more work.
>>
>> We need to audit all __sk_dst_get() and make sure they are inside an
>> rcu_read_lock()/rcu_read_unlock() section.
>>
>> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> index 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f 100644
>> --- a/net/ipv4/tcp_output.c
>> +++ b/net/ipv4/tcp_output.c
>> @@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
>>  unsigned int tcp_current_mss(struct sock *sk)
>>  {
>>       const struct tcp_sock *tp = tcp_sk(sk);
>> -     const struct dst_entry *dst = __sk_dst_get(sk);
>> +     const struct dst_entry *dst;
>>       u32 mss_now;
>>       unsigned int header_len;
>>       struct tcp_out_options opts;
>> @@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
>>
>>       mss_now = tp->mss_cache;
>>
>> +     rcu_read_lock();
>> +     dst = __sk_dst_get(sk);
>>       if (dst) {
>>               u32 mtu = dst_mtu(dst);
>>               if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
>>                       mss_now = tcp_sync_mss(sk, mtu);
>>       }
>> +     rcu_read_unlock();
>>
>>       header_len = tcp_established_options(sk, NULL, &opts, &md5) +
>>                    sizeof(struct tcphdr);
>
> Normally TCP sockets sk_dst_cache can only be changed if the thread
> doing the change owns the socket.
>
> I have not yet understood which point was breaking the rule yet.

Great work Eric! I have been unable to reproduce the KASAN warning
with this patch.

Reported-by: Daniel J Blueman <daniel@quora.org>
Tested-by: Daniel J Blueman <daniel@quora.org>

I do change the network queueing discipline and related at runtime [1]
which may be triggering this, though I did think I saw the KASAN
report only after resuming from suspend. rf(un)kill and other tweaking
may have been involved too.

Thanks,
  Dan

[1] /etc/sysctl.d/90-tcp.conf

net.core.default_qdisc = fq_codel
net.ipv4.tcp_congestion_control = bbr
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_ecn = 1
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-06  6:33     ` Daniel J Blueman
@ 2017-03-06 13:45       ` Eric Dumazet
  2017-03-06 16:20         ` Eric Dumazet
  2017-05-04  1:30         ` Daniel J Blueman
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2017-03-06 13:45 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Netdev, David Miller, Stephen Hemminger

On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:
> On 2 March 2017 at 21:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Thu, 2017-03-02 at 05:08 -0800, Eric Dumazet wrote:
> >
> >> Thanks for the report !
> >>
> >> This patch should solve this precise issue, but we need more work.
> >>
> >> We need to audit all __sk_dst_get() and make sure they are inside an
> >> rcu_read_lock()/rcu_read_unlock() section.
> >>
> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> >> index 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f 100644
> >> --- a/net/ipv4/tcp_output.c
> >> +++ b/net/ipv4/tcp_output.c
> >> @@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
> >>  unsigned int tcp_current_mss(struct sock *sk)
> >>  {
> >>       const struct tcp_sock *tp = tcp_sk(sk);
> >> -     const struct dst_entry *dst = __sk_dst_get(sk);
> >> +     const struct dst_entry *dst;
> >>       u32 mss_now;
> >>       unsigned int header_len;
> >>       struct tcp_out_options opts;
> >> @@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
> >>
> >>       mss_now = tp->mss_cache;
> >>
> >> +     rcu_read_lock();
> >> +     dst = __sk_dst_get(sk);
> >>       if (dst) {
> >>               u32 mtu = dst_mtu(dst);
> >>               if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
> >>                       mss_now = tcp_sync_mss(sk, mtu);
> >>       }
> >> +     rcu_read_unlock();
> >>
> >>       header_len = tcp_established_options(sk, NULL, &opts, &md5) +
> >>                    sizeof(struct tcphdr);
> >
> > Normally TCP sockets sk_dst_cache can only be changed if the thread
> > doing the change owns the socket.
> >
> > I have not yet understood which point was breaking the rule yet.
> 
> Great work Eric! I have been unable to reproduce the KASAN warning
> with this patch.
> 
> Reported-by: Daniel J Blueman <daniel@quora.org>
> Tested-by: Daniel J Blueman <daniel@quora.org>
> 
> I do change the network queueing discipline and related at runtime [1]
> which may be triggering this, though I did think I saw the KASAN
> report only after resuming from suspend. rf(un)kill and other tweaking
> may have been involved too.
> 
> Thanks,
>   Dan
> 
> [1] /etc/sysctl.d/90-tcp.conf
> 
> net.core.default_qdisc = fq_codel
> net.ipv4.tcp_congestion_control = bbr
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.tcp_ecn = 1

Thanks Daniel, but this bandaid patch should not be needed.

Somehow another point in the stack is at fault and needs to be
identified.

Otherwise we'll keep adding works around.

Since net-next is soon to be re-opened, I will submit patches adding
more lockdep assisted checks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-06 13:45       ` Eric Dumazet
@ 2017-03-06 16:20         ` Eric Dumazet
  2017-03-07 12:29           ` Daniel J Blueman
  2017-03-07 16:29           ` Stephen Hemminger
  2017-05-04  1:30         ` Daniel J Blueman
  1 sibling, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2017-03-06 16:20 UTC (permalink / raw)
  To: Daniel J Blueman; +Cc: Netdev, David Miller, Stephen Hemminger

On Mon, 2017-03-06 at 05:45 -0800, Eric Dumazet wrote:
> On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:

> > I do change the network queueing discipline and related at runtime [1]
> > which may be triggering this, though I did think I saw the KASAN
> > report only after resuming from suspend. rf(un)kill and other tweaking
> > may have been involved too.
> > 
> > Thanks,
> >   Dan
> > 
> > [1] /etc/sysctl.d/90-tcp.conf
> > 
> > net.core.default_qdisc = fq_codel
> > net.ipv4.tcp_congestion_control = bbr
> > net.ipv4.tcp_slow_start_after_idle = 0
> > net.ipv4.tcp_ecn = 1

BTW, fq_codel is not suitable for BBR.

Only fq contains the needed pacing for BBR.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-06 16:20         ` Eric Dumazet
@ 2017-03-07 12:29           ` Daniel J Blueman
  2017-03-07 16:29           ` Stephen Hemminger
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel J Blueman @ 2017-03-07 12:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, David Miller, Stephen Hemminger

On 7 March 2017 at 00:20, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-03-06 at 05:45 -0800, Eric Dumazet wrote:
>> On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:
>
>> > I do change the network queueing discipline and related at runtime [1]
>> > which may be triggering this, though I did think I saw the KASAN
>> > report only after resuming from suspend. rf(un)kill and other tweaking
>> > may have been involved too.
>> >
>> > Thanks,
>> >   Dan
>> >
>> > [1] /etc/sysctl.d/90-tcp.conf
>> >
>> > net.core.default_qdisc = fq_codel
>> > net.ipv4.tcp_congestion_control = bbr
>> > net.ipv4.tcp_slow_start_after_idle = 0
>> > net.ipv4.tcp_ecn = 1
>
> BTW, fq_codel is not suitable for BBR.
>
> Only fq contains the needed pacing for BBR.

Good tip, thanks Eric!
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-06 16:20         ` Eric Dumazet
  2017-03-07 12:29           ` Daniel J Blueman
@ 2017-03-07 16:29           ` Stephen Hemminger
  2017-03-07 16:56             ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2017-03-07 16:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Daniel J Blueman, Netdev, David Miller, Stephen Hemminger

On Mon, 06 Mar 2017 08:20:03 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2017-03-06 at 05:45 -0800, Eric Dumazet wrote:
> > On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:  
> 
> > > I do change the network queueing discipline and related at runtime [1]
> > > which may be triggering this, though I did think I saw the KASAN
> > > report only after resuming from suspend. rf(un)kill and other tweaking
> > > may have been involved too.
> > > 
> > > Thanks,
> > >   Dan
> > > 
> > > [1] /etc/sysctl.d/90-tcp.conf
> > > 
> > > net.core.default_qdisc = fq_codel
> > > net.ipv4.tcp_congestion_control = bbr
> > > net.ipv4.tcp_slow_start_after_idle = 0
> > > net.ipv4.tcp_ecn = 1  
> 
> BTW, fq_codel is not suitable for BBR.
> 
> Only fq contains the needed pacing for BBR.

What about something like this???

diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c
index b89bce4c721e..647be997a9c5 100644
--- a/net/ipv4/tcp_bbr.c
+++ b/net/ipv4/tcp_bbr.c
@@ -801,6 +801,9 @@ static void bbr_init(struct sock *sk)
        struct bbr *bbr = inet_csk_ca(sk);
        u64 bw;
 
+       WARN_ONCE(strcmp(default_qdisc_ops->id, "fq"),
+                 "TCP BBR should only be used with FQ qdisc\n");
+
        bbr->prior_cwnd = 0;
        bbr->tso_segs_goal = 0;  /* default segs per skb until first ACK */
        bbr->rtt_cnt = 0;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-07 16:29           ` Stephen Hemminger
@ 2017-03-07 16:56             ` Eric Dumazet
  0 siblings, 0 replies; 10+ messages in thread
From: Eric Dumazet @ 2017-03-07 16:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Daniel J Blueman, Netdev, David Miller, Stephen Hemminger

On Tue, 2017-03-07 at 08:29 -0800, Stephen Hemminger wrote:

> +       WARN_ONCE(strcmp(default_qdisc_ops->id, "fq"),
> +                 "TCP BBR should only be used with FQ qdisc\n");
> +
>  

Why would that be needed, especially for people that properly setup
their qdisc ? Maybe they do not want to force fq on all devices like tun
or taps ;)

Also we intend to provide a fallback in case FQ is not in the qdisc
chain : Some routers admins really want fq_codel (or whatever qdisc)

TCP will handle the pacing itself, at a small cost for the ones that
hate fq.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [4.9.13] use after free in ipv4_mtu
  2017-03-06 13:45       ` Eric Dumazet
  2017-03-06 16:20         ` Eric Dumazet
@ 2017-05-04  1:30         ` Daniel J Blueman
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel J Blueman @ 2017-05-04  1:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Netdev, David Miller, Stephen Hemminger

On 6 March 2017 at 21:45, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-03-06 at 14:33 +0800, Daniel J Blueman wrote:
>> On 2 March 2017 at 21:28, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2017-03-02 at 05:08 -0800, Eric Dumazet wrote:
>> >
>> >> Thanks for the report !
>> >>
>> >> This patch should solve this precise issue, but we need more work.
>> >>
>> >> We need to audit all __sk_dst_get() and make sure they are inside an
>> >> rcu_read_lock()/rcu_read_unlock() section.
>> >>
>> >> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
>> >> index 22548b5f05cbe5a655e0c53df2d31c5cc2e8a702..517963e1cb6eb9d70fcd71f44262813c3378759f 100644
>> >> --- a/net/ipv4/tcp_output.c
>> >> +++ b/net/ipv4/tcp_output.c
>> >> @@ -1459,7 +1459,7 @@ EXPORT_SYMBOL(tcp_sync_mss);
>> >>  unsigned int tcp_current_mss(struct sock *sk)
>> >>  {
>> >>       const struct tcp_sock *tp = tcp_sk(sk);
>> >> -     const struct dst_entry *dst = __sk_dst_get(sk);
>> >> +     const struct dst_entry *dst;
>> >>       u32 mss_now;
>> >>       unsigned int header_len;
>> >>       struct tcp_out_options opts;
>> >> @@ -1467,11 +1467,14 @@ unsigned int tcp_current_mss(struct sock *sk)
>> >>
>> >>       mss_now = tp->mss_cache;
>> >>
>> >> +     rcu_read_lock();
>> >> +     dst = __sk_dst_get(sk);
>> >>       if (dst) {
>> >>               u32 mtu = dst_mtu(dst);
>> >>               if (mtu != inet_csk(sk)->icsk_pmtu_cookie)
>> >>                       mss_now = tcp_sync_mss(sk, mtu);
>> >>       }
>> >> +     rcu_read_unlock();
>> >>
>> >>       header_len = tcp_established_options(sk, NULL, &opts, &md5) +
>> >>                    sizeof(struct tcphdr);
>> >
>> > Normally TCP sockets sk_dst_cache can only be changed if the thread
>> > doing the change owns the socket.
>> >
>> > I have not yet understood which point was breaking the rule yet.
>>
>> Great work Eric! I have been unable to reproduce the KASAN warning
>> with this patch.
>>
>> Reported-by: Daniel J Blueman <daniel@quora.org>
>> Tested-by: Daniel J Blueman <daniel@quora.org>
>>
>> I do change the network queueing discipline and related at runtime [1]
>> which may be triggering this, though I did think I saw the KASAN
>> report only after resuming from suspend. rf(un)kill and other tweaking
>> may have been involved too.
>>
>> Thanks,
>>   Dan
>
> Thanks Daniel, but this bandaid patch should not be needed.
>
> Somehow another point in the stack is at fault and needs to be
> identified.
>
> Otherwise we'll keep adding works around.
>
> Since net-next is soon to be re-opened, I will submit patches adding
> more lockdep assisted checks.

I am still seeing the same KASAN issue on 4.9.25; let me know if any
debug or testing would help. Thanks Eric!

https://quora.org/linux/ipv4_mtu/vmlinux
https://quora.org/linux/ipv4_mtu/config

[19515.180104] BUG: KASAN: use-after-free in ipv4_mtu+0x25d/0x320 at
addr ffff88040d6fecc4
[19515.180108] Read of size 4 by task Socket Thread/3299
[19515.180113] CPU: 7 PID: 3299 Comm: Socket Thread Tainted: G    BU
       4.9.25-debug+ #5
[19515.180115] Hardware name: Dell Inc. XPS 15 9550/0N7TVV, BIOS
1.2.21 02/17/2017
[19515.180127]  ffff8803f55a77d0 ffffffff9ce404b1 ffff88042d803800
ffff88040d6fecc0
[19515.180133]  ffff8803f55a77f8 ffffffff9c7e2751 ffff8803f55a7890
ffff88040d6fecc0
[19515.180137]  ffff88042d803800 ffff8803f55a7880 ffffffff9c7e29ea
ffff8803f55a7828
[19515.180140] Call Trace:
[19515.180148]  [<ffffffff9ce404b1>] dump_stack+0x63/0x82
[19515.180154]  [<ffffffff9c7e2751>] kasan_object_err+0x21/0x70
[19515.180158]  [<ffffffff9c7e29ea>] kasan_report_error+0x1fa/0x500
[19515.180162]  [<ffffffff9dced48d>] ? schedule+0x8d/0x1a0
[19515.180168]  [<ffffffff9c4f561e>] ? get_futex_key+0x64e/0xbb0
[19515.180174]  [<ffffffff9c40a1ab>] ? update_cfs_rq_load_avg+0x89b/0x1520
[19515.180177]  [<ffffffff9c7e2e31>] __asan_report_load4_noabort+0x61/0x70
[19515.180180]  [<ffffffff9d9ace2d>] ? ipv4_mtu+0x25d/0x320
[19515.180182]  [<ffffffff9d9ace2d>] ipv4_mtu+0x25d/0x320
[19515.180187]  [<ffffffff9da44693>] tcp_current_mss+0x133/0x270
[19515.180190]  [<ffffffff9da44560>] ? tcp_mtu_to_mss+0x280/0x280
[19515.180193]  [<ffffffff9c40eeb9>] ? select_idle_sibling+0x29/0xaa0
[19515.180195]  [<ffffffff9c4f6284>] ? futex_wait_setup+0x1d4/0x300
[19515.180198]  [<ffffffff9c424f81>] ? enqueue_task_fair+0x261/0x2760
[19515.180201]  [<ffffffff9d9f99e4>] tcp_send_mss+0x24/0x2b0
[19515.180203]  [<ffffffff9da09809>] tcp_sendmsg+0x3d9/0x3530
[19515.180208]  [<ffffffff9c3f018d>] ? ttwu_do_wakeup+0x1d/0x2c0
[19515.180211]  [<ffffffff9c3f0539>] ? ttwu_do_activate+0x109/0x180
[19515.180213]  [<ffffffff9da09430>] ? tcp_sendpage+0x1cc0/0x1cc0
[19515.180216]  [<ffffffff9c3f3a27>] ? wake_up_q+0x87/0xe0
[19515.180219]  [<ffffffff9c4fb707>] ? do_futex+0xf87/0x1730
[19515.180222]  [<ffffffff9dac9d90>] ? inet_gso_segment+0x1340/0x1340
[19515.180224]  [<ffffffff9daca1b0>] ? inet_recvmsg+0x420/0x420
[19515.180226]  [<ffffffff9daca3c7>] inet_sendmsg+0x217/0x3c0
[19515.180228]  [<ffffffff9daca1b0>] ? inet_recvmsg+0x420/0x420
[19515.180232]  [<ffffffff9d87752a>] sock_sendmsg+0xba/0xf0
[19515.180234]  [<ffffffff9d8785ef>] SYSC_sendto+0x1df/0x330
[19515.180236]  [<ffffffff9d878410>] ? SYSC_connect+0x2d0/0x2d0
[19515.180243]  [<ffffffff9d872794>] ? sock_ioctl+0x284/0x3a0
[19515.180248]  [<ffffffff9c87e3ef>] ? do_vfs_ioctl+0x1af/0xf60
[19515.180250]  [<ffffffff9c87e240>] ? ioctl_preallocate+0x1e0/0x1e0
[19515.180253]  [<ffffffff9c4fbfa4>] ? SyS_futex+0xf4/0x2a0
[19515.180259]  [<ffffffff9c8436b4>] ? vfs_read+0x254/0x2f0
[19515.180261]  [<ffffffff9c4fbeb0>] ? do_futex+0x1730/0x1730
[19515.180265]  [<ffffffff9c84705c>] ? SyS_read+0x11c/0x1c0
[19515.180267]  [<ffffffff9d87a31e>] SyS_sendto+0xe/0x10
[19515.180271]  [<ffffffff9dcf897b>] entry_SYSCALL_64_fastpath+0x1e/0xad
[19515.180274] Object at ffff88040d6fecc0, in cache kmalloc-64 size: 64
[19515.180277] Allocated:
[19515.180279] PID = 2326
[19515.180287]  save_stack_trace+0x1b/0x20
[19515.180291]  save_stack+0x46/0xd0
[19515.180293]  kasan_kmalloc+0xad/0xe0
[19515.180295]  kmem_cache_alloc_trace+0xe8/0x1e0
[19515.180298]  fib_create_info+0x917/0x3fc0
[19515.180300]  fib_table_insert+0x1d4/0x1c90
[19515.180302]  inet_rtm_newroute+0xfb/0x180
[19515.180306]  rtnetlink_rcv_msg+0x249/0x6a0
[19515.180312]  netlink_rcv_skb+0x247/0x350
[19515.180314]  rtnetlink_rcv+0x28/0x30
[19515.180316]  netlink_unicast+0x419/0x5c0
[19515.180318]  netlink_sendmsg+0x8a7/0xb60
[19515.180319]  sock_sendmsg+0xba/0xf0
[19515.180321]  ___sys_sendmsg+0x697/0x860
[19515.180323]  __sys_sendmsg+0xd5/0x170
[19515.180324]  SyS_sendmsg+0x12/0x20
[19515.180327]  entry_SYSCALL_64_fastpath+0x1e/0xad
[19515.180327] Freed:
[19515.180329] PID = 0
[19515.180333]  save_stack_trace+0x1b/0x20
[19515.180340]  save_stack+0x46/0xd0
[19515.180345]  kasan_slab_free+0x71/0xb0
[19515.180347]  kfree+0x8c/0x1a0
[19515.180353]  free_fib_info_rcu+0x558/0x760
[19515.180358]  rcu_process_callbacks+0x968/0x1000
[19515.180361]  __do_softirq+0x1a9/0x538
[19515.180361] Memory state around the buggy address:
[19515.180367]  ffff88040d6feb80: fc fc fc fc 00 00 00 00 00 00 00 00
fc fc fc fc
[19515.180369]  ffff88040d6fec00: fb fb fb fb fb fb fb fb fc fc fc fc
fb fb fb fb
[19515.180371] >ffff88040d6fec80: fb fb fb fb fc fc fc fc fb fb fb fb
fb fb fb fb
[19515.180373]                                            ^
[19515.180375]  ffff88040d6fed00: fc fc fc fc fb fb fb fb fb fb fb fb
fc fc fc fc
[19515.180378]  ffff88040d6fed80: 00 00 00 00 00 00 00 00 fc fc fc fc
fb fb fb fb
-- 
Daniel J Blueman

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-05-04  1:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-02 12:42 [4.9.13] use after free in ipv4_mtu Daniel J Blueman
2017-03-02 13:08 ` Eric Dumazet
2017-03-02 13:28   ` Eric Dumazet
2017-03-06  6:33     ` Daniel J Blueman
2017-03-06 13:45       ` Eric Dumazet
2017-03-06 16:20         ` Eric Dumazet
2017-03-07 12:29           ` Daniel J Blueman
2017-03-07 16:29           ` Stephen Hemminger
2017-03-07 16:56             ` Eric Dumazet
2017-05-04  1:30         ` Daniel J Blueman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.