linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: use-after-free in neigh_timer_handler/sock_wfree
@ 2017-03-01 19:27 Dmitry Vyukov
  2017-03-01 21:24 ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Vyukov @ 2017-03-01 19:27 UTC (permalink / raw)
  To: David Miller, netdev, LKML, Cong Wang, Eric Dumazet
  Cc: syzkaller, Alexey Kuznetsov, James Morris

Hello,

I am seeing the following use-after-free report while running
syzkaller fuzzer on
linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:

==================================================================
BUG: KASAN: use-after-free in constant_test_bit
arch/x86/include/asm/bitops.h:324 [inline] at addr ffff8801c56d5460
BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
[inline] at addr ffff8801c56d5460
BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
net/core/sock.c:1630 at addr ffff8801c56d5460
Read of size 8 by task syz-fuzzer/3261
CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
 constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
 sock_flag include/net/sock.h:789 [inline]
 sock_wfree+0x118/0x120 net/core/sock.c:1630
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
 skb_release_all+0x15/0x60 net/core/skbuff.c:667
 __kfree_skb+0x15/0x20 net/core/skbuff.c:683
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
 ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
 neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
 neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
 call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
 expire_timers kernel/time/timer.c:1305 [inline]
 __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
 run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
 __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1cc/0x200 kernel/softirq.c:405
 exiting_irq arch/x86/include/asm/apic.h:658 [inline]
 smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
 apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
RIP: 0033:0x46a7c3
RSP: 002b:000000c83e2d5180 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
RAX: 0000000000000000 RBX: 000000000046a7b0 RCX: 000000c820471200
RDX: 0000000000000020 RSI: 000000c839e1bba0 RDI: 000000c83e2d5190
RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000073
R10: 000000c839a31b03 R11: 000000c839e1bbf8 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000010 R15: 0000000001263e90
 </IRQ>
Object at ffff8801c56d5400, in cache RAWv6 size: 1480
Allocated:
PID = 12540
 kmem_cache_alloc+0x102/0x680 mm/slab.c:3568
 sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1332
 sk_alloc+0x8c/0x470 net/core/sock.c:1394
 inet6_create+0x44d/0x1140 net/ipv6/af_inet6.c:183
 __sock_create+0x4e4/0x870 net/socket.c:1197
 sock_create net/socket.c:1237 [inline]
 SYSC_socket net/socket.c:1267 [inline]
 SyS_socket+0xf9/0x230 net/socket.c:1247
 entry_SYSCALL_64_fastpath+0x1f/0xc2
Freed:
PID = 12572
 kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:580
 __cache_free mm/slab.c:3510 [inline]
 kmem_cache_free+0x71/0x240 mm/slab.c:3770
 sk_prot_free net/core/sock.c:1375 [inline]
 __sk_destruct+0x487/0x6b0 net/core/sock.c:1450
 sk_destruct+0x47/0x80 net/core/sock.c:1458
 __sk_free+0x57/0x230 net/core/sock.c:1466
 sk_free+0x23/0x30 net/core/sock.c:1477
 sock_put include/net/sock.h:1644 [inline]
 sk_common_release+0x3bf/0x5e0 net/core/sock.c:2781
 rawv6_close+0x4c/0x80 net/ipv6/raw.c:1218
 inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
 sock_release+0x8d/0x1e0 net/socket.c:597
 sock_close+0x16/0x20 net/socket.c:1061
 __fput+0x332/0x7f0 fs/file_table.c:208
 ____fput+0x15/0x20 fs/file_table.c:244
 task_work_run+0x18a/0x260 kernel/task_work.c:116
 exit_task_work include/linux/task_work.h:21 [inline]
 do_exit+0x1956/0x2900 kernel/exit.c:873
 do_group_exit+0x149/0x420 kernel/exit.c:977
 get_signal+0x7e0/0x1820 kernel/signal.c:2313
 do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807
 exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:156
 prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
 syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
 entry_SYSCALL_64_fastpath+0xc0/0xc2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-01 19:27 net: use-after-free in neigh_timer_handler/sock_wfree Dmitry Vyukov
@ 2017-03-01 21:24 ` Cong Wang
  2017-03-01 21:43   ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-03-01 21:24 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, netdev, LKML, Eric Dumazet, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 11:27 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
> Hello,
>
> I am seeing the following use-after-free report while running
> syzkaller fuzzer on
> linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:
>
> ==================================================================
> BUG: KASAN: use-after-free in constant_test_bit
> arch/x86/include/asm/bitops.h:324 [inline] at addr ffff8801c56d5460
> BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
> [inline] at addr ffff8801c56d5460
> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
> net/core/sock.c:1630 at addr ffff8801c56d5460
> Read of size 8 by task syz-fuzzer/3261
> CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
> Hardware name: Google Google Compute Engine/Google Compute Engine,
> BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>  constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
>  sock_flag include/net/sock.h:789 [inline]
>  sock_wfree+0x118/0x120 net/core/sock.c:1630
>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
>  skb_release_all+0x15/0x60 net/core/skbuff.c:667
>  __kfree_skb+0x15/0x20 net/core/skbuff.c:683
>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
>  ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
>  neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
>  neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
>  expire_timers kernel/time/timer.c:1305 [inline]
>  __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>  invoke_softirq kernel/softirq.c:364 [inline]
>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707

This one looks very similar to a previous one:
https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ

Both happen on raw v6 sockets.

For me, it seems the sk refcnt is not correct, skb should still hold
a refcnt so it should not be freed before kfree_skb() in a timer
handler...



> RIP: 0033:0x46a7c3
> RSP: 002b:000000c83e2d5180 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
> RAX: 0000000000000000 RBX: 000000000046a7b0 RCX: 000000c820471200
> RDX: 0000000000000020 RSI: 000000c839e1bba0 RDI: 000000c83e2d5190
> RBP: 0000000000000002 R08: 0000000000000002 R09: 0000000000000073
> R10: 000000c839a31b03 R11: 000000c839e1bbf8 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000010 R15: 0000000001263e90
>  </IRQ>
> Object at ffff8801c56d5400, in cache RAWv6 size: 1480
> Allocated:
> PID = 12540
>  kmem_cache_alloc+0x102/0x680 mm/slab.c:3568
>  sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1332
>  sk_alloc+0x8c/0x470 net/core/sock.c:1394
>  inet6_create+0x44d/0x1140 net/ipv6/af_inet6.c:183
>  __sock_create+0x4e4/0x870 net/socket.c:1197
>  sock_create net/socket.c:1237 [inline]
>  SYSC_socket net/socket.c:1267 [inline]
>  SyS_socket+0xf9/0x230 net/socket.c:1247
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> Freed:
> PID = 12572
>  kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:580
>  __cache_free mm/slab.c:3510 [inline]
>  kmem_cache_free+0x71/0x240 mm/slab.c:3770
>  sk_prot_free net/core/sock.c:1375 [inline]
>  __sk_destruct+0x487/0x6b0 net/core/sock.c:1450
>  sk_destruct+0x47/0x80 net/core/sock.c:1458
>  __sk_free+0x57/0x230 net/core/sock.c:1466
>  sk_free+0x23/0x30 net/core/sock.c:1477
>  sock_put include/net/sock.h:1644 [inline]
>  sk_common_release+0x3bf/0x5e0 net/core/sock.c:2781
>  rawv6_close+0x4c/0x80 net/ipv6/raw.c:1218
>  inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
>  inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
>  sock_release+0x8d/0x1e0 net/socket.c:597
>  sock_close+0x16/0x20 net/socket.c:1061
>  __fput+0x332/0x7f0 fs/file_table.c:208
>  ____fput+0x15/0x20 fs/file_table.c:244
>  task_work_run+0x18a/0x260 kernel/task_work.c:116
>  exit_task_work include/linux/task_work.h:21 [inline]
>  do_exit+0x1956/0x2900 kernel/exit.c:873
>  do_group_exit+0x149/0x420 kernel/exit.c:977
>  get_signal+0x7e0/0x1820 kernel/signal.c:2313
>  do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:807
>  exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:156
>  prepare_exit_to_usermode arch/x86/entry/common.c:190 [inline]
>  syscall_return_slowpath+0x4d3/0x570 arch/x86/entry/common.c:259
>  entry_SYSCALL_64_fastpath+0xc0/0xc2

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-01 21:24 ` Cong Wang
@ 2017-03-01 21:43   ` Cong Wang
  2017-03-01 21:54     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-03-01 21:43 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, netdev, LKML, Eric Dumazet, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 1:24 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 11:27 AM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> Hello,
>>
>> I am seeing the following use-after-free report while running
>> syzkaller fuzzer on
>> linux-next/3e7350242c6f3d41d28e03418bd781cc1b7bad5f:
>>
>> ==================================================================
>> BUG: KASAN: use-after-free in constant_test_bit
>> arch/x86/include/asm/bitops.h:324 [inline] at addr ffff8801c56d5460
>> BUG: KASAN: use-after-free in sock_flag include/net/sock.h:789
>> [inline] at addr ffff8801c56d5460
>> BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
>> net/core/sock.c:1630 at addr ffff8801c56d5460
>> Read of size 8 by task syz-fuzzer/3261
>> CPU: 0 PID: 3261 Comm: syz-fuzzer Not tainted 4.10.0-next-20170224+ #1
>> Hardware name: Google Google Compute Engine/Google Compute Engine,
>> BIOS Google 01/01/2011
>> Call Trace:
>>  <IRQ>
>>  __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
>>  constant_test_bit arch/x86/include/asm/bitops.h:324 [inline]
>>  sock_flag include/net/sock.h:789 [inline]
>>  sock_wfree+0x118/0x120 net/core/sock.c:1630
>>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:654
>>  skb_release_all+0x15/0x60 net/core/skbuff.c:667
>>  __kfree_skb+0x15/0x20 net/core/skbuff.c:683
>>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:704
>>  ndisc_error_report+0xbb/0x190 net/ipv6/ndisc.c:683
>>  neigh_invalidate+0x23e/0x570 net/core/neighbour.c:848
>>  neigh_timer_handler+0x4e7/0x1140 net/core/neighbour.c:933
>>  call_timer_fn+0x241/0x820 kernel/time/timer.c:1266
>>  expire_timers kernel/time/timer.c:1305 [inline]
>>  __run_timers+0x960/0xcf0 kernel/time/timer.c:1599
>>  run_timer_softirq+0x21/0x80 kernel/time/timer.c:1612
>>  __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
>>  invoke_softirq kernel/softirq.c:364 [inline]
>>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>>  exiting_irq arch/x86/include/asm/apic.h:658 [inline]
>>  smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
>>  apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
>
> This one looks very similar to a previous one:
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>
> Both happen on raw v6 sockets.
>
> For me, it seems the sk refcnt is not correct, skb should still hold
> a refcnt so it should not be freed before kfree_skb() in a timer
> handler...

More precisely, after this commit:

commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Thu Jun 11 02:55:43 2009 -0700

    net: No more expensive sock_hold()/sock_put() on each tx

we don't take (old) refcnt any more on TX path, sk_wmem_alloc
is the new refcnt. ;)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-01 21:43   ` Cong Wang
@ 2017-03-01 21:54     ` Eric Dumazet
  2017-03-01 23:09       ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-03-01 21:54 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, netdev, LKML, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 1:43 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> This one looks very similar to a previous one:
>> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>>
>> Both happen on raw v6 sockets.
>>
>> For me, it seems the sk refcnt is not correct, skb should still hold
>> a refcnt so it should not be freed before kfree_skb() in a timer
>> handler...
>
> More precisely, after this commit:
>
> commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date:   Thu Jun 11 02:55:43 2009 -0700
>
>     net: No more expensive sock_hold()/sock_put() on each tx
>
> we don't take (old) refcnt any more on TX path, sk_wmem_alloc
> is the new refcnt. ;)

So the bug is that skb->truesize is mangled by reassembly unit,
while sbk->sk is tracking sk_wmem_alloc changes in order
to decide when it is safe to free sk.

This is why we need to call skb_orphan(), as we did for IPv4 in
8282f27449bf15548

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-01 21:54     ` Eric Dumazet
@ 2017-03-01 23:09       ` Cong Wang
  2017-03-01 23:15         ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-03-01 23:09 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, David Miller, netdev, LKML, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 1:54 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Mar 1, 2017 at 1:43 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>
>>> This one looks very similar to a previous one:
>>> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>>>
>>> Both happen on raw v6 sockets.
>>>
>>> For me, it seems the sk refcnt is not correct, skb should still hold
>>> a refcnt so it should not be freed before kfree_skb() in a timer
>>> handler...
>>
>> More precisely, after this commit:
>>
>> commit 2b85a34e911bf483c27cfdd124aeb1605145dc80
>> Author: Eric Dumazet <eric.dumazet@gmail.com>
>> Date:   Thu Jun 11 02:55:43 2009 -0700
>>
>>     net: No more expensive sock_hold()/sock_put() on each tx
>>
>> we don't take (old) refcnt any more on TX path, sk_wmem_alloc
>> is the new refcnt. ;)
>
> So the bug is that skb->truesize is mangled by reassembly unit,
> while sbk->sk is tracking sk_wmem_alloc changes in order
> to decide when it is safe to free sk.

That is my suspicion as well, skb->truesize is updated somewhere
but sk->sk_wmem_alloc isn't, so leads to this bug.

>
> This is why we need to call skb_orphan(), as we did for IPv4 in
> 8282f27449bf15548


But I doubt skb_orphan() is the solution here, shouldn't we just
update sk->sk_wmem_alloc with skb->truesize changes?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-01 23:09       ` Cong Wang
@ 2017-03-01 23:15         ` Eric Dumazet
  2017-03-02  5:25           ` Cong Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-03-01 23:15 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, netdev, LKML, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 3:09 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:

>
> But I doubt skb_orphan() is the solution here, shouldn't we just
> update sk->sk_wmem_alloc with skb->truesize changes?

Is it worth it ? Apart from syszkaller I mean...

We started with something that had a real impact on real workloads.

158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
pskb_expand_head()

Note that auditing the stack took me a while.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-01 23:15         ` Eric Dumazet
@ 2017-03-02  5:25           ` Cong Wang
  2017-03-02  5:36             ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Cong Wang @ 2017-03-02  5:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, David Miller, netdev, LKML, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 3:15 PM, Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Mar 1, 2017 at 3:09 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
>>
>> But I doubt skb_orphan() is the solution here, shouldn't we just
>> update sk->sk_wmem_alloc with skb->truesize changes?
>
> Is it worth it ? Apart from syszkaller I mean...
>
> We started with something that had a real impact on real workloads.
>
> 158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
> pskb_expand_head()
>
> Note that auditing the stack took me a while.

I don't know how sk refcnt could work correctly without making
sk_wmem_alloc correctly. We certainly could just call skb_orphan()
is we don't need skb->sk any more, probably like the frag case,
but for this case, the neigh one, the skb's sitting in neigh->arp_queue
are not going to be freed unless in failed case, therefore skb->sk
should not be orphaned so early.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: net: use-after-free in neigh_timer_handler/sock_wfree
  2017-03-02  5:25           ` Cong Wang
@ 2017-03-02  5:36             ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2017-03-02  5:36 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, netdev, LKML, syzkaller,
	Alexey Kuznetsov, James Morris

On Wed, Mar 1, 2017 at 9:25 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Wed, Mar 1, 2017 at 3:15 PM, Eric Dumazet <edumazet@google.com> wrote:
>> On Wed, Mar 1, 2017 at 3:09 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>>>
>>> But I doubt skb_orphan() is the solution here, shouldn't we just
>>> update sk->sk_wmem_alloc with skb->truesize changes?
>>
>> Is it worth it ? Apart from syszkaller I mean...
>>
>> We started with something that had a real impact on real workloads.
>>
>> 158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
>> pskb_expand_head()
>>
>> Note that auditing the stack took me a while.
>
> I don't know how sk refcnt could work correctly without making
> sk_wmem_alloc correctly. We certainly could just call skb_orphan()
> is we don't need skb->sk any more, probably like the frag case,
> but for this case, the neigh one, the skb's sitting in neigh->arp_queue
> are not going to be freed unless in failed case, therefore skb->sk
> should not be orphaned so early.


There is absolutely no issue in arp/nd case.
Many skbs can sit there and it is fine.
Same with skbs sitting a long time in a qdisc.

Of course we try to not call skb_orphan() unless really needed.

tcp_gso_segment() tries very hard to propagate skb ownership to the segments,
but even something apparently easy like that took some patches before
being done right.

(for details : 0d08c42cf9a71530fef5ebcfe368f38f2dd0476f "tcp: gso: fix
truesize tracking")

conntrack reasm is mostly used in forwarding workloads, where skb->sk
is already NULL.

Are you thinking of a real workload where skb->sk _needs_ to be kept
in ipv6 reasm ?

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-03-02  5:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01 19:27 net: use-after-free in neigh_timer_handler/sock_wfree Dmitry Vyukov
2017-03-01 21:24 ` Cong Wang
2017-03-01 21:43   ` Cong Wang
2017-03-01 21:54     ` Eric Dumazet
2017-03-01 23:09       ` Cong Wang
2017-03-01 23:15         ` Eric Dumazet
2017-03-02  5:25           ` Cong Wang
2017-03-02  5:36             ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).