linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* net: suspicious RCU usage in nf_hook
@ 2017-01-27 21:15 Dmitry Vyukov
  2017-01-27 23:22 ` Cong Wang
  2017-01-27 23:35 ` Eric Dumazet
  0 siblings, 2 replies; 15+ messages in thread
From: Dmitry Vyukov @ 2017-01-27 21:15 UTC (permalink / raw)
  To: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel
  Cc: syzkaller

Hello,

I've got the following report while running syzkaller fuzzer on
fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1:

[ INFO: suspicious RCU usage. ]
4.10.0-rc5+ #192 Not tainted
-------------------------------
./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
critical section!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 0
2 locks held by syz-executor14/23111:
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
include/net/sock.h:1454 [inline]
 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
 #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
include/linux/netfilter.h:201 [inline]
 #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
__ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160

stack backtrace:
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
 ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752
in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
INFO: lockdep is turned off.
CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:15 [inline]
 dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
 rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
 sock_sendmsg_nosec net/socket.c:635 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:645
 sock_write_iter+0x326/0x600 net/socket.c:848
 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
 vfs_writev+0x87/0xc0 fs/read_write.c:911
 do_writev+0x110/0x2c0 fs/read_write.c:944
 SYSC_writev fs/read_write.c:1017 [inline]
 SyS_writev+0x27/0x30 fs/read_write.c:1014
 entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x445559
RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
BUG: scheduling while atomic: syz-executor14/23111/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Kernel panic - not syncing: scheduling while atomic

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-27 21:15 net: suspicious RCU usage in nf_hook Dmitry Vyukov
@ 2017-01-27 23:22 ` Cong Wang
  2017-01-27 23:30   ` Cong Wang
  2017-01-27 23:35 ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-01-27 23:22 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
> stack backtrace:
> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
>  rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
>  ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>  mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
>  atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
>  __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
>  static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
>  net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
>  sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
>  __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
>  sk_destruct+0x47/0x80 net/core/sock.c:1460

jump label uses a mutex and we call jump label API in softIRQ context...
Maybe we have to move it to a work struct as what we did for netlink.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-27 23:22 ` Cong Wang
@ 2017-01-27 23:30   ` Cong Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2017-01-27 23:30 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Fri, Jan 27, 2017 at 3:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov <dvyukov@google.com> wrote:
>> stack backtrace:
>> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Call Trace:
>>  __dump_stack lib/dump_stack.c:15 [inline]
>>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>>  lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
>>  rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
>>  ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
>>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>>  mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
>>  atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
>>  __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
>>  static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
>>  net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
>>  sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
>>  __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
>>  sk_destruct+0x47/0x80 net/core/sock.c:1460
>
> jump label uses a mutex and we call jump label API in softIRQ context...
> Maybe we have to move it to a work struct as what we did for netlink.

Correct myself: process context but with RCU read lock held...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-27 21:15 net: suspicious RCU usage in nf_hook Dmitry Vyukov
  2017-01-27 23:22 ` Cong Wang
@ 2017-01-27 23:35 ` Eric Dumazet
  2017-01-28  1:00   ` Cong Wang
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2017-01-27 23:35 UTC (permalink / raw)
  To: Dmitry Vyukov
  Cc: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Fri, 2017-01-27 at 22:15 +0100, Dmitry Vyukov wrote:
> Hello,
> 
> I've got the following report while running syzkaller fuzzer on
> fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1:
> 
> [ INFO: suspicious RCU usage. ]
> 4.10.0-rc5+ #192 Not tainted
> -------------------------------
> ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
> critical section!
> 
> other info that might help us debug this:
> 
> rcu_scheduler_active = 2, debug_locks = 0
> 2 locks held by syz-executor14/23111:
>  #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
> include/net/sock.h:1454 [inline]
>  #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
> rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
>  #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
> include/linux/netfilter.h:201 [inline]
>  #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
> __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
> 
> stack backtrace:
> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
>  rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
>  ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>  mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
>  atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
>  __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
>  static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
>  net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
>  sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
>  __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
>  sk_destruct+0x47/0x80 net/core/sock.c:1460
>  __sk_free+0x57/0x230 net/core/sock.c:1468
>  sock_wfree+0xae/0x120 net/core/sock.c:1645
>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
>  skb_release_all+0x15/0x60 net/core/skbuff.c:668
>  __kfree_skb+0x15/0x20 net/core/skbuff.c:684
>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
>  inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
>  inet_frag_put include/net/inet_frag.h:133 [inline]
>  nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
>  ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
>  nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
>  nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
>  nf_hook include/linux/netfilter.h:212 [inline]
>  __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
>  ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
>  ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
>  ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
>  rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
>  rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  sock_write_iter+0x326/0x600 net/socket.c:848
>  do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
>  do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
>  vfs_writev+0x87/0xc0 fs/read_write.c:911
>  do_writev+0x110/0x2c0 fs/read_write.c:944
>  SYSC_writev fs/read_write.c:1017 [inline]
>  SyS_writev+0x27/0x30 fs/read_write.c:1014
>  entry_SYSCALL_64_fastpath+0x1f/0xc2
> RIP: 0033:0x445559
> RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
> RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
> RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
> RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
> R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752
> in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
> INFO: lockdep is turned off.
> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:15 [inline]
>  dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
>  ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
>  __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
>  mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
>  atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
>  __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
>  static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
>  net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
>  sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
>  __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
>  sk_destruct+0x47/0x80 net/core/sock.c:1460
>  __sk_free+0x57/0x230 net/core/sock.c:1468
>  sock_wfree+0xae/0x120 net/core/sock.c:1645
>  skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
>  skb_release_all+0x15/0x60 net/core/skbuff.c:668
>  __kfree_skb+0x15/0x20 net/core/skbuff.c:684
>  kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
>  inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
>  inet_frag_put include/net/inet_frag.h:133 [inline]
>  nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617
>  ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
>  nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
>  nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
>  nf_hook include/linux/netfilter.h:212 [inline]
>  __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
>  ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
>  ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
>  ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
>  rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
>  rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
>  inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
>  sock_sendmsg_nosec net/socket.c:635 [inline]
>  sock_sendmsg+0xca/0x110 net/socket.c:645
>  sock_write_iter+0x326/0x600 net/socket.c:848
>  do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
>  do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
>  vfs_writev+0x87/0xc0 fs/read_write.c:911
>  do_writev+0x110/0x2c0 fs/read_write.c:944
>  SYSC_writev fs/read_write.c:1017 [inline]
>  SyS_writev+0x27/0x30 fs/read_write.c:1014
>  entry_SYSCALL_64_fastpath+0x1f/0xc2

Oh well, I forgot to submit the official patch I think, Jan 9th.

https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-27 23:35 ` Eric Dumazet
@ 2017-01-28  1:00   ` Cong Wang
  2017-01-28  1:31     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-01-28  1:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Oh well, I forgot to submit the official patch I think, Jan 9th.
>
> https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>

Hmm, but why only fragments need skb_orphan()? It seems like
any kfree_skb() inside a nf hook needs to have a preceding
skb_orphan().

Also, I am not convinced it is similar to commit 8282f27449bf15548
which is on RX path.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-28  1:00   ` Cong Wang
@ 2017-01-28  1:31     ` Eric Dumazet
  2017-01-31  6:19       ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2017-01-28  1:31 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote:
> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Oh well, I forgot to submit the official patch I think, Jan 9th.
> >
> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
> >
> 
> Hmm, but why only fragments need skb_orphan()? It seems like
> any kfree_skb() inside a nf hook needs to have a preceding
> skb_orphan().


> 
> Also, I am not convinced it is similar to commit 8282f27449bf15548
> which is on RX path.

Well, we clearly see IPv6 reassembly being part of the equation in both
cases.

I was replying to first part of the splat [1], which was already
diagnosed and had a non official patch.

use after free is also a bug, regardless of jump label being used or
not.

I still do not really understand this nf_hook issue, I thought we were
disabling BH in netfilter.

So the in_interrupt() check in net_disable_timestamp() should trigger,
this was the intent of netstamp_needed_deferred existence.

Not sure if we can test for rcu_read_lock() as well.

[1]
 sk_destruct+0x47/0x80 net/core/sock.c:1460
 __sk_free+0x57/0x230 net/core/sock.c:1468
 sock_wfree+0xae/0x120 net/core/sock.c:1645
 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
 skb_release_all+0x15/0x60 net/core/skbuff.c:668
 __kfree_skb+0x15/0x20 net/core/skbuff.c:684
 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
 inet_frag_put include/net/inet_frag.h:133 [inline]
 nf_ct_frag6_gather+0x1106/0x3840
net/ipv6/netfilter/nf_conntrack_reasm.c:617
 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
 nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
 nf_hook include/linux/netfilter.h:212 [inline]
 __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-28  1:31     ` Eric Dumazet
@ 2017-01-31  6:19       ` Cong Wang
  2017-01-31 15:44         ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-01-31  6:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Fri, Jan 27, 2017 at 5:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote:
>> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Oh well, I forgot to submit the official patch I think, Jan 9th.
>> >
>> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ
>> >
>>
>> Hmm, but why only fragments need skb_orphan()? It seems like
>> any kfree_skb() inside a nf hook needs to have a preceding
>> skb_orphan().
>
>
>>
>> Also, I am not convinced it is similar to commit 8282f27449bf15548
>> which is on RX path.
>
> Well, we clearly see IPv6 reassembly being part of the equation in both
> cases.

Yeah, of course. My worry is that this problem is more than just
IPv6 reassembly.

>
> I was replying to first part of the splat [1], which was already
> diagnosed and had a non official patch.
>
> use after free is also a bug, regardless of jump label being used or
> not.
>
> I still do not really understand this nf_hook issue, I thought we were
> disabling BH in netfilter.

It is a different warning from use-after-free, this one is about sleep
in atomic context, mutex lock is acquired with RCU read lock held.


>
> So the in_interrupt() check in net_disable_timestamp() should trigger,
> this was the intent of netstamp_needed_deferred existence.
>
> Not sure if we can test for rcu_read_lock() as well.
>

The context is process context (TX path before hitting qdisc), and
BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
makes me thinking maybe we really need to disable BH in this
case for nf_hook()? But it is called in RX path too, and BH is
already disabled there.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-31  6:19       ` Cong Wang
@ 2017-01-31 15:44         ` Eric Dumazet
  2017-02-01 20:51           ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2017-01-31 15:44 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:

> 
> The context is process context (TX path before hitting qdisc), and
> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
> makes me thinking maybe we really need to disable BH in this
> case for nf_hook()? But it is called in RX path too, and BH is
> already disabled there.

ipt_do_table() and similar netfilter entry points disable BH.

Maybe it is done too late.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-01-31 15:44         ` Eric Dumazet
@ 2017-02-01 20:51           ` Cong Wang
  2017-02-01 21:16             ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-02-01 20:51 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
>
>>
>> The context is process context (TX path before hitting qdisc), and
>> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
>> makes me thinking maybe we really need to disable BH in this
>> case for nf_hook()? But it is called in RX path too, and BH is
>> already disabled there.
>
> ipt_do_table() and similar netfilter entry points disable BH.
>
> Maybe it is done too late.

I think we need a fix like the following one for minimum impact.

diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..eee7d63 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
 void net_disable_timestamp(void)
 {
 #ifdef HAVE_JUMP_LABEL
-       if (in_interrupt()) {
-               atomic_inc(&netstamp_needed_deferred);
-               return;
-       }
-#endif
+       atomic_inc(&netstamp_needed_deferred);
+#else
        static_key_slow_dec(&netstamp_needed);
+#endif
 }
 EXPORT_SYMBOL(net_disable_timestamp);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-02-01 20:51           ` Cong Wang
@ 2017-02-01 21:16             ` Eric Dumazet
  2017-02-01 21:22               ` Eric Dumazet
  2017-02-01 23:29               ` Cong Wang
  0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2017-02-01 21:16 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Wed, 2017-02-01 at 12:51 -0800, Cong Wang wrote:
> On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
> >
> >>
> >> The context is process context (TX path before hitting qdisc), and
> >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
> >> makes me thinking maybe we really need to disable BH in this
> >> case for nf_hook()? But it is called in RX path too, and BH is
> >> already disabled there.
> >
> > ipt_do_table() and similar netfilter entry points disable BH.
> >
> > Maybe it is done too late.
> 
> I think we need a fix like the following one for minimum impact.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 727b6fd..eee7d63 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
>  void net_disable_timestamp(void)
>  {
>  #ifdef HAVE_JUMP_LABEL
> -       if (in_interrupt()) {
> -               atomic_inc(&netstamp_needed_deferred);
> -               return;
> -       }
> -#endif
> +       atomic_inc(&netstamp_needed_deferred);
> +#else
>         static_key_slow_dec(&netstamp_needed);
> +#endif
>  }
>  EXPORT_SYMBOL(net_disable_timestamp);

This would permanently leave the kernel in the netstamp_needed state.

I would prefer the patch using a process context to perform the
cleanup ? Note there is a race window, but probably not a big deal.

 net/core/dev.c |   30 ++++++++++--------------------
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7f218e095361520d11c243d650e053321ea7274f..1cae681b6cfd1cf2c9bee7072eb8af9cf79cced8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1695,37 +1695,27 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);
 
 static struct static_key netstamp_needed __read_mostly;
 #ifdef HAVE_JUMP_LABEL
-/* We are not allowed to call static_key_slow_dec() from irq context
- * If net_disable_timestamp() is called from irq context, defer the
- * static_key_slow_dec() calls.
- */
 static atomic_t netstamp_needed_deferred;
-#endif
-
-void net_enable_timestamp(void)
+static void netstamp_clear(struct work_struct *work)
 {
-#ifdef HAVE_JUMP_LABEL
 	int deferred = atomic_xchg(&netstamp_needed_deferred, 0);
 
-	if (deferred) {
-		while (--deferred)
-			static_key_slow_dec(&netstamp_needed);
-		return;
-	}
+	while (deferred--)
+		static_key_slow_dec(&netstamp_needed);
+}
+static DECLARE_WORK(netstamp_work, netstamp_clear);
 #endif
+
+void net_enable_timestamp(void)
+{
 	static_key_slow_inc(&netstamp_needed);
 }
 EXPORT_SYMBOL(net_enable_timestamp);
 
 void net_disable_timestamp(void)
 {
-#ifdef HAVE_JUMP_LABEL
-	if (in_interrupt()) {
-		atomic_inc(&netstamp_needed_deferred);
-		return;
-	}
-#endif
-	static_key_slow_dec(&netstamp_needed);
+	atomic_inc(&netstamp_needed_deferred);
+	schedule_work(&netstamp_work);
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-02-01 21:16             ` Eric Dumazet
@ 2017-02-01 21:22               ` Eric Dumazet
  2017-02-01 23:29               ` Cong Wang
  1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2017-02-01 21:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Wed, 2017-02-01 at 13:16 -0800, Eric Dumazet wrote:

> This would permanently leave the kernel in the netstamp_needed state.
> 
> I would prefer the patch using a process context to perform the
> cleanup ? Note there is a race window, but probably not a big deal.
> 
>  net/core/dev.c |   30 ++++++++++--------------------
>  1 file changed, 10 insertions(+), 20 deletions(-)

Patch is not complete (for the HAVE_JUMP_LABEL=n case)

Would you like to author/submit it ?

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-02-01 21:16             ` Eric Dumazet
  2017-02-01 21:22               ` Eric Dumazet
@ 2017-02-01 23:29               ` Cong Wang
  2017-02-01 23:48                 ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Cong Wang @ 2017-02-01 23:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

[-- Attachment #1: Type: text/plain, Size: 1978 bytes --]

On Wed, Feb 1, 2017 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-02-01 at 12:51 -0800, Cong Wang wrote:
>> On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote:
>> >
>> >>
>> >> The context is process context (TX path before hitting qdisc), and
>> >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this
>> >> makes me thinking maybe we really need to disable BH in this
>> >> case for nf_hook()? But it is called in RX path too, and BH is
>> >> already disabled there.
>> >
>> > ipt_do_table() and similar netfilter entry points disable BH.
>> >
>> > Maybe it is done too late.
>>
>> I think we need a fix like the following one for minimum impact.
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 727b6fd..eee7d63 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp);
>>  void net_disable_timestamp(void)
>>  {
>>  #ifdef HAVE_JUMP_LABEL
>> -       if (in_interrupt()) {
>> -               atomic_inc(&netstamp_needed_deferred);
>> -               return;
>> -       }
>> -#endif
>> +       atomic_inc(&netstamp_needed_deferred);
>> +#else
>>         static_key_slow_dec(&netstamp_needed);
>> +#endif
>>  }
>>  EXPORT_SYMBOL(net_disable_timestamp);
>
> This would permanently leave the kernel in the netstamp_needed state.
>
> I would prefer the patch using a process context to perform the
> cleanup ? Note there is a race window, but probably not a big deal.

Not sure if it is better. The difference is caught up in net_enable_timestamp(),
which is called setsockopt() path and sk_clone() path, so we could be
in netstamp_needed state for a long time too until user-space exercises
these paths.

I am feeling we probably need to get rid of netstamp_needed_deferred,
and simply defer the whole static_key_slow_dec(), like the attached patch
(compile only).

What do you think?

[-- Attachment #2: net-timestamp.diff --]
[-- Type: text/plain, Size: 1483 bytes --]

diff --git a/net/core/dev.c b/net/core/dev.c
index 727b6fd..0ef1734 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1694,38 +1694,28 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);
 #endif
 
 static struct static_key netstamp_needed __read_mostly;
-#ifdef HAVE_JUMP_LABEL
-/* We are not allowed to call static_key_slow_dec() from irq context
- * If net_disable_timestamp() is called from irq context, defer the
- * static_key_slow_dec() calls.
- */
-static atomic_t netstamp_needed_deferred;
-#endif
 
-void net_enable_timestamp(void)
+static void netstamp_clear(struct work_struct *work)
 {
-#ifdef HAVE_JUMP_LABEL
-	int deferred = atomic_xchg(&netstamp_needed_deferred, 0);
+	static_key_slow_dec(&netstamp_needed);
+}
 
-	if (deferred) {
-		while (--deferred)
-			static_key_slow_dec(&netstamp_needed);
-		return;
-	}
-#endif
+static DECLARE_WORK(netstamp_work, netstamp_clear);
+
+void net_enable_timestamp(void)
+{
+	flush_work(&netstamp_work);
 	static_key_slow_inc(&netstamp_needed);
 }
 EXPORT_SYMBOL(net_enable_timestamp);
 
 void net_disable_timestamp(void)
 {
-#ifdef HAVE_JUMP_LABEL
-	if (in_interrupt()) {
-		atomic_inc(&netstamp_needed_deferred);
-		return;
-	}
-#endif
-	static_key_slow_dec(&netstamp_needed);
+	/* We are not allowed to call static_key_slow_dec() from irq context
+	 * If net_disable_timestamp() is called from irq context, defer the
+	 * static_key_slow_dec() calls.
+	 */
+	schedule_work(&netstamp_work);
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-02-01 23:29               ` Cong Wang
@ 2017-02-01 23:48                 ` Eric Dumazet
  2017-02-01 23:59                   ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2017-02-01 23:48 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:

> Not sure if it is better. The difference is caught up in net_enable_timestamp(),
> which is called setsockopt() path and sk_clone() path, so we could be
> in netstamp_needed state for a long time too until user-space exercises
> these paths.
>
> I am feeling we probably need to get rid of netstamp_needed_deferred,
> and simply defer the whole static_key_slow_dec(), like the attached patch
> (compile only).
>
> What do you think?

I think we need to keep the atomic.

If two cpus call net_disable_timestamp() roughly at the same time, the
work will be scheduled once.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-02-01 23:48                 ` Eric Dumazet
@ 2017-02-01 23:59                   ` Eric Dumazet
  2017-02-02 18:01                     ` Cong Wang
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2017-02-01 23:59 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote:
> On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> > Not sure if it is better. The difference is caught up in net_enable_timestamp(),
> > which is called setsockopt() path and sk_clone() path, so we could be
> > in netstamp_needed state for a long time too until user-space exercises
> > these paths.
> >
> > I am feeling we probably need to get rid of netstamp_needed_deferred,
> > and simply defer the whole static_key_slow_dec(), like the attached patch
> > (compile only).
> >
> > What do you think?
> 
> I think we need to keep the atomic.
> 
> If two cpus call net_disable_timestamp() roughly at the same time, the
> work will be scheduled once.

Updated patch (but not tested yet)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7f218e095361520d11c243d650e053321ea7274f..29101c98399f40b6b8e42c31a255d8f1fb6bd7a1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1695,24 +1695,19 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue);
 
 static struct static_key netstamp_needed __read_mostly;
 #ifdef HAVE_JUMP_LABEL
-/* We are not allowed to call static_key_slow_dec() from irq context
- * If net_disable_timestamp() is called from irq context, defer the
- * static_key_slow_dec() calls.
- */
 static atomic_t netstamp_needed_deferred;
-#endif
-
-void net_enable_timestamp(void)
+static void netstamp_clear(struct work_struct *work)
 {
-#ifdef HAVE_JUMP_LABEL
 	int deferred = atomic_xchg(&netstamp_needed_deferred, 0);
 
-	if (deferred) {
-		while (--deferred)
-			static_key_slow_dec(&netstamp_needed);
-		return;
-	}
+	while (deferred--)
+		static_key_slow_dec(&netstamp_needed);
+}
+static DECLARE_WORK(netstamp_work, netstamp_clear);
 #endif
+
+void net_enable_timestamp(void)
+{
 	static_key_slow_inc(&netstamp_needed);
 }
 EXPORT_SYMBOL(net_enable_timestamp);
@@ -1720,12 +1715,12 @@ EXPORT_SYMBOL(net_enable_timestamp);
 void net_disable_timestamp(void)
 {
 #ifdef HAVE_JUMP_LABEL
-	if (in_interrupt()) {
-		atomic_inc(&netstamp_needed_deferred);
-		return;
-	}
-#endif
+	/* net_disable_timestamp() can be called from non process context */
+	atomic_inc(&netstamp_needed_deferred);
+	schedule_work(&netstamp_work);
+#else
 	static_key_slow_dec(&netstamp_needed);
+#endif
 }
 EXPORT_SYMBOL(net_disable_timestamp);
 

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: net: suspicious RCU usage in nf_hook
  2017-02-01 23:59                   ` Eric Dumazet
@ 2017-02-02 18:01                     ` Cong Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Cong Wang @ 2017-02-02 18:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML,
	Pablo Neira Ayuso, netfilter-devel, syzkaller

On Wed, Feb 1, 2017 at 3:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote:
>> On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>
>> > Not sure if it is better. The difference is caught up in net_enable_timestamp(),
>> > which is called setsockopt() path and sk_clone() path, so we could be
>> > in netstamp_needed state for a long time too until user-space exercises
>> > these paths.
>> >
>> > I am feeling we probably need to get rid of netstamp_needed_deferred,
>> > and simply defer the whole static_key_slow_dec(), like the attached patch
>> > (compile only).
>> >
>> > What do you think?
>>
>> I think we need to keep the atomic.
>>
>> If two cpus call net_disable_timestamp() roughly at the same time, the
>> work will be scheduled once.

Good point! Yeah, the same work will not be schedule twice.

>
> Updated patch (but not tested yet)

I can't think out a better way to fix this. I expect jump_label to provide
an API for this, but it doesn't, static_key_slow_dec_deferred()
is just for batching. Probably we should introduce one to avoid these
ugly #ifdef HAVE_JUMP_LABEL here, but that is a -next material.

So, please feel free to send it formally.

Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-02-02 18:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-27 21:15 net: suspicious RCU usage in nf_hook Dmitry Vyukov
2017-01-27 23:22 ` Cong Wang
2017-01-27 23:30   ` Cong Wang
2017-01-27 23:35 ` Eric Dumazet
2017-01-28  1:00   ` Cong Wang
2017-01-28  1:31     ` Eric Dumazet
2017-01-31  6:19       ` Cong Wang
2017-01-31 15:44         ` Eric Dumazet
2017-02-01 20:51           ` Cong Wang
2017-02-01 21:16             ` Eric Dumazet
2017-02-01 21:22               ` Eric Dumazet
2017-02-01 23:29               ` Cong Wang
2017-02-01 23:48                 ` Eric Dumazet
2017-02-01 23:59                   ` Eric Dumazet
2017-02-02 18:01                     ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).