* net: suspicious RCU usage in nf_hook @ 2017-01-27 21:15 Dmitry Vyukov 2017-01-27 23:22 ` Cong Wang 2017-01-27 23:35 ` Eric Dumazet 0 siblings, 2 replies; 17+ messages in thread From: Dmitry Vyukov @ 2017-01-27 21:15 UTC (permalink / raw) To: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel Cc: syzkaller Hello, I've got the following report while running syzkaller fuzzer on fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1: [ INFO: suspicious RCU usage. ] 4.10.0-rc5+ #192 Not tainted ------------------------------- ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 0 2 locks held by syz-executor14/23111: #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock include/net/sock.h:1454 [inline] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919 #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook include/linux/netfilter.h:201 [inline] #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160 stack backtrace: CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline] ___might_sleep+0x560/0x650 kernel/sched/core.c:7748 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x600 net/socket.c:848 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 vfs_writev+0x87/0xc0 fs/read_write.c:911 do_writev+0x110/0x2c0 fs/read_write.c:944 SYSC_writev fs/read_write.c:1017 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1014 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x445559 RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559 RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005 RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000 R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752 in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14 INFO: lockdep is turned off. CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x600 net/socket.c:848 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 vfs_writev+0x87/0xc0 fs/read_write.c:911 do_writev+0x110/0x2c0 fs/read_write.c:944 SYSC_writev fs/read_write.c:1017 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1014 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x445559 RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559 RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005 RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000 R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400 BUG: scheduling while atomic: syz-executor14/23111/0x00000002 INFO: lockdep is turned off. Modules linked in: Kernel panic - not syncing: scheduling while atomic ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-27 21:15 net: suspicious RCU usage in nf_hook Dmitry Vyukov @ 2017-01-27 23:22 ` Cong Wang 2017-01-27 23:30 ` Cong Wang 2017-01-27 23:35 ` Eric Dumazet 1 sibling, 1 reply; 17+ messages in thread From: Cong Wang @ 2017-01-27 23:22 UTC (permalink / raw) To: Dmitry Vyukov Cc: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov <dvyukov@google.com> wrote: > stack backtrace: > CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452 > rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline] > ___might_sleep+0x560/0x650 kernel/sched/core.c:7748 > __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 > mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 > atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 > __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 > static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 > net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 > sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 > __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 > sk_destruct+0x47/0x80 net/core/sock.c:1460 jump label uses a mutex and we call jump label API in softIRQ context... Maybe we have to move it to a work struct as what we did for netlink. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-27 23:22 ` Cong Wang @ 2017-01-27 23:30 ` Cong Wang 0 siblings, 0 replies; 17+ messages in thread From: Cong Wang @ 2017-01-27 23:30 UTC (permalink / raw) To: Dmitry Vyukov Cc: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Fri, Jan 27, 2017 at 3:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > On Fri, Jan 27, 2017 at 1:15 PM, Dmitry Vyukov <dvyukov@google.com> wrote: >> stack backtrace: >> CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> Call Trace: >> __dump_stack lib/dump_stack.c:15 [inline] >> dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 >> lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452 >> rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline] >> ___might_sleep+0x560/0x650 kernel/sched/core.c:7748 >> __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 >> mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 >> atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 >> __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 >> static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 >> net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 >> sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 >> __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 >> sk_destruct+0x47/0x80 net/core/sock.c:1460 > > jump label uses a mutex and we call jump label API in softIRQ context... > Maybe we have to move it to a work struct as what we did for netlink. Correct myself: process context but with RCU read lock held... ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-27 21:15 net: suspicious RCU usage in nf_hook Dmitry Vyukov 2017-01-27 23:22 ` Cong Wang @ 2017-01-27 23:35 ` Eric Dumazet 2017-01-28 1:00 ` Cong Wang 1 sibling, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-01-27 23:35 UTC (permalink / raw) To: Dmitry Vyukov Cc: David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Fri, 2017-01-27 at 22:15 +0100, Dmitry Vyukov wrote: > Hello, > > I've got the following report while running syzkaller fuzzer on > fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1: > > [ INFO: suspicious RCU usage. ] > 4.10.0-rc5+ #192 Not tainted > ------------------------------- > ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side > critical section! > > other info that might help us debug this: > > rcu_scheduler_active = 2, debug_locks = 0 > 2 locks held by syz-executor14/23111: > #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock > include/net/sock.h:1454 [inline] > #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] > rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919 > #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook > include/linux/netfilter.h:201 [inline] > #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] > __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160 > > stack backtrace: > CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452 > rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline] > ___might_sleep+0x560/0x650 kernel/sched/core.c:7748 > __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 > mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 > atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 > __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 > static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 > net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 > sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 > __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 > sk_destruct+0x47/0x80 net/core/sock.c:1460 > __sk_free+0x57/0x230 net/core/sock.c:1468 > sock_wfree+0xae/0x120 net/core/sock.c:1645 > skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 > skb_release_all+0x15/0x60 net/core/skbuff.c:668 > __kfree_skb+0x15/0x20 net/core/skbuff.c:684 > kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 > inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 > inet_frag_put include/net/inet_frag.h:133 [inline] > nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 > ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 > nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] > nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 > nf_hook include/linux/netfilter.h:212 [inline] > __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 > ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 > ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 > ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 > rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] > rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 > inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 > sock_sendmsg_nosec net/socket.c:635 [inline] > sock_sendmsg+0xca/0x110 net/socket.c:645 > sock_write_iter+0x326/0x600 net/socket.c:848 > do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 > do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 > vfs_writev+0x87/0xc0 fs/read_write.c:911 > do_writev+0x110/0x2c0 fs/read_write.c:944 > SYSC_writev fs/read_write.c:1017 [inline] > SyS_writev+0x27/0x30 fs/read_write.c:1014 > entry_SYSCALL_64_fastpath+0x1f/0xc2 > RIP: 0033:0x445559 > RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014 > RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559 > RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005 > RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000 > R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400 > BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752 > in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14 > INFO: lockdep is turned off. > CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > Call Trace: > __dump_stack lib/dump_stack.c:15 [inline] > dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 > ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780 > __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 > mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 > atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 > __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 > static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 > net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 > sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 > __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 > sk_destruct+0x47/0x80 net/core/sock.c:1460 > __sk_free+0x57/0x230 net/core/sock.c:1468 > sock_wfree+0xae/0x120 net/core/sock.c:1645 > skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 > skb_release_all+0x15/0x60 net/core/skbuff.c:668 > __kfree_skb+0x15/0x20 net/core/skbuff.c:684 > kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 > inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 > inet_frag_put include/net/inet_frag.h:133 [inline] > nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 > ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 > nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] > nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 > nf_hook include/linux/netfilter.h:212 [inline] > __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 > ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 > ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 > ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 > rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] > rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 > inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 > sock_sendmsg_nosec net/socket.c:635 [inline] > sock_sendmsg+0xca/0x110 net/socket.c:645 > sock_write_iter+0x326/0x600 net/socket.c:848 > do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 > do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 > vfs_writev+0x87/0xc0 fs/read_write.c:911 > do_writev+0x110/0x2c0 fs/read_write.c:944 > SYSC_writev fs/read_write.c:1017 [inline] > SyS_writev+0x27/0x30 fs/read_write.c:1014 > entry_SYSCALL_64_fastpath+0x1f/0xc2 Oh well, I forgot to submit the official patch I think, Jan 9th. https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-27 23:35 ` Eric Dumazet @ 2017-01-28 1:00 ` Cong Wang 2017-01-28 1:31 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Cong Wang @ 2017-01-28 1:00 UTC (permalink / raw) To: Eric Dumazet Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > Oh well, I forgot to submit the official patch I think, Jan 9th. > > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ > Hmm, but why only fragments need skb_orphan()? It seems like any kfree_skb() inside a nf hook needs to have a preceding skb_orphan(). Also, I am not convinced it is similar to commit 8282f27449bf15548 which is on RX path. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-28 1:00 ` Cong Wang @ 2017-01-28 1:31 ` Eric Dumazet 2017-01-31 6:19 ` Cong Wang 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-01-28 1:31 UTC (permalink / raw) To: Cong Wang Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote: > On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > Oh well, I forgot to submit the official patch I think, Jan 9th. > > > > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ > > > > Hmm, but why only fragments need skb_orphan()? It seems like > any kfree_skb() inside a nf hook needs to have a preceding > skb_orphan(). > > Also, I am not convinced it is similar to commit 8282f27449bf15548 > which is on RX path. Well, we clearly see IPv6 reassembly being part of the equation in both cases. I was replying to first part of the splat [1], which was already diagnosed and had a non official patch. use after free is also a bug, regardless of jump label being used or not. I still do not really understand this nf_hook issue, I thought we were disabling BH in netfilter. So the in_interrupt() check in net_disable_timestamp() should trigger, this was the intent of netstamp_needed_deferred existence. Not sure if we can test for rcu_read_lock() as well. [1] sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-28 1:31 ` Eric Dumazet @ 2017-01-31 6:19 ` Cong Wang 2017-01-31 15:44 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Cong Wang @ 2017-01-31 6:19 UTC (permalink / raw) To: Eric Dumazet Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Fri, Jan 27, 2017 at 5:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2017-01-27 at 17:00 -0800, Cong Wang wrote: >> On Fri, Jan 27, 2017 at 3:35 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > Oh well, I forgot to submit the official patch I think, Jan 9th. >> > >> > https://groups.google.com/forum/#!topic/syzkaller/BhyN5OFd7sQ >> > >> >> Hmm, but why only fragments need skb_orphan()? It seems like >> any kfree_skb() inside a nf hook needs to have a preceding >> skb_orphan(). > > >> >> Also, I am not convinced it is similar to commit 8282f27449bf15548 >> which is on RX path. > > Well, we clearly see IPv6 reassembly being part of the equation in both > cases. Yeah, of course. My worry is that this problem is more than just IPv6 reassembly. > > I was replying to first part of the splat [1], which was already > diagnosed and had a non official patch. > > use after free is also a bug, regardless of jump label being used or > not. > > I still do not really understand this nf_hook issue, I thought we were > disabling BH in netfilter. It is a different warning from use-after-free, this one is about sleep in atomic context, mutex lock is acquired with RCU read lock held. > > So the in_interrupt() check in net_disable_timestamp() should trigger, > this was the intent of netstamp_needed_deferred existence. > > Not sure if we can test for rcu_read_lock() as well. > The context is process context (TX path before hitting qdisc), and BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this makes me thinking maybe we really need to disable BH in this case for nf_hook()? But it is called in RX path too, and BH is already disabled there. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-31 6:19 ` Cong Wang @ 2017-01-31 15:44 ` Eric Dumazet 2017-02-01 20:51 ` Cong Wang 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-01-31 15:44 UTC (permalink / raw) To: Cong Wang Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote: > > The context is process context (TX path before hitting qdisc), and > BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this > makes me thinking maybe we really need to disable BH in this > case for nf_hook()? But it is called in RX path too, and BH is > already disabled there. ipt_do_table() and similar netfilter entry points disable BH. Maybe it is done too late. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-01-31 15:44 ` Eric Dumazet @ 2017-02-01 20:51 ` Cong Wang 2017-02-01 21:16 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Cong Wang @ 2017-02-01 20:51 UTC (permalink / raw) To: Eric Dumazet Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote: > >> >> The context is process context (TX path before hitting qdisc), and >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this >> makes me thinking maybe we really need to disable BH in this >> case for nf_hook()? But it is called in RX path too, and BH is >> already disabled there. > > ipt_do_table() and similar netfilter entry points disable BH. > > Maybe it is done too late. I think we need a fix like the following one for minimum impact. diff --git a/net/core/dev.c b/net/core/dev.c index 727b6fd..eee7d63 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp); void net_disable_timestamp(void) { #ifdef HAVE_JUMP_LABEL - if (in_interrupt()) { - atomic_inc(&netstamp_needed_deferred); - return; - } -#endif + atomic_inc(&netstamp_needed_deferred); +#else static_key_slow_dec(&netstamp_needed); +#endif } EXPORT_SYMBOL(net_disable_timestamp); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-02-01 20:51 ` Cong Wang @ 2017-02-01 21:16 ` Eric Dumazet 2017-02-01 21:22 ` Eric Dumazet 2017-02-01 23:29 ` Cong Wang 0 siblings, 2 replies; 17+ messages in thread From: Eric Dumazet @ 2017-02-01 21:16 UTC (permalink / raw) To: Cong Wang Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Wed, 2017-02-01 at 12:51 -0800, Cong Wang wrote: > On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote: > > > >> > >> The context is process context (TX path before hitting qdisc), and > >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this > >> makes me thinking maybe we really need to disable BH in this > >> case for nf_hook()? But it is called in RX path too, and BH is > >> already disabled there. > > > > ipt_do_table() and similar netfilter entry points disable BH. > > > > Maybe it is done too late. > > I think we need a fix like the following one for minimum impact. > > diff --git a/net/core/dev.c b/net/core/dev.c > index 727b6fd..eee7d63 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp); > void net_disable_timestamp(void) > { > #ifdef HAVE_JUMP_LABEL > - if (in_interrupt()) { > - atomic_inc(&netstamp_needed_deferred); > - return; > - } > -#endif > + atomic_inc(&netstamp_needed_deferred); > +#else > static_key_slow_dec(&netstamp_needed); > +#endif > } > EXPORT_SYMBOL(net_disable_timestamp); This would permanently leave the kernel in the netstamp_needed state. I would prefer the patch using a process context to perform the cleanup ? Note there is a race window, but probably not a big deal. net/core/dev.c | 30 ++++++++++-------------------- 1 file changed, 10 insertions(+), 20 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 7f218e095361520d11c243d650e053321ea7274f..1cae681b6cfd1cf2c9bee7072eb8af9cf79cced8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1695,37 +1695,27 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue); static struct static_key netstamp_needed __read_mostly; #ifdef HAVE_JUMP_LABEL -/* We are not allowed to call static_key_slow_dec() from irq context - * If net_disable_timestamp() is called from irq context, defer the - * static_key_slow_dec() calls. - */ static atomic_t netstamp_needed_deferred; -#endif - -void net_enable_timestamp(void) +static void netstamp_clear(struct work_struct *work) { -#ifdef HAVE_JUMP_LABEL int deferred = atomic_xchg(&netstamp_needed_deferred, 0); - if (deferred) { - while (--deferred) - static_key_slow_dec(&netstamp_needed); - return; - } + while (deferred--) + static_key_slow_dec(&netstamp_needed); +} +static DECLARE_WORK(netstamp_work, netstamp_clear); #endif + +void net_enable_timestamp(void) +{ static_key_slow_inc(&netstamp_needed); } EXPORT_SYMBOL(net_enable_timestamp); void net_disable_timestamp(void) { -#ifdef HAVE_JUMP_LABEL - if (in_interrupt()) { - atomic_inc(&netstamp_needed_deferred); - return; - } -#endif - static_key_slow_dec(&netstamp_needed); + atomic_inc(&netstamp_needed_deferred); + schedule_work(&netstamp_work); } EXPORT_SYMBOL(net_disable_timestamp); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-02-01 21:16 ` Eric Dumazet @ 2017-02-01 21:22 ` Eric Dumazet 2017-02-01 23:29 ` Cong Wang 1 sibling, 0 replies; 17+ messages in thread From: Eric Dumazet @ 2017-02-01 21:22 UTC (permalink / raw) To: Cong Wang Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Wed, 2017-02-01 at 13:16 -0800, Eric Dumazet wrote: > This would permanently leave the kernel in the netstamp_needed state. > > I would prefer the patch using a process context to perform the > cleanup ? Note there is a race window, but probably not a big deal. > > net/core/dev.c | 30 ++++++++++-------------------- > 1 file changed, 10 insertions(+), 20 deletions(-) Patch is not complete (for the HAVE_JUMP_LABEL=n case) Would you like to author/submit it ? Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-02-01 21:16 ` Eric Dumazet 2017-02-01 21:22 ` Eric Dumazet @ 2017-02-01 23:29 ` Cong Wang 2017-02-01 23:48 ` Eric Dumazet 1 sibling, 1 reply; 17+ messages in thread From: Cong Wang @ 2017-02-01 23:29 UTC (permalink / raw) To: Eric Dumazet Cc: Dmitry Vyukov, David Miller, Alexey Kuznetsov, Eric Dumazet, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller [-- Attachment #1: Type: text/plain, Size: 1978 bytes --] On Wed, Feb 1, 2017 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2017-02-01 at 12:51 -0800, Cong Wang wrote: >> On Tue, Jan 31, 2017 at 7:44 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> > On Mon, 2017-01-30 at 22:19 -0800, Cong Wang wrote: >> > >> >> >> >> The context is process context (TX path before hitting qdisc), and >> >> BH is not disabled, so in_interrupt() doesn't catch it. Hmm, this >> >> makes me thinking maybe we really need to disable BH in this >> >> case for nf_hook()? But it is called in RX path too, and BH is >> >> already disabled there. >> > >> > ipt_do_table() and similar netfilter entry points disable BH. >> > >> > Maybe it is done too late. >> >> I think we need a fix like the following one for minimum impact. >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 727b6fd..eee7d63 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -1720,12 +1720,10 @@ EXPORT_SYMBOL(net_enable_timestamp); >> void net_disable_timestamp(void) >> { >> #ifdef HAVE_JUMP_LABEL >> - if (in_interrupt()) { >> - atomic_inc(&netstamp_needed_deferred); >> - return; >> - } >> -#endif >> + atomic_inc(&netstamp_needed_deferred); >> +#else >> static_key_slow_dec(&netstamp_needed); >> +#endif >> } >> EXPORT_SYMBOL(net_disable_timestamp); > > This would permanently leave the kernel in the netstamp_needed state. > > I would prefer the patch using a process context to perform the > cleanup ? Note there is a race window, but probably not a big deal. Not sure if it is better. The difference is caught up in net_enable_timestamp(), which is called setsockopt() path and sk_clone() path, so we could be in netstamp_needed state for a long time too until user-space exercises these paths. I am feeling we probably need to get rid of netstamp_needed_deferred, and simply defer the whole static_key_slow_dec(), like the attached patch (compile only). What do you think? [-- Attachment #2: net-timestamp.diff --] [-- Type: text/plain, Size: 1483 bytes --] diff --git a/net/core/dev.c b/net/core/dev.c index 727b6fd..0ef1734 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1694,38 +1694,28 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue); #endif static struct static_key netstamp_needed __read_mostly; -#ifdef HAVE_JUMP_LABEL -/* We are not allowed to call static_key_slow_dec() from irq context - * If net_disable_timestamp() is called from irq context, defer the - * static_key_slow_dec() calls. - */ -static atomic_t netstamp_needed_deferred; -#endif -void net_enable_timestamp(void) +static void netstamp_clear(struct work_struct *work) { -#ifdef HAVE_JUMP_LABEL - int deferred = atomic_xchg(&netstamp_needed_deferred, 0); + static_key_slow_dec(&netstamp_needed); +} - if (deferred) { - while (--deferred) - static_key_slow_dec(&netstamp_needed); - return; - } -#endif +static DECLARE_WORK(netstamp_work, netstamp_clear); + +void net_enable_timestamp(void) +{ + flush_work(&netstamp_work); static_key_slow_inc(&netstamp_needed); } EXPORT_SYMBOL(net_enable_timestamp); void net_disable_timestamp(void) { -#ifdef HAVE_JUMP_LABEL - if (in_interrupt()) { - atomic_inc(&netstamp_needed_deferred); - return; - } -#endif - static_key_slow_dec(&netstamp_needed); + /* We are not allowed to call static_key_slow_dec() from irq context + * If net_disable_timestamp() is called from irq context, defer the + * static_key_slow_dec() calls. + */ + schedule_work(&netstamp_work); } EXPORT_SYMBOL(net_disable_timestamp); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-02-01 23:29 ` Cong Wang @ 2017-02-01 23:48 ` Eric Dumazet 2017-02-01 23:59 ` Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-02-01 23:48 UTC (permalink / raw) To: Cong Wang Cc: Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > Not sure if it is better. The difference is caught up in net_enable_timestamp(), > which is called setsockopt() path and sk_clone() path, so we could be > in netstamp_needed state for a long time too until user-space exercises > these paths. > > I am feeling we probably need to get rid of netstamp_needed_deferred, > and simply defer the whole static_key_slow_dec(), like the attached patch > (compile only). > > What do you think? I think we need to keep the atomic. If two cpus call net_disable_timestamp() roughly at the same time, the work will be scheduled once. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-02-01 23:48 ` Eric Dumazet @ 2017-02-01 23:59 ` Eric Dumazet 2017-02-02 18:01 ` Cong Wang 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-02-01 23:59 UTC (permalink / raw) To: Eric Dumazet Cc: Cong Wang, Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote: > On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > Not sure if it is better. The difference is caught up in net_enable_timestamp(), > > which is called setsockopt() path and sk_clone() path, so we could be > > in netstamp_needed state for a long time too until user-space exercises > > these paths. > > > > I am feeling we probably need to get rid of netstamp_needed_deferred, > > and simply defer the whole static_key_slow_dec(), like the attached patch > > (compile only). > > > > What do you think? > > I think we need to keep the atomic. > > If two cpus call net_disable_timestamp() roughly at the same time, the > work will be scheduled once. Updated patch (but not tested yet) diff --git a/net/core/dev.c b/net/core/dev.c index 7f218e095361520d11c243d650e053321ea7274f..29101c98399f40b6b8e42c31a255d8f1fb6bd7a1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1695,24 +1695,19 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue); static struct static_key netstamp_needed __read_mostly; #ifdef HAVE_JUMP_LABEL -/* We are not allowed to call static_key_slow_dec() from irq context - * If net_disable_timestamp() is called from irq context, defer the - * static_key_slow_dec() calls. - */ static atomic_t netstamp_needed_deferred; -#endif - -void net_enable_timestamp(void) +static void netstamp_clear(struct work_struct *work) { -#ifdef HAVE_JUMP_LABEL int deferred = atomic_xchg(&netstamp_needed_deferred, 0); - if (deferred) { - while (--deferred) - static_key_slow_dec(&netstamp_needed); - return; - } + while (deferred--) + static_key_slow_dec(&netstamp_needed); +} +static DECLARE_WORK(netstamp_work, netstamp_clear); #endif + +void net_enable_timestamp(void) +{ static_key_slow_inc(&netstamp_needed); } EXPORT_SYMBOL(net_enable_timestamp); @@ -1720,12 +1715,12 @@ EXPORT_SYMBOL(net_enable_timestamp); void net_disable_timestamp(void) { #ifdef HAVE_JUMP_LABEL - if (in_interrupt()) { - atomic_inc(&netstamp_needed_deferred); - return; - } -#endif + /* net_disable_timestamp() can be called from non process context */ + atomic_inc(&netstamp_needed_deferred); + schedule_work(&netstamp_work); +#else static_key_slow_dec(&netstamp_needed); +#endif } EXPORT_SYMBOL(net_disable_timestamp); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: net: suspicious RCU usage in nf_hook 2017-02-01 23:59 ` Eric Dumazet @ 2017-02-02 18:01 ` Cong Wang 2017-02-02 18:31 ` [PATCH net] net: use a work queue to defer net_disable_timestamp() work Eric Dumazet 0 siblings, 1 reply; 17+ messages in thread From: Cong Wang @ 2017-02-02 18:01 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, Dmitry Vyukov, David Miller, Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI, Patrick McHardy, netdev, LKML, Pablo Neira Ayuso, netfilter-devel, syzkaller On Wed, Feb 1, 2017 at 3:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2017-02-01 at 15:48 -0800, Eric Dumazet wrote: >> On Wed, Feb 1, 2017 at 3:29 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote: >> >> > Not sure if it is better. The difference is caught up in net_enable_timestamp(), >> > which is called setsockopt() path and sk_clone() path, so we could be >> > in netstamp_needed state for a long time too until user-space exercises >> > these paths. >> > >> > I am feeling we probably need to get rid of netstamp_needed_deferred, >> > and simply defer the whole static_key_slow_dec(), like the attached patch >> > (compile only). >> > >> > What do you think? >> >> I think we need to keep the atomic. >> >> If two cpus call net_disable_timestamp() roughly at the same time, the >> work will be scheduled once. Good point! Yeah, the same work will not be schedule twice. > > Updated patch (but not tested yet) I can't think out a better way to fix this. I expect jump_label to provide an API for this, but it doesn't, static_key_slow_dec_deferred() is just for batching. Probably we should introduce one to avoid these ugly #ifdef HAVE_JUMP_LABEL here, but that is a -next material. So, please feel free to send it formally. Thanks. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH net] net: use a work queue to defer net_disable_timestamp() work 2017-02-02 18:01 ` Cong Wang @ 2017-02-02 18:31 ` Eric Dumazet 2017-02-03 21:18 ` David Miller 0 siblings, 1 reply; 17+ messages in thread From: Eric Dumazet @ 2017-02-02 18:31 UTC (permalink / raw) To: Cong Wang; +Cc: Eric Dumazet, Dmitry Vyukov, David Miller, netdev From: Eric Dumazet <edumazet@google.com> Dmitry reported a warning [1] showing that we were calling net_disable_timestamp() -> static_key_slow_dec() from a non process context. Grabbing a mutex while holding a spinlock or rcu_read_lock() is not allowed. As Cong suggested, we now use a work queue. It is possible netstamp_clear() exits while netstamp_needed_deferred is not zero, but it is probably not worth trying to do better than that. netstamp_needed_deferred atomic tracks the exact number of deferred decrements. [1] [ INFO: suspicious RCU usage. ] 4.10.0-rc5+ #192 Not tainted ------------------------------- ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 0 2 locks held by syz-executor14/23111: #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock include/net/sock.h:1454 [inline] #0: (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919 #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook include/linux/netfilter.h:201 [inline] #1: (rcu_read_lock){......}, at: [<ffffffff83ae2678>] __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160 stack backtrace: CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452 rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline] ___might_sleep+0x560/0x650 kernel/sched/core.c:7748 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x600 net/socket.c:848 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 vfs_writev+0x87/0xc0 fs/read_write.c:911 do_writev+0x110/0x2c0 fs/read_write.c:944 SYSC_writev fs/read_write.c:1017 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1014 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x445559 RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559 RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005 RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000 R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:752 in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14 INFO: lockdep is turned off. CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780 __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739 mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752 atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060 __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149 static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174 net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728 sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403 __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441 sk_destruct+0x47/0x80 net/core/sock.c:1460 __sk_free+0x57/0x230 net/core/sock.c:1468 sock_wfree+0xae/0x120 net/core/sock.c:1645 skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655 skb_release_all+0x15/0x60 net/core/skbuff.c:668 __kfree_skb+0x15/0x20 net/core/skbuff.c:684 kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705 inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304 inet_frag_put include/net/inet_frag.h:133 [inline] nf_ct_frag6_gather+0x1106/0x3840 net/ipv6/netfilter/nf_conntrack_reasm.c:617 ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68 nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline] nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310 nf_hook include/linux/netfilter.h:212 [inline] __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160 ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170 ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722 ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742 rawv6_push_pending_frames net/ipv6/raw.c:613 [inline] rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 sock_write_iter+0x326/0x600 net/socket.c:848 do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695 do_readv_writev+0x42c/0x9b0 fs/read_write.c:872 vfs_writev+0x87/0xc0 fs/read_write.c:911 do_writev+0x110/0x2c0 fs/read_write.c:944 SYSC_writev fs/read_write.c:1017 [inline] SyS_writev+0x27/0x30 fs/read_write.c:1014 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x445559 Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context") Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/core/dev.c | 31 +++++++++++++------------------ 1 file changed, 13 insertions(+), 18 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 7f218e095361520d11c243d650e053321ea7274f..29101c98399f40b6b8e42c31a255d8f1fb6bd7a1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1695,24 +1695,19 @@ EXPORT_SYMBOL_GPL(net_dec_egress_queue); static struct static_key netstamp_needed __read_mostly; #ifdef HAVE_JUMP_LABEL -/* We are not allowed to call static_key_slow_dec() from irq context - * If net_disable_timestamp() is called from irq context, defer the - * static_key_slow_dec() calls. - */ static atomic_t netstamp_needed_deferred; -#endif - -void net_enable_timestamp(void) +static void netstamp_clear(struct work_struct *work) { -#ifdef HAVE_JUMP_LABEL int deferred = atomic_xchg(&netstamp_needed_deferred, 0); - if (deferred) { - while (--deferred) - static_key_slow_dec(&netstamp_needed); - return; - } + while (deferred--) + static_key_slow_dec(&netstamp_needed); +} +static DECLARE_WORK(netstamp_work, netstamp_clear); #endif + +void net_enable_timestamp(void) +{ static_key_slow_inc(&netstamp_needed); } EXPORT_SYMBOL(net_enable_timestamp); @@ -1720,12 +1715,12 @@ EXPORT_SYMBOL(net_enable_timestamp); void net_disable_timestamp(void) { #ifdef HAVE_JUMP_LABEL - if (in_interrupt()) { - atomic_inc(&netstamp_needed_deferred); - return; - } -#endif + /* net_disable_timestamp() can be called from non process context */ + atomic_inc(&netstamp_needed_deferred); + schedule_work(&netstamp_work); +#else static_key_slow_dec(&netstamp_needed); +#endif } EXPORT_SYMBOL(net_disable_timestamp); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH net] net: use a work queue to defer net_disable_timestamp() work 2017-02-02 18:31 ` [PATCH net] net: use a work queue to defer net_disable_timestamp() work Eric Dumazet @ 2017-02-03 21:18 ` David Miller 0 siblings, 0 replies; 17+ messages in thread From: David Miller @ 2017-02-03 21:18 UTC (permalink / raw) To: eric.dumazet; +Cc: xiyou.wangcong, edumazet, dvyukov, netdev From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 02 Feb 2017 10:31:35 -0800 > From: Eric Dumazet <edumazet@google.com> > > Dmitry reported a warning [1] showing that we were calling > net_disable_timestamp() -> static_key_slow_dec() from a non > process context. > > Grabbing a mutex while holding a spinlock or rcu_read_lock() > is not allowed. > > As Cong suggested, we now use a work queue. > > It is possible netstamp_clear() exits while netstamp_needed_deferred > is not zero, but it is probably not worth trying to do better than that. > > netstamp_needed_deferred atomic tracks the exact number of deferred > decrements. ... > Fixes: b90e5794c5bd ("net: dont call jump_label_dec from irq context") > Suggested-by: Cong Wang <xiyou.wangcong@gmail.com> > Reported-by: Dmitry Vyukov <dvyukov@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied and queued up for -stable, thanks Eric. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2017-02-03 21:18 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-27 21:15 net: suspicious RCU usage in nf_hook Dmitry Vyukov 2017-01-27 23:22 ` Cong Wang 2017-01-27 23:30 ` Cong Wang 2017-01-27 23:35 ` Eric Dumazet 2017-01-28 1:00 ` Cong Wang 2017-01-28 1:31 ` Eric Dumazet 2017-01-31 6:19 ` Cong Wang 2017-01-31 15:44 ` Eric Dumazet 2017-02-01 20:51 ` Cong Wang 2017-02-01 21:16 ` Eric Dumazet 2017-02-01 21:22 ` Eric Dumazet 2017-02-01 23:29 ` Cong Wang 2017-02-01 23:48 ` Eric Dumazet 2017-02-01 23:59 ` Eric Dumazet 2017-02-02 18:01 ` Cong Wang 2017-02-02 18:31 ` [PATCH net] net: use a work queue to defer net_disable_timestamp() work Eric Dumazet 2017-02-03 21:18 ` David Miller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.