All of lore.kernel.org
 help / color / mirror / Atom feed
* possible deadlock in sk_clone_lock
@ 2021-02-26 21:08 ` syzbot
  0 siblings, 0 replies; 25+ messages in thread
From: syzbot @ 2021-02-26 21:08 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, mike.kravetz, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    577c2835 Add linux-next specific files for 20210224
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=137cef82d00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e9bb3d369b3bf49
dashboard link: https://syzkaller.appspot.com/bug?extid=506c8a2a115201881d45

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.11.0-next-20210224-syzkaller #0 Not tainted
-----------------------------------------------------
syz-executor.3/15411 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390

and this task is already holding:
ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 net/ipv4/tcp.c:2788
which would create a new lock dependency:
 (slock-AF_INET){+.-.}-{2:2} -> (hugetlb_lock){+.+.}-{2:2}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (slock-AF_INET){+.-.}-{2:2}

... which became SOFTIRQ-irq-safe at:
  lock_acquire kernel/locking/lockdep.c:5510 [inline]
  lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
  inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
  tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
  tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
  tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
  tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
  ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
  ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
  NF_HOOK include/linux/netfilter.h:301 [inline]
  NF_HOOK include/linux/netfilter.h:295 [inline]
  ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
  dst_input include/net/dst.h:458 [inline]
  ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
  ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
  ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
  ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
  __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
  __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
  __netif_receive_skb_list net/core/dev.c:5508 [inline]
  netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
  gro_normal_list net/core/dev.c:5772 [inline]
  gro_normal_list net/core/dev.c:5768 [inline]
  napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
  virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
  virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
  __napi_poll+0xaf/0x440 net/core/dev.c:6892
  napi_poll net/core/dev.c:6959 [inline]
  net_rx_action+0x801/0xb40 net/core/dev.c:7036
  __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
  invoke_softirq kernel/softirq.c:221 [inline]
  __irq_exit_rcu kernel/softirq.c:422 [inline]
  irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
  common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
  asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
  tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
  tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
  tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
  tomoyo_path_permission security/tomoyo/file.c:587 [inline]
  tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
  tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
  tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
  security_path_symlink+0xdf/0x150 security/security.c:1119
  do_symlinkat+0x123/0x300 fs/namei.c:4201
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xae

to a SOFTIRQ-irq-unsafe lock:
 (hugetlb_lock){+.+.}-{2:2}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire kernel/locking/lockdep.c:5510 [inline]
  lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
  proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
  call_write_iter include/linux/fs.h:1977 [inline]
  new_sync_write+0x426/0x650 fs/read_write.c:519
  vfs_write+0x796/0xa30 fs/read_write.c:606
  ksys_write+0x12d/0x250 fs/read_write.c:659
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hugetlb_lock);
                               local_irq_disable();
                               lock(slock-AF_INET);
                               lock(hugetlb_lock);
  <Interrupt>
    lock(slock-AF_INET);

 *** DEADLOCK ***

3 locks held by syz-executor.3/15411:
 #0: ffff88802a56a190 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:775 [inline]
 #0: ffff88802a56a190 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: __sock_release+0x86/0x280 net/socket.c:598
 #1: ffff88802391c7a0 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline]
 #1: ffff88802391c7a0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1e/0xc0 net/ipv4/tcp.c:2866
 #2: ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
 #2: ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 net/ipv4/tcp.c:2788

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (slock-AF_INET){+.-.}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
                    _raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
                    spin_lock_bh include/linux/spinlock.h:359 [inline]
                    lock_sock_nested+0x40/0x120 net/core/sock.c:3063
                    lock_sock include/net/sock.h:1600 [inline]
                    inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:180
                    inet_dgram_connect+0x1f5/0x2d0 net/ipv4/af_inet.c:578
                    __sys_connect_file+0x155/0x1a0 net/socket.c:1837
                    __sys_connect+0x161/0x190 net/socket.c:1854
                    __do_sys_connect net/socket.c:1864 [inline]
                    __se_sys_connect net/socket.c:1861 [inline]
                    __x64_sys_connect+0x6f/0xb0 net/socket.c:1861
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   IN-SOFTIRQ-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                    spin_lock include/linux/spinlock.h:354 [inline]
                    sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
                    inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
                    tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
                    tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
                    tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
                    tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
                    ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
                    ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
                    NF_HOOK include/linux/netfilter.h:301 [inline]
                    NF_HOOK include/linux/netfilter.h:295 [inline]
                    ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
                    dst_input include/net/dst.h:458 [inline]
                    ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
                    ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
                    ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
                    ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
                    __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
                    __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
                    __netif_receive_skb_list net/core/dev.c:5508 [inline]
                    netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
                    gro_normal_list net/core/dev.c:5772 [inline]
                    gro_normal_list net/core/dev.c:5768 [inline]
                    napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
                    virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
                    virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
                    __napi_poll+0xaf/0x440 net/core/dev.c:6892
                    napi_poll net/core/dev.c:6959 [inline]
                    net_rx_action+0x801/0xb40 net/core/dev.c:7036
                    __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
                    invoke_softirq kernel/softirq.c:221 [inline]
                    __irq_exit_rcu kernel/softirq.c:422 [inline]
                    irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
                    common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
                    asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
                    tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
                    tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
                    tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
                    tomoyo_path_permission security/tomoyo/file.c:587 [inline]
                    tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
                    tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
                    tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
                    security_path_symlink+0xdf/0x150 security/security.c:1119
                    do_symlinkat+0x123/0x300 fs/namei.c:4201
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   INITIAL USE at:
                   lock_acquire kernel/locking/lockdep.c:5510 [inline]
                   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
                   _raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
                   spin_lock_bh include/linux/spinlock.h:359 [inline]
                   lock_sock_nested+0x40/0x120 net/core/sock.c:3063
                   lock_sock include/net/sock.h:1600 [inline]
                   inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:180
                   inet_dgram_connect+0x1f5/0x2d0 net/ipv4/af_inet.c:578
                   __sys_connect_file+0x155/0x1a0 net/socket.c:1837
                   __sys_connect+0x161/0x190 net/socket.c:1854
                   __do_sys_connect net/socket.c:1864 [inline]
                   __se_sys_connect net/socket.c:1861 [inline]
                   __x64_sys_connect+0x6f/0xb0 net/socket.c:1861
                   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                   entry_SYSCALL_64_after_hwframe+0x44/0xae
 }
 ... key      at: [<ffffffff901df860>] af_family_slock_keys+0x20/0x300
 ... acquired at:
   lock_acquire kernel/locking/lockdep.c:5510 [inline]
   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
   spin_lock include/linux/spinlock.h:354 [inline]
   __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
   free_huge_page+0x31/0xb0 mm/hugetlb.c:1461
   __put_page+0xf7/0x3e0 mm/swap.c:126
   put_page include/linux/mm.h:1219 [inline]
   __skb_frag_unref include/linux/skbuff.h:3085 [inline]
   skb_release_data+0x465/0x750 net/core/skbuff.c:666
   skb_release_all net/core/skbuff.c:725 [inline]
   __kfree_skb+0x46/0x60 net/core/skbuff.c:739
   sk_wmem_free_skb include/net/sock.h:1558 [inline]
   tcp_rtx_queue_purge net/ipv4/tcp.c:2895 [inline]
   tcp_write_queue_purge+0x44c/0x1250 net/ipv4/tcp.c:2908
   tcp_v4_destroy_sock+0xf2/0x780 net/ipv4/tcp_ipv4.c:2219
   inet_csk_destroy_sock+0x196/0x490 net/ipv4/inet_connection_sock.c:884
   __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2855
   tcp_close+0x29/0xc0 net/ipv4/tcp.c:2867
   inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
   __sock_release+0xcd/0x280 net/socket.c:599
   sock_close+0x18/0x20 net/socket.c:1258
   __fput+0x288/0x920 fs/file_table.c:280
   task_work_run+0xdd/0x1a0 kernel/task_work.c:140
   get_signal+0x1c89/0x2100 kernel/signal.c:2554
   arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
   handle_signal_work kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
   exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
   __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
   syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
   entry_SYSCALL_64_after_hwframe+0x44/0xae


the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (hugetlb_lock){+.+.}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                    spin_lock include/linux/spinlock.h:354 [inline]
                    hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
                    proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
                    call_write_iter include/linux/fs.h:1977 [inline]
                    new_sync_write+0x426/0x650 fs/read_write.c:519
                    vfs_write+0x796/0xa30 fs/read_write.c:606
                    ksys_write+0x12d/0x250 fs/read_write.c:659
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   SOFTIRQ-ON-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                    spin_lock include/linux/spinlock.h:354 [inline]
                    hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
                    proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
                    call_write_iter include/linux/fs.h:1977 [inline]
                    new_sync_write+0x426/0x650 fs/read_write.c:519
                    vfs_write+0x796/0xa30 fs/read_write.c:606
                    ksys_write+0x12d/0x250 fs/read_write.c:659
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   INITIAL USE at:
                   lock_acquire kernel/locking/lockdep.c:5510 [inline]
                   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                   spin_lock include/linux/spinlock.h:354 [inline]
                   hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
                   proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
                   call_write_iter include/linux/fs.h:1977 [inline]
                   new_sync_write+0x426/0x650 fs/read_write.c:519
                   vfs_write+0x796/0xa30 fs/read_write.c:606
                   ksys_write+0x12d/0x250 fs/read_write.c:659
                   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                   entry_SYSCALL_64_after_hwframe+0x44/0xae
 }
 ... key      at: [<ffffffff8c0a0e18>] hugetlb_lock+0x18/0x4240
 ... acquired at:
   lock_acquire kernel/locking/lockdep.c:5510 [inline]
   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
   spin_lock include/linux/spinlock.h:354 [inline]
   __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
   free_huge_page+0x31/0xb0 mm/hugetlb.c:1461
   __put_page+0xf7/0x3e0 mm/swap.c:126
   put_page include/linux/mm.h:1219 [inline]
   __skb_frag_unref include/linux/skbuff.h:3085 [inline]
   skb_release_data+0x465/0x750 net/core/skbuff.c:666
   skb_release_all net/core/skbuff.c:725 [inline]
   __kfree_skb+0x46/0x60 net/core/skbuff.c:739
   sk_wmem_free_skb include/net/sock.h:1558 [inline]
   tcp_rtx_queue_purge net/ipv4/tcp.c:2895 [inline]
   tcp_write_queue_purge+0x44c/0x1250 net/ipv4/tcp.c:2908
   tcp_v4_destroy_sock+0xf2/0x780 net/ipv4/tcp_ipv4.c:2219
   inet_csk_destroy_sock+0x196/0x490 net/ipv4/inet_connection_sock.c:884
   __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2855
   tcp_close+0x29/0xc0 net/ipv4/tcp.c:2867
   inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
   __sock_release+0xcd/0x280 net/socket.c:599
   sock_close+0x18/0x20 net/socket.c:1258
   __fput+0x288/0x920 fs/file_table.c:280
   task_work_run+0xdd/0x1a0 kernel/task_work.c:140
   get_signal+0x1c89/0x2100 kernel/signal.c:2554
   arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
   handle_signal_work kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
   exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
   __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
   syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
   entry_SYSCALL_64_after_hwframe+0x44/0xae


stack backtrace:
CPU: 0 PID: 15411 Comm: syz-executor.3 Not tainted 5.11.0-next-20210224-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 print_bad_irq_dependency kernel/locking/lockdep.c:2460 [inline]
 check_irq_usage.cold+0x50d/0x744 kernel/locking/lockdep.c:2689
 check_prev_add kernel/locking/lockdep.c:2940 [inline]
 check_prevs_add kernel/locking/lockdep.c:3059 [inline]
 validate_chain kernel/locking/lockdep.c:3674 [inline]
 __lock_acquire+0x2b2c/0x54c0 kernel/locking/lockdep.c:4900
 lock_acquire kernel/locking/lockdep.c:5510 [inline]
 lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:354 [inline]
 __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
 free_huge_page+0x31/0xb0 mm/hugetlb.c:1461
 __put_page+0xf7/0x3e0 mm/swap.c:126
 put_page include/linux/mm.h:1219 [inline]
 __skb_frag_unref include/linux/skbuff.h:3085 [inline]
 skb_release_data+0x465/0x750 net/core/skbuff.c:666
 skb_release_all net/core/skbuff.c:725 [inline]
 __kfree_skb+0x46/0x60 net/core/skbuff.c:739
 sk_wmem_free_skb include/net/sock.h:1558 [inline]
 tcp_rtx_queue_purge net/ipv4/tcp.c:2895 [inline]
 tcp_write_queue_purge+0x44c/0x1250 net/ipv4/tcp.c:2908
 tcp_v4_destroy_sock+0xf2/0x780 net/ipv4/tcp_ipv4.c:2219
 inet_csk_destroy_sock+0x196/0x490 net/ipv4/inet_connection_sock.c:884
 __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2855
 tcp_close+0x29/0xc0 net/ipv4/tcp.c:2867
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 __sock_release+0xcd/0x280 net/socket.c:599
 sock_close+0x18/0x20 net/socket.c:1258
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 get_signal+0x1c89/0x2100 kernel/signal.c:2554
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x465ef9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbfc748c188 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: 0000000000005cd9 RBX: 000000000056bf60 RCX: 0000000000465ef9
RDX: ffffffffffffffd0 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00000000004bcd1c R08: 0000000000000000 R09: ffffffffffffff36
R10: 000000000401c005 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007ffde63b79bf R14: 00007fbfc748c300 R15: 0000000000022000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* possible deadlock in sk_clone_lock
@ 2021-02-26 21:08 ` syzbot
  0 siblings, 0 replies; 25+ messages in thread
From: syzbot @ 2021-02-26 21:08 UTC (permalink / raw)
  To: akpm, linux-kernel, linux-mm, mike.kravetz, syzkaller-bugs

Hello,

syzbot found the following issue on:

HEAD commit:    577c2835 Add linux-next specific files for 20210224
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=137cef82d00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e9bb3d369b3bf49
dashboard link: https://syzkaller.appspot.com/bug?extid=506c8a2a115201881d45

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com

=====================================================
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
5.11.0-next-20210224-syzkaller #0 Not tainted
-----------------------------------------------------
syz-executor.3/15411 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390

and this task is already holding:
ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 net/ipv4/tcp.c:2788
which would create a new lock dependency:
 (slock-AF_INET){+.-.}-{2:2} -> (hugetlb_lock){+.+.}-{2:2}

but this new dependency connects a SOFTIRQ-irq-safe lock:
 (slock-AF_INET){+.-.}-{2:2}

... which became SOFTIRQ-irq-safe at:
  lock_acquire kernel/locking/lockdep.c:5510 [inline]
  lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
  inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
  tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
  tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
  tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
  tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
  ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
  ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
  NF_HOOK include/linux/netfilter.h:301 [inline]
  NF_HOOK include/linux/netfilter.h:295 [inline]
  ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
  dst_input include/net/dst.h:458 [inline]
  ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
  ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
  ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
  ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
  __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
  __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
  __netif_receive_skb_list net/core/dev.c:5508 [inline]
  netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
  gro_normal_list net/core/dev.c:5772 [inline]
  gro_normal_list net/core/dev.c:5768 [inline]
  napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
  virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
  virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
  __napi_poll+0xaf/0x440 net/core/dev.c:6892
  napi_poll net/core/dev.c:6959 [inline]
  net_rx_action+0x801/0xb40 net/core/dev.c:7036
  __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
  invoke_softirq kernel/softirq.c:221 [inline]
  __irq_exit_rcu kernel/softirq.c:422 [inline]
  irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
  common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
  asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
  tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
  tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
  tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
  tomoyo_path_permission security/tomoyo/file.c:587 [inline]
  tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
  tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
  tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
  security_path_symlink+0xdf/0x150 security/security.c:1119
  do_symlinkat+0x123/0x300 fs/namei.c:4201
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xae

to a SOFTIRQ-irq-unsafe lock:
 (hugetlb_lock){+.+.}-{2:2}

... which became SOFTIRQ-irq-unsafe at:
...
  lock_acquire kernel/locking/lockdep.c:5510 [inline]
  lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
  __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
  _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
  spin_lock include/linux/spinlock.h:354 [inline]
  hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
  proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
  call_write_iter include/linux/fs.h:1977 [inline]
  new_sync_write+0x426/0x650 fs/read_write.c:519
  vfs_write+0x796/0xa30 fs/read_write.c:606
  ksys_write+0x12d/0x250 fs/read_write.c:659
  do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
  entry_SYSCALL_64_after_hwframe+0x44/0xae

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(hugetlb_lock);
                               local_irq_disable();
                               lock(slock-AF_INET);
                               lock(hugetlb_lock);
  <Interrupt>
    lock(slock-AF_INET);

 *** DEADLOCK ***

3 locks held by syz-executor.3/15411:
 #0: ffff88802a56a190 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:775 [inline]
 #0: ffff88802a56a190 (&sb->s_type->i_mutex_key#13){+.+.}-{3:3}, at: __sock_release+0x86/0x280 net/socket.c:598
 #1: ffff88802391c7a0 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1600 [inline]
 #1: ffff88802391c7a0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1e/0xc0 net/ipv4/tcp.c:2866
 #2: ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
 #2: ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 net/ipv4/tcp.c:2788

the dependencies between SOFTIRQ-irq-safe lock and the holding lock:
-> (slock-AF_INET){+.-.}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
                    _raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
                    spin_lock_bh include/linux/spinlock.h:359 [inline]
                    lock_sock_nested+0x40/0x120 net/core/sock.c:3063
                    lock_sock include/net/sock.h:1600 [inline]
                    inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:180
                    inet_dgram_connect+0x1f5/0x2d0 net/ipv4/af_inet.c:578
                    __sys_connect_file+0x155/0x1a0 net/socket.c:1837
                    __sys_connect+0x161/0x190 net/socket.c:1854
                    __do_sys_connect net/socket.c:1864 [inline]
                    __se_sys_connect net/socket.c:1861 [inline]
                    __x64_sys_connect+0x6f/0xb0 net/socket.c:1861
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   IN-SOFTIRQ-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                    spin_lock include/linux/spinlock.h:354 [inline]
                    sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
                    inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
                    tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
                    tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
                    tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
                    tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
                    ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
                    ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
                    NF_HOOK include/linux/netfilter.h:301 [inline]
                    NF_HOOK include/linux/netfilter.h:295 [inline]
                    ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
                    dst_input include/net/dst.h:458 [inline]
                    ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
                    ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
                    ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
                    ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
                    __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
                    __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
                    __netif_receive_skb_list net/core/dev.c:5508 [inline]
                    netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
                    gro_normal_list net/core/dev.c:5772 [inline]
                    gro_normal_list net/core/dev.c:5768 [inline]
                    napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
                    virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
                    virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
                    __napi_poll+0xaf/0x440 net/core/dev.c:6892
                    napi_poll net/core/dev.c:6959 [inline]
                    net_rx_action+0x801/0xb40 net/core/dev.c:7036
                    __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
                    invoke_softirq kernel/softirq.c:221 [inline]
                    __irq_exit_rcu kernel/softirq.c:422 [inline]
                    irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
                    common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
                    asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
                    tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
                    tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
                    tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
                    tomoyo_path_permission security/tomoyo/file.c:587 [inline]
                    tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
                    tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
                    tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
                    security_path_symlink+0xdf/0x150 security/security.c:1119
                    do_symlinkat+0x123/0x300 fs/namei.c:4201
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   INITIAL USE at:
                   lock_acquire kernel/locking/lockdep.c:5510 [inline]
                   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
                   _raw_spin_lock_bh+0x2f/0x40 kernel/locking/spinlock.c:175
                   spin_lock_bh include/linux/spinlock.h:359 [inline]
                   lock_sock_nested+0x40/0x120 net/core/sock.c:3063
                   lock_sock include/net/sock.h:1600 [inline]
                   inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:180
                   inet_dgram_connect+0x1f5/0x2d0 net/ipv4/af_inet.c:578
                   __sys_connect_file+0x155/0x1a0 net/socket.c:1837
                   __sys_connect+0x161/0x190 net/socket.c:1854
                   __do_sys_connect net/socket.c:1864 [inline]
                   __se_sys_connect net/socket.c:1861 [inline]
                   __x64_sys_connect+0x6f/0xb0 net/socket.c:1861
                   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                   entry_SYSCALL_64_after_hwframe+0x44/0xae
 }
 ... key      at: [<ffffffff901df860>] af_family_slock_keys+0x20/0x300
 ... acquired at:
   lock_acquire kernel/locking/lockdep.c:5510 [inline]
   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
   spin_lock include/linux/spinlock.h:354 [inline]
   __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
   free_huge_page+0x31/0xb0 mm/hugetlb.c:1461
   __put_page+0xf7/0x3e0 mm/swap.c:126
   put_page include/linux/mm.h:1219 [inline]
   __skb_frag_unref include/linux/skbuff.h:3085 [inline]
   skb_release_data+0x465/0x750 net/core/skbuff.c:666
   skb_release_all net/core/skbuff.c:725 [inline]
   __kfree_skb+0x46/0x60 net/core/skbuff.c:739
   sk_wmem_free_skb include/net/sock.h:1558 [inline]
   tcp_rtx_queue_purge net/ipv4/tcp.c:2895 [inline]
   tcp_write_queue_purge+0x44c/0x1250 net/ipv4/tcp.c:2908
   tcp_v4_destroy_sock+0xf2/0x780 net/ipv4/tcp_ipv4.c:2219
   inet_csk_destroy_sock+0x196/0x490 net/ipv4/inet_connection_sock.c:884
   __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2855
   tcp_close+0x29/0xc0 net/ipv4/tcp.c:2867
   inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
   __sock_release+0xcd/0x280 net/socket.c:599
   sock_close+0x18/0x20 net/socket.c:1258
   __fput+0x288/0x920 fs/file_table.c:280
   task_work_run+0xdd/0x1a0 kernel/task_work.c:140
   get_signal+0x1c89/0x2100 kernel/signal.c:2554
   arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
   handle_signal_work kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
   exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
   __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
   syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
   entry_SYSCALL_64_after_hwframe+0x44/0xae


the dependencies between the lock to be acquired
 and SOFTIRQ-irq-unsafe lock:
-> (hugetlb_lock){+.+.}-{2:2} {
   HARDIRQ-ON-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                    spin_lock include/linux/spinlock.h:354 [inline]
                    hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
                    proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
                    call_write_iter include/linux/fs.h:1977 [inline]
                    new_sync_write+0x426/0x650 fs/read_write.c:519
                    vfs_write+0x796/0xa30 fs/read_write.c:606
                    ksys_write+0x12d/0x250 fs/read_write.c:659
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   SOFTIRQ-ON-W at:
                    lock_acquire kernel/locking/lockdep.c:5510 [inline]
                    lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                    __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                    _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                    spin_lock include/linux/spinlock.h:354 [inline]
                    hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
                    proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
                    call_write_iter include/linux/fs.h:1977 [inline]
                    new_sync_write+0x426/0x650 fs/read_write.c:519
                    vfs_write+0x796/0xa30 fs/read_write.c:606
                    ksys_write+0x12d/0x250 fs/read_write.c:659
                    do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                    entry_SYSCALL_64_after_hwframe+0x44/0xae
   INITIAL USE at:
                   lock_acquire kernel/locking/lockdep.c:5510 [inline]
                   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
                   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
                   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
                   spin_lock include/linux/spinlock.h:354 [inline]
                   hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
                   proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
                   call_write_iter include/linux/fs.h:1977 [inline]
                   new_sync_write+0x426/0x650 fs/read_write.c:519
                   vfs_write+0x796/0xa30 fs/read_write.c:606
                   ksys_write+0x12d/0x250 fs/read_write.c:659
                   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
                   entry_SYSCALL_64_after_hwframe+0x44/0xae
 }
 ... key      at: [<ffffffff8c0a0e18>] hugetlb_lock+0x18/0x4240
 ... acquired at:
   lock_acquire kernel/locking/lockdep.c:5510 [inline]
   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
   spin_lock include/linux/spinlock.h:354 [inline]
   __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
   free_huge_page+0x31/0xb0 mm/hugetlb.c:1461
   __put_page+0xf7/0x3e0 mm/swap.c:126
   put_page include/linux/mm.h:1219 [inline]
   __skb_frag_unref include/linux/skbuff.h:3085 [inline]
   skb_release_data+0x465/0x750 net/core/skbuff.c:666
   skb_release_all net/core/skbuff.c:725 [inline]
   __kfree_skb+0x46/0x60 net/core/skbuff.c:739
   sk_wmem_free_skb include/net/sock.h:1558 [inline]
   tcp_rtx_queue_purge net/ipv4/tcp.c:2895 [inline]
   tcp_write_queue_purge+0x44c/0x1250 net/ipv4/tcp.c:2908
   tcp_v4_destroy_sock+0xf2/0x780 net/ipv4/tcp_ipv4.c:2219
   inet_csk_destroy_sock+0x196/0x490 net/ipv4/inet_connection_sock.c:884
   __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2855
   tcp_close+0x29/0xc0 net/ipv4/tcp.c:2867
   inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
   __sock_release+0xcd/0x280 net/socket.c:599
   sock_close+0x18/0x20 net/socket.c:1258
   __fput+0x288/0x920 fs/file_table.c:280
   task_work_run+0xdd/0x1a0 kernel/task_work.c:140
   get_signal+0x1c89/0x2100 kernel/signal.c:2554
   arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
   handle_signal_work kernel/entry/common.c:147 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
   exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
   __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
   syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
   entry_SYSCALL_64_after_hwframe+0x44/0xae


stack backtrace:
CPU: 0 PID: 15411 Comm: syz-executor.3 Not tainted 5.11.0-next-20210224-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xfa/0x151 lib/dump_stack.c:120
 print_bad_irq_dependency kernel/locking/lockdep.c:2460 [inline]
 check_irq_usage.cold+0x50d/0x744 kernel/locking/lockdep.c:2689
 check_prev_add kernel/locking/lockdep.c:2940 [inline]
 check_prevs_add kernel/locking/lockdep.c:3059 [inline]
 validate_chain kernel/locking/lockdep.c:3674 [inline]
 __lock_acquire+0x2b2c/0x54c0 kernel/locking/lockdep.c:4900
 lock_acquire kernel/locking/lockdep.c:5510 [inline]
 lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
 __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
 spin_lock include/linux/spinlock.h:354 [inline]
 __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
 free_huge_page+0x31/0xb0 mm/hugetlb.c:1461
 __put_page+0xf7/0x3e0 mm/swap.c:126
 put_page include/linux/mm.h:1219 [inline]
 __skb_frag_unref include/linux/skbuff.h:3085 [inline]
 skb_release_data+0x465/0x750 net/core/skbuff.c:666
 skb_release_all net/core/skbuff.c:725 [inline]
 __kfree_skb+0x46/0x60 net/core/skbuff.c:739
 sk_wmem_free_skb include/net/sock.h:1558 [inline]
 tcp_rtx_queue_purge net/ipv4/tcp.c:2895 [inline]
 tcp_write_queue_purge+0x44c/0x1250 net/ipv4/tcp.c:2908
 tcp_v4_destroy_sock+0xf2/0x780 net/ipv4/tcp_ipv4.c:2219
 inet_csk_destroy_sock+0x196/0x490 net/ipv4/inet_connection_sock.c:884
 __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2855
 tcp_close+0x29/0xc0 net/ipv4/tcp.c:2867
 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
 __sock_release+0xcd/0x280 net/socket.c:599
 sock_close+0x18/0x20 net/socket.c:1258
 __fput+0x288/0x920 fs/file_table.c:280
 task_work_run+0xdd/0x1a0 kernel/task_work.c:140
 get_signal+0x1c89/0x2100 kernel/signal.c:2554
 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811
 handle_signal_work kernel/entry/common.c:147 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:171 [inline]
 exit_to_user_mode_prepare+0x148/0x250 kernel/entry/common.c:208
 __syscall_exit_to_user_mode_work kernel/entry/common.c:290 [inline]
 syscall_exit_to_user_mode+0x19/0x50 kernel/entry/common.c:301
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x465ef9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fbfc748c188 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: 0000000000005cd9 RBX: 000000000056bf60 RCX: 0000000000465ef9
RDX: ffffffffffffffd0 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00000000004bcd1c R08: 0000000000000000 R09: ffffffffffffff36
R10: 000000000401c005 R11: 0000000000000246 R12: 000000000056bf60
R13: 00007ffde63b79bf R14: 00007fbfc748c300 R15: 0000000000022000


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-02-26 21:08 ` syzbot
@ 2021-02-26 22:44   ` Shakeel Butt
  -1 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-02-26 22:44 UTC (permalink / raw)
  To: syzbot
  Cc: Andrew Morton, LKML, Linux MM, Mike Kravetz, syzkaller-bugs,
	Eric Dumazet, Mina Almasry

On Fri, Feb 26, 2021 at 2:09 PM syzbot
<syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    577c2835 Add linux-next specific files for 20210224
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=137cef82d00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e9bb3d369b3bf49
> dashboard link: https://syzkaller.appspot.com/bug?extid=506c8a2a115201881d45
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com
>
> =====================================================
> WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> 5.11.0-next-20210224-syzkaller #0 Not tainted
> -----------------------------------------------------
> syz-executor.3/15411 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
> ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
> ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
>
> and this task is already holding:
> ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
> ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 net/ipv4/tcp.c:2788
> which would create a new lock dependency:
>  (slock-AF_INET){+.-.}-{2:2} -> (hugetlb_lock){+.+.}-{2:2}
>
> but this new dependency connects a SOFTIRQ-irq-safe lock:
>  (slock-AF_INET){+.-.}-{2:2}
>
> ... which became SOFTIRQ-irq-safe at:
>   lock_acquire kernel/locking/lockdep.c:5510 [inline]
>   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
>   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>   spin_lock include/linux/spinlock.h:354 [inline]
>   sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
>   inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
>   tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
>   tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
>   tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
>   tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
>   ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
>   ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
>   NF_HOOK include/linux/netfilter.h:301 [inline]
>   NF_HOOK include/linux/netfilter.h:295 [inline]
>   ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
>   dst_input include/net/dst.h:458 [inline]
>   ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
>   ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
>   ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
>   ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
>   __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
>   __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
>   __netif_receive_skb_list net/core/dev.c:5508 [inline]
>   netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
>   gro_normal_list net/core/dev.c:5772 [inline]
>   gro_normal_list net/core/dev.c:5768 [inline]
>   napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
>   virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
>   virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
>   __napi_poll+0xaf/0x440 net/core/dev.c:6892
>   napi_poll net/core/dev.c:6959 [inline]
>   net_rx_action+0x801/0xb40 net/core/dev.c:7036
>   __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
>   invoke_softirq kernel/softirq.c:221 [inline]
>   __irq_exit_rcu kernel/softirq.c:422 [inline]
>   irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
>   common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
>   asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
>   tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
>   tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
>   tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
>   tomoyo_path_permission security/tomoyo/file.c:587 [inline]
>   tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
>   tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
>   tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
>   security_path_symlink+0xdf/0x150 security/security.c:1119
>   do_symlinkat+0x123/0x300 fs/namei.c:4201
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> to a SOFTIRQ-irq-unsafe lock:
>  (hugetlb_lock){+.+.}-{2:2}
>
> ... which became SOFTIRQ-irq-unsafe at:
> ...
>   lock_acquire kernel/locking/lockdep.c:5510 [inline]
>   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
>   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>   spin_lock include/linux/spinlock.h:354 [inline]
>   hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
>   proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
>   call_write_iter include/linux/fs.h:1977 [inline]
>   new_sync_write+0x426/0x650 fs/read_write.c:519
>   vfs_write+0x796/0xa30 fs/read_write.c:606
>   ksys_write+0x12d/0x250 fs/read_write.c:659
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> other info that might help us debug this:
>
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(hugetlb_lock);
>                                local_irq_disable();
>                                lock(slock-AF_INET);
>                                lock(hugetlb_lock);
>   <Interrupt>
>     lock(slock-AF_INET);
>
>  *** DEADLOCK ***

This has been reproduced on 4.19 stable kernel as well [1] and there
is a reproducer as well.

It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
wonder if we just need to make hugetlb_lock softirq-safe.

[1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
@ 2021-02-26 22:44   ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-02-26 22:44 UTC (permalink / raw)
  To: syzbot
  Cc: Andrew Morton, LKML, Linux MM, Mike Kravetz, syzkaller-bugs,
	Eric Dumazet, Mina Almasry

On Fri, Feb 26, 2021 at 2:09 PM syzbot
<syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit:    577c2835 Add linux-next specific files for 20210224
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=137cef82d00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=e9bb3d369b3bf49
> dashboard link: https://syzkaller.appspot.com/bug?extid=506c8a2a115201881d45
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com
>
> =====================================================
> WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
> 5.11.0-next-20210224-syzkaller #0 Not tainted
> -----------------------------------------------------
> syz-executor.3/15411 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
> ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
> ffffffff8c0a0e18 (hugetlb_lock){+.+.}-{2:2}, at: __free_huge_page+0x4cd/0xc10 mm/hugetlb.c:1390
>
> and this task is already holding:
> ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:354 [inline]
> ffff88802391c720 (slock-AF_INET){+.-.}-{2:2}, at: __tcp_close+0x6d9/0x1170 net/ipv4/tcp.c:2788
> which would create a new lock dependency:
>  (slock-AF_INET){+.-.}-{2:2} -> (hugetlb_lock){+.+.}-{2:2}
>
> but this new dependency connects a SOFTIRQ-irq-safe lock:
>  (slock-AF_INET){+.-.}-{2:2}
>
> ... which became SOFTIRQ-irq-safe at:
>   lock_acquire kernel/locking/lockdep.c:5510 [inline]
>   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
>   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>   spin_lock include/linux/spinlock.h:354 [inline]
>   sk_clone_lock+0x296/0x1070 net/core/sock.c:1913
>   inet_csk_clone_lock+0x21/0x4c0 net/ipv4/inet_connection_sock.c:830
>   tcp_create_openreq_child+0x30/0x16d0 net/ipv4/tcp_minisocks.c:460
>   tcp_v4_syn_recv_sock+0x10c/0x1460 net/ipv4/tcp_ipv4.c:1526
>   tcp_check_req+0x616/0x1860 net/ipv4/tcp_minisocks.c:772
>   tcp_v4_rcv+0x221a/0x3780 net/ipv4/tcp_ipv4.c:2001
>   ip_protocol_deliver_rcu+0x5c/0x8a0 net/ipv4/ip_input.c:204
>   ip_local_deliver_finish+0x20a/0x370 net/ipv4/ip_input.c:231
>   NF_HOOK include/linux/netfilter.h:301 [inline]
>   NF_HOOK include/linux/netfilter.h:295 [inline]
>   ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:252
>   dst_input include/net/dst.h:458 [inline]
>   ip_sublist_rcv_finish+0x9a/0x2c0 net/ipv4/ip_input.c:551
>   ip_list_rcv_finish.constprop.0+0x514/0x6e0 net/ipv4/ip_input.c:601
>   ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
>   ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
>   __netif_receive_skb_list_ptype net/core/dev.c:5408 [inline]
>   __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5456
>   __netif_receive_skb_list net/core/dev.c:5508 [inline]
>   netif_receive_skb_list_internal+0x777/0xd70 net/core/dev.c:5618
>   gro_normal_list net/core/dev.c:5772 [inline]
>   gro_normal_list net/core/dev.c:5768 [inline]
>   napi_complete_done+0x1f1/0x820 net/core/dev.c:6474
>   virtqueue_napi_complete+0x2c/0xc0 drivers/net/virtio_net.c:334
>   virtnet_poll+0xae2/0xd90 drivers/net/virtio_net.c:1455
>   __napi_poll+0xaf/0x440 net/core/dev.c:6892
>   napi_poll net/core/dev.c:6959 [inline]
>   net_rx_action+0x801/0xb40 net/core/dev.c:7036
>   __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
>   invoke_softirq kernel/softirq.c:221 [inline]
>   __irq_exit_rcu kernel/softirq.c:422 [inline]
>   irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
>   common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
>   asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:623
>   tomoyo_domain_quota_is_ok+0x2f1/0x550 security/tomoyo/util.c:1093
>   tomoyo_supervisor+0x2f2/0xf00 security/tomoyo/common.c:2089
>   tomoyo_audit_path_log security/tomoyo/file.c:168 [inline]
>   tomoyo_path_permission security/tomoyo/file.c:587 [inline]
>   tomoyo_path_permission+0x270/0x3a0 security/tomoyo/file.c:573
>   tomoyo_path_perm+0x39e/0x400 security/tomoyo/file.c:838
>   tomoyo_path_symlink+0x94/0xe0 security/tomoyo/tomoyo.c:200
>   security_path_symlink+0xdf/0x150 security/security.c:1119
>   do_symlinkat+0x123/0x300 fs/namei.c:4201
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> to a SOFTIRQ-irq-unsafe lock:
>  (hugetlb_lock){+.+.}-{2:2}
>
> ... which became SOFTIRQ-irq-unsafe at:
> ...
>   lock_acquire kernel/locking/lockdep.c:5510 [inline]
>   lock_acquire+0x1ab/0x730 kernel/locking/lockdep.c:5475
>   __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
>   _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:151
>   spin_lock include/linux/spinlock.h:354 [inline]
>   hugetlb_overcommit_handler+0x260/0x3e0 mm/hugetlb.c:3448
>   proc_sys_call_handler+0x336/0x610 fs/proc/proc_sysctl.c:591
>   call_write_iter include/linux/fs.h:1977 [inline]
>   new_sync_write+0x426/0x650 fs/read_write.c:519
>   vfs_write+0x796/0xa30 fs/read_write.c:606
>   ksys_write+0x12d/0x250 fs/read_write.c:659
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> other info that might help us debug this:
>
>  Possible interrupt unsafe locking scenario:
>
>        CPU0                    CPU1
>        ----                    ----
>   lock(hugetlb_lock);
>                                local_irq_disable();
>                                lock(slock-AF_INET);
>                                lock(hugetlb_lock);
>   <Interrupt>
>     lock(slock-AF_INET);
>
>  *** DEADLOCK ***

This has been reproduced on 4.19 stable kernel as well [1] and there
is a reproducer as well.

It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
wonder if we just need to make hugetlb_lock softirq-safe.

[1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-02-26 22:44   ` Shakeel Butt
  (?)
@ 2021-02-26 23:14   ` Mike Kravetz
  2021-02-27  0:00       ` Shakeel Butt
  -1 siblings, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2021-02-26 23:14 UTC (permalink / raw)
  To: Shakeel Butt, syzbot
  Cc: Andrew Morton, LKML, Linux MM, syzkaller-bugs, Eric Dumazet,
	Mina Almasry, Michal Hocko

Cc: Michal

On 2/26/21 2:44 PM, Shakeel Butt wrote:
> On Fri, Feb 26, 2021 at 2:09 PM syzbot
> <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
<snip>
>> other info that might help us debug this:
>>
>>  Possible interrupt unsafe locking scenario:
>>
>>        CPU0                    CPU1
>>        ----                    ----
>>   lock(hugetlb_lock);
>>                                local_irq_disable();
>>                                lock(slock-AF_INET);
>>                                lock(hugetlb_lock);
>>   <Interrupt>
>>     lock(slock-AF_INET);
>>
>>  *** DEADLOCK ***
> 
> This has been reproduced on 4.19 stable kernel as well [1] and there
> is a reproducer as well.
> 
> It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> wonder if we just need to make hugetlb_lock softirq-safe.
> 
> [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93

Thanks Shakeel,

Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
context") attempted to address this issue.  It uses a work queue to
acquire hugetlb_lock if the caller is !in_task().

In another recent thread, there was the suggestion to change the
!in_task to in_atomic.

I need to do some research on the subtle differences between in_task,
in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
reported here.  But, that obviously is not the case.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-02-26 23:14   ` Mike Kravetz
@ 2021-02-27  0:00       ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-02-27  0:00 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: syzbot, Andrew Morton, LKML, Linux MM, syzkaller-bugs,
	Eric Dumazet, Mina Almasry, Michal Hocko

On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Cc: Michal
>
> On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> <snip>
> >> other info that might help us debug this:
> >>
> >>  Possible interrupt unsafe locking scenario:
> >>
> >>        CPU0                    CPU1
> >>        ----                    ----
> >>   lock(hugetlb_lock);
> >>                                local_irq_disable();
> >>                                lock(slock-AF_INET);
> >>                                lock(hugetlb_lock);
> >>   <Interrupt>
> >>     lock(slock-AF_INET);
> >>
> >>  *** DEADLOCK ***
> >
> > This has been reproduced on 4.19 stable kernel as well [1] and there
> > is a reproducer as well.
> >
> > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > wonder if we just need to make hugetlb_lock softirq-safe.
> >
> > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
>
> Thanks Shakeel,
>
> Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> context") attempted to address this issue.  It uses a work queue to
> acquire hugetlb_lock if the caller is !in_task().
>
> In another recent thread, there was the suggestion to change the
> !in_task to in_atomic.
>
> I need to do some research on the subtle differences between in_task,
> in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> reported here.  But, that obviously is not the case.

I think the freeing is happening in the process context in this report
but it is creating the lock chain from softirq-safe slock to
irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
always defer the freeing of hugetlb pages to a work queue or (2) make
hugetlb_lock softirq-safe.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
@ 2021-02-27  0:00       ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-02-27  0:00 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: syzbot, Andrew Morton, LKML, Linux MM, syzkaller-bugs,
	Eric Dumazet, Mina Almasry, Michal Hocko

On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> Cc: Michal
>
> On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> <snip>
> >> other info that might help us debug this:
> >>
> >>  Possible interrupt unsafe locking scenario:
> >>
> >>        CPU0                    CPU1
> >>        ----                    ----
> >>   lock(hugetlb_lock);
> >>                                local_irq_disable();
> >>                                lock(slock-AF_INET);
> >>                                lock(hugetlb_lock);
> >>   <Interrupt>
> >>     lock(slock-AF_INET);
> >>
> >>  *** DEADLOCK ***
> >
> > This has been reproduced on 4.19 stable kernel as well [1] and there
> > is a reproducer as well.
> >
> > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > wonder if we just need to make hugetlb_lock softirq-safe.
> >
> > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
>
> Thanks Shakeel,
>
> Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> context") attempted to address this issue.  It uses a work queue to
> acquire hugetlb_lock if the caller is !in_task().
>
> In another recent thread, there was the suggestion to change the
> !in_task to in_atomic.
>
> I need to do some research on the subtle differences between in_task,
> in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> reported here.  But, that obviously is not the case.

I think the freeing is happening in the process context in this report
but it is creating the lock chain from softirq-safe slock to
irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
always defer the freeing of hugetlb pages to a work queue or (2) make
hugetlb_lock softirq-safe.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-02-27  0:00       ` Shakeel Butt
  (?)
@ 2021-03-01 12:11       ` Michal Hocko
  2021-03-01 15:10           ` Shakeel Butt
  -1 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2021-03-01 12:11 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > Cc: Michal
> >
> > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> > <snip>
> > >> other info that might help us debug this:
> > >>
> > >>  Possible interrupt unsafe locking scenario:
> > >>
> > >>        CPU0                    CPU1
> > >>        ----                    ----
> > >>   lock(hugetlb_lock);
> > >>                                local_irq_disable();
> > >>                                lock(slock-AF_INET);
> > >>                                lock(hugetlb_lock);
> > >>   <Interrupt>
> > >>     lock(slock-AF_INET);
> > >>
> > >>  *** DEADLOCK ***
> > >
> > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > is a reproducer as well.
> > >
> > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > wonder if we just need to make hugetlb_lock softirq-safe.
> > >
> > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> >
> > Thanks Shakeel,
> >
> > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > context") attempted to address this issue.  It uses a work queue to
> > acquire hugetlb_lock if the caller is !in_task().
> >
> > In another recent thread, there was the suggestion to change the
> > !in_task to in_atomic.
> >
> > I need to do some research on the subtle differences between in_task,
> > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > reported here.  But, that obviously is not the case.
> 
> I think the freeing is happening in the process context in this report
> but it is creating the lock chain from softirq-safe slock to
> irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> always defer the freeing of hugetlb pages to a work queue or (2) make
> hugetlb_lock softirq-safe.

There is __do_softirq so this should be in the soft IRQ context no?
Is this really reproducible with kernels which have c77c0a8ac4c5
applied?

Btw. making hugetlb lock irq safe has been already discussed and it
seems to be much harder than expected as some heavy operations are done
under the lock. This is really bad. Postponing the whole freeing
operation into a worker context is certainly possible but I would
consider it rather unfortunate. We would have to add some sync mechanism
to wait for hugetlb pages in flight to prevent from external
observability to the userspace. E.g. when shrinking the pool.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-01 12:11       ` Michal Hocko
@ 2021-03-01 15:10           ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-03-01 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > Cc: Michal
> > >
> > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> > > <snip>
> > > >> other info that might help us debug this:
> > > >>
> > > >>  Possible interrupt unsafe locking scenario:
> > > >>
> > > >>        CPU0                    CPU1
> > > >>        ----                    ----
> > > >>   lock(hugetlb_lock);
> > > >>                                local_irq_disable();
> > > >>                                lock(slock-AF_INET);
> > > >>                                lock(hugetlb_lock);
> > > >>   <Interrupt>
> > > >>     lock(slock-AF_INET);
> > > >>
> > > >>  *** DEADLOCK ***
> > > >
> > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > is a reproducer as well.
> > > >
> > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > >
> > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > >
> > > Thanks Shakeel,
> > >
> > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > > context") attempted to address this issue.  It uses a work queue to
> > > acquire hugetlb_lock if the caller is !in_task().
> > >
> > > In another recent thread, there was the suggestion to change the
> > > !in_task to in_atomic.
> > >
> > > I need to do some research on the subtle differences between in_task,
> > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > reported here.  But, that obviously is not the case.
> >
> > I think the freeing is happening in the process context in this report
> > but it is creating the lock chain from softirq-safe slock to
> > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > always defer the freeing of hugetlb pages to a work queue or (2) make
> > hugetlb_lock softirq-safe.
>
> There is __do_softirq so this should be in the soft IRQ context no?
> Is this really reproducible with kernels which have c77c0a8ac4c5
> applied?

Yes this is softirq context and syzbot has reproduced this on
linux-next 20210224.

>
> Btw. making hugetlb lock irq safe has been already discussed and it
> seems to be much harder than expected as some heavy operations are done
> under the lock. This is really bad.

What about just softirq-safe i.e. spin_[un]lock_bh()? Will it still be that bad?

> Postponing the whole freeing
> operation into a worker context is certainly possible but I would
> consider it rather unfortunate. We would have to add some sync mechanism
> to wait for hugetlb pages in flight to prevent from external
> observability to the userspace. E.g. when shrinking the pool.

I think in practice recycling of hugetlb pages is a rare event, so we
might get away without the sync mechanism. How about start postponing
the freeing without sync mechanism and add it later if there are any
user reports complaining?

> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
@ 2021-03-01 15:10           ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-03-01 15:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > >
> > > Cc: Michal
> > >
> > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> > > <snip>
> > > >> other info that might help us debug this:
> > > >>
> > > >>  Possible interrupt unsafe locking scenario:
> > > >>
> > > >>        CPU0                    CPU1
> > > >>        ----                    ----
> > > >>   lock(hugetlb_lock);
> > > >>                                local_irq_disable();
> > > >>                                lock(slock-AF_INET);
> > > >>                                lock(hugetlb_lock);
> > > >>   <Interrupt>
> > > >>     lock(slock-AF_INET);
> > > >>
> > > >>  *** DEADLOCK ***
> > > >
> > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > is a reproducer as well.
> > > >
> > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > >
> > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > >
> > > Thanks Shakeel,
> > >
> > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > > context") attempted to address this issue.  It uses a work queue to
> > > acquire hugetlb_lock if the caller is !in_task().
> > >
> > > In another recent thread, there was the suggestion to change the
> > > !in_task to in_atomic.
> > >
> > > I need to do some research on the subtle differences between in_task,
> > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > reported here.  But, that obviously is not the case.
> >
> > I think the freeing is happening in the process context in this report
> > but it is creating the lock chain from softirq-safe slock to
> > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > always defer the freeing of hugetlb pages to a work queue or (2) make
> > hugetlb_lock softirq-safe.
>
> There is __do_softirq so this should be in the soft IRQ context no?
> Is this really reproducible with kernels which have c77c0a8ac4c5
> applied?

Yes this is softirq context and syzbot has reproduced this on
linux-next 20210224.

>
> Btw. making hugetlb lock irq safe has been already discussed and it
> seems to be much harder than expected as some heavy operations are done
> under the lock. This is really bad.

What about just softirq-safe i.e. spin_[un]lock_bh()? Will it still be that bad?

> Postponing the whole freeing
> operation into a worker context is certainly possible but I would
> consider it rather unfortunate. We would have to add some sync mechanism
> to wait for hugetlb pages in flight to prevent from external
> observability to the userspace. E.g. when shrinking the pool.

I think in practice recycling of hugetlb pages is a rare event, so we
might get away without the sync mechanism. How about start postponing
the freeing without sync mechanism and add it later if there are any
user reports complaining?

> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-01 15:10           ` Shakeel Butt
  (?)
@ 2021-03-01 15:57           ` Michal Hocko
  2021-03-01 16:39               ` Shakeel Butt
  -1 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2021-03-01 15:57 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon 01-03-21 07:10:11, Shakeel Butt wrote:
> On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > >
> > > > Cc: Michal
> > > >
> > > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> > > > <snip>
> > > > >> other info that might help us debug this:
> > > > >>
> > > > >>  Possible interrupt unsafe locking scenario:
> > > > >>
> > > > >>        CPU0                    CPU1
> > > > >>        ----                    ----
> > > > >>   lock(hugetlb_lock);
> > > > >>                                local_irq_disable();
> > > > >>                                lock(slock-AF_INET);
> > > > >>                                lock(hugetlb_lock);
> > > > >>   <Interrupt>
> > > > >>     lock(slock-AF_INET);
> > > > >>
> > > > >>  *** DEADLOCK ***
> > > > >
> > > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > > is a reproducer as well.
> > > > >
> > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > > >
> > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > > >
> > > > Thanks Shakeel,
> > > >
> > > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > > > context") attempted to address this issue.  It uses a work queue to
> > > > acquire hugetlb_lock if the caller is !in_task().
> > > >
> > > > In another recent thread, there was the suggestion to change the
> > > > !in_task to in_atomic.
> > > >
> > > > I need to do some research on the subtle differences between in_task,
> > > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > > reported here.  But, that obviously is not the case.
> > >
> > > I think the freeing is happening in the process context in this report
> > > but it is creating the lock chain from softirq-safe slock to
> > > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > > always defer the freeing of hugetlb pages to a work queue or (2) make
> > > hugetlb_lock softirq-safe.
> >
> > There is __do_softirq so this should be in the soft IRQ context no?
> > Is this really reproducible with kernels which have c77c0a8ac4c5
> > applied?
> 
> Yes this is softirq context and syzbot has reproduced this on
> linux-next 20210224.

Then how come this can ever be a problem? in_task() should exclude soft
irq context unless I am mistaken.
 
> > Btw. making hugetlb lock irq safe has been already discussed and it
> > seems to be much harder than expected as some heavy operations are done
> > under the lock. This is really bad.
> 
> What about just softirq-safe i.e. spin_[un]lock_bh()? Will it still be that bad?

This would be a similar problem to the irq variant. It would just result
in soft irq being delayed potentially.

> > Postponing the whole freeing
> > operation into a worker context is certainly possible but I would
> > consider it rather unfortunate. We would have to add some sync mechanism
> > to wait for hugetlb pages in flight to prevent from external
> > observability to the userspace. E.g. when shrinking the pool.
> 
> I think in practice recycling of hugetlb pages is a rare event, so we
> might get away without the sync mechanism. How about start postponing
> the freeing without sync mechanism and add it later if there are any
> user reports complaining?

I think this should be a last resort. Maybe we can come up with
something better. E.g. break down the hugetlb_lock and use something
different for expensive operations.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-01 15:57           ` Michal Hocko
@ 2021-03-01 16:39               ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-03-01 16:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 01-03-21 07:10:11, Shakeel Butt wrote:
> > On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > > > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > > >
> > > > > Cc: Michal
> > > > >
> > > > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > > > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> > > > > <snip>
> > > > > >> other info that might help us debug this:
> > > > > >>
> > > > > >>  Possible interrupt unsafe locking scenario:
> > > > > >>
> > > > > >>        CPU0                    CPU1
> > > > > >>        ----                    ----
> > > > > >>   lock(hugetlb_lock);
> > > > > >>                                local_irq_disable();
> > > > > >>                                lock(slock-AF_INET);
> > > > > >>                                lock(hugetlb_lock);
> > > > > >>   <Interrupt>
> > > > > >>     lock(slock-AF_INET);
> > > > > >>
> > > > > >>  *** DEADLOCK ***
> > > > > >
> > > > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > > > is a reproducer as well.
> > > > > >
> > > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > > > >
> > > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > > > >
> > > > > Thanks Shakeel,
> > > > >
> > > > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > > > > context") attempted to address this issue.  It uses a work queue to
> > > > > acquire hugetlb_lock if the caller is !in_task().
> > > > >
> > > > > In another recent thread, there was the suggestion to change the
> > > > > !in_task to in_atomic.
> > > > >
> > > > > I need to do some research on the subtle differences between in_task,
> > > > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > > > reported here.  But, that obviously is not the case.
> > > >
> > > > I think the freeing is happening in the process context in this report
> > > > but it is creating the lock chain from softirq-safe slock to
> > > > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > > > always defer the freeing of hugetlb pages to a work queue or (2) make
> > > > hugetlb_lock softirq-safe.
> > >
> > > There is __do_softirq so this should be in the soft IRQ context no?
> > > Is this really reproducible with kernels which have c77c0a8ac4c5
> > > applied?
> >
> > Yes this is softirq context and syzbot has reproduced this on
> > linux-next 20210224.
>
> Then how come this can ever be a problem? in_task() should exclude soft
> irq context unless I am mistaken.
>

If I take the following example of syzbot's deadlock scenario then
CPU1 is the one freeing the hugetlb pages. It is in the process
context but has disabled softirqs (see __tcp_close()).

        CPU0                    CPU1
        ----                    ----
   lock(hugetlb_lock);
                                local_irq_disable();
                                lock(slock-AF_INET);
                                lock(hugetlb_lock);
   <Interrupt>
     lock(slock-AF_INET);

So, this deadlock scenario is very much possible.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
@ 2021-03-01 16:39               ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-03-01 16:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 01-03-21 07:10:11, Shakeel Butt wrote:
> > On Mon, Mar 1, 2021 at 4:12 AM Michal Hocko <mhocko@suse.com> wrote:
> > >
> > > On Fri 26-02-21 16:00:30, Shakeel Butt wrote:
> > > > On Fri, Feb 26, 2021 at 3:14 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> > > > >
> > > > > Cc: Michal
> > > > >
> > > > > On 2/26/21 2:44 PM, Shakeel Butt wrote:
> > > > > > On Fri, Feb 26, 2021 at 2:09 PM syzbot
> > > > > > <syzbot+506c8a2a115201881d45@syzkaller.appspotmail.com> wrote:
> > > > > <snip>
> > > > > >> other info that might help us debug this:
> > > > > >>
> > > > > >>  Possible interrupt unsafe locking scenario:
> > > > > >>
> > > > > >>        CPU0                    CPU1
> > > > > >>        ----                    ----
> > > > > >>   lock(hugetlb_lock);
> > > > > >>                                local_irq_disable();
> > > > > >>                                lock(slock-AF_INET);
> > > > > >>                                lock(hugetlb_lock);
> > > > > >>   <Interrupt>
> > > > > >>     lock(slock-AF_INET);
> > > > > >>
> > > > > >>  *** DEADLOCK ***
> > > > > >
> > > > > > This has been reproduced on 4.19 stable kernel as well [1] and there
> > > > > > is a reproducer as well.
> > > > > >
> > > > > > It seems like sendmsg(MSG_ZEROCOPY) from a buffer backed by hugetlb. I
> > > > > > wonder if we just need to make hugetlb_lock softirq-safe.
> > > > > >
> > > > > > [1] https://syzkaller.appspot.com/bug?extid=6383ce4b0b8ec575ad93
> > > > >
> > > > > Thanks Shakeel,
> > > > >
> > > > > Commit c77c0a8ac4c5 ("mm/hugetlb: defer freeing of huge pages if in non-task
> > > > > context") attempted to address this issue.  It uses a work queue to
> > > > > acquire hugetlb_lock if the caller is !in_task().
> > > > >
> > > > > In another recent thread, there was the suggestion to change the
> > > > > !in_task to in_atomic.
> > > > >
> > > > > I need to do some research on the subtle differences between in_task,
> > > > > in_atomic, etc.  TBH, I 'thought' !in_task would prevent the issue
> > > > > reported here.  But, that obviously is not the case.
> > > >
> > > > I think the freeing is happening in the process context in this report
> > > > but it is creating the lock chain from softirq-safe slock to
> > > > irq-unsafe hugetlb_lock. So, two solutions I can think of are: (1)
> > > > always defer the freeing of hugetlb pages to a work queue or (2) make
> > > > hugetlb_lock softirq-safe.
> > >
> > > There is __do_softirq so this should be in the soft IRQ context no?
> > > Is this really reproducible with kernels which have c77c0a8ac4c5
> > > applied?
> >
> > Yes this is softirq context and syzbot has reproduced this on
> > linux-next 20210224.
>
> Then how come this can ever be a problem? in_task() should exclude soft
> irq context unless I am mistaken.
>

If I take the following example of syzbot's deadlock scenario then
CPU1 is the one freeing the hugetlb pages. It is in the process
context but has disabled softirqs (see __tcp_close()).

        CPU0                    CPU1
        ----                    ----
   lock(hugetlb_lock);
                                local_irq_disable();
                                lock(slock-AF_INET);
                                lock(hugetlb_lock);
   <Interrupt>
     lock(slock-AF_INET);

So, this deadlock scenario is very much possible.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-01 16:39               ` Shakeel Butt
  (?)
@ 2021-03-01 17:23               ` Michal Hocko
  2021-03-02  1:16                 ` Mike Kravetz
  -1 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2021-03-01 17:23 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
[...]
> > Then how come this can ever be a problem? in_task() should exclude soft
> > irq context unless I am mistaken.
> >
> 
> If I take the following example of syzbot's deadlock scenario then
> CPU1 is the one freeing the hugetlb pages. It is in the process
> context but has disabled softirqs (see __tcp_close()).
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(hugetlb_lock);
>                                 local_irq_disable();
>                                 lock(slock-AF_INET);
>                                 lock(hugetlb_lock);
>    <Interrupt>
>      lock(slock-AF_INET);
> 
> So, this deadlock scenario is very much possible.

OK, I see the point now. I was focusing on the IRQ context and hugetlb
side too much. We do not need to be freeing from there. All it takes is
to get a dependency chain over a common lock held here. Thanks for
bearing with me.

Let's see whether we can make hugetlb_lock irq safe.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-01 17:23               ` Michal Hocko
@ 2021-03-02  1:16                 ` Mike Kravetz
  2021-03-02  9:44                   ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2021-03-02  1:16 UTC (permalink / raw)
  To: Michal Hocko, Shakeel Butt
  Cc: syzbot, Andrew Morton, LKML, Linux MM, syzkaller-bugs,
	Eric Dumazet, Mina Almasry

On 3/1/21 9:23 AM, Michal Hocko wrote:
> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> [...]
>>> Then how come this can ever be a problem? in_task() should exclude soft
>>> irq context unless I am mistaken.
>>>
>>
>> If I take the following example of syzbot's deadlock scenario then
>> CPU1 is the one freeing the hugetlb pages. It is in the process
>> context but has disabled softirqs (see __tcp_close()).
>>
>>         CPU0                    CPU1
>>         ----                    ----
>>    lock(hugetlb_lock);
>>                                 local_irq_disable();
>>                                 lock(slock-AF_INET);
>>                                 lock(hugetlb_lock);
>>    <Interrupt>
>>      lock(slock-AF_INET);
>>
>> So, this deadlock scenario is very much possible.
> 
> OK, I see the point now. I was focusing on the IRQ context and hugetlb
> side too much. We do not need to be freeing from there. All it takes is
> to get a dependency chain over a common lock held here. Thanks for
> bearing with me.
> 
> Let's see whether we can make hugetlb_lock irq safe.

I may be confused, but it seems like we have a general problem with
calling free_huge_page (as a result of put_page) with interrupts
disabled.

Consider the current free_huge_page code.  Today, we drop the lock
when processing gigantic pages because we may need to block on a mutex
in cma code.  If our caller has disabled interrupts, then it doesn't
matter if the hugetlb lock is irq safe, when we drop it interrupts will
still be disabled we can not block .  Right?  If correct, then making
hugetlb_lock irq safe would not help.

Again, I may be missing something.

Note that we also are considering doing more with the hugetlb lock
dropped in this path in the 'free vmemmap of hugetlb pages' series.

Since we need to do some work that could block in this path, it seems
like we really need to use a workqueue.  It is too bad that there is not
an interface to identify all the cases where interrupts are disabled.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-02  1:16                 ` Mike Kravetz
@ 2021-03-02  9:44                   ` Michal Hocko
  2021-03-02 14:11                     ` Shakeel Butt
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2021-03-02  9:44 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Shakeel Butt, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> > [...]
> >>> Then how come this can ever be a problem? in_task() should exclude soft
> >>> irq context unless I am mistaken.
> >>>
> >>
> >> If I take the following example of syzbot's deadlock scenario then
> >> CPU1 is the one freeing the hugetlb pages. It is in the process
> >> context but has disabled softirqs (see __tcp_close()).
> >>
> >>         CPU0                    CPU1
> >>         ----                    ----
> >>    lock(hugetlb_lock);
> >>                                 local_irq_disable();
> >>                                 lock(slock-AF_INET);
> >>                                 lock(hugetlb_lock);
> >>    <Interrupt>
> >>      lock(slock-AF_INET);
> >>
> >> So, this deadlock scenario is very much possible.
> > 
> > OK, I see the point now. I was focusing on the IRQ context and hugetlb
> > side too much. We do not need to be freeing from there. All it takes is
> > to get a dependency chain over a common lock held here. Thanks for
> > bearing with me.
> > 
> > Let's see whether we can make hugetlb_lock irq safe.
> 
> I may be confused, but it seems like we have a general problem with
> calling free_huge_page (as a result of put_page) with interrupts
> disabled.
> 
> Consider the current free_huge_page code.  Today, we drop the lock
> when processing gigantic pages because we may need to block on a mutex
> in cma code.  If our caller has disabled interrupts, then it doesn't
> matter if the hugetlb lock is irq safe, when we drop it interrupts will
> still be disabled we can not block .  Right?  If correct, then making
> hugetlb_lock irq safe would not help.
> 
> Again, I may be missing something.
> 
> Note that we also are considering doing more with the hugetlb lock
> dropped in this path in the 'free vmemmap of hugetlb pages' series.
> 
> Since we need to do some work that could block in this path, it seems
> like we really need to use a workqueue.  It is too bad that there is not
> an interface to identify all the cases where interrupts are disabled.

Wouldn't something like this help? It is quite ugly but it would be
simple enough and backportable while we come up with a more rigorous 
solution. What do you think?

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdb58ab14cb..c9a8b39f678d 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
 void free_huge_page(struct page *page)
 {
 	/*
-	 * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
+	 * Defer freeing if in non-task context or when put_page is called
+	 * with IRQ disabled (e.g from via TCP slock dependency chain) to
+	 * avoid hugetlb_lock deadlock.
 	 */
-	if (!in_task()) {
+	if (!in_task() || irqs_disabled()) {
 		/*
 		 * Only call schedule_work() if hpage_freelist is previously
 		 * empty. Otherwise, schedule_work() had been called but the
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-02  9:44                   ` Michal Hocko
@ 2021-03-02 14:11                     ` Shakeel Butt
  2021-03-02 14:29                       ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Shakeel Butt @ 2021-03-02 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > On 3/1/21 9:23 AM, Michal Hocko wrote:
> > > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> > > [...]
> > >>> Then how come this can ever be a problem? in_task() should exclude soft
> > >>> irq context unless I am mistaken.
> > >>>
> > >>
> > >> If I take the following example of syzbot's deadlock scenario then
> > >> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >> context but has disabled softirqs (see __tcp_close()).
> > >>
> > >>         CPU0                    CPU1
> > >>         ----                    ----
> > >>    lock(hugetlb_lock);
> > >>                                 local_irq_disable();
> > >>                                 lock(slock-AF_INET);
> > >>                                 lock(hugetlb_lock);
> > >>    <Interrupt>
> > >>      lock(slock-AF_INET);
> > >>
> > >> So, this deadlock scenario is very much possible.
> > >
> > > OK, I see the point now. I was focusing on the IRQ context and hugetlb
> > > side too much. We do not need to be freeing from there. All it takes is
> > > to get a dependency chain over a common lock held here. Thanks for
> > > bearing with me.
> > >
> > > Let's see whether we can make hugetlb_lock irq safe.
> >
> > I may be confused, but it seems like we have a general problem with
> > calling free_huge_page (as a result of put_page) with interrupts
> > disabled.
> >
> > Consider the current free_huge_page code.  Today, we drop the lock
> > when processing gigantic pages because we may need to block on a mutex
> > in cma code.  If our caller has disabled interrupts, then it doesn't
> > matter if the hugetlb lock is irq safe, when we drop it interrupts will
> > still be disabled we can not block .  Right?  If correct, then making
> > hugetlb_lock irq safe would not help.
> >
> > Again, I may be missing something.
> >
> > Note that we also are considering doing more with the hugetlb lock
> > dropped in this path in the 'free vmemmap of hugetlb pages' series.
> >
> > Since we need to do some work that could block in this path, it seems
> > like we really need to use a workqueue.  It is too bad that there is not
> > an interface to identify all the cases where interrupts are disabled.
>
> Wouldn't something like this help? It is quite ugly but it would be
> simple enough and backportable while we come up with a more rigorous
> solution. What do you think?
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4bdb58ab14cb..c9a8b39f678d 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
>  void free_huge_page(struct page *page)
>  {
>         /*
> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> +        * Defer freeing if in non-task context or when put_page is called
> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> +        * avoid hugetlb_lock deadlock.
>          */
> -       if (!in_task()) {
> +       if (!in_task() || irqs_disabled()) {

Does irqs_disabled() also check softirqs?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-02 14:11                     ` Shakeel Butt
@ 2021-03-02 14:29                       ` Michal Hocko
  2021-03-02 21:19                         ` Mike Kravetz
  0 siblings, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2021-03-02 14:29 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > > On 3/1/21 9:23 AM, Michal Hocko wrote:
> > > > On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > > >> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> > > > [...]
> > > >>> Then how come this can ever be a problem? in_task() should exclude soft
> > > >>> irq context unless I am mistaken.
> > > >>>
> > > >>
> > > >> If I take the following example of syzbot's deadlock scenario then
> > > >> CPU1 is the one freeing the hugetlb pages. It is in the process
> > > >> context but has disabled softirqs (see __tcp_close()).
> > > >>
> > > >>         CPU0                    CPU1
> > > >>         ----                    ----
> > > >>    lock(hugetlb_lock);
> > > >>                                 local_irq_disable();
> > > >>                                 lock(slock-AF_INET);
> > > >>                                 lock(hugetlb_lock);
> > > >>    <Interrupt>
> > > >>      lock(slock-AF_INET);
> > > >>
[...]
> > Wouldn't something like this help? It is quite ugly but it would be
> > simple enough and backportable while we come up with a more rigorous
> > solution. What do you think?
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 4bdb58ab14cb..c9a8b39f678d 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> >  void free_huge_page(struct page *page)
> >  {
> >         /*
> > -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> > +        * Defer freeing if in non-task context or when put_page is called
> > +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> > +        * avoid hugetlb_lock deadlock.
> >          */
> > -       if (!in_task()) {
> > +       if (!in_task() || irqs_disabled()) {
> 
> Does irqs_disabled() also check softirqs?

Nope it doesn't AFAICS. I was referring to the above lockdep splat which
claims irq disabled to be the trigger. But now that you are mentioning
that it would be better to replace in_task() along the way. We have
discussed that in another email thread and I was suggesting to use
in_atomic() which should catch also bh disabled situation. The big IF is
that this needs preempt count to be enabled unconditionally. There are
changes in the RCU tree heading that direction.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-02 14:29                       ` Michal Hocko
@ 2021-03-02 21:19                         ` Mike Kravetz
  2021-03-03  3:59                             ` Shakeel Butt
  2021-03-03  8:03                           ` Michal Hocko
  0 siblings, 2 replies; 25+ messages in thread
From: Mike Kravetz @ 2021-03-02 21:19 UTC (permalink / raw)
  To: Michal Hocko, Shakeel Butt
  Cc: syzbot, Andrew Morton, LKML, Linux MM, syzkaller-bugs,
	Eric Dumazet, Mina Almasry

On 3/2/21 6:29 AM, Michal Hocko wrote:
> On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
>> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
>>>
>>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
>>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
>>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
>>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
>>>>> [...]
>>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
>>>>>>> irq context unless I am mistaken.
>>>>>>>
>>>>>>
>>>>>> If I take the following example of syzbot's deadlock scenario then
>>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
>>>>>> context but has disabled softirqs (see __tcp_close()).
>>>>>>
>>>>>>         CPU0                    CPU1
>>>>>>         ----                    ----
>>>>>>    lock(hugetlb_lock);
>>>>>>                                 local_irq_disable();
>>>>>>                                 lock(slock-AF_INET);
>>>>>>                                 lock(hugetlb_lock);
>>>>>>    <Interrupt>
>>>>>>      lock(slock-AF_INET);
>>>>>>
> [...]
>>> Wouldn't something like this help? It is quite ugly but it would be
>>> simple enough and backportable while we come up with a more rigorous
>>> solution. What do you think?
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 4bdb58ab14cb..c9a8b39f678d 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
>>>  void free_huge_page(struct page *page)
>>>  {
>>>         /*
>>> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
>>> +        * Defer freeing if in non-task context or when put_page is called
>>> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
>>> +        * avoid hugetlb_lock deadlock.
>>>          */
>>> -       if (!in_task()) {
>>> +       if (!in_task() || irqs_disabled()) {
>>
>> Does irqs_disabled() also check softirqs?
> 
> Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> claims irq disabled to be the trigger. But now that you are mentioning
> that it would be better to replace in_task() along the way. We have
> discussed that in another email thread and I was suggesting to use
> in_atomic() which should catch also bh disabled situation. The big IF is
> that this needs preempt count to be enabled unconditionally. There are
> changes in the RCU tree heading that direction.

I have not been following developments in preemption and the RCU tree. 
The comment for in_atomic() says:

/*
 * Are we running in atomic context?  WARNING: this macro cannot
 * always detect atomic context; in particular, it cannot know about
 * held spinlocks in non-preemptible kernels.  Thus it should not be
 * used in the general case to determine whether sleeping is possible.
 * Do not use in_atomic() in driver code.
 */

That does seem to be the case.  I verified in_atomic can detect softirq
context even in non-preemptible kernels.  But, as the comment says it
will not detect a held spinlock in non-preemptible kernels.  So, I think
in_atomic would be better than the current check for !in_task.  That
would handle this syzbot issue, but we could still have issues if the
hugetlb put_page path is called while someone is holding a spinlock with
all interrupts enabled.  Looks like there is no way to detect this
today in non-preemptible kernels.  in_atomic does detect spinlocks held
in preemptible kernels.

I might suggest changing !in_task to in_atomic for now, and then work on
a more robust solution.  I'm afraid such a robust solution will
require considerable effort.  It would need to handle put_page being
called in any context: hardirq, softirq, spinlock held ...  The
put_page/free_huge_page path will need to offload (workqueue or
something else) any processing that can possibly sleep.

Is it worth making the in_atomic change now, or should we just start
working on the more robust complete solution?
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-02 21:19                         ` Mike Kravetz
@ 2021-03-03  3:59                             ` Shakeel Butt
  2021-03-03  8:03                           ` Michal Hocko
  1 sibling, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-03-03  3:59 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Tue, Mar 2, 2021 at 1:19 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 3/2/21 6:29 AM, Michal Hocko wrote:
> > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>
> >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> >>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
> >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>>> [...]
> >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
> >>>>>>> irq context unless I am mistaken.
> >>>>>>>
> >>>>>>
> >>>>>> If I take the following example of syzbot's deadlock scenario then
> >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
> >>>>>> context but has disabled softirqs (see __tcp_close()).
> >>>>>>
> >>>>>>         CPU0                    CPU1
> >>>>>>         ----                    ----
> >>>>>>    lock(hugetlb_lock);
> >>>>>>                                 local_irq_disable();
> >>>>>>                                 lock(slock-AF_INET);
> >>>>>>                                 lock(hugetlb_lock);
> >>>>>>    <Interrupt>
> >>>>>>      lock(slock-AF_INET);
> >>>>>>
> > [...]
> >>> Wouldn't something like this help? It is quite ugly but it would be
> >>> simple enough and backportable while we come up with a more rigorous
> >>> solution. What do you think?
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> >>>  void free_huge_page(struct page *page)
> >>>  {
> >>>         /*
> >>> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> >>> +        * Defer freeing if in non-task context or when put_page is called
> >>> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> >>> +        * avoid hugetlb_lock deadlock.
> >>>          */
> >>> -       if (!in_task()) {
> >>> +       if (!in_task() || irqs_disabled()) {
> >>
> >> Does irqs_disabled() also check softirqs?
> >
> > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > claims irq disabled to be the trigger. But now that you are mentioning
> > that it would be better to replace in_task() along the way. We have
> > discussed that in another email thread and I was suggesting to use
> > in_atomic() which should catch also bh disabled situation. The big IF is
> > that this needs preempt count to be enabled unconditionally. There are
> > changes in the RCU tree heading that direction.
>
> I have not been following developments in preemption and the RCU tree.
> The comment for in_atomic() says:
>
> /*
>  * Are we running in atomic context?  WARNING: this macro cannot
>  * always detect atomic context; in particular, it cannot know about
>  * held spinlocks in non-preemptible kernels.  Thus it should not be
>  * used in the general case to determine whether sleeping is possible.
>  * Do not use in_atomic() in driver code.
>  */
>
> That does seem to be the case.  I verified in_atomic can detect softirq
> context even in non-preemptible kernels.  But, as the comment says it
> will not detect a held spinlock in non-preemptible kernels.  So, I think
> in_atomic would be better than the current check for !in_task.  That
> would handle this syzbot issue, but we could still have issues if the
> hugetlb put_page path is called while someone is holding a spinlock with
> all interrupts enabled.  Looks like there is no way to detect this
> today in non-preemptible kernels.  in_atomic does detect spinlocks held
> in preemptible kernels.
>
> I might suggest changing !in_task to in_atomic for now, and then work on
> a more robust solution.  I'm afraid such a robust solution will
> require considerable effort.  It would need to handle put_page being
> called in any context: hardirq, softirq, spinlock held ...  The
> put_page/free_huge_page path will need to offload (workqueue or
> something else) any processing that can possibly sleep.
>
> Is it worth making the in_atomic change now, or should we just start
> working on the more robust complete solution?

IMHO the change to in_atomic is beneficial because it will at least
fix this specific issue. No reason to keep the users of TCP TX
zerocopy from hugetlb pages broken for a more comprehensive solution.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
@ 2021-03-03  3:59                             ` Shakeel Butt
  0 siblings, 0 replies; 25+ messages in thread
From: Shakeel Butt @ 2021-03-03  3:59 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Michal Hocko, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Tue, Mar 2, 2021 at 1:19 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 3/2/21 6:29 AM, Michal Hocko wrote:
> > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>
> >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> >>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
> >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>>> [...]
> >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
> >>>>>>> irq context unless I am mistaken.
> >>>>>>>
> >>>>>>
> >>>>>> If I take the following example of syzbot's deadlock scenario then
> >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
> >>>>>> context but has disabled softirqs (see __tcp_close()).
> >>>>>>
> >>>>>>         CPU0                    CPU1
> >>>>>>         ----                    ----
> >>>>>>    lock(hugetlb_lock);
> >>>>>>                                 local_irq_disable();
> >>>>>>                                 lock(slock-AF_INET);
> >>>>>>                                 lock(hugetlb_lock);
> >>>>>>    <Interrupt>
> >>>>>>      lock(slock-AF_INET);
> >>>>>>
> > [...]
> >>> Wouldn't something like this help? It is quite ugly but it would be
> >>> simple enough and backportable while we come up with a more rigorous
> >>> solution. What do you think?
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> >>>  void free_huge_page(struct page *page)
> >>>  {
> >>>         /*
> >>> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> >>> +        * Defer freeing if in non-task context or when put_page is called
> >>> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> >>> +        * avoid hugetlb_lock deadlock.
> >>>          */
> >>> -       if (!in_task()) {
> >>> +       if (!in_task() || irqs_disabled()) {
> >>
> >> Does irqs_disabled() also check softirqs?
> >
> > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > claims irq disabled to be the trigger. But now that you are mentioning
> > that it would be better to replace in_task() along the way. We have
> > discussed that in another email thread and I was suggesting to use
> > in_atomic() which should catch also bh disabled situation. The big IF is
> > that this needs preempt count to be enabled unconditionally. There are
> > changes in the RCU tree heading that direction.
>
> I have not been following developments in preemption and the RCU tree.
> The comment for in_atomic() says:
>
> /*
>  * Are we running in atomic context?  WARNING: this macro cannot
>  * always detect atomic context; in particular, it cannot know about
>  * held spinlocks in non-preemptible kernels.  Thus it should not be
>  * used in the general case to determine whether sleeping is possible.
>  * Do not use in_atomic() in driver code.
>  */
>
> That does seem to be the case.  I verified in_atomic can detect softirq
> context even in non-preemptible kernels.  But, as the comment says it
> will not detect a held spinlock in non-preemptible kernels.  So, I think
> in_atomic would be better than the current check for !in_task.  That
> would handle this syzbot issue, but we could still have issues if the
> hugetlb put_page path is called while someone is holding a spinlock with
> all interrupts enabled.  Looks like there is no way to detect this
> today in non-preemptible kernels.  in_atomic does detect spinlocks held
> in preemptible kernels.
>
> I might suggest changing !in_task to in_atomic for now, and then work on
> a more robust solution.  I'm afraid such a robust solution will
> require considerable effort.  It would need to handle put_page being
> called in any context: hardirq, softirq, spinlock held ...  The
> put_page/free_huge_page path will need to offload (workqueue or
> something else) any processing that can possibly sleep.
>
> Is it worth making the in_atomic change now, or should we just start
> working on the more robust complete solution?

IMHO the change to in_atomic is beneficial because it will at least
fix this specific issue. No reason to keep the users of TCP TX
zerocopy from hugetlb pages broken for a more comprehensive solution.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-02 21:19                         ` Mike Kravetz
  2021-03-03  3:59                             ` Shakeel Butt
@ 2021-03-03  8:03                           ` Michal Hocko
  2021-03-03 17:59                             ` Paul E. McKenney
  1 sibling, 1 reply; 25+ messages in thread
From: Michal Hocko @ 2021-03-03  8:03 UTC (permalink / raw)
  To: Mike Kravetz, Paul E. McKenney
  Cc: Shakeel Butt, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

[Add Paul]

On Tue 02-03-21 13:19:34, Mike Kravetz wrote:
> On 3/2/21 6:29 AM, Michal Hocko wrote:
> > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>
> >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> >>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
> >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> >>>>> [...]
> >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
> >>>>>>> irq context unless I am mistaken.
> >>>>>>>
> >>>>>>
> >>>>>> If I take the following example of syzbot's deadlock scenario then
> >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
> >>>>>> context but has disabled softirqs (see __tcp_close()).
> >>>>>>
> >>>>>>         CPU0                    CPU1
> >>>>>>         ----                    ----
> >>>>>>    lock(hugetlb_lock);
> >>>>>>                                 local_irq_disable();
> >>>>>>                                 lock(slock-AF_INET);
> >>>>>>                                 lock(hugetlb_lock);
> >>>>>>    <Interrupt>
> >>>>>>      lock(slock-AF_INET);
> >>>>>>
> > [...]
> >>> Wouldn't something like this help? It is quite ugly but it would be
> >>> simple enough and backportable while we come up with a more rigorous
> >>> solution. What do you think?
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> >>>  void free_huge_page(struct page *page)
> >>>  {
> >>>         /*
> >>> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> >>> +        * Defer freeing if in non-task context or when put_page is called
> >>> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> >>> +        * avoid hugetlb_lock deadlock.
> >>>          */
> >>> -       if (!in_task()) {
> >>> +       if (!in_task() || irqs_disabled()) {
> >>
> >> Does irqs_disabled() also check softirqs?
> > 
> > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > claims irq disabled to be the trigger. But now that you are mentioning
> > that it would be better to replace in_task() along the way. We have
> > discussed that in another email thread and I was suggesting to use
> > in_atomic() which should catch also bh disabled situation. The big IF is
> > that this needs preempt count to be enabled unconditionally. There are
> > changes in the RCU tree heading that direction.
> 
> I have not been following developments in preemption and the RCU tree. 
> The comment for in_atomic() says:
> 
> /*
>  * Are we running in atomic context?  WARNING: this macro cannot
>  * always detect atomic context; in particular, it cannot know about
>  * held spinlocks in non-preemptible kernels.  Thus it should not be
>  * used in the general case to determine whether sleeping is possible.
>  * Do not use in_atomic() in driver code.
>  */
> 
> That does seem to be the case.  I verified in_atomic can detect softirq
> context even in non-preemptible kernels.  But, as the comment says it
> will not detect a held spinlock in non-preemptible kernels.  So, I think
> in_atomic would be better than the current check for !in_task.  That
> would handle this syzbot issue, but we could still have issues if the
> hugetlb put_page path is called while someone is holding a spinlock with
> all interrupts enabled.  Looks like there is no way to detect this
> today in non-preemptible kernels.  in_atomic does detect spinlocks held
> in preemptible kernels.

Paul what is the current plan with in_atomic to be usable for !PREEMPT
configurations?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-03  8:03                           ` Michal Hocko
@ 2021-03-03 17:59                             ` Paul E. McKenney
  2021-03-04  9:58                               ` Michal Hocko
  0 siblings, 1 reply; 25+ messages in thread
From: Paul E. McKenney @ 2021-03-03 17:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mike Kravetz, Shakeel Butt, syzbot, Andrew Morton, LKML,
	Linux MM, syzkaller-bugs, Eric Dumazet, Mina Almasry, tglx,
	john.ogness, urezki, ast

On Wed, Mar 03, 2021 at 09:03:27AM +0100, Michal Hocko wrote:
> [Add Paul]
> 
> On Tue 02-03-21 13:19:34, Mike Kravetz wrote:
> > On 3/2/21 6:29 AM, Michal Hocko wrote:
> > > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> > >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > >>>
> > >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > >>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> > >>>>> [...]
> > >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
> > >>>>>>> irq context unless I am mistaken.
> > >>>>>>>
> > >>>>>>
> > >>>>>> If I take the following example of syzbot's deadlock scenario then
> > >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >>>>>> context but has disabled softirqs (see __tcp_close()).
> > >>>>>>
> > >>>>>>         CPU0                    CPU1
> > >>>>>>         ----                    ----
> > >>>>>>    lock(hugetlb_lock);
> > >>>>>>                                 local_irq_disable();
> > >>>>>>                                 lock(slock-AF_INET);
> > >>>>>>                                 lock(hugetlb_lock);
> > >>>>>>    <Interrupt>
> > >>>>>>      lock(slock-AF_INET);
> > >>>>>>
> > > [...]
> > >>> Wouldn't something like this help? It is quite ugly but it would be
> > >>> simple enough and backportable while we come up with a more rigorous
> > >>> solution. What do you think?
> > >>>
> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> > >>> --- a/mm/hugetlb.c
> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> > >>>  void free_huge_page(struct page *page)
> > >>>  {
> > >>>         /*
> > >>> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> > >>> +        * Defer freeing if in non-task context or when put_page is called
> > >>> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> > >>> +        * avoid hugetlb_lock deadlock.
> > >>>          */
> > >>> -       if (!in_task()) {
> > >>> +       if (!in_task() || irqs_disabled()) {
> > >>
> > >> Does irqs_disabled() also check softirqs?
> > > 
> > > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > > claims irq disabled to be the trigger. But now that you are mentioning
> > > that it would be better to replace in_task() along the way. We have
> > > discussed that in another email thread and I was suggesting to use
> > > in_atomic() which should catch also bh disabled situation. The big IF is
> > > that this needs preempt count to be enabled unconditionally. There are
> > > changes in the RCU tree heading that direction.
> > 
> > I have not been following developments in preemption and the RCU tree. 
> > The comment for in_atomic() says:
> > 
> > /*
> >  * Are we running in atomic context?  WARNING: this macro cannot
> >  * always detect atomic context; in particular, it cannot know about
> >  * held spinlocks in non-preemptible kernels.  Thus it should not be
> >  * used in the general case to determine whether sleeping is possible.
> >  * Do not use in_atomic() in driver code.
> >  */
> > 
> > That does seem to be the case.  I verified in_atomic can detect softirq
> > context even in non-preemptible kernels.  But, as the comment says it
> > will not detect a held spinlock in non-preemptible kernels.  So, I think
> > in_atomic would be better than the current check for !in_task.  That
> > would handle this syzbot issue, but we could still have issues if the
> > hugetlb put_page path is called while someone is holding a spinlock with
> > all interrupts enabled.  Looks like there is no way to detect this
> > today in non-preemptible kernels.  in_atomic does detect spinlocks held
> > in preemptible kernels.
> 
> Paul what is the current plan with in_atomic to be usable for !PREEMPT
> configurations?

Ah, thank you for the reminder!  I have rebased that series on top of
v5.12-rc1 on -rcu branch tglx-pc.2021.03.03a.

The current state is that Linus is not convinced that this change is
worthwhile given that only RCU and printk() want it.  (BPF would use
it if it were available, but is willing to live without it, at least in
the short term.)

But maybe Linus will be convinced given your additional use case.
Here is hoping!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-03 17:59                             ` Paul E. McKenney
@ 2021-03-04  9:58                               ` Michal Hocko
  0 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2021-03-04  9:58 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Mike Kravetz, Shakeel Butt, syzbot, Andrew Morton, LKML,
	Linux MM, syzkaller-bugs, Eric Dumazet, Mina Almasry, tglx,
	john.ogness, urezki, ast

On Wed 03-03-21 09:59:45, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 09:03:27AM +0100, Michal Hocko wrote:
[...]
> > Paul what is the current plan with in_atomic to be usable for !PREEMPT
> > configurations?
> 
> Ah, thank you for the reminder!  I have rebased that series on top of
> v5.12-rc1 on -rcu branch tglx-pc.2021.03.03a.
> 
> The current state is that Linus is not convinced that this change is
> worthwhile given that only RCU and printk() want it.  (BPF would use
> it if it were available, but is willing to live without it, at least in
> the short term.)
> 
> But maybe Linus will be convinced given your additional use case.
> Here is hoping!

Yes, hugetlb freeing path would benefit from this. You can reference
this lockdep report (http://lkml.kernel.org/r/000000000000f1c03b05bc43aadc@google.com)
with an additional argument that making hugetlb_lock irq safe is a
larger undertaking and we will need something reasonably backportable
for older kernels as well.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: possible deadlock in sk_clone_lock
  2021-03-03  3:59                             ` Shakeel Butt
  (?)
@ 2021-03-05  9:09                             ` Michal Hocko
  -1 siblings, 0 replies; 25+ messages in thread
From: Michal Hocko @ 2021-03-05  9:09 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mike Kravetz, syzbot, Andrew Morton, LKML, Linux MM,
	syzkaller-bugs, Eric Dumazet, Mina Almasry

On Tue 02-03-21 19:59:22, Shakeel Butt wrote:
> On Tue, Mar 2, 2021 at 1:19 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
> >
> > On 3/2/21 6:29 AM, Michal Hocko wrote:
> > > On Tue 02-03-21 06:11:51, Shakeel Butt wrote:
> > >> On Tue, Mar 2, 2021 at 1:44 AM Michal Hocko <mhocko@suse.com> wrote:
> > >>>
> > >>> On Mon 01-03-21 17:16:29, Mike Kravetz wrote:
> > >>>> On 3/1/21 9:23 AM, Michal Hocko wrote:
> > >>>>> On Mon 01-03-21 08:39:22, Shakeel Butt wrote:
> > >>>>>> On Mon, Mar 1, 2021 at 7:57 AM Michal Hocko <mhocko@suse.com> wrote:
> > >>>>> [...]
> > >>>>>>> Then how come this can ever be a problem? in_task() should exclude soft
> > >>>>>>> irq context unless I am mistaken.
> > >>>>>>>
> > >>>>>>
> > >>>>>> If I take the following example of syzbot's deadlock scenario then
> > >>>>>> CPU1 is the one freeing the hugetlb pages. It is in the process
> > >>>>>> context but has disabled softirqs (see __tcp_close()).
> > >>>>>>
> > >>>>>>         CPU0                    CPU1
> > >>>>>>         ----                    ----
> > >>>>>>    lock(hugetlb_lock);
> > >>>>>>                                 local_irq_disable();
> > >>>>>>                                 lock(slock-AF_INET);
> > >>>>>>                                 lock(hugetlb_lock);
> > >>>>>>    <Interrupt>
> > >>>>>>      lock(slock-AF_INET);
> > >>>>>>
> > > [...]
> > >>> Wouldn't something like this help? It is quite ugly but it would be
> > >>> simple enough and backportable while we come up with a more rigorous
> > >>> solution. What do you think?
> > >>>
> > >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > >>> index 4bdb58ab14cb..c9a8b39f678d 100644
> > >>> --- a/mm/hugetlb.c
> > >>> +++ b/mm/hugetlb.c
> > >>> @@ -1495,9 +1495,11 @@ static DECLARE_WORK(free_hpage_work, free_hpage_workfn);
> > >>>  void free_huge_page(struct page *page)
> > >>>  {
> > >>>         /*
> > >>> -        * Defer freeing if in non-task context to avoid hugetlb_lock deadlock.
> > >>> +        * Defer freeing if in non-task context or when put_page is called
> > >>> +        * with IRQ disabled (e.g from via TCP slock dependency chain) to
> > >>> +        * avoid hugetlb_lock deadlock.
> > >>>          */
> > >>> -       if (!in_task()) {
> > >>> +       if (!in_task() || irqs_disabled()) {
> > >>
> > >> Does irqs_disabled() also check softirqs?
> > >
> > > Nope it doesn't AFAICS. I was referring to the above lockdep splat which
> > > claims irq disabled to be the trigger. But now that you are mentioning
> > > that it would be better to replace in_task() along the way. We have
> > > discussed that in another email thread and I was suggesting to use
> > > in_atomic() which should catch also bh disabled situation. The big IF is
> > > that this needs preempt count to be enabled unconditionally. There are
> > > changes in the RCU tree heading that direction.
> >
> > I have not been following developments in preemption and the RCU tree.
> > The comment for in_atomic() says:
> >
> > /*
> >  * Are we running in atomic context?  WARNING: this macro cannot
> >  * always detect atomic context; in particular, it cannot know about
> >  * held spinlocks in non-preemptible kernels.  Thus it should not be
> >  * used in the general case to determine whether sleeping is possible.
> >  * Do not use in_atomic() in driver code.
> >  */
> >
> > That does seem to be the case.  I verified in_atomic can detect softirq
> > context even in non-preemptible kernels.  But, as the comment says it
> > will not detect a held spinlock in non-preemptible kernels.  So, I think
> > in_atomic would be better than the current check for !in_task.  That
> > would handle this syzbot issue, but we could still have issues if the
> > hugetlb put_page path is called while someone is holding a spinlock with
> > all interrupts enabled.  Looks like there is no way to detect this
> > today in non-preemptible kernels.  in_atomic does detect spinlocks held
> > in preemptible kernels.
> >
> > I might suggest changing !in_task to in_atomic for now, and then work on
> > a more robust solution.  I'm afraid such a robust solution will
> > require considerable effort.  It would need to handle put_page being
> > called in any context: hardirq, softirq, spinlock held ...  The
> > put_page/free_huge_page path will need to offload (workqueue or
> > something else) any processing that can possibly sleep.
> >
> > Is it worth making the in_atomic change now, or should we just start
> > working on the more robust complete solution?
> 
> IMHO the change to in_atomic is beneficial because it will at least
> fix this specific issue. No reason to keep the users of TCP TX
> zerocopy from hugetlb pages broken for a more comprehensive solution.

Another option would be to select PREEMPT_COUNT when hugetlb is enabled.
That would reduce dependency on a patch I was talking about in other
email. Not nice but here we are...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-03-05  9:10 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-26 21:08 possible deadlock in sk_clone_lock syzbot
2021-02-26 21:08 ` syzbot
2021-02-26 22:44 ` Shakeel Butt
2021-02-26 22:44   ` Shakeel Butt
2021-02-26 23:14   ` Mike Kravetz
2021-02-27  0:00     ` Shakeel Butt
2021-02-27  0:00       ` Shakeel Butt
2021-03-01 12:11       ` Michal Hocko
2021-03-01 15:10         ` Shakeel Butt
2021-03-01 15:10           ` Shakeel Butt
2021-03-01 15:57           ` Michal Hocko
2021-03-01 16:39             ` Shakeel Butt
2021-03-01 16:39               ` Shakeel Butt
2021-03-01 17:23               ` Michal Hocko
2021-03-02  1:16                 ` Mike Kravetz
2021-03-02  9:44                   ` Michal Hocko
2021-03-02 14:11                     ` Shakeel Butt
2021-03-02 14:29                       ` Michal Hocko
2021-03-02 21:19                         ` Mike Kravetz
2021-03-03  3:59                           ` Shakeel Butt
2021-03-03  3:59                             ` Shakeel Butt
2021-03-05  9:09                             ` Michal Hocko
2021-03-03  8:03                           ` Michal Hocko
2021-03-03 17:59                             ` Paul E. McKenney
2021-03-04  9:58                               ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.