* WARNING in sk_stream_kill_queues (5) @ 2020-11-30 8:40 syzbot 2020-12-03 15:58 ` Marco Elver 0 siblings, 1 reply; 18+ messages in thread From: syzbot @ 2020-11-30 8:40 UTC (permalink / raw) To: akpm, davem, dvyukov, elver, glider, jannh, kuba, linux-kernel, netdev, sfr, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: 6147c83f Add linux-next specific files for 20201126 git tree: linux-next console output: https://syzkaller.appspot.com/x/log.txt?x=117c9679500000 kernel config: https://syzkaller.appspot.com/x/.config?x=9b91566da897c24f dashboard link: https://syzkaller.appspot.com/bug?extid=7b99aafdcc2eedea6178 compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103bf743500000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167c60c9500000 The issue was bisected to: commit 145cd60fb481328faafba76842aa0fd242e2b163 Author: Alexander Potapenko <glider@google.com> Date: Tue Nov 24 05:38:44 2020 +0000 mm, kfence: insert KFENCE hooks for SLUB bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13abe5b3500000 final oops: https://syzkaller.appspot.com/x/report.txt?x=106be5b3500000 console output: https://syzkaller.appspot.com/x/log.txt?x=17abe5b3500000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com Fixes: 145cd60fb481 ("mm, kfence: insert KFENCE hooks for SLUB") ------------[ cut here ]------------ WARNING: CPU: 0 PID: 11307 at net/core/stream.c:207 sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 Modules linked in: CPU: 0 PID: 11307 Comm: syz-executor673 Not tainted 5.10.0-rc5-next-20201126-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 Code: 00 00 00 fc ff df 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e 63 01 00 00 8b ab 20 02 00 00 e9 60 ff ff ff e8 ad 24 7b fa <0f> 0b eb 97 e8 a4 24 7b fa 0f 0b eb a0 e8 9b 24 7b fa 0f 0b e9 a5 RSP: 0018:ffffc9000979f978 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 00000000fffffe80 RCX: ffffffff86f5877a RDX: ffff88801ebb5040 RSI: ffffffff86f587e3 RDI: 0000000000000005 RBP: 0000000000000180 R08: 0000000000000001 R09: ffffffff8ebd9817 R10: 0000000000000000 R11: 0000000000000001 R12: ffff8880182f3ce0 R13: ffffffff8fb178c0 R14: ffff8880182f3ae8 R15: ffff8880182f3c70 FS: 0000000000000000(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000004c7cd8 CR3: 000000000b08e000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: inet_csk_destroy_sock+0x1a5/0x490 net/ipv4/inet_connection_sock.c:885 __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2585 tcp_close+0x29/0xc0 net/ipv4/tcp.c:2597 inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 __sock_release+0xcd/0x280 net/socket.c:596 sock_close+0x18/0x20 net/socket.c:1255 __fput+0x283/0x920 fs/file_table.c:280 task_work_run+0xdd/0x190 kernel/task_work.c:140 exit_task_work include/linux/task_work.h:30 [inline] do_exit+0xb89/0x29e0 kernel/exit.c:823 do_group_exit+0x125/0x310 kernel/exit.c:920 get_signal+0x3ec/0x2010 kernel/signal.c:2770 arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 handle_signal_work kernel/entry/common.c:144 [inline] exit_to_user_mode_loop kernel/entry/common.c:168 [inline] exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:198 syscall_exit_to_user_mode+0x36/0x260 kernel/entry/common.c:275 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x44ea59 Code: Unable to access opcode bytes at RIP 0x44ea2f. RSP: 002b:00007fd1200f3d98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: 0000000000012255 RBX: 00000000006e6a18 RCX: 000000000044ea59 RDX: 00000001000001bd RSI: 00000000200001c0 RDI: 0000000000000003 RBP: 00000000006e6a10 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e6a1c R13: 3030303030303030 R14: 3030303030303d65 R15: 2b74d0dd4a6f722c --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-11-30 8:40 WARNING in sk_stream_kill_queues (5) syzbot @ 2020-12-03 15:58 ` Marco Elver 2020-12-03 16:27 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-03 15:58 UTC (permalink / raw) To: netdev Cc: akpm, davem, dvyukov, glider, jannh, kuba, linux-kernel, sfr, syzkaller-bugs, edumazet, willemb, syzbot On Mon, Nov 30, 2020 at 12:40AM -0800, syzbot wrote: > Hello, > > syzbot found the following issue on: > > HEAD commit: 6147c83f Add linux-next specific files for 20201126 > git tree: linux-next > console output: https://syzkaller.appspot.com/x/log.txt?x=117c9679500000 > kernel config: https://syzkaller.appspot.com/x/.config?x=9b91566da897c24f > dashboard link: https://syzkaller.appspot.com/bug?extid=7b99aafdcc2eedea6178 > compiler: gcc (GCC) 10.1.0-syz 20200507 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103bf743500000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167c60c9500000 > > The issue was bisected to: > > commit 145cd60fb481328faafba76842aa0fd242e2b163 > Author: Alexander Potapenko <glider@google.com> > Date: Tue Nov 24 05:38:44 2020 +0000 > > mm, kfence: insert KFENCE hooks for SLUB > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13abe5b3500000 > final oops: https://syzkaller.appspot.com/x/report.txt?x=106be5b3500000 > console output: https://syzkaller.appspot.com/x/log.txt?x=17abe5b3500000 > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com > Fixes: 145cd60fb481 ("mm, kfence: insert KFENCE hooks for SLUB") > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 11307 at net/core/stream.c:207 sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 [...] > Call Trace: > inet_csk_destroy_sock+0x1a5/0x490 net/ipv4/inet_connection_sock.c:885 > __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2585 > tcp_close+0x29/0xc0 net/ipv4/tcp.c:2597 > inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 > __sock_release+0xcd/0x280 net/socket.c:596 > sock_close+0x18/0x20 net/socket.c:1255 > __fput+0x283/0x920 fs/file_table.c:280 > task_work_run+0xdd/0x190 kernel/task_work.c:140 > exit_task_work include/linux/task_work.h:30 [inline] > do_exit+0xb89/0x29e0 kernel/exit.c:823 > do_group_exit+0x125/0x310 kernel/exit.c:920 > get_signal+0x3ec/0x2010 kernel/signal.c:2770 > arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 > handle_signal_work kernel/entry/common.c:144 [inline] > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:198 > syscall_exit_to_user_mode+0x36/0x260 kernel/entry/common.c:275 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 I've been debugging this and I think enabling KFENCE uncovered that some code is assuming that the following is always true: ksize(kmalloc(S)) == ksize(kmalloc(S)) but I don't think this assumption can be made (with or without KFENCE). With KFENCE, we actually end up testing no code assumes this, because KFENCE's ksize() always returns the exact size S. I have narrowed it down to sk_wmem_queued becoming <0 in sk_wmem_free_skb(). The skb passed to sk_wmem_free_skb() and whose truesize causes sk_wmem_queued to become negative is always allocated in: | kmem_cache_alloc_node+0x140/0x400 mm/slub.c:2939 | __alloc_skb+0x6d/0x710 net/core/skbuff.c:198 | alloc_skb_fclone include/linux/skbuff.h:1144 [inline] | sk_stream_alloc_skb+0x109/0xc30 net/ipv4/tcp.c:888 | tso_fragment net/ipv4/tcp_output.c:2124 [inline] | tcp_write_xmit+0x1dbf/0x5ce0 net/ipv4/tcp_output.c:2674 | __tcp_push_pending_frames+0xaa/0x390 net/ipv4/tcp_output.c:2866 | tcp_push_pending_frames include/net/tcp.h:1864 [inline] | tcp_data_snd_check net/ipv4/tcp_input.c:5374 [inline] | tcp_rcv_established+0x8c9/0x1eb0 net/ipv4/tcp_input.c:5869 | tcp_v4_do_rcv+0x5d1/0x870 net/ipv4/tcp_ipv4.c:1668 | sk_backlog_rcv include/net/sock.h:1011 [inline] | __release_sock+0x134/0x3a0 net/core/sock.c:2523 | release_sock+0x54/0x1b0 net/core/sock.c:3053 | sk_wait_data+0x177/0x450 net/core/sock.c:2565 | tcp_recvmsg+0x17ea/0x2aa0 net/ipv4/tcp.c:2181 | inet_recvmsg+0x11b/0x5d0 net/ipv4/af_inet.c:848 | sock_recvmsg_nosec net/socket.c:885 [inline] | sock_recvmsg net/socket.c:903 [inline] | sock_recvmsg net/socket.c:899 [inline] | ____sys_recvmsg+0x2c4/0x600 net/socket.c:2563 | ___sys_recvmsg+0x127/0x200 net/socket.c:2605 | __sys_recvmsg+0xe2/0x1a0 net/socket.c:2641 | do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 | entry_SYSCALL_64_after_hwframe+0x44/0xa9 I used the code below to add some warnings that helped narrow it down. Does any of this help explain the problem? Thanks, -- Marco ------ >8 ------ diff --git a/include/net/sock.h b/include/net/sock.h index e8d958ef3ea0..ef4837f3aba4 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -35,6 +35,7 @@ #ifndef _SOCK_H #define _SOCK_H +#include <linux/kfence.h> #include <linux/hardirq.h> #include <linux/kernel.h> #include <linux/list.h> @@ -1534,7 +1535,15 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) DECLARE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) { + bool bad = false; + sk_wmem_queued_add(sk, -skb->truesize); + + if (WARN_ON(READ_ONCE(sk->sk_wmem_queued) == -384)) { + pr_info("wmem_queued=%d truesize=%u\n", sk->sk_wmem_queued, skb->truesize); + bad = true; + } + sk_mem_uncharge(sk, skb->truesize); if (static_branch_unlikely(&tcp_tx_skb_cache_key) && !sk->sk_tx_skb_cache && !skb_cloned(skb)) { @@ -1544,6 +1553,9 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) return; } __kfree_skb(skb); + + if (bad) + (void)READ_ONCE(skb->truesize); /* UAF to let KASAN show where it was allocated */ } static inline void sock_release_ownership(struct sock *sk) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index ffe3dcc0ebea..f365495819ee 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -208,6 +208,19 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); + + if (is_kfence_address(data)) + pr_info("kfence's ksize: %zu\n", ksize(data)); + /* + * BUG BUG + * Hypothesis: The problem is that some code assumes that: + * + * ksize(kmalloc(S)) == ksize(kmalloc(S)) + * + * Note: If we force no KFENCE allocation for @data above, the warnings + * disappear. KFENCE's ksize() always returns the exact size S. + */ + if (!data) goto nodata; /* kmalloc(size) might give us more room than requested. ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 15:58 ` Marco Elver @ 2020-12-03 16:27 ` Eric Dumazet 2020-12-03 16:34 ` Marco Elver 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2020-12-03 16:27 UTC (permalink / raw) To: Marco Elver Cc: netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, Dec 3, 2020 at 4:58 PM Marco Elver <elver@google.com> wrote: > > On Mon, Nov 30, 2020 at 12:40AM -0800, syzbot wrote: > > Hello, > > > > syzbot found the following issue on: > > > > HEAD commit: 6147c83f Add linux-next specific files for 20201126 > > git tree: linux-next > > console output: https://syzkaller.appspot.com/x/log.txt?x=117c9679500000 > > kernel config: https://syzkaller.appspot.com/x/.config?x=9b91566da897c24f > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b99aafdcc2eedea6178 > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103bf743500000 > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167c60c9500000 > > > > The issue was bisected to: > > > > commit 145cd60fb481328faafba76842aa0fd242e2b163 > > Author: Alexander Potapenko <glider@google.com> > > Date: Tue Nov 24 05:38:44 2020 +0000 > > > > mm, kfence: insert KFENCE hooks for SLUB > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13abe5b3500000 > > final oops: https://syzkaller.appspot.com/x/report.txt?x=106be5b3500000 > > console output: https://syzkaller.appspot.com/x/log.txt?x=17abe5b3500000 > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com > > Fixes: 145cd60fb481 ("mm, kfence: insert KFENCE hooks for SLUB") > > > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 11307 at net/core/stream.c:207 sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 > [...] > > Call Trace: > > inet_csk_destroy_sock+0x1a5/0x490 net/ipv4/inet_connection_sock.c:885 > > __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2585 > > tcp_close+0x29/0xc0 net/ipv4/tcp.c:2597 > > inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 > > __sock_release+0xcd/0x280 net/socket.c:596 > > sock_close+0x18/0x20 net/socket.c:1255 > > __fput+0x283/0x920 fs/file_table.c:280 > > task_work_run+0xdd/0x190 kernel/task_work.c:140 > > exit_task_work include/linux/task_work.h:30 [inline] > > do_exit+0xb89/0x29e0 kernel/exit.c:823 > > do_group_exit+0x125/0x310 kernel/exit.c:920 > > get_signal+0x3ec/0x2010 kernel/signal.c:2770 > > arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 > > handle_signal_work kernel/entry/common.c:144 [inline] > > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > > exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:198 > > syscall_exit_to_user_mode+0x36/0x260 kernel/entry/common.c:275 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > I've been debugging this and I think enabling KFENCE uncovered that some > code is assuming that the following is always true: > > ksize(kmalloc(S)) == ksize(kmalloc(S)) > I do not think we make this assumption. Each skb tracks the 'truesize' which is populated from __alloc_skb() using ksize(allocated head) . So if ksize() decides to give us random data, it should be still fine, because we use ksize(buff) only once at alloc skb time, and record the value in skb->truesize (only the socket buffer accounting would be off) > but I don't think this assumption can be made (with or without KFENCE). > > With KFENCE, we actually end up testing no code assumes this, because > KFENCE's ksize() always returns the exact size S. > > I have narrowed it down to sk_wmem_queued becoming <0 in > sk_wmem_free_skb(). > > The skb passed to sk_wmem_free_skb() and whose truesize causes > sk_wmem_queued to become negative is always allocated in: > > | kmem_cache_alloc_node+0x140/0x400 mm/slub.c:2939 > | __alloc_skb+0x6d/0x710 net/core/skbuff.c:198 > | alloc_skb_fclone include/linux/skbuff.h:1144 [inline] > | sk_stream_alloc_skb+0x109/0xc30 net/ipv4/tcp.c:888 > | tso_fragment net/ipv4/tcp_output.c:2124 [inline] > | tcp_write_xmit+0x1dbf/0x5ce0 net/ipv4/tcp_output.c:2674 > | __tcp_push_pending_frames+0xaa/0x390 net/ipv4/tcp_output.c:2866 > | tcp_push_pending_frames include/net/tcp.h:1864 [inline] > | tcp_data_snd_check net/ipv4/tcp_input.c:5374 [inline] > | tcp_rcv_established+0x8c9/0x1eb0 net/ipv4/tcp_input.c:5869 > | tcp_v4_do_rcv+0x5d1/0x870 net/ipv4/tcp_ipv4.c:1668 > | sk_backlog_rcv include/net/sock.h:1011 [inline] > | __release_sock+0x134/0x3a0 net/core/sock.c:2523 > | release_sock+0x54/0x1b0 net/core/sock.c:3053 > | sk_wait_data+0x177/0x450 net/core/sock.c:2565 > | tcp_recvmsg+0x17ea/0x2aa0 net/ipv4/tcp.c:2181 > | inet_recvmsg+0x11b/0x5d0 net/ipv4/af_inet.c:848 > | sock_recvmsg_nosec net/socket.c:885 [inline] > | sock_recvmsg net/socket.c:903 [inline] > | sock_recvmsg net/socket.c:899 [inline] > | ____sys_recvmsg+0x2c4/0x600 net/socket.c:2563 > | ___sys_recvmsg+0x127/0x200 net/socket.c:2605 > | __sys_recvmsg+0xe2/0x1a0 net/socket.c:2641 > | do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > | entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > I used the code below to add some warnings that helped narrow it down. > > Does any of this help explain the problem? Not yet :) tso_fragment() transfers some payload from one skb to another, and properly shifts this amount from src skb to dst skb (@buff) : sk_wmem_queued_add(sk, buff->truesize); buff->truesize += nlen; skb->truesize -= nlen; > > Thanks, > -- Marco > > ------ >8 ------ > > > diff --git a/include/net/sock.h b/include/net/sock.h > index e8d958ef3ea0..ef4837f3aba4 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -35,6 +35,7 @@ > #ifndef _SOCK_H > #define _SOCK_H > > +#include <linux/kfence.h> > #include <linux/hardirq.h> > #include <linux/kernel.h> > #include <linux/list.h> > @@ -1534,7 +1535,15 @@ static inline void sk_mem_uncharge(struct sock *sk, int size) > DECLARE_STATIC_KEY_FALSE(tcp_tx_skb_cache_key); > static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) > { > + bool bad = false; > + > sk_wmem_queued_add(sk, -skb->truesize); > + > + if (WARN_ON(READ_ONCE(sk->sk_wmem_queued) == -384)) { > + pr_info("wmem_queued=%d truesize=%u\n", sk->sk_wmem_queued, skb->truesize); > + bad = true; > + } > + > sk_mem_uncharge(sk, skb->truesize); > if (static_branch_unlikely(&tcp_tx_skb_cache_key) && > !sk->sk_tx_skb_cache && !skb_cloned(skb)) { > @@ -1544,6 +1553,9 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb) > return; > } > __kfree_skb(skb); > + > + if (bad) > + (void)READ_ONCE(skb->truesize); /* UAF to let KASAN show where it was allocated */ > } > > static inline void sock_release_ownership(struct sock *sk) > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index ffe3dcc0ebea..f365495819ee 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -208,6 +208,19 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > size = SKB_DATA_ALIGN(size); > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > + > + if (is_kfence_address(data)) > + pr_info("kfence's ksize: %zu\n", ksize(data)); > + /* > + * BUG BUG > + * Hypothesis: The problem is that some code assumes that: > + * > + * ksize(kmalloc(S)) == ksize(kmalloc(S)) > + * > + * Note: If we force no KFENCE allocation for @data above, the warnings > + * disappear. KFENCE's ksize() always returns the exact size S. > + */ > + > if (!data) > goto nodata; > /* kmalloc(size) might give us more room than requested. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 16:27 ` Eric Dumazet @ 2020-12-03 16:34 ` Marco Elver 2020-12-03 16:42 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-03 16:34 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, 3 Dec 2020 at 17:27, Eric Dumazet <edumazet@google.com> wrote: > On Thu, Dec 3, 2020 at 4:58 PM Marco Elver <elver@google.com> wrote: > > > > On Mon, Nov 30, 2020 at 12:40AM -0800, syzbot wrote: > > > Hello, > > > > > > syzbot found the following issue on: > > > > > > HEAD commit: 6147c83f Add linux-next specific files for 20201126 > > > git tree: linux-next > > > console output: https://syzkaller.appspot.com/x/log.txt?x=117c9679500000 > > > kernel config: https://syzkaller.appspot.com/x/.config?x=9b91566da897c24f > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b99aafdcc2eedea6178 > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103bf743500000 > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167c60c9500000 > > > > > > The issue was bisected to: > > > > > > commit 145cd60fb481328faafba76842aa0fd242e2b163 > > > Author: Alexander Potapenko <glider@google.com> > > > Date: Tue Nov 24 05:38:44 2020 +0000 > > > > > > mm, kfence: insert KFENCE hooks for SLUB > > > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13abe5b3500000 > > > final oops: https://syzkaller.appspot.com/x/report.txt?x=106be5b3500000 > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17abe5b3500000 > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com > > > Fixes: 145cd60fb481 ("mm, kfence: insert KFENCE hooks for SLUB") > > > > > > ------------[ cut here ]------------ > > > WARNING: CPU: 0 PID: 11307 at net/core/stream.c:207 sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 > > [...] > > > Call Trace: > > > inet_csk_destroy_sock+0x1a5/0x490 net/ipv4/inet_connection_sock.c:885 > > > __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2585 > > > tcp_close+0x29/0xc0 net/ipv4/tcp.c:2597 > > > inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 > > > __sock_release+0xcd/0x280 net/socket.c:596 > > > sock_close+0x18/0x20 net/socket.c:1255 > > > __fput+0x283/0x920 fs/file_table.c:280 > > > task_work_run+0xdd/0x190 kernel/task_work.c:140 > > > exit_task_work include/linux/task_work.h:30 [inline] > > > do_exit+0xb89/0x29e0 kernel/exit.c:823 > > > do_group_exit+0x125/0x310 kernel/exit.c:920 > > > get_signal+0x3ec/0x2010 kernel/signal.c:2770 > > > arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 > > > handle_signal_work kernel/entry/common.c:144 [inline] > > > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > > > exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:198 > > > syscall_exit_to_user_mode+0x36/0x260 kernel/entry/common.c:275 > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > I've been debugging this and I think enabling KFENCE uncovered that some > > code is assuming that the following is always true: > > > > ksize(kmalloc(S)) == ksize(kmalloc(S)) > > > > > I do not think we make this assumption. > > Each skb tracks the 'truesize' which is populated from __alloc_skb() > using ksize(allocated head) . > > So if ksize() decides to give us random data, it should be still fine, > because we use ksize(buff) only once at alloc skb time, and record the > value in skb->truesize > (only the socket buffer accounting would be off) Good, thanks for clarifying. So something else must be off then. > > but I don't think this assumption can be made (with or without KFENCE). > > > > With KFENCE, we actually end up testing no code assumes this, because > > KFENCE's ksize() always returns the exact size S. > > > > I have narrowed it down to sk_wmem_queued becoming <0 in > > sk_wmem_free_skb(). > > > > The skb passed to sk_wmem_free_skb() and whose truesize causes > > sk_wmem_queued to become negative is always allocated in: > > > > | kmem_cache_alloc_node+0x140/0x400 mm/slub.c:2939 > > | __alloc_skb+0x6d/0x710 net/core/skbuff.c:198 > > | alloc_skb_fclone include/linux/skbuff.h:1144 [inline] > > | sk_stream_alloc_skb+0x109/0xc30 net/ipv4/tcp.c:888 > > | tso_fragment net/ipv4/tcp_output.c:2124 [inline] > > | tcp_write_xmit+0x1dbf/0x5ce0 net/ipv4/tcp_output.c:2674 > > | __tcp_push_pending_frames+0xaa/0x390 net/ipv4/tcp_output.c:2866 > > | tcp_push_pending_frames include/net/tcp.h:1864 [inline] > > | tcp_data_snd_check net/ipv4/tcp_input.c:5374 [inline] > > | tcp_rcv_established+0x8c9/0x1eb0 net/ipv4/tcp_input.c:5869 > > | tcp_v4_do_rcv+0x5d1/0x870 net/ipv4/tcp_ipv4.c:1668 > > | sk_backlog_rcv include/net/sock.h:1011 [inline] > > | __release_sock+0x134/0x3a0 net/core/sock.c:2523 > > | release_sock+0x54/0x1b0 net/core/sock.c:3053 > > | sk_wait_data+0x177/0x450 net/core/sock.c:2565 > > | tcp_recvmsg+0x17ea/0x2aa0 net/ipv4/tcp.c:2181 > > | inet_recvmsg+0x11b/0x5d0 net/ipv4/af_inet.c:848 > > | sock_recvmsg_nosec net/socket.c:885 [inline] > > | sock_recvmsg net/socket.c:903 [inline] > > | sock_recvmsg net/socket.c:899 [inline] > > | ____sys_recvmsg+0x2c4/0x600 net/socket.c:2563 > > | ___sys_recvmsg+0x127/0x200 net/socket.c:2605 > > | __sys_recvmsg+0xe2/0x1a0 net/socket.c:2641 > > | do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 > > | entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > I used the code below to add some warnings that helped narrow it down. > > > > Does any of this help explain the problem? > > Not yet :) Damn... > tso_fragment() transfers some payload from one skb to another, and > properly shifts this amount from src skb to dst skb (@buff) : > > sk_wmem_queued_add(sk, buff->truesize); > buff->truesize += nlen; > skb->truesize -= nlen; Any other stacktraces that might help? Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 16:34 ` Marco Elver @ 2020-12-03 16:42 ` Eric Dumazet 2020-12-03 17:41 ` Marco Elver 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2020-12-03 16:42 UTC (permalink / raw) To: Marco Elver Cc: netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, Dec 3, 2020 at 5:34 PM Marco Elver <elver@google.com> wrote: > > On Thu, 3 Dec 2020 at 17:27, Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Dec 3, 2020 at 4:58 PM Marco Elver <elver@google.com> wrote: > > > > > > On Mon, Nov 30, 2020 at 12:40AM -0800, syzbot wrote: > > > > Hello, > > > > > > > > syzbot found the following issue on: > > > > > > > > HEAD commit: 6147c83f Add linux-next specific files for 20201126 > > > > git tree: linux-next > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=117c9679500000 > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=9b91566da897c24f > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b99aafdcc2eedea6178 > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103bf743500000 > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167c60c9500000 > > > > > > > > The issue was bisected to: > > > > > > > > commit 145cd60fb481328faafba76842aa0fd242e2b163 > > > > Author: Alexander Potapenko <glider@google.com> > > > > Date: Tue Nov 24 05:38:44 2020 +0000 > > > > > > > > mm, kfence: insert KFENCE hooks for SLUB > > > > > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13abe5b3500000 > > > > final oops: https://syzkaller.appspot.com/x/report.txt?x=106be5b3500000 > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17abe5b3500000 > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com > > > > Fixes: 145cd60fb481 ("mm, kfence: insert KFENCE hooks for SLUB") > > > > > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 0 PID: 11307 at net/core/stream.c:207 sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 > > > [...] > > > > Call Trace: > > > > inet_csk_destroy_sock+0x1a5/0x490 net/ipv4/inet_connection_sock.c:885 > > > > __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2585 > > > > tcp_close+0x29/0xc0 net/ipv4/tcp.c:2597 > > > > inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 > > > > __sock_release+0xcd/0x280 net/socket.c:596 > > > > sock_close+0x18/0x20 net/socket.c:1255 > > > > __fput+0x283/0x920 fs/file_table.c:280 > > > > task_work_run+0xdd/0x190 kernel/task_work.c:140 > > > > exit_task_work include/linux/task_work.h:30 [inline] > > > > do_exit+0xb89/0x29e0 kernel/exit.c:823 > > > > do_group_exit+0x125/0x310 kernel/exit.c:920 > > > > get_signal+0x3ec/0x2010 kernel/signal.c:2770 > > > > arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 > > > > handle_signal_work kernel/entry/common.c:144 [inline] > > > > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > > > > exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:198 > > > > syscall_exit_to_user_mode+0x36/0x260 kernel/entry/common.c:275 > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > I've been debugging this and I think enabling KFENCE uncovered that some > > > code is assuming that the following is always true: > > > > > > ksize(kmalloc(S)) == ksize(kmalloc(S)) > > > > > > > > > I do not think we make this assumption. > > > > Each skb tracks the 'truesize' which is populated from __alloc_skb() > > using ksize(allocated head) . > > > > So if ksize() decides to give us random data, it should be still fine, > > because we use ksize(buff) only once at alloc skb time, and record the > > value in skb->truesize > > (only the socket buffer accounting would be off) > > Good, thanks for clarifying. So something else must be off then. Actually we might have the following assumption : buff = kmalloc(size, GFP...) if (buff) ASSERT(ksize(buff) >= size) So obviously ksize() should not be completely random ;) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 16:42 ` Eric Dumazet @ 2020-12-03 17:41 ` Marco Elver 2020-12-03 18:01 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-03 17:41 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, Dec 03, 2020 at 05:42PM +0100, Eric Dumazet wrote: > On Thu, Dec 3, 2020 at 5:34 PM Marco Elver <elver@google.com> wrote: > > > > On Thu, 3 Dec 2020 at 17:27, Eric Dumazet <edumazet@google.com> wrote: > > > On Thu, Dec 3, 2020 at 4:58 PM Marco Elver <elver@google.com> wrote: > > > > > > > > On Mon, Nov 30, 2020 at 12:40AM -0800, syzbot wrote: > > > > > Hello, > > > > > > > > > > syzbot found the following issue on: > > > > > > > > > > HEAD commit: 6147c83f Add linux-next specific files for 20201126 > > > > > git tree: linux-next > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=117c9679500000 > > > > > kernel config: https://syzkaller.appspot.com/x/.config?x=9b91566da897c24f > > > > > dashboard link: https://syzkaller.appspot.com/bug?extid=7b99aafdcc2eedea6178 > > > > > compiler: gcc (GCC) 10.1.0-syz 20200507 > > > > > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=103bf743500000 > > > > > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=167c60c9500000 > > > > > > > > > > The issue was bisected to: > > > > > > > > > > commit 145cd60fb481328faafba76842aa0fd242e2b163 > > > > > Author: Alexander Potapenko <glider@google.com> > > > > > Date: Tue Nov 24 05:38:44 2020 +0000 > > > > > > > > > > mm, kfence: insert KFENCE hooks for SLUB > > > > > > > > > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=13abe5b3500000 > > > > > final oops: https://syzkaller.appspot.com/x/report.txt?x=106be5b3500000 > > > > > console output: https://syzkaller.appspot.com/x/log.txt?x=17abe5b3500000 > > > > > > > > > > IMPORTANT: if you fix the issue, please add the following tag to the commit: > > > > > Reported-by: syzbot+7b99aafdcc2eedea6178@syzkaller.appspotmail.com > > > > > Fixes: 145cd60fb481 ("mm, kfence: insert KFENCE hooks for SLUB") > > > > > > > > > > ------------[ cut here ]------------ > > > > > WARNING: CPU: 0 PID: 11307 at net/core/stream.c:207 sk_stream_kill_queues+0x3c3/0x530 net/core/stream.c:207 > > > > [...] > > > > > Call Trace: > > > > > inet_csk_destroy_sock+0x1a5/0x490 net/ipv4/inet_connection_sock.c:885 > > > > > __tcp_close+0xd3e/0x1170 net/ipv4/tcp.c:2585 > > > > > tcp_close+0x29/0xc0 net/ipv4/tcp.c:2597 > > > > > inet_release+0x12e/0x280 net/ipv4/af_inet.c:431 > > > > > __sock_release+0xcd/0x280 net/socket.c:596 > > > > > sock_close+0x18/0x20 net/socket.c:1255 > > > > > __fput+0x283/0x920 fs/file_table.c:280 > > > > > task_work_run+0xdd/0x190 kernel/task_work.c:140 > > > > > exit_task_work include/linux/task_work.h:30 [inline] > > > > > do_exit+0xb89/0x29e0 kernel/exit.c:823 > > > > > do_group_exit+0x125/0x310 kernel/exit.c:920 > > > > > get_signal+0x3ec/0x2010 kernel/signal.c:2770 > > > > > arch_do_signal_or_restart+0x2a8/0x1eb0 arch/x86/kernel/signal.c:811 > > > > > handle_signal_work kernel/entry/common.c:144 [inline] > > > > > exit_to_user_mode_loop kernel/entry/common.c:168 [inline] > > > > > exit_to_user_mode_prepare+0x124/0x200 kernel/entry/common.c:198 > > > > > syscall_exit_to_user_mode+0x36/0x260 kernel/entry/common.c:275 > > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > > > > > I've been debugging this and I think enabling KFENCE uncovered that some > > > > code is assuming that the following is always true: > > > > > > > > ksize(kmalloc(S)) == ksize(kmalloc(S)) > > > > > > > > > > > > > I do not think we make this assumption. > > > > > > Each skb tracks the 'truesize' which is populated from __alloc_skb() > > > using ksize(allocated head) . > > > > > > So if ksize() decides to give us random data, it should be still fine, > > > because we use ksize(buff) only once at alloc skb time, and record the > > > value in skb->truesize > > > (only the socket buffer accounting would be off) > > > > Good, thanks for clarifying. So something else must be off then. > > Actually we might have the following assumption : > > buff = kmalloc(size, GFP...) > if (buff) > ASSERT(ksize(buff) >= size) > > So obviously ksize() should not be completely random ;) One more experiment -- simply adding --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, */ size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); + size = 1 << kmalloc_index(size); /* HACK */ data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); also got rid of the warnings. Something must be off with some value that is computed in terms of ksize(). If not, I don't have any explanation for why the above hides the problem. Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 17:41 ` Marco Elver @ 2020-12-03 18:01 ` Eric Dumazet 2020-12-07 15:30 ` Marco Elver ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Eric Dumazet @ 2020-12-03 18:01 UTC (permalink / raw) To: Marco Elver, Eric Dumazet Cc: netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On 12/3/20 6:41 PM, Marco Elver wrote: > One more experiment -- simply adding > > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > */ > size = SKB_DATA_ALIGN(size); > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > + size = 1 << kmalloc_index(size); /* HACK */ > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > also got rid of the warnings. Something must be off with some value that > is computed in terms of ksize(). If not, I don't have any explanation > for why the above hides the problem. Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) hae some kind of assumptions, I will double check this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 18:01 ` Eric Dumazet @ 2020-12-07 15:30 ` Marco Elver 2020-12-07 16:28 ` Marco Elver 2020-12-08 19:06 ` Marco Elver 2 siblings, 0 replies; 18+ messages in thread From: Marco Elver @ 2020-12-07 15:30 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, Dec 03, 2020 at 07:01PM +0100, Eric Dumazet wrote: > On 12/3/20 6:41 PM, Marco Elver wrote: > > > One more experiment -- simply adding > > > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > */ > > size = SKB_DATA_ALIGN(size); > > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + size = 1 << kmalloc_index(size); /* HACK */ > > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > > > > also got rid of the warnings. Something must be off with some value that > > is computed in terms of ksize(). If not, I don't have any explanation > > for why the above hides the problem. > > Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) > hae some kind of assumptions, I will double check this. I looked at some of these macros and am wondering why SKB_TRUESIZE() uses SKB_DATA_ALIGN(sizeof(struct sk_buff)). Because I don't understand how the memcaches that allocate sk_buff are aligned or somehow always return SKB_DATA_ALIGN(sizeof(struct sk_buff)) sized objects -- a simple BUG_ON(ksize(skb) != SKB_DATA_ALIGN(sizeof(struct sk_buff))) triggers. Alas, doing something like: --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -235,7 +235,7 @@ /* return minimum truesize of one skb containing X bytes of data */ #define SKB_TRUESIZE(X) ((X) + \ - SKB_DATA_ALIGN(sizeof(struct sk_buff)) + \ + sizeof(struct sk_buff) + \ SKB_DATA_ALIGN(sizeof(struct skb_shared_info))) does not fix the problem. Still trying to debug, because I don't want this to block the SLUB enablement of KFENCE, even if it turns out it's not KFENCE. :-/ Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 18:01 ` Eric Dumazet 2020-12-07 15:30 ` Marco Elver @ 2020-12-07 16:28 ` Marco Elver 2020-12-08 19:06 ` Marco Elver 2 siblings, 0 replies; 18+ messages in thread From: Marco Elver @ 2020-12-07 16:28 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, Dec 03, 2020 at 07:01PM +0100, Eric Dumazet wrote: > On 12/3/20 6:41 PM, Marco Elver wrote: > > > One more experiment -- simply adding > > > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > */ > > size = SKB_DATA_ALIGN(size); > > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + size = 1 << kmalloc_index(size); /* HACK */ > > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > > > > also got rid of the warnings. Something must be off with some value that > > is computed in terms of ksize(). If not, I don't have any explanation > > for why the above hides the problem. > > Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) > hae some kind of assumptions, I will double check this. Some more data; removing all uses of ksize() fixes the warnings: | --- a/net/core/skbuff.c | +++ b/net/core/skbuff.c | @@ -214,7 +214,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, | * Put skb_shared_info exactly at the end of allocated zone, | * to allow max possible filling before reallocation. | */ | - size = SKB_WITH_OVERHEAD(ksize(data)); | + size = SKB_WITH_OVERHEAD(size); | prefetchw(data + size); | | /* | @@ -1628,7 +1628,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, | gfp_mask, NUMA_NO_NODE, NULL); | if (!data) | goto nodata; | - size = SKB_WITH_OVERHEAD(ksize(data)); | + size = SKB_WITH_OVERHEAD(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); ^^ Reverting *only* this to 'ksize(data)' triggers the warning. | /* Copy only real data... and, alas, header. This should be | * optimized for the cases when header is void. | @@ -5901,7 +5901,7 @@ static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off, | if (!data) | return -ENOMEM; | | - size = SKB_WITH_OVERHEAD(ksize(data)); | + size = SKB_WITH_OVERHEAD(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); | | /* Copy real data, and all frags */ | skb_copy_from_linear_data_offset(skb, off, data, new_hlen); | @@ -6025,7 +6025,7 @@ static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off, | if (!data) | return -ENOMEM; | | - size = SKB_WITH_OVERHEAD(ksize(data)); | + size = SKB_WITH_OVERHEAD(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); | | memcpy((struct skb_shared_info *)(data + size), | skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0])); Conversely, only doing this also fixes the warnings: | --- a/net/core/skbuff.c | +++ b/net/core/skbuff.c | @@ -1628,7 +1628,7 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, | gfp_mask, NUMA_NO_NODE, NULL); | if (!data) | goto nodata; | - size = SKB_WITH_OVERHEAD(ksize(data)); | + size = SKB_WITH_OVERHEAD(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); | | /* Copy only real data... and, alas, header. This should be | * optimized for the cases when header is void. But not sure if any of this is helpful, since in the end what we want is to make a bunch of subtractions reach precisely 0, and any deviation somewhere might, by chance, achieve that. Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-03 18:01 ` Eric Dumazet 2020-12-07 15:30 ` Marco Elver 2020-12-07 16:28 ` Marco Elver @ 2020-12-08 19:06 ` Marco Elver 2020-12-09 12:47 ` Marco Elver 2 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-08 19:06 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, 3 Dec 2020 at 19:01, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On 12/3/20 6:41 PM, Marco Elver wrote: > > > One more experiment -- simply adding > > > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > */ > > size = SKB_DATA_ALIGN(size); > > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + size = 1 << kmalloc_index(size); /* HACK */ > > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > > > > also got rid of the warnings. Something must be off with some value that > > is computed in terms of ksize(). If not, I don't have any explanation > > for why the above hides the problem. > > Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) > hae some kind of assumptions, I will double check this. If I force kfence to return 4K sized allocations for everything, the warnings remain. That might suggest that it's not due to a missed ALIGN. Is it possible that copies or moves are a problem? E.g. we copy something from kfence -> non-kfence object (or vice-versa), and ksize() no longer matches, then things go wrong? Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-08 19:06 ` Marco Elver @ 2020-12-09 12:47 ` Marco Elver 2020-12-10 16:51 ` Marco Elver 0 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-09 12:47 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot [-- Attachment #1: Type: text/plain, Size: 1635 bytes --] On Tue, Dec 08, 2020 at 08:06PM +0100, Marco Elver wrote: > On Thu, 3 Dec 2020 at 19:01, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On 12/3/20 6:41 PM, Marco Elver wrote: > > > > > One more experiment -- simply adding > > > > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > > */ > > > size = SKB_DATA_ALIGN(size); > > > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > + size = 1 << kmalloc_index(size); /* HACK */ > > > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > > > > > > > also got rid of the warnings. Something must be off with some value that > > > is computed in terms of ksize(). If not, I don't have any explanation > > > for why the above hides the problem. > > > > Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) > > hae some kind of assumptions, I will double check this. > > If I force kfence to return 4K sized allocations for everything, the > warnings remain. That might suggest that it's not due to a missed > ALIGN. > > Is it possible that copies or moves are a problem? E.g. we copy > something from kfence -> non-kfence object (or vice-versa), and > ksize() no longer matches, then things go wrong? I was able to narrow it down to allocations of size 640. I also narrowed it down to 5 allocations that go through kfence that start triggering the issue. I have attached the list of those 5 allocations with allocation + free stacks. I'll try to go through them, maybe I get lucky eventually. :-) Thanks, -- Marco [-- Attachment #2: suspect-allocations.log --] [-- Type: text/plain, Size: 18154 bytes --] kfence-#0 [0xffff888436814000-0xffff88843681427f, size=640, cache=kmalloc-1k] allocated by task 5298: __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 alloc_skb_fclone include/linux/skbuff.h:1144 [inline] sk_stream_alloc_skb+0xd3/0x650 net/ipv4/tcp.c:888 tcp_fragment+0x124/0xac0 net/ipv4/tcp_output.c:1569 __tcp_retransmit_skb+0x5e3/0x1770 net/ipv4/tcp_output.c:3183 tcp_retransmit_skb+0x2a/0x200 net/ipv4/tcp_output.c:3257 tcp_retransmit_timer+0x958/0x1a60 net/ipv4/tcp_timer.c:527 tcp_write_timer_handler+0x4a6/0x5d0 net/ipv4/tcp_timer.c:610 tcp_write_timer+0x86/0x270 net/ipv4/tcp_timer.c:630 call_timer_fn+0x145/0x510 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x519/0x680 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0x6f/0x110 kernel/time/timer.c:1744 __do_softirq+0x132/0x40b kernel/softirq.c:298 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x58/0x70 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0xcd/0x110 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x38/0xd0 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 native_restore_fl arch/x86/include/asm/irqflags.h:41 [inline] arch_local_irq_restore arch/x86/include/asm/irqflags.h:84 [inline] __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] _raw_spin_unlock_irqrestore+0x20/0x40 kernel/locking/spinlock.c:191 spin_unlock_irqrestore include/linux/spinlock.h:409 [inline] free_debug_processing+0x1fc/0x2e0 mm/slub.c:1255 __slab_free+0x130/0x5b0 mm/slub.c:2991 do_slab_free mm/slub.c:3145 [inline] slab_free mm/slub.c:3158 [inline] kfree+0x532/0x580 mm/slub.c:4156 tomoyo_realpath_from_path+0x12b/0x3d0 security/tomoyo/realpath.c:291 tomoyo_get_realpath security/tomoyo/file.c:151 [inline] tomoyo_path_number_perm+0x11d/0x420 security/tomoyo/file.c:723 tomoyo_path_chmod+0x21/0x30 security/tomoyo/tomoyo.c:342 security_path_chmod+0x87/0xc0 security/security.c:1152 chmod_common+0xbd/0x280 fs/open.c:578 vfs_fchmod fs/open.c:598 [inline] __do_sys_fchmod fs/open.c:607 [inline] __se_sys_fchmod fs/open.c:601 [inline] __x64_sys_fchmod+0xaa/0x100 fs/open.c:601 do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 freed by task 12099: skb_free_head net/core/skbuff.c:595 [inline] skb_release_data+0x499/0x4e0 net/core/skbuff.c:615 skb_release_all net/core/skbuff.c:669 [inline] __kfree_skb+0x34/0x50 net/core/skbuff.c:683 sk_wmem_free_skb include/net/sock.h:1546 [inline] tcp_rtx_queue_unlink_and_free include/net/tcp.h:1856 [inline] tcp_clean_rtx_queue net/ipv4/tcp_input.c:3251 [inline] tcp_ack+0x124a/0x3450 net/ipv4/tcp_input.c:3795 tcp_rcv_established+0x367/0x10b0 net/ipv4/tcp_input.c:5858 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 sk_backlog_rcv include/net/sock.h:1010 [inline] __release_sock+0xd7/0x260 net/core/sock.c:2523 release_sock+0x40/0x120 net/core/sock.c:3053 sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 sock_recvmsg_nosec net/socket.c:885 [inline] sock_recvmsg net/socket.c:903 [inline] sock_recvmsg net/socket.c:899 [inline] ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 __sys_recvmsg+0x8b/0x130 net/socket.c:2641 __do_sys_recvmsg net/socket.c:2651 [inline] __se_sys_recvmsg net/socket.c:2648 [inline] __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 --------------------------------- kfence-#1 [0xffff888436816000-0xffff88843681627f, size=640, cache=kmalloc-1k] allocated by task 29: __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 alloc_skb_fclone include/linux/skbuff.h:1144 [inline] sk_stream_alloc_skb+0xd3/0x650 net/ipv4/tcp.c:888 tcp_fragment+0x124/0xac0 net/ipv4/tcp_output.c:1569 __tcp_retransmit_skb+0x5e3/0x1770 net/ipv4/tcp_output.c:3183 tcp_retransmit_skb+0x2a/0x200 net/ipv4/tcp_output.c:3257 tcp_xmit_retransmit_queue.part.0+0x389/0x6f0 net/ipv4/tcp_output.c:3339 tcp_xmit_retransmit_queue+0x36/0x40 net/ipv4/tcp_output.c:3293 tcp_xmit_recovery net/ipv4/tcp_input.c:3652 [inline] tcp_xmit_recovery+0x64/0xe0 net/ipv4/tcp_input.c:3638 tcp_ack+0x1a60/0x3450 net/ipv4/tcp_input.c:3825 tcp_rcv_established+0x367/0x10b0 net/ipv4/tcp_input.c:5858 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 tcp_v4_rcv+0x1e29/0x20c0 net/ipv4/tcp_ipv4.c:2050 ip_protocol_deliver_rcu+0x31/0x4f0 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x111/0x150 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip_local_deliver+0x244/0x250 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:447 [inline] ip_rcv_finish+0x14a/0x1d0 net/ipv4/ip_input.c:428 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip_rcv+0x1c4/0x1d0 net/ipv4/ip_input.c:539 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5316 __netif_receive_skb+0x29/0xe0 net/core/dev.c:5430 process_backlog+0x169/0x350 net/core/dev.c:6320 napi_poll net/core/dev.c:6764 [inline] net_rx_action+0x326/0xa30 net/core/dev.c:6834 __do_softirq+0x132/0x40b kernel/softirq.c:298 run_ksoftirqd kernel/softirq.c:653 [inline] run_ksoftirqd+0x21/0x40 kernel/softirq.c:645 smpboot_thread_fn+0x3e6/0x560 kernel/smpboot.c:165 kthread+0x24f/0x280 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296 freed by task 13615: skb_free_head net/core/skbuff.c:595 [inline] skb_release_data+0x499/0x4e0 net/core/skbuff.c:615 skb_release_all net/core/skbuff.c:669 [inline] __kfree_skb+0x34/0x50 net/core/skbuff.c:683 sk_wmem_free_skb include/net/sock.h:1546 [inline] tcp_rtx_queue_unlink_and_free include/net/tcp.h:1856 [inline] tcp_clean_rtx_queue net/ipv4/tcp_input.c:3251 [inline] tcp_ack+0x124a/0x3450 net/ipv4/tcp_input.c:3795 tcp_rcv_established+0x367/0x10b0 net/ipv4/tcp_input.c:5858 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 sk_backlog_rcv include/net/sock.h:1010 [inline] __release_sock+0xd7/0x260 net/core/sock.c:2523 release_sock+0x40/0x120 net/core/sock.c:3053 sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 sock_recvmsg_nosec net/socket.c:885 [inline] sock_recvmsg net/socket.c:903 [inline] sock_recvmsg net/socket.c:899 [inline] ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 __sys_recvmsg+0x8b/0x130 net/socket.c:2641 __do_sys_recvmsg net/socket.c:2651 [inline] __se_sys_recvmsg net/socket.c:2648 [inline] __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 --------------------------------- kfence-#2 [0xffff888436818c00-0xffff888436818e7f, size=640, cache=kmalloc-1k] allocated by task 0: __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 alloc_skb include/linux/skbuff.h:1094 [inline] __tcp_send_ack.part.0+0x47/0x3c0 net/ipv4/tcp_output.c:3945 __tcp_send_ack net/ipv4/tcp_output.c:3977 [inline] tcp_send_ack+0x47/0x50 net/ipv4/tcp_output.c:3977 __tcp_ack_snd_check+0xb2/0x530 net/ipv4/tcp_input.c:5400 tcp_ack_snd_check net/ipv4/tcp_input.c:5445 [inline] tcp_rcv_established+0x5c2/0x10b0 net/ipv4/tcp_input.c:5870 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 tcp_v4_rcv+0x1e29/0x20c0 net/ipv4/tcp_ipv4.c:2050 ip_protocol_deliver_rcu+0x31/0x4f0 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x111/0x150 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip_local_deliver+0x244/0x250 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:447 [inline] ip_rcv_finish+0x14a/0x1d0 net/ipv4/ip_input.c:428 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip_rcv+0x1c4/0x1d0 net/ipv4/ip_input.c:539 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5316 __netif_receive_skb+0x29/0xe0 net/core/dev.c:5430 process_backlog+0x169/0x350 net/core/dev.c:6320 napi_poll net/core/dev.c:6764 [inline] net_rx_action+0x326/0xa30 net/core/dev.c:6834 __do_softirq+0x132/0x40b kernel/softirq.c:298 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x58/0x70 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0xcd/0x110 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x38/0xd0 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline] arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline] default_idle+0xe/0x10 arch/x86/kernel/process.c:688 default_idle_call+0x32/0x50 kernel/sched/idle.c:98 cpuidle_idle_call kernel/sched/idle.c:168 [inline] do_idle+0x207/0x270 kernel/sched/idle.c:273 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:370 secondary_startup_64_no_verify+0xb0/0xbb freed by task 0: skb_free_head net/core/skbuff.c:595 [inline] skb_release_data+0x499/0x4e0 net/core/skbuff.c:615 skb_release_all net/core/skbuff.c:669 [inline] __kfree_skb+0x34/0x50 net/core/skbuff.c:683 tcp_data_queue+0x1801/0x2560 net/ipv4/tcp_input.c:4927 tcp_rcv_established+0x52c/0x10b0 net/ipv4/tcp_input.c:5867 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 tcp_v4_rcv+0x1e29/0x20c0 net/ipv4/tcp_ipv4.c:2050 ip_protocol_deliver_rcu+0x31/0x4f0 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x111/0x150 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip_local_deliver+0x244/0x250 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:447 [inline] ip_rcv_finish+0x14a/0x1d0 net/ipv4/ip_input.c:428 NF_HOOK include/linux/netfilter.h:301 [inline] NF_HOOK include/linux/netfilter.h:295 [inline] ip_rcv+0x1c4/0x1d0 net/ipv4/ip_input.c:539 __netif_receive_skb_one_core+0xb3/0xe0 net/core/dev.c:5316 __netif_receive_skb+0x29/0xe0 net/core/dev.c:5430 process_backlog+0x169/0x350 net/core/dev.c:6320 napi_poll net/core/dev.c:6764 [inline] net_rx_action+0x326/0xa30 net/core/dev.c:6834 __do_softirq+0x132/0x40b kernel/softirq.c:298 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x58/0x70 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0xcd/0x110 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x38/0xd0 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline] arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline] default_idle+0xe/0x10 arch/x86/kernel/process.c:688 default_idle_call+0x32/0x50 kernel/sched/idle.c:98 cpuidle_idle_call kernel/sched/idle.c:168 [inline] do_idle+0x207/0x270 kernel/sched/idle.c:273 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:370 secondary_startup_64_no_verify+0xb0/0xbb --------------------------------- kfence-#3 [0xffff88843681ac00-0xffff88843681ae7f, size=640, cache=kmalloc-1k] allocated by task 17012: __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 alloc_skb_fclone include/linux/skbuff.h:1144 [inline] sk_stream_alloc_skb+0xd3/0x650 net/ipv4/tcp.c:888 tso_fragment net/ipv4/tcp_output.c:2124 [inline] tcp_write_xmit+0x1366/0x3510 net/ipv4/tcp_output.c:2674 __tcp_push_pending_frames+0x68/0x1f0 net/ipv4/tcp_output.c:2866 tcp_push_pending_frames include/net/tcp.h:1864 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5374 [inline] tcp_rcv_established+0x57c/0x10b0 net/ipv4/tcp_input.c:5869 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 sk_backlog_rcv include/net/sock.h:1010 [inline] __release_sock+0xd7/0x260 net/core/sock.c:2523 release_sock+0x40/0x120 net/core/sock.c:3053 sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 sock_recvmsg_nosec net/socket.c:885 [inline] sock_recvmsg net/socket.c:903 [inline] sock_recvmsg net/socket.c:899 [inline] ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 __sys_recvmsg+0x8b/0x130 net/socket.c:2641 __do_sys_recvmsg net/socket.c:2651 [inline] __se_sys_recvmsg net/socket.c:2648 [inline] __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 freed by task 17012: skb_free_head net/core/skbuff.c:595 [inline] skb_release_data+0x499/0x4e0 net/core/skbuff.c:615 skb_release_all net/core/skbuff.c:669 [inline] __kfree_skb+0x34/0x50 net/core/skbuff.c:683 tcp_drop net/ipv4/tcp_input.c:4618 [inline] tcp_prune_ofo_queue+0x14b/0x3e0 net/ipv4/tcp_input.c:5250 tcp_prune_queue net/ipv4/tcp_input.c:5307 [inline] tcp_try_rmem_schedule+0x9a2/0xbc0 net/ipv4/tcp_input.c:4680 tcp_data_queue_ofo net/ipv4/tcp_input.c:4701 [inline] tcp_data_queue+0x2dd/0x2560 net/ipv4/tcp_input.c:5015 tcp_rcv_established+0x52c/0x10b0 net/ipv4/tcp_input.c:5867 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 sk_backlog_rcv include/net/sock.h:1010 [inline] __release_sock+0xd7/0x260 net/core/sock.c:2523 release_sock+0x40/0x120 net/core/sock.c:3053 sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 sock_recvmsg_nosec net/socket.c:885 [inline] sock_recvmsg net/socket.c:903 [inline] sock_recvmsg net/socket.c:899 [inline] ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 __sys_recvmsg+0x8b/0x130 net/socket.c:2641 __do_sys_recvmsg net/socket.c:2651 [inline] __se_sys_recvmsg net/socket.c:2648 [inline] __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 --------------------------------- kfence-#4 [0xffff88843681c000-0xffff88843681c27f, size=640, cache=kmalloc-1k] allocated by task 0: __kmalloc_reserve net/core/skbuff.c:142 [inline] __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 alloc_skb_fclone include/linux/skbuff.h:1144 [inline] sk_stream_alloc_skb+0xd3/0x650 net/ipv4/tcp.c:888 tso_fragment net/ipv4/tcp_output.c:2124 [inline] tcp_write_xmit+0x1366/0x3510 net/ipv4/tcp_output.c:2674 tcp_send_loss_probe+0x337/0x4c0 net/ipv4/tcp_output.c:2804 tcp_write_timer_handler+0x4d4/0x5d0 net/ipv4/tcp_timer.c:606 tcp_write_timer+0x86/0x270 net/ipv4/tcp_timer.c:630 call_timer_fn+0x145/0x510 kernel/time/timer.c:1417 expire_timers kernel/time/timer.c:1462 [inline] __run_timers.part.0+0x519/0x680 kernel/time/timer.c:1731 __run_timers kernel/time/timer.c:1712 [inline] run_timer_softirq+0x6f/0x110 kernel/time/timer.c:1744 __do_softirq+0x132/0x40b kernel/softirq.c:298 asm_call_irq_on_stack+0xf/0x20 __run_on_irqstack arch/x86/include/asm/irq_stack.h:26 [inline] run_on_irqstack_cond arch/x86/include/asm/irq_stack.h:77 [inline] do_softirq_own_stack+0x58/0x70 arch/x86/kernel/irq_64.c:77 invoke_softirq kernel/softirq.c:393 [inline] __irq_exit_rcu kernel/softirq.c:423 [inline] irq_exit_rcu+0xcd/0x110 kernel/softirq.c:435 sysvec_apic_timer_interrupt+0x38/0xd0 arch/x86/kernel/apic/apic.c:1096 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:628 native_safe_halt arch/x86/include/asm/irqflags.h:60 [inline] arch_safe_halt arch/x86/include/asm/irqflags.h:103 [inline] default_idle+0xe/0x10 arch/x86/kernel/process.c:688 default_idle_call+0x32/0x50 kernel/sched/idle.c:98 cpuidle_idle_call kernel/sched/idle.c:168 [inline] do_idle+0x207/0x270 kernel/sched/idle.c:273 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:370 secondary_startup_64_no_verify+0xb0/0xbb freed by task 17200: skb_free_head net/core/skbuff.c:595 [inline] skb_release_data+0x499/0x4e0 net/core/skbuff.c:615 skb_release_all net/core/skbuff.c:669 [inline] __kfree_skb+0x34/0x50 net/core/skbuff.c:683 sk_wmem_free_skb include/net/sock.h:1546 [inline] tcp_rtx_queue_unlink_and_free include/net/tcp.h:1856 [inline] tcp_shifted_skb+0x4f8/0x960 net/ipv4/tcp_input.c:1464 tcp_shift_skb_data net/ipv4/tcp_input.c:1607 [inline] tcp_sacktag_walk+0x7e0/0xc40 net/ipv4/tcp_input.c:1670 tcp_sacktag_write_queue+0xd5e/0x1b50 net/ipv4/tcp_input.c:1931 tcp_ack+0x1fcd/0x3450 net/ipv4/tcp_input.c:3758 tcp_rcv_established+0x367/0x10b0 net/ipv4/tcp_input.c:5858 tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 sk_backlog_rcv include/net/sock.h:1010 [inline] __release_sock+0xd7/0x260 net/core/sock.c:2523 release_sock+0x40/0x120 net/core/sock.c:3053 sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 sock_recvmsg_nosec net/socket.c:885 [inline] sock_recvmsg net/socket.c:903 [inline] sock_recvmsg net/socket.c:899 [inline] ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 __sys_recvmsg+0x8b/0x130 net/socket.c:2641 __do_sys_recvmsg net/socket.c:2651 [inline] __se_sys_recvmsg net/socket.c:2648 [inline] __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-09 12:47 ` Marco Elver @ 2020-12-10 16:51 ` Marco Elver 2020-12-10 17:14 ` Eric Dumazet 0 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-10 16:51 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Wed, Dec 09, 2020 at 01:47PM +0100, Marco Elver wrote: > On Tue, Dec 08, 2020 at 08:06PM +0100, Marco Elver wrote: > > On Thu, 3 Dec 2020 at 19:01, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On 12/3/20 6:41 PM, Marco Elver wrote: > > > > > > > One more experiment -- simply adding > > > > > > > > --- a/net/core/skbuff.c > > > > +++ b/net/core/skbuff.c > > > > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > > > */ > > > > size = SKB_DATA_ALIGN(size); > > > > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > + size = 1 << kmalloc_index(size); /* HACK */ > > > > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > > > > > > > > > > also got rid of the warnings. Something must be off with some value that > > > > is computed in terms of ksize(). If not, I don't have any explanation > > > > for why the above hides the problem. > > > > > > Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) > > > hae some kind of assumptions, I will double check this. > > > > If I force kfence to return 4K sized allocations for everything, the > > warnings remain. That might suggest that it's not due to a missed > > ALIGN. > > > > Is it possible that copies or moves are a problem? E.g. we copy > > something from kfence -> non-kfence object (or vice-versa), and > > ksize() no longer matches, then things go wrong? > > I was able to narrow it down to allocations of size 640. I also narrowed > it down to 5 allocations that go through kfence that start triggering > the issue. I have attached the list of those 5 allocations with > allocation + free stacks. I'll try to go through them, maybe I get > lucky eventually. :-) [...] > kfence-#3 [0xffff88843681ac00-0xffff88843681ae7f, size=640, cache=kmalloc-1k] allocated by task 17012: > __kmalloc_reserve net/core/skbuff.c:142 [inline] > __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 > alloc_skb_fclone include/linux/skbuff.h:1144 [inline] > sk_stream_alloc_skb+0xd3/0x650 net/ipv4/tcp.c:888 > tso_fragment net/ipv4/tcp_output.c:2124 [inline] > tcp_write_xmit+0x1366/0x3510 net/ipv4/tcp_output.c:2674 > __tcp_push_pending_frames+0x68/0x1f0 net/ipv4/tcp_output.c:2866 > tcp_push_pending_frames include/net/tcp.h:1864 [inline] > tcp_data_snd_check net/ipv4/tcp_input.c:5374 [inline] > tcp_rcv_established+0x57c/0x10b0 net/ipv4/tcp_input.c:5869 > tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 > sk_backlog_rcv include/net/sock.h:1010 [inline] > __release_sock+0xd7/0x260 net/core/sock.c:2523 > release_sock+0x40/0x120 net/core/sock.c:3053 > sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 > tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 > inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 > sock_recvmsg_nosec net/socket.c:885 [inline] > sock_recvmsg net/socket.c:903 [inline] > sock_recvmsg net/socket.c:899 [inline] > ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 > ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 > __sys_recvmsg+0x8b/0x130 net/socket.c:2641 > __do_sys_recvmsg net/socket.c:2651 [inline] > __se_sys_recvmsg net/socket.c:2648 [inline] > __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 > do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 So I started putting gdb to work, and whenever I see an allocation exactly like the above that goes through tso_fragment() a warning immediately follows. Long story short, I somehow synthesized this patch that appears to fix things, but I can't explain why exactly: | --- a/net/core/skbuff.c | +++ b/net/core/skbuff.c | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, | | skb_metadata_clear(skb); | | - /* It is not generally safe to change skb->truesize. | - * For the moment, we really care of rx path, or | - * when skb is orphaned (not attached to a socket). | - */ | - if (!skb->sk || skb->destructor == sock_edemux) | - skb->truesize += size - osize; | - | return 0; | | nofrags: Now, here are the breadcrumbs I followed: 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=<optimized out>, size=<optimized out>) at ./include/linux/skbuff.h:1144 | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 | #6 0xffffffff84d41c36 in tso_fragment (gfp=<optimized out>, mss_now=<optimized out>, len=<optimized out>, | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_output.c:2124 | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) | at net/ipv4/tcp_output.c:2674 | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) | at ./include/net/sock.h:918 | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 | #10 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 | [...] Set watchpoint on skb->truesize: | (gdb) frame 3 | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 | 217 size = SKB_WITH_OVERHEAD(ksize(data)); | (gdb) p &skb->truesize | $5 = (unsigned int *) 0xffff888117f55f90 | (gdb) awatch *0xffff888117f55f90 | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 2. Some time later, we see that the skb with kfence-allocated data is cloned: | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 | | Value = 1570 | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 | 1002 C(truesize); | (gdb) bt | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=<optimized out>, clone_it=<optimized out>, skb=<optimized out>, | sk=<optimized out>) at ./include/linux/tcp.h:439 | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) | at net/ipv4/tcp_output.c:2688 | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) | at ./include/net/sock.h:918 | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 | #7 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=<optimized out>, sk=<optimized out>) at ./include/net/sock.h:1010 [...] 3. The original skb (that was cloned) has its truesize adjusted after a pskb_expand_head(): | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 | | Old value = 1570 | New value = 1954 ^^ the difference between the old and the new value is exactly 384, which is also the final underflow of the sk_wmem_queued that triggers the warning. Presumably if the original allocation had been through kmalloc-1k and not KFENCE, the difference here would have been 0, since ksize() of the original allocation in step (1) would have been 1024, and not 640 (difference of 384). | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) | at net/core/skbuff.c:1687 | 1687 skb->truesize += size - osize; | (gdb) bt | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) | at net/core/skbuff.c:1687 | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=<optimized out>) at ./arch/x86/include/asm/atomic.h:29 | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=<optimized out>, pcount=<optimized out>, from=<optimized out>, | to=<optimized out>) at net/ipv4/tcp_input.c:1497 | #5 tcp_shift_skb_data (dup_sack=<optimized out>, end_seq=<optimized out>, start_seq=<optimized out>, state=<optimized out>, | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_input.c:1605 | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, | state=0xffffffff88fa1aa0 <watchpoints+192>, start_seq=0, end_seq=401956752, dup_sack_in=false) | at net/ipv4/tcp_input.c:1670 | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, | state=0xffffffff88fa1aa0 <watchpoints+192>) at net/ipv4/tcp_input.c:1931 | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 [...] Any of this make sense? Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-10 16:51 ` Marco Elver @ 2020-12-10 17:14 ` Eric Dumazet 2020-12-10 19:01 ` Marco Elver 0 siblings, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2020-12-10 17:14 UTC (permalink / raw) To: Marco Elver Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, Dec 10, 2020 at 5:51 PM Marco Elver <elver@google.com> wrote: > > On Wed, Dec 09, 2020 at 01:47PM +0100, Marco Elver wrote: > > On Tue, Dec 08, 2020 at 08:06PM +0100, Marco Elver wrote: > > > On Thu, 3 Dec 2020 at 19:01, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On 12/3/20 6:41 PM, Marco Elver wrote: > > > > > > > > > One more experiment -- simply adding > > > > > > > > > > --- a/net/core/skbuff.c > > > > > +++ b/net/core/skbuff.c > > > > > @@ -207,7 +207,21 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, > > > > > */ > > > > > size = SKB_DATA_ALIGN(size); > > > > > size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > > > > + size = 1 << kmalloc_index(size); /* HACK */ > > > > > data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); > > > > > > > > > > > > > > > also got rid of the warnings. Something must be off with some value that > > > > > is computed in terms of ksize(). If not, I don't have any explanation > > > > > for why the above hides the problem. > > > > > > > > Maybe the implementations of various macros (SKB_DATA_ALIGN and friends) > > > > hae some kind of assumptions, I will double check this. > > > > > > If I force kfence to return 4K sized allocations for everything, the > > > warnings remain. That might suggest that it's not due to a missed > > > ALIGN. > > > > > > Is it possible that copies or moves are a problem? E.g. we copy > > > something from kfence -> non-kfence object (or vice-versa), and > > > ksize() no longer matches, then things go wrong? > > > > I was able to narrow it down to allocations of size 640. I also narrowed > > it down to 5 allocations that go through kfence that start triggering > > the issue. I have attached the list of those 5 allocations with > > allocation + free stacks. I'll try to go through them, maybe I get > > lucky eventually. :-) > > [...] > > > kfence-#3 [0xffff88843681ac00-0xffff88843681ae7f, size=640, cache=kmalloc-1k] allocated by task 17012: > > __kmalloc_reserve net/core/skbuff.c:142 [inline] > > __alloc_skb+0xb8/0x3f0 net/core/skbuff.c:210 > > alloc_skb_fclone include/linux/skbuff.h:1144 [inline] > > sk_stream_alloc_skb+0xd3/0x650 net/ipv4/tcp.c:888 > > tso_fragment net/ipv4/tcp_output.c:2124 [inline] > > tcp_write_xmit+0x1366/0x3510 net/ipv4/tcp_output.c:2674 > > __tcp_push_pending_frames+0x68/0x1f0 net/ipv4/tcp_output.c:2866 > > tcp_push_pending_frames include/net/tcp.h:1864 [inline] > > tcp_data_snd_check net/ipv4/tcp_input.c:5374 [inline] > > tcp_rcv_established+0x57c/0x10b0 net/ipv4/tcp_input.c:5869 > > tcp_v4_do_rcv+0x361/0x4c0 net/ipv4/tcp_ipv4.c:1668 > > sk_backlog_rcv include/net/sock.h:1010 [inline] > > __release_sock+0xd7/0x260 net/core/sock.c:2523 > > release_sock+0x40/0x120 net/core/sock.c:3053 > > sk_wait_data+0x127/0x2b0 net/core/sock.c:2565 > > tcp_recvmsg+0x1106/0x1b60 net/ipv4/tcp.c:2181 > > inet_recvmsg+0xb1/0x270 net/ipv4/af_inet.c:848 > > sock_recvmsg_nosec net/socket.c:885 [inline] > > sock_recvmsg net/socket.c:903 [inline] > > sock_recvmsg net/socket.c:899 [inline] > > ____sys_recvmsg+0x2fd/0x3a0 net/socket.c:2563 > > ___sys_recvmsg+0xd9/0x1b0 net/socket.c:2605 > > __sys_recvmsg+0x8b/0x130 net/socket.c:2641 > > __do_sys_recvmsg net/socket.c:2651 [inline] > > __se_sys_recvmsg net/socket.c:2648 [inline] > > __x64_sys_recvmsg+0x43/0x50 net/socket.c:2648 > > do_syscall_64+0x34/0x80 arch/x86/entry/common.c:46 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > So I started putting gdb to work, and whenever I see an allocation > exactly like the above that goes through tso_fragment() a warning > immediately follows. > > Long story short, I somehow synthesized this patch that appears to fix > things, but I can't explain why exactly: > > | --- a/net/core/skbuff.c > | +++ b/net/core/skbuff.c > | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > | > | skb_metadata_clear(skb); > | > | - /* It is not generally safe to change skb->truesize. > | - * For the moment, we really care of rx path, or > | - * when skb is orphaned (not attached to a socket). > | - */ > | - if (!skb->sk || skb->destructor == sock_edemux) > | - skb->truesize += size - osize; > | - > | return 0; > | > | nofrags: > > Now, here are the breadcrumbs I followed: > > > 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: > > | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 > | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 > | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=<optimized out>, size=<optimized out>) at ./include/linux/skbuff.h:1144 > | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 > | #6 0xffffffff84d41c36 in tso_fragment (gfp=<optimized out>, mss_now=<optimized out>, len=<optimized out>, > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_output.c:2124 > | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) > | at net/ipv4/tcp_output.c:2674 > | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > | at ./include/net/sock.h:918 > | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > | #10 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 > | [...] > > Set watchpoint on skb->truesize: > > | (gdb) frame 3 > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > | 217 size = SKB_WITH_OVERHEAD(ksize(data)); > | (gdb) p &skb->truesize > | $5 = (unsigned int *) 0xffff888117f55f90 > | (gdb) awatch *0xffff888117f55f90 > | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > 2. Some time later, we see that the skb with kfence-allocated data > is cloned: > > | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > | > | Value = 1570 > | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > | 1002 C(truesize); > | (gdb) bt > | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 > | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, > | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 > | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=<optimized out>, clone_it=<optimized out>, skb=<optimized out>, > | sk=<optimized out>) at ./include/linux/tcp.h:439 > | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) > | at net/ipv4/tcp_output.c:2688 > | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > | at ./include/net/sock.h:918 > | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > | #7 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 > | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=<optimized out>, sk=<optimized out>) at ./include/net/sock.h:1010 > [...] > > > 3. The original skb (that was cloned) has its truesize adjusted > after a pskb_expand_head(): > > | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > | > | Old value = 1570 > | New value = 1954 > > ^^ the difference between the old and the new value is exactly > 384, which is also the final underflow of the sk_wmem_queued > that triggers the warning. Presumably if the original allocation > had been through kmalloc-1k and not KFENCE, the difference here > would have been 0, since ksize() of the original allocation in > step (1) would have been 1024, and not 640 (difference of 384). > > | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > | at net/core/skbuff.c:1687 > | 1687 skb->truesize += size - osize; > | (gdb) bt > | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > | at net/core/skbuff.c:1687 > | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=<optimized out>) at ./arch/x86/include/asm/atomic.h:29 > | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 > | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 > | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=<optimized out>, pcount=<optimized out>, from=<optimized out>, > | to=<optimized out>) at net/ipv4/tcp_input.c:1497 > | #5 tcp_shift_skb_data (dup_sack=<optimized out>, end_seq=<optimized out>, start_seq=<optimized out>, state=<optimized out>, > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_input.c:1605 > | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, > | state=0xffffffff88fa1aa0 <watchpoints+192>, start_seq=0, end_seq=401956752, dup_sack_in=false) > | at net/ipv4/tcp_input.c:1670 > | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, > | state=0xffffffff88fa1aa0 <watchpoints+192>) at net/ipv4/tcp_input.c:1931 > | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 > | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 > | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 > [...] > > > Any of this make sense? Very nice debugging ! I guess we could fix this in skb_prepare_for_shift(), eventually caring for the truesize manipulation (or reverting the change done in pskb_expand_head(), since only kfence is having this issue. (All TCP skbs in output path have the same allocation size for skb->head) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); */ static int skb_prepare_for_shift(struct sk_buff *skb) { - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + unsigned int ret = 0, save; + + if (skb_cloned(skb)) { + save = skb->truesize; + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); + skb->truesize = save; + } + return ret; } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-10 17:14 ` Eric Dumazet @ 2020-12-10 19:01 ` Marco Elver 2020-12-14 10:09 ` Marco Elver 0 siblings, 1 reply; 18+ messages in thread From: Marco Elver @ 2020-12-10 19:01 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, 10 Dec 2020 at 18:14, Eric Dumazet <edumazet@google.com> wrote: > On Thu, Dec 10, 2020 at 5:51 PM Marco Elver <elver@google.com> wrote: [...] > > So I started putting gdb to work, and whenever I see an allocation > > exactly like the above that goes through tso_fragment() a warning > > immediately follows. > > > > Long story short, I somehow synthesized this patch that appears to fix > > things, but I can't explain why exactly: > > > > | --- a/net/core/skbuff.c > > | +++ b/net/core/skbuff.c > > | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > | > > | skb_metadata_clear(skb); > > | > > | - /* It is not generally safe to change skb->truesize. > > | - * For the moment, we really care of rx path, or > > | - * when skb is orphaned (not attached to a socket). > > | - */ > > | - if (!skb->sk || skb->destructor == sock_edemux) > > | - skb->truesize += size - osize; > > | - > > | return 0; > > | > > | nofrags: > > > > Now, here are the breadcrumbs I followed: > > > > > > 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: > > > > | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 > > | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 > > | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=<optimized out>, size=<optimized out>) at ./include/linux/skbuff.h:1144 > > | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 > > | #6 0xffffffff84d41c36 in tso_fragment (gfp=<optimized out>, mss_now=<optimized out>, len=<optimized out>, > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_output.c:2124 > > | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) > > | at net/ipv4/tcp_output.c:2674 > > | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > | at ./include/net/sock.h:918 > > | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > | #10 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 > > | [...] > > > > Set watchpoint on skb->truesize: > > > > | (gdb) frame 3 > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > | 217 size = SKB_WITH_OVERHEAD(ksize(data)); > > | (gdb) p &skb->truesize > > | $5 = (unsigned int *) 0xffff888117f55f90 > > | (gdb) awatch *0xffff888117f55f90 > > | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > 2. Some time later, we see that the skb with kfence-allocated data > > is cloned: > > > > | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > | > > | Value = 1570 > > | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > | 1002 C(truesize); > > | (gdb) bt > > | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 > > | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, > > | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 > > | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=<optimized out>, clone_it=<optimized out>, skb=<optimized out>, > > | sk=<optimized out>) at ./include/linux/tcp.h:439 > > | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) > > | at net/ipv4/tcp_output.c:2688 > > | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > | at ./include/net/sock.h:918 > > | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > | #7 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 > > | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=<optimized out>, sk=<optimized out>) at ./include/net/sock.h:1010 > > [...] > > > > > > 3. The original skb (that was cloned) has its truesize adjusted > > after a pskb_expand_head(): > > > > | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > | > > | Old value = 1570 > > | New value = 1954 > > > > ^^ the difference between the old and the new value is exactly > > 384, which is also the final underflow of the sk_wmem_queued > > that triggers the warning. Presumably if the original allocation > > had been through kmalloc-1k and not KFENCE, the difference here > > would have been 0, since ksize() of the original allocation in > > step (1) would have been 1024, and not 640 (difference of 384). > > > > | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > | at net/core/skbuff.c:1687 > > | 1687 skb->truesize += size - osize; > > | (gdb) bt > > | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > | at net/core/skbuff.c:1687 > > | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=<optimized out>) at ./arch/x86/include/asm/atomic.h:29 > > | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 > > | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 > > | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=<optimized out>, pcount=<optimized out>, from=<optimized out>, > > | to=<optimized out>) at net/ipv4/tcp_input.c:1497 > > | #5 tcp_shift_skb_data (dup_sack=<optimized out>, end_seq=<optimized out>, start_seq=<optimized out>, state=<optimized out>, > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_input.c:1605 > > | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, > > | state=0xffffffff88fa1aa0 <watchpoints+192>, start_seq=0, end_seq=401956752, dup_sack_in=false) > > | at net/ipv4/tcp_input.c:1670 > > | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, > > | state=0xffffffff88fa1aa0 <watchpoints+192>) at net/ipv4/tcp_input.c:1931 > > | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 > > | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 > > | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 > > [...] > > > > > > Any of this make sense? > > Very nice debugging ! > > I guess we could fix this in skb_prepare_for_shift(), eventually > caring for the truesize manipulation > (or reverting the change done in pskb_expand_head(), since only kfence > is having this issue. Phew, good to hear I finally got lucky. :-) Either option is fine, as long as it avoids this problem in future. Hopefully it can be fixed for 5.11. > (All TCP skbs in output path have the same allocation size for skb->head) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 > 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); > */ > static int skb_prepare_for_shift(struct sk_buff *skb) > { > - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > + unsigned int ret = 0, save; > + > + if (skb_cloned(skb)) { > + save = skb->truesize; > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > + skb->truesize = save; > + } > + return ret; > } FWIW, Tested-by: Marco Elver <elver@google.com> Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-10 19:01 ` Marco Elver @ 2020-12-14 10:09 ` Marco Elver 2020-12-14 10:47 ` Eric Dumazet 2021-02-03 10:25 ` Marco Elver 0 siblings, 2 replies; 18+ messages in thread From: Marco Elver @ 2020-12-14 10:09 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Thu, 10 Dec 2020 at 20:01, Marco Elver <elver@google.com> wrote: > On Thu, 10 Dec 2020 at 18:14, Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Dec 10, 2020 at 5:51 PM Marco Elver <elver@google.com> wrote: > [...] > > > So I started putting gdb to work, and whenever I see an allocation > > > exactly like the above that goes through tso_fragment() a warning > > > immediately follows. > > > > > > Long story short, I somehow synthesized this patch that appears to fix > > > things, but I can't explain why exactly: > > > > > > | --- a/net/core/skbuff.c > > > | +++ b/net/core/skbuff.c > > > | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > > | > > > | skb_metadata_clear(skb); > > > | > > > | - /* It is not generally safe to change skb->truesize. > > > | - * For the moment, we really care of rx path, or > > > | - * when skb is orphaned (not attached to a socket). > > > | - */ > > > | - if (!skb->sk || skb->destructor == sock_edemux) > > > | - skb->truesize += size - osize; > > > | - > > > | return 0; > > > | > > > | nofrags: > > > > > > Now, here are the breadcrumbs I followed: > > > > > > > > > 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: > > > > > > | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 > > > | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 > > > | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=<optimized out>, size=<optimized out>) at ./include/linux/skbuff.h:1144 > > > | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 > > > | #6 0xffffffff84d41c36 in tso_fragment (gfp=<optimized out>, mss_now=<optimized out>, len=<optimized out>, > > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_output.c:2124 > > > | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) > > > | at net/ipv4/tcp_output.c:2674 > > > | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > | at ./include/net/sock.h:918 > > > | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > > | #10 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > > | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > > | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 > > > | [...] > > > > > > Set watchpoint on skb->truesize: > > > > > > | (gdb) frame 3 > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > | 217 size = SKB_WITH_OVERHEAD(ksize(data)); > > > | (gdb) p &skb->truesize > > > | $5 = (unsigned int *) 0xffff888117f55f90 > > > | (gdb) awatch *0xffff888117f55f90 > > > | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > > > 2. Some time later, we see that the skb with kfence-allocated data > > > is cloned: > > > > > > | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > | > > > | Value = 1570 > > > | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > | 1002 C(truesize); > > > | (gdb) bt > > > | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 > > > | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, > > > | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 > > > | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=<optimized out>, clone_it=<optimized out>, skb=<optimized out>, > > > | sk=<optimized out>) at ./include/linux/tcp.h:439 > > > | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) > > > | at net/ipv4/tcp_output.c:2688 > > > | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > | at ./include/net/sock.h:918 > > > | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > > | #7 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > > | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > > | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 > > > | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=<optimized out>, sk=<optimized out>) at ./include/net/sock.h:1010 > > > [...] > > > > > > > > > 3. The original skb (that was cloned) has its truesize adjusted > > > after a pskb_expand_head(): > > > > > > | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > | > > > | Old value = 1570 > > > | New value = 1954 > > > > > > ^^ the difference between the old and the new value is exactly > > > 384, which is also the final underflow of the sk_wmem_queued > > > that triggers the warning. Presumably if the original allocation > > > had been through kmalloc-1k and not KFENCE, the difference here > > > would have been 0, since ksize() of the original allocation in > > > step (1) would have been 1024, and not 640 (difference of 384). > > > > > > | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > | at net/core/skbuff.c:1687 > > > | 1687 skb->truesize += size - osize; > > > | (gdb) bt > > > | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > | at net/core/skbuff.c:1687 > > > | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=<optimized out>) at ./arch/x86/include/asm/atomic.h:29 > > > | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 > > > | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 > > > | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=<optimized out>, pcount=<optimized out>, from=<optimized out>, > > > | to=<optimized out>) at net/ipv4/tcp_input.c:1497 > > > | #5 tcp_shift_skb_data (dup_sack=<optimized out>, end_seq=<optimized out>, start_seq=<optimized out>, state=<optimized out>, > > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_input.c:1605 > > > | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, > > > | state=0xffffffff88fa1aa0 <watchpoints+192>, start_seq=0, end_seq=401956752, dup_sack_in=false) > > > | at net/ipv4/tcp_input.c:1670 > > > | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, > > > | state=0xffffffff88fa1aa0 <watchpoints+192>) at net/ipv4/tcp_input.c:1931 > > > | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 > > > | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 > > > | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 > > > [...] > > > > > > > > > Any of this make sense? > > > > Very nice debugging ! > > > > I guess we could fix this in skb_prepare_for_shift(), eventually > > caring for the truesize manipulation > > (or reverting the change done in pskb_expand_head(), since only kfence > > is having this issue. > > Phew, good to hear I finally got lucky. :-) > > Either option is fine, as long as it avoids this problem in future. > Hopefully it can be fixed for 5.11. > > > (All TCP skbs in output path have the same allocation size for skb->head) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 > > 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); > > */ > > static int skb_prepare_for_shift(struct sk_buff *skb) > > { > > - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > + unsigned int ret = 0, save; > > + > > + if (skb_cloned(skb)) { > > + save = skb->truesize; > > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > + skb->truesize = save; > > + } > > + return ret; > > } > > FWIW, > > Tested-by: Marco Elver <elver@google.com> Has this patch, or similar, already been sent? Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-14 10:09 ` Marco Elver @ 2020-12-14 10:47 ` Eric Dumazet 2021-01-11 8:55 ` Marco Elver 2021-02-03 10:25 ` Marco Elver 1 sibling, 1 reply; 18+ messages in thread From: Eric Dumazet @ 2020-12-14 10:47 UTC (permalink / raw) To: Marco Elver Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Mon, Dec 14, 2020 at 11:09 AM Marco Elver <elver@google.com> wrote: > > On Thu, 10 Dec 2020 at 20:01, Marco Elver <elver@google.com> wrote: > > On Thu, 10 Dec 2020 at 18:14, Eric Dumazet <edumazet@google.com> wrote: > > > On Thu, Dec 10, 2020 at 5:51 PM Marco Elver <elver@google.com> wrote: > > [...] > > > > So I started putting gdb to work, and whenever I see an allocation > > > > exactly like the above that goes through tso_fragment() a warning > > > > immediately follows. > > > > > > > > Long story short, I somehow synthesized this patch that appears to fix > > > > things, but I can't explain why exactly: > > > > > > > > | --- a/net/core/skbuff.c > > > > | +++ b/net/core/skbuff.c > > > > | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > > > | > > > > | skb_metadata_clear(skb); > > > > | > > > > | - /* It is not generally safe to change skb->truesize. > > > > | - * For the moment, we really care of rx path, or > > > > | - * when skb is orphaned (not attached to a socket). > > > > | - */ > > > > | - if (!skb->sk || skb->destructor == sock_edemux) > > > > | - skb->truesize += size - osize; > > > > | - > > > > | return 0; > > > > | > > > > | nofrags: > > > > > > > > Now, here are the breadcrumbs I followed: > > > > > > > > > > > > 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: > > > > > > > > | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 > > > > | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 > > > > | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 > > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > > | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=<optimized out>, size=<optimized out>) at ./include/linux/skbuff.h:1144 > > > > | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 > > > > | #6 0xffffffff84d41c36 in tso_fragment (gfp=<optimized out>, mss_now=<optimized out>, len=<optimized out>, > > > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_output.c:2124 > > > > | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) > > > > | at net/ipv4/tcp_output.c:2674 > > > > | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > > | at ./include/net/sock.h:918 > > > > | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > > > | #10 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > > > | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > > > | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 > > > > | [...] > > > > > > > > Set watchpoint on skb->truesize: > > > > > > > > | (gdb) frame 3 > > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > > | 217 size = SKB_WITH_OVERHEAD(ksize(data)); > > > > | (gdb) p &skb->truesize > > > > | $5 = (unsigned int *) 0xffff888117f55f90 > > > > | (gdb) awatch *0xffff888117f55f90 > > > > | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > > > > > 2. Some time later, we see that the skb with kfence-allocated data > > > > is cloned: > > > > > > > > | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > | > > > > | Value = 1570 > > > > | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > > | 1002 C(truesize); > > > > | (gdb) bt > > > > | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > > | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 > > > > | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, > > > > | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 > > > > | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=<optimized out>, clone_it=<optimized out>, skb=<optimized out>, > > > > | sk=<optimized out>) at ./include/linux/tcp.h:439 > > > > | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) > > > > | at net/ipv4/tcp_output.c:2688 > > > > | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > > | at ./include/net/sock.h:918 > > > > | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > > > | #7 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > > > | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > > > | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 > > > > | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=<optimized out>, sk=<optimized out>) at ./include/net/sock.h:1010 > > > > [...] > > > > > > > > > > > > 3. The original skb (that was cloned) has its truesize adjusted > > > > after a pskb_expand_head(): > > > > > > > > | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > | > > > > | Old value = 1570 > > > > | New value = 1954 > > > > > > > > ^^ the difference between the old and the new value is exactly > > > > 384, which is also the final underflow of the sk_wmem_queued > > > > that triggers the warning. Presumably if the original allocation > > > > had been through kmalloc-1k and not KFENCE, the difference here > > > > would have been 0, since ksize() of the original allocation in > > > > step (1) would have been 1024, and not 640 (difference of 384). > > > > > > > > | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > > | at net/core/skbuff.c:1687 > > > > | 1687 skb->truesize += size - osize; > > > > | (gdb) bt > > > > | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > > | at net/core/skbuff.c:1687 > > > > | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=<optimized out>) at ./arch/x86/include/asm/atomic.h:29 > > > > | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 > > > > | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 > > > > | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=<optimized out>, pcount=<optimized out>, from=<optimized out>, > > > > | to=<optimized out>) at net/ipv4/tcp_input.c:1497 > > > > | #5 tcp_shift_skb_data (dup_sack=<optimized out>, end_seq=<optimized out>, start_seq=<optimized out>, state=<optimized out>, > > > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_input.c:1605 > > > > | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, > > > > | state=0xffffffff88fa1aa0 <watchpoints+192>, start_seq=0, end_seq=401956752, dup_sack_in=false) > > > > | at net/ipv4/tcp_input.c:1670 > > > > | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, > > > > | state=0xffffffff88fa1aa0 <watchpoints+192>) at net/ipv4/tcp_input.c:1931 > > > > | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 > > > > | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 > > > > | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 > > > > [...] > > > > > > > > > > > > Any of this make sense? > > > > > > Very nice debugging ! > > > > > > I guess we could fix this in skb_prepare_for_shift(), eventually > > > caring for the truesize manipulation > > > (or reverting the change done in pskb_expand_head(), since only kfence > > > is having this issue. > > > > Phew, good to hear I finally got lucky. :-) > > > > Either option is fine, as long as it avoids this problem in future. > > Hopefully it can be fixed for 5.11. > > > > > (All TCP skbs in output path have the same allocation size for skb->head) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 > > > 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); > > > */ > > > static int skb_prepare_for_shift(struct sk_buff *skb) > > > { > > > - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > + unsigned int ret = 0, save; > > > + > > > + if (skb_cloned(skb)) { > > > + save = skb->truesize; > > > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > + skb->truesize = save; > > > + } > > > + return ret; > > > } > > > > FWIW, > > > > Tested-by: Marco Elver <elver@google.com> > > Has this patch, or similar, already been sent? Not yet, we have few weeks left before 5.11 is released ;) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-14 10:47 ` Eric Dumazet @ 2021-01-11 8:55 ` Marco Elver 0 siblings, 0 replies; 18+ messages in thread From: Marco Elver @ 2021-01-11 8:55 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Mon, 14 Dec 2020 at 11:47, Eric Dumazet <edumazet@google.com> wrote: > On Mon, Dec 14, 2020 at 11:09 AM Marco Elver <elver@google.com> wrote: > > On Thu, 10 Dec 2020 at 20:01, Marco Elver <elver@google.com> wrote: > > > On Thu, 10 Dec 2020 at 18:14, Eric Dumazet <edumazet@google.com> wrote: > > > > On Thu, Dec 10, 2020 at 5:51 PM Marco Elver <elver@google.com> wrote: [...] > > > Either option is fine, as long as it avoids this problem in future. > > > Hopefully it can be fixed for 5.11. > > > > > > > (All TCP skbs in output path have the same allocation size for skb->head) > > > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > > index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 > > > > 100644 > > > > --- a/net/core/skbuff.c > > > > +++ b/net/core/skbuff.c > > > > @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); > > > > */ > > > > static int skb_prepare_for_shift(struct sk_buff *skb) > > > > { > > > > - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > > + unsigned int ret = 0, save; > > > > + > > > > + if (skb_cloned(skb)) { > > > > + save = skb->truesize; > > > > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > > + skb->truesize = save; > > > > + } > > > > + return ret; > > > > } > > > > > > FWIW, > > > > > > Tested-by: Marco Elver <elver@google.com> > > > > Has this patch, or similar, already been sent? > > > Not yet, we have few weeks left before 5.11 is released ;) Ping. Though KFENCE has been dropped from 5.11, just a reminder so we get this fixed for 5.12. Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: WARNING in sk_stream_kill_queues (5) 2020-12-14 10:09 ` Marco Elver 2020-12-14 10:47 ` Eric Dumazet @ 2021-02-03 10:25 ` Marco Elver 1 sibling, 0 replies; 18+ messages in thread From: Marco Elver @ 2021-02-03 10:25 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, netdev, Andrew Morton, David Miller, Dmitry Vyukov, Alexander Potapenko, Jann Horn, Jakub Kicinski, LKML, Stephen Rothwell, syzkaller-bugs, Willem de Bruijn, syzbot On Mon, 14 Dec 2020 at 11:09, Marco Elver <elver@google.com> wrote: > On Thu, 10 Dec 2020 at 20:01, Marco Elver <elver@google.com> wrote: > > On Thu, 10 Dec 2020 at 18:14, Eric Dumazet <edumazet@google.com> wrote: > > > On Thu, Dec 10, 2020 at 5:51 PM Marco Elver <elver@google.com> wrote: > > [...] > > > > So I started putting gdb to work, and whenever I see an allocation > > > > exactly like the above that goes through tso_fragment() a warning > > > > immediately follows. > > > > > > > > Long story short, I somehow synthesized this patch that appears to fix > > > > things, but I can't explain why exactly: > > > > > > > > | --- a/net/core/skbuff.c > > > > | +++ b/net/core/skbuff.c > > > > | @@ -1679,13 +1679,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail, > > > > | > > > > | skb_metadata_clear(skb); > > > > | > > > > | - /* It is not generally safe to change skb->truesize. > > > > | - * For the moment, we really care of rx path, or > > > > | - * when skb is orphaned (not attached to a socket). > > > > | - */ > > > > | - if (!skb->sk || skb->destructor == sock_edemux) > > > > | - skb->truesize += size - osize; > > > > | - > > > > | return 0; > > > > | > > > > | nofrags: > > > > > > > > Now, here are the breadcrumbs I followed: > > > > > > > > > > > > 1. Breakpoint on kfence_ksize() -- first allocation that matches the above: > > > > > > > > | #0 __kfence_ksize (s=18446612700164612096) at mm/kfence/core.c:726 > > > > | #1 0xffffffff816fbf30 in kfence_ksize (addr=0xffff888436856000) at mm/kfence/core.c:737 > > > > | #2 0xffffffff816217cf in ksize (objp=0xffff888436856000) at mm/slab_common.c:1178 > > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > > | #4 0xffffffff84d0ba73 in alloc_skb_fclone (priority=<optimized out>, size=<optimized out>) at ./include/linux/skbuff.h:1144 > > > > | #5 sk_stream_alloc_skb (sk=0xffff8881176cc000, size=0, gfp=2592, force_schedule=232) at net/ipv4/tcp.c:888 > > > > | #6 0xffffffff84d41c36 in tso_fragment (gfp=<optimized out>, mss_now=<optimized out>, len=<optimized out>, > > > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_output.c:2124 > > > > | #7 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=21950, nonagle=3096, push_one=-1996874776, gfp=0) > > > > | at net/ipv4/tcp_output.c:2674 > > > > | #8 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > > | at ./include/net/sock.h:918 > > > > | #9 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > > > | #10 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > > > | #11 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > > > | #12 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f52ea0) at net/ipv4/tcp_ipv4.c:1668 > > > > | [...] > > > > > > > > Set watchpoint on skb->truesize: > > > > > > > > | (gdb) frame 3 > > > > | #3 0xffffffff84896911 in __alloc_skb (size=914710528, gfp_mask=2592, flags=0, node=-1) at net/core/skbuff.c:217 > > > > | 217 size = SKB_WITH_OVERHEAD(ksize(data)); > > > > | (gdb) p &skb->truesize > > > > | $5 = (unsigned int *) 0xffff888117f55f90 > > > > | (gdb) awatch *0xffff888117f55f90 > > > > | Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > > > > > 2. Some time later, we see that the skb with kfence-allocated data > > > > is cloned: > > > > > > > > | Thread 7 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > | > > > > | Value = 1570 > > > > | 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > > | 1002 C(truesize); > > > > | (gdb) bt > > > > | #0 0xffffffff84886947 in __skb_clone (n=0xffff888117f55fa0, skb=0xffff888117f55ec0) at net/core/skbuff.c:1002 > > > > | #1 0xffffffff8488bfb9 in skb_clone (skb=0xffff888117f55ec0, gfp_mask=2592) at net/core/skbuff.c:1454 > > > > | #2 0xffffffff84d3cd1c in __tcp_transmit_skb (sk=0xffff8881176cc000, skb=0xffff888117f55ec0, clone_it=0, gfp_mask=2592, > > > > | rcv_nxt=0) at net/ipv4/tcp_output.c:1267 > > > > | #3 0xffffffff84d4125b in tcp_transmit_skb (gfp_mask=<optimized out>, clone_it=<optimized out>, skb=<optimized out>, > > > > | sk=<optimized out>) at ./include/linux/tcp.h:439 > > > > | #4 tcp_write_xmit (sk=0xffff8881176cc000, mss_now=392485600, nonagle=1326, push_one=-1996875104, gfp=0) > > > > | at net/ipv4/tcp_output.c:2688 > > > > | #5 0xffffffff84d43e48 in __tcp_push_pending_frames (sk=0xffff8881176cc000, cur_mss=337, nonagle=0) > > > > | at ./include/net/sock.h:918 > > > > | #6 0xffffffff84d3259c in tcp_push_pending_frames (sk=<optimized out>) at ./include/net/tcp.h:1864 > > > > | #7 tcp_data_snd_check (sk=<optimized out>) at net/ipv4/tcp_input.c:5374 > > > > | #8 tcp_rcv_established (sk=0xffff8881176cc000, skb=0x0 <fixed_percpu_data>) at net/ipv4/tcp_input.c:5869 > > > > | #9 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f57820) at net/ipv4/tcp_ipv4.c:1668 > > > > | #10 0xffffffff8487bf67 in sk_backlog_rcv (skb=<optimized out>, sk=<optimized out>) at ./include/net/sock.h:1010 > > > > [...] > > > > > > > > > > > > 3. The original skb (that was cloned) has its truesize adjusted > > > > after a pskb_expand_head(): > > > > > > > > | Thread 2 hit Hardware access (read/write) watchpoint 6: *0xffff888117f55f90 > > > > | > > > > | Old value = 1570 > > > > | New value = 1954 > > > > > > > > ^^ the difference between the old and the new value is exactly > > > > 384, which is also the final underflow of the sk_wmem_queued > > > > that triggers the warning. Presumably if the original allocation > > > > had been through kmalloc-1k and not KFENCE, the difference here > > > > would have been 0, since ksize() of the original allocation in > > > > step (1) would have been 1024, and not 640 (difference of 384). > > > > > > > > | 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > > | at net/core/skbuff.c:1687 > > > > | 1687 skb->truesize += size - osize; > > > > | (gdb) bt > > > > | #0 0xffffffff8488d84b in pskb_expand_head (skb=0xffff888117f55ec0, nhead=401956752, ntail=1954, gfp_mask=2298092192) > > > > | at net/core/skbuff.c:1687 > > > > | #1 0xffffffff8488de01 in skb_prepare_for_shift (skb=<optimized out>) at ./arch/x86/include/asm/atomic.h:29 > > > > | #2 skb_prepare_for_shift (skb=0xffff888117f55ec0) at net/core/skbuff.c:3276 > > > > | #3 0xffffffff848936b1 in skb_shift (tgt=0xffff888117f549c0, skb=0xffff888117f55ec0, shiftlen=674) at net/core/skbuff.c:3351 > > > > | #4 0xffffffff84d264de in tcp_skb_shift (shiftlen=<optimized out>, pcount=<optimized out>, from=<optimized out>, > > > > | to=<optimized out>) at net/ipv4/tcp_input.c:1497 > > > > | #5 tcp_shift_skb_data (dup_sack=<optimized out>, end_seq=<optimized out>, start_seq=<optimized out>, state=<optimized out>, > > > > | skb=<optimized out>, sk=<optimized out>) at net/ipv4/tcp_input.c:1605 > > > > | #6 tcp_sacktag_walk (skb=0xffff888117f55ec0, sk=0xffff8881176cc000, next_dup=0x894, > > > > | state=0xffffffff88fa1aa0 <watchpoints+192>, start_seq=0, end_seq=401956752, dup_sack_in=false) > > > > | at net/ipv4/tcp_input.c:1670 > > > > | #7 0xffffffff84d276de in tcp_sacktag_write_queue (sk=0xffff888117f55f90, ack_skb=0x1888117f55f90, prior_snd_una=2196, > > > > | state=0xffffffff88fa1aa0 <watchpoints+192>) at net/ipv4/tcp_input.c:1931 > > > > | #8 0xffffffff84d2ca1d in tcp_ack (sk=0xffff8881176cc000, skb=0x1888117f55f90, flag=16643) at net/ipv4/tcp_input.c:3758 > > > > | #9 0xffffffff84d32387 in tcp_rcv_established (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_input.c:5858 > > > > | #10 0xffffffff84d56731 in tcp_v4_do_rcv (sk=0xffff8881176cc000, skb=0xffff888117f54020) at net/ipv4/tcp_ipv4.c:1668 > > > > [...] > > > > > > > > > > > > Any of this make sense? > > > > > > Very nice debugging ! > > > > > > I guess we could fix this in skb_prepare_for_shift(), eventually > > > caring for the truesize manipulation > > > (or reverting the change done in pskb_expand_head(), since only kfence > > > is having this issue. > > > > Phew, good to hear I finally got lucky. :-) > > > > Either option is fine, as long as it avoids this problem in future. > > Hopefully it can be fixed for 5.11. > > > > > (All TCP skbs in output path have the same allocation size for skb->head) > > > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index e578544b2cc7110ec2f6bcf4c29d93e4b4b1ad14..798b51eeeaa4fbed65d41d9eab207dbbf438dab3 > > > 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -3270,7 +3270,14 @@ EXPORT_SYMBOL(skb_split); > > > */ > > > static int skb_prepare_for_shift(struct sk_buff *skb) > > > { > > > - return skb_cloned(skb) && pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > + unsigned int ret = 0, save; > > > + > > > + if (skb_cloned(skb)) { > > > + save = skb->truesize; > > > + ret = pskb_expand_head(skb, 0, 0, GFP_ATOMIC); > > > + skb->truesize = save; > > > + } > > > + return ret; > > > } > > > > FWIW, > > > > Tested-by: Marco Elver <elver@google.com> > > Has this patch, or similar, already been sent? This is now in net-next: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=097b9146c0e2 Thanks, -- Marco ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-02-03 10:27 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-30 8:40 WARNING in sk_stream_kill_queues (5) syzbot 2020-12-03 15:58 ` Marco Elver 2020-12-03 16:27 ` Eric Dumazet 2020-12-03 16:34 ` Marco Elver 2020-12-03 16:42 ` Eric Dumazet 2020-12-03 17:41 ` Marco Elver 2020-12-03 18:01 ` Eric Dumazet 2020-12-07 15:30 ` Marco Elver 2020-12-07 16:28 ` Marco Elver 2020-12-08 19:06 ` Marco Elver 2020-12-09 12:47 ` Marco Elver 2020-12-10 16:51 ` Marco Elver 2020-12-10 17:14 ` Eric Dumazet 2020-12-10 19:01 ` Marco Elver 2020-12-14 10:09 ` Marco Elver 2020-12-14 10:47 ` Eric Dumazet 2021-01-11 8:55 ` Marco Elver 2021-02-03 10:25 ` Marco Elver
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.