Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Hillf Danton <hdanton@sina.com>
Cc: syzbot <syzbot+83e7f982ca045ab4405c@syzkaller.appspotmail.com>,
	 andrii@kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org,  syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] [bpf?] [trace?] possible deadlock in force_sig_info_to_task
Date: Sun, 28 Apr 2024 13:01:19 -0700	[thread overview]
Message-ID: <CAHk-=wjBvNvVggy14p9rkHA8W1ZVfoKXvW0oeX5NZWxWUv8gfQ@mail.gmail.com> (raw)
In-Reply-To: <20240427231321.3978-1-hdanton@sina.com>

On Sat, 27 Apr 2024 at 16:13, Hillf Danton <hdanton@sina.com> wrote:
>
> > -> #0 (&sighand->siglock){....}-{2:2}:
> >        check_prev_add kernel/locking/lockdep.c:3134 [inline]
> >        check_prevs_add kernel/locking/lockdep.c:3253 [inline]
> >        validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
> >        __lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
> >        lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
> >        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >        _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
> >        force_sig_info_to_task+0x68/0x580 kernel/signal.c:1334
> >        force_sig_fault_to_task kernel/signal.c:1733 [inline]
> >        force_sig_fault+0x12c/0x1d0 kernel/signal.c:1738
> >        __bad_area_nosemaphore+0x127/0x780 arch/x86/mm/fault.c:814
> >        handle_page_fault arch/x86/mm/fault.c:1505 [inline]
>
> Given page fault with runqueue locked, bpf makes trouble instead of
> helping anything in this case.

That's not the odd thing here.

Look, the callchain is:

> >        exc_page_fault+0x612/0x8e0 arch/x86/mm/fault.c:1563
> >        asm_exc_page_fault+0x26/0x30 arch/x86/include/asm/idtentry.h:623
> >        rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:48
> >        copy_user_generic arch/x86/include/asm/uaccess_64.h:110 [inline]
> >        raw_copy_from_user arch/x86/include/asm/uaccess_64.h:125 [inline]
> >        __copy_from_user_inatomic include/linux/uaccess.h:87 [inline]
> >        copy_from_user_nofault+0xbc/0x150 mm/maccess.c:125

IOW, this is all doing a copy from user with page faults disabled, and
it shouldn't have caused a signal to be sent, so the whole
__bad_area_nosemaphore -> force_sig_fault path is bad.

The *problem* here is that the page fault doesn't actually happen on a
user access, it happens on the *ret* instruction in
rep_movs_alternative itself (which doesn't have a exception fixup,
obviously, because no exception is supposed to happen there!):

  RIP: 0010:rep_movs_alternative+0x22/0x70 arch/x86/lib/copy_user_64.S:50
  Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 83 f9 40 73 40 83 f9 08
73 21 85 c9 74 0f 8a 06 88 07 48 ff c7 48 ff c6 48 ff c9 75 f1 <c3> cc
cc cc cc 66 0f 1f 84 00 00 0$
  RSP: 0000:ffffc90004137468 EFLAGS: 00050002
  RAX: ffffffff8205ce4e RBX: dffffc0000000000 RCX: 0000000000000002
  RDX: 0000000000000000 RSI: 0000000000000900 RDI: ffffc900041374e8
  RBP: ffff88802d039784 R08: 0000000000000005 R09: ffffffff8205ce37
  R10: 0000000000000003 R11: ffff88802d038000 R12: 1ffff11005a072f0
  R13: 0000000000000900 R14: 0000000000000002 R15: ffffc900041374e8

where decoding that "Code:" line gives this:

   0: f3 0f 1e fa          endbr64
   4: 48 83 f9 40          cmp    $0x40,%rcx
   8: 73 40                jae    0x4a
   a: 83 f9 08              cmp    $0x8,%ecx
   d: 73 21                jae    0x30
   f: 85 c9                test   %ecx,%ecx
  11: 74 0f                je     0x22
  13: 8a 06                mov    (%rsi),%al
  15: 88 07                mov    %al,(%rdi)
  17: 48 ff c7              inc    %rdi
  1a: 48 ff c6              inc    %rsi
  1d: 48 ff c9              dec    %rcx
  20: 75 f1                jne    0x13
  22:* c3                    ret <-- trapping instruction

but I have no idea why the 'ret' instruction would take a page fault.
It really shouldn't.

Now, it's not like 'ret' instructions can't take page faults, but it
sure shouldn't happen in the *kernel*. The reasons for page faults on
'ret' instructions are:

 - the instruction itself takes a page fault

 - the stack pointer is bogus

 - possibly because the stack *contents* are bogus (at least some x86
instructions that jump will check the destination in the jump
instruction itself, although I didn't think 'ret' was one of them)

but for the kernel, none of these actually seem to be the case
normally. And even abnormally I don't see this being an issue, since
the exception backtrace is happily shown (ie the stack looks all
good).

So this dump is just *WEIRD*.

End result: the problem is not about any kind of deadlock on circular
locking. That's just the symptom of that odd page fault that shouldn't
have happened, and that I don't quite see how it happened.

               Linus