Re: KASAN: use-after-free Read in handle_userfault (2)

From: Andrea Arcangeli <aarcange@redhat.com>
To: Dmitry Vyukov <dvyukov@google.com>
Cc: syzbot+cbc64b24b2b2d54c07a9@syzkaller.appspotmail.com,
	Sasha Levin <Alexander.Levin@microsoft.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Peter Xu <peterx@redhat.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: KASAN: use-after-free Read in handle_userfault (2)
Date: Fri, 4 Jan 2019 18:10:13 -0500	[thread overview]
Message-ID: <20190104231013.GB4924@redhat.com> (raw)
In-Reply-To: <CACT4Y+Y+ZdBPXGhytZycqcLc_7GOR-VZB8KKuX_qUgxmiVT60w@mail.gmail.com>

On Wed, Jan 02, 2019 at 02:37:58PM +0100, Dmitry Vyukov wrote:
> If we are proceeding with "mm: some enhancements to the page fault
> mechanism", that's good as it will eliminate at least part of this
> output.

Agreed.

> There are 2 types of debug configs: ones add additional checks for
> machines and another add verbose output for humans. CONFIG_DEBUG_VM
> seems to be more of the first type of debug config -- additional
> checks for machines. I've seen some of the second type debug configs
> are prefixed with DEBUG_VERBOSE or something along these lines. Maybe
> it makes sense to split this out of CONFIG_DEBUG_VM. Since it's a
> "gray" check (rather then white/black check), it can't be used in
> CI/fuzzing setups anyways -- not possible to analyse thousands of
> cases manually (though maybe we actually hitting some that can be
> classified as kernel bugs).

The problem with the DEBUG_VERBOSE is that if the kernel needs a
rebuild it's only marginally better than having to ask the developer
to add a one liner dump_stack before rebuilding the kernel.

Next time we need it, it may be simpler to figure a dynamic tracing
trick than to ask a rebuild.

In addition of pursuing the VM_FAULT_RETRY improvements from Peter,
it'd be fine to drop it for now to avoid confusing the machines.

> Yes, the problem is way more general. As you noted it applies to 100K
> of EINVAL|EFAULT|ENOMEM, it's super hard to figure out what/where
> exactly goes wrong in the kernel getting only -22. But at the same

What stands out for this location is that it's the bailout point
of all non uffd compatible syscalls.

The chances such annotation turns out to be useful is much higher than
a random -EINVAL location, but in principle it's the same kind of
issue and I agree it'd be unpractical to annotate them manually.

> time we can't have all of these 100K places dump stacks. I don't know
> what's a good solution for this. Manually annotating 100K places does
> not look like the right way to go. Maybe kprobes can do this? In some
> cases I used CONFIG_KCOV  and kcovtrace
> (https://github.com/google/syzkaller/blob/master/tools/kcovtrace/kcovtrace.c)
> to collect kernel trace from a failing syscall.

If there is a de facto standard to search for a "syscall bailout call
trace" that can provide us that very same dump_stack, that should work
for this uffd issue too indeed.

KCOV may or may not be enabled in enterprise -debug kernels, but the
main limitation of the kcovtrace.c seems to be the lack of threading
support and the privilege inheritance through fork. While it's not
mandatory, the uffd manager is practically always implemented as a
thread of some process so whatever alternative to dump_stack() should
work with threads.

I think kprobes/ebpf should work provided debug info is available
(which is not always guaranteed), otherwise to obtain it without debug
info, it'd require a ftrace static trace point to use with the
function graph tracer I guess.

Thanks,
Andrea