From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753775AbdJaTNm (ORCPT <rfc822;w@1wt.eu>);
        Tue, 31 Oct 2017 15:13:42 -0400
Received: from mx1.redhat.com ([209.132.183.28]:41828 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751366AbdJaTNk (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 31 Oct 2017 15:13:40 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 715662D26B6
Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com
Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=aarcange@redhat.com
Date: Tue, 31 Oct 2017 20:13:36 +0100
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>, Dmitry Vyukov <dvyukov@google.com>,
        syzbot 
        <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>,
        Jan Beulich <JBeulich@suse.com>, "H. Peter Anvin" <hpa@zytor.com>,
        Josh Poimboeuf <jpoimboe@redhat.com>,
        "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
        Laurent Dufour <ldufour@linux.vnet.ibm.com>,
        LKML <linux-kernel@vger.kernel.org>, Andy Lutomirski <luto@kernel.org>,
        Ingo Molnar <mingo@redhat.com>, syzkaller-bugs@googlegroups.com,
        Thomas Gleixner <tglx@linutronix.de>,
        the arch/x86 maintainers <x86@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Michal Hocko <mhocko@suse.com>, Hugh Dickins <hughd@google.com>,
        David Rientjes <rientjes@google.com>, linux-mm <linux-mm@kvack.org>,
        Thorsten Leemhuis <regressions@leemhuis.info>
Subject: Re: KASAN: use-after-free Read in __do_page_fault
Message-ID: <20171031191336.GA2799@redhat.com>
References: <94eb2c0433c8f42cac055cc86991@google.com>
 <CACT4Y+YtdzYFPZfs0gjDtuHqkkZdRNwKfe-zBJex_uXUevNtBg@mail.gmail.com>
 <b9c543d1-27f9-8db7-238e-7c1305b1bff5@suse.cz>
 <CACT4Y+ZzrcHAUSG25HSi7ybKJd8gxDtimXHE_6UsowOT3wcT5g@mail.gmail.com>
 <8e92c891-a9e0-efed-f0b9-9bf567d8fbcd@suse.cz>
 <4bc852be-7ef3-0b60-6dbb-81139d25a817@suse.cz>
 <CA+55aFwWJyArZMEuo1-4+VaiP95e__cRHkVvrfiQ+NUVJ15DNQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CA+55aFwWJyArZMEuo1-4+VaiP95e__cRHkVvrfiQ+NUVJ15DNQ@mail.gmail.com>
User-Agent: Mutt/1.9.1 (2017-09-22)
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 31 Oct 2017 19:13:40 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Tue, Oct 31, 2017 at 08:37:47AM -0700, Linus Torvalds wrote:
> Yes. Accessing "vma" after calling "handle_mm_fault()" is a bug. An
> unfortunate issue with userfaultfd.
> 
> The suggested fix to simply look up pkey beforehand seems sane and simple.

Agreed.

> 
> But sadly, from a quick check, it looks like arch/um/ has the same
> bug, but even worse. It will do
> 
>  (a) handle_mm_fault() in a loop without re-calculating vma. Don't ask me why.
> 
>  (b) flush_tlb_page(vma, address); afterwards

Yes, that flush_tlb_page is unsafe. Luckily it's only using it for
vma->vm_mm so it doesn't sound major issue to fix it.

> 
> but much more importantly, I think __get_user_pages() is broken in two ways:
> 
>  - faultin_page() does:
> 
>         ret = handle_mm_fault(vma, address, fault_flags);
>         ...
>         if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE))
> 
>    (easily fixed the same way)
> 
>  - more annoyingly and harder to fix: the retry case in
> __get_user_pages(), and the VMA saving there.
> 
> Ho humm.
> 
> Andrea, looking at that get_user_pages() case, I really think it's
> userfaultfd that is broken.
> 
> Could we perhaps limit userfaultfd to _only_ do the VM_FAULT_RETRY,
> and simply fail for non-retry faults?

In the get_user_pages case we already limit it to do only
VM_FAULT_RETRY so no use after free should materialize whenever gup is
involved.

The problematic path for the return to userland (get_user_pages
returns to kernel) is this one:

	if (return_to_userland) {
		if (signal_pending(current) &&
		    !fatal_signal_pending(current)) {
			/*
			 * If we got a SIGSTOP or SIGCONT and this is
			 * a normal userland page fault, just let
			 * userland return so the signal will be
			 * handled and gdb debugging works.  The page
			 * fault code immediately after we return from
			 * this function is going to release the
			 * mmap_sem and it's not depending on it
			 * (unlike gup would if we were not to return
			 * VM_FAULT_RETRY).
			 *
			 * If a fatal signal is pending we still take
			 * the streamlined VM_FAULT_RETRY failure path
			 * and there's no need to retake the mmap_sem
			 * in such case.
			 */
			down_read(&mm->mmap_sem);
			ret = VM_FAULT_NOPAGE;
		}
	}

We could remove the above branch all together and then
handle_userfault() would always return VM_FAULT_RETRY whenever it
decides to release the mmap_sem. The above makes debugging with gdb
more user friendly and it potentially lowers the latency of signals as
signals can unblock handle_userfault. The downside is that the return
to userland cannot dereference the vma after calling handle_mm_fault.

Thanks,
Andrea