From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753775AbdJaTNm (ORCPT ); Tue, 31 Oct 2017 15:13:42 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41828 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751366AbdJaTNk (ORCPT ); Tue, 31 Oct 2017 15:13:40 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 715662D26B6 Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=aarcange@redhat.com Date: Tue, 31 Oct 2017 20:13:36 +0100 From: Andrea Arcangeli To: Linus Torvalds Cc: Vlastimil Babka , Dmitry Vyukov , syzbot , Jan Beulich , "H. Peter Anvin" , Josh Poimboeuf , "Kirill A. Shutemov" , Laurent Dufour , LKML , Andy Lutomirski , Ingo Molnar , syzkaller-bugs@googlegroups.com, Thomas Gleixner , the arch/x86 maintainers , Andrew Morton , Michal Hocko , Hugh Dickins , David Rientjes , linux-mm , Thorsten Leemhuis Subject: Re: KASAN: use-after-free Read in __do_page_fault Message-ID: <20171031191336.GA2799@redhat.com> References: <94eb2c0433c8f42cac055cc86991@google.com> <8e92c891-a9e0-efed-f0b9-9bf567d8fbcd@suse.cz> <4bc852be-7ef3-0b60-6dbb-81139d25a817@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 31 Oct 2017 19:13:40 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 31, 2017 at 08:37:47AM -0700, Linus Torvalds wrote: > Yes. Accessing "vma" after calling "handle_mm_fault()" is a bug. An > unfortunate issue with userfaultfd. > > The suggested fix to simply look up pkey beforehand seems sane and simple. Agreed. > > But sadly, from a quick check, it looks like arch/um/ has the same > bug, but even worse. It will do > > (a) handle_mm_fault() in a loop without re-calculating vma. Don't ask me why. > > (b) flush_tlb_page(vma, address); afterwards Yes, that flush_tlb_page is unsafe. Luckily it's only using it for vma->vm_mm so it doesn't sound major issue to fix it. > > but much more importantly, I think __get_user_pages() is broken in two ways: > > - faultin_page() does: > > ret = handle_mm_fault(vma, address, fault_flags); > ... > if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE)) > > (easily fixed the same way) > > - more annoyingly and harder to fix: the retry case in > __get_user_pages(), and the VMA saving there. > > Ho humm. > > Andrea, looking at that get_user_pages() case, I really think it's > userfaultfd that is broken. > > Could we perhaps limit userfaultfd to _only_ do the VM_FAULT_RETRY, > and simply fail for non-retry faults? In the get_user_pages case we already limit it to do only VM_FAULT_RETRY so no use after free should materialize whenever gup is involved. The problematic path for the return to userland (get_user_pages returns to kernel) is this one: if (return_to_userland) { if (signal_pending(current) && !fatal_signal_pending(current)) { /* * If we got a SIGSTOP or SIGCONT and this is * a normal userland page fault, just let * userland return so the signal will be * handled and gdb debugging works. The page * fault code immediately after we return from * this function is going to release the * mmap_sem and it's not depending on it * (unlike gup would if we were not to return * VM_FAULT_RETRY). * * If a fatal signal is pending we still take * the streamlined VM_FAULT_RETRY failure path * and there's no need to retake the mmap_sem * in such case. */ down_read(&mm->mmap_sem); ret = VM_FAULT_NOPAGE; } } We could remove the above branch all together and then handle_userfault() would always return VM_FAULT_RETRY whenever it decides to release the mmap_sem. The above makes debugging with gdb more user friendly and it potentially lowers the latency of signals as signals can unblock handle_userfault. The downside is that the return to userland cannot dereference the vma after calling handle_mm_fault. Thanks, Andrea