From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752708AbaKUUPc (ORCPT ); Fri, 21 Nov 2014 15:15:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:36596 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751848AbaKUUP1 (ORCPT ); Fri, 21 Nov 2014 15:15:27 -0500 Date: Fri, 21 Nov 2014 21:14:15 +0100 From: Andrea Arcangeli To: Peter Maydell Cc: zhanghailiang , Robert Love , Dave Hansen , Jan Kara , kvm-devel , Neil Brown , Stefan Hajnoczi , QEMU Developers , KOSAKI Motohiro , Michel Lespinasse , Taras Glek , Andrew Jones , Juan Quintela , Hugh Dickins , Mel Gorman , Sasha Levin , Android Kernel Team , "Dr. David Alan Gilbert" , "Huangpeng (Peter)" , Andres Lagar-Cavilla , Christopher Covington , Anthony Liguori , Paolo Bonzini , Keith Packard , Wenchao Xia , lkml - Kernel Mailing List , Andy Lutomirski , Minchan Kim , Dmitry Adamushko , Johannes Weiner , Mike Hommey , Andrew Morton , Peter Feiner Subject: Re: [Qemu-devel] [PATCH 00/17] RFC: userfault v2 Message-ID: <20141121201415.GK4569@redhat.com> References: <1412356087-16115-1-git-send-email-aarcange@redhat.com> <544E1143.1080905@huawei.com> <20141029174607.GK19606@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Peter, On Wed, Oct 29, 2014 at 05:56:59PM +0000, Peter Maydell wrote: > On 29 October 2014 17:46, Andrea Arcangeli wrote: > > After some chat during the KVMForum I've been already thinking it > > could be beneficial for some usage to give userland the information > > about the fault being read or write > > ...I wonder if that would let us replace the current nasty > mess we use in linux-user to detect read vs write faults > (which uses a bunch of architecture-specific hacks including > in some cases "look at the insn that triggered this SEGV and > decode it to see if it was a load or a store"; see the > various cpu_signal_handler() implementations in user-exec.c). There's currently no plan to deliver to userland read access notifications of a present page, simply because the task of the userfaultfd is to handle the page fault in userland, but if the page is mapped and readable it won't fault in the first place :). I just mean it's not like gdb read watch. Even if the region would be set to PROT_NONE it would still SEGV without triggering an userfault (after all pte_present would still true because the page is still mapped despite not being readable, so in any case it wouldn't be considered a not-present page fault). If you temporarily remove the page (which requires an unavoidable TLB flush also considering if the page was previously mapped the TLB could still resolve it for reads) it would work then, because the plan is to provide read/write fault information through the userfaultfd. In theory it would be possible to deliver PROT_NONE faults through userfault too but it doesn't make much sense because PROT_NONE still requires a TLB flush, in addition to the vma modifications/splitting/rbtree-rebalance and the mmap_sem for writing as well. Temporarily removing/moving the page with remap_anon_pages shall be much better than using PROT_NONE for this (or alternative syscall name to differentiate it further from remap_file_pages, or equivalent userfaultfd command if we decide to hide the pte/pmd mangling as userfaultfd commands instead of adding new standalone syscalls). It would have the only constraint that you must mark the region MADV_DONTFORK if you intend linux-user to ever fork or it won't work reliably (that constraint is to eliminate the need of additional rmap complexity, precisely so that it doesn't turn into something more intrusive like remap_file_pages). I assume that would be a fine constraint for linux-user. Thanks, Andrea