From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753866AbaJGNiZ (ORCPT ); Tue, 7 Oct 2014 09:38:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3275 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753496AbaJGNiW (ORCPT ); Tue, 7 Oct 2014 09:38:22 -0400 Date: Tue, 7 Oct 2014 15:37:10 +0200 From: Andrea Arcangeli To: "Kirill A. Shutemov" Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-api@vger.kernel.org, Robert Love , Dave Hansen , Jan Kara , Neil Brown , Stefan Hajnoczi , Andrew Jones , KOSAKI Motohiro , Michel Lespinasse , Taras Glek , Juan Quintela , Hugh Dickins , Isaku Yamahata , Mel Gorman , Sasha Levin , Android Kernel Team , "\\\"Dr. David Alan Gilbert\\\"" , "Huangpeng (Peter)" , Andres Lagar-Cavilla , Christopher Covington , Anthony Liguori , Paolo Bonzini , Keith Packard , Wenchao Xia , Andy Lutomirski , Minchan Kim , Dmitry Adamushko , Johannes Weiner , Mike Hommey , Andrew Morton , Linus Torvalds , Peter Feiner Subject: Re: [Qemu-devel] [PATCH 10/17] mm: rmap preparation for remap_anon_pages Message-ID: <20141007133710.GA2342@redhat.com> References: <1412356087-16115-1-git-send-email-aarcange@redhat.com> <1412356087-16115-11-git-send-email-aarcange@redhat.com> <20141007111026.GD30762@node.dhcp.inet.fi> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141007111026.GD30762@node.dhcp.inet.fi> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Kirill, On Tue, Oct 07, 2014 at 02:10:26PM +0300, Kirill A. Shutemov wrote: > On Fri, Oct 03, 2014 at 07:08:00PM +0200, Andrea Arcangeli wrote: > > There's one constraint enforced to allow this simplification: the > > source pages passed to remap_anon_pages must be mapped only in one > > vma, but this is not a limitation when used to handle userland page > > faults with MADV_USERFAULT. The source addresses passed to > > remap_anon_pages should be set as VM_DONTCOPY with MADV_DONTFORK to > > avoid any risk of the mapcount of the pages increasing, if fork runs > > in parallel in another thread, before or while remap_anon_pages runs. > > Have you considered triggering COW instead of adding limitation on > pages' mapcount? The limitation looks artificial from interface POV. I haven't considered it, mostly because I see it as a feature that it returns -EBUSY. I prefer to avoid the risk of userland getting a successful retval but internally the kernel silently behaving non-zerocopy by mistake because some userland bug forgot to set MADV_DONTFORK on the src_vma. COW would be not zerocopy so it's not ok. We get sub 1msec latency for userfaults through 10gbit and we don't want to risk wasting CPU caches. I however considered allowing to extend the strict behavior (i.e. the feature) later in a backwards compatible way. We could provide a non-zerocopy beahvior with a RAP_ALLOW_COW flag that would then turn the -EBUSY error into a copy. It's also more complex to implement the cow now, so it would make the code that really matters, harder to review. So it may be preferable to extend this later in a backwards compatible way with a new RAP_ALLOW_COW flag. The current handling the flags is already written in a way that should allow backwards compatible extension with RAP_ALLOW_*: #define RAP_ALLOW_SRC_HOLES (1UL<<0) SYSCALL_DEFINE4(remap_anon_pages, unsigned long, dst_start, unsigned long, src_start, unsigned long, len, unsigned long, flags) [..] long err = -EINVAL; [..] if (flags & ~RAP_ALLOW_SRC_HOLES) return err;