From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932173AbcEBTLv (ORCPT ); Mon, 2 May 2016 15:11:51 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42057 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755063AbcEBTLn (ORCPT ); Mon, 2 May 2016 15:11:43 -0400 Date: Mon, 2 May 2016 21:11:41 +0200 From: Andrea Arcangeli To: Jerome Glisse Cc: "Kirill A. Shutemov" , Oleg Nesterov , Hugh Dickins , Linus Torvalds , Andrew Morton , Alex Williamson , kirill.shutemov@linux.intel.com, linux-kernel@vger.kernel.org, "linux-mm@kvack.org" Subject: Re: GUP guarantees wrt to userspace mappings Message-ID: <20160502191141.GE12310@redhat.com> References: <20160429005106.GB2847@node.shutemov.name> <20160428204542.5f2053f7@ul30vt.home> <20160429070611.GA4990@node.shutemov.name> <20160429163444.GM11700@redhat.com> <20160502104119.GA23305@node.shutemov.name> <20160502111513.GA4079@gmail.com> <20160502121402.GB23305@node.shutemov.name> <20160502133919.GB4079@gmail.com> <20160502150013.GA24419@node.shutemov.name> <20160502152249.GA5827@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160502152249.GA5827@gmail.com> User-Agent: Mutt/1.6.0 (2016-04-01) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Mon, 02 May 2016 19:11:43 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 02, 2016 at 05:22:49PM +0200, Jerome Glisse wrote: > I think this is still fine as it means that device will read only and thus > you can migrate to different page (ie the guest is not expecting to read back > anything writen by the device and device writting to the page would be illegal > and a proper IOMMU would forbid it). So it is like direct-io when you write > from anonymous memory to a file. Agreed. write=1 is so that if there's an O_DIRECT write() and the app is only reading, there will be no COW generated on shared anonymous memory/MAP_PRIVATE-filebacked. > Now that i think again about it, i don't think it exist. pmdp_collapse_flush() > will flush the tlb and thus send an IPI but get_user_pages_fast() can't be > preempted so the flush will have to wait for existing get_user_pages_fast() to > complete. Or am i missunderstanding flush ? So khugepaged is safe from GUP_fast > point of view like the comment, inside it, says. This is exactly correct, there's no race window. The IPI (or the quiescent point in case of the gup_fast RCU version) are the things that flush away get_user_pages_fast with pmdp_collapse_flush(). > Well you can't not rely on special vma here. Qemu alloc anonymous memory and > hand it over to guest, then a guest driver (ie runing in the guest not on the > host) try to map that memory and need valid DMA address for it, this is when > vfio (on the host kernel) starts pining memory of regular anonymous vma (on > the host). That same memory might back some special vma with ->mmap callback > but in the guest. Point is there is no driver on the host and no special vma. > From host point of view this is anonymous memory, but from guest POV it is > just memory. It's quite important it stays regular tmpfs/anon as device memory is managed by the device and we'd lose everything (KSM/swapping/NUMA balancing/compaction/memory-hotunplug/CMA etc..).