Linux-ARM-Kernel Archive on
 help / color / Atom feed
From: Sean Christopherson <>
To: Christoffer Dall <>
Cc:, Marc Zyngier <>,, Ard Biesheuvel <>,
	Paolo Bonzini <>,,
Subject: Re: Memory regions and VMAs across architectures
Date: Wed, 20 Nov 2019 07:28:07 -0800
Message-ID: <> (raw)
In-Reply-To: <>

On Wed, Nov 20, 2019 at 12:52:16PM +0100, Christoffer Dall wrote:
> On Tue, Nov 19, 2019 at 07:44:48PM -0800, Sean Christopherson wrote:
> > On Fri, Nov 08, 2019 at 12:19:20PM +0100, Christoffer Dall wrote:
> > > First, what prevents user space from messing around with the VMAs after
> > > kvm_arch_prepare_memory_region() completes?  If nothing, then what is
> > > the value of the cheks we perform wrt. to VMAs?
> > 
> > Arm's prepare_memory_region() holds mmap_sem and mmu_lock while processing
> > the VMAs and populating the stage 2 page tables.  Holding mmap_sem prevents
> > the VMAs from being invalidated while the stage 2 tables are populated,
> > e.g. prevents racing with the mmu notifier.  The VMAs could be modified
> > after prepare_memory_region(), but the mmu notifier will ensure they are
> > unmapped from stage2 prior the the host change taking effect.  So I think
> > you're safe (famous last words).
> > 
> So we for example check:
> 	writeable = !(memslot->falgs & KVM_MEM_READONLY);
> 	if (writeable && !(vma->vm_flags & VM_WRITE))
> 		return -EPERM;
> And yes, user space can then unmap the VMAs and MMU notifiers will
> unmap the stage 2 entries, but user space can then create a new
> read-only VMA covering the area of the memslot and the fault-handling
> path will have to deal with this same check later.Only, the fault
> handling path, via gfn_to_pfn_prot(), returns an address based on an
> entirely different set of mechanics, than our prepare_memory_region,
> which I think indicates we are doing something wrong somewhere, and we
> should have a common path for faulting things in, for I/O, both if we do
> this up-front or if we do this at fault time.

Unconditionally interpreting vm_pgoff as a physical address does not seem
correct.  There are cases where that might be correct, e.g. if the backing
(virtual) file is a flat representation of the address space, which appears
to be the case on some architectures, e.g. for PCI handling.  But even then
there should be some confirmation that the VMA is actually associated with
such a file, otherwise KVM is at the mercy of userspace to do the right
thing (unless there are other guarantees on arm I am unaware of).

> > > Second, why would arm/arm64 need special handling for I/O mappings
> > > compared to other architectures, and how is this dealt with for
> > > x86/s390/power/... ?
> > 
> > As Ard mentioned, it looks like an optimization.  The "passthrough"
> > part from the changelog implies that VM_PFNMAP memory regions are exclusive
> > to the guest.  Mapping the entire thing would be a nice boot optimization
> > as it would save taking page faults on every page of the MMIO region.
> > 
> > As for how this is different from other archs... at least on x86, VM_PFNMAP
> > isn't guaranteed to be passthrough or even MMIO, e.g. prefaulting the
> > pages may actually trigger allocation, and remapping the addresses could be
> > flat out wrong.
> What does VM_PFNMAP mean on x86?  I didn't think we were relying on
> anything architecture specific in their meaning in the arm code, and I
> thought the VM_PFNMAP was a generic mm flag with generic mm meaning,
> but I could be wrong here?

No, you're correct, VM_PFNMAP is a generic flag that state the VMA doesn't
have an associated struct page and is being managed directly by something
other than the core mmu.

But not having a struct page doesn't guarantee that the PFN is backed by
MMIO, or that it is exclusive to the guest (although in practice this is
probably the case 99.9999% of the time).  E.g. x86 supports having guest
memory backed by regular ram that is hidden from the host kernel via
'mem=', which will show up as VM_PFNMAP.

> Is there any valid semantics for creating a memslot backed by a
> VM_PFNMAP on x86, and if so, what are those?
> Similarly, if you do map a device region straight to the guest on x86,
> how is that handled?  (A pointer to the right place in the myriad of EPT
> and shadow code in x86 would be much appreciated.)

There is no special handling in x86 for VM_PFNMAP memory, we rely on KVM's
generic __gfn_to_pfn_memslot() to retrieve the PFN on demand, and use
mmu_notifier_seq to ensure the stale PFNs (invalidated in the host) aren't
inserted into the guest page tables.  Effectively the same thing arm does,
sans the prepare_memory_region() shenanigans.

linux-arm-kernel mailing list

  reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08 11:19 Christoffer Dall
2019-11-08 13:59 ` Ard Biesheuvel
2019-11-20  3:44 ` Sean Christopherson
2019-11-20 11:52   ` Christoffer Dall
2019-11-20 15:28     ` Sean Christopherson [this message]
2019-11-21  9:40       ` Christoffer Dall

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on

Archives are clonable:
	git clone --mirror linux-arm-kernel/git/0.git
	git clone --mirror linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ \
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone