Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / Atom feed
From: Christoffer Dall <christoffer.dall@arm.com>
To: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: kvm@vger.kernel.org, Marc Zyngier <maz@kernel.org>,
	borntraeger@de.ibm.com, Ard Biesheuvel <ard.biesheuvel@arm.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: Memory regions and VMAs across architectures
Date: Thu, 21 Nov 2019 10:40:03 +0100
Message-ID: <20191121094003.GB22554@e113682-lin.lund.arm.com> (raw)
In-Reply-To: <20191120152807.GA32572@linux.intel.com>

On Wed, Nov 20, 2019 at 07:28:07AM -0800, Sean Christopherson wrote:
> On Wed, Nov 20, 2019 at 12:52:16PM +0100, Christoffer Dall wrote:
> > On Tue, Nov 19, 2019 at 07:44:48PM -0800, Sean Christopherson wrote:
> > > On Fri, Nov 08, 2019 at 12:19:20PM +0100, Christoffer Dall wrote:
> > > > First, what prevents user space from messing around with the VMAs after
> > > > kvm_arch_prepare_memory_region() completes?  If nothing, then what is
> > > > the value of the cheks we perform wrt. to VMAs?
> > > 
> > > Arm's prepare_memory_region() holds mmap_sem and mmu_lock while processing
> > > the VMAs and populating the stage 2 page tables.  Holding mmap_sem prevents
> > > the VMAs from being invalidated while the stage 2 tables are populated,
> > > e.g. prevents racing with the mmu notifier.  The VMAs could be modified
> > > after prepare_memory_region(), but the mmu notifier will ensure they are
> > > unmapped from stage2 prior the the host change taking effect.  So I think
> > > you're safe (famous last words).
> > > 
> > 
> > So we for example check:
> > 
> > 	writeable = !(memslot->falgs & KVM_MEM_READONLY);
> > 	if (writeable && !(vma->vm_flags & VM_WRITE))
> > 		return -EPERM;
> > 
> > And yes, user space can then unmap the VMAs and MMU notifiers will
> > unmap the stage 2 entries, but user space can then create a new
> > read-only VMA covering the area of the memslot and the fault-handling
> > path will have to deal with this same check later.Only, the fault
> > handling path, via gfn_to_pfn_prot(), returns an address based on an
> > entirely different set of mechanics, than our prepare_memory_region,
> > which I think indicates we are doing something wrong somewhere, and we
> > should have a common path for faulting things in, for I/O, both if we do
> > this up-front or if we do this at fault time.
> 
> Unconditionally interpreting vm_pgoff as a physical address does not seem
> correct.  There are cases where that might be correct, e.g. if the backing
> (virtual) file is a flat representation of the address space, which appears
> to be the case on some architectures, e.g. for PCI handling.  But even then
> there should be some confirmation that the VMA is actually associated with
> such a file, otherwise KVM is at the mercy of userspace to do the right
> thing (unless there are other guarantees on arm I am unaware of).
> 
> > > > Second, why would arm/arm64 need special handling for I/O mappings
> > > > compared to other architectures, and how is this dealt with for
> > > > x86/s390/power/... ?
> > > 
> > > As Ard mentioned, it looks like an optimization.  The "passthrough"
> > > part from the changelog implies that VM_PFNMAP memory regions are exclusive
> > > to the guest.  Mapping the entire thing would be a nice boot optimization
> > > as it would save taking page faults on every page of the MMIO region.
> > > 
> > > As for how this is different from other archs... at least on x86, VM_PFNMAP
> > > isn't guaranteed to be passthrough or even MMIO, e.g. prefaulting the
> > > pages may actually trigger allocation, and remapping the addresses could be
> > > flat out wrong.
> > 
> > What does VM_PFNMAP mean on x86?  I didn't think we were relying on
> > anything architecture specific in their meaning in the arm code, and I
> > thought the VM_PFNMAP was a generic mm flag with generic mm meaning,
> > but I could be wrong here?
> 
> No, you're correct, VM_PFNMAP is a generic flag that state the VMA doesn't
> have an associated struct page and is being managed directly by something
> other than the core mmu.
> 
> But not having a struct page doesn't guarantee that the PFN is backed by
> MMIO, or that it is exclusive to the guest (although in practice this is
> probably the case 99.9999% of the time).  E.g. x86 supports having guest
> memory backed by regular ram that is hidden from the host kernel via
> 'mem=', which will show up as VM_PFNMAP.
> 
> > Is there any valid semantics for creating a memslot backed by a
> > VM_PFNMAP on x86, and if so, what are those?
> > 
> > Similarly, if you do map a device region straight to the guest on x86,
> > how is that handled?  (A pointer to the right place in the myriad of EPT
> > and shadow code in x86 would be much appreciated.)
> 
> There is no special handling in x86 for VM_PFNMAP memory, we rely on KVM's
> generic __gfn_to_pfn_memslot() to retrieve the PFN on demand, and use
> mmu_notifier_seq to ensure the stale PFNs (invalidated in the host) aren't
> inserted into the guest page tables.  Effectively the same thing arm does,
> sans the prepare_memory_region() shenanigans.

Thanks Sean, I'll have a look at reworking said shenanigans ;)

    Christoffer

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

      reply index

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-08 11:19 Christoffer Dall
2019-11-08 13:59 ` Ard Biesheuvel
2019-11-20  3:44 ` Sean Christopherson
2019-11-20 11:52   ` Christoffer Dall
2019-11-20 15:28     ` Sean Christopherson
2019-11-21  9:40       ` Christoffer Dall [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191121094003.GB22554@e113682-lin.lund.arm.com \
    --to=christoffer.dall@arm.com \
    --cc=ard.biesheuvel@arm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ARM-Kernel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/0 linux-arm-kernel/git/0.git
	git clone --mirror https://lore.kernel.org/linux-arm-kernel/1 linux-arm-kernel/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arm-kernel linux-arm-kernel/ https://lore.kernel.org/linux-arm-kernel \
		linux-arm-kernel@lists.infradead.org
	public-inbox-index linux-arm-kernel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-arm-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git