kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Price <steven.price@arm.com>
Cc: Marc Zyngier <maz@kernel.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC PATCH 0/2] MTE support for KVM guest
Date: Wed, 24 Jun 2020 15:21:31 +0100	[thread overview]
Message-ID: <20200624142131.GA27945@gaia> (raw)
In-Reply-To: <e04696b6-63de-1e25-f6f3-1da63f791754@arm.com>

On Wed, Jun 24, 2020 at 12:16:28PM +0100, Steven Price wrote:
> On 23/06/2020 18:48, Catalin Marinas wrote:
> > On Wed, Jun 17, 2020 at 01:38:42PM +0100, Steven Price wrote:
> > > These patches add support to KVM to enable MTE within a guest. It is
> > > based on Catalin's v4 MTE user space series[1].
> > > 
> > > [1] http://lkml.kernel.org/r/20200515171612.1020-1-catalin.marinas%40arm.com
> > > 
> > > Posting as an RFC as I'd like feedback on the approach taken. First a
> > > little background on how MTE fits within the architecture:
> > > 
> > > The stage 2 page tables have limited scope for controlling the
> > > availability of MTE. If a page is mapped as Normal and cached in stage 2
> > > then it's the stage 1 tables that get to choose whether the memory is
> > > tagged or not. So the only way of forbidding tags on a page from the
> > > hypervisor is to change the cacheability (or make it device memory)
> > > which would cause other problems.  Note this restriction fits the
> > > intention that a system should have all (general purpose) memory
> > > supporting tags if it support MTE, so it's not too surprising.
> > > 
> > > However, the upshot of this is that to enable MTE within a guest all
> > > pages of memory mapped into the guest as normal cached pages in stage 2
> > > *must* support MTE (i.e. we must ensure the tags are appropriately
> > > sanitised and save/restore the tags during swap etc).
> > > 
> > > My current approach is that KVM transparently upgrades any pages
> > > provided by the VMM to be tag-enabled when they are faulted in (i.e.
> > > sets the PG_mte_tagged flag on the page) which has the benefit of
> > > requiring fewer changes in the VMM. However, save/restore of the VM
> > > state still requires the VMM to have a PROT_MTE enabled mapping so that
> > > it can access the tag values. A VMM which 'forgets' to enable PROT_MTE
> > > would lose the tag values when saving/restoring (tags are RAZ/WI when
> > > PROT_MTE isn't set).
> > > 
> > > An alternative approach would be to enforce the VMM provides PROT_MTE
> > > memory in the first place. This seems appealing to prevent the above
> > > potentially unexpected gotchas with save/restore, however this would
> > > also extend to memory that you might not expect to have PROT_MTE (e.g. a
> > > shared frame buffer for an emulated graphics card).
> > 
> > As you mentioned above, if memory is mapped as Normal Cacheable at Stage
> > 2 (whether we use FWB or not), the guest is allowed to turn MTE on via
> > Stage 1. There is no way for KVM to prevent a guest from using MTE other
> > than the big HCR_EL2.ATA knob.
> > 
> > This causes potential issues since we can't guarantee that all the
> > Cacheable memory slots allocated by the VMM support MTE. If they do not,
> > the arch behaviour is "unpredictable". We also can't trust the guest to
> > not enable MTE on such Cacheable mappings.
> 
> Architecturally it seems dodgy to export any address that isn't "normal
> memory" (i.e. with tag storage) to the guest as Normal Cacheable. Although
> I'm a bit worried this might cause a regression in some existing case.

What I had in mind is some persistent memory that may be given to the
guest for direct access. This is allowed to be cacheable (write-back)
but may not have tag storage.

> > On the host kernel, mmap'ing with PROT_MTE is only allowed for anonymous
> > mappings and shmem. So requiring the VMM to always pass PROT_MTE mapped
> > ranges to KVM, irrespective of whether it's guest RAM, emulated device,
> > virtio etc. (as long as they are Cacheable), filters unsafe ranges that
> > may be mapped into guest.
> 
> That would be an easy way of doing the filtering, but it's not clear whether
> PROT_MTE is actually what the VMM wants (it most likely doesn't want to have
> tag checking enabled on the memory in user space).

From the other sub-thread, yeah, we probably don't want to mandate
PROT_MTE because of potential inadvertent tag check faults in the VMM
itself.

> > Note that in the next revision of the MTE patches I'll drop the DT
> > memory nodes checking and rely only on the CPUID information (arch
> > updated promised by the architects).
> > 
> > I see two possible ways to handle this (there may be more):
> > 
> > 1. As in your current patches, assume any Cacheable at Stage 2 can have
> >     MTE enabled at Stage 1. In addition, we need to check whether the
> >     physical memory supports MTE and it could be something simple like
> >     pfn_valid(). Is there a way to reject a memory slot passed by the
> >     VMM?
> 
> Yes pfn_valid() should have been in there. At the moment pfn_to_page() is
> called without any checks.
> 
> The problem with attempting to reject a memory slot is that the memory
> backing that slot can change. So checking at the time the slot is created
> isn't enough (although it might be a useful error checking feature).

But isn't the slot changed as a result of another VMM call? So we could
always have such check in place.

> It's not clear to me what we can do at fault time when we discover the
> memory isn't tag-capable and would have been mapped cacheable other than
> kill the VM.

Indeed, I don't have a better idea other than trying not to get in this
situation.

> > 2. Similar to 1 but instead of checking whether the pfn supports MTE, we
> >     require the VMM to only pass PROT_MTE ranges (filtering already done
> >     by the host kernel). We need a way to reject the slot and return an
> >     error to the VMM.
> > 
> > I think rejecting a slot at the Stage 2 fault time is very late. You
> > probably won't be able to do much other than killing the guest.
> 
> As above, we will struggle to catch all cases during slot creation, so I
> think we're going to have to deal with this late detection as well.

We can leave it in place as a safety check, killing the VM. My hope is
that we can detect slot creation subsequent changes.

> > Both 1 and 2 above risk breaking existing VMMs just because they happen
> > to start on an MTE-capable machine. So, can we also require the VMM to
> > explicitly opt in to MTE support in guests via some ioctl()? This in
> > turn would enable the additional checks in KVM for the MTE capability of
> > the memory slots (1 or 2 above).
> 
> Patch 2 introduces a VCPU feature which must be explicitly enabled for the
> guest to have MTE. So it won't break existing VMMs. However clearly simply
> setting that bit will likely break some configurations where not all memory
> is MTE capable.

Ah, I missed that. At least we won't break current, unaware VMMs. I
suspect the CPUID is also conditioned by this explicit enabling.

> > An alternative to an MTE enable ioctl(), if all the memory slots are set
> > up prior to the VM starting, KVM could check 1 or 2 above and decide
> > whether to expose MTE to guests (HCR_EL2.ATA).
> 
> The VMM needs to be fully aware of MTE before it's enabled by KVM otherwise
> it could lose the tags (e.g. during migration). So my preference is to make
> it an explicit opt-in.

I agree.

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2020-06-24 14:21 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17 12:38 [RFC PATCH 0/2] MTE support for KVM guest Steven Price
2020-06-17 12:38 ` [RFC PATCH 1/2] arm64: kvm: Save/restore MTE registers Steven Price
2020-06-17 14:05   ` Catalin Marinas
2020-06-18 10:43     ` Steven Price
2020-06-17 12:38 ` [RFC PATCH 2/2] arm64: kvm: Introduce MTE VCPU feature Steven Price
2020-06-17 14:38   ` Catalin Marinas
2020-06-17 15:34     ` Steven Price
2020-06-26 16:40       ` James Morse
2020-06-23 17:48 ` [RFC PATCH 0/2] MTE support for KVM guest Catalin Marinas
2020-06-24 11:16   ` Steven Price
2020-06-24 14:21     ` Catalin Marinas [this message]
2020-06-24 14:59       ` Steven Price
2020-06-24 16:24         ` Catalin Marinas
2020-06-26 17:24           ` James Morse
2020-06-23 18:05 ` Peter Maydell
2020-06-24  9:38   ` Catalin Marinas
2020-06-24 10:34     ` Dave Martin
2020-06-24 11:03       ` Steven Price
2020-06-24 11:09         ` Catalin Marinas
2020-06-24 11:18           ` Steven Price
2020-06-24 11:52             ` Catalin Marinas
2020-06-24 13:16             ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200624142131.GA27945@gaia \
    --to=catalin.marinas@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=steven.price@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).