All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Fuad Tabba <tabba@google.com>, Steven Price <steven.price@arm.com>
Cc: Steven Price <steven.price@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel@nongnu.org, Dave Martin <Dave.Martin@arm.com>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v17 2/6] KVM: arm64: Introduce MTE VM feature
Date: Tue, 22 Jun 2021 12:29:27 +0100	[thread overview]
Message-ID: <8735tacf3c.wl-maz@kernel.org> (raw)
In-Reply-To: <CA+EHjTxgR3LraZ1gyXjwc5YoE5dVOtCfhjELYFH35KzJSuo6EQ@mail.gmail.com>

On Mon, 21 Jun 2021 18:00:20 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi,
> 
> On Mon, Jun 21, 2021 at 12:18 PM Steven Price <steven.price@arm.com> wrote:
> >
> > Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
> > for a VM. This will expose the feature to the guest and automatically
> > tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
> > storage) to ensure that the guest cannot see stale tags, and so that
> > the tags are correctly saved/restored across swap.
> >
> > Actually exposing the new capability to user space happens in a later
> > patch.
> >
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Steven Price <steven.price@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  3 ++
> >  arch/arm64/include/asm/kvm_host.h    |  3 ++
> >  arch/arm64/kvm/hyp/exception.c       |  3 +-
> >  arch/arm64/kvm/mmu.c                 | 64 +++++++++++++++++++++++++++-
> >  arch/arm64/kvm/sys_regs.c            |  7 +++
> >  include/uapi/linux/kvm.h             |  1 +
> >  6 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 01b9857757f2..fd418955e31e 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >         if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> >             vcpu_el1_is_32bit(vcpu))
> >                 vcpu->arch.hcr_el2 |= HCR_TID2;
> > +
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               vcpu->arch.hcr_el2 |= HCR_ATA;
> >  }
> >
> >  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7cd7d5c8c4bc..afaa5333f0e4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -132,6 +132,8 @@ struct kvm_arch {
> >
> >         u8 pfr0_csv2;
> >         u8 pfr0_csv3;
> > +       /* Memory Tagging Extension enabled for the guest */
> > +       bool mte_enabled;
> >  };
> 
> nit: newline before the comment/new member
> 
> >
> >  struct kvm_vcpu_fault_info {
> > @@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >         ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >
> > +#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> >  #define kvm_vcpu_has_pmu(vcpu)                                 \
> >         (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
> >
> > diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> > index 11541b94b328..0418399e0a20 100644
> > --- a/arch/arm64/kvm/hyp/exception.c
> > +++ b/arch/arm64/kvm/hyp/exception.c
> > @@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
> >         new |= (old & PSR_C_BIT);
> >         new |= (old & PSR_V_BIT);
> >
> > -       // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               new |= PSR_TCO_BIT;
> >
> >         new |= (old & PSR_DIT_BIT);
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c10207fed2f3..52326b739357 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,45 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >         return PAGE_SIZE;
> >  }
> >
> > +/*
> > + * The page will be mapped in stage 2 as Normal Cacheable, so the VM will be
> > + * able to see the page's tags and therefore they must be initialised first. If
> > + * PG_mte_tagged is set, tags have already been initialised.
> > + *
> > + * The race in the test/set of the PG_mte_tagged flag is handled by:
> > + * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
> > + *   racing to santise the same page
> > + * - mmap_lock protects between a VM faulting a page in and the VMM performing
> > + *   an mprotect() to add VM_MTE
> > + */
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +                            unsigned long size)
> > +{
> > +       unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +       struct page *page;
> > +
> > +       if (!kvm_has_mte(kvm))
> > +               return 0;
> > +
> > +       /*
> > +        * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +        * that may not support tags.
> > +        */
> > +       page = pfn_to_online_page(pfn);
> > +
> > +       if (!page)
> > +               return -EFAULT;
> > +
> > +       for (i = 0; i < nr_pages; i++, page++) {
> > +               if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +                       mte_clear_page_tags(page_address(page));
> > +                       set_bit(PG_mte_tagged, &page->flags);
> > +               }
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> >                           unsigned long fault_status)
> > @@ -971,8 +1010,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         if (writable)
> >                 prot |= KVM_PGTABLE_PROT_W;
> >
> > -       if (fault_status != FSC_PERM && !device)
> > +       if (fault_status != FSC_PERM && !device) {
> > +               /* Check the VMM hasn't introduced a new VM_SHARED VMA */
> > +               if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
> > +                       ret = -EFAULT;
> > +                       goto out_unlock;
> > +               }
> > +               ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> > +               if (ret)
> > +                       goto out_unlock;
> > +
> 
> nit: Would it make sense to bring in sanitise_mte_tags under the
> kvm_has_mte. I know that a check is done in kvm_has_mte as well, but
> since you're already checking, it might make the code a bit clearer.

I think it makes more sense once merged with -next, as the CMO has
been moved into the PT code. I came up with the following resolution:

	if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
							   &pfn, &fault_ipa);

	if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
		/* Check the VMM hasn't introduced a new VM_SHARED VMA */
		if (!(vma->vm_flags & VM_SHARED))
			ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
		else
			ret = -EFAULT;
		if (ret)
			goto out_unlock;
	}

	if (writable)
		prot |= KVM_PGTABLE_PROT_W;


However, there is a more annoying issue here, which is that the vma is
accessed outside of the mm lock. I *think* we're safe because if an
unmap happens in parallel, the MMU notifier will kick and we will be
in one of two cases:

- the unmap occurs before we take the kvm->mmu_lock, and the mmu
  notifier seq_lock is want saves us (we will drop everything and take
  the fault again),

- it occurs once we hold the lock, and this blocks the unmap.

Either way, I'd be more confident if the shared state was sampled
inside the locked section.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Fuad Tabba <tabba@google.com>, Steven Price <steven.price@arm.com>
Cc: Juan Quintela <quintela@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Price <steven.price@arm.com>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v17 2/6] KVM: arm64: Introduce MTE VM feature
Date: Tue, 22 Jun 2021 12:29:27 +0100	[thread overview]
Message-ID: <8735tacf3c.wl-maz@kernel.org> (raw)
In-Reply-To: <CA+EHjTxgR3LraZ1gyXjwc5YoE5dVOtCfhjELYFH35KzJSuo6EQ@mail.gmail.com>

On Mon, 21 Jun 2021 18:00:20 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi,
> 
> On Mon, Jun 21, 2021 at 12:18 PM Steven Price <steven.price@arm.com> wrote:
> >
> > Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
> > for a VM. This will expose the feature to the guest and automatically
> > tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
> > storage) to ensure that the guest cannot see stale tags, and so that
> > the tags are correctly saved/restored across swap.
> >
> > Actually exposing the new capability to user space happens in a later
> > patch.
> >
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Steven Price <steven.price@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  3 ++
> >  arch/arm64/include/asm/kvm_host.h    |  3 ++
> >  arch/arm64/kvm/hyp/exception.c       |  3 +-
> >  arch/arm64/kvm/mmu.c                 | 64 +++++++++++++++++++++++++++-
> >  arch/arm64/kvm/sys_regs.c            |  7 +++
> >  include/uapi/linux/kvm.h             |  1 +
> >  6 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 01b9857757f2..fd418955e31e 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >         if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> >             vcpu_el1_is_32bit(vcpu))
> >                 vcpu->arch.hcr_el2 |= HCR_TID2;
> > +
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               vcpu->arch.hcr_el2 |= HCR_ATA;
> >  }
> >
> >  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7cd7d5c8c4bc..afaa5333f0e4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -132,6 +132,8 @@ struct kvm_arch {
> >
> >         u8 pfr0_csv2;
> >         u8 pfr0_csv3;
> > +       /* Memory Tagging Extension enabled for the guest */
> > +       bool mte_enabled;
> >  };
> 
> nit: newline before the comment/new member
> 
> >
> >  struct kvm_vcpu_fault_info {
> > @@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >         ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >
> > +#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> >  #define kvm_vcpu_has_pmu(vcpu)                                 \
> >         (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
> >
> > diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> > index 11541b94b328..0418399e0a20 100644
> > --- a/arch/arm64/kvm/hyp/exception.c
> > +++ b/arch/arm64/kvm/hyp/exception.c
> > @@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
> >         new |= (old & PSR_C_BIT);
> >         new |= (old & PSR_V_BIT);
> >
> > -       // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               new |= PSR_TCO_BIT;
> >
> >         new |= (old & PSR_DIT_BIT);
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c10207fed2f3..52326b739357 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,45 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >         return PAGE_SIZE;
> >  }
> >
> > +/*
> > + * The page will be mapped in stage 2 as Normal Cacheable, so the VM will be
> > + * able to see the page's tags and therefore they must be initialised first. If
> > + * PG_mte_tagged is set, tags have already been initialised.
> > + *
> > + * The race in the test/set of the PG_mte_tagged flag is handled by:
> > + * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
> > + *   racing to santise the same page
> > + * - mmap_lock protects between a VM faulting a page in and the VMM performing
> > + *   an mprotect() to add VM_MTE
> > + */
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +                            unsigned long size)
> > +{
> > +       unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +       struct page *page;
> > +
> > +       if (!kvm_has_mte(kvm))
> > +               return 0;
> > +
> > +       /*
> > +        * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +        * that may not support tags.
> > +        */
> > +       page = pfn_to_online_page(pfn);
> > +
> > +       if (!page)
> > +               return -EFAULT;
> > +
> > +       for (i = 0; i < nr_pages; i++, page++) {
> > +               if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +                       mte_clear_page_tags(page_address(page));
> > +                       set_bit(PG_mte_tagged, &page->flags);
> > +               }
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> >                           unsigned long fault_status)
> > @@ -971,8 +1010,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         if (writable)
> >                 prot |= KVM_PGTABLE_PROT_W;
> >
> > -       if (fault_status != FSC_PERM && !device)
> > +       if (fault_status != FSC_PERM && !device) {
> > +               /* Check the VMM hasn't introduced a new VM_SHARED VMA */
> > +               if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
> > +                       ret = -EFAULT;
> > +                       goto out_unlock;
> > +               }
> > +               ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> > +               if (ret)
> > +                       goto out_unlock;
> > +
> 
> nit: Would it make sense to bring in sanitise_mte_tags under the
> kvm_has_mte. I know that a check is done in kvm_has_mte as well, but
> since you're already checking, it might make the code a bit clearer.

I think it makes more sense once merged with -next, as the CMO has
been moved into the PT code. I came up with the following resolution:

	if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
							   &pfn, &fault_ipa);

	if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
		/* Check the VMM hasn't introduced a new VM_SHARED VMA */
		if (!(vma->vm_flags & VM_SHARED))
			ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
		else
			ret = -EFAULT;
		if (ret)
			goto out_unlock;
	}

	if (writable)
		prot |= KVM_PGTABLE_PROT_W;


However, there is a more annoying issue here, which is that the vma is
accessed outside of the mm lock. I *think* we're safe because if an
unmap happens in parallel, the MMU notifier will kick and we will be
in one of two cases:

- the unmap occurs before we take the kvm->mmu_lock, and the mmu
  notifier seq_lock is want saves us (we will drop everything and take
  the fault again),

- it occurs once we hold the lock, and this blocks the unmap.

Either way, I'd be more confident if the shared state was sampled
inside the locked section.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.


WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Fuad Tabba <tabba@google.com>, Steven Price <steven.price@arm.com>
Cc: Juan Quintela <quintela@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	qemu-devel@nongnu.org,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Price <steven.price@arm.com>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v17 2/6] KVM: arm64: Introduce MTE VM feature
Date: Tue, 22 Jun 2021 12:29:27 +0100	[thread overview]
Message-ID: <8735tacf3c.wl-maz@kernel.org> (raw)
In-Reply-To: <CA+EHjTxgR3LraZ1gyXjwc5YoE5dVOtCfhjELYFH35KzJSuo6EQ@mail.gmail.com>

On Mon, 21 Jun 2021 18:00:20 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi,
> 
> On Mon, Jun 21, 2021 at 12:18 PM Steven Price <steven.price@arm.com> wrote:
> >
> > Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
> > for a VM. This will expose the feature to the guest and automatically
> > tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
> > storage) to ensure that the guest cannot see stale tags, and so that
> > the tags are correctly saved/restored across swap.
> >
> > Actually exposing the new capability to user space happens in a later
> > patch.
> >
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Steven Price <steven.price@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  3 ++
> >  arch/arm64/include/asm/kvm_host.h    |  3 ++
> >  arch/arm64/kvm/hyp/exception.c       |  3 +-
> >  arch/arm64/kvm/mmu.c                 | 64 +++++++++++++++++++++++++++-
> >  arch/arm64/kvm/sys_regs.c            |  7 +++
> >  include/uapi/linux/kvm.h             |  1 +
> >  6 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 01b9857757f2..fd418955e31e 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >         if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> >             vcpu_el1_is_32bit(vcpu))
> >                 vcpu->arch.hcr_el2 |= HCR_TID2;
> > +
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               vcpu->arch.hcr_el2 |= HCR_ATA;
> >  }
> >
> >  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7cd7d5c8c4bc..afaa5333f0e4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -132,6 +132,8 @@ struct kvm_arch {
> >
> >         u8 pfr0_csv2;
> >         u8 pfr0_csv3;
> > +       /* Memory Tagging Extension enabled for the guest */
> > +       bool mte_enabled;
> >  };
> 
> nit: newline before the comment/new member
> 
> >
> >  struct kvm_vcpu_fault_info {
> > @@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >         ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >
> > +#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> >  #define kvm_vcpu_has_pmu(vcpu)                                 \
> >         (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
> >
> > diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> > index 11541b94b328..0418399e0a20 100644
> > --- a/arch/arm64/kvm/hyp/exception.c
> > +++ b/arch/arm64/kvm/hyp/exception.c
> > @@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
> >         new |= (old & PSR_C_BIT);
> >         new |= (old & PSR_V_BIT);
> >
> > -       // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               new |= PSR_TCO_BIT;
> >
> >         new |= (old & PSR_DIT_BIT);
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c10207fed2f3..52326b739357 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,45 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >         return PAGE_SIZE;
> >  }
> >
> > +/*
> > + * The page will be mapped in stage 2 as Normal Cacheable, so the VM will be
> > + * able to see the page's tags and therefore they must be initialised first. If
> > + * PG_mte_tagged is set, tags have already been initialised.
> > + *
> > + * The race in the test/set of the PG_mte_tagged flag is handled by:
> > + * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
> > + *   racing to santise the same page
> > + * - mmap_lock protects between a VM faulting a page in and the VMM performing
> > + *   an mprotect() to add VM_MTE
> > + */
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +                            unsigned long size)
> > +{
> > +       unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +       struct page *page;
> > +
> > +       if (!kvm_has_mte(kvm))
> > +               return 0;
> > +
> > +       /*
> > +        * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +        * that may not support tags.
> > +        */
> > +       page = pfn_to_online_page(pfn);
> > +
> > +       if (!page)
> > +               return -EFAULT;
> > +
> > +       for (i = 0; i < nr_pages; i++, page++) {
> > +               if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +                       mte_clear_page_tags(page_address(page));
> > +                       set_bit(PG_mte_tagged, &page->flags);
> > +               }
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> >                           unsigned long fault_status)
> > @@ -971,8 +1010,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         if (writable)
> >                 prot |= KVM_PGTABLE_PROT_W;
> >
> > -       if (fault_status != FSC_PERM && !device)
> > +       if (fault_status != FSC_PERM && !device) {
> > +               /* Check the VMM hasn't introduced a new VM_SHARED VMA */
> > +               if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
> > +                       ret = -EFAULT;
> > +                       goto out_unlock;
> > +               }
> > +               ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> > +               if (ret)
> > +                       goto out_unlock;
> > +
> 
> nit: Would it make sense to bring in sanitise_mte_tags under the
> kvm_has_mte. I know that a check is done in kvm_has_mte as well, but
> since you're already checking, it might make the code a bit clearer.

I think it makes more sense once merged with -next, as the CMO has
been moved into the PT code. I came up with the following resolution:

	if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
							   &pfn, &fault_ipa);

	if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
		/* Check the VMM hasn't introduced a new VM_SHARED VMA */
		if (!(vma->vm_flags & VM_SHARED))
			ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
		else
			ret = -EFAULT;
		if (ret)
			goto out_unlock;
	}

	if (writable)
		prot |= KVM_PGTABLE_PROT_W;


However, there is a more annoying issue here, which is that the vma is
accessed outside of the mm lock. I *think* we're safe because if an
unmap happens in parallel, the MMU notifier will kick and we will be
in one of two cases:

- the unmap occurs before we take the kvm->mmu_lock, and the mmu
  notifier seq_lock is want saves us (we will drop everything and take
  the fault again),

- it occurs once we hold the lock, and this blocks the unmap.

Either way, I'd be more confident if the shared state was sampled
inside the locked section.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Marc Zyngier <maz@kernel.org>
To: Fuad Tabba <tabba@google.com>, Steven Price <steven.price@arm.com>
Cc: Steven Price <steven.price@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel@nongnu.org, Dave Martin <Dave.Martin@arm.com>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v17 2/6] KVM: arm64: Introduce MTE VM feature
Date: Tue, 22 Jun 2021 12:29:27 +0100	[thread overview]
Message-ID: <8735tacf3c.wl-maz@kernel.org> (raw)
In-Reply-To: <CA+EHjTxgR3LraZ1gyXjwc5YoE5dVOtCfhjELYFH35KzJSuo6EQ@mail.gmail.com>

On Mon, 21 Jun 2021 18:00:20 +0100,
Fuad Tabba <tabba@google.com> wrote:
> 
> Hi,
> 
> On Mon, Jun 21, 2021 at 12:18 PM Steven Price <steven.price@arm.com> wrote:
> >
> > Add a new VM feature 'KVM_ARM_CAP_MTE' which enables memory tagging
> > for a VM. This will expose the feature to the guest and automatically
> > tag memory pages touched by the VM as PG_mte_tagged (and clear the tag
> > storage) to ensure that the guest cannot see stale tags, and so that
> > the tags are correctly saved/restored across swap.
> >
> > Actually exposing the new capability to user space happens in a later
> > patch.
> >
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Steven Price <steven.price@arm.com>
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h |  3 ++
> >  arch/arm64/include/asm/kvm_host.h    |  3 ++
> >  arch/arm64/kvm/hyp/exception.c       |  3 +-
> >  arch/arm64/kvm/mmu.c                 | 64 +++++++++++++++++++++++++++-
> >  arch/arm64/kvm/sys_regs.c            |  7 +++
> >  include/uapi/linux/kvm.h             |  1 +
> >  6 files changed, 79 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index 01b9857757f2..fd418955e31e 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -84,6 +84,9 @@ static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu)
> >         if (cpus_have_const_cap(ARM64_MISMATCHED_CACHE_TYPE) ||
> >             vcpu_el1_is_32bit(vcpu))
> >                 vcpu->arch.hcr_el2 |= HCR_TID2;
> > +
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               vcpu->arch.hcr_el2 |= HCR_ATA;
> >  }
> >
> >  static inline unsigned long *vcpu_hcr(struct kvm_vcpu *vcpu)
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7cd7d5c8c4bc..afaa5333f0e4 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -132,6 +132,8 @@ struct kvm_arch {
> >
> >         u8 pfr0_csv2;
> >         u8 pfr0_csv3;
> > +       /* Memory Tagging Extension enabled for the guest */
> > +       bool mte_enabled;
> >  };
> 
> nit: newline before the comment/new member
> 
> >
> >  struct kvm_vcpu_fault_info {
> > @@ -769,6 +771,7 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
> >  #define kvm_arm_vcpu_sve_finalized(vcpu) \
> >         ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED)
> >
> > +#define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled)
> >  #define kvm_vcpu_has_pmu(vcpu)                                 \
> >         (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features))
> >
> > diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
> > index 11541b94b328..0418399e0a20 100644
> > --- a/arch/arm64/kvm/hyp/exception.c
> > +++ b/arch/arm64/kvm/hyp/exception.c
> > @@ -112,7 +112,8 @@ static void enter_exception64(struct kvm_vcpu *vcpu, unsigned long target_mode,
> >         new |= (old & PSR_C_BIT);
> >         new |= (old & PSR_V_BIT);
> >
> > -       // TODO: TCO (if/when ARMv8.5-MemTag is exposed to guests)
> > +       if (kvm_has_mte(vcpu->kvm))
> > +               new |= PSR_TCO_BIT;
> >
> >         new |= (old & PSR_DIT_BIT);
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c10207fed2f3..52326b739357 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -822,6 +822,45 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> >         return PAGE_SIZE;
> >  }
> >
> > +/*
> > + * The page will be mapped in stage 2 as Normal Cacheable, so the VM will be
> > + * able to see the page's tags and therefore they must be initialised first. If
> > + * PG_mte_tagged is set, tags have already been initialised.
> > + *
> > + * The race in the test/set of the PG_mte_tagged flag is handled by:
> > + * - preventing VM_SHARED mappings in a memslot with MTE preventing two VMs
> > + *   racing to santise the same page
> > + * - mmap_lock protects between a VM faulting a page in and the VMM performing
> > + *   an mprotect() to add VM_MTE
> > + */
> > +static int sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
> > +                            unsigned long size)
> > +{
> > +       unsigned long i, nr_pages = size >> PAGE_SHIFT;
> > +       struct page *page;
> > +
> > +       if (!kvm_has_mte(kvm))
> > +               return 0;
> > +
> > +       /*
> > +        * pfn_to_online_page() is used to reject ZONE_DEVICE pages
> > +        * that may not support tags.
> > +        */
> > +       page = pfn_to_online_page(pfn);
> > +
> > +       if (!page)
> > +               return -EFAULT;
> > +
> > +       for (i = 0; i < nr_pages; i++, page++) {
> > +               if (!test_bit(PG_mte_tagged, &page->flags)) {
> > +                       mte_clear_page_tags(page_address(page));
> > +                       set_bit(PG_mte_tagged, &page->flags);
> > +               }
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >                           struct kvm_memory_slot *memslot, unsigned long hva,
> >                           unsigned long fault_status)
> > @@ -971,8 +1010,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >         if (writable)
> >                 prot |= KVM_PGTABLE_PROT_W;
> >
> > -       if (fault_status != FSC_PERM && !device)
> > +       if (fault_status != FSC_PERM && !device) {
> > +               /* Check the VMM hasn't introduced a new VM_SHARED VMA */
> > +               if (kvm_has_mte(kvm) && vma->vm_flags & VM_SHARED) {
> > +                       ret = -EFAULT;
> > +                       goto out_unlock;
> > +               }
> > +               ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
> > +               if (ret)
> > +                       goto out_unlock;
> > +
> 
> nit: Would it make sense to bring in sanitise_mte_tags under the
> kvm_has_mte. I know that a check is done in kvm_has_mte as well, but
> since you're already checking, it might make the code a bit clearer.

I think it makes more sense once merged with -next, as the CMO has
been moved into the PT code. I came up with the following resolution:

	if (vma_pagesize == PAGE_SIZE && !(force_pte || device))
		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
							   &pfn, &fault_ipa);

	if (fault_status != FSC_PERM && !device && kvm_has_mte(kvm)) {
		/* Check the VMM hasn't introduced a new VM_SHARED VMA */
		if (!(vma->vm_flags & VM_SHARED))
			ret = sanitise_mte_tags(kvm, pfn, vma_pagesize);
		else
			ret = -EFAULT;
		if (ret)
			goto out_unlock;
	}

	if (writable)
		prot |= KVM_PGTABLE_PROT_W;


However, there is a more annoying issue here, which is that the vma is
accessed outside of the mm lock. I *think* we're safe because if an
unmap happens in parallel, the MMU notifier will kick and we will be
in one of two cases:

- the unmap occurs before we take the kvm->mmu_lock, and the mmu
  notifier seq_lock is want saves us (we will drop everything and take
  the fault again),

- it occurs once we hold the lock, and this blocks the unmap.

Either way, I'd be more confident if the shared state was sampled
inside the locked section.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-06-22 11:29 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-21 11:17 [PATCH v17 0/6] MTE support for KVM guest Steven Price
2021-06-21 11:17 ` Steven Price
2021-06-21 11:17 ` Steven Price
2021-06-21 11:17 ` Steven Price
2021-06-21 11:17 ` [PATCH v17 1/6] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17 ` [PATCH v17 2/6] KVM: arm64: Introduce MTE VM feature Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 17:00   ` Fuad Tabba
2021-06-21 17:00     ` Fuad Tabba
2021-06-21 17:00     ` Fuad Tabba
2021-06-21 17:00     ` Fuad Tabba
2021-06-22 11:29     ` Marc Zyngier [this message]
2021-06-22 11:29       ` Marc Zyngier
2021-06-22 11:29       ` Marc Zyngier
2021-06-22 11:29       ` Marc Zyngier
2021-06-21 11:17 ` [PATCH v17 3/6] KVM: arm64: Save/restore MTE registers Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-22  9:46   ` Fuad Tabba
2021-06-22  9:46     ` Fuad Tabba
2021-06-22  9:46     ` Fuad Tabba
2021-06-22  9:46     ` Fuad Tabba
2021-06-21 11:17 ` [PATCH v17 4/6] KVM: arm64: Expose KVM_ARM_CAP_MTE Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-22  8:07   ` Fuad Tabba
2021-06-22  8:07     ` Fuad Tabba
2021-06-22  8:07     ` Fuad Tabba
2021-06-22  8:07     ` Fuad Tabba
2021-06-22  8:48     ` Marc Zyngier
2021-06-22  8:48       ` Marc Zyngier
2021-06-22  8:48       ` Marc Zyngier
2021-06-22  8:48       ` Marc Zyngier
2021-06-21 11:17 ` [PATCH v17 5/6] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-22  8:56   ` Fuad Tabba
2021-06-22  8:56     ` Fuad Tabba
2021-06-22  8:56     ` Fuad Tabba
2021-06-22  8:56     ` Fuad Tabba
2021-06-22 10:25     ` Marc Zyngier
2021-06-22 10:25       ` Marc Zyngier
2021-06-22 10:25       ` Marc Zyngier
2021-06-22 10:25       ` Marc Zyngier
2021-06-22 10:56       ` Fuad Tabba
2021-06-22 10:56         ` Fuad Tabba
2021-06-22 10:56         ` Fuad Tabba
2021-06-22 10:56         ` Fuad Tabba
2021-06-23 14:07         ` Steven Price
2021-06-23 14:07           ` Steven Price
2021-06-23 14:07           ` Steven Price
2021-06-23 14:07           ` Steven Price
2021-06-24 13:35   ` Marc Zyngier
2021-06-24 13:35     ` Marc Zyngier
2021-06-24 13:35     ` Marc Zyngier
2021-06-24 13:35     ` Marc Zyngier
2021-06-24 13:42     ` Steven Price
2021-06-24 13:42       ` Steven Price
2021-06-24 13:42       ` Steven Price
2021-06-24 13:42       ` Steven Price
2021-06-21 11:17 ` [PATCH v17 6/6] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-21 11:17   ` Steven Price
2021-06-22  9:42   ` Fuad Tabba
2021-06-22  9:42     ` Fuad Tabba
2021-06-22  9:42     ` Fuad Tabba
2021-06-22  9:42     ` Fuad Tabba
2021-06-22 10:35     ` Marc Zyngier
2021-06-22 10:35       ` Marc Zyngier
2021-06-22 10:35       ` Marc Zyngier
2021-06-22 10:35       ` Marc Zyngier
2021-06-22 10:41       ` Fuad Tabba
2021-06-22 10:41         ` Fuad Tabba
2021-06-22 10:41         ` Fuad Tabba
2021-06-22 10:41         ` Fuad Tabba
2021-06-22 14:21 ` [PATCH v17 0/6] MTE support for KVM guest Marc Zyngier
2021-06-22 14:21   ` Marc Zyngier
2021-06-22 14:21   ` Marc Zyngier
2021-06-22 14:21   ` Marc Zyngier
2021-06-23 14:09   ` Steven Price
2021-06-23 14:09     ` Steven Price
2021-06-23 14:09     ` Steven Price
2021-06-23 14:09     ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735tacf3c.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=Dave.Martin@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dgilbert@redhat.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=steven.price@arm.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.