All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Price <steven.price@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Haibo Xu <Haibo.Xu@arm.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
Date: Mon, 24 May 2021 19:11:29 +0100	[thread overview]
Message-ID: <20210524181129.GI14645@arm.com> (raw)
In-Reply-To: <5eec330f-63c0-2af8-70f8-ba9b643e2558@arm.com>

On Fri, May 21, 2021 at 10:42:09AM +0100, Steven Price wrote:
> On 20/05/2021 18:27, Catalin Marinas wrote:
> > On Thu, May 20, 2021 at 04:58:01PM +0100, Steven Price wrote:
> >> On 20/05/2021 13:05, Catalin Marinas wrote:
> >>> On Mon, May 17, 2021 at 01:32:38PM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..4b6c83beb75d 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1309,6 +1309,65 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >>>> +				      struct kvm_arm_copy_mte_tags *copy_tags)
> >>>> +{
> >>>> +	gpa_t guest_ipa = copy_tags->guest_ipa;
> >>>> +	size_t length = copy_tags->length;
> >>>> +	void __user *tags = copy_tags->addr;
> >>>> +	gpa_t gfn;
> >>>> +	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	if (copy_tags->reserved[0] || copy_tags->reserved[1])
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	gfn = gpa_to_gfn(guest_ipa);
> >>>> +
> >>>> +	mutex_lock(&kvm->slots_lock);
> >>>> +
> >>>> +	while (length > 0) {
> >>>> +		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
> >>>> +		void *maddr;
> >>>> +		unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
> >>>> +
> >>>> +		if (is_error_noslot_pfn(pfn)) {
> >>>> +			ret = -EFAULT;
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		maddr = page_address(pfn_to_page(pfn));
> >>>> +
> >>>> +		if (!write) {
> >>>> +			num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
> >>>> +			kvm_release_pfn_clean(pfn);
> >>>
> >>> Do we need to check if PG_mte_tagged is set? If the page was not faulted
> >>> into the guest address space but the VMM has the page, does the
> >>> gfn_to_pfn_prot() guarantee that a kvm_set_spte_gfn() was called? If
> >>> not, this may read stale tags.
> >>
> >> Ah, I hadn't thought about that... No I don't believe gfn_to_pfn_prot()
> >> will fault it into the guest.
> > 
> > It doesn't indeed. What it does is a get_user_pages() but it's not of
> > much help since the VMM pte wouldn't be tagged (we would have solved
> > lots of problems if we required PROT_MTE in the VMM...)
> 
> Sadly it solves some problems and creates others :(

I had some (random) thoughts on how to make things simpler, maybe. I
think most of these races would have been solved if we required PROT_MTE
in the VMM but this has an impact on the VMM if it wants to use MTE
itself. If such requirement was in place, all KVM needed to do is check
PG_mte_tagged.

So what we actually need is a set_pte_at() in the VMM to clear the tags
and set PG_mte_tagged. Currently, we only do this if the memory type is
tagged (PROT_MTE) but it's not strictly necessary.

As an optimisation for normal programs, we don't want to do this all the
time but the visible behaviour wouldn't change (well, maybe for ptrace
slightly). However, it doesn't mean we couldn't for a VMM, with an
opt-in via prctl(). This would add a MMCF_MTE_TAG_INIT bit (couldn't
think of a better name) to mm_context_t.flags and set_pte_at() would
behave as if the pte was tagged without actually mapping the memory in
user space as tagged (protection flags not changed). Pages that don't
support tagging are still safe, just some unnecessary ignored tag
writes. This would need to be set before the mmap() for the guest
memory.

If we want finer-grained control we'd have to store this information in
the vma flags, in addition to VM_MTE (e.g. VM_MTE_TAG_INIT) but without
affecting the actual memory type. The easiest would be another pte bit,
though we are short on them. A more intrusive (not too bad) approach is
to introduce a set_pte_at_vma() and read the flags directly in the arch
code. In most places where set_pte_at() is called on a user mm, the vma
is also available.

Anyway, I'm not saying we go this route, just thinking out loud, get
some opinions.

> > Another thing I forgot to ask, what's guaranteeing that the page
> > supports tags? Does this ioctl ensure that it would attempt the tag
> > copying from some device mapping? Do we need some kvm_is_device_pfn()
> > check? I guess ZONE_DEVICE memory we just refuse to map in an earlier
> > patch.
> 
> Hmm, nothing much. While reads are now fine (the memory won't have
> PG_mte_tagged), writes could potentially happen on ZONE_DEVICE memory.

I don't think it's a problem for writes either as the host wouldn't map
such memory as tagged. It's just that it returns zeros and writes are
ignored, so we could instead return an error (I haven't checked your
latest series yet).

> >> 		} else {
> >> 			num_tags = mte_copy_tags_from_user(maddr, tags,
> >> 							MTE_GRANULES_PER_PAGE);
> >> 			kvm_release_pfn_dirty(pfn);
> >> 		}
> >>
> >> 		if (num_tags != MTE_GRANULES_PER_PAGE) {
> >> 			ret = -EFAULT;
> >> 			goto out;
> >> 		}
> >>
> >> 		if (write)
> >> 			test_and_set_bit(PG_mte_tagged, &page->flags);
> > 
> > I think a set_bit() would do, I doubt it's any more efficient. But why
> 
> I'd seen test_and_set_bit() used elsewhere (I forget where now) as a
> slightly more efficient approach. It complies down to a READ_ONCE and a
> conditional atomic, vs a single non-conditional atomic. But I don't have
> any actual data on the performance and this isn't a hot path, so I'll
> switch to the more obvious set_bit().

Yeah, I think I've seen this as well. Anyway, it's probably lost in the
noise of tag writing here.

-- 
Catalin

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Price <steven.price@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Andrew Jones <drjones@redhat.com>, Haibo Xu <Haibo.Xu@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	Julien Thierry <julien.thierry.kdev@gmail.com>
Subject: Re: [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
Date: Mon, 24 May 2021 19:11:29 +0100	[thread overview]
Message-ID: <20210524181129.GI14645@arm.com> (raw)
In-Reply-To: <5eec330f-63c0-2af8-70f8-ba9b643e2558@arm.com>

On Fri, May 21, 2021 at 10:42:09AM +0100, Steven Price wrote:
> On 20/05/2021 18:27, Catalin Marinas wrote:
> > On Thu, May 20, 2021 at 04:58:01PM +0100, Steven Price wrote:
> >> On 20/05/2021 13:05, Catalin Marinas wrote:
> >>> On Mon, May 17, 2021 at 01:32:38PM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..4b6c83beb75d 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1309,6 +1309,65 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >>>> +				      struct kvm_arm_copy_mte_tags *copy_tags)
> >>>> +{
> >>>> +	gpa_t guest_ipa = copy_tags->guest_ipa;
> >>>> +	size_t length = copy_tags->length;
> >>>> +	void __user *tags = copy_tags->addr;
> >>>> +	gpa_t gfn;
> >>>> +	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	if (copy_tags->reserved[0] || copy_tags->reserved[1])
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	gfn = gpa_to_gfn(guest_ipa);
> >>>> +
> >>>> +	mutex_lock(&kvm->slots_lock);
> >>>> +
> >>>> +	while (length > 0) {
> >>>> +		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
> >>>> +		void *maddr;
> >>>> +		unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
> >>>> +
> >>>> +		if (is_error_noslot_pfn(pfn)) {
> >>>> +			ret = -EFAULT;
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		maddr = page_address(pfn_to_page(pfn));
> >>>> +
> >>>> +		if (!write) {
> >>>> +			num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
> >>>> +			kvm_release_pfn_clean(pfn);
> >>>
> >>> Do we need to check if PG_mte_tagged is set? If the page was not faulted
> >>> into the guest address space but the VMM has the page, does the
> >>> gfn_to_pfn_prot() guarantee that a kvm_set_spte_gfn() was called? If
> >>> not, this may read stale tags.
> >>
> >> Ah, I hadn't thought about that... No I don't believe gfn_to_pfn_prot()
> >> will fault it into the guest.
> > 
> > It doesn't indeed. What it does is a get_user_pages() but it's not of
> > much help since the VMM pte wouldn't be tagged (we would have solved
> > lots of problems if we required PROT_MTE in the VMM...)
> 
> Sadly it solves some problems and creates others :(

I had some (random) thoughts on how to make things simpler, maybe. I
think most of these races would have been solved if we required PROT_MTE
in the VMM but this has an impact on the VMM if it wants to use MTE
itself. If such requirement was in place, all KVM needed to do is check
PG_mte_tagged.

So what we actually need is a set_pte_at() in the VMM to clear the tags
and set PG_mte_tagged. Currently, we only do this if the memory type is
tagged (PROT_MTE) but it's not strictly necessary.

As an optimisation for normal programs, we don't want to do this all the
time but the visible behaviour wouldn't change (well, maybe for ptrace
slightly). However, it doesn't mean we couldn't for a VMM, with an
opt-in via prctl(). This would add a MMCF_MTE_TAG_INIT bit (couldn't
think of a better name) to mm_context_t.flags and set_pte_at() would
behave as if the pte was tagged without actually mapping the memory in
user space as tagged (protection flags not changed). Pages that don't
support tagging are still safe, just some unnecessary ignored tag
writes. This would need to be set before the mmap() for the guest
memory.

If we want finer-grained control we'd have to store this information in
the vma flags, in addition to VM_MTE (e.g. VM_MTE_TAG_INIT) but without
affecting the actual memory type. The easiest would be another pte bit,
though we are short on them. A more intrusive (not too bad) approach is
to introduce a set_pte_at_vma() and read the flags directly in the arch
code. In most places where set_pte_at() is called on a user mm, the vma
is also available.

Anyway, I'm not saying we go this route, just thinking out loud, get
some opinions.

> > Another thing I forgot to ask, what's guaranteeing that the page
> > supports tags? Does this ioctl ensure that it would attempt the tag
> > copying from some device mapping? Do we need some kvm_is_device_pfn()
> > check? I guess ZONE_DEVICE memory we just refuse to map in an earlier
> > patch.
> 
> Hmm, nothing much. While reads are now fine (the memory won't have
> PG_mte_tagged), writes could potentially happen on ZONE_DEVICE memory.

I don't think it's a problem for writes either as the host wouldn't map
such memory as tagged. It's just that it returns zeros and writes are
ignored, so we could instead return an error (I haven't checked your
latest series yet).

> >> 		} else {
> >> 			num_tags = mte_copy_tags_from_user(maddr, tags,
> >> 							MTE_GRANULES_PER_PAGE);
> >> 			kvm_release_pfn_dirty(pfn);
> >> 		}
> >>
> >> 		if (num_tags != MTE_GRANULES_PER_PAGE) {
> >> 			ret = -EFAULT;
> >> 			goto out;
> >> 		}
> >>
> >> 		if (write)
> >> 			test_and_set_bit(PG_mte_tagged, &page->flags);
> > 
> > I think a set_bit() would do, I doubt it's any more efficient. But why
> 
> I'd seen test_and_set_bit() used elsewhere (I forget where now) as a
> slightly more efficient approach. It complies down to a READ_ONCE and a
> conditional atomic, vs a single non-conditional atomic. But I don't have
> any actual data on the performance and this isn't a hot path, so I'll
> switch to the more obvious set_bit().

Yeah, I think I've seen this as well. Anyway, it's probably lost in the
noise of tag writing here.

-- 
Catalin


WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Price <steven.price@arm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
Date: Mon, 24 May 2021 19:11:29 +0100	[thread overview]
Message-ID: <20210524181129.GI14645@arm.com> (raw)
In-Reply-To: <5eec330f-63c0-2af8-70f8-ba9b643e2558@arm.com>

On Fri, May 21, 2021 at 10:42:09AM +0100, Steven Price wrote:
> On 20/05/2021 18:27, Catalin Marinas wrote:
> > On Thu, May 20, 2021 at 04:58:01PM +0100, Steven Price wrote:
> >> On 20/05/2021 13:05, Catalin Marinas wrote:
> >>> On Mon, May 17, 2021 at 01:32:38PM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..4b6c83beb75d 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1309,6 +1309,65 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >>>> +				      struct kvm_arm_copy_mte_tags *copy_tags)
> >>>> +{
> >>>> +	gpa_t guest_ipa = copy_tags->guest_ipa;
> >>>> +	size_t length = copy_tags->length;
> >>>> +	void __user *tags = copy_tags->addr;
> >>>> +	gpa_t gfn;
> >>>> +	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	if (copy_tags->reserved[0] || copy_tags->reserved[1])
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	gfn = gpa_to_gfn(guest_ipa);
> >>>> +
> >>>> +	mutex_lock(&kvm->slots_lock);
> >>>> +
> >>>> +	while (length > 0) {
> >>>> +		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
> >>>> +		void *maddr;
> >>>> +		unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
> >>>> +
> >>>> +		if (is_error_noslot_pfn(pfn)) {
> >>>> +			ret = -EFAULT;
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		maddr = page_address(pfn_to_page(pfn));
> >>>> +
> >>>> +		if (!write) {
> >>>> +			num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
> >>>> +			kvm_release_pfn_clean(pfn);
> >>>
> >>> Do we need to check if PG_mte_tagged is set? If the page was not faulted
> >>> into the guest address space but the VMM has the page, does the
> >>> gfn_to_pfn_prot() guarantee that a kvm_set_spte_gfn() was called? If
> >>> not, this may read stale tags.
> >>
> >> Ah, I hadn't thought about that... No I don't believe gfn_to_pfn_prot()
> >> will fault it into the guest.
> > 
> > It doesn't indeed. What it does is a get_user_pages() but it's not of
> > much help since the VMM pte wouldn't be tagged (we would have solved
> > lots of problems if we required PROT_MTE in the VMM...)
> 
> Sadly it solves some problems and creates others :(

I had some (random) thoughts on how to make things simpler, maybe. I
think most of these races would have been solved if we required PROT_MTE
in the VMM but this has an impact on the VMM if it wants to use MTE
itself. If such requirement was in place, all KVM needed to do is check
PG_mte_tagged.

So what we actually need is a set_pte_at() in the VMM to clear the tags
and set PG_mte_tagged. Currently, we only do this if the memory type is
tagged (PROT_MTE) but it's not strictly necessary.

As an optimisation for normal programs, we don't want to do this all the
time but the visible behaviour wouldn't change (well, maybe for ptrace
slightly). However, it doesn't mean we couldn't for a VMM, with an
opt-in via prctl(). This would add a MMCF_MTE_TAG_INIT bit (couldn't
think of a better name) to mm_context_t.flags and set_pte_at() would
behave as if the pte was tagged without actually mapping the memory in
user space as tagged (protection flags not changed). Pages that don't
support tagging are still safe, just some unnecessary ignored tag
writes. This would need to be set before the mmap() for the guest
memory.

If we want finer-grained control we'd have to store this information in
the vma flags, in addition to VM_MTE (e.g. VM_MTE_TAG_INIT) but without
affecting the actual memory type. The easiest would be another pte bit,
though we are short on them. A more intrusive (not too bad) approach is
to introduce a set_pte_at_vma() and read the flags directly in the arch
code. In most places where set_pte_at() is called on a user mm, the vma
is also available.

Anyway, I'm not saying we go this route, just thinking out loud, get
some opinions.

> > Another thing I forgot to ask, what's guaranteeing that the page
> > supports tags? Does this ioctl ensure that it would attempt the tag
> > copying from some device mapping? Do we need some kvm_is_device_pfn()
> > check? I guess ZONE_DEVICE memory we just refuse to map in an earlier
> > patch.
> 
> Hmm, nothing much. While reads are now fine (the memory won't have
> PG_mte_tagged), writes could potentially happen on ZONE_DEVICE memory.

I don't think it's a problem for writes either as the host wouldn't map
such memory as tagged. It's just that it returns zeros and writes are
ignored, so we could instead return an error (I haven't checked your
latest series yet).

> >> 		} else {
> >> 			num_tags = mte_copy_tags_from_user(maddr, tags,
> >> 							MTE_GRANULES_PER_PAGE);
> >> 			kvm_release_pfn_dirty(pfn);
> >> 		}
> >>
> >> 		if (num_tags != MTE_GRANULES_PER_PAGE) {
> >> 			ret = -EFAULT;
> >> 			goto out;
> >> 		}
> >>
> >> 		if (write)
> >> 			test_and_set_bit(PG_mte_tagged, &page->flags);
> > 
> > I think a set_bit() would do, I doubt it's any more efficient. But why
> 
> I'd seen test_and_set_bit() used elsewhere (I forget where now) as a
> slightly more efficient approach. It complies down to a READ_ONCE and a
> conditional atomic, vs a single non-conditional atomic. But I don't have
> any actual data on the performance and this isn't a hot path, so I'll
> switch to the more obvious set_bit().

Yeah, I think I've seen this as well. Anyway, it's probably lost in the
noise of tag writing here.

-- 
Catalin
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Steven Price <steven.price@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Haibo Xu <Haibo.Xu@arm.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
Date: Mon, 24 May 2021 19:11:29 +0100	[thread overview]
Message-ID: <20210524181129.GI14645@arm.com> (raw)
In-Reply-To: <5eec330f-63c0-2af8-70f8-ba9b643e2558@arm.com>

On Fri, May 21, 2021 at 10:42:09AM +0100, Steven Price wrote:
> On 20/05/2021 18:27, Catalin Marinas wrote:
> > On Thu, May 20, 2021 at 04:58:01PM +0100, Steven Price wrote:
> >> On 20/05/2021 13:05, Catalin Marinas wrote:
> >>> On Mon, May 17, 2021 at 01:32:38PM +0100, Steven Price wrote:
> >>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> >>>> index e89a5e275e25..4b6c83beb75d 100644
> >>>> --- a/arch/arm64/kvm/arm.c
> >>>> +++ b/arch/arm64/kvm/arm.c
> >>>> @@ -1309,6 +1309,65 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> >>>>  	}
> >>>>  }
> >>>>  
> >>>> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >>>> +				      struct kvm_arm_copy_mte_tags *copy_tags)
> >>>> +{
> >>>> +	gpa_t guest_ipa = copy_tags->guest_ipa;
> >>>> +	size_t length = copy_tags->length;
> >>>> +	void __user *tags = copy_tags->addr;
> >>>> +	gpa_t gfn;
> >>>> +	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
> >>>> +	int ret = 0;
> >>>> +
> >>>> +	if (copy_tags->reserved[0] || copy_tags->reserved[1])
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
> >>>> +		return -EINVAL;
> >>>> +
> >>>> +	gfn = gpa_to_gfn(guest_ipa);
> >>>> +
> >>>> +	mutex_lock(&kvm->slots_lock);
> >>>> +
> >>>> +	while (length > 0) {
> >>>> +		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
> >>>> +		void *maddr;
> >>>> +		unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
> >>>> +
> >>>> +		if (is_error_noslot_pfn(pfn)) {
> >>>> +			ret = -EFAULT;
> >>>> +			goto out;
> >>>> +		}
> >>>> +
> >>>> +		maddr = page_address(pfn_to_page(pfn));
> >>>> +
> >>>> +		if (!write) {
> >>>> +			num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
> >>>> +			kvm_release_pfn_clean(pfn);
> >>>
> >>> Do we need to check if PG_mte_tagged is set? If the page was not faulted
> >>> into the guest address space but the VMM has the page, does the
> >>> gfn_to_pfn_prot() guarantee that a kvm_set_spte_gfn() was called? If
> >>> not, this may read stale tags.
> >>
> >> Ah, I hadn't thought about that... No I don't believe gfn_to_pfn_prot()
> >> will fault it into the guest.
> > 
> > It doesn't indeed. What it does is a get_user_pages() but it's not of
> > much help since the VMM pte wouldn't be tagged (we would have solved
> > lots of problems if we required PROT_MTE in the VMM...)
> 
> Sadly it solves some problems and creates others :(

I had some (random) thoughts on how to make things simpler, maybe. I
think most of these races would have been solved if we required PROT_MTE
in the VMM but this has an impact on the VMM if it wants to use MTE
itself. If such requirement was in place, all KVM needed to do is check
PG_mte_tagged.

So what we actually need is a set_pte_at() in the VMM to clear the tags
and set PG_mte_tagged. Currently, we only do this if the memory type is
tagged (PROT_MTE) but it's not strictly necessary.

As an optimisation for normal programs, we don't want to do this all the
time but the visible behaviour wouldn't change (well, maybe for ptrace
slightly). However, it doesn't mean we couldn't for a VMM, with an
opt-in via prctl(). This would add a MMCF_MTE_TAG_INIT bit (couldn't
think of a better name) to mm_context_t.flags and set_pte_at() would
behave as if the pte was tagged without actually mapping the memory in
user space as tagged (protection flags not changed). Pages that don't
support tagging are still safe, just some unnecessary ignored tag
writes. This would need to be set before the mmap() for the guest
memory.

If we want finer-grained control we'd have to store this information in
the vma flags, in addition to VM_MTE (e.g. VM_MTE_TAG_INIT) but without
affecting the actual memory type. The easiest would be another pte bit,
though we are short on them. A more intrusive (not too bad) approach is
to introduce a set_pte_at_vma() and read the flags directly in the arch
code. In most places where set_pte_at() is called on a user mm, the vma
is also available.

Anyway, I'm not saying we go this route, just thinking out loud, get
some opinions.

> > Another thing I forgot to ask, what's guaranteeing that the page
> > supports tags? Does this ioctl ensure that it would attempt the tag
> > copying from some device mapping? Do we need some kvm_is_device_pfn()
> > check? I guess ZONE_DEVICE memory we just refuse to map in an earlier
> > patch.
> 
> Hmm, nothing much. While reads are now fine (the memory won't have
> PG_mte_tagged), writes could potentially happen on ZONE_DEVICE memory.

I don't think it's a problem for writes either as the host wouldn't map
such memory as tagged. It's just that it returns zeros and writes are
ignored, so we could instead return an error (I haven't checked your
latest series yet).

> >> 		} else {
> >> 			num_tags = mte_copy_tags_from_user(maddr, tags,
> >> 							MTE_GRANULES_PER_PAGE);
> >> 			kvm_release_pfn_dirty(pfn);
> >> 		}
> >>
> >> 		if (num_tags != MTE_GRANULES_PER_PAGE) {
> >> 			ret = -EFAULT;
> >> 			goto out;
> >> 		}
> >>
> >> 		if (write)
> >> 			test_and_set_bit(PG_mte_tagged, &page->flags);
> > 
> > I think a set_bit() would do, I doubt it's any more efficient. But why
> 
> I'd seen test_and_set_bit() used elsewhere (I forget where now) as a
> slightly more efficient approach. It complies down to a READ_ONCE and a
> conditional atomic, vs a single non-conditional atomic. But I don't have
> any actual data on the performance and this isn't a hot path, so I'll
> switch to the more obvious set_bit().

Yeah, I think I've seen this as well. Anyway, it's probably lost in the
noise of tag writing here.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-05-24 18:11 UTC|newest]

Thread overview: 196+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 12:32 [PATCH v12 0/8] MTE support for KVM guest Steven Price
2021-05-17 12:32 ` Steven Price
2021-05-17 12:32 ` Steven Price
2021-05-17 12:32 ` Steven Price
2021-05-17 12:32 ` [PATCH v12 1/8] arm64: mte: Handle race when synchronising tags Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 14:03   ` Marc Zyngier
2021-05-17 14:03     ` Marc Zyngier
2021-05-17 14:03     ` Marc Zyngier
2021-05-17 14:03     ` Marc Zyngier
2021-05-17 14:56     ` Steven Price
2021-05-17 14:56       ` Steven Price
2021-05-17 14:56       ` Steven Price
2021-05-17 14:56       ` Steven Price
2021-05-19 17:32   ` Catalin Marinas
2021-05-19 17:32     ` Catalin Marinas
2021-05-19 17:32     ` Catalin Marinas
2021-05-19 17:32     ` Catalin Marinas
2021-05-17 12:32 ` [PATCH v12 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32 ` [PATCH v12 3/8] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 16:14   ` Marc Zyngier
2021-05-17 16:14     ` Marc Zyngier
2021-05-17 16:14     ` Marc Zyngier
2021-05-17 16:14     ` Marc Zyngier
2021-05-19  9:32     ` Steven Price
2021-05-19  9:32       ` Steven Price
2021-05-19  9:32       ` Steven Price
2021-05-19  9:32       ` Steven Price
2021-05-19 17:48       ` Catalin Marinas
2021-05-19 17:48         ` Catalin Marinas
2021-05-19 17:48         ` Catalin Marinas
2021-05-19 17:48         ` Catalin Marinas
2021-05-19 18:06   ` Catalin Marinas
2021-05-19 18:06     ` Catalin Marinas
2021-05-19 18:06     ` Catalin Marinas
2021-05-19 18:06     ` Catalin Marinas
2021-05-20 11:55     ` Steven Price
2021-05-20 11:55       ` Steven Price
2021-05-20 11:55       ` Steven Price
2021-05-20 11:55       ` Steven Price
2021-05-20 12:25       ` Catalin Marinas
2021-05-20 12:25         ` Catalin Marinas
2021-05-20 12:25         ` Catalin Marinas
2021-05-20 12:25         ` Catalin Marinas
2021-05-20 13:02         ` Catalin Marinas
2021-05-20 13:02           ` Catalin Marinas
2021-05-20 13:02           ` Catalin Marinas
2021-05-20 13:02           ` Catalin Marinas
2021-05-20 13:03         ` Steven Price
2021-05-20 13:03           ` Steven Price
2021-05-20 13:03           ` Steven Price
2021-05-20 13:03           ` Steven Price
2021-05-17 12:32 ` [PATCH v12 4/8] arm64: kvm: Introduce MTE VM feature Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 16:45   ` Marc Zyngier
2021-05-17 16:45     ` Marc Zyngier
2021-05-17 16:45     ` Marc Zyngier
2021-05-17 16:45     ` Marc Zyngier
2021-05-19 10:48     ` Steven Price
2021-05-19 10:48       ` Steven Price
2021-05-19 10:48       ` Steven Price
2021-05-19 10:48       ` Steven Price
2021-05-20  8:51       ` Marc Zyngier
2021-05-20  8:51         ` Marc Zyngier
2021-05-20  8:51         ` Marc Zyngier
2021-05-20  8:51         ` Marc Zyngier
2021-05-20 14:46         ` Steven Price
2021-05-20 14:46           ` Steven Price
2021-05-20 14:46           ` Steven Price
2021-05-20 14:46           ` Steven Price
2021-05-20 11:54   ` Catalin Marinas
2021-05-20 11:54     ` Catalin Marinas
2021-05-20 11:54     ` Catalin Marinas
2021-05-20 11:54     ` Catalin Marinas
2021-05-20 15:05     ` Steven Price
2021-05-20 15:05       ` Steven Price
2021-05-20 15:05       ` Steven Price
2021-05-20 15:05       ` Steven Price
2021-05-20 17:50       ` Catalin Marinas
2021-05-20 17:50         ` Catalin Marinas
2021-05-20 17:50         ` Catalin Marinas
2021-05-20 17:50         ` Catalin Marinas
2021-05-21  9:28         ` Steven Price
2021-05-21  9:28           ` Steven Price
2021-05-21  9:28           ` Steven Price
2021-05-21  9:28           ` Steven Price
2021-05-17 12:32 ` [PATCH v12 5/8] arm64: kvm: Save/restore MTE registers Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 17:17   ` Marc Zyngier
2021-05-17 17:17     ` Marc Zyngier
2021-05-17 17:17     ` Marc Zyngier
2021-05-17 17:17     ` Marc Zyngier
2021-05-19 13:04     ` Steven Price
2021-05-19 13:04       ` Steven Price
2021-05-19 13:04       ` Steven Price
2021-05-19 13:04       ` Steven Price
2021-05-20  9:46       ` Marc Zyngier
2021-05-20  9:46         ` Marc Zyngier
2021-05-20  9:46         ` Marc Zyngier
2021-05-20  9:46         ` Marc Zyngier
2021-05-20 15:21         ` Steven Price
2021-05-20 15:21           ` Steven Price
2021-05-20 15:21           ` Steven Price
2021-05-20 15:21           ` Steven Price
2021-05-17 12:32 ` [PATCH v12 6/8] arm64: kvm: Expose KVM_ARM_CAP_MTE Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 17:40   ` Marc Zyngier
2021-05-17 17:40     ` Marc Zyngier
2021-05-17 17:40     ` Marc Zyngier
2021-05-17 17:40     ` Marc Zyngier
2021-05-19 13:26     ` Steven Price
2021-05-19 13:26       ` Steven Price
2021-05-19 13:26       ` Steven Price
2021-05-19 13:26       ` Steven Price
2021-05-20 10:09       ` Marc Zyngier
2021-05-20 10:09         ` Marc Zyngier
2021-05-20 10:09         ` Marc Zyngier
2021-05-20 10:09         ` Marc Zyngier
2021-05-20 10:51         ` Steven Price
2021-05-20 10:51           ` Steven Price
2021-05-20 10:51           ` Steven Price
2021-05-20 10:51           ` Steven Price
2021-05-17 12:32 ` [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 18:04   ` Marc Zyngier
2021-05-17 18:04     ` Marc Zyngier
2021-05-17 18:04     ` Marc Zyngier
2021-05-17 18:04     ` Marc Zyngier
2021-05-19 13:51     ` Steven Price
2021-05-19 13:51       ` Steven Price
2021-05-19 13:51       ` Steven Price
2021-05-19 13:51       ` Steven Price
2021-05-20 12:05   ` Catalin Marinas
2021-05-20 12:05     ` Catalin Marinas
2021-05-20 12:05     ` Catalin Marinas
2021-05-20 12:05     ` Catalin Marinas
2021-05-20 15:58     ` Steven Price
2021-05-20 15:58       ` Steven Price
2021-05-20 15:58       ` Steven Price
2021-05-20 15:58       ` Steven Price
2021-05-20 17:27       ` Catalin Marinas
2021-05-20 17:27         ` Catalin Marinas
2021-05-20 17:27         ` Catalin Marinas
2021-05-20 17:27         ` Catalin Marinas
2021-05-21  9:42         ` Steven Price
2021-05-21  9:42           ` Steven Price
2021-05-21  9:42           ` Steven Price
2021-05-21  9:42           ` Steven Price
2021-05-24 18:11           ` Catalin Marinas [this message]
2021-05-24 18:11             ` Catalin Marinas
2021-05-24 18:11             ` Catalin Marinas
2021-05-24 18:11             ` Catalin Marinas
2021-05-27  7:50             ` Steven Price
2021-05-27  7:50               ` Steven Price
2021-05-27  7:50               ` Steven Price
2021-05-27  7:50               ` Steven Price
2021-05-27 13:08               ` Catalin Marinas
2021-05-27 13:08                 ` Catalin Marinas
2021-05-27 13:08                 ` Catalin Marinas
2021-05-27 13:08                 ` Catalin Marinas
2021-05-17 12:32 ` [PATCH v12 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 12:32   ` Steven Price
2021-05-17 18:09   ` Marc Zyngier
2021-05-17 18:09     ` Marc Zyngier
2021-05-17 18:09     ` Marc Zyngier
2021-05-17 18:09     ` Marc Zyngier
2021-05-19 14:09     ` Steven Price
2021-05-19 14:09       ` Steven Price
2021-05-19 14:09       ` Steven Price
2021-05-19 14:09       ` Steven Price
2021-05-20 10:24       ` Marc Zyngier
2021-05-20 10:24         ` Marc Zyngier
2021-05-20 10:24         ` Marc Zyngier
2021-05-20 10:24         ` Marc Zyngier
2021-05-20 10:52         ` Steven Price
2021-05-20 10:52           ` Steven Price
2021-05-20 10:52           ` Steven Price
2021-05-20 10:52           ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210524181129.GI14645@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=Haibo.Xu@arm.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.