linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Price <steven.price@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Haibo Xu <Haibo.Xu@arm.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest
Date: Thu, 27 May 2021 08:50:30 +0100	[thread overview]
Message-ID: <58345eca-6e5f-0faa-e47d-e9149d73f6c5@arm.com> (raw)
In-Reply-To: <20210524181129.GI14645@arm.com>

On 24/05/2021 19:11, Catalin Marinas wrote:
> On Fri, May 21, 2021 at 10:42:09AM +0100, Steven Price wrote:
>> On 20/05/2021 18:27, Catalin Marinas wrote:
>>> On Thu, May 20, 2021 at 04:58:01PM +0100, Steven Price wrote:
>>>> On 20/05/2021 13:05, Catalin Marinas wrote:
>>>>> On Mon, May 17, 2021 at 01:32:38PM +0100, Steven Price wrote:
>>>>>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>>>>>> index e89a5e275e25..4b6c83beb75d 100644
>>>>>> --- a/arch/arm64/kvm/arm.c
>>>>>> +++ b/arch/arm64/kvm/arm.c
>>>>>> @@ -1309,6 +1309,65 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
>>>>>>  	}
>>>>>>  }
>>>>>>  
>>>>>> +static int kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>>>>>> +				      struct kvm_arm_copy_mte_tags *copy_tags)
>>>>>> +{
>>>>>> +	gpa_t guest_ipa = copy_tags->guest_ipa;
>>>>>> +	size_t length = copy_tags->length;
>>>>>> +	void __user *tags = copy_tags->addr;
>>>>>> +	gpa_t gfn;
>>>>>> +	bool write = !(copy_tags->flags & KVM_ARM_TAGS_FROM_GUEST);
>>>>>> +	int ret = 0;
>>>>>> +
>>>>>> +	if (copy_tags->reserved[0] || copy_tags->reserved[1])
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	if (copy_tags->flags & ~KVM_ARM_TAGS_FROM_GUEST)
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	if (length & ~PAGE_MASK || guest_ipa & ~PAGE_MASK)
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>> +	gfn = gpa_to_gfn(guest_ipa);
>>>>>> +
>>>>>> +	mutex_lock(&kvm->slots_lock);
>>>>>> +
>>>>>> +	while (length > 0) {
>>>>>> +		kvm_pfn_t pfn = gfn_to_pfn_prot(kvm, gfn, write, NULL);
>>>>>> +		void *maddr;
>>>>>> +		unsigned long num_tags = PAGE_SIZE / MTE_GRANULE_SIZE;
>>>>>> +
>>>>>> +		if (is_error_noslot_pfn(pfn)) {
>>>>>> +			ret = -EFAULT;
>>>>>> +			goto out;
>>>>>> +		}
>>>>>> +
>>>>>> +		maddr = page_address(pfn_to_page(pfn));
>>>>>> +
>>>>>> +		if (!write) {
>>>>>> +			num_tags = mte_copy_tags_to_user(tags, maddr, num_tags);
>>>>>> +			kvm_release_pfn_clean(pfn);
>>>>>
>>>>> Do we need to check if PG_mte_tagged is set? If the page was not faulted
>>>>> into the guest address space but the VMM has the page, does the
>>>>> gfn_to_pfn_prot() guarantee that a kvm_set_spte_gfn() was called? If
>>>>> not, this may read stale tags.
>>>>
>>>> Ah, I hadn't thought about that... No I don't believe gfn_to_pfn_prot()
>>>> will fault it into the guest.
>>>
>>> It doesn't indeed. What it does is a get_user_pages() but it's not of
>>> much help since the VMM pte wouldn't be tagged (we would have solved
>>> lots of problems if we required PROT_MTE in the VMM...)
>>
>> Sadly it solves some problems and creates others :(
> 
> I had some (random) thoughts on how to make things simpler, maybe. I
> think most of these races would have been solved if we required PROT_MTE
> in the VMM but this has an impact on the VMM if it wants to use MTE
> itself. If such requirement was in place, all KVM needed to do is check
> PG_mte_tagged.
> 
> So what we actually need is a set_pte_at() in the VMM to clear the tags
> and set PG_mte_tagged. Currently, we only do this if the memory type is
> tagged (PROT_MTE) but it's not strictly necessary.
> 
> As an optimisation for normal programs, we don't want to do this all the
> time but the visible behaviour wouldn't change (well, maybe for ptrace
> slightly). However, it doesn't mean we couldn't for a VMM, with an
> opt-in via prctl(). This would add a MMCF_MTE_TAG_INIT bit (couldn't
> think of a better name) to mm_context_t.flags and set_pte_at() would
> behave as if the pte was tagged without actually mapping the memory in
> user space as tagged (protection flags not changed). Pages that don't
> support tagging are still safe, just some unnecessary ignored tag
> writes. This would need to be set before the mmap() for the guest
> memory.
> 
> If we want finer-grained control we'd have to store this information in
> the vma flags, in addition to VM_MTE (e.g. VM_MTE_TAG_INIT) but without
> affecting the actual memory type. The easiest would be another pte bit,
> though we are short on them. A more intrusive (not too bad) approach is
> to introduce a set_pte_at_vma() and read the flags directly in the arch
> code. In most places where set_pte_at() is called on a user mm, the vma
> is also available.
> 
> Anyway, I'm not saying we go this route, just thinking out loud, get
> some opinions.

Does get_user_pages() actually end up calling set_pte_at() normally? If
not then on the normal user_mem_abort() route although we can easily
check VM_MTE_TAG_INIT there's no obvious place to hook in to ensure that
the pages actually allocated have the PG_mte_tagged flag.

I'm also not sure how well this would work with the MMU notifiers path
in KVM. With MMU notifiers (i.e. the VMM replacing a page in the
memslot) there's not even an obvious hook to enforce the VMA flag. So I
think we'd end up with something like the sanitise_mte_tags() function
to at least check that the PG_mte_tagged flag is set on the pages
(assuming that the trigger for the MMU notifier has done the
corresponding set_pte_at()). Admittedly this might close the current
race documented there.

It also feels wrong to me to tie this to a process with prctl(), it
seems much more normal to implement this as a new mprotect() flag as
this is really a memory property not a process property. And I think
we'll find some scary corner cases if we try to associate everything
back to a process - although I can't instantly think of anything that
will actually break.

>>> Another thing I forgot to ask, what's guaranteeing that the page
>>> supports tags? Does this ioctl ensure that it would attempt the tag
>>> copying from some device mapping? Do we need some kvm_is_device_pfn()
>>> check? I guess ZONE_DEVICE memory we just refuse to map in an earlier
>>> patch.
>>
>> Hmm, nothing much. While reads are now fine (the memory won't have
>> PG_mte_tagged), writes could potentially happen on ZONE_DEVICE memory.
> 
> I don't think it's a problem for writes either as the host wouldn't map
> such memory as tagged. It's just that it returns zeros and writes are
> ignored, so we could instead return an error (I haven't checked your
> latest series yet).

The latest series uses pfn_to_online_page() to reject ZONE_DEVICE early.

>>>> 		} else {
>>>> 			num_tags = mte_copy_tags_from_user(maddr, tags,
>>>> 							MTE_GRANULES_PER_PAGE);
>>>> 			kvm_release_pfn_dirty(pfn);
>>>> 		}
>>>>
>>>> 		if (num_tags != MTE_GRANULES_PER_PAGE) {
>>>> 			ret = -EFAULT;
>>>> 			goto out;
>>>> 		}
>>>>
>>>> 		if (write)
>>>> 			test_and_set_bit(PG_mte_tagged, &page->flags);
>>>
>>> I think a set_bit() would do, I doubt it's any more efficient. But why
>>
>> I'd seen test_and_set_bit() used elsewhere (I forget where now) as a
>> slightly more efficient approach. It complies down to a READ_ONCE and a
>> conditional atomic, vs a single non-conditional atomic. But I don't have
>> any actual data on the performance and this isn't a hot path, so I'll
>> switch to the more obvious set_bit().
> 
> Yeah, I think I've seen this as well. Anyway, it's probably lost in the
> noise of tag writing here.
> 

Agreed.

Thanks,

Steve

  reply	other threads:[~2021-05-27  7:50 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-17 12:32 [PATCH v12 0/8] MTE support for KVM guest Steven Price
2021-05-17 12:32 ` [PATCH v12 1/8] arm64: mte: Handle race when synchronising tags Steven Price
2021-05-17 14:03   ` Marc Zyngier
2021-05-17 14:56     ` Steven Price
2021-05-19 17:32   ` Catalin Marinas
2021-05-17 12:32 ` [PATCH v12 2/8] arm64: Handle MTE tags zeroing in __alloc_zeroed_user_highpage() Steven Price
2021-05-17 12:32 ` [PATCH v12 3/8] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-05-17 16:14   ` Marc Zyngier
2021-05-19  9:32     ` Steven Price
2021-05-19 17:48       ` Catalin Marinas
2021-05-19 18:06   ` Catalin Marinas
2021-05-20 11:55     ` Steven Price
2021-05-20 12:25       ` Catalin Marinas
2021-05-20 13:02         ` Catalin Marinas
2021-05-20 13:03         ` Steven Price
2021-05-17 12:32 ` [PATCH v12 4/8] arm64: kvm: Introduce MTE VM feature Steven Price
2021-05-17 16:45   ` Marc Zyngier
2021-05-19 10:48     ` Steven Price
2021-05-20  8:51       ` Marc Zyngier
2021-05-20 14:46         ` Steven Price
2021-05-20 11:54   ` Catalin Marinas
2021-05-20 15:05     ` Steven Price
2021-05-20 17:50       ` Catalin Marinas
2021-05-21  9:28         ` Steven Price
2021-05-17 12:32 ` [PATCH v12 5/8] arm64: kvm: Save/restore MTE registers Steven Price
2021-05-17 17:17   ` Marc Zyngier
2021-05-19 13:04     ` Steven Price
2021-05-20  9:46       ` Marc Zyngier
2021-05-20 15:21         ` Steven Price
2021-05-17 12:32 ` [PATCH v12 6/8] arm64: kvm: Expose KVM_ARM_CAP_MTE Steven Price
2021-05-17 17:40   ` Marc Zyngier
2021-05-19 13:26     ` Steven Price
2021-05-20 10:09       ` Marc Zyngier
2021-05-20 10:51         ` Steven Price
2021-05-17 12:32 ` [PATCH v12 7/8] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-05-17 18:04   ` Marc Zyngier
2021-05-19 13:51     ` Steven Price
2021-05-20 12:05   ` Catalin Marinas
2021-05-20 15:58     ` Steven Price
2021-05-20 17:27       ` Catalin Marinas
2021-05-21  9:42         ` Steven Price
2021-05-24 18:11           ` Catalin Marinas
2021-05-27  7:50             ` Steven Price [this message]
2021-05-27 13:08               ` Catalin Marinas
2021-05-17 12:32 ` [PATCH v12 8/8] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-05-17 18:09   ` Marc Zyngier
2021-05-19 14:09     ` Steven Price
2021-05-20 10:24       ` Marc Zyngier
2021-05-20 10:52         ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58345eca-6e5f-0faa-e47d-e9149d73f6c5@arm.com \
    --to=steven.price@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=Haibo.Xu@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).