All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Price <steven.price@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Haibo Xu <Haibo.Xu@arm.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature
Date: Thu, 29 Apr 2021 17:06:41 +0100	[thread overview]
Message-ID: <c3293d47-a5f2-ea4a-6730-f5cae26d8a7e@arm.com> (raw)
In-Reply-To: <20210428170705.GB4022@arm.com>

On 28/04/2021 18:07, Catalin Marinas wrote:
> On Fri, Apr 16, 2021 at 04:43:05PM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 77cb2d28f2a4..5f8e165ea053 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -879,6 +879,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +
>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && !device &&
>> +	    pfn_valid(pfn)) {
> 
> In the current implementation, device == !pfn_valid(), so we could skip
> the latter check.

Thanks, I'll drop that check.

>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>> +		 * have already been initialised.
>> +		 */
>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
> 
> I think that's fine, though maybe adding a comment that otherwise it
> would be mapped at stage 2 as Normal Cacheable and we cannot guarantee
> that the memory supports MTE tags.

That's what I intended by "be able to see the page's tags", but I'll 
reword to be explicit about it being Normal Cacheable.

>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		prot |= KVM_PGTABLE_PROT_W;
> 
> I probably asked already but is the only way to map a standard RAM page
> (not device) in stage 2 via the fault handler? One case I had in mind
> was something like get_user_pages() but it looks like that one doesn't
> call set_pte_at_notify(). There are a few other places where
> set_pte_at_notify() is called and these may happen before we got a
> chance to fault on stage 2, effectively populating the entry (IIUC). If
> that's an issue, we could move the above loop and check closer to the
> actual pte setting like kvm_pgtable_stage2_map().

The only call sites of kvm_pgtable_stage2_map() are in mmu.c:

  * kvm_phys_addr_ioremap() - maps as device in stage 2

  * user_mem_abort() - handled above

  * kvm_set_spte_handler() - ultimately called from the .change_pte() 
callback of the MMU notifier

So the last one is potentially a problem. It's called via the MMU 
notifiers in the case of set_pte_at_notify(). The users of that are:

  * uprobe_write_opcode(): Allocates a new page and performs a 
copy_highpage() to copy the data to the new page (which with MTE 
includes the tags and will copy across the PG_mte_tagged flag).

  * write_protect_page() (KSM): Changes the permissions on the PTE but 
it's still the same page, so nothing to do regarding MTE.

  * replace_page() (KSM): If the page has MTE tags then the MTE version 
of memcmp_pages() will return false, so the only caller 
(try_to_merge_one_page()) will never call this on a page with tags.

  * wp_page_copy(): This one is more interesting - if we go down the 
cow_user_page() path with an old page then everything is safe (tags are 
copied over). The is_zero_pfn() case worries me a bit - a new page is 
allocated, but I can't instantly see anything to zero out the tags (and 
set PG_mte_tagged).

  * migrate_vma_insert_page(): I think migration should be safe as the 
tags should be copied.

So wp_page_copy() looks suspicious.

kvm_pgtable_stage2_map() looks like it could be a good place for the 
checks, it looks like it should work and is probably a more obvious 
place for the checks.

> While the set_pte_at() race on the page flags is somewhat clearer, we
> may still have a race here with the VMM's set_pte_at() if the page is
> mapped as tagged. KVM has its own mmu_lock but it wouldn't be held when
> handling the VMM page tables (well, not always, see below).
> 
> gfn_to_pfn_prot() ends up calling get_user_pages*(). At least the slow
> path (hva_to_pfn_slow()) ends up with FOLL_TOUCH in gup and the VMM pte
> would be set, tags cleared (if PROT_MTE) before the stage 2 pte. I'm not
> sure whether get_user_page_fast_only() does the same.
> 
> The race with an mprotect(PROT_MTE) in the VMM is fine I think as the
> KVM mmu notifier is invoked before set_pte_at() and racing with another
> user_mem_abort() is serialised by the KVM mmu_lock. The subsequent
> set_pte_at() would see the PG_mte_tagged set either by the current CPU
> or by the one it was racing with.
> 

Given the changes to set_pte_at() which means that tags are restored 
from swap even if !PROT_MTE, the only race I can see remaining is the 
creation of new PROT_MTE mappings. As you mention an attempt to change 
mappings in the VMM memory space should involve a mmu notifier call 
which I think serialises this. So the remaining issue is doing this in a 
separate address space.

So I guess the potential problem is:

  * allocate memory MAP_SHARED but !PROT_MTE
  * fork()
  * VM causes a fault in parent address space
  * child does a mprotect(PROT_MTE)

With the last two potentially racing. Sadly I can't see a good way of 
handling that.

Steve

WARNING: multiple messages have this Message-ID (diff)
From: Steven Price <steven.price@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Andrew Jones <drjones@redhat.com>, Haibo Xu <Haibo.Xu@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	Julien Thierry <julien.thierry.kdev@gmail.com>
Subject: Re: [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature
Date: Thu, 29 Apr 2021 17:06:41 +0100	[thread overview]
Message-ID: <c3293d47-a5f2-ea4a-6730-f5cae26d8a7e@arm.com> (raw)
In-Reply-To: <20210428170705.GB4022@arm.com>

On 28/04/2021 18:07, Catalin Marinas wrote:
> On Fri, Apr 16, 2021 at 04:43:05PM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 77cb2d28f2a4..5f8e165ea053 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -879,6 +879,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +
>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && !device &&
>> +	    pfn_valid(pfn)) {
> 
> In the current implementation, device == !pfn_valid(), so we could skip
> the latter check.

Thanks, I'll drop that check.

>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>> +		 * have already been initialised.
>> +		 */
>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
> 
> I think that's fine, though maybe adding a comment that otherwise it
> would be mapped at stage 2 as Normal Cacheable and we cannot guarantee
> that the memory supports MTE tags.

That's what I intended by "be able to see the page's tags", but I'll 
reword to be explicit about it being Normal Cacheable.

>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		prot |= KVM_PGTABLE_PROT_W;
> 
> I probably asked already but is the only way to map a standard RAM page
> (not device) in stage 2 via the fault handler? One case I had in mind
> was something like get_user_pages() but it looks like that one doesn't
> call set_pte_at_notify(). There are a few other places where
> set_pte_at_notify() is called and these may happen before we got a
> chance to fault on stage 2, effectively populating the entry (IIUC). If
> that's an issue, we could move the above loop and check closer to the
> actual pte setting like kvm_pgtable_stage2_map().

The only call sites of kvm_pgtable_stage2_map() are in mmu.c:

  * kvm_phys_addr_ioremap() - maps as device in stage 2

  * user_mem_abort() - handled above

  * kvm_set_spte_handler() - ultimately called from the .change_pte() 
callback of the MMU notifier

So the last one is potentially a problem. It's called via the MMU 
notifiers in the case of set_pte_at_notify(). The users of that are:

  * uprobe_write_opcode(): Allocates a new page and performs a 
copy_highpage() to copy the data to the new page (which with MTE 
includes the tags and will copy across the PG_mte_tagged flag).

  * write_protect_page() (KSM): Changes the permissions on the PTE but 
it's still the same page, so nothing to do regarding MTE.

  * replace_page() (KSM): If the page has MTE tags then the MTE version 
of memcmp_pages() will return false, so the only caller 
(try_to_merge_one_page()) will never call this on a page with tags.

  * wp_page_copy(): This one is more interesting - if we go down the 
cow_user_page() path with an old page then everything is safe (tags are 
copied over). The is_zero_pfn() case worries me a bit - a new page is 
allocated, but I can't instantly see anything to zero out the tags (and 
set PG_mte_tagged).

  * migrate_vma_insert_page(): I think migration should be safe as the 
tags should be copied.

So wp_page_copy() looks suspicious.

kvm_pgtable_stage2_map() looks like it could be a good place for the 
checks, it looks like it should work and is probably a more obvious 
place for the checks.

> While the set_pte_at() race on the page flags is somewhat clearer, we
> may still have a race here with the VMM's set_pte_at() if the page is
> mapped as tagged. KVM has its own mmu_lock but it wouldn't be held when
> handling the VMM page tables (well, not always, see below).
> 
> gfn_to_pfn_prot() ends up calling get_user_pages*(). At least the slow
> path (hva_to_pfn_slow()) ends up with FOLL_TOUCH in gup and the VMM pte
> would be set, tags cleared (if PROT_MTE) before the stage 2 pte. I'm not
> sure whether get_user_page_fast_only() does the same.
> 
> The race with an mprotect(PROT_MTE) in the VMM is fine I think as the
> KVM mmu notifier is invoked before set_pte_at() and racing with another
> user_mem_abort() is serialised by the KVM mmu_lock. The subsequent
> set_pte_at() would see the PG_mte_tagged set either by the current CPU
> or by the one it was racing with.
> 

Given the changes to set_pte_at() which means that tags are restored 
from swap even if !PROT_MTE, the only race I can see remaining is the 
creation of new PROT_MTE mappings. As you mention an attempt to change 
mappings in the VMM memory space should involve a mmu notifier call 
which I think serialises this. So the remaining issue is doing this in a 
separate address space.

So I guess the potential problem is:

  * allocate memory MAP_SHARED but !PROT_MTE
  * fork()
  * VM causes a fault in parent address space
  * child does a mprotect(PROT_MTE)

With the last two potentially racing. Sadly I can't see a good way of 
handling that.

Steve


WARNING: multiple messages have this Message-ID (diff)
From: Steven Price <steven.price@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature
Date: Thu, 29 Apr 2021 17:06:41 +0100	[thread overview]
Message-ID: <c3293d47-a5f2-ea4a-6730-f5cae26d8a7e@arm.com> (raw)
In-Reply-To: <20210428170705.GB4022@arm.com>

On 28/04/2021 18:07, Catalin Marinas wrote:
> On Fri, Apr 16, 2021 at 04:43:05PM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 77cb2d28f2a4..5f8e165ea053 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -879,6 +879,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +
>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && !device &&
>> +	    pfn_valid(pfn)) {
> 
> In the current implementation, device == !pfn_valid(), so we could skip
> the latter check.

Thanks, I'll drop that check.

>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>> +		 * have already been initialised.
>> +		 */
>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
> 
> I think that's fine, though maybe adding a comment that otherwise it
> would be mapped at stage 2 as Normal Cacheable and we cannot guarantee
> that the memory supports MTE tags.

That's what I intended by "be able to see the page's tags", but I'll 
reword to be explicit about it being Normal Cacheable.

>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		prot |= KVM_PGTABLE_PROT_W;
> 
> I probably asked already but is the only way to map a standard RAM page
> (not device) in stage 2 via the fault handler? One case I had in mind
> was something like get_user_pages() but it looks like that one doesn't
> call set_pte_at_notify(). There are a few other places where
> set_pte_at_notify() is called and these may happen before we got a
> chance to fault on stage 2, effectively populating the entry (IIUC). If
> that's an issue, we could move the above loop and check closer to the
> actual pte setting like kvm_pgtable_stage2_map().

The only call sites of kvm_pgtable_stage2_map() are in mmu.c:

  * kvm_phys_addr_ioremap() - maps as device in stage 2

  * user_mem_abort() - handled above

  * kvm_set_spte_handler() - ultimately called from the .change_pte() 
callback of the MMU notifier

So the last one is potentially a problem. It's called via the MMU 
notifiers in the case of set_pte_at_notify(). The users of that are:

  * uprobe_write_opcode(): Allocates a new page and performs a 
copy_highpage() to copy the data to the new page (which with MTE 
includes the tags and will copy across the PG_mte_tagged flag).

  * write_protect_page() (KSM): Changes the permissions on the PTE but 
it's still the same page, so nothing to do regarding MTE.

  * replace_page() (KSM): If the page has MTE tags then the MTE version 
of memcmp_pages() will return false, so the only caller 
(try_to_merge_one_page()) will never call this on a page with tags.

  * wp_page_copy(): This one is more interesting - if we go down the 
cow_user_page() path with an old page then everything is safe (tags are 
copied over). The is_zero_pfn() case worries me a bit - a new page is 
allocated, but I can't instantly see anything to zero out the tags (and 
set PG_mte_tagged).

  * migrate_vma_insert_page(): I think migration should be safe as the 
tags should be copied.

So wp_page_copy() looks suspicious.

kvm_pgtable_stage2_map() looks like it could be a good place for the 
checks, it looks like it should work and is probably a more obvious 
place for the checks.

> While the set_pte_at() race on the page flags is somewhat clearer, we
> may still have a race here with the VMM's set_pte_at() if the page is
> mapped as tagged. KVM has its own mmu_lock but it wouldn't be held when
> handling the VMM page tables (well, not always, see below).
> 
> gfn_to_pfn_prot() ends up calling get_user_pages*(). At least the slow
> path (hva_to_pfn_slow()) ends up with FOLL_TOUCH in gup and the VMM pte
> would be set, tags cleared (if PROT_MTE) before the stage 2 pte. I'm not
> sure whether get_user_page_fast_only() does the same.
> 
> The race with an mprotect(PROT_MTE) in the VMM is fine I think as the
> KVM mmu notifier is invoked before set_pte_at() and racing with another
> user_mem_abort() is serialised by the KVM mmu_lock. The subsequent
> set_pte_at() would see the PG_mte_tagged set either by the current CPU
> or by the one it was racing with.
> 

Given the changes to set_pte_at() which means that tags are restored 
from swap even if !PROT_MTE, the only race I can see remaining is the 
creation of new PROT_MTE mappings. As you mention an attempt to change 
mappings in the VMM memory space should involve a mmu notifier call 
which I think serialises this. So the remaining issue is doing this in a 
separate address space.

So I guess the potential problem is:

  * allocate memory MAP_SHARED but !PROT_MTE
  * fork()
  * VM causes a fault in parent address space
  * child does a mprotect(PROT_MTE)

With the last two potentially racing. Sadly I can't see a good way of 
handling that.

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: Steven Price <steven.price@arm.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	qemu-devel@nongnu.org, Juan Quintela <quintela@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Peter Maydell <peter.maydell@linaro.org>,
	Haibo Xu <Haibo.Xu@arm.com>, Andrew Jones <drjones@redhat.com>
Subject: Re: [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature
Date: Thu, 29 Apr 2021 17:06:41 +0100	[thread overview]
Message-ID: <c3293d47-a5f2-ea4a-6730-f5cae26d8a7e@arm.com> (raw)
In-Reply-To: <20210428170705.GB4022@arm.com>

On 28/04/2021 18:07, Catalin Marinas wrote:
> On Fri, Apr 16, 2021 at 04:43:05PM +0100, Steven Price wrote:
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 77cb2d28f2a4..5f8e165ea053 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -879,6 +879,26 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>   							   &pfn, &fault_ipa);
>> +
>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && !device &&
>> +	    pfn_valid(pfn)) {
> 
> In the current implementation, device == !pfn_valid(), so we could skip
> the latter check.

Thanks, I'll drop that check.

>> +		/*
>> +		 * VM will be able to see the page's tags, so we must ensure
>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>> +		 * have already been initialised.
>> +		 */
>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>> +		struct page *page = pfn_to_online_page(pfn);
>> +
>> +		if (!page)
>> +			return -EFAULT;
> 
> I think that's fine, though maybe adding a comment that otherwise it
> would be mapped at stage 2 as Normal Cacheable and we cannot guarantee
> that the memory supports MTE tags.

That's what I intended by "be able to see the page's tags", but I'll 
reword to be explicit about it being Normal Cacheable.

>> +
>> +		for (i = 0; i < nr_pages; i++, page++) {
>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>> +				mte_clear_page_tags(page_address(page));
>> +		}
>> +	}
>> +
>>   	if (writable)
>>   		prot |= KVM_PGTABLE_PROT_W;
> 
> I probably asked already but is the only way to map a standard RAM page
> (not device) in stage 2 via the fault handler? One case I had in mind
> was something like get_user_pages() but it looks like that one doesn't
> call set_pte_at_notify(). There are a few other places where
> set_pte_at_notify() is called and these may happen before we got a
> chance to fault on stage 2, effectively populating the entry (IIUC). If
> that's an issue, we could move the above loop and check closer to the
> actual pte setting like kvm_pgtable_stage2_map().

The only call sites of kvm_pgtable_stage2_map() are in mmu.c:

  * kvm_phys_addr_ioremap() - maps as device in stage 2

  * user_mem_abort() - handled above

  * kvm_set_spte_handler() - ultimately called from the .change_pte() 
callback of the MMU notifier

So the last one is potentially a problem. It's called via the MMU 
notifiers in the case of set_pte_at_notify(). The users of that are:

  * uprobe_write_opcode(): Allocates a new page and performs a 
copy_highpage() to copy the data to the new page (which with MTE 
includes the tags and will copy across the PG_mte_tagged flag).

  * write_protect_page() (KSM): Changes the permissions on the PTE but 
it's still the same page, so nothing to do regarding MTE.

  * replace_page() (KSM): If the page has MTE tags then the MTE version 
of memcmp_pages() will return false, so the only caller 
(try_to_merge_one_page()) will never call this on a page with tags.

  * wp_page_copy(): This one is more interesting - if we go down the 
cow_user_page() path with an old page then everything is safe (tags are 
copied over). The is_zero_pfn() case worries me a bit - a new page is 
allocated, but I can't instantly see anything to zero out the tags (and 
set PG_mte_tagged).

  * migrate_vma_insert_page(): I think migration should be safe as the 
tags should be copied.

So wp_page_copy() looks suspicious.

kvm_pgtable_stage2_map() looks like it could be a good place for the 
checks, it looks like it should work and is probably a more obvious 
place for the checks.

> While the set_pte_at() race on the page flags is somewhat clearer, we
> may still have a race here with the VMM's set_pte_at() if the page is
> mapped as tagged. KVM has its own mmu_lock but it wouldn't be held when
> handling the VMM page tables (well, not always, see below).
> 
> gfn_to_pfn_prot() ends up calling get_user_pages*(). At least the slow
> path (hva_to_pfn_slow()) ends up with FOLL_TOUCH in gup and the VMM pte
> would be set, tags cleared (if PROT_MTE) before the stage 2 pte. I'm not
> sure whether get_user_page_fast_only() does the same.
> 
> The race with an mprotect(PROT_MTE) in the VMM is fine I think as the
> KVM mmu notifier is invoked before set_pte_at() and racing with another
> user_mem_abort() is serialised by the KVM mmu_lock. The subsequent
> set_pte_at() would see the PG_mte_tagged set either by the current CPU
> or by the one it was racing with.
> 

Given the changes to set_pte_at() which means that tags are restored 
from swap even if !PROT_MTE, the only race I can see remaining is the 
creation of new PROT_MTE mappings. As you mention an attempt to change 
mappings in the VMM memory space should involve a mmu notifier call 
which I think serialises this. So the remaining issue is doing this in a 
separate address space.

So I guess the potential problem is:

  * allocate memory MAP_SHARED but !PROT_MTE
  * fork()
  * VM causes a fault in parent address space
  * child does a mprotect(PROT_MTE)

With the last two potentially racing. Sadly I can't see a good way of 
handling that.

Steve

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-04-29 16:06 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16 15:43 [PATCH v11 0/6] MTE support for KVM guest Steven Price
2021-04-16 15:43 ` Steven Price
2021-04-16 15:43 ` Steven Price
2021-04-16 15:43 ` Steven Price
2021-04-16 15:43 ` [PATCH v11 1/6] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-27 17:43   ` Catalin Marinas
2021-04-27 17:43     ` Catalin Marinas
2021-04-27 17:43     ` Catalin Marinas
2021-04-27 17:43     ` Catalin Marinas
2021-04-29 16:06     ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-05-04 15:29       ` Catalin Marinas
2021-05-04 15:29         ` Catalin Marinas
2021-05-04 15:29         ` Catalin Marinas
2021-05-04 15:29         ` Catalin Marinas
2021-04-16 15:43 ` [PATCH v11 2/6] arm64: kvm: Introduce MTE VM feature Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-28 17:07   ` Catalin Marinas
2021-04-28 17:07     ` Catalin Marinas
2021-04-28 17:07     ` Catalin Marinas
2021-04-28 17:07     ` Catalin Marinas
2021-04-29 16:06     ` Steven Price [this message]
2021-04-29 16:06       ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-05-04 17:40       ` Catalin Marinas
2021-05-04 17:40         ` Catalin Marinas
2021-05-04 17:40         ` Catalin Marinas
2021-05-04 17:40         ` Catalin Marinas
2021-05-06 16:15         ` Steven Price
2021-05-06 16:15           ` Steven Price
2021-05-06 16:15           ` Steven Price
2021-05-06 16:15           ` Steven Price
2021-05-07 18:25           ` Catalin Marinas
2021-05-07 18:25             ` Catalin Marinas
2021-05-07 18:25             ` Catalin Marinas
2021-05-07 18:25             ` Catalin Marinas
2021-05-10 18:35             ` Catalin Marinas
2021-05-10 18:35               ` Catalin Marinas
2021-05-10 18:35               ` Catalin Marinas
2021-05-10 18:35               ` Catalin Marinas
2021-05-12 15:46               ` Steven Price
2021-05-12 15:46                 ` Steven Price
2021-05-12 15:46                 ` Steven Price
2021-05-12 15:46                 ` Steven Price
2021-05-12 17:45                 ` Catalin Marinas
2021-05-12 17:45                   ` Catalin Marinas
2021-05-12 17:45                   ` Catalin Marinas
2021-05-12 17:45                   ` Catalin Marinas
2021-05-13 10:57                   ` Steven Price
2021-05-13 10:57                     ` Steven Price
2021-05-13 10:57                     ` Steven Price
2021-05-13 10:57                     ` Steven Price
2021-05-13 15:08                     ` Catalin Marinas
2021-05-13 15:08                       ` Catalin Marinas
2021-05-13 15:08                       ` Catalin Marinas
2021-05-13 15:08                       ` Catalin Marinas
2021-05-13 15:21                     ` Catalin Marinas
2021-05-13 15:21                       ` Catalin Marinas
2021-05-13 15:21                       ` Catalin Marinas
2021-05-13 15:21                       ` Catalin Marinas
2021-04-16 15:43 ` [PATCH v11 3/6] arm64: kvm: Save/restore MTE registers Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43 ` [PATCH v11 4/6] arm64: kvm: Expose KVM_ARM_CAP_MTE Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43 ` [PATCH v11 5/6] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-27 17:58   ` Catalin Marinas
2021-04-27 17:58     ` Catalin Marinas
2021-04-27 17:58     ` Catalin Marinas
2021-04-27 17:58     ` Catalin Marinas
2021-04-29 16:06     ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-04-29 16:06       ` Steven Price
2021-05-04 17:44       ` Catalin Marinas
2021-05-04 17:44         ` Catalin Marinas
2021-05-04 17:44         ` Catalin Marinas
2021-05-04 17:44         ` Catalin Marinas
2021-05-07  9:44         ` Steven Price
2021-05-07  9:44           ` Steven Price
2021-05-07  9:44           ` Steven Price
2021-05-07  9:44           ` Steven Price
2021-05-07  9:59           ` David Laight
2021-05-07  9:59             ` David Laight
2021-05-07  9:59             ` David Laight
2021-05-07  9:59             ` David Laight
2021-04-16 15:43 ` [PATCH v11 6/6] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price
2021-04-16 15:43   ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3293d47-a5f2-ea4a-6730-f5cae26d8a7e@arm.com \
    --to=steven.price@arm.com \
    --cc=Dave.Martin@arm.com \
    --cc=Haibo.Xu@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.