All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Price <steven.price@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Andrew Jones <drjones@redhat.com>, Haibo Xu <Haibo.Xu@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	Julien Thierry <julien.thierry.kdev@gmail.com>
Subject: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Date: Wed, 31 Mar 2021 11:32:46 +0200	[thread overview]
Message-ID: <d545a051-a02a-4c3a-0afe-66612839ba32@redhat.com> (raw)
In-Reply-To: <20210331092109.GA21921@arm.com>

On 31.03.21 11:21, Catalin Marinas wrote:
> On Wed, Mar 31, 2021 at 09:34:44AM +0200, David Hildenbrand wrote:
>> On 30.03.21 12:30, Catalin Marinas wrote:
>>> On Mon, Mar 29, 2021 at 05:06:51PM +0100, Steven Price wrote:
>>>> On 28/03/2021 13:21, Catalin Marinas wrote:
>>>>> On Sat, Mar 27, 2021 at 03:23:24PM +0000, Catalin Marinas wrote:
>>>>>> On Fri, Mar 12, 2021 at 03:18:58PM +0000, Steven Price wrote:
>>>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>>>> index 77cb2d28f2a4..b31b7a821f90 100644
>>>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>>>> @@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>>     	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>>>     		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>>>     							   &pfn, &fault_ipa);
>>>>>>> +
>>>>>>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && pfn_valid(pfn)) {
>>>>>>> +		/*
>>>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>>>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>>>>>>> +		 * have already been initialised.
>>>>>>> +		 */
>>>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>>>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>>>>>>> +
>>>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>>>> +				mte_clear_page_tags(page_address(page));
>>>>>>> +		}
>>>>>>> +	}
>>>>>>
>>>>>> This pfn_valid() check may be problematic. Following commit eeb0753ba27b
>>>>>> ("arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory"), it returns
>>>>>> true for ZONE_DEVICE memory but such memory is allowed not to support
>>>>>> MTE.
>>>>>
>>>>> Some more thinking, this should be safe as any ZONE_DEVICE would be
>>>>> mapped as untagged memory in the kernel linear map. It could be slightly
>>>>> inefficient if it unnecessarily tries to clear tags in ZONE_DEVICE,
>>>>> untagged memory. Another overhead is pfn_valid() which will likely end
>>>>> up calling memblock_is_map_memory().
>>>>>
>>>>> However, the bigger issue is that Stage 2 cannot disable tagging for
>>>>> Stage 1 unless the memory is Non-cacheable or Device at S2. Is there a
>>>>> way to detect what gets mapped in the guest as Normal Cacheable memory
>>>>> and make sure it's only early memory or hotplug but no ZONE_DEVICE (or
>>>>> something else like on-chip memory)?  If we can't guarantee that all
>>>>> Cacheable memory given to a guest supports tags, we should disable the
>>>>> feature altogether.
>>>>
>>>> In stage 2 I believe we only have two types of mapping - 'normal' or
>>>> DEVICE_nGnRE (see stage2_map_set_prot_attr()). Filtering out the latter is a
>>>> case of checking the 'device' variable, and makes sense to avoid the
>>>> overhead you describe.
>>>>
>>>> This should also guarantee that all stage-2 cacheable memory supports tags,
>>>> as kvm_is_device_pfn() is simply !pfn_valid(), and pfn_valid() should only
>>>> be true for memory that Linux considers "normal".
>>
>> If you think "normal" == "normal System RAM", that's wrong; see below.
> 
> By "normal" I think both Steven and I meant the Normal Cacheable memory
> attribute (another being the Device memory attribute).
> 
>>> That's the problem. With Anshuman's commit I mentioned above,
>>> pfn_valid() returns true for ZONE_DEVICE mappings (e.g. persistent
>>> memory, not talking about some I/O mapping that requires Device_nGnRE).
>>> So kvm_is_device_pfn() is false for such memory and it may be mapped as
>>> Normal but it is not guaranteed to support tagging.
>>
>> pfn_valid() means "there is a struct page"; if you do pfn_to_page() and
>> touch the page, you won't fault. So Anshuman's commit is correct.
> 
> I agree.
> 
>> pfn_to_online_page() means, "there is a struct page and it's system RAM
>> that's in use; the memmap has a sane content"
> 
> Does pfn_to_online_page() returns a valid struct page pointer for
> ZONE_DEVICE pages? IIUC, these are not guaranteed to be system RAM, for
> some definition of system RAM (I assume NVDIMM != system RAM). For
> example, pmem_attach_disk() calls devm_memremap_pages() and this would
> use the Normal Cacheable memory attribute without necessarily being
> system RAM.

No, not for ZONE_DEVICE.

However, if you expose PMEM via dax/kmem as System RAM to the system (-> 
add_memory_driver_managed()), then PMEM (managed via ZONE_NOMRAL or 
ZONE_MOVABLE) would work with pfn_to_online_page() -- as the system 
thinks it's "ordinary system RAM" and the memory is managed by the buddy.

> 
> So if pfn_valid() is not equivalent to system RAM, we have a potential
> issue with MTE. Even if "system RAM" includes NVDIMMs, we still have
> this issue and we may need a new term to describe MTE-safe memory. In
> the kernel we assume MTE-safe all pages that can be mapped as
> MAP_ANONYMOUS and I don't think these include ZONE_DEVICE pages.
> 
> Thanks.
> 


-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Andrew Jones <drjones@redhat.com>, Haibo Xu <Haibo.Xu@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Steven Price <steven.price@arm.com>,
	James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu,
	Thomas Gleixner <tglx@linutronix.de>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Date: Wed, 31 Mar 2021 11:32:46 +0200	[thread overview]
Message-ID: <d545a051-a02a-4c3a-0afe-66612839ba32@redhat.com> (raw)
In-Reply-To: <20210331092109.GA21921@arm.com>

On 31.03.21 11:21, Catalin Marinas wrote:
> On Wed, Mar 31, 2021 at 09:34:44AM +0200, David Hildenbrand wrote:
>> On 30.03.21 12:30, Catalin Marinas wrote:
>>> On Mon, Mar 29, 2021 at 05:06:51PM +0100, Steven Price wrote:
>>>> On 28/03/2021 13:21, Catalin Marinas wrote:
>>>>> On Sat, Mar 27, 2021 at 03:23:24PM +0000, Catalin Marinas wrote:
>>>>>> On Fri, Mar 12, 2021 at 03:18:58PM +0000, Steven Price wrote:
>>>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>>>> index 77cb2d28f2a4..b31b7a821f90 100644
>>>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>>>> @@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>>     	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>>>     		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>>>     							   &pfn, &fault_ipa);
>>>>>>> +
>>>>>>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && pfn_valid(pfn)) {
>>>>>>> +		/*
>>>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>>>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>>>>>>> +		 * have already been initialised.
>>>>>>> +		 */
>>>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>>>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>>>>>>> +
>>>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>>>> +				mte_clear_page_tags(page_address(page));
>>>>>>> +		}
>>>>>>> +	}
>>>>>>
>>>>>> This pfn_valid() check may be problematic. Following commit eeb0753ba27b
>>>>>> ("arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory"), it returns
>>>>>> true for ZONE_DEVICE memory but such memory is allowed not to support
>>>>>> MTE.
>>>>>
>>>>> Some more thinking, this should be safe as any ZONE_DEVICE would be
>>>>> mapped as untagged memory in the kernel linear map. It could be slightly
>>>>> inefficient if it unnecessarily tries to clear tags in ZONE_DEVICE,
>>>>> untagged memory. Another overhead is pfn_valid() which will likely end
>>>>> up calling memblock_is_map_memory().
>>>>>
>>>>> However, the bigger issue is that Stage 2 cannot disable tagging for
>>>>> Stage 1 unless the memory is Non-cacheable or Device at S2. Is there a
>>>>> way to detect what gets mapped in the guest as Normal Cacheable memory
>>>>> and make sure it's only early memory or hotplug but no ZONE_DEVICE (or
>>>>> something else like on-chip memory)?  If we can't guarantee that all
>>>>> Cacheable memory given to a guest supports tags, we should disable the
>>>>> feature altogether.
>>>>
>>>> In stage 2 I believe we only have two types of mapping - 'normal' or
>>>> DEVICE_nGnRE (see stage2_map_set_prot_attr()). Filtering out the latter is a
>>>> case of checking the 'device' variable, and makes sense to avoid the
>>>> overhead you describe.
>>>>
>>>> This should also guarantee that all stage-2 cacheable memory supports tags,
>>>> as kvm_is_device_pfn() is simply !pfn_valid(), and pfn_valid() should only
>>>> be true for memory that Linux considers "normal".
>>
>> If you think "normal" == "normal System RAM", that's wrong; see below.
> 
> By "normal" I think both Steven and I meant the Normal Cacheable memory
> attribute (another being the Device memory attribute).
> 
>>> That's the problem. With Anshuman's commit I mentioned above,
>>> pfn_valid() returns true for ZONE_DEVICE mappings (e.g. persistent
>>> memory, not talking about some I/O mapping that requires Device_nGnRE).
>>> So kvm_is_device_pfn() is false for such memory and it may be mapped as
>>> Normal but it is not guaranteed to support tagging.
>>
>> pfn_valid() means "there is a struct page"; if you do pfn_to_page() and
>> touch the page, you won't fault. So Anshuman's commit is correct.
> 
> I agree.
> 
>> pfn_to_online_page() means, "there is a struct page and it's system RAM
>> that's in use; the memmap has a sane content"
> 
> Does pfn_to_online_page() returns a valid struct page pointer for
> ZONE_DEVICE pages? IIUC, these are not guaranteed to be system RAM, for
> some definition of system RAM (I assume NVDIMM != system RAM). For
> example, pmem_attach_disk() calls devm_memremap_pages() and this would
> use the Normal Cacheable memory attribute without necessarily being
> system RAM.

No, not for ZONE_DEVICE.

However, if you expose PMEM via dax/kmem as System RAM to the system (-> 
add_memory_driver_managed()), then PMEM (managed via ZONE_NOMRAL or 
ZONE_MOVABLE) would work with pfn_to_online_page() -- as the system 
thinks it's "ordinary system RAM" and the memory is managed by the buddy.

> 
> So if pfn_valid() is not equivalent to system RAM, we have a potential
> issue with MTE. Even if "system RAM" includes NVDIMMs, we still have
> this issue and we may need a new term to describe MTE-safe memory. In
> the kernel we assume MTE-safe all pages that can be mapped as
> MAP_ANONYMOUS and I don't think these include ZONE_DEVICE pages.
> 
> Thanks.
> 


-- 
Thanks,

David / dhildenb



WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Steven Price <steven.price@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Dave Martin <Dave.Martin@arm.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Date: Wed, 31 Mar 2021 11:32:46 +0200	[thread overview]
Message-ID: <d545a051-a02a-4c3a-0afe-66612839ba32@redhat.com> (raw)
In-Reply-To: <20210331092109.GA21921@arm.com>

On 31.03.21 11:21, Catalin Marinas wrote:
> On Wed, Mar 31, 2021 at 09:34:44AM +0200, David Hildenbrand wrote:
>> On 30.03.21 12:30, Catalin Marinas wrote:
>>> On Mon, Mar 29, 2021 at 05:06:51PM +0100, Steven Price wrote:
>>>> On 28/03/2021 13:21, Catalin Marinas wrote:
>>>>> On Sat, Mar 27, 2021 at 03:23:24PM +0000, Catalin Marinas wrote:
>>>>>> On Fri, Mar 12, 2021 at 03:18:58PM +0000, Steven Price wrote:
>>>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>>>> index 77cb2d28f2a4..b31b7a821f90 100644
>>>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>>>> @@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>>     	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>>>     		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>>>     							   &pfn, &fault_ipa);
>>>>>>> +
>>>>>>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && pfn_valid(pfn)) {
>>>>>>> +		/*
>>>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>>>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>>>>>>> +		 * have already been initialised.
>>>>>>> +		 */
>>>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>>>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>>>>>>> +
>>>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>>>> +				mte_clear_page_tags(page_address(page));
>>>>>>> +		}
>>>>>>> +	}
>>>>>>
>>>>>> This pfn_valid() check may be problematic. Following commit eeb0753ba27b
>>>>>> ("arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory"), it returns
>>>>>> true for ZONE_DEVICE memory but such memory is allowed not to support
>>>>>> MTE.
>>>>>
>>>>> Some more thinking, this should be safe as any ZONE_DEVICE would be
>>>>> mapped as untagged memory in the kernel linear map. It could be slightly
>>>>> inefficient if it unnecessarily tries to clear tags in ZONE_DEVICE,
>>>>> untagged memory. Another overhead is pfn_valid() which will likely end
>>>>> up calling memblock_is_map_memory().
>>>>>
>>>>> However, the bigger issue is that Stage 2 cannot disable tagging for
>>>>> Stage 1 unless the memory is Non-cacheable or Device at S2. Is there a
>>>>> way to detect what gets mapped in the guest as Normal Cacheable memory
>>>>> and make sure it's only early memory or hotplug but no ZONE_DEVICE (or
>>>>> something else like on-chip memory)?  If we can't guarantee that all
>>>>> Cacheable memory given to a guest supports tags, we should disable the
>>>>> feature altogether.
>>>>
>>>> In stage 2 I believe we only have two types of mapping - 'normal' or
>>>> DEVICE_nGnRE (see stage2_map_set_prot_attr()). Filtering out the latter is a
>>>> case of checking the 'device' variable, and makes sense to avoid the
>>>> overhead you describe.
>>>>
>>>> This should also guarantee that all stage-2 cacheable memory supports tags,
>>>> as kvm_is_device_pfn() is simply !pfn_valid(), and pfn_valid() should only
>>>> be true for memory that Linux considers "normal".
>>
>> If you think "normal" == "normal System RAM", that's wrong; see below.
> 
> By "normal" I think both Steven and I meant the Normal Cacheable memory
> attribute (another being the Device memory attribute).
> 
>>> That's the problem. With Anshuman's commit I mentioned above,
>>> pfn_valid() returns true for ZONE_DEVICE mappings (e.g. persistent
>>> memory, not talking about some I/O mapping that requires Device_nGnRE).
>>> So kvm_is_device_pfn() is false for such memory and it may be mapped as
>>> Normal but it is not guaranteed to support tagging.
>>
>> pfn_valid() means "there is a struct page"; if you do pfn_to_page() and
>> touch the page, you won't fault. So Anshuman's commit is correct.
> 
> I agree.
> 
>> pfn_to_online_page() means, "there is a struct page and it's system RAM
>> that's in use; the memmap has a sane content"
> 
> Does pfn_to_online_page() returns a valid struct page pointer for
> ZONE_DEVICE pages? IIUC, these are not guaranteed to be system RAM, for
> some definition of system RAM (I assume NVDIMM != system RAM). For
> example, pmem_attach_disk() calls devm_memremap_pages() and this would
> use the Normal Cacheable memory attribute without necessarily being
> system RAM.

No, not for ZONE_DEVICE.

However, if you expose PMEM via dax/kmem as System RAM to the system (-> 
add_memory_driver_managed()), then PMEM (managed via ZONE_NOMRAL or 
ZONE_MOVABLE) would work with pfn_to_online_page() -- as the system 
thinks it's "ordinary system RAM" and the memory is managed by the buddy.

> 
> So if pfn_valid() is not equivalent to system RAM, we have a potential
> issue with MTE. Even if "system RAM" includes NVDIMMs, we still have
> this issue and we may need a new term to describe MTE-safe memory. In
> the kernel we assume MTE-safe all pages that can be mapped as
> MAP_ANONYMOUS and I don't think these include ZONE_DEVICE pages.
> 
> Thanks.
> 


-- 
Thanks,

David / dhildenb

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Catalin Marinas <catalin.marinas@arm.com>
Cc: Steven Price <steven.price@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	 Peter Maydell <peter.maydell@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Andrew Jones <drjones@redhat.com>, Haibo Xu <Haibo.Xu@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	qemu-devel@nongnu.org, Marc Zyngier <maz@kernel.org>,
	Juan Quintela <quintela@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	linux-kernel@vger.kernel.org, Dave Martin <Dave.Martin@arm.com>,
	James Morse <james.morse@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	Julien Thierry <julien.thierry.kdev@gmail.com>
Subject: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature
Date: Wed, 31 Mar 2021 11:32:46 +0200	[thread overview]
Message-ID: <d545a051-a02a-4c3a-0afe-66612839ba32@redhat.com> (raw)
In-Reply-To: <20210331092109.GA21921@arm.com>

On 31.03.21 11:21, Catalin Marinas wrote:
> On Wed, Mar 31, 2021 at 09:34:44AM +0200, David Hildenbrand wrote:
>> On 30.03.21 12:30, Catalin Marinas wrote:
>>> On Mon, Mar 29, 2021 at 05:06:51PM +0100, Steven Price wrote:
>>>> On 28/03/2021 13:21, Catalin Marinas wrote:
>>>>> On Sat, Mar 27, 2021 at 03:23:24PM +0000, Catalin Marinas wrote:
>>>>>> On Fri, Mar 12, 2021 at 03:18:58PM +0000, Steven Price wrote:
>>>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>>>> index 77cb2d28f2a4..b31b7a821f90 100644
>>>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>>>> @@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>>>     	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>>>     		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>>>     							   &pfn, &fault_ipa);
>>>>>>> +
>>>>>>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && pfn_valid(pfn)) {
>>>>>>> +		/*
>>>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>>>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>>>>>>> +		 * have already been initialised.
>>>>>>> +		 */
>>>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>>>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>>>>>>> +
>>>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>>>> +				mte_clear_page_tags(page_address(page));
>>>>>>> +		}
>>>>>>> +	}
>>>>>>
>>>>>> This pfn_valid() check may be problematic. Following commit eeb0753ba27b
>>>>>> ("arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory"), it returns
>>>>>> true for ZONE_DEVICE memory but such memory is allowed not to support
>>>>>> MTE.
>>>>>
>>>>> Some more thinking, this should be safe as any ZONE_DEVICE would be
>>>>> mapped as untagged memory in the kernel linear map. It could be slightly
>>>>> inefficient if it unnecessarily tries to clear tags in ZONE_DEVICE,
>>>>> untagged memory. Another overhead is pfn_valid() which will likely end
>>>>> up calling memblock_is_map_memory().
>>>>>
>>>>> However, the bigger issue is that Stage 2 cannot disable tagging for
>>>>> Stage 1 unless the memory is Non-cacheable or Device at S2. Is there a
>>>>> way to detect what gets mapped in the guest as Normal Cacheable memory
>>>>> and make sure it's only early memory or hotplug but no ZONE_DEVICE (or
>>>>> something else like on-chip memory)?  If we can't guarantee that all
>>>>> Cacheable memory given to a guest supports tags, we should disable the
>>>>> feature altogether.
>>>>
>>>> In stage 2 I believe we only have two types of mapping - 'normal' or
>>>> DEVICE_nGnRE (see stage2_map_set_prot_attr()). Filtering out the latter is a
>>>> case of checking the 'device' variable, and makes sense to avoid the
>>>> overhead you describe.
>>>>
>>>> This should also guarantee that all stage-2 cacheable memory supports tags,
>>>> as kvm_is_device_pfn() is simply !pfn_valid(), and pfn_valid() should only
>>>> be true for memory that Linux considers "normal".
>>
>> If you think "normal" == "normal System RAM", that's wrong; see below.
> 
> By "normal" I think both Steven and I meant the Normal Cacheable memory
> attribute (another being the Device memory attribute).
> 
>>> That's the problem. With Anshuman's commit I mentioned above,
>>> pfn_valid() returns true for ZONE_DEVICE mappings (e.g. persistent
>>> memory, not talking about some I/O mapping that requires Device_nGnRE).
>>> So kvm_is_device_pfn() is false for such memory and it may be mapped as
>>> Normal but it is not guaranteed to support tagging.
>>
>> pfn_valid() means "there is a struct page"; if you do pfn_to_page() and
>> touch the page, you won't fault. So Anshuman's commit is correct.
> 
> I agree.
> 
>> pfn_to_online_page() means, "there is a struct page and it's system RAM
>> that's in use; the memmap has a sane content"
> 
> Does pfn_to_online_page() returns a valid struct page pointer for
> ZONE_DEVICE pages? IIUC, these are not guaranteed to be system RAM, for
> some definition of system RAM (I assume NVDIMM != system RAM). For
> example, pmem_attach_disk() calls devm_memremap_pages() and this would
> use the Normal Cacheable memory attribute without necessarily being
> system RAM.

No, not for ZONE_DEVICE.

However, if you expose PMEM via dax/kmem as System RAM to the system (-> 
add_memory_driver_managed()), then PMEM (managed via ZONE_NOMRAL or 
ZONE_MOVABLE) would work with pfn_to_online_page() -- as the system 
thinks it's "ordinary system RAM" and the memory is managed by the buddy.

> 
> So if pfn_valid() is not equivalent to system RAM, we have a potential
> issue with MTE. Even if "system RAM" includes NVDIMMs, we still have
> this issue and we may need a new term to describe MTE-safe memory. In
> the kernel we assume MTE-safe all pages that can be mapped as
> MAP_ANONYMOUS and I don't think these include ZONE_DEVICE pages.
> 
> Thanks.
> 


-- 
Thanks,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-03-31  9:33 UTC|newest]

Thread overview: 112+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12 15:18 [PATCH v10 0/6] MTE support for KVM guest Steven Price
2021-03-12 15:18 ` Steven Price
2021-03-12 15:18 ` Steven Price
2021-03-12 15:18 ` Steven Price
2021-03-12 15:18 ` [PATCH v10 1/6] arm64: mte: Sync tags for pages where PTE is untagged Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-26 18:56   ` Catalin Marinas
2021-03-26 18:56     ` Catalin Marinas
2021-03-26 18:56     ` Catalin Marinas
2021-03-26 18:56     ` Catalin Marinas
2021-03-29 15:55     ` Steven Price
2021-03-29 15:55       ` Steven Price
2021-03-29 15:55       ` Steven Price
2021-03-29 15:55       ` Steven Price
2021-03-30 10:13       ` Catalin Marinas
2021-03-30 10:13         ` Catalin Marinas
2021-03-30 10:13         ` Catalin Marinas
2021-03-30 10:13         ` Catalin Marinas
2021-03-31 10:09         ` Steven Price
2021-03-31 10:09           ` Steven Price
2021-03-31 10:09           ` Steven Price
2021-03-31 10:09           ` Steven Price
2021-03-12 15:18 ` [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-27 15:23   ` Catalin Marinas
2021-03-27 15:23     ` Catalin Marinas
2021-03-27 15:23     ` Catalin Marinas
2021-03-27 15:23     ` Catalin Marinas
2021-03-28 12:21     ` Catalin Marinas
2021-03-28 12:21       ` Catalin Marinas
2021-03-28 12:21       ` Catalin Marinas
2021-03-28 12:21       ` Catalin Marinas
2021-03-29 16:06       ` Steven Price
2021-03-29 16:06         ` Steven Price
2021-03-29 16:06         ` Steven Price
2021-03-29 16:06         ` Steven Price
2021-03-30 10:30         ` Catalin Marinas
2021-03-30 10:30           ` Catalin Marinas
2021-03-30 10:30           ` Catalin Marinas
2021-03-30 10:30           ` Catalin Marinas
2021-03-31  7:34           ` David Hildenbrand
2021-03-31  7:34             ` David Hildenbrand
2021-03-31  7:34             ` David Hildenbrand
2021-03-31  7:34             ` David Hildenbrand
2021-03-31  9:21             ` Catalin Marinas
2021-03-31  9:21               ` Catalin Marinas
2021-03-31  9:21               ` Catalin Marinas
2021-03-31  9:21               ` Catalin Marinas
2021-03-31  9:32               ` David Hildenbrand [this message]
2021-03-31  9:32                 ` David Hildenbrand
2021-03-31  9:32                 ` David Hildenbrand
2021-03-31  9:32                 ` David Hildenbrand
2021-03-31 10:41                 ` Steven Price
2021-03-31 10:41                   ` Steven Price
2021-03-31 10:41                   ` Steven Price
2021-03-31 10:41                   ` Steven Price
2021-03-31 14:14                   ` David Hildenbrand
2021-03-31 14:14                     ` David Hildenbrand
2021-03-31 14:14                     ` David Hildenbrand
2021-03-31 14:14                     ` David Hildenbrand
2021-03-31 18:43                   ` Catalin Marinas
2021-03-31 18:43                     ` Catalin Marinas
2021-03-31 18:43                     ` Catalin Marinas
2021-03-31 18:43                     ` Catalin Marinas
2021-04-07 10:20                     ` Steven Price
2021-04-07 10:20                       ` Steven Price
2021-04-07 10:20                       ` Steven Price
2021-04-07 10:20                       ` Steven Price
2021-04-07 15:14                       ` Catalin Marinas
2021-04-07 15:14                         ` Catalin Marinas
2021-04-07 15:14                         ` Catalin Marinas
2021-04-07 15:14                         ` Catalin Marinas
2021-04-07 15:30                         ` David Hildenbrand
2021-04-07 15:30                           ` David Hildenbrand
2021-04-07 15:30                           ` David Hildenbrand
2021-04-07 15:30                           ` David Hildenbrand
2021-04-07 15:52                         ` Steven Price
2021-04-07 15:52                           ` Steven Price
2021-04-07 15:52                           ` Steven Price
2021-04-07 15:52                           ` Steven Price
2021-04-08 14:18                           ` Catalin Marinas
2021-04-08 14:18                             ` Catalin Marinas
2021-04-08 14:18                             ` Catalin Marinas
2021-04-08 14:18                             ` Catalin Marinas
2021-04-08 18:16                             ` David Hildenbrand
2021-04-08 18:16                               ` David Hildenbrand
2021-04-08 18:16                               ` David Hildenbrand
2021-04-08 18:16                               ` David Hildenbrand
2021-04-08 18:21                               ` Catalin Marinas
2021-04-08 18:21                                 ` Catalin Marinas
2021-04-08 18:21                                 ` Catalin Marinas
2021-04-08 18:21                                 ` Catalin Marinas
2021-03-12 15:18 ` [PATCH v10 3/6] arm64: kvm: Save/restore MTE registers Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:18   ` Steven Price
2021-03-12 15:19 ` [PATCH v10 4/6] arm64: kvm: Expose KVM_ARM_CAP_MTE Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19 ` [PATCH v10 5/6] KVM: arm64: ioctl to fetch/store tags in a guest Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19 ` [PATCH v10 6/6] KVM: arm64: Document MTE capability and ioctl Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19   ` Steven Price
2021-03-12 15:19   ` Steven Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d545a051-a02a-4c3a-0afe-66612839ba32@redhat.com \
    --to=david@redhat.com \
    --cc=Dave.Martin@arm.com \
    --cc=Haibo.Xu@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=steven.price@arm.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.