All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zenghui Yu <yuzenghui@huawei.com>
To: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: <zhengxiang9@huawei.com>, <marc.zyngier@arm.com>,
	<christoffer.dall@arm.com>, <catalin.marinas@arm.com>,
	<will.deacon@arm.com>, <james.morse@arm.com>,
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <linux-kernel@vger.kernel.org>,
	<wanghaibin.wang@huawei.com>, <lious.lilei@hisilicon.com>,
	<lishuo1@hisilicon.com>
Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Date: Tue, 19 Mar 2019 17:05:23 +0800	[thread overview]
Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> (raw)
In-Reply-To: <20190318173405.GA31412@en101>

Hi Suzuki,

On 2019/3/19 1:34, Suzuki K Poulose wrote:
> Hi !
> On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote:
>> Hi Suzuki,
>>
>> ---8<---
>>
>> test: kvm: arm: Maybe two more fixes
>>
>> Applied based on Suzuki's patch.
>>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>>   virt/kvm/arm/mmu.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 05765df..ccd5d5d 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache
>>   		 * Normal THP split/merge follows mmu_notifier
>>   		 * callbacks and do get handled accordingly.
>>   		 */
>> -			unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
>> +			addr &= S2_PMD_MASK;
>> +			unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE);
>> +			get_page(virt_to_page(pmd));
>>   		} else {
>>
>>   			/*
>> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache *cac
>>   	if (stage2_pud_present(kvm, old_pud)) {
>>   		/* If we have PTE level mapping, unmap the entire range */
>>   		if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) {
>> -			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
>> +			addr &= S2_PUD_MASK;
>> +			unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE);
>> +			get_page(virt_to_page(pudp));
>>   		} else {
>>   			stage2_pud_clear(kvm, pudp);
>>   			kvm_tlb_flush_vmid_ipa(kvm, addr);
> 
> This makes it a bit tricky to follow the code. The other option is to
> do something like :

Yes.

> 
> 
> ---8>---
> 
> kvm: arm: Fix handling of stage2 huge mappings
> 
> We rely on the mmu_notifier call backs to handle the split/merging
> of huge pages and thus we are guaranteed that while creating a
> block mapping, the entire block is unmapped at stage2. However,
> we miss a case where the block mapping is split for dirty logging
> case and then could later be made block mapping, if we cancel the
> dirty logging. This not only creates inconsistent TLB entries for
> the pages in the the block, but also leakes the table pages for
> PMD level.
> 
> Handle these corner cases for the huge mappings at stage2 by
> unmapping the PTE level mapping. This could potentially release
> the upper level table. So we need to restart the table walk
> once we unmap the range.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++---------------
>   1 file changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fce0983..a38a3f1 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   {
>   	pmd_t *pmd, old_pmd;
>   
> +retry:
>   	pmd = stage2_get_pmd(kvm, cache, addr);
>   	VM_BUG_ON(!pmd);
>   
>   	old_pmd = *pmd;
> +	/*
> +	 * Multiple vcpus faulting on the same PMD entry, can
> +	 * lead to them sequentially updating the PMD with the
> +	 * same value. Following the break-before-make
> +	 * (pmd_clear() followed by tlb_flush()) process can
> +	 * hinder forward progress due to refaults generated
> +	 * on missing translations.
> +	 *
> +	 * Skip updating the page table if the entry is
> +	 * unchanged.
> +	 */
> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> +		return 0;
> +
>   	if (pmd_present(old_pmd)) {
>   		/*
> -		 * Multiple vcpus faulting on the same PMD entry, can
> -		 * lead to them sequentially updating the PMD with the
> -		 * same value. Following the break-before-make
> -		 * (pmd_clear() followed by tlb_flush()) process can
> -		 * hinder forward progress due to refaults generated
> -		 * on missing translations.
> -		 *
> -		 * Skip updating the page table if the entry is
> -		 * unchanged.
> +		 * If we already have PTE level mapping for this block,
> +		 * we must unmap it to avoid inconsistent TLB
> +		 * state. We could end up in this situation if
> +		 * the memory slot was marked for dirty logging
> +		 * and was reverted, leaving PTE level mappings
> +		 * for the pages accessed during the period.
> +		 * Normal THP split/merge follows mmu_notifier
> +		 * callbacks and do get handled accordingly.
> +		 * Unmap the PTE level mapping and retry.
>   		 */
> -		if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> -			return 0;
> -
> +		if (!pmd_thp_or_huge(old_pmd)) {
> +			unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to
make it looks the same as PUD level (but it is not necessary).
> +			goto retry;
> +		}
>   		/*
>   		 * Mapping in huge pages should only happen through a
>   		 * fault.  If a page is merged into a transparent huge
> @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   		 * should become splitting first, unmapped, merged,
>   		 * and mapped back in on-demand.
>   		 */
> -		VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> -
> +		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
>   		pmd_clear(pmd);
>   		kvm_tlb_flush_vmid_ipa(kvm, addr);
>   	} else {
> @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   {
>   	pud_t *pudp, old_pud;
>   
> +retry:
>   	pudp = stage2_get_pud(kvm, cache, addr);
>   	VM_BUG_ON(!pudp);
>   
> @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   		return 0;
>   
>   	if (stage2_pud_present(kvm, old_pud)) {
> -		stage2_pud_clear(kvm, pudp);
> -		kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		/*
> +		 * If we already have PTE level mapping, unmap the entire
> +		 * range and retry.
> +		 */
> +		if (!stage2_pud_huge(kvm, old_pud)) {
> +			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
> +			goto retry;
> +		} else {
> +			stage2_pud_clear(kvm, pudp);
> +			kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		}
>   	} else {
>   		get_page(virt_to_page(pudp));
>   	}
> 

It look much better, and works fine now!


thanks,

zenghui



WARNING: multiple messages have this Message-ID (diff)
From: Zenghui Yu <yuzenghui@huawei.com>
To: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: zhengxiang9@huawei.com, marc.zyngier@arm.com,
	christoffer.dall@arm.com, catalin.marinas@arm.com,
	will.deacon@arm.com, james.morse@arm.com,
	linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org,
	wanghaibin.wang@huawei.com, lious.lilei@hisilicon.com,
	lishuo1@hisilicon.com
Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Date: Tue, 19 Mar 2019 17:05:23 +0800	[thread overview]
Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> (raw)
In-Reply-To: <20190318173405.GA31412@en101>

Hi Suzuki,

On 2019/3/19 1:34, Suzuki K Poulose wrote:
> Hi !
> On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote:
>> Hi Suzuki,
>>
>> ---8<---
>>
>> test: kvm: arm: Maybe two more fixes
>>
>> Applied based on Suzuki's patch.
>>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>>   virt/kvm/arm/mmu.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 05765df..ccd5d5d 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache
>>   		 * Normal THP split/merge follows mmu_notifier
>>   		 * callbacks and do get handled accordingly.
>>   		 */
>> -			unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
>> +			addr &= S2_PMD_MASK;
>> +			unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE);
>> +			get_page(virt_to_page(pmd));
>>   		} else {
>>
>>   			/*
>> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache *cac
>>   	if (stage2_pud_present(kvm, old_pud)) {
>>   		/* If we have PTE level mapping, unmap the entire range */
>>   		if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) {
>> -			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
>> +			addr &= S2_PUD_MASK;
>> +			unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE);
>> +			get_page(virt_to_page(pudp));
>>   		} else {
>>   			stage2_pud_clear(kvm, pudp);
>>   			kvm_tlb_flush_vmid_ipa(kvm, addr);
> 
> This makes it a bit tricky to follow the code. The other option is to
> do something like :

Yes.

> 
> 
> ---8>---
> 
> kvm: arm: Fix handling of stage2 huge mappings
> 
> We rely on the mmu_notifier call backs to handle the split/merging
> of huge pages and thus we are guaranteed that while creating a
> block mapping, the entire block is unmapped at stage2. However,
> we miss a case where the block mapping is split for dirty logging
> case and then could later be made block mapping, if we cancel the
> dirty logging. This not only creates inconsistent TLB entries for
> the pages in the the block, but also leakes the table pages for
> PMD level.
> 
> Handle these corner cases for the huge mappings at stage2 by
> unmapping the PTE level mapping. This could potentially release
> the upper level table. So we need to restart the table walk
> once we unmap the range.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++---------------
>   1 file changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fce0983..a38a3f1 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   {
>   	pmd_t *pmd, old_pmd;
>   
> +retry:
>   	pmd = stage2_get_pmd(kvm, cache, addr);
>   	VM_BUG_ON(!pmd);
>   
>   	old_pmd = *pmd;
> +	/*
> +	 * Multiple vcpus faulting on the same PMD entry, can
> +	 * lead to them sequentially updating the PMD with the
> +	 * same value. Following the break-before-make
> +	 * (pmd_clear() followed by tlb_flush()) process can
> +	 * hinder forward progress due to refaults generated
> +	 * on missing translations.
> +	 *
> +	 * Skip updating the page table if the entry is
> +	 * unchanged.
> +	 */
> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> +		return 0;
> +
>   	if (pmd_present(old_pmd)) {
>   		/*
> -		 * Multiple vcpus faulting on the same PMD entry, can
> -		 * lead to them sequentially updating the PMD with the
> -		 * same value. Following the break-before-make
> -		 * (pmd_clear() followed by tlb_flush()) process can
> -		 * hinder forward progress due to refaults generated
> -		 * on missing translations.
> -		 *
> -		 * Skip updating the page table if the entry is
> -		 * unchanged.
> +		 * If we already have PTE level mapping for this block,
> +		 * we must unmap it to avoid inconsistent TLB
> +		 * state. We could end up in this situation if
> +		 * the memory slot was marked for dirty logging
> +		 * and was reverted, leaving PTE level mappings
> +		 * for the pages accessed during the period.
> +		 * Normal THP split/merge follows mmu_notifier
> +		 * callbacks and do get handled accordingly.
> +		 * Unmap the PTE level mapping and retry.
>   		 */
> -		if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> -			return 0;
> -
> +		if (!pmd_thp_or_huge(old_pmd)) {
> +			unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to
make it looks the same as PUD level (but it is not necessary).
> +			goto retry;
> +		}
>   		/*
>   		 * Mapping in huge pages should only happen through a
>   		 * fault.  If a page is merged into a transparent huge
> @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   		 * should become splitting first, unmapped, merged,
>   		 * and mapped back in on-demand.
>   		 */
> -		VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> -
> +		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
>   		pmd_clear(pmd);
>   		kvm_tlb_flush_vmid_ipa(kvm, addr);
>   	} else {
> @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   {
>   	pud_t *pudp, old_pud;
>   
> +retry:
>   	pudp = stage2_get_pud(kvm, cache, addr);
>   	VM_BUG_ON(!pudp);
>   
> @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   		return 0;
>   
>   	if (stage2_pud_present(kvm, old_pud)) {
> -		stage2_pud_clear(kvm, pudp);
> -		kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		/*
> +		 * If we already have PTE level mapping, unmap the entire
> +		 * range and retry.
> +		 */
> +		if (!stage2_pud_huge(kvm, old_pud)) {
> +			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
> +			goto retry;
> +		} else {
> +			stage2_pud_clear(kvm, pudp);
> +			kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		}
>   	} else {
>   		get_page(virt_to_page(pudp));
>   	}
> 

It look much better, and works fine now!


thanks,

zenghui

WARNING: multiple messages have this Message-ID (diff)
From: Zenghui Yu <yuzenghui@huawei.com>
To: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Cc: marc.zyngier@arm.com, catalin.marinas@arm.com,
	will.deacon@arm.com, christoffer.dall@arm.com,
	linux-kernel@vger.kernel.org, zhengxiang9@huawei.com,
	james.morse@arm.com, lishuo1@hisilicon.com,
	wanghaibin.wang@huawei.com, kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, lious.lilei@hisilicon.com
Subject: Re: [RFC] Question about TLB flush while set Stage-2 huge pages
Date: Tue, 19 Mar 2019 17:05:23 +0800	[thread overview]
Message-ID: <25971fd5-3774-3389-a82a-04707480c1e0@huawei.com> (raw)
In-Reply-To: <20190318173405.GA31412@en101>

Hi Suzuki,

On 2019/3/19 1:34, Suzuki K Poulose wrote:
> Hi !
> On Sun, Mar 17, 2019 at 09:34:11PM +0800, Zenghui Yu wrote:
>> Hi Suzuki,
>>
>> ---8<---
>>
>> test: kvm: arm: Maybe two more fixes
>>
>> Applied based on Suzuki's patch.
>>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>>   virt/kvm/arm/mmu.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 05765df..ccd5d5d 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1089,7 +1089,9 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache
>>   		 * Normal THP split/merge follows mmu_notifier
>>   		 * callbacks and do get handled accordingly.
>>   		 */
>> -			unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
>> +			addr &= S2_PMD_MASK;
>> +			unmap_stage2_ptes(kvm, pmd, addr, addr + S2_PMD_SIZE);
>> +			get_page(virt_to_page(pmd));
>>   		} else {
>>
>>   			/*
>> @@ -1138,7 +1140,9 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct
>> kvm_mmu_memory_cache *cac
>>   	if (stage2_pud_present(kvm, old_pud)) {
>>   		/* If we have PTE level mapping, unmap the entire range */
>>   		if (WARN_ON_ONCE(!stage2_pud_huge(kvm, old_pud))) {
>> -			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
>> +			addr &= S2_PUD_MASK;
>> +			unmap_stage2_pmds(kvm, pudp, addr, addr + S2_PUD_SIZE);
>> +			get_page(virt_to_page(pudp));
>>   		} else {
>>   			stage2_pud_clear(kvm, pudp);
>>   			kvm_tlb_flush_vmid_ipa(kvm, addr);
> 
> This makes it a bit tricky to follow the code. The other option is to
> do something like :

Yes.

> 
> 
> ---8>---
> 
> kvm: arm: Fix handling of stage2 huge mappings
> 
> We rely on the mmu_notifier call backs to handle the split/merging
> of huge pages and thus we are guaranteed that while creating a
> block mapping, the entire block is unmapped at stage2. However,
> we miss a case where the block mapping is split for dirty logging
> case and then could later be made block mapping, if we cancel the
> dirty logging. This not only creates inconsistent TLB entries for
> the pages in the the block, but also leakes the table pages for
> PMD level.
> 
> Handle these corner cases for the huge mappings at stage2 by
> unmapping the PTE level mapping. This could potentially release
> the upper level table. So we need to restart the table walk
> once we unmap the range.
> 
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>   virt/kvm/arm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++++---------------
>   1 file changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index fce0983..a38a3f1 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1060,25 +1060,41 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   {
>   	pmd_t *pmd, old_pmd;
>   
> +retry:
>   	pmd = stage2_get_pmd(kvm, cache, addr);
>   	VM_BUG_ON(!pmd);
>   
>   	old_pmd = *pmd;
> +	/*
> +	 * Multiple vcpus faulting on the same PMD entry, can
> +	 * lead to them sequentially updating the PMD with the
> +	 * same value. Following the break-before-make
> +	 * (pmd_clear() followed by tlb_flush()) process can
> +	 * hinder forward progress due to refaults generated
> +	 * on missing translations.
> +	 *
> +	 * Skip updating the page table if the entry is
> +	 * unchanged.
> +	 */
> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> +		return 0;
> +
>   	if (pmd_present(old_pmd)) {
>   		/*
> -		 * Multiple vcpus faulting on the same PMD entry, can
> -		 * lead to them sequentially updating the PMD with the
> -		 * same value. Following the break-before-make
> -		 * (pmd_clear() followed by tlb_flush()) process can
> -		 * hinder forward progress due to refaults generated
> -		 * on missing translations.
> -		 *
> -		 * Skip updating the page table if the entry is
> -		 * unchanged.
> +		 * If we already have PTE level mapping for this block,
> +		 * we must unmap it to avoid inconsistent TLB
> +		 * state. We could end up in this situation if
> +		 * the memory slot was marked for dirty logging
> +		 * and was reverted, leaving PTE level mappings
> +		 * for the pages accessed during the period.
> +		 * Normal THP split/merge follows mmu_notifier
> +		 * callbacks and do get handled accordingly.
> +		 * Unmap the PTE level mapping and retry.
>   		 */
> -		if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> -			return 0;
> -
> +		if (!pmd_thp_or_huge(old_pmd)) {
> +			unmap_stage2_range(kvm, (addr & S2_PMD_MASK), S2_PMD_SIZE);
Nit: we can get rid of the parentheses around "addr & S2_PMD_MASK" to
make it looks the same as PUD level (but it is not necessary).
> +			goto retry;
> +		}
>   		/*
>   		 * Mapping in huge pages should only happen through a
>   		 * fault.  If a page is merged into a transparent huge
> @@ -1090,8 +1106,7 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   		 * should become splitting first, unmapped, merged,
>   		 * and mapped back in on-demand.
>   		 */
> -		VM_BUG_ON(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> -
> +		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
>   		pmd_clear(pmd);
>   		kvm_tlb_flush_vmid_ipa(kvm, addr);
>   	} else {
> @@ -1107,6 +1122,7 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   {
>   	pud_t *pudp, old_pud;
>   
> +retry:
>   	pudp = stage2_get_pud(kvm, cache, addr);
>   	VM_BUG_ON(!pudp);
>   
> @@ -1122,8 +1138,17 @@ static int stage2_set_pud_huge(struct kvm *kvm, struct kvm_mmu_memory_cache *cac
>   		return 0;
>   
>   	if (stage2_pud_present(kvm, old_pud)) {
> -		stage2_pud_clear(kvm, pudp);
> -		kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		/*
> +		 * If we already have PTE level mapping, unmap the entire
> +		 * range and retry.
> +		 */
> +		if (!stage2_pud_huge(kvm, old_pud)) {
> +			unmap_stage2_range(kvm, addr & S2_PUD_MASK, S2_PUD_SIZE);
> +			goto retry;
> +		} else {
> +			stage2_pud_clear(kvm, pudp);
> +			kvm_tlb_flush_vmid_ipa(kvm, addr);
> +		}
>   	} else {
>   		get_page(virt_to_page(pudp));
>   	}
> 

It look much better, and works fine now!


thanks,

zenghui



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2019-03-19  9:08 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-11 16:31 [RFC] Question about TLB flush while set Stage-2 huge pages Zheng Xiang
2019-03-11 16:31 ` Zheng Xiang
2019-03-11 16:31 ` Zheng Xiang
2019-03-12 11:32 ` Marc Zyngier
2019-03-12 11:32   ` Marc Zyngier
2019-03-12 15:30   ` Zheng Xiang
2019-03-12 15:30     ` Zheng Xiang
2019-03-12 15:30     ` Zheng Xiang
2019-03-12 18:18     ` Marc Zyngier
2019-03-12 18:18       ` Marc Zyngier
2019-03-13  9:45       ` Zheng Xiang
2019-03-13  9:45         ` Zheng Xiang
2019-03-13  9:45         ` Zheng Xiang
2019-03-14 10:55         ` Suzuki K Poulose
2019-03-14 10:55           ` Suzuki K Poulose
2019-03-14 15:50           ` Zenghui Yu
2019-03-14 15:50             ` Zenghui Yu
2019-03-14 15:50             ` Zenghui Yu
2019-03-15  8:21             ` Zheng Xiang
2019-03-15  8:21               ` Zheng Xiang
2019-03-15  8:21               ` Zheng Xiang
2019-03-15 14:56               ` Suzuki K Poulose
2019-03-15 14:56                 ` Suzuki K Poulose
2019-03-17 13:34                 ` Zenghui Yu
2019-03-17 13:34                   ` Zenghui Yu
2019-03-17 13:34                   ` Zenghui Yu
2019-03-18 17:34                   ` Suzuki K Poulose
2019-03-18 17:34                     ` Suzuki K Poulose
2019-03-19  9:05                     ` Zenghui Yu [this message]
2019-03-19  9:05                       ` Zenghui Yu
2019-03-19  9:05                       ` Zenghui Yu
2019-03-19 14:11                       ` [PATCH] kvm: arm: Fix handling of stage2 huge mappings Suzuki K Poulose
2019-03-19 14:11                         ` Suzuki K Poulose
2019-03-19 16:02                         ` Zenghui Yu
2019-03-19 16:02                           ` Zenghui Yu
2019-03-19 16:02                           ` Zenghui Yu
2019-03-20  8:15                         ` Marc Zyngier
2019-03-20  8:15                           ` Marc Zyngier
2019-03-20  8:15                           ` Marc Zyngier
2019-03-20  9:44                           ` Suzuki K Poulose
2019-03-20  9:44                             ` Suzuki K Poulose
2019-03-20  9:44                             ` Suzuki K Poulose
2019-03-20 10:11                             ` Marc Zyngier
2019-03-20 10:11                               ` Marc Zyngier
2019-03-20 10:11                               ` Marc Zyngier
2019-03-20 10:23                               ` Suzuki K Poulose
2019-03-20 10:23                                 ` Suzuki K Poulose
2019-03-20 10:35                                 ` Marc Zyngier
2019-03-20 10:35                                   ` Marc Zyngier
2019-03-20 10:35                                   ` Marc Zyngier
2019-03-20 11:12                                   ` Suzuki K Poulose
2019-03-20 11:12                                     ` Suzuki K Poulose
2019-03-20 17:24                                     ` Marc Zyngier
2019-03-20 17:24                                       ` Marc Zyngier
2019-03-20 17:24                                       ` Marc Zyngier
2019-03-17 13:55                 ` [RFC] Question about TLB flush while set Stage-2 huge pages Zenghui Yu
2019-03-17 13:55                   ` Zenghui Yu
2019-03-17 13:55                   ` Zenghui Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=25971fd5-3774-3389-a82a-04707480c1e0@huawei.com \
    --to=yuzenghui@huawei.com \
    --cc=Suzuki.Poulose@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=christoffer.dall@arm.com \
    --cc=james.morse@arm.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lious.lilei@hisilicon.com \
    --cc=lishuo1@hisilicon.com \
    --cc=marc.zyngier@arm.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will.deacon@arm.com \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.