All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-03 14:12 ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-03 14:12 UTC (permalink / raw)
  To: linux-arm-kernel, andreyknvl
  Cc: dvyukov, marc.zyngier, christoffer.dall, kvmarm, kvm,
	linux-kernel, kcc, syzkaller, will.deacon, catalin.marinas,
	pbonzini, mark.rutland, suzuki.poulose, ard.biesheuvel, stable

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: stable@vger.kernel.org # v3.10+
Cc: Paolo Bonzini <pbonzin@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

---
Changes since V2:
 - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
   to avoid possible issues like [0]

 [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html

Changes since V1:
 - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
   vCPU starvation and lockup detector warnings.
---
 arch/arm/kvm/mmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 13b9c1f..db94f3a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
 
+	assert_spin_locked(&kvm->mmu_lock);
 	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
 	do {
+		/*
+		 * If the range is too large, release the kvm->mmu_lock
+		 * to prevent starvation and lockup detector warnings.
+		 */
+		if (size > S2_PUD_SIZE)
+			cond_resched_lock(&kvm->mmu_lock);
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
 			unmap_stage2_puds(kvm, pgd, addr, next);
@@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	if (kvm->arch.pgd == NULL)
 		return;
 
+	spin_lock(&kvm->mmu_lock);
 	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	spin_unlock(&kvm->mmu_lock);
+
 	/* Free the HW pgd, one page at a time */
 	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
 	kvm->arch.pgd = NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-03 14:12 ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-03 14:12 UTC (permalink / raw)
  To: linux-arm-kernel, andreyknvl
  Cc: kvm, marc.zyngier, catalin.marinas, ard.biesheuvel, will.deacon,
	linux-kernel, stable, kcc, syzkaller, dvyukov, pbonzini, kvmarm

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: stable@vger.kernel.org # v3.10+
Cc: Paolo Bonzini <pbonzin@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

---
Changes since V2:
 - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
   to avoid possible issues like [0]

 [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html

Changes since V1:
 - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
   vCPU starvation and lockup detector warnings.
---
 arch/arm/kvm/mmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 13b9c1f..db94f3a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
 
+	assert_spin_locked(&kvm->mmu_lock);
 	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
 	do {
+		/*
+		 * If the range is too large, release the kvm->mmu_lock
+		 * to prevent starvation and lockup detector warnings.
+		 */
+		if (size > S2_PUD_SIZE)
+			cond_resched_lock(&kvm->mmu_lock);
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
 			unmap_stage2_puds(kvm, pgd, addr, next);
@@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	if (kvm->arch.pgd == NULL)
 		return;
 
+	spin_lock(&kvm->mmu_lock);
 	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	spin_unlock(&kvm->mmu_lock);
+
 	/* Free the HW pgd, one page at a time */
 	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
 	kvm->arch.pgd = NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-03 14:12 ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-03 14:12 UTC (permalink / raw)
  To: linux-arm-kernel

In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
unmap_stage2_range() on the entire memory range for the guest. This could
cause problems with other callers (e.g, munmap on a memslot) trying to
unmap a range. And since we have to unmap the entire Guest memory range
holding a spinlock, make sure we yield the lock if necessary, after we
unmap each PUD range.

Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Cc: stable at vger.kernel.org # v3.10+
Cc: Paolo Bonzini <pbonzin@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
[ Avoid vCPU starvation and lockup detector warnings ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>

---
Changes since V2:
 - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
   to avoid possible issues like [0]

 [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html

Changes since V1:
 - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
   vCPU starvation and lockup detector warnings.
---
 arch/arm/kvm/mmu.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 13b9c1f..db94f3a 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 	phys_addr_t addr = start, end = start + size;
 	phys_addr_t next;
 
+	assert_spin_locked(&kvm->mmu_lock);
 	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
 	do {
+		/*
+		 * If the range is too large, release the kvm->mmu_lock
+		 * to prevent starvation and lockup detector warnings.
+		 */
+		if (size > S2_PUD_SIZE)
+			cond_resched_lock(&kvm->mmu_lock);
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
 			unmap_stage2_puds(kvm, pgd, addr, next);
@@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	if (kvm->arch.pgd == NULL)
 		return;
 
+	spin_lock(&kvm->mmu_lock);
 	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	spin_unlock(&kvm->mmu_lock);
+
 	/* Free the HW pgd, one page at a time */
 	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
 	kvm->arch.pgd = NULL;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-03 14:12 ` Suzuki K Poulose
@ 2017-04-03 14:22   ` Mark Rutland
  -1 siblings, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2017-04-03 14:22 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, andreyknvl, dvyukov, marc.zyngier,
	christoffer.dall, kvmarm, kvm, linux-kernel, kcc, syzkaller,
	will.deacon, catalin.marinas, pbonzini, ard.biesheuvel, stable

Hi,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range. And since we have to unmap the entire Guest memory range
> holding a spinlock, make sure we yield the lock if necessary, after we
> unmap each PUD range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: stable@vger.kernel.org # v3.10+
> Cc: Paolo Bonzini <pbonzin@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> [ Avoid vCPU starvation and lockup detector warnings ]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> ---
> Changes since V2:
>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>    to avoid possible issues like [0]
> 
>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html

Sorry if I'm being thick, but how does restricting this to a larger
range help with the "sleeping function called from invalid context"
issue?

Surely that just makes it rarer?

Thanks,
Mark.

> 
> Changes since V1:
>  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
>    vCPU starvation and lockup detector warnings.
> ---
>  arch/arm/kvm/mmu.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..db94f3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	assert_spin_locked(&kvm->mmu_lock);
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		/*
> +		 * If the range is too large, release the kvm->mmu_lock
> +		 * to prevent starvation and lockup detector warnings.
> +		 */
> +		if (size > S2_PUD_SIZE)
> +			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	if (kvm->arch.pgd == NULL)
>  		return;
>  
> +	spin_lock(&kvm->mmu_lock);
>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> +	spin_unlock(&kvm->mmu_lock);
> +
>  	/* Free the HW pgd, one page at a time */
>  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>  	kvm->arch.pgd = NULL;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-03 14:22   ` Mark Rutland
  0 siblings, 0 replies; 22+ messages in thread
From: Mark Rutland @ 2017-04-03 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range. And since we have to unmap the entire Guest memory range
> holding a spinlock, make sure we yield the lock if necessary, after we
> unmap each PUD range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: stable at vger.kernel.org # v3.10+
> Cc: Paolo Bonzini <pbonzin@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> [ Avoid vCPU starvation and lockup detector warnings ]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> ---
> Changes since V2:
>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>    to avoid possible issues like [0]
> 
>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html

Sorry if I'm being thick, but how does restricting this to a larger
range help with the "sleeping function called from invalid context"
issue?

Surely that just makes it rarer?

Thanks,
Mark.

> 
> Changes since V1:
>  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
>    vCPU starvation and lockup detector warnings.
> ---
>  arch/arm/kvm/mmu.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..db94f3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	assert_spin_locked(&kvm->mmu_lock);
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		/*
> +		 * If the range is too large, release the kvm->mmu_lock
> +		 * to prevent starvation and lockup detector warnings.
> +		 */
> +		if (size > S2_PUD_SIZE)
> +			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	if (kvm->arch.pgd == NULL)
>  		return;
>  
> +	spin_lock(&kvm->mmu_lock);
>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> +	spin_unlock(&kvm->mmu_lock);
> +
>  	/* Free the HW pgd, one page at a time */
>  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>  	kvm->arch.pgd = NULL;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-03 14:22   ` Mark Rutland
@ 2017-04-03 14:25     ` Suzuki K Poulose
  -1 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-03 14:25 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-arm-kernel, andreyknvl, dvyukov, marc.zyngier,
	christoffer.dall, kvmarm, kvm, linux-kernel, kcc, syzkaller,
	will.deacon, catalin.marinas, pbonzini, ard.biesheuvel, stable

On 03/04/17 15:22, Mark Rutland wrote:
> Hi,
>
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range. And since we have to unmap the entire Guest memory range
>> holding a spinlock, make sure we yield the lock if necessary, after we
>> unmap each PUD range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: stable@vger.kernel.org # v3.10+
>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> [ Avoid vCPU starvation and lockup detector warnings ]
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> ---
>> Changes since V2:
>>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>>    to avoid possible issues like [0]
>>
>>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
>
> Sorry if I'm being thick, but how does restricting this to a larger
> range help with the "sleeping function called from invalid context"
> issue?
>
> Surely that just makes it rarer?

The issue in [0] arises when we try to unmap a page at stage2, while holding a
different spinlock and we try to do cond_resched_lock(), thinking we might
spend too much time holding the lock. With this patch, we don't try to relax
the lock if we are dealing with smaller sizes and hence avoids cond_resched_lock().
So in effect it tries to avoid the cond_resched_lock() when we could finish
the operation soon enough.

Hope that helps.

Suzuki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-03 14:25     ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-03 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/04/17 15:22, Mark Rutland wrote:
> Hi,
>
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range. And since we have to unmap the entire Guest memory range
>> holding a spinlock, make sure we yield the lock if necessary, after we
>> unmap each PUD range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: stable at vger.kernel.org # v3.10+
>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> [ Avoid vCPU starvation and lockup detector warnings ]
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>> ---
>> Changes since V2:
>>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>>    to avoid possible issues like [0]
>>
>>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
>
> Sorry if I'm being thick, but how does restricting this to a larger
> range help with the "sleeping function called from invalid context"
> issue?
>
> Surely that just makes it rarer?

The issue in [0] arises when we try to unmap a page at stage2, while holding a
different spinlock and we try to do cond_resched_lock(), thinking we might
spend too much time holding the lock. With this patch, we don't try to relax
the lock if we are dealing with smaller sizes and hence avoids cond_resched_lock().
So in effect it tries to avoid the cond_resched_lock() when we could finish
the operation soon enough.

Hope that helps.

Suzuki

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-03 14:22   ` Mark Rutland
@ 2017-04-03 14:31     ` Christoffer Dall
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-03 14:31 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Suzuki K Poulose, linux-arm-kernel, andreyknvl, dvyukov,
	marc.zyngier, christoffer.dall, kvmarm, kvm, linux-kernel, kcc,
	syzkaller, will.deacon, catalin.marinas, pbonzini,
	ard.biesheuvel, stable

On Mon, Apr 03, 2017 at 03:22:11PM +0100, Mark Rutland wrote:
> Hi,
> 
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> > In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> > unmap_stage2_range() on the entire memory range for the guest. This could
> > cause problems with other callers (e.g, munmap on a memslot) trying to
> > unmap a range. And since we have to unmap the entire Guest memory range
> > holding a spinlock, make sure we yield the lock if necessary, after we
> > unmap each PUD range.
> > 
> > Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> > Cc: stable@vger.kernel.org # v3.10+
> > Cc: Paolo Bonzini <pbonzin@redhat.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Christoffer Dall <christoffer.dall@linaro.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > [ Avoid vCPU starvation and lockup detector warnings ]
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > 
> > ---
> > Changes since V2:
> >  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
> >    to avoid possible issues like [0]
> > 
> >  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
> 
> Sorry if I'm being thick, but how does restricting this to a larger
> range help with the "sleeping function called from invalid context"
> issue?
> 
> Surely that just makes it rarer?

As far as I can tell, the unmap_stage2_range() function is only called
in the problematic path which has the extra lock taken rom
try_to_unmap_one() via the kvm_unmap_hva() function, which always
passes PAGE_SIZE as the argument, which is always smaller than
S2_PUD_SIZE.

Did I miss something?

Thanks,
-Christoffer
> 
> > 
> > Changes since V1:
> >  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
> >    vCPU starvation and lockup detector warnings.
> > ---
> >  arch/arm/kvm/mmu.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index 13b9c1f..db94f3a 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> >  	phys_addr_t addr = start, end = start + size;
> >  	phys_addr_t next;
> >  
> > +	assert_spin_locked(&kvm->mmu_lock);
> >  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
> >  	do {
> > +		/*
> > +		 * If the range is too large, release the kvm->mmu_lock
> > +		 * to prevent starvation and lockup detector warnings.
> > +		 */
> > +		if (size > S2_PUD_SIZE)
> > +			cond_resched_lock(&kvm->mmu_lock);
> >  		next = stage2_pgd_addr_end(addr, end);
> >  		if (!stage2_pgd_none(*pgd))
> >  			unmap_stage2_puds(kvm, pgd, addr, next);
> > @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> >  	if (kvm->arch.pgd == NULL)
> >  		return;
> >  
> > +	spin_lock(&kvm->mmu_lock);
> >  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> > +	spin_unlock(&kvm->mmu_lock);
> > +
> >  	/* Free the HW pgd, one page at a time */
> >  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> >  	kvm->arch.pgd = NULL;
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-03 14:31     ` Christoffer Dall
  0 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-03 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 03, 2017 at 03:22:11PM +0100, Mark Rutland wrote:
> Hi,
> 
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> > In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> > unmap_stage2_range() on the entire memory range for the guest. This could
> > cause problems with other callers (e.g, munmap on a memslot) trying to
> > unmap a range. And since we have to unmap the entire Guest memory range
> > holding a spinlock, make sure we yield the lock if necessary, after we
> > unmap each PUD range.
> > 
> > Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> > Cc: stable at vger.kernel.org # v3.10+
> > Cc: Paolo Bonzini <pbonzin@redhat.com>
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Cc: Christoffer Dall <christoffer.dall@linaro.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > [ Avoid vCPU starvation and lockup detector warnings ]
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > 
> > ---
> > Changes since V2:
> >  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
> >    to avoid possible issues like [0]
> > 
> >  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
> 
> Sorry if I'm being thick, but how does restricting this to a larger
> range help with the "sleeping function called from invalid context"
> issue?
> 
> Surely that just makes it rarer?

As far as I can tell, the unmap_stage2_range() function is only called
in the problematic path which has the extra lock taken rom
try_to_unmap_one() via the kvm_unmap_hva() function, which always
passes PAGE_SIZE as the argument, which is always smaller than
S2_PUD_SIZE.

Did I miss something?

Thanks,
-Christoffer
> 
> > 
> > Changes since V1:
> >  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
> >    vCPU starvation and lockup detector warnings.
> > ---
> >  arch/arm/kvm/mmu.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index 13b9c1f..db94f3a 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> >  	phys_addr_t addr = start, end = start + size;
> >  	phys_addr_t next;
> >  
> > +	assert_spin_locked(&kvm->mmu_lock);
> >  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
> >  	do {
> > +		/*
> > +		 * If the range is too large, release the kvm->mmu_lock
> > +		 * to prevent starvation and lockup detector warnings.
> > +		 */
> > +		if (size > S2_PUD_SIZE)
> > +			cond_resched_lock(&kvm->mmu_lock);
> >  		next = stage2_pgd_addr_end(addr, end);
> >  		if (!stage2_pgd_none(*pgd))
> >  			unmap_stage2_puds(kvm, pgd, addr, next);
> > @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> >  	if (kvm->arch.pgd == NULL)
> >  		return;
> >  
> > +	spin_lock(&kvm->mmu_lock);
> >  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> > +	spin_unlock(&kvm->mmu_lock);
> > +
> >  	/* Free the HW pgd, one page at a time */
> >  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> >  	kvm->arch.pgd = NULL;
> > -- 
> > 2.7.4
> > 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-03 14:12 ` Suzuki K Poulose
  (?)
@ 2017-04-04 10:13   ` Christoffer Dall
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-04 10:13 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, andreyknvl, dvyukov, marc.zyngier,
	christoffer.dall, kvmarm, kvm, linux-kernel, kcc, syzkaller,
	will.deacon, catalin.marinas, pbonzini, mark.rutland,
	ard.biesheuvel, stable

Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range. And since we have to unmap the entire Guest memory range
> holding a spinlock, make sure we yield the lock if necessary, after we
> unmap each PUD range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: stable@vger.kernel.org # v3.10+
> Cc: Paolo Bonzini <pbonzin@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> [ Avoid vCPU starvation and lockup detector warnings ]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 

This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@
 
 #define stage2_pud_huge(pud)			pud_huge(pud)
 
+#define S2_PGDIR_SIZE				PGDIR_SIZE
+#define S2_PGDIR_MASK				PGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
 {
-	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
 
 	return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
 
-#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
 
 #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 		 * If the range is too large, release the kvm->mmu_lock
 		 * to prevent starvation and lockup detector warnings.
 		 */
-		if (size > S2_PUD_SIZE)
+		if (size > S2_PGDIR_SIZE)
 			cond_resched_lock(&kvm->mmu_lock);
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 	}
 
 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
-	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
 	if (!pgd)
 		return -ENOMEM;
 
@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	spin_unlock(&kvm->mmu_lock);
 
 	/* Free the HW pgd, one page at a time */
-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
 	kvm->arch.pgd = NULL;
 }
 

Thanks,
-Christoffer

> ---
> Changes since V2:
>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>    to avoid possible issues like [0]
> 
>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
> 
> Changes since V1:
>  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
>    vCPU starvation and lockup detector warnings.
> ---
>  arch/arm/kvm/mmu.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..db94f3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	assert_spin_locked(&kvm->mmu_lock);
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		/*
> +		 * If the range is too large, release the kvm->mmu_lock
> +		 * to prevent starvation and lockup detector warnings.
> +		 */
> +		if (size > S2_PUD_SIZE)
> +			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	if (kvm->arch.pgd == NULL)
>  		return;
>  
> +	spin_lock(&kvm->mmu_lock);
>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> +	spin_unlock(&kvm->mmu_lock);
> +
>  	/* Free the HW pgd, one page at a time */
>  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>  	kvm->arch.pgd = NULL;
> -- 
> 2.7.4
> 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-04 10:13   ` Christoffer Dall
  0 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-04 10:13 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, kvm, ard.biesheuvel, marc.zyngier, andreyknvl,
	will.deacon, linux-kernel, stable, kcc, syzkaller, dvyukov,
	catalin.marinas, pbonzini, kvmarm

Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range. And since we have to unmap the entire Guest memory range
> holding a spinlock, make sure we yield the lock if necessary, after we
> unmap each PUD range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: stable@vger.kernel.org # v3.10+
> Cc: Paolo Bonzini <pbonzin@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> [ Avoid vCPU starvation and lockup detector warnings ]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 

This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@
 
 #define stage2_pud_huge(pud)			pud_huge(pud)
 
+#define S2_PGDIR_SIZE				PGDIR_SIZE
+#define S2_PGDIR_MASK				PGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
 {
-	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
 
 	return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
 
-#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
 
 #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 		 * If the range is too large, release the kvm->mmu_lock
 		 * to prevent starvation and lockup detector warnings.
 		 */
-		if (size > S2_PUD_SIZE)
+		if (size > S2_PGDIR_SIZE)
 			cond_resched_lock(&kvm->mmu_lock);
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 	}
 
 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
-	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
 	if (!pgd)
 		return -ENOMEM;
 
@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	spin_unlock(&kvm->mmu_lock);
 
 	/* Free the HW pgd, one page at a time */
-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
 	kvm->arch.pgd = NULL;
 }
 

Thanks,
-Christoffer

> ---
> Changes since V2:
>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>    to avoid possible issues like [0]
> 
>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
> 
> Changes since V1:
>  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
>    vCPU starvation and lockup detector warnings.
> ---
>  arch/arm/kvm/mmu.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..db94f3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	assert_spin_locked(&kvm->mmu_lock);
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		/*
> +		 * If the range is too large, release the kvm->mmu_lock
> +		 * to prevent starvation and lockup detector warnings.
> +		 */
> +		if (size > S2_PUD_SIZE)
> +			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	if (kvm->arch.pgd == NULL)
>  		return;
>  
> +	spin_lock(&kvm->mmu_lock);
>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> +	spin_unlock(&kvm->mmu_lock);
> +
>  	/* Free the HW pgd, one page at a time */
>  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>  	kvm->arch.pgd = NULL;
> -- 
> 2.7.4
> 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-04 10:13   ` Christoffer Dall
  0 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-04 10:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Suzuki,

On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> unmap_stage2_range() on the entire memory range for the guest. This could
> cause problems with other callers (e.g, munmap on a memslot) trying to
> unmap a range. And since we have to unmap the entire Guest memory range
> holding a spinlock, make sure we yield the lock if necessary, after we
> unmap each PUD range.
> 
> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Cc: stable at vger.kernel.org # v3.10+
> Cc: Paolo Bonzini <pbonzin@redhat.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> [ Avoid vCPU starvation and lockup detector warnings ]
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 

This unfortunately fails to build on 32-bit ARM, and I also think we
intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

How about adding this to your patch (which includes a rename of
S2_PGD_SIZE which is horribly confusing as it indicates the size of the
first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
of address space mapped by a single entry in the same table):

diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
index 460d616..c997f2d 100644
--- a/arch/arm/include/asm/stage2_pgtable.h
+++ b/arch/arm/include/asm/stage2_pgtable.h
@@ -35,10 +35,13 @@
 
 #define stage2_pud_huge(pud)			pud_huge(pud)
 
+#define S2_PGDIR_SIZE				PGDIR_SIZE
+#define S2_PGDIR_MASK				PGDIR_MASK
+
 /* Open coded p*d_addr_end that can deal with 64bit addresses */
 static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
 {
-	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
+	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
 
 	return (boundary - 1 < end - 1) ? boundary : end;
 }
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..6e79a4c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
 static unsigned long hyp_idmap_end;
 static phys_addr_t hyp_idmap_vector;
 
-#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
+#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
 #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
 
 #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
 		 * If the range is too large, release the kvm->mmu_lock
 		 * to prevent starvation and lockup detector warnings.
 		 */
-		if (size > S2_PUD_SIZE)
+		if (size > S2_PGDIR_SIZE)
 			cond_resched_lock(&kvm->mmu_lock);
 		next = stage2_pgd_addr_end(addr, end);
 		if (!stage2_pgd_none(*pgd))
@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 	}
 
 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
-	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
+	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
 	if (!pgd)
 		return -ENOMEM;
 
@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
 	spin_unlock(&kvm->mmu_lock);
 
 	/* Free the HW pgd, one page at a time */
-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
+	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
 	kvm->arch.pgd = NULL;
 }
 

Thanks,
-Christoffer

> ---
> Changes since V2:
>  - Restrict kvm->mmu_lock relaxation to bigger ranges in unmap_stage2_range(),
>    to avoid possible issues like [0]
> 
>  [0] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-March/498210.html
> 
> Changes since V1:
>  - Yield the kvm->mmu_lock if necessary in unmap_stage2_range to prevent
>    vCPU starvation and lockup detector warnings.
> ---
>  arch/arm/kvm/mmu.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 13b9c1f..db94f3a 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -292,8 +292,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  	phys_addr_t addr = start, end = start + size;
>  	phys_addr_t next;
>  
> +	assert_spin_locked(&kvm->mmu_lock);
>  	pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>  	do {
> +		/*
> +		 * If the range is too large, release the kvm->mmu_lock
> +		 * to prevent starvation and lockup detector warnings.
> +		 */
> +		if (size > S2_PUD_SIZE)
> +			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
>  			unmap_stage2_puds(kvm, pgd, addr, next);
> @@ -831,7 +838,10 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	if (kvm->arch.pgd == NULL)
>  		return;
>  
> +	spin_lock(&kvm->mmu_lock);
>  	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
> +	spin_unlock(&kvm->mmu_lock);
> +
>  	/* Free the HW pgd, one page at a time */
>  	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>  	kvm->arch.pgd = NULL;
> -- 
> 2.7.4
> 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-04 10:13   ` Christoffer Dall
  (?)
@ 2017-04-04 10:35     ` Suzuki K Poulose
  -1 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-04 10:35 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: linux-arm-kernel, andreyknvl, dvyukov, marc.zyngier,
	christoffer.dall, kvmarm, kvm, linux-kernel, kcc, syzkaller,
	will.deacon, catalin.marinas, pbonzini, mark.rutland,
	ard.biesheuvel, stable

Hi Christoffer,

On 04/04/17 11:13, Christoffer Dall wrote:
> Hi Suzuki,
>
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range. And since we have to unmap the entire Guest memory range
>> holding a spinlock, make sure we yield the lock if necessary, after we
>> unmap each PUD range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: stable@vger.kernel.org # v3.10+
>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> [ Avoid vCPU starvation and lockup detector warnings ]
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>
> This unfortunately fails to build on 32-bit ARM, and I also think we
> intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

Sorry about that, I didn't test the patch with arm32. I am fine the
patch below. And I agree that the name change does make things more
readable. See below for a hunk that I posted to the kbuild report.

>
> How about adding this to your patch (which includes a rename of
> S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> of address space mapped by a single entry in the same table):
>
> diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> index 460d616..c997f2d 100644
> --- a/arch/arm/include/asm/stage2_pgtable.h
> +++ b/arch/arm/include/asm/stage2_pgtable.h
> @@ -35,10 +35,13 @@
>
>  #define stage2_pud_huge(pud)			pud_huge(pud)
>
> +#define S2_PGDIR_SIZE				PGDIR_SIZE
> +#define S2_PGDIR_MASK				PGDIR_MASK
> +
>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
>  {
> -	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> +	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
>
>  	return (boundary - 1 < end - 1) ? boundary : end;
>  }
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..6e79a4c 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
>  static unsigned long hyp_idmap_end;
>  static phys_addr_t hyp_idmap_vector;
>
> -#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> +#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>
>  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  		 * If the range is too large, release the kvm->mmu_lock
>  		 * to prevent starvation and lockup detector warnings.
>  		 */
> -		if (size > S2_PUD_SIZE)
> +		if (size > S2_PGDIR_SIZE)
>  			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
> @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>  	}
>
>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> +	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
>  		return -ENOMEM;
>
> @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	spin_unlock(&kvm->mmu_lock);
>
>  	/* Free the HW pgd, one page at a time */
> -	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> +	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
>  	kvm->arch.pgd = NULL;
>  }
>

Btw, I have a different hunk to solve the problem, posted to the kbuild
report. I will post it here for the sake of capturing the discussion in
one place. The following hunk on top of the patch, changes the lock
release after we process one PGDIR entry. As for the first time
we enter the loop we haven't done much with the lock held, hence it may make
sense to do it after the first round and we have more work to do.

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
         assert_spin_locked(&kvm->mmu_lock);
         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
         do {
+               next = stage2_pgd_addr_end(addr, end);
+               if (!stage2_pgd_none(*pgd))
+                       unmap_stage2_puds(kvm, pgd, addr, next);
                 /*
                  * If the range is too large, release the kvm->mmu_lock
                  * to prevent starvation and lockup detector warnings.
                  */
-               if (size > S2_PUD_SIZE)
+               if (next != end)
                         cond_resched_lock(&kvm->mmu_lock);
-               next = stage2_pgd_addr_end(addr, end);
-               if (!stage2_pgd_none(*pgd))
-                       unmap_stage2_puds(kvm, pgd, addr, next);
         } while (pgd++, addr = next, addr != end);
  }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-04 10:35     ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-04 10:35 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: linux-arm-kernel, kvm, ard.biesheuvel, marc.zyngier, andreyknvl,
	will.deacon, linux-kernel, stable, kcc, syzkaller, dvyukov,
	catalin.marinas, pbonzini, kvmarm

Hi Christoffer,

On 04/04/17 11:13, Christoffer Dall wrote:
> Hi Suzuki,
>
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range. And since we have to unmap the entire Guest memory range
>> holding a spinlock, make sure we yield the lock if necessary, after we
>> unmap each PUD range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: stable@vger.kernel.org # v3.10+
>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> [ Avoid vCPU starvation and lockup detector warnings ]
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>
> This unfortunately fails to build on 32-bit ARM, and I also think we
> intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

Sorry about that, I didn't test the patch with arm32. I am fine the
patch below. And I agree that the name change does make things more
readable. See below for a hunk that I posted to the kbuild report.

>
> How about adding this to your patch (which includes a rename of
> S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> of address space mapped by a single entry in the same table):
>
> diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> index 460d616..c997f2d 100644
> --- a/arch/arm/include/asm/stage2_pgtable.h
> +++ b/arch/arm/include/asm/stage2_pgtable.h
> @@ -35,10 +35,13 @@
>
>  #define stage2_pud_huge(pud)			pud_huge(pud)
>
> +#define S2_PGDIR_SIZE				PGDIR_SIZE
> +#define S2_PGDIR_MASK				PGDIR_MASK
> +
>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
>  {
> -	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> +	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
>
>  	return (boundary - 1 < end - 1) ? boundary : end;
>  }
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..6e79a4c 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
>  static unsigned long hyp_idmap_end;
>  static phys_addr_t hyp_idmap_vector;
>
> -#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> +#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>
>  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  		 * If the range is too large, release the kvm->mmu_lock
>  		 * to prevent starvation and lockup detector warnings.
>  		 */
> -		if (size > S2_PUD_SIZE)
> +		if (size > S2_PGDIR_SIZE)
>  			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
> @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>  	}
>
>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> +	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
>  		return -ENOMEM;
>
> @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	spin_unlock(&kvm->mmu_lock);
>
>  	/* Free the HW pgd, one page at a time */
> -	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> +	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
>  	kvm->arch.pgd = NULL;
>  }
>

Btw, I have a different hunk to solve the problem, posted to the kbuild
report. I will post it here for the sake of capturing the discussion in
one place. The following hunk on top of the patch, changes the lock
release after we process one PGDIR entry. As for the first time
we enter the loop we haven't done much with the lock held, hence it may make
sense to do it after the first round and we have more work to do.

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
         assert_spin_locked(&kvm->mmu_lock);
         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
         do {
+               next = stage2_pgd_addr_end(addr, end);
+               if (!stage2_pgd_none(*pgd))
+                       unmap_stage2_puds(kvm, pgd, addr, next);
                 /*
                  * If the range is too large, release the kvm->mmu_lock
                  * to prevent starvation and lockup detector warnings.
                  */
-               if (size > S2_PUD_SIZE)
+               if (next != end)
                         cond_resched_lock(&kvm->mmu_lock);
-               next = stage2_pgd_addr_end(addr, end);
-               if (!stage2_pgd_none(*pgd))
-                       unmap_stage2_puds(kvm, pgd, addr, next);
         } while (pgd++, addr = next, addr != end);
  }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-04 10:35     ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-04 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Christoffer,

On 04/04/17 11:13, Christoffer Dall wrote:
> Hi Suzuki,
>
> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>> unmap_stage2_range() on the entire memory range for the guest. This could
>> cause problems with other callers (e.g, munmap on a memslot) trying to
>> unmap a range. And since we have to unmap the entire Guest memory range
>> holding a spinlock, make sure we yield the lock if necessary, after we
>> unmap each PUD range.
>>
>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>> Cc: stable at vger.kernel.org # v3.10+
>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> [ Avoid vCPU starvation and lockup detector warnings ]
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>
>
> This unfortunately fails to build on 32-bit ARM, and I also think we
> intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.

Sorry about that, I didn't test the patch with arm32. I am fine the
patch below. And I agree that the name change does make things more
readable. See below for a hunk that I posted to the kbuild report.

>
> How about adding this to your patch (which includes a rename of
> S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> of address space mapped by a single entry in the same table):
>
> diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> index 460d616..c997f2d 100644
> --- a/arch/arm/include/asm/stage2_pgtable.h
> +++ b/arch/arm/include/asm/stage2_pgtable.h
> @@ -35,10 +35,13 @@
>
>  #define stage2_pud_huge(pud)			pud_huge(pud)
>
> +#define S2_PGDIR_SIZE				PGDIR_SIZE
> +#define S2_PGDIR_MASK				PGDIR_MASK
> +
>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
>  {
> -	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> +	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
>
>  	return (boundary - 1 < end - 1) ? boundary : end;
>  }
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..6e79a4c 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
>  static unsigned long hyp_idmap_end;
>  static phys_addr_t hyp_idmap_vector;
>
> -#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> +#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>
>  #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>  		 * If the range is too large, release the kvm->mmu_lock
>  		 * to prevent starvation and lockup detector warnings.
>  		 */
> -		if (size > S2_PUD_SIZE)
> +		if (size > S2_PGDIR_SIZE)
>  			cond_resched_lock(&kvm->mmu_lock);
>  		next = stage2_pgd_addr_end(addr, end);
>  		if (!stage2_pgd_none(*pgd))
> @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>  	}
>
>  	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> +	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
>  	if (!pgd)
>  		return -ENOMEM;
>
> @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>  	spin_unlock(&kvm->mmu_lock);
>
>  	/* Free the HW pgd, one page at a time */
> -	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> +	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
>  	kvm->arch.pgd = NULL;
>  }
>

Btw, I have a different hunk to solve the problem, posted to the kbuild
report. I will post it here for the sake of capturing the discussion in
one place. The following hunk on top of the patch, changes the lock
release after we process one PGDIR entry. As for the first time
we enter the loop we haven't done much with the lock held, hence it may make
sense to do it after the first round and we have more work to do.

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index db94f3a..582a972 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
         assert_spin_locked(&kvm->mmu_lock);
         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
         do {
+               next = stage2_pgd_addr_end(addr, end);
+               if (!stage2_pgd_none(*pgd))
+                       unmap_stage2_puds(kvm, pgd, addr, next);
                 /*
                  * If the range is too large, release the kvm->mmu_lock
                  * to prevent starvation and lockup detector warnings.
                  */
-               if (size > S2_PUD_SIZE)
+               if (next != end)
                         cond_resched_lock(&kvm->mmu_lock);
-               next = stage2_pgd_addr_end(addr, end);
-               if (!stage2_pgd_none(*pgd))
-                       unmap_stage2_puds(kvm, pgd, addr, next);
         } while (pgd++, addr = next, addr != end);
  }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-04 10:35     ` Suzuki K Poulose
  (?)
@ 2017-04-04 12:29       ` Christoffer Dall
  -1 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-04 12:29 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, andreyknvl, dvyukov, marc.zyngier,
	christoffer.dall, kvmarm, kvm, linux-kernel, kcc, syzkaller,
	will.deacon, catalin.marinas, pbonzini, mark.rutland,
	ard.biesheuvel, stable

On Tue, Apr 04, 2017 at 11:35:35AM +0100, Suzuki K Poulose wrote:
> Hi Christoffer,
> 
> On 04/04/17 11:13, Christoffer Dall wrote:
> >Hi Suzuki,
> >
> >On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> >>In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >>unmap_stage2_range() on the entire memory range for the guest. This could
> >>cause problems with other callers (e.g, munmap on a memslot) trying to
> >>unmap a range. And since we have to unmap the entire Guest memory range
> >>holding a spinlock, make sure we yield the lock if necessary, after we
> >>unmap each PUD range.
> >>
> >>Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >>Cc: stable@vger.kernel.org # v3.10+
> >>Cc: Paolo Bonzini <pbonzin@redhat.com>
> >>Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>Cc: Christoffer Dall <christoffer.dall@linaro.org>
> >>Cc: Mark Rutland <mark.rutland@arm.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>[ Avoid vCPU starvation and lockup detector warnings ]
> >>Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>
> >
> >This unfortunately fails to build on 32-bit ARM, and I also think we
> >intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
> 
> Sorry about that, I didn't test the patch with arm32. I am fine the
> patch below. And I agree that the name change does make things more
> readable. See below for a hunk that I posted to the kbuild report.
> 
> >
> >How about adding this to your patch (which includes a rename of
> >S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> >first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> >of address space mapped by a single entry in the same table):
> >
> >diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> >index 460d616..c997f2d 100644
> >--- a/arch/arm/include/asm/stage2_pgtable.h
> >+++ b/arch/arm/include/asm/stage2_pgtable.h
> >@@ -35,10 +35,13 @@
> >
> > #define stage2_pud_huge(pud)			pud_huge(pud)
> >
> >+#define S2_PGDIR_SIZE				PGDIR_SIZE
> >+#define S2_PGDIR_MASK				PGDIR_MASK
> >+
> > /* Open coded p*d_addr_end that can deal with 64bit addresses */
> > static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
> > {
> >-	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> >+	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
> >
> > 	return (boundary - 1 < end - 1) ? boundary : end;
> > }
> >diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >index db94f3a..6e79a4c 100644
> >--- a/arch/arm/kvm/mmu.c
> >+++ b/arch/arm/kvm/mmu.c
> >@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
> > static unsigned long hyp_idmap_end;
> > static phys_addr_t hyp_idmap_vector;
> >
> >-#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> >+#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
> >
> > #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> >@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> > 		 * If the range is too large, release the kvm->mmu_lock
> > 		 * to prevent starvation and lockup detector warnings.
> > 		 */
> >-		if (size > S2_PUD_SIZE)
> >+		if (size > S2_PGDIR_SIZE)
> > 			cond_resched_lock(&kvm->mmu_lock);
> > 		next = stage2_pgd_addr_end(addr, end);
> > 		if (!stage2_pgd_none(*pgd))
> >@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > 	}
> >
> > 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> >-	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> >+	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
> > 	if (!pgd)
> > 		return -ENOMEM;
> >
> >@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> > 	spin_unlock(&kvm->mmu_lock);
> >
> > 	/* Free the HW pgd, one page at a time */
> >-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> >+	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
> > 	kvm->arch.pgd = NULL;
> > }
> >
> 
> Btw, I have a different hunk to solve the problem, posted to the kbuild
> report. I will post it here for the sake of capturing the discussion in
> one place. The following hunk on top of the patch, changes the lock
> release after we process one PGDIR entry. As for the first time
> we enter the loop we haven't done much with the lock held, hence it may make
> sense to do it after the first round and we have more work to do.
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>         assert_spin_locked(&kvm->mmu_lock);
>         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>         do {
> +               next = stage2_pgd_addr_end(addr, end);
> +               if (!stage2_pgd_none(*pgd))
> +                       unmap_stage2_puds(kvm, pgd, addr, next);
>                 /*
>                  * If the range is too large, release the kvm->mmu_lock
>                  * to prevent starvation and lockup detector warnings.
>                  */
> -               if (size > S2_PUD_SIZE)
> +               if (next != end)
>                         cond_resched_lock(&kvm->mmu_lock);
> -               next = stage2_pgd_addr_end(addr, end);
> -               if (!stage2_pgd_none(*pgd))
> -                       unmap_stage2_puds(kvm, pgd, addr, next);
>         } while (pgd++, addr = next, addr != end);
>  }
> 
> 

I like your change, let me fix that up, and we can always do the rename
trick later.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-04 12:29       ` Christoffer Dall
  0 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-04 12:29 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, kvm, ard.biesheuvel, marc.zyngier, andreyknvl,
	will.deacon, linux-kernel, stable, kcc, syzkaller, dvyukov,
	catalin.marinas, pbonzini, kvmarm

On Tue, Apr 04, 2017 at 11:35:35AM +0100, Suzuki K Poulose wrote:
> Hi Christoffer,
> 
> On 04/04/17 11:13, Christoffer Dall wrote:
> >Hi Suzuki,
> >
> >On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> >>In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >>unmap_stage2_range() on the entire memory range for the guest. This could
> >>cause problems with other callers (e.g, munmap on a memslot) trying to
> >>unmap a range. And since we have to unmap the entire Guest memory range
> >>holding a spinlock, make sure we yield the lock if necessary, after we
> >>unmap each PUD range.
> >>
> >>Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >>Cc: stable@vger.kernel.org # v3.10+
> >>Cc: Paolo Bonzini <pbonzin@redhat.com>
> >>Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>Cc: Christoffer Dall <christoffer.dall@linaro.org>
> >>Cc: Mark Rutland <mark.rutland@arm.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>[ Avoid vCPU starvation and lockup detector warnings ]
> >>Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>
> >
> >This unfortunately fails to build on 32-bit ARM, and I also think we
> >intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
> 
> Sorry about that, I didn't test the patch with arm32. I am fine the
> patch below. And I agree that the name change does make things more
> readable. See below for a hunk that I posted to the kbuild report.
> 
> >
> >How about adding this to your patch (which includes a rename of
> >S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> >first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> >of address space mapped by a single entry in the same table):
> >
> >diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> >index 460d616..c997f2d 100644
> >--- a/arch/arm/include/asm/stage2_pgtable.h
> >+++ b/arch/arm/include/asm/stage2_pgtable.h
> >@@ -35,10 +35,13 @@
> >
> > #define stage2_pud_huge(pud)			pud_huge(pud)
> >
> >+#define S2_PGDIR_SIZE				PGDIR_SIZE
> >+#define S2_PGDIR_MASK				PGDIR_MASK
> >+
> > /* Open coded p*d_addr_end that can deal with 64bit addresses */
> > static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
> > {
> >-	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> >+	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
> >
> > 	return (boundary - 1 < end - 1) ? boundary : end;
> > }
> >diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >index db94f3a..6e79a4c 100644
> >--- a/arch/arm/kvm/mmu.c
> >+++ b/arch/arm/kvm/mmu.c
> >@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
> > static unsigned long hyp_idmap_end;
> > static phys_addr_t hyp_idmap_vector;
> >
> >-#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> >+#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
> >
> > #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> >@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> > 		 * If the range is too large, release the kvm->mmu_lock
> > 		 * to prevent starvation and lockup detector warnings.
> > 		 */
> >-		if (size > S2_PUD_SIZE)
> >+		if (size > S2_PGDIR_SIZE)
> > 			cond_resched_lock(&kvm->mmu_lock);
> > 		next = stage2_pgd_addr_end(addr, end);
> > 		if (!stage2_pgd_none(*pgd))
> >@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > 	}
> >
> > 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> >-	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> >+	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
> > 	if (!pgd)
> > 		return -ENOMEM;
> >
> >@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> > 	spin_unlock(&kvm->mmu_lock);
> >
> > 	/* Free the HW pgd, one page at a time */
> >-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> >+	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
> > 	kvm->arch.pgd = NULL;
> > }
> >
> 
> Btw, I have a different hunk to solve the problem, posted to the kbuild
> report. I will post it here for the sake of capturing the discussion in
> one place. The following hunk on top of the patch, changes the lock
> release after we process one PGDIR entry. As for the first time
> we enter the loop we haven't done much with the lock held, hence it may make
> sense to do it after the first round and we have more work to do.
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>         assert_spin_locked(&kvm->mmu_lock);
>         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>         do {
> +               next = stage2_pgd_addr_end(addr, end);
> +               if (!stage2_pgd_none(*pgd))
> +                       unmap_stage2_puds(kvm, pgd, addr, next);
>                 /*
>                  * If the range is too large, release the kvm->mmu_lock
>                  * to prevent starvation and lockup detector warnings.
>                  */
> -               if (size > S2_PUD_SIZE)
> +               if (next != end)
>                         cond_resched_lock(&kvm->mmu_lock);
> -               next = stage2_pgd_addr_end(addr, end);
> -               if (!stage2_pgd_none(*pgd))
> -                       unmap_stage2_puds(kvm, pgd, addr, next);
>         } while (pgd++, addr = next, addr != end);
>  }
> 
> 

I like your change, let me fix that up, and we can always do the rename
trick later.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-04 12:29       ` Christoffer Dall
  0 siblings, 0 replies; 22+ messages in thread
From: Christoffer Dall @ 2017-04-04 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 04, 2017 at 11:35:35AM +0100, Suzuki K Poulose wrote:
> Hi Christoffer,
> 
> On 04/04/17 11:13, Christoffer Dall wrote:
> >Hi Suzuki,
> >
> >On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> >>In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> >>unmap_stage2_range() on the entire memory range for the guest. This could
> >>cause problems with other callers (e.g, munmap on a memslot) trying to
> >>unmap a range. And since we have to unmap the entire Guest memory range
> >>holding a spinlock, make sure we yield the lock if necessary, after we
> >>unmap each PUD range.
> >>
> >>Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> >>Cc: stable at vger.kernel.org # v3.10+
> >>Cc: Paolo Bonzini <pbonzin@redhat.com>
> >>Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>Cc: Christoffer Dall <christoffer.dall@linaro.org>
> >>Cc: Mark Rutland <mark.rutland@arm.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>[ Avoid vCPU starvation and lockup detector warnings ]
> >>Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >>Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> >>
> >
> >This unfortunately fails to build on 32-bit ARM, and I also think we
> >intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
> 
> Sorry about that, I didn't test the patch with arm32. I am fine the
> patch below. And I agree that the name change does make things more
> readable. See below for a hunk that I posted to the kbuild report.
> 
> >
> >How about adding this to your patch (which includes a rename of
> >S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> >first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> >of address space mapped by a single entry in the same table):
> >
> >diff --git a/arch/arm/include/asm/stage2_pgtable.h b/arch/arm/include/asm/stage2_pgtable.h
> >index 460d616..c997f2d 100644
> >--- a/arch/arm/include/asm/stage2_pgtable.h
> >+++ b/arch/arm/include/asm/stage2_pgtable.h
> >@@ -35,10 +35,13 @@
> >
> > #define stage2_pud_huge(pud)			pud_huge(pud)
> >
> >+#define S2_PGDIR_SIZE				PGDIR_SIZE
> >+#define S2_PGDIR_MASK				PGDIR_MASK
> >+
> > /* Open coded p*d_addr_end that can deal with 64bit addresses */
> > static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr, phys_addr_t end)
> > {
> >-	phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> >+	phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
> >
> > 	return (boundary - 1 < end - 1) ? boundary : end;
> > }
> >diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >index db94f3a..6e79a4c 100644
> >--- a/arch/arm/kvm/mmu.c
> >+++ b/arch/arm/kvm/mmu.c
> >@@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
> > static unsigned long hyp_idmap_end;
> > static phys_addr_t hyp_idmap_vector;
> >
> >-#define S2_PGD_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> >+#define S2_PGD_TABLE_SIZE	(PTRS_PER_S2_PGD * sizeof(pgd_t))
> > #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
> >
> > #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> >@@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
> > 		 * If the range is too large, release the kvm->mmu_lock
> > 		 * to prevent starvation and lockup detector warnings.
> > 		 */
> >-		if (size > S2_PUD_SIZE)
> >+		if (size > S2_PGDIR_SIZE)
> > 			cond_resched_lock(&kvm->mmu_lock);
> > 		next = stage2_pgd_addr_end(addr, end);
> > 		if (!stage2_pgd_none(*pgd))
> >@@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > 	}
> >
> > 	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> >-	pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> >+	pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
> > 	if (!pgd)
> > 		return -ENOMEM;
> >
> >@@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> > 	spin_unlock(&kvm->mmu_lock);
> >
> > 	/* Free the HW pgd, one page at a time */
> >-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> >+	free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
> > 	kvm->arch.pgd = NULL;
> > }
> >
> 
> Btw, I have a different hunk to solve the problem, posted to the kbuild
> report. I will post it here for the sake of capturing the discussion in
> one place. The following hunk on top of the patch, changes the lock
> release after we process one PGDIR entry. As for the first time
> we enter the loop we haven't done much with the lock held, hence it may make
> sense to do it after the first round and we have more work to do.
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, u64 size)
>         assert_spin_locked(&kvm->mmu_lock);
>         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>         do {
> +               next = stage2_pgd_addr_end(addr, end);
> +               if (!stage2_pgd_none(*pgd))
> +                       unmap_stage2_puds(kvm, pgd, addr, next);
>                 /*
>                  * If the range is too large, release the kvm->mmu_lock
>                  * to prevent starvation and lockup detector warnings.
>                  */
> -               if (size > S2_PUD_SIZE)
> +               if (next != end)
>                         cond_resched_lock(&kvm->mmu_lock);
> -               next = stage2_pgd_addr_end(addr, end);
> -               if (!stage2_pgd_none(*pgd))
> -                       unmap_stage2_puds(kvm, pgd, addr, next);
>         } while (pgd++, addr = next, addr != end);
>  }
> 
> 

I like your change, let me fix that up, and we can always do the rename
trick later.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-04 10:35     ` Suzuki K Poulose
@ 2017-04-22  0:28       ` Alexander Graf
  -1 siblings, 0 replies; 22+ messages in thread
From: Alexander Graf @ 2017-04-22  0:28 UTC (permalink / raw)
  To: Suzuki K Poulose, Christoffer Dall
  Cc: linux-arm-kernel, kvm, ard.biesheuvel, marc.zyngier, andreyknvl,
	will.deacon, linux-kernel, stable, kcc, syzkaller, dvyukov,
	catalin.marinas, pbonzini, kvmarm



On 04.04.17 12:35, Suzuki K Poulose wrote:
> Hi Christoffer,
>
> On 04/04/17 11:13, Christoffer Dall wrote:
>> Hi Suzuki,
>>
>> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>> unmap_stage2_range() on the entire memory range for the guest. This
>>> could
>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>> unmap a range. And since we have to unmap the entire Guest memory range
>>> holding a spinlock, make sure we yield the lock if necessary, after we
>>> unmap each PUD range.
>>>
>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>> Cc: stable@vger.kernel.org # v3.10+
>>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> [ Avoid vCPU starvation and lockup detector warnings ]
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>
>> This unfortunately fails to build on 32-bit ARM, and I also think we
>> intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
>
> Sorry about that, I didn't test the patch with arm32. I am fine the
> patch below. And I agree that the name change does make things more
> readable. See below for a hunk that I posted to the kbuild report.
>
>>
>> How about adding this to your patch (which includes a rename of
>> S2_PGD_SIZE which is horribly confusing as it indicates the size of the
>> first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
>> of address space mapped by a single entry in the same table):
>>
>> diff --git a/arch/arm/include/asm/stage2_pgtable.h
>> b/arch/arm/include/asm/stage2_pgtable.h
>> index 460d616..c997f2d 100644
>> --- a/arch/arm/include/asm/stage2_pgtable.h
>> +++ b/arch/arm/include/asm/stage2_pgtable.h
>> @@ -35,10 +35,13 @@
>>
>>  #define stage2_pud_huge(pud)            pud_huge(pud)
>>
>> +#define S2_PGDIR_SIZE                PGDIR_SIZE
>> +#define S2_PGDIR_MASK                PGDIR_MASK
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr,
>> phys_addr_t end)
>>  {
>> -    phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
>> +    phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
>>
>>      return (boundary - 1 < end - 1) ? boundary : end;
>>  }
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index db94f3a..6e79a4c 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
>>  static unsigned long hyp_idmap_end;
>>  static phys_addr_t hyp_idmap_vector;
>>
>> -#define S2_PGD_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
>> +#define S2_PGD_TABLE_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
>>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>>
>>  #define KVM_S2PTE_FLAG_IS_IOMAP        (1UL << 0)
>> @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm,
>> phys_addr_t start, u64 size)
>>           * If the range is too large, release the kvm->mmu_lock
>>           * to prevent starvation and lockup detector warnings.
>>           */
>> -        if (size > S2_PUD_SIZE)
>> +        if (size > S2_PGDIR_SIZE)
>>              cond_resched_lock(&kvm->mmu_lock);
>>          next = stage2_pgd_addr_end(addr, end);
>>          if (!stage2_pgd_none(*pgd))
>> @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>      }
>>
>>      /* Allocate the HW PGD, making sure that each page gets its own
>> refcount */
>> -    pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
>> +    pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
>>      if (!pgd)
>>          return -ENOMEM;
>>
>> @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>      spin_unlock(&kvm->mmu_lock);
>>
>>      /* Free the HW pgd, one page at a time */
>> -    free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>> +    free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
>>      kvm->arch.pgd = NULL;
>>  }
>>
>
> Btw, I have a different hunk to solve the problem, posted to the kbuild
> report. I will post it here for the sake of capturing the discussion in
> one place. The following hunk on top of the patch, changes the lock
> release after we process one PGDIR entry. As for the first time
> we enter the loop we haven't done much with the lock held, hence it may
> make
> sense to do it after the first round and we have more work to do.
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm,
> phys_addr_t start, u64 size)
>         assert_spin_locked(&kvm->mmu_lock);
>         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>         do {
> +               next = stage2_pgd_addr_end(addr, end);
> +               if (!stage2_pgd_none(*pgd))

Just as heads up, I had this version applied to my tree by accident 
(commit 8b3405e345b5a098101b0c31b264c812bba045d9 from Christoffer's 
queue) and ran into a NULL pointer dereference:

[223090.242280] Unable to handle kernel NULL pointer dereference at 
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
[223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
[223090.262543] [<ffff0000080a2478>] 
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] [<ffff0000082274f8>] 
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
[223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
[223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
[223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
[223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
[223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
[223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
[223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
[223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
[223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
[223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
[223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c

0xffff0000080adb78 is in unmap_stage2_range (../arch/arm/kvm/mmu.c:260).
255		pud_t *pud, *start_pud;
256
257		start_pud = pud = stage2_pud_offset(pgd, addr);
258		do {
259			next = stage2_pud_addr_end(addr, end);
260			if (!stage2_pud_none(*pud)) {
261				if (stage2_pud_huge(*pud)) {
262					pud_t old_pud = *pud;
263
264					stage2_pud_clear(pud);


So please beware that for some reason pud may become invalid after 
rescheduling.


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-22  0:28       ` Alexander Graf
  0 siblings, 0 replies; 22+ messages in thread
From: Alexander Graf @ 2017-04-22  0:28 UTC (permalink / raw)
  To: linux-arm-kernel



On 04.04.17 12:35, Suzuki K Poulose wrote:
> Hi Christoffer,
>
> On 04/04/17 11:13, Christoffer Dall wrote:
>> Hi Suzuki,
>>
>> On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
>>> In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
>>> unmap_stage2_range() on the entire memory range for the guest. This
>>> could
>>> cause problems with other callers (e.g, munmap on a memslot) trying to
>>> unmap a range. And since we have to unmap the entire Guest memory range
>>> holding a spinlock, make sure we yield the lock if necessary, after we
>>> unmap each PUD range.
>>>
>>> Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
>>> Cc: stable at vger.kernel.org # v3.10+
>>> Cc: Paolo Bonzini <pbonzin@redhat.com>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> [ Avoid vCPU starvation and lockup detector warnings ]
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>>
>>
>> This unfortunately fails to build on 32-bit ARM, and I also think we
>> intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
>
> Sorry about that, I didn't test the patch with arm32. I am fine the
> patch below. And I agree that the name change does make things more
> readable. See below for a hunk that I posted to the kbuild report.
>
>>
>> How about adding this to your patch (which includes a rename of
>> S2_PGD_SIZE which is horribly confusing as it indicates the size of the
>> first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
>> of address space mapped by a single entry in the same table):
>>
>> diff --git a/arch/arm/include/asm/stage2_pgtable.h
>> b/arch/arm/include/asm/stage2_pgtable.h
>> index 460d616..c997f2d 100644
>> --- a/arch/arm/include/asm/stage2_pgtable.h
>> +++ b/arch/arm/include/asm/stage2_pgtable.h
>> @@ -35,10 +35,13 @@
>>
>>  #define stage2_pud_huge(pud)            pud_huge(pud)
>>
>> +#define S2_PGDIR_SIZE                PGDIR_SIZE
>> +#define S2_PGDIR_MASK                PGDIR_MASK
>> +
>>  /* Open coded p*d_addr_end that can deal with 64bit addresses */
>>  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr,
>> phys_addr_t end)
>>  {
>> -    phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
>> +    phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
>>
>>      return (boundary - 1 < end - 1) ? boundary : end;
>>  }
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index db94f3a..6e79a4c 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
>>  static unsigned long hyp_idmap_end;
>>  static phys_addr_t hyp_idmap_vector;
>>
>> -#define S2_PGD_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
>> +#define S2_PGD_TABLE_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
>>  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
>>
>>  #define KVM_S2PTE_FLAG_IS_IOMAP        (1UL << 0)
>> @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm,
>> phys_addr_t start, u64 size)
>>           * If the range is too large, release the kvm->mmu_lock
>>           * to prevent starvation and lockup detector warnings.
>>           */
>> -        if (size > S2_PUD_SIZE)
>> +        if (size > S2_PGDIR_SIZE)
>>              cond_resched_lock(&kvm->mmu_lock);
>>          next = stage2_pgd_addr_end(addr, end);
>>          if (!stage2_pgd_none(*pgd))
>> @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
>>      }
>>
>>      /* Allocate the HW PGD, making sure that each page gets its own
>> refcount */
>> -    pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
>> +    pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
>>      if (!pgd)
>>          return -ENOMEM;
>>
>> @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
>>      spin_unlock(&kvm->mmu_lock);
>>
>>      /* Free the HW pgd, one page at a time */
>> -    free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
>> +    free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
>>      kvm->arch.pgd = NULL;
>>  }
>>
>
> Btw, I have a different hunk to solve the problem, posted to the kbuild
> report. I will post it here for the sake of capturing the discussion in
> one place. The following hunk on top of the patch, changes the lock
> release after we process one PGDIR entry. As for the first time
> we enter the loop we haven't done much with the lock held, hence it may
> make
> sense to do it after the first round and we have more work to do.
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index db94f3a..582a972 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm,
> phys_addr_t start, u64 size)
>         assert_spin_locked(&kvm->mmu_lock);
>         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
>         do {
> +               next = stage2_pgd_addr_end(addr, end);
> +               if (!stage2_pgd_none(*pgd))

Just as heads up, I had this version applied to my tree by accident 
(commit 8b3405e345b5a098101b0c31b264c812bba045d9 from Christoffer's 
queue) and ran into a NULL pointer dereference:

[223090.242280] Unable to handle kernel NULL pointer dereference at 
virtual address 00000040
[223090.262330] PC is at unmap_stage2_range+0x8c/0x428
[223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
[223090.262531] Call trace:
[223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
[223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
[223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
[223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
[223090.262543] [<ffff0000080a2478>] 
kvm_mmu_notifier_invalidate_page+0x50/0x8c
[223090.262547] [<ffff0000082274f8>] 
__mmu_notifier_invalidate_page+0x5c/0x84
[223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
[223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
[223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
[223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
[223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
[223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
[223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
[223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
[223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
[223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
[223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
[223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
[223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
[223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
[223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
[223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
[223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
[223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
[223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
[223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
[223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
[223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
[223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c

0xffff0000080adb78 is in unmap_stage2_range (../arch/arm/kvm/mmu.c:260).
255		pud_t *pud, *start_pud;
256
257		start_pud = pud = stage2_pud_offset(pgd, addr);
258		do {
259			next = stage2_pud_addr_end(addr, end);
260			if (!stage2_pud_none(*pud)) {
261				if (stage2_pud_huge(*pud)) {
262					pud_t old_pud = *pud;
263
264					stage2_pud_clear(pud);


So please beware that for some reason pud may become invalid after 
rescheduling.


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  2017-04-22  0:28       ` Alexander Graf
@ 2017-04-24  9:42         ` Suzuki K Poulose
  -1 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-24  9:42 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Christoffer Dall, linux-arm-kernel, kvm, ard.biesheuvel,
	marc.zyngier, andreyknvl, will.deacon, linux-kernel, stable, kcc,
	syzkaller, dvyukov, catalin.marinas, pbonzini, kvmarm

On Sat, Apr 22, 2017 at 02:28:44AM +0200, Alexander Graf wrote:
> 
> 
> On 04.04.17 12:35, Suzuki K Poulose wrote:
> > Hi Christoffer,
> > 
> > On 04/04/17 11:13, Christoffer Dall wrote:
> > > Hi Suzuki,
> > > 
> > > On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> > > > In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> > > > unmap_stage2_range() on the entire memory range for the guest. This
> > > > could
> > > > cause problems with other callers (e.g, munmap on a memslot) trying to
> > > > unmap a range. And since we have to unmap the entire Guest memory range
> > > > holding a spinlock, make sure we yield the lock if necessary, after we
> > > > unmap each PUD range.
> > > > 
> > > > Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> > > > Cc: stable@vger.kernel.org # v3.10+
> > > > Cc: Paolo Bonzini <pbonzin@redhat.com>
> > > > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > > > Cc: Christoffer Dall <christoffer.dall@linaro.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > [ Avoid vCPU starvation and lockup detector warnings ]
> > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > 
> > > 
> > > This unfortunately fails to build on 32-bit ARM, and I also think we
> > > intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
> > 
> > Sorry about that, I didn't test the patch with arm32. I am fine the
> > patch below. And I agree that the name change does make things more
> > readable. See below for a hunk that I posted to the kbuild report.
> > 
> > > 
> > > How about adding this to your patch (which includes a rename of
> > > S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> > > first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> > > of address space mapped by a single entry in the same table):
> > > 
> > > diff --git a/arch/arm/include/asm/stage2_pgtable.h
> > > b/arch/arm/include/asm/stage2_pgtable.h
> > > index 460d616..c997f2d 100644
> > > --- a/arch/arm/include/asm/stage2_pgtable.h
> > > +++ b/arch/arm/include/asm/stage2_pgtable.h
> > > @@ -35,10 +35,13 @@
> > > 
> > >  #define stage2_pud_huge(pud)            pud_huge(pud)
> > > 
> > > +#define S2_PGDIR_SIZE                PGDIR_SIZE
> > > +#define S2_PGDIR_MASK                PGDIR_MASK
> > > +
> > >  /* Open coded p*d_addr_end that can deal with 64bit addresses */
> > >  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr,
> > > phys_addr_t end)
> > >  {
> > > -    phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> > > +    phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
> > > 
> > >      return (boundary - 1 < end - 1) ? boundary : end;
> > >  }
> > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > > index db94f3a..6e79a4c 100644
> > > --- a/arch/arm/kvm/mmu.c
> > > +++ b/arch/arm/kvm/mmu.c
> > > @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
> > >  static unsigned long hyp_idmap_end;
> > >  static phys_addr_t hyp_idmap_vector;
> > > 
> > > -#define S2_PGD_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
> > > +#define S2_PGD_TABLE_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
> > >  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
> > > 
> > >  #define KVM_S2PTE_FLAG_IS_IOMAP        (1UL << 0)
> > > @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm,
> > > phys_addr_t start, u64 size)
> > >           * If the range is too large, release the kvm->mmu_lock
> > >           * to prevent starvation and lockup detector warnings.
> > >           */
> > > -        if (size > S2_PUD_SIZE)
> > > +        if (size > S2_PGDIR_SIZE)
> > >              cond_resched_lock(&kvm->mmu_lock);
> > >          next = stage2_pgd_addr_end(addr, end);
> > >          if (!stage2_pgd_none(*pgd))
> > > @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > >      }
> > > 
> > >      /* Allocate the HW PGD, making sure that each page gets its own
> > > refcount */
> > > -    pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> > > +    pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
> > >      if (!pgd)
> > >          return -ENOMEM;
> > > 
> > > @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> > >      spin_unlock(&kvm->mmu_lock);
> > > 
> > >      /* Free the HW pgd, one page at a time */
> > > -    free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> > > +    free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
> > >      kvm->arch.pgd = NULL;
> > >  }
> > > 
> > 
> > Btw, I have a different hunk to solve the problem, posted to the kbuild
> > report. I will post it here for the sake of capturing the discussion in
> > one place. The following hunk on top of the patch, changes the lock
> > release after we process one PGDIR entry. As for the first time
> > we enter the loop we haven't done much with the lock held, hence it may
> > make
> > sense to do it after the first round and we have more work to do.
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index db94f3a..582a972 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm,
> > phys_addr_t start, u64 size)
> >         assert_spin_locked(&kvm->mmu_lock);
> >         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
> >         do {
> > +               next = stage2_pgd_addr_end(addr, end);
> > +               if (!stage2_pgd_none(*pgd))
> 
> Just as heads up, I had this version applied to my tree by accident (commit
> 8b3405e345b5a098101b0c31b264c812bba045d9 from Christoffer's queue) and ran
> into a NULL pointer dereference:
> 
> [223090.242280] Unable to handle kernel NULL pointer dereference at virtual
> address 00000040
> [223090.262330] PC is at unmap_stage2_range+0x8c/0x428
> [223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
> [223090.262531] Call trace:
> [223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
> [223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
> [223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
> [223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
> [223090.262543] [<ffff0000080a2478>]
> kvm_mmu_notifier_invalidate_page+0x50/0x8c
> [223090.262547] [<ffff0000082274f8>]
> __mmu_notifier_invalidate_page+0x5c/0x84
> [223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
> [223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
> [223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
> [223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
> [223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
> [223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
> [223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
> [223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
> [223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
> [223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
> [223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
> [223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
> [223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
> [223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
> [223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
> [223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
> [223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
> [223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
> [223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
> [223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
> [223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
> [223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
> [223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c
> 
> 0xffff0000080adb78 is in unmap_stage2_range (../arch/arm/kvm/mmu.c:260).
> 255		pud_t *pud, *start_pud;
> 256
> 257		start_pud = pud = stage2_pud_offset(pgd, addr);
> 258		do {
> 259			next = stage2_pud_addr_end(addr, end);
> 260			if (!stage2_pud_none(*pud)) {
> 261				if (stage2_pud_huge(*pud)) {
> 262					pud_t old_pud = *pud;
> 263
> 264					stage2_pud_clear(pud);
> 
> 
> So please beware that for some reason pud may become invalid after
> rescheduling.

Alex,

Thanks for the report. The patch below should fix this one.

---8>---

kvm: arm/arm64: Fix race in resetting stage2 PGD

In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on the lock could potentially see that
the PGD is still valid and proceed to perform a stage2 operation.

This patch moves the stage2 PGD manipulation under the lock.

Reported-by: Alexander Graf <agraf@suse.de>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kvm/mmu.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 582a972..9c4026d 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -835,16 +835,18 @@ void stage2_unmap_vm(struct kvm *kvm)
  */
 void kvm_free_stage2_pgd(struct kvm *kvm)
 {
-	if (kvm->arch.pgd == NULL)
-		return;
+	void *pgd = NULL;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	if (kvm->arch.pgd) {
+		unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+		pgd = kvm->arch.pgd;
+		kvm->arch.pgd = NULL;
+	}
 	spin_unlock(&kvm->mmu_lock);
-
 	/* Free the HW pgd, one page at a time */
-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
-	kvm->arch.pgd = NULL;
+	if (pgd)
+		free_pages_exact(pgd, S2_PGD_SIZE);
 }
 
 static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
@ 2017-04-24  9:42         ` Suzuki K Poulose
  0 siblings, 0 replies; 22+ messages in thread
From: Suzuki K Poulose @ 2017-04-24  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, Apr 22, 2017 at 02:28:44AM +0200, Alexander Graf wrote:
> 
> 
> On 04.04.17 12:35, Suzuki K Poulose wrote:
> > Hi Christoffer,
> > 
> > On 04/04/17 11:13, Christoffer Dall wrote:
> > > Hi Suzuki,
> > > 
> > > On Mon, Apr 03, 2017 at 03:12:43PM +0100, Suzuki K Poulose wrote:
> > > > In kvm_free_stage2_pgd() we don't hold the kvm->mmu_lock while calling
> > > > unmap_stage2_range() on the entire memory range for the guest. This
> > > > could
> > > > cause problems with other callers (e.g, munmap on a memslot) trying to
> > > > unmap a range. And since we have to unmap the entire Guest memory range
> > > > holding a spinlock, make sure we yield the lock if necessary, after we
> > > > unmap each PUD range.
> > > > 
> > > > Fixes: commit d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> > > > Cc: stable at vger.kernel.org # v3.10+
> > > > Cc: Paolo Bonzini <pbonzin@redhat.com>
> > > > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > > > Cc: Christoffer Dall <christoffer.dall@linaro.org>
> > > > Cc: Mark Rutland <mark.rutland@arm.com>
> > > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > [ Avoid vCPU starvation and lockup detector warnings ]
> > > > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > > > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> > > > 
> > > 
> > > This unfortunately fails to build on 32-bit ARM, and I also think we
> > > intended to check against S2_PGDIR_SIZE, not S2_PUD_SIZE.
> > 
> > Sorry about that, I didn't test the patch with arm32. I am fine the
> > patch below. And I agree that the name change does make things more
> > readable. See below for a hunk that I posted to the kbuild report.
> > 
> > > 
> > > How about adding this to your patch (which includes a rename of
> > > S2_PGD_SIZE which is horribly confusing as it indicates the size of the
> > > first level stage-2 table itself, where S2_PGDIR_SIZE indicates the size
> > > of address space mapped by a single entry in the same table):
> > > 
> > > diff --git a/arch/arm/include/asm/stage2_pgtable.h
> > > b/arch/arm/include/asm/stage2_pgtable.h
> > > index 460d616..c997f2d 100644
> > > --- a/arch/arm/include/asm/stage2_pgtable.h
> > > +++ b/arch/arm/include/asm/stage2_pgtable.h
> > > @@ -35,10 +35,13 @@
> > > 
> > >  #define stage2_pud_huge(pud)            pud_huge(pud)
> > > 
> > > +#define S2_PGDIR_SIZE                PGDIR_SIZE
> > > +#define S2_PGDIR_MASK                PGDIR_MASK
> > > +
> > >  /* Open coded p*d_addr_end that can deal with 64bit addresses */
> > >  static inline phys_addr_t stage2_pgd_addr_end(phys_addr_t addr,
> > > phys_addr_t end)
> > >  {
> > > -    phys_addr_t boundary = (addr + PGDIR_SIZE) & PGDIR_MASK;
> > > +    phys_addr_t boundary = (addr + S2_PGDIR_SIZE) & S2_PGDIR_MASK;
> > > 
> > >      return (boundary - 1 < end - 1) ? boundary : end;
> > >  }
> > > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > > index db94f3a..6e79a4c 100644
> > > --- a/arch/arm/kvm/mmu.c
> > > +++ b/arch/arm/kvm/mmu.c
> > > @@ -41,7 +41,7 @@ static unsigned long hyp_idmap_start;
> > >  static unsigned long hyp_idmap_end;
> > >  static phys_addr_t hyp_idmap_vector;
> > > 
> > > -#define S2_PGD_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
> > > +#define S2_PGD_TABLE_SIZE    (PTRS_PER_S2_PGD * sizeof(pgd_t))
> > >  #define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
> > > 
> > >  #define KVM_S2PTE_FLAG_IS_IOMAP        (1UL << 0)
> > > @@ -299,7 +299,7 @@ static void unmap_stage2_range(struct kvm *kvm,
> > > phys_addr_t start, u64 size)
> > >           * If the range is too large, release the kvm->mmu_lock
> > >           * to prevent starvation and lockup detector warnings.
> > >           */
> > > -        if (size > S2_PUD_SIZE)
> > > +        if (size > S2_PGDIR_SIZE)
> > >              cond_resched_lock(&kvm->mmu_lock);
> > >          next = stage2_pgd_addr_end(addr, end);
> > >          if (!stage2_pgd_none(*pgd))
> > > @@ -747,7 +747,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
> > >      }
> > > 
> > >      /* Allocate the HW PGD, making sure that each page gets its own
> > > refcount */
> > > -    pgd = alloc_pages_exact(S2_PGD_SIZE, GFP_KERNEL | __GFP_ZERO);
> > > +    pgd = alloc_pages_exact(S2_PGD_TABLE_SIZE, GFP_KERNEL | __GFP_ZERO);
> > >      if (!pgd)
> > >          return -ENOMEM;
> > > 
> > > @@ -843,7 +843,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
> > >      spin_unlock(&kvm->mmu_lock);
> > > 
> > >      /* Free the HW pgd, one page at a time */
> > > -    free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
> > > +    free_pages_exact(kvm->arch.pgd, S2_PGD_TABLE_SIZE);
> > >      kvm->arch.pgd = NULL;
> > >  }
> > > 
> > 
> > Btw, I have a different hunk to solve the problem, posted to the kbuild
> > report. I will post it here for the sake of capturing the discussion in
> > one place. The following hunk on top of the patch, changes the lock
> > release after we process one PGDIR entry. As for the first time
> > we enter the loop we haven't done much with the lock held, hence it may
> > make
> > sense to do it after the first round and we have more work to do.
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index db94f3a..582a972 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -295,15 +295,15 @@ static void unmap_stage2_range(struct kvm *kvm,
> > phys_addr_t start, u64 size)
> >         assert_spin_locked(&kvm->mmu_lock);
> >         pgd = kvm->arch.pgd + stage2_pgd_index(addr);
> >         do {
> > +               next = stage2_pgd_addr_end(addr, end);
> > +               if (!stage2_pgd_none(*pgd))
> 
> Just as heads up, I had this version applied to my tree by accident (commit
> 8b3405e345b5a098101b0c31b264c812bba045d9 from Christoffer's queue) and ran
> into a NULL pointer dereference:
> 
> [223090.242280] Unable to handle kernel NULL pointer dereference at virtual
> address 00000040
> [223090.262330] PC is at unmap_stage2_range+0x8c/0x428
> [223090.262332] LR is at kvm_unmap_hva_handler+0x2c/0x3c
> [223090.262531] Call trace:
> [223090.262533] [<ffff0000080adb78>] unmap_stage2_range+0x8c/0x428
> [223090.262535] [<ffff0000080adf40>] kvm_unmap_hva_handler+0x2c/0x3c
> [223090.262537] [<ffff0000080ace2c>] handle_hva_to_gpa+0xb0/0x104
> [223090.262539] [<ffff0000080af988>] kvm_unmap_hva+0x5c/0xbc
> [223090.262543] [<ffff0000080a2478>]
> kvm_mmu_notifier_invalidate_page+0x50/0x8c
> [223090.262547] [<ffff0000082274f8>]
> __mmu_notifier_invalidate_page+0x5c/0x84
> [223090.262551] [<ffff00000820b700>] try_to_unmap_one+0x1d0/0x4a0
> [223090.262553] [<ffff00000820c5c8>] rmap_walk+0x1cc/0x2e0
> [223090.262555] [<ffff00000820c90c>] try_to_unmap+0x74/0xa4
> [223090.262557] [<ffff000008230ce4>] migrate_pages+0x31c/0x5ac
> [223090.262561] [<ffff0000081f869c>] compact_zone+0x3fc/0x7ac
> [223090.262563] [<ffff0000081f8ae0>] compact_zone_order+0x94/0xb0
> [223090.262564] [<ffff0000081f91c0>] try_to_compact_pages+0x108/0x290
> [223090.262569] [<ffff0000081d5108>] __alloc_pages_direct_compact+0x70/0x1ac
> [223090.262571] [<ffff0000081d64a0>] __alloc_pages_nodemask+0x434/0x9f4
> [223090.262572] [<ffff0000082256f0>] alloc_pages_vma+0x230/0x254
> [223090.262574] [<ffff000008235e5c>] do_huge_pmd_anonymous_page+0x114/0x538
> [223090.262576] [<ffff000008201bec>] handle_mm_fault+0xd40/0x17a4
> [223090.262577] [<ffff0000081fb324>] __get_user_pages+0x12c/0x36c
> [223090.262578] [<ffff0000081fb804>] get_user_pages_unlocked+0xa4/0x1b8
> [223090.262579] [<ffff0000080a3ce8>] __gfn_to_pfn_memslot+0x280/0x31c
> [223090.262580] [<ffff0000080a3dd0>] gfn_to_pfn_prot+0x4c/0x5c
> [223090.262582] [<ffff0000080af3f8>] kvm_handle_guest_abort+0x240/0x774
> [223090.262584] [<ffff0000080b2bac>] handle_exit+0x11c/0x1ac
> [223090.262586] [<ffff0000080ab99c>] kvm_arch_vcpu_ioctl_run+0x31c/0x648
> [223090.262587] [<ffff0000080a1d78>] kvm_vcpu_ioctl+0x378/0x768
> [223090.262590] [<ffff00000825df5c>] do_vfs_ioctl+0x324/0x5a4
> [223090.262591] [<ffff00000825e26c>] SyS_ioctl+0x90/0xa4
> [223090.262595] [<ffff000008085d84>] el0_svc_naked+0x38/0x3c
> 
> 0xffff0000080adb78 is in unmap_stage2_range (../arch/arm/kvm/mmu.c:260).
> 255		pud_t *pud, *start_pud;
> 256
> 257		start_pud = pud = stage2_pud_offset(pgd, addr);
> 258		do {
> 259			next = stage2_pud_addr_end(addr, end);
> 260			if (!stage2_pud_none(*pud)) {
> 261				if (stage2_pud_huge(*pud)) {
> 262					pud_t old_pud = *pud;
> 263
> 264					stage2_pud_clear(pud);
> 
> 
> So please beware that for some reason pud may become invalid after
> rescheduling.

Alex,

Thanks for the report. The patch below should fix this one.

---8>---

kvm: arm/arm64: Fix race in resetting stage2 PGD

In kvm_free_stage2_pgd() we check the stage2 PGD before holding
the lock and proceed to take the lock if it is valid. And we unmap
the page tables, followed by releasing the lock. We reset the PGD
only after dropping this lock, which could cause a race condition
where another thread waiting on the lock could potentially see that
the PGD is still valid and proceed to perform a stage2 operation.

This patch moves the stage2 PGD manipulation under the lock.

Reported-by: Alexander Graf <agraf@suse.de>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm/kvm/mmu.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 582a972..9c4026d 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -835,16 +835,18 @@ void stage2_unmap_vm(struct kvm *kvm)
  */
 void kvm_free_stage2_pgd(struct kvm *kvm)
 {
-	if (kvm->arch.pgd == NULL)
-		return;
+	void *pgd = NULL;
 
 	spin_lock(&kvm->mmu_lock);
-	unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+	if (kvm->arch.pgd) {
+		unmap_stage2_range(kvm, 0, KVM_PHYS_SIZE);
+		pgd = kvm->arch.pgd;
+		kvm->arch.pgd = NULL;
+	}
 	spin_unlock(&kvm->mmu_lock);
-
 	/* Free the HW pgd, one page at a time */
-	free_pages_exact(kvm->arch.pgd, S2_PGD_SIZE);
-	kvm->arch.pgd = NULL;
+	if (pgd)
+		free_pages_exact(pgd, S2_PGD_SIZE);
 }
 
 static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-04-24  9:42 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-03 14:12 [PATCH v3] kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd Suzuki K Poulose
2017-04-03 14:12 ` Suzuki K Poulose
2017-04-03 14:12 ` Suzuki K Poulose
2017-04-03 14:22 ` Mark Rutland
2017-04-03 14:22   ` Mark Rutland
2017-04-03 14:25   ` Suzuki K Poulose
2017-04-03 14:25     ` Suzuki K Poulose
2017-04-03 14:31   ` Christoffer Dall
2017-04-03 14:31     ` Christoffer Dall
2017-04-04 10:13 ` Christoffer Dall
2017-04-04 10:13   ` Christoffer Dall
2017-04-04 10:13   ` Christoffer Dall
2017-04-04 10:35   ` Suzuki K Poulose
2017-04-04 10:35     ` Suzuki K Poulose
2017-04-04 10:35     ` Suzuki K Poulose
2017-04-04 12:29     ` Christoffer Dall
2017-04-04 12:29       ` Christoffer Dall
2017-04-04 12:29       ` Christoffer Dall
2017-04-22  0:28     ` Alexander Graf
2017-04-22  0:28       ` Alexander Graf
2017-04-24  9:42       ` Suzuki K Poulose
2017-04-24  9:42         ` Suzuki K Poulose

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.