linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
@ 2019-11-13 16:46 Greg Kurz
  2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
  2019-11-20 16:38 ` [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Lijun Pan
  0 siblings, 2 replies; 4+ messages in thread
From: Greg Kurz @ 2019-11-13 16:46 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Cédric Le Goater,
	David Gibson, Lijun Pan, Satheesh Rajendran, Laurent Vivier,
	kvm-ppc, linuxppc-dev, stable, linux-kernel

The EQ page is allocated by the guest and then passed to the hypervisor
with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page
before handing it over to the HW. This reference is dropped either when
the guest issues the H_INT_RESET hcall or when the KVM device is released.
But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,
either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).
In both cases the existing EQ page reference is leaked because we simply
overwrite it in the XIVE queue structure without calling put_page().

This is especially visible when the guest memory is backed with huge pages:
start a VM up to the guest userspace, either reboot it or unplug a vCPU,
quit QEMU. The leak is observed by comparing the value of HugePages_Free in
/proc/meminfo before and after the VM is run.

Ideally we'd want the XIVE code to handle the EQ page de-allocation at the
platform level. This isn't the case right now because the various XIVE
drivers have different allocation needs. It could maybe worth introducing
hooks for this purpose instead of exposing XIVE internals to the drivers,
but this is certainly a huge work to be done later.

In the meantime, for easier backport, fix both vCPU unplug and guest reboot
leaks by introducing a wrapper around xive_native_configure_queue() that
does the necessary cleanup.

Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v5.2
Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
v2: use wrapper as suggested by Cedric
---
 arch/powerpc/kvm/book3s_xive_native.c |   31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 34bd123fa024..0e1fc5a16729 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -50,6 +50,24 @@ static void kvmppc_xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
 	}
 }
 
+static int kvmppc_xive_native_configure_queue(u32 vp_id, struct xive_q *q,
+					      u8 prio, __be32 *qpage,
+					      u32 order, bool can_escalate)
+{
+	int rc;
+	__be32 *qpage_prev = q->qpage;
+
+	rc = xive_native_configure_queue(vp_id, q, prio, qpage, order,
+					 can_escalate);
+	if (rc)
+		return rc;
+
+	if (qpage_prev)
+		put_page(virt_to_page(qpage_prev));
+
+	return rc;
+}
+
 void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
@@ -575,19 +593,14 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 		q->guest_qaddr  = 0;
 		q->guest_qshift = 0;
 
-		rc = xive_native_configure_queue(xc->vp_id, q, priority,
-						 NULL, 0, true);
+		rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
+							NULL, 0, true);
 		if (rc) {
 			pr_err("Failed to reset queue %d for VCPU %d: %d\n",
 			       priority, xc->server_num, rc);
 			return rc;
 		}
 
-		if (q->qpage) {
-			put_page(virt_to_page(q->qpage));
-			q->qpage = NULL;
-		}
-
 		return 0;
 	}
 
@@ -646,8 +659,8 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 	  * OPAL level because the use of END ESBs is not supported by
 	  * Linux.
 	  */
-	rc = xive_native_configure_queue(xc->vp_id, q, priority,
-					 (__be32 *) qaddr, kvm_eq.qshift, true);
+	rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
+					(__be32 *) qaddr, kvm_eq.qshift, true);
 	if (rc) {
 		pr_err("Failed to configure queue %d for VCPU %d: %d\n",
 		       priority, xc->server_num, rc);


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
  2019-11-13 16:46 [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Greg Kurz
@ 2019-11-13 16:46 ` Greg Kurz
  2019-11-13 17:00   ` Cédric Le Goater
  2019-11-20 16:38 ` [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Lijun Pan
  1 sibling, 1 reply; 4+ messages in thread
From: Greg Kurz @ 2019-11-13 16:46 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Michael Ellerman, Benjamin Herrenschmidt, Cédric Le Goater,
	David Gibson, Lijun Pan, Satheesh Rajendran, Laurent Vivier,
	kvm-ppc, linuxppc-dev, stable, linux-kernel

We need to check the host page size is big enough to accomodate the
EQ. Let's do this before taking a reference on the EQ page to avoid
a potential leak if the check fails.

Cc: stable@vger.kernel.org # v5.2
Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_xive_native.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 0e1fc5a16729..d83adb1e1490 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -630,12 +630,6 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	gfn = gpa_to_gfn(kvm_eq.qaddr);
-	page = gfn_to_page(kvm, gfn);
-	if (is_error_page(page)) {
-		srcu_read_unlock(&kvm->srcu, srcu_idx);
-		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
-		return -EINVAL;
-	}
 
 	page_size = kvm_host_page_size(kvm, gfn);
 	if (1ull << kvm_eq.qshift > page_size) {
@@ -644,6 +638,13 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 		return -EINVAL;
 	}
 
+	page = gfn_to_page(kvm, gfn);
+	if (is_error_page(page)) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
+		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
+		return -EINVAL;
+	}
+
 	qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path
  2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
@ 2019-11-13 17:00   ` Cédric Le Goater
  0 siblings, 0 replies; 4+ messages in thread
From: Cédric Le Goater @ 2019-11-13 17:00 UTC (permalink / raw)
  To: Greg Kurz, Paul Mackerras
  Cc: Michael Ellerman, Benjamin Herrenschmidt, David Gibson,
	Lijun Pan, Satheesh Rajendran, Laurent Vivier, kvm-ppc,
	linuxppc-dev, stable, linux-kernel

On 13/11/2019 17:46, Greg Kurz wrote:
> We need to check the host page size is big enough to accomodate the
> EQ. Let's do this before taking a reference on the EQ page to avoid
> a potential leak if the check fails.
> 
> Cc: stable@vger.kernel.org # v5.2
> Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
> Signed-off-by: Greg Kurz <groug@kaod.org>


Reviewed-by: Cédric Le Goater <clg@kaod.org>

> ---
>  arch/powerpc/kvm/book3s_xive_native.c |   13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 0e1fc5a16729..d83adb1e1490 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -630,12 +630,6 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
>  
>  	srcu_idx = srcu_read_lock(&kvm->srcu);
>  	gfn = gpa_to_gfn(kvm_eq.qaddr);
> -	page = gfn_to_page(kvm, gfn);
> -	if (is_error_page(page)) {
> -		srcu_read_unlock(&kvm->srcu, srcu_idx);
> -		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
> -		return -EINVAL;
> -	}
>  
>  	page_size = kvm_host_page_size(kvm, gfn);
>  	if (1ull << kvm_eq.qshift > page_size) {
> @@ -644,6 +638,13 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
>  		return -EINVAL;
>  	}
>  
> +	page = gfn_to_page(kvm, gfn);
> +	if (is_error_page(page)) {
> +		srcu_read_unlock(&kvm->srcu, srcu_idx);
> +		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
> +		return -EINVAL;
> +	}
> +
>  	qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>  
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one
  2019-11-13 16:46 [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Greg Kurz
  2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
@ 2019-11-20 16:38 ` Lijun Pan
  1 sibling, 0 replies; 4+ messages in thread
From: Lijun Pan @ 2019-11-20 16:38 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paul Mackerras, Laurent Vivier, linux-kernel, kvm-ppc,
	Satheesh Rajendran, Cédric Le Goater, Lijun Pan, stable,
	linuxppc-dev, David Gibson



> On Nov 13, 2019, at 10:46 AM, Greg Kurz <groug@kaod.org> wrote:
> 
> The EQ page is allocated by the guest and then passed to the hypervisor
> with the H_INT_SET_QUEUE_CONFIG hcall. A reference is taken on the page
> before handing it over to the HW. This reference is dropped either when
> the guest issues the H_INT_RESET hcall or when the KVM device is released.
> But, the guest can legitimately call H_INT_SET_QUEUE_CONFIG several times,
> either to reset the EQ (vCPU hot unplug) or to set a new EQ (guest reboot).
> In both cases the existing EQ page reference is leaked because we simply
> overwrite it in the XIVE queue structure without calling put_page().
> 
> This is especially visible when the guest memory is backed with huge pages:
> start a VM up to the guest userspace, either reboot it or unplug a vCPU,
> quit QEMU. The leak is observed by comparing the value of HugePages_Free in
> /proc/meminfo before and after the VM is run.
> 
> Ideally we'd want the XIVE code to handle the EQ page de-allocation at the
> platform level. This isn't the case right now because the various XIVE
> drivers have different allocation needs. It could maybe worth introducing
> hooks for this purpose instead of exposing XIVE internals to the drivers,
> but this is certainly a huge work to be done later.
> 
> In the meantime, for easier backport, fix both vCPU unplug and guest reboot
> leaks by introducing a wrapper around xive_native_configure_queue() that
> does the necessary cleanup.
> 
> Reported-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Cc: stable@vger.kernel.org # v5.2
> Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Signed-off-by: Greg Kurz <groug@kaod.org>

Tested-by: Lijun Pan <ljp@linux.ibm.com>

> ---
> v2: use wrapper as suggested by Cedric
> ---
> arch/powerpc/kvm/book3s_xive_native.c |   31 ++++++++++++++++++++++---------
> 1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index 34bd123fa024..0e1fc5a16729 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -50,6 +50,24 @@ static void kvmppc_xive_native_cleanup_queue(struct kvm_vcpu *vcpu, int prio)
> 	}
> }
> 
> +static int kvmppc_xive_native_configure_queue(u32 vp_id, struct xive_q *q,
> +					      u8 prio, __be32 *qpage,
> +					      u32 order, bool can_escalate)
> +{
> +	int rc;
> +	__be32 *qpage_prev = q->qpage;
> +
> +	rc = xive_native_configure_queue(vp_id, q, prio, qpage, order,
> +					 can_escalate);
> +	if (rc)
> +		return rc;
> +
> +	if (qpage_prev)
> +		put_page(virt_to_page(qpage_prev));
> +
> +	return rc;
> +}
> +
> void kvmppc_xive_native_cleanup_vcpu(struct kvm_vcpu *vcpu)
> {
> 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
> @@ -575,19 +593,14 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> 		q->guest_qaddr  = 0;
> 		q->guest_qshift = 0;
> 
> -		rc = xive_native_configure_queue(xc->vp_id, q, priority,
> -						 NULL, 0, true);
> +		rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
> +							NULL, 0, true);
> 		if (rc) {
> 			pr_err("Failed to reset queue %d for VCPU %d: %d\n",
> 			       priority, xc->server_num, rc);
> 			return rc;
> 		}
> 
> -		if (q->qpage) {
> -			put_page(virt_to_page(q->qpage));
> -			q->qpage = NULL;
> -		}
> -
> 		return 0;
> 	}
> 
> @@ -646,8 +659,8 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
> 	  * OPAL level because the use of END ESBs is not supported by
> 	  * Linux.
> 	  */
> -	rc = xive_native_configure_queue(xc->vp_id, q, priority,
> -					 (__be32 *) qaddr, kvm_eq.qshift, true);
> +	rc = kvmppc_xive_native_configure_queue(xc->vp_id, q, priority,
> +					(__be32 *) qaddr, kvm_eq.qshift, true);
> 	if (rc) {
> 		pr_err("Failed to configure queue %d for VCPU %d: %d\n",
> 		       priority, xc->server_num, rc);
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-20 16:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 16:46 [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Greg Kurz
2019-11-13 16:46 ` [PATCH v2 2/2] KVM: PPC: Book3S HV: XIVE: Fix potential page leak on error path Greg Kurz
2019-11-13 17:00   ` Cédric Le Goater
2019-11-20 16:38 ` [PATCH v2 1/2] KVM: PPC: Book3S HV: XIVE: Free previous EQ page when setting up a new one Lijun Pan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).