[PATCH v4 0/2] let archs decide for vcpu ids

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/2] let archs decide for vcpu ids
@ 2016-04-21 14:14 Greg Kurz
  2016-04-21 14:15 ` [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS Greg Kurz
  2016-04-21 14:20 ` [PATCH v4 2/2] KVM: move vcpu id checking to archs Greg Kurz
  0 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2016-04-21 14:14 UTC (permalink / raw)
  To: Paolo Bonzini, james.hogan, mingo
  Cc: linux-mips, kvm, rkrcmar, linux-kernel, David Hildenbrand,
	qemu-ppc, Cornelia Huck, Paul Mackerras, David Gibson

This series mostly addresses Radim's comments on my previous patch
"KVM: remove buggy vcpu id check on vcpu creation":
- prepended a patch to fix kvm_get_vcpu_by_id()
- updated the KVM API documentation

---

Greg Kurz (2):
      KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
      KVM: move vcpu id checking to archs


 Documentation/virtual/kvm/api.txt |    7 +++----
 arch/mips/kvm/mips.c              |    7 ++++++-
 arch/x86/kvm/x86.c                |    3 +++
 include/linux/kvm_host.h          |    7 ++++---
 virt/kvm/kvm_main.c               |    3 ---
 5 files changed, 16 insertions(+), 11 deletions(-)

--
Greg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
  2016-04-21 14:14 [PATCH v4 0/2] let archs decide for vcpu ids Greg Kurz
@ 2016-04-21 14:15 ` Greg Kurz
  2016-04-21 14:17   ` David Hildenbrand
                     ` (2 more replies)
  2016-04-21 14:20 ` [PATCH v4 2/2] KVM: move vcpu id checking to archs Greg Kurz
  1 sibling, 3 replies; 22+ messages in thread
From: Greg Kurz @ 2016-04-21 14:15 UTC (permalink / raw)
  To: Paolo Bonzini, james.hogan, mingo
  Cc: linux-mips, kvm, rkrcmar, linux-kernel, David Hildenbrand,
	qemu-ppc, Cornelia Huck, Paul Mackerras, David Gibson

Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.

This patch simply reverses the logic so that we only try fast path if the
vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
affected by the change.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
 include/linux/kvm_host.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 5276fe0916fc..23bfe1bd159c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -447,12 +447,13 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
 
 static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
 {
-	struct kvm_vcpu *vcpu;
+	struct kvm_vcpu *vcpu = NULL;
 	int i;
 
-	if (id < 0 || id >= KVM_MAX_VCPUS)
+	if (id < 0)
 		return NULL;
-	vcpu = kvm_get_vcpu(kvm, id);
+	if (id < KVM_MAX_VCPUS)
+		vcpu = kvm_get_vcpu(kvm, id);
 	if (vcpu && vcpu->vcpu_id == id)
 		return vcpu;
 	kvm_for_each_vcpu(i, vcpu, kvm)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
  2016-04-21 14:15 ` [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS Greg Kurz
@ 2016-04-21 14:17   ` David Hildenbrand
  2016-04-21 14:30     ` Greg Kurz
  2016-04-26  7:44   ` Cornelia Huck
  2016-04-27  9:40   ` Gerg Kurz
  2 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2016-04-21 14:17 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, rkrcmar,
	linux-kernel, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

> Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
> a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
> problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
> 
> This patch simply reverses the logic so that we only try fast path if the
> vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
> affected by the change.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  include/linux/kvm_host.h |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5276fe0916fc..23bfe1bd159c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -447,12 +447,13 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> 
>  static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
>  {
> -	struct kvm_vcpu *vcpu;
> +	struct kvm_vcpu *vcpu = NULL;
>  	int i;
> 
> -	if (id < 0 || id >= KVM_MAX_VCPUS)
> +	if (id < 0)
>  		return NULL;
> -	vcpu = kvm_get_vcpu(kvm, id);
> +	if (id < KVM_MAX_VCPUS)
> +		vcpu = kvm_get_vcpu(kvm, id);

Maybe this check even should go into kvm_get_vcpu()

>  	if (vcpu && vcpu->vcpu_id == id)
>  		return vcpu;
>  	kvm_for_each_vcpu(i, vcpu, kvm)
> 

Anyhow,

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-21 14:14 [PATCH v4 0/2] let archs decide for vcpu ids Greg Kurz
  2016-04-21 14:15 ` [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS Greg Kurz
@ 2016-04-21 14:20 ` Greg Kurz
  2016-04-21 16:00   ` Radim Krčmář
  2016-04-22  9:21   ` Wei Yang
  1 sibling, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2016-04-21 14:20 UTC (permalink / raw)
  To: Paolo Bonzini, james.hogan, mingo
  Cc: linux-mips, kvm, rkrcmar, linux-kernel, David Hildenbrand,
	qemu-ppc, Cornelia Huck, Paul Mackerras, David Gibson

Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
introduced a check to prevent potential kernel memory corruption in case
the vcpu id is too great.

Unfortunately this check assumes vcpu ids grow in sequence with a common
difference of 1, which is wrong: archs are free to use vcpu id as they fit.
For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
1024, guests may be limited down to 128 vcpus on POWER8.

This means the check does not belong here and should be moved to some arch
specific function: kvm_arch_vcpu_create() looks like a good candidate.

ARM and s390 already have such a check.

I could not spot any path in the PowerPC or common KVM code where a vcpu
id is used as described in the above commit: I believe PowerPC can live
without this check.

In the end, this patch simply moves the check to MIPS and x86. While here,
we also update the documentation to dissociate vcpu ids from the maximum
number of vcpus per virtual machine.

Acked-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
---
v4: - updated subject for more clarity on what the patch does
    - added James's and Connie's A-b tags
    - updated documentation

 Documentation/virtual/kvm/api.txt |    7 +++----
 arch/mips/kvm/mips.c              |    7 ++++++-
 arch/x86/kvm/x86.c                |    3 +++
 virt/kvm/kvm_main.c               |    3 ---
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 4d0542c5206b..486a1d783b82 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -199,11 +199,10 @@ Type: vm ioctl
 Parameters: vcpu id (apic id on x86)
 Returns: vcpu fd on success, -1 on error
 
-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
-in the range [0, max_vcpus).
+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
 
-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
-the KVM_CHECK_EXTENSION ioctl() at run-time.
+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
 The maximum possible value for max_vcpus can be retrieved using the
 KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
 
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 70ef1a43c114..0278ea146db5 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
 	int err, size, offset;
 	void *gebase;
 	int i;
+	struct kvm_vcpu *vcpu;
 
-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
+	if (id >= KVM_MAX_VCPUS) {
+		err = -EINVAL;
+		goto out;
+	}
 
+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
 	if (!vcpu) {
 		err = -ENOMEM;
 		goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b7798c7b210..7738202edcce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
 {
 	struct kvm_vcpu *vcpu;
 
+	if (id >= KVM_MAX_VCPUS)
+		return ERR_PTR(-EINVAL);
+
 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
 		printk_once(KERN_WARNING
 		"kvm: SMP vm created on host with unstable TSC; "
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4fd482fb9260..6b6cca3cb488 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 	int r;
 	struct kvm_vcpu *vcpu;
 
-	if (id >= KVM_MAX_VCPUS)
-		return -EINVAL;
-
 	vcpu = kvm_arch_vcpu_create(kvm, id);
 	if (IS_ERR(vcpu))
 		return PTR_ERR(vcpu);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
  2016-04-21 14:17   ` David Hildenbrand
@ 2016-04-21 14:30     ` Greg Kurz
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Kurz @ 2016-04-21 14:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, rkrcmar,
	linux-kernel, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

On Thu, 21 Apr 2016 16:17:29 +0200
David Hildenbrand <dahi@linux.vnet.ibm.com> wrote:

> > Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
> > a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
> > problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
> > 
> > This patch simply reverses the logic so that we only try fast path if the
> > vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
> > affected by the change.
> > 
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> >  include/linux/kvm_host.h |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index 5276fe0916fc..23bfe1bd159c 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -447,12 +447,13 @@ static inline struct kvm_vcpu *kvm_get_vcpu(struct kvm *kvm, int i)
> > 
> >  static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
> >  {
> > -	struct kvm_vcpu *vcpu;
> > +	struct kvm_vcpu *vcpu = NULL;
> >  	int i;
> > 
> > -	if (id < 0 || id >= KVM_MAX_VCPUS)
> > +	if (id < 0)
> >  		return NULL;
> > -	vcpu = kvm_get_vcpu(kvm, id);
> > +	if (id < KVM_MAX_VCPUS)
> > +		vcpu = kvm_get_vcpu(kvm, id);  
> 
> Maybe this check even should go into kvm_get_vcpu()
> 

Yeah possibly, but there are 19 users for kvm_get_vcpu() and I'm not sure if
none of them is on a hot path where this extra check could hurt... maybe this
can be done in a cleanup patch afterwards ?

> >  	if (vcpu && vcpu->vcpu_id == id)
> >  		return vcpu;
> >  	kvm_for_each_vcpu(i, vcpu, kvm)
> >   
> 
> Anyhow,
> 
> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
> 
> David

Thanks for the review !

Cheers.

--
Greg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-21 14:20 ` [PATCH v4 2/2] KVM: move vcpu id checking to archs Greg Kurz
@ 2016-04-21 16:00   ` Radim Krčmář
  2016-04-21 16:45     ` Greg Kurz
  2016-04-22  9:21   ` Wei Yang
  1 sibling, 1 reply; 22+ messages in thread
From: Radim Krčmář @ 2016-04-21 16:00 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

2016-04-21 16:20+0200, Greg Kurz:
> Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> introduced a check to prevent potential kernel memory corruption in case
> the vcpu id is too great.
> 
> Unfortunately this check assumes vcpu ids grow in sequence with a common
> difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> 1024, guests may be limited down to 128 vcpus on POWER8.
> 
> This means the check does not belong here and should be moved to some arch
> specific function: kvm_arch_vcpu_create() looks like a good candidate.
> 
> ARM and s390 already have such a check.
> 
> I could not spot any path in the PowerPC or common KVM code where a vcpu
> id is used as described in the above commit: I believe PowerPC can live
> without this check.
> 
> In the end, this patch simply moves the check to MIPS and x86. While here,
> we also update the documentation to dissociate vcpu ids from the maximum
> number of vcpus per virtual machine.
> 
> Acked-by: James Hogan <james.hogan@imgtec.com>
> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
> v4: - updated subject for more clarity on what the patch does
>     - added James's and Connie's A-b tags
>     - updated documentation
> 
>  Documentation/virtual/kvm/api.txt |    7 +++----
>  arch/mips/kvm/mips.c              |    7 ++++++-
>  arch/x86/kvm/x86.c                |    3 +++
>  virt/kvm/kvm_main.c               |    3 ---
>  4 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index 4d0542c5206b..486a1d783b82 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -199,11 +199,10 @@ Type: vm ioctl
>  Parameters: vcpu id (apic id on x86)
>  Returns: vcpu fd on success, -1 on error
>  
> -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> -in the range [0, max_vcpus).
> +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.

Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
provided too high vcpu_id to an old KVM or because new KVM failed in
other areas.  Not a huge problem (because I expect that userspace will
die on both), but a new KVM_CAP would be able to disambiguate it.

Toggleable capability doesn't seem necessary and only PowerPC changes,
so the capability could be arch specific ... I think that a generic one
makes more sense, though.

Userspace also doesn't know the vcpu id limit anymore, and it might
care.  What do you think about returning the arch-specific limit (or the
highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?

I think this would also clarify the connection between VCPU limit and
VCPU_ID limit.  Or is a boolean cap better?

> -The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
> -the KVM_CHECK_EXTENSION ioctl() at run-time.
> +The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
> +KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
>  The maximum possible value for max_vcpus can be retrieved using the
>  KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-21 16:00   ` Radim Krčmář
@ 2016-04-21 16:45     ` Greg Kurz
  2016-04-21 17:36       ` Radim Krčmář
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kurz @ 2016-04-21 16:45 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

On Thu, 21 Apr 2016 18:00:19 +0200
Radim Krčmář <rkrcmar@redhat.com> wrote:

> 2016-04-21 16:20+0200, Greg Kurz:
> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > introduced a check to prevent potential kernel memory corruption in case
> > the vcpu id is too great.
> > 
> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > 
> > This means the check does not belong here and should be moved to some arch
> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > 
> > ARM and s390 already have such a check.
> > 
> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > id is used as described in the above commit: I believe PowerPC can live
> > without this check.
> > 
> > In the end, this patch simply moves the check to MIPS and x86. While here,
> > we also update the documentation to dissociate vcpu ids from the maximum
> > number of vcpus per virtual machine.
> > 
> > Acked-by: James Hogan <james.hogan@imgtec.com>
> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > ---
> > v4: - updated subject for more clarity on what the patch does
> >     - added James's and Connie's A-b tags
> >     - updated documentation
> > 
> >  Documentation/virtual/kvm/api.txt |    7 +++----
> >  arch/mips/kvm/mips.c              |    7 ++++++-
> >  arch/x86/kvm/x86.c                |    3 +++
> >  virt/kvm/kvm_main.c               |    3 ---
> >  4 files changed, 12 insertions(+), 8 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > index 4d0542c5206b..486a1d783b82 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -199,11 +199,10 @@ Type: vm ioctl
> >  Parameters: vcpu id (apic id on x86)
> >  Returns: vcpu fd on success, -1 on error
> >  
> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> > -in the range [0, max_vcpus).
> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.  
> 
> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> provided too high vcpu_id to an old KVM or because new KVM failed in
> other areas.  Not a huge problem (because I expect that userspace will
> die on both), but a new KVM_CAP would be able to disambiguate it.
> 
> Toggleable capability doesn't seem necessary and only PowerPC changes,
> so the capability could be arch specific ... I think that a generic one
> makes more sense, though.
>

I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.

So I understand your concern but would we have a user for this ?

> Userspace also doesn't know the vcpu id limit anymore, and it might
> care.  What do you think about returning the arch-specific limit (or the
> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> 

This is partly true: only arch agnostic code would be lost.

Moreover this is a problem for powerpc only at the moment and userspace code
can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.

For other architectures, it is simply KVM_MAX_VCPUS.

> I think this would also clarify the connection between VCPU limit and
> VCPU_ID limit.  Or is a boolean cap better?
> 

Well, I'm not fan of adding a generic API to handle a corner case... maybe later
if we have other scenarios where vcpu ids need to cross the limit ?

> > -The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
> > -the KVM_CHECK_EXTENSION ioctl() at run-time.
> > +The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
> > +KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> >  The maximum possible value for max_vcpus can be retrieved using the
> >  KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.  
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-21 16:45     ` Greg Kurz
@ 2016-04-21 17:36       ` Radim Krčmář
  2016-04-22  9:25         ` Greg Kurz
  0 siblings, 1 reply; 22+ messages in thread
From: Radim Krčmář @ 2016-04-21 17:36 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

2016-04-21 18:45+0200, Greg Kurz:
> On Thu, 21 Apr 2016 18:00:19 +0200
> Radim Krčmář <rkrcmar@redhat.com> wrote:
>> 2016-04-21 16:20+0200, Greg Kurz:
>> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>> > introduced a check to prevent potential kernel memory corruption in case
>> > the vcpu id is too great.
>> > 
>> > Unfortunately this check assumes vcpu ids grow in sequence with a common
>> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>> > 1024, guests may be limited down to 128 vcpus on POWER8.
>> > 
>> > This means the check does not belong here and should be moved to some arch
>> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
>> > 
>> > ARM and s390 already have such a check.
>> > 
>> > I could not spot any path in the PowerPC or common KVM code where a vcpu
>> > id is used as described in the above commit: I believe PowerPC can live
>> > without this check.
>> > 
>> > In the end, this patch simply moves the check to MIPS and x86. While here,
>> > we also update the documentation to dissociate vcpu ids from the maximum
>> > number of vcpus per virtual machine.
>> > 
>> > Acked-by: James Hogan <james.hogan@imgtec.com>
>> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>> > ---
>> > v4: - updated subject for more clarity on what the patch does
>> >     - added James's and Connie's A-b tags
>> >     - updated documentation
>> > 
>> >  Documentation/virtual/kvm/api.txt |    7 +++----
>> >  arch/mips/kvm/mips.c              |    7 ++++++-
>> >  arch/x86/kvm/x86.c                |    3 +++
>> >  virt/kvm/kvm_main.c               |    3 ---
>> >  4 files changed, 12 insertions(+), 8 deletions(-)
>> > 
>> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> > index 4d0542c5206b..486a1d783b82 100644
>> > --- a/Documentation/virtual/kvm/api.txt
>> > +++ b/Documentation/virtual/kvm/api.txt
>> > @@ -199,11 +199,10 @@ Type: vm ioctl
>> >  Parameters: vcpu id (apic id on x86)
>> >  Returns: vcpu fd on success, -1 on error
>> >  
>> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
>> > -in the range [0, max_vcpus).
>> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.  
>> 
>> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
>> provided too high vcpu_id to an old KVM or because new KVM failed in
>> other areas.  Not a huge problem (because I expect that userspace will
>> die on both), but a new KVM_CAP would be able to disambiguate it.
>> 
>> Toggleable capability doesn't seem necessary and only PowerPC changes,
>> so the capability could be arch specific ... I think that a generic one
>> makes more sense, though.
>>
> 
> I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.

Yes, userspace cannot disambiguate, but would have the option of not
doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.

> So I understand your concern but would we have a user for this ?

I think so, new userspace on pre-patch KVM is the most likely one.

Userspace cannot tell that KVM doesn't support the extension and
behaving like on patched KVM would result in a failure with cryptic
error message, because KVM only returns EINVAL.

Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
recognizing that the user wanted too much?

>> Userspace also doesn't know the vcpu id limit anymore, and it might
>> care.  What do you think about returning the arch-specific limit (or the
>> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
>> 
> 
> This is partly true: only arch agnostic code would be lost.
> 
> Moreover this is a problem for powerpc only at the moment and userspace code
> can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.

How would that work on KVM without this patch?

> For other architectures, it is simply KVM_MAX_VCPUS.

(Other architectures would not implement the capability.)

>> I think this would also clarify the connection between VCPU limit and
>> VCPU_ID limit.  Or is a boolean cap better?
>> 
> 
> Well, I'm not fan of adding a generic API to handle a corner case...

I don't like it either, but I think that introducing the capability is
worth avoided problems.

>                                                                      maybe later
> if we have other scenarios where vcpu ids need to cross the limit ?

x86 is going to have that soon too -- vcpu_id will be able to range from
0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
won't be improved to actually scale, so MAX_CPUS will remain lower.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-21 14:20 ` [PATCH v4 2/2] KVM: move vcpu id checking to archs Greg Kurz
  2016-04-21 16:00   ` Radim Krčmář
@ 2016-04-22  9:21   ` Wei Yang
  2016-04-22  9:30     ` Greg Kurz
  1 sibling, 1 reply; 22+ messages in thread
From: Wei Yang @ 2016-04-22  9:21 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, rkrcmar,
	linux-kernel, David Hildenbrand, qemu-ppc, Cornelia Huck,
	Paul Mackerras, David Gibson

Hi, Greg

One confusion.

There are 5 kvm_arch_vcpu_create() while in this patch you changed 2 of them.
Some particular reason?

On Thu, Apr 21, 2016 at 04:20:53PM +0200, Greg Kurz wrote:
>Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>introduced a check to prevent potential kernel memory corruption in case
>the vcpu id is too great.
>
>Unfortunately this check assumes vcpu ids grow in sequence with a common
>difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>1024, guests may be limited down to 128 vcpus on POWER8.
>
>This means the check does not belong here and should be moved to some arch
>specific function: kvm_arch_vcpu_create() looks like a good candidate.
>
>ARM and s390 already have such a check.
>
>I could not spot any path in the PowerPC or common KVM code where a vcpu
>id is used as described in the above commit: I believe PowerPC can live
>without this check.
>
>In the end, this patch simply moves the check to MIPS and x86. While here,
>we also update the documentation to dissociate vcpu ids from the maximum
>number of vcpus per virtual machine.
>
>Acked-by: James Hogan <james.hogan@imgtec.com>
>Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>---
>v4: - updated subject for more clarity on what the patch does
>    - added James's and Connie's A-b tags
>    - updated documentation
>
> Documentation/virtual/kvm/api.txt |    7 +++----
> arch/mips/kvm/mips.c              |    7 ++++++-
> arch/x86/kvm/x86.c                |    3 +++
> virt/kvm/kvm_main.c               |    3 ---
> 4 files changed, 12 insertions(+), 8 deletions(-)
>
>diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>index 4d0542c5206b..486a1d783b82 100644
>--- a/Documentation/virtual/kvm/api.txt
>+++ b/Documentation/virtual/kvm/api.txt
>@@ -199,11 +199,10 @@ Type: vm ioctl
> Parameters: vcpu id (apic id on x86)
> Returns: vcpu fd on success, -1 on error
> 
>-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
>-in the range [0, max_vcpus).
>+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
> 
>-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
>-the KVM_CHECK_EXTENSION ioctl() at run-time.
>+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
>+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> The maximum possible value for max_vcpus can be retrieved using the
> KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> 
>diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>index 70ef1a43c114..0278ea146db5 100644
>--- a/arch/mips/kvm/mips.c
>+++ b/arch/mips/kvm/mips.c
>@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> 	int err, size, offset;
> 	void *gebase;
> 	int i;
>+	struct kvm_vcpu *vcpu;
> 
>-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>+	if (id >= KVM_MAX_VCPUS) {
>+		err = -EINVAL;
>+		goto out;
>+	}
> 
>+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> 	if (!vcpu) {
> 		err = -ENOMEM;
> 		goto out;
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index 9b7798c7b210..7738202edcce 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
> {
> 	struct kvm_vcpu *vcpu;
> 
>+	if (id >= KVM_MAX_VCPUS)
>+		return ERR_PTR(-EINVAL);
>+
> 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
> 		printk_once(KERN_WARNING
> 		"kvm: SMP vm created on host with unstable TSC; "
>diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>index 4fd482fb9260..6b6cca3cb488 100644
>--- a/virt/kvm/kvm_main.c
>+++ b/virt/kvm/kvm_main.c
>@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> 	int r;
> 	struct kvm_vcpu *vcpu;
> 
>-	if (id >= KVM_MAX_VCPUS)
>-		return -EINVAL;
>-
> 	vcpu = kvm_arch_vcpu_create(kvm, id);
> 	if (IS_ERR(vcpu))
> 		return PTR_ERR(vcpu);
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Richard Yang\nHelp you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-21 17:36       ` Radim Krčmář
@ 2016-04-22  9:25         ` Greg Kurz
  2016-04-22 10:22           ` Cornelia Huck
                             ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Greg Kurz @ 2016-04-22  9:25 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

Hi Radim !

On Thu, 21 Apr 2016 19:36:11 +0200
Radim Krčmář <rkrcmar@redhat.com> wrote:

> 2016-04-21 18:45+0200, Greg Kurz:
> > On Thu, 21 Apr 2016 18:00:19 +0200
> > Radim Krčmář <rkrcmar@redhat.com> wrote:  
> >> 2016-04-21 16:20+0200, Greg Kurz:  
> >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> >> > introduced a check to prevent potential kernel memory corruption in case
> >> > the vcpu id is too great.
> >> > 
> >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> >> > 
> >> > This means the check does not belong here and should be moved to some arch
> >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> >> > 
> >> > ARM and s390 already have such a check.
> >> > 
> >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> >> > id is used as described in the above commit: I believe PowerPC can live
> >> > without this check.
> >> > 
> >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> >> > we also update the documentation to dissociate vcpu ids from the maximum
> >> > number of vcpus per virtual machine.
> >> > 
> >> > Acked-by: James Hogan <james.hogan@imgtec.com>
> >> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> >> > ---
> >> > v4: - updated subject for more clarity on what the patch does
> >> >     - added James's and Connie's A-b tags
> >> >     - updated documentation
> >> > 
> >> >  Documentation/virtual/kvm/api.txt |    7 +++----
> >> >  arch/mips/kvm/mips.c              |    7 ++++++-
> >> >  arch/x86/kvm/x86.c                |    3 +++
> >> >  virt/kvm/kvm_main.c               |    3 ---
> >> >  4 files changed, 12 insertions(+), 8 deletions(-)
> >> > 
> >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >> > index 4d0542c5206b..486a1d783b82 100644
> >> > --- a/Documentation/virtual/kvm/api.txt
> >> > +++ b/Documentation/virtual/kvm/api.txt
> >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> >> >  Parameters: vcpu id (apic id on x86)
> >> >  Returns: vcpu fd on success, -1 on error
> >> >  
> >> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> >> > -in the range [0, max_vcpus).
> >> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.    
> >> 
> >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> >> provided too high vcpu_id to an old KVM or because new KVM failed in
> >> other areas.  Not a huge problem (because I expect that userspace will
> >> die on both), but a new KVM_CAP would be able to disambiguate it.
> >> 
> >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> >> so the capability could be arch specific ... I think that a generic one
> >> makes more sense, though.
> >>  
> > 
> > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.  
> 
> Yes, userspace cannot disambiguate, but would have the option of not
> doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
> 

It makes sense indeed.

> > So I understand your concern but would we have a user for this ?  
> 
> I think so, new userspace on pre-patch KVM is the most likely one.
> 
> Userspace cannot tell that KVM doesn't support the extension and
> behaving like on patched KVM would result in a failure with cryptic
> error message, because KVM only returns EINVAL.
> 

This is already the case with or without the patch... which only changes
things for PowerPC userspace. And in the case of QEMU, we're already
violating the spec with the way we compute vcpu ids.

> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
> recognizing that the user wanted too much?
> 

No. The error is caught in generic code and QEMU exits for all archs.

And BTW, how would QEMU guess that vcpu id is too high ? I see at
least three paths that can return EINVAL...

> >> Userspace also doesn't know the vcpu id limit anymore, and it might
> >> care.  What do you think about returning the arch-specific limit (or the
> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> >>   
> > 
> > This is partly true: only arch agnostic code would be lost.
> > 
> > Moreover this is a problem for powerpc only at the moment and userspace code
> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.  
> 
> How would that work on KVM without this patch?
> 

It doesn't work for PowerPC :)

KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
KVM_CAP_PPC_SMT indicates the host has 8 threads per core.

KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
it has vcpu_id == 128 * 8 == 1024.

Of course we can patch QEMU to restrict the maximum number of vcpus
to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
KVM for PowerPC is sized to run MAX_VCPUS... :\

> > For other architectures, it is simply KVM_MAX_VCPUS.  
> 
> (Other architectures would not implement the capability.)
> 

So this would be KVM_CAP_PPC_MAX_VCPU_ID ?

> >> I think this would also clarify the connection between VCPU limit and
> >> VCPU_ID limit.  Or is a boolean cap better?
> >>   
> > 
> > Well, I'm not fan of adding a generic API to handle a corner case...  
> 
> I don't like it either, but I think that introducing the capability is
> worth avoided problems.
> 

I admit that having separate capabilities for the number of vcpus and the
maximum vcpu id fixes the confusion once and for all.

> >                                                                      maybe later
> > if we have other scenarios where vcpu ids need to cross the limit ?  
> 
> x86 is going to have that soon too -- vcpu_id will be able to range from
> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> won't be improved to actually scale, so MAX_CPUS will remain lower.
> 

Do you have some pointers to share so that we can see the broader picture ?

Thanks !

--
Greg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22  9:21   ` Wei Yang
@ 2016-04-22  9:30     ` Greg Kurz
  2016-04-23  0:51       ` Wei Yang
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kurz @ 2016-04-22  9:30 UTC (permalink / raw)
  To: Wei Yang
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, rkrcmar,
	linux-kernel, David Hildenbrand, qemu-ppc, Cornelia Huck,
	Paul Mackerras, David Gibson

On Fri, 22 Apr 2016 17:21:03 +0800
Wei Yang <richard.weiyang@huawei.com> wrote:

> Hi, Greg
> 

Hi Wei !

> One confusion.
> 
> There are 5 kvm_arch_vcpu_create() while in this patch you changed 2 of them.
> Some particular reason?
> 

Yes and the reason is given in the changelog:
- ARM and s390 already have such a check
- PowerPC can live without this check
- this patch simply moves the check to MIPS and x86

Does it clarify ?

Cheers.

--
Greg

> On Thu, Apr 21, 2016 at 04:20:53PM +0200, Greg Kurz wrote:
> >Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> >introduced a check to prevent potential kernel memory corruption in case
> >the vcpu id is too great.
> >
> >Unfortunately this check assumes vcpu ids grow in sequence with a common
> >difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> >For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> >mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> >1024, guests may be limited down to 128 vcpus on POWER8.
> >
> >This means the check does not belong here and should be moved to some arch
> >specific function: kvm_arch_vcpu_create() looks like a good candidate.
> >
> >ARM and s390 already have such a check.
> >
> >I could not spot any path in the PowerPC or common KVM code where a vcpu
> >id is used as described in the above commit: I believe PowerPC can live
> >without this check.
> >
> >In the end, this patch simply moves the check to MIPS and x86. While here,
> >we also update the documentation to dissociate vcpu ids from the maximum
> >number of vcpus per virtual machine.
> >
> >Acked-by: James Hogan <james.hogan@imgtec.com>
> >Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> >Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> >---
> >v4: - updated subject for more clarity on what the patch does
> >    - added James's and Connie's A-b tags
> >    - updated documentation
> >
> > Documentation/virtual/kvm/api.txt |    7 +++----
> > arch/mips/kvm/mips.c              |    7 ++++++-
> > arch/x86/kvm/x86.c                |    3 +++
> > virt/kvm/kvm_main.c               |    3 ---
> > 4 files changed, 12 insertions(+), 8 deletions(-)
> >
> >diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> >index 4d0542c5206b..486a1d783b82 100644
> >--- a/Documentation/virtual/kvm/api.txt
> >+++ b/Documentation/virtual/kvm/api.txt
> >@@ -199,11 +199,10 @@ Type: vm ioctl
> > Parameters: vcpu id (apic id on x86)
> > Returns: vcpu fd on success, -1 on error
> > 
> >-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> >-in the range [0, max_vcpus).
> >+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
> > 
> >-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
> >-the KVM_CHECK_EXTENSION ioctl() at run-time.
> >+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
> >+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> > The maximum possible value for max_vcpus can be retrieved using the
> > KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
> > 
> >diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> >index 70ef1a43c114..0278ea146db5 100644
> >--- a/arch/mips/kvm/mips.c
> >+++ b/arch/mips/kvm/mips.c
> >@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
> > 	int err, size, offset;
> > 	void *gebase;
> > 	int i;
> >+	struct kvm_vcpu *vcpu;
> > 
> >-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> >+	if (id >= KVM_MAX_VCPUS) {
> >+		err = -EINVAL;
> >+		goto out;
> >+	}
> > 
> >+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
> > 	if (!vcpu) {
> > 		err = -ENOMEM;
> > 		goto out;
> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >index 9b7798c7b210..7738202edcce 100644
> >--- a/arch/x86/kvm/x86.c
> >+++ b/arch/x86/kvm/x86.c
> >@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
> > {
> > 	struct kvm_vcpu *vcpu;
> > 
> >+	if (id >= KVM_MAX_VCPUS)
> >+		return ERR_PTR(-EINVAL);
> >+
> > 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
> > 		printk_once(KERN_WARNING
> > 		"kvm: SMP vm created on host with unstable TSC; "
> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >index 4fd482fb9260..6b6cca3cb488 100644
> >--- a/virt/kvm/kvm_main.c
> >+++ b/virt/kvm/kvm_main.c
> >@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> > 	int r;
> > 	struct kvm_vcpu *vcpu;
> > 
> >-	if (id >= KVM_MAX_VCPUS)
> >-		return -EINVAL;
> >-
> > 	vcpu = kvm_arch_vcpu_create(kvm, id);
> > 	if (IS_ERR(vcpu))
> > 		return PTR_ERR(vcpu);
> >
> >--
> >To unsubscribe from this list: send the line "unsubscribe kvm" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22  9:25         ` Greg Kurz
@ 2016-04-22 10:22           ` Cornelia Huck
  2016-04-22 11:19           ` Igor Mammedov
  2016-04-22 13:40           ` Radim Krčmář
  2 siblings, 0 replies; 22+ messages in thread
From: Cornelia Huck @ 2016-04-22 10:22 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Radim Krčmář,
	Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Paul Mackerras, David Gibson

On Fri, 22 Apr 2016 11:25:38 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> On Thu, 21 Apr 2016 19:36:11 +0200
> Radim Krčmář <rkrcmar@redhat.com> wrote:

> > > For other architectures, it is simply KVM_MAX_VCPUS.  
> > 
> > (Other architectures would not implement the capability.)
> > 
> 
> So this would be KVM_CAP_PPC_MAX_VCPU_ID ?
> 
> > >> I think this would also clarify the connection between VCPU limit and
> > >> VCPU_ID limit.  Or is a boolean cap better?
> > >>   
> > > 
> > > Well, I'm not fan of adding a generic API to handle a corner case...  
> > 
> > I don't like it either, but I think that introducing the capability is
> > worth avoided problems.
> > 
> 
> I admit that having separate capabilities for the number of vcpus and the
> maximum vcpu id fixes the confusion once and for all.

Yes, and I think that the new max_vpcu_id cap should be generic for
that reason.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22  9:25         ` Greg Kurz
  2016-04-22 10:22           ` Cornelia Huck
@ 2016-04-22 11:19           ` Igor Mammedov
  2016-04-22 13:48             ` Radim Krčmář
  2016-04-22 13:40           ` Radim Krčmář
  2 siblings, 1 reply; 22+ messages in thread
From: Igor Mammedov @ 2016-04-22 11:19 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Radim Krčmář,
	Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

On Fri, 22 Apr 2016 11:25:38 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> Hi Radim !
> 
> On Thu, 21 Apr 2016 19:36:11 +0200
> Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> > 2016-04-21 18:45+0200, Greg Kurz:  
> > > On Thu, 21 Apr 2016 18:00:19 +0200
> > > Radim Krčmář <rkrcmar@redhat.com> wrote:    
> > >> 2016-04-21 16:20+0200, Greg Kurz:    
[...]
> > >                                                                      maybe later
> > > if we have other scenarios where vcpu ids need to cross the limit ?    
> > 
> > x86 is going to have that soon too -- vcpu_id will be able to range from
> > 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> > won't be improved to actually scale, so MAX_CPUS will remain lower.
> >  
That's not true, x86 is going to stick with KVM_MAX_VCPUS/qemu's max_cpus,
the only thing that is going to change is that max supported APIC ID
value will be in range 0 to 2^32-1 vs current 8bit one
and since APIC ID is not vcpu_id so it won't affect vcpu_id.
 
> 
> Do you have some pointers to share so that we can see the broader picture ?
> 
> Thanks !
> 
> --
> Greg
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22  9:25         ` Greg Kurz
  2016-04-22 10:22           ` Cornelia Huck
  2016-04-22 11:19           ` Igor Mammedov
@ 2016-04-22 13:40           ` Radim Krčmář
  2016-04-22 14:50             ` Greg Kurz
  2 siblings, 1 reply; 22+ messages in thread
From: Radim Krčmář @ 2016-04-22 13:40 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

2016-04-22 11:25+0200, Greg Kurz:
> Hi Radim !
> 
> On Thu, 21 Apr 2016 19:36:11 +0200
> Radim Krčmář <rkrcmar@redhat.com> wrote:
> 
> > 2016-04-21 18:45+0200, Greg Kurz:
> > > On Thu, 21 Apr 2016 18:00:19 +0200
> > > Radim Krčmář <rkrcmar@redhat.com> wrote:  
> > >> 2016-04-21 16:20+0200, Greg Kurz:  
> > >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > >> > introduced a check to prevent potential kernel memory corruption in case
> > >> > the vcpu id is too great.
> > >> > 
> > >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > >> > 
> > >> > This means the check does not belong here and should be moved to some arch
> > >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > >> > 
> > >> > ARM and s390 already have such a check.
> > >> > 
> > >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > >> > id is used as described in the above commit: I believe PowerPC can live
> > >> > without this check.
> > >> > 
> > >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> > >> > we also update the documentation to dissociate vcpu ids from the maximum
> > >> > number of vcpus per virtual machine.
> > >> > 
> > >> > Acked-by: James Hogan <james.hogan@imgtec.com>
> > >> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > >> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > >> > ---
> > >> > v4: - updated subject for more clarity on what the patch does
> > >> >     - added James's and Connie's A-b tags
> > >> >     - updated documentation
> > >> > 
> > >> >  Documentation/virtual/kvm/api.txt |    7 +++----
> > >> >  arch/mips/kvm/mips.c              |    7 ++++++-
> > >> >  arch/x86/kvm/x86.c                |    3 +++
> > >> >  virt/kvm/kvm_main.c               |    3 ---
> > >> >  4 files changed, 12 insertions(+), 8 deletions(-)
> > >> > 
> > >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > >> > index 4d0542c5206b..486a1d783b82 100644
> > >> > --- a/Documentation/virtual/kvm/api.txt
> > >> > +++ b/Documentation/virtual/kvm/api.txt
> > >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> > >> >  Parameters: vcpu id (apic id on x86)
> > >> >  Returns: vcpu fd on success, -1 on error
> > >> >  
> > >> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> > >> > -in the range [0, max_vcpus).
> > >> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.    
> > >> 
> > >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> > >> provided too high vcpu_id to an old KVM or because new KVM failed in
> > >> other areas.  Not a huge problem (because I expect that userspace will
> > >> die on both), but a new KVM_CAP would be able to disambiguate it.
> > >> 
> > >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> > >> so the capability could be arch specific ... I think that a generic one
> > >> makes more sense, though.
> > >>  
> > > 
> > > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.  
> > 
> > Yes, userspace cannot disambiguate, but would have the option of not
> > doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
> > 
> 
> It makes sense indeed.
> 
>> > So I understand your concern but would we have a user for this ?  
>> 
>> I think so, new userspace on pre-patch KVM is the most likely one.
>> 
>> Userspace cannot tell that KVM doesn't support the extension and
>> behaving like on patched KVM would result in a failure with cryptic
>> error message, because KVM only returns EINVAL.
>> 
> 
> This is already the case with or without the patch... which only changes
> things for PowerPC userspace.

I guess that the error message from QEMU should be improved then ...
The spec is quite clear that KVM is going to fail because of invalid
vcpu_id.

x86 QEMU does this when the amount of CPUs doesn't fit APIC constraints:
  # qemu-kvm -smp 160,cores=9
  qemu-system-x86_64: max_cpus is too large. APIC ID of last CPU is 278

Not perfectly understandable to the uninitiated, but it gets the point
across.

>                               And in the case of QEMU, we're already
> violating the spec with the way we compute vcpu ids.

The numbering strategy is valid.  KVM spec never said that numbering
cannot be sparse/arbitrary, just that vcpu_id must not exceed MAX_VCPUS.

>> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
>> recognizing that the user wanted too much?
>> 
> 
> No. The error is caught in generic code and QEMU exits for all archs.
> 
> And BTW, how would QEMU guess that vcpu id is too high ? I see at
> least three paths that can return EINVAL...

I agree, -EINVAL is the problem. :)
(Returning -ERANGE would have made more sense.)

The userspace has to guess if it tries fails, but if it was aware of the
actual limit, it could assume that the creation failed because of high
vcpu_id and report it as a user-amendable error ...
Userspace would better do some sanity checks beforehand instead of
trying and failing, though.

>> >> Userspace also doesn't know the vcpu id limit anymore, and it might
>> >> care.  What do you think about returning the arch-specific limit (or the
>> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
>> >>   
>> > 
>> > This is partly true: only arch agnostic code would be lost.
>> > 
>> > Moreover this is a problem for powerpc only at the moment and userspace code
>> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.  
>> 
>> How would that work on KVM without this patch?
>> 
> 
> It doesn't work for PowerPC :)
> 
> KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
> KVM_CAP_PPC_SMT indicates the host has 8 threads per core.
> 
> KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
> it has vcpu_id == 128 * 8 == 1024.
> 
> Of course we can patch QEMU to restrict the maximum number of vcpus
> to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
> KVM for PowerPC is sized to run MAX_VCPUS... :\

Yeah.  If we introduced the capability, then QEMU would restrict vcpu_id
to MAX_VCPUS or MAX_VCPU_ID and the number of VCPUS always to MAX_VCPUS.

>> > For other architectures, it is simply KVM_MAX_VCPUS.  
>> 
>> (Other architectures would not implement the capability.)
>> 
> 
> So this would be KVM_CAP_PPC_MAX_VCPU_ID ?

No, need.  The capability could be generic, because the concept is
arch-neutral (and it looks like x86 will use it too).

Unimplemented capabilities return 0, which is why other architectures
don't need to implement it.  Nothing can break:  userspace on old KVM
will get 0 for the capability, so it's going to behave as if we didn't
have this patch and this patch made sure that nothing actually changed
for other architectures.

>> >                                                                      maybe later
>> > if we have other scenarios where vcpu ids need to cross the limit ?  
>> 
>> x86 is going to have that soon too -- vcpu_id will be able to range from
>> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
>> won't be improved to actually scale, so MAX_CPUS will remain lower.
>> 
> 
> Do you have some pointers to share so that we can see the broader picture ?

At the moment, the most concrete one is x2APIC spec, sorry.

KVM currently supports only xAPIC, which has 8 bit APIC ID.
(The x2APIC interface that you might see is paravirtualized.)

vcpu_id is APIC ID on x86 and x2APIC still has sparse allocation that
encodes topology.  KVM is going to support standard x2APIC (when QEMU
supports interrupt remapping), which opens a possibility where we'd end
up in exactly the same situation where PowerPC is now -- architectural
code would handle vcpu_id bigger than MAX_CPUS, but userspace couldn't
make us of it, because of API limitations.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22 11:19           ` Igor Mammedov
@ 2016-04-22 13:48             ` Radim Krčmář
  0 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2016-04-22 13:48 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Greg Kurz, Paolo Bonzini, james.hogan, mingo, linux-mips, kvm,
	linux-kernel, David Hildenbrand, qemu-ppc, Cornelia Huck,
	Paul Mackerras, David Gibson

2016-04-22 13:19+0200, Igor Mammedov:
> On Fri, 22 Apr 2016 11:25:38 +0200
> Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:
>> On Thu, 21 Apr 2016 19:36:11 +0200
>> Radim Krčmář <rkrcmar@redhat.com> wrote:
>> > 2016-04-21 18:45+0200, Greg Kurz:  
>> > > On Thu, 21 Apr 2016 18:00:19 +0200
>> > > Radim Krčmář <rkrcmar@redhat.com> wrote:    
>> > >> 2016-04-21 16:20+0200, Greg Kurz:    
>[...]
>> > >                                                                      maybe later
>> > > if we have other scenarios where vcpu ids need to cross the limit ?    
>> > 
>> > x86 is going to have that soon too -- vcpu_id will be able to range from
>> > 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
>> > won't be improved to actually scale, so MAX_CPUS will remain lower.
>> >  
> That's not true, x86 is going to stick with KVM_MAX_VCPUS/qemu's max_cpus,
> the only thing that is going to change is that max supported APIC ID
> value will be in range 0 to 2^32-1 vs current 8bit one
> and since APIC ID is not vcpu_id so it won't affect vcpu_id.

I wish it wasn't.  vcpu_id is the initial APIC ID -- at least the spec
says so and KVM code behaves like that (QEMU does too).

It doesn't have to be so, though, KVM_SET_LAPIC provides the interface
to set APIC ID.  We'd decouple these two and change some related things.
(And add yet another cap for that? :])

I'll see what would be really needed,

thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22 13:40           ` Radim Krčmář
@ 2016-04-22 14:50             ` Greg Kurz
  2016-04-25 14:15               ` Radim Krčmář
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kurz @ 2016-04-22 14:50 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

On Fri, 22 Apr 2016 15:40:30 +0200
Radim Krčmář <rkrcmar@redhat.com> wrote:

> 2016-04-22 11:25+0200, Greg Kurz:
> > Hi Radim !
> > 
> > On Thu, 21 Apr 2016 19:36:11 +0200
> > Radim Krčmář <rkrcmar@redhat.com> wrote:
> >   
> > > 2016-04-21 18:45+0200, Greg Kurz:  
> > > > On Thu, 21 Apr 2016 18:00:19 +0200
> > > > Radim Krčmář <rkrcmar@redhat.com> wrote:    
> > > >> 2016-04-21 16:20+0200, Greg Kurz:    
> > > >> > Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
> > > >> > introduced a check to prevent potential kernel memory corruption in case
> > > >> > the vcpu id is too great.
> > > >> > 
> > > >> > Unfortunately this check assumes vcpu ids grow in sequence with a common
> > > >> > difference of 1, which is wrong: archs are free to use vcpu id as they fit.
> > > >> > For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
> > > >> > mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
> > > >> > 1024, guests may be limited down to 128 vcpus on POWER8.
> > > >> > 
> > > >> > This means the check does not belong here and should be moved to some arch
> > > >> > specific function: kvm_arch_vcpu_create() looks like a good candidate.
> > > >> > 
> > > >> > ARM and s390 already have such a check.
> > > >> > 
> > > >> > I could not spot any path in the PowerPC or common KVM code where a vcpu
> > > >> > id is used as described in the above commit: I believe PowerPC can live
> > > >> > without this check.
> > > >> > 
> > > >> > In the end, this patch simply moves the check to MIPS and x86. While here,
> > > >> > we also update the documentation to dissociate vcpu ids from the maximum
> > > >> > number of vcpus per virtual machine.
> > > >> > 
> > > >> > Acked-by: James Hogan <james.hogan@imgtec.com>
> > > >> > Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
> > > >> > Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> > > >> > ---
> > > >> > v4: - updated subject for more clarity on what the patch does
> > > >> >     - added James's and Connie's A-b tags
> > > >> >     - updated documentation
> > > >> > 
> > > >> >  Documentation/virtual/kvm/api.txt |    7 +++----
> > > >> >  arch/mips/kvm/mips.c              |    7 ++++++-
> > > >> >  arch/x86/kvm/x86.c                |    3 +++
> > > >> >  virt/kvm/kvm_main.c               |    3 ---
> > > >> >  4 files changed, 12 insertions(+), 8 deletions(-)
> > > >> > 
> > > >> > diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> > > >> > index 4d0542c5206b..486a1d783b82 100644
> > > >> > --- a/Documentation/virtual/kvm/api.txt
> > > >> > +++ b/Documentation/virtual/kvm/api.txt
> > > >> > @@ -199,11 +199,10 @@ Type: vm ioctl
> > > >> >  Parameters: vcpu id (apic id on x86)
> > > >> >  Returns: vcpu fd on success, -1 on error
> > > >> >  
> > > >> > -This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
> > > >> > -in the range [0, max_vcpus).
> > > >> > +This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.      
> > > >> 
> > > >> Userspace won't be able to tell if KVM_CREATE_VCPU failed because it
> > > >> provided too high vcpu_id to an old KVM or because new KVM failed in
> > > >> other areas.  Not a huge problem (because I expect that userspace will
> > > >> die on both), but a new KVM_CAP would be able to disambiguate it.
> > > >> 
> > > >> Toggleable capability doesn't seem necessary and only PowerPC changes,
> > > >> so the capability could be arch specific ... I think that a generic one
> > > >> makes more sense, though.
> > > >>    
> > > > 
> > > > I'm not sure userspace can disambiguate all the cases where KVM_CREATE_VCPU
> > > > returns EINVAL already... and, FWIW, QEMU simply exits if it gets an error.    
> > > 
> > > Yes, userspace cannot disambiguate, but would have the option of not
> > > doing something that is destined to fail, like with KVM_CAP_MAX_VCPU.
> > >   
> > 
> > It makes sense indeed.
> >   
> >> > So I understand your concern but would we have a user for this ?    
> >> 
> >> I think so, new userspace on pre-patch KVM is the most likely one.
> >> 
> >> Userspace cannot tell that KVM doesn't support the extension and
> >> behaving like on patched KVM would result in a failure with cryptic
> >> error message, because KVM only returns EINVAL.
> >>   
> > 
> > This is already the case with or without the patch... which only changes
> > things for PowerPC userspace.  
> 
> I guess that the error message from QEMU should be improved then ...

Definitely.

> The spec is quite clear that KVM is going to fail because of invalid
> vcpu_id.
> 

Agreed.

> x86 QEMU does this when the amount of CPUs doesn't fit APIC constraints:
>   # qemu-kvm -smp 160,cores=9
>   qemu-system-x86_64: max_cpus is too large. APIC ID of last CPU is 278
> 
> Not perfectly understandable to the uninitiated, but it gets the point
> across.
> 
> >                               And in the case of QEMU, we're already
> > violating the spec with the way we compute vcpu ids.  
> 
> The numbering strategy is valid.  KVM spec never said that numbering
> cannot be sparse/arbitrary, just that vcpu_id must not exceed MAX_VCPUS.
> 

You're right... it's just conflicting with the API when trying to run guests with
less threads per core than the host (the situation where we get higher vcpu ids).

This should probably be documented somewhere.

> >> Btw. PowerPC QEMU tries vcpu_id >= KVM_MAX_VCPUS and fails, instead of
> >> recognizing that the user wanted too much?
> >>   
> > 
> > No. The error is caught in generic code and QEMU exits for all archs.
> > 
> > And BTW, how would QEMU guess that vcpu id is too high ? I see at
> > least three paths that can return EINVAL...  
> 
> I agree, -EINVAL is the problem. :)
> (Returning -ERANGE would have made more sense.)
> 
> The userspace has to guess if it tries fails, but if it was aware of the
> actual limit, it could assume that the creation failed because of high
> vcpu_id and report it as a user-amendable error ...
> Userspace would better do some sanity checks beforehand instead of
> trying and failing, though.
> 

I'm convinced. :)

> >> >> Userspace also doesn't know the vcpu id limit anymore, and it might
> >> >> care.  What do you think about returning the arch-specific limit (or the
> >> >> highest positive integer) as int in KVM_CAP_MAX_VCPU_ID?
> >> >>     
> >> > 
> >> > This is partly true: only arch agnostic code would be lost.
> >> > 
> >> > Moreover this is a problem for powerpc only at the moment and userspace code
> >> > can compute the vcpu_id limit out of KVM_CAP_MAX_VCPUS and KVM_CAP_PPC_SMT.    
> >> 
> >> How would that work on KVM without this patch?
> >>   
> > 
> > It doesn't work for PowerPC :)
> > 
> > KVM_CAP_MAX_VCPUS indicates we can can start, say, 1024 vcpus and
> > KVM_CAP_PPC_SMT indicates the host has 8 threads per core.
> > 
> > KVM_CREATE_VCPU returns EINVAL when we start the 128th one because
> > it has vcpu_id == 128 * 8 == 1024.
> > 
> > Of course we can patch QEMU to restrict the maximum number of vcpus
> > to MAX_VCPUS / PPC_SMT for PowerPC, but it would be infortunate since
> > KVM for PowerPC is sized to run MAX_VCPUS... :\  
> 
> Yeah.  If we introduced the capability, then QEMU would restrict vcpu_id
> to MAX_VCPUS or MAX_VCPU_ID and the number of VCPUS always to MAX_VCPUS.
> 

Ok. I'll give a try.

Just to be sure I haven't missed something:
- change the spec to introduce the MAX_VCPU_ID concept
- update all related checks in KVM
- provide a KVM_CAP_MAX_VCPU_ID for userspace

> >> > For other architectures, it is simply KVM_MAX_VCPUS.    
> >> 
> >> (Other architectures would not implement the capability.)
> >>   
> > 
> > So this would be KVM_CAP_PPC_MAX_VCPU_ID ?  
> 
> No, need.  The capability could be generic, because the concept is
> arch-neutral (and it looks like x86 will use it too).
> 
> Unimplemented capabilities return 0, which is why other architectures
> don't need to implement it.  Nothing can break:  userspace on old KVM
> will get 0 for the capability, so it's going to behave as if we didn't
> have this patch and this patch made sure that nothing actually changed
> for other architectures.
> 
> >> >                                                                      maybe later
> >> > if we have other scenarios where vcpu ids need to cross the limit ?    
> >> 
> >> x86 is going to have that soon too -- vcpu_id will be able to range from
> >> 0 to 2^32-1 (or 2^31), but MAX_CPUS related data structures probably
> >> won't be improved to actually scale, so MAX_CPUS will remain lower.
> >>   
> > 
> > Do you have some pointers to share so that we can see the broader picture ?  
> 
> At the moment, the most concrete one is x2APIC spec, sorry.
> 
> KVM currently supports only xAPIC, which has 8 bit APIC ID.
> (The x2APIC interface that you might see is paravirtualized.)
> 
> vcpu_id is APIC ID on x86 and x2APIC still has sparse allocation that
> encodes topology.  KVM is going to support standard x2APIC (when QEMU
> supports interrupt remapping), which opens a possibility where we'd end
> up in exactly the same situation where PowerPC is now -- architectural
> code would handle vcpu_id bigger than MAX_CPUS, but userspace couldn't
> make us of it, because of API limitations.
> 

Thanks for the clarification !

Cheers.

--
Greg

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22  9:30     ` Greg Kurz
@ 2016-04-23  0:51       ` Wei Yang
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Yang @ 2016-04-23  0:51 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Wei Yang, Paolo Bonzini, james.hogan, mingo, linux-mips, kvm,
	rkrcmar, linux-kernel, David Hildenbrand, qemu-ppc,
	Cornelia Huck, Paul Mackerras, David Gibson

On Fri, Apr 22, 2016 at 11:30:45AM +0200, Greg Kurz wrote:
>On Fri, 22 Apr 2016 17:21:03 +0800
>Wei Yang <richard.weiyang@huawei.com> wrote:
>
>> Hi, Greg
>> 
>
>Hi Wei !
>
>> One confusion.
>> 
>> There are 5 kvm_arch_vcpu_create() while in this patch you changed 2 of them.
>> Some particular reason?
>> 
>
>Yes and the reason is given in the changelog:
>- ARM and s390 already have such a check
>- PowerPC can live without this check
>- this patch simply moves the check to MIPS and x86
>
>Does it clarify ?
>

Sure, thanks :-)

>Cheers.
>
>--
>Greg
>
>> On Thu, Apr 21, 2016 at 04:20:53PM +0200, Greg Kurz wrote:
>> >Commit 338c7dbadd26 ("KVM: Improve create VCPU parameter (CVE-2013-4587)")
>> >introduced a check to prevent potential kernel memory corruption in case
>> >the vcpu id is too great.
>> >
>> >Unfortunately this check assumes vcpu ids grow in sequence with a common
>> >difference of 1, which is wrong: archs are free to use vcpu id as they fit.
>> >For example, QEMU originated vcpu ids for PowerPC cpus running in boot3s_hv
>> >mode, can grow with a common difference of 2, 4 or 8: if KVM_MAX_VCPUS is
>> >1024, guests may be limited down to 128 vcpus on POWER8.
>> >
>> >This means the check does not belong here and should be moved to some arch
>> >specific function: kvm_arch_vcpu_create() looks like a good candidate.
>> >
>> >ARM and s390 already have such a check.
>> >
>> >I could not spot any path in the PowerPC or common KVM code where a vcpu
>> >id is used as described in the above commit: I believe PowerPC can live
>> >without this check.
>> >
>> >In the end, this patch simply moves the check to MIPS and x86. While here,
>> >we also update the documentation to dissociate vcpu ids from the maximum
>> >number of vcpus per virtual machine.
>> >
>> >Acked-by: James Hogan <james.hogan@imgtec.com>
>> >Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
>> >Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>> >---
>> >v4: - updated subject for more clarity on what the patch does
>> >    - added James's and Connie's A-b tags
>> >    - updated documentation
>> >
>> > Documentation/virtual/kvm/api.txt |    7 +++----
>> > arch/mips/kvm/mips.c              |    7 ++++++-
>> > arch/x86/kvm/x86.c                |    3 +++
>> > virt/kvm/kvm_main.c               |    3 ---
>> > 4 files changed, 12 insertions(+), 8 deletions(-)
>> >
>> >diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> >index 4d0542c5206b..486a1d783b82 100644
>> >--- a/Documentation/virtual/kvm/api.txt
>> >+++ b/Documentation/virtual/kvm/api.txt
>> >@@ -199,11 +199,10 @@ Type: vm ioctl
>> > Parameters: vcpu id (apic id on x86)
>> > Returns: vcpu fd on success, -1 on error
>> > 
>> >-This API adds a vcpu to a virtual machine.  The vcpu id is a small integer
>> >-in the range [0, max_vcpus).
>> >+This API adds a vcpu to a virtual machine.  The vcpu id is a positive integer.
>> > 
>> >-The recommended max_vcpus value can be retrieved using the KVM_CAP_NR_VCPUS of
>> >-the KVM_CHECK_EXTENSION ioctl() at run-time.
>> >+The recommended maximum number of vcpus (max_vcpus) can be retrieved using the
>> >+KVM_CAP_NR_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
>> > The maximum possible value for max_vcpus can be retrieved using the
>> > KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
>> > 
>> >diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
>> >index 70ef1a43c114..0278ea146db5 100644
>> >--- a/arch/mips/kvm/mips.c
>> >+++ b/arch/mips/kvm/mips.c
>> >@@ -248,9 +248,14 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id)
>> > 	int err, size, offset;
>> > 	void *gebase;
>> > 	int i;
>> >+	struct kvm_vcpu *vcpu;
>> > 
>> >-	struct kvm_vcpu *vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>> >+	if (id >= KVM_MAX_VCPUS) {
>> >+		err = -EINVAL;
>> >+		goto out;
>> >+	}
>> > 
>> >+	vcpu = kzalloc(sizeof(struct kvm_vcpu), GFP_KERNEL);
>> > 	if (!vcpu) {
>> > 		err = -ENOMEM;
>> > 		goto out;
>> >diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >index 9b7798c7b210..7738202edcce 100644
>> >--- a/arch/x86/kvm/x86.c
>> >+++ b/arch/x86/kvm/x86.c
>> >@@ -7358,6 +7358,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm,
>> > {
>> > 	struct kvm_vcpu *vcpu;
>> > 
>> >+	if (id >= KVM_MAX_VCPUS)
>> >+		return ERR_PTR(-EINVAL);
>> >+
>> > 	if (check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0)
>> > 		printk_once(KERN_WARNING
>> > 		"kvm: SMP vm created on host with unstable TSC; "
>> >diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> >index 4fd482fb9260..6b6cca3cb488 100644
>> >--- a/virt/kvm/kvm_main.c
>> >+++ b/virt/kvm/kvm_main.c
>> >@@ -2272,9 +2272,6 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
>> > 	int r;
>> > 	struct kvm_vcpu *vcpu;
>> > 
>> >-	if (id >= KVM_MAX_VCPUS)
>> >-		return -EINVAL;
>> >-
>> > 	vcpu = kvm_arch_vcpu_create(kvm, id);
>> > 	if (IS_ERR(vcpu))
>> > 		return PTR_ERR(vcpu);
>> >
>> >--
>> >To unsubscribe from this list: send the line "unsubscribe kvm" in
>> >the body of a message to majordomo@vger.kernel.org
>> >More majordomo info at  http://vger.kernel.org/majordomo-info.html  
>> 

-- 
Richard Yang\nHelp you, Help me

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-22 14:50             ` Greg Kurz
@ 2016-04-25 14:15               ` Radim Krčmář
  2016-04-25 14:30                 ` Greg Kurz
  0 siblings, 1 reply; 22+ messages in thread
From: Radim Krčmář @ 2016-04-25 14:15 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

2016-04-22 16:50+0200, Greg Kurz:
> Just to be sure I haven't missed something:
> - change the spec to introduce the MAX_VCPU_ID concept
> - update all related checks in KVM
> - provide a KVM_CAP_MAX_VCPU_ID for userspace

That is it, thanks a lot!

(From nitpicks that come to my mind ... MAX_VCPU_ID should not be able
 to lower the VCPU_ID limit below MAX_VCPUS.)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/2] KVM: move vcpu id checking to archs
  2016-04-25 14:15               ` Radim Krčmář
@ 2016-04-25 14:30                 ` Greg Kurz
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Kurz @ 2016-04-25 14:30 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

On Mon, 25 Apr 2016 16:15:22 +0200
Radim Krčmář <rkrcmar@redhat.com> wrote:

> 2016-04-22 16:50+0200, Greg Kurz:
> > Just to be sure I haven't missed something:
> > - change the spec to introduce the MAX_VCPU_ID concept
> > - update all related checks in KVM
> > - provide a KVM_CAP_MAX_VCPU_ID for userspace  
> 
> That is it, thanks a lot!
> 
> (From nitpicks that come to my mind ... MAX_VCPU_ID should not be able
>  to lower the VCPU_ID limit below MAX_VCPUS.)
> 

Indeed it shouldn't. Thanks for the tip !

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
  2016-04-21 14:15 ` [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS Greg Kurz
  2016-04-21 14:17   ` David Hildenbrand
@ 2016-04-26  7:44   ` Cornelia Huck
  2016-04-27  9:40   ` Gerg Kurz
  2 siblings, 0 replies; 22+ messages in thread
From: Cornelia Huck @ 2016-04-26  7:44 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, rkrcmar,
	linux-kernel, David Hildenbrand, qemu-ppc, Paul Mackerras,
	David Gibson

On Thu, 21 Apr 2016 16:15:05 +0200
Greg Kurz <gkurz@linux.vnet.ibm.com> wrote:

> Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
> a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
> problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
> 
> This patch simply reverses the logic so that we only try fast path if the
> vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
> affected by the change.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---
>  include/linux/kvm_host.h |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)

Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
  2016-04-21 14:15 ` [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS Greg Kurz
  2016-04-21 14:17   ` David Hildenbrand
  2016-04-26  7:44   ` Cornelia Huck
@ 2016-04-27  9:40   ` Gerg Kurz
  2016-04-27 14:40     ` Radim Krčmář
  2 siblings, 1 reply; 22+ messages in thread
From: Gerg Kurz @ 2016-04-27  9:40 UTC (permalink / raw)
  To: Paolo Bonzini, james.hogan, mingo
  Cc: linux-mips, kvm, rkrcmar, linux-kernel, David Hildenbrand,
	qemu-ppc, Cornelia Huck, Paul Mackerras, David Gibson,
	Radim Krčmář


Quoting Greg Kurz <gkurz@linux.vnet.ibm.com>:

> Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
> a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
> problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
>
> This patch simply reverses the logic so that we only try fast path if the
> vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
> affected by the change.
>
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> ---

Radim,

I think this sanity check is only needed because kvm_get_vcpu() use the
id as an index in kvm->vcpus[]. Checking against the new KVM_MAX_VCPU_ID
would be clearly wrong here.

And this patch got two R-b tags already. Do you agree we keep it ?

Cheers.

--
Greg

>  include/linux/kvm_host.h |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 5276fe0916fc..23bfe1bd159c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -447,12 +447,13 @@ static inline struct kvm_vcpu  
> *kvm_get_vcpu(struct kvm *kvm, int i)
>
>  static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
>  {
> -	struct kvm_vcpu *vcpu;
> +	struct kvm_vcpu *vcpu = NULL;
>  	int i;
>
> -	if (id < 0 || id >= KVM_MAX_VCPUS)
> +	if (id < 0)
>  		return NULL;
> -	vcpu = kvm_get_vcpu(kvm, id);
> +	if (id < KVM_MAX_VCPUS)
> +		vcpu = kvm_get_vcpu(kvm, id);
>  	if (vcpu && vcpu->vcpu_id == id)
>  		return vcpu;
>  	kvm_for_each_vcpu(i, vcpu, kvm)
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS
  2016-04-27  9:40   ` Gerg Kurz
@ 2016-04-27 14:40     ` Radim Krčmář
  0 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2016-04-27 14:40 UTC (permalink / raw)
  To: Greg Kurz
  Cc: Paolo Bonzini, james.hogan, mingo, linux-mips, kvm, linux-kernel,
	David Hildenbrand, qemu-ppc, Cornelia Huck, Paul Mackerras,
	David Gibson

2016-04-27 05:40-0400, Gerg Kurz:
> Quoting Greg Kurz <gkurz@linux.vnet.ibm.com>:
> 
>> Commit c896939f7cff ("KVM: use heuristic for fast VCPU lookup by id") added
>> a return path that prevents vcpu ids to exceed KVM_MAX_VCPUS. This is a
>> problem for powerpc where vcpu ids can grow up to 8*KVM_MAX_VCPUS.
>> 
>> This patch simply reverses the logic so that we only try fast path if the
>> vcpu id can be tried as an index in kvm->vcpus[]. The slow path is not
>> affected by the change.
>> 
>> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
>> ---
> 
> Radim,
> 
> I think this sanity check is only needed because kvm_get_vcpu() use the
> id as an index in kvm->vcpus[]. Checking against the new KVM_MAX_VCPU_ID
> would be clearly wrong here.

I agree, checking KVM_MAX_VCPU_ID would be pointless.

> And this patch got two R-b tags already. Do you agree we keep it ?

Yes, thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-04-27 14:40 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-21 14:14 [PATCH v4 0/2] let archs decide for vcpu ids Greg Kurz
2016-04-21 14:15 ` [PATCH v4 1/2] KVM: remove NULL return path for vcpu ids >= KVM_MAX_VCPUS Greg Kurz
2016-04-21 14:17   ` David Hildenbrand
2016-04-21 14:30     ` Greg Kurz
2016-04-26  7:44   ` Cornelia Huck
2016-04-27  9:40   ` Gerg Kurz
2016-04-27 14:40     ` Radim Krčmář
2016-04-21 14:20 ` [PATCH v4 2/2] KVM: move vcpu id checking to archs Greg Kurz
2016-04-21 16:00   ` Radim Krčmář
2016-04-21 16:45     ` Greg Kurz
2016-04-21 17:36       ` Radim Krčmář
2016-04-22  9:25         ` Greg Kurz
2016-04-22 10:22           ` Cornelia Huck
2016-04-22 11:19           ` Igor Mammedov
2016-04-22 13:48             ` Radim Krčmář
2016-04-22 13:40           ` Radim Krčmář
2016-04-22 14:50             ` Greg Kurz
2016-04-25 14:15               ` Radim Krčmář
2016-04-25 14:30                 ` Greg Kurz
2016-04-22  9:21   ` Wei Yang
2016-04-22  9:30     ` Greg Kurz
2016-04-23  0:51       ` Wei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).