All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fixes for broken commit 48afe6e4eabf, Windows fails to boot
@ 2021-05-29  9:13 ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-05-29  9:13 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Eduardo Habkost,
	Siddharth Chandrasekaran, Philippe Mathieu-Daudé
  Cc: Claudio Fontana, Michael S . Tsirkin, kvm, Marcelo Tosatti,
	Richard Henderson, Cameron Esfahani, Roman Bolshakov, qemu-devel

Commit ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
introduced two bugs that break cpu max and host in the refactoring,
by running initializations in the wrong order.

This small series of two patches is an attempt to correct the situation.

Please provide your test results and feedback, thanks!

Claudio

Claudio Fontana (2):
  i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  i386: run accel_cpu_instance_init as instance_post_init

 target/i386/cpu.c | 66 +++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 31 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/2] Fixes for broken commit 48afe6e4eabf, Windows fails to boot
@ 2021-05-29  9:13 ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-05-29  9:13 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Eduardo Habkost,
	Siddharth Chandrasekaran, Philippe Mathieu-Daudé
  Cc: kvm, Michael S . Tsirkin, Marcelo Tosatti, Richard Henderson,
	qemu-devel, Cameron Esfahani, Roman Bolshakov, Claudio Fontana

Commit ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
introduced two bugs that break cpu max and host in the refactoring,
by running initializations in the wrong order.

This small series of two patches is an attempt to correct the situation.

Please provide your test results and feedback, thanks!

Claudio

Claudio Fontana (2):
  i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  i386: run accel_cpu_instance_init as instance_post_init

 target/i386/cpu.c | 66 +++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 31 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  2021-05-29  9:13 ` Claudio Fontana
@ 2021-05-29  9:13   ` Claudio Fontana
  -1 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-05-29  9:13 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Eduardo Habkost,
	Siddharth Chandrasekaran, Philippe Mathieu-Daudé
  Cc: Claudio Fontana, Michael S . Tsirkin, kvm, Marcelo Tosatti,
	Richard Henderson, Cameron Esfahani, Roman Bolshakov, qemu-devel

we need to expand features first, before we attempt to check them
in the accel realizefn code called by cpu_exec_realizefn().

At the same time we need checks for code_urev and host_cpuid_required,
and modifications to cpu->mwait to happen after the initial setting
of them inside the accel realizefn code.

Partial Fix.

Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9e211ac2ce..6bcb7dbc2c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
     Error *local_err = NULL;
     static bool ht_warned;
 
-    /* Process Hyper-V enlightenments */
-    x86_cpu_hyperv_realize(cpu);
-
-    cpu_exec_realizefn(cs, &local_err);
-    if (local_err != NULL) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
-    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
-        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
-        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
-        goto out;
-    }
-
-    if (cpu->ucode_rev == 0) {
-        /* The default is the same as KVM's.  */
-        if (IS_AMD_CPU(env)) {
-            cpu->ucode_rev = 0x01000065;
-        } else {
-            cpu->ucode_rev = 0x100000000ULL;
-        }
-    }
-
-    /* mwait extended info: needed for Core compatibility */
-    /* We always wake on interrupt even if host does not have the capability */
-    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
-
     if (cpu->apic_id == UNASSIGNED_APIC_ID) {
         error_setg(errp, "apic-id property was not initialized properly");
         return;
@@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
            & CPUID_EXT2_AMD_ALIASES);
     }
 
+    /* Process Hyper-V enlightenments */
+    x86_cpu_hyperv_realize(cpu);
+
+    cpu_exec_realizefn(cs, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
+        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
+        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
+        goto out;
+    }
+
+    if (cpu->ucode_rev == 0) {
+        /* The default is the same as KVM's.  */
+        if (IS_AMD_CPU(env)) {
+            cpu->ucode_rev = 0x01000065;
+        } else {
+            cpu->ucode_rev = 0x100000000ULL;
+        }
+    }
+
+    /* mwait extended info: needed for Core compatibility */
+    /* We always wake on interrupt even if host does not have the capability */
+    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
+
     /* For 64bit systems think about the number of physical bits to present.
      * ideally this should be the same as the host; anything other than matching
      * the host can cause incorrect guest behaviour.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
@ 2021-05-29  9:13   ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-05-29  9:13 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Eduardo Habkost,
	Siddharth Chandrasekaran, Philippe Mathieu-Daudé
  Cc: kvm, Michael S . Tsirkin, Marcelo Tosatti, Richard Henderson,
	qemu-devel, Cameron Esfahani, Roman Bolshakov, Claudio Fontana

we need to expand features first, before we attempt to check them
in the accel realizefn code called by cpu_exec_realizefn().

At the same time we need checks for code_urev and host_cpuid_required,
and modifications to cpu->mwait to happen after the initial setting
of them inside the accel realizefn code.

Partial Fix.

Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 9e211ac2ce..6bcb7dbc2c 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
     Error *local_err = NULL;
     static bool ht_warned;
 
-    /* Process Hyper-V enlightenments */
-    x86_cpu_hyperv_realize(cpu);
-
-    cpu_exec_realizefn(cs, &local_err);
-    if (local_err != NULL) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
-    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
-        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
-        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
-        goto out;
-    }
-
-    if (cpu->ucode_rev == 0) {
-        /* The default is the same as KVM's.  */
-        if (IS_AMD_CPU(env)) {
-            cpu->ucode_rev = 0x01000065;
-        } else {
-            cpu->ucode_rev = 0x100000000ULL;
-        }
-    }
-
-    /* mwait extended info: needed for Core compatibility */
-    /* We always wake on interrupt even if host does not have the capability */
-    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
-
     if (cpu->apic_id == UNASSIGNED_APIC_ID) {
         error_setg(errp, "apic-id property was not initialized properly");
         return;
@@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
            & CPUID_EXT2_AMD_ALIASES);
     }
 
+    /* Process Hyper-V enlightenments */
+    x86_cpu_hyperv_realize(cpu);
+
+    cpu_exec_realizefn(cs, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
+        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
+        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
+        goto out;
+    }
+
+    if (cpu->ucode_rev == 0) {
+        /* The default is the same as KVM's.  */
+        if (IS_AMD_CPU(env)) {
+            cpu->ucode_rev = 0x01000065;
+        } else {
+            cpu->ucode_rev = 0x100000000ULL;
+        }
+    }
+
+    /* mwait extended info: needed for Core compatibility */
+    /* We always wake on interrupt even if host does not have the capability */
+    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
+
     /* For 64bit systems think about the number of physical bits to present.
      * ideally this should be the same as the host; anything other than matching
      * the host can cause incorrect guest behaviour.
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init
  2021-05-29  9:13 ` Claudio Fontana
@ 2021-05-29  9:13   ` Claudio Fontana
  -1 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-05-29  9:13 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Eduardo Habkost,
	Siddharth Chandrasekaran, Philippe Mathieu-Daudé
  Cc: Claudio Fontana, Michael S . Tsirkin, kvm, Marcelo Tosatti,
	Richard Henderson, Cameron Esfahani, Roman Bolshakov, qemu-devel

This partially fixes host and max cpu initialization,
by running the accel cpu initialization only after all instance
init functions are called for all X86 cpu subclasses.

Partial Fix.

Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 target/i386/cpu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6bcb7dbc2c..ae148fbd2f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6422,6 +6422,11 @@ static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
     x86_cpu_register_bit_prop(xcc, name, w, bitnr);
 }
 
+static void x86_cpu_post_initfn(Object *obj)
+{
+    accel_cpu_instance_init(CPU(obj));
+}
+
 static void x86_cpu_initfn(Object *obj)
 {
     X86CPU *cpu = X86_CPU(obj);
@@ -6473,9 +6478,6 @@ static void x86_cpu_initfn(Object *obj)
     if (xcc->model) {
         x86_cpu_load_model(cpu, xcc->model);
     }
-
-    /* if required, do accelerator-specific cpu initializations */
-    accel_cpu_instance_init(CPU(obj));
 }
 
 static int64_t x86_cpu_get_arch_id(CPUState *cs)
@@ -6810,6 +6812,8 @@ static const TypeInfo x86_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(X86CPU),
     .instance_init = x86_cpu_initfn,
+    .instance_post_init = x86_cpu_post_initfn,
+
     .abstract = true,
     .class_size = sizeof(X86CPUClass),
     .class_init = x86_cpu_common_class_init,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init
@ 2021-05-29  9:13   ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-05-29  9:13 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini, Eduardo Habkost,
	Siddharth Chandrasekaran, Philippe Mathieu-Daudé
  Cc: kvm, Michael S . Tsirkin, Marcelo Tosatti, Richard Henderson,
	qemu-devel, Cameron Esfahani, Roman Bolshakov, Claudio Fontana

This partially fixes host and max cpu initialization,
by running the accel cpu initialization only after all instance
init functions are called for all X86 cpu subclasses.

Partial Fix.

Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
Signed-off-by: Claudio Fontana <cfontana@suse.de>
---
 target/i386/cpu.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 6bcb7dbc2c..ae148fbd2f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6422,6 +6422,11 @@ static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
     x86_cpu_register_bit_prop(xcc, name, w, bitnr);
 }
 
+static void x86_cpu_post_initfn(Object *obj)
+{
+    accel_cpu_instance_init(CPU(obj));
+}
+
 static void x86_cpu_initfn(Object *obj)
 {
     X86CPU *cpu = X86_CPU(obj);
@@ -6473,9 +6478,6 @@ static void x86_cpu_initfn(Object *obj)
     if (xcc->model) {
         x86_cpu_load_model(cpu, xcc->model);
     }
-
-    /* if required, do accelerator-specific cpu initializations */
-    accel_cpu_instance_init(CPU(obj));
 }
 
 static int64_t x86_cpu_get_arch_id(CPUState *cs)
@@ -6810,6 +6812,8 @@ static const TypeInfo x86_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(X86CPU),
     .instance_init = x86_cpu_initfn,
+    .instance_post_init = x86_cpu_post_initfn,
+
     .abstract = true,
     .class_size = sizeof(X86CPUClass),
     .class_init = x86_cpu_common_class_init,
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  2021-05-29  9:13   ` Claudio Fontana
@ 2021-06-01 18:48     ` Eduardo Habkost
  -1 siblings, 0 replies; 18+ messages in thread
From: Eduardo Habkost @ 2021-06-01 18:48 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Peter Maydell, Paolo Bonzini, Siddharth Chandrasekaran,
	Philippe Mathieu-Daudé,
	Michael S . Tsirkin, kvm, Marcelo Tosatti, Richard Henderson,
	Cameron Esfahani, Roman Bolshakov, qemu-devel, Vitaly Kuznetsov

+Vitaly

On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote:
> we need to expand features first, before we attempt to check them
> in the accel realizefn code called by cpu_exec_realizefn().
> 
> At the same time we need checks for code_urev and host_cpuid_required,
> and modifications to cpu->mwait to happen after the initial setting
> of them inside the accel realizefn code.

I miss an explanation why those 3 steps need to happen after
accel realizefn.

I'm worried by the fragility of the ordering.  If there are
specific things that must be done before/after
cpu_exec_realizefn(), this needs to be clear in the code.

> 
> Partial Fix.
> 
> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")

Shouldn't this be
  f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
?

Also, it looks like part of the ordering change was made in
commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in
cpu_exec_realizefn").



> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> ---
>  target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 28 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 9e211ac2ce..6bcb7dbc2c 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>      Error *local_err = NULL;
>      static bool ht_warned;
>  
> -    /* Process Hyper-V enlightenments */
> -    x86_cpu_hyperv_realize(cpu);

Vitaly, is this reordering going to affect the Hyper-V cleanup
work you are doing?  It seems harmless and it makes sense to keep
the "realize" functions close together, but I'd like to confirm.

> -
> -    cpu_exec_realizefn(cs, &local_err);
> -    if (local_err != NULL) {
> -        error_propagate(errp, local_err);
> -        return;
> -    }
> -
> -    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> -        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> -        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> -        goto out;
> -    }
> -
> -    if (cpu->ucode_rev == 0) {
> -        /* The default is the same as KVM's.  */
> -        if (IS_AMD_CPU(env)) {
> -            cpu->ucode_rev = 0x01000065;
> -        } else {
> -            cpu->ucode_rev = 0x100000000ULL;
> -        }
> -    }
> -
> -    /* mwait extended info: needed for Core compatibility */
> -    /* We always wake on interrupt even if host does not have the capability */
> -    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> -
>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
>          error_setg(errp, "apic-id property was not initialized properly");
>          return;
> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>             & CPUID_EXT2_AMD_ALIASES);
>      }
>  
> +    /* Process Hyper-V enlightenments */
> +    x86_cpu_hyperv_realize(cpu);
> +
> +    cpu_exec_realizefn(cs, &local_err);

I'm worried by the reordering of cpu_exec_realizefn().  That
function does a lot, and reordering it might have even more
unwanted side effects.

I wonder if it wouldn't be easier to revert commit 30565f10e907
("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn").


> +    if (local_err != NULL) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> +        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> +        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> +        goto out;
> +    }
> +
> +    if (cpu->ucode_rev == 0) {
> +        /* The default is the same as KVM's.  */
> +        if (IS_AMD_CPU(env)) {
> +            cpu->ucode_rev = 0x01000065;
> +        } else {
> +            cpu->ucode_rev = 0x100000000ULL;
> +        }
> +    }
> +
> +    /* mwait extended info: needed for Core compatibility */
> +    /* We always wake on interrupt even if host does not have the capability */
> +    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> +

The dependency between those lines and cpu_exec_realizefn() is
completely unclear here.  What can we do to make this clearer and
less fragile?

Note that this is not a comment on this fix, specifically, but on
how the initialization ordering is easy to break here.


>      /* For 64bit systems think about the number of physical bits to present.
>       * ideally this should be the same as the host; anything other than matching
>       * the host can cause incorrect guest behaviour.
> -- 
> 2.26.2
> 

-- 
Eduardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
@ 2021-06-01 18:48     ` Eduardo Habkost
  0 siblings, 0 replies; 18+ messages in thread
From: Eduardo Habkost @ 2021-06-01 18:48 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Peter Maydell, Marcelo Tosatti, kvm, Siddharth Chandrasekaran,
	Michael S . Tsirkin, Richard Henderson, qemu-devel,
	Cameron Esfahani, Roman Bolshakov, Paolo Bonzini,
	Vitaly Kuznetsov, Philippe Mathieu-Daudé

+Vitaly

On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote:
> we need to expand features first, before we attempt to check them
> in the accel realizefn code called by cpu_exec_realizefn().
> 
> At the same time we need checks for code_urev and host_cpuid_required,
> and modifications to cpu->mwait to happen after the initial setting
> of them inside the accel realizefn code.

I miss an explanation why those 3 steps need to happen after
accel realizefn.

I'm worried by the fragility of the ordering.  If there are
specific things that must be done before/after
cpu_exec_realizefn(), this needs to be clear in the code.

> 
> Partial Fix.
> 
> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")

Shouldn't this be
  f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
?

Also, it looks like part of the ordering change was made in
commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in
cpu_exec_realizefn").



> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> ---
>  target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
>  1 file changed, 28 insertions(+), 28 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 9e211ac2ce..6bcb7dbc2c 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>      Error *local_err = NULL;
>      static bool ht_warned;
>  
> -    /* Process Hyper-V enlightenments */
> -    x86_cpu_hyperv_realize(cpu);

Vitaly, is this reordering going to affect the Hyper-V cleanup
work you are doing?  It seems harmless and it makes sense to keep
the "realize" functions close together, but I'd like to confirm.

> -
> -    cpu_exec_realizefn(cs, &local_err);
> -    if (local_err != NULL) {
> -        error_propagate(errp, local_err);
> -        return;
> -    }
> -
> -    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> -        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> -        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> -        goto out;
> -    }
> -
> -    if (cpu->ucode_rev == 0) {
> -        /* The default is the same as KVM's.  */
> -        if (IS_AMD_CPU(env)) {
> -            cpu->ucode_rev = 0x01000065;
> -        } else {
> -            cpu->ucode_rev = 0x100000000ULL;
> -        }
> -    }
> -
> -    /* mwait extended info: needed for Core compatibility */
> -    /* We always wake on interrupt even if host does not have the capability */
> -    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> -
>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
>          error_setg(errp, "apic-id property was not initialized properly");
>          return;
> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>             & CPUID_EXT2_AMD_ALIASES);
>      }
>  
> +    /* Process Hyper-V enlightenments */
> +    x86_cpu_hyperv_realize(cpu);
> +
> +    cpu_exec_realizefn(cs, &local_err);

I'm worried by the reordering of cpu_exec_realizefn().  That
function does a lot, and reordering it might have even more
unwanted side effects.

I wonder if it wouldn't be easier to revert commit 30565f10e907
("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn").


> +    if (local_err != NULL) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> +        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> +        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> +        goto out;
> +    }
> +
> +    if (cpu->ucode_rev == 0) {
> +        /* The default is the same as KVM's.  */
> +        if (IS_AMD_CPU(env)) {
> +            cpu->ucode_rev = 0x01000065;
> +        } else {
> +            cpu->ucode_rev = 0x100000000ULL;
> +        }
> +    }
> +
> +    /* mwait extended info: needed for Core compatibility */
> +    /* We always wake on interrupt even if host does not have the capability */
> +    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> +

The dependency between those lines and cpu_exec_realizefn() is
completely unclear here.  What can we do to make this clearer and
less fragile?

Note that this is not a comment on this fix, specifically, but on
how the initialization ordering is easy to break here.


>      /* For 64bit systems think about the number of physical bits to present.
>       * ideally this should be the same as the host; anything other than matching
>       * the host can cause incorrect guest behaviour.
> -- 
> 2.26.2
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init
  2021-05-29  9:13   ` Claudio Fontana
@ 2021-06-01 18:59     ` Eduardo Habkost
  -1 siblings, 0 replies; 18+ messages in thread
From: Eduardo Habkost @ 2021-06-01 18:59 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Peter Maydell, Paolo Bonzini, Siddharth Chandrasekaran,
	Philippe Mathieu-Daudé,
	Michael S . Tsirkin, kvm, Marcelo Tosatti, Richard Henderson,
	Cameron Esfahani, Roman Bolshakov, qemu-devel

On Sat, May 29, 2021 at 11:13:13AM +0200, Claudio Fontana wrote:
> This partially fixes host and max cpu initialization,
> by running the accel cpu initialization only after all instance
> init functions are called for all X86 cpu subclasses.

Can you describe what exactly are the initialization ordering
dependencies that were broken?

> 
> Partial Fix.

What does "partial fix" mean?

> 
> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> Signed-off-by: Claudio Fontana <cfontana@suse.de>

The fix looks simple and may be obvious, my only concerns are:

1. Testing.  Luckily we are a bit early in the release cycle so
   we have some time for that.
2. Describing more clearly what exactly was wrong.  This can be
   fixed manually in the commit message when applying the patch.


An even better long term solution would be removing the
initialization ordering dependencies and make
accel_cpu_instance_init() safe to be called earlier.  Would that
be doable?


> ---
>  target/i386/cpu.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 6bcb7dbc2c..ae148fbd2f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6422,6 +6422,11 @@ static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
>      x86_cpu_register_bit_prop(xcc, name, w, bitnr);
>  }
>  
> +static void x86_cpu_post_initfn(Object *obj)
> +{
> +    accel_cpu_instance_init(CPU(obj));
> +}
> +
>  static void x86_cpu_initfn(Object *obj)
>  {
>      X86CPU *cpu = X86_CPU(obj);
> @@ -6473,9 +6478,6 @@ static void x86_cpu_initfn(Object *obj)
>      if (xcc->model) {
>          x86_cpu_load_model(cpu, xcc->model);
>      }
> -
> -    /* if required, do accelerator-specific cpu initializations */
> -    accel_cpu_instance_init(CPU(obj));
>  }
>  
>  static int64_t x86_cpu_get_arch_id(CPUState *cs)
> @@ -6810,6 +6812,8 @@ static const TypeInfo x86_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(X86CPU),
>      .instance_init = x86_cpu_initfn,
> +    .instance_post_init = x86_cpu_post_initfn,
> +
>      .abstract = true,
>      .class_size = sizeof(X86CPUClass),
>      .class_init = x86_cpu_common_class_init,
> -- 
> 2.26.2
> 

-- 
Eduardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init
@ 2021-06-01 18:59     ` Eduardo Habkost
  0 siblings, 0 replies; 18+ messages in thread
From: Eduardo Habkost @ 2021-06-01 18:59 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Peter Maydell, Marcelo Tosatti, kvm, Siddharth Chandrasekaran,
	Michael S . Tsirkin, Richard Henderson, qemu-devel,
	Cameron Esfahani, Roman Bolshakov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On Sat, May 29, 2021 at 11:13:13AM +0200, Claudio Fontana wrote:
> This partially fixes host and max cpu initialization,
> by running the accel cpu initialization only after all instance
> init functions are called for all X86 cpu subclasses.

Can you describe what exactly are the initialization ordering
dependencies that were broken?

> 
> Partial Fix.

What does "partial fix" mean?

> 
> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> Signed-off-by: Claudio Fontana <cfontana@suse.de>

The fix looks simple and may be obvious, my only concerns are:

1. Testing.  Luckily we are a bit early in the release cycle so
   we have some time for that.
2. Describing more clearly what exactly was wrong.  This can be
   fixed manually in the commit message when applying the patch.


An even better long term solution would be removing the
initialization ordering dependencies and make
accel_cpu_instance_init() safe to be called earlier.  Would that
be doable?


> ---
>  target/i386/cpu.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 6bcb7dbc2c..ae148fbd2f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -6422,6 +6422,11 @@ static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
>      x86_cpu_register_bit_prop(xcc, name, w, bitnr);
>  }
>  
> +static void x86_cpu_post_initfn(Object *obj)
> +{
> +    accel_cpu_instance_init(CPU(obj));
> +}
> +
>  static void x86_cpu_initfn(Object *obj)
>  {
>      X86CPU *cpu = X86_CPU(obj);
> @@ -6473,9 +6478,6 @@ static void x86_cpu_initfn(Object *obj)
>      if (xcc->model) {
>          x86_cpu_load_model(cpu, xcc->model);
>      }
> -
> -    /* if required, do accelerator-specific cpu initializations */
> -    accel_cpu_instance_init(CPU(obj));
>  }
>  
>  static int64_t x86_cpu_get_arch_id(CPUState *cs)
> @@ -6810,6 +6812,8 @@ static const TypeInfo x86_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(X86CPU),
>      .instance_init = x86_cpu_initfn,
> +    .instance_post_init = x86_cpu_post_initfn,
> +
>      .abstract = true,
>      .class_size = sizeof(X86CPUClass),
>      .class_init = x86_cpu_common_class_init,
> -- 
> 2.26.2
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init
  2021-06-01 18:59     ` Eduardo Habkost
@ 2021-06-03  7:38       ` Claudio Fontana
  -1 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-06-03  7:38 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, Paolo Bonzini, Siddharth Chandrasekaran,
	Philippe Mathieu-Daudé,
	Michael S . Tsirkin, kvm, Marcelo Tosatti, Richard Henderson,
	Cameron Esfahani, Roman Bolshakov, qemu-devel

On 6/1/21 8:59 PM, Eduardo Habkost wrote:
> On Sat, May 29, 2021 at 11:13:13AM +0200, Claudio Fontana wrote:
>> This partially fixes host and max cpu initialization,
>> by running the accel cpu initialization only after all instance
>> init functions are called for all X86 cpu subclasses.
> 
> Can you describe what exactly are the initialization ordering
> dependencies that were broken?
> 
>>
>> Partial Fix.
> 
> What does "partial fix" mean?

Hi Eduardo, this is in response to the report that Windows + UEFI is not booting anymore:

https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg06692.html

Probably "Partial Fix" does not make sense in the context of the commit log, as there the context of the overall issue is not present.

> 
>>
>> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> 
> The fix looks simple and may be obvious, my only concerns are:
> 
> 1. Testing.  Luckily we are a bit early in the release cycle so
>    we have some time for that.


Indeed, and unfortunately neither make check nor gitlab CI capture this issue.
Smoke testing a boot with OVMF_VARS.secboot.fd does:

./build/x86_64-softmmu/qemu-system-x86_64 \
        -cpu host \
        -enable-kvm \
        -name test,debug-threads=on \
        -smp 1,threads=1,cores=1,sockets=1 \
        -m 4G \
        -net nic -net user \
        -boot d,menu=on \
        -usbdevice tablet \
        -machine q35,smm=on \
        -drive if=pflash,format=raw,readonly=on,unit=0,file="./OVMF_CODE.secboot.fd" \
        -drive if=pflash,format=raw,unit=1,file="./OVMF_VARS.secboot.fd" \
        -global ICH9-LPC.disable_s3=1 \
        -global driver=cfi.pflash01,property=secure,value=on \
        -cdrom "./Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO" \
        -device ahci,id=ahci

> 2. Describing more clearly what exactly was wrong.  This can be
>    fixed manually in the commit message when applying the patch.
> 

"max features" cpu models (cpu "max" and "host", which is child of "max") rely on the instance initialization code
to set max_features = true;

this is then checked during feature expansion to add features not explicitly set by the user.

The original code in the instance initialization was performing different actions depending on the accelerator,
with a fallback "else" that was used to set properties for TCG and all other accelerators not explicitly handled (KVM and HVF).

My patch went on to split the code that was accelerator-specific, into kvm accel-cpu and hvf accel-cpu,
to perform the accelerator-specific initializations there, for the case where cpu->max_features is true.

However I inserted the call to the accel cpu initialization at the end of the x86 base class code initialization (x86_cpu_initfn).

That was wrong, because the code in the accel cpu for KVM, for example, ends up being called _before_ the subclass ("max") initialization,
so it has no chance to see "max_features = true", and so does not initialize the KVM properties.
And further, since the subclass "max" code gets called after kvm-cpu accel, it goes on and writes defaults properties that were supposed to be overridden for the KVM specialization.


> 
> An even better long term solution would be removing the
> initialization ordering dependencies and make
> accel_cpu_instance_init() safe to be called earlier.  Would that
> be doable?


I think that the max_features mechanism would need to be reworked for that, currently I do not see how to make this doable with the existing way x86 is initialized and realized.

If the post_init method works for everybody I personally think we should go with that,
along with an improvement of our tests (make check + CI) to capture such problems specifically looking at cpu "max" and "host".

Thanks,

Claudio

> 
> 
>> ---
>>  target/i386/cpu.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 6bcb7dbc2c..ae148fbd2f 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6422,6 +6422,11 @@ static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
>>      x86_cpu_register_bit_prop(xcc, name, w, bitnr);
>>  }
>>  
>> +static void x86_cpu_post_initfn(Object *obj)
>> +{
>> +    accel_cpu_instance_init(CPU(obj));
>> +}
>> +
>>  static void x86_cpu_initfn(Object *obj)
>>  {
>>      X86CPU *cpu = X86_CPU(obj);
>> @@ -6473,9 +6478,6 @@ static void x86_cpu_initfn(Object *obj)
>>      if (xcc->model) {
>>          x86_cpu_load_model(cpu, xcc->model);
>>      }
>> -
>> -    /* if required, do accelerator-specific cpu initializations */
>> -    accel_cpu_instance_init(CPU(obj));
>>  }
>>  
>>  static int64_t x86_cpu_get_arch_id(CPUState *cs)
>> @@ -6810,6 +6812,8 @@ static const TypeInfo x86_cpu_type_info = {
>>      .parent = TYPE_CPU,
>>      .instance_size = sizeof(X86CPU),
>>      .instance_init = x86_cpu_initfn,
>> +    .instance_post_init = x86_cpu_post_initfn,
>> +
>>      .abstract = true,
>>      .class_size = sizeof(X86CPUClass),
>>      .class_init = x86_cpu_common_class_init,
>> -- 
>> 2.26.2
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init
@ 2021-06-03  7:38       ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-06-03  7:38 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, Marcelo Tosatti, kvm, Siddharth Chandrasekaran,
	Michael S . Tsirkin, Richard Henderson, qemu-devel,
	Cameron Esfahani, Roman Bolshakov, Paolo Bonzini,
	Philippe Mathieu-Daudé

On 6/1/21 8:59 PM, Eduardo Habkost wrote:
> On Sat, May 29, 2021 at 11:13:13AM +0200, Claudio Fontana wrote:
>> This partially fixes host and max cpu initialization,
>> by running the accel cpu initialization only after all instance
>> init functions are called for all X86 cpu subclasses.
> 
> Can you describe what exactly are the initialization ordering
> dependencies that were broken?
> 
>>
>> Partial Fix.
> 
> What does "partial fix" mean?

Hi Eduardo, this is in response to the report that Windows + UEFI is not booting anymore:

https://lists.gnu.org/archive/html/qemu-devel/2021-05/msg06692.html

Probably "Partial Fix" does not make sense in the context of the commit log, as there the context of the overall issue is not present.

> 
>>
>> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> 
> The fix looks simple and may be obvious, my only concerns are:
> 
> 1. Testing.  Luckily we are a bit early in the release cycle so
>    we have some time for that.


Indeed, and unfortunately neither make check nor gitlab CI capture this issue.
Smoke testing a boot with OVMF_VARS.secboot.fd does:

./build/x86_64-softmmu/qemu-system-x86_64 \
        -cpu host \
        -enable-kvm \
        -name test,debug-threads=on \
        -smp 1,threads=1,cores=1,sockets=1 \
        -m 4G \
        -net nic -net user \
        -boot d,menu=on \
        -usbdevice tablet \
        -machine q35,smm=on \
        -drive if=pflash,format=raw,readonly=on,unit=0,file="./OVMF_CODE.secboot.fd" \
        -drive if=pflash,format=raw,unit=1,file="./OVMF_VARS.secboot.fd" \
        -global ICH9-LPC.disable_s3=1 \
        -global driver=cfi.pflash01,property=secure,value=on \
        -cdrom "./Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO" \
        -device ahci,id=ahci

> 2. Describing more clearly what exactly was wrong.  This can be
>    fixed manually in the commit message when applying the patch.
> 

"max features" cpu models (cpu "max" and "host", which is child of "max") rely on the instance initialization code
to set max_features = true;

this is then checked during feature expansion to add features not explicitly set by the user.

The original code in the instance initialization was performing different actions depending on the accelerator,
with a fallback "else" that was used to set properties for TCG and all other accelerators not explicitly handled (KVM and HVF).

My patch went on to split the code that was accelerator-specific, into kvm accel-cpu and hvf accel-cpu,
to perform the accelerator-specific initializations there, for the case where cpu->max_features is true.

However I inserted the call to the accel cpu initialization at the end of the x86 base class code initialization (x86_cpu_initfn).

That was wrong, because the code in the accel cpu for KVM, for example, ends up being called _before_ the subclass ("max") initialization,
so it has no chance to see "max_features = true", and so does not initialize the KVM properties.
And further, since the subclass "max" code gets called after kvm-cpu accel, it goes on and writes defaults properties that were supposed to be overridden for the KVM specialization.


> 
> An even better long term solution would be removing the
> initialization ordering dependencies and make
> accel_cpu_instance_init() safe to be called earlier.  Would that
> be doable?


I think that the max_features mechanism would need to be reworked for that, currently I do not see how to make this doable with the existing way x86 is initialized and realized.

If the post_init method works for everybody I personally think we should go with that,
along with an improvement of our tests (make check + CI) to capture such problems specifically looking at cpu "max" and "host".

Thanks,

Claudio

> 
> 
>> ---
>>  target/i386/cpu.c | 10 +++++++---
>>  1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 6bcb7dbc2c..ae148fbd2f 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6422,6 +6422,11 @@ static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
>>      x86_cpu_register_bit_prop(xcc, name, w, bitnr);
>>  }
>>  
>> +static void x86_cpu_post_initfn(Object *obj)
>> +{
>> +    accel_cpu_instance_init(CPU(obj));
>> +}
>> +
>>  static void x86_cpu_initfn(Object *obj)
>>  {
>>      X86CPU *cpu = X86_CPU(obj);
>> @@ -6473,9 +6478,6 @@ static void x86_cpu_initfn(Object *obj)
>>      if (xcc->model) {
>>          x86_cpu_load_model(cpu, xcc->model);
>>      }
>> -
>> -    /* if required, do accelerator-specific cpu initializations */
>> -    accel_cpu_instance_init(CPU(obj));
>>  }
>>  
>>  static int64_t x86_cpu_get_arch_id(CPUState *cs)
>> @@ -6810,6 +6812,8 @@ static const TypeInfo x86_cpu_type_info = {
>>      .parent = TYPE_CPU,
>>      .instance_size = sizeof(X86CPU),
>>      .instance_init = x86_cpu_initfn,
>> +    .instance_post_init = x86_cpu_post_initfn,
>> +
>>      .abstract = true,
>>      .class_size = sizeof(X86CPUClass),
>>      .class_init = x86_cpu_common_class_init,
>> -- 
>> 2.26.2
>>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  2021-06-01 18:48     ` Eduardo Habkost
@ 2021-06-03  8:13       ` Claudio Fontana
  -1 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-06-03  8:13 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, Paolo Bonzini, Siddharth Chandrasekaran,
	Philippe Mathieu-Daudé,
	Michael S . Tsirkin, kvm, Marcelo Tosatti, Richard Henderson,
	Cameron Esfahani, Roman Bolshakov, qemu-devel, Vitaly Kuznetsov

On 6/1/21 8:48 PM, Eduardo Habkost wrote:
> +Vitaly
> 
> On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote:
>> we need to expand features first, before we attempt to check them
>> in the accel realizefn code called by cpu_exec_realizefn().
>>
>> At the same time we need checks for code_urev and host_cpuid_required,
>> and modifications to cpu->mwait to happen after the initial setting
>> of them inside the accel realizefn code.
> 
> I miss an explanation why those 3 steps need to happen after
> accel realizefn.
> 
> I'm worried by the fragility of the ordering.  If there are
> specific things that must be done before/after
> cpu_exec_realizefn(), this needs to be clear in the code.

Hi Eduardo,

indeed the initialization and realization code for x86 appears to be very sensitive to ordering.
This continues to be the case after the fix.

cpu_exec_realizefn



> 
>>
>> Partial Fix.
>>
>> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> 
> Shouldn't this be
>   f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> ?
> 
> Also, it looks like part of the ordering change was made in
> commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in
> cpu_exec_realizefn").
> 
> 
> 
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> ---
>>  target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
>>  1 file changed, 28 insertions(+), 28 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 9e211ac2ce..6bcb7dbc2c 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>      Error *local_err = NULL;
>>      static bool ht_warned;
>>  
>> -    /* Process Hyper-V enlightenments */
>> -    x86_cpu_hyperv_realize(cpu);
> 
> Vitaly, is this reordering going to affect the Hyper-V cleanup
> work you are doing?  It seems harmless and it makes sense to keep
> the "realize" functions close together, but I'd like to confirm.
> 
>> -
>> -    cpu_exec_realizefn(cs, &local_err);
>> -    if (local_err != NULL) {
>> -        error_propagate(errp, local_err);
>> -        return;
>> -    }
>> -
>> -    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
>> -        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
>> -        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
>> -        goto out;
>> -    }
>> -
>> -    if (cpu->ucode_rev == 0) {
>> -        /* The default is the same as KVM's.  */
>> -        if (IS_AMD_CPU(env)) {
>> -            cpu->ucode_rev = 0x01000065;
>> -        } else {
>> -            cpu->ucode_rev = 0x100000000ULL;
>> -        }
>> -    }
>> -
>> -    /* mwait extended info: needed for Core compatibility */
>> -    /* We always wake on interrupt even if host does not have the capability */
>> -    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> -
>>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
>>          error_setg(errp, "apic-id property was not initialized properly");
>>          return;
>> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>             & CPUID_EXT2_AMD_ALIASES);
>>      }
>>  
>> +    /* Process Hyper-V enlightenments */
>> +    x86_cpu_hyperv_realize(cpu);
>> +
>> +    cpu_exec_realizefn(cs, &local_err);
> 
> I'm worried by the reordering of cpu_exec_realizefn().  That
> function does a lot, and reordering it might have even more
> unwanted side effects.
> 
> I wonder if it wouldn't be easier to revert commit 30565f10e907
> ("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn").

the part that is wrong in that commit does not I think have to do with where the accel_cpu::cpu_realizefn() is called, but rather
the move of the call to cpu_exec_realizefn (now including the accel_cpu::cpu_realizefn) to the very beginning of the function.

This was done due to the fact that the accel-specific code needs to be called before the code:

* if (cpu->ucode_rev == 0) {

(meaning "use default if nothing accelerator specific has been set"), as this could be set by accel-specific code,

* cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;

(as the mwait is written to by cpu "host" kvm/hvf-specific code when enabling cpu pm),

* if (cpu->phys_bits && ...

(as the phys bits can be set by calling the host CPUID)

But I missed there that we cannot move the call before the expansion of cpu features,
as the accel realize code checks and enables additional features assuming expansion has already happened.

This was what was breaking the cpu "host" phys bits, even after correcting the cpu instance initialization order,
as the KVM/HVF -specific code would do the adjustment of phys bits to match the host only if:

* if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {



> 
> 
>> +    if (local_err != NULL) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
>> +        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
>> +        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
>> +        goto out;
>> +    }
>> +
>> +    if (cpu->ucode_rev == 0) {
>> +        /* The default is the same as KVM's.  */
>> +        if (IS_AMD_CPU(env)) {
>> +            cpu->ucode_rev = 0x01000065;
>> +        } else {
>> +            cpu->ucode_rev = 0x100000000ULL;
>> +        }
>> +    }
>> +
>> +    /* mwait extended info: needed for Core compatibility */
>> +    /* We always wake on interrupt even if host does not have the capability */
>> +    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> +
> 
> The dependency between those lines and cpu_exec_realizefn() is
> completely unclear here.  What can we do to make this clearer and
> less fragile?

Should I add something similar to my comment above?

There _is_ something already in the patch that I added as I detected these dependencies,
but I notably did not mention the mwait one, and missed completely the cpu expansion issue.

It's in kvm-cpu.c:

static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
{
    /*                                                                                                                                      
     * The realize order is important, since x86_cpu_realize() checks if                                                                    
     * nothing else has been set by the user (or by accelerators) in                                                                        
     * cpu->ucode_rev and cpu->phys_bits.                                                                                                   
     *                                                                                                                                      
     * realize order:                                                                                                                       
     * kvm_cpu -> host_cpu -> x86_cpu                                                                                                       
     */

Maybe there is a better place to document this, where we could also describe in more detail the other dependencies?

On my side the main question would be: did I miss some other order dependency?

Knowing exactly what the current code assumptions are, and where those dependencies lie
would be preferable I think compared with reverting the commits.

Something able to cover this with reliable tests would be a good way to feel more confident with the resolution,
to make sure that something else is not hiding in there..

> 
> Note that this is not a comment on this fix, specifically, but on
> how the initialization ordering is easy to break here.
> 
> 
>>      /* For 64bit systems think about the number of physical bits to present.
>>       * ideally this should be the same as the host; anything other than matching
>>       * the host can cause incorrect guest behaviour.
>> -- 
>> 2.26.2
>>
> 

Thanks,

Claudio

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
@ 2021-06-03  8:13       ` Claudio Fontana
  0 siblings, 0 replies; 18+ messages in thread
From: Claudio Fontana @ 2021-06-03  8:13 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Peter Maydell, Marcelo Tosatti, kvm, Siddharth Chandrasekaran,
	Michael S . Tsirkin, Richard Henderson, qemu-devel,
	Cameron Esfahani, Roman Bolshakov, Paolo Bonzini,
	Vitaly Kuznetsov, Philippe Mathieu-Daudé

On 6/1/21 8:48 PM, Eduardo Habkost wrote:
> +Vitaly
> 
> On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote:
>> we need to expand features first, before we attempt to check them
>> in the accel realizefn code called by cpu_exec_realizefn().
>>
>> At the same time we need checks for code_urev and host_cpuid_required,
>> and modifications to cpu->mwait to happen after the initial setting
>> of them inside the accel realizefn code.
> 
> I miss an explanation why those 3 steps need to happen after
> accel realizefn.
> 
> I'm worried by the fragility of the ordering.  If there are
> specific things that must be done before/after
> cpu_exec_realizefn(), this needs to be clear in the code.

Hi Eduardo,

indeed the initialization and realization code for x86 appears to be very sensitive to ordering.
This continues to be the case after the fix.

cpu_exec_realizefn



> 
>>
>> Partial Fix.
>>
>> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> 
> Shouldn't this be
>   f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> ?
> 
> Also, it looks like part of the ordering change was made in
> commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in
> cpu_exec_realizefn").
> 
> 
> 
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> ---
>>  target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
>>  1 file changed, 28 insertions(+), 28 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 9e211ac2ce..6bcb7dbc2c 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>      Error *local_err = NULL;
>>      static bool ht_warned;
>>  
>> -    /* Process Hyper-V enlightenments */
>> -    x86_cpu_hyperv_realize(cpu);
> 
> Vitaly, is this reordering going to affect the Hyper-V cleanup
> work you are doing?  It seems harmless and it makes sense to keep
> the "realize" functions close together, but I'd like to confirm.
> 
>> -
>> -    cpu_exec_realizefn(cs, &local_err);
>> -    if (local_err != NULL) {
>> -        error_propagate(errp, local_err);
>> -        return;
>> -    }
>> -
>> -    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
>> -        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
>> -        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
>> -        goto out;
>> -    }
>> -
>> -    if (cpu->ucode_rev == 0) {
>> -        /* The default is the same as KVM's.  */
>> -        if (IS_AMD_CPU(env)) {
>> -            cpu->ucode_rev = 0x01000065;
>> -        } else {
>> -            cpu->ucode_rev = 0x100000000ULL;
>> -        }
>> -    }
>> -
>> -    /* mwait extended info: needed for Core compatibility */
>> -    /* We always wake on interrupt even if host does not have the capability */
>> -    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> -
>>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
>>          error_setg(errp, "apic-id property was not initialized properly");
>>          return;
>> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>             & CPUID_EXT2_AMD_ALIASES);
>>      }
>>  
>> +    /* Process Hyper-V enlightenments */
>> +    x86_cpu_hyperv_realize(cpu);
>> +
>> +    cpu_exec_realizefn(cs, &local_err);
> 
> I'm worried by the reordering of cpu_exec_realizefn().  That
> function does a lot, and reordering it might have even more
> unwanted side effects.
> 
> I wonder if it wouldn't be easier to revert commit 30565f10e907
> ("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn").

the part that is wrong in that commit does not I think have to do with where the accel_cpu::cpu_realizefn() is called, but rather
the move of the call to cpu_exec_realizefn (now including the accel_cpu::cpu_realizefn) to the very beginning of the function.

This was done due to the fact that the accel-specific code needs to be called before the code:

* if (cpu->ucode_rev == 0) {

(meaning "use default if nothing accelerator specific has been set"), as this could be set by accel-specific code,

* cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;

(as the mwait is written to by cpu "host" kvm/hvf-specific code when enabling cpu pm),

* if (cpu->phys_bits && ...

(as the phys bits can be set by calling the host CPUID)

But I missed there that we cannot move the call before the expansion of cpu features,
as the accel realize code checks and enables additional features assuming expansion has already happened.

This was what was breaking the cpu "host" phys bits, even after correcting the cpu instance initialization order,
as the KVM/HVF -specific code would do the adjustment of phys bits to match the host only if:

* if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {



> 
> 
>> +    if (local_err != NULL) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
>> +        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
>> +        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
>> +        goto out;
>> +    }
>> +
>> +    if (cpu->ucode_rev == 0) {
>> +        /* The default is the same as KVM's.  */
>> +        if (IS_AMD_CPU(env)) {
>> +            cpu->ucode_rev = 0x01000065;
>> +        } else {
>> +            cpu->ucode_rev = 0x100000000ULL;
>> +        }
>> +    }
>> +
>> +    /* mwait extended info: needed for Core compatibility */
>> +    /* We always wake on interrupt even if host does not have the capability */
>> +    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> +
> 
> The dependency between those lines and cpu_exec_realizefn() is
> completely unclear here.  What can we do to make this clearer and
> less fragile?

Should I add something similar to my comment above?

There _is_ something already in the patch that I added as I detected these dependencies,
but I notably did not mention the mwait one, and missed completely the cpu expansion issue.

It's in kvm-cpu.c:

static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
{
    /*                                                                                                                                      
     * The realize order is important, since x86_cpu_realize() checks if                                                                    
     * nothing else has been set by the user (or by accelerators) in                                                                        
     * cpu->ucode_rev and cpu->phys_bits.                                                                                                   
     *                                                                                                                                      
     * realize order:                                                                                                                       
     * kvm_cpu -> host_cpu -> x86_cpu                                                                                                       
     */

Maybe there is a better place to document this, where we could also describe in more detail the other dependencies?

On my side the main question would be: did I miss some other order dependency?

Knowing exactly what the current code assumptions are, and where those dependencies lie
would be preferable I think compared with reverting the commits.

Something able to cover this with reliable tests would be a good way to feel more confident with the resolution,
to make sure that something else is not hiding in there..

> 
> Note that this is not a comment on this fix, specifically, but on
> how the initialization ordering is easy to break here.
> 
> 
>>      /* For 64bit systems think about the number of physical bits to present.
>>       * ideally this should be the same as the host; anything other than matching
>>       * the host can cause incorrect guest behaviour.
>> -- 
>> 2.26.2
>>
> 

Thanks,

Claudio


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  2021-06-01 18:48     ` Eduardo Habkost
@ 2021-06-03  8:45       ` Vitaly Kuznetsov
  -1 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2021-06-03  8:45 UTC (permalink / raw)
  To: Eduardo Habkost, Claudio Fontana
  Cc: Peter Maydell, Paolo Bonzini, Siddharth Chandrasekaran,
	Philippe Mathieu-Daudé,
	Michael S . Tsirkin, kvm, Marcelo Tosatti, Richard Henderson,
	Cameron Esfahani, Roman Bolshakov, qemu-devel

Eduardo Habkost <ehabkost@redhat.com> writes:

>> 
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 9e211ac2ce..6bcb7dbc2c 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>      Error *local_err = NULL;
>>      static bool ht_warned;
>>  
>> -    /* Process Hyper-V enlightenments */
>> -    x86_cpu_hyperv_realize(cpu);
>
> Vitaly, is this reordering going to affect the Hyper-V cleanup
> work you are doing?  It seems harmless and it makes sense to keep
> the "realize" functions close together, but I'd like to confirm.
>

Currently, x86_cpu_hyperv_realize() is designed to run before
kvm_hyperv_expand_features() (and thus x86_cpu_expand_features()):
x86_cpu_hyperv_realize() sets some default values to
cpu->hyperv_vendor/hyperv_interface_id/hyperv_version_id... but in
'hv-passthrough' mode these are going to be overwritten by KVM's values.

By changing the ordering, this patch changes the logic so QEMU's default
values will always be used, even in 'hv-passthrough' mode. This is
undesireable. I'd suggest we keep x86_cpu_hyperv_realize() call where it
is now, I'll think about possible cleanup later (when both this patch
and the rest of my cleanup lands).

-- 
Vitaly


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
@ 2021-06-03  8:45       ` Vitaly Kuznetsov
  0 siblings, 0 replies; 18+ messages in thread
From: Vitaly Kuznetsov @ 2021-06-03  8:45 UTC (permalink / raw)
  To: Eduardo Habkost, Claudio Fontana
  Cc: Peter Maydell, Marcelo Tosatti, kvm, Siddharth Chandrasekaran,
	Michael S . Tsirkin, Richard Henderson, qemu-devel,
	Cameron Esfahani, Roman Bolshakov, Paolo Bonzini,
	Philippe Mathieu-Daudé

Eduardo Habkost <ehabkost@redhat.com> writes:

>> 
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 9e211ac2ce..6bcb7dbc2c 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>      Error *local_err = NULL;
>>      static bool ht_warned;
>>  
>> -    /* Process Hyper-V enlightenments */
>> -    x86_cpu_hyperv_realize(cpu);
>
> Vitaly, is this reordering going to affect the Hyper-V cleanup
> work you are doing?  It seems harmless and it makes sense to keep
> the "realize" functions close together, but I'd like to confirm.
>

Currently, x86_cpu_hyperv_realize() is designed to run before
kvm_hyperv_expand_features() (and thus x86_cpu_expand_features()):
x86_cpu_hyperv_realize() sets some default values to
cpu->hyperv_vendor/hyperv_interface_id/hyperv_version_id... but in
'hv-passthrough' mode these are going to be overwritten by KVM's values.

By changing the ordering, this patch changes the logic so QEMU's default
values will always be used, even in 'hv-passthrough' mode. This is
undesireable. I'd suggest we keep x86_cpu_hyperv_realize() call where it
is now, I'll think about possible cleanup later (when both this patch
and the rest of my cleanup lands).

-- 
Vitaly



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
  2021-06-03  8:13       ` Claudio Fontana
@ 2021-06-03 18:16         ` Eduardo Habkost
  -1 siblings, 0 replies; 18+ messages in thread
From: Eduardo Habkost @ 2021-06-03 18:16 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Peter Maydell, Paolo Bonzini, Siddharth Chandrasekaran,
	Philippe Mathieu-Daudé,
	Michael S . Tsirkin, kvm, Marcelo Tosatti, Richard Henderson,
	Cameron Esfahani, Roman Bolshakov, qemu-devel, Vitaly Kuznetsov

On Thu, Jun 03, 2021 at 10:13:30AM +0200, Claudio Fontana wrote:
> On 6/1/21 8:48 PM, Eduardo Habkost wrote:
> > +Vitaly
> > 
> > On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote:
> >> we need to expand features first, before we attempt to check them
> >> in the accel realizefn code called by cpu_exec_realizefn().
> >>
> >> At the same time we need checks for code_urev and host_cpuid_required,
> >> and modifications to cpu->mwait to happen after the initial setting
> >> of them inside the accel realizefn code.
> > 
> > I miss an explanation why those 3 steps need to happen after
> > accel realizefn.
> > 
> > I'm worried by the fragility of the ordering.  If there are
> > specific things that must be done before/after
> > cpu_exec_realizefn(), this needs to be clear in the code.
> 
> Hi Eduardo,
> 
> indeed the initialization and realization code for x86 appears to be very sensitive to ordering.
> This continues to be the case after the fix.
> 
> cpu_exec_realizefn
> 
> 
> 
> > 
> >>
> >> Partial Fix.
> >>
> >> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> > 
> > Shouldn't this be
> >   f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> > ?
> > 
> > Also, it looks like part of the ordering change was made in
> > commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in
> > cpu_exec_realizefn").
> > 
> > 
> > 
> >> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >> ---
> >>  target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
> >>  1 file changed, 28 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index 9e211ac2ce..6bcb7dbc2c 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>      Error *local_err = NULL;
> >>      static bool ht_warned;
> >>  
> >> -    /* Process Hyper-V enlightenments */
> >> -    x86_cpu_hyperv_realize(cpu);
> > 
> > Vitaly, is this reordering going to affect the Hyper-V cleanup
> > work you are doing?  It seems harmless and it makes sense to keep
> > the "realize" functions close together, but I'd like to confirm.
> > 
> >> -
> >> -    cpu_exec_realizefn(cs, &local_err);
> >> -    if (local_err != NULL) {
> >> -        error_propagate(errp, local_err);
> >> -        return;
> >> -    }
> >> -
> >> -    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> >> -        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> >> -        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> >> -        goto out;
> >> -    }
> >> -
> >> -    if (cpu->ucode_rev == 0) {
> >> -        /* The default is the same as KVM's.  */
> >> -        if (IS_AMD_CPU(env)) {
> >> -            cpu->ucode_rev = 0x01000065;
> >> -        } else {
> >> -            cpu->ucode_rev = 0x100000000ULL;
> >> -        }
> >> -    }
> >> -
> >> -    /* mwait extended info: needed for Core compatibility */
> >> -    /* We always wake on interrupt even if host does not have the capability */
> >> -    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> -
> >>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
> >>          error_setg(errp, "apic-id property was not initialized properly");
> >>          return;
> >> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>             & CPUID_EXT2_AMD_ALIASES);
> >>      }
> >>  
> >> +    /* Process Hyper-V enlightenments */
> >> +    x86_cpu_hyperv_realize(cpu);
> >> +
> >> +    cpu_exec_realizefn(cs, &local_err);
> > 
> > I'm worried by the reordering of cpu_exec_realizefn().  That
> > function does a lot, and reordering it might have even more
> > unwanted side effects.
> > 
> > I wonder if it wouldn't be easier to revert commit 30565f10e907
> > ("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn").
> 
> the part that is wrong in that commit does not I think have to do with where the accel_cpu::cpu_realizefn() is called, but rather
> the move of the call to cpu_exec_realizefn (now including the accel_cpu::cpu_realizefn) to the very beginning of the function.

Oh, I didn't notice commit 30565f10e907 also moved the
cpu_exec_realizefn() call to the beginning of the function.  So
moving it back (like you do her) actually seems to be a good
idea.

> 
> This was done due to the fact that the accel-specific code needs to be called before the code:
> 
> * if (cpu->ucode_rev == 0) {
> 
> (meaning "use default if nothing accelerator specific has been set"), as this could be set by accel-specific code,
> 
> * cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> 
> (as the mwait is written to by cpu "host" kvm/hvf-specific code when enabling cpu pm),
> 
> * if (cpu->phys_bits && ...
> 
> (as the phys bits can be set by calling the host CPUID)
> 
> But I missed there that we cannot move the call before the expansion of cpu features,
> as the accel realize code checks and enables additional features assuming expansion has already happened.
> 
> This was what was breaking the cpu "host" phys bits, even after correcting the cpu instance initialization order,
> as the KVM/HVF -specific code would do the adjustment of phys bits to match the host only if:
> 
> * if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> 
> 
> 
> > 
> > 
> >> +    if (local_err != NULL) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >> +    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> >> +        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> >> +        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> >> +        goto out;
> >> +    }
> >> +
> >> +    if (cpu->ucode_rev == 0) {
> >> +        /* The default is the same as KVM's.  */
> >> +        if (IS_AMD_CPU(env)) {
> >> +            cpu->ucode_rev = 0x01000065;
> >> +        } else {
> >> +            cpu->ucode_rev = 0x100000000ULL;
> >> +        }
> >> +    }
> >> +
> >> +    /* mwait extended info: needed for Core compatibility */
> >> +    /* We always wake on interrupt even if host does not have the capability */
> >> +    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> +
> > 
> > The dependency between those lines and cpu_exec_realizefn() is
> > completely unclear here.  What can we do to make this clearer and
> > less fragile?
> 
> Should I add something similar to my comment above?
> 
> There _is_ something already in the patch that I added as I detected these dependencies,
> but I notably did not mention the mwait one, and missed completely the cpu expansion issue.
> 
> It's in kvm-cpu.c:
> 
> static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
> {
>     /*                                                                                                                                      
>      * The realize order is important, since x86_cpu_realize() checks if                                                                    
>      * nothing else has been set by the user (or by accelerators) in                                                                        
>      * cpu->ucode_rev and cpu->phys_bits.                                                                                                   
>      *                                                                                                                                      
>      * realize order:                                                                                                                       
>      * kvm_cpu -> host_cpu -> x86_cpu                                                                                                       
>      */
> 
> Maybe there is a better place to document this, where we could also describe in more detail the other dependencies?

I would describe it in (at least) two places:

1. Documentation of AccelCPUClass.cpu_realizefn() should indicate
   what is allowed and not allowed for people implementing
   accelerators.
2. Comments at x86_cpu_realizefn() indicating the dependencies
   and why ordering is important.

> 
> On my side the main question would be: did I miss some other order dependency?
> 
> Knowing exactly what the current code assumptions are, and where those dependencies lie
> would be preferable I think compared with reverting the commits.

Absolutely.  I was just trying to figure out what's the simplest
and most trivial fix possible for the issue.

> 
> Something able to cover this with reliable tests would be a good way to feel more confident with the resolution,
> to make sure that something else is not hiding in there..

Yes, this kind of bug should have been caught by automated test
cases somehow.

> 
> > 
> > Note that this is not a comment on this fix, specifically, but on
> > how the initialization ordering is easy to break here.
> > 
> > 
> >>      /* For 64bit systems think about the number of physical bits to present.
> >>       * ideally this should be the same as the host; anything other than matching
> >>       * the host can cause incorrect guest behaviour.
> >> -- 
> >> 2.26.2
> >>
> > 
> 
> Thanks,
> 
> Claudio
> 

-- 
Eduardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn
@ 2021-06-03 18:16         ` Eduardo Habkost
  0 siblings, 0 replies; 18+ messages in thread
From: Eduardo Habkost @ 2021-06-03 18:16 UTC (permalink / raw)
  To: Claudio Fontana
  Cc: Peter Maydell, Marcelo Tosatti, kvm, Siddharth Chandrasekaran,
	Michael S . Tsirkin, Richard Henderson, qemu-devel,
	Cameron Esfahani, Roman Bolshakov, Paolo Bonzini,
	Vitaly Kuznetsov, Philippe Mathieu-Daudé

On Thu, Jun 03, 2021 at 10:13:30AM +0200, Claudio Fontana wrote:
> On 6/1/21 8:48 PM, Eduardo Habkost wrote:
> > +Vitaly
> > 
> > On Sat, May 29, 2021 at 11:13:12AM +0200, Claudio Fontana wrote:
> >> we need to expand features first, before we attempt to check them
> >> in the accel realizefn code called by cpu_exec_realizefn().
> >>
> >> At the same time we need checks for code_urev and host_cpuid_required,
> >> and modifications to cpu->mwait to happen after the initial setting
> >> of them inside the accel realizefn code.
> > 
> > I miss an explanation why those 3 steps need to happen after
> > accel realizefn.
> > 
> > I'm worried by the fragility of the ordering.  If there are
> > specific things that must be done before/after
> > cpu_exec_realizefn(), this needs to be clear in the code.
> 
> Hi Eduardo,
> 
> indeed the initialization and realization code for x86 appears to be very sensitive to ordering.
> This continues to be the case after the fix.
> 
> cpu_exec_realizefn
> 
> 
> 
> > 
> >>
> >> Partial Fix.
> >>
> >> Fixes: 48afe6e4eabf ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> > 
> > Shouldn't this be
> >   f5cc5a5c1686 ("i386: split cpu accelerators from cpu.c, using AccelCPUClass")
> > ?
> > 
> > Also, it looks like part of the ordering change was made in
> > commit 30565f10e907 ("cpu: call AccelCPUClass::cpu_realizefn in
> > cpu_exec_realizefn").
> > 
> > 
> > 
> >> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >> ---
> >>  target/i386/cpu.c | 56 +++++++++++++++++++++++------------------------
> >>  1 file changed, 28 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index 9e211ac2ce..6bcb7dbc2c 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -6133,34 +6133,6 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>      Error *local_err = NULL;
> >>      static bool ht_warned;
> >>  
> >> -    /* Process Hyper-V enlightenments */
> >> -    x86_cpu_hyperv_realize(cpu);
> > 
> > Vitaly, is this reordering going to affect the Hyper-V cleanup
> > work you are doing?  It seems harmless and it makes sense to keep
> > the "realize" functions close together, but I'd like to confirm.
> > 
> >> -
> >> -    cpu_exec_realizefn(cs, &local_err);
> >> -    if (local_err != NULL) {
> >> -        error_propagate(errp, local_err);
> >> -        return;
> >> -    }
> >> -
> >> -    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> >> -        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> >> -        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> >> -        goto out;
> >> -    }
> >> -
> >> -    if (cpu->ucode_rev == 0) {
> >> -        /* The default is the same as KVM's.  */
> >> -        if (IS_AMD_CPU(env)) {
> >> -            cpu->ucode_rev = 0x01000065;
> >> -        } else {
> >> -            cpu->ucode_rev = 0x100000000ULL;
> >> -        }
> >> -    }
> >> -
> >> -    /* mwait extended info: needed for Core compatibility */
> >> -    /* We always wake on interrupt even if host does not have the capability */
> >> -    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> -
> >>      if (cpu->apic_id == UNASSIGNED_APIC_ID) {
> >>          error_setg(errp, "apic-id property was not initialized properly");
> >>          return;
> >> @@ -6190,6 +6162,34 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>             & CPUID_EXT2_AMD_ALIASES);
> >>      }
> >>  
> >> +    /* Process Hyper-V enlightenments */
> >> +    x86_cpu_hyperv_realize(cpu);
> >> +
> >> +    cpu_exec_realizefn(cs, &local_err);
> > 
> > I'm worried by the reordering of cpu_exec_realizefn().  That
> > function does a lot, and reordering it might have even more
> > unwanted side effects.
> > 
> > I wonder if it wouldn't be easier to revert commit 30565f10e907
> > ("cpu: call AccelCPUClass::cpu_realizefn in cpu_exec_realizefn").
> 
> the part that is wrong in that commit does not I think have to do with where the accel_cpu::cpu_realizefn() is called, but rather
> the move of the call to cpu_exec_realizefn (now including the accel_cpu::cpu_realizefn) to the very beginning of the function.

Oh, I didn't notice commit 30565f10e907 also moved the
cpu_exec_realizefn() call to the beginning of the function.  So
moving it back (like you do her) actually seems to be a good
idea.

> 
> This was done due to the fact that the accel-specific code needs to be called before the code:
> 
> * if (cpu->ucode_rev == 0) {
> 
> (meaning "use default if nothing accelerator specific has been set"), as this could be set by accel-specific code,
> 
> * cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> 
> (as the mwait is written to by cpu "host" kvm/hvf-specific code when enabling cpu pm),
> 
> * if (cpu->phys_bits && ...
> 
> (as the phys bits can be set by calling the host CPUID)
> 
> But I missed there that we cannot move the call before the expansion of cpu features,
> as the accel realize code checks and enables additional features assuming expansion has already happened.
> 
> This was what was breaking the cpu "host" phys bits, even after correcting the cpu instance initialization order,
> as the KVM/HVF -specific code would do the adjustment of phys bits to match the host only if:
> 
> * if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
> 
> 
> 
> > 
> > 
> >> +    if (local_err != NULL) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >> +    if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
> >> +        g_autofree char *name = x86_cpu_class_get_model_name(xcc);
> >> +        error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
> >> +        goto out;
> >> +    }
> >> +
> >> +    if (cpu->ucode_rev == 0) {
> >> +        /* The default is the same as KVM's.  */
> >> +        if (IS_AMD_CPU(env)) {
> >> +            cpu->ucode_rev = 0x01000065;
> >> +        } else {
> >> +            cpu->ucode_rev = 0x100000000ULL;
> >> +        }
> >> +    }
> >> +
> >> +    /* mwait extended info: needed for Core compatibility */
> >> +    /* We always wake on interrupt even if host does not have the capability */
> >> +    cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> +
> > 
> > The dependency between those lines and cpu_exec_realizefn() is
> > completely unclear here.  What can we do to make this clearer and
> > less fragile?
> 
> Should I add something similar to my comment above?
> 
> There _is_ something already in the patch that I added as I detected these dependencies,
> but I notably did not mention the mwait one, and missed completely the cpu expansion issue.
> 
> It's in kvm-cpu.c:
> 
> static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
> {
>     /*                                                                                                                                      
>      * The realize order is important, since x86_cpu_realize() checks if                                                                    
>      * nothing else has been set by the user (or by accelerators) in                                                                        
>      * cpu->ucode_rev and cpu->phys_bits.                                                                                                   
>      *                                                                                                                                      
>      * realize order:                                                                                                                       
>      * kvm_cpu -> host_cpu -> x86_cpu                                                                                                       
>      */
> 
> Maybe there is a better place to document this, where we could also describe in more detail the other dependencies?

I would describe it in (at least) two places:

1. Documentation of AccelCPUClass.cpu_realizefn() should indicate
   what is allowed and not allowed for people implementing
   accelerators.
2. Comments at x86_cpu_realizefn() indicating the dependencies
   and why ordering is important.

> 
> On my side the main question would be: did I miss some other order dependency?
> 
> Knowing exactly what the current code assumptions are, and where those dependencies lie
> would be preferable I think compared with reverting the commits.

Absolutely.  I was just trying to figure out what's the simplest
and most trivial fix possible for the issue.

> 
> Something able to cover this with reliable tests would be a good way to feel more confident with the resolution,
> to make sure that something else is not hiding in there..

Yes, this kind of bug should have been caught by automated test
cases somehow.

> 
> > 
> > Note that this is not a comment on this fix, specifically, but on
> > how the initialization ordering is easy to break here.
> > 
> > 
> >>      /* For 64bit systems think about the number of physical bits to present.
> >>       * ideally this should be the same as the host; anything other than matching
> >>       * the host can cause incorrect guest behaviour.
> >> -- 
> >> 2.26.2
> >>
> > 
> 
> Thanks,
> 
> Claudio
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-06-03 18:17 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-29  9:13 [PATCH 0/2] Fixes for broken commit 48afe6e4eabf, Windows fails to boot Claudio Fontana
2021-05-29  9:13 ` Claudio Fontana
2021-05-29  9:13 ` [PATCH 1/2] i386: reorder call to cpu_exec_realizefn in x86_cpu_realizefn Claudio Fontana
2021-05-29  9:13   ` Claudio Fontana
2021-06-01 18:48   ` Eduardo Habkost
2021-06-01 18:48     ` Eduardo Habkost
2021-06-03  8:13     ` Claudio Fontana
2021-06-03  8:13       ` Claudio Fontana
2021-06-03 18:16       ` Eduardo Habkost
2021-06-03 18:16         ` Eduardo Habkost
2021-06-03  8:45     ` Vitaly Kuznetsov
2021-06-03  8:45       ` Vitaly Kuznetsov
2021-05-29  9:13 ` [PATCH 2/2] i386: run accel_cpu_instance_init as instance_post_init Claudio Fontana
2021-05-29  9:13   ` Claudio Fontana
2021-06-01 18:59   ` Eduardo Habkost
2021-06-01 18:59     ` Eduardo Habkost
2021-06-03  7:38     ` Claudio Fontana
2021-06-03  7:38       ` Claudio Fontana

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.