All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1] x86/cpu: initialize the CPU concurrently
@ 2020-11-25 11:54 Zhenyu Ye
  2020-12-18 17:17 ` Igor Mammedov
  2020-12-18 18:47 ` Eduardo Habkost
  0 siblings, 2 replies; 5+ messages in thread
From: Zhenyu Ye @ 2020-11-25 11:54 UTC (permalink / raw)
  To: S. Tsirkin, Michael, marcel.apfelbaum, Paolo Bonzini,
	richard.henderson, ehabkost
  Cc: yebiaoxiang, qemu-devel, Xiexiangyou, yezhenyu2

From 0b4318c9dbf6fa152ec14eab29837ea06e2d78e5 Mon Sep 17 00:00:00 2001
From: eillon <yezhenyu2@huawei.com>
Date: Wed, 25 Nov 2020 19:17:03 +0800
Subject: [PATCH] x86/cpu: initialize the CPU concurrently

Currently we initialize cpu one by one in qemu_init_vcpu(), every cpu
should have to wait util cpu->created = true.  When cpus_accel->create_vcpu_thread
costs too long time or the number of CPUs is too large, this will prolong
the boot time.

This patch initializes the CPU concurrently.

Signed-off-by: eillon <yezhenyu2@huawei.com>
---
 hw/i386/x86.c         | 7 +++++--
 include/hw/core/cpu.h | 3 +++
 include/hw/i386/x86.h | 3 ++-
 softmmu/cpus.c        | 9 +++++++--
 4 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 5944fc44ed..a98f68b220 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -101,9 +101,11 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
 }


-void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
+void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id,
+                 bool last_cpu, Error **errp)
 {
     Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
+    ((CPUState *)cpu)->last_cpu = last_cpu;

     if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
         goto out;
@@ -135,7 +137,8 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
                                                       ms->smp.max_cpus - 1) + 1;
     possible_cpus = mc->possible_cpu_arch_ids(ms);
     for (i = 0; i < ms->smp.cpus; i++) {
-        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
+        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
+                    i == ms->smp.cpus - 1, &error_fatal);
     }
 }

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 3d92c967ff..b7e05e2d58 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -467,6 +467,9 @@ struct CPUState {

     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
+
+    /* The last cpu to init. */
+    bool last_cpu;
 };

 typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 739fac5087..0f7a6e5d16 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -84,7 +84,8 @@ void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
 uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
                                     unsigned int cpu_index);

-void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
+void x86_cpu_new(X86MachineState *pcms, int64_t apic_id,
+                 bool last_cpu, Error **errp);
 void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
 CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
                                              unsigned cpu_index);
diff --git a/softmmu/cpus.c b/softmmu/cpus.c
index e46ac68ad0..5a8eae28ab 100644
--- a/softmmu/cpus.c
+++ b/softmmu/cpus.c
@@ -621,8 +621,13 @@ void qemu_init_vcpu(CPUState *cpu)
     g_assert(cpus_accel != NULL && cpus_accel->create_vcpu_thread != NULL);
     cpus_accel->create_vcpu_thread(cpu);

-    while (!cpu->created) {
-        qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
+    if (cpu->last_cpu) {
+        CPUState *cs = first_cpu;
+        CPU_FOREACH(cs) {
+            while (!cs->created) {
+                qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
+            }
+        }
     }
 }

-- 
2.22.0.windows.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] x86/cpu: initialize the CPU concurrently
  2020-11-25 11:54 [RFC PATCH v1] x86/cpu: initialize the CPU concurrently Zhenyu Ye
@ 2020-12-18 17:17 ` Igor Mammedov
  2020-12-21 11:26   ` Zhenyu Ye
  2020-12-18 18:47 ` Eduardo Habkost
  1 sibling, 1 reply; 5+ messages in thread
From: Igor Mammedov @ 2020-12-18 17:17 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: ehabkost, S. Tsirkin, Michael, richard.henderson, qemu-devel,
	Xiexiangyou, yebiaoxiang, Paolo Bonzini

On Wed, 25 Nov 2020 19:54:17 +0800
Zhenyu Ye <yezhenyu2@huawei.com> wrote:

> From 0b4318c9dbf6fa152ec14eab29837ea06e2d78e5 Mon Sep 17 00:00:00 2001
> From: eillon <yezhenyu2@huawei.com>
> Date: Wed, 25 Nov 2020 19:17:03 +0800
> Subject: [PATCH] x86/cpu: initialize the CPU concurrently
> 
> Currently we initialize cpu one by one in qemu_init_vcpu(), every cpu
> should have to wait util cpu->created = true.  When cpus_accel->create_vcpu_thread
> costs too long time or the number of CPUs is too large, this will prolong
> the boot time.
> 
> This patch initializes the CPU concurrently.

could you benchmark and share results for before an after this patch?
(It would be even better to add it as part of commit message)

> 
> Signed-off-by: eillon <yezhenyu2@huawei.com>
> ---
>  hw/i386/x86.c         | 7 +++++--
>  include/hw/core/cpu.h | 3 +++
>  include/hw/i386/x86.h | 3 ++-
>  softmmu/cpus.c        | 9 +++++++--
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 5944fc44ed..a98f68b220 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -101,9 +101,11 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
>  }
> 
> 
> -void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
> +void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id,
> +                 bool last_cpu, Error **errp)
>  {
>      Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
> +    ((CPUState *)cpu)->last_cpu = last_cpu;
> 
>      if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
>          goto out;
> @@ -135,7 +137,8 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>                                                        ms->smp.max_cpus - 1) + 1;
>      possible_cpus = mc->possible_cpu_arch_ids(ms);
>      for (i = 0; i < ms->smp.cpus; i++) {
> -        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
> +        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
> +                    i == ms->smp.cpus - 1, &error_fatal);
>      }
>  }
> 
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 3d92c967ff..b7e05e2d58 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -467,6 +467,9 @@ struct CPUState {
> 
>      /* track IOMMUs whose translations we've cached in the TCG TLB */
>      GArray *iommu_notifiers;
> +
> +    /* The last cpu to init. */
> +    bool last_cpu;
>  };
> 
>  typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index 739fac5087..0f7a6e5d16 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -84,7 +84,8 @@ void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
>  uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
>                                      unsigned int cpu_index);
> 
> -void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
> +void x86_cpu_new(X86MachineState *pcms, int64_t apic_id,
> +                 bool last_cpu, Error **errp);
>  void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
>  CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
>                                               unsigned cpu_index);
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index e46ac68ad0..5a8eae28ab 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -621,8 +621,13 @@ void qemu_init_vcpu(CPUState *cpu)
>      g_assert(cpus_accel != NULL && cpus_accel->create_vcpu_thread != NULL);
>      cpus_accel->create_vcpu_thread(cpu);
> 
> -    while (!cpu->created) {
> -        qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
> +    if (cpu->last_cpu) {
> +        CPUState *cs = first_cpu;
> +        CPU_FOREACH(cs) {
> +            while (!cs->created) {
> +                qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
> +            }
> +        }
>      }
>  }
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] x86/cpu: initialize the CPU concurrently
  2020-11-25 11:54 [RFC PATCH v1] x86/cpu: initialize the CPU concurrently Zhenyu Ye
  2020-12-18 17:17 ` Igor Mammedov
@ 2020-12-18 18:47 ` Eduardo Habkost
  2020-12-21 11:23   ` Zhenyu Ye
  1 sibling, 1 reply; 5+ messages in thread
From: Eduardo Habkost @ 2020-12-18 18:47 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: S. Tsirkin, Michael, richard.henderson, qemu-devel, Xiexiangyou,
	yebiaoxiang, Paolo Bonzini

Hi,

Thanks for the patch, and sorry for taking so long to look at it.

On Wed, Nov 25, 2020 at 07:54:17PM +0800, Zhenyu Ye wrote:
> From 0b4318c9dbf6fa152ec14eab29837ea06e2d78e5 Mon Sep 17 00:00:00 2001
> From: eillon <yezhenyu2@huawei.com>
> Date: Wed, 25 Nov 2020 19:17:03 +0800
> Subject: [PATCH] x86/cpu: initialize the CPU concurrently
> 
> Currently we initialize cpu one by one in qemu_init_vcpu(), every cpu
> should have to wait util cpu->created = true.  When cpus_accel->create_vcpu_thread
> costs too long time or the number of CPUs is too large, this will prolong
> the boot time.
> 

How long was boot time before and after the patch?

> This patch initializes the CPU concurrently.
> 
> Signed-off-by: eillon <yezhenyu2@huawei.com>
> ---
>  hw/i386/x86.c         | 7 +++++--
>  include/hw/core/cpu.h | 3 +++
>  include/hw/i386/x86.h | 3 ++-
>  softmmu/cpus.c        | 9 +++++++--
>  4 files changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/i386/x86.c b/hw/i386/x86.c
> index 5944fc44ed..a98f68b220 100644
> --- a/hw/i386/x86.c
> +++ b/hw/i386/x86.c
> @@ -101,9 +101,11 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
>  }
> 
> 
> -void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
> +void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id,
> +                 bool last_cpu, Error **errp)
>  {
>      Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
> +    ((CPUState *)cpu)->last_cpu = last_cpu;

Please use the CPU() macro instead of direct casts.  Preferably
with a separate variable.  e.g.:

  Object *obj = object_new(MACHINE(x86ms)->cpu_type);
  CPUState *cpu = CPU(obj);

> 
>      if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
>          goto out;
> @@ -135,7 +137,8 @@ void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
>                                                        ms->smp.max_cpus - 1) + 1;
>      possible_cpus = mc->possible_cpu_arch_ids(ms);
>      for (i = 0; i < ms->smp.cpus; i++) {
> -        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, &error_fatal);
> +        x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
> +                    i == ms->smp.cpus - 1, &error_fatal);
>      }
>  }
> 
> diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
> index 3d92c967ff..b7e05e2d58 100644
> --- a/include/hw/core/cpu.h
> +++ b/include/hw/core/cpu.h
> @@ -467,6 +467,9 @@ struct CPUState {
> 
>      /* track IOMMUs whose translations we've cached in the TCG TLB */
>      GArray *iommu_notifiers;
> +
> +    /* The last cpu to init. */
> +    bool last_cpu;
>  };
> 
>  typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
> diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
> index 739fac5087..0f7a6e5d16 100644
> --- a/include/hw/i386/x86.h
> +++ b/include/hw/i386/x86.h
> @@ -84,7 +84,8 @@ void init_topo_info(X86CPUTopoInfo *topo_info, const X86MachineState *x86ms);
>  uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
>                                      unsigned int cpu_index);
> 
> -void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
> +void x86_cpu_new(X86MachineState *pcms, int64_t apic_id,
> +                 bool last_cpu, Error **errp);
>  void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
>  CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
>                                               unsigned cpu_index);
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index e46ac68ad0..5a8eae28ab 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -621,8 +621,13 @@ void qemu_init_vcpu(CPUState *cpu)
>      g_assert(cpus_accel != NULL && cpus_accel->create_vcpu_thread != NULL);
>      cpus_accel->create_vcpu_thread(cpu);
> 
> -    while (!cpu->created) {
> -        qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
> +    if (cpu->last_cpu) {

What about all the other architectures that don't set
last_cpu=true?  They will never wait for the CPU to be created.

> +        CPUState *cs = first_cpu;
> +        CPU_FOREACH(cs) {
> +            while (!cs->created) {
> +                qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
> +            }
> +        }

I suggest doing this "wait for all CPUs" step outside qemu_init_vcpu().

What about not making the last CPU special, and just providing a
optional mechanism to wait for all VCPU threads after the CPU
objects were created?  e.g.:

  struct CPUState {
      ...
      /*
       * If true, qemu_init_vcpu() will not wait for the VCPU thread to be created
       * before returning.
       */
      bool async_init;
  };

  void qemu_wait_vcpu_thread_init(CPUState *cpu)
  {
      while (!cpu->created) {
          qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
      }
  }

  void qemu_wait_all_vcpu_threads_init(void)
  {
      CPU_FOREACH(cpu) {
          qemu_wait_vcpu_thread_init(cpu);
      }
  }

  void qemu_init_vcpu(CPUState *cpu)
  {
      ...
      if (!cpu->async_init) {
          qemu_wait_vcpu_thread_init(cpu);
      }
  }

  void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, bool async, Error **errp)
  {
      ...
      cpu->async_init = async;
      qdev_realize(DEVICE(cpu), NULL, errp);
      ...
  }

  void x86_cpus_init(...)
  {
      ...
      for (i = 0; i < ms->smp.cpus; i++) {
          x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, true, &error_fatal);
      }
      qemu_wait_all_vcpu_threads_init();
  }

>      }
>  }
> 
> -- 
> 2.22.0.windows.1
> 

-- 
Eduardo



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] x86/cpu: initialize the CPU concurrently
  2020-12-18 18:47 ` Eduardo Habkost
@ 2020-12-21 11:23   ` Zhenyu Ye
  0 siblings, 0 replies; 5+ messages in thread
From: Zhenyu Ye @ 2020-12-21 11:23 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: S. Tsirkin, Michael, richard.henderson, qemu-devel, Xiexiangyou,
	yebiaoxiang, Paolo Bonzini

Hi Eduardo,

Thanks for your review.

On 2020/12/19 2:47, Eduardo Habkost wrote:
> On Wed, Nov 25, 2020 at 07:54:17PM +0800, Zhenyu Ye wrote:
>> From 0b4318c9dbf6fa152ec14eab29837ea06e2d78e5 Mon Sep 17 00:00:00 2001
>> From: eillon <yezhenyu2@huawei.com>
>> Date: Wed, 25 Nov 2020 19:17:03 +0800
>> Subject: [PATCH] x86/cpu: initialize the CPU concurrently
>>
>> Currently we initialize cpu one by one in qemu_init_vcpu(), every cpu
>> should have to wait util cpu->created = true.  When cpus_accel->create_vcpu_thread
>> costs too long time or the number of CPUs is too large, this will prolong
>> the boot time.
>>
> 
> How long was boot time before and after the patch?
> 

When using haxm as the accelerator on windows, it takes at least
200ms to initialize one cpu.  For a 4-core VM, it takes:

	before		800ms -- 1000ms
	after		200ms

Information about the processor on the host:

	Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz


> 
> I suggest doing this "wait for all CPUs" step outside qemu_init_vcpu().
> 
> What about not making the last CPU special, and just providing a
> optional mechanism to wait for all VCPU threads after the CPU
> objects were created?  e.g.:
> 

Thanks for your suggestion and I will send PATCH v2 soon.

Thanks,
Zhenyu


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH v1] x86/cpu: initialize the CPU concurrently
  2020-12-18 17:17 ` Igor Mammedov
@ 2020-12-21 11:26   ` Zhenyu Ye
  0 siblings, 0 replies; 5+ messages in thread
From: Zhenyu Ye @ 2020-12-21 11:26 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: ehabkost, S. Tsirkin, Michael, richard.henderson, qemu-devel,
	Xiexiangyou, yebiaoxiang, Paolo Bonzini

Hi Igor Mammedov,

Thanks for your review.

On 2020/12/19 1:17, Igor Mammedov wrote:
> On Wed, 25 Nov 2020 19:54:17 +0800
> Zhenyu Ye <yezhenyu2@huawei.com> wrote:
> 
>> From 0b4318c9dbf6fa152ec14eab29837ea06e2d78e5 Mon Sep 17 00:00:00 2001
>> From: eillon <yezhenyu2@huawei.com>
>> Date: Wed, 25 Nov 2020 19:17:03 +0800
>> Subject: [PATCH] x86/cpu: initialize the CPU concurrently
>>
>> Currently we initialize cpu one by one in qemu_init_vcpu(), every cpu
>> should have to wait util cpu->created = true.  When cpus_accel->create_vcpu_thread
>> costs too long time or the number of CPUs is too large, this will prolong
>> the boot time.
>>
>> This patch initializes the CPU concurrently.
> 
> could you benchmark and share results for before an after this patch?
> (It would be even better to add it as part of commit message)
> 

When using haxm as the accelerator on windows, it takes at least
200ms to initialize one cpu.  For a 4-core VM, it takes:

	before		800ms -- 1000ms
	after		200ms

Information about the processor on the host:

	Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz

See another email(to Eduardo) for more information.

Thanks,
Zhenyu


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-21 11:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-25 11:54 [RFC PATCH v1] x86/cpu: initialize the CPU concurrently Zhenyu Ye
2020-12-18 17:17 ` Igor Mammedov
2020-12-21 11:26   ` Zhenyu Ye
2020-12-18 18:47 ` Eduardo Habkost
2020-12-21 11:23   ` Zhenyu Ye

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.