All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] hostmem: change default prealloc threads number
       [not found] <20210927131951.1810823-1-user@n248-145-203>
@ 2021-09-27 15:16 ` David Hildenbrand
  2021-09-27 16:47   ` [External] " Nan Wang
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2021-09-27 15:16 UTC (permalink / raw)
  To: Nan Wang, imammedo; +Cc: mikughoull, ehabkost, qemu-devel

On 27.09.21 15:19, Nan Wang wrote:
> From: "wangnan.light" <wangnan.light@bytedance.com>
> 
> the default number of prealloc threads is 1, for huge memory backend
> file, single thread touch page is really slow.
> We can adjust thread number by prealloc-threads property, but if the
> default value updated to MachineState::smp::cpus may be better.
> For example, old version of qemu(prealloc-threads have not been
> introduced yet), the value of threads num is MachineState::smp::cpus,
> so if someone use the same commandline to start current verion of qemu
> and old version of qemu which will lead to different behaviors.

The introducing patch mentions:

commit ffac16fab33bb42f17e47624985220c1fd864e9d
Author: Igor Mammedov <imammedo@redhat.com>
Date:   Wed Feb 19 11:09:50 2020 -0500

     hostmem: introduce "prealloc-threads" property

     the property will allow user to specify number of threads to use
     in pre-allocation stage. It also will allow to reduce implicit
     hostmem dependency on current_machine.
     On object creation it will default to 1, but via machine
     compat property it will be updated to MachineState::smp::cpus
     to keep current behavior for hostmem and main RAM (which is
     now also hostmem based).

So it looks like we want to do the latter via compat properties eventually.

However, I'd like to note that more prealloc threads might be good for 
large backends, and might be bad for small backends. To me, it feels 
like a workload that relies on this should really do this manually. So I 
am still not sure if this is the right thing to do.

Note that qapi/qom.json:

"@prealloc-threads: number of CPU threads to use for prealloc (default: 
1", so that doc would be wrong now.

Why exactly can't workload that cares not simply set this manually? 
Performance tuning smells like something to be done manually for a 
specific workload.

> 
> Signed-off-by: wangnan.light <wangnan.light@bytedance.com>
> ---
>   backends/hostmem.c | 2 +-
>   hw/core/machine.c  | 5 +++++
>   2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/backends/hostmem.c b/backends/hostmem.c
> index 4c05862ed5..c4a249b7e6 100644
> --- a/backends/hostmem.c
> +++ b/backends/hostmem.c
> @@ -273,7 +273,7 @@ static void host_memory_backend_init(Object *obj)
>       backend->merge = machine_mem_merge(machine);
>       backend->dump = machine_dump_guest_core(machine);
>       backend->reserve = true;
> -    backend->prealloc_threads = 1;
> +    backend->prealloc_threads = machine_smp_cpus(machine);
>   }
>   
>   static void host_memory_backend_post_init(Object *obj)
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 067f42b528..95ba5b1477 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -1065,6 +1065,11 @@ bool machine_dump_guest_core(MachineState *machine)
>       return machine->dump_guest_core;
>   }
>   
> +bool machine_smp_cpus(MachineState *machine)
> +{
> +    return machine->smp.cpus;
> +}
> +
>   bool machine_mem_merge(MachineState *machine)
>   {
>       return machine->mem_merge;
> 


-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH] hostmem: change default prealloc threads number
  2021-09-27 15:16 ` [PATCH] hostmem: change default prealloc threads number David Hildenbrand
@ 2021-09-27 16:47   ` Nan Wang
  2021-09-29  9:05     ` Igor Mammedov
  0 siblings, 1 reply; 6+ messages in thread
From: Nan Wang @ 2021-09-27 16:47 UTC (permalink / raw)
  To: David Hildenbrand, imammedo; +Cc: mikughoull, ehabkost, qemu-devel



On 2021/9/27 11:16, David Hildenbrand wrote:
> On 27.09.21 15:19, Nan Wang wrote:
>> From: "wangnan.light" <wangnan.light@bytedance.com>
>>
>> the default number of prealloc threads is 1, for huge memory backend
>> file, single thread touch page is really slow.
>> We can adjust thread number by prealloc-threads property, but if the
>> default value updated to MachineState::smp::cpus may be better.
>> For example, old version of qemu(prealloc-threads have not been
>> introduced yet), the value of threads num is MachineState::smp::cpus,
>> so if someone use the same commandline to start current verion of qemu
>> and old version of qemu which will lead to different behaviors.
> 
> The introducing patch mentions:
> 
> commit ffac16fab33bb42f17e47624985220c1fd864e9d
> Author: Igor Mammedov <imammedo@redhat.com>
> Date:   Wed Feb 19 11:09:50 2020 -0500
> 
>      hostmem: introduce "prealloc-threads" property
> 
>      the property will allow user to specify number of threads to use
>      in pre-allocation stage. It also will allow to reduce implicit
>      hostmem dependency on current_machine.
>      On object creation it will default to 1, but via machine
>      compat property it will be updated to MachineState::smp::cpus
>      to keep current behavior for hostmem and main RAM (which is
>      now also hostmem based).
> 
> So it looks like we want to do the latter via compat properties eventually.
> 
> However, I'd like to note that more prealloc threads might be good for 
> large backends, and might be bad for small backends. To me, it feels 
> like a workload that relies on this should really do this manually. So I 
> am still not sure if this is the right thing to do.
Yes, I agree with you "more prealloc threas are good for large backends, 
and bad for small backends". But I think most situation large backends 
always with large vcpu numbers and small backens always with small vcpu 
numbers, because most users will not create a vm with large vcpu numbers 
with small memory.


> 
> Note that qapi/qom.json:
> 
> "@prealloc-threads: number of CPU threads to use for prealloc (default: 
> 1", so that doc would be wrong now.
> 
> Why exactly can't workload that cares not simply set this manually? 
> Performance tuning smells like something to be done manually for a 
> specific workload.
>
It is a simply way that let workload set the prealloc threads manually. 
For example, for large backends it set many prealloc threads, and set 1 
prealloc threads manually for small backends. Yes, workload can 
`maunally` set prealloc thread to 1, rather than use `default` value 1.
So when workload want to(or maybe just forget specify the 
prealloc-threads property) use the default value, I think the 
MachineState::smp::cpus maybe better than 1.


>>
>> Signed-off-by: wangnan.light <wangnan.light@bytedance.com>
>> ---
>>   backends/hostmem.c | 2 +-
>>   hw/core/machine.c  | 5 +++++
>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/backends/hostmem.c b/backends/hostmem.c
>> index 4c05862ed5..c4a249b7e6 100644
>> --- a/backends/hostmem.c
>> +++ b/backends/hostmem.c
>> @@ -273,7 +273,7 @@ static void host_memory_backend_init(Object *obj)
>>       backend->merge = machine_mem_merge(machine);
>>       backend->dump = machine_dump_guest_core(machine);
>>       backend->reserve = true;
>> -    backend->prealloc_threads = 1;
>> +    backend->prealloc_threads = machine_smp_cpus(machine);
>>   }
>>   static void host_memory_backend_post_init(Object *obj)
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index 067f42b528..95ba5b1477 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -1065,6 +1065,11 @@ bool machine_dump_guest_core(MachineState 
>> *machine)
>>       return machine->dump_guest_core;
>>   }
>> +bool machine_smp_cpus(MachineState *machine)
>> +{
>> +    return machine->smp.cpus;
>> +}
>> +
>>   bool machine_mem_merge(MachineState *machine)
>>   {
>>       return machine->mem_merge;
>>
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH] hostmem: change default prealloc threads number
  2021-09-27 16:47   ` [External] " Nan Wang
@ 2021-09-29  9:05     ` Igor Mammedov
  2021-09-29  9:10       ` David Hildenbrand
  2021-09-29  9:22       ` Daniel P. Berrangé
  0 siblings, 2 replies; 6+ messages in thread
From: Igor Mammedov @ 2021-09-29  9:05 UTC (permalink / raw)
  To: Nan Wang; +Cc: mikughoull, qemu-devel, ehabkost, David Hildenbrand

On Tue, 28 Sep 2021 00:47:01 +0800
Nan Wang <wangnan.light@bytedance.com> wrote:

> On 2021/9/27 11:16, David Hildenbrand wrote:
> > On 27.09.21 15:19, Nan Wang wrote:  
> >> From: "wangnan.light" <wangnan.light@bytedance.com>
> >>
> >> the default number of prealloc threads is 1, for huge memory backend
> >> file, single thread touch page is really slow.
> >> We can adjust thread number by prealloc-threads property, but if the
> >> default value updated to MachineState::smp::cpus may be better.
> >> For example, old version of qemu(prealloc-threads have not been
> >> introduced yet), the value of threads num is MachineState::smp::cpus,
> >> so if someone use the same commandline to start current verion of qemu
> >> and old version of qemu which will lead to different behaviors.  
> > 
> > The introducing patch mentions:
> > 
> > commit ffac16fab33bb42f17e47624985220c1fd864e9d
> > Author: Igor Mammedov <imammedo@redhat.com>
> > Date:   Wed Feb 19 11:09:50 2020 -0500
> > 
> >      hostmem: introduce "prealloc-threads" property
> > 
> >      the property will allow user to specify number of threads to use
> >      in pre-allocation stage. It also will allow to reduce implicit
> >      hostmem dependency on current_machine.
> >      On object creation it will default to 1, but via machine
> >      compat property it will be updated to MachineState::smp::cpus
> >      to keep current behavior for hostmem and main RAM (which is
> >      now also hostmem based).
> > 
> > So it looks like we want to do the latter via compat properties eventually.
> > 
> > However, I'd like to note that more prealloc threads might be good for 
> > large backends, and might be bad for small backends. To me, it feels 
> > like a workload that relies on this should really do this manually. So I 
> > am still not sure if this is the right thing to do.  
> Yes, I agree with you "more prealloc threas are good for large backends, 
> and bad for small backends". But I think most situation large backends 
> always with large vcpu numbers and small backens always with small vcpu 
> numbers, because most users will not create a vm with large vcpu numbers 
> with small memory.
> 
> 
> > 
> > Note that qapi/qom.json:
> > 
> > "@prealloc-threads: number of CPU threads to use for prealloc (default: 
> > 1", so that doc would be wrong now.
> > 
> > Why exactly can't workload that cares not simply set this manually? 
> > Performance tuning smells like something to be done manually for a 
> > specific workload.
> >  
> It is a simply way that let workload set the prealloc threads manually. 
> For example, for large backends it set many prealloc threads, and set 1 
> prealloc threads manually for small backends. Yes, workload can 
> `maunally` set prealloc thread to 1, rather than use `default` value 1.
> So when workload want to(or maybe just forget specify the 
> prealloc-threads property) use the default value, I think the 
> MachineState::smp::cpus maybe better than 1.

as commit mentioned by David states, it creates implicit dependency
on Machine and we were working getting rid of such dependencies
where it's possible.

So if you have to change prealloc-threads to a larger number,
you can try to use specific machine compat properties to do it,
instead of pushing machine to generic backend code. But 'good'
default for your workload doesn't guaranties it's a good one
another.

My preference is that user (mgmt layer) should set property
explicitly if it cares. It's leads to more stable VM config,
as opposed to using defaults which could change over time and
unexpectedly 'regress' such VMs, and can factor in host/workload
specific nuances without need to change QEMU.


> >> Signed-off-by: wangnan.light <wangnan.light@bytedance.com>
> >> ---
> >>   backends/hostmem.c | 2 +-
> >>   hw/core/machine.c  | 5 +++++
> >>   2 files changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/backends/hostmem.c b/backends/hostmem.c
> >> index 4c05862ed5..c4a249b7e6 100644
> >> --- a/backends/hostmem.c
> >> +++ b/backends/hostmem.c
> >> @@ -273,7 +273,7 @@ static void host_memory_backend_init(Object *obj)
> >>       backend->merge = machine_mem_merge(machine);
> >>       backend->dump = machine_dump_guest_core(machine);
> >>       backend->reserve = true;
> >> -    backend->prealloc_threads = 1;
> >> +    backend->prealloc_threads = machine_smp_cpus(machine);
> >>   }
> >>   static void host_memory_backend_post_init(Object *obj)
> >> diff --git a/hw/core/machine.c b/hw/core/machine.c
> >> index 067f42b528..95ba5b1477 100644
> >> --- a/hw/core/machine.c
> >> +++ b/hw/core/machine.c
> >> @@ -1065,6 +1065,11 @@ bool machine_dump_guest_core(MachineState 
> >> *machine)
> >>       return machine->dump_guest_core;
> >>   }
> >> +bool machine_smp_cpus(MachineState *machine)
> >> +{
> >> +    return machine->smp.cpus;
> >> +}
> >> +
> >>   bool machine_mem_merge(MachineState *machine)
> >>   {
> >>       return machine->mem_merge;
> >>  
> > 
> >   
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH] hostmem: change default prealloc threads number
  2021-09-29  9:05     ` Igor Mammedov
@ 2021-09-29  9:10       ` David Hildenbrand
  2021-09-29  9:22       ` Daniel P. Berrangé
  1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2021-09-29  9:10 UTC (permalink / raw)
  To: Igor Mammedov, Nan Wang; +Cc: mikughoull, ehabkost, qemu-devel

On 29.09.21 11:05, Igor Mammedov wrote:
> On Tue, 28 Sep 2021 00:47:01 +0800
> Nan Wang <wangnan.light@bytedance.com> wrote:
> 
>> On 2021/9/27 11:16, David Hildenbrand wrote:
>>> On 27.09.21 15:19, Nan Wang wrote:
>>>> From: "wangnan.light" <wangnan.light@bytedance.com>
>>>>
>>>> the default number of prealloc threads is 1, for huge memory backend
>>>> file, single thread touch page is really slow.
>>>> We can adjust thread number by prealloc-threads property, but if the
>>>> default value updated to MachineState::smp::cpus may be better.
>>>> For example, old version of qemu(prealloc-threads have not been
>>>> introduced yet), the value of threads num is MachineState::smp::cpus,
>>>> so if someone use the same commandline to start current verion of qemu
>>>> and old version of qemu which will lead to different behaviors.
>>>
>>> The introducing patch mentions:
>>>
>>> commit ffac16fab33bb42f17e47624985220c1fd864e9d
>>> Author: Igor Mammedov <imammedo@redhat.com>
>>> Date:   Wed Feb 19 11:09:50 2020 -0500
>>>
>>>       hostmem: introduce "prealloc-threads" property
>>>
>>>       the property will allow user to specify number of threads to use
>>>       in pre-allocation stage. It also will allow to reduce implicit
>>>       hostmem dependency on current_machine.
>>>       On object creation it will default to 1, but via machine
>>>       compat property it will be updated to MachineState::smp::cpus
>>>       to keep current behavior for hostmem and main RAM (which is
>>>       now also hostmem based).
>>>
>>> So it looks like we want to do the latter via compat properties eventually.
>>>
>>> However, I'd like to note that more prealloc threads might be good for
>>> large backends, and might be bad for small backends. To me, it feels
>>> like a workload that relies on this should really do this manually. So I
>>> am still not sure if this is the right thing to do.
>> Yes, I agree with you "more prealloc threas are good for large backends,
>> and bad for small backends". But I think most situation large backends
>> always with large vcpu numbers and small backens always with small vcpu
>> numbers, because most users will not create a vm with large vcpu numbers
>> with small memory.
>>
>>
>>>
>>> Note that qapi/qom.json:
>>>
>>> "@prealloc-threads: number of CPU threads to use for prealloc (default:
>>> 1", so that doc would be wrong now.
>>>
>>> Why exactly can't workload that cares not simply set this manually?
>>> Performance tuning smells like something to be done manually for a
>>> specific workload.
>>>   
>> It is a simply way that let workload set the prealloc threads manually.
>> For example, for large backends it set many prealloc threads, and set 1
>> prealloc threads manually for small backends. Yes, workload can
>> `maunally` set prealloc thread to 1, rather than use `default` value 1.
>> So when workload want to(or maybe just forget specify the
>> prealloc-threads property) use the default value, I think the
>> MachineState::smp::cpus maybe better than 1.
> 
> as commit mentioned by David states, it creates implicit dependency
> on Machine and we were working getting rid of such dependencies
> where it's possible.
> 
> So if you have to change prealloc-threads to a larger number,
> you can try to use specific machine compat properties to do it,
> instead of pushing machine to generic backend code. But 'good'
> default for your workload doesn't guaranties it's a good one
> another.
> 
> My preference is that user (mgmt layer) should set property
> explicitly if it cares. It's leads to more stable VM config,
> as opposed to using defaults which could change over time and
> unexpectedly 'regress' such VMs, and can factor in host/workload
> specific nuances without need to change QEMU.

Exactly my thoughts, thanks Igor.

-- 
Thanks,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH] hostmem: change default prealloc threads number
  2021-09-29  9:05     ` Igor Mammedov
  2021-09-29  9:10       ` David Hildenbrand
@ 2021-09-29  9:22       ` Daniel P. Berrangé
  2021-09-29 14:46         ` Igor Mammedov
  1 sibling, 1 reply; 6+ messages in thread
From: Daniel P. Berrangé @ 2021-09-29  9:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: mikughoull, David Hildenbrand, qemu-devel, ehabkost, Nan Wang

On Wed, Sep 29, 2021 at 11:05:31AM +0200, Igor Mammedov wrote:
> On Tue, 28 Sep 2021 00:47:01 +0800
> Nan Wang <wangnan.light@bytedance.com> wrote:
> 
> > On 2021/9/27 11:16, David Hildenbrand wrote:
> > > On 27.09.21 15:19, Nan Wang wrote:  
> > >> From: "wangnan.light" <wangnan.light@bytedance.com>
> > >>
> > >> the default number of prealloc threads is 1, for huge memory backend
> > >> file, single thread touch page is really slow.
> > >> We can adjust thread number by prealloc-threads property, but if the
> > >> default value updated to MachineState::smp::cpus may be better.
> > >> For example, old version of qemu(prealloc-threads have not been
> > >> introduced yet), the value of threads num is MachineState::smp::cpus,
> > >> so if someone use the same commandline to start current verion of qemu
> > >> and old version of qemu which will lead to different behaviors.  
> > > 
> > > The introducing patch mentions:
> > > 
> > > commit ffac16fab33bb42f17e47624985220c1fd864e9d
> > > Author: Igor Mammedov <imammedo@redhat.com>
> > > Date:   Wed Feb 19 11:09:50 2020 -0500
> > > 
> > >      hostmem: introduce "prealloc-threads" property
> > > 
> > >      the property will allow user to specify number of threads to use
> > >      in pre-allocation stage. It also will allow to reduce implicit
> > >      hostmem dependency on current_machine.
> > >      On object creation it will default to 1, but via machine
> > >      compat property it will be updated to MachineState::smp::cpus
> > >      to keep current behavior for hostmem and main RAM (which is
> > >      now also hostmem based).
> > > 
> > > So it looks like we want to do the latter via compat properties eventually.
> > > 
> > > However, I'd like to note that more prealloc threads might be good for 
> > > large backends, and might be bad for small backends. To me, it feels 
> > > like a workload that relies on this should really do this manually. So I 
> > > am still not sure if this is the right thing to do.  
> > Yes, I agree with you "more prealloc threas are good for large backends, 
> > and bad for small backends". But I think most situation large backends 
> > always with large vcpu numbers and small backens always with small vcpu 
> > numbers, because most users will not create a vm with large vcpu numbers 
> > with small memory.
> > 
> > 
> > > 
> > > Note that qapi/qom.json:
> > > 
> > > "@prealloc-threads: number of CPU threads to use for prealloc (default: 
> > > 1", so that doc would be wrong now.
> > > 
> > > Why exactly can't workload that cares not simply set this manually? 
> > > Performance tuning smells like something to be done manually for a 
> > > specific workload.
> > >  
> > It is a simply way that let workload set the prealloc threads manually. 
> > For example, for large backends it set many prealloc threads, and set 1 
> > prealloc threads manually for small backends. Yes, workload can 
> > `maunally` set prealloc thread to 1, rather than use `default` value 1.
> > So when workload want to(or maybe just forget specify the 
> > prealloc-threads property) use the default value, I think the 
> > MachineState::smp::cpus maybe better than 1.
> 
> as commit mentioned by David states, it creates implicit dependency
> on Machine and we were working getting rid of such dependencies
> where it's possible.
> 
> So if you have to change prealloc-threads to a larger number,
> you can try to use specific machine compat properties to do it,
> instead of pushing machine to generic backend code. But 'good'
> default for your workload doesn't guaranties it's a good one
> another.
> 
> My preference is that user (mgmt layer) should set property
> explicitly if it cares. It's leads to more stable VM config,
> as opposed to using defaults which could change over time and
> unexpectedly 'regress' such VMs, and can factor in host/workload
> specific nuances without need to change QEMU.

Setting prealloc_threads to match vCPUs count feels like it is making
an assumption that if we've allowed 4 vCPUs, it is OK for the prealloc
to consume 4 host CPUs. This assumption could be tricky when QEMU is
strictly pinned to host CPUs, as vCPU threads are pinned to some pCPUs
but emulator threads might be pinned differently.

Would there still be a performance advantage to prealloc_threads > 1,
if all non-vCPU threads are pinned to the same single host CPU ?


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH] hostmem: change default prealloc threads number
  2021-09-29  9:22       ` Daniel P. Berrangé
@ 2021-09-29 14:46         ` Igor Mammedov
  0 siblings, 0 replies; 6+ messages in thread
From: Igor Mammedov @ 2021-09-29 14:46 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: mikughoull, David Hildenbrand, qemu-devel, ehabkost, Nan Wang

On Wed, 29 Sep 2021 10:22:39 +0100
Daniel P. Berrangé <berrange@redhat.com> wrote:

> On Wed, Sep 29, 2021 at 11:05:31AM +0200, Igor Mammedov wrote:
> > On Tue, 28 Sep 2021 00:47:01 +0800
> > Nan Wang <wangnan.light@bytedance.com> wrote:
> >   
> > > On 2021/9/27 11:16, David Hildenbrand wrote:  
> > > > On 27.09.21 15:19, Nan Wang wrote:    
> > > >> From: "wangnan.light" <wangnan.light@bytedance.com>
> > > >>
> > > >> the default number of prealloc threads is 1, for huge memory backend
> > > >> file, single thread touch page is really slow.
> > > >> We can adjust thread number by prealloc-threads property, but if the
> > > >> default value updated to MachineState::smp::cpus may be better.
> > > >> For example, old version of qemu(prealloc-threads have not been
> > > >> introduced yet), the value of threads num is MachineState::smp::cpus,
> > > >> so if someone use the same commandline to start current verion of qemu
> > > >> and old version of qemu which will lead to different behaviors.    
> > > > 
> > > > The introducing patch mentions:
> > > > 
> > > > commit ffac16fab33bb42f17e47624985220c1fd864e9d
> > > > Author: Igor Mammedov <imammedo@redhat.com>
> > > > Date:   Wed Feb 19 11:09:50 2020 -0500
> > > > 
> > > >      hostmem: introduce "prealloc-threads" property
> > > > 
> > > >      the property will allow user to specify number of threads to use
> > > >      in pre-allocation stage. It also will allow to reduce implicit
> > > >      hostmem dependency on current_machine.
> > > >      On object creation it will default to 1, but via machine
> > > >      compat property it will be updated to MachineState::smp::cpus
> > > >      to keep current behavior for hostmem and main RAM (which is
> > > >      now also hostmem based).
> > > > 
> > > > So it looks like we want to do the latter via compat properties eventually.
> > > > 
> > > > However, I'd like to note that more prealloc threads might be good for 
> > > > large backends, and might be bad for small backends. To me, it feels 
> > > > like a workload that relies on this should really do this manually. So I 
> > > > am still not sure if this is the right thing to do.    
> > > Yes, I agree with you "more prealloc threas are good for large backends, 
> > > and bad for small backends". But I think most situation large backends 
> > > always with large vcpu numbers and small backens always with small vcpu 
> > > numbers, because most users will not create a vm with large vcpu numbers 
> > > with small memory.
> > > 
> > >   
> > > > 
> > > > Note that qapi/qom.json:
> > > > 
> > > > "@prealloc-threads: number of CPU threads to use for prealloc (default: 
> > > > 1", so that doc would be wrong now.
> > > > 
> > > > Why exactly can't workload that cares not simply set this manually? 
> > > > Performance tuning smells like something to be done manually for a 
> > > > specific workload.
> > > >    
> > > It is a simply way that let workload set the prealloc threads manually. 
> > > For example, for large backends it set many prealloc threads, and set 1 
> > > prealloc threads manually for small backends. Yes, workload can 
> > > `maunally` set prealloc thread to 1, rather than use `default` value 1.
> > > So when workload want to(or maybe just forget specify the 
> > > prealloc-threads property) use the default value, I think the 
> > > MachineState::smp::cpus maybe better than 1.  
> > 
> > as commit mentioned by David states, it creates implicit dependency
> > on Machine and we were working getting rid of such dependencies
> > where it's possible.
> > 
> > So if you have to change prealloc-threads to a larger number,
> > you can try to use specific machine compat properties to do it,
> > instead of pushing machine to generic backend code. But 'good'
> > default for your workload doesn't guaranties it's a good one
> > another.
> > 
> > My preference is that user (mgmt layer) should set property
> > explicitly if it cares. It's leads to more stable VM config,
> > as opposed to using defaults which could change over time and
> > unexpectedly 'regress' such VMs, and can factor in host/workload
> > specific nuances without need to change QEMU.  
> 
> Setting prealloc_threads to match vCPUs count feels like it is making
> an assumption that if we've allowed 4 vCPUs, it is OK for the prealloc
> to consume 4 host CPUs. This assumption could be tricky when QEMU is
> strictly pinned to host CPUs, as vCPU threads are pinned to some pCPUs
> but emulator threads might be pinned differently.
> 
> Would there still be a performance advantage to prealloc_threads > 1,
> if all non-vCPU threads are pinned to the same single host CPU ?

I'd imagine it will only introduce unnecessary task contention.
Current conservative default (1) is the best we can do without
knowing workload/host configuration, since it affects host and
already running VMs less than higher number of pre-allocation
threads.
 

> 
> 
> Regards,
> Daniel



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-09-29 14:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210927131951.1810823-1-user@n248-145-203>
2021-09-27 15:16 ` [PATCH] hostmem: change default prealloc threads number David Hildenbrand
2021-09-27 16:47   ` [External] " Nan Wang
2021-09-29  9:05     ` Igor Mammedov
2021-09-29  9:10       ` David Hildenbrand
2021-09-29  9:22       ` Daniel P. Berrangé
2021-09-29 14:46         ` Igor Mammedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.