xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-10 11:22 Juergen Gross
  2019-05-10 11:22 ` [Xen-devel] " Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2019-05-10 11:22 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 10/05/2019 12:29, Dario Faggioli wrote:
> On Fri, 2019-05-10 at 11:00 +0200, Juergen Gross wrote:
>> On 10/05/2019 10:53, Jan Beulich wrote:
>>>>>> On 08.05.19 at 16:36, <jgross@suse.com> wrote:
>>>>
>>>> With sched-gran=core or sched-gran=socket offlining a single cpu
>>>> results
>>>> in moving the complete core or socket to cpupool_free_cpus and
>>>> then
>>>> offlining from there. Only complete cores/sockets can be moved to
>>>> any
>>>> cpupool. When onlining a cpu it is added to cpupool_free_cpus and
>>>> if
>>>> the core/socket is completely online it will automatically be
>>>> added to
>>>> Pool-0 (as today any single onlined cpu).
>>>
>>> Well, this is in line with what was discussed on the call
>>> yesterday, so
>>> I think it's an acceptable initial state to end up in. Albeit, just
>>> for
>>> completeness, I'm not convinced there's no use for "smt-
>>> {dis,en}able"
>>> anymore with core-aware scheduling implemented just in Xen - it
>>> may still be considered useful as long as we don't expose proper
>>> topology to guests, for them to be able to do something similar.
>>
>> As the extra complexity for supporting that is significant I'd like
>> to
>> at least postpone it. And with the (later) introduction of per-
>> cpupool
>> smt on/off I guess this would be even less important.
>>
> I agree.
> 
> Isn't it the case that (but note that I'm just thinking out loud here),
> if we make smt= and sched-gran= per-cpupool, the user gains the chance
> to use both, if he/she wants (e.g., for testing)?

Yes.

> If yes, is such a thing valuable enough that it'd it make sense to work
> on that, as a first thing, I mean?

My planned roadmap is:

1. this series
2. scheduler clean-up
3. per-cpupool smt and granularity

> We'd still forbid moving things from pools with different
> configuration, at least at the beginning, of course.

Right, allowing that would be 4.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xen-devel] [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-10 11:22 [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
@ 2019-05-10 11:22 ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-10 11:22 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 10/05/2019 12:29, Dario Faggioli wrote:
> On Fri, 2019-05-10 at 11:00 +0200, Juergen Gross wrote:
>> On 10/05/2019 10:53, Jan Beulich wrote:
>>>>>> On 08.05.19 at 16:36, <jgross@suse.com> wrote:
>>>>
>>>> With sched-gran=core or sched-gran=socket offlining a single cpu
>>>> results
>>>> in moving the complete core or socket to cpupool_free_cpus and
>>>> then
>>>> offlining from there. Only complete cores/sockets can be moved to
>>>> any
>>>> cpupool. When onlining a cpu it is added to cpupool_free_cpus and
>>>> if
>>>> the core/socket is completely online it will automatically be
>>>> added to
>>>> Pool-0 (as today any single onlined cpu).
>>>
>>> Well, this is in line with what was discussed on the call
>>> yesterday, so
>>> I think it's an acceptable initial state to end up in. Albeit, just
>>> for
>>> completeness, I'm not convinced there's no use for "smt-
>>> {dis,en}able"
>>> anymore with core-aware scheduling implemented just in Xen - it
>>> may still be considered useful as long as we don't expose proper
>>> topology to guests, for them to be able to do something similar.
>>
>> As the extra complexity for supporting that is significant I'd like
>> to
>> at least postpone it. And with the (later) introduction of per-
>> cpupool
>> smt on/off I guess this would be even less important.
>>
> I agree.
> 
> Isn't it the case that (but note that I'm just thinking out loud here),
> if we make smt= and sched-gran= per-cpupool, the user gains the chance
> to use both, if he/she wants (e.g., for testing)?

Yes.

> If yes, is such a thing valuable enough that it'd it make sense to work
> on that, as a first thing, I mean?

My planned roadmap is:

1. this series
2. scheduler clean-up
3. per-cpupool smt and granularity

> We'd still forbid moving things from pools with different
> configuration, at least at the beginning, of course.

Right, allowing that would be 4.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-10  9:00             ` Juergen Gross
  2019-05-10 10:29               ` Dario Faggioli
@ 2019-05-10 11:17               ` Jan Beulich
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2019-05-10 11:17 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 10.05.19 at 11:00, <jgross@suse.com> wrote:
> On 10/05/2019 10:53, Jan Beulich wrote:
>>>>> On 08.05.19 at 16:36, <jgross@suse.com> wrote:
>>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>>> handled (hence the RFC status of the series).
>>>>>
>>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>>> scheme as you suggested. Right now the current solution is much more
>>>>> simple.
>>>>
>>>> I see (I did notice the cover letter remark, but managed to not
>>>> honor it when writing the reply), but I'm unconvinced if incurring
>>>> more code churn by not dealing with things the "dynamic" way
>>>> right away is indeed the "more simple" (overall) solution.
>>>
>>> I have started to address cpu on/offlining now.
>>>
>>> There are multiple design decisions to take.
>>>
>>> 1. Interaction between sched-gran and smt boot parameters
>>> 2. Interaction between sched-gran and xen-hptool smt switching
>>> 3. Interaction between sched-gran and single cpu on/offlining
>>>
>>> Right now any guest won't see a difference regarding sched-gran
>>> selection. This means we don't have to think about potential migration
>>> restrictions. This might change in future when we want to enable the
>>> guest to e.g. use core scheduling themselves in order to mitigate
>>> against side channel attacks within the guest.
>>>
>>> The most simple solution would be (and I'd like to send out V1 of my
>>> series with that implemented):
>>>
>>> sched-gran=core and sched-gran=socket don't allow dynamical switching
>>> of smt via xen-hptool.
>>>
>>> With sched-gran=core or sched-gran=socket offlining a single cpu results
>>> in moving the complete core or socket to cpupool_free_cpus and then
>>> offlining from there. Only complete cores/sockets can be moved to any
>>> cpupool. When onlining a cpu it is added to cpupool_free_cpus and if
>>> the core/socket is completely online it will automatically be added to
>>> Pool-0 (as today any single onlined cpu).
>> 
>> Well, this is in line with what was discussed on the call yesterday, so
>> I think it's an acceptable initial state to end up in. Albeit, just for
>> completeness, I'm not convinced there's no use for "smt-{dis,en}able"
>> anymore with core-aware scheduling implemented just in Xen - it
>> may still be considered useful as long as we don't expose proper
>> topology to guests, for them to be able to do something similar.
> 
> As the extra complexity for supporting that is significant I'd like to
> at least postpone it.

Understood.

> And with the (later) introduction of per-cpupool
> smt on/off I guess this would be even less important.

Likely, since pools themselves can be created and destroyed
dynamically. At that point this would basically be a more
fine-grained smt-{en,dis}able.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-10  9:00             ` Juergen Gross
@ 2019-05-10 10:29               ` Dario Faggioli
  2019-05-10 11:17               ` Jan Beulich
  1 sibling, 0 replies; 16+ messages in thread
From: Dario Faggioli @ 2019-05-10 10:29 UTC (permalink / raw)
  To: Juergen Gross, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 2033 bytes --]

On Fri, 2019-05-10 at 11:00 +0200, Juergen Gross wrote:
> On 10/05/2019 10:53, Jan Beulich wrote:
> > > > > On 08.05.19 at 16:36, <jgross@suse.com> wrote:
> > > 
> > > With sched-gran=core or sched-gran=socket offlining a single cpu
> > > results
> > > in moving the complete core or socket to cpupool_free_cpus and
> > > then
> > > offlining from there. Only complete cores/sockets can be moved to
> > > any
> > > cpupool. When onlining a cpu it is added to cpupool_free_cpus and
> > > if
> > > the core/socket is completely online it will automatically be
> > > added to
> > > Pool-0 (as today any single onlined cpu).
> > 
> > Well, this is in line with what was discussed on the call
> > yesterday, so
> > I think it's an acceptable initial state to end up in. Albeit, just
> > for
> > completeness, I'm not convinced there's no use for "smt-
> > {dis,en}able"
> > anymore with core-aware scheduling implemented just in Xen - it
> > may still be considered useful as long as we don't expose proper
> > topology to guests, for them to be able to do something similar.
> 
> As the extra complexity for supporting that is significant I'd like
> to
> at least postpone it. And with the (later) introduction of per-
> cpupool
> smt on/off I guess this would be even less important.
> 
I agree.

Isn't it the case that (but note that I'm just thinking out loud here),
if we make smt= and sched-gran= per-cpupool, the user gains the chance
to use both, if he/she wants (e.g., for testing)?

If yes, is such a thing valuable enough that it'd it make sense to work
on that, as a first thing, I mean?

We'd still forbid moving things from pools with different
configuration, at least at the beginning, of course.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-10  9:00             ` Juergen Gross
  2019-05-10 10:29               ` Dario Faggioli
  2019-05-10 11:17               ` Jan Beulich
  0 siblings, 2 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-10  9:00 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

On 10/05/2019 10:53, Jan Beulich wrote:
>>>> On 08.05.19 at 16:36, <jgross@suse.com> wrote:
>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>> handled (hence the RFC status of the series).
>>>>
>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>> scheme as you suggested. Right now the current solution is much more
>>>> simple.
>>>
>>> I see (I did notice the cover letter remark, but managed to not
>>> honor it when writing the reply), but I'm unconvinced if incurring
>>> more code churn by not dealing with things the "dynamic" way
>>> right away is indeed the "more simple" (overall) solution.
>>
>> I have started to address cpu on/offlining now.
>>
>> There are multiple design decisions to take.
>>
>> 1. Interaction between sched-gran and smt boot parameters
>> 2. Interaction between sched-gran and xen-hptool smt switching
>> 3. Interaction between sched-gran and single cpu on/offlining
>>
>> Right now any guest won't see a difference regarding sched-gran
>> selection. This means we don't have to think about potential migration
>> restrictions. This might change in future when we want to enable the
>> guest to e.g. use core scheduling themselves in order to mitigate
>> against side channel attacks within the guest.
>>
>> The most simple solution would be (and I'd like to send out V1 of my
>> series with that implemented):
>>
>> sched-gran=core and sched-gran=socket don't allow dynamical switching
>> of smt via xen-hptool.
>>
>> With sched-gran=core or sched-gran=socket offlining a single cpu results
>> in moving the complete core or socket to cpupool_free_cpus and then
>> offlining from there. Only complete cores/sockets can be moved to any
>> cpupool. When onlining a cpu it is added to cpupool_free_cpus and if
>> the core/socket is completely online it will automatically be added to
>> Pool-0 (as today any single onlined cpu).
> 
> Well, this is in line with what was discussed on the call yesterday, so
> I think it's an acceptable initial state to end up in. Albeit, just for
> completeness, I'm not convinced there's no use for "smt-{dis,en}able"
> anymore with core-aware scheduling implemented just in Xen - it
> may still be considered useful as long as we don't expose proper
> topology to guests, for them to be able to do something similar.

As the extra complexity for supporting that is significant I'd like to
at least postpone it. And with the (later) introduction of per-cpupool
smt on/off I guess this would be even less important.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-08 14:36         ` Juergen Gross
@ 2019-05-10  8:53           ` Jan Beulich
       [not found]           ` <5CD53C1C020000780022D706@suse.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2019-05-10  8:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 08.05.19 at 16:36, <jgross@suse.com> wrote:
> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>> handled (hence the RFC status of the series).
>>>
>>> When cpu hotplug is being added it might be appropriate to switch the
>>> scheme as you suggested. Right now the current solution is much more
>>> simple.
>> 
>> I see (I did notice the cover letter remark, but managed to not
>> honor it when writing the reply), but I'm unconvinced if incurring
>> more code churn by not dealing with things the "dynamic" way
>> right away is indeed the "more simple" (overall) solution.
> 
> I have started to address cpu on/offlining now.
> 
> There are multiple design decisions to take.
> 
> 1. Interaction between sched-gran and smt boot parameters
> 2. Interaction between sched-gran and xen-hptool smt switching
> 3. Interaction between sched-gran and single cpu on/offlining
> 
> Right now any guest won't see a difference regarding sched-gran
> selection. This means we don't have to think about potential migration
> restrictions. This might change in future when we want to enable the
> guest to e.g. use core scheduling themselves in order to mitigate
> against side channel attacks within the guest.
> 
> The most simple solution would be (and I'd like to send out V1 of my
> series with that implemented):
> 
> sched-gran=core and sched-gran=socket don't allow dynamical switching
> of smt via xen-hptool.
> 
> With sched-gran=core or sched-gran=socket offlining a single cpu results
> in moving the complete core or socket to cpupool_free_cpus and then
> offlining from there. Only complete cores/sockets can be moved to any
> cpupool. When onlining a cpu it is added to cpupool_free_cpus and if
> the core/socket is completely online it will automatically be added to
> Pool-0 (as today any single onlined cpu).

Well, this is in line with what was discussed on the call yesterday, so
I think it's an acceptable initial state to end up in. Albeit, just for
completeness, I'm not convinced there's no use for "smt-{dis,en}able"
anymore with core-aware scheduling implemented just in Xen - it
may still be considered useful as long as we don't expose proper
topology to guests, for them to be able to do something similar.

> The next steps (for future patches) could be:
> 
> - per-cpupool smt settings (static at cpupool creation, moving a domain
>   between cpupools with different smt settings not supported)
> 
> - support moving domains between cpupools with different smt settings
>   (a guest started with smt=0 would only ever use 1 thread per core)

Yes, in its most general terms: Such movement may be wasteful, but
should be possible to be carried out safely in all cases.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-06 10:01       ` Jan Beulich
@ 2019-05-08 14:36         ` Juergen Gross
  2019-05-10  8:53           ` Jan Beulich
       [not found]           ` <5CD53C1C020000780022D706@suse.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-08 14:36 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Dario Faggioli,
	Julien Grall, xen-devel, Ian Jackson, Roger Pau Monne

On 06/05/2019 12:01, Jan Beulich wrote:
>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>> And that was mentioned in the cover letter: cpu hotplug is not yet
>> handled (hence the RFC status of the series).
>>
>> When cpu hotplug is being added it might be appropriate to switch the
>> scheme as you suggested. Right now the current solution is much more
>> simple.
> 
> I see (I did notice the cover letter remark, but managed to not
> honor it when writing the reply), but I'm unconvinced if incurring
> more code churn by not dealing with things the "dynamic" way
> right away is indeed the "more simple" (overall) solution.

I have started to address cpu on/offlining now.

There are multiple design decisions to take.

1. Interaction between sched-gran and smt boot parameters
2. Interaction between sched-gran and xen-hptool smt switching
3. Interaction between sched-gran and single cpu on/offlining

Right now any guest won't see a difference regarding sched-gran
selection. This means we don't have to think about potential migration
restrictions. This might change in future when we want to enable the
guest to e.g. use core scheduling themselves in order to mitigate
against side channel attacks within the guest.

The most simple solution would be (and I'd like to send out V1 of my
series with that implemented):

sched-gran=core and sched-gran=socket don't allow dynamical switching
of smt via xen-hptool.

With sched-gran=core or sched-gran=socket offlining a single cpu results
in moving the complete core or socket to cpupool_free_cpus and then
offlining from there. Only complete cores/sockets can be moved to any
cpupool. When onlining a cpu it is added to cpupool_free_cpus and if
the core/socket is completely online it will automatically be added to
Pool-0 (as today any single onlined cpu).

The next steps (for future patches) could be:

- per-cpupool smt settings (static at cpupool creation, moving a domain
  between cpupools with different smt settings not supported)

- support moving domains between cpupools with different smt settings
  (a guest started with smt=0 would only ever use 1 thread per core)

- support per-cpupool scheduling granularity

Thoughts?


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-06 13:29 Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-06 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

On 06/05/2019 15:14, Jan Beulich wrote:
>>>> On 06.05.19 at 14:23, <jgross@suse.com> wrote:
>> On 06/05/2019 13:58, Jan Beulich wrote:
>>>>>> On 06.05.19 at 12:20, <jgross@suse.com> wrote:
>>>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>>>>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>>>> . Yet then I'm a little puzzled by its use here in the first place.
>>>>>>> Generally I think for_each_cpu() uses in __init functions are
>>>>>>> problematic, as they then require further code elsewhere to
>>>>>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>>>>>> notifiers is typically more appropriate.
>>>>>>
>>>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>>>> handled (hence the RFC status of the series).
>>>>>>
>>>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>>>> scheme as you suggested. Right now the current solution is much more
>>>>>> simple.
>>>>>
>>>>> I see (I did notice the cover letter remark, but managed to not
>>>>> honor it when writing the reply), but I'm unconvinced if incurring
>>>>> more code churn by not dealing with things the "dynamic" way
>>>>> right away is indeed the "more simple" (overall) solution.
>>>>
>>>> Especially with hotplug things are becoming more complicated: I'd like
>>>> to have the final version fall back to smaller granularities in case
>>>> e.g. the user has selected socket scheduling and two sockets have
>>>> different numbers of cores. With hotplug such a situation might be
>>>> discovered only with some domUs already running, so how should we
>>>> react in that case? Doing panic() is no option, so either we reject
>>>> onlining the additional socket, or we adapt by dynamically modifying the
>>>> scheduling granularity. Without that being discussed I don't think it
>>>> makes sense to put a lot effort into a solution which is going to be
>>>> rejected in the end.
>>>
>>> Hmm, where's the symmetry requirement coming from? Socket
>>> scheduling should mean as many vCPU-s on one socket as there
>>> are cores * threads; similarly core scheduling (number of threads).
>>> Statically partitioning domains would seem an intermediate step
>>> at best only anyway, as that requires (on average) leaving more
>>> resources (cores/threads) idle than with a dynamic partitioning
>>> model.
>>
>> And that is exactly the purpose of core/socket scheduling. How else
>> would it be possible (in future) to pass through the topology below
>> the scheduling granularity to the guest?
> 
> True. Albeit nevertheless an (at least) unfortunate limitation.
> 
>> And how should it be of any
>> use for fighting security issues due to side channel attacks?
> 
> From Xen's pov all is still fine afaict. It's the lack of (correct)
> topology exposure (as per above) which would make guest
> side mitigation impossible.
> 
>>> As to your specific question how to react: Since bringing online
>>> a full new socket implies bringing online all its cores / threads one
>>> by one anyway, a "too small" socket in your scheme would
>>> simply result in the socket remaining unused until "enough"
>>> cores/threads have appeared. Similarly the socket would go
>>> out of use as soon as one of its cores/threads gets offlined.
>>
>> Yes, this is a possible way to do it. It should be spelled out,
>> though.
>>
>>> Obviously this ends up problematic for the last usable socket.
>>
>> Yes, like today for the last cpu/thread.
> 
> Well, only kind of. It's quite expected that the last thread
> can't be offlined. I'd call it rather unexpected that a random
> thread on the last socket can't be offlined just because each
> other socket also has a single offline thread: There might
> still be hundreds of online threads in this case, after all.

You'd need to offline the related thread in all active guests. Otherwise
(from the guest's point of view) a cpu suddenly disappears.

> 
>>> But with the static partitioning you describe I also can't really
>>> see how "xen-hptool smt-disable" is going to work.
>>
>> It won't work. It just makes no sense to use it with core scheduling
>> active.
> 
> Why not? Disabling HT may be for purposes other than mitigating
> vulnerabilities like L1TF. And the system is in a symmetric state
> at the beginning and end of the entire operation; it's merely
> intermediate state which doesn't fit the expectations you set forth.

It is like bare metal: You can't physically unplug a single thread. This
is possible only for complete sockets.

It would theoretically be possible to have a test whether all guests
have the related cpus offlined in order to offline them in Xen. IMHO
this would be overkill: as an admin you have to decide whether you want
to use core scheduling or you want the ability to switch of SMT on the
fly.

You can still boot e.g. with sched-gran=socket and smt=off.

Another possibility would be to make sched-gran and SMT per cpupool.
In that case I'd like to those attributes static at creation time of
the cpupool, though.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-06 12:23             ` Juergen Gross
@ 2019-05-06 13:14               ` Jan Beulich
  0 siblings, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2019-05-06 13:14 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 06.05.19 at 14:23, <jgross@suse.com> wrote:
> On 06/05/2019 13:58, Jan Beulich wrote:
>>>>> On 06.05.19 at 12:20, <jgross@suse.com> wrote:
>>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>>>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>>> . Yet then I'm a little puzzled by its use here in the first place.
>>>>>> Generally I think for_each_cpu() uses in __init functions are
>>>>>> problematic, as they then require further code elsewhere to
>>>>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>>>>> notifiers is typically more appropriate.
>>>>>
>>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>>> handled (hence the RFC status of the series).
>>>>>
>>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>>> scheme as you suggested. Right now the current solution is much more
>>>>> simple.
>>>>
>>>> I see (I did notice the cover letter remark, but managed to not
>>>> honor it when writing the reply), but I'm unconvinced if incurring
>>>> more code churn by not dealing with things the "dynamic" way
>>>> right away is indeed the "more simple" (overall) solution.
>>>
>>> Especially with hotplug things are becoming more complicated: I'd like
>>> to have the final version fall back to smaller granularities in case
>>> e.g. the user has selected socket scheduling and two sockets have
>>> different numbers of cores. With hotplug such a situation might be
>>> discovered only with some domUs already running, so how should we
>>> react in that case? Doing panic() is no option, so either we reject
>>> onlining the additional socket, or we adapt by dynamically modifying the
>>> scheduling granularity. Without that being discussed I don't think it
>>> makes sense to put a lot effort into a solution which is going to be
>>> rejected in the end.
>> 
>> Hmm, where's the symmetry requirement coming from? Socket
>> scheduling should mean as many vCPU-s on one socket as there
>> are cores * threads; similarly core scheduling (number of threads).
>> Statically partitioning domains would seem an intermediate step
>> at best only anyway, as that requires (on average) leaving more
>> resources (cores/threads) idle than with a dynamic partitioning
>> model.
> 
> And that is exactly the purpose of core/socket scheduling. How else
> would it be possible (in future) to pass through the topology below
> the scheduling granularity to the guest?

True. Albeit nevertheless an (at least) unfortunate limitation.

> And how should it be of any
> use for fighting security issues due to side channel attacks?

From Xen's pov all is still fine afaict. It's the lack of (correct)
topology exposure (as per above) which would make guest
side mitigation impossible.

>> As to your specific question how to react: Since bringing online
>> a full new socket implies bringing online all its cores / threads one
>> by one anyway, a "too small" socket in your scheme would
>> simply result in the socket remaining unused until "enough"
>> cores/threads have appeared. Similarly the socket would go
>> out of use as soon as one of its cores/threads gets offlined.
> 
> Yes, this is a possible way to do it. It should be spelled out,
> though.
> 
>> Obviously this ends up problematic for the last usable socket.
> 
> Yes, like today for the last cpu/thread.

Well, only kind of. It's quite expected that the last thread
can't be offlined. I'd call it rather unexpected that a random
thread on the last socket can't be offlined just because each
other socket also has a single offline thread: There might
still be hundreds of online threads in this case, after all.

>> But with the static partitioning you describe I also can't really
>> see how "xen-hptool smt-disable" is going to work.
> 
> It won't work. It just makes no sense to use it with core scheduling
> active.

Why not? Disabling HT may be for purposes other than mitigating
vulnerabilities like L1TF. And the system is in a symmetric state
at the beginning and end of the entire operation; it's merely
intermediate state which doesn't fit the expectations you set forth.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-06 12:23             ` Juergen Gross
  2019-05-06 13:14               ` Jan Beulich
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2019-05-06 12:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

On 06/05/2019 13:58, Jan Beulich wrote:
>>>> On 06.05.19 at 12:20, <jgross@suse.com> wrote:
>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>> . Yet then I'm a little puzzled by its use here in the first place.
>>>>> Generally I think for_each_cpu() uses in __init functions are
>>>>> problematic, as they then require further code elsewhere to
>>>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>>>> notifiers is typically more appropriate.
>>>>
>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>> handled (hence the RFC status of the series).
>>>>
>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>> scheme as you suggested. Right now the current solution is much more
>>>> simple.
>>>
>>> I see (I did notice the cover letter remark, but managed to not
>>> honor it when writing the reply), but I'm unconvinced if incurring
>>> more code churn by not dealing with things the "dynamic" way
>>> right away is indeed the "more simple" (overall) solution.
>>
>> Especially with hotplug things are becoming more complicated: I'd like
>> to have the final version fall back to smaller granularities in case
>> e.g. the user has selected socket scheduling and two sockets have
>> different numbers of cores. With hotplug such a situation might be
>> discovered only with some domUs already running, so how should we
>> react in that case? Doing panic() is no option, so either we reject
>> onlining the additional socket, or we adapt by dynamically modifying the
>> scheduling granularity. Without that being discussed I don't think it
>> makes sense to put a lot effort into a solution which is going to be
>> rejected in the end.
> 
> Hmm, where's the symmetry requirement coming from? Socket
> scheduling should mean as many vCPU-s on one socket as there
> are cores * threads; similarly core scheduling (number of threads).
> Statically partitioning domains would seem an intermediate step
> at best only anyway, as that requires (on average) leaving more
> resources (cores/threads) idle than with a dynamic partitioning
> model.

And that is exactly the purpose of core/socket scheduling. How else
would it be possible (in future) to pass through the topology below
the scheduling granularity to the guest? And how should it be of any
use for fighting security issues due to side channel attacks?

> As to your specific question how to react: Since bringing online
> a full new socket implies bringing online all its cores / threads one
> by one anyway, a "too small" socket in your scheme would
> simply result in the socket remaining unused until "enough"
> cores/threads have appeared. Similarly the socket would go
> out of use as soon as one of its cores/threads gets offlined.

Yes, this is a possible way to do it. It should be spelled out,
though.

> Obviously this ends up problematic for the last usable socket.

Yes, like today for the last cpu/thread.

> But with the static partitioning you describe I also can't really
> see how "xen-hptool smt-disable" is going to work.

It won't work. It just makes no sense to use it with core scheduling
active. In the best case it would render all cores but the last one
unusable (the last one would be refused to be disabled) and all
domains would then fight for that last core.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-06 10:20         ` Juergen Gross
@ 2019-05-06 11:58           ` Jan Beulich
       [not found]           ` <5CD02161020000780022C257@suse.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2019-05-06 11:58 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 06.05.19 at 12:20, <jgross@suse.com> wrote:
> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>> . Yet then I'm a little puzzled by its use here in the first place.
>>>> Generally I think for_each_cpu() uses in __init functions are
>>>> problematic, as they then require further code elsewhere to
>>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>>> notifiers is typically more appropriate.
>>>
>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>> handled (hence the RFC status of the series).
>>>
>>> When cpu hotplug is being added it might be appropriate to switch the
>>> scheme as you suggested. Right now the current solution is much more
>>> simple.
>> 
>> I see (I did notice the cover letter remark, but managed to not
>> honor it when writing the reply), but I'm unconvinced if incurring
>> more code churn by not dealing with things the "dynamic" way
>> right away is indeed the "more simple" (overall) solution.
> 
> Especially with hotplug things are becoming more complicated: I'd like
> to have the final version fall back to smaller granularities in case
> e.g. the user has selected socket scheduling and two sockets have
> different numbers of cores. With hotplug such a situation might be
> discovered only with some domUs already running, so how should we
> react in that case? Doing panic() is no option, so either we reject
> onlining the additional socket, or we adapt by dynamically modifying the
> scheduling granularity. Without that being discussed I don't think it
> makes sense to put a lot effort into a solution which is going to be
> rejected in the end.

Hmm, where's the symmetry requirement coming from? Socket
scheduling should mean as many vCPU-s on one socket as there
are cores * threads; similarly core scheduling (number of threads).
Statically partitioning domains would seem an intermediate step
at best only anyway, as that requires (on average) leaving more
resources (cores/threads) idle than with a dynamic partitioning
model.

As to your specific question how to react: Since bringing online
a full new socket implies bringing online all its cores / threads one
by one anyway, a "too small" socket in your scheme would
simply result in the socket remaining unused until "enough"
cores/threads have appeared. Similarly the socket would go
out of use as soon as one of its cores/threads gets offlined.
Obviously this ends up problematic for the last usable socket.

But with the static partitioning you describe I also can't really
see how "xen-hptool smt-disable" is going to work.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-06 10:20         ` Juergen Gross
  2019-05-06 11:58           ` Jan Beulich
       [not found]           ` <5CD02161020000780022C257@suse.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-06 10:20 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

On 06/05/2019 12:01, Jan Beulich wrote:
>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>>> On 06.05.19 at 08:56, <jgross@suse.com> wrote:
>>>>  void scheduler_percpu_init(unsigned int cpu)
>>>>  {
>>>>      struct scheduler *sched = per_cpu(scheduler, cpu);
>>>>      struct sched_resource *sd = per_cpu(sched_res, cpu);
>>>> +    const cpumask_t *mask;
>>>> +    unsigned int master_cpu;
>>>> +    spinlock_t *lock;
>>>> +    struct sched_item *old_item, *master_item;
>>>> +
>>>> +    if ( system_state == SYS_STATE_resume )
>>>> +        return;
>>>> +
>>>> +    switch ( opt_sched_granularity )
>>>> +    {
>>>> +    case SCHED_GRAN_cpu:
>>>> +        mask = cpumask_of(cpu);
>>>> +        break;
>>>> +    case SCHED_GRAN_core:
>>>> +        mask = per_cpu(cpu_sibling_mask, cpu);
>>>> +        break;
>>>> +    case SCHED_GRAN_socket:
>>>> +        mask = per_cpu(cpu_core_mask, cpu);
>>>> +        break;
>>>> +    default:
>>>> +        ASSERT_UNREACHABLE();
>>>> +        return;
>>>> +    }
>>>>  
>>>> -    if ( system_state != SYS_STATE_resume )
>>>> +    if ( cpu == 0 || cpumask_weight(mask) == 1 )
>>>
>>> At least outside of x86 specific code I think we should avoid
>>> introducing (further?) assumptions that seeing CPU 0 on a
>>> CPU initialization path implies this being while booting the
>>> system. I wonder anyway whether the right side of the ||
>>> doesn't render the left side redundant.
>>
>> On the boot cpu this function is called before e.g. cpu_sibling_mask
>> is initialized. I can have a try using:
>>
>> if ( cpumask_weight(mask) <= 1 )
> 
> Or re-order things such that it gets set in time?

That might be difficult.

I've ended up with:

if ( !mask || cpumask_weight(mask) == 1 )

> 
>>>> +static unsigned int __init sched_check_granularity(void)
>>>> +{
>>>> +    unsigned int cpu;
>>>> +    unsigned int siblings, gran = 0;
>>>> +
>>>> +    for_each_online_cpu( cpu )
>>>
>>> You want to decide for one of two possible styles, but not a mixture
>>> of both:
>>>
>>>     for_each_online_cpu ( cpu )
>>>
>>> or
>>>
>>>     for_each_online_cpu(cpu)
>>
>> Sorry, will correct.
>>
>>>
>>> . Yet then I'm a little puzzled by its use here in the first place.
>>> Generally I think for_each_cpu() uses in __init functions are
>>> problematic, as they then require further code elsewhere to
>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>> notifiers is typically more appropriate.
>>
>> And that was mentioned in the cover letter: cpu hotplug is not yet
>> handled (hence the RFC status of the series).
>>
>> When cpu hotplug is being added it might be appropriate to switch the
>> scheme as you suggested. Right now the current solution is much more
>> simple.
> 
> I see (I did notice the cover letter remark, but managed to not
> honor it when writing the reply), but I'm unconvinced if incurring
> more code churn by not dealing with things the "dynamic" way
> right away is indeed the "more simple" (overall) solution.

Especially with hotplug things are becoming more complicated: I'd like
to have the final version fall back to smaller granularities in case
e.g. the user has selected socket scheduling and two sockets have
different numbers of cores. With hotplug such a situation might be
discovered only with some domUs already running, so how should we
react in that case? Doing panic() is no option, so either we reject
onlining the additional socket, or we adapt by dynamically modifying the
scheduling granularity. Without that being discussed I don't think it
makes sense to put a lot effort into a solution which is going to be
rejected in the end.

I'm fine with doing a proper implementation for the non-RFC variant
with a generally accepted design.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-06  9:23     ` Juergen Gross
@ 2019-05-06 10:01       ` Jan Beulich
  2019-05-08 14:36         ` Juergen Gross
       [not found]       ` <5CD005E7020000780022C1B5@suse.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Jan Beulich @ 2019-05-06 10:01 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>> On 06.05.19 at 08:56, <jgross@suse.com> wrote:
>>>  void scheduler_percpu_init(unsigned int cpu)
>>>  {
>>>      struct scheduler *sched = per_cpu(scheduler, cpu);
>>>      struct sched_resource *sd = per_cpu(sched_res, cpu);
>>> +    const cpumask_t *mask;
>>> +    unsigned int master_cpu;
>>> +    spinlock_t *lock;
>>> +    struct sched_item *old_item, *master_item;
>>> +
>>> +    if ( system_state == SYS_STATE_resume )
>>> +        return;
>>> +
>>> +    switch ( opt_sched_granularity )
>>> +    {
>>> +    case SCHED_GRAN_cpu:
>>> +        mask = cpumask_of(cpu);
>>> +        break;
>>> +    case SCHED_GRAN_core:
>>> +        mask = per_cpu(cpu_sibling_mask, cpu);
>>> +        break;
>>> +    case SCHED_GRAN_socket:
>>> +        mask = per_cpu(cpu_core_mask, cpu);
>>> +        break;
>>> +    default:
>>> +        ASSERT_UNREACHABLE();
>>> +        return;
>>> +    }
>>>  
>>> -    if ( system_state != SYS_STATE_resume )
>>> +    if ( cpu == 0 || cpumask_weight(mask) == 1 )
>> 
>> At least outside of x86 specific code I think we should avoid
>> introducing (further?) assumptions that seeing CPU 0 on a
>> CPU initialization path implies this being while booting the
>> system. I wonder anyway whether the right side of the ||
>> doesn't render the left side redundant.
> 
> On the boot cpu this function is called before e.g. cpu_sibling_mask
> is initialized. I can have a try using:
> 
> if ( cpumask_weight(mask) <= 1 )

Or re-order things such that it gets set in time?

>>> +static unsigned int __init sched_check_granularity(void)
>>> +{
>>> +    unsigned int cpu;
>>> +    unsigned int siblings, gran = 0;
>>> +
>>> +    for_each_online_cpu( cpu )
>> 
>> You want to decide for one of two possible styles, but not a mixture
>> of both:
>> 
>>     for_each_online_cpu ( cpu )
>> 
>> or
>> 
>>     for_each_online_cpu(cpu)
> 
> Sorry, will correct.
> 
>> 
>> . Yet then I'm a little puzzled by its use here in the first place.
>> Generally I think for_each_cpu() uses in __init functions are
>> problematic, as they then require further code elsewhere to
>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>> notifiers is typically more appropriate.
> 
> And that was mentioned in the cover letter: cpu hotplug is not yet
> handled (hence the RFC status of the series).
> 
> When cpu hotplug is being added it might be appropriate to switch the
> scheme as you suggested. Right now the current solution is much more
> simple.

I see (I did notice the cover letter remark, but managed to not
honor it when writing the reply), but I'm unconvinced if incurring
more code churn by not dealing with things the "dynamic" way
right away is indeed the "more simple" (overall) solution.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
       [not found]   ` <5CCFF6F1020000780022C12B@suse.com>
       [not found]     ` <ac57c420*a72e*7570*db8f*27e4693c2755@suse.com>
@ 2019-05-06  9:23     ` Juergen Gross
  2019-05-06 10:01       ` Jan Beulich
       [not found]       ` <5CD005E7020000780022C1B5@suse.com>
  1 sibling, 2 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-06  9:23 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

On 06/05/2019 10:57, Jan Beulich wrote:
>>>> On 06.05.19 at 08:56, <jgross@suse.com> wrote:
>> --- a/xen/arch/x86/setup.c
>> +++ b/xen/arch/x86/setup.c
>> @@ -1701,6 +1701,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>>          printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
>>      smp_cpus_done();
>>  
>> +    scheduler_smp_init();
>> +
>>      do_initcalls();
> 
> This placement and the actual implementation of the function make
> me wonder: Why didn't you make this an initcall, thus taking care of
> Arm (at least in an abstract way) at the same time?

Hmm, true. Will change.

> 
>>  void scheduler_percpu_init(unsigned int cpu)
>>  {
>>      struct scheduler *sched = per_cpu(scheduler, cpu);
>>      struct sched_resource *sd = per_cpu(sched_res, cpu);
>> +    const cpumask_t *mask;
>> +    unsigned int master_cpu;
>> +    spinlock_t *lock;
>> +    struct sched_item *old_item, *master_item;
>> +
>> +    if ( system_state == SYS_STATE_resume )
>> +        return;
>> +
>> +    switch ( opt_sched_granularity )
>> +    {
>> +    case SCHED_GRAN_cpu:
>> +        mask = cpumask_of(cpu);
>> +        break;
>> +    case SCHED_GRAN_core:
>> +        mask = per_cpu(cpu_sibling_mask, cpu);
>> +        break;
>> +    case SCHED_GRAN_socket:
>> +        mask = per_cpu(cpu_core_mask, cpu);
>> +        break;
>> +    default:
>> +        ASSERT_UNREACHABLE();
>> +        return;
>> +    }
>>  
>> -    if ( system_state != SYS_STATE_resume )
>> +    if ( cpu == 0 || cpumask_weight(mask) == 1 )
> 
> At least outside of x86 specific code I think we should avoid
> introducing (further?) assumptions that seeing CPU 0 on a
> CPU initialization path implies this being while booting the
> system. I wonder anyway whether the right side of the ||
> doesn't render the left side redundant.

On the boot cpu this function is called before e.g. cpu_sibling_mask
is initialized. I can have a try using:

if ( cpumask_weight(mask) <= 1 )

> 
>> +static unsigned int __init sched_check_granularity(void)
>> +{
>> +    unsigned int cpu;
>> +    unsigned int siblings, gran = 0;
>> +
>> +    for_each_online_cpu( cpu )
> 
> You want to decide for one of two possible styles, but not a mixture
> of both:
> 
>     for_each_online_cpu ( cpu )
> 
> or
> 
>     for_each_online_cpu(cpu)

Sorry, will correct.

> 
> . Yet then I'm a little puzzled by its use here in the first place.
> Generally I think for_each_cpu() uses in __init functions are
> problematic, as they then require further code elsewhere to
> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
> notifiers is typically more appropriate.

And that was mentioned in the cover letter: cpu hotplug is not yet
handled (hence the RFC status of the series).

When cpu hotplug is being added it might be appropriate to switch the
scheme as you suggested. Right now the current solution is much more
simple.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-06  6:56 ` [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
@ 2019-05-06  8:57   ` Jan Beulich
       [not found]   ` <5CCFF6F1020000780022C12B@suse.com>
  1 sibling, 0 replies; 16+ messages in thread
From: Jan Beulich @ 2019-05-06  8:57 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

>>> On 06.05.19 at 08:56, <jgross@suse.com> wrote:
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -1701,6 +1701,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
>          printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
>      smp_cpus_done();
>  
> +    scheduler_smp_init();
> +
>      do_initcalls();

This placement and the actual implementation of the function make
me wonder: Why didn't you make this an initcall, thus taking care of
Arm (at least in an abstract way) at the same time?

>  void scheduler_percpu_init(unsigned int cpu)
>  {
>      struct scheduler *sched = per_cpu(scheduler, cpu);
>      struct sched_resource *sd = per_cpu(sched_res, cpu);
> +    const cpumask_t *mask;
> +    unsigned int master_cpu;
> +    spinlock_t *lock;
> +    struct sched_item *old_item, *master_item;
> +
> +    if ( system_state == SYS_STATE_resume )
> +        return;
> +
> +    switch ( opt_sched_granularity )
> +    {
> +    case SCHED_GRAN_cpu:
> +        mask = cpumask_of(cpu);
> +        break;
> +    case SCHED_GRAN_core:
> +        mask = per_cpu(cpu_sibling_mask, cpu);
> +        break;
> +    case SCHED_GRAN_socket:
> +        mask = per_cpu(cpu_core_mask, cpu);
> +        break;
> +    default:
> +        ASSERT_UNREACHABLE();
> +        return;
> +    }
>  
> -    if ( system_state != SYS_STATE_resume )
> +    if ( cpu == 0 || cpumask_weight(mask) == 1 )

At least outside of x86 specific code I think we should avoid
introducing (further?) assumptions that seeing CPU 0 on a
CPU initialization path implies this being while booting the
system. I wonder anyway whether the right side of the ||
doesn't render the left side redundant.

> +static unsigned int __init sched_check_granularity(void)
> +{
> +    unsigned int cpu;
> +    unsigned int siblings, gran = 0;
> +
> +    for_each_online_cpu( cpu )

You want to decide for one of two possible styles, but not a mixture
of both:

    for_each_online_cpu ( cpu )

or

    for_each_online_cpu(cpu)

. Yet then I'm a little puzzled by its use here in the first place.
Generally I think for_each_cpu() uses in __init functions are
problematic, as they then require further code elsewhere to
deal with hot-onlining. A pre-SMP-initcall plus use of CPU
notifiers is typically more appropriate.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
  2019-05-06  6:55 [PATCH RFC V2 00/45] xen: add core scheduling support Juergen Gross
@ 2019-05-06  6:56 ` Juergen Gross
  2019-05-06  8:57   ` Jan Beulich
       [not found]   ` <5CCFF6F1020000780022C12B@suse.com>
  0 siblings, 2 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-06  6:56 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Tim Deegan, Julien Grall, Jan Beulich, Dario Faggioli,
	Roger Pau Monné

Add a scheduling granularity enum ("thread", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"thread", this can be modified by the new boot parameter (x86 only)
"sched_granularity".

According to the selected granularity sched_granularity is set after
all cpus are online. The sched items of the idle vcpus and the sched
resources of the physical cpus need to be combined in case
sched_granularity > 1, this happens before the init_pdata hook of
the active scheduler is being called.

A test is added for all sched resources holding the same number of
cpus. For now panic if this is not the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- fixed freeing of sched_res when merging cpus
- rename parameter to "sched-gran" (Jan Beulich)
- rename parameter option from "thread" to "cpu" (Jan Beulich)
---
 xen/arch/x86/setup.c       |   2 +
 xen/common/schedule.c      | 155 +++++++++++++++++++++++++++++++++++++++++++--
 xen/include/xen/sched-if.h |   4 +-
 xen/include/xen/sched.h    |   1 +
 4 files changed, 153 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 3440794275..83854eeef8 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1701,6 +1701,8 @@ void __init noreturn __start_xen(unsigned long mbi_p)
         printk(XENLOG_INFO "Parked %u CPUs\n", num_parked);
     smp_cpus_done();
 
+    scheduler_smp_init();
+
     do_initcalls();
 
     if ( opt_watchdog ) 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 4336f2bdf8..3e68259411 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -55,9 +55,32 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings);
 int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US;
 integer_param("sched_ratelimit_us", sched_ratelimit_us);
 
+static enum {
+    SCHED_GRAN_cpu,
+    SCHED_GRAN_core,
+    SCHED_GRAN_socket
+} opt_sched_granularity = SCHED_GRAN_cpu;
+
+#ifdef CONFIG_X86
+static int __init sched_select_granularity(const char *str)
+{
+    if (strcmp("cpu", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_cpu;
+    else if (strcmp("core", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_core;
+    else if (strcmp("socket", str) == 0)
+        opt_sched_granularity = SCHED_GRAN_socket;
+    else
+        return -EINVAL;
+
+    return 0;
+}
+custom_param("sched-gran", sched_select_granularity);
+#endif
+
 /* Number of vcpus per struct sched_item. */
 static unsigned int sched_granularity = 1;
-const cpumask_t *sched_res_mask = &cpumask_all;
+cpumask_var_t sched_res_mask;
 
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
@@ -323,6 +346,8 @@ static void sched_free_item(struct sched_item *item, struct vcpu *v)
     if ( item->vcpu == v )
         item->vcpu = v->next_in_list;
 
+    item->runstate_cnt[v->runstate.state]--;
+
     if ( !cnt )
         sched_free_item_mem(item);
 }
@@ -2113,8 +2138,14 @@ static int cpu_schedule_up(unsigned int cpu)
     sd = xzalloc(struct sched_resource);
     if ( sd == NULL )
         return -ENOMEM;
+    if ( !zalloc_cpumask_var(&sd->cpus) )
+    {
+        xfree(sd);
+        return -ENOMEM;
+    }
+
     sd->processor = cpu;
-    sd->cpus = cpumask_of(cpu);
+    cpumask_copy(sd->cpus, cpumask_of(cpu));
     per_cpu(sched_res, cpu) = sd;
 
     per_cpu(scheduler, cpu) = &ops;
@@ -2170,30 +2201,92 @@ static int cpu_schedule_up(unsigned int cpu)
     return 0;
 }
 
+static void sched_free_sched_res(struct sched_resource *sd)
+{
+    kill_timer(&sd->s_timer);
+    free_cpumask_var(sd->cpus);
+
+    xfree(sd);
+}
+
 static void cpu_schedule_down(unsigned int cpu)
 {
     struct sched_resource *sd = per_cpu(sched_res, cpu);
     struct scheduler *sched = per_cpu(scheduler, cpu);
 
+    cpumask_clear_cpu(cpu, sd->cpus);
+    per_cpu(sched_res, cpu) = NULL;
+
+    if ( cpumask_weight(sd->cpus) )
+        return;
+
     sched_free_pdata(sched, sd->sched_priv, cpu);
     sched_free_vdata(sched, idle_vcpu[cpu]->sched_item->priv);
 
     idle_vcpu[cpu]->sched_item->priv = NULL;
     sd->sched_priv = NULL;
+    cpumask_clear_cpu(cpu, sched_res_mask);
 
-    kill_timer(&sd->s_timer);
-
-    xfree(per_cpu(sched_res, cpu));
-    per_cpu(sched_res, cpu) = NULL;
+    sched_free_sched_res(sd);
 }
 
 void scheduler_percpu_init(unsigned int cpu)
 {
     struct scheduler *sched = per_cpu(scheduler, cpu);
     struct sched_resource *sd = per_cpu(sched_res, cpu);
+    const cpumask_t *mask;
+    unsigned int master_cpu;
+    spinlock_t *lock;
+    struct sched_item *old_item, *master_item;
+
+    if ( system_state == SYS_STATE_resume )
+        return;
+
+    switch ( opt_sched_granularity )
+    {
+    case SCHED_GRAN_cpu:
+        mask = cpumask_of(cpu);
+        break;
+    case SCHED_GRAN_core:
+        mask = per_cpu(cpu_sibling_mask, cpu);
+        break;
+    case SCHED_GRAN_socket:
+        mask = per_cpu(cpu_core_mask, cpu);
+        break;
+    default:
+        ASSERT_UNREACHABLE();
+        return;
+    }
 
-    if ( system_state != SYS_STATE_resume )
+    if ( cpu == 0 || cpumask_weight(mask) == 1 )
+    {
+        cpumask_set_cpu(cpu, sched_res_mask);
         sched_init_pdata(sched, sd->sched_priv, cpu);
+        return;
+    }
+
+    master_cpu = cpumask_first(mask);
+    master_item = idle_vcpu[master_cpu]->sched_item;
+    lock = pcpu_schedule_lock(master_cpu);
+
+    /* Merge idle_vcpu item and sched_resource into master cpu. */
+    old_item = idle_vcpu[cpu]->sched_item;
+    idle_vcpu[cpu]->sched_item = master_item;
+    per_cpu(sched_res, cpu) = per_cpu(sched_res, master_cpu);
+    per_cpu(sched_res_idx, cpu) = cpumask_weight(per_cpu(sched_res, cpu)->cpus);
+    cpumask_set_cpu(cpu, per_cpu(sched_res, cpu)->cpus);
+    master_item->runstate_cnt[RUNSTATE_running] +=
+            old_item->runstate_cnt[RUNSTATE_running];
+    master_item->runstate_cnt[RUNSTATE_runnable] +=
+            old_item->runstate_cnt[RUNSTATE_runnable];
+
+    pcpu_schedule_unlock(lock, master_cpu);
+
+    sched_free_pdata(sched, sd->sched_priv, cpu);
+    sched_free_vdata(sched, old_item->priv);
+
+    sched_free_sched_res(sd);
+    sched_free_item_mem(old_item);
 }
 
 static int cpu_schedule_callback(
@@ -2273,6 +2366,51 @@ static struct notifier_block cpu_schedule_nfb = {
     .notifier_call = cpu_schedule_callback
 };
 
+static unsigned int __init sched_check_granularity(void)
+{
+    unsigned int cpu;
+    unsigned int siblings, gran = 0;
+
+    for_each_online_cpu( cpu )
+    {
+        switch ( opt_sched_granularity )
+        {
+        case SCHED_GRAN_cpu:
+            /* If granularity is "thread" we are fine already. */
+            return 1;
+        case SCHED_GRAN_core:
+            siblings = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
+            break;
+        case SCHED_GRAN_socket:
+            siblings = cpumask_weight(per_cpu(cpu_core_mask, cpu));
+            break;
+        default:
+            ASSERT_UNREACHABLE();
+            return 0;
+        }
+
+        if ( gran == 0 )
+            gran = siblings;
+        else if ( gran != siblings )
+            return 0;
+    }
+
+    return gran;
+}
+
+/* Setup data for selected scheduler granularity. */
+void __init scheduler_smp_init(void)
+{
+    unsigned int gran;
+
+    gran = sched_check_granularity();
+    if ( gran == 0 )
+        panic("Illegal cpu configuration for scheduling granularity!\n"
+              "Please use thread scheduling.\n");
+
+    sched_granularity = gran;
+}
+
 /* Initialise the data structures. */
 void __init scheduler_init(void)
 {
@@ -2304,6 +2442,9 @@ void __init scheduler_init(void)
         printk("Using '%s' (%s)\n", ops.name, ops.opt_name);
     }
 
+    if ( !zalloc_cpumask_var(&sched_res_mask) )
+        BUG();
+
     if ( cpu_schedule_up(0) )
         BUG();
     register_cpu_notifier(&cpu_schedule_nfb);
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index f16d81ab4a..86525da77b 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -23,7 +23,7 @@ extern cpumask_t cpupool_free_cpus;
 extern int sched_ratelimit_us;
 
 /* Scheduling resource mask. */
-extern const cpumask_t *sched_res_mask;
+extern cpumask_var_t sched_res_mask;
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -45,7 +45,7 @@ struct sched_resource {
     struct timer        s_timer;        /* scheduling timer                */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
     unsigned            processor;
-    const cpumask_t    *cpus;           /* cpus covered by this struct     */
+    cpumask_var_t       cpus;           /* cpus covered by this struct     */
 };
 
 #define curr_on_cpu(c)    (per_cpu(sched_res, c)->curr)
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 5629602de5..0c37c2b55e 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -681,6 +681,7 @@ void noreturn asm_domain_crash_synchronous(unsigned long addr);
 
 void scheduler_init(void);
 void scheduler_percpu_init(unsigned int cpu);
+void scheduler_smp_init(void);
 int  sched_init_vcpu(struct vcpu *v);
 void sched_destroy_vcpu(struct vcpu *v);
 int  sched_init_domain(struct domain *d, int poolid);
-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-05-10 11:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-10 11:22 [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
2019-05-10 11:22 ` [Xen-devel] " Juergen Gross
  -- strict thread matches above, loose matches on Subject: below --
2019-05-06 13:29 Juergen Gross
2019-05-06  6:55 [PATCH RFC V2 00/45] xen: add core scheduling support Juergen Gross
2019-05-06  6:56 ` [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
2019-05-06  8:57   ` Jan Beulich
     [not found]   ` <5CCFF6F1020000780022C12B@suse.com>
     [not found]     ` <ac57c420*a72e*7570*db8f*27e4693c2755@suse.com>
2019-05-06  9:23     ` Juergen Gross
2019-05-06 10:01       ` Jan Beulich
2019-05-08 14:36         ` Juergen Gross
2019-05-10  8:53           ` Jan Beulich
     [not found]           ` <5CD53C1C020000780022D706@suse.com>
2019-05-10  9:00             ` Juergen Gross
2019-05-10 10:29               ` Dario Faggioli
2019-05-10 11:17               ` Jan Beulich
     [not found]       ` <5CD005E7020000780022C1B5@suse.com>
2019-05-06 10:20         ` Juergen Gross
2019-05-06 11:58           ` Jan Beulich
     [not found]           ` <5CD02161020000780022C257@suse.com>
2019-05-06 12:23             ` Juergen Gross
2019-05-06 13:14               ` Jan Beulich
     [not found] <20190506065644.7415****1****jgross@suse.com>
     [not found] <20190506065644.7415*1*jgross@suse.com>

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).