All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-10 11:22 ` Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-10 11:22 UTC (permalink / raw)
  To: Dario Faggioli, Jan Beulich
  Cc: Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Tim Deegan, Julien Grall,
	xen-devel, Ian Jackson, Roger Pau Monne

On 10/05/2019 12:29, Dario Faggioli wrote:
> On Fri, 2019-05-10 at 11:00 +0200, Juergen Gross wrote:
>> On 10/05/2019 10:53, Jan Beulich wrote:
>>>>>> On 08.05.19 at 16:36, <jgross@suse.com> wrote:
>>>>
>>>> With sched-gran=core or sched-gran=socket offlining a single cpu
>>>> results
>>>> in moving the complete core or socket to cpupool_free_cpus and
>>>> then
>>>> offlining from there. Only complete cores/sockets can be moved to
>>>> any
>>>> cpupool. When onlining a cpu it is added to cpupool_free_cpus and
>>>> if
>>>> the core/socket is completely online it will automatically be
>>>> added to
>>>> Pool-0 (as today any single onlined cpu).
>>>
>>> Well, this is in line with what was discussed on the call
>>> yesterday, so
>>> I think it's an acceptable initial state to end up in. Albeit, just
>>> for
>>> completeness, I'm not convinced there's no use for "smt-
>>> {dis,en}able"
>>> anymore with core-aware scheduling implemented just in Xen - it
>>> may still be considered useful as long as we don't expose proper
>>> topology to guests, for them to be able to do something similar.
>>
>> As the extra complexity for supporting that is significant I'd like
>> to
>> at least postpone it. And with the (later) introduction of per-
>> cpupool
>> smt on/off I guess this would be even less important.
>>
> I agree.
> 
> Isn't it the case that (but note that I'm just thinking out loud here),
> if we make smt= and sched-gran= per-cpupool, the user gains the chance
> to use both, if he/she wants (e.g., for testing)?

Yes.

> If yes, is such a thing valuable enough that it'd it make sense to work
> on that, as a first thing, I mean?

My planned roadmap is:

1. this series
2. scheduler clean-up
3. per-cpupool smt and granularity

> We'd still forbid moving things from pools with different
> configuration, at least at the beginning, of course.

Right, allowing that would be 4.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread
* Re: [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum
@ 2019-05-06 13:29 Juergen Gross
  0 siblings, 0 replies; 16+ messages in thread
From: Juergen Gross @ 2019-05-06 13:29 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tim Deegan, Stefano Stabellini, Wei Liu, Konrad Rzeszutek Wilk,
	George Dunlap, Andrew Cooper, Ian Jackson, Dario Faggioli,
	Julien Grall, xen-devel, Roger Pau Monne

On 06/05/2019 15:14, Jan Beulich wrote:
>>>> On 06.05.19 at 14:23, <jgross@suse.com> wrote:
>> On 06/05/2019 13:58, Jan Beulich wrote:
>>>>>> On 06.05.19 at 12:20, <jgross@suse.com> wrote:
>>>> On 06/05/2019 12:01, Jan Beulich wrote:
>>>>>>>> On 06.05.19 at 11:23, <jgross@suse.com> wrote:
>>>>>> On 06/05/2019 10:57, Jan Beulich wrote:
>>>>>>> . Yet then I'm a little puzzled by its use here in the first place.
>>>>>>> Generally I think for_each_cpu() uses in __init functions are
>>>>>>> problematic, as they then require further code elsewhere to
>>>>>>> deal with hot-onlining. A pre-SMP-initcall plus use of CPU
>>>>>>> notifiers is typically more appropriate.
>>>>>>
>>>>>> And that was mentioned in the cover letter: cpu hotplug is not yet
>>>>>> handled (hence the RFC status of the series).
>>>>>>
>>>>>> When cpu hotplug is being added it might be appropriate to switch the
>>>>>> scheme as you suggested. Right now the current solution is much more
>>>>>> simple.
>>>>>
>>>>> I see (I did notice the cover letter remark, but managed to not
>>>>> honor it when writing the reply), but I'm unconvinced if incurring
>>>>> more code churn by not dealing with things the "dynamic" way
>>>>> right away is indeed the "more simple" (overall) solution.
>>>>
>>>> Especially with hotplug things are becoming more complicated: I'd like
>>>> to have the final version fall back to smaller granularities in case
>>>> e.g. the user has selected socket scheduling and two sockets have
>>>> different numbers of cores. With hotplug such a situation might be
>>>> discovered only with some domUs already running, so how should we
>>>> react in that case? Doing panic() is no option, so either we reject
>>>> onlining the additional socket, or we adapt by dynamically modifying the
>>>> scheduling granularity. Without that being discussed I don't think it
>>>> makes sense to put a lot effort into a solution which is going to be
>>>> rejected in the end.
>>>
>>> Hmm, where's the symmetry requirement coming from? Socket
>>> scheduling should mean as many vCPU-s on one socket as there
>>> are cores * threads; similarly core scheduling (number of threads).
>>> Statically partitioning domains would seem an intermediate step
>>> at best only anyway, as that requires (on average) leaving more
>>> resources (cores/threads) idle than with a dynamic partitioning
>>> model.
>>
>> And that is exactly the purpose of core/socket scheduling. How else
>> would it be possible (in future) to pass through the topology below
>> the scheduling granularity to the guest?
> 
> True. Albeit nevertheless an (at least) unfortunate limitation.
> 
>> And how should it be of any
>> use for fighting security issues due to side channel attacks?
> 
> From Xen's pov all is still fine afaict. It's the lack of (correct)
> topology exposure (as per above) which would make guest
> side mitigation impossible.
> 
>>> As to your specific question how to react: Since bringing online
>>> a full new socket implies bringing online all its cores / threads one
>>> by one anyway, a "too small" socket in your scheme would
>>> simply result in the socket remaining unused until "enough"
>>> cores/threads have appeared. Similarly the socket would go
>>> out of use as soon as one of its cores/threads gets offlined.
>>
>> Yes, this is a possible way to do it. It should be spelled out,
>> though.
>>
>>> Obviously this ends up problematic for the last usable socket.
>>
>> Yes, like today for the last cpu/thread.
> 
> Well, only kind of. It's quite expected that the last thread
> can't be offlined. I'd call it rather unexpected that a random
> thread on the last socket can't be offlined just because each
> other socket also has a single offline thread: There might
> still be hundreds of online threads in this case, after all.

You'd need to offline the related thread in all active guests. Otherwise
(from the guest's point of view) a cpu suddenly disappears.

> 
>>> But with the static partitioning you describe I also can't really
>>> see how "xen-hptool smt-disable" is going to work.
>>
>> It won't work. It just makes no sense to use it with core scheduling
>> active.
> 
> Why not? Disabling HT may be for purposes other than mitigating
> vulnerabilities like L1TF. And the system is in a symmetric state
> at the beginning and end of the entire operation; it's merely
> intermediate state which doesn't fit the expectations you set forth.

It is like bare metal: You can't physically unplug a single thread. This
is possible only for complete sockets.

It would theoretically be possible to have a test whether all guests
have the related cpus offlined in order to offline them in Xen. IMHO
this would be overkill: as an admin you have to decide whether you want
to use core scheduling or you want the ability to switch of SMT on the
fly.

You can still boot e.g. with sched-gran=socket and smt=off.

Another possibility would be to make sched-gran and SMT per cpupool.
In that case I'd like to those attributes static at creation time of
the cpupool, though.


Juergen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread
* [PATCH RFC V2 00/45] xen: add core scheduling support
@ 2019-05-06  6:55 Juergen Gross
  2019-05-06  6:56 ` [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
  0 siblings, 1 reply; 16+ messages in thread
From: Juergen Gross @ 2019-05-06  6:55 UTC (permalink / raw)
  To: xen-devel
  Cc: Juergen Gross, Tim Deegan, Stefano Stabellini, Wei Liu,
	Konrad Rzeszutek Wilk, George Dunlap, Andrew Cooper, Ian Jackson,
	Robert VanVossen, Dario Faggioli, Julien Grall, Josh Whitehead,
	Meng Xu, Jan Beulich, Roger Pau Monné

Add support for core- and socket-scheduling in the Xen hypervisor.

Via boot parameter sched-gran=core (or sched-gran=socket)
it is possible to change the scheduling granularity from cpu (the
default) to either whole cores or even sockets.

All logical cpus (threads) of the core or socket are always scheduled
together. This means that on a core always vcpus of the same domain
will be active, and those vcpus will always be scheduled at the same
time.

This is achieved by switching the scheduler to no longer see vcpus as
the primary object to schedule, but "schedule items". Each schedule
item consists of as many vcpus as each core has threads on the current
system. The vcpu->item relation is fixed.

I have done some very basic performance testing: on a 4 cpu system
(2 cores with 2 threads each) I did a "make -j 4" for building the Xen
hypervisor. With This test has been run on dom0, once with no other
guest active and once with another guest with 4 vcpus running the same
test. The results are (always elapsed time, system time, user time):

sched-gran=cpu,    no other guest: 116.10 177.65 207.84
sched-gran=core,   no other guest: 114.04 175.47 207.45
sched-gran=cpu,    other guest:    202.30 334.21 384.63
sched-gran=core,   other guest:    207.24 293.04 371.37

All tests have been performed with credit2, the other schedulers are
untested up to now.

Cpupools are not yet working, as moving cpus between cpupools needs
more work (this is the reason for the series still being RFC). Same
applies to cpu add/remove.

Changes in RFC V2:
- ARM is building now
- HVM domains are working now
- idling will always be done with idle_vcpu active
- other small changes see individual patches

Juergen Gross (45):
  xen/sched: add inline wrappers for calling per-scheduler functions
  xen/sched: use new sched_item instead of vcpu in scheduler interfaces
  xen/sched: alloc struct sched_item for each vcpu
  xen/sched: move per-vcpu scheduler private data pointer to sched_item
  xen/sched: build a linked list of struct sched_item
  xen/sched: introduce struct sched_resource
  xen/sched: let pick_cpu return a scheduler resource
  xen/sched: switch schedule_data.curr to point at sched_item
  xen/sched: move per cpu scheduler private data into struct
    sched_resource
  xen/sched: switch vcpu_schedule_lock to item_schedule_lock
  xen/sched: move some per-vcpu items to struct sched_item
  xen/sched: add scheduler helpers hiding vcpu
  xen/sched: add domain pointer to struct sched_item
  xen/sched: add id to struct sched_item
  xen/sched: rename scheduler related perf counters
  xen/sched: switch struct task_slice from vcpu to sched_item
  xen/sched: add is_running indicator to struct sched_item
  xen/sched: make null scheduler vcpu agnostic.
  xen/sched: make rt scheduler vcpu agnostic.
  xen/sched: make credit scheduler vcpu agnostic.
  xen/sched: make credit2 scheduler vcpu agnostic.
  xen/sched: make arinc653 scheduler vcpu agnostic.
  xen: add sched_item_pause_nosync() and sched_item_unpause()
  xen: let vcpu_create() select processor
  xen/sched: use sched_resource cpu instead smp_processor_id in
    schedulers
  xen/sched: switch schedule() from vcpus to sched_items
  xen/sched: switch sched_move_irqs() to take sched_item as parameter
  xen: switch from for_each_vcpu() to for_each_sched_item()
  xen/sched: add runstate counters to struct sched_item
  xen/sched: rework and rename vcpu_force_reschedule()
  xen/sched: Change vcpu_migrate_*() to operate on schedule item
  xen/sched: move struct task_slice into struct sched_item
  xen/sched: add code to sync scheduling of all vcpus of a sched item
  xen/sched: introduce item_runnable_state()
  xen/sched: add support for multiple vcpus per sched item where missing
  x86: make loading of GDT at context switch more modular
  x86: optimize loading of GDT at context switch
  xen/sched: modify cpupool_domain_cpumask() to be an item mask
  xen/sched: support allocating multiple vcpus into one sched item
  xen/sched: add a scheduler_percpu_init() function
  xen/sched: add a percpu resource index
  xen/sched: add fall back to idle vcpu when scheduling item
  xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
  xen/sched: carve out freeing sched_item memory into dedicated function
  xen/sched: add scheduling granularity enum

 xen/arch/arm/domain.c            |    2 +-
 xen/arch/arm/domain_build.c      |   13 +-
 xen/arch/arm/smpboot.c           |    2 +
 xen/arch/x86/dom0_build.c        |   10 +-
 xen/arch/x86/domain.c            |   95 ++-
 xen/arch/x86/hvm/dom0_build.c    |    9 +-
 xen/arch/x86/pv/dom0_build.c     |   10 +-
 xen/arch/x86/pv/emul-priv-op.c   |    1 +
 xen/arch/x86/pv/shim.c           |    4 +-
 xen/arch/x86/pv/traps.c          |    5 +-
 xen/arch/x86/setup.c             |    2 +
 xen/arch/x86/smpboot.c           |    2 +
 xen/arch/x86/traps.c             |    9 +-
 xen/common/cpupool.c             |   30 +-
 xen/common/domain.c              |   34 +-
 xen/common/domctl.c              |   23 +-
 xen/common/keyhandler.c          |    4 +-
 xen/common/sched_arinc653.c      |  258 ++++----
 xen/common/sched_credit.c        |  743 ++++++++++-----------
 xen/common/sched_credit2.c       | 1119 +++++++++++++++----------------
 xen/common/sched_null.c          |  424 ++++++------
 xen/common/sched_rt.c            |  544 +++++++--------
 xen/common/schedule.c            | 1348 +++++++++++++++++++++++++++++---------
 xen/common/softirq.c             |    6 +-
 xen/common/wait.c                |    4 +-
 xen/include/asm-arm/current.h    |    1 +
 xen/include/asm-x86/cpuidle.h    |    2 +-
 xen/include/asm-x86/current.h    |    7 +-
 xen/include/asm-x86/dom0_build.h |    3 +-
 xen/include/asm-x86/smp.h        |    3 +
 xen/include/xen/domain.h         |    3 +-
 xen/include/xen/perfc_defn.h     |   32 +-
 xen/include/xen/sched-if.h       |  418 ++++++++++--
 xen/include/xen/sched.h          |   95 ++-
 xen/include/xen/softirq.h        |    1 +
 35 files changed, 3198 insertions(+), 2068 deletions(-)

-- 
2.16.4


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread
[parent not found: <20190506065644.7415****1****jgross@suse.com>]
[parent not found: <20190506065644.7415*1*jgross@suse.com>]

end of thread, other threads:[~2019-05-10 11:22 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-10 11:22 [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
2019-05-10 11:22 ` [Xen-devel] " Juergen Gross
  -- strict thread matches above, loose matches on Subject: below --
2019-05-06 13:29 Juergen Gross
2019-05-06  6:55 [PATCH RFC V2 00/45] xen: add core scheduling support Juergen Gross
2019-05-06  6:56 ` [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum Juergen Gross
2019-05-06  8:57   ` Jan Beulich
     [not found]   ` <5CCFF6F1020000780022C12B@suse.com>
     [not found]     ` <ac57c420*a72e*7570*db8f*27e4693c2755@suse.com>
2019-05-06  9:23     ` Juergen Gross
2019-05-06 10:01       ` Jan Beulich
2019-05-08 14:36         ` Juergen Gross
2019-05-10  8:53           ` Jan Beulich
     [not found]           ` <5CD53C1C020000780022D706@suse.com>
2019-05-10  9:00             ` Juergen Gross
2019-05-10 10:29               ` Dario Faggioli
2019-05-10 11:17               ` Jan Beulich
     [not found]       ` <5CD005E7020000780022C1B5@suse.com>
2019-05-06 10:20         ` Juergen Gross
2019-05-06 11:58           ` Jan Beulich
     [not found]           ` <5CD02161020000780022C257@suse.com>
2019-05-06 12:23             ` Juergen Gross
2019-05-06 13:14               ` Jan Beulich
     [not found] <20190506065644.7415****1****jgross@suse.com>
     [not found] <20190506065644.7415*1*jgross@suse.com>

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.