All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Display the cpu of sched domain in procfs
@ 2020-02-19  7:15 Chen Yu
  2020-02-19  8:13 ` Valentin Schneider
  0 siblings, 1 reply; 6+ messages in thread
From: Chen Yu @ 2020-02-19  7:15 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Mel Gorman, Tony Luck, Aubrey Li, Tim Chen, Dave Hansen,
	Borislav Petkov

Problem:
sched domain topology is not always consistent with the CPU topology exposed at
/sys/devices/system/cpu/cpuX/topology,  which makes it
hard for monitor tools to distinguish the CPUs among different sched domains.

For example, on x86 if there are NUMA nodes within a package, say,
SNC(Sub-Numa-Cluster),
then there would be no die sched domain but only NUMA sched domains
created. As a result,
you don't know what the sched domain hierarchical is by only looking
at /sys/devices/system/cpu/cpuX/topology.

Although by appending sched_debug in command line would show the sched
domain CPU topology,
it is only printed once during boot up, which makes it hard to track
at run-time.

Proposal:
Add *span* filed under proc sched_domain directory to represent the
set of CPUs in each
sched domain.

Question:
*Before sending the patch out, may I have you opinions on whether this
is doable?*


Here are the sample output on a SNC system after the patch been applied:
grep . /proc/sys/kernel/sched_domain/cpu0/domain*/span
/proc/sys/kernel/sched_domain/cpu0/domain0/span:0,96     (SMT domain)
/proc/sys/kernel/sched_domain/cpu0/domain1/span:0-3,7-9,13-15,19-20,
(MC domain)

                   96-99,103-105,109-111,

                   115-116
/proc/sys/kernel/sched_domain/cpu0/domain2/span:0-23,96-119
         (NUMA domain)
/proc/sys/kernel/sched_domain/cpu0/domain3/span:0-191
           (NUMA domain)


FYI, the corresponding CPU topology is:
grep . /sys/devices/system/cpu/cpu0/topology/*cpus_list
/sys/devices/system/cpu/cpu0/topology/core_cpus_list:0,96
/sys/devices/system/cpu/cpu0/topology/die_cpus_list:0-23,96-119
/sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-47,96-143


thanks,
Chenyu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Display the cpu of sched domain in procfs
  2020-02-19  7:15 [RFC] Display the cpu of sched domain in procfs Chen Yu
@ 2020-02-19  8:13 ` Valentin Schneider
  2020-02-19  8:53   ` Dietmar Eggemann
  2020-02-19 10:00   ` Chen Yu
  0 siblings, 2 replies; 6+ messages in thread
From: Valentin Schneider @ 2020-02-19  8:13 UTC (permalink / raw)
  To: Chen Yu, Linux Kernel Mailing List
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Mel Gorman, Tony Luck, Aubrey Li, Tim Chen, Dave Hansen,
	Borislav Petkov

Hi,

On 19/02/2020 07:15, Chen Yu wrote:
> Problem:
> sched domain topology is not always consistent with the CPU topology exposed at
> /sys/devices/system/cpu/cpuX/topology,  which makes it
> hard for monitor tools to distinguish the CPUs among different sched domains.
> 
> For example, on x86 if there are NUMA nodes within a package, say,
> SNC(Sub-Numa-Cluster),
> then there would be no die sched domain but only NUMA sched domains
> created. As a result,
> you don't know what the sched domain hierarchical is by only looking
> at /sys/devices/system/cpu/cpuX/topology.
> 
> Although by appending sched_debug in command line would show the sched
> domain CPU topology,
> it is only printed once during boot up, which makes it hard to track
> at run-time.
> 

It should (and in my experience, is) be printed any time there is a sched
domain update - hotplug, cpusets, IOW not just at bootup.

e.g. if I hotplug out a CPU:

root@valsch-juno:~# echo 0 > /sys/devices/system/cpu/cpu3/online
[40150.882586] CPU3: shutdown
[40150.885383] psci: CPU3 killed (polled 0 ms)
[40150.891362] CPU0 attaching NULL sched-domain.
[40150.895954] CPU1 attaching NULL sched-domain.
[40150.900433] CPU2 attaching NULL sched-domain.
[40150.906583] CPU3 attaching NULL sched-domain.
[40150.910998] CPU4 attaching NULL sched-domain.
[40150.915444] CPU5 attaching NULL sched-domain.
[40150.920108] CPU0 attaching sched-domain(s):
[40150.924396]  domain-0: span=0,4-5 level=MC
[40150.928592]   groups: 0:{ span=0 cap=444 }, 4:{ span=4 cap=445 }, 5:{ span=5 cap=446 }
[40150.936684]   domain-1: span=0-2,4-5 level=DIE
[40150.941207]    groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2041 }
[40150.948107] CPU1 attaching sched-domain(s):
[40150.952342]  domain-0: span=1-2 level=MC
[40150.956311]   groups: 1:{ span=1 cap=1020 }, 2:{ span=2 cap=1021 }
[40150.962592]   domain-1: span=0-2,4-5 level=DIE
[40150.967082]    groups: 1:{ span=1-2 cap=2041 }, 0:{ span=0,4-5 cap=1335 }
[40150.973984] CPU2 attaching sched-domain(s):
[40150.978208]  domain-0: span=1-2 level=MC
[40150.982176]   groups: 2:{ span=2 cap=1021 }, 1:{ span=1 cap=1021 }
[40150.988431]   domain-1: span=0-2,4-5 level=DIE
[40150.992922]    groups: 1:{ span=1-2 cap=2042 }, 0:{ span=0,4-5 cap=1335 }
[40150.999819] CPU4 attaching sched-domain(s):
[40151.004045]  domain-0: span=0,4-5 level=MC
[40151.008186]   groups: 4:{ span=4 cap=445 }, 5:{ span=5 cap=446 }, 0:{ span=0 cap=444 }
[40151.016220]   domain-1: span=0-2,4-5 level=DIE
[40151.020722]    groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2044 }
[40151.027619] CPU5 attaching sched-domain(s):
[40151.031843]  domain-0: span=0,4-5 level=MC
[40151.035985]   groups: 5:{ span=5 cap=446 }, 0:{ span=0 cap=444 }, 4:{ span=4 cap=445 }
[40151.044021]   domain-1: span=0-2,4-5 level=DIE
[40151.048512]    groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2043 }
[40151.055440] root domain span: 0-2,4-5 (max cpu_capacity = 1024)


Same for setting up cpusets:

root@valsch-juno:~# cgset -r cpuset.mems=0 asym
root@valsch-juno:~# cgset -r cpuset.cpu_exclusive=1 asym
root@valsch-juno:~# 
root@valsch-juno:~# cgcreate -g cpuset:smp
root@valsch-juno:~# cgset -r cpuset.cpus=4-5 smp
root@valsch-juno:~# cgset -r cpuset.mems=0 smp
root@valsch-juno:~# cgset -r cpuset.cpu_exclusive=1 smp
root@valsch-juno:~# 
root@valsch-juno:~# cgset -r cpuset.sched_load_balance=0 .
[40224.135466] CPU0 attaching NULL sched-domain.
[40224.140038] CPU1 attaching NULL sched-domain.
[40224.144531] CPU2 attaching NULL sched-domain.
[40224.148951] CPU3 attaching NULL sched-domain.
[40224.153366] CPU4 attaching NULL sched-domain.
[40224.157811] CPU5 attaching NULL sched-domain.
[40224.162394] CPU0 attaching sched-domain(s):
[40224.166623]  domain-0: span=0,3 level=MC
[40224.170624]   groups: 0:{ span=0 cap=445 }, 3:{ span=3 cap=446 }
[40224.176709]   domain-1: span=0-3 level=DIE
[40224.180884]    groups: 0:{ span=0,3 cap=891 }, 1:{ span=1-2 cap=2044 }
[40224.187497] CPU1 attaching sched-domain(s):
[40224.191753]  domain-0: span=1-2 level=MC
[40224.195724]   groups: 1:{ span=1 cap=1021 }, 2:{ span=2 cap=1023 }
[40224.202010]   domain-1: span=0-3 level=DIE
[40224.206154]    groups: 1:{ span=1-2 cap=2044 }, 0:{ span=0,3 cap=890 }
[40224.212792] CPU2 attaching sched-domain(s):
[40224.217020]  domain-0: span=1-2 level=MC
[40224.220989]   groups: 2:{ span=2 cap=1023 }, 1:{ span=1 cap=1020 }
[40224.227244]   domain-1: span=0-3 level=DIE
[40224.231386]    groups: 1:{ span=1-2 cap=2042 }, 0:{ span=0,3 cap=889 }
[40224.238025] CPU3 attaching sched-domain(s):
[40224.242252]  domain-0: span=0,3 level=MC
[40224.246221]   groups: 3:{ span=3 cap=446 }, 0:{ span=0 cap=443 }
[40224.252329]   domain-1: span=0-3 level=DIE
[40224.256474]    groups: 0:{ span=0,3 cap=889 }, 1:{ span=1-2 cap=2042 }
[40224.263142] root domain span: 0-3 (max cpu_capacity = 1024)
[40224.268945] CPU4 attaching sched-domain(s):
[40224.273200]  domain-0: span=4-5 level=MC
[40224.277173]   groups: 4:{ span=4 cap=446 }, 5:{ span=5 cap=444 }
[40224.283291] CPU5 attaching sched-domain(s):
[40224.287517]  domain-0: span=4-5 level=MC
[40224.291487]   groups: 5:{ span=5 cap=444 }, 4:{ span=4 cap=446 }
[40224.297584] root domain span: 4-5 (max cpu_capacity = 446)
[40224.303185] rd 4-5: CPUs do not have asymmetric capacities

So in short, if you have sched_debug enabled, whatever is reported as the
last sched domain hierarchy in dmesg will be the one in use.

Now, if you have a userspace that tries to be clever and wants to use this
information then yes, this isn't ideal, but then that's a different matter.
I think exposing the NUMA boundaries is fair game - and they already are
via /sys/devices/system/node/node*/. I'm not sure we'd want to expose more
(e.g. MC span), ideally that is something you shouldn't really have to care
about - that's the scheduler's job.

> Proposal:
> Add *span* filed under proc sched_domain directory to represent the
> set of CPUs in each
> sched domain.
> 
> Question:
> *Before sending the patch out, may I have you opinions on whether this
> is doable?*
> 
> 
> Here are the sample output on a SNC system after the patch been applied:
> grep . /proc/sys/kernel/sched_domain/cpu0/domain*/span
> /proc/sys/kernel/sched_domain/cpu0/domain0/span:0,96     (SMT domain)
> /proc/sys/kernel/sched_domain/cpu0/domain1/span:0-3,7-9,13-15,19-20,
> (MC domain)
> 
>                    96-99,103-105,109-111,
> 
>                    115-116
> /proc/sys/kernel/sched_domain/cpu0/domain2/span:0-23,96-119
>          (NUMA domain)
> /proc/sys/kernel/sched_domain/cpu0/domain3/span:0-191
>            (NUMA domain)
> 
> 
> FYI, the corresponding CPU topology is:
> grep . /sys/devices/system/cpu/cpu0/topology/*cpus_list
> /sys/devices/system/cpu/cpu0/topology/core_cpus_list:0,96
> /sys/devices/system/cpu/cpu0/topology/die_cpus_list:0-23,96-119
> /sys/devices/system/cpu/cpu0/topology/package_cpus_list:0-47,96-143
> 
> 
> thanks,
> Chenyu
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Display the cpu of sched domain in procfs
  2020-02-19  8:13 ` Valentin Schneider
@ 2020-02-19  8:53   ` Dietmar Eggemann
  2020-02-19 10:01     ` Chen Yu
  2020-02-19 10:00   ` Chen Yu
  1 sibling, 1 reply; 6+ messages in thread
From: Dietmar Eggemann @ 2020-02-19  8:53 UTC (permalink / raw)
  To: Valentin Schneider, Chen Yu, Linux Kernel Mailing List
  Cc: Peter Zijlstra, Ingo Molnar, Juri Lelli, Vincent Guittot,
	Mel Gorman, Tony Luck, Aubrey Li, Tim Chen, Dave Hansen,
	Borislav Petkov

On 19/02/2020 09:13, Valentin Schneider wrote:
> Hi,
> 
> On 19/02/2020 07:15, Chen Yu wrote:
>> Problem:
>> sched domain topology is not always consistent with the CPU topology exposed at
>> /sys/devices/system/cpu/cpuX/topology,  which makes it
>> hard for monitor tools to distinguish the CPUs among different sched domains.
>>
>> For example, on x86 if there are NUMA nodes within a package, say,
>> SNC(Sub-Numa-Cluster),
>> then there would be no die sched domain but only NUMA sched domains
>> created. As a result,
>> you don't know what the sched domain hierarchical is by only looking
>> at /sys/devices/system/cpu/cpuX/topology.
>>
>> Although by appending sched_debug in command line would show the sched
>> domain CPU topology,
>> it is only printed once during boot up, which makes it hard to track
>> at run-time.

What about /proc/schedstat?

E.g. on Intel Xeon CPU E5-2690 v2

$ cat /proc/schedstat | head
version 15
timestamp 4486170100
cpu0 0 0 0 0 0 0 59501267037720 16902762382193 1319621004
domain0 00,00100001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00,3ff003ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

        ^^^^^^^^^^^

cpu1 0 0 0 0 0 0 56045879920164 16758983055410 1318489275
domain0 00,00200002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 00,3ff003ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
...


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Display the cpu of sched domain in procfs
  2020-02-19  8:13 ` Valentin Schneider
  2020-02-19  8:53   ` Dietmar Eggemann
@ 2020-02-19 10:00   ` Chen Yu
  2020-02-19 10:46     ` Valentin Schneider
  1 sibling, 1 reply; 6+ messages in thread
From: Chen Yu @ 2020-02-19 10:00 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Mel Gorman, Tony Luck, Aubrey Li,
	Tim Chen, Dave Hansen, Borislav Petkov

Hi Valentin,
Thanks very much for looking at my question,
On Wed, Feb 19, 2020 at 4:13 PM Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> Hi,
>
> On 19/02/2020 07:15, Chen Yu wrote:
> > Problem:
> > sched domain topology is not always consistent with the CPU topology exposed at
> > /sys/devices/system/cpu/cpuX/topology,  which makes it
> > hard for monitor tools to distinguish the CPUs among different sched domains.
> >
> > For example, on x86 if there are NUMA nodes within a package, say,
> > SNC(Sub-Numa-Cluster),
> > then there would be no die sched domain but only NUMA sched domains
> > created. As a result,
> > you don't know what the sched domain hierarchical is by only looking
> > at /sys/devices/system/cpu/cpuX/topology.
> >
> > Although by appending sched_debug in command line would show the sched
> > domain CPU topology,
> > it is only printed once during boot up, which makes it hard to track
> > at run-time.
> >
>
> It should (and in my experience, is) be printed any time there is a sched
> domain update - hotplug, cpusets, IOW not just at bootup.
>
Right, whenever domain has changed it will be printed.
> e.g. if I hotplug out a CPU:
>
> root@valsch-juno:~# echo 0 > /sys/devices/system/cpu/cpu3/online
> [40150.882586] CPU3: shutdown
> [40150.885383] psci: CPU3 killed (polled 0 ms)
> [40150.891362] CPU0 attaching NULL sched-domain.
> [40150.895954] CPU1 attaching NULL sched-domain.
> [40150.900433] CPU2 attaching NULL sched-domain.
> [40150.906583] CPU3 attaching NULL sched-domain.
> [40150.910998] CPU4 attaching NULL sched-domain.
> [40150.915444] CPU5 attaching NULL sched-domain.
> [40150.920108] CPU0 attaching sched-domain(s):
> [40150.924396]  domain-0: span=0,4-5 level=MC
> [40150.928592]   groups: 0:{ span=0 cap=444 }, 4:{ span=4 cap=445 }, 5:{ span=5 cap=446 }
> [40150.936684]   domain-1: span=0-2,4-5 level=DIE
> [40150.941207]    groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2041 }
> [40150.948107] CPU1 attaching sched-domain(s):
> [40150.952342]  domain-0: span=1-2 level=MC
> [40150.956311]   groups: 1:{ span=1 cap=1020 }, 2:{ span=2 cap=1021 }
> [40150.962592]   domain-1: span=0-2,4-5 level=DIE
> [40150.967082]    groups: 1:{ span=1-2 cap=2041 }, 0:{ span=0,4-5 cap=1335 }
> [40150.973984] CPU2 attaching sched-domain(s):
> [40150.978208]  domain-0: span=1-2 level=MC
> [40150.982176]   groups: 2:{ span=2 cap=1021 }, 1:{ span=1 cap=1021 }
> [40150.988431]   domain-1: span=0-2,4-5 level=DIE
> [40150.992922]    groups: 1:{ span=1-2 cap=2042 }, 0:{ span=0,4-5 cap=1335 }
> [40150.999819] CPU4 attaching sched-domain(s):
> [40151.004045]  domain-0: span=0,4-5 level=MC
> [40151.008186]   groups: 4:{ span=4 cap=445 }, 5:{ span=5 cap=446 }, 0:{ span=0 cap=444 }
> [40151.016220]   domain-1: span=0-2,4-5 level=DIE
> [40151.020722]    groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2044 }
> [40151.027619] CPU5 attaching sched-domain(s):
> [40151.031843]  domain-0: span=0,4-5 level=MC
> [40151.035985]   groups: 5:{ span=5 cap=446 }, 0:{ span=0 cap=444 }, 4:{ span=4 cap=445 }
> [40151.044021]   domain-1: span=0-2,4-5 level=DIE
> [40151.048512]    groups: 0:{ span=0,4-5 cap=1335 }, 1:{ span=1-2 cap=2043 }
> [40151.055440] root domain span: 0-2,4-5 (max cpu_capacity = 1024)
>
>
> Same for setting up cpusets:
>
> root@valsch-juno:~# cgset -r cpuset.mems=0 asym
> root@valsch-juno:~# cgset -r cpuset.cpu_exclusive=1 asym
> root@valsch-juno:~#
> root@valsch-juno:~# cgcreate -g cpuset:smp
> root@valsch-juno:~# cgset -r cpuset.cpus=4-5 smp
> root@valsch-juno:~# cgset -r cpuset.mems=0 smp
> root@valsch-juno:~# cgset -r cpuset.cpu_exclusive=1 smp
> root@valsch-juno:~#
> root@valsch-juno:~# cgset -r cpuset.sched_load_balance=0 .
> [40224.135466] CPU0 attaching NULL sched-domain.
> [40224.140038] CPU1 attaching NULL sched-domain.
> [40224.144531] CPU2 attaching NULL sched-domain.
> [40224.148951] CPU3 attaching NULL sched-domain.
> [40224.153366] CPU4 attaching NULL sched-domain.
> [40224.157811] CPU5 attaching NULL sched-domain.
> [40224.162394] CPU0 attaching sched-domain(s):
> [40224.166623]  domain-0: span=0,3 level=MC
> [40224.170624]   groups: 0:{ span=0 cap=445 }, 3:{ span=3 cap=446 }
> [40224.176709]   domain-1: span=0-3 level=DIE
> [40224.180884]    groups: 0:{ span=0,3 cap=891 }, 1:{ span=1-2 cap=2044 }
> [40224.187497] CPU1 attaching sched-domain(s):
> [40224.191753]  domain-0: span=1-2 level=MC
> [40224.195724]   groups: 1:{ span=1 cap=1021 }, 2:{ span=2 cap=1023 }
> [40224.202010]   domain-1: span=0-3 level=DIE
> [40224.206154]    groups: 1:{ span=1-2 cap=2044 }, 0:{ span=0,3 cap=890 }
> [40224.212792] CPU2 attaching sched-domain(s):
> [40224.217020]  domain-0: span=1-2 level=MC
> [40224.220989]   groups: 2:{ span=2 cap=1023 }, 1:{ span=1 cap=1020 }
> [40224.227244]   domain-1: span=0-3 level=DIE
> [40224.231386]    groups: 1:{ span=1-2 cap=2042 }, 0:{ span=0,3 cap=889 }
> [40224.238025] CPU3 attaching sched-domain(s):
> [40224.242252]  domain-0: span=0,3 level=MC
> [40224.246221]   groups: 3:{ span=3 cap=446 }, 0:{ span=0 cap=443 }
> [40224.252329]   domain-1: span=0-3 level=DIE
> [40224.256474]    groups: 0:{ span=0,3 cap=889 }, 1:{ span=1-2 cap=2042 }
> [40224.263142] root domain span: 0-3 (max cpu_capacity = 1024)
> [40224.268945] CPU4 attaching sched-domain(s):
> [40224.273200]  domain-0: span=4-5 level=MC
> [40224.277173]   groups: 4:{ span=4 cap=446 }, 5:{ span=5 cap=444 }
> [40224.283291] CPU5 attaching sched-domain(s):
> [40224.287517]  domain-0: span=4-5 level=MC
> [40224.291487]   groups: 5:{ span=5 cap=444 }, 4:{ span=4 cap=446 }
> [40224.297584] root domain span: 4-5 (max cpu_capacity = 446)
> [40224.303185] rd 4-5: CPUs do not have asymmetric capacities
>
> So in short, if you have sched_debug enabled, whatever is reported as the
> last sched domain hierarchy in dmesg will be the one in use.
>
> Now, if you have a userspace that tries to be clever and wants to use this
> information then yes, this isn't ideal, but then that's a different matter.
The dmesg might be lost if someone has once used dmesg -c to clear the log,
and the /var/log/dmesg might not always there. And it is not common to trigger
sched domain update once boot up in some environment.
But anyway, this information printed by sched_debug is very fertile for knowing
the topology.
> I think exposing the NUMA boundaries is fair game - and they already are
> via /sys/devices/system/node/node*/.
It seems that the numa sysfs could not reflect the SNC topology, it just has the
*leaf* numa node information. Say, node0 and node1 might form one sched_domain.
> I'm not sure we'd want to expose more
> (e.g. MC span), ideally that is something you shouldn't really have to care
> about - that's the scheduler's job.
I agree, it just aims to facilitate the user space, or just for
debugging purpose.
But as Dietmar replied, the /proc/schedstat has already exposed it.


thanks again,
Chenyu

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Display the cpu of sched domain in procfs
  2020-02-19  8:53   ` Dietmar Eggemann
@ 2020-02-19 10:01     ` Chen Yu
  0 siblings, 0 replies; 6+ messages in thread
From: Chen Yu @ 2020-02-19 10:01 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Valentin Schneider, Linux Kernel Mailing List, Peter Zijlstra,
	Ingo Molnar, Juri Lelli, Vincent Guittot, Mel Gorman, Tony Luck,
	Aubrey Li, Tim Chen, Dave Hansen, Borislav Petkov

Hi Dietmar,

On Wed, Feb 19, 2020 at 4:53 PM Dietmar Eggemann
<dietmar.eggemann@arm.com> wrote:
>
> On 19/02/2020 09:13, Valentin Schneider wrote:
> > Hi,
> >
> > On 19/02/2020 07:15, Chen Yu wrote:
> >> Problem:
> >> sched domain topology is not always consistent with the CPU topology exposed at
> >> /sys/devices/system/cpu/cpuX/topology,  which makes it
> >> hard for monitor tools to distinguish the CPUs among different sched domains.
> >>
> >> For example, on x86 if there are NUMA nodes within a package, say,
> >> SNC(Sub-Numa-Cluster),
> >> then there would be no die sched domain but only NUMA sched domains
> >> created. As a result,
> >> you don't know what the sched domain hierarchical is by only looking
> >> at /sys/devices/system/cpu/cpuX/topology.
> >>
> >> Although by appending sched_debug in command line would show the sched
> >> domain CPU topology,
> >> it is only printed once during boot up, which makes it hard to track
> >> at run-time.
>
> What about /proc/schedstat?
>
That's it!  I did not notice it before, this should work although the
user space might
need to parse the format.


-- 
Thanks,
Chenyu
> E.g. on Intel Xeon CPU E5-2690 v2
>
> $ cat /proc/schedstat | head
> version 15
> timestamp 4486170100
> cpu0 0 0 0 0 0 0 59501267037720 16902762382193 1319621004
> domain0 00,00100001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain1 00,3ff003ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>
>         ^^^^^^^^^^^
>
> cpu1 0 0 0 0 0 0 56045879920164 16758983055410 1318489275
> domain0 00,00200002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain1 00,3ff003ff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> ...
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Display the cpu of sched domain in procfs
  2020-02-19 10:00   ` Chen Yu
@ 2020-02-19 10:46     ` Valentin Schneider
  0 siblings, 0 replies; 6+ messages in thread
From: Valentin Schneider @ 2020-02-19 10:46 UTC (permalink / raw)
  To: Chen Yu
  Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar,
	Juri Lelli, Vincent Guittot, Mel Gorman, Tony Luck, Aubrey Li,
	Tim Chen, Dave Hansen, Borislav Petkov

On 19/02/2020 10:00, Chen Yu wrote:
>> Now, if you have a userspace that tries to be clever and wants to use this
>> information then yes, this isn't ideal, but then that's a different matter.
> The dmesg might be lost if someone has once used dmesg -c to clear the log,
> and the /var/log/dmesg might not always there. And it is not common to trigger
> sched domain update once boot up in some environment.
> But anyway, this information printed by sched_debug is very fertile for knowing
> the topology.
>> I think exposing the NUMA boundaries is fair game - and they already are
>> via /sys/devices/system/node/node*/.
> It seems that the numa sysfs could not reflect the SNC topology, it just has the
> *leaf* numa node information. Say, node0 and node1 might form one sched_domain.

Right, but if you have leaves + distance table, then userspace can try to
be clever about it without being exposed to scheduler innards.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-02-19 10:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-19  7:15 [RFC] Display the cpu of sched domain in procfs Chen Yu
2020-02-19  8:13 ` Valentin Schneider
2020-02-19  8:53   ` Dietmar Eggemann
2020-02-19 10:01     ` Chen Yu
2020-02-19 10:00   ` Chen Yu
2020-02-19 10:46     ` Valentin Schneider

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.