linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* sysfs topology for arm64 cluster_id
@ 2015-01-14  0:47 Jon Masters
  2015-01-14 11:24 ` Arnd Bergmann
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jon Masters @ 2015-01-14  0:47 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: linux-kernel

Hi Folks,

TLDR: I would like to consider the value of adding something like
"cluster_siblings" or similar in sysfs to describe ARM topology.

A quick question on intended data representation in /sysfs topology
before I ask the team on this end to go down the (wrong?) path. On ARM
systems today, we have a hierarchical CPU topology:

                 Socket ---- Coherent Interonnect ---- Socket
                   |                                    |
         Cluster0 ... ClusterN                Cluster0 ... ClusterN
            |             |                      |             |
      Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
        |       |      |        |           |       |      |       |
     T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN

Where we might (or might not) have threads in individual cores (a la SMT
- it's allowed in the architecture at any rate) and we group cores
together into units of clusters usually 2-4 cores in size (though this
varies between implementations, some of which have different but similar
concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
cores). There are multiple clusters per "socket", and there might be an
arbitrary number of sockets. We'll start to enable NUMA soon.

The existing ARM architectural code understands expressing topology in
terms of the above, but it doesn't quite map these concepts directly in
sysfs (does not expose cluster_ids as an example). Currently, a cpu-map
in DeviceTree can expose hierarchies (included nested clusters) and this
is parsed at boot time to populate scheduler information, as well as the
topology files in sysfs (if that is provided - none of the reference
devicetrees upstream do this today, but some exist). But the cluster
information itself isn't quite exposed (whereas other whacky
architectural concepts such as s390 books are exposed already today).

Anyway. We have a small problem with tools such as those in util-linux
(lscpu) getting confused as a result of translating x86-isms to ARM. For
example, the lscpu utility calculates the number of sockets using the
following computation:

nsockets = desc->ncpus / nthreads / ncores

(number of sockets = total number of online processing elements /
threads within a single core / cores within a single socket)

If you're not careful, you can end up with something like:

# lscpu
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             4

Now we can argue that the system in question needs an updated cpu-map
(it'll actually be something ACPI but I'm keeping this discussion to DT
to avoid that piece further in discussion, and you can assume I'm
booting any test boxes in further work on this using DeviceTree prior to
switching the result over to ACPI) but either way, util-linux is
thinking in an x86-centric sense of what these files mean. And I think
the existing topology/cpu-map stuff in arm64 is doing the same.

Is it not a good idea to expose the cluster details directly in sysfs
and have these utilities understand the possible extra level in the
calculation? Or do we want to just fudge the numbers (as seems to be the
case in some systems I am seeing) to make the x86 model add up?

Let me know the preferred course...

Jon.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2015-01-14  0:47 sysfs topology for arm64 cluster_id Jon Masters
@ 2015-01-14 11:24 ` Arnd Bergmann
  2015-01-14 16:41   ` Don Dutile
  2015-01-14 16:07 ` Don Dutile
  2015-01-14 17:00 ` Mark Rutland
  2 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2015-01-14 11:24 UTC (permalink / raw)
  To: linux-arm-kernel; +Cc: Jon Masters, linux-kernel

On Tuesday 13 January 2015 19:47:00 Jon Masters wrote:
> Hi Folks,
> 
> TLDR: I would like to consider the value of adding something like
> "cluster_siblings" or similar in sysfs to describe ARM topology.
> 
> A quick question on intended data representation in /sysfs topology
> before I ask the team on this end to go down the (wrong?) path. On ARM
> systems today, we have a hierarchical CPU topology:
> 
>                  Socket ---- Coherent Interonnect ---- Socket
>                    |                                    |
>          Cluster0 ... ClusterN                Cluster0 ... ClusterN
>             |             |                      |             |
>       Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>         |       |      |        |           |       |      |       |
>      T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
> 
> Where we might (or might not) have threads in individual cores (a la SMT
> - it's allowed in the architecture at any rate) and we group cores
> together into units of clusters usually 2-4 cores in size (though this
> varies between implementations, some of which have different but similar
> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
> cores). There are multiple clusters per "socket", and there might be an
> arbitrary number of sockets. We'll start to enable NUMA soon.

Have you taken a look at the NUMA patches that Ganapatrao Kulkarni
has sent out? These encode the system-wide topology based on the model
from IBM Power machines.

> Is it not a good idea to expose the cluster details directly in sysfs
> and have these utilities understand the possible extra level in the
> calculation? Or do we want to just fudge the numbers (as seems to be the
> case in some systems I am seeing) to make the x86 model add up?
> 
> Let me know the preferred course...

I like the idea of encoding the topology independent of the specific
levels implemented in hardware, and we could use that same model
that we have in DT to represent things to user space, or that
can directly access the "arm,associativity" properties in
/sys/firmware/devicetree/base, but that would not be portable to
ACPI based systems.

In the platform that Ganapatrao is interested in, there are no clusters,
but they have two levels of NUMA topology (sockets and boards), and
I could well imagine systems that have more than those two, or systems
that have multiple levels below a socket (e.g. chip, cluster, core,
thread) that all share the same NUMA node because they have a common
memory controller.

It would be nice to find a good representation for sysfs that covers
all of these cases, and that also shows the associativity of I/O
devices.

	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2015-01-14  0:47 sysfs topology for arm64 cluster_id Jon Masters
  2015-01-14 11:24 ` Arnd Bergmann
@ 2015-01-14 16:07 ` Don Dutile
  2015-01-14 17:00 ` Mark Rutland
  2 siblings, 0 replies; 9+ messages in thread
From: Don Dutile @ 2015-01-14 16:07 UTC (permalink / raw)
  To: Jon Masters, linux-arm-kernel; +Cc: linux-kernel

On 01/13/2015 07:47 PM, Jon Masters wrote:
> Hi Folks,
>
> TLDR: I would like to consider the value of adding something like
> "cluster_siblings" or similar in sysfs to describe ARM topology.
>
> A quick question on intended data representation in /sysfs topology
> before I ask the team on this end to go down the (wrong?) path. On ARM
> systems today, we have a hierarchical CPU topology:
>
>                   Socket ---- Coherent Interonnect ---- Socket
>                     |                                    |
>           Cluster0 ... ClusterN                Cluster0 ... ClusterN
>              |             |                      |             |
>        Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>          |       |      |        |           |       |      |       |
>       T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
>
> Where we might (or might not) have threads in individual cores (a la SMT
> - it's allowed in the architecture at any rate) and we group cores
> together into units of clusters usually 2-4 cores in size (though this
> varies between implementations, some of which have different but similar
> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
> cores). There are multiple clusters per "socket", and there might be an
> arbitrary number of sockets. We'll start to enable NUMA soon.
>
> The existing ARM architectural code understands expressing topology in
> terms of the above, but it doesn't quite map these concepts directly in
> sysfs (does not expose cluster_ids as an example). Currently, a cpu-map
> in DeviceTree can expose hierarchies (included nested clusters) and this
> is parsed at boot time to populate scheduler information, as well as the
> topology files in sysfs (if that is provided - none of the reference
> devicetrees upstream do this today, but some exist). But the cluster
> information itself isn't quite exposed (whereas other whacky
> architectural concepts such as s390 books are exposed already today).
>
> Anyway. We have a small problem with tools such as those in util-linux
> (lscpu) getting confused as a result of translating x86-isms to ARM. For
> example, the lscpu utility calculates the number of sockets using the
> following computation:
>
> nsockets = desc->ncpus / nthreads / ncores
>
> (number of sockets = total number of online processing elements /
> threads within a single core / cores within a single socket)
>
> If you're not careful, you can end up with something like:
>
> # lscpu
> Architecture:          aarch64
> Byte Order:            Little Endian
> CPU(s):                8
> On-line CPU(s) list:   0-7
> Thread(s) per core:    1
> Core(s) per socket:    2
> Socket(s):             4
>
Basically, in the top-most diagram, lscpu (& hwloc) are equating Cluster<N>
as socket<N>.  I'm curious what the sysfs numa info will be interpreted
as when/if that is turned on for arm64.

> Now we can argue that the system in question needs an updated cpu-map
> (it'll actually be something ACPI but I'm keeping this discussion to DT
> to avoid that piece further in discussion, and you can assume I'm
> booting any test boxes in further work on this using DeviceTree prior to
> switching the result over to ACPI) but either way, util-linux is
> thinking in an x86-centric sense of what these files mean. And I think
> the existing topology/cpu-map stuff in arm64 is doing the same.
>
The above values are extracted from the MPIDR:Affx fields and is currently
independent of DT & ACPI.
The Aff1 field is the 'cluster-id' and is being used to associated cpu's (via cpu masks)
to siblings. lscpu & hwloc associate cpu-nums & siblings to sockets via the above
calculation, which doesn't quite show how siblings enter the equation
       ncores = CPU_COUNT_S(setsize, core_siblings) / nthreads;

Note: in the arm(32) tree, what was 'socket-id' is 'cluster-id' in arm64;
       I believe this 'mapping' (backporting/association) is one root problem
       in the arch/arm64/kernel/topology.c code.

Now, a simple, yet requiring lots of fun, cross-architecture testing, would
be to change lscpu to use the sysfs physical_package_id to get Socket correct.  Yet,
that won't fix the above 'Core(s) per socket' because that's being created
via the sibling masks, which are generated from the cluster-id.
This change would require arm(64) to implement DT & ACPI methods to
extract pcpu's to sockets (missing at the moment).

And modifying the cluster-id and/or the siblings masks creates non-topology
(non-lscpu, non-hwloc) issues like breaking gic init code paths which use
the cluster-id information as well. ... some 'empirical data' to note
if anyone thinks it's just a topology-presentation issue.

> Is it not a good idea to expose the cluster details directly in sysfs
> and have these utilities understand the possible extra level in the
> calculation? Or do we want to just fudge the numbers (as seems to be the
> case in some systems I am seeing) to make the x86 model add up?
>
Short-term, I'm trying to develop a reasonable 'fudge' for lscpu & hwloc,
that doesn't impact the (proper) operation of the gic code.
I haven't dug deep enough yet, but this also requires a check on how
the scheduler associates cpu-cache-sibling associativity when selecting
optimal cpu to schedule threads on.

> Let me know the preferred course...
>
> Jon.
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2015-01-14 11:24 ` Arnd Bergmann
@ 2015-01-14 16:41   ` Don Dutile
  0 siblings, 0 replies; 9+ messages in thread
From: Don Dutile @ 2015-01-14 16:41 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel; +Cc: Jon Masters, linux-kernel

On 01/14/2015 06:24 AM, Arnd Bergmann wrote:
> On Tuesday 13 January 2015 19:47:00 Jon Masters wrote:
>> Hi Folks,
>>
>> TLDR: I would like to consider the value of adding something like
>> "cluster_siblings" or similar in sysfs to describe ARM topology.
>>
>> A quick question on intended data representation in /sysfs topology
>> before I ask the team on this end to go down the (wrong?) path. On ARM
>> systems today, we have a hierarchical CPU topology:
>>
>>                   Socket ---- Coherent Interonnect ---- Socket
>>                     |                                    |
>>           Cluster0 ... ClusterN                Cluster0 ... ClusterN
>>              |             |                      |             |
>>        Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>>          |       |      |        |           |       |      |       |
>>       T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
>>
>> Where we might (or might not) have threads in individual cores (a la SMT
>> - it's allowed in the architecture at any rate) and we group cores
>> together into units of clusters usually 2-4 cores in size (though this
>> varies between implementations, some of which have different but similar
>> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
>> cores). There are multiple clusters per "socket", and there might be an
>> arbitrary number of sockets. We'll start to enable NUMA soon.
>
> Have you taken a look at the NUMA patches that Ganapatrao Kulkarni
> has sent out? These encode the system-wide topology based on the model
> from IBM Power machines.
>
Thanks for that ptr!  I'll take a look at this code today.

>> Is it not a good idea to expose the cluster details directly in sysfs
>> and have these utilities understand the possible extra level in the
>> calculation? Or do we want to just fudge the numbers (as seems to be the
>> case in some systems I am seeing) to make the x86 model add up?
>>
>> Let me know the preferred course...
>
> I like the idea of encoding the topology independent of the specific
> levels implemented in hardware, and we could use that same model
> that we have in DT to represent things to user space, or that
> can directly access the "arm,associativity" properties in
> /sys/firmware/devicetree/base, but that would not be portable to
> ACPI based systems.
>
> In the platform that Ganapatrao is interested in, there are no clusters,
> but they have two levels of NUMA topology (sockets and boards), and
> I could well imagine systems that have more than those two, or systems
> that have multiple levels below a socket (e.g. chip, cluster, core,
> thread) that all share the same NUMA node because they have a common
> memory controller.
>
> It would be nice to find a good representation for sysfs that covers
> all of these cases, and that also shows the associativity of I/O
> devices.
>
Caches too (and cpu associativity to them, esp. L2)

> 	Arnd
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2015-01-14  0:47 sysfs topology for arm64 cluster_id Jon Masters
  2015-01-14 11:24 ` Arnd Bergmann
  2015-01-14 16:07 ` Don Dutile
@ 2015-01-14 17:00 ` Mark Rutland
  2015-01-14 17:18   ` Jon Masters
  2 siblings, 1 reply; 9+ messages in thread
From: Mark Rutland @ 2015-01-14 17:00 UTC (permalink / raw)
  To: jcm; +Cc: linux-arm-kernel, linux-kernel

On Wed, Jan 14, 2015 at 12:47:00AM +0000, Jon Masters wrote:
> Hi Folks,
> 
> TLDR: I would like to consider the value of adding something like
> "cluster_siblings" or similar in sysfs to describe ARM topology.
> 
> A quick question on intended data representation in /sysfs topology
> before I ask the team on this end to go down the (wrong?) path. On ARM
> systems today, we have a hierarchical CPU topology:
> 
>                  Socket ---- Coherent Interonnect ---- Socket
>                    |                                    |
>          Cluster0 ... ClusterN                Cluster0 ... ClusterN
>             |             |                      |             |
>       Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>         |       |      |        |           |       |      |       |
>      T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
> 
> Where we might (or might not) have threads in individual cores (a la SMT
> - it's allowed in the architecture at any rate) and we group cores
> together into units of clusters usually 2-4 cores in size (though this
> varies between implementations, some of which have different but similar
> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
> cores). There are multiple clusters per "socket", and there might be an
> arbitrary number of sockets. We'll start to enable NUMA soon.

I have a slight disagreement with the diagram above.

The MPIDR_EL1.Aff* fields and the cpu-map bindings currently only
describe the hierarchy, without any information on the relative
weighting between levels, and without any mapping to HW concepts such as
sockets. What these happen to map to is specific to a particular system,
and the hierarchy may be carved up in a number of possible ways
(including "virtual" clusters). There are also 24 RES0 bits that could
potentially become additional Aff fields we may need to describe in
future.

"socket", "package", etc are meaningless unless the system provides a
mapping of Aff levels to these. We can't guess how the HW is actually
organised.

Mark.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2015-01-14 17:00 ` Mark Rutland
@ 2015-01-14 17:18   ` Jon Masters
       [not found]     ` <CALRxmdA+qa+MxkT-Gx-Me2Of5EX+Zobz6HtWRuVK7hhG=zxpmg@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Masters @ 2015-01-14 17:18 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-arm-kernel, linux-kernel, Don Dutile

On 01/14/2015 12:00 PM, Mark Rutland wrote:
> On Wed, Jan 14, 2015 at 12:47:00AM +0000, Jon Masters wrote:
>> Hi Folks,
>>
>> TLDR: I would like to consider the value of adding something like
>> "cluster_siblings" or similar in sysfs to describe ARM topology.
>>
>> A quick question on intended data representation in /sysfs topology
>> before I ask the team on this end to go down the (wrong?) path. On ARM
>> systems today, we have a hierarchical CPU topology:
>>
>>                  Socket ---- Coherent Interonnect ---- Socket
>>                    |                                    |
>>          Cluster0 ... ClusterN                Cluster0 ... ClusterN
>>             |             |                      |             |
>>       Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>>         |       |      |        |           |       |      |       |
>>      T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
>>
>> Where we might (or might not) have threads in individual cores (a la SMT
>> - it's allowed in the architecture at any rate) and we group cores
>> together into units of clusters usually 2-4 cores in size (though this
>> varies between implementations, some of which have different but similar
>> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
>> cores). There are multiple clusters per "socket", and there might be an
>> arbitrary number of sockets. We'll start to enable NUMA soon.
> 
> I have a slight disagreement with the diagram above.

Thanks for the clarification - note that I was *explicitly not* saying
that the MPIDR Affinity bits sufficiently described the system :) Nor do
I think cpu-map does cover everything we want today.

> The MPIDR_EL1.Aff* fields and the cpu-map bindings currently only
> describe the hierarchy, without any information on the relative
> weighting between levels, and without any mapping to HW concepts such as
> sockets. What these happen to map to is specific to a particular system,
> and the hierarchy may be carved up in a number of possible ways
> (including "virtual" clusters). There are also 24 RES0 bits that could
> potentially become additional Aff fields we may need to describe in
> future.

> "socket", "package", etc are meaningless unless the system provides a
> mapping of Aff levels to these. We can't guess how the HW is actually
> organised.

The replies I got from you and Arnd gel with my thinking that we want
something generic enough in Linux to handle this in a non-architectural
way (real topology, not just hierarchies). That should also cover the
kind of cluster-like stuff e.g. AMD with NUMA on HT on a single socket
and other stuff. So...it sounds like we need "something" to add to our
understanding of hierarchy, and that "something" is in sysfs. A proposal
needs to be derived (I think Don will followup since he is keen to poke
at this). We'll go back to the ACPI ASWG folks to add whatever is
missing to future ACPI bindings after that discussion.

Jon.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: sysfs topology for arm64 cluster_id
       [not found]     ` <CALRxmdA+qa+MxkT-Gx-Me2Of5EX+Zobz6HtWRuVK7hhG=zxpmg@mail.gmail.com>
@ 2016-07-01 15:54       ` Stuart Yoder
  2016-07-01 17:25         ` Don Dutile
  2016-08-05 14:16         ` Christopher Covington
  0 siblings, 2 replies; 9+ messages in thread
From: Stuart Yoder @ 2016-07-01 15:54 UTC (permalink / raw)
  To: Jon Masters, Mark Rutland, linux-arm-kernel, linux-kernel, Don Dutile
  Cc: Will Deacon, Catalin Marinas, Peter Newton

Re-opening a thread from back in early 2015...

> -----Original Message-----
> From: Jon Masters <jcm@redhat.com>
> Date: Wed, Jan 14, 2015 at 11:18 AM
> Subject: Re: sysfs topology for arm64 cluster_id
> To: Mark Rutland <mark.rutland@arm.com>
> Cc: "linux-arm-kernel@lists.infradead.org"
> <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org"
> <linux-kernel@vger.kernel.org>, Don Dutile <ddutile@redhat.com>
> 
> 
> On 01/14/2015 12:00 PM, Mark Rutland wrote:
> > On Wed, Jan 14, 2015 at 12:47:00AM +0000, Jon Masters wrote:
> >> Hi Folks,
> >>
> >> TLDR: I would like to consider the value of adding something like
> >> "cluster_siblings" or similar in sysfs to describe ARM topology.
> >>
> >> A quick question on intended data representation in /sysfs topology
> >> before I ask the team on this end to go down the (wrong?) path. On ARM
> >> systems today, we have a hierarchical CPU topology:
> >>
> >>                  Socket ---- Coherent Interonnect ---- Socket
> >>                    |                                    |
> >>          Cluster0 ... ClusterN                Cluster0 ... ClusterN
> >>             |             |                      |             |
> >>       Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
> >>         |       |      |        |           |       |      |       |
> >>      T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
> >>
> >> Where we might (or might not) have threads in individual cores (a la SMT
> >> - it's allowed in the architecture at any rate) and we group cores
> >> together into units of clusters usually 2-4 cores in size (though this
> >> varies between implementations, some of which have different but similar
> >> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
> >> cores). There are multiple clusters per "socket", and there might be an
> >> arbitrary number of sockets. We'll start to enable NUMA soon.
> >
> > I have a slight disagreement with the diagram above.
> 
> Thanks for the clarification - note that I was *explicitly not* saying
> that the MPIDR Affinity bits sufficiently described the system :) Nor do
> I think cpu-map does cover everything we want today.
> 
> > The MPIDR_EL1.Aff* fields and the cpu-map bindings currently only
> > describe the hierarchy, without any information on the relative
> > weighting between levels, and without any mapping to HW concepts such as
> > sockets. What these happen to map to is specific to a particular system,
> > and the hierarchy may be carved up in a number of possible ways
> > (including "virtual" clusters). There are also 24 RES0 bits that could
> > potentially become additional Aff fields we may need to describe in
> > future.
> 
> > "socket", "package", etc are meaningless unless the system provides a
> > mapping of Aff levels to these. We can't guess how the HW is actually
> > organised.
> 
> The replies I got from you and Arnd gel with my thinking that we want
> something generic enough in Linux to handle this in a non-architectural
> way (real topology, not just hierarchies). That should also cover the
> kind of cluster-like stuff e.g. AMD with NUMA on HT on a single socket
> and other stuff. So...it sounds like we need "something" to add to our
> understanding of hierarchy, and that "something" is in sysfs. A proposal
> needs to be derived (I think Don will followup since he is keen to poke
> at this). We'll go back to the ACPI ASWG folks to add whatever is
> missing to future ACPI bindings after that discussion.

So, whatever happened to this?

We are running into issues with some DPDK code on arm64 that makes assumptions
about the existence of a NUMA-based system based on the physical_package_id
in sysfs. On A57 cpus since physical_package_id represents 'cluster' 
things go a bit haywire.

Granted this particular app has an x86-centric assumption in it, but what is the
longer term view of how topologies should be represented?

This thread seemed to be heading in the direction of a solution, but 
then it seems to have just stopped.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2016-07-01 15:54       ` Stuart Yoder
@ 2016-07-01 17:25         ` Don Dutile
  2016-08-05 14:16         ` Christopher Covington
  1 sibling, 0 replies; 9+ messages in thread
From: Don Dutile @ 2016-07-01 17:25 UTC (permalink / raw)
  To: Stuart Yoder, Jon Masters, Mark Rutland, linux-arm-kernel, linux-kernel
  Cc: Catalin Marinas, Peter Newton, Will Deacon

On 07/01/2016 11:54 AM, Stuart Yoder wrote:
> Re-opening a thread from back in early 2015...
>
>> -----Original Message-----
>> From: Jon Masters <jcm@redhat.com>
>> Date: Wed, Jan 14, 2015 at 11:18 AM
>> Subject: Re: sysfs topology for arm64 cluster_id
>> To: Mark Rutland <mark.rutland@arm.com>
>> Cc: "linux-arm-kernel@lists.infradead.org"
>> <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org"
>> <linux-kernel@vger.kernel.org>, Don Dutile <ddutile@redhat.com>
>>
>>
>> On 01/14/2015 12:00 PM, Mark Rutland wrote:
>>> On Wed, Jan 14, 2015 at 12:47:00AM +0000, Jon Masters wrote:
>>>> Hi Folks,
>>>>
>>>> TLDR: I would like to consider the value of adding something like
>>>> "cluster_siblings" or similar in sysfs to describe ARM topology.
>>>>
>>>> A quick question on intended data representation in /sysfs topology
>>>> before I ask the team on this end to go down the (wrong?) path. On ARM
>>>> systems today, we have a hierarchical CPU topology:
>>>>
>>>>                   Socket ---- Coherent Interonnect ---- Socket
>>>>                     |                                    |
>>>>           Cluster0 ... ClusterN                Cluster0 ... ClusterN
>>>>              |             |                      |             |
>>>>        Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>>>>          |       |      |        |           |       |      |       |
>>>>       T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
>>>>
>>>> Where we might (or might not) have threads in individual cores (a la SMT
>>>> - it's allowed in the architecture at any rate) and we group cores
>>>> together into units of clusters usually 2-4 cores in size (though this
>>>> varies between implementations, some of which have different but similar
>>>> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
>>>> cores). There are multiple clusters per "socket", and there might be an
>>>> arbitrary number of sockets. We'll start to enable NUMA soon.
>>>
>>> I have a slight disagreement with the diagram above.
>>
>> Thanks for the clarification - note that I was *explicitly not* saying
>> that the MPIDR Affinity bits sufficiently described the system :) Nor do
>> I think cpu-map does cover everything we want today.
>>
>>> The MPIDR_EL1.Aff* fields and the cpu-map bindings currently only
>>> describe the hierarchy, without any information on the relative
>>> weighting between levels, and without any mapping to HW concepts such as
>>> sockets. What these happen to map to is specific to a particular system,
>>> and the hierarchy may be carved up in a number of possible ways
>>> (including "virtual" clusters). There are also 24 RES0 bits that could
>>> potentially become additional Aff fields we may need to describe in
>>> future.
>>
>>> "socket", "package", etc are meaningless unless the system provides a
>>> mapping of Aff levels to these. We can't guess how the HW is actually
>>> organised.
>>
>> The replies I got from you and Arnd gel with my thinking that we want
>> something generic enough in Linux to handle this in a non-architectural
>> way (real topology, not just hierarchies). That should also cover the
>> kind of cluster-like stuff e.g. AMD with NUMA on HT on a single socket
>> and other stuff. So...it sounds like we need "something" to add to our
>> understanding of hierarchy, and that "something" is in sysfs. A proposal
>> needs to be derived (I think Don will followup since he is keen to poke
>> at this). We'll go back to the ACPI ASWG folks to add whatever is
>> missing to future ACPI bindings after that discussion.
>
> So, whatever happened to this?
>
> We are running into issues with some DPDK code on arm64 that makes assumptions
> about the existence of a NUMA-based system based on the physical_package_id
> in sysfs. On A57 cpus since physical_package_id represents 'cluster'
> things go a bit haywire.
>
> Granted this particular app has an x86-centric assumption in it, but what is the
> longer term view of how topologies should be represented?
>
> This thread seemed to be heading in the direction of a solution, but
> then it seems to have just stopped.
>
> Thanks,
> Stuart
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Unlike what jcm stated, the simplest/fastest solution is an architecture-specific solution.
The problem with aarch64: the MPIDR is unarchitected past core's what the hierarchy information
means -- vendor dependent.

What aarch4 lacks is the cpu-id *equivalent* of x86, which has a very detailed, architected
specification (and linux kernel implementation) to appropriately map cores (and threads) to
caches, and memory nodes/clusters/chunks/ to cores (threads of cores have obvious mem association).

So, someone has to architect the x86 cpuid equivalence.  It doesn't have to be in the i-stream,
as x86 does, but for servers -- and that's where your DPDK -- nearly any server sw (b/c most servers
these days have lots of cores & memory) grope the sysfs space to determine topology and do the
equivalent, topology-dependent optimizations in the apps.
A proposal that was bantered around RH was yet-another-ACPI structure.... which could
be populated on x86 as well, and provide the equivalent of the now-architecture-specific
futue architecture-agnostic, core/thread/memory (/io) topology information.

Unfortunately, I don't have the cycles to lend to this effort, as I've taken over the RDMA stack
in RHEL (from dledford, who now is upstream maintainer for rdma-list).
As advanced layered products like DPDK are ported to arm64,
this issue will reach critical mass quickly, when dog-n-pony-shows turn into benchmark comparisons.

Thanks for raising the issue on the appropriate lists.
Perhaps some real effort will be made to finally resolve the issue.

- Don

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: sysfs topology for arm64 cluster_id
  2016-07-01 15:54       ` Stuart Yoder
  2016-07-01 17:25         ` Don Dutile
@ 2016-08-05 14:16         ` Christopher Covington
  1 sibling, 0 replies; 9+ messages in thread
From: Christopher Covington @ 2016-08-05 14:16 UTC (permalink / raw)
  To: Stuart Yoder, Jon Masters, Mark Rutland, linux-arm-kernel,
	linux-kernel, Don Dutile
  Cc: Catalin Marinas, Peter Newton, Will Deacon, Shanker Donthineni,
	Vikram Sethi, Mark Salter, Mark Langsdorf, Mark Brown

Hi Stuart,

On 07/01/2016 11:54 AM, Stuart Yoder wrote:
> Re-opening a thread from back in early 2015...
> 
>> -----Original Message-----
>> From: Jon Masters <jcm@redhat.com>
>> Date: Wed, Jan 14, 2015 at 11:18 AM
>> Subject: Re: sysfs topology for arm64 cluster_id
>> To: Mark Rutland <mark.rutland@arm.com>
>> Cc: "linux-arm-kernel@lists.infradead.org"
>> <linux-arm-kernel@lists.infradead.org>, "linux-kernel@vger.kernel.org"
>> <linux-kernel@vger.kernel.org>, Don Dutile <ddutile@redhat.com>
>>
>>
>> On 01/14/2015 12:00 PM, Mark Rutland wrote:
>>> On Wed, Jan 14, 2015 at 12:47:00AM +0000, Jon Masters wrote:
>>>> Hi Folks,
>>>>
>>>> TLDR: I would like to consider the value of adding something like
>>>> "cluster_siblings" or similar in sysfs to describe ARM topology.
>>>>
>>>> A quick question on intended data representation in /sysfs topology
>>>> before I ask the team on this end to go down the (wrong?) path. On ARM
>>>> systems today, we have a hierarchical CPU topology:
>>>>
>>>>                  Socket ---- Coherent Interonnect ---- Socket
>>>>                    |                                    |
>>>>          Cluster0 ... ClusterN                Cluster0 ... ClusterN
>>>>             |             |                      |             |
>>>>       Core0...CoreN  Core0...CoreN        Core0...CoreN  Core0...CoreN
>>>>         |       |      |        |           |       |      |       |
>>>>      T0..TN  T0..Tn  T0..TN  T0..TN       T0..TN T0..TN  T0..TN  T0..TN
>>>>
>>>> Where we might (or might not) have threads in individual cores (a la SMT
>>>> - it's allowed in the architecture at any rate) and we group cores
>>>> together into units of clusters usually 2-4 cores in size (though this
>>>> varies between implementations, some of which have different but similar
>>>> concepts, such as AppliedMicro Potenza PMDs CPU complexes of dual
>>>> cores). There are multiple clusters per "socket", and there might be an
>>>> arbitrary number of sockets. We'll start to enable NUMA soon.
>>>
>>> I have a slight disagreement with the diagram above.
>>
>> Thanks for the clarification - note that I was *explicitly not* saying
>> that the MPIDR Affinity bits sufficiently described the system :) Nor do
>> I think cpu-map does cover everything we want today.
>>
>>> The MPIDR_EL1.Aff* fields and the cpu-map bindings currently only
>>> describe the hierarchy, without any information on the relative
>>> weighting between levels, and without any mapping to HW concepts such as
>>> sockets. What these happen to map to is specific to a particular system,
>>> and the hierarchy may be carved up in a number of possible ways
>>> (including "virtual" clusters). There are also 24 RES0 bits that could
>>> potentially become additional Aff fields we may need to describe in
>>> future.
>>
>>> "socket", "package", etc are meaningless unless the system provides a
>>> mapping of Aff levels to these. We can't guess how the HW is actually
>>> organised.
>>
>> The replies I got from you and Arnd gel with my thinking that we want
>> something generic enough in Linux to handle this in a non-architectural
>> way (real topology, not just hierarchies). That should also cover the
>> kind of cluster-like stuff e.g. AMD with NUMA on HT on a single socket
>> and other stuff. So...it sounds like we need "something" to add to our
>> understanding of hierarchy, and that "something" is in sysfs. A proposal
>> needs to be derived (I think Don will followup since he is keen to poke
>> at this). We'll go back to the ACPI ASWG folks to add whatever is
>> missing to future ACPI bindings after that discussion.
> 
> So, whatever happened to this?
> 
> We are running into issues with some DPDK code on arm64 that makes assumptions
> about the existence of a NUMA-based system based on the physical_package_id
> in sysfs. On A57 cpus since physical_package_id represents 'cluster' 
> things go a bit haywire.
> 
> Granted this particular app has an x86-centric assumption in it, but what is the
> longer term view of how topologies should be represented?
> 
> This thread seemed to be heading in the direction of a solution, but 
> then it seems to have just stopped.

Can you elaborate a little more on the specifics of the DPDK failure? Would the following change fix it? This should make physical_package_id in sysfs read as -1 (default definition from include/linux/topology.h), while preserving the cluster affinity information for kernel scheduling purposes.

Thanks,
Cov

--- 8< ---
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 8b57339..f1095e7 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -13,7 +13,6 @@ struct cpu_topology {
 
 extern struct cpu_topology cpu_topology[NR_CPUS];
 
-#define topology_physical_package_id(cpu)	(cpu_topology[cpu].cluster_id)
 #define topology_core_id(cpu)		(cpu_topology[cpu].core_id)
 #define topology_core_cpumask(cpu)	(&cpu_topology[cpu].core_sibling)
 #define topology_sibling_cpumask(cpu)	(&cpu_topology[cpu].thread_sibling)
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code
Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-08-05 14:16 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-14  0:47 sysfs topology for arm64 cluster_id Jon Masters
2015-01-14 11:24 ` Arnd Bergmann
2015-01-14 16:41   ` Don Dutile
2015-01-14 16:07 ` Don Dutile
2015-01-14 17:00 ` Mark Rutland
2015-01-14 17:18   ` Jon Masters
     [not found]     ` <CALRxmdA+qa+MxkT-Gx-Me2Of5EX+Zobz6HtWRuVK7hhG=zxpmg@mail.gmail.com>
2016-07-01 15:54       ` Stuart Yoder
2016-07-01 17:25         ` Don Dutile
2016-08-05 14:16         ` Christopher Covington

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).