All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
To: Borislav Petkov <bp@alien8.de>
Cc: x86@kernel.org, linux-kernel@vger.kernel.org, leo.duran@amd.com,
	yazen.ghannam@amd.com, Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket
Date: Tue, 27 Jun 2017 20:07:10 +0700	[thread overview]
Message-ID: <bb167d94-828b-2d5b-642a-d56e8db392d0@amd.com> (raw)
In-Reply-To: <20170627104803.wlhsqhaylbeqod37@pd.tnic>

Hi Boris,

On 6/27/17 17:48, Borislav Petkov wrote:
> On Tue, Jun 27, 2017 at 01:40:52AM -0500, Suravee Suthikulpanit wrote:
>> According to the Documentation/x86/topology.txt, AMD nomenclature for
>> package is NUMA node (or die).
>
> You guys keep talking about die so you should add the definition of
> "die" to that document so that we're all on the same page. In a separate
> patch.

What we are trying point out here is that (NUMA) "node" and "die" are the same 
thing in most of AMD processors, not necessary trying to introduce another term 
here.

>> However, this is not the case on AMD family17h multi-die processor
>> platforms, which can have up to 4 dies per socket as shown in the
>> following system topology.
>
> So what exactly does that mean? A die is a package on ZN and you can have up to 4 packages on a physical socket?

Yes. 4 packages (or 4 dies, or 4 NUMA nodes) in a socket.

>> Current logic interpretes package as socket (i.e. phys_proc_id is
>> socket id), which results in setting x86_has_numa_in_package, and omits
>> the DIE schedule domain.
>
> Well, this is what we do on MCM Magny-Cours. Btw, here's where you
> explain *why* you need the DIE domain.

As I have described in the cover letter, this patch series changes how kernel 
derives cpu "package" from package-as-socket to package-as-die in order to fix 
following issues on AMD family17h multi-die processor platforms:

  * irqbalance fails to allocating IRQs to individual CPU within the die.

  * The scheduler fails to load-balance across 8 threads within a die`
    (e.g. running 8-thread application w/ taskset -c 0-7 ) with
    the DIE schedule domain omitted due to x86_has_numa_in_package.

These issues are fixed when properly intepretes package as DIE.

For MCM Magny-Cours (family15h), we do not have this issue because the MC 
sched-domain has the same cpumask as the DIE sched-domain. This is not the same 
as for Zen cores (family17h) where the MC sched-domain is the CCX (2 CCX per die)

>> However, NUMA schedule domains are derived from
>> SRAT/SLIT, which assumes NUMA node is a die,
>
> Ok, so I don't understand this: SRAT/SLIT should basically say that you
> have 4 *logical* NUMA nodes on that *physical* socket. Is that what you
> want? Or is not what you want? Because it makes sense to me that BIOS
> should describe how the logical grouping of the cores in a node is.
> And x86_numa_in_package_topology roughly means that we rely on SRAT/SLIT
> to tell us the topology.

SRAT/SLIT tables are doing the right thing, where it describes topology among 
*logical* NUMA nodes (dies). So, for the system view below:

>> System View (with 2 socket) :
>>            --------------------
>>            |     -------------|------
>>            |     |            |     |
>>          ------------       ------------
>>          | D1 -- D0 |       | D7 -- D6 |
>>          | |  \/ |  |       | |  \/ |  |
>>  SOCKET0 | |  /\ |  |       | |  /\ |  | SOCKET1
>>          | D2 -- D3 |       | D4 -- D5 |
>>          ------------       ------------
>>            |     |            |     |
>>            ------|------------|     |
>>                  --------------------

SLIT table is showing

node   0   1   2   3   4   5   6   7
   0:  10  16  16  16  32  32  32  32
   1:  16  10  16  16  32  32  32  32
   2:  16  16  10  16  32  32  32  32
   3:  16  16  16  10  32  32  32  32
   4:  32  32  32  32  10  16  16  16
   5:  32  32  32  32  16  10  16  16
   6:  32  32  32  32  16  16  10  16
   7:  32  32  32  32  16  16  16  10

However, SRAT/SLIT does not describe the DIE. So, using 
x86_numa_in_packge_topology on multi-die Zen processor will result in missing 
the DIE sched-domain for cpus within a die.

> So why can't SRAT/SLIT do that just like they did on Magny-Cours MCM
> packages?

Again, Magny-Cours MCM DIE sched-domain is the same as MC sched-domain. So, we 
can omit the DIE sched-domain. Here is an example of /proc/schedstat

Magney-Cours cpu0
domain0 00000000,00000003 (SMT)
domain1 00000000,000000ff (MC which is the same as DIE)
domain2 00ff00ff,00ffffff (NUMA 1 hop)
domain3 ffffffff,ffffffff (NUMA platform)

Zen cpu0 (package-as-socket)
domain0 00000000,00000001,00000000,00000001 (SMT)
domain1 00000000,0000000f,00000000,0000000f (MC ccx)
domain2 00000000,ffffffff,00000000,ffffffff (NUMA socket)
domain3 ffffffff,ffffffff,ffffffff,ffffffff (NUMA platform)

Zen cpu0 (package-as-die)
domain0 00000000,00000001,00000000,00000001 (SMT)
domain1 00000000,0000000f,00000000,0000000f (MC ccx)
domain2 00000000,000000ff,00000000,000000ff (DIE)
domain3 00000000,ffffffff,00000000,ffffffff (NUMA socket)
domain4 ffffffff,ffffffff,ffffffff,ffffffff (NUMA platform)

Hope this helps clarify the purpose of this patch series. If no other concerns 
going into this direction, I'll split and clean up the patch series per your 
initial review comments.

Thanks,
Suravee

  reply	other threads:[~2017-06-27 13:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-27  6:40 [PATCH 0/2] x86/CPU/AMD: Fix multi-die processor topology info Suravee Suthikulpanit
2017-06-27  6:40 ` [PATCH 1/2] x86/CPU/AMD: Present package as die instead of socket Suravee Suthikulpanit
2017-06-27 10:48   ` Borislav Petkov
2017-06-27 13:07     ` Suravee Suthikulpanit [this message]
2017-06-27 13:42       ` Borislav Petkov
2017-06-27 16:54         ` Suravee Suthikulpanit
2017-06-27 17:44           ` Borislav Petkov
2017-06-27 18:32             ` Ghannam, Yazen
2017-06-27 18:44               ` Borislav Petkov
2017-06-27 20:26             ` Suravee Suthikulpanit
2017-06-28  9:12               ` Borislav Petkov
2017-06-27 14:21       ` Thomas Gleixner
2017-06-27 14:54         ` Brice Goglin
2017-06-27 15:48         ` Duran, Leo
2017-06-27 16:11           ` Borislav Petkov
2017-06-27 16:23             ` Duran, Leo
2017-06-27 16:31               ` Borislav Petkov
2017-06-27 16:42                 ` Duran, Leo
2017-06-27 16:45                   ` Borislav Petkov
2017-06-27 17:04                     ` Duran, Leo
2017-06-27 16:19           ` Thomas Gleixner
2017-06-27 16:34             ` Duran, Leo
2017-06-27 15:55         ` Suravee Suthikulpanit
2017-06-27 16:16           ` Borislav Petkov
2017-07-05 15:50       ` Peter Zijlstra
2017-06-27  6:40 ` [PATCH 2/2] x86/CPU/AMD: Use L3 Cache info from CPUID to determine LLC ID Suravee Suthikulpanit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb167d94-828b-2d5b-642a-d56e8db392d0@amd.com \
    --to=suravee.suthikulpanit@amd.com \
    --cc=bp@alien8.de \
    --cc=leo.duran@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.