linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Naveen Krishna Chatradhi <nchatrad@amd.com>,
	linux-edac@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, mchehab@kernel.org,
	Muralidhara M K <muralimk@amd.com>
Subject: Re: [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran
Date: Thu, 2 Sep 2021 19:30:24 +0200	[thread overview]
Message-ID: <YTEKMHqjY/IUBfgl@zn.tnic> (raw)
In-Reply-To: <YS/Dsc2gWGGCWnbs@yaz-ubuntu>

On Wed, Sep 01, 2021 at 06:17:21PM +0000, Yazen Ghannam wrote:
> These devices aren't officially GPUs, since they don't have graphics/video
> capabilities. Can we come up with a new term for this class of devices? Maybe
> accelerators or something?
> 
> In any case, GPU is still used throughout documentation and code, so it's fair
> to just stick with "gpu".

Hmm, yeah, everybody is talking about special-purpose processing units
now, i.e., accelerators or whatever they call them. I guess this is the
new fancy thing since sliced bread.

Well, what are those PCI IDs going to represent? Devices which have RAS
capabilities on them?

We have this nomenclature called "uncore" in the perf subsystem for
counters which are not part of the CPU core or whatever. But there we
use that term on AMD already so that might cause confusion.

But I guess the type of those devices doesn't matter for amd_nb.c,
right?

All that thing cares for is having an array of northbridges, each with
the respective PCI devices and that's it. So for amd_nb.c I think that
differentiation doesn't matter... but keep reading...

> We use the Node ID to index into the amd_northbridge.nb array, e.g. in
> node_to_amd_nb().
> 
> We can get the Node ID of a GPU node when processing an MCA error as in Patch
> 2 of this set. The hardware is going to give us a value of 8 or more.
> 
> So, for example, if we set up the "nb" array like this for 1 CPU and 2 GPUs:
> [ID:Type] : [0: CPU], [8: GPU], [9: GPU]
>  
> Then I think we'll need some more processing at runtime to map, for example,
> an error from GPU Node 9 to NB array Index 2, etc.
> 
> Or we can manage this at init time like this:
> [0: CPU], [1: NULL], [2: NULL], [3: NULL], [4: NULL], [5: NULL], [6: NULL],
> [7, NULL], [8: GPU], [9: GPU]
> 
> And at runtime, the code which does Node ID to NB entry just works. This
> applies to node_to_amd_nb(), places where we loop over amd_nb_num(), etc.
> 
> What do you think?

Ok, looking at patch 2, it does:

	node_id = ((m->ipid >> 44) & 0xF);

So how ugly would it become if you do here:

	node_id = ((m->ipid >> 44) & 0xF);
	node_id -= accel_id_offset;

where that accel_id_offset is the thing you've read out from one of the
Data Fabric registers before?

This way, the gap between CPU IDs and accel IDs is gone and in the
software view, there is none.

Or are we reading other hardware registers which are aware of that gap
and we would have to remove it again to get the proper index? And if so,
and if it becomes real ugly, maybe we will have to bite the bullet and
do the gap in the array but that would be yucky...

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-09-02 17:29 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 15:28 [PATCH 0/7] x86/edac/amd64: Add support for noncpu nodes Naveen Krishna Chatradhi
2021-06-30 15:28 ` [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs Naveen Krishna Chatradhi
2021-07-19 19:28   ` Yazen Ghannam
2021-08-10 12:45     ` Chatradhi, Naveen Krishna
2021-08-10 13:54       ` Borislav Petkov
2021-06-30 15:28 ` [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-07-19 20:25   ` Yazen Ghannam
2021-08-10 12:45     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 3/7] EDAC/mc: Add new HBM2 memory type Naveen Krishna Chatradhi
2021-07-19 20:47   ` Yazen Ghannam
2021-07-19 21:16     ` Luck, Tony
2021-07-20 12:13       ` Zhuo, Qiuxu
2021-07-20 16:30         ` [PATCH] EDAC/skx_common: Set the memory type correctly for HBM memory Luck, Tony
2021-07-20 16:33     ` [PATCH 3/7] EDAC/mc: Add new HBM2 memory type Luck, Tony
2021-06-30 15:28 ` [PATCH 4/7] EDAC/mce_amd: extract node id from InstanceHi in IPID Naveen Krishna Chatradhi
2021-07-29 16:32   ` Yazen Ghannam
2021-08-10 12:45     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 5/7] EDAC/amd64: Enumerate memory on noncpu nodes Naveen Krishna Chatradhi
2021-07-29 17:55   ` Yazen Ghannam
2021-08-10 12:49     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 6/7] EDAC/amd64: Add address translation support for DF3.5 Naveen Krishna Chatradhi
2021-07-29 17:59   ` Yazen Ghannam
2021-07-29 18:13     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 7/7] EDAC/amd64: Add fixed UMC to CS mapping Naveen Krishna Chatradhi
2021-08-06  7:43 ` [PATCH v2 0/3] x86/edac/amd64: Add support for noncpu nodes Naveen Krishna Chatradhi
2021-08-06  7:43   ` [PATCH v2 1/3] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-08-20 15:36     ` Yazen Ghannam
2021-08-06  7:43   ` [PATCH v2 2/3] EDAC/mce_amd: Extract node id from InstanceHi in IPID Naveen Krishna Chatradhi
2021-08-20 15:43     ` Yazen Ghannam
2021-08-06  7:43   ` [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes Naveen Krishna Chatradhi
2021-08-20 17:02     ` Yazen Ghannam
2021-08-23 16:56       ` Chatradhi, Naveen Krishna
2021-08-23 18:54     ` [PATCH v3 0/3] x86/edac/amd64: Add support for " Naveen Krishna Chatradhi
2021-08-23 18:54       ` [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-08-25 10:42         ` Borislav Petkov
2021-09-01 18:17           ` Yazen Ghannam
2021-09-02 17:30             ` Borislav Petkov [this message]
2021-09-13 18:07               ` Yazen Ghannam
2021-09-16 16:36                 ` Borislav Petkov
2021-10-11 14:26           ` Chatradhi, Naveen Krishna
2021-10-11 18:08             ` Borislav Petkov
2021-08-23 18:54       ` [PATCH v3 2/3] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-08-27 10:24         ` Borislav Petkov
2021-09-01 18:18           ` Yazen Ghannam
2021-08-23 18:54       ` [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes Naveen Krishna Chatradhi
2021-08-27 11:30         ` Borislav Petkov
2021-09-01 18:42           ` Yazen Ghannam
2021-09-08 18:41             ` Borislav Petkov
2021-09-13 18:19               ` Yazen Ghannam
2021-10-14 18:50       ` [PATCH v4 0/4] x86/edac/amd64: Add support for " Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH 3/4] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-10-14 18:53       ` [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes Naveen Krishna Chatradhi
2021-10-14 18:53         ` [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-10-21 15:52           ` Yazen Ghannam
2021-10-14 18:53         ` [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-10-21 15:55           ` Yazen Ghannam
2021-10-14 18:53         ` [PATCH v4 3/4] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-10-21 16:27           ` Yazen Ghannam
2021-10-14 18:54         ` [PATCH v4 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-10-21 16:40           ` Yazen Ghannam
2021-10-14 19:53         ` [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes Borislav Petkov
2021-10-15 12:18           ` Chatradhi, Naveen Krishna
2021-10-15 19:25             ` Borislav Petkov
2021-08-20 17:07   ` [PATCH v2 0/3] " Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YTEKMHqjY/IUBfgl@zn.tnic \
    --to=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=muralimk@amd.com \
    --cc=nchatrad@amd.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).