All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Chatradhi, Naveen Krishna" <NaveenKrishna.Chatradhi@amd.com>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"mchehab@kernel.org" <mchehab@kernel.org>
Subject: RE: [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran
Date: Tue, 10 Aug 2021 12:45:20 +0000	[thread overview]
Message-ID: <BL1PR12MB5286AF6D10A89BBA9DB569EBE8F79@BL1PR12MB5286.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20210719202554.GB19451@aus-x-yghannam.amd.com>

[Public]

Hi Yazen

Regards,
Naveenk

-----Original Message-----
From: Ghannam, Yazen <Yazen.Ghannam@amd.com> 
Sent: Tuesday, July 20, 2021 1:56 AM
To: Chatradhi, Naveen Krishna <NaveenKrishna.Chatradhi@amd.com>
Cc: linux-edac@vger.kernel.org; x86@kernel.org; linux-kernel@vger.kernel.org; bp@alien8.de; mingo@redhat.com; mchehab@kernel.org
Subject: Re: [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran

On Wed, Jun 30, 2021 at 08:58:23PM +0530, Naveen Krishna Chatradhi wrote:
> From: Muralidhara M K <muralimk@amd.com>
> 
> On newer heterogeneous systems from AMD, there is a possibility of 
> having GPU nodes along with CPU nodes with the MCA banks. The GPU 
> nodes (noncpu nodes) starts enumerating from northbridge index 8.
>

"there is a possibility of having GPU nodes along with CPU nodes with the MCA banks" doesn't read clearly to me. It could be more explicit.
For example, "On newer systems...the CPUs manages MCA errors reported from the GPUs. Enumerate the GPU nodes with the AMD NB framework to support EDAC, etc." or something like this.

Also, "northbridge index" isn't a hardware thing rather it's an internal Linux value. I think you are referring to the "AMD Node ID" value from CPUID. The GPUs don't have CPUID, so the "AMD Node ID" value can't be directly read like for CPUs. But the current hardware implementation is such that the GPU nodes are enumerated in sequential order based on the PCI hierarchy, and the first GPU node is assumed to have an "AMD Node ID" value of 8 (the second GPU node has 9, etc.). With this implemenation detail, the Data Fabric on the GPU nodes can be accessed the same way as the Data Fabric on CPU nodes.

> Aldebaran GPUs have 2 root ports, with 4 misc port for each root.
> 

I don't fully understand this sentence. There are 2 "Nodes"/Data Fabrics per GPU package, but what do "4 misc port for each root" mean? In any case, is this relevant to this patch?

Also, there should be an imperitive in the commit message, i.e. "Add ...".
[naveenk:] Modified the commit message

> Signed-off-by: Muralidhara M K <muralimk@amd.com>
> Signed-off-by: Naveen Krishna Chatradhi <nchatrad@amd.com>
> ---
>  arch/x86/include/asm/amd_nb.h |  6 ++++
>  arch/x86/kernel/amd_nb.c      | 62 ++++++++++++++++++++++++++++++++---
>  2 files changed, 63 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/amd_nb.h 
> b/arch/x86/include/asm/amd_nb.h index 00d1a400b7a1..e71581cf00e3 
> 100644
> --- a/arch/x86/include/asm/amd_nb.h
> +++ b/arch/x86/include/asm/amd_nb.h
> @@ -79,6 +79,12 @@ struct amd_northbridge_info {
>  
>  #ifdef CONFIG_AMD_NB
>  
> +/*
> + * On Newer heterogeneous systems from AMD with CPU and GPU nodes 
> +connected
> + * via xGMI links, the NON CPU Nodes are enumerated from index 8  */
> +#define NONCPU_NODE_INDEX	8

"Newer" doesn't need to be capatilized. And there should be a period at the end of the sentence.

I don't think "xGMI links" would mean much to most folks. I think the implication here is that the CPUs and GPUs are connected directly together (or rather their Data Fabrics are connected) like is done with
2 socket CPU systems and also within a socket for Multi-chip Module
(MCM) CPUs like Naples.
[naveenk:] Modified the message

> +
>  u16 amd_nb_num(void);
>  bool amd_nb_has_feature(unsigned int feature);  struct 
> amd_northbridge *node_to_amd_nb(int node); diff --git 
> a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 
> 5884dfa619ff..489003e850dd 100644
> --- a/arch/x86/kernel/amd_nb.c
> +++ b/arch/x86/kernel/amd_nb.c
> @@ -26,6 +26,8 @@
>  #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F4 0x1444
>  #define PCI_DEVICE_ID_AMD_19H_DF_F4	0x1654
>  #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F4 0x166e
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT	0x14bb
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4	0x14d4
>

These PCI IDs look correct.

>  /* Protect the PCI config register pairs used for SMN. */  static 
> DEFINE_MUTEX(smn_mutex); @@ -94,6 +96,21 @@ static const struct 
> pci_device_id hygon_nb_link_ids[] = {
>  	{}
>  };
>  
> +static const struct pci_device_id amd_noncpu_root_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT) },
> +	{}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_misc_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3) },
> +	{}
> +};
> +
> +static const struct pci_device_id amd_noncpu_nb_link_ids[] = {
> +	{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F4) },
> +	{}
> +};
> +

I think separating the CPU and non-CPU IDs is a good idea.

>  const struct amd_nb_bus_dev_range amd_nb_bus_dev_ranges[] __initconst = {
>  	{ 0x00, 0x18, 0x20 },
>  	{ 0xff, 0x00, 0x20 },
> @@ -182,11 +199,16 @@ int amd_cache_northbridges(void)
>  	const struct pci_device_id *misc_ids = amd_nb_misc_ids;
>  	const struct pci_device_id *link_ids = amd_nb_link_ids;
>  	const struct pci_device_id *root_ids = amd_root_ids;
> +
> +	const struct pci_device_id *noncpu_misc_ids = amd_noncpu_nb_misc_ids;
> +	const struct pci_device_id *noncpu_link_ids = amd_noncpu_nb_link_ids;
> +	const struct pci_device_id *noncpu_root_ids = amd_noncpu_root_ids;
> +
>  	struct pci_dev *root, *misc, *link;
>  	struct amd_northbridge *nb;
>  	u16 roots_per_misc = 0;
> -	u16 misc_count = 0;
> -	u16 root_count = 0;
> +	u16 misc_count = 0, misc_count_noncpu = 0;
> +	u16 root_count = 0, root_count_noncpu = 0;
>  	u16 i, j;
>  
>  	if (amd_northbridges.num)
> @@ -205,10 +227,16 @@ int amd_cache_northbridges(void)
>  	if (!misc_count)
>  		return -ENODEV;
>  
> +	while ((misc = next_northbridge(misc, noncpu_misc_ids)) != NULL)
> +		misc_count_noncpu++;
> +
>  	root = NULL;
>  	while ((root = next_northbridge(root, root_ids)) != NULL)
>  		root_count++;
>  
> +	while ((root = next_northbridge(root, noncpu_root_ids)) != NULL)
> +		root_count_noncpu++;
> +
>  	if (root_count) {
>  		roots_per_misc = root_count / misc_count;
>  
> @@ -222,15 +250,27 @@ int amd_cache_northbridges(void)
>  		}
>  	}
>  
> -	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> +	/*
> +	 * The valid amd_northbridges are in between (0 ~ misc_count) and
> +	 * (NONCPU_NODE_INDEX ~ NONCPU_NODE_INDEX + misc_count_noncpu)
> +	 */

This comment isn't clear to me. Is it even necessary?
[naveenk:] moved the message

> +	if (misc_count_noncpu)
> +		/*
> +		 * There are NONCPU Nodes with pci root ports starting at index 8
> +		 * allocate few extra cells for simplicity in handling the indexes
> +		 */

I think this comment can be more explicit. The first non-CPU Node ID starts at 8 even if there are fewer than 8 CPU nodes. To maintain the AMD Node ID to Linux amd_nb indexing scheme, allocate the number of GPU nodes plus 8. Some allocated amd_northbridge structures will go unused when the number of CPU nodes is less than 8, but this tradeoff is to keep things relatively simple.

> +		amd_northbridges.num = NONCPU_NODE_INDEX + misc_count_noncpu;
> +	else
> +		amd_northbridges.num = misc_count;

The if-else statements should have {}s even though there's only a single line of code in each. This is just to make it easier to read multiple lines. Or the second code comment can be merged with the first outside the if-else.
[naveenk:] Done

> +
> +	nb = kcalloc(amd_northbridges.num, sizeof(struct amd_northbridge), 
> +GFP_KERNEL);
>  	if (!nb)
>  		return -ENOMEM;
>  
>  	amd_northbridges.nb = nb;
> -	amd_northbridges.num = misc_count;
>  
>  	link = misc = root = NULL;
> -	for (i = 0; i < amd_northbridges.num; i++) {
> +	for (i = 0; i < misc_count; i++) {
>  		node_to_amd_nb(i)->root = root =
>  			next_northbridge(root, root_ids);
>  		node_to_amd_nb(i)->misc = misc =
> @@ -251,6 +291,18 @@ int amd_cache_northbridges(void)
>  			root = next_northbridge(root, root_ids);
>  	}
>  
> +	link = misc = root = NULL;

This line can go inside the if statement below.
[naveenk:] Done

I'm not sure it's totally necessary since the GPU devices should be listed after the CPU devices. But I guess better safe than sorry in case that implementation detail doesn't hold in the future. If you keep it, then I think you should do the same above when finding the counts.

> +	if (misc_count_noncpu) {
> +		for (i = NONCPU_NODE_INDEX; i < NONCPU_NODE_INDEX + misc_count_noncpu; i++) {
> +			node_to_amd_nb(i)->root = root =
> +				next_northbridge(root, noncpu_root_ids);
> +			node_to_amd_nb(i)->misc = misc =
> +				next_northbridge(misc, noncpu_misc_ids);
> +			node_to_amd_nb(i)->link = link =
> +				next_northbridge(link, noncpu_link_ids);
> +		}
> +	}
> +
>  	if (amd_gart_present())
>  		amd_northbridges.flags |= AMD_NB_GART;
>  
> --

Thanks,
Yazen
[naveenk:] Than you

  reply	other threads:[~2021-08-10 12:45 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 15:28 [PATCH 0/7] x86/edac/amd64: Add support for noncpu nodes Naveen Krishna Chatradhi
2021-06-30 15:28 ` [PATCH 1/7] x86/amd_nb: Add Aldebaran device to PCI IDs Naveen Krishna Chatradhi
2021-07-19 19:28   ` Yazen Ghannam
2021-08-10 12:45     ` Chatradhi, Naveen Krishna
2021-08-10 13:54       ` Borislav Petkov
2021-06-30 15:28 ` [PATCH 2/7] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-07-19 20:25   ` Yazen Ghannam
2021-08-10 12:45     ` Chatradhi, Naveen Krishna [this message]
2021-06-30 15:28 ` [PATCH 3/7] EDAC/mc: Add new HBM2 memory type Naveen Krishna Chatradhi
2021-07-19 20:47   ` Yazen Ghannam
2021-07-19 21:16     ` Luck, Tony
2021-07-20 12:13       ` Zhuo, Qiuxu
2021-07-20 16:30         ` [PATCH] EDAC/skx_common: Set the memory type correctly for HBM memory Luck, Tony
2021-07-20 16:33     ` [PATCH 3/7] EDAC/mc: Add new HBM2 memory type Luck, Tony
2021-06-30 15:28 ` [PATCH 4/7] EDAC/mce_amd: extract node id from InstanceHi in IPID Naveen Krishna Chatradhi
2021-07-29 16:32   ` Yazen Ghannam
2021-08-10 12:45     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 5/7] EDAC/amd64: Enumerate memory on noncpu nodes Naveen Krishna Chatradhi
2021-07-29 17:55   ` Yazen Ghannam
2021-08-10 12:49     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 6/7] EDAC/amd64: Add address translation support for DF3.5 Naveen Krishna Chatradhi
2021-07-29 17:59   ` Yazen Ghannam
2021-07-29 18:13     ` Chatradhi, Naveen Krishna
2021-06-30 15:28 ` [PATCH 7/7] EDAC/amd64: Add fixed UMC to CS mapping Naveen Krishna Chatradhi
2021-08-06  7:43 ` [PATCH v2 0/3] x86/edac/amd64: Add support for noncpu nodes Naveen Krishna Chatradhi
2021-08-06  7:43   ` [PATCH v2 1/3] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-08-20 15:36     ` Yazen Ghannam
2021-08-06  7:43   ` [PATCH v2 2/3] EDAC/mce_amd: Extract node id from InstanceHi in IPID Naveen Krishna Chatradhi
2021-08-20 15:43     ` Yazen Ghannam
2021-08-06  7:43   ` [PATCH v2 3/3] EDAC/amd64: Enumerate memory on noncpu nodes Naveen Krishna Chatradhi
2021-08-20 17:02     ` Yazen Ghannam
2021-08-23 16:56       ` Chatradhi, Naveen Krishna
2021-08-23 18:54     ` [PATCH v3 0/3] x86/edac/amd64: Add support for " Naveen Krishna Chatradhi
2021-08-23 18:54       ` [PATCH v3 1/3] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-08-25 10:42         ` Borislav Petkov
2021-09-01 18:17           ` Yazen Ghannam
2021-09-02 17:30             ` Borislav Petkov
2021-09-13 18:07               ` Yazen Ghannam
2021-09-16 16:36                 ` Borislav Petkov
2021-10-11 14:26           ` Chatradhi, Naveen Krishna
2021-10-11 18:08             ` Borislav Petkov
2021-08-23 18:54       ` [PATCH v3 2/3] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-08-27 10:24         ` Borislav Petkov
2021-09-01 18:18           ` Yazen Ghannam
2021-08-23 18:54       ` [PATCH v3 3/3] EDAC/amd64: Enumerate memory on noncpu nodes Naveen Krishna Chatradhi
2021-08-27 11:30         ` Borislav Petkov
2021-09-01 18:42           ` Yazen Ghannam
2021-09-08 18:41             ` Borislav Petkov
2021-09-13 18:19               ` Yazen Ghannam
2021-10-14 18:50       ` [PATCH v4 0/4] x86/edac/amd64: Add support for " Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH 3/4] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-10-14 18:50         ` [PATCH 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-10-14 18:53       ` [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes Naveen Krishna Chatradhi
2021-10-14 18:53         ` [PATCH v4 1/4] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-10-21 15:52           ` Yazen Ghannam
2021-10-14 18:53         ` [PATCH v4 2/4] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-10-21 15:55           ` Yazen Ghannam
2021-10-14 18:53         ` [PATCH v4 3/4] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-10-21 16:27           ` Yazen Ghannam
2021-10-14 18:54         ` [PATCH v4 4/4] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-10-21 16:40           ` Yazen Ghannam
2021-10-14 19:53         ` [PATCH v4 0/4] x86/edac/amd64: Add support for noncpu nodes Borislav Petkov
2021-10-15 12:18           ` Chatradhi, Naveen Krishna
2021-10-15 19:25             ` Borislav Petkov
2021-08-20 17:07   ` [PATCH v2 0/3] " Yazen Ghannam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BL1PR12MB5286AF6D10A89BBA9DB569EBE8F79@BL1PR12MB5286.namprd12.prod.outlook.com \
    --to=naveenkrishna.chatradhi@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=mingo@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.