All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Naveen Krishna Chatradhi <nchatrad@amd.com>
Cc: linux-edac@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	mchehab@kernel.org, yazen.ghannam@amd.com,
	Muralidhara M K <muralimk@amd.com>
Subject: Re: [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran
Date: Mon, 1 Nov 2021 18:28:18 +0100	[thread overview]
Message-ID: <YYAjssgwjBw/vkf0@zn.tnic> (raw)
In-Reply-To: <20211028130106.15701-2-nchatrad@amd.com>

On Thu, Oct 28, 2021 at 06:31:02PM +0530, Naveen Krishna Chatradhi wrote:
> +/* GPU Data Fabric ID Device 24 Function 1 */
> +#define PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1 0x14d1
> +
> +/* DF18xF1 registers on Aldebaran GPU */
> +#define REG_LOCAL_NODE_TYPE_MAP		0x144
> +#define REG_RMT_NODE_TYPE_MAP		0x148

Move those defines up, along with the rest of them. While at it, you can
align them all vertically.

> +
> +/*
> + * Newer AMD CPUs and GPUs whose data fabrics can be connected via custom xGMI

"Newer" is a commit message type of adjective and doesn't belong in
permanent comments because when years pass, they won't be "newer"
anymore. IOW, you can simply drop it here.

> + * links, comes with registers to gather local and remote node type map info.

"come"

> + *
> + * "Local Node Type" refers to nodes with the same type as that from which the
> + * register is read, and "Remote Node Type" refers to nodes with a different type.

This sure sounds weird.

With my simplistic thinking I'd assume "local" is the CPU and "remote"
is the GPU...

> + * This function, reads the registers from GPU DF function 1.
> + * Hence, local nodes are GPU and remote nodes are CPUs.
> + */
> +static int amd_get_node_map(void)
> +{
> +	struct amd_node_map *nodemap;
> +	struct pci_dev *pdev;
> +	u32 tmp;
> +
> +	pdev = pci_get_device(PCI_VENDOR_ID_AMD,
> +			      PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F1, NULL);
> +	if (!pdev) {
> +		pr_debug("DF Func1 PCI device not found on this node.\n");
> +		return -ENODEV;
> +	}
> +
> +	nodemap = kmalloc(sizeof(*nodemap), GFP_KERNEL);

You allocate a whopping 4 bytes? Just do

	struct amd_node_map nodemap;

in the nb info descriptor.

I still need to see whether those node maps and functions you're adding
and exporting even make sense but that will happen later.

> +	if (!nodemap)
> +		return -ENOMEM;
> +
> +	pci_read_config_dword(pdev, REG_LOCAL_NODE_TYPE_MAP, &tmp);

Check retval.

> +	nodemap->gpu_node_start_id = tmp & 0xFFF;
> +
> +	pci_read_config_dword(pdev, REG_RMT_NODE_TYPE_MAP, &tmp);

Ditto.

> +	nodemap->cpu_node_count = tmp >> 16 & 0xFFF;
> +
> +	amd_northbridges.nodemap = nodemap;
> +	return 0;
> +}
> +
>  static struct pci_dev *next_northbridge(struct pci_dev *dev,
>  					const struct pci_device_id *ids)
>  {
> @@ -230,6 +297,27 @@ int amd_df_indirect_read(u16 node, u8 func, u16 reg, u8 instance_id, u32 *lo)
>  }
>  EXPORT_SYMBOL_GPL(amd_df_indirect_read);
>  
> +struct pci_dev *get_root_devs(struct pci_dev *root,

static

> +			      const struct pci_device_id *root_ids,
> +			      u16 roots_per_misc)
> +{
> +	u16 j;
> +
> +	/*
> +	 * If there are more PCI root devices than data fabric/
> +	 * system management network interfaces, then the (N)
> +	 * PCI roots per DF/SMN interface are functionally the
> +	 * same (for DF/SMN access) and N-1 are redundant.  N-1
> +	 * PCI roots should be skipped per DF/SMN interface so
> +	 * the following DF/SMN interfaces get mapped to
> +	 * correct PCI roots.
> +	 */
> +	for (j = 0; j < roots_per_misc; j++)
> +		root = next_northbridge(root, root_ids);
> +
> +	return root;
> +}
> +
>  int amd_cache_northbridges(void)
>  {
>  	const struct pci_device_id *misc_ids = amd_nb_misc_ids;
> @@ -237,10 +325,10 @@ int amd_cache_northbridges(void)
>  	const struct pci_device_id *root_ids = amd_root_ids;
>  	struct pci_dev *root, *misc, *link;
>  	struct amd_northbridge *nb;
> -	u16 roots_per_misc = 0;
> -	u16 misc_count = 0;
> -	u16 root_count = 0;
> -	u16 i, j;
> +	u16 roots_per_misc = 0, gpu_roots_per_misc = 0;
> +	u16 misc_count = 0, gpu_misc_count = 0;
> +	u16 root_count = 0, gpu_root_count = 0;
> +	u16 i;
>  
>  	if (amd_northbridges.num)
>  		return 0;
> @@ -252,15 +340,23 @@ int amd_cache_northbridges(void)
>  	}
>  
>  	misc = NULL;
> -	while ((misc = next_northbridge(misc, misc_ids)) != NULL)
> -		misc_count++;
> +	while ((misc = next_northbridge(misc, misc_ids)) != NULL) {

Just remove that redundant "!= NULL" at the end, while at it.

> +		if (misc->device == PCI_DEVICE_ID_AMD_ALDEBARAN_DF_F3)
> +			gpu_misc_count++;
> +		else
> +			misc_count++;
> +	}
>  
>  	if (!misc_count)
>  		return -ENODEV;
>  
>  	root = NULL;
> -	while ((root = next_northbridge(root, root_ids)) != NULL)
> -		root_count++;
> +	while ((root = next_northbridge(root, root_ids)) != NULL) {
> +		if (root->device == PCI_DEVICE_ID_AMD_ALDEBARAN_ROOT)
> +			gpu_root_count++;
> +		else
> +			root_count++;
> +	}
>  
>  	if (root_count) {
>  		roots_per_misc = root_count / misc_count;
> @@ -275,33 +371,37 @@ int amd_cache_northbridges(void)
>  		}
>  	}
>  
> -	nb = kcalloc(misc_count, sizeof(struct amd_northbridge), GFP_KERNEL);
> +	/*
> +	 * The number of miscs, roots and roots_per_misc might vary on different
> +	 * nodes of a heterogeneous system.
> +	 * Calculate roots_per_misc accordingly in order to skip the redundant
> +	 * roots and map the DF/SMN interfaces to correct PCI roots.
> +	 */

Reflow that comment so that it is a block.

> +	if (gpu_root_count && gpu_misc_count) {
> +		int ret = amd_get_node_map();
> +

^ Superfluous newline.

> +		if (ret)
> +			return ret;
> +
> +		gpu_roots_per_misc = gpu_root_count / gpu_misc_count;
> +	}
> +

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-11-01 17:28 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-28 13:01 [PATCH v6 0/5] x86/edac/amd64: Add heterogeneous node support Naveen Krishna Chatradhi
2021-10-28 13:01 ` [PATCH v6 1/5] x86/amd_nb: Add support for northbridges on Aldebaran Naveen Krishna Chatradhi
2021-11-01 17:28   ` Borislav Petkov [this message]
2021-11-02 18:03   ` Borislav Petkov
2021-11-04 13:18     ` Chatradhi, Naveen Krishna
2021-11-08 13:34       ` Borislav Petkov
2021-11-08 16:53         ` Chatradhi, Naveen Krishna
2021-11-08 19:03           ` Borislav Petkov
2021-11-09 11:30             ` Chatradhi, Naveen Krishna
2021-11-09 20:41               ` Borislav Petkov
2021-11-04 13:21     ` Chatradhi, Naveen Krishna
2021-10-28 13:01 ` [PATCH v6 2/5] EDAC/mce_amd: Extract node id from MCA_IPID Naveen Krishna Chatradhi
2021-11-08 13:37   ` Borislav Petkov
2021-10-28 13:01 ` [PATCH v6 3/5] EDAC/amd64: Extend family ops functions Naveen Krishna Chatradhi
2021-11-10 17:45   ` Borislav Petkov
2021-11-11 16:23     ` Chatradhi, Naveen Krishna
2021-11-11 18:05       ` Borislav Petkov
2021-11-12 20:59       ` Yazen Ghannam
2021-11-13 11:58         ` Borislav Petkov
2021-10-28 13:01 ` [PATCH v6 4/5] EDAC/amd64: Move struct fam_type into amd64_pvt structure Naveen Krishna Chatradhi
2021-11-11 12:39   ` Borislav Petkov
2021-11-11 16:26     ` Chatradhi, Naveen Krishna
2021-10-28 13:01 ` [PATCH v6 5/5] EDAC/amd64: Enumerate memory on Aldebaran GPU nodes Naveen Krishna Chatradhi
2021-11-11 13:12   ` Borislav Petkov
2021-11-15 15:24     ` Chatradhi, Naveen Krishna
2021-11-15 16:04       ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YYAjssgwjBw/vkf0@zn.tnic \
    --to=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=mingo@redhat.com \
    --cc=muralimk@amd.com \
    --cc=nchatrad@amd.com \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.