All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu
@ 2021-10-14 18:12 Jonathan Kim
  2021-10-14 18:46 ` Felix Kuehling
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Kim @ 2021-10-14 18:12 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Sean.Keely, Jonathan Kim

ROCr needs to be able to identify all devices that have direct access to
fine grain memory, which should include CPUs that are connected to GPUs
over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the
CPU is part of the hive.

v2: fixup to ensure all numa nodes get the hive id mapped

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 98cca5f2b27f..9fda4ee03813 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1296,6 +1296,26 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
 	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
 
+	adev = (struct amdgpu_device *)(gpu->kgd);
+
+	/* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */
+	if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) {
+		int i;
+
+		for (i = 0; i < proximity_domain; i++) {
+			struct kfd_topology_device *to_dev =
+						kfd_topology_device_by_proximity_domain(i);
+
+			if (!to_dev)
+				continue;
+
+			if (to_dev->gpu)
+				break;
+
+			to_dev->node_props.hive_id = gpu->hive_id;
+		}
+	}
+
 	/* Check to see if this gpu device exists in the topology_device_list.
 	 * If so, assign the gpu to that device,
 	 * else create a Virtual CRAT for this gpu device and then parse that
@@ -1457,7 +1477,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		dev->node_props.max_waves_per_simd = 10;
 	}
 
-	adev = (struct amdgpu_device *)(dev->gpu->kgd);
 	/* kfd only concerns sram ecc on GFX and HBM ecc on UMC */
 	dev->node_props.capability |=
 		((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu
  2021-10-14 18:12 [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu Jonathan Kim
@ 2021-10-14 18:46 ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2021-10-14 18:46 UTC (permalink / raw)
  To: Jonathan Kim, amd-gfx; +Cc: Sean.Keely

Am 2021-10-14 um 2:12 p.m. schrieb Jonathan Kim:
> ROCr needs to be able to identify all devices that have direct access to
> fine grain memory, which should include CPUs that are connected to GPUs
> over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the
> CPU is part of the hive.
>
> v2: fixup to ensure all numa nodes get the hive id mapped
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 98cca5f2b27f..9fda4ee03813 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1296,6 +1296,26 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>  
>  	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
>  
> +	adev = (struct amdgpu_device *)(gpu->kgd);
> +
> +	/* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */
> +	if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) {
> +		int i;
> +
> +		for (i = 0; i < proximity_domain; i++) {
> +			struct kfd_topology_device *to_dev =
> +						kfd_topology_device_by_proximity_domain(i);

Sorry, one more nit-pick. This loop is pretty inefficient (0(n^2))
because kfd_topolody_device_by_proximity_domain does a linear search
itself. It would be more efficient to just loop over the
topology_device_list directly here (while holding the read lock):

>         down_read(&topology_lock);
>
>         list_for_each_entry(top_dev, &topology_device_list, list) {
>                 ...
Regards,
  Felix


> +
> +			if (!to_dev)
> +				continue;
> +
> +			if (to_dev->gpu)
> +				break;
> +
> +			to_dev->node_props.hive_id = gpu->hive_id;
> +		}
> +	}
> +
>  	/* Check to see if this gpu device exists in the topology_device_list.
>  	 * If so, assign the gpu to that device,
>  	 * else create a Virtual CRAT for this gpu device and then parse that
> @@ -1457,7 +1477,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>  		dev->node_props.max_waves_per_simd = 10;
>  	}
>  
> -	adev = (struct amdgpu_device *)(dev->gpu->kgd);
>  	/* kfd only concerns sram ecc on GFX and HBM ecc on UMC */
>  	dev->node_props.capability |=
>  		((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu
  2021-10-15 15:11 Jonathan Kim
@ 2021-10-15 21:52 ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2021-10-15 21:52 UTC (permalink / raw)
  To: Jonathan Kim, amd-gfx; +Cc: Sean.Keely


On 2021-10-15 11:11 a.m., Jonathan Kim wrote:
> ROCr needs to be able to identify all devices that have direct access to
> fine grain memory, which should include CPUs that are connected to GPUs
> over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the
> CPU is part of the hive.
>
> v3: avoid quadratic search by doing linear list read instead querying per
> proximity id
>
> v2: fixup to ensure all numa nodes get the hive id mapped
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 19 ++++++++++++++++++-
>   1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 98cca5f2b27f..dd593ad0614a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1296,6 +1296,24 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>   
>   	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
>   
> +	adev = (struct amdgpu_device *)(gpu->kgd);
> +
> +	/* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */
> +	if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) {
> +		struct kfd_topology_device *top_dev;
> +
> +		down_read(&topology_lock);
> +
> +		list_for_each_entry(top_dev, &topology_device_list, list) {
> +			if (top_dev->gpu)
> +				break;
> +
> +			top_dev->node_props.hive_id = gpu->hive_id;
> +		}
> +
> +		up_read(&topology_lock);
> +	}
> +
>   	/* Check to see if this gpu device exists in the topology_device_list.
>   	 * If so, assign the gpu to that device,
>   	 * else create a Virtual CRAT for this gpu device and then parse that
> @@ -1457,7 +1475,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>   		dev->node_props.max_waves_per_simd = 10;
>   	}
>   
> -	adev = (struct amdgpu_device *)(dev->gpu->kgd);
>   	/* kfd only concerns sram ecc on GFX and HBM ecc on UMC */
>   	dev->node_props.capability |=
>   		((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu
@ 2021-10-15 15:11 Jonathan Kim
  2021-10-15 21:52 ` Felix Kuehling
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Kim @ 2021-10-15 15:11 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Sean.Keely, Jonathan Kim

ROCr needs to be able to identify all devices that have direct access to
fine grain memory, which should include CPUs that are connected to GPUs
over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the
CPU is part of the hive.

v3: avoid quadratic search by doing linear list read instead querying per
proximity id

v2: fixup to ensure all numa nodes get the hive id mapped

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 98cca5f2b27f..dd593ad0614a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1296,6 +1296,24 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
 	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
 
+	adev = (struct amdgpu_device *)(gpu->kgd);
+
+	/* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */
+	if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) {
+		struct kfd_topology_device *top_dev;
+
+		down_read(&topology_lock);
+
+		list_for_each_entry(top_dev, &topology_device_list, list) {
+			if (top_dev->gpu)
+				break;
+
+			top_dev->node_props.hive_id = gpu->hive_id;
+		}
+
+		up_read(&topology_lock);
+	}
+
 	/* Check to see if this gpu device exists in the topology_device_list.
 	 * If so, assign the gpu to that device,
 	 * else create a Virtual CRAT for this gpu device and then parse that
@@ -1457,7 +1475,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		dev->node_props.max_waves_per_simd = 10;
 	}
 
-	adev = (struct amdgpu_device *)(dev->gpu->kgd);
 	/* kfd only concerns sram ecc on GFX and HBM ecc on UMC */
 	dev->node_props.capability |=
 		((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu
  2021-10-14 17:44 Jonathan Kim
@ 2021-10-14 17:55 ` Felix Kuehling
  0 siblings, 0 replies; 6+ messages in thread
From: Felix Kuehling @ 2021-10-14 17:55 UTC (permalink / raw)
  To: Jonathan Kim, amd-gfx; +Cc: Sean.Keely

Am 2021-10-14 um 1:44 p.m. schrieb Jonathan Kim:
> ROCr needs to be able to identify all devices that have direct access to
> fine grain memory, which should include CPUs that are connected to GPUs
> over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the
> CPU is part of the hive.
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 98cca5f2b27f..d04c48dfd72b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -1296,6 +1296,27 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>  
>  	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
>  
> +	adev = (struct amdgpu_device *)(gpu->kgd);
> +
> +	/* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */
> +	if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) {
> +		int i;
> +
> +		for (i = 0; i < proximity_domain; i++) {
> +			struct kfd_topology_device *to_dev =
> +						kfd_topology_device_by_proximity_domain(i);
> +
> +			if (!to_dev)
> +				continue;
> +
> +			if (to_dev->gpu)
> +				break;
> +
> +			to_dev->node_props.hive_id = gpu->hive_id;
> +			break;

On a NUMA system there will be multiple CPU nodes (e.g. in NPS-4 mode).
The "break" statement here means, you'll only update the hive ID on the
first NUMA node.

Other than that, this change makes sense.

Regards,
  Felix


> +		}
> +	}
> +
>  	/* Check to see if this gpu device exists in the topology_device_list.
>  	 * If so, assign the gpu to that device,
>  	 * else create a Virtual CRAT for this gpu device and then parse that
> @@ -1457,7 +1478,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>  		dev->node_props.max_waves_per_simd = 10;
>  	}
>  
> -	adev = (struct amdgpu_device *)(dev->gpu->kgd);
>  	/* kfd only concerns sram ecc on GFX and HBM ecc on UMC */
>  	dev->node_props.capability |=
>  		((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu
@ 2021-10-14 17:44 Jonathan Kim
  2021-10-14 17:55 ` Felix Kuehling
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Kim @ 2021-10-14 17:44 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Sean.Keely, Jonathan Kim

ROCr needs to be able to identify all devices that have direct access to
fine grain memory, which should include CPUs that are connected to GPUs
over xGMI. The GPU hive ID can be mapped onto the CPU hive ID since the
CPU is part of the hive.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 98cca5f2b27f..d04c48dfd72b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1296,6 +1296,27 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
 	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
 
+	adev = (struct amdgpu_device *)(gpu->kgd);
+
+	/* Include the CPU in xGMI hive if xGMI connected by assigning it the hive ID. */
+	if (gpu->hive_id && adev->gmc.xgmi.connected_to_cpu) {
+		int i;
+
+		for (i = 0; i < proximity_domain; i++) {
+			struct kfd_topology_device *to_dev =
+						kfd_topology_device_by_proximity_domain(i);
+
+			if (!to_dev)
+				continue;
+
+			if (to_dev->gpu)
+				break;
+
+			to_dev->node_props.hive_id = gpu->hive_id;
+			break;
+		}
+	}
+
 	/* Check to see if this gpu device exists in the topology_device_list.
 	 * If so, assign the gpu to that device,
 	 * else create a Virtual CRAT for this gpu device and then parse that
@@ -1457,7 +1478,6 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		dev->node_props.max_waves_per_simd = 10;
 	}
 
-	adev = (struct amdgpu_device *)(dev->gpu->kgd);
 	/* kfd only concerns sram ecc on GFX and HBM ecc on UMC */
 	dev->node_props.capability |=
 		((adev->ras_enabled & BIT(AMDGPU_RAS_BLOCK__GFX)) != 0) ?
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-15 21:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14 18:12 [PATCH] drm/amdkfd: map gpu hive id to xgmi connected cpu Jonathan Kim
2021-10-14 18:46 ` Felix Kuehling
  -- strict thread matches above, loose matches on Subject: below --
2021-10-15 15:11 Jonathan Kim
2021-10-15 21:52 ` Felix Kuehling
2021-10-14 17:44 Jonathan Kim
2021-10-14 17:55 ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.