All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/37] KFD dGPU topology and initialization
@ 2017-12-09  4:08 Felix Kuehling
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-09  4:09 ` [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root Felix Kuehling
  0 siblings, 2 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling

This patch series adds support for dGPU topology to KFD and implements
everything needed to initialize KFD on dGPUs.

This is still missing dGPU memory management APIs, so it's not going to
be able to run any user mode tests yet. But device information about CPUs
and supported dGPUs should be reported correctly in
/sys/class/kfd/kfd/topology/nodes/*.

Patches 1-10 are small fixes and additions to the topology code.
Patches 11-19 reorganize the topology code to prepare for dGPU support.
Patches 20 and 21 add topology support for CPUs and dGPUs respectively.
Patches 22-28 add more topology features and fixes/workarounds.
Patch 29 adds a helper to enable PCIe atomics to the PCI driver.
Patches 30-36 enable KFD initialization on dGPUs
Patch 37 enables KFD initialization on supported dGPUs in AMDGPU.

This is my last patch series this year. I worked hard to finish this
before my year-end vacation.

Amber Lin (1):
  drm/amdkfd: Add perf counters to topology

Ben Goz (1):
  drm/amdkfd: Add AQL Queue Memory flag on topology

Felix Kuehling (13):
  drm/amdkfd: Group up CRAT related functions
  drm/amdkfd: Turn verbose topology messages into pr_debug
  drm/amdkfd: Simplify counting of memory banks
  drm/amdkfd: Add topology support for CPUs
  drm/amdkfd: Module option to disable CRAT table
  drm/amdkfd: Conditionally enable PCIe atomics
  drm/amdkfd: Make IOMMUv2 code conditional
  drm/amdkfd: Make sched_policy a per-device setting
  drm/amdkfd: Add dGPU support to the device queue manager
  drm/amdkfd: Add dGPU support to the MQD manager
  drm/amdkfd: Add dGPU support to kernel_queue_init
  drm/amdkfd: Add dGPU device IDs and device info
  drm/amdgpu: Enable KFD initialization on dGPUs

Flora Cui (3):
  drm/amd: add new interface to query cu info
  drm/amdgpu: add amdgpu interface to query cu info
  drm/amdkfd: Update number of compute unit from KGD

Harish Kasiviswanathan (13):
  drm/amd: Add get_local_mem_info to KGD-KFD interface
  drm/amdgpu: Implement get_local_mem_info
  drm/amdkfd: Stop using get_vmem_size KGD-KFD interface
  drm/amdkfd: Remove deprecated get_vmem_size
  drm/amd: Remove get_vmem_size from KGD-KFD interface
  drm/amdkfd: Topology: Fix location_id
  drm/amdkfd: Reorganize CRAT fetching from ACPI
  drm/amdkfd: Decouple CRAT parsing from device list update
  drm/amdkfd: Support enumerating non-GPU devices
  drm/amdkfd: sync IOLINK defines to thunk spec
  drm/amdkfd: Fix sibling_map[] size
  drm/amdkfd: Add topology support for dGPUs
  drm/amdkfd: Ignore ACPI CRAT for non-APU systems

Jay Cornwall (1):
  PCI: Add pci_enable_atomic_ops_to_root

Kent Russell (3):
  drm/amdkfd: Coding style cleanup
  drm/amdgpu: Add support for reporting VRAM usage
  drm/amdkfd: Add support for displaying VRAM usage

Philip Cox (1):
  drm/amdkfd: Fixup incorrect info in the CZ CRAT table

Yong Zhao (1):
  drm/amdkfd: Fix memory leaks in kfd topology

 drivers/gpu/drm/amd/amdgpu/amdgpu.h                |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   65 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |    4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |    4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c              |    7 +
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c              |    5 +
 drivers/gpu/drm/amd/amdkfd/Kconfig                 |    2 +-
 drivers/gpu/drm/amd/amdkfd/Makefile                |    2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |    3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c              | 1271 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h              |   42 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c            |    3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  230 +++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |   33 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    5 +
 .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |   56 +
 .../drm/amd/amdkfd/kfd_device_queue_manager_vi.c   |   93 ++
 drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |    7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |    5 +
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    7 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   35 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   21 +
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c             |    2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |   21 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   17 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |    3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c          | 1054 +++++++++-------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |   39 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   35 +-
 drivers/pci/pci.c                                  |   81 ++
 include/linux/pci.h                                |    1 +
 include/uapi/linux/pci_regs.h                      |    2 +
 35 files changed, 2656 insertions(+), 512 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_crat.c

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 01/37] drm/amd: add new interface to query cu info
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 02/37] drm/amdgpu: add amdgpu " Felix Kuehling
                     ` (27 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Flora Cui, Harish Kasiviswanathan

From: Flora Cui <flora.cui@amd.com>

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index fe3079a..3a93ffe 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -46,6 +46,20 @@ enum kfd_preempt_type {
 	KFD_PREEMPT_TYPE_WAVEFRONT_RESET,
 };
 
+struct kfd_cu_info {
+	uint32_t num_shader_engines;
+	uint32_t num_shader_arrays_per_engine;
+	uint32_t num_cu_per_sh;
+	uint32_t cu_active_number;
+	uint32_t cu_ao_mask;
+	uint32_t simd_per_cu;
+	uint32_t max_waves_per_simd;
+	uint32_t wave_front_size;
+	uint32_t max_scratch_slots_per_cu;
+	uint32_t lds_size;
+	uint32_t cu_bitmap[4][4];
+};
+
 enum kgd_memory_pool {
 	KGD_POOL_SYSTEM_CACHEABLE = 1,
 	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
@@ -153,6 +167,8 @@ struct tile_config {
  *
  * @get_tile_config: Returns GPU-specific tiling mode information
  *
+ * @get_cu_info: Retrieves activated cu info
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -239,6 +255,9 @@ struct kfd2kgd_calls {
 	void (*set_scratch_backing_va)(struct kgd_dev *kgd,
 				uint64_t va, uint32_t vmid);
 	int (*get_tile_config)(struct kgd_dev *kgd, struct tile_config *config);
+
+	void (*get_cu_info)(struct kgd_dev *kgd,
+			struct kfd_cu_info *cu_info);
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 02/37] drm/amdgpu: add amdgpu interface to query cu info
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-09  4:08   ` [PATCH 01/37] drm/amd: add new interface to query cu info Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 03/37] drm/amd: Add get_local_mem_info to KGD-KFD interface Felix Kuehling
                     ` (26 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Flora Cui, Harish Kasiviswanathan

From: Flora Cui <flora.cui@amd.com>

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h               |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 23 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c             |  7 +++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c             |  5 +++++
 7 files changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index cbcb6a1..a6552c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -954,6 +954,7 @@ struct amdgpu_gfx_config {
 };
 
 struct amdgpu_cu_info {
+	uint32_t simd_per_cu;
 	uint32_t max_waves_per_simd;
 	uint32_t wave_front_size;
 	uint32_t max_scratch_slots_per_cu;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index f7fa767..cfb7827 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -271,3 +271,26 @@ uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd)
 
 	return amdgpu_dpm_get_sclk(adev, false) / 100;
 }
+
+void get_cu_info(struct kgd_dev *kgd, struct kfd_cu_info *cu_info)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+	struct amdgpu_cu_info acu_info = adev->gfx.cu_info;
+
+	memset(cu_info, 0, sizeof(*cu_info));
+	if (sizeof(cu_info->cu_bitmap) != sizeof(acu_info.bitmap))
+		return;
+
+	cu_info->cu_active_number = acu_info.number;
+	cu_info->cu_ao_mask = acu_info.ao_cu_mask;
+	memcpy(&cu_info->cu_bitmap[0], &acu_info.bitmap[0],
+	       sizeof(acu_info.bitmap));
+	cu_info->num_shader_engines = adev->gfx.config.max_shader_engines;
+	cu_info->num_shader_arrays_per_engine = adev->gfx.config.max_sh_per_se;
+	cu_info->num_cu_per_sh = adev->gfx.config.max_cu_per_sh;
+	cu_info->simd_per_cu = acu_info.simd_per_cu;
+	cu_info->max_waves_per_simd = acu_info.max_waves_per_simd;
+	cu_info->wave_front_size = acu_info.wave_front_size;
+	cu_info->max_scratch_slots_per_cu = acu_info.max_scratch_slots_per_cu;
+	cu_info->lds_size = acu_info.lds_size;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 8d689ab..a8fa225 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -60,6 +60,7 @@ uint64_t get_vmem_size(struct kgd_dev *kgd);
 uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
 
 uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
+void get_cu_info(struct kgd_dev *kgd, struct kfd_cu_info *cu_info);
 
 #define read_user_wptr(mmptr, wptr, dst)				\
 	({								\
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 12feba8..c9b98d0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -200,6 +200,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
+	.get_cu_info = get_cu_info
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index b380495..c538e30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -161,6 +161,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
+	.get_cu_info = get_cu_info
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 5c8a7a4..43f9e10 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -48,6 +48,8 @@
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 
+#define NUM_SIMD_PER_CU 0x4 /* missing from the gfx_7 IP headers */
+
 #define GFX7_NUM_GFX_RINGS     1
 #define GFX7_MEC_HPD_SIZE      2048
 
@@ -5282,6 +5284,11 @@ static void gfx_v7_0_get_cu_info(struct amdgpu_device *adev)
 
 	cu_info->number = active_cu_number;
 	cu_info->ao_cu_mask = ao_cu_mask;
+	cu_info->simd_per_cu = NUM_SIMD_PER_CU;
+	cu_info->max_waves_per_simd = 10;
+	cu_info->max_scratch_slots_per_cu = 32;
+	cu_info->wave_front_size = 64;
+	cu_info->lds_size = 64;
 }
 
 const struct amdgpu_ip_block_version gfx_v7_0_ip_block =
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 9ecdf62..0270028 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -7132,6 +7132,11 @@ static void gfx_v8_0_get_cu_info(struct amdgpu_device *adev)
 
 	cu_info->number = active_cu_number;
 	cu_info->ao_cu_mask = ao_cu_mask;
+	cu_info->simd_per_cu = NUM_SIMD_PER_CU;
+	cu_info->max_waves_per_simd = 10;
+	cu_info->max_scratch_slots_per_cu = 32;
+	cu_info->wave_front_size = 64;
+	cu_info->lds_size = 64;
 }
 
 const struct amdgpu_ip_block_version gfx_v8_0_ip_block =
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 03/37] drm/amd: Add get_local_mem_info to KGD-KFD interface
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-09  4:08   ` [PATCH 01/37] drm/amd: add new interface to query cu info Felix Kuehling
  2017-12-09  4:08   ` [PATCH 02/37] drm/amdgpu: add amdgpu " Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info Felix Kuehling
                     ` (25 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Ben Goz, Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Add get_local_mem_info which provides more information about local
memory than get_vmem_size:
* public and private framebuffer size
* memory clock

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 3a93ffe..c58389c 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -60,6 +60,14 @@ struct kfd_cu_info {
 	uint32_t cu_bitmap[4][4];
 };
 
+/* For getting GPU local memory information from KGD */
+struct kfd_local_mem_info {
+	uint64_t local_mem_size_private;
+	uint64_t local_mem_size_public;
+	uint32_t vram_width;
+	uint32_t mem_clk_max;
+};
+
 enum kgd_memory_pool {
 	KGD_POOL_SYSTEM_CACHEABLE = 1,
 	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
@@ -122,6 +130,8 @@ struct tile_config {
  *
  * @get_vmem_size: Retrieves (physical) size of VRAM
  *
+ * @get_local_mem_info: Retrieves information about GPU local memory
+ *
  * @get_gpu_clock_counter: Retrieves GPU clock counter
  *
  * @get_max_engine_clock_in_mhz: Retrieves maximum GPU clock in MHz
@@ -181,6 +191,8 @@ struct kfd2kgd_calls {
 	void (*free_gtt_mem)(struct kgd_dev *kgd, void *mem_obj);
 
 	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
+	void (*get_local_mem_info)(struct kgd_dev *kgd,
+			struct kfd_local_mem_info *mem_info);
 	uint64_t (*get_gpu_clock_counter)(struct kgd_dev *kgd);
 
 	uint32_t (*get_max_engine_clock_in_mhz)(struct kgd_dev *kgd);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 03/37] drm/amd: Add get_local_mem_info to KGD-KFD interface Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
       [not found]     ` <1512792555-26042-5-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-09  4:08   ` [PATCH 05/37] drm/amdkfd: Stop using get_vmem_size KGD-KFD interface Felix Kuehling
                     ` (24 subsequent siblings)
  28 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Ben Goz, Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Implement new kgd-kfd interface function get_local_mem_info.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 30 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
 4 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index cfb7827..56f6c12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -252,6 +252,36 @@ uint64_t get_vmem_size(struct kgd_dev *kgd)
 	return adev->mc.real_vram_size;
 }
 
+void get_local_mem_info(struct kgd_dev *kgd,
+			struct kfd_local_mem_info *mem_info)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+	uint64_t address_mask = adev->dev->dma_mask ? ~*adev->dev->dma_mask :
+					     ~((1ULL << 32) - 1);
+	resource_size_t aper_limit = adev->mc.aper_base + adev->mc.aper_size;
+
+	memset(mem_info, 0, sizeof(*mem_info));
+	if (!(adev->mc.aper_base & address_mask || aper_limit & address_mask)) {
+		mem_info->local_mem_size_public = adev->mc.visible_vram_size;
+		mem_info->local_mem_size_private = adev->mc.real_vram_size -
+				adev->mc.visible_vram_size;
+	} else {
+		mem_info->local_mem_size_public = 0;
+		mem_info->local_mem_size_private = adev->mc.real_vram_size;
+	}
+	mem_info->vram_width = adev->mc.vram_width;
+
+	pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
+			adev->mc.aper_base, aper_limit,
+			mem_info->local_mem_size_public,
+			mem_info->local_mem_size_private);
+
+	if (amdgpu_sriov_vf(adev))
+		mem_info->mem_clk_max = adev->clock.default_mclk / 100;
+	else
+		mem_info->mem_clk_max = amdgpu_dpm_get_mclk(adev, false) / 100;
+}
+
 uint64_t get_gpu_clock_counter(struct kgd_dev *kgd)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index a8fa225..bc5385a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -57,6 +57,8 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **cpu_ptr);
 void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj);
 uint64_t get_vmem_size(struct kgd_dev *kgd);
+void get_local_mem_info(struct kgd_dev *kgd,
+			struct kfd_local_mem_info *mem_info);
 uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
 
 uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index c9b98d0..b705608 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -174,6 +174,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.init_gtt_mem_allocation = alloc_gtt_mem,
 	.free_gtt_mem = free_gtt_mem,
 	.get_vmem_size = get_vmem_size,
+	.get_local_mem_info = get_local_mem_info,
 	.get_gpu_clock_counter = get_gpu_clock_counter,
 	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
 	.alloc_pasid = amdgpu_vm_alloc_pasid,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index c538e30..b0e581a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -133,6 +133,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.init_gtt_mem_allocation = alloc_gtt_mem,
 	.free_gtt_mem = free_gtt_mem,
 	.get_vmem_size = get_vmem_size,
+	.get_local_mem_info = get_local_mem_info,
 	.get_gpu_clock_counter = get_gpu_clock_counter,
 	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
 	.alloc_pasid = amdgpu_vm_alloc_pasid,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 05/37] drm/amdkfd: Stop using get_vmem_size KGD-KFD interface
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 06/37] drm/amdkfd: Remove deprecated get_vmem_size Felix Kuehling
                     ` (23 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Ben Goz, Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

get_vmem_size() is deprecated. Instead use get_local_mem_info().

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 9d03a56..cb0303a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1073,11 +1073,15 @@ static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
 	uint32_t buf[7];
 	uint64_t local_mem_size;
 	int i;
+	struct kfd_local_mem_info local_mem_info;
 
 	if (!gpu)
 		return 0;
 
-	local_mem_size = gpu->kfd2kgd->get_vmem_size(gpu->kgd);
+	gpu->kfd2kgd->get_local_mem_info(gpu->kgd, &local_mem_info);
+
+	local_mem_size = local_mem_info.local_mem_size_private +
+			local_mem_info.local_mem_size_public;
 
 	buf[0] = gpu->pdev->devfn;
 	buf[1] = gpu->pdev->subsystem_vendor;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 06/37] drm/amdkfd: Remove deprecated get_vmem_size
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 05/37] drm/amdkfd: Stop using get_vmem_size KGD-KFD interface Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 07/37] drm/amd: Remove get_vmem_size from KGD-KFD interface Felix Kuehling
                     ` (22 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 10 ----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 -
 4 files changed, 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 56f6c12..972ecf0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -242,16 +242,6 @@ void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj)
 	kfree(mem);
 }
 
-uint64_t get_vmem_size(struct kgd_dev *kgd)
-{
-	struct amdgpu_device *adev =
-		(struct amdgpu_device *)kgd;
-
-	BUG_ON(kgd == NULL);
-
-	return adev->mc.real_vram_size;
-}
-
 void get_local_mem_info(struct kgd_dev *kgd,
 			struct kfd_local_mem_info *mem_info)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index bc5385a..eed7dea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -56,7 +56,6 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **mem_obj, uint64_t *gpu_addr,
 			void **cpu_ptr);
 void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj);
-uint64_t get_vmem_size(struct kgd_dev *kgd);
 void get_local_mem_info(struct kgd_dev *kgd,
 			struct kfd_local_mem_info *mem_info);
 uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index b705608..c9e2fbe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -173,7 +173,6 @@ static int get_tile_config(struct kgd_dev *kgd,
 static const struct kfd2kgd_calls kfd2kgd = {
 	.init_gtt_mem_allocation = alloc_gtt_mem,
 	.free_gtt_mem = free_gtt_mem,
-	.get_vmem_size = get_vmem_size,
 	.get_local_mem_info = get_local_mem_info,
 	.get_gpu_clock_counter = get_gpu_clock_counter,
 	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index b0e581a..72ff646 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -132,7 +132,6 @@ static int get_tile_config(struct kgd_dev *kgd,
 static const struct kfd2kgd_calls kfd2kgd = {
 	.init_gtt_mem_allocation = alloc_gtt_mem,
 	.free_gtt_mem = free_gtt_mem,
-	.get_vmem_size = get_vmem_size,
 	.get_local_mem_info = get_local_mem_info,
 	.get_gpu_clock_counter = get_gpu_clock_counter,
 	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 07/37] drm/amd: Remove get_vmem_size from KGD-KFD interface
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 06/37] drm/amdkfd: Remove deprecated get_vmem_size Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 08/37] drm/amdkfd: Update number of compute unit from KGD Felix Kuehling
                     ` (21 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index c58389c..0899cee 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -128,8 +128,6 @@ struct tile_config {
  *
  * @free_gtt_mem: Frees a buffer that was allocated on the gart aperture
  *
- * @get_vmem_size: Retrieves (physical) size of VRAM
- *
  * @get_local_mem_info: Retrieves information about GPU local memory
  *
  * @get_gpu_clock_counter: Retrieves GPU clock counter
@@ -190,7 +188,6 @@ struct kfd2kgd_calls {
 
 	void (*free_gtt_mem)(struct kgd_dev *kgd, void *mem_obj);
 
-	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
 	void (*get_local_mem_info)(struct kgd_dev *kgd,
 			struct kfd_local_mem_info *mem_info);
 	uint64_t (*get_gpu_clock_counter)(struct kgd_dev *kgd);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 08/37] drm/amdkfd: Update number of compute unit from KGD
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 07/37] drm/amd: Remove get_vmem_size from KGD-KFD interface Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 09/37] drm/amdkfd: Topology: Fix location_id Felix Kuehling
                     ` (20 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Flora Cui

From: Flora Cui <flora.cui@amd.com>

Overwrite the active simd_count from KGD at driver loading time. This is
based on assumption that register GC_USER_SHADER_ARRAY_CONFIG won’t get
changed.

V2: remove the incorrect simd_count reported at loading module.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed by: Yair Shachar< yair.shachar@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index cb0303a..ca2e51a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -133,8 +133,7 @@ static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
 	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
 	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
 		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
-	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
-				cu->processor_id_low);
+	pr_info("CU GPU: id_base=%d\n", cu->processor_id_low);
 }
 
 /* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
@@ -1124,6 +1123,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 {
 	uint32_t gpu_id;
 	struct kfd_topology_device *dev;
+	struct kfd_cu_info cu_info;
 	int res;
 
 	gpu_id = kfd_generate_gpu_id(gpu);
@@ -1161,6 +1161,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
 	dev->gpu_id = gpu_id;
 	gpu->id = gpu_id;
+	dev->gpu->kfd2kgd->get_cu_info(dev->gpu->kgd, &cu_info);
+	dev->node_props.simd_count = dev->node_props.simd_per_cu *
+			cu_info.cu_active_number;
 	dev->node_props.vendor_id = gpu->pdev->vendor;
 	dev->node_props.device_id = gpu->pdev->device;
 	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 09/37] drm/amdkfd: Topology: Fix location_id
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 08/37] drm/amdkfd: Update number of compute unit from KGD Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 10/37] drm/amdkfd: Fix memory leaks in kfd topology Felix Kuehling
                     ` (19 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Fix location_id format to match Thunk specification.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index ca2e51a..b614746 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1166,8 +1166,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 			cu_info.cu_active_number;
 	dev->node_props.vendor_id = gpu->pdev->vendor;
 	dev->node_props.device_id = gpu->pdev->device;
-	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
-			(gpu->pdev->devfn & 0xffffff);
+	dev->node_props.location_id = PCI_DEVID(gpu->pdev->bus->number,
+		gpu->pdev->devfn);
 	/*
 	 * TODO: Retrieve max engine clock values from KGD
 	 */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 10/37] drm/amdkfd: Fix memory leaks in kfd topology
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 09/37] drm/amdkfd: Topology: Fix location_id Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 11/37] drm/amdkfd: Group up CRAT related functions Felix Kuehling
                     ` (18 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Yong Zhao, Felix Kuehling

From: Yong Zhao <yong.zhao@amd.com>

Kobject created using kobject_create_and_add() can be freed using
kobject_put() when there is no referenece any more. However,
kobject memory allocated with kzalloc() has to set up a release
callback in order to free it when the counter decreases to 0.
Otherwise it causes memory leak.

Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b614746..9b9824f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -501,11 +501,17 @@ static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
 	return ret;
 }
 
+static void kfd_topology_kobj_release(struct kobject *kobj)
+{
+	kfree(kobj);
+}
+
 static const struct sysfs_ops sysprops_ops = {
 	.show = sysprops_show,
 };
 
 static struct kobj_type sysprops_type = {
+	.release = kfd_topology_kobj_release,
 	.sysfs_ops = &sysprops_ops,
 };
 
@@ -541,6 +547,7 @@ static const struct sysfs_ops iolink_ops = {
 };
 
 static struct kobj_type iolink_type = {
+	.release = kfd_topology_kobj_release,
 	.sysfs_ops = &iolink_ops,
 };
 
@@ -568,6 +575,7 @@ static const struct sysfs_ops mem_ops = {
 };
 
 static struct kobj_type mem_type = {
+	.release = kfd_topology_kobj_release,
 	.sysfs_ops = &mem_ops,
 };
 
@@ -607,6 +615,7 @@ static const struct sysfs_ops cache_ops = {
 };
 
 static struct kobj_type cache_type = {
+	.release = kfd_topology_kobj_release,
 	.sysfs_ops = &cache_ops,
 };
 
@@ -729,6 +738,7 @@ static const struct sysfs_ops node_ops = {
 };
 
 static struct kobj_type node_type = {
+	.release = kfd_topology_kobj_release,
 	.sysfs_ops = &node_ops,
 };
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 11/37] drm/amdkfd: Group up CRAT related functions
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 10/37] drm/amdkfd: Fix memory leaks in kfd topology Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 12/37] drm/amdkfd: Coding style cleanup Felix Kuehling
                     ` (17 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Amber Lin, Felix Kuehling

Take CRAT related functions out of kfd_topology.c and place them in
kfd_crat.c. This is the initial step of supporting more CRAT features,
i.e. creating virtual CRAT table for KFD devices without CRAT.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/Makefile       |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 350 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 332 +---------------------------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   3 +-
 5 files changed, 360 insertions(+), 330 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_crat.c

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index 5263e4d..153fb31 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -14,7 +14,7 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_process_queue_manager.o kfd_device_queue_manager.o \
 		kfd_device_queue_manager_cik.o kfd_device_queue_manager_vi.o \
 		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
-		kfd_dbgdev.o kfd_dbgmgr.o
+		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
 amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
new file mode 100644
index 0000000..1e331be
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -0,0 +1,350 @@
+/*
+ * Copyright 2015-2017 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+#include <linux/acpi.h>
+#include "kfd_crat.h"
+#include "kfd_topology.h"
+
+static int topology_crat_parsed;
+extern struct list_head topology_device_list;
+extern struct kfd_system_properties sys_props;
+
+static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
+	dev->node_props.cpu_core_id_base = cu->processor_id_low;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
+		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
+
+	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
+			cu->processor_id_low);
+}
+
+static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	dev->node_props.simd_id_base = cu->processor_id_low;
+	dev->node_props.simd_count = cu->num_simd_cores;
+	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
+	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
+	dev->node_props.wave_front_size = cu->wave_front_size;
+	dev->node_props.mem_banks_count = cu->num_banks;
+	dev->node_props.array_count = cu->num_arrays;
+	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
+	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
+	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
+		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
+	pr_info("CU GPU: id_base=%d\n", cu->processor_id_low);
+}
+
+/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
+{
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
+			cu->proximity_domain, cu->hsa_capability);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (cu->proximity_domain == i) {
+			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
+				kfd_populated_cu_info_cpu(dev, cu);
+
+			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
+				kfd_populated_cu_info_gpu(dev, cu);
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/*
+ * kfd_parse_subtype_mem is called when the topology mutex is
+ * already acquired
+ */
+static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
+{
+	struct kfd_mem_properties *props;
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
+			mem->promixity_domain);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (mem->promixity_domain == i) {
+			props = kfd_alloc_struct(props);
+			if (props == NULL)
+				return -ENOMEM;
+
+			if (dev->node_props.cpu_cores_count == 0)
+				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
+			else
+				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
+
+			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
+				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
+			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
+				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
+
+			props->size_in_bytes =
+				((uint64_t)mem->length_high << 32) +
+							mem->length_low;
+			props->width = mem->width;
+
+			dev->mem_bank_count++;
+			list_add_tail(&props->list, &dev->mem_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/*
+ * kfd_parse_subtype_cache is called when the topology mutex
+ * is already acquired
+ */
+static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
+{
+	struct kfd_cache_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t id;
+
+	id = cache->processor_id_low;
+
+	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (id == dev->node_props.cpu_core_id_base ||
+		    id == dev->node_props.simd_id_base) {
+			props = kfd_alloc_struct(props);
+			if (props == NULL)
+				return -ENOMEM;
+
+			props->processor_id_low = id;
+			props->cache_level = cache->cache_level;
+			props->cache_size = cache->cache_size;
+			props->cacheline_size = cache->cache_line_size;
+			props->cachelines_per_tag = cache->lines_per_tag;
+			props->cache_assoc = cache->associativity;
+			props->cache_latency = cache->cache_latency;
+
+			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_DATA;
+			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
+			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_CPU;
+			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_HSACU;
+
+			dev->cache_count++;
+			dev->node_props.caches_count++;
+			list_add_tail(&props->list, &dev->cache_props);
+
+			break;
+		}
+
+	return 0;
+}
+
+/*
+ * kfd_parse_subtype_iolink is called when the topology mutex
+ * is already acquired
+ */
+static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
+{
+	struct kfd_iolink_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+	uint32_t id_from;
+	uint32_t id_to;
+
+	id_from = iolink->proximity_domain_from;
+	id_to = iolink->proximity_domain_to;
+
+	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (id_from == i) {
+			props = kfd_alloc_struct(props);
+			if (props == NULL)
+				return -ENOMEM;
+
+			props->node_from = id_from;
+			props->node_to = id_to;
+			props->ver_maj = iolink->version_major;
+			props->ver_min = iolink->version_minor;
+
+			/*
+			 * weight factor (derived from CDIR), currently always 1
+			 */
+			props->weight = 1;
+
+			props->min_latency = iolink->minimum_latency;
+			props->max_latency = iolink->maximum_latency;
+			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
+			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
+			props->rec_transfer_size =
+					iolink->recommended_transfer_size;
+
+			dev->io_link_count++;
+			dev->node_props.io_links_count++;
+			list_add_tail(&props->list, &dev->io_link_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
+{
+	struct crat_subtype_computeunit *cu;
+	struct crat_subtype_memory *mem;
+	struct crat_subtype_cache *cache;
+	struct crat_subtype_iolink *iolink;
+	int ret = 0;
+
+	switch (sub_type_hdr->type) {
+	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
+		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
+		ret = kfd_parse_subtype_cu(cu);
+		break;
+	case CRAT_SUBTYPE_MEMORY_AFFINITY:
+		mem = (struct crat_subtype_memory *)sub_type_hdr;
+		ret = kfd_parse_subtype_mem(mem);
+		break;
+	case CRAT_SUBTYPE_CACHE_AFFINITY:
+		cache = (struct crat_subtype_cache *)sub_type_hdr;
+		ret = kfd_parse_subtype_cache(cache);
+		break;
+	case CRAT_SUBTYPE_TLB_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found TLB entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_IOLINK_AFFINITY:
+		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
+		ret = kfd_parse_subtype_iolink(iolink);
+		break;
+	default:
+		pr_warn("Unknown subtype (%d) in CRAT\n",
+				sub_type_hdr->type);
+	}
+
+	return ret;
+}
+
+int kfd_parse_crat_table(void *crat_image)
+{
+	struct kfd_topology_device *top_dev;
+	struct crat_subtype_generic *sub_type_hdr;
+	uint16_t node_id;
+	int ret;
+	struct crat_header *crat_table = (struct crat_header *)crat_image;
+	uint16_t num_nodes;
+	uint32_t image_len;
+
+	if (!crat_image)
+		return -EINVAL;
+
+	num_nodes = crat_table->num_domains;
+	image_len = crat_table->length;
+
+	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
+
+	for (node_id = 0; node_id < num_nodes; node_id++) {
+		top_dev = kfd_create_topology_device();
+		if (!top_dev) {
+			kfd_release_live_view();
+			return -ENOMEM;
+		}
+	}
+
+	sys_props.platform_id =
+		(*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
+	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
+	sys_props.platform_rev = crat_table->revision;
+
+	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
+	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
+			((char *)crat_image) + image_len) {
+		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
+			ret = kfd_parse_subtype(sub_type_hdr);
+			if (ret != 0) {
+				kfd_release_live_view();
+				return ret;
+			}
+		}
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+				sub_type_hdr->length);
+	}
+
+	sys_props.generation_count++;
+	topology_crat_parsed = 1;
+
+	return 0;
+}
+
+int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
+{
+	struct acpi_table_header *crat_table;
+	acpi_status status;
+
+	if (!size)
+		return -EINVAL;
+
+	/*
+	 * Fetch the CRAT table from ACPI
+	 */
+	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
+	if (status == AE_NOT_FOUND) {
+		pr_warn("CRAT table not found\n");
+		return -ENODATA;
+	} else if (ACPI_FAILURE(status)) {
+		const char *err = acpi_format_exception(status);
+
+		pr_err("CRAT table error: %s\n", err);
+		return -EINVAL;
+	}
+
+	if (*size >= crat_table->length && crat_image != NULL)
+		memcpy(crat_image, crat_table, crat_table->length);
+
+	*size = crat_table->length;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index a374fa3..15371cb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -291,4 +291,7 @@ struct cdit_header {
 
 #pragma pack()
 
+int kfd_topology_get_crat_acpi(void *crat_image, size_t *size);
+int kfd_parse_crat_table(void *crat_image);
+
 #endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 9b9824f..2b3fe95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -34,9 +34,8 @@
 #include "kfd_topology.h"
 #include "kfd_device_queue_manager.h"
 
-static struct list_head topology_device_list;
-static int topology_crat_parsed;
-static struct kfd_system_properties sys_props;
+struct list_head topology_device_list;
+struct kfd_system_properties sys_props;
 
 static DECLARE_RWSEM(topology_lock);
 
@@ -76,276 +75,6 @@ struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
 	return device;
 }
 
-static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
-{
-	struct acpi_table_header *crat_table;
-	acpi_status status;
-
-	if (!size)
-		return -EINVAL;
-
-	/*
-	 * Fetch the CRAT table from ACPI
-	 */
-	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
-	if (status == AE_NOT_FOUND) {
-		pr_warn("CRAT table not found\n");
-		return -ENODATA;
-	} else if (ACPI_FAILURE(status)) {
-		const char *err = acpi_format_exception(status);
-
-		pr_err("CRAT table error: %s\n", err);
-		return -EINVAL;
-	}
-
-	if (*size >= crat_table->length && crat_image != NULL)
-		memcpy(crat_image, crat_table, crat_table->length);
-
-	*size = crat_table->length;
-
-	return 0;
-}
-
-static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
-		struct crat_subtype_computeunit *cu)
-{
-	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
-	dev->node_props.cpu_core_id_base = cu->processor_id_low;
-	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
-		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
-
-	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
-			cu->processor_id_low);
-}
-
-static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
-		struct crat_subtype_computeunit *cu)
-{
-	dev->node_props.simd_id_base = cu->processor_id_low;
-	dev->node_props.simd_count = cu->num_simd_cores;
-	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
-	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
-	dev->node_props.wave_front_size = cu->wave_front_size;
-	dev->node_props.mem_banks_count = cu->num_banks;
-	dev->node_props.array_count = cu->num_arrays;
-	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
-	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
-	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
-	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
-		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
-	pr_info("CU GPU: id_base=%d\n", cu->processor_id_low);
-}
-
-/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
-static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
-{
-	struct kfd_topology_device *dev;
-	int i = 0;
-
-	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
-			cu->proximity_domain, cu->hsa_capability);
-	list_for_each_entry(dev, &topology_device_list, list) {
-		if (cu->proximity_domain == i) {
-			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
-				kfd_populated_cu_info_cpu(dev, cu);
-
-			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
-				kfd_populated_cu_info_gpu(dev, cu);
-			break;
-		}
-		i++;
-	}
-
-	return 0;
-}
-
-/*
- * kfd_parse_subtype_mem is called when the topology mutex is
- * already acquired
- */
-static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
-{
-	struct kfd_mem_properties *props;
-	struct kfd_topology_device *dev;
-	int i = 0;
-
-	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
-			mem->promixity_domain);
-	list_for_each_entry(dev, &topology_device_list, list) {
-		if (mem->promixity_domain == i) {
-			props = kfd_alloc_struct(props);
-			if (props == NULL)
-				return -ENOMEM;
-
-			if (dev->node_props.cpu_cores_count == 0)
-				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
-			else
-				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
-
-			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
-				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
-			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
-				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
-
-			props->size_in_bytes =
-				((uint64_t)mem->length_high << 32) +
-							mem->length_low;
-			props->width = mem->width;
-
-			dev->mem_bank_count++;
-			list_add_tail(&props->list, &dev->mem_props);
-
-			break;
-		}
-		i++;
-	}
-
-	return 0;
-}
-
-/*
- * kfd_parse_subtype_cache is called when the topology mutex
- * is already acquired
- */
-static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
-{
-	struct kfd_cache_properties *props;
-	struct kfd_topology_device *dev;
-	uint32_t id;
-
-	id = cache->processor_id_low;
-
-	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
-	list_for_each_entry(dev, &topology_device_list, list)
-		if (id == dev->node_props.cpu_core_id_base ||
-		    id == dev->node_props.simd_id_base) {
-			props = kfd_alloc_struct(props);
-			if (props == NULL)
-				return -ENOMEM;
-
-			props->processor_id_low = id;
-			props->cache_level = cache->cache_level;
-			props->cache_size = cache->cache_size;
-			props->cacheline_size = cache->cache_line_size;
-			props->cachelines_per_tag = cache->lines_per_tag;
-			props->cache_assoc = cache->associativity;
-			props->cache_latency = cache->cache_latency;
-
-			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
-				props->cache_type |= HSA_CACHE_TYPE_DATA;
-			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
-				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
-			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
-				props->cache_type |= HSA_CACHE_TYPE_CPU;
-			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
-				props->cache_type |= HSA_CACHE_TYPE_HSACU;
-
-			dev->cache_count++;
-			dev->node_props.caches_count++;
-			list_add_tail(&props->list, &dev->cache_props);
-
-			break;
-		}
-
-	return 0;
-}
-
-/*
- * kfd_parse_subtype_iolink is called when the topology mutex
- * is already acquired
- */
-static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
-{
-	struct kfd_iolink_properties *props;
-	struct kfd_topology_device *dev;
-	uint32_t i = 0;
-	uint32_t id_from;
-	uint32_t id_to;
-
-	id_from = iolink->proximity_domain_from;
-	id_to = iolink->proximity_domain_to;
-
-	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
-	list_for_each_entry(dev, &topology_device_list, list) {
-		if (id_from == i) {
-			props = kfd_alloc_struct(props);
-			if (props == NULL)
-				return -ENOMEM;
-
-			props->node_from = id_from;
-			props->node_to = id_to;
-			props->ver_maj = iolink->version_major;
-			props->ver_min = iolink->version_minor;
-
-			/*
-			 * weight factor (derived from CDIR), currently always 1
-			 */
-			props->weight = 1;
-
-			props->min_latency = iolink->minimum_latency;
-			props->max_latency = iolink->maximum_latency;
-			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
-			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
-			props->rec_transfer_size =
-					iolink->recommended_transfer_size;
-
-			dev->io_link_count++;
-			dev->node_props.io_links_count++;
-			list_add_tail(&props->list, &dev->io_link_props);
-
-			break;
-		}
-		i++;
-	}
-
-	return 0;
-}
-
-static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
-{
-	struct crat_subtype_computeunit *cu;
-	struct crat_subtype_memory *mem;
-	struct crat_subtype_cache *cache;
-	struct crat_subtype_iolink *iolink;
-	int ret = 0;
-
-	switch (sub_type_hdr->type) {
-	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
-		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
-		ret = kfd_parse_subtype_cu(cu);
-		break;
-	case CRAT_SUBTYPE_MEMORY_AFFINITY:
-		mem = (struct crat_subtype_memory *)sub_type_hdr;
-		ret = kfd_parse_subtype_mem(mem);
-		break;
-	case CRAT_SUBTYPE_CACHE_AFFINITY:
-		cache = (struct crat_subtype_cache *)sub_type_hdr;
-		ret = kfd_parse_subtype_cache(cache);
-		break;
-	case CRAT_SUBTYPE_TLB_AFFINITY:
-		/*
-		 * For now, nothing to do here
-		 */
-		pr_info("Found TLB entry in CRAT table (not processing)\n");
-		break;
-	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
-		/*
-		 * For now, nothing to do here
-		 */
-		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
-		break;
-	case CRAT_SUBTYPE_IOLINK_AFFINITY:
-		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
-		ret = kfd_parse_subtype_iolink(iolink);
-		break;
-	default:
-		pr_warn("Unknown subtype (%d) in CRAT\n",
-				sub_type_hdr->type);
-	}
-
-	return ret;
-}
-
 static void kfd_release_topology_device(struct kfd_topology_device *dev)
 {
 	struct kfd_mem_properties *mem;
@@ -380,7 +109,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
 	sys_props.num_devices--;
 }
 
-static void kfd_release_live_view(void)
+void kfd_release_live_view(void)
 {
 	struct kfd_topology_device *dev;
 
@@ -393,7 +122,7 @@ static void kfd_release_live_view(void)
 	memset(&sys_props, 0, sizeof(sys_props));
 }
 
-static struct kfd_topology_device *kfd_create_topology_device(void)
+struct kfd_topology_device *kfd_create_topology_device(void)
 {
 	struct kfd_topology_device *dev;
 
@@ -413,58 +142,6 @@ static struct kfd_topology_device *kfd_create_topology_device(void)
 	return dev;
 }
 
-static int kfd_parse_crat_table(void *crat_image)
-{
-	struct kfd_topology_device *top_dev;
-	struct crat_subtype_generic *sub_type_hdr;
-	uint16_t node_id;
-	int ret;
-	struct crat_header *crat_table = (struct crat_header *)crat_image;
-	uint16_t num_nodes;
-	uint32_t image_len;
-
-	if (!crat_image)
-		return -EINVAL;
-
-	num_nodes = crat_table->num_domains;
-	image_len = crat_table->length;
-
-	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
-
-	for (node_id = 0; node_id < num_nodes; node_id++) {
-		top_dev = kfd_create_topology_device();
-		if (!top_dev) {
-			kfd_release_live_view();
-			return -ENOMEM;
-		}
-	}
-
-	sys_props.platform_id =
-		(*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
-	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
-	sys_props.platform_rev = crat_table->revision;
-
-	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
-	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
-			((char *)crat_image) + image_len) {
-		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
-			ret = kfd_parse_subtype(sub_type_hdr);
-			if (ret != 0) {
-				kfd_release_live_view();
-				return ret;
-			}
-		}
-
-		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
-				sub_type_hdr->length);
-	}
-
-	sys_props.generation_count++;
-	topology_crat_parsed = 1;
-
-	return 0;
-}
-
 
 #define sysfs_show_gen_prop(buffer, fmt, ...) \
 		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
@@ -1016,7 +693,6 @@ int kfd_topology_init(void)
 	 */
 	INIT_LIST_HEAD(&topology_device_list);
 	init_rwsem(&topology_lock);
-	topology_crat_parsed = 0;
 
 	memset(&sys_props, 0, sizeof(sys_props));
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index c3ddb9b..9996458 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -164,6 +164,7 @@ struct kfd_system_properties {
 	struct attribute	attr_props;
 };
 
-
+struct kfd_topology_device *kfd_create_topology_device(void);
+void kfd_release_live_view(void);
 
 #endif /* __KFD_TOPOLOGY_H__ */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 12/37] drm/amdkfd: Coding style cleanup
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 11/37] drm/amdkfd: Group up CRAT related functions Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 13/37] drm/amdkfd: Reorganize CRAT fetching from ACPI Felix Kuehling
                     ` (16 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Kent Russell

From: Kent Russell <kent.russell@amd.com>

Minor cleanup that was missed previously because code moved around.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 12 ++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 1e331be..f2dda60 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -91,11 +91,11 @@ static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
 	int i = 0;
 
 	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
-			mem->promixity_domain);
+			mem->proximity_domain);
 	list_for_each_entry(dev, &topology_device_list, list) {
-		if (mem->promixity_domain == i) {
+		if (mem->proximity_domain == i) {
 			props = kfd_alloc_struct(props);
-			if (props == NULL)
+			if (!props)
 				return -ENOMEM;
 
 			if (dev->node_props.cpu_cores_count == 0)
@@ -141,7 +141,7 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
 		if (id == dev->node_props.cpu_core_id_base ||
 		    id == dev->node_props.simd_id_base) {
 			props = kfd_alloc_struct(props);
-			if (props == NULL)
+			if (!props)
 				return -ENOMEM;
 
 			props->processor_id_low = id;
@@ -190,7 +190,7 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
 	list_for_each_entry(dev, &topology_device_list, list) {
 		if (id_from == i) {
 			props = kfd_alloc_struct(props);
-			if (props == NULL)
+			if (!props)
 				return -ENOMEM;
 
 			props->node_from = id_from;
@@ -260,7 +260,7 @@ static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
 		ret = kfd_parse_subtype_iolink(iolink);
 		break;
 	default:
-		pr_warn("Unknown subtype (%d) in CRAT\n",
+		pr_warn("Unknown subtype %d in CRAT\n",
 				sub_type_hdr->type);
 	}
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index 15371cb..920697b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -127,7 +127,7 @@ struct crat_subtype_memory {
 	uint8_t		length;
 	uint16_t	reserved;
 	uint32_t	flags;
-	uint32_t	promixity_domain;
+	uint32_t	proximity_domain;
 	uint32_t	base_addr_low;
 	uint32_t	base_addr_high;
 	uint32_t	length_low;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 2b3fe95..b6cf785 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -895,7 +895,7 @@ int kfd_topology_remove_device(struct kfd_dev *gpu)
 
 	up_write(&topology_lock);
 
-	if (res == 0)
+	if (!res)
 		kfd_notify_gpu_change(gpu_id, 0);
 
 	return res;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 13/37] drm/amdkfd: Reorganize CRAT fetching from ACPI
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 12/37] drm/amdkfd: Coding style cleanup Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 14/37] drm/amdkfd: Decouple CRAT parsing from device list update Felix Kuehling
                     ` (15 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan, Kent Russell

From: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>

Reorganize and rename kfd_topology_get_crat_acpi function. In this way
acpi_get_table(..) needs to be called only once. This will also aid in
dGPU topology implementation.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 41 +++++++++++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |  3 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 40 ++++++++++++++----------------
 3 files changed, 54 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index f2dda60..e264f5d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -319,17 +319,29 @@ int kfd_parse_crat_table(void *crat_image)
 	return 0;
 }
 
-int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
+/*
+ * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
+ * copies CRAT from ACPI (if available).
+ * NOTE: Call kfd_destroy_crat_image to free CRAT image memory
+ *
+ *	@crat_image: CRAT read from ACPI. If no CRAT in ACPI then
+ *		     crat_image will be NULL
+ *	@size: [OUT] size of crat_image
+ *
+ *	Return 0 if successful else return -ve value
+ */
+int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
 {
 	struct acpi_table_header *crat_table;
 	acpi_status status;
+	void *pcrat_image;
 
-	if (!size)
+	if (!crat_image)
 		return -EINVAL;
 
-	/*
-	 * Fetch the CRAT table from ACPI
-	 */
+	*crat_image = NULL;
+
+	/* Fetch the CRAT table from ACPI */
 	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
 	if (status == AE_NOT_FOUND) {
 		pr_warn("CRAT table not found\n");
@@ -341,10 +353,25 @@ int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
 		return -EINVAL;
 	}
 
-	if (*size >= crat_table->length && crat_image != NULL)
-		memcpy(crat_image, crat_table, crat_table->length);
+	pcrat_image = kmalloc(crat_table->length, GFP_KERNEL);
+	if (!pcrat_image)
+		return -ENOMEM;
+
+	memcpy(pcrat_image, crat_table, crat_table->length);
 
+	*crat_image = pcrat_image;
 	*size = crat_table->length;
 
 	return 0;
 }
+
+/*
+ * kfd_destroy_crat_image
+ *
+ *	@crat_image: [IN] - crat_image from kfd_create_crat_image_xxx(..)
+ *
+ */
+void kfd_destroy_crat_image(void *crat_image)
+{
+	kfree(crat_image);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index 920697b..da83105 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -291,7 +291,8 @@ struct cdit_header {
 
 #pragma pack()
 
-int kfd_topology_get_crat_acpi(void *crat_image, size_t *size);
+int kfd_create_crat_image_acpi(void **crat_image, size_t *size);
+void kfd_destroy_crat_image(void *crat_image);
 int kfd_parse_crat_table(void *crat_image);
 
 #endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b6cf785..35da4af 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -699,35 +699,31 @@ int kfd_topology_init(void)
 	/*
 	 * Get the CRAT image from the ACPI
 	 */
-	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
-	if (ret == 0 && image_size > 0) {
-		pr_info("Found CRAT image with size=%zd\n", image_size);
-		crat_image = kmalloc(image_size, GFP_KERNEL);
-		if (!crat_image) {
-			ret = -ENOMEM;
-			pr_err("No memory for allocating CRAT image\n");
+	ret = kfd_create_crat_image_acpi(&crat_image, &image_size);
+	if (!ret) {
+		ret = kfd_parse_crat_table(crat_image);
+		if (ret)
 			goto err;
-		}
-		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
-
-		if (ret == 0) {
-			down_write(&topology_lock);
-			ret = kfd_parse_crat_table(crat_image);
-			if (ret == 0)
-				ret = kfd_topology_update_sysfs();
-			up_write(&topology_lock);
-		} else {
-			pr_err("Couldn't get CRAT table size from ACPI\n");
-		}
-		kfree(crat_image);
 	} else if (ret == -ENODATA) {
+		/* TODO: Create fake CRAT table */
 		ret = 0;
+		goto err;
 	} else {
 		pr_err("Couldn't get CRAT table size from ACPI\n");
+		goto err;
 	}
 
+	down_write(&topology_lock);
+	ret = kfd_topology_update_sysfs();
+	up_write(&topology_lock);
+
+	if (!ret)
+		pr_info("Finished initializing topology\n");
+	else
+		pr_err("Failed to update topology in sysfs ret=%d\n", ret);
+
 err:
-	pr_info("Finished initializing topology ret=%d\n", ret);
+	kfd_destroy_crat_image(crat_image);
 	return ret;
 }
 
@@ -747,7 +743,7 @@ static void kfd_debug_print_topology(void)
 		pr_info("Node: %d\n", i);
 		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
 		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
-		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
+		pr_info("\tSIMD count: %d\n", dev->node_props.simd_count);
 		i++;
 	}
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 14/37] drm/amdkfd: Decouple CRAT parsing from device list update
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 13/37] drm/amdkfd: Reorganize CRAT fetching from ACPI Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 15/37] drm/amdkfd: Support enumerating non-GPU devices Felix Kuehling
                     ` (14 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan, Kent Russell

From: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>

Currently, CRAT parsing is intertwined with topology_device_list and
hence repeated calls to kfd_parse_crat_table() will fail. Decouple
kfd_parse_crat_table() and topology_device_list.

kfd_parse_crat_table() will parse CRAT and add topology devices to a
temporary list temp_topology_device_list and then
kfd_topology_update_device_list will move contents from temporary list to
master list.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 118 +++++++++++++++++-------------
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |   3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  84 ++++++++++++++-------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
 4 files changed, 132 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index e264f5d..cd31dfb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -23,8 +23,6 @@
 #include "kfd_crat.h"
 #include "kfd_topology.h"
 
-static int topology_crat_parsed;
-extern struct list_head topology_device_list;
 extern struct kfd_system_properties sys_props;
 
 static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
@@ -57,16 +55,18 @@ static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
 	pr_info("CU GPU: id_base=%d\n", cu->processor_id_low);
 }
 
-/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
-static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
+/* kfd_parse_subtype_cu - parse compute unit subtypes and attach it to correct
+ * topology device present in the device_list
+ */
+static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu,
+				struct list_head *device_list)
 {
 	struct kfd_topology_device *dev;
-	int i = 0;
 
 	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
 			cu->proximity_domain, cu->hsa_capability);
-	list_for_each_entry(dev, &topology_device_list, list) {
-		if (cu->proximity_domain == i) {
+	list_for_each_entry(dev, device_list, list) {
+		if (cu->proximity_domain == dev->proximity_domain) {
 			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
 				kfd_populated_cu_info_cpu(dev, cu);
 
@@ -74,26 +74,24 @@ static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
 				kfd_populated_cu_info_gpu(dev, cu);
 			break;
 		}
-		i++;
 	}
 
 	return 0;
 }
 
-/*
- * kfd_parse_subtype_mem is called when the topology mutex is
- * already acquired
+/* kfd_parse_subtype_mem - parse memory subtypes and attach it to correct
+ * topology device present in the device_list
  */
-static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
+static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem,
+				struct list_head *device_list)
 {
 	struct kfd_mem_properties *props;
 	struct kfd_topology_device *dev;
-	int i = 0;
 
 	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
 			mem->proximity_domain);
-	list_for_each_entry(dev, &topology_device_list, list) {
-		if (mem->proximity_domain == i) {
+	list_for_each_entry(dev, device_list, list) {
+		if (mem->proximity_domain == dev->proximity_domain) {
 			props = kfd_alloc_struct(props);
 			if (!props)
 				return -ENOMEM;
@@ -118,17 +116,16 @@ static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
 
 			break;
 		}
-		i++;
 	}
 
 	return 0;
 }
 
-/*
- * kfd_parse_subtype_cache is called when the topology mutex
- * is already acquired
+/* kfd_parse_subtype_cache - parse cache subtypes and attach it to correct
+ * topology device present in the device_list
  */
-static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
+static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
+			struct list_head *device_list)
 {
 	struct kfd_cache_properties *props;
 	struct kfd_topology_device *dev;
@@ -137,7 +134,7 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
 	id = cache->processor_id_low;
 
 	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
-	list_for_each_entry(dev, &topology_device_list, list)
+	list_for_each_entry(dev, device_list, list)
 		if (id == dev->node_props.cpu_core_id_base ||
 		    id == dev->node_props.simd_id_base) {
 			props = kfd_alloc_struct(props);
@@ -171,15 +168,14 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
 	return 0;
 }
 
-/*
- * kfd_parse_subtype_iolink is called when the topology mutex
- * is already acquired
+/* kfd_parse_subtype_iolink - parse iolink subtypes and attach it to correct
+ * topology device present in the device_list
  */
-static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
+static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
+					struct list_head *device_list)
 {
 	struct kfd_iolink_properties *props;
 	struct kfd_topology_device *dev;
-	uint32_t i = 0;
 	uint32_t id_from;
 	uint32_t id_to;
 
@@ -187,8 +183,8 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
 	id_to = iolink->proximity_domain_to;
 
 	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
-	list_for_each_entry(dev, &topology_device_list, list) {
-		if (id_from == i) {
+	list_for_each_entry(dev, device_list, list) {
+		if (id_from == dev->proximity_domain) {
 			props = kfd_alloc_struct(props);
 			if (!props)
 				return -ENOMEM;
@@ -216,13 +212,18 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
 
 			break;
 		}
-		i++;
 	}
 
 	return 0;
 }
 
-static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
+/* kfd_parse_subtype - parse subtypes and attach it to correct topology device
+ * present in the device_list
+ *	@sub_type_hdr - subtype section of crat_image
+ *	@device_list - list of topology devices present in this crat_image
+ */
+static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr,
+				struct list_head *device_list)
 {
 	struct crat_subtype_computeunit *cu;
 	struct crat_subtype_memory *mem;
@@ -233,15 +234,15 @@ static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
 	switch (sub_type_hdr->type) {
 	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
 		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
-		ret = kfd_parse_subtype_cu(cu);
+		ret = kfd_parse_subtype_cu(cu, device_list);
 		break;
 	case CRAT_SUBTYPE_MEMORY_AFFINITY:
 		mem = (struct crat_subtype_memory *)sub_type_hdr;
-		ret = kfd_parse_subtype_mem(mem);
+		ret = kfd_parse_subtype_mem(mem, device_list);
 		break;
 	case CRAT_SUBTYPE_CACHE_AFFINITY:
 		cache = (struct crat_subtype_cache *)sub_type_hdr;
-		ret = kfd_parse_subtype_cache(cache);
+		ret = kfd_parse_subtype_cache(cache, device_list);
 		break;
 	case CRAT_SUBTYPE_TLB_AFFINITY:
 		/*
@@ -257,7 +258,7 @@ static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
 		break;
 	case CRAT_SUBTYPE_IOLINK_AFFINITY:
 		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
-		ret = kfd_parse_subtype_iolink(iolink);
+		ret = kfd_parse_subtype_iolink(iolink, device_list);
 		break;
 	default:
 		pr_warn("Unknown subtype %d in CRAT\n",
@@ -267,12 +268,23 @@ static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
 	return ret;
 }
 
-int kfd_parse_crat_table(void *crat_image)
+/* kfd_parse_crat_table - parse CRAT table. For each node present in CRAT
+ * create a kfd_topology_device and add in to device_list. Also parse
+ * CRAT subtypes and attach it to appropriate kfd_topology_device
+ *	@crat_image - input image containing CRAT
+ *	@device_list - [OUT] list of kfd_topology_device generated after
+ *		       parsing crat_image
+ *	@proximity_domain - Proximity domain of the first device in the table
+ *
+ *	Return - 0 if successful else -ve value
+ */
+int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
+			 uint32_t proximity_domain)
 {
 	struct kfd_topology_device *top_dev;
 	struct crat_subtype_generic *sub_type_hdr;
 	uint16_t node_id;
-	int ret;
+	int ret = 0;
 	struct crat_header *crat_table = (struct crat_header *)crat_image;
 	uint16_t num_nodes;
 	uint32_t image_len;
@@ -280,17 +292,26 @@ int kfd_parse_crat_table(void *crat_image)
 	if (!crat_image)
 		return -EINVAL;
 
+	if (!list_empty(device_list)) {
+		pr_warn("Error device list should be empty\n");
+		return -EINVAL;
+	}
+
 	num_nodes = crat_table->num_domains;
 	image_len = crat_table->length;
 
 	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
 
 	for (node_id = 0; node_id < num_nodes; node_id++) {
-		top_dev = kfd_create_topology_device();
-		if (!top_dev) {
-			kfd_release_live_view();
-			return -ENOMEM;
-		}
+		top_dev = kfd_create_topology_device(device_list);
+		if (!top_dev)
+			break;
+		top_dev->proximity_domain = proximity_domain++;
+	}
+
+	if (!top_dev) {
+		ret = -ENOMEM;
+		goto err;
 	}
 
 	sys_props.platform_id =
@@ -302,21 +323,20 @@ int kfd_parse_crat_table(void *crat_image)
 	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
 			((char *)crat_image) + image_len) {
 		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
-			ret = kfd_parse_subtype(sub_type_hdr);
-			if (ret != 0) {
-				kfd_release_live_view();
-				return ret;
-			}
+			ret = kfd_parse_subtype(sub_type_hdr, device_list);
+			if (ret)
+				break;
 		}
 
 		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
 				sub_type_hdr->length);
 	}
 
-	sys_props.generation_count++;
-	topology_crat_parsed = 1;
+err:
+	if (ret)
+		kfd_release_topology_device_list(device_list);
 
-	return 0;
+	return ret;
 }
 
 /*
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index da83105..4e683ae 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -293,6 +293,7 @@ struct cdit_header {
 
 int kfd_create_crat_image_acpi(void **crat_image, size_t *size);
 void kfd_destroy_crat_image(void *crat_image);
-int kfd_parse_crat_table(void *crat_image);
+int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
+			 uint32_t proximity_domain);
 
 #endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 35da4af..f64350b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -34,7 +34,8 @@
 #include "kfd_topology.h"
 #include "kfd_device_queue_manager.h"
 
-struct list_head topology_device_list;
+/* topology_device_list - Master list of all topology devices */
+static struct list_head topology_device_list;
 struct kfd_system_properties sys_props;
 
 static DECLARE_RWSEM(topology_lock);
@@ -105,24 +106,27 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
 	}
 
 	kfree(dev);
-
-	sys_props.num_devices--;
 }
 
-void kfd_release_live_view(void)
+void kfd_release_topology_device_list(struct list_head *device_list)
 {
 	struct kfd_topology_device *dev;
 
-	while (topology_device_list.next != &topology_device_list) {
-		dev = container_of(topology_device_list.next,
-				 struct kfd_topology_device, list);
+	while (!list_empty(device_list)) {
+		dev = list_first_entry(device_list,
+				       struct kfd_topology_device, list);
 		kfd_release_topology_device(dev);
+	}
 }
 
+static void kfd_release_live_view(void)
+{
+	kfd_release_topology_device_list(&topology_device_list);
 	memset(&sys_props, 0, sizeof(sys_props));
 }
 
-struct kfd_topology_device *kfd_create_topology_device(void)
+struct kfd_topology_device *kfd_create_topology_device(
+				struct list_head *device_list)
 {
 	struct kfd_topology_device *dev;
 
@@ -136,8 +140,7 @@ struct kfd_topology_device *kfd_create_topology_device(void)
 	INIT_LIST_HEAD(&dev->cache_props);
 	INIT_LIST_HEAD(&dev->io_link_props);
 
-	list_add_tail(&dev->list, &topology_device_list);
-	sys_props.num_devices++;
+	list_add_tail(&dev->list, device_list);
 
 	return dev;
 }
@@ -682,16 +685,32 @@ static void kfd_topology_release_sysfs(void)
 	}
 }
 
+/* Called with write topology_lock acquired */
+static void kfd_topology_update_device_list(struct list_head *temp_list,
+					struct list_head *master_list)
+{
+	while (!list_empty(temp_list)) {
+		list_move_tail(temp_list->next, master_list);
+		sys_props.num_devices++;
+	}
+}
+
 int kfd_topology_init(void)
 {
 	void *crat_image = NULL;
 	size_t image_size = 0;
 	int ret;
+	struct list_head temp_topology_device_list;
 
-	/*
-	 * Initialize the head for the topology device list
+	/* topology_device_list - Master list of all topology devices
+	 * temp_topology_device_list - temporary list created while parsing CRAT
+	 * or VCRAT. Once parsing is complete the contents of list is moved to
+	 * topology_device_list
 	 */
+
+	/* Initialize the head for the both the lists */
 	INIT_LIST_HEAD(&topology_device_list);
+	INIT_LIST_HEAD(&temp_topology_device_list);
 	init_rwsem(&topology_lock);
 
 	memset(&sys_props, 0, sizeof(sys_props));
@@ -701,7 +720,8 @@ int kfd_topology_init(void)
 	 */
 	ret = kfd_create_crat_image_acpi(&crat_image, &image_size);
 	if (!ret) {
-		ret = kfd_parse_crat_table(crat_image);
+		ret = kfd_parse_crat_table(crat_image,
+					   &temp_topology_device_list, 0);
 		if (ret)
 			goto err;
 	} else if (ret == -ENODATA) {
@@ -714,12 +734,15 @@ int kfd_topology_init(void)
 	}
 
 	down_write(&topology_lock);
+	kfd_topology_update_device_list(&temp_topology_device_list,
+					&topology_device_list);
 	ret = kfd_topology_update_sysfs();
 	up_write(&topology_lock);
 
-	if (!ret)
+	if (!ret) {
+		sys_props.generation_count++;
 		pr_info("Finished initializing topology\n");
-	else
+	} else
 		pr_err("Failed to update topology in sysfs ret=%d\n", ret);
 
 err:
@@ -729,8 +752,10 @@ int kfd_topology_init(void)
 
 void kfd_topology_shutdown(void)
 {
+	down_write(&topology_lock);
 	kfd_topology_release_sysfs();
 	kfd_release_live_view();
+	up_write(&topology_lock);
 }
 
 static void kfd_debug_print_topology(void)
@@ -806,13 +831,15 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 	uint32_t gpu_id;
 	struct kfd_topology_device *dev;
 	struct kfd_cu_info cu_info;
-	int res;
+	int res = 0;
+	struct list_head temp_topology_device_list;
+
+	INIT_LIST_HEAD(&temp_topology_device_list);
 
 	gpu_id = kfd_generate_gpu_id(gpu);
 
 	pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
 
-	down_write(&topology_lock);
 	/*
 	 * Try to assign the GPU to existing topology device (generated from
 	 * CRAT table
@@ -821,11 +848,12 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 	if (!dev) {
 		pr_info("GPU was not found in the current topology. Extending.\n");
 		kfd_debug_print_topology();
-		dev = kfd_create_topology_device();
+		dev = kfd_create_topology_device(&temp_topology_device_list);
 		if (!dev) {
 			res = -ENOMEM;
 			goto err;
 		}
+
 		dev->gpu = gpu;
 
 		/*
@@ -833,12 +861,18 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		 * GPU vBIOS
 		 */
 
+		down_write(&topology_lock);
+		kfd_topology_update_device_list(&temp_topology_device_list,
+			&topology_device_list);
+
 		/* Update the SYSFS tree, since we added another topology
 		 * device
 		 */
 		if (kfd_topology_update_sysfs() < 0)
 			kfd_topology_release_sysfs();
 
+		up_write(&topology_lock);
+
 	}
 
 	dev->gpu_id = gpu_id;
@@ -859,30 +893,26 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		pr_info("Adding doorbell packet type capability\n");
 	}
 
-	res = 0;
-
-err:
-	up_write(&topology_lock);
-
-	if (res == 0)
+	if (!res)
 		kfd_notify_gpu_change(gpu_id, 1);
-
+err:
 	return res;
 }
 
 int kfd_topology_remove_device(struct kfd_dev *gpu)
 {
-	struct kfd_topology_device *dev;
+	struct kfd_topology_device *dev, *tmp;
 	uint32_t gpu_id;
 	int res = -ENODEV;
 
 	down_write(&topology_lock);
 
-	list_for_each_entry(dev, &topology_device_list, list)
+	list_for_each_entry_safe(dev, tmp, &topology_device_list, list)
 		if (dev->gpu == gpu) {
 			gpu_id = dev->gpu_id;
 			kfd_remove_sysfs_node_entry(dev);
 			kfd_release_topology_device(dev);
+			sys_props.num_devices--;
 			res = 0;
 			if (kfd_topology_update_sysfs() < 0)
 				kfd_topology_release_sysfs();
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 9996458..0d98b61 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -135,6 +135,7 @@ struct kfd_iolink_properties {
 struct kfd_topology_device {
 	struct list_head		list;
 	uint32_t			gpu_id;
+	uint32_t			proximity_domain;
 	struct kfd_node_properties	node_props;
 	uint32_t			mem_bank_count;
 	struct list_head		mem_props;
@@ -164,7 +165,8 @@ struct kfd_system_properties {
 	struct attribute	attr_props;
 };
 
-struct kfd_topology_device *kfd_create_topology_device(void);
-void kfd_release_live_view(void);
+struct kfd_topology_device *kfd_create_topology_device(
+		struct list_head *device_list);
+void kfd_release_topology_device_list(struct list_head *device_list);
 
 #endif /* __KFD_TOPOLOGY_H__ */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 15/37] drm/amdkfd: Support enumerating non-GPU devices
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 14/37] drm/amdkfd: Decouple CRAT parsing from device list update Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 16/37] drm/amdkfd: sync IOLINK defines to thunk spec Felix Kuehling
                     ` (13 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Modify kfd_topology_enum_kfd_devices(..) function to support non-GPU
nodes. The function returned NULL when it encountered non-GPU (say CPU)
nodes. This caused kfd_ioctl_create_event and kfd_init_apertures to fail
for Intel + Tonga.

kfd_topology_enum_kfd_devices will now parse all the nodes and return
valid kfd_dev for nodes with GPU.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c |  7 ++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h        |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c    | 18 +++++++++++-------
 4 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index c59384b..7377513 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -300,9 +300,14 @@ int kfd_init_apertures(struct kfd_process *process)
 	struct kfd_process_device *pdd;
 
 	/*Iterating over all devices*/
-	while ((dev = kfd_topology_enum_kfd_devices(id)) != NULL &&
+	while (kfd_topology_enum_kfd_devices(id, &dev) == 0 &&
 		id < NUM_OF_SUPPORTED_GPUS) {
 
+		if (!dev) {
+			id++; /* Skip non GPU devices */
+			continue;
+		}
+
 		pdd = kfd_create_process_device_data(dev, process);
 		if (!pdd) {
 			pr_err("Failed to create process device data\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
index d6a7961..15fff44 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
@@ -59,7 +59,7 @@ unsigned int kfd_pasid_alloc(void)
 		struct kfd_dev *dev = NULL;
 		unsigned int i = 0;
 
-		while ((dev = kfd_topology_enum_kfd_devices(i)) != NULL) {
+		while ((kfd_topology_enum_kfd_devices(i, &dev)) == 0) {
 			if (dev && dev->kfd2kgd) {
 				kfd2kgd = dev->kfd2kgd;
 				break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0c96a6b..69a6206 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -670,7 +670,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu);
 int kfd_topology_remove_device(struct kfd_dev *gpu);
 struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
 struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
-struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx);
+int kfd_topology_enum_kfd_devices(uint8_t idx, struct kfd_dev **kdev);
 
 /* Interrupts */
 int kfd_interrupt_init(struct kfd_dev *dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index f64350b..b2d2b7e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -927,22 +927,26 @@ int kfd_topology_remove_device(struct kfd_dev *gpu)
 	return res;
 }
 
-/*
- * When idx is out of bounds, the function will return NULL
+/* kfd_topology_enum_kfd_devices - Enumerate through all devices in KFD
+ *	topology. If GPU device is found @idx, then valid kfd_dev pointer is
+ *	returned through @kdev
+ * Return -	0: On success (@kdev will be NULL for non GPU nodes)
+ *		-1: If end of list
  */
-struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
+int kfd_topology_enum_kfd_devices(uint8_t idx, struct kfd_dev **kdev)
 {
 
 	struct kfd_topology_device *top_dev;
-	struct kfd_dev *device = NULL;
 	uint8_t device_idx = 0;
 
+	*kdev = NULL;
 	down_read(&topology_lock);
 
 	list_for_each_entry(top_dev, &topology_device_list, list) {
 		if (device_idx == idx) {
-			device = top_dev->gpu;
-			break;
+			*kdev = top_dev->gpu;
+			up_read(&topology_lock);
+			return 0;
 		}
 
 		device_idx++;
@@ -950,7 +954,7 @@ struct kfd_dev *kfd_topology_enum_kfd_devices(uint8_t idx)
 
 	up_read(&topology_lock);
 
-	return device;
+	return -1;
 
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 16/37] drm/amdkfd: sync IOLINK defines to thunk spec
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 15/37] drm/amdkfd: Support enumerating non-GPU devices Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 17/37] drm/amdkfd: Turn verbose topology messages into pr_debug Felix Kuehling
                     ` (12 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Current thunk spec v1.07 dated Feb 1, 2016

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index 4e683ae..3ac55a6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -222,9 +222,12 @@ struct crat_subtype_ccompute {
 /*
  * HSA IO Link Affinity structure and definitions
  */
-#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
-#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
-#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
+#define CRAT_IOLINK_FLAGS_ENABLED	(1 << 0)
+#define CRAT_IOLINK_FLAGS_NON_COHERENT	(1 << 1)
+#define CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT (1 << 2)
+#define CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT (1 << 3)
+#define CRAT_IOLINK_FLAGS_NO_PEER_TO_PEER_DMA (1 << 4)
+#define CRAT_IOLINK_FLAGS_RESERVED_MASK 0xffffffe0
 
 /*
  * IO interface types
@@ -232,8 +235,16 @@ struct crat_subtype_ccompute {
 #define CRAT_IOLINK_TYPE_UNDEFINED	0
 #define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
 #define CRAT_IOLINK_TYPE_PCIEXPRESS	2
-#define CRAT_IOLINK_TYPE_OTHER		3
-#define CRAT_IOLINK_TYPE_MAX		255
+#define CRAT_IOLINK_TYPE_AMBA 3
+#define CRAT_IOLINK_TYPE_MIPI 4
+#define CRAT_IOLINK_TYPE_QPI_1_1 5
+#define CRAT_IOLINK_TYPE_RESERVED1 6
+#define CRAT_IOLINK_TYPE_RESERVED2 7
+#define CRAT_IOLINK_TYPE_RAPID_IO 8
+#define CRAT_IOLINK_TYPE_INFINIBAND 9
+#define CRAT_IOLINK_TYPE_RESERVED3 10
+#define CRAT_IOLINK_TYPE_OTHER 11
+#define CRAT_IOLINK_TYPE_MAX 255
 
 #define CRAT_IOLINK_RESERVED_LENGTH 24
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 17/37] drm/amdkfd: Turn verbose topology messages into pr_debug
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 16/37] drm/amdkfd: sync IOLINK defines to thunk spec Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 18/37] drm/amdkfd: Simplify counting of memory banks Felix Kuehling
                     ` (11 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 17 +++++++++--------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  2 +-
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index cd31dfb..8a0a9a0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -33,7 +33,7 @@ static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
 	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
 		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
 
-	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
+	pr_debug("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
 			cu->processor_id_low);
 }
 
@@ -52,7 +52,7 @@ static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
 	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
 	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
 		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
-	pr_info("CU GPU: id_base=%d\n", cu->processor_id_low);
+	pr_debug("CU GPU: id_base=%d\n", cu->processor_id_low);
 }
 
 /* kfd_parse_subtype_cu - parse compute unit subtypes and attach it to correct
@@ -63,7 +63,7 @@ static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu,
 {
 	struct kfd_topology_device *dev;
 
-	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
+	pr_debug("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
 			cu->proximity_domain, cu->hsa_capability);
 	list_for_each_entry(dev, device_list, list) {
 		if (cu->proximity_domain == dev->proximity_domain) {
@@ -88,7 +88,7 @@ static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem,
 	struct kfd_mem_properties *props;
 	struct kfd_topology_device *dev;
 
-	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
+	pr_debug("Found memory entry in CRAT table with proximity_domain=%d\n",
 			mem->proximity_domain);
 	list_for_each_entry(dev, device_list, list) {
 		if (mem->proximity_domain == dev->proximity_domain) {
@@ -133,7 +133,7 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
 
 	id = cache->processor_id_low;
 
-	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
+	pr_debug("Found cache entry in CRAT table with processor_id=%d\n", id);
 	list_for_each_entry(dev, device_list, list)
 		if (id == dev->node_props.cpu_core_id_base ||
 		    id == dev->node_props.simd_id_base) {
@@ -182,7 +182,8 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
 	id_from = iolink->proximity_domain_from;
 	id_to = iolink->proximity_domain_to;
 
-	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
+	pr_debug("Found IO link entry in CRAT table with id_from=%d\n",
+			id_from);
 	list_for_each_entry(dev, device_list, list) {
 		if (id_from == dev->proximity_domain) {
 			props = kfd_alloc_struct(props);
@@ -248,13 +249,13 @@ static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr,
 		/*
 		 * For now, nothing to do here
 		 */
-		pr_info("Found TLB entry in CRAT table (not processing)\n");
+		pr_debug("Found TLB entry in CRAT table (not processing)\n");
 		break;
 	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
 		/*
 		 * For now, nothing to do here
 		 */
-		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
+		pr_debug("Found CCOMPUTE entry in CRAT table (not processing)\n");
 		break;
 	case CRAT_SUBTYPE_IOLINK_AFFINITY:
 		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index b2d2b7e..001e473 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -890,7 +890,7 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
 	if (dev->gpu->device_info->asic_family == CHIP_CARRIZO) {
 		dev->node_props.capability |= HSA_CAP_DOORBELL_PACKET_TYPE;
-		pr_info("Adding doorbell packet type capability\n");
+		pr_debug("Adding doorbell packet type capability\n");
 	}
 
 	if (!res)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 18/37] drm/amdkfd: Simplify counting of memory banks
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 17/37] drm/amdkfd: Turn verbose topology messages into pr_debug Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 19/37] drm/amdkfd: Fix sibling_map[] size Felix Kuehling
                     ` (10 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling

Only count memory banks in one place. Ignore redundant num_banks
entry in crat_subtype_computeunit.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  3 +--
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 14 ++------------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  1 -
 3 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 8a0a9a0..ea1e0af 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -45,7 +45,6 @@ static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
 	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
 	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
 	dev->node_props.wave_front_size = cu->wave_front_size;
-	dev->node_props.mem_banks_count = cu->num_banks;
 	dev->node_props.array_count = cu->num_arrays;
 	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
 	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
@@ -111,7 +110,7 @@ static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem,
 							mem->length_low;
 			props->width = mem->width;
 
-			dev->mem_bank_count++;
+			dev->node_props.mem_banks_count++;
 			list_add_tail(&props->list, &dev->mem_props);
 
 			break;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 001e473..17e8daf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -335,18 +335,8 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 			dev->node_props.cpu_cores_count);
 	sysfs_show_32bit_prop(buffer, "simd_count",
 			dev->node_props.simd_count);
-
-	if (dev->mem_bank_count < dev->node_props.mem_banks_count) {
-		pr_info_once("mem_banks_count truncated from %d to %d\n",
-				dev->node_props.mem_banks_count,
-				dev->mem_bank_count);
-		sysfs_show_32bit_prop(buffer, "mem_banks_count",
-				dev->mem_bank_count);
-	} else {
-		sysfs_show_32bit_prop(buffer, "mem_banks_count",
-				dev->node_props.mem_banks_count);
-	}
-
+	sysfs_show_32bit_prop(buffer, "mem_banks_count",
+			dev->node_props.mem_banks_count);
 	sysfs_show_32bit_prop(buffer, "caches_count",
 			dev->node_props.caches_count);
 	sysfs_show_32bit_prop(buffer, "io_links_count",
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 0d98b61..17b2d43 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -137,7 +137,6 @@ struct kfd_topology_device {
 	uint32_t			gpu_id;
 	uint32_t			proximity_domain;
 	struct kfd_node_properties	node_props;
-	uint32_t			mem_bank_count;
 	struct list_head		mem_props;
 	uint32_t			cache_count;
 	struct list_head		cache_props;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 19/37] drm/amdkfd: Fix sibling_map[] size
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (17 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 18/37] drm/amdkfd: Simplify counting of memory banks Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 20/37] drm/amdkfd: Add topology support for CPUs Felix Kuehling
                     ` (9 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Change kfd_cache_properties.sibling_map[256] to
kfd_cache_properties.sibling_map[32]. Since, CRAT uses bitmap for
sibling_map, it is more efficient to use bitmap in the kfd structure
also.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 20 +++++++++++++-------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 +---
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 17e8daf..622feda 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -263,7 +263,7 @@ static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
 		char *buffer)
 {
 	ssize_t ret;
-	uint32_t i;
+	uint32_t i, j;
 	struct kfd_cache_properties *cache;
 
 	/* Making sure that the buffer is an empty string */
@@ -281,12 +281,18 @@ static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
 	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
 	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
 	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
-	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
-		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
-				buffer, cache->sibling_map[i],
-				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
-						"\n" : ",");
-
+	for (i = 0; i < CRAT_SIBLINGMAP_SIZE; i++)
+		for (j = 0; j < sizeof(cache->sibling_map[0])*8; j++) {
+			/* Check each bit */
+			if (cache->sibling_map[i] & (1 << j))
+				ret = snprintf(buffer, PAGE_SIZE,
+					 "%s%d%s", buffer, 1, ",");
+			else
+				ret = snprintf(buffer, PAGE_SIZE,
+					 "%s%d%s", buffer, 0, ",");
+		}
+	/* Replace the last "," with end of line */
+	*(buffer + strlen(buffer) - 1) = 0xA;
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 17b2d43..50a741b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -91,8 +91,6 @@ struct kfd_mem_properties {
 	struct attribute	attr;
 };
 
-#define KFD_TOPOLOGY_CPU_SIBLINGS 256
-
 #define HSA_CACHE_TYPE_DATA		0x00000001
 #define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
 #define HSA_CACHE_TYPE_CPU		0x00000004
@@ -109,7 +107,7 @@ struct kfd_cache_properties {
 	uint32_t		cache_assoc;
 	uint32_t		cache_latency;
 	uint32_t		cache_type;
-	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
+	uint8_t			sibling_map[CRAT_SIBLINGMAP_SIZE];
 	struct kobject		*kobj;
 	struct attribute	attr;
 };
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 20/37] drm/amdkfd: Add topology support for CPUs
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (18 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 19/37] drm/amdkfd: Fix sibling_map[] size Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
  2017-12-09  4:08   ` [PATCH 21/37] drm/amdkfd: Add topology support for dGPUs Felix Kuehling
                     ` (8 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

Currently, the KFD topology information is generated by parsing the CRAT
(ACPI) table. However, at present CRAT table is available only for AMD
APUs. To support CPUs on systems without a CRAT table, the KFD driver will
create a Virtual CRAT (VCRAT) table and then the existing code will parse
this table to generate topology.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 321 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |   9 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 190 +++++++++++++++---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   3 +
 5 files changed, 489 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index ea1e0af..00732ec 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -21,10 +21,9 @@
  */
 #include <linux/acpi.h>
 #include "kfd_crat.h"
+#include "kfd_priv.h"
 #include "kfd_topology.h"
 
-extern struct kfd_system_properties sys_props;
-
 static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
 		struct crat_subtype_computeunit *cu)
 {
@@ -281,7 +280,7 @@ static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr,
 int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
 			 uint32_t proximity_domain)
 {
-	struct kfd_topology_device *top_dev;
+	struct kfd_topology_device *top_dev = NULL;
 	struct crat_subtype_generic *sub_type_hdr;
 	uint16_t node_id;
 	int ret = 0;
@@ -314,10 +313,10 @@ int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
 		goto err;
 	}
 
-	sys_props.platform_id =
-		(*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
-	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
-	sys_props.platform_rev = crat_table->revision;
+	memcpy(top_dev->oem_id, crat_table->oem_id, CRAT_OEMID_LENGTH);
+	memcpy(top_dev->oem_table_id, crat_table->oem_table_id,
+			CRAT_OEMTABLEID_LENGTH);
+	top_dev->oem_revision = crat_table->oem_revision;
 
 	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
 	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
@@ -385,8 +384,312 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
 	return 0;
 }
 
-/*
- * kfd_destroy_crat_image
+/* Memory required to create Virtual CRAT.
+ * Since there is no easy way to predict the amount of memory required, the
+ * following amount are allocated for CPU and GPU Virtual CRAT. This is
+ * expected to cover all known conditions. But to be safe additional check
+ * is put in the code to ensure we don't overwrite.
+ */
+#define VCRAT_SIZE_FOR_CPU	(2 * PAGE_SIZE)
+#define VCRAT_SIZE_FOR_GPU	(3 * PAGE_SIZE)
+
+/* kfd_fill_cu_for_cpu - Fill in Compute info for the given CPU NUMA node
+ *
+ *	@numa_node_id: CPU NUMA node id
+ *	@avail_size: Available size in the memory
+ *	@sub_type_hdr: Memory into which compute info will be filled in
+ *
+ *	Return 0 if successful else return -ve value
+ */
+static int kfd_fill_cu_for_cpu(int numa_node_id, int *avail_size,
+				int proximity_domain,
+				struct crat_subtype_computeunit *sub_type_hdr)
+{
+	const struct cpumask *cpumask;
+
+	*avail_size -= sizeof(struct crat_subtype_computeunit);
+	if (*avail_size < 0)
+		return -ENOMEM;
+
+	memset(sub_type_hdr, 0, sizeof(struct crat_subtype_computeunit));
+
+	/* Fill in subtype header data */
+	sub_type_hdr->type = CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY;
+	sub_type_hdr->length = sizeof(struct crat_subtype_computeunit);
+	sub_type_hdr->flags = CRAT_SUBTYPE_FLAGS_ENABLED;
+
+	cpumask = cpumask_of_node(numa_node_id);
+
+	/* Fill in CU data */
+	sub_type_hdr->flags |= CRAT_CU_FLAGS_CPU_PRESENT;
+	sub_type_hdr->proximity_domain = proximity_domain;
+	sub_type_hdr->processor_id_low = kfd_numa_node_to_apic_id(numa_node_id);
+	if (sub_type_hdr->processor_id_low == -1)
+		return -EINVAL;
+
+	sub_type_hdr->num_cpu_cores = cpumask_weight(cpumask);
+
+	return 0;
+}
+
+/* kfd_fill_mem_info_for_cpu - Fill in Memory info for the given CPU NUMA node
+ *
+ *	@numa_node_id: CPU NUMA node id
+ *	@avail_size: Available size in the memory
+ *	@sub_type_hdr: Memory into which compute info will be filled in
+ *
+ *	Return 0 if successful else return -ve value
+ */
+static int kfd_fill_mem_info_for_cpu(int numa_node_id, int *avail_size,
+			int proximity_domain,
+			struct crat_subtype_memory *sub_type_hdr)
+{
+	uint64_t mem_in_bytes = 0;
+	pg_data_t *pgdat;
+	int zone_type;
+
+	*avail_size -= sizeof(struct crat_subtype_memory);
+	if (*avail_size < 0)
+		return -ENOMEM;
+
+	memset(sub_type_hdr, 0, sizeof(struct crat_subtype_memory));
+
+	/* Fill in subtype header data */
+	sub_type_hdr->type = CRAT_SUBTYPE_MEMORY_AFFINITY;
+	sub_type_hdr->length = sizeof(struct crat_subtype_memory);
+	sub_type_hdr->flags = CRAT_SUBTYPE_FLAGS_ENABLED;
+
+	/* Fill in Memory Subunit data */
+
+	/* Unlike si_meminfo, si_meminfo_node is not exported. So
+	 * the following lines are duplicated from si_meminfo_node
+	 * function
+	 */
+	pgdat = NODE_DATA(numa_node_id);
+	for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++)
+		mem_in_bytes += pgdat->node_zones[zone_type].managed_pages;
+	mem_in_bytes <<= PAGE_SHIFT;
+
+	sub_type_hdr->length_low = lower_32_bits(mem_in_bytes);
+	sub_type_hdr->length_high = upper_32_bits(mem_in_bytes);
+	sub_type_hdr->proximity_domain = proximity_domain;
+
+	return 0;
+}
+
+static int kfd_fill_iolink_info_for_cpu(int numa_node_id, int *avail_size,
+				uint32_t *num_entries,
+				struct crat_subtype_iolink *sub_type_hdr)
+{
+	int nid;
+	struct cpuinfo_x86 *c = &cpu_data(0);
+	uint8_t link_type;
+
+	if (c->x86_vendor == X86_VENDOR_AMD)
+		link_type = CRAT_IOLINK_TYPE_HYPERTRANSPORT;
+	else
+		link_type = CRAT_IOLINK_TYPE_QPI_1_1;
+
+	*num_entries = 0;
+
+	/* Create IO links from this node to other CPU nodes */
+	for_each_online_node(nid) {
+		if (nid == numa_node_id) /* node itself */
+			continue;
+
+		*avail_size -= sizeof(struct crat_subtype_iolink);
+		if (*avail_size < 0)
+			return -ENOMEM;
+
+		memset(sub_type_hdr, 0, sizeof(struct crat_subtype_iolink));
+
+		/* Fill in subtype header data */
+		sub_type_hdr->type = CRAT_SUBTYPE_IOLINK_AFFINITY;
+		sub_type_hdr->length = sizeof(struct crat_subtype_iolink);
+		sub_type_hdr->flags = CRAT_SUBTYPE_FLAGS_ENABLED;
+
+		/* Fill in IO link data */
+		sub_type_hdr->proximity_domain_from = numa_node_id;
+		sub_type_hdr->proximity_domain_to = nid;
+		sub_type_hdr->io_interface_type = link_type;
+
+		(*num_entries)++;
+		sub_type_hdr++;
+	}
+
+	return 0;
+}
+
+/* kfd_create_vcrat_image_cpu - Create Virtual CRAT for CPU
+ *
+ *	@pcrat_image: Fill in VCRAT for CPU
+ *	@size:	[IN] allocated size of crat_image.
+ *		[OUT] actual size of data filled in crat_image
+ */
+static int kfd_create_vcrat_image_cpu(void *pcrat_image, size_t *size)
+{
+	struct crat_header *crat_table = (struct crat_header *)pcrat_image;
+	struct acpi_table_header *acpi_table;
+	acpi_status status;
+	struct crat_subtype_generic *sub_type_hdr;
+	int avail_size = *size;
+	int numa_node_id;
+	uint32_t entries = 0;
+	int ret = 0;
+
+	if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_CPU)
+		return -EINVAL;
+
+	/* Fill in CRAT Header.
+	 * Modify length and total_entries as subunits are added.
+	 */
+	avail_size -= sizeof(struct crat_header);
+	if (avail_size < 0)
+		return -ENOMEM;
+
+	memset(crat_table, 0, sizeof(struct crat_header));
+	memcpy(&crat_table->signature, CRAT_SIGNATURE,
+			sizeof(crat_table->signature));
+	crat_table->length = sizeof(struct crat_header);
+
+	status = acpi_get_table("DSDT", 0, &acpi_table);
+	if (status == AE_NOT_FOUND)
+		pr_warn("DSDT table not found for OEM information\n");
+	else {
+		crat_table->oem_revision = acpi_table->revision;
+		memcpy(crat_table->oem_id, acpi_table->oem_id,
+				CRAT_OEMID_LENGTH);
+		memcpy(crat_table->oem_table_id, acpi_table->oem_table_id,
+				CRAT_OEMTABLEID_LENGTH);
+	}
+	crat_table->total_entries = 0;
+	crat_table->num_domains = 0;
+
+	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
+
+	for_each_online_node(numa_node_id) {
+		if (kfd_numa_node_to_apic_id(numa_node_id) == -1)
+			continue;
+
+		/* Fill in Subtype: Compute Unit */
+		ret = kfd_fill_cu_for_cpu(numa_node_id, &avail_size,
+			crat_table->num_domains,
+			(struct crat_subtype_computeunit *)sub_type_hdr);
+		if (ret < 0)
+			return ret;
+		crat_table->length += sub_type_hdr->length;
+		crat_table->total_entries++;
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+			sub_type_hdr->length);
+
+		/* Fill in Subtype: Memory */
+		ret = kfd_fill_mem_info_for_cpu(numa_node_id, &avail_size,
+			crat_table->num_domains,
+			(struct crat_subtype_memory *)sub_type_hdr);
+		if (ret < 0)
+			return ret;
+		crat_table->length += sub_type_hdr->length;
+		crat_table->total_entries++;
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+			sub_type_hdr->length);
+
+		/* Fill in Subtype: IO Link */
+		ret = kfd_fill_iolink_info_for_cpu(numa_node_id, &avail_size,
+				&entries,
+				(struct crat_subtype_iolink *)sub_type_hdr);
+		if (ret < 0)
+			return ret;
+		crat_table->length += (sub_type_hdr->length * entries);
+		crat_table->total_entries += entries;
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+				sub_type_hdr->length * entries);
+
+		crat_table->num_domains++;
+	}
+
+	/* TODO: Add cache Subtype for CPU.
+	 * Currently, CPU cache information is available in function
+	 * detect_cache_attributes(cpu) defined in the file
+	 * ./arch/x86/kernel/cpu/intel_cacheinfo.c. This function is not
+	 * exported and to get the same information the code needs to be
+	 * duplicated.
+	 */
+
+	*size = crat_table->length;
+	pr_info("Virtual CRAT table created for CPU\n");
+
+	return 0;
+}
+
+/* kfd_create_crat_image_virtual - Allocates memory for CRAT image and
+ *		creates a Virtual CRAT (VCRAT) image
+ *
+ * NOTE: Call kfd_destroy_crat_image to free CRAT image memory
+ *
+ *	@crat_image: VCRAT image created because ACPI does not have a
+ *		     CRAT for this device
+ *	@size: [OUT] size of virtual crat_image
+ *	@flags:	COMPUTE_UNIT_CPU - Create VCRAT for CPU device
+ *		COMPUTE_UNIT_GPU - Create VCRAT for GPU
+ *		(COMPUTE_UNIT_CPU | COMPUTE_UNIT_GPU) - Create VCRAT for APU
+ *			-- this option is not currently implemented.
+ *			The assumption is that all AMD APUs will have CRAT
+ *	@kdev: Valid kfd_device required if flags contain COMPUTE_UNIT_GPU
+ *
+ *	Return 0 if successful else return -ve value
+ */
+int kfd_create_crat_image_virtual(void **crat_image, size_t *size,
+				  int flags, struct kfd_dev *kdev,
+				  uint32_t proximity_domain)
+{
+	void *pcrat_image = NULL;
+	int ret = 0;
+
+	if (!crat_image)
+		return -EINVAL;
+
+	*crat_image = NULL;
+
+	/* Allocate one VCRAT_SIZE_FOR_CPU for CPU virtual CRAT image and
+	 * VCRAT_SIZE_FOR_GPU for GPU virtual CRAT image. This should cover
+	 * all the current conditions. A check is put not to overwrite beyond
+	 * allocated size
+	 */
+	switch (flags) {
+	case COMPUTE_UNIT_CPU:
+		pcrat_image = kmalloc(VCRAT_SIZE_FOR_CPU, GFP_KERNEL);
+		if (!pcrat_image)
+			return -ENOMEM;
+		*size = VCRAT_SIZE_FOR_CPU;
+		ret = kfd_create_vcrat_image_cpu(pcrat_image, size);
+		break;
+	case COMPUTE_UNIT_GPU:
+		/* TODO: */
+		ret = -EINVAL;
+		pr_err("VCRAT not implemented for dGPU\n");
+		break;
+	case (COMPUTE_UNIT_CPU | COMPUTE_UNIT_GPU):
+		/* TODO: */
+		ret = -EINVAL;
+		pr_err("VCRAT not implemented for APU\n");
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (!ret)
+		*crat_image = pcrat_image;
+	else
+		kfree(pcrat_image);
+
+	return ret;
+}
+
+
+/* kfd_destroy_crat_image
  *
  *	@crat_image: [IN] - crat_image from kfd_create_crat_image_xxx(..)
  *
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index 3ac55a6..aaa43ab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -44,6 +44,10 @@
 
 #define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
 
+/* Compute Unit flags */
+#define COMPUTE_UNIT_CPU	(1 << 0)  /* Create Virtual CRAT for CPU */
+#define COMPUTE_UNIT_GPU	(1 << 1)  /* Create Virtual CRAT for GPU */
+
 struct crat_header {
 	uint32_t	signature;
 	uint32_t	length;
@@ -302,9 +306,14 @@ struct cdit_header {
 
 #pragma pack()
 
+struct kfd_dev;
+
 int kfd_create_crat_image_acpi(void **crat_image, size_t *size);
 void kfd_destroy_crat_image(void *crat_image);
 int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
 			 uint32_t proximity_domain);
+int kfd_create_crat_image_virtual(void **crat_image, size_t *size,
+				  int flags, struct kfd_dev *kdev,
+				  uint32_t proximity_domain);
 
 #endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 69a6206..aeee9d4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -671,6 +671,7 @@ int kfd_topology_remove_device(struct kfd_dev *gpu);
 struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
 struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
 int kfd_topology_enum_kfd_devices(uint8_t idx, struct kfd_dev **kdev);
+int kfd_numa_node_to_apic_id(int numa_node_id);
 
 /* Interrupts */
 int kfd_interrupt_init(struct kfd_dev *dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 622feda..9aa6004 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -28,6 +28,8 @@
 #include <linux/hash.h>
 #include <linux/cpufreq.h>
 #include <linux/log2.h>
+#include <linux/dmi.h>
+#include <linux/atomic.h>
 
 #include "kfd_priv.h"
 #include "kfd_crat.h"
@@ -36,9 +38,10 @@
 
 /* topology_device_list - Master list of all topology devices */
 static struct list_head topology_device_list;
-struct kfd_system_properties sys_props;
+static struct kfd_system_properties sys_props;
 
 static DECLARE_RWSEM(topology_lock);
+static atomic_t topology_crat_proximity_domain;
 
 struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
 {
@@ -691,12 +694,92 @@ static void kfd_topology_update_device_list(struct list_head *temp_list,
 	}
 }
 
+static void kfd_debug_print_topology(void)
+{
+	struct kfd_topology_device *dev;
+
+	down_read(&topology_lock);
+
+	dev = list_last_entry(&topology_device_list,
+			struct kfd_topology_device, list);
+	if (dev) {
+		if (dev->node_props.cpu_cores_count &&
+				dev->node_props.simd_count) {
+			pr_info("Topology: Add APU node [0x%0x:0x%0x]\n",
+				dev->node_props.device_id,
+				dev->node_props.vendor_id);
+		} else if (dev->node_props.cpu_cores_count)
+			pr_info("Topology: Add CPU node\n");
+		else if (dev->node_props.simd_count)
+			pr_info("Topology: Add dGPU node [0x%0x:0x%0x]\n",
+				dev->node_props.device_id,
+				dev->node_props.vendor_id);
+	}
+	up_read(&topology_lock);
+}
+
+/* Helper function for intializing platform_xx members of
+ * kfd_system_properties. Uses OEM info from the last CPU/APU node.
+ */
+static void kfd_update_system_properties(void)
+{
+	struct kfd_topology_device *dev;
+
+	down_read(&topology_lock);
+	dev = list_last_entry(&topology_device_list,
+			struct kfd_topology_device, list);
+	if (dev) {
+		sys_props.platform_id =
+			(*((uint64_t *)dev->oem_id)) & CRAT_OEMID_64BIT_MASK;
+		sys_props.platform_oem = *((uint64_t *)dev->oem_table_id);
+		sys_props.platform_rev = dev->oem_revision;
+	}
+	up_read(&topology_lock);
+}
+
+static void find_system_memory(const struct dmi_header *dm,
+	void *private)
+{
+	struct kfd_mem_properties *mem;
+	u16 mem_width, mem_clock;
+	struct kfd_topology_device *kdev =
+		(struct kfd_topology_device *)private;
+	const u8 *dmi_data = (const u8 *)(dm + 1);
+
+	if (dm->type == DMI_ENTRY_MEM_DEVICE && dm->length >= 0x15) {
+		mem_width = (u16)(*(const u16 *)(dmi_data + 0x6));
+		mem_clock = (u16)(*(const u16 *)(dmi_data + 0x11));
+		list_for_each_entry(mem, &kdev->mem_props, list) {
+			if (mem_width != 0xFFFF && mem_width != 0)
+				mem->width = mem_width;
+			if (mem_clock != 0)
+				mem->mem_clk_max = mem_clock;
+		}
+	}
+}
+/* kfd_add_non_crat_information - Add information that is not currently
+ *	defined in CRAT but is necessary for KFD topology
+ * @dev - topology device to which addition info is added
+ */
+static void kfd_add_non_crat_information(struct kfd_topology_device *kdev)
+{
+	/* Check if CPU only node. */
+	if (!kdev->gpu) {
+		/* Add system memory information */
+		dmi_walk(find_system_memory, kdev);
+	}
+	/* TODO: For GPU node, rearrange code from kfd_topology_add_device */
+}
+
 int kfd_topology_init(void)
 {
 	void *crat_image = NULL;
 	size_t image_size = 0;
 	int ret;
 	struct list_head temp_topology_device_list;
+	int cpu_only_node = 0;
+	struct kfd_topology_device *kdev;
+	int proximity_domain;
 
 	/* topology_device_list - Master list of all topology devices
 	 * temp_topology_device_list - temporary list created while parsing CRAT
@@ -711,36 +794,78 @@ int kfd_topology_init(void)
 
 	memset(&sys_props, 0, sizeof(sys_props));
 
+	/* Proximity domains in ACPI CRAT tables start counting at
+	 * 0. The same should be true for virtual CRAT tables created
+	 * at this stage. GPUs added later in kfd_topology_add_device
+	 * use a counter.
+	 */
+	proximity_domain = 0;
+
 	/*
-	 * Get the CRAT image from the ACPI
+	 * Get the CRAT image from the ACPI. If ACPI doesn't have one
+	 * create a virtual CRAT.
+	 * NOTE: The current implementation expects all AMD APUs to have
+	 *	CRAT. If no CRAT is available, it is assumed to be a CPU
 	 */
 	ret = kfd_create_crat_image_acpi(&crat_image, &image_size);
 	if (!ret) {
 		ret = kfd_parse_crat_table(crat_image,
-					   &temp_topology_device_list, 0);
-		if (ret)
+					   &temp_topology_device_list,
+					   proximity_domain);
+		if (ret) {
+			kfd_release_topology_device_list(
+				&temp_topology_device_list);
+			kfd_destroy_crat_image(crat_image);
+			crat_image = NULL;
+		}
+	}
+
+	if (!crat_image) {
+		ret = kfd_create_crat_image_virtual(&crat_image, &image_size,
+						    COMPUTE_UNIT_CPU, NULL,
+						    proximity_domain);
+		cpu_only_node = 1;
+		if (ret) {
+			pr_err("Error creating VCRAT table for CPU\n");
+			return ret;
+		}
+
+		ret = kfd_parse_crat_table(crat_image,
+					   &temp_topology_device_list,
+					   proximity_domain);
+		if (ret) {
+			pr_err("Error parsing VCRAT table for CPU\n");
 			goto err;
-	} else if (ret == -ENODATA) {
-		/* TODO: Create fake CRAT table */
-		ret = 0;
-		goto err;
-	} else {
-		pr_err("Couldn't get CRAT table size from ACPI\n");
-		goto err;
+		}
 	}
 
 	down_write(&topology_lock);
 	kfd_topology_update_device_list(&temp_topology_device_list,
 					&topology_device_list);
+	atomic_set(&topology_crat_proximity_domain, sys_props.num_devices-1);
 	ret = kfd_topology_update_sysfs();
 	up_write(&topology_lock);
 
 	if (!ret) {
 		sys_props.generation_count++;
+		kfd_update_system_properties();
+		kfd_debug_print_topology();
 		pr_info("Finished initializing topology\n");
 	} else
 		pr_err("Failed to update topology in sysfs ret=%d\n", ret);
 
+	/* For nodes with GPU, this information gets added
+	 * when GPU is detected (kfd_topology_add_device).
+	 */
+	if (cpu_only_node) {
+		/* Add additional information to CPU only node created above */
+		down_write(&topology_lock);
+		kdev = list_first_entry(&topology_device_list,
+				struct kfd_topology_device, list);
+		up_write(&topology_lock);
+		kfd_add_non_crat_information(kdev);
+	}
+
 err:
 	kfd_destroy_crat_image(crat_image);
 	return ret;
@@ -754,21 +879,6 @@ void kfd_topology_shutdown(void)
 	up_write(&topology_lock);
 }
 
-static void kfd_debug_print_topology(void)
-{
-	struct kfd_topology_device *dev;
-	uint32_t i = 0;
-
-	pr_info("DEBUG PRINT OF TOPOLOGY:");
-	list_for_each_entry(dev, &topology_device_list, list) {
-		pr_info("Node: %d\n", i);
-		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
-		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
-		pr_info("\tSIMD count: %d\n", dev->node_props.simd_count);
-		i++;
-	}
-}
-
 static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
 {
 	uint32_t hashout;
@@ -954,6 +1064,34 @@ int kfd_topology_enum_kfd_devices(uint8_t idx, struct kfd_dev **kdev)
 
 }
 
+static int kfd_cpumask_to_apic_id(const struct cpumask *cpumask)
+{
+	const struct cpuinfo_x86 *cpuinfo;
+	int first_cpu_of_numa_node;
+
+	if (!cpumask || cpumask == cpu_none_mask)
+		return -1;
+	first_cpu_of_numa_node = cpumask_first(cpumask);
+	if (first_cpu_of_numa_node >= nr_cpu_ids)
+		return -1;
+	cpuinfo = &cpu_data(first_cpu_of_numa_node);
+
+	return cpuinfo->apicid;
+}
+
+/* kfd_numa_node_to_apic_id - Returns the APIC ID of the first logical processor
+ *	of the given NUMA node (numa_node_id)
+ * Return -1 on failure
+ */
+int kfd_numa_node_to_apic_id(int numa_node_id)
+{
+	if (numa_node_id == -1) {
+		pr_warn("Invalid NUMA Node. Use online CPU mask\n");
+		return kfd_cpumask_to_apic_id(cpu_online_mask);
+	}
+	return kfd_cpumask_to_apic_id(cpumask_of_node(numa_node_id));
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int kfd_debugfs_hqds_by_device(struct seq_file *m, void *data)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 50a741b..8668189 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -148,6 +148,9 @@ struct kfd_topology_device {
 	struct attribute		attr_gpuid;
 	struct attribute		attr_name;
 	struct attribute		attr_props;
+	uint8_t				oem_id[CRAT_OEMID_LENGTH];
+	uint8_t				oem_table_id[CRAT_OEMTABLEID_LENGTH];
+	uint32_t			oem_revision;
 };
 
 struct kfd_system_properties {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 21/37] drm/amdkfd: Add topology support for dGPUs
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (19 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 20/37] drm/amdkfd: Add topology support for CPUs Felix Kuehling
@ 2017-12-09  4:08   ` Felix Kuehling
       [not found]     ` <1512792555-26042-22-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-09  4:09   ` [PATCH 22/37] drm/amdkfd: Add perf counters to topology Felix Kuehling
                     ` (7 subsequent siblings)
  28 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:08 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Jay Cornwall, Amber Lin, Ben Goz, Felix Kuehling,
	Harish Kasiviswanathan, Kent Russell

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Generate and parse VCRAT tables for dGPUs in kfd_topology_add_device.

Some information that isn't available in the CRAT table is patched
into the topology after parsing.

HSA_CAP_DOORBELL_TYPE_1_0 is dependent on the ASIC feature
CP_HQD_PQ_CONTROL.SLOT_BASED_WPTR, which was not introduced in VI
until Carrizo. Report HSA_CAP_DOORBELL_TYPE_PRE_1_0 on Tonga ASICs.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 594 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 188 ++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   8 +-
 5 files changed, 746 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 00732ec..ba7577b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -20,10 +20,117 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 #include <linux/acpi.h>
+#include <linux/amd-iommu.h>
 #include "kfd_crat.h"
 #include "kfd_priv.h"
 #include "kfd_topology.h"
 
+/* GPU Processor ID base for dGPUs for which VCRAT needs to be created.
+ * GPU processor ID are expressed with Bit[31]=1.
+ * The base is set to 0x8000_0000 + 0x1000 to avoid collision with GPU IDs
+ * used in the CRAT.
+ */
+static uint32_t gpu_processor_id_low = 0x80001000;
+
+/* Return the next available gpu_processor_id and increment it for next GPU
+ *	@total_cu_count - Total CUs present in the GPU including ones
+ *			  masked off
+ */
+static inline unsigned int get_and_inc_gpu_processor_id(
+				unsigned int total_cu_count)
+{
+	int current_id = gpu_processor_id_low;
+
+	gpu_processor_id_low += total_cu_count;
+	return current_id;
+}
+
+/* Static table to describe GPU Cache information */
+struct kfd_gpu_cache_info {
+	uint32_t	cache_size;
+	uint32_t	cache_level;
+	uint32_t	flags;
+	/* Indicates how many Compute Units share this cache
+	 * Value = 1 indicates the cache is not shared
+	 */
+	uint32_t	num_cu_shared;
+};
+
+static struct kfd_gpu_cache_info kaveri_cache_info[] = {
+	{
+		/* TCP L1 Cache per CU */
+		.cache_size = 16,
+		.cache_level = 1,
+		.flags = (CRAT_CACHE_FLAGS_ENABLED |
+				CRAT_CACHE_FLAGS_DATA_CACHE |
+				CRAT_CACHE_FLAGS_SIMD_CACHE),
+		.num_cu_shared = 1,
+
+	},
+	{
+		/* Scalar L1 Instruction Cache (in SQC module) per bank */
+		.cache_size = 16,
+		.cache_level = 1,
+		.flags = (CRAT_CACHE_FLAGS_ENABLED |
+				CRAT_CACHE_FLAGS_INST_CACHE |
+				CRAT_CACHE_FLAGS_SIMD_CACHE),
+		.num_cu_shared = 2,
+	},
+	{
+		/* Scalar L1 Data Cache (in SQC module) per bank */
+		.cache_size = 8,
+		.cache_level = 1,
+		.flags = (CRAT_CACHE_FLAGS_ENABLED |
+				CRAT_CACHE_FLAGS_DATA_CACHE |
+				CRAT_CACHE_FLAGS_SIMD_CACHE),
+		.num_cu_shared = 2,
+	},
+
+	/* TODO: Add L2 Cache information */
+};
+
+
+static struct kfd_gpu_cache_info carrizo_cache_info[] = {
+	{
+		/* TCP L1 Cache per CU */
+		.cache_size = 16,
+		.cache_level = 1,
+		.flags = (CRAT_CACHE_FLAGS_ENABLED |
+				CRAT_CACHE_FLAGS_DATA_CACHE |
+				CRAT_CACHE_FLAGS_SIMD_CACHE),
+		.num_cu_shared = 1,
+	},
+	{
+		/* Scalar L1 Instruction Cache (in SQC module) per bank */
+		.cache_size = 8,
+		.cache_level = 1,
+		.flags = (CRAT_CACHE_FLAGS_ENABLED |
+				CRAT_CACHE_FLAGS_INST_CACHE |
+				CRAT_CACHE_FLAGS_SIMD_CACHE),
+		.num_cu_shared = 4,
+	},
+	{
+		/* Scalar L1 Data Cache (in SQC module) per bank. */
+		.cache_size = 4,
+		.cache_level = 1,
+		.flags = (CRAT_CACHE_FLAGS_ENABLED |
+				CRAT_CACHE_FLAGS_DATA_CACHE |
+				CRAT_CACHE_FLAGS_SIMD_CACHE),
+		.num_cu_shared = 4,
+	},
+
+	/* TODO: Add L2 Cache information */
+};
+
+/* NOTE: In future if more information is added to struct kfd_gpu_cache_info
+ * the following ASICs may need a separate table.
+ */
+#define hawaii_cache_info kaveri_cache_info
+#define tonga_cache_info carrizo_cache_info
+#define fiji_cache_info  carrizo_cache_info
+#define polaris10_cache_info carrizo_cache_info
+#define polaris11_cache_info carrizo_cache_info
+
 static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
 		struct crat_subtype_computeunit *cu)
 {
@@ -44,7 +151,7 @@ static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
 	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
 	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
 	dev->node_props.wave_front_size = cu->wave_front_size;
-	dev->node_props.array_count = cu->num_arrays;
+	dev->node_props.array_count = cu->array_count;
 	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
 	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
 	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
@@ -94,9 +201,16 @@ static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem,
 			if (!props)
 				return -ENOMEM;
 
-			if (dev->node_props.cpu_cores_count == 0)
-				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
-			else
+			/* We're on GPU node */
+			if (dev->node_props.cpu_cores_count == 0) {
+				/* APU */
+				if (mem->visibility_type == 0)
+					props->heap_type =
+						HSA_MEM_HEAP_TYPE_FB_PRIVATE;
+				/* dGPU */
+				else
+					props->heap_type = mem->visibility_type;
+			} else
 				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
 
 			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
@@ -128,13 +242,29 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
 	struct kfd_cache_properties *props;
 	struct kfd_topology_device *dev;
 	uint32_t id;
+	uint32_t total_num_of_cu;
 
 	id = cache->processor_id_low;
 
 	pr_debug("Found cache entry in CRAT table with processor_id=%d\n", id);
-	list_for_each_entry(dev, device_list, list)
-		if (id == dev->node_props.cpu_core_id_base ||
-		    id == dev->node_props.simd_id_base) {
+	list_for_each_entry(dev, device_list, list) {
+		total_num_of_cu = (dev->node_props.array_count *
+					dev->node_props.cu_per_simd_array);
+
+		/* Cache infomration in CRAT doesn't have proximity_domain
+		 * information as it is associated with a CPU core or GPU
+		 * Compute Unit. So map the cache using CPU core Id or SIMD
+		 * (GPU) ID.
+		 * TODO: This works because currently we can safely assume that
+		 *  Compute Units are parsed before caches are parsed. In
+		 *  future, remove this dependency
+		 */
+		if ((id >= dev->node_props.cpu_core_id_base &&
+			id <= dev->node_props.cpu_core_id_base +
+				dev->node_props.cpu_cores_count) ||
+			(id >= dev->node_props.simd_id_base &&
+			id < dev->node_props.simd_id_base +
+				total_num_of_cu)) {
 			props = kfd_alloc_struct(props);
 			if (!props)
 				return -ENOMEM;
@@ -146,6 +276,8 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
 			props->cachelines_per_tag = cache->lines_per_tag;
 			props->cache_assoc = cache->associativity;
 			props->cache_latency = cache->cache_latency;
+			memcpy(props->sibling_map, cache->sibling_map,
+					sizeof(props->sibling_map));
 
 			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
 				props->cache_type |= HSA_CACHE_TYPE_DATA;
@@ -162,6 +294,7 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
 
 			break;
 		}
+	}
 
 	return 0;
 }
@@ -172,8 +305,8 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
 static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
 					struct list_head *device_list)
 {
-	struct kfd_iolink_properties *props;
-	struct kfd_topology_device *dev;
+	struct kfd_iolink_properties *props = NULL, *props2;
+	struct kfd_topology_device *dev, *cpu_dev;
 	uint32_t id_from;
 	uint32_t id_to;
 
@@ -192,11 +325,12 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
 			props->node_to = id_to;
 			props->ver_maj = iolink->version_major;
 			props->ver_min = iolink->version_minor;
+			props->iolink_type = iolink->io_interface_type;
 
-			/*
-			 * weight factor (derived from CDIR), currently always 1
-			 */
-			props->weight = 1;
+			if (props->iolink_type == CRAT_IOLINK_TYPE_PCIEXPRESS)
+				props->weight = 20;
+			else
+				props->weight = node_distance(id_from, id_to);
 
 			props->min_latency = iolink->minimum_latency;
 			props->max_latency = iolink->maximum_latency;
@@ -208,11 +342,29 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
 			dev->io_link_count++;
 			dev->node_props.io_links_count++;
 			list_add_tail(&props->list, &dev->io_link_props);
-
 			break;
 		}
 	}
 
+	/* CPU topology is created before GPUs are detected, so CPU->GPU
+	 * links are not built at that time. If a PCIe type is discovered, it
+	 * means a GPU is detected and we are adding GPU->CPU to the topology.
+	 * At this time, also add the corresponded CPU->GPU link.
+	 */
+	if (props && props->iolink_type == CRAT_IOLINK_TYPE_PCIEXPRESS) {
+		cpu_dev = kfd_topology_device_by_proximity_domain(id_to);
+		if (!cpu_dev)
+			return -ENODEV;
+		/* same everything but the other direction */
+		props2 = kmemdup(props, sizeof(*props2), GFP_KERNEL);
+		props2->node_from = id_to;
+		props2->node_to = id_from;
+		props2->kobj = NULL;
+		cpu_dev->io_link_count++;
+		cpu_dev->node_props.io_links_count++;
+		list_add_tail(&props2->list, &cpu_dev->io_link_props);
+	}
+
 	return 0;
 }
 
@@ -338,6 +490,176 @@ int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
 	return ret;
 }
 
+/* Helper function. See kfd_fill_gpu_cache_info for parameter description */
+static int fill_in_pcache(struct crat_subtype_cache *pcache,
+				struct kfd_gpu_cache_info *pcache_info,
+				struct kfd_cu_info *cu_info,
+				int mem_available,
+				int cu_bitmask,
+				int cache_type, unsigned int cu_processor_id,
+				int cu_block)
+{
+	unsigned int cu_sibling_map_mask;
+	int first_active_cu;
+
+	/* First check if enough memory is available */
+	if (sizeof(struct crat_subtype_cache) > mem_available)
+		return -ENOMEM;
+
+	cu_sibling_map_mask = cu_bitmask;
+	cu_sibling_map_mask >>= cu_block;
+	cu_sibling_map_mask &=
+		((1 << pcache_info[cache_type].num_cu_shared) - 1);
+	first_active_cu = ffs(cu_sibling_map_mask);
+
+	/* CU could be inactive. In case of shared cache find the first active
+	 * CU. and incase of non-shared cache check if the CU is inactive. If
+	 * inactive active skip it
+	 */
+	if (first_active_cu) {
+		memset(pcache, 0, sizeof(struct crat_subtype_cache));
+		pcache->type = CRAT_SUBTYPE_CACHE_AFFINITY;
+		pcache->length = sizeof(struct crat_subtype_cache);
+		pcache->flags = pcache_info[cache_type].flags;
+		pcache->processor_id_low = cu_processor_id
+					 + (first_active_cu - 1);
+		pcache->cache_level = pcache_info[cache_type].cache_level;
+		pcache->cache_size = pcache_info[cache_type].cache_size;
+
+		/* Sibling map is w.r.t processor_id_low, so shift out
+		 * inactive CU
+		 */
+		cu_sibling_map_mask =
+			cu_sibling_map_mask >> (first_active_cu - 1);
+
+		pcache->sibling_map[0] = (uint8_t)(cu_sibling_map_mask & 0xFF);
+		pcache->sibling_map[1] =
+				(uint8_t)((cu_sibling_map_mask >> 8) & 0xFF);
+		pcache->sibling_map[2] =
+				(uint8_t)((cu_sibling_map_mask >> 16) & 0xFF);
+		pcache->sibling_map[3] =
+				(uint8_t)((cu_sibling_map_mask >> 24) & 0xFF);
+		return 0;
+	}
+	return 1;
+}
+
+/* kfd_fill_gpu_cache_info - Fill GPU cache info using kfd_gpu_cache_info
+ * tables
+ *
+ *	@kdev - [IN] GPU device
+ *	@gpu_processor_id - [IN] GPU processor ID to which these caches
+ *			    associate
+ *	@available_size - [IN] Amount of memory available in pcache
+ *	@cu_info - [IN] Compute Unit info obtained from KGD
+ *	@pcache - [OUT] memory into which cache data is to be filled in.
+ *	@size_filled - [OUT] amount of data used up in pcache.
+ *	@num_of_entries - [OUT] number of caches added
+ */
+static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
+			int gpu_processor_id,
+			int available_size,
+			struct kfd_cu_info *cu_info,
+			struct crat_subtype_cache *pcache,
+			int *size_filled,
+			int *num_of_entries)
+{
+	struct kfd_gpu_cache_info *pcache_info;
+	int num_of_cache_types = 0;
+	int i, j, k;
+	int ct = 0;
+	int mem_available = available_size;
+	unsigned int cu_processor_id;
+	int ret;
+
+	switch (kdev->device_info->asic_family) {
+	case CHIP_KAVERI:
+		pcache_info = kaveri_cache_info;
+		num_of_cache_types = ARRAY_SIZE(kaveri_cache_info);
+		break;
+	case CHIP_HAWAII:
+		pcache_info = hawaii_cache_info;
+		num_of_cache_types = ARRAY_SIZE(hawaii_cache_info);
+		break;
+	case CHIP_CARRIZO:
+		pcache_info = carrizo_cache_info;
+		num_of_cache_types = ARRAY_SIZE(carrizo_cache_info);
+		break;
+	case CHIP_TONGA:
+		pcache_info = tonga_cache_info;
+		num_of_cache_types = ARRAY_SIZE(tonga_cache_info);
+		break;
+	case CHIP_FIJI:
+		pcache_info = fiji_cache_info;
+		num_of_cache_types = ARRAY_SIZE(fiji_cache_info);
+		break;
+	case CHIP_POLARIS10:
+		pcache_info = polaris10_cache_info;
+		num_of_cache_types = ARRAY_SIZE(polaris10_cache_info);
+		break;
+	case CHIP_POLARIS11:
+		pcache_info = polaris11_cache_info;
+		num_of_cache_types = ARRAY_SIZE(polaris11_cache_info);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	*size_filled = 0;
+	*num_of_entries = 0;
+
+	/* For each type of cache listed in the kfd_gpu_cache_info table,
+	 * go through all available Compute Units.
+	 * The [i,j,k] loop will
+	 *		if kfd_gpu_cache_info.num_cu_shared = 1
+	 *			will parse through all available CU
+	 *		If (kfd_gpu_cache_info.num_cu_shared != 1)
+	 *			then it will consider only one CU from
+	 *			the shared unit
+	 */
+
+	for (ct = 0; ct < num_of_cache_types; ct++) {
+		cu_processor_id = gpu_processor_id;
+		for (i = 0; i < cu_info->num_shader_engines; i++) {
+			for (j = 0; j < cu_info->num_shader_arrays_per_engine;
+				j++) {
+				for (k = 0; k < cu_info->num_cu_per_sh;
+					k += pcache_info[ct].num_cu_shared) {
+
+					ret = fill_in_pcache(pcache,
+						pcache_info,
+						cu_info,
+						mem_available,
+						cu_info->cu_bitmap[i][j],
+						ct,
+						cu_processor_id,
+						k);
+
+					if (ret < 0)
+						break;
+
+					if (!ret) {
+						pcache++;
+						(*num_of_entries)++;
+						mem_available -=
+							sizeof(*pcache);
+						(*size_filled) +=
+							sizeof(*pcache);
+					}
+
+					/* Move to next CU block */
+					cu_processor_id +=
+						pcache_info[ct].num_cu_shared;
+				}
+			}
+		}
+	}
+
+	pr_debug("Added [%d] GPU cache entries\n", *num_of_entries);
+
+	return 0;
+}
+
 /*
  * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
  * copies CRAT from ACPI (if available).
@@ -624,6 +946,239 @@ static int kfd_create_vcrat_image_cpu(void *pcrat_image, size_t *size)
 	return 0;
 }
 
+static int kfd_fill_gpu_memory_affinity(int *avail_size,
+		struct kfd_dev *kdev, uint8_t type, uint64_t size,
+		struct crat_subtype_memory *sub_type_hdr,
+		uint32_t proximity_domain,
+		const struct kfd_local_mem_info *local_mem_info)
+{
+	*avail_size -= sizeof(struct crat_subtype_memory);
+	if (*avail_size < 0)
+		return -ENOMEM;
+
+	memset((void *)sub_type_hdr, 0, sizeof(struct crat_subtype_memory));
+	sub_type_hdr->type = CRAT_SUBTYPE_MEMORY_AFFINITY;
+	sub_type_hdr->length = sizeof(struct crat_subtype_memory);
+	sub_type_hdr->flags |= CRAT_SUBTYPE_FLAGS_ENABLED;
+
+	sub_type_hdr->proximity_domain = proximity_domain;
+
+	pr_debug("Fill gpu memory affinity - type 0x%x size 0x%llx\n",
+			type, size);
+
+	sub_type_hdr->length_low = lower_32_bits(size);
+	sub_type_hdr->length_high = upper_32_bits(size);
+
+	sub_type_hdr->width = local_mem_info->vram_width;
+	sub_type_hdr->visibility_type = type;
+
+	return 0;
+}
+
+/* kfd_fill_gpu_direct_io_link - Fill in direct io link from GPU
+ * to its NUMA node
+ *	@avail_size: Available size in the memory
+ *	@kdev - [IN] GPU device
+ *	@sub_type_hdr: Memory into which io link info will be filled in
+ *	@proximity_domain - proximity domain of the GPU node
+ *
+ *	Return 0 if successful else return -ve value
+ */
+static int kfd_fill_gpu_direct_io_link(int *avail_size,
+			struct kfd_dev *kdev,
+			struct crat_subtype_iolink *sub_type_hdr,
+			uint32_t proximity_domain)
+{
+	*avail_size -= sizeof(struct crat_subtype_iolink);
+	if (*avail_size < 0)
+		return -ENOMEM;
+
+	memset((void *)sub_type_hdr, 0, sizeof(struct crat_subtype_iolink));
+
+	/* Fill in subtype header data */
+	sub_type_hdr->type = CRAT_SUBTYPE_IOLINK_AFFINITY;
+	sub_type_hdr->length = sizeof(struct crat_subtype_iolink);
+	sub_type_hdr->flags |= CRAT_SUBTYPE_FLAGS_ENABLED;
+
+	/* Fill in IOLINK subtype.
+	 * TODO: Fill-in other fields of iolink subtype
+	 */
+	sub_type_hdr->io_interface_type = CRAT_IOLINK_TYPE_PCIEXPRESS;
+	sub_type_hdr->proximity_domain_from = proximity_domain;
+#ifdef CONFIG_NUMA
+	if (kdev->pdev->dev.numa_node == NUMA_NO_NODE)
+		sub_type_hdr->proximity_domain_to = 0;
+	else
+		sub_type_hdr->proximity_domain_to = kdev->pdev->dev.numa_node;
+#else
+	sub_type_hdr->proximity_domain_to = 0;
+#endif
+	return 0;
+}
+
+/* kfd_create_vcrat_image_gpu - Create Virtual CRAT for CPU
+ *
+ *	@pcrat_image: Fill in VCRAT for GPU
+ *	@size:	[IN] allocated size of crat_image.
+ *		[OUT] actual size of data filled in crat_image
+ */
+static int kfd_create_vcrat_image_gpu(void *pcrat_image,
+				      size_t *size, struct kfd_dev *kdev,
+				      uint32_t proximity_domain)
+{
+	struct crat_header *crat_table = (struct crat_header *)pcrat_image;
+	struct crat_subtype_generic *sub_type_hdr;
+	struct crat_subtype_computeunit *cu;
+	struct kfd_cu_info cu_info;
+	struct amd_iommu_device_info iommu_info;
+	int avail_size = *size;
+	uint32_t total_num_of_cu;
+	int num_of_cache_entries = 0;
+	int cache_mem_filled = 0;
+	int ret = 0;
+	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
+					 AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
+					 AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
+	struct kfd_local_mem_info local_mem_info;
+
+	if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
+		return -EINVAL;
+
+	/* Fill the CRAT Header.
+	 * Modify length and total_entries as subunits are added.
+	 */
+	avail_size -= sizeof(struct crat_header);
+	if (avail_size < 0)
+		return -ENOMEM;
+
+	memset(crat_table, 0, sizeof(struct crat_header));
+
+	memcpy(&crat_table->signature, CRAT_SIGNATURE,
+			sizeof(crat_table->signature));
+	/* Change length as we add more subtypes*/
+	crat_table->length = sizeof(struct crat_header);
+	crat_table->num_domains = 1;
+	crat_table->total_entries = 0;
+
+	/* Fill in Subtype: Compute Unit
+	 * First fill in the sub type header and then sub type data
+	 */
+	avail_size -= sizeof(struct crat_subtype_computeunit);
+	if (avail_size < 0)
+		return -ENOMEM;
+
+	sub_type_hdr = (struct crat_subtype_generic *)(crat_table + 1);
+	memset(sub_type_hdr, 0, sizeof(struct crat_subtype_computeunit));
+
+	sub_type_hdr->type = CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY;
+	sub_type_hdr->length = sizeof(struct crat_subtype_computeunit);
+	sub_type_hdr->flags = CRAT_SUBTYPE_FLAGS_ENABLED;
+
+	/* Fill CU subtype data */
+	cu = (struct crat_subtype_computeunit *)sub_type_hdr;
+	cu->flags |= CRAT_CU_FLAGS_GPU_PRESENT;
+	cu->proximity_domain = proximity_domain;
+
+	kdev->kfd2kgd->get_cu_info(kdev->kgd, &cu_info);
+	cu->num_simd_per_cu = cu_info.simd_per_cu;
+	cu->num_simd_cores = cu_info.simd_per_cu * cu_info.cu_active_number;
+	cu->max_waves_simd = cu_info.max_waves_per_simd;
+
+	cu->wave_front_size = cu_info.wave_front_size;
+	cu->array_count = cu_info.num_shader_arrays_per_engine *
+		cu_info.num_shader_engines;
+	total_num_of_cu = (cu->array_count * cu_info.num_cu_per_sh);
+	cu->processor_id_low = get_and_inc_gpu_processor_id(total_num_of_cu);
+	cu->num_cu_per_array = cu_info.num_cu_per_sh;
+	cu->max_slots_scatch_cu = cu_info.max_scratch_slots_per_cu;
+	cu->num_banks = cu_info.num_shader_engines;
+	cu->lds_size_in_kb = cu_info.lds_size;
+
+	cu->hsa_capability = 0;
+
+	/* Check if this node supports IOMMU. During parsing this flag will
+	 * translate to HSA_CAP_ATS_PRESENT
+	 */
+	iommu_info.flags = 0;
+	if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
+		if ((iommu_info.flags & required_iommu_flags) ==
+				required_iommu_flags)
+			cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
+	}
+
+	crat_table->length += sub_type_hdr->length;
+	crat_table->total_entries++;
+
+	/* Fill in Subtype: Memory. Only on systems with large BAR (no
+	 * private FB), report memory as public. On other systems
+	 * report the total FB size (public+private) as a single
+	 * private heap.
+	 */
+	kdev->kfd2kgd->get_local_mem_info(kdev->kgd, &local_mem_info);
+	sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+			sub_type_hdr->length);
+
+	if (local_mem_info.local_mem_size_private == 0)
+		ret = kfd_fill_gpu_memory_affinity(&avail_size,
+				kdev, HSA_MEM_HEAP_TYPE_FB_PUBLIC,
+				local_mem_info.local_mem_size_public,
+				(struct crat_subtype_memory *)sub_type_hdr,
+				proximity_domain,
+				&local_mem_info);
+	else
+		ret = kfd_fill_gpu_memory_affinity(&avail_size,
+				kdev, HSA_MEM_HEAP_TYPE_FB_PRIVATE,
+				local_mem_info.local_mem_size_public +
+				local_mem_info.local_mem_size_private,
+				(struct crat_subtype_memory *)sub_type_hdr,
+				proximity_domain,
+				&local_mem_info);
+	if (ret < 0)
+		return ret;
+
+	crat_table->length += sizeof(struct crat_subtype_memory);
+	crat_table->total_entries++;
+
+	/* TODO: Fill in cache information. This information is NOT readily
+	 * available in KGD
+	 */
+	sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+		sub_type_hdr->length);
+	ret = kfd_fill_gpu_cache_info(kdev, cu->processor_id_low,
+				avail_size,
+				&cu_info,
+				(struct crat_subtype_cache *)sub_type_hdr,
+				&cache_mem_filled,
+				&num_of_cache_entries);
+
+	if (ret < 0)
+		return ret;
+
+	crat_table->length += cache_mem_filled;
+	crat_table->total_entries += num_of_cache_entries;
+	avail_size -= cache_mem_filled;
+
+	/* Fill in Subtype: IO_LINKS
+	 *  Only direct links are added here which is Link from GPU to
+	 *  to its NUMA node. Indirect links are added by userspace.
+	 */
+	sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+		cache_mem_filled);
+	ret = kfd_fill_gpu_direct_io_link(&avail_size, kdev,
+		(struct crat_subtype_iolink *)sub_type_hdr, proximity_domain);
+
+	if (ret < 0)
+		return ret;
+
+	crat_table->length += sub_type_hdr->length;
+	crat_table->total_entries++;
+
+	*size = crat_table->length;
+	pr_info("Virtual CRAT table created for GPU\n");
+
+	return ret;
+}
+
 /* kfd_create_crat_image_virtual - Allocates memory for CRAT image and
  *		creates a Virtual CRAT (VCRAT) image
  *
@@ -667,9 +1222,14 @@ int kfd_create_crat_image_virtual(void **crat_image, size_t *size,
 		ret = kfd_create_vcrat_image_cpu(pcrat_image, size);
 		break;
 	case COMPUTE_UNIT_GPU:
-		/* TODO: */
-		ret = -EINVAL;
-		pr_err("VCRAT not implemented for dGPU\n");
+		if (!kdev)
+			return -EINVAL;
+		pcrat_image = kmalloc(VCRAT_SIZE_FOR_GPU, GFP_KERNEL);
+		if (!pcrat_image)
+			return -ENOMEM;
+		*size = VCRAT_SIZE_FOR_GPU;
+		ret = kfd_create_vcrat_image_gpu(pcrat_image, size, kdev,
+						 proximity_domain);
 		break;
 	case (COMPUTE_UNIT_CPU | COMPUTE_UNIT_GPU):
 		/* TODO: */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
index aaa43ab..c97979c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
@@ -109,7 +109,7 @@ struct crat_subtype_computeunit {
 	uint8_t		wave_front_size;
 	uint8_t		num_banks;
 	uint16_t	micro_engine_id;
-	uint8_t		num_arrays;
+	uint8_t		array_count;
 	uint8_t		num_cu_per_array;
 	uint8_t		num_simd_per_cu;
 	uint8_t		max_slots_scatch_cu;
@@ -137,7 +137,8 @@ struct crat_subtype_memory {
 	uint32_t	length_low;
 	uint32_t	length_high;
 	uint32_t	width;
-	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
+	uint8_t		visibility_type; /* for virtual (dGPU) CRAT */
+	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH - 1];
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index aeee9d4..f0327c2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -668,6 +668,8 @@ int kfd_topology_init(void);
 void kfd_topology_shutdown(void);
 int kfd_topology_add_device(struct kfd_dev *gpu);
 int kfd_topology_remove_device(struct kfd_dev *gpu);
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+						uint32_t proximity_domain);
 struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
 struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
 int kfd_topology_enum_kfd_devices(uint8_t idx, struct kfd_dev **kdev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 9aa6004..7fe7ee0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -43,6 +43,25 @@ static struct kfd_system_properties sys_props;
 static DECLARE_RWSEM(topology_lock);
 static atomic_t topology_crat_proximity_domain;
 
+struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
+						uint32_t proximity_domain)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_topology_device *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->proximity_domain == proximity_domain) {
+			device = top_dev;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
 struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
 {
 	struct kfd_topology_device *top_dev;
@@ -79,6 +98,7 @@ struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
 	return device;
 }
 
+/* Called with write topology_lock acquired */
 static void kfd_release_topology_device(struct kfd_topology_device *dev)
 {
 	struct kfd_mem_properties *mem;
@@ -394,8 +414,7 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 		}
 
 		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
-			dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(
-					dev->gpu->kgd));
+			dev->node_props.max_engine_clk_fcompute);
 
 		sysfs_show_64bit_prop(buffer, "local_mem_size",
 				(unsigned long long int) 0);
@@ -597,6 +616,7 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
 	return 0;
 }
 
+/* Called with write topology lock acquired */
 static int kfd_build_sysfs_node_tree(void)
 {
 	struct kfd_topology_device *dev;
@@ -613,6 +633,7 @@ static int kfd_build_sysfs_node_tree(void)
 	return 0;
 }
 
+/* Called with write topology lock acquired */
 static void kfd_remove_sysfs_node_tree(void)
 {
 	struct kfd_topology_device *dev;
@@ -908,19 +929,26 @@ static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
 
 	return hashout;
 }
-
+/* kfd_assign_gpu - Attach @gpu to the correct kfd topology device. If
+ *		the GPU device is not already present in the topology device
+ *		list then return NULL. This means a new topology device has to
+ *		be created for this GPU.
+ * TODO: Rather than assiging @gpu to first topology device withtout
+ *		gpu attached, it will better to have more stringent check.
+ */
 static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
 {
 	struct kfd_topology_device *dev;
 	struct kfd_topology_device *out_dev = NULL;
 
+	down_write(&topology_lock);
 	list_for_each_entry(dev, &topology_device_list, list)
 		if (!dev->gpu && (dev->node_props.simd_count > 0)) {
 			dev->gpu = gpu;
 			out_dev = dev;
 			break;
 		}
-
+	up_write(&topology_lock);
 	return out_dev;
 }
 
@@ -932,6 +960,45 @@ static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
 	 */
 }
 
+/* kfd_fill_mem_clk_max_info - Since CRAT doesn't have memory clock info,
+ *		patch this after CRAT parsing.
+ */
+static void kfd_fill_mem_clk_max_info(struct kfd_topology_device *dev)
+{
+	struct kfd_mem_properties *mem;
+	struct kfd_local_mem_info local_mem_info;
+
+	if (!dev)
+		return;
+
+	/* Currently, amdgpu driver (amdgpu_mc) deals only with GPUs with
+	 * single bank of VRAM local memory.
+	 * for dGPUs - VCRAT reports only one bank of Local Memory
+	 * for APUs - If CRAT from ACPI reports more than one bank, then
+	 *	all the banks will report the same mem_clk_max information
+	 */
+	dev->gpu->kfd2kgd->get_local_mem_info(dev->gpu->kgd,
+		&local_mem_info);
+
+	list_for_each_entry(mem, &dev->mem_props, list)
+		mem->mem_clk_max = local_mem_info.mem_clk_max;
+}
+
+static void kfd_fill_iolink_non_crat_info(struct kfd_topology_device *dev)
+{
+	struct kfd_iolink_properties *link;
+
+	if (!dev || !dev->gpu)
+		return;
+
+	/* GPU only creates direck links so apply flags setting to all */
+	if (dev->gpu->device_info->asic_family == CHIP_HAWAII)
+		list_for_each_entry(link, &dev->io_link_props, list)
+			link->flags = CRAT_IOLINK_FLAGS_ENABLED |
+				CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
+				CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
+}
+
 int kfd_topology_add_device(struct kfd_dev *gpu)
 {
 	uint32_t gpu_id;
@@ -939,6 +1006,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 	struct kfd_cu_info cu_info;
 	int res = 0;
 	struct list_head temp_topology_device_list;
+	void *crat_image = NULL;
+	size_t image_size = 0;
+	int proximity_domain;
 
 	INIT_LIST_HEAD(&temp_topology_device_list);
 
@@ -946,27 +1016,33 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 
 	pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
 
-	/*
-	 * Try to assign the GPU to existing topology device (generated from
-	 * CRAT table
+	proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
+
+	/* Check to see if this gpu device exists in the topology_device_list.
+	 * If so, assign the gpu to that device,
+	 * else create a Virtual CRAT for this gpu device and then parse that
+	 * CRAT to create a new topology device. Once created assign the gpu to
+	 * that topology device
 	 */
 	dev = kfd_assign_gpu(gpu);
 	if (!dev) {
-		pr_info("GPU was not found in the current topology. Extending.\n");
-		kfd_debug_print_topology();
-		dev = kfd_create_topology_device(&temp_topology_device_list);
-		if (!dev) {
-			res = -ENOMEM;
+		res = kfd_create_crat_image_virtual(&crat_image, &image_size,
+						    COMPUTE_UNIT_GPU, gpu,
+						    proximity_domain);
+		if (res) {
+			pr_err("Error creating VCRAT for GPU (ID: 0x%x)\n",
+			       gpu_id);
+			return res;
+		}
+		res = kfd_parse_crat_table(crat_image,
+					   &temp_topology_device_list,
+					   proximity_domain);
+		if (res) {
+			pr_err("Error parsing VCRAT for GPU (ID: 0x%x)\n",
+			       gpu_id);
 			goto err;
 		}
 
-		dev->gpu = gpu;
-
-		/*
-		 * TODO: Make a call to retrieve topology information from the
-		 * GPU vBIOS
-		 */
-
 		down_write(&topology_lock);
 		kfd_topology_update_device_list(&temp_topology_device_list,
 			&topology_device_list);
@@ -974,34 +1050,86 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		/* Update the SYSFS tree, since we added another topology
 		 * device
 		 */
-		if (kfd_topology_update_sysfs() < 0)
-			kfd_topology_release_sysfs();
-
+		res = kfd_topology_update_sysfs();
 		up_write(&topology_lock);
 
+		if (!res)
+			sys_props.generation_count++;
+		else
+			pr_err("Failed to update GPU (ID: 0x%x) to sysfs topology. res=%d\n",
+						gpu_id, res);
+		dev = kfd_assign_gpu(gpu);
+		if (WARN_ON(!dev)) {
+			res = -ENODEV;
+			goto err;
+		}
 	}
 
 	dev->gpu_id = gpu_id;
 	gpu->id = gpu_id;
+
+	/* TODO: Move the following lines to function
+	 *	kfd_add_non_crat_information
+	 */
+
+	/* Fill-in additional information that is not available in CRAT but
+	 * needed for the topology
+	 */
+
 	dev->gpu->kfd2kgd->get_cu_info(dev->gpu->kgd, &cu_info);
-	dev->node_props.simd_count = dev->node_props.simd_per_cu *
-			cu_info.cu_active_number;
+	dev->node_props.simd_arrays_per_engine =
+		cu_info.num_shader_arrays_per_engine;
+
 	dev->node_props.vendor_id = gpu->pdev->vendor;
 	dev->node_props.device_id = gpu->pdev->device;
 	dev->node_props.location_id = PCI_DEVID(gpu->pdev->bus->number,
 		gpu->pdev->devfn);
-	/*
-	 * TODO: Retrieve max engine clock values from KGD
-	 */
-
-	if (dev->gpu->device_info->asic_family == CHIP_CARRIZO) {
-		dev->node_props.capability |= HSA_CAP_DOORBELL_PACKET_TYPE;
+	dev->node_props.max_engine_clk_fcompute =
+		dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
+	dev->node_props.max_engine_clk_ccompute =
+		cpufreq_quick_get_max(0) / 1000;
+
+	kfd_fill_mem_clk_max_info(dev);
+	kfd_fill_iolink_non_crat_info(dev);
+
+	switch (dev->gpu->device_info->asic_family) {
+	case CHIP_KAVERI:
+	case CHIP_HAWAII:
+	case CHIP_TONGA:
+		dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_PRE_1_0 <<
+			HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
+			HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+		break;
+	case CHIP_CARRIZO:
+	case CHIP_FIJI:
+	case CHIP_POLARIS10:
+	case CHIP_POLARIS11:
 		pr_debug("Adding doorbell packet type capability\n");
+		dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_1_0 <<
+			HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
+			HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+		break;
+	default:
+		WARN(1, "Unexpected ASIC family %u",
+		     dev->gpu->device_info->asic_family);
 	}
 
+	/* Fix errors in CZ CRAT.
+	 * simd_count: Carrizo CRAT reports wrong simd_count, probably
+	 *		because it doesn't consider masked out CUs
+	 * capability flag: Carrizo CRAT doesn't report IOMMU
+	 *		flags. TODO: Fix this.
+	 */
+	if (dev->gpu->device_info->asic_family == CHIP_CARRIZO)
+		dev->node_props.simd_count =
+			cu_info.simd_per_cu * cu_info.cu_active_number;
+
+	kfd_debug_print_topology();
+
 	if (!res)
 		kfd_notify_gpu_change(gpu_id, 1);
 err:
+	kfd_destroy_crat_image(crat_image);
 	return res;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 8668189..55de56f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -39,8 +39,12 @@
 #define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
 #define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
 #define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
-#define HSA_CAP_RESERVED			0xfffff000
-#define HSA_CAP_DOORBELL_PACKET_TYPE		0x00001000
+#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK	0x00003000
+#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT	12
+#define HSA_CAP_RESERVED			0xffffc000
+
+#define HSA_CAP_DOORBELL_TYPE_PRE_1_0		0x0
+#define HSA_CAP_DOORBELL_TYPE_1_0		0x1
 
 struct kfd_node_properties {
 	uint32_t cpu_cores_count;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (20 preceding siblings ...)
  2017-12-09  4:08   ` [PATCH 21/37] drm/amdkfd: Add topology support for dGPUs Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
       [not found]     ` <1512792555-26042-23-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-09  4:09   ` [PATCH 23/37] drm/amdkfd: Fixup incorrect info in the CZ CRAT table Felix Kuehling
                     ` (6 subsequent siblings)
  28 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Amber Lin, Felix Kuehling, Kent Russell

From: Amber Lin <Amber.Lin@amd.com>

For hardware blocks whose performance counters are accessed via MMIO
registers, KFD provides the support for those privileged blocks. IOMMU is
one of those privileged blocks. Most performance counter properties
required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
 This patch adds properties to topology in KFD sysfs for information not
available in /sys/bus/event_source/devices/amd_iommu. They are shown at
/sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
/sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
/sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
For dGPUs, who don't have IOMMU, nothing appears under
/sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
 2 files changed, 127 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 7fe7ee0..52d20f5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
 	struct kfd_mem_properties *mem;
 	struct kfd_cache_properties *cache;
 	struct kfd_iolink_properties *iolink;
+	struct kfd_perf_properties *perf;
 
 	list_del(&dev->list);
 
@@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
 		kfree(iolink);
 	}
 
+	while (dev->perf_props.next != &dev->perf_props) {
+		perf = container_of(dev->perf_props.next,
+				struct kfd_perf_properties, list);
+		list_del(&perf->list);
+		kfree(perf);
+	}
+
 	kfree(dev);
 }
 
@@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
 	INIT_LIST_HEAD(&dev->mem_props);
 	INIT_LIST_HEAD(&dev->cache_props);
 	INIT_LIST_HEAD(&dev->io_link_props);
+	INIT_LIST_HEAD(&dev->perf_props);
 
 	list_add_tail(&dev->list, device_list);
 
@@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
 	.sysfs_ops = &cache_ops,
 };
 
+/****** Sysfs of Performance Counters ******/
+
+struct kfd_perf_attr {
+	struct kobj_attribute attr;
+	uint32_t data;
+};
+
+static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
+			char *buf)
+{
+	struct kfd_perf_attr *attr;
+
+	buf[0] = 0;
+	attr = container_of(attrs, struct kfd_perf_attr, attr);
+	if (!attr->data) /* invalid data for PMC */
+		return 0;
+	else
+		return sysfs_show_32bit_val(buf, attr->data);
+}
+
+#define KFD_PERF_DESC(_name, _data)			\
+{							\
+	.attr  = __ATTR(_name, 0444, perf_show, NULL),	\
+	.data = _data,					\
+}
+
+static struct kfd_perf_attr perf_attr_iommu[] = {
+	KFD_PERF_DESC(max_concurrent, 0),
+	KFD_PERF_DESC(num_counters, 0),
+	KFD_PERF_DESC(counter_ids, 0),
+};
+/****************************************/
+
 static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 		char *buffer)
 {
@@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
 	struct kfd_iolink_properties *iolink;
 	struct kfd_cache_properties *cache;
 	struct kfd_mem_properties *mem;
+	struct kfd_perf_properties *perf;
 
 	if (dev->kobj_iolink) {
 		list_for_each_entry(iolink, &dev->io_link_props, list)
@@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
 		dev->kobj_mem = NULL;
 	}
 
+	if (dev->kobj_perf) {
+		list_for_each_entry(perf, &dev->perf_props, list) {
+			kfree(perf->attr_group);
+			perf->attr_group = NULL;
+		}
+		kobject_del(dev->kobj_perf);
+		kobject_put(dev->kobj_perf);
+		dev->kobj_perf = NULL;
+	}
+
 	if (dev->kobj_node) {
 		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
 		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
@@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
 	struct kfd_iolink_properties *iolink;
 	struct kfd_cache_properties *cache;
 	struct kfd_mem_properties *mem;
+	struct kfd_perf_properties *perf;
 	int ret;
-	uint32_t i;
+	uint32_t i, num_attrs;
+	struct attribute **attrs;
 
 	if (WARN_ON(dev->kobj_node))
 		return -EEXIST;
@@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
 	if (!dev->kobj_iolink)
 		return -ENOMEM;
 
+	dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
+	if (!dev->kobj_perf)
+		return -ENOMEM;
+
 	/*
 	 * Creating sysfs files for node properties
 	 */
@@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
 		if (ret < 0)
 			return ret;
 		i++;
-}
+	}
+
+	/* All hardware blocks have the same number of attributes. */
+	num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
+	list_for_each_entry(perf, &dev->perf_props, list) {
+		perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
+			* num_attrs + sizeof(struct attribute_group),
+			GFP_KERNEL);
+		if (!perf->attr_group)
+			return -ENOMEM;
+
+		attrs = (struct attribute **)(perf->attr_group + 1);
+		if (!strcmp(perf->block_name, "iommu")) {
+		/* Information of IOMMU's num_counters and counter_ids is shown
+		 * under /sys/bus/event_source/devices/amd_iommu. We don't
+		 * duplicate here.
+		 */
+			perf_attr_iommu[0].data = perf->max_concurrent;
+			for (i = 0; i < num_attrs; i++)
+				attrs[i] = &perf_attr_iommu[i].attr.attr;
+		}
+		perf->attr_group->name = perf->block_name;
+		perf->attr_group->attrs = attrs;
+		ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
+		if (ret < 0)
+			return ret;
+	}
 
 	return 0;
 }
@@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
 		}
 	}
 }
+
+/*
+ * Performance counters information is not part of CRAT but we would like to
+ * put them in the sysfs under topology directory for Thunk to get the data.
+ * This function is called before updating the sysfs.
+ */
+static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
+{
+	struct kfd_perf_properties *props;
+
+	if (amd_iommu_pc_supported()) {
+		props = kfd_alloc_struct(props);
+		if (!props)
+			return -ENOMEM;
+		strcpy(props->block_name, "iommu");
+		props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
+			amd_iommu_pc_get_max_counters(0); /* assume one iommu */
+		list_add_tail(&props->list, &kdev->perf_props);
+	}
+
+	return 0;
+}
+
 /* kfd_add_non_crat_information - Add information that is not currently
  *	defined in CRAT but is necessary for KFD topology
  * @dev - topology device to which addition info is added
@@ -860,6 +968,10 @@ int kfd_topology_init(void)
 		}
 	}
 
+	kdev = list_first_entry(&temp_topology_device_list,
+				struct kfd_topology_device, list);
+	kfd_add_perf_to_topology(kdev);
+
 	down_write(&topology_lock);
 	kfd_topology_update_device_list(&temp_topology_device_list,
 					&topology_device_list);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 55de56f..b9f3142 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -134,6 +134,13 @@ struct kfd_iolink_properties {
 	struct attribute	attr;
 };
 
+struct kfd_perf_properties {
+	struct list_head	list;
+	char			block_name[16];
+	uint32_t		max_concurrent;
+	struct attribute_group	*attr_group;
+};
+
 struct kfd_topology_device {
 	struct list_head		list;
 	uint32_t			gpu_id;
@@ -144,11 +151,13 @@ struct kfd_topology_device {
 	struct list_head		cache_props;
 	uint32_t			io_link_count;
 	struct list_head		io_link_props;
+	struct list_head		perf_props;
 	struct kfd_dev			*gpu;
 	struct kobject			*kobj_node;
 	struct kobject			*kobj_mem;
 	struct kobject			*kobj_cache;
 	struct kobject			*kobj_iolink;
+	struct kobject			*kobj_perf;
 	struct attribute		attr_gpuid;
 	struct attribute		attr_name;
 	struct attribute		attr_props;
@@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
 		struct list_head *device_list);
 void kfd_release_topology_device_list(struct list_head *device_list);
 
+extern bool amd_iommu_pc_supported(void);
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+
 #endif /* __KFD_TOPOLOGY_H__ */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 23/37] drm/amdkfd: Fixup incorrect info in the CZ CRAT table
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (21 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 22/37] drm/amdkfd: Add perf counters to topology Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
  2017-12-09  4:09   ` [PATCH 24/37] drm/amdkfd: Add AQL Queue Memory flag on topology Felix Kuehling
                     ` (5 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Philip Cox, Felix Kuehling, Kent Russell

From: Philip Cox <Philip.Cox@amd.com>

* Wrong value for max_waves_per_simd
* Missing ATC capability bit

Signed-off-by: Philip Cox <Philip.Cox@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 52d20f5..80bc71d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1229,12 +1229,15 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 	/* Fix errors in CZ CRAT.
 	 * simd_count: Carrizo CRAT reports wrong simd_count, probably
 	 *		because it doesn't consider masked out CUs
-	 * capability flag: Carrizo CRAT doesn't report IOMMU
-	 *		flags. TODO: Fix this.
+	 * max_waves_per_simd: Carrizo reports wrong max_waves_per_simd
+	 * capability flag: Carrizo CRAT doesn't report IOMMU flags
 	 */
-	if (dev->gpu->device_info->asic_family == CHIP_CARRIZO)
+	if (dev->gpu->device_info->asic_family == CHIP_CARRIZO) {
 		dev->node_props.simd_count =
 			cu_info.simd_per_cu * cu_info.cu_active_number;
+		dev->node_props.max_waves_per_simd = 10;
+		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
+	}
 
 	kfd_debug_print_topology();
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 24/37] drm/amdkfd: Add AQL Queue Memory flag on topology
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (22 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 23/37] drm/amdkfd: Fixup incorrect info in the CZ CRAT table Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
  2017-12-09  4:09   ` [PATCH 25/37] drm/amdkfd: Module option to disable CRAT table Felix Kuehling
                     ` (4 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Ben Goz, Felix Kuehling

From: Ben Goz <ben.goz@amd.com>

This is needed for enabling a user-mode workaround for an AQL queue
wrapping HW bug on Tonga.

Signed-off-by: Ben Goz <ben.goz@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 80bc71d..e7daf2c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -455,6 +455,10 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 				HSA_CAP_WATCH_POINTS_TOTALBITS_MASK);
 		}
 
+		if (dev->gpu->device_info->asic_family == CHIP_TONGA)
+			dev->node_props.capability |=
+					HSA_CAP_AQL_QUEUE_DOUBLE_MAP;
+
 		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
 			dev->node_props.max_engine_clk_fcompute);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index b9f3142..53fca1f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -45,6 +45,7 @@
 
 #define HSA_CAP_DOORBELL_TYPE_PRE_1_0		0x0
 #define HSA_CAP_DOORBELL_TYPE_1_0		0x1
+#define HSA_CAP_AQL_QUEUE_DOUBLE_MAP		0x00004000
 
 struct kfd_node_properties {
 	uint32_t cpu_cores_count;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 25/37] drm/amdkfd: Module option to disable CRAT table
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (23 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 24/37] drm/amdkfd: Add AQL Queue Memory flag on topology Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
  2017-12-09  4:09   ` [PATCH 26/37] drm/amdkfd: Ignore ACPI CRAT for non-APU systems Felix Kuehling
                     ` (3 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling

Some systems have broken CRAT tables. Add a module option to ignore
a CRAT table.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c   | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index ba7577b..a028623 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -694,6 +694,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size)
 		return -EINVAL;
 	}
 
+	if (ignore_crat) {
+		pr_info("CRAT table disabled by module option\n");
+		return -ENODATA;
+	}
+
 	pcrat_image = kmalloc(crat_table->length, GFP_KERNEL);
 	if (!pcrat_image)
 		return -ENOMEM;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index f50e494..3ac72be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -69,6 +69,11 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
 	"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = enable)");
 
+int ignore_crat;
+module_param(ignore_crat, int, 0444);
+MODULE_PARM_DESC(ignore_crat,
+	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
+
 static int amdkfd_init_completed;
 
 int kgd2kfd_init(unsigned int interface_version,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f0327c2..6a48d29 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -104,6 +104,12 @@ extern int cwsr_enable;
  */
 extern int send_sigterm;
 
+/*
+ * Ignore CRAT table during KFD initialization, can be used to work around
+ * broken CRAT tables on some AMD systems
+ */
+extern int ignore_crat;
+
 /**
  * enum kfd_sched_policy
  *
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 26/37] drm/amdkfd: Ignore ACPI CRAT for non-APU systems
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (24 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 25/37] drm/amdkfd: Module option to disable CRAT table Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
  2017-12-09  4:09   ` [PATCH 27/37] drm/amdgpu: Add support for reporting VRAM usage Felix Kuehling
                     ` (2 subsequent siblings)
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Some AMD motherboards without an APU have a broken CRAT table which
causes KFD initialization failures or incorrect information about
NUMA nodes, CPU cores or system memory. Ignore CRAT tables without
GPUs and rely on KFD's code to create a CRAT table for the CPU.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index e7daf2c..7f0d41e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -904,6 +904,25 @@ static void kfd_add_non_crat_information(struct kfd_topology_device *kdev)
 	/* TODO: For GPU node, rearrange code from kfd_topology_add_device */
 }
 
+/* kfd_is_acpi_crat_invalid - CRAT from ACPI is valid only for AMD APU devices.
+ *	Ignore CRAT for all other devices. AMD APU is identified if both CPU
+ *	and GPU cores are present.
+ * @device_list - topology device list created by parsing ACPI CRAT table.
+ * @return - TRUE if invalid, FALSE is valid.
+ */
+static bool kfd_is_acpi_crat_invalid(struct list_head *device_list)
+{
+	struct kfd_topology_device *dev;
+
+	list_for_each_entry(dev, device_list, list) {
+		if (dev->node_props.cpu_cores_count &&
+			dev->node_props.simd_count)
+			return false;
+	}
+	pr_info("Ignoring ACPI CRAT on non-APU system\n");
+	return true;
+}
+
 int kfd_topology_init(void)
 {
 	void *crat_image = NULL;
@@ -936,7 +955,7 @@ int kfd_topology_init(void)
 
 	/*
 	 * Get the CRAT image from the ACPI. If ACPI doesn't have one
-	 * create a virtual CRAT.
+	 * or if ACPI CRAT is invalid create a virtual CRAT.
 	 * NOTE: The current implementation expects all AMD APUs to have
 	 *	CRAT. If no CRAT is available, it is assumed to be a CPU
 	 */
@@ -945,7 +964,8 @@ int kfd_topology_init(void)
 		ret = kfd_parse_crat_table(crat_image,
 					   &temp_topology_device_list,
 					   proximity_domain);
-		if (ret) {
+		if (ret ||
+		    kfd_is_acpi_crat_invalid(&temp_topology_device_list)) {
 			kfd_release_topology_device_list(
 				&temp_topology_device_list);
 			kfd_destroy_crat_image(crat_image);
@@ -971,7 +991,6 @@ int kfd_topology_init(void)
 			goto err;
 		}
 	}
-
 	kdev = list_first_entry(&temp_topology_device_list,
 				struct kfd_topology_device, list);
 	kfd_add_perf_to_topology(kdev);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 27/37] drm/amdgpu: Add support for reporting VRAM usage
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (25 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 26/37] drm/amdkfd: Ignore ACPI CRAT for non-APU systems Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
  2017-12-09  4:09   ` [PATCH 28/37] drm/amdkfd: Add support for displaying " Felix Kuehling
  2017-12-10 10:26   ` [PATCH 00/37] KFD dGPU topology and initialization Oded Gabbay
  28 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Kent Russell

From: Kent Russell <kent.russell@amd.com>

Add functions to report the vram_usage from the amdgpu_device

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 7 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 3 ++-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   | 3 +++
 5 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 972ecf0..51284bf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -314,3 +314,10 @@ void get_cu_info(struct kgd_dev *kgd, struct kfd_cu_info *cu_info)
 	cu_info->max_scratch_slots_per_cu = acu_info.max_scratch_slots_per_cu;
 	cu_info->lds_size = acu_info.lds_size;
 }
+
+uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+
+	return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index eed7dea..2a519f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -62,6 +62,7 @@ uint64_t get_gpu_clock_counter(struct kgd_dev *kgd);
 
 uint32_t get_max_engine_clock_in_mhz(struct kgd_dev *kgd);
 void get_cu_info(struct kgd_dev *kgd, struct kfd_cu_info *cu_info);
+uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd);
 
 #define read_user_wptr(mmptr, wptr, dst)				\
 	({								\
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index c9e2fbe..3d60e1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -200,7 +200,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
-	.get_cu_info = get_cu_info
+	.get_cu_info = get_cu_info,
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 72ff646..66b513e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -161,7 +161,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
-	.get_cu_info = get_cu_info
+	.get_cu_info = get_cu_info,
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 0899cee..a6752bd 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -177,6 +177,8 @@ struct tile_config {
  *
  * @get_cu_info: Retrieves activated cu info
  *
+ * @get_vram_usage: Returns current VRAM usage
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -267,6 +269,7 @@ struct kfd2kgd_calls {
 
 	void (*get_cu_info)(struct kgd_dev *kgd,
 			struct kfd_cu_info *cu_info);
+	uint64_t (*get_vram_usage)(struct kgd_dev *kgd);
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (26 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 27/37] drm/amdgpu: Add support for reporting VRAM usage Felix Kuehling
@ 2017-12-09  4:09   ` Felix Kuehling
       [not found]     ` <1512792555-26042-29-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2017-12-10 10:26   ` [PATCH 00/37] KFD dGPU topology and initialization Oded Gabbay
  28 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Felix Kuehling, Kent Russell

From: Kent Russell <kent.russell@amd.com>

Add a sysfs file in topology (node/x/memory_banks/X/used_memory) that
reports the current VRAM usage for that node. Only works for GPU nodes
at this time.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49 +++++++++++++++++++++++++++----
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
 2 files changed, 46 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 7f0d41e..7f04038 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -186,6 +186,8 @@ struct kfd_topology_device *kfd_create_topology_device(
 		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
 #define sysfs_show_32bit_val(buffer, value) \
 		sysfs_show_gen_prop(buffer, "%u\n", value)
+#define sysfs_show_64bit_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%llu\n", value)
 #define sysfs_show_str_val(buffer, value) \
 		sysfs_show_gen_prop(buffer, "%s\n", value)
 
@@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
 {
 	ssize_t ret;
 	struct kfd_mem_properties *mem;
+	uint64_t used_mem;
 
 	/* Making sure that the buffer is an empty string */
 	buffer[0] = 0;
 
-	mem = container_of(attr, struct kfd_mem_properties, attr);
+	if (strcmp(attr->name, "used_memory") == 0) {
+		mem = container_of(attr, struct kfd_mem_properties, attr_used);
+		if (mem->gpu) {
+			used_mem = mem->gpu->kfd2kgd->get_vram_usage(
+								mem->gpu->kgd);
+			return sysfs_show_64bit_val(buffer, used_mem);
+		}
+		/* TODO: Report APU/CPU-allocated memory; For now return 0 */
+		return 0;
+	}
+
+	mem = container_of(attr, struct kfd_mem_properties, attr_props);
 	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
 	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
 	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
@@ -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
 	if (dev->kobj_mem) {
 		list_for_each_entry(mem, &dev->mem_props, list)
 			if (mem->kobj) {
-				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
+				/* TODO: Remove when CPU/APU supported */
+				if (dev->node_props.cpu_cores_count == 0)
+					sysfs_remove_file(mem->kobj,
+							&mem->attr_used);
+				kfd_remove_sysfs_file(mem->kobj,
+						&mem->attr_props);
 				mem->kobj = NULL;
 			}
 		kobject_del(dev->kobj_mem);
@@ -629,12 +648,23 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
 		if (ret < 0)
 			return ret;
 
-		mem->attr.name = "properties";
-		mem->attr.mode = KFD_SYSFS_FILE_MODE;
-		sysfs_attr_init(&mem->attr);
-		ret = sysfs_create_file(mem->kobj, &mem->attr);
+		mem->attr_props.name = "properties";
+		mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&mem->attr_props);
+		ret = sysfs_create_file(mem->kobj, &mem->attr_props);
 		if (ret < 0)
 			return ret;
+
+		/* TODO: Support APU/CPU memory usage */
+		if (dev->node_props.cpu_cores_count == 0) {
+			mem->attr_used.name = "used_memory";
+			mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
+			sysfs_attr_init(&mem->attr_used);
+			ret = sysfs_create_file(mem->kobj, &mem->attr_used);
+			if (ret < 0)
+				return ret;
+		}
+
 		i++;
 	}
 
@@ -1075,15 +1105,22 @@ static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
 {
 	struct kfd_topology_device *dev;
 	struct kfd_topology_device *out_dev = NULL;
+	struct kfd_mem_properties *mem;
 
 	down_write(&topology_lock);
 	list_for_each_entry(dev, &topology_device_list, list)
 		if (!dev->gpu && (dev->node_props.simd_count > 0)) {
 			dev->gpu = gpu;
 			out_dev = dev;
+
+			/* Assign mem->gpu */
+			list_for_each_entry(mem, &dev->mem_props, list)
+				mem->gpu = dev->gpu;
+
 			break;
 		}
 	up_write(&topology_lock);
+
 	return out_dev;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 53fca1f..0f698d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -93,7 +93,9 @@ struct kfd_mem_properties {
 	uint32_t		width;
 	uint32_t		mem_clk_max;
 	struct kobject		*kobj;
-	struct attribute	attr;
+	struct kfd_dev		*gpu;
+	struct attribute	attr_props;
+	struct attribute	attr_used;
 };
 
 #define HSA_CACHE_TYPE_DATA		0x00000001
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
  2017-12-09  4:08 [PATCH 00/37] KFD dGPU topology and initialization Felix Kuehling
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-09  4:09 ` Felix Kuehling
  2017-12-12 23:27   ` Bjorn Helgaas
  1 sibling, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-09  4:09 UTC (permalink / raw)
  To: oded.gabbay, amd-gfx; +Cc: Jay Cornwall, linux-pci, Felix Kuehling

From: Jay Cornwall <Jay.Cornwall@amd.com>

The PCIe 3.0 AtomicOp (6.15) feature allows atomic transctions to be
requested by, routed through and completed by PCIe components. Routing and
completion do not require software support. Component support for each is
detectable via the DEVCAP2 register.

AtomicOp requests are permitted only if a component's
DEVCTL2.ATOMICOP_REQUESTER_ENABLE field is set. This capability cannot be
detected but is a no-op if set on a component with no support. These
requests can only be serviced if the upstream components support AtomicOp
completion and/or routing to a component which does.

A concrete example is the AMD Fiji-class GPU, which is specified to
support AtomicOp requests, routed through a PLX 8747 switch (advertising
AtomicOp routing) to a Haswell host bridge (advertising AtomicOp
completion support). When AtomicOp requests are disabled the GPU logs
attempts to initiate requests to an MMIO register for debugging.

Add pci_enable_atomic_ops_to_root for per-device control over AtomicOp
requests. Upstream bridges are checked for AtomicOp routing capability and
the call fails if any lack this capability. The root port is checked for
AtomicOp completion capabilities and the call fails if it does not support
any. Routes to other PCIe components are not checked for AtomicOp routing
and completion capabilities.

v2: Check for AtomicOp route to root port with AtomicOp completion
v3: Style fixes
v4: Endpoint to root port only, check upstream egress blocking
v5: Rebase, use existing PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK define

CC: linux-pci@vger.kernel.org
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/pci/pci.c             | 81 +++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci.h           |  1 +
 include/uapi/linux/pci_regs.h |  2 ++
 3 files changed, 84 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6078dfc..89a8bb0 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2966,6 +2966,87 @@ bool pci_acs_path_enabled(struct pci_dev *start,
 }
 
 /**
+ * pci_enable_atomic_ops_to_root - enable AtomicOp requests to root port
+ * @dev: the PCI device
+ *
+ * Return 0 if the device is capable of generating AtomicOp requests,
+ * all upstream bridges support AtomicOp routing, egress blocking is disabled
+ * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
+ * 128-bit AtomicOp completion, or negative otherwise.
+ */
+int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
+{
+	struct pci_bus *bus = dev->bus;
+
+	if (!pci_is_pcie(dev))
+		return -EINVAL;
+
+	switch (pci_pcie_type(dev)) {
+	/*
+	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
+	 * to implement AtomicOp requester capabilities.
+	 */
+	case PCI_EXP_TYPE_ENDPOINT:
+	case PCI_EXP_TYPE_LEG_END:
+	case PCI_EXP_TYPE_RC_END:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	while (bus->parent) {
+		struct pci_dev *bridge = bus->self;
+		u32 cap;
+
+		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
+
+		switch (pci_pcie_type(bridge)) {
+		/*
+		 * Upstream, downstream and root ports may implement AtomicOp
+		 * routing capabilities. AtomicOp routing via a root port is
+		 * not considered.
+		 */
+		case PCI_EXP_TYPE_UPSTREAM:
+		case PCI_EXP_TYPE_DOWNSTREAM:
+			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
+				return -EINVAL;
+			break;
+
+		/*
+		 * Root ports are permitted to implement AtomicOp completion
+		 * capabilities.
+		 */
+		case PCI_EXP_TYPE_ROOT_PORT:
+			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
+				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
+				return -EINVAL;
+			break;
+		}
+
+		/*
+		 * Upstream ports may block AtomicOps on egress.
+		 */
+		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {
+			u32 ctl2;
+
+			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
+						   &ctl2);
+			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
+				return -EINVAL;
+		}
+
+		bus = bus->parent;
+	}
+
+	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
+				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
+
+	return 0;
+}
+EXPORT_SYMBOL(pci_enable_atomic_ops_to_root);
+
+/**
  * pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge
  * @dev: the PCI device
  * @pin: the INTx pin (1=INTA, 2=INTB, 3=INTC, 4=INTD)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f4f8ee5..2a39f63 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2062,6 +2062,7 @@ void pci_request_acs(void);
 bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
 bool pci_acs_path_enabled(struct pci_dev *start,
 			  struct pci_dev *end, u16 acs_flags);
+int pci_enable_atomic_ops_to_root(struct pci_dev *dev);
 
 #define PCI_VPD_LRDT			0x80	/* Large Resource Data Type */
 #define PCI_VPD_LRDT_ID(x)		((x) | PCI_VPD_LRDT)
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index f8d5804..45f251a 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -623,7 +623,9 @@
 #define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
 #define  PCI_EXP_DEVCAP2_ARI		0x00000020 /* Alternative Routing-ID */
 #define  PCI_EXP_DEVCAP2_ATOMIC_ROUTE	0x00000040 /* Atomic Op routing */
+#define  PCI_EXP_DEVCAP2_ATOMIC_COMP32	0x00000080 /* 32b AtomicOp completion */
 #define PCI_EXP_DEVCAP2_ATOMIC_COMP64	0x00000100 /* Atomic 64-bit compare */
+#define  PCI_EXP_DEVCAP2_ATOMIC_COMP128	0x00000200 /* 128b AtomicOp completion*/
 #define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
 #define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
 #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH 00/37] KFD dGPU topology and initialization
       [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (27 preceding siblings ...)
  2017-12-09  4:09   ` [PATCH 28/37] drm/amdkfd: Add support for displaying " Felix Kuehling
@ 2017-12-10 10:26   ` Oded Gabbay
  28 siblings, 0 replies; 62+ messages in thread
From: Oded Gabbay @ 2017-12-10 10:26 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Sat, Dec 9, 2017 at 6:08 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> This patch series adds support for dGPU topology to KFD and implements
> everything needed to initialize KFD on dGPUs.

I'm excited!

>
> This is still missing dGPU memory management APIs, so it's not going to
> be able to run any user mode tests yet. But device information about CPUs
> and supported dGPUs should be reported correctly in
> /sys/class/kfd/kfd/topology/nodes/*.
>
> Patches 1-10 are small fixes and additions to the topology code.
> Patches 11-19 reorganize the topology code to prepare for dGPU support.
> Patches 20 and 21 add topology support for CPUs and dGPUs respectively.
> Patches 22-28 add more topology features and fixes/workarounds.
> Patch 29 adds a helper to enable PCIe atomics to the PCI driver.
We'll need to find someone to RB this patch.

> Patches 30-36 enable KFD initialization on dGPUs
> Patch 37 enables KFD initialization on supported dGPUs in AMDGPU.
>
> This is my last patch series this year. I worked hard to finish this
> before my year-end vacation.

I know it was hard work. Well done! You deserve a big Christmas
present this year ;)

I'll try to go over all of the patches in time for 4.16, and I really
think we can make it.
Are you available for fixes in these patches in case the need arises ?

Thanks,
Oded

>
> Amber Lin (1):
>   drm/amdkfd: Add perf counters to topology
>
> Ben Goz (1):
>   drm/amdkfd: Add AQL Queue Memory flag on topology
>
> Felix Kuehling (13):
>   drm/amdkfd: Group up CRAT related functions
>   drm/amdkfd: Turn verbose topology messages into pr_debug
>   drm/amdkfd: Simplify counting of memory banks
>   drm/amdkfd: Add topology support for CPUs
>   drm/amdkfd: Module option to disable CRAT table
>   drm/amdkfd: Conditionally enable PCIe atomics
>   drm/amdkfd: Make IOMMUv2 code conditional
>   drm/amdkfd: Make sched_policy a per-device setting
>   drm/amdkfd: Add dGPU support to the device queue manager
>   drm/amdkfd: Add dGPU support to the MQD manager
>   drm/amdkfd: Add dGPU support to kernel_queue_init
>   drm/amdkfd: Add dGPU device IDs and device info
>   drm/amdgpu: Enable KFD initialization on dGPUs
>
> Flora Cui (3):
>   drm/amd: add new interface to query cu info
>   drm/amdgpu: add amdgpu interface to query cu info
>   drm/amdkfd: Update number of compute unit from KGD
>
> Harish Kasiviswanathan (13):
>   drm/amd: Add get_local_mem_info to KGD-KFD interface
>   drm/amdgpu: Implement get_local_mem_info
>   drm/amdkfd: Stop using get_vmem_size KGD-KFD interface
>   drm/amdkfd: Remove deprecated get_vmem_size
>   drm/amd: Remove get_vmem_size from KGD-KFD interface
>   drm/amdkfd: Topology: Fix location_id
>   drm/amdkfd: Reorganize CRAT fetching from ACPI
>   drm/amdkfd: Decouple CRAT parsing from device list update
>   drm/amdkfd: Support enumerating non-GPU devices
>   drm/amdkfd: sync IOLINK defines to thunk spec
>   drm/amdkfd: Fix sibling_map[] size
>   drm/amdkfd: Add topology support for dGPUs
>   drm/amdkfd: Ignore ACPI CRAT for non-APU systems
>
> Jay Cornwall (1):
>   PCI: Add pci_enable_atomic_ops_to_root
>
> Kent Russell (3):
>   drm/amdkfd: Coding style cleanup
>   drm/amdgpu: Add support for reporting VRAM usage
>   drm/amdkfd: Add support for displaying VRAM usage
>
> Philip Cox (1):
>   drm/amdkfd: Fixup incorrect info in the CZ CRAT table
>
> Yong Zhao (1):
>   drm/amdkfd: Fix memory leaks in kfd topology
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h                |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   65 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |    4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |    4 +-
>  drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c              |    7 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c              |    5 +
>  drivers/gpu/drm/amd/amdkfd/Kconfig                 |    2 +-
>  drivers/gpu/drm/amd/amdkfd/Makefile                |    2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |    3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c              | 1271 ++++++++++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.h              |   42 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.c            |    3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  230 +++-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |   33 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    5 +
>  .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |   56 +
>  .../drm/amd/amdkfd/kfd_device_queue_manager_vi.c   |   93 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |    7 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |    5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    7 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   35 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   21 +
>  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c             |    2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |   21 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   17 +-
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |    3 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          | 1054 +++++++++-------
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |   39 +-
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   35 +-
>  drivers/pci/pci.c                                  |   81 ++
>  include/linux/pci.h                                |    1 +
>  include/uapi/linux/pci_regs.h                      |    2 +
>  35 files changed, 2656 insertions(+), 512 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 21/37] drm/amdkfd: Add topology support for dGPUs
       [not found]     ` <1512792555-26042-22-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-11 14:46       ` Oded Gabbay
       [not found]         ` <CAFCwf10_AOKQaQU31Mnn+2fO=awvO-DWaM7bTzO-khjk=yw+8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-11 14:46 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Jay Cornwall, Amber Lin, Ben Goz, Harish Kasiviswanathan,
	amd-gfx list, Kent Russell

On Sat, Dec 9, 2017 at 6:08 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>
> Generate and parse VCRAT tables for dGPUs in kfd_topology_add_device.
>
> Some information that isn't available in the CRAT table is patched
> into the topology after parsing.
>
> HSA_CAP_DOORBELL_TYPE_1_0 is dependent on the ASIC feature
> CP_HQD_PQ_CONTROL.SLOT_BASED_WPTR, which was not introduced in VI
> until Carrizo. Report HSA_CAP_DOORBELL_TYPE_PRE_1_0 on Tonga ASICs.
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
> Signed-off-by: Kent Russell <kent.russell@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 594 +++++++++++++++++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.h     |   5 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 188 ++++++++--
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   8 +-
>  5 files changed, 746 insertions(+), 51 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 00732ec..ba7577b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -20,10 +20,117 @@
>   * OTHER DEALINGS IN THE SOFTWARE.
>   */
>  #include <linux/acpi.h>
> +#include <linux/amd-iommu.h>
>  #include "kfd_crat.h"
>  #include "kfd_priv.h"
>  #include "kfd_topology.h"
>
> +/* GPU Processor ID base for dGPUs for which VCRAT needs to be created.
> + * GPU processor ID are expressed with Bit[31]=1.
> + * The base is set to 0x8000_0000 + 0x1000 to avoid collision with GPU IDs
> + * used in the CRAT.
> + */
> +static uint32_t gpu_processor_id_low = 0x80001000;
> +
> +/* Return the next available gpu_processor_id and increment it for next GPU
> + *     @total_cu_count - Total CUs present in the GPU including ones
> + *                       masked off
> + */
> +static inline unsigned int get_and_inc_gpu_processor_id(
> +                               unsigned int total_cu_count)
> +{
> +       int current_id = gpu_processor_id_low;
> +
> +       gpu_processor_id_low += total_cu_count;
> +       return current_id;
> +}
> +
> +/* Static table to describe GPU Cache information */
> +struct kfd_gpu_cache_info {
> +       uint32_t        cache_size;
> +       uint32_t        cache_level;
> +       uint32_t        flags;
> +       /* Indicates how many Compute Units share this cache
> +        * Value = 1 indicates the cache is not shared
> +        */
> +       uint32_t        num_cu_shared;
> +};
> +
> +static struct kfd_gpu_cache_info kaveri_cache_info[] = {
> +       {
> +               /* TCP L1 Cache per CU */
> +               .cache_size = 16,
> +               .cache_level = 1,
> +               .flags = (CRAT_CACHE_FLAGS_ENABLED |
> +                               CRAT_CACHE_FLAGS_DATA_CACHE |
> +                               CRAT_CACHE_FLAGS_SIMD_CACHE),
> +               .num_cu_shared = 1,
> +
> +       },
> +       {
> +               /* Scalar L1 Instruction Cache (in SQC module) per bank */
> +               .cache_size = 16,
> +               .cache_level = 1,
> +               .flags = (CRAT_CACHE_FLAGS_ENABLED |
> +                               CRAT_CACHE_FLAGS_INST_CACHE |
> +                               CRAT_CACHE_FLAGS_SIMD_CACHE),
> +               .num_cu_shared = 2,
> +       },
> +       {
> +               /* Scalar L1 Data Cache (in SQC module) per bank */
> +               .cache_size = 8,
> +               .cache_level = 1,
> +               .flags = (CRAT_CACHE_FLAGS_ENABLED |
> +                               CRAT_CACHE_FLAGS_DATA_CACHE |
> +                               CRAT_CACHE_FLAGS_SIMD_CACHE),
> +               .num_cu_shared = 2,
> +       },
> +
> +       /* TODO: Add L2 Cache information */
> +};
> +
> +
> +static struct kfd_gpu_cache_info carrizo_cache_info[] = {
> +       {
> +               /* TCP L1 Cache per CU */
> +               .cache_size = 16,
> +               .cache_level = 1,
> +               .flags = (CRAT_CACHE_FLAGS_ENABLED |
> +                               CRAT_CACHE_FLAGS_DATA_CACHE |
> +                               CRAT_CACHE_FLAGS_SIMD_CACHE),
> +               .num_cu_shared = 1,
> +       },
> +       {
> +               /* Scalar L1 Instruction Cache (in SQC module) per bank */
> +               .cache_size = 8,
> +               .cache_level = 1,
> +               .flags = (CRAT_CACHE_FLAGS_ENABLED |
> +                               CRAT_CACHE_FLAGS_INST_CACHE |
> +                               CRAT_CACHE_FLAGS_SIMD_CACHE),
> +               .num_cu_shared = 4,
> +       },
> +       {
> +               /* Scalar L1 Data Cache (in SQC module) per bank. */
> +               .cache_size = 4,
> +               .cache_level = 1,
> +               .flags = (CRAT_CACHE_FLAGS_ENABLED |
> +                               CRAT_CACHE_FLAGS_DATA_CACHE |
> +                               CRAT_CACHE_FLAGS_SIMD_CACHE),
> +               .num_cu_shared = 4,
> +       },
> +
> +       /* TODO: Add L2 Cache information */
> +};
> +
> +/* NOTE: In future if more information is added to struct kfd_gpu_cache_info
> + * the following ASICs may need a separate table.
> + */
> +#define hawaii_cache_info kaveri_cache_info
> +#define tonga_cache_info carrizo_cache_info
> +#define fiji_cache_info  carrizo_cache_info
> +#define polaris10_cache_info carrizo_cache_info
> +#define polaris11_cache_info carrizo_cache_info
> +
>  static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
>                 struct crat_subtype_computeunit *cu)
>  {
> @@ -44,7 +151,7 @@ static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
>         dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
>         dev->node_props.max_waves_per_simd = cu->max_waves_simd;
>         dev->node_props.wave_front_size = cu->wave_front_size;
> -       dev->node_props.array_count = cu->num_arrays;
> +       dev->node_props.array_count = cu->array_count;
>         dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
>         dev->node_props.simd_per_cu = cu->num_simd_per_cu;
>         dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
> @@ -94,9 +201,16 @@ static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem,
>                         if (!props)
>                                 return -ENOMEM;
>
> -                       if (dev->node_props.cpu_cores_count == 0)
> -                               props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
> -                       else
> +                       /* We're on GPU node */
> +                       if (dev->node_props.cpu_cores_count == 0) {
> +                               /* APU */
> +                               if (mem->visibility_type == 0)
> +                                       props->heap_type =
> +                                               HSA_MEM_HEAP_TYPE_FB_PRIVATE;
> +                               /* dGPU */
> +                               else
> +                                       props->heap_type = mem->visibility_type;
> +                       } else
>                                 props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
>
>                         if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
> @@ -128,13 +242,29 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
>         struct kfd_cache_properties *props;
>         struct kfd_topology_device *dev;
>         uint32_t id;
> +       uint32_t total_num_of_cu;
>
>         id = cache->processor_id_low;
>
>         pr_debug("Found cache entry in CRAT table with processor_id=%d\n", id);
> -       list_for_each_entry(dev, device_list, list)
> -               if (id == dev->node_props.cpu_core_id_base ||
> -                   id == dev->node_props.simd_id_base) {
> +       list_for_each_entry(dev, device_list, list) {
> +               total_num_of_cu = (dev->node_props.array_count *
> +                                       dev->node_props.cu_per_simd_array);
> +
> +               /* Cache infomration in CRAT doesn't have proximity_domain
> +                * information as it is associated with a CPU core or GPU
> +                * Compute Unit. So map the cache using CPU core Id or SIMD
> +                * (GPU) ID.
> +                * TODO: This works because currently we can safely assume that
> +                *  Compute Units are parsed before caches are parsed. In
> +                *  future, remove this dependency
> +                */
> +               if ((id >= dev->node_props.cpu_core_id_base &&
> +                       id <= dev->node_props.cpu_core_id_base +
> +                               dev->node_props.cpu_cores_count) ||
> +                       (id >= dev->node_props.simd_id_base &&
> +                       id < dev->node_props.simd_id_base +
> +                               total_num_of_cu)) {
>                         props = kfd_alloc_struct(props);
>                         if (!props)
>                                 return -ENOMEM;
> @@ -146,6 +276,8 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
>                         props->cachelines_per_tag = cache->lines_per_tag;
>                         props->cache_assoc = cache->associativity;
>                         props->cache_latency = cache->cache_latency;
> +                       memcpy(props->sibling_map, cache->sibling_map,
> +                                       sizeof(props->sibling_map));
>
>                         if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
>                                 props->cache_type |= HSA_CACHE_TYPE_DATA;
> @@ -162,6 +294,7 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
>
>                         break;
>                 }
> +       }
>
>         return 0;
>  }
> @@ -172,8 +305,8 @@ static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache,
>  static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
>                                         struct list_head *device_list)
>  {
> -       struct kfd_iolink_properties *props;
> -       struct kfd_topology_device *dev;
> +       struct kfd_iolink_properties *props = NULL, *props2;
> +       struct kfd_topology_device *dev, *cpu_dev;
>         uint32_t id_from;
>         uint32_t id_to;
>
> @@ -192,11 +325,12 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
>                         props->node_to = id_to;
>                         props->ver_maj = iolink->version_major;
>                         props->ver_min = iolink->version_minor;
> +                       props->iolink_type = iolink->io_interface_type;
>
> -                       /*
> -                        * weight factor (derived from CDIR), currently always 1
> -                        */
> -                       props->weight = 1;
> +                       if (props->iolink_type == CRAT_IOLINK_TYPE_PCIEXPRESS)
> +                               props->weight = 20;
> +                       else
> +                               props->weight = node_distance(id_from, id_to);
>
>                         props->min_latency = iolink->minimum_latency;
>                         props->max_latency = iolink->maximum_latency;
> @@ -208,11 +342,29 @@ static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink,
>                         dev->io_link_count++;
>                         dev->node_props.io_links_count++;
>                         list_add_tail(&props->list, &dev->io_link_props);
> -
>                         break;
>                 }
>         }
>
> +       /* CPU topology is created before GPUs are detected, so CPU->GPU
> +        * links are not built at that time. If a PCIe type is discovered, it
> +        * means a GPU is detected and we are adding GPU->CPU to the topology.
> +        * At this time, also add the corresponded CPU->GPU link.
> +        */
> +       if (props && props->iolink_type == CRAT_IOLINK_TYPE_PCIEXPRESS) {
> +               cpu_dev = kfd_topology_device_by_proximity_domain(id_to);
> +               if (!cpu_dev)
> +                       return -ENODEV;
> +               /* same everything but the other direction */
> +               props2 = kmemdup(props, sizeof(*props2), GFP_KERNEL);
> +               props2->node_from = id_to;
> +               props2->node_to = id_from;
> +               props2->kobj = NULL;
> +               cpu_dev->io_link_count++;
> +               cpu_dev->node_props.io_links_count++;
> +               list_add_tail(&props2->list, &cpu_dev->io_link_props);
> +       }
> +
>         return 0;
>  }
>
> @@ -338,6 +490,176 @@ int kfd_parse_crat_table(void *crat_image, struct list_head *device_list,
>         return ret;
>  }
>
> +/* Helper function. See kfd_fill_gpu_cache_info for parameter description */
> +static int fill_in_pcache(struct crat_subtype_cache *pcache,
> +                               struct kfd_gpu_cache_info *pcache_info,
> +                               struct kfd_cu_info *cu_info,
> +                               int mem_available,
> +                               int cu_bitmask,
> +                               int cache_type, unsigned int cu_processor_id,
> +                               int cu_block)
> +{
> +       unsigned int cu_sibling_map_mask;
> +       int first_active_cu;
> +
> +       /* First check if enough memory is available */
> +       if (sizeof(struct crat_subtype_cache) > mem_available)
> +               return -ENOMEM;
> +
> +       cu_sibling_map_mask = cu_bitmask;
> +       cu_sibling_map_mask >>= cu_block;
> +       cu_sibling_map_mask &=
> +               ((1 << pcache_info[cache_type].num_cu_shared) - 1);
> +       first_active_cu = ffs(cu_sibling_map_mask);
> +
> +       /* CU could be inactive. In case of shared cache find the first active
> +        * CU. and incase of non-shared cache check if the CU is inactive. If
> +        * inactive active skip it
> +        */
> +       if (first_active_cu) {
> +               memset(pcache, 0, sizeof(struct crat_subtype_cache));
> +               pcache->type = CRAT_SUBTYPE_CACHE_AFFINITY;
> +               pcache->length = sizeof(struct crat_subtype_cache);
> +               pcache->flags = pcache_info[cache_type].flags;
> +               pcache->processor_id_low = cu_processor_id
> +                                        + (first_active_cu - 1);
> +               pcache->cache_level = pcache_info[cache_type].cache_level;
> +               pcache->cache_size = pcache_info[cache_type].cache_size;
> +
> +               /* Sibling map is w.r.t processor_id_low, so shift out
> +                * inactive CU
> +                */
> +               cu_sibling_map_mask =
> +                       cu_sibling_map_mask >> (first_active_cu - 1);
> +
> +               pcache->sibling_map[0] = (uint8_t)(cu_sibling_map_mask & 0xFF);
> +               pcache->sibling_map[1] =
> +                               (uint8_t)((cu_sibling_map_mask >> 8) & 0xFF);
> +               pcache->sibling_map[2] =
> +                               (uint8_t)((cu_sibling_map_mask >> 16) & 0xFF);
> +               pcache->sibling_map[3] =
> +                               (uint8_t)((cu_sibling_map_mask >> 24) & 0xFF);
> +               return 0;
> +       }
> +       return 1;
> +}
> +
> +/* kfd_fill_gpu_cache_info - Fill GPU cache info using kfd_gpu_cache_info
> + * tables
> + *
> + *     @kdev - [IN] GPU device
> + *     @gpu_processor_id - [IN] GPU processor ID to which these caches
> + *                         associate
> + *     @available_size - [IN] Amount of memory available in pcache
> + *     @cu_info - [IN] Compute Unit info obtained from KGD
> + *     @pcache - [OUT] memory into which cache data is to be filled in.
> + *     @size_filled - [OUT] amount of data used up in pcache.
> + *     @num_of_entries - [OUT] number of caches added
> + */
> +static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
> +                       int gpu_processor_id,
> +                       int available_size,
> +                       struct kfd_cu_info *cu_info,
> +                       struct crat_subtype_cache *pcache,
> +                       int *size_filled,
> +                       int *num_of_entries)
> +{
> +       struct kfd_gpu_cache_info *pcache_info;
> +       int num_of_cache_types = 0;
> +       int i, j, k;
> +       int ct = 0;
> +       int mem_available = available_size;
> +       unsigned int cu_processor_id;
> +       int ret;
> +
> +       switch (kdev->device_info->asic_family) {
> +       case CHIP_KAVERI:
> +               pcache_info = kaveri_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(kaveri_cache_info);
> +               break;
> +       case CHIP_HAWAII:
> +               pcache_info = hawaii_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(hawaii_cache_info);
> +               break;
> +       case CHIP_CARRIZO:
> +               pcache_info = carrizo_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(carrizo_cache_info);
> +               break;
> +       case CHIP_TONGA:
> +               pcache_info = tonga_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(tonga_cache_info);
> +               break;
> +       case CHIP_FIJI:
> +               pcache_info = fiji_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(fiji_cache_info);
> +               break;
> +       case CHIP_POLARIS10:
> +               pcache_info = polaris10_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(polaris10_cache_info);
> +               break;
> +       case CHIP_POLARIS11:
> +               pcache_info = polaris11_cache_info;
> +               num_of_cache_types = ARRAY_SIZE(polaris11_cache_info);
> +               break;
> +       default:
> +               return -EINVAL;
> +       }
> +
> +       *size_filled = 0;
> +       *num_of_entries = 0;
> +
> +       /* For each type of cache listed in the kfd_gpu_cache_info table,
> +        * go through all available Compute Units.
> +        * The [i,j,k] loop will
> +        *              if kfd_gpu_cache_info.num_cu_shared = 1
> +        *                      will parse through all available CU
> +        *              If (kfd_gpu_cache_info.num_cu_shared != 1)
> +        *                      then it will consider only one CU from
> +        *                      the shared unit
> +        */
> +
> +       for (ct = 0; ct < num_of_cache_types; ct++) {
> +               cu_processor_id = gpu_processor_id;
> +               for (i = 0; i < cu_info->num_shader_engines; i++) {
> +                       for (j = 0; j < cu_info->num_shader_arrays_per_engine;
> +                               j++) {
> +                               for (k = 0; k < cu_info->num_cu_per_sh;
> +                                       k += pcache_info[ct].num_cu_shared) {
> +
> +                                       ret = fill_in_pcache(pcache,
> +                                               pcache_info,
> +                                               cu_info,
> +                                               mem_available,
> +                                               cu_info->cu_bitmap[i][j],
> +                                               ct,
> +                                               cu_processor_id,
> +                                               k);
> +
> +                                       if (ret < 0)
> +                                               break;
> +
> +                                       if (!ret) {
> +                                               pcache++;
> +                                               (*num_of_entries)++;
> +                                               mem_available -=
> +                                                       sizeof(*pcache);
> +                                               (*size_filled) +=
> +                                                       sizeof(*pcache);
> +                                       }
> +
> +                                       /* Move to next CU block */
> +                                       cu_processor_id +=
> +                                               pcache_info[ct].num_cu_shared;
> +                               }
> +                       }
> +               }
> +       }
> +
> +       pr_debug("Added [%d] GPU cache entries\n", *num_of_entries);
> +
> +       return 0;
> +}
> +
>  /*
>   * kfd_create_crat_image_acpi - Allocates memory for CRAT image and
>   * copies CRAT from ACPI (if available).
> @@ -624,6 +946,239 @@ static int kfd_create_vcrat_image_cpu(void *pcrat_image, size_t *size)
>         return 0;
>  }
>
> +static int kfd_fill_gpu_memory_affinity(int *avail_size,
> +               struct kfd_dev *kdev, uint8_t type, uint64_t size,
> +               struct crat_subtype_memory *sub_type_hdr,
> +               uint32_t proximity_domain,
> +               const struct kfd_local_mem_info *local_mem_info)
> +{
> +       *avail_size -= sizeof(struct crat_subtype_memory);
> +       if (*avail_size < 0)
> +               return -ENOMEM;
> +
> +       memset((void *)sub_type_hdr, 0, sizeof(struct crat_subtype_memory));
> +       sub_type_hdr->type = CRAT_SUBTYPE_MEMORY_AFFINITY;
> +       sub_type_hdr->length = sizeof(struct crat_subtype_memory);
> +       sub_type_hdr->flags |= CRAT_SUBTYPE_FLAGS_ENABLED;
> +
> +       sub_type_hdr->proximity_domain = proximity_domain;
> +
> +       pr_debug("Fill gpu memory affinity - type 0x%x size 0x%llx\n",
> +                       type, size);
> +
> +       sub_type_hdr->length_low = lower_32_bits(size);
> +       sub_type_hdr->length_high = upper_32_bits(size);
> +
> +       sub_type_hdr->width = local_mem_info->vram_width;
> +       sub_type_hdr->visibility_type = type;
> +
> +       return 0;
> +}
> +
> +/* kfd_fill_gpu_direct_io_link - Fill in direct io link from GPU
> + * to its NUMA node
> + *     @avail_size: Available size in the memory
> + *     @kdev - [IN] GPU device
> + *     @sub_type_hdr: Memory into which io link info will be filled in
> + *     @proximity_domain - proximity domain of the GPU node
> + *
> + *     Return 0 if successful else return -ve value
> + */
> +static int kfd_fill_gpu_direct_io_link(int *avail_size,
> +                       struct kfd_dev *kdev,
> +                       struct crat_subtype_iolink *sub_type_hdr,
> +                       uint32_t proximity_domain)
> +{
> +       *avail_size -= sizeof(struct crat_subtype_iolink);
> +       if (*avail_size < 0)
> +               return -ENOMEM;
> +
> +       memset((void *)sub_type_hdr, 0, sizeof(struct crat_subtype_iolink));
> +
> +       /* Fill in subtype header data */
> +       sub_type_hdr->type = CRAT_SUBTYPE_IOLINK_AFFINITY;
> +       sub_type_hdr->length = sizeof(struct crat_subtype_iolink);
> +       sub_type_hdr->flags |= CRAT_SUBTYPE_FLAGS_ENABLED;
> +
> +       /* Fill in IOLINK subtype.
> +        * TODO: Fill-in other fields of iolink subtype
> +        */
> +       sub_type_hdr->io_interface_type = CRAT_IOLINK_TYPE_PCIEXPRESS;
> +       sub_type_hdr->proximity_domain_from = proximity_domain;
> +#ifdef CONFIG_NUMA
> +       if (kdev->pdev->dev.numa_node == NUMA_NO_NODE)
Had to add #include <linux/pci.h> at the head of the file to make this
line compile

Oded


> +               sub_type_hdr->proximity_domain_to = 0;
> +       else
> +               sub_type_hdr->proximity_domain_to = kdev->pdev->dev.numa_node;
> +#else
> +       sub_type_hdr->proximity_domain_to = 0;
> +#endif
> +       return 0;
> +}
> +
> +/* kfd_create_vcrat_image_gpu - Create Virtual CRAT for CPU
> + *
> + *     @pcrat_image: Fill in VCRAT for GPU
> + *     @size:  [IN] allocated size of crat_image.
> + *             [OUT] actual size of data filled in crat_image
> + */
> +static int kfd_create_vcrat_image_gpu(void *pcrat_image,
> +                                     size_t *size, struct kfd_dev *kdev,
> +                                     uint32_t proximity_domain)
> +{
> +       struct crat_header *crat_table = (struct crat_header *)pcrat_image;
> +       struct crat_subtype_generic *sub_type_hdr;
> +       struct crat_subtype_computeunit *cu;
> +       struct kfd_cu_info cu_info;
> +       struct amd_iommu_device_info iommu_info;
> +       int avail_size = *size;
> +       uint32_t total_num_of_cu;
> +       int num_of_cache_entries = 0;
> +       int cache_mem_filled = 0;
> +       int ret = 0;
> +       const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> +                                        AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> +                                        AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> +       struct kfd_local_mem_info local_mem_info;
> +
> +       if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
> +               return -EINVAL;
> +
> +       /* Fill the CRAT Header.
> +        * Modify length and total_entries as subunits are added.
> +        */
> +       avail_size -= sizeof(struct crat_header);
> +       if (avail_size < 0)
> +               return -ENOMEM;
> +
> +       memset(crat_table, 0, sizeof(struct crat_header));
> +
> +       memcpy(&crat_table->signature, CRAT_SIGNATURE,
> +                       sizeof(crat_table->signature));
> +       /* Change length as we add more subtypes*/
> +       crat_table->length = sizeof(struct crat_header);
> +       crat_table->num_domains = 1;
> +       crat_table->total_entries = 0;
> +
> +       /* Fill in Subtype: Compute Unit
> +        * First fill in the sub type header and then sub type data
> +        */
> +       avail_size -= sizeof(struct crat_subtype_computeunit);
> +       if (avail_size < 0)
> +               return -ENOMEM;
> +
> +       sub_type_hdr = (struct crat_subtype_generic *)(crat_table + 1);
> +       memset(sub_type_hdr, 0, sizeof(struct crat_subtype_computeunit));
> +
> +       sub_type_hdr->type = CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY;
> +       sub_type_hdr->length = sizeof(struct crat_subtype_computeunit);
> +       sub_type_hdr->flags = CRAT_SUBTYPE_FLAGS_ENABLED;
> +
> +       /* Fill CU subtype data */
> +       cu = (struct crat_subtype_computeunit *)sub_type_hdr;
> +       cu->flags |= CRAT_CU_FLAGS_GPU_PRESENT;
> +       cu->proximity_domain = proximity_domain;
> +
> +       kdev->kfd2kgd->get_cu_info(kdev->kgd, &cu_info);
> +       cu->num_simd_per_cu = cu_info.simd_per_cu;
> +       cu->num_simd_cores = cu_info.simd_per_cu * cu_info.cu_active_number;
> +       cu->max_waves_simd = cu_info.max_waves_per_simd;
> +
> +       cu->wave_front_size = cu_info.wave_front_size;
> +       cu->array_count = cu_info.num_shader_arrays_per_engine *
> +               cu_info.num_shader_engines;
> +       total_num_of_cu = (cu->array_count * cu_info.num_cu_per_sh);
> +       cu->processor_id_low = get_and_inc_gpu_processor_id(total_num_of_cu);
> +       cu->num_cu_per_array = cu_info.num_cu_per_sh;
> +       cu->max_slots_scatch_cu = cu_info.max_scratch_slots_per_cu;
> +       cu->num_banks = cu_info.num_shader_engines;
> +       cu->lds_size_in_kb = cu_info.lds_size;
> +
> +       cu->hsa_capability = 0;
> +
> +       /* Check if this node supports IOMMU. During parsing this flag will
> +        * translate to HSA_CAP_ATS_PRESENT
> +        */
> +       iommu_info.flags = 0;
> +       if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
> +               if ((iommu_info.flags & required_iommu_flags) ==
> +                               required_iommu_flags)
> +                       cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
> +       }
> +
> +       crat_table->length += sub_type_hdr->length;
> +       crat_table->total_entries++;
> +
> +       /* Fill in Subtype: Memory. Only on systems with large BAR (no
> +        * private FB), report memory as public. On other systems
> +        * report the total FB size (public+private) as a single
> +        * private heap.
> +        */
> +       kdev->kfd2kgd->get_local_mem_info(kdev->kgd, &local_mem_info);
> +       sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
> +                       sub_type_hdr->length);
> +
> +       if (local_mem_info.local_mem_size_private == 0)
> +               ret = kfd_fill_gpu_memory_affinity(&avail_size,
> +                               kdev, HSA_MEM_HEAP_TYPE_FB_PUBLIC,
> +                               local_mem_info.local_mem_size_public,
> +                               (struct crat_subtype_memory *)sub_type_hdr,
> +                               proximity_domain,
> +                               &local_mem_info);
> +       else
> +               ret = kfd_fill_gpu_memory_affinity(&avail_size,
> +                               kdev, HSA_MEM_HEAP_TYPE_FB_PRIVATE,
> +                               local_mem_info.local_mem_size_public +
> +                               local_mem_info.local_mem_size_private,
> +                               (struct crat_subtype_memory *)sub_type_hdr,
> +                               proximity_domain,
> +                               &local_mem_info);
> +       if (ret < 0)
> +               return ret;
> +
> +       crat_table->length += sizeof(struct crat_subtype_memory);
> +       crat_table->total_entries++;
> +
> +       /* TODO: Fill in cache information. This information is NOT readily
> +        * available in KGD
> +        */
> +       sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
> +               sub_type_hdr->length);
> +       ret = kfd_fill_gpu_cache_info(kdev, cu->processor_id_low,
> +                               avail_size,
> +                               &cu_info,
> +                               (struct crat_subtype_cache *)sub_type_hdr,
> +                               &cache_mem_filled,
> +                               &num_of_cache_entries);
> +
> +       if (ret < 0)
> +               return ret;
> +
> +       crat_table->length += cache_mem_filled;
> +       crat_table->total_entries += num_of_cache_entries;
> +       avail_size -= cache_mem_filled;
> +
> +       /* Fill in Subtype: IO_LINKS
> +        *  Only direct links are added here which is Link from GPU to
> +        *  to its NUMA node. Indirect links are added by userspace.
> +        */
> +       sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
> +               cache_mem_filled);
> +       ret = kfd_fill_gpu_direct_io_link(&avail_size, kdev,
> +               (struct crat_subtype_iolink *)sub_type_hdr, proximity_domain);
> +
> +       if (ret < 0)
> +               return ret;
> +
> +       crat_table->length += sub_type_hdr->length;
> +       crat_table->total_entries++;
> +
> +       *size = crat_table->length;
> +       pr_info("Virtual CRAT table created for GPU\n");
> +
> +       return ret;
> +}
> +
>  /* kfd_create_crat_image_virtual - Allocates memory for CRAT image and
>   *             creates a Virtual CRAT (VCRAT) image
>   *
> @@ -667,9 +1222,14 @@ int kfd_create_crat_image_virtual(void **crat_image, size_t *size,
>                 ret = kfd_create_vcrat_image_cpu(pcrat_image, size);
>                 break;
>         case COMPUTE_UNIT_GPU:
> -               /* TODO: */
> -               ret = -EINVAL;
> -               pr_err("VCRAT not implemented for dGPU\n");
> +               if (!kdev)
> +                       return -EINVAL;
> +               pcrat_image = kmalloc(VCRAT_SIZE_FOR_GPU, GFP_KERNEL);
> +               if (!pcrat_image)
> +                       return -ENOMEM;
> +               *size = VCRAT_SIZE_FOR_GPU;
> +               ret = kfd_create_vcrat_image_gpu(pcrat_image, size, kdev,
> +                                                proximity_domain);
>                 break;
>         case (COMPUTE_UNIT_CPU | COMPUTE_UNIT_GPU):
>                 /* TODO: */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
> index aaa43ab..c97979c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.h
> @@ -109,7 +109,7 @@ struct crat_subtype_computeunit {
>         uint8_t         wave_front_size;
>         uint8_t         num_banks;
>         uint16_t        micro_engine_id;
> -       uint8_t         num_arrays;
> +       uint8_t         array_count;
>         uint8_t         num_cu_per_array;
>         uint8_t         num_simd_per_cu;
>         uint8_t         max_slots_scatch_cu;
> @@ -137,7 +137,8 @@ struct crat_subtype_memory {
>         uint32_t        length_low;
>         uint32_t        length_high;
>         uint32_t        width;
> -       uint8_t         reserved2[CRAT_MEMORY_RESERVED_LENGTH];
> +       uint8_t         visibility_type; /* for virtual (dGPU) CRAT */
> +       uint8_t         reserved2[CRAT_MEMORY_RESERVED_LENGTH - 1];
>  };
>
>  /*
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index aeee9d4..f0327c2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -668,6 +668,8 @@ int kfd_topology_init(void);
>  void kfd_topology_shutdown(void);
>  int kfd_topology_add_device(struct kfd_dev *gpu);
>  int kfd_topology_remove_device(struct kfd_dev *gpu);
> +struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
> +                                               uint32_t proximity_domain);
>  struct kfd_dev *kfd_device_by_id(uint32_t gpu_id);
>  struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev);
>  int kfd_topology_enum_kfd_devices(uint8_t idx, struct kfd_dev **kdev);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 9aa6004..7fe7ee0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -43,6 +43,25 @@ static struct kfd_system_properties sys_props;
>  static DECLARE_RWSEM(topology_lock);
>  static atomic_t topology_crat_proximity_domain;
>
> +struct kfd_topology_device *kfd_topology_device_by_proximity_domain(
> +                                               uint32_t proximity_domain)
> +{
> +       struct kfd_topology_device *top_dev;
> +       struct kfd_topology_device *device = NULL;
> +
> +       down_read(&topology_lock);
> +
> +       list_for_each_entry(top_dev, &topology_device_list, list)
> +               if (top_dev->proximity_domain == proximity_domain) {
> +                       device = top_dev;
> +                       break;
> +               }
> +
> +       up_read(&topology_lock);
> +
> +       return device;
> +}
> +
>  struct kfd_dev *kfd_device_by_id(uint32_t gpu_id)
>  {
>         struct kfd_topology_device *top_dev;
> @@ -79,6 +98,7 @@ struct kfd_dev *kfd_device_by_pci_dev(const struct pci_dev *pdev)
>         return device;
>  }
>
> +/* Called with write topology_lock acquired */
>  static void kfd_release_topology_device(struct kfd_topology_device *dev)
>  {
>         struct kfd_mem_properties *mem;
> @@ -394,8 +414,7 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>                 }
>
>                 sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
> -                       dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(
> -                                       dev->gpu->kgd));
> +                       dev->node_props.max_engine_clk_fcompute);
>
>                 sysfs_show_64bit_prop(buffer, "local_mem_size",
>                                 (unsigned long long int) 0);
> @@ -597,6 +616,7 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>         return 0;
>  }
>
> +/* Called with write topology lock acquired */
>  static int kfd_build_sysfs_node_tree(void)
>  {
>         struct kfd_topology_device *dev;
> @@ -613,6 +633,7 @@ static int kfd_build_sysfs_node_tree(void)
>         return 0;
>  }
>
> +/* Called with write topology lock acquired */
>  static void kfd_remove_sysfs_node_tree(void)
>  {
>         struct kfd_topology_device *dev;
> @@ -908,19 +929,26 @@ static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
>
>         return hashout;
>  }
> -
> +/* kfd_assign_gpu - Attach @gpu to the correct kfd topology device. If
> + *             the GPU device is not already present in the topology device
> + *             list then return NULL. This means a new topology device has to
> + *             be created for this GPU.
> + * TODO: Rather than assiging @gpu to first topology device withtout
> + *             gpu attached, it will better to have more stringent check.
> + */
>  static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>  {
>         struct kfd_topology_device *dev;
>         struct kfd_topology_device *out_dev = NULL;
>
> +       down_write(&topology_lock);
>         list_for_each_entry(dev, &topology_device_list, list)
>                 if (!dev->gpu && (dev->node_props.simd_count > 0)) {
>                         dev->gpu = gpu;
>                         out_dev = dev;
>                         break;
>                 }
> -
> +       up_write(&topology_lock);
>         return out_dev;
>  }
>
> @@ -932,6 +960,45 @@ static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
>          */
>  }
>
> +/* kfd_fill_mem_clk_max_info - Since CRAT doesn't have memory clock info,
> + *             patch this after CRAT parsing.
> + */
> +static void kfd_fill_mem_clk_max_info(struct kfd_topology_device *dev)
> +{
> +       struct kfd_mem_properties *mem;
> +       struct kfd_local_mem_info local_mem_info;
> +
> +       if (!dev)
> +               return;
> +
> +       /* Currently, amdgpu driver (amdgpu_mc) deals only with GPUs with
> +        * single bank of VRAM local memory.
> +        * for dGPUs - VCRAT reports only one bank of Local Memory
> +        * for APUs - If CRAT from ACPI reports more than one bank, then
> +        *      all the banks will report the same mem_clk_max information
> +        */
> +       dev->gpu->kfd2kgd->get_local_mem_info(dev->gpu->kgd,
> +               &local_mem_info);
> +
> +       list_for_each_entry(mem, &dev->mem_props, list)
> +               mem->mem_clk_max = local_mem_info.mem_clk_max;
> +}
> +
> +static void kfd_fill_iolink_non_crat_info(struct kfd_topology_device *dev)
> +{
> +       struct kfd_iolink_properties *link;
> +
> +       if (!dev || !dev->gpu)
> +               return;
> +
> +       /* GPU only creates direck links so apply flags setting to all */
> +       if (dev->gpu->device_info->asic_family == CHIP_HAWAII)
> +               list_for_each_entry(link, &dev->io_link_props, list)
> +                       link->flags = CRAT_IOLINK_FLAGS_ENABLED |
> +                               CRAT_IOLINK_FLAGS_NO_ATOMICS_32_BIT |
> +                               CRAT_IOLINK_FLAGS_NO_ATOMICS_64_BIT;
> +}
> +
>  int kfd_topology_add_device(struct kfd_dev *gpu)
>  {
>         uint32_t gpu_id;
> @@ -939,6 +1006,9 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>         struct kfd_cu_info cu_info;
>         int res = 0;
>         struct list_head temp_topology_device_list;
> +       void *crat_image = NULL;
> +       size_t image_size = 0;
> +       int proximity_domain;
>
>         INIT_LIST_HEAD(&temp_topology_device_list);
>
> @@ -946,27 +1016,33 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>
>         pr_debug("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
>
> -       /*
> -        * Try to assign the GPU to existing topology device (generated from
> -        * CRAT table
> +       proximity_domain = atomic_inc_return(&topology_crat_proximity_domain);
> +
> +       /* Check to see if this gpu device exists in the topology_device_list.
> +        * If so, assign the gpu to that device,
> +        * else create a Virtual CRAT for this gpu device and then parse that
> +        * CRAT to create a new topology device. Once created assign the gpu to
> +        * that topology device
>          */
>         dev = kfd_assign_gpu(gpu);
>         if (!dev) {
> -               pr_info("GPU was not found in the current topology. Extending.\n");
> -               kfd_debug_print_topology();
> -               dev = kfd_create_topology_device(&temp_topology_device_list);
> -               if (!dev) {
> -                       res = -ENOMEM;
> +               res = kfd_create_crat_image_virtual(&crat_image, &image_size,
> +                                                   COMPUTE_UNIT_GPU, gpu,
> +                                                   proximity_domain);
> +               if (res) {
> +                       pr_err("Error creating VCRAT for GPU (ID: 0x%x)\n",
> +                              gpu_id);
> +                       return res;
> +               }
> +               res = kfd_parse_crat_table(crat_image,
> +                                          &temp_topology_device_list,
> +                                          proximity_domain);
> +               if (res) {
> +                       pr_err("Error parsing VCRAT for GPU (ID: 0x%x)\n",
> +                              gpu_id);
>                         goto err;
>                 }
>
> -               dev->gpu = gpu;
> -
> -               /*
> -                * TODO: Make a call to retrieve topology information from the
> -                * GPU vBIOS
> -                */
> -
>                 down_write(&topology_lock);
>                 kfd_topology_update_device_list(&temp_topology_device_list,
>                         &topology_device_list);
> @@ -974,34 +1050,86 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>                 /* Update the SYSFS tree, since we added another topology
>                  * device
>                  */
> -               if (kfd_topology_update_sysfs() < 0)
> -                       kfd_topology_release_sysfs();
> -
> +               res = kfd_topology_update_sysfs();
>                 up_write(&topology_lock);
>
> +               if (!res)
> +                       sys_props.generation_count++;
> +               else
> +                       pr_err("Failed to update GPU (ID: 0x%x) to sysfs topology. res=%d\n",
> +                                               gpu_id, res);
> +               dev = kfd_assign_gpu(gpu);
> +               if (WARN_ON(!dev)) {
> +                       res = -ENODEV;
> +                       goto err;
> +               }
>         }
>
>         dev->gpu_id = gpu_id;
>         gpu->id = gpu_id;
> +
> +       /* TODO: Move the following lines to function
> +        *      kfd_add_non_crat_information
> +        */
> +
> +       /* Fill-in additional information that is not available in CRAT but
> +        * needed for the topology
> +        */
> +
>         dev->gpu->kfd2kgd->get_cu_info(dev->gpu->kgd, &cu_info);
> -       dev->node_props.simd_count = dev->node_props.simd_per_cu *
> -                       cu_info.cu_active_number;
> +       dev->node_props.simd_arrays_per_engine =
> +               cu_info.num_shader_arrays_per_engine;
> +
>         dev->node_props.vendor_id = gpu->pdev->vendor;
>         dev->node_props.device_id = gpu->pdev->device;
>         dev->node_props.location_id = PCI_DEVID(gpu->pdev->bus->number,
>                 gpu->pdev->devfn);
> -       /*
> -        * TODO: Retrieve max engine clock values from KGD
> -        */
> -
> -       if (dev->gpu->device_info->asic_family == CHIP_CARRIZO) {
> -               dev->node_props.capability |= HSA_CAP_DOORBELL_PACKET_TYPE;
> +       dev->node_props.max_engine_clk_fcompute =
> +               dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
> +       dev->node_props.max_engine_clk_ccompute =
> +               cpufreq_quick_get_max(0) / 1000;
> +
> +       kfd_fill_mem_clk_max_info(dev);
> +       kfd_fill_iolink_non_crat_info(dev);
> +
> +       switch (dev->gpu->device_info->asic_family) {
> +       case CHIP_KAVERI:
> +       case CHIP_HAWAII:
> +       case CHIP_TONGA:
> +               dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_PRE_1_0 <<
> +                       HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
> +                       HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
> +               break;
> +       case CHIP_CARRIZO:
> +       case CHIP_FIJI:
> +       case CHIP_POLARIS10:
> +       case CHIP_POLARIS11:
>                 pr_debug("Adding doorbell packet type capability\n");
> +               dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_1_0 <<
> +                       HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
> +                       HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
> +               break;
> +       default:
> +               WARN(1, "Unexpected ASIC family %u",
> +                    dev->gpu->device_info->asic_family);
>         }
>
> +       /* Fix errors in CZ CRAT.
> +        * simd_count: Carrizo CRAT reports wrong simd_count, probably
> +        *              because it doesn't consider masked out CUs
> +        * capability flag: Carrizo CRAT doesn't report IOMMU
> +        *              flags. TODO: Fix this.
> +        */
> +       if (dev->gpu->device_info->asic_family == CHIP_CARRIZO)
> +               dev->node_props.simd_count =
> +                       cu_info.simd_per_cu * cu_info.cu_active_number;
> +
> +       kfd_debug_print_topology();
> +
>         if (!res)
>                 kfd_notify_gpu_change(gpu_id, 1);
>  err:
> +       kfd_destroy_crat_image(crat_image);
>         return res;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index 8668189..55de56f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -39,8 +39,12 @@
>  #define HSA_CAP_WATCH_POINTS_SUPPORTED         0x00000080
>  #define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK    0x00000f00
>  #define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT   8
> -#define HSA_CAP_RESERVED                       0xfffff000
> -#define HSA_CAP_DOORBELL_PACKET_TYPE           0x00001000
> +#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK   0x00003000
> +#define HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT  12
> +#define HSA_CAP_RESERVED                       0xffffc000
> +
> +#define HSA_CAP_DOORBELL_TYPE_PRE_1_0          0x0
> +#define HSA_CAP_DOORBELL_TYPE_1_0              0x1
>
>  struct kfd_node_properties {
>         uint32_t cpu_cores_count;
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]     ` <1512792555-26042-23-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-11 15:23       ` Oded Gabbay
       [not found]         ` <CAFCwf11fXHVD+OaQ3RXFNb+wttvqZM8b=7+UfhrGw_uSoiph0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-11 15:23 UTC (permalink / raw)
  To: Felix Kuehling, Alex Deucher, Christian König
  Cc: Amber Lin, Kent Russell, amd-gfx list

On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> From: Amber Lin <Amber.Lin@amd.com>
>
> For hardware blocks whose performance counters are accessed via MMIO
> registers, KFD provides the support for those privileged blocks. IOMMU is
> one of those privileged blocks. Most performance counter properties
> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>  This patch adds properties to topology in KFD sysfs for information not
> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
> For dGPUs, who don't have IOMMU, nothing appears under
> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.

I don't feel comfortable with this patch. It seems to me you didn't
have anywhere to put these counters so you just stuck them in a place
the thunk already reads because it was "convenient" for you to do it.
But, as you point out in a comment later, these counters have nothing
to do with topology.
So this just feels wrong and I would like to:

a. get additional opinions on it. Christian ? Alex ? What do you think
? How the GPU's GFX counters are exposed ?
b. Ask why not use IOCTL to get the counters ?

btw, I tried to search for other drivers that do this (expose perf
counters in sysfs) and didn't find any (it wasn't an exhaustive search
so I may have missed).

Thanks,
Oded




>
> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
> Signed-off-by: Kent Russell <kent.russell@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>  2 files changed, 127 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 7fe7ee0..52d20f5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>         struct kfd_mem_properties *mem;
>         struct kfd_cache_properties *cache;
>         struct kfd_iolink_properties *iolink;
> +       struct kfd_perf_properties *perf;
>
>         list_del(&dev->list);
>
> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>                 kfree(iolink);
>         }
>
> +       while (dev->perf_props.next != &dev->perf_props) {
> +               perf = container_of(dev->perf_props.next,
> +                               struct kfd_perf_properties, list);
> +               list_del(&perf->list);
> +               kfree(perf);
> +       }
> +
>         kfree(dev);
>  }
>
> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>         INIT_LIST_HEAD(&dev->mem_props);
>         INIT_LIST_HEAD(&dev->cache_props);
>         INIT_LIST_HEAD(&dev->io_link_props);
> +       INIT_LIST_HEAD(&dev->perf_props);
>
>         list_add_tail(&dev->list, device_list);
>
> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>         .sysfs_ops = &cache_ops,
>  };
>
> +/****** Sysfs of Performance Counters ******/
> +
> +struct kfd_perf_attr {
> +       struct kobj_attribute attr;
> +       uint32_t data;
> +};
> +
> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
> +                       char *buf)
> +{
> +       struct kfd_perf_attr *attr;
> +
> +       buf[0] = 0;
> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
> +       if (!attr->data) /* invalid data for PMC */
> +               return 0;
> +       else
> +               return sysfs_show_32bit_val(buf, attr->data);
> +}
> +
> +#define KFD_PERF_DESC(_name, _data)                    \
> +{                                                      \
> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
> +       .data = _data,                                  \
> +}
> +
> +static struct kfd_perf_attr perf_attr_iommu[] = {
> +       KFD_PERF_DESC(max_concurrent, 0),
> +       KFD_PERF_DESC(num_counters, 0),
> +       KFD_PERF_DESC(counter_ids, 0),
> +};
> +/****************************************/
> +
>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>                 char *buffer)
>  {
> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>         struct kfd_iolink_properties *iolink;
>         struct kfd_cache_properties *cache;
>         struct kfd_mem_properties *mem;
> +       struct kfd_perf_properties *perf;
>
>         if (dev->kobj_iolink) {
>                 list_for_each_entry(iolink, &dev->io_link_props, list)
> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>                 dev->kobj_mem = NULL;
>         }
>
> +       if (dev->kobj_perf) {
> +               list_for_each_entry(perf, &dev->perf_props, list) {
> +                       kfree(perf->attr_group);
> +                       perf->attr_group = NULL;
> +               }
> +               kobject_del(dev->kobj_perf);
> +               kobject_put(dev->kobj_perf);
> +               dev->kobj_perf = NULL;
> +       }
> +
>         if (dev->kobj_node) {
>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>         struct kfd_iolink_properties *iolink;
>         struct kfd_cache_properties *cache;
>         struct kfd_mem_properties *mem;
> +       struct kfd_perf_properties *perf;
>         int ret;
> -       uint32_t i;
> +       uint32_t i, num_attrs;
> +       struct attribute **attrs;
>
>         if (WARN_ON(dev->kobj_node))
>                 return -EEXIST;
> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>         if (!dev->kobj_iolink)
>                 return -ENOMEM;
>
> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
> +       if (!dev->kobj_perf)
> +               return -ENOMEM;
> +
>         /*
>          * Creating sysfs files for node properties
>          */
> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>                 if (ret < 0)
>                         return ret;
>                 i++;
> -}
> +       }
> +
> +       /* All hardware blocks have the same number of attributes. */
> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
> +       list_for_each_entry(perf, &dev->perf_props, list) {
> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
> +                       * num_attrs + sizeof(struct attribute_group),
> +                       GFP_KERNEL);
> +               if (!perf->attr_group)
> +                       return -ENOMEM;
> +
> +               attrs = (struct attribute **)(perf->attr_group + 1);
> +               if (!strcmp(perf->block_name, "iommu")) {
> +               /* Information of IOMMU's num_counters and counter_ids is shown
> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
> +                * duplicate here.
> +                */
> +                       perf_attr_iommu[0].data = perf->max_concurrent;
> +                       for (i = 0; i < num_attrs; i++)
> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
> +               }
> +               perf->attr_group->name = perf->block_name;
> +               perf->attr_group->attrs = attrs;
> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
> +               if (ret < 0)
> +                       return ret;
> +       }
>
>         return 0;
>  }
> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>                 }
>         }
>  }
> +
> +/*
> + * Performance counters information is not part of CRAT but we would like to
> + * put them in the sysfs under topology directory for Thunk to get the data.
> + * This function is called before updating the sysfs.
> + */
> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
> +{
> +       struct kfd_perf_properties *props;
> +
> +       if (amd_iommu_pc_supported()) {
> +               props = kfd_alloc_struct(props);
> +               if (!props)
> +                       return -ENOMEM;
> +               strcpy(props->block_name, "iommu");
> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
> +               list_add_tail(&props->list, &kdev->perf_props);
> +       }
> +
> +       return 0;
> +}
> +
>  /* kfd_add_non_crat_information - Add information that is not currently
>   *     defined in CRAT but is necessary for KFD topology
>   * @dev - topology device to which addition info is added
> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>                 }
>         }
>
> +       kdev = list_first_entry(&temp_topology_device_list,
> +                               struct kfd_topology_device, list);
> +       kfd_add_perf_to_topology(kdev);
> +
>         down_write(&topology_lock);
>         kfd_topology_update_device_list(&temp_topology_device_list,
>                                         &topology_device_list);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index 55de56f..b9f3142 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>         struct attribute        attr;
>  };
>
> +struct kfd_perf_properties {
> +       struct list_head        list;
> +       char                    block_name[16];
> +       uint32_t                max_concurrent;
> +       struct attribute_group  *attr_group;
> +};
> +
>  struct kfd_topology_device {
>         struct list_head                list;
>         uint32_t                        gpu_id;
> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>         struct list_head                cache_props;
>         uint32_t                        io_link_count;
>         struct list_head                io_link_props;
> +       struct list_head                perf_props;
>         struct kfd_dev                  *gpu;
>         struct kobject                  *kobj_node;
>         struct kobject                  *kobj_mem;
>         struct kobject                  *kobj_cache;
>         struct kobject                  *kobj_iolink;
> +       struct kobject                  *kobj_perf;
>         struct attribute                attr_gpuid;
>         struct attribute                attr_name;
>         struct attribute                attr_props;
> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>                 struct list_head *device_list);
>  void kfd_release_topology_device_list(struct list_head *device_list);
>
> +extern bool amd_iommu_pc_supported(void);
> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
> +
>  #endif /* __KFD_TOPOLOGY_H__ */
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]     ` <1512792555-26042-29-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-11 15:32       ` Oded Gabbay
       [not found]         ` <CAFCwf12qZXOyS3iwHd5WnBLKce9Kf+7gMu720uiN_TJSzuFngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-11 15:32 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Kent Russell, amd-gfx list

On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> From: Kent Russell <kent.russell@amd.com>
>
> Add a sysfs file in topology (node/x/memory_banks/X/used_memory) that
> reports the current VRAM usage for that node. Only works for GPU nodes
> at this time.
>

As with patch 22 (perf counters), I would not expect this information
to be included in the topology. It doesn't describe the properties of
the device, but a current state.
Oded

> Signed-off-by: Kent Russell <kent.russell@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49 +++++++++++++++++++++++++++----
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>  2 files changed, 46 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 7f0d41e..7f04038 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -186,6 +186,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>                 sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>  #define sysfs_show_32bit_val(buffer, value) \
>                 sysfs_show_gen_prop(buffer, "%u\n", value)
> +#define sysfs_show_64bit_val(buffer, value) \
> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>  #define sysfs_show_str_val(buffer, value) \
>                 sysfs_show_gen_prop(buffer, "%s\n", value)
>
> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
>  {
>         ssize_t ret;
>         struct kfd_mem_properties *mem;
> +       uint64_t used_mem;
>
>         /* Making sure that the buffer is an empty string */
>         buffer[0] = 0;
>
> -       mem = container_of(attr, struct kfd_mem_properties, attr);
> +       if (strcmp(attr->name, "used_memory") == 0) {
> +               mem = container_of(attr, struct kfd_mem_properties, attr_used);
> +               if (mem->gpu) {
> +                       used_mem = mem->gpu->kfd2kgd->get_vram_usage(
> +                                                               mem->gpu->kgd);
> +                       return sysfs_show_64bit_val(buffer, used_mem);
> +               }
> +               /* TODO: Report APU/CPU-allocated memory; For now return 0 */
> +               return 0;
> +       }
> +
> +       mem = container_of(attr, struct kfd_mem_properties, attr_props);
>         sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>         sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
>         sysfs_show_32bit_prop(buffer, "flags", mem->flags);
> @@ -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>         if (dev->kobj_mem) {
>                 list_for_each_entry(mem, &dev->mem_props, list)
>                         if (mem->kobj) {
> -                               kfd_remove_sysfs_file(mem->kobj, &mem->attr);
> +                               /* TODO: Remove when CPU/APU supported */
> +                               if (dev->node_props.cpu_cores_count == 0)
> +                                       sysfs_remove_file(mem->kobj,
> +                                                       &mem->attr_used);
> +                               kfd_remove_sysfs_file(mem->kobj,
> +                                               &mem->attr_props);
>                                 mem->kobj = NULL;
>                         }
>                 kobject_del(dev->kobj_mem);
> @@ -629,12 +648,23 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>                 if (ret < 0)
>                         return ret;
>
> -               mem->attr.name = "properties";
> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
> -               sysfs_attr_init(&mem->attr);
> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
> +               mem->attr_props.name = "properties";
> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
> +               sysfs_attr_init(&mem->attr_props);
> +               ret = sysfs_create_file(mem->kobj, &mem->attr_props);
>                 if (ret < 0)
>                         return ret;
> +
> +               /* TODO: Support APU/CPU memory usage */
> +               if (dev->node_props.cpu_cores_count == 0) {
> +                       mem->attr_used.name = "used_memory";
> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
> +                       sysfs_attr_init(&mem->attr_used);
> +                       ret = sysfs_create_file(mem->kobj, &mem->attr_used);
> +                       if (ret < 0)
> +                               return ret;
> +               }
> +
>                 i++;
>         }
>
> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>  {
>         struct kfd_topology_device *dev;
>         struct kfd_topology_device *out_dev = NULL;
> +       struct kfd_mem_properties *mem;
>
>         down_write(&topology_lock);
>         list_for_each_entry(dev, &topology_device_list, list)
>                 if (!dev->gpu && (dev->node_props.simd_count > 0)) {
>                         dev->gpu = gpu;
>                         out_dev = dev;
> +
> +                       /* Assign mem->gpu */
> +                       list_for_each_entry(mem, &dev->mem_props, list)
> +                               mem->gpu = dev->gpu;
> +
>                         break;
>                 }
>         up_write(&topology_lock);
> +
>         return out_dev;
>  }
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index 53fca1f..0f698d8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>         uint32_t                width;
>         uint32_t                mem_clk_max;
>         struct kobject          *kobj;
> -       struct attribute        attr;
> +       struct kfd_dev          *gpu;
> +       struct attribute        attr_props;
> +       struct attribute        attr_used;
>  };
>
>  #define HSA_CACHE_TYPE_DATA            0x00000001
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]         ` <CAFCwf11fXHVD+OaQ3RXFNb+wttvqZM8b=7+UfhrGw_uSoiph0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-11 15:47           ` Alex Deucher
       [not found]             ` <CADnq5_OAhLbnA+07J+T+V82=Q0SZwB4r8zUv7D+KDXrqV==+GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-12-11 15:53           ` Christian König
  2017-12-11 19:54           ` Felix Kuehling
  2 siblings, 1 reply; 62+ messages in thread
From: Alex Deucher @ 2017-12-11 15:47 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Amber Lin, Felix Kuehling, amd-gfx list, Kent Russell,
	Alex Deucher, Christian König

On Mon, Dec 11, 2017 at 10:23 AM, Oded Gabbay <oded.gabbay@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> From: Amber Lin <Amber.Lin@amd.com>
>>
>> For hardware blocks whose performance counters are accessed via MMIO
>> registers, KFD provides the support for those privileged blocks. IOMMU is
>> one of those privileged blocks. Most performance counter properties
>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>  This patch adds properties to topology in KFD sysfs for information not
>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>> For dGPUs, who don't have IOMMU, nothing appears under
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>
> I don't feel comfortable with this patch. It seems to me you didn't
> have anywhere to put these counters so you just stuck them in a place
> the thunk already reads because it was "convenient" for you to do it.
> But, as you point out in a comment later, these counters have nothing
> to do with topology.
> So this just feels wrong and I would like to:
>
> a. get additional opinions on it. Christian ? Alex ? What do you think
> ? How the GPU's GFX counters are exposed ?

They are handled as part of the command stream and exposed via OpenGL
extensions.

> b. Ask why not use IOCTL to get the counters ?

I'm not sure that is really any better.

>
> btw, I tried to search for other drivers that do this (expose perf
> counters in sysfs) and didn't find any (it wasn't an exhaustive search
> so I may have missed).

There is the perf subsystem, but I'm not sure how well it works for
device counters and tying the sampling to events and such.

Alex

>
> Thanks,
> Oded
>
>
>
>
>>
>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>  2 files changed, 127 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 7fe7ee0..52d20f5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>         struct kfd_mem_properties *mem;
>>         struct kfd_cache_properties *cache;
>>         struct kfd_iolink_properties *iolink;
>> +       struct kfd_perf_properties *perf;
>>
>>         list_del(&dev->list);
>>
>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>                 kfree(iolink);
>>         }
>>
>> +       while (dev->perf_props.next != &dev->perf_props) {
>> +               perf = container_of(dev->perf_props.next,
>> +                               struct kfd_perf_properties, list);
>> +               list_del(&perf->list);
>> +               kfree(perf);
>> +       }
>> +
>>         kfree(dev);
>>  }
>>
>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>         INIT_LIST_HEAD(&dev->mem_props);
>>         INIT_LIST_HEAD(&dev->cache_props);
>>         INIT_LIST_HEAD(&dev->io_link_props);
>> +       INIT_LIST_HEAD(&dev->perf_props);
>>
>>         list_add_tail(&dev->list, device_list);
>>
>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>         .sysfs_ops = &cache_ops,
>>  };
>>
>> +/****** Sysfs of Performance Counters ******/
>> +
>> +struct kfd_perf_attr {
>> +       struct kobj_attribute attr;
>> +       uint32_t data;
>> +};
>> +
>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>> +                       char *buf)
>> +{
>> +       struct kfd_perf_attr *attr;
>> +
>> +       buf[0] = 0;
>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>> +       if (!attr->data) /* invalid data for PMC */
>> +               return 0;
>> +       else
>> +               return sysfs_show_32bit_val(buf, attr->data);
>> +}
>> +
>> +#define KFD_PERF_DESC(_name, _data)                    \
>> +{                                                      \
>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>> +       .data = _data,                                  \
>> +}
>> +
>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>> +       KFD_PERF_DESC(max_concurrent, 0),
>> +       KFD_PERF_DESC(num_counters, 0),
>> +       KFD_PERF_DESC(counter_ids, 0),
>> +};
>> +/****************************************/
>> +
>>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>                 char *buffer)
>>  {
>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>         struct kfd_iolink_properties *iolink;
>>         struct kfd_cache_properties *cache;
>>         struct kfd_mem_properties *mem;
>> +       struct kfd_perf_properties *perf;
>>
>>         if (dev->kobj_iolink) {
>>                 list_for_each_entry(iolink, &dev->io_link_props, list)
>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>                 dev->kobj_mem = NULL;
>>         }
>>
>> +       if (dev->kobj_perf) {
>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>> +                       kfree(perf->attr_group);
>> +                       perf->attr_group = NULL;
>> +               }
>> +               kobject_del(dev->kobj_perf);
>> +               kobject_put(dev->kobj_perf);
>> +               dev->kobj_perf = NULL;
>> +       }
>> +
>>         if (dev->kobj_node) {
>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>         struct kfd_iolink_properties *iolink;
>>         struct kfd_cache_properties *cache;
>>         struct kfd_mem_properties *mem;
>> +       struct kfd_perf_properties *perf;
>>         int ret;
>> -       uint32_t i;
>> +       uint32_t i, num_attrs;
>> +       struct attribute **attrs;
>>
>>         if (WARN_ON(dev->kobj_node))
>>                 return -EEXIST;
>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>         if (!dev->kobj_iolink)
>>                 return -ENOMEM;
>>
>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>> +       if (!dev->kobj_perf)
>> +               return -ENOMEM;
>> +
>>         /*
>>          * Creating sysfs files for node properties
>>          */
>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>                 if (ret < 0)
>>                         return ret;
>>                 i++;
>> -}
>> +       }
>> +
>> +       /* All hardware blocks have the same number of attributes. */
>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>> +                       * num_attrs + sizeof(struct attribute_group),
>> +                       GFP_KERNEL);
>> +               if (!perf->attr_group)
>> +                       return -ENOMEM;
>> +
>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>> +               if (!strcmp(perf->block_name, "iommu")) {
>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>> +                * duplicate here.
>> +                */
>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>> +                       for (i = 0; i < num_attrs; i++)
>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>> +               }
>> +               perf->attr_group->name = perf->block_name;
>> +               perf->attr_group->attrs = attrs;
>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>> +               if (ret < 0)
>> +                       return ret;
>> +       }
>>
>>         return 0;
>>  }
>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>                 }
>>         }
>>  }
>> +
>> +/*
>> + * Performance counters information is not part of CRAT but we would like to
>> + * put them in the sysfs under topology directory for Thunk to get the data.
>> + * This function is called before updating the sysfs.
>> + */
>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>> +{
>> +       struct kfd_perf_properties *props;
>> +
>> +       if (amd_iommu_pc_supported()) {
>> +               props = kfd_alloc_struct(props);
>> +               if (!props)
>> +                       return -ENOMEM;
>> +               strcpy(props->block_name, "iommu");
>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>> +               list_add_tail(&props->list, &kdev->perf_props);
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>  /* kfd_add_non_crat_information - Add information that is not currently
>>   *     defined in CRAT but is necessary for KFD topology
>>   * @dev - topology device to which addition info is added
>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>                 }
>>         }
>>
>> +       kdev = list_first_entry(&temp_topology_device_list,
>> +                               struct kfd_topology_device, list);
>> +       kfd_add_perf_to_topology(kdev);
>> +
>>         down_write(&topology_lock);
>>         kfd_topology_update_device_list(&temp_topology_device_list,
>>                                         &topology_device_list);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index 55de56f..b9f3142 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>         struct attribute        attr;
>>  };
>>
>> +struct kfd_perf_properties {
>> +       struct list_head        list;
>> +       char                    block_name[16];
>> +       uint32_t                max_concurrent;
>> +       struct attribute_group  *attr_group;
>> +};
>> +
>>  struct kfd_topology_device {
>>         struct list_head                list;
>>         uint32_t                        gpu_id;
>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>         struct list_head                cache_props;
>>         uint32_t                        io_link_count;
>>         struct list_head                io_link_props;
>> +       struct list_head                perf_props;
>>         struct kfd_dev                  *gpu;
>>         struct kobject                  *kobj_node;
>>         struct kobject                  *kobj_mem;
>>         struct kobject                  *kobj_cache;
>>         struct kobject                  *kobj_iolink;
>> +       struct kobject                  *kobj_perf;
>>         struct attribute                attr_gpuid;
>>         struct attribute                attr_name;
>>         struct attribute                attr_props;
>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>                 struct list_head *device_list);
>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>
>> +extern bool amd_iommu_pc_supported(void);
>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>> +
>>  #endif /* __KFD_TOPOLOGY_H__ */
>> --
>> 2.7.4
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]         ` <CAFCwf11fXHVD+OaQ3RXFNb+wttvqZM8b=7+UfhrGw_uSoiph0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-12-11 15:47           ` Alex Deucher
@ 2017-12-11 15:53           ` Christian König
       [not found]             ` <79968e85-d5dd-59ad-c6c1-9ca6f93a3c37-5C7GfCeVMHo@public.gmane.org>
  2017-12-11 19:54           ` Felix Kuehling
  2 siblings, 1 reply; 62+ messages in thread
From: Christian König @ 2017-12-11 15:53 UTC (permalink / raw)
  To: Oded Gabbay, Felix Kuehling, Alex Deucher
  Cc: Amber Lin, Kent Russell, amd-gfx list

Am 11.12.2017 um 16:23 schrieb Oded Gabbay:
> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> From: Amber Lin <Amber.Lin@amd.com>
>>
>> For hardware blocks whose performance counters are accessed via MMIO
>> registers, KFD provides the support for those privileged blocks. IOMMU is
>> one of those privileged blocks. Most performance counter properties
>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>   This patch adds properties to topology in KFD sysfs for information not
>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>> For dGPUs, who don't have IOMMU, nothing appears under
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
> I don't feel comfortable with this patch. It seems to me you didn't
> have anywhere to put these counters so you just stuck them in a place
> the thunk already reads because it was "convenient" for you to do it.
> But, as you point out in a comment later, these counters have nothing
> to do with topology.
> So this just feels wrong and I would like to:
>
> a. get additional opinions on it. Christian ? Alex ? What do you think
> ?

I agree that this looks odd.

But that any device specific informations show up outside of 
/sys/devices/pci0000:00... is strange to start with.

Please don't tell me that we have build up a secondary topology parallel 
to the linux device tree here?

In other word for my Vega10 the real subdirectory for any device 
specific config is:
./devices/pci0000:00/0000:00:02.1/0000:01:00.0/0000:02:00.0/0000:03:00.0

With the following files as symlink to it:
./bus/pci/devices/0000:03:00.0
./bus/pci/drivers/amdgpu/0000:03:00.0

>   How the GPU's GFX counters are exposed ?

We discussed that internally but haven't decided on anything AFAIK.

> b. Ask why not use IOCTL to get the counters ?

Per process counters are directly readable inside the command submission 
affecting them.

Only the timer tick is exposed as IOCTL as well IIRC.

Regards,
Christian.

>
> btw, I tried to search for other drivers that do this (expose perf
> counters in sysfs) and didn't find any (it wasn't an exhaustive search
> so I may have missed).
>
> Thanks,
> Oded
>
>
>
>
>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>   2 files changed, 127 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 7fe7ee0..52d20f5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>          struct kfd_mem_properties *mem;
>>          struct kfd_cache_properties *cache;
>>          struct kfd_iolink_properties *iolink;
>> +       struct kfd_perf_properties *perf;
>>
>>          list_del(&dev->list);
>>
>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>                  kfree(iolink);
>>          }
>>
>> +       while (dev->perf_props.next != &dev->perf_props) {
>> +               perf = container_of(dev->perf_props.next,
>> +                               struct kfd_perf_properties, list);
>> +               list_del(&perf->list);
>> +               kfree(perf);
>> +       }
>> +
>>          kfree(dev);
>>   }
>>
>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>          INIT_LIST_HEAD(&dev->mem_props);
>>          INIT_LIST_HEAD(&dev->cache_props);
>>          INIT_LIST_HEAD(&dev->io_link_props);
>> +       INIT_LIST_HEAD(&dev->perf_props);
>>
>>          list_add_tail(&dev->list, device_list);
>>
>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>          .sysfs_ops = &cache_ops,
>>   };
>>
>> +/****** Sysfs of Performance Counters ******/
>> +
>> +struct kfd_perf_attr {
>> +       struct kobj_attribute attr;
>> +       uint32_t data;
>> +};
>> +
>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>> +                       char *buf)
>> +{
>> +       struct kfd_perf_attr *attr;
>> +
>> +       buf[0] = 0;
>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>> +       if (!attr->data) /* invalid data for PMC */
>> +               return 0;
>> +       else
>> +               return sysfs_show_32bit_val(buf, attr->data);
>> +}
>> +
>> +#define KFD_PERF_DESC(_name, _data)                    \
>> +{                                                      \
>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>> +       .data = _data,                                  \
>> +}
>> +
>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>> +       KFD_PERF_DESC(max_concurrent, 0),
>> +       KFD_PERF_DESC(num_counters, 0),
>> +       KFD_PERF_DESC(counter_ids, 0),
>> +};
>> +/****************************************/
>> +
>>   static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>                  char *buffer)
>>   {
>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>          struct kfd_iolink_properties *iolink;
>>          struct kfd_cache_properties *cache;
>>          struct kfd_mem_properties *mem;
>> +       struct kfd_perf_properties *perf;
>>
>>          if (dev->kobj_iolink) {
>>                  list_for_each_entry(iolink, &dev->io_link_props, list)
>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>                  dev->kobj_mem = NULL;
>>          }
>>
>> +       if (dev->kobj_perf) {
>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>> +                       kfree(perf->attr_group);
>> +                       perf->attr_group = NULL;
>> +               }
>> +               kobject_del(dev->kobj_perf);
>> +               kobject_put(dev->kobj_perf);
>> +               dev->kobj_perf = NULL;
>> +       }
>> +
>>          if (dev->kobj_node) {
>>                  sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>                  sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>          struct kfd_iolink_properties *iolink;
>>          struct kfd_cache_properties *cache;
>>          struct kfd_mem_properties *mem;
>> +       struct kfd_perf_properties *perf;
>>          int ret;
>> -       uint32_t i;
>> +       uint32_t i, num_attrs;
>> +       struct attribute **attrs;
>>
>>          if (WARN_ON(dev->kobj_node))
>>                  return -EEXIST;
>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>          if (!dev->kobj_iolink)
>>                  return -ENOMEM;
>>
>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>> +       if (!dev->kobj_perf)
>> +               return -ENOMEM;
>> +
>>          /*
>>           * Creating sysfs files for node properties
>>           */
>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>                  if (ret < 0)
>>                          return ret;
>>                  i++;
>> -}
>> +       }
>> +
>> +       /* All hardware blocks have the same number of attributes. */
>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>> +                       * num_attrs + sizeof(struct attribute_group),
>> +                       GFP_KERNEL);
>> +               if (!perf->attr_group)
>> +                       return -ENOMEM;
>> +
>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>> +               if (!strcmp(perf->block_name, "iommu")) {
>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>> +                * duplicate here.
>> +                */
>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>> +                       for (i = 0; i < num_attrs; i++)
>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>> +               }
>> +               perf->attr_group->name = perf->block_name;
>> +               perf->attr_group->attrs = attrs;
>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>> +               if (ret < 0)
>> +                       return ret;
>> +       }
>>
>>          return 0;
>>   }
>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>                  }
>>          }
>>   }
>> +
>> +/*
>> + * Performance counters information is not part of CRAT but we would like to
>> + * put them in the sysfs under topology directory for Thunk to get the data.
>> + * This function is called before updating the sysfs.
>> + */
>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>> +{
>> +       struct kfd_perf_properties *props;
>> +
>> +       if (amd_iommu_pc_supported()) {
>> +               props = kfd_alloc_struct(props);
>> +               if (!props)
>> +                       return -ENOMEM;
>> +               strcpy(props->block_name, "iommu");
>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>> +               list_add_tail(&props->list, &kdev->perf_props);
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   /* kfd_add_non_crat_information - Add information that is not currently
>>    *     defined in CRAT but is necessary for KFD topology
>>    * @dev - topology device to which addition info is added
>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>                  }
>>          }
>>
>> +       kdev = list_first_entry(&temp_topology_device_list,
>> +                               struct kfd_topology_device, list);
>> +       kfd_add_perf_to_topology(kdev);
>> +
>>          down_write(&topology_lock);
>>          kfd_topology_update_device_list(&temp_topology_device_list,
>>                                          &topology_device_list);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index 55de56f..b9f3142 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>          struct attribute        attr;
>>   };
>>
>> +struct kfd_perf_properties {
>> +       struct list_head        list;
>> +       char                    block_name[16];
>> +       uint32_t                max_concurrent;
>> +       struct attribute_group  *attr_group;
>> +};
>> +
>>   struct kfd_topology_device {
>>          struct list_head                list;
>>          uint32_t                        gpu_id;
>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>          struct list_head                cache_props;
>>          uint32_t                        io_link_count;
>>          struct list_head                io_link_props;
>> +       struct list_head                perf_props;
>>          struct kfd_dev                  *gpu;
>>          struct kobject                  *kobj_node;
>>          struct kobject                  *kobj_mem;
>>          struct kobject                  *kobj_cache;
>>          struct kobject                  *kobj_iolink;
>> +       struct kobject                  *kobj_perf;
>>          struct attribute                attr_gpuid;
>>          struct attribute                attr_name;
>>          struct attribute                attr_props;
>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>                  struct list_head *device_list);
>>   void kfd_release_topology_device_list(struct list_head *device_list);
>>
>> +extern bool amd_iommu_pc_supported(void);
>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>> +
>>   #endif /* __KFD_TOPOLOGY_H__ */
>> --
>> 2.7.4
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 21/37] drm/amdkfd: Add topology support for dGPUs
       [not found]         ` <CAFCwf10_AOKQaQU31Mnn+2fO=awvO-DWaM7bTzO-khjk=yw+8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-11 16:20           ` Felix Kuehling
  0 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2017-12-11 16:20 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Jay Cornwall, Amber Lin, Ben Goz, Harish Kasiviswanathan,
	amd-gfx list, Kent Russell

On 2017-12-11 09:46 AM, Oded Gabbay wrote:
>> +
>> +       /* Fill in IOLINK subtype.
>> +        * TODO: Fill-in other fields of iolink subtype
>> +        */
>> +       sub_type_hdr->io_interface_type = CRAT_IOLINK_TYPE_PCIEXPRESS;
>> +       sub_type_hdr->proximity_domain_from = proximity_domain;
>> +#ifdef CONFIG_NUMA
>> +       if (kdev->pdev->dev.numa_node == NUMA_NO_NODE)
> Had to add #include <linux/pci.h> at the head of the file to make this
> line compile
Thanks for catching that.

I removed some "unnecessary" includes as I was splitting a larger patch
into CPU and dGPU support for topology. I think the kernel config I used
for testing that didn't have NUMA enable for some reason, so I didn't
catch this.

Regards,
  Felix

>
> Oded
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]             ` <79968e85-d5dd-59ad-c6c1-9ca6f93a3c37-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-11 16:32               ` Oded Gabbay
       [not found]                 ` <CAFCwf10FhChovT58CFzb0LEVGG4jZBocdbb_57iSRj9GnenGyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-11 16:32 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Deucher, Amber Lin, Felix Kuehling, Kent Russell, amd-gfx list

On Mon, Dec 11, 2017 at 5:53 PM, Christian König
<christian.koenig@amd.com> wrote:
> Am 11.12.2017 um 16:23 schrieb Oded Gabbay:
>>
>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com>
>> wrote:
>>>
>>> From: Amber Lin <Amber.Lin@amd.com>
>>>
>>> For hardware blocks whose performance counters are accessed via MMIO
>>> registers, KFD provides the support for those privileged blocks. IOMMU is
>>> one of those privileged blocks. Most performance counter properties
>>> required by Thunk are available at
>>> /sys/bus/event_source/devices/amd_iommu.
>>>   This patch adds properties to topology in KFD sysfs for information not
>>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>,
>>> i.e.
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>>> For dGPUs, who don't have IOMMU, nothing appears under
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>>
>> I don't feel comfortable with this patch. It seems to me you didn't
>> have anywhere to put these counters so you just stuck them in a place
>> the thunk already reads because it was "convenient" for you to do it.
>> But, as you point out in a comment later, these counters have nothing
>> to do with topology.
>> So this just feels wrong and I would like to:
>>
>> a. get additional opinions on it. Christian ? Alex ? What do you think
>> ?
>
>
> I agree that this looks odd.
>
> But that any device specific informations show up outside of
> /sys/devices/pci0000:00... is strange to start with.
>
> Please don't tell me that we have build up a secondary topology parallel to
> the linux device tree here?

I hate to disappoint but the answer is that we do have a secondary
topology dedicated to kfd under
/sys/devices/virtual/kfd/kfd/topology/nodes/X

This is part of the original design of the driver and it wasn't
trivial to upstream, but at the end of the day it got upstreamed.
I think the base argument was (and still is) that we expose a single
char-dev for ALL the GPUs/APUs that are present in the system, and
therefore, the thunk layer should get that information in one single
place under the kfd driver folder in sysfs.
I guess we could have done things differently, but that would have
required more "integration" with the gfx driver at the time, which
some people might have been thinking, at the time, to be more
difficult then write write our own implementation.
Anyway, that's all in the past and I doubt it will now change
unless/until amdkfd is abolished and all functionality will move to
amdgpu.

Thanks,
Oded
>
> In other word for my Vega10 the real subdirectory for any device specific
> config is:
> ./devices/pci0000:00/0000:00:02.1/0000:01:00.0/0000:02:00.0/0000:03:00.0
>
> With the following files as symlink to it:
> ./bus/pci/devices/0000:03:00.0
> ./bus/pci/drivers/amdgpu/0000:03:00.0
>
>>   How the GPU's GFX counters are exposed ?
>
>
> We discussed that internally but haven't decided on anything AFAIK.
>
>> b. Ask why not use IOCTL to get the counters ?
>
>
> Per process counters are directly readable inside the command submission
> affecting them.
>
> Only the timer tick is exposed as IOCTL as well IIRC.
>
> Regards,
> Christian.
>
>
>>
>> btw, I tried to search for other drivers that do this (expose perf
>> counters in sysfs) and didn't find any (it wasn't an exhaustive search
>> so I may have missed).
>>
>> Thanks,
>> Oded
>>
>>
>>
>>
>>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116
>>> +++++++++++++++++++++++++++++-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>>   2 files changed, 127 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 7fe7ee0..52d20f5 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct
>>> kfd_topology_device *dev)
>>>          struct kfd_mem_properties *mem;
>>>          struct kfd_cache_properties *cache;
>>>          struct kfd_iolink_properties *iolink;
>>> +       struct kfd_perf_properties *perf;
>>>
>>>          list_del(&dev->list);
>>>
>>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct
>>> kfd_topology_device *dev)
>>>                  kfree(iolink);
>>>          }
>>>
>>> +       while (dev->perf_props.next != &dev->perf_props) {
>>> +               perf = container_of(dev->perf_props.next,
>>> +                               struct kfd_perf_properties, list);
>>> +               list_del(&perf->list);
>>> +               kfree(perf);
>>> +       }
>>> +
>>>          kfree(dev);
>>>   }
>>>
>>> @@ -162,6 +170,7 @@ struct kfd_topology_device
>>> *kfd_create_topology_device(
>>>          INIT_LIST_HEAD(&dev->mem_props);
>>>          INIT_LIST_HEAD(&dev->cache_props);
>>>          INIT_LIST_HEAD(&dev->io_link_props);
>>> +       INIT_LIST_HEAD(&dev->perf_props);
>>>
>>>          list_add_tail(&dev->list, device_list);
>>>
>>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>>          .sysfs_ops = &cache_ops,
>>>   };
>>>
>>> +/****** Sysfs of Performance Counters ******/
>>> +
>>> +struct kfd_perf_attr {
>>> +       struct kobj_attribute attr;
>>> +       uint32_t data;
>>> +};
>>> +
>>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute
>>> *attrs,
>>> +                       char *buf)
>>> +{
>>> +       struct kfd_perf_attr *attr;
>>> +
>>> +       buf[0] = 0;
>>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>>> +       if (!attr->data) /* invalid data for PMC */
>>> +               return 0;
>>> +       else
>>> +               return sysfs_show_32bit_val(buf, attr->data);
>>> +}
>>> +
>>> +#define KFD_PERF_DESC(_name, _data)                    \
>>> +{                                                      \
>>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>>> +       .data = _data,                                  \
>>> +}
>>> +
>>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>>> +       KFD_PERF_DESC(max_concurrent, 0),
>>> +       KFD_PERF_DESC(num_counters, 0),
>>> +       KFD_PERF_DESC(counter_ids, 0),
>>> +};
>>> +/****************************************/
>>> +
>>>   static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>>                  char *buffer)
>>>   {
>>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct
>>> kfd_topology_device *dev)
>>>          struct kfd_iolink_properties *iolink;
>>>          struct kfd_cache_properties *cache;
>>>          struct kfd_mem_properties *mem;
>>> +       struct kfd_perf_properties *perf;
>>>
>>>          if (dev->kobj_iolink) {
>>>                  list_for_each_entry(iolink, &dev->io_link_props, list)
>>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct
>>> kfd_topology_device *dev)
>>>                  dev->kobj_mem = NULL;
>>>          }
>>>
>>> +       if (dev->kobj_perf) {
>>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>>> +                       kfree(perf->attr_group);
>>> +                       perf->attr_group = NULL;
>>> +               }
>>> +               kobject_del(dev->kobj_perf);
>>> +               kobject_put(dev->kobj_perf);
>>> +               dev->kobj_perf = NULL;
>>> +       }
>>> +
>>>          if (dev->kobj_node) {
>>>                  sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>>                  sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct
>>> kfd_topology_device *dev,
>>>          struct kfd_iolink_properties *iolink;
>>>          struct kfd_cache_properties *cache;
>>>          struct kfd_mem_properties *mem;
>>> +       struct kfd_perf_properties *perf;
>>>          int ret;
>>> -       uint32_t i;
>>> +       uint32_t i, num_attrs;
>>> +       struct attribute **attrs;
>>>
>>>          if (WARN_ON(dev->kobj_node))
>>>                  return -EEXIST;
>>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct
>>> kfd_topology_device *dev,
>>>          if (!dev->kobj_iolink)
>>>                  return -ENOMEM;
>>>
>>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>>> +       if (!dev->kobj_perf)
>>> +               return -ENOMEM;
>>> +
>>>          /*
>>>           * Creating sysfs files for node properties
>>>           */
>>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct
>>> kfd_topology_device *dev,
>>>                  if (ret < 0)
>>>                          return ret;
>>>                  i++;
>>> -}
>>> +       }
>>> +
>>> +       /* All hardware blocks have the same number of attributes. */
>>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>>> +                       * num_attrs + sizeof(struct attribute_group),
>>> +                       GFP_KERNEL);
>>> +               if (!perf->attr_group)
>>> +                       return -ENOMEM;
>>> +
>>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>>> +               if (!strcmp(perf->block_name, "iommu")) {
>>> +               /* Information of IOMMU's num_counters and counter_ids is
>>> shown
>>> +                * under /sys/bus/event_source/devices/amd_iommu. We
>>> don't
>>> +                * duplicate here.
>>> +                */
>>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>>> +                       for (i = 0; i < num_attrs; i++)
>>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>>> +               }
>>> +               perf->attr_group->name = perf->block_name;
>>> +               perf->attr_group->attrs = attrs;
>>> +               ret = sysfs_create_group(dev->kobj_perf,
>>> perf->attr_group);
>>> +               if (ret < 0)
>>> +                       return ret;
>>> +       }
>>>
>>>          return 0;
>>>   }
>>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct
>>> dmi_header *dm,
>>>                  }
>>>          }
>>>   }
>>> +
>>> +/*
>>> + * Performance counters information is not part of CRAT but we would
>>> like to
>>> + * put them in the sysfs under topology directory for Thunk to get the
>>> data.
>>> + * This function is called before updating the sysfs.
>>> + */
>>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>> +{
>>> +       struct kfd_perf_properties *props;
>>> +
>>> +       if (amd_iommu_pc_supported()) {
>>> +               props = kfd_alloc_struct(props);
>>> +               if (!props)
>>> +                       return -ENOMEM;
>>> +               strcpy(props->block_name, "iommu");
>>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>> +                       amd_iommu_pc_get_max_counters(0); /* assume one
>>> iommu */
>>> +               list_add_tail(&props->list, &kdev->perf_props);
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>   /* kfd_add_non_crat_information - Add information that is not currently
>>>    *     defined in CRAT but is necessary for KFD topology
>>>    * @dev - topology device to which addition info is added
>>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>>                  }
>>>          }
>>>
>>> +       kdev = list_first_entry(&temp_topology_device_list,
>>> +                               struct kfd_topology_device, list);
>>> +       kfd_add_perf_to_topology(kdev);
>>> +
>>>          down_write(&topology_lock);
>>>          kfd_topology_update_device_list(&temp_topology_device_list,
>>>                                          &topology_device_list);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> index 55de56f..b9f3142 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>>          struct attribute        attr;
>>>   };
>>>
>>> +struct kfd_perf_properties {
>>> +       struct list_head        list;
>>> +       char                    block_name[16];
>>> +       uint32_t                max_concurrent;
>>> +       struct attribute_group  *attr_group;
>>> +};
>>> +
>>>   struct kfd_topology_device {
>>>          struct list_head                list;
>>>          uint32_t                        gpu_id;
>>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>>          struct list_head                cache_props;
>>>          uint32_t                        io_link_count;
>>>          struct list_head                io_link_props;
>>> +       struct list_head                perf_props;
>>>          struct kfd_dev                  *gpu;
>>>          struct kobject                  *kobj_node;
>>>          struct kobject                  *kobj_mem;
>>>          struct kobject                  *kobj_cache;
>>>          struct kobject                  *kobj_iolink;
>>> +       struct kobject                  *kobj_perf;
>>>          struct attribute                attr_gpuid;
>>>          struct attribute                attr_name;
>>>          struct attribute                attr_props;
>>> @@ -173,4 +182,8 @@ struct kfd_topology_device
>>> *kfd_create_topology_device(
>>>                  struct list_head *device_list);
>>>   void kfd_release_topology_device_list(struct list_head *device_list);
>>>
>>> +extern bool amd_iommu_pc_supported(void);
>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>> +
>>>   #endif /* __KFD_TOPOLOGY_H__ */
>>> --
>>> 2.7.4
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]         ` <CAFCwf12qZXOyS3iwHd5WnBLKce9Kf+7gMu720uiN_TJSzuFngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-11 16:40           ` Oded Gabbay
       [not found]             ` <CAFCwf129dU00ocL=btDWh0bjHVaX-JsqWcwAr1uLpJmvj1gkKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-11 16:40 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Kent Russell, amd-gfx list

On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay@gmail.com> wrote:
> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> From: Kent Russell <kent.russell@amd.com>
>>
>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory) that
>> reports the current VRAM usage for that node. Only works for GPU nodes
>> at this time.
>>
>
> As with patch 22 (perf counters), I would not expect this information
> to be included in the topology. It doesn't describe the properties of
> the device, but a current state.
> Oded

For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL
(AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()

Thanks,

Oded


>
>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49 +++++++++++++++++++++++++++----
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>  2 files changed, 46 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 7f0d41e..7f04038 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -186,6 +186,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>                 sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>>  #define sysfs_show_32bit_val(buffer, value) \
>>                 sysfs_show_gen_prop(buffer, "%u\n", value)
>> +#define sysfs_show_64bit_val(buffer, value) \
>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>  #define sysfs_show_str_val(buffer, value) \
>>                 sysfs_show_gen_prop(buffer, "%s\n", value)
>>
>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
>>  {
>>         ssize_t ret;
>>         struct kfd_mem_properties *mem;
>> +       uint64_t used_mem;
>>
>>         /* Making sure that the buffer is an empty string */
>>         buffer[0] = 0;
>>
>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>> +       if (strcmp(attr->name, "used_memory") == 0) {
>> +               mem = container_of(attr, struct kfd_mem_properties, attr_used);
>> +               if (mem->gpu) {
>> +                       used_mem = mem->gpu->kfd2kgd->get_vram_usage(
>> +                                                               mem->gpu->kgd);
>> +                       return sysfs_show_64bit_val(buffer, used_mem);
>> +               }
>> +               /* TODO: Report APU/CPU-allocated memory; For now return 0 */
>> +               return 0;
>> +       }
>> +
>> +       mem = container_of(attr, struct kfd_mem_properties, attr_props);
>>         sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>>         sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
>>         sysfs_show_32bit_prop(buffer, "flags", mem->flags);
>> @@ -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>         if (dev->kobj_mem) {
>>                 list_for_each_entry(mem, &dev->mem_props, list)
>>                         if (mem->kobj) {
>> -                               kfd_remove_sysfs_file(mem->kobj, &mem->attr);
>> +                               /* TODO: Remove when CPU/APU supported */
>> +                               if (dev->node_props.cpu_cores_count == 0)
>> +                                       sysfs_remove_file(mem->kobj,
>> +                                                       &mem->attr_used);
>> +                               kfd_remove_sysfs_file(mem->kobj,
>> +                                               &mem->attr_props);
>>                                 mem->kobj = NULL;
>>                         }
>>                 kobject_del(dev->kobj_mem);
>> @@ -629,12 +648,23 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>                 if (ret < 0)
>>                         return ret;
>>
>> -               mem->attr.name = "properties";
>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>> -               sysfs_attr_init(&mem->attr);
>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>> +               mem->attr_props.name = "properties";
>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>> +               sysfs_attr_init(&mem->attr_props);
>> +               ret = sysfs_create_file(mem->kobj, &mem->attr_props);
>>                 if (ret < 0)
>>                         return ret;
>> +
>> +               /* TODO: Support APU/CPU memory usage */
>> +               if (dev->node_props.cpu_cores_count == 0) {
>> +                       mem->attr_used.name = "used_memory";
>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>> +                       sysfs_attr_init(&mem->attr_used);
>> +                       ret = sysfs_create_file(mem->kobj, &mem->attr_used);
>> +                       if (ret < 0)
>> +                               return ret;
>> +               }
>> +
>>                 i++;
>>         }
>>
>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>>  {
>>         struct kfd_topology_device *dev;
>>         struct kfd_topology_device *out_dev = NULL;
>> +       struct kfd_mem_properties *mem;
>>
>>         down_write(&topology_lock);
>>         list_for_each_entry(dev, &topology_device_list, list)
>>                 if (!dev->gpu && (dev->node_props.simd_count > 0)) {
>>                         dev->gpu = gpu;
>>                         out_dev = dev;
>> +
>> +                       /* Assign mem->gpu */
>> +                       list_for_each_entry(mem, &dev->mem_props, list)
>> +                               mem->gpu = dev->gpu;
>> +
>>                         break;
>>                 }
>>         up_write(&topology_lock);
>> +
>>         return out_dev;
>>  }
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index 53fca1f..0f698d8 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>         uint32_t                width;
>>         uint32_t                mem_clk_max;
>>         struct kobject          *kobj;
>> -       struct attribute        attr;
>> +       struct kfd_dev          *gpu;
>> +       struct attribute        attr_props;
>> +       struct attribute        attr_used;
>>  };
>>
>>  #define HSA_CACHE_TYPE_DATA            0x00000001
>> --
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]             ` <CADnq5_OAhLbnA+07J+T+V82=Q0SZwB4r8zUv7D+KDXrqV==+GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-11 16:45               ` Oded Gabbay
  0 siblings, 0 replies; 62+ messages in thread
From: Oded Gabbay @ 2017-12-11 16:45 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Amber Lin, Felix Kuehling, amd-gfx list, Kent Russell,
	Alex Deucher, Christian König

On Mon, Dec 11, 2017 at 5:47 PM, Alex Deucher <alexdeucher@gmail.com> wrote:
> On Mon, Dec 11, 2017 at 10:23 AM, Oded Gabbay <oded.gabbay@gmail.com> wrote:
>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>> From: Amber Lin <Amber.Lin@amd.com>
>>>
>>> For hardware blocks whose performance counters are accessed via MMIO
>>> registers, KFD provides the support for those privileged blocks. IOMMU is
>>> one of those privileged blocks. Most performance counter properties
>>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>>  This patch adds properties to topology in KFD sysfs for information not
>>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>>> For dGPUs, who don't have IOMMU, nothing appears under
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>>
>> I don't feel comfortable with this patch. It seems to me you didn't
>> have anywhere to put these counters so you just stuck them in a place
>> the thunk already reads because it was "convenient" for you to do it.
>> But, as you point out in a comment later, these counters have nothing
>> to do with topology.
>> So this just feels wrong and I would like to:
>>
>> a. get additional opinions on it. Christian ? Alex ? What do you think
>> ? How the GPU's GFX counters are exposed ?
>
> They are handled as part of the command stream and exposed via OpenGL
> extensions.
>
>> b. Ask why not use IOCTL to get the counters ?
>
> I'm not sure that is really any better.

I imagined it should be part of some INFO IOCTL, similar to the one in amdgpu.
btw, there is another patch about adding vram usage to sysfs, which I
also don't like, because this type of info is reported in
amdgpu_info_ioctl.
I think there might be a place for amdkfd_info_ioctl.

Thanks,
Oded

>
>>
>> btw, I tried to search for other drivers that do this (expose perf
>> counters in sysfs) and didn't find any (it wasn't an exhaustive search
>> so I may have missed).
>
> There is the perf subsystem, but I'm not sure how well it works for
> device counters and tying the sampling to events and such.
>
> Alex
>
>>
>> Thanks,
>> Oded
>>
>>
>>
>>
>>>
>>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>>  2 files changed, 127 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 7fe7ee0..52d20f5 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>         struct kfd_mem_properties *mem;
>>>         struct kfd_cache_properties *cache;
>>>         struct kfd_iolink_properties *iolink;
>>> +       struct kfd_perf_properties *perf;
>>>
>>>         list_del(&dev->list);
>>>
>>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>                 kfree(iolink);
>>>         }
>>>
>>> +       while (dev->perf_props.next != &dev->perf_props) {
>>> +               perf = container_of(dev->perf_props.next,
>>> +                               struct kfd_perf_properties, list);
>>> +               list_del(&perf->list);
>>> +               kfree(perf);
>>> +       }
>>> +
>>>         kfree(dev);
>>>  }
>>>
>>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>         INIT_LIST_HEAD(&dev->mem_props);
>>>         INIT_LIST_HEAD(&dev->cache_props);
>>>         INIT_LIST_HEAD(&dev->io_link_props);
>>> +       INIT_LIST_HEAD(&dev->perf_props);
>>>
>>>         list_add_tail(&dev->list, device_list);
>>>
>>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>>         .sysfs_ops = &cache_ops,
>>>  };
>>>
>>> +/****** Sysfs of Performance Counters ******/
>>> +
>>> +struct kfd_perf_attr {
>>> +       struct kobj_attribute attr;
>>> +       uint32_t data;
>>> +};
>>> +
>>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>>> +                       char *buf)
>>> +{
>>> +       struct kfd_perf_attr *attr;
>>> +
>>> +       buf[0] = 0;
>>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>>> +       if (!attr->data) /* invalid data for PMC */
>>> +               return 0;
>>> +       else
>>> +               return sysfs_show_32bit_val(buf, attr->data);
>>> +}
>>> +
>>> +#define KFD_PERF_DESC(_name, _data)                    \
>>> +{                                                      \
>>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>>> +       .data = _data,                                  \
>>> +}
>>> +
>>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>>> +       KFD_PERF_DESC(max_concurrent, 0),
>>> +       KFD_PERF_DESC(num_counters, 0),
>>> +       KFD_PERF_DESC(counter_ids, 0),
>>> +};
>>> +/****************************************/
>>> +
>>>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>>                 char *buffer)
>>>  {
>>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>         struct kfd_iolink_properties *iolink;
>>>         struct kfd_cache_properties *cache;
>>>         struct kfd_mem_properties *mem;
>>> +       struct kfd_perf_properties *perf;
>>>
>>>         if (dev->kobj_iolink) {
>>>                 list_for_each_entry(iolink, &dev->io_link_props, list)
>>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>                 dev->kobj_mem = NULL;
>>>         }
>>>
>>> +       if (dev->kobj_perf) {
>>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>>> +                       kfree(perf->attr_group);
>>> +                       perf->attr_group = NULL;
>>> +               }
>>> +               kobject_del(dev->kobj_perf);
>>> +               kobject_put(dev->kobj_perf);
>>> +               dev->kobj_perf = NULL;
>>> +       }
>>> +
>>>         if (dev->kobj_node) {
>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>         struct kfd_iolink_properties *iolink;
>>>         struct kfd_cache_properties *cache;
>>>         struct kfd_mem_properties *mem;
>>> +       struct kfd_perf_properties *perf;
>>>         int ret;
>>> -       uint32_t i;
>>> +       uint32_t i, num_attrs;
>>> +       struct attribute **attrs;
>>>
>>>         if (WARN_ON(dev->kobj_node))
>>>                 return -EEXIST;
>>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>         if (!dev->kobj_iolink)
>>>                 return -ENOMEM;
>>>
>>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>>> +       if (!dev->kobj_perf)
>>> +               return -ENOMEM;
>>> +
>>>         /*
>>>          * Creating sysfs files for node properties
>>>          */
>>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>                 if (ret < 0)
>>>                         return ret;
>>>                 i++;
>>> -}
>>> +       }
>>> +
>>> +       /* All hardware blocks have the same number of attributes. */
>>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>>> +                       * num_attrs + sizeof(struct attribute_group),
>>> +                       GFP_KERNEL);
>>> +               if (!perf->attr_group)
>>> +                       return -ENOMEM;
>>> +
>>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>>> +               if (!strcmp(perf->block_name, "iommu")) {
>>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>>> +                * duplicate here.
>>> +                */
>>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>>> +                       for (i = 0; i < num_attrs; i++)
>>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>>> +               }
>>> +               perf->attr_group->name = perf->block_name;
>>> +               perf->attr_group->attrs = attrs;
>>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>>> +               if (ret < 0)
>>> +                       return ret;
>>> +       }
>>>
>>>         return 0;
>>>  }
>>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>>                 }
>>>         }
>>>  }
>>> +
>>> +/*
>>> + * Performance counters information is not part of CRAT but we would like to
>>> + * put them in the sysfs under topology directory for Thunk to get the data.
>>> + * This function is called before updating the sysfs.
>>> + */
>>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>> +{
>>> +       struct kfd_perf_properties *props;
>>> +
>>> +       if (amd_iommu_pc_supported()) {
>>> +               props = kfd_alloc_struct(props);
>>> +               if (!props)
>>> +                       return -ENOMEM;
>>> +               strcpy(props->block_name, "iommu");
>>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>>> +               list_add_tail(&props->list, &kdev->perf_props);
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>  /* kfd_add_non_crat_information - Add information that is not currently
>>>   *     defined in CRAT but is necessary for KFD topology
>>>   * @dev - topology device to which addition info is added
>>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>>                 }
>>>         }
>>>
>>> +       kdev = list_first_entry(&temp_topology_device_list,
>>> +                               struct kfd_topology_device, list);
>>> +       kfd_add_perf_to_topology(kdev);
>>> +
>>>         down_write(&topology_lock);
>>>         kfd_topology_update_device_list(&temp_topology_device_list,
>>>                                         &topology_device_list);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> index 55de56f..b9f3142 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>>         struct attribute        attr;
>>>  };
>>>
>>> +struct kfd_perf_properties {
>>> +       struct list_head        list;
>>> +       char                    block_name[16];
>>> +       uint32_t                max_concurrent;
>>> +       struct attribute_group  *attr_group;
>>> +};
>>> +
>>>  struct kfd_topology_device {
>>>         struct list_head                list;
>>>         uint32_t                        gpu_id;
>>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>>         struct list_head                cache_props;
>>>         uint32_t                        io_link_count;
>>>         struct list_head                io_link_props;
>>> +       struct list_head                perf_props;
>>>         struct kfd_dev                  *gpu;
>>>         struct kobject                  *kobj_node;
>>>         struct kobject                  *kobj_mem;
>>>         struct kobject                  *kobj_cache;
>>>         struct kobject                  *kobj_iolink;
>>> +       struct kobject                  *kobj_perf;
>>>         struct attribute                attr_gpuid;
>>>         struct attribute                attr_name;
>>>         struct attribute                attr_props;
>>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>                 struct list_head *device_list);
>>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>>
>>> +extern bool amd_iommu_pc_supported(void);
>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>> +
>>>  #endif /* __KFD_TOPOLOGY_H__ */
>>> --
>>> 2.7.4
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]                 ` <CAFCwf10FhChovT58CFzb0LEVGG4jZBocdbb_57iSRj9GnenGyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-11 17:26                   ` Christian König
  0 siblings, 0 replies; 62+ messages in thread
From: Christian König @ 2017-12-11 17:26 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Alex Deucher, Amber Lin, Felix Kuehling, Kent Russell, amd-gfx list

Am 11.12.2017 um 17:32 schrieb Oded Gabbay:
> On Mon, Dec 11, 2017 at 5:53 PM, Christian König
> <christian.koenig@amd.com> wrote:
>> Am 11.12.2017 um 16:23 schrieb Oded Gabbay:
>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com>
>>> wrote:
>>>> From: Amber Lin <Amber.Lin@amd.com>
>>>>
>>>> For hardware blocks whose performance counters are accessed via MMIO
>>>> registers, KFD provides the support for those privileged blocks. IOMMU is
>>>> one of those privileged blocks. Most performance counter properties
>>>> required by Thunk are available at
>>>> /sys/bus/event_source/devices/amd_iommu.
>>>>    This patch adds properties to topology in KFD sysfs for information not
>>>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>,
>>>> i.e.
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>>>> For dGPUs, who don't have IOMMU, nothing appears under
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>>> I don't feel comfortable with this patch. It seems to me you didn't
>>> have anywhere to put these counters so you just stuck them in a place
>>> the thunk already reads because it was "convenient" for you to do it.
>>> But, as you point out in a comment later, these counters have nothing
>>> to do with topology.
>>> So this just feels wrong and I would like to:
>>>
>>> a. get additional opinions on it. Christian ? Alex ? What do you think
>>> ?
>>
>> I agree that this looks odd.
>>
>> But that any device specific informations show up outside of
>> /sys/devices/pci0000:00... is strange to start with.
>>
>> Please don't tell me that we have build up a secondary topology parallel to
>> the linux device tree here?
> I hate to disappoint but the answer is that we do have a secondary
> topology dedicated to kfd under
> /sys/devices/virtual/kfd/kfd/topology/nodes/X
>
> This is part of the original design of the driver and it wasn't
> trivial to upstream, but at the end of the day it got upstreamed.
> I think the base argument was (and still is) that we expose a single
> char-dev for ALL the GPUs/APUs that are present in the system, and
> therefore, the thunk layer should get that information in one single
> place under the kfd driver folder in sysfs.
> I guess we could have done things differently, but that would have
> required more "integration" with the gfx driver at the time, which
> some people might have been thinking, at the time, to be more
> difficult then write write our own implementation.
> Anyway, that's all in the past and I doubt it will now change
> unless/until amdkfd is abolished and all functionality will move to
> amdgpu.

Well the requirement that KFD should be able to discover its devices 
actually makes sense.

But we should probably make 
/sys/devices/virtual/kfd/kfd/topology/nodes/X just a symlink to the 
device directory in sysfs like everybody else does.

If that isn't possible because of file name clashes we should at least 
point it to a subdirectory.

Regards,
Christian.

>
> Thanks,
> Oded
>> In other word for my Vega10 the real subdirectory for any device specific
>> config is:
>> ./devices/pci0000:00/0000:00:02.1/0000:01:00.0/0000:02:00.0/0000:03:00.0
>>
>> With the following files as symlink to it:
>> ./bus/pci/devices/0000:03:00.0
>> ./bus/pci/drivers/amdgpu/0000:03:00.0
>>
>>>    How the GPU's GFX counters are exposed ?
>>
>> We discussed that internally but haven't decided on anything AFAIK.
>>
>>> b. Ask why not use IOCTL to get the counters ?
>>
>> Per process counters are directly readable inside the command submission
>> affecting them.
>>
>> Only the timer tick is exposed as IOCTL as well IIRC.
>>
>> Regards,
>> Christian.
>>
>>
>>> btw, I tried to search for other drivers that do this (expose perf
>>> counters in sysfs) and didn't find any (it wasn't an exhaustive search
>>> so I may have missed).
>>>
>>> Thanks,
>>> Oded
>>>
>>>
>>>
>>>
>>>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116
>>>> +++++++++++++++++++++++++++++-
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>>>    2 files changed, 127 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> index 7fe7ee0..52d20f5 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct
>>>> kfd_topology_device *dev)
>>>>           struct kfd_mem_properties *mem;
>>>>           struct kfd_cache_properties *cache;
>>>>           struct kfd_iolink_properties *iolink;
>>>> +       struct kfd_perf_properties *perf;
>>>>
>>>>           list_del(&dev->list);
>>>>
>>>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct
>>>> kfd_topology_device *dev)
>>>>                   kfree(iolink);
>>>>           }
>>>>
>>>> +       while (dev->perf_props.next != &dev->perf_props) {
>>>> +               perf = container_of(dev->perf_props.next,
>>>> +                               struct kfd_perf_properties, list);
>>>> +               list_del(&perf->list);
>>>> +               kfree(perf);
>>>> +       }
>>>> +
>>>>           kfree(dev);
>>>>    }
>>>>
>>>> @@ -162,6 +170,7 @@ struct kfd_topology_device
>>>> *kfd_create_topology_device(
>>>>           INIT_LIST_HEAD(&dev->mem_props);
>>>>           INIT_LIST_HEAD(&dev->cache_props);
>>>>           INIT_LIST_HEAD(&dev->io_link_props);
>>>> +       INIT_LIST_HEAD(&dev->perf_props);
>>>>
>>>>           list_add_tail(&dev->list, device_list);
>>>>
>>>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>>>           .sysfs_ops = &cache_ops,
>>>>    };
>>>>
>>>> +/****** Sysfs of Performance Counters ******/
>>>> +
>>>> +struct kfd_perf_attr {
>>>> +       struct kobj_attribute attr;
>>>> +       uint32_t data;
>>>> +};
>>>> +
>>>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute
>>>> *attrs,
>>>> +                       char *buf)
>>>> +{
>>>> +       struct kfd_perf_attr *attr;
>>>> +
>>>> +       buf[0] = 0;
>>>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>>>> +       if (!attr->data) /* invalid data for PMC */
>>>> +               return 0;
>>>> +       else
>>>> +               return sysfs_show_32bit_val(buf, attr->data);
>>>> +}
>>>> +
>>>> +#define KFD_PERF_DESC(_name, _data)                    \
>>>> +{                                                      \
>>>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>>>> +       .data = _data,                                  \
>>>> +}
>>>> +
>>>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>>>> +       KFD_PERF_DESC(max_concurrent, 0),
>>>> +       KFD_PERF_DESC(num_counters, 0),
>>>> +       KFD_PERF_DESC(counter_ids, 0),
>>>> +};
>>>> +/****************************************/
>>>> +
>>>>    static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>>>                   char *buffer)
>>>>    {
>>>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct
>>>> kfd_topology_device *dev)
>>>>           struct kfd_iolink_properties *iolink;
>>>>           struct kfd_cache_properties *cache;
>>>>           struct kfd_mem_properties *mem;
>>>> +       struct kfd_perf_properties *perf;
>>>>
>>>>           if (dev->kobj_iolink) {
>>>>                   list_for_each_entry(iolink, &dev->io_link_props, list)
>>>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct
>>>> kfd_topology_device *dev)
>>>>                   dev->kobj_mem = NULL;
>>>>           }
>>>>
>>>> +       if (dev->kobj_perf) {
>>>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>>>> +                       kfree(perf->attr_group);
>>>> +                       perf->attr_group = NULL;
>>>> +               }
>>>> +               kobject_del(dev->kobj_perf);
>>>> +               kobject_put(dev->kobj_perf);
>>>> +               dev->kobj_perf = NULL;
>>>> +       }
>>>> +
>>>>           if (dev->kobj_node) {
>>>>                   sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>>>                   sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct
>>>> kfd_topology_device *dev,
>>>>           struct kfd_iolink_properties *iolink;
>>>>           struct kfd_cache_properties *cache;
>>>>           struct kfd_mem_properties *mem;
>>>> +       struct kfd_perf_properties *perf;
>>>>           int ret;
>>>> -       uint32_t i;
>>>> +       uint32_t i, num_attrs;
>>>> +       struct attribute **attrs;
>>>>
>>>>           if (WARN_ON(dev->kobj_node))
>>>>                   return -EEXIST;
>>>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct
>>>> kfd_topology_device *dev,
>>>>           if (!dev->kobj_iolink)
>>>>                   return -ENOMEM;
>>>>
>>>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>>>> +       if (!dev->kobj_perf)
>>>> +               return -ENOMEM;
>>>> +
>>>>           /*
>>>>            * Creating sysfs files for node properties
>>>>            */
>>>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct
>>>> kfd_topology_device *dev,
>>>>                   if (ret < 0)
>>>>                           return ret;
>>>>                   i++;
>>>> -}
>>>> +       }
>>>> +
>>>> +       /* All hardware blocks have the same number of attributes. */
>>>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>>>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>>>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>>>> +                       * num_attrs + sizeof(struct attribute_group),
>>>> +                       GFP_KERNEL);
>>>> +               if (!perf->attr_group)
>>>> +                       return -ENOMEM;
>>>> +
>>>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>>>> +               if (!strcmp(perf->block_name, "iommu")) {
>>>> +               /* Information of IOMMU's num_counters and counter_ids is
>>>> shown
>>>> +                * under /sys/bus/event_source/devices/amd_iommu. We
>>>> don't
>>>> +                * duplicate here.
>>>> +                */
>>>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>>>> +                       for (i = 0; i < num_attrs; i++)
>>>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>>>> +               }
>>>> +               perf->attr_group->name = perf->block_name;
>>>> +               perf->attr_group->attrs = attrs;
>>>> +               ret = sysfs_create_group(dev->kobj_perf,
>>>> perf->attr_group);
>>>> +               if (ret < 0)
>>>> +                       return ret;
>>>> +       }
>>>>
>>>>           return 0;
>>>>    }
>>>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct
>>>> dmi_header *dm,
>>>>                   }
>>>>           }
>>>>    }
>>>> +
>>>> +/*
>>>> + * Performance counters information is not part of CRAT but we would
>>>> like to
>>>> + * put them in the sysfs under topology directory for Thunk to get the
>>>> data.
>>>> + * This function is called before updating the sysfs.
>>>> + */
>>>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>>> +{
>>>> +       struct kfd_perf_properties *props;
>>>> +
>>>> +       if (amd_iommu_pc_supported()) {
>>>> +               props = kfd_alloc_struct(props);
>>>> +               if (!props)
>>>> +                       return -ENOMEM;
>>>> +               strcpy(props->block_name, "iommu");
>>>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>>> +                       amd_iommu_pc_get_max_counters(0); /* assume one
>>>> iommu */
>>>> +               list_add_tail(&props->list, &kdev->perf_props);
>>>> +       }
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>    /* kfd_add_non_crat_information - Add information that is not currently
>>>>     *     defined in CRAT but is necessary for KFD topology
>>>>     * @dev - topology device to which addition info is added
>>>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>>>                   }
>>>>           }
>>>>
>>>> +       kdev = list_first_entry(&temp_topology_device_list,
>>>> +                               struct kfd_topology_device, list);
>>>> +       kfd_add_perf_to_topology(kdev);
>>>> +
>>>>           down_write(&topology_lock);
>>>>           kfd_topology_update_device_list(&temp_topology_device_list,
>>>>                                           &topology_device_list);
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> index 55de56f..b9f3142 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>>>           struct attribute        attr;
>>>>    };
>>>>
>>>> +struct kfd_perf_properties {
>>>> +       struct list_head        list;
>>>> +       char                    block_name[16];
>>>> +       uint32_t                max_concurrent;
>>>> +       struct attribute_group  *attr_group;
>>>> +};
>>>> +
>>>>    struct kfd_topology_device {
>>>>           struct list_head                list;
>>>>           uint32_t                        gpu_id;
>>>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>>>           struct list_head                cache_props;
>>>>           uint32_t                        io_link_count;
>>>>           struct list_head                io_link_props;
>>>> +       struct list_head                perf_props;
>>>>           struct kfd_dev                  *gpu;
>>>>           struct kobject                  *kobj_node;
>>>>           struct kobject                  *kobj_mem;
>>>>           struct kobject                  *kobj_cache;
>>>>           struct kobject                  *kobj_iolink;
>>>> +       struct kobject                  *kobj_perf;
>>>>           struct attribute                attr_gpuid;
>>>>           struct attribute                attr_name;
>>>>           struct attribute                attr_props;
>>>> @@ -173,4 +182,8 @@ struct kfd_topology_device
>>>> *kfd_create_topology_device(
>>>>                   struct list_head *device_list);
>>>>    void kfd_release_topology_device_list(struct list_head *device_list);
>>>>
>>>> +extern bool amd_iommu_pc_supported(void);
>>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>>> +
>>>>    #endif /* __KFD_TOPOLOGY_H__ */
>>>> --
>>>> 2.7.4
>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]             ` <CAFCwf129dU00ocL=btDWh0bjHVaX-JsqWcwAr1uLpJmvj1gkKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-11 17:28               ` Christian König
       [not found]                 ` <64ce9318-7aa1-d654-7d41-e24cb7ad056a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Christian König @ 2017-12-11 17:28 UTC (permalink / raw)
  To: Oded Gabbay, Felix Kuehling; +Cc: Kent Russell, amd-gfx list

Am 11.12.2017 um 17:40 schrieb Oded Gabbay:
> On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay@gmail.com> wrote:
>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>> From: Kent Russell <kent.russell@amd.com>
>>>
>>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory) that
>>> reports the current VRAM usage for that node. Only works for GPU nodes
>>> at this time.
>>>
>> As with patch 22 (perf counters), I would not expect this information
>> to be included in the topology. It doesn't describe the properties of
>> the device, but a current state.
>> Oded
> For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL
> (AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()

Yep, completely agree.

That stuff is runtime properties and not static attribute nor 
configuration or setup.

So either debugfs or IOCTL are the two best options as far as I can see.

Christian.

>
> Thanks,
>
> Oded
>
>
>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49 +++++++++++++++++++++++++++----
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>>   2 files changed, 46 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 7f0d41e..7f04038 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -186,6 +186,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>                  sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>>>   #define sysfs_show_32bit_val(buffer, value) \
>>>                  sysfs_show_gen_prop(buffer, "%u\n", value)
>>> +#define sysfs_show_64bit_val(buffer, value) \
>>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>>   #define sysfs_show_str_val(buffer, value) \
>>>                  sysfs_show_gen_prop(buffer, "%s\n", value)
>>>
>>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
>>>   {
>>>          ssize_t ret;
>>>          struct kfd_mem_properties *mem;
>>> +       uint64_t used_mem;
>>>
>>>          /* Making sure that the buffer is an empty string */
>>>          buffer[0] = 0;
>>>
>>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>>> +       if (strcmp(attr->name, "used_memory") == 0) {
>>> +               mem = container_of(attr, struct kfd_mem_properties, attr_used);
>>> +               if (mem->gpu) {
>>> +                       used_mem = mem->gpu->kfd2kgd->get_vram_usage(
>>> +                                                               mem->gpu->kgd);
>>> +                       return sysfs_show_64bit_val(buffer, used_mem);
>>> +               }
>>> +               /* TODO: Report APU/CPU-allocated memory; For now return 0 */
>>> +               return 0;
>>> +       }
>>> +
>>> +       mem = container_of(attr, struct kfd_mem_properties, attr_props);
>>>          sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>>>          sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
>>>          sysfs_show_32bit_prop(buffer, "flags", mem->flags);
>>> @@ -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>          if (dev->kobj_mem) {
>>>                  list_for_each_entry(mem, &dev->mem_props, list)
>>>                          if (mem->kobj) {
>>> -                               kfd_remove_sysfs_file(mem->kobj, &mem->attr);
>>> +                               /* TODO: Remove when CPU/APU supported */
>>> +                               if (dev->node_props.cpu_cores_count == 0)
>>> +                                       sysfs_remove_file(mem->kobj,
>>> +                                                       &mem->attr_used);
>>> +                               kfd_remove_sysfs_file(mem->kobj,
>>> +                                               &mem->attr_props);
>>>                                  mem->kobj = NULL;
>>>                          }
>>>                  kobject_del(dev->kobj_mem);
>>> @@ -629,12 +648,23 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>                  if (ret < 0)
>>>                          return ret;
>>>
>>> -               mem->attr.name = "properties";
>>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>> -               sysfs_attr_init(&mem->attr);
>>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>>> +               mem->attr_props.name = "properties";
>>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>> +               sysfs_attr_init(&mem->attr_props);
>>> +               ret = sysfs_create_file(mem->kobj, &mem->attr_props);
>>>                  if (ret < 0)
>>>                          return ret;
>>> +
>>> +               /* TODO: Support APU/CPU memory usage */
>>> +               if (dev->node_props.cpu_cores_count == 0) {
>>> +                       mem->attr_used.name = "used_memory";
>>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>>> +                       sysfs_attr_init(&mem->attr_used);
>>> +                       ret = sysfs_create_file(mem->kobj, &mem->attr_used);
>>> +                       if (ret < 0)
>>> +                               return ret;
>>> +               }
>>> +
>>>                  i++;
>>>          }
>>>
>>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
>>>   {
>>>          struct kfd_topology_device *dev;
>>>          struct kfd_topology_device *out_dev = NULL;
>>> +       struct kfd_mem_properties *mem;
>>>
>>>          down_write(&topology_lock);
>>>          list_for_each_entry(dev, &topology_device_list, list)
>>>                  if (!dev->gpu && (dev->node_props.simd_count > 0)) {
>>>                          dev->gpu = gpu;
>>>                          out_dev = dev;
>>> +
>>> +                       /* Assign mem->gpu */
>>> +                       list_for_each_entry(mem, &dev->mem_props, list)
>>> +                               mem->gpu = dev->gpu;
>>> +
>>>                          break;
>>>                  }
>>>          up_write(&topology_lock);
>>> +
>>>          return out_dev;
>>>   }
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> index 53fca1f..0f698d8 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>>          uint32_t                width;
>>>          uint32_t                mem_clk_max;
>>>          struct kobject          *kobj;
>>> -       struct attribute        attr;
>>> +       struct kfd_dev          *gpu;
>>> +       struct attribute        attr_props;
>>> +       struct attribute        attr_used;
>>>   };
>>>
>>>   #define HSA_CACHE_TYPE_DATA            0x00000001
>>> --
>>> 2.7.4
>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]                 ` <64ce9318-7aa1-d654-7d41-e24cb7ad056a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-11 19:52                   ` Felix Kuehling
       [not found]                     ` <71874366-c697-8e90-de59-4e5f1d4f797b-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-11 19:52 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Oded Gabbay; +Cc: Kent Russell, amd-gfx list

On 2017-12-11 12:28 PM, Christian König wrote:
> Am 11.12.2017 um 17:40 schrieb Oded Gabbay:
>> On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay@gmail.com>
>> wrote:
>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling
>>> <Felix.Kuehling@amd.com> wrote:
>>>> From: Kent Russell <kent.russell@amd.com>
>>>>
>>>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory) that
>>>> reports the current VRAM usage for that node. Only works for GPU nodes
>>>> at this time.
>>>>
>>> As with patch 22 (perf counters), I would not expect this information
>>> to be included in the topology. It doesn't describe the properties of
>>> the device, but a current state.
>>> Oded
>> For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL
>> (AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()
>
> Yep, completely agree.
>
> That stuff is runtime properties and not static attribute nor
> configuration or setup.
>
> So either debugfs or IOCTL are the two best options as far as I can see.

Right. I admit, this feature was a bit of a hack to quickly enable the
HIP team without having to change a bunch of interfaces (ioctls, Thunk,
and Runtime).

This patch isn't critical for enabling dGPU support. I'll drop it for
now and we can reimplement it properly later.

Regards,
  Felix

>
> Christian.
>
>>
>> Thanks,
>>
>> Oded
>>
>>
>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49
>>>> +++++++++++++++++++++++++++----
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>>>   2 files changed, 46 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> index 7f0d41e..7f04038 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -186,6 +186,8 @@ struct kfd_topology_device
>>>> *kfd_create_topology_device(
>>>>                  sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
>>>>   #define sysfs_show_32bit_val(buffer, value) \
>>>>                  sysfs_show_gen_prop(buffer, "%u\n", value)
>>>> +#define sysfs_show_64bit_val(buffer, value) \
>>>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>>>   #define sysfs_show_str_val(buffer, value) \
>>>>                  sysfs_show_gen_prop(buffer, "%s\n", value)
>>>>
>>>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj,
>>>> struct attribute *attr,
>>>>   {
>>>>          ssize_t ret;
>>>>          struct kfd_mem_properties *mem;
>>>> +       uint64_t used_mem;
>>>>
>>>>          /* Making sure that the buffer is an empty string */
>>>>          buffer[0] = 0;
>>>>
>>>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>>>> +       if (strcmp(attr->name, "used_memory") == 0) {
>>>> +               mem = container_of(attr, struct kfd_mem_properties,
>>>> attr_used);
>>>> +               if (mem->gpu) {
>>>> +                       used_mem = mem->gpu->kfd2kgd->get_vram_usage(
>>>> +                                                              
>>>> mem->gpu->kgd);
>>>> +                       return sysfs_show_64bit_val(buffer, used_mem);
>>>> +               }
>>>> +               /* TODO: Report APU/CPU-allocated memory; For now
>>>> return 0 */
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       mem = container_of(attr, struct kfd_mem_properties,
>>>> attr_props);
>>>>          sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
>>>>          sysfs_show_64bit_prop(buffer, "size_in_bytes",
>>>> mem->size_in_bytes);
>>>>          sysfs_show_32bit_prop(buffer, "flags", mem->flags);
>>>> @@ -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct
>>>> kfd_topology_device *dev)
>>>>          if (dev->kobj_mem) {
>>>>                  list_for_each_entry(mem, &dev->mem_props, list)
>>>>                          if (mem->kobj) {
>>>> -                               kfd_remove_sysfs_file(mem->kobj,
>>>> &mem->attr);
>>>> +                               /* TODO: Remove when CPU/APU
>>>> supported */
>>>> +                               if (dev->node_props.cpu_cores_count
>>>> == 0)
>>>> +                                       sysfs_remove_file(mem->kobj,
>>>> +                                                      
>>>> &mem->attr_used);
>>>> +                               kfd_remove_sysfs_file(mem->kobj,
>>>> +                                               &mem->attr_props);
>>>>                                  mem->kobj = NULL;
>>>>                          }
>>>>                  kobject_del(dev->kobj_mem);
>>>> @@ -629,12 +648,23 @@ static int kfd_build_sysfs_node_entry(struct
>>>> kfd_topology_device *dev,
>>>>                  if (ret < 0)
>>>>                          return ret;
>>>>
>>>> -               mem->attr.name = "properties";
>>>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>>> -               sysfs_attr_init(&mem->attr);
>>>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>>>> +               mem->attr_props.name = "properties";
>>>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>>> +               sysfs_attr_init(&mem->attr_props);
>>>> +               ret = sysfs_create_file(mem->kobj, &mem->attr_props);
>>>>                  if (ret < 0)
>>>>                          return ret;
>>>> +
>>>> +               /* TODO: Support APU/CPU memory usage */
>>>> +               if (dev->node_props.cpu_cores_count == 0) {
>>>> +                       mem->attr_used.name = "used_memory";
>>>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>>>> +                       sysfs_attr_init(&mem->attr_used);
>>>> +                       ret = sysfs_create_file(mem->kobj,
>>>> &mem->attr_used);
>>>> +                       if (ret < 0)
>>>> +                               return ret;
>>>> +               }
>>>> +
>>>>                  i++;
>>>>          }
>>>>
>>>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device
>>>> *kfd_assign_gpu(struct kfd_dev *gpu)
>>>>   {
>>>>          struct kfd_topology_device *dev;
>>>>          struct kfd_topology_device *out_dev = NULL;
>>>> +       struct kfd_mem_properties *mem;
>>>>
>>>>          down_write(&topology_lock);
>>>>          list_for_each_entry(dev, &topology_device_list, list)
>>>>                  if (!dev->gpu && (dev->node_props.simd_count > 0)) {
>>>>                          dev->gpu = gpu;
>>>>                          out_dev = dev;
>>>> +
>>>> +                       /* Assign mem->gpu */
>>>> +                       list_for_each_entry(mem, &dev->mem_props,
>>>> list)
>>>> +                               mem->gpu = dev->gpu;
>>>> +
>>>>                          break;
>>>>                  }
>>>>          up_write(&topology_lock);
>>>> +
>>>>          return out_dev;
>>>>   }
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> index 53fca1f..0f698d8 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>>>          uint32_t                width;
>>>>          uint32_t                mem_clk_max;
>>>>          struct kobject          *kobj;
>>>> -       struct attribute        attr;
>>>> +       struct kfd_dev          *gpu;
>>>> +       struct attribute        attr_props;
>>>> +       struct attribute        attr_used;
>>>>   };
>>>>
>>>>   #define HSA_CACHE_TYPE_DATA            0x00000001
>>>> -- 
>>>> 2.7.4
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]         ` <CAFCwf11fXHVD+OaQ3RXFNb+wttvqZM8b=7+UfhrGw_uSoiph0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-12-11 15:47           ` Alex Deucher
  2017-12-11 15:53           ` Christian König
@ 2017-12-11 19:54           ` Felix Kuehling
       [not found]             ` <02742353-8c54-4bed-e013-f29de62a43d5-5C7GfCeVMHo@public.gmane.org>
  2 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2017-12-11 19:54 UTC (permalink / raw)
  To: Oded Gabbay, Alex Deucher, Christian König
  Cc: Amber Lin, Kent Russell, amd-gfx list

On 2017-12-11 10:23 AM, Oded Gabbay wrote:
> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> From: Amber Lin <Amber.Lin@amd.com>
>>
>> For hardware blocks whose performance counters are accessed via MMIO
>> registers, KFD provides the support for those privileged blocks. IOMMU is
>> one of those privileged blocks. Most performance counter properties
>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>  This patch adds properties to topology in KFD sysfs for information not
>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>> For dGPUs, who don't have IOMMU, nothing appears under
>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
> I don't feel comfortable with this patch. It seems to me you didn't
> have anywhere to put these counters so you just stuck them in a place
> the thunk already reads because it was "convenient" for you to do it.
> But, as you point out in a comment later, these counters have nothing
> to do with topology.
> So this just feels wrong and I would like to:
>
> a. get additional opinions on it. Christian ? Alex ? What do you think
> ? How the GPU's GFX counters are exposed ?
> b. Ask why not use IOCTL to get the counters ?

I see the performance counter information similar to other information
provided in the topology, such as memory, caches, CUs, etc. That's why
it makes sense for me to report it in the topology.

If this is controversial, I can drop the patch for now. It's not
critically needed for enabling dGPU support.

Regards,
  Felix

>
> btw, I tried to search for other drivers that do this (expose perf
> counters in sysfs) and didn't find any (it wasn't an exhaustive search
> so I may have missed).
>
> Thanks,
> Oded
>
>
>
>
>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>  2 files changed, 127 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 7fe7ee0..52d20f5 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>         struct kfd_mem_properties *mem;
>>         struct kfd_cache_properties *cache;
>>         struct kfd_iolink_properties *iolink;
>> +       struct kfd_perf_properties *perf;
>>
>>         list_del(&dev->list);
>>
>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>                 kfree(iolink);
>>         }
>>
>> +       while (dev->perf_props.next != &dev->perf_props) {
>> +               perf = container_of(dev->perf_props.next,
>> +                               struct kfd_perf_properties, list);
>> +               list_del(&perf->list);
>> +               kfree(perf);
>> +       }
>> +
>>         kfree(dev);
>>  }
>>
>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>         INIT_LIST_HEAD(&dev->mem_props);
>>         INIT_LIST_HEAD(&dev->cache_props);
>>         INIT_LIST_HEAD(&dev->io_link_props);
>> +       INIT_LIST_HEAD(&dev->perf_props);
>>
>>         list_add_tail(&dev->list, device_list);
>>
>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>         .sysfs_ops = &cache_ops,
>>  };
>>
>> +/****** Sysfs of Performance Counters ******/
>> +
>> +struct kfd_perf_attr {
>> +       struct kobj_attribute attr;
>> +       uint32_t data;
>> +};
>> +
>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>> +                       char *buf)
>> +{
>> +       struct kfd_perf_attr *attr;
>> +
>> +       buf[0] = 0;
>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>> +       if (!attr->data) /* invalid data for PMC */
>> +               return 0;
>> +       else
>> +               return sysfs_show_32bit_val(buf, attr->data);
>> +}
>> +
>> +#define KFD_PERF_DESC(_name, _data)                    \
>> +{                                                      \
>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>> +       .data = _data,                                  \
>> +}
>> +
>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>> +       KFD_PERF_DESC(max_concurrent, 0),
>> +       KFD_PERF_DESC(num_counters, 0),
>> +       KFD_PERF_DESC(counter_ids, 0),
>> +};
>> +/****************************************/
>> +
>>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>                 char *buffer)
>>  {
>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>         struct kfd_iolink_properties *iolink;
>>         struct kfd_cache_properties *cache;
>>         struct kfd_mem_properties *mem;
>> +       struct kfd_perf_properties *perf;
>>
>>         if (dev->kobj_iolink) {
>>                 list_for_each_entry(iolink, &dev->io_link_props, list)
>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>                 dev->kobj_mem = NULL;
>>         }
>>
>> +       if (dev->kobj_perf) {
>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>> +                       kfree(perf->attr_group);
>> +                       perf->attr_group = NULL;
>> +               }
>> +               kobject_del(dev->kobj_perf);
>> +               kobject_put(dev->kobj_perf);
>> +               dev->kobj_perf = NULL;
>> +       }
>> +
>>         if (dev->kobj_node) {
>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>         struct kfd_iolink_properties *iolink;
>>         struct kfd_cache_properties *cache;
>>         struct kfd_mem_properties *mem;
>> +       struct kfd_perf_properties *perf;
>>         int ret;
>> -       uint32_t i;
>> +       uint32_t i, num_attrs;
>> +       struct attribute **attrs;
>>
>>         if (WARN_ON(dev->kobj_node))
>>                 return -EEXIST;
>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>         if (!dev->kobj_iolink)
>>                 return -ENOMEM;
>>
>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>> +       if (!dev->kobj_perf)
>> +               return -ENOMEM;
>> +
>>         /*
>>          * Creating sysfs files for node properties
>>          */
>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>                 if (ret < 0)
>>                         return ret;
>>                 i++;
>> -}
>> +       }
>> +
>> +       /* All hardware blocks have the same number of attributes. */
>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>> +                       * num_attrs + sizeof(struct attribute_group),
>> +                       GFP_KERNEL);
>> +               if (!perf->attr_group)
>> +                       return -ENOMEM;
>> +
>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>> +               if (!strcmp(perf->block_name, "iommu")) {
>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>> +                * duplicate here.
>> +                */
>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>> +                       for (i = 0; i < num_attrs; i++)
>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>> +               }
>> +               perf->attr_group->name = perf->block_name;
>> +               perf->attr_group->attrs = attrs;
>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>> +               if (ret < 0)
>> +                       return ret;
>> +       }
>>
>>         return 0;
>>  }
>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>                 }
>>         }
>>  }
>> +
>> +/*
>> + * Performance counters information is not part of CRAT but we would like to
>> + * put them in the sysfs under topology directory for Thunk to get the data.
>> + * This function is called before updating the sysfs.
>> + */
>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>> +{
>> +       struct kfd_perf_properties *props;
>> +
>> +       if (amd_iommu_pc_supported()) {
>> +               props = kfd_alloc_struct(props);
>> +               if (!props)
>> +                       return -ENOMEM;
>> +               strcpy(props->block_name, "iommu");
>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>> +               list_add_tail(&props->list, &kdev->perf_props);
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>  /* kfd_add_non_crat_information - Add information that is not currently
>>   *     defined in CRAT but is necessary for KFD topology
>>   * @dev - topology device to which addition info is added
>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>                 }
>>         }
>>
>> +       kdev = list_first_entry(&temp_topology_device_list,
>> +                               struct kfd_topology_device, list);
>> +       kfd_add_perf_to_topology(kdev);
>> +
>>         down_write(&topology_lock);
>>         kfd_topology_update_device_list(&temp_topology_device_list,
>>                                         &topology_device_list);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index 55de56f..b9f3142 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>         struct attribute        attr;
>>  };
>>
>> +struct kfd_perf_properties {
>> +       struct list_head        list;
>> +       char                    block_name[16];
>> +       uint32_t                max_concurrent;
>> +       struct attribute_group  *attr_group;
>> +};
>> +
>>  struct kfd_topology_device {
>>         struct list_head                list;
>>         uint32_t                        gpu_id;
>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>         struct list_head                cache_props;
>>         uint32_t                        io_link_count;
>>         struct list_head                io_link_props;
>> +       struct list_head                perf_props;
>>         struct kfd_dev                  *gpu;
>>         struct kobject                  *kobj_node;
>>         struct kobject                  *kobj_mem;
>>         struct kobject                  *kobj_cache;
>>         struct kobject                  *kobj_iolink;
>> +       struct kobject                  *kobj_perf;
>>         struct attribute                attr_gpuid;
>>         struct attribute                attr_name;
>>         struct attribute                attr_props;
>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>                 struct list_head *device_list);
>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>
>> +extern bool amd_iommu_pc_supported(void);
>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>> +
>>  #endif /* __KFD_TOPOLOGY_H__ */
>> --
>> 2.7.4
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]             ` <02742353-8c54-4bed-e013-f29de62a43d5-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-12  8:15               ` Oded Gabbay
       [not found]                 ` <CAFCwf13RLA1pnL+Srh2WVz-YuNgS0yE__F3-+LUHY=9hsEH7MQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-12  8:15 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Alex Deucher, Amber Lin, Christian König, amd-gfx list,
	Kent Russell

On Mon, Dec 11, 2017 at 9:54 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> On 2017-12-11 10:23 AM, Oded Gabbay wrote:
>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>> From: Amber Lin <Amber.Lin@amd.com>
>>>
>>> For hardware blocks whose performance counters are accessed via MMIO
>>> registers, KFD provides the support for those privileged blocks. IOMMU is
>>> one of those privileged blocks. Most performance counter properties
>>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>>  This patch adds properties to topology in KFD sysfs for information not
>>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>>> For dGPUs, who don't have IOMMU, nothing appears under
>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>> I don't feel comfortable with this patch. It seems to me you didn't
>> have anywhere to put these counters so you just stuck them in a place
>> the thunk already reads because it was "convenient" for you to do it.
>> But, as you point out in a comment later, these counters have nothing
>> to do with topology.
>> So this just feels wrong and I would like to:
>>
>> a. get additional opinions on it. Christian ? Alex ? What do you think
>> ? How the GPU's GFX counters are exposed ?
>> b. Ask why not use IOCTL to get the counters ?
>
> I see the performance counter information similar to other information
> provided in the topology, such as memory, caches, CUs, etc. That's why
> it makes sense for me to report it in the topology.
>
> If this is controversial, I can drop the patch for now. It's not
> critically needed for enabling dGPU support.
>
> Regards,
>   Felix

Felix,
Is the perf counter information part of the snapshot that the thunk
takes before opening the device, or is it constantly being sampled ?
If its a oneshot thing, than I think that's somehow acceptable.

Oded

>
>>
>> btw, I tried to search for other drivers that do this (expose perf
>> counters in sysfs) and didn't find any (it wasn't an exhaustive search
>> so I may have missed).
>>
>> Thanks,
>> Oded
>>
>>
>>
>>
>>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>>  2 files changed, 127 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 7fe7ee0..52d20f5 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>         struct kfd_mem_properties *mem;
>>>         struct kfd_cache_properties *cache;
>>>         struct kfd_iolink_properties *iolink;
>>> +       struct kfd_perf_properties *perf;
>>>
>>>         list_del(&dev->list);
>>>
>>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>                 kfree(iolink);
>>>         }
>>>
>>> +       while (dev->perf_props.next != &dev->perf_props) {
>>> +               perf = container_of(dev->perf_props.next,
>>> +                               struct kfd_perf_properties, list);
>>> +               list_del(&perf->list);
>>> +               kfree(perf);
>>> +       }
>>> +
>>>         kfree(dev);
>>>  }
>>>
>>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>         INIT_LIST_HEAD(&dev->mem_props);
>>>         INIT_LIST_HEAD(&dev->cache_props);
>>>         INIT_LIST_HEAD(&dev->io_link_props);
>>> +       INIT_LIST_HEAD(&dev->perf_props);
>>>
>>>         list_add_tail(&dev->list, device_list);
>>>
>>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>>         .sysfs_ops = &cache_ops,
>>>  };
>>>
>>> +/****** Sysfs of Performance Counters ******/
>>> +
>>> +struct kfd_perf_attr {
>>> +       struct kobj_attribute attr;
>>> +       uint32_t data;
>>> +};
>>> +
>>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>>> +                       char *buf)
>>> +{
>>> +       struct kfd_perf_attr *attr;
>>> +
>>> +       buf[0] = 0;
>>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>>> +       if (!attr->data) /* invalid data for PMC */
>>> +               return 0;
>>> +       else
>>> +               return sysfs_show_32bit_val(buf, attr->data);
>>> +}
>>> +
>>> +#define KFD_PERF_DESC(_name, _data)                    \
>>> +{                                                      \
>>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>>> +       .data = _data,                                  \
>>> +}
>>> +
>>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>>> +       KFD_PERF_DESC(max_concurrent, 0),
>>> +       KFD_PERF_DESC(num_counters, 0),
>>> +       KFD_PERF_DESC(counter_ids, 0),
>>> +};
>>> +/****************************************/
>>> +
>>>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>>                 char *buffer)
>>>  {
>>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>         struct kfd_iolink_properties *iolink;
>>>         struct kfd_cache_properties *cache;
>>>         struct kfd_mem_properties *mem;
>>> +       struct kfd_perf_properties *perf;
>>>
>>>         if (dev->kobj_iolink) {
>>>                 list_for_each_entry(iolink, &dev->io_link_props, list)
>>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>                 dev->kobj_mem = NULL;
>>>         }
>>>
>>> +       if (dev->kobj_perf) {
>>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>>> +                       kfree(perf->attr_group);
>>> +                       perf->attr_group = NULL;
>>> +               }
>>> +               kobject_del(dev->kobj_perf);
>>> +               kobject_put(dev->kobj_perf);
>>> +               dev->kobj_perf = NULL;
>>> +       }
>>> +
>>>         if (dev->kobj_node) {
>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>         struct kfd_iolink_properties *iolink;
>>>         struct kfd_cache_properties *cache;
>>>         struct kfd_mem_properties *mem;
>>> +       struct kfd_perf_properties *perf;
>>>         int ret;
>>> -       uint32_t i;
>>> +       uint32_t i, num_attrs;
>>> +       struct attribute **attrs;
>>>
>>>         if (WARN_ON(dev->kobj_node))
>>>                 return -EEXIST;
>>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>         if (!dev->kobj_iolink)
>>>                 return -ENOMEM;
>>>
>>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>>> +       if (!dev->kobj_perf)
>>> +               return -ENOMEM;
>>> +
>>>         /*
>>>          * Creating sysfs files for node properties
>>>          */
>>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>                 if (ret < 0)
>>>                         return ret;
>>>                 i++;
>>> -}
>>> +       }
>>> +
>>> +       /* All hardware blocks have the same number of attributes. */
>>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>>> +                       * num_attrs + sizeof(struct attribute_group),
>>> +                       GFP_KERNEL);
>>> +               if (!perf->attr_group)
>>> +                       return -ENOMEM;
>>> +
>>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>>> +               if (!strcmp(perf->block_name, "iommu")) {
>>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>>> +                * duplicate here.
>>> +                */
>>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>>> +                       for (i = 0; i < num_attrs; i++)
>>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>>> +               }
>>> +               perf->attr_group->name = perf->block_name;
>>> +               perf->attr_group->attrs = attrs;
>>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>>> +               if (ret < 0)
>>> +                       return ret;
>>> +       }
>>>
>>>         return 0;
>>>  }
>>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>>                 }
>>>         }
>>>  }
>>> +
>>> +/*
>>> + * Performance counters information is not part of CRAT but we would like to
>>> + * put them in the sysfs under topology directory for Thunk to get the data.
>>> + * This function is called before updating the sysfs.
>>> + */
>>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>> +{
>>> +       struct kfd_perf_properties *props;
>>> +
>>> +       if (amd_iommu_pc_supported()) {
>>> +               props = kfd_alloc_struct(props);
>>> +               if (!props)
>>> +                       return -ENOMEM;
>>> +               strcpy(props->block_name, "iommu");
>>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>>> +               list_add_tail(&props->list, &kdev->perf_props);
>>> +       }
>>> +
>>> +       return 0;
>>> +}
>>> +
>>>  /* kfd_add_non_crat_information - Add information that is not currently
>>>   *     defined in CRAT but is necessary for KFD topology
>>>   * @dev - topology device to which addition info is added
>>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>>                 }
>>>         }
>>>
>>> +       kdev = list_first_entry(&temp_topology_device_list,
>>> +                               struct kfd_topology_device, list);
>>> +       kfd_add_perf_to_topology(kdev);
>>> +
>>>         down_write(&topology_lock);
>>>         kfd_topology_update_device_list(&temp_topology_device_list,
>>>                                         &topology_device_list);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> index 55de56f..b9f3142 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>>         struct attribute        attr;
>>>  };
>>>
>>> +struct kfd_perf_properties {
>>> +       struct list_head        list;
>>> +       char                    block_name[16];
>>> +       uint32_t                max_concurrent;
>>> +       struct attribute_group  *attr_group;
>>> +};
>>> +
>>>  struct kfd_topology_device {
>>>         struct list_head                list;
>>>         uint32_t                        gpu_id;
>>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>>         struct list_head                cache_props;
>>>         uint32_t                        io_link_count;
>>>         struct list_head                io_link_props;
>>> +       struct list_head                perf_props;
>>>         struct kfd_dev                  *gpu;
>>>         struct kobject                  *kobj_node;
>>>         struct kobject                  *kobj_mem;
>>>         struct kobject                  *kobj_cache;
>>>         struct kobject                  *kobj_iolink;
>>> +       struct kobject                  *kobj_perf;
>>>         struct attribute                attr_gpuid;
>>>         struct attribute                attr_name;
>>>         struct attribute                attr_props;
>>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>                 struct list_head *device_list);
>>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>>
>>> +extern bool amd_iommu_pc_supported(void);
>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>> +
>>>  #endif /* __KFD_TOPOLOGY_H__ */
>>> --
>>> 2.7.4
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* RE: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]                     ` <71874366-c697-8e90-de59-4e5f1d4f797b-5C7GfCeVMHo@public.gmane.org>
@ 2017-12-12 11:11                       ` Russell, Kent
       [not found]                         ` <BN6PR1201MB01800128BE70E4BF0E4DEE4085340-6iU6OBHu2P/H0AMcJMwsYmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Russell, Kent @ 2017-12-12 11:11 UTC (permalink / raw)
  To: Kuehling, Felix, Koenig, Christian, Oded Gabbay; +Cc: amd-gfx list

That's alright. I admit that it was a bit self-serving in that I was asked to get something somewhere to get the information out, and it was the simplest solution. I can see if I can come up with a more acceptable option in a future patch set, but for now I think Felix is right in that we can just drop this one for now, it's definitely not worth holding up the rest of the patches over.

 Kent 

-----Original Message-----
From: Kuehling, Felix 
Sent: Monday, December 11, 2017 2:52 PM
To: Koenig, Christian; Oded Gabbay
Cc: Russell, Kent; amd-gfx list
Subject: Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage

On 2017-12-11 12:28 PM, Christian König wrote:
> Am 11.12.2017 um 17:40 schrieb Oded Gabbay:
>> On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay@gmail.com>
>> wrote:
>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling 
>>> <Felix.Kuehling@amd.com> wrote:
>>>> From: Kent Russell <kent.russell@amd.com>
>>>>
>>>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory) 
>>>> that reports the current VRAM usage for that node. Only works for 
>>>> GPU nodes at this time.
>>>>
>>> As with patch 22 (perf counters), I would not expect this 
>>> information to be included in the topology. It doesn't describe the 
>>> properties of the device, but a current state.
>>> Oded
>> For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL 
>> (AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()
>
> Yep, completely agree.
>
> That stuff is runtime properties and not static attribute nor 
> configuration or setup.
>
> So either debugfs or IOCTL are the two best options as far as I can see.

Right. I admit, this feature was a bit of a hack to quickly enable the HIP team without having to change a bunch of interfaces (ioctls, Thunk, and Runtime).

This patch isn't critical for enabling dGPU support. I'll drop it for now and we can reimplement it properly later.

Regards,
  Felix

>
> Christian.
>
>>
>> Thanks,
>>
>> Oded
>>
>>
>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49
>>>> +++++++++++++++++++++++++++----
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>>>   2 files changed, 46 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> index 7f0d41e..7f04038 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -186,6 +186,8 @@ struct kfd_topology_device 
>>>> *kfd_create_topology_device(
>>>>                  sysfs_show_gen_prop(buffer, "%s %llu\n", name, 
>>>> value)
>>>>   #define sysfs_show_32bit_val(buffer, value) \
>>>>                  sysfs_show_gen_prop(buffer, "%u\n", value)
>>>> +#define sysfs_show_64bit_val(buffer, value) \
>>>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>>>   #define sysfs_show_str_val(buffer, value) \
>>>>                  sysfs_show_gen_prop(buffer, "%s\n", value)
>>>>
>>>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj, 
>>>> struct attribute *attr,
>>>>   {
>>>>          ssize_t ret;
>>>>          struct kfd_mem_properties *mem;
>>>> +       uint64_t used_mem;
>>>>
>>>>          /* Making sure that the buffer is an empty string */
>>>>          buffer[0] = 0;
>>>>
>>>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>>>> +       if (strcmp(attr->name, "used_memory") == 0) {
>>>> +               mem = container_of(attr, struct kfd_mem_properties,
>>>> attr_used);
>>>> +               if (mem->gpu) {
>>>> +                       used_mem = 
>>>> +mem->gpu->kfd2kgd->get_vram_usage(
>>>> +                                                              
>>>> mem->gpu->kgd);
>>>> +                       return sysfs_show_64bit_val(buffer, 
>>>> +used_mem);
>>>> +               }
>>>> +               /* TODO: Report APU/CPU-allocated memory; For now
>>>> return 0 */
>>>> +               return 0;
>>>> +       }
>>>> +
>>>> +       mem = container_of(attr, struct kfd_mem_properties,
>>>> attr_props);
>>>>          sysfs_show_32bit_prop(buffer, "heap_type", 
>>>> mem->heap_type);
>>>>          sysfs_show_64bit_prop(buffer, "size_in_bytes",
>>>> mem->size_in_bytes);
>>>>          sysfs_show_32bit_prop(buffer, "flags", mem->flags); @@ 
>>>> -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct
>>>> kfd_topology_device *dev)
>>>>          if (dev->kobj_mem) {
>>>>                  list_for_each_entry(mem, &dev->mem_props, list)
>>>>                          if (mem->kobj) {
>>>> -                               kfd_remove_sysfs_file(mem->kobj, 
>>>> &mem->attr);
>>>> +                               /* TODO: Remove when CPU/APU
>>>> supported */
>>>> +                               if (dev->node_props.cpu_cores_count
>>>> == 0)
>>>> +                                       
>>>> +sysfs_remove_file(mem->kobj,
>>>> +                                                      
>>>> &mem->attr_used);
>>>> +                               kfd_remove_sysfs_file(mem->kobj,
>>>> +                                               &mem->attr_props);
>>>>                                  mem->kobj = NULL;
>>>>                          }
>>>>                  kobject_del(dev->kobj_mem); @@ -629,12 +648,23 @@ 
>>>> static int kfd_build_sysfs_node_entry(struct kfd_topology_device 
>>>> *dev,
>>>>                  if (ret < 0)
>>>>                          return ret;
>>>>
>>>> -               mem->attr.name = "properties";
>>>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>>> -               sysfs_attr_init(&mem->attr);
>>>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>>>> +               mem->attr_props.name = "properties";
>>>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>>> +               sysfs_attr_init(&mem->attr_props);
>>>> +               ret = sysfs_create_file(mem->kobj, 
>>>> +&mem->attr_props);
>>>>                  if (ret < 0)
>>>>                          return ret;
>>>> +
>>>> +               /* TODO: Support APU/CPU memory usage */
>>>> +               if (dev->node_props.cpu_cores_count == 0) {
>>>> +                       mem->attr_used.name = "used_memory";
>>>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>>>> +                       sysfs_attr_init(&mem->attr_used);
>>>> +                       ret = sysfs_create_file(mem->kobj,
>>>> &mem->attr_used);
>>>> +                       if (ret < 0)
>>>> +                               return ret;
>>>> +               }
>>>> +
>>>>                  i++;
>>>>          }
>>>>
>>>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device 
>>>> *kfd_assign_gpu(struct kfd_dev *gpu)
>>>>   {
>>>>          struct kfd_topology_device *dev;
>>>>          struct kfd_topology_device *out_dev = NULL;
>>>> +       struct kfd_mem_properties *mem;
>>>>
>>>>          down_write(&topology_lock);
>>>>          list_for_each_entry(dev, &topology_device_list, list)
>>>>                  if (!dev->gpu && (dev->node_props.simd_count > 0)) 
>>>> {
>>>>                          dev->gpu = gpu;
>>>>                          out_dev = dev;
>>>> +
>>>> +                       /* Assign mem->gpu */
>>>> +                       list_for_each_entry(mem, &dev->mem_props,
>>>> list)
>>>> +                               mem->gpu = dev->gpu;
>>>> +
>>>>                          break;
>>>>                  }
>>>>          up_write(&topology_lock);
>>>> +
>>>>          return out_dev;
>>>>   }
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> index 53fca1f..0f698d8 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>>>          uint32_t                width;
>>>>          uint32_t                mem_clk_max;
>>>>          struct kobject          *kobj;
>>>> -       struct attribute        attr;
>>>> +       struct kfd_dev          *gpu;
>>>> +       struct attribute        attr_props;
>>>> +       struct attribute        attr_used;
>>>>   };
>>>>
>>>>   #define HSA_CACHE_TYPE_DATA            0x00000001
>>>> --
>>>> 2.7.4
>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
  2017-12-09  4:09 ` [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root Felix Kuehling
@ 2017-12-12 23:27   ` Bjorn Helgaas
  2017-12-12 23:42     ` Jason Gunthorpe
  2018-01-02 23:41       ` Felix Kuehling
  0 siblings, 2 replies; 62+ messages in thread
From: Bjorn Helgaas @ 2017-12-12 23:27 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: oded.gabbay, amd-gfx, Jay Cornwall, linux-pci, Ram Amrani,
	Doug Ledford, Michal Kalderon, Ariel Elior, Jason Gunthorpe

[+cc Ram, Michal, Ariel, Doug, Jason]

The [29/37] in the subject makes it look like this is part of a larger
series, but I can't find the rest of it on linux-pci or linux-kernel.

I don't want to merge a new interface unless there's an in-tree user
of it.  I assume the rest of the series includes a user.

On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
> From: Jay Cornwall <Jay.Cornwall@amd.com>
> 
> The PCIe 3.0 AtomicOp (6.15) feature allows atomic transctions to be
> requested by, routed through and completed by PCIe components. Routing and
> completion do not require software support. Component support for each is
> detectable via the DEVCAP2 register.
> 
> AtomicOp requests are permitted only if a component's
> DEVCTL2.ATOMICOP_REQUESTER_ENABLE field is set. This capability cannot be
> detected but is a no-op if set on a component with no support. These
> requests can only be serviced if the upstream components support AtomicOp
> completion and/or routing to a component which does.
> 
> A concrete example is the AMD Fiji-class GPU, which is specified to
> support AtomicOp requests, routed through a PLX 8747 switch (advertising
> AtomicOp routing) to a Haswell host bridge (advertising AtomicOp
> completion support). When AtomicOp requests are disabled the GPU logs
> attempts to initiate requests to an MMIO register for debugging.
> 
> Add pci_enable_atomic_ops_to_root for per-device control over AtomicOp
> requests. Upstream bridges are checked for AtomicOp routing capability and
> the call fails if any lack this capability. The root port is checked for
> AtomicOp completion capabilities and the call fails if it does not support
> any. Routes to other PCIe components are not checked for AtomicOp routing
> and completion capabilities.
> 
> v2: Check for AtomicOp route to root port with AtomicOp completion
> v3: Style fixes
> v4: Endpoint to root port only, check upstream egress blocking
> v5: Rebase, use existing PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK define
> 
> CC: linux-pci@vger.kernel.org
> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/pci/pci.c             | 81 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/pci.h           |  1 +
>  include/uapi/linux/pci_regs.h |  2 ++
>  3 files changed, 84 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 6078dfc..89a8bb0 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2966,6 +2966,87 @@ bool pci_acs_path_enabled(struct pci_dev *start,
>  }
>  
>  /**
> + * pci_enable_atomic_ops_to_root - enable AtomicOp requests to root port
> + * @dev: the PCI device
> + *
> + * Return 0 if the device is capable of generating AtomicOp requests,

I don't believe this part.

You return 0 if the upstream path can route AtomicOps and the Root
Port can complete them.  But there's nothing here that's conditional
on capabilities of *dev*.

You could read back PCI_EXP_DEVCTL2 to see if
PCI_EXP_DEVCTL2_ATOMIC_REQ was writable, but even then, you can't
really tell what the device is capable of.

> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
> + * 128-bit AtomicOp completion, or negative otherwise.
> + */
> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
> +{
> +	struct pci_bus *bus = dev->bus;
> +
> +	if (!pci_is_pcie(dev))
> +		return -EINVAL;
> +
> +	switch (pci_pcie_type(dev)) {
> +	/*
> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
> +	 * to implement AtomicOp requester capabilities.
> +	 */
> +	case PCI_EXP_TYPE_ENDPOINT:
> +	case PCI_EXP_TYPE_LEG_END:
> +	case PCI_EXP_TYPE_RC_END:
> +		break;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	while (bus->parent) {
> +		struct pci_dev *bridge = bus->self;
> +		u32 cap;
> +
> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
> +
> +		switch (pci_pcie_type(bridge)) {
> +		/*
> +		 * Upstream, downstream and root ports may implement AtomicOp
> +		 * routing capabilities. AtomicOp routing via a root port is
> +		 * not considered.
> +		 */
> +		case PCI_EXP_TYPE_UPSTREAM:
> +		case PCI_EXP_TYPE_DOWNSTREAM:
> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> +				return -EINVAL;
> +			break;
> +
> +		/*
> +		 * Root ports are permitted to implement AtomicOp completion
> +		 * capabilities.
> +		 */
> +		case PCI_EXP_TYPE_ROOT_PORT:
> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
> +				return -EINVAL;
> +			break;
> +		}

IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
initiate AtomicOps that target system memory.  This interface
(pci_enable_atomic_ops_to_root()) doesn't specify what size operations
the driver wants to do.  If the GPU requests a 128-bit op and the Root
Port doesn't support it, I think we'll see an Unsupported Request
error.

Do you need to extend this interface so the driver can specify what
sizes it wants?

The existing code in qedr_pci_set_atomic() is very similar.  We should
make this new interface work for both places, then actually use it in
qedr_pci_set_atomic().

> +
> +		/*
> +		 * Upstream ports may block AtomicOps on egress.
> +		 */
> +		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {

pci_pcie_type() is not a reliable method for determining the function
of a switch port.  There are systems where the upstream port is
labeled as a downstream port, e.g.,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c8fc9339409d

> +			u32 ctl2;
> +
> +			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
> +						   &ctl2);
> +			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
> +				return -EINVAL;
> +		}
> +
> +		bus = bus->parent;
> +	}
> +
> +	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
> +				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(pci_enable_atomic_ops_to_root);
> +
> +/**
>   * pci_swizzle_interrupt_pin - swizzle INTx for device behind bridge
>   * @dev: the PCI device
>   * @pin: the INTx pin (1=INTA, 2=INTB, 3=INTC, 4=INTD)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f4f8ee5..2a39f63 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2062,6 +2062,7 @@ void pci_request_acs(void);
>  bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
>  bool pci_acs_path_enabled(struct pci_dev *start,
>  			  struct pci_dev *end, u16 acs_flags);
> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev);
>  
>  #define PCI_VPD_LRDT			0x80	/* Large Resource Data Type */
>  #define PCI_VPD_LRDT_ID(x)		((x) | PCI_VPD_LRDT)
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index f8d5804..45f251a 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -623,7 +623,9 @@
>  #define PCI_EXP_DEVCAP2		36	/* Device Capabilities 2 */
>  #define  PCI_EXP_DEVCAP2_ARI		0x00000020 /* Alternative Routing-ID */
>  #define  PCI_EXP_DEVCAP2_ATOMIC_ROUTE	0x00000040 /* Atomic Op routing */
> +#define  PCI_EXP_DEVCAP2_ATOMIC_COMP32	0x00000080 /* 32b AtomicOp completion */
>  #define PCI_EXP_DEVCAP2_ATOMIC_COMP64	0x00000100 /* Atomic 64-bit compare */
> +#define  PCI_EXP_DEVCAP2_ATOMIC_COMP128	0x00000200 /* 128b AtomicOp completion*/

The comments should be similar.  I think yours are better than the
original, so please change the original to

  /* 64b AtomicOp completion */

so they all match.

>  #define  PCI_EXP_DEVCAP2_LTR		0x00000800 /* Latency tolerance reporting */
>  #define  PCI_EXP_DEVCAP2_OBFF_MASK	0x000c0000 /* OBFF support mechanism */
>  #define  PCI_EXP_DEVCAP2_OBFF_MSG	0x00040000 /* New message signaling */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
  2017-12-12 23:27   ` Bjorn Helgaas
@ 2017-12-12 23:42     ` Jason Gunthorpe
  2017-12-13  7:22         ` Oded Gabbay
  2018-01-02 23:41       ` Felix Kuehling
  1 sibling, 1 reply; 62+ messages in thread
From: Jason Gunthorpe @ 2017-12-12 23:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Felix Kuehling, oded.gabbay, amd-gfx, Jay Cornwall, linux-pci,
	Ram Amrani, Doug Ledford, Michal Kalderon, Ariel Elior

On Tue, Dec 12, 2017 at 05:27:07PM -0600, Bjorn Helgaas wrote:
> [+cc Ram, Michal, Ariel, Doug, Jason]
> 
> The [29/37] in the subject makes it look like this is part of a larger
> series, but I can't find the rest of it on linux-pci or linux-kernel.

Didn't find the cover letter, but the AMD patchworks captured the series..

https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/

> I don't want to merge a new interface unless there's an in-tree user
> of it.  I assume the rest of the series includes a user.

Looks like it.

I would also guess we will see users in drivers/infiniband emerge as
CPU coherent atomics are also a topic our hardware drivers will be
interested in. But I am not aware of any pending patches.

Jason

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
@ 2017-12-13  7:22         ` Oded Gabbay
  0 siblings, 0 replies; 62+ messages in thread
From: Oded Gabbay @ 2017-12-13  7:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bjorn Helgaas, Felix Kuehling, amd-gfx list, Jay Cornwall,
	linux-pci, Ram Amrani, Doug Ledford, Michal Kalderon,
	Ariel Elior

On Wed, Dec 13, 2017 at 1:42 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Dec 12, 2017 at 05:27:07PM -0600, Bjorn Helgaas wrote:
>> [+cc Ram, Michal, Ariel, Doug, Jason]
>>
>> The [29/37] in the subject makes it look like this is part of a larger
>> series, but I can't find the rest of it on linux-pci or linux-kernel.
>
> Didn't find the cover letter, but the AMD patchworks captured the series..
>
> https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/

Hi,
This patchset is mainly for the amdkfd driver, which is used for
running HSA Framework on AMD's APUs and in the near future, dGPUs.
This driver has been in the kernel since 3.19.
PCIe atomics were not required for APUs because there GPU part is
integrated with the CPU and they have atomic accesses between them.

For enabling HSA on dGPUs (such as Fiji, Vega, Polaris) which connect
through PCIe, we need to have PCIe atomics support.
The patchset starts to upstream the dGPU support and one of the
pre-requisites is the patch in discussion.

>
>> I don't want to merge a new interface unless there's an in-tree user
>> of it.  I assume the rest of the series includes a user.
>
> Looks like it.
So, yes, there is a user in the kernel and there is an entire
open-source userspace framework around it, called ROCm
(https://github.com/RadeonOpenCompute/ROCm)

Oded

>
> I would also guess we will see users in drivers/infiniband emerge as
> CPU coherent atomics are also a topic our hardware drivers will be
> interested in. But I am not aware of any pending patches.
>
> Jason

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
@ 2017-12-13  7:22         ` Oded Gabbay
  0 siblings, 0 replies; 62+ messages in thread
From: Oded Gabbay @ 2017-12-13  7:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Jay Cornwall, linux-pci-u79uwXL29TY76Z2rM5mHXA, Felix Kuehling,
	Ram Amrani, amd-gfx list, Doug Ledford, Bjorn Helgaas,
	Michal Kalderon, Ariel Elior

On Wed, Dec 13, 2017 at 1:42 AM, Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Tue, Dec 12, 2017 at 05:27:07PM -0600, Bjorn Helgaas wrote:
>> [+cc Ram, Michal, Ariel, Doug, Jason]
>>
>> The [29/37] in the subject makes it look like this is part of a larger
>> series, but I can't find the rest of it on linux-pci or linux-kernel.
>
> Didn't find the cover letter, but the AMD patchworks captured the series..
>
> https://patchwork.freedesktop.org/project/amd-xorg-ddx/patches/

Hi,
This patchset is mainly for the amdkfd driver, which is used for
running HSA Framework on AMD's APUs and in the near future, dGPUs.
This driver has been in the kernel since 3.19.
PCIe atomics were not required for APUs because there GPU part is
integrated with the CPU and they have atomic accesses between them.

For enabling HSA on dGPUs (such as Fiji, Vega, Polaris) which connect
through PCIe, we need to have PCIe atomics support.
The patchset starts to upstream the dGPU support and one of the
pre-requisites is the patch in discussion.

>
>> I don't want to merge a new interface unless there's an in-tree user
>> of it.  I assume the rest of the series includes a user.
>
> Looks like it.
So, yes, there is a user in the kernel and there is an entire
open-source userspace framework around it, called ROCm
(https://github.com/RadeonOpenCompute/ROCm)

Oded

>
> I would also guess we will see users in drivers/infiniband emerge as
> CPU coherent atomics are also a topic our hardware drivers will be
> interested in. But I am not aware of any pending patches.
>
> Jason
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]                         ` <BN6PR1201MB01800128BE70E4BF0E4DEE4085340-6iU6OBHu2P/H0AMcJMwsYmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-12-13  7:23                           ` Oded Gabbay
       [not found]                             ` <CAFCwf12RbD-AaecdAjurqyAYXU6UqvpdLM+mpc7VdvXuYtp=TA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Oded Gabbay @ 2017-12-13  7:23 UTC (permalink / raw)
  To: Russell, Kent; +Cc: Kuehling, Felix, Koenig, Christian, amd-gfx list

On Tue, Dec 12, 2017 at 1:11 PM, Russell, Kent <Kent.Russell@amd.com> wrote:
> That's alright. I admit that it was a bit self-serving in that I was asked to get something somewhere to get the information out, and it was the simplest solution. I can see if I can come up with a more acceptable option in a future patch set, but for now I think Felix is right in that we can just drop this one for now, it's definitely not worth holding up the rest of the patches over.
>
>  Kent
Sure, np. I'll drop this for now.

Oded

>
> -----Original Message-----
> From: Kuehling, Felix
> Sent: Monday, December 11, 2017 2:52 PM
> To: Koenig, Christian; Oded Gabbay
> Cc: Russell, Kent; amd-gfx list
> Subject: Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
>
> On 2017-12-11 12:28 PM, Christian König wrote:
>> Am 11.12.2017 um 17:40 schrieb Oded Gabbay:
>>> On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay@gmail.com>
>>> wrote:
>>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling
>>>> <Felix.Kuehling@amd.com> wrote:
>>>>> From: Kent Russell <kent.russell@amd.com>
>>>>>
>>>>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory)
>>>>> that reports the current VRAM usage for that node. Only works for
>>>>> GPU nodes at this time.
>>>>>
>>>> As with patch 22 (perf counters), I would not expect this
>>>> information to be included in the topology. It doesn't describe the
>>>> properties of the device, but a current state.
>>>> Oded
>>> For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL
>>> (AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()
>>
>> Yep, completely agree.
>>
>> That stuff is runtime properties and not static attribute nor
>> configuration or setup.
>>
>> So either debugfs or IOCTL are the two best options as far as I can see.
>
> Right. I admit, this feature was a bit of a hack to quickly enable the HIP team without having to change a bunch of interfaces (ioctls, Thunk, and Runtime).
>
> This patch isn't critical for enabling dGPU support. I'll drop it for now and we can reimplement it properly later.
>
> Regards,
>   Felix
>
>>
>> Christian.
>>
>>>
>>> Thanks,
>>>
>>> Oded
>>>
>>>
>>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49
>>>>> +++++++++++++++++++++++++++----
>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>>>>   2 files changed, 46 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> index 7f0d41e..7f04038 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> @@ -186,6 +186,8 @@ struct kfd_topology_device
>>>>> *kfd_create_topology_device(
>>>>>                  sysfs_show_gen_prop(buffer, "%s %llu\n", name,
>>>>> value)
>>>>>   #define sysfs_show_32bit_val(buffer, value) \
>>>>>                  sysfs_show_gen_prop(buffer, "%u\n", value)
>>>>> +#define sysfs_show_64bit_val(buffer, value) \
>>>>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>>>>   #define sysfs_show_str_val(buffer, value) \
>>>>>                  sysfs_show_gen_prop(buffer, "%s\n", value)
>>>>>
>>>>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj,
>>>>> struct attribute *attr,
>>>>>   {
>>>>>          ssize_t ret;
>>>>>          struct kfd_mem_properties *mem;
>>>>> +       uint64_t used_mem;
>>>>>
>>>>>          /* Making sure that the buffer is an empty string */
>>>>>          buffer[0] = 0;
>>>>>
>>>>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>>>>> +       if (strcmp(attr->name, "used_memory") == 0) {
>>>>> +               mem = container_of(attr, struct kfd_mem_properties,
>>>>> attr_used);
>>>>> +               if (mem->gpu) {
>>>>> +                       used_mem =
>>>>> +mem->gpu->kfd2kgd->get_vram_usage(
>>>>> +
>>>>> mem->gpu->kgd);
>>>>> +                       return sysfs_show_64bit_val(buffer,
>>>>> +used_mem);
>>>>> +               }
>>>>> +               /* TODO: Report APU/CPU-allocated memory; For now
>>>>> return 0 */
>>>>> +               return 0;
>>>>> +       }
>>>>> +
>>>>> +       mem = container_of(attr, struct kfd_mem_properties,
>>>>> attr_props);
>>>>>          sysfs_show_32bit_prop(buffer, "heap_type",
>>>>> mem->heap_type);
>>>>>          sysfs_show_64bit_prop(buffer, "size_in_bytes",
>>>>> mem->size_in_bytes);
>>>>>          sysfs_show_32bit_prop(buffer, "flags", mem->flags); @@
>>>>> -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct
>>>>> kfd_topology_device *dev)
>>>>>          if (dev->kobj_mem) {
>>>>>                  list_for_each_entry(mem, &dev->mem_props, list)
>>>>>                          if (mem->kobj) {
>>>>> -                               kfd_remove_sysfs_file(mem->kobj,
>>>>> &mem->attr);
>>>>> +                               /* TODO: Remove when CPU/APU
>>>>> supported */
>>>>> +                               if (dev->node_props.cpu_cores_count
>>>>> == 0)
>>>>> +
>>>>> +sysfs_remove_file(mem->kobj,
>>>>> +
>>>>> &mem->attr_used);
>>>>> +                               kfd_remove_sysfs_file(mem->kobj,
>>>>> +                                               &mem->attr_props);
>>>>>                                  mem->kobj = NULL;
>>>>>                          }
>>>>>                  kobject_del(dev->kobj_mem); @@ -629,12 +648,23 @@
>>>>> static int kfd_build_sysfs_node_entry(struct kfd_topology_device
>>>>> *dev,
>>>>>                  if (ret < 0)
>>>>>                          return ret;
>>>>>
>>>>> -               mem->attr.name = "properties";
>>>>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>>>> -               sysfs_attr_init(&mem->attr);
>>>>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>>>>> +               mem->attr_props.name = "properties";
>>>>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>>>> +               sysfs_attr_init(&mem->attr_props);
>>>>> +               ret = sysfs_create_file(mem->kobj,
>>>>> +&mem->attr_props);
>>>>>                  if (ret < 0)
>>>>>                          return ret;
>>>>> +
>>>>> +               /* TODO: Support APU/CPU memory usage */
>>>>> +               if (dev->node_props.cpu_cores_count == 0) {
>>>>> +                       mem->attr_used.name = "used_memory";
>>>>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>>>>> +                       sysfs_attr_init(&mem->attr_used);
>>>>> +                       ret = sysfs_create_file(mem->kobj,
>>>>> &mem->attr_used);
>>>>> +                       if (ret < 0)
>>>>> +                               return ret;
>>>>> +               }
>>>>> +
>>>>>                  i++;
>>>>>          }
>>>>>
>>>>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device
>>>>> *kfd_assign_gpu(struct kfd_dev *gpu)
>>>>>   {
>>>>>          struct kfd_topology_device *dev;
>>>>>          struct kfd_topology_device *out_dev = NULL;
>>>>> +       struct kfd_mem_properties *mem;
>>>>>
>>>>>          down_write(&topology_lock);
>>>>>          list_for_each_entry(dev, &topology_device_list, list)
>>>>>                  if (!dev->gpu && (dev->node_props.simd_count > 0))
>>>>> {
>>>>>                          dev->gpu = gpu;
>>>>>                          out_dev = dev;
>>>>> +
>>>>> +                       /* Assign mem->gpu */
>>>>> +                       list_for_each_entry(mem, &dev->mem_props,
>>>>> list)
>>>>> +                               mem->gpu = dev->gpu;
>>>>> +
>>>>>                          break;
>>>>>                  }
>>>>>          up_write(&topology_lock);
>>>>> +
>>>>>          return out_dev;
>>>>>   }
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> index 53fca1f..0f698d8 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>>>>          uint32_t                width;
>>>>>          uint32_t                mem_clk_max;
>>>>>          struct kobject          *kobj;
>>>>> -       struct attribute        attr;
>>>>> +       struct kfd_dev          *gpu;
>>>>> +       struct attribute        attr_props;
>>>>> +       struct attribute        attr_used;
>>>>>   };
>>>>>
>>>>>   #define HSA_CACHE_TYPE_DATA            0x00000001
>>>>> --
>>>>> 2.7.4
>>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]                             ` <CAFCwf12RbD-AaecdAjurqyAYXU6UqvpdLM+mpc7VdvXuYtp=TA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-16 20:36                               ` Felix Kühling
       [not found]                                 ` <f64e2633-9424-26d4-0a35-166a2b8a62c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Felix Kühling @ 2017-12-16 20:36 UTC (permalink / raw)
  To: Oded Gabbay, Russell, Kent
  Cc: Kuehling, Felix, Koenig, Christian, amd-gfx list


[-- Attachment #1.1.1: Type: text/plain, Size: 9671 bytes --]

Am 13.12.2017 um 08:23 schrieb Oded Gabbay:
> On Tue, Dec 12, 2017 at 1:11 PM, Russell, Kent <Kent.Russell-5C7GfCeVMHo@public.gmane.org> wrote:
>> That's alright. I admit that it was a bit self-serving in that I was asked to get something somewhere to get the information out, and it was the simplest solution. I can see if I can come up with a more acceptable option in a future patch set, but for now I think Felix is right in that we can just drop this one for now, it's definitely not worth holding up the rest of the patches over.
>>
>>  Kent
> Sure, np. I'll drop this for now.

FWIW, there is precedent for this type of information in sysfs. See
/sys/devices/virtual/drm/ttm/memory_accounting/kernel/used_memory.

Regards,
  Felix

>
> Oded
>
>> -----Original Message-----
>> From: Kuehling, Felix
>> Sent: Monday, December 11, 2017 2:52 PM
>> To: Koenig, Christian; Oded Gabbay
>> Cc: Russell, Kent; amd-gfx list
>> Subject: Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
>>
>> On 2017-12-11 12:28 PM, Christian König wrote:
>>> Am 11.12.2017 um 17:40 schrieb Oded Gabbay:
>>>> On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>> wrote:
>>>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling
>>>>> <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org> wrote:
>>>>>> From: Kent Russell <kent.russell-5C7GfCeVMHo@public.gmane.org>
>>>>>>
>>>>>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory)
>>>>>> that reports the current VRAM usage for that node. Only works for
>>>>>> GPU nodes at this time.
>>>>>>
>>>>> As with patch 22 (perf counters), I would not expect this
>>>>> information to be included in the topology. It doesn't describe the
>>>>> properties of the device, but a current state.
>>>>> Oded
>>>> For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL
>>>> (AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()
>>> Yep, completely agree.
>>>
>>> That stuff is runtime properties and not static attribute nor
>>> configuration or setup.
>>>
>>> So either debugfs or IOCTL are the two best options as far as I can see.
>> Right. I admit, this feature was a bit of a hack to quickly enable the HIP team without having to change a bunch of interfaces (ioctls, Thunk, and Runtime).
>>
>> This patch isn't critical for enabling dGPU support. I'll drop it for now and we can reimplement it properly later.
>>
>> Regards,
>>   Felix
>>
>>> Christian.
>>>
>>>> Thanks,
>>>>
>>>> Oded
>>>>
>>>>
>>>>>> Signed-off-by: Kent Russell <kent.russell-5C7GfCeVMHo@public.gmane.org>
>>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>>>>>> ---
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49
>>>>>> +++++++++++++++++++++++++++----
>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>>>>>   2 files changed, 46 insertions(+), 7 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>> index 7f0d41e..7f04038 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>> @@ -186,6 +186,8 @@ struct kfd_topology_device
>>>>>> *kfd_create_topology_device(
>>>>>>                  sysfs_show_gen_prop(buffer, "%s %llu\n", name,
>>>>>> value)
>>>>>>   #define sysfs_show_32bit_val(buffer, value) \
>>>>>>                  sysfs_show_gen_prop(buffer, "%u\n", value)
>>>>>> +#define sysfs_show_64bit_val(buffer, value) \
>>>>>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>>>>>   #define sysfs_show_str_val(buffer, value) \
>>>>>>                  sysfs_show_gen_prop(buffer, "%s\n", value)
>>>>>>
>>>>>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj,
>>>>>> struct attribute *attr,
>>>>>>   {
>>>>>>          ssize_t ret;
>>>>>>          struct kfd_mem_properties *mem;
>>>>>> +       uint64_t used_mem;
>>>>>>
>>>>>>          /* Making sure that the buffer is an empty string */
>>>>>>          buffer[0] = 0;
>>>>>>
>>>>>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>>>>>> +       if (strcmp(attr->name, "used_memory") == 0) {
>>>>>> +               mem = container_of(attr, struct kfd_mem_properties,
>>>>>> attr_used);
>>>>>> +               if (mem->gpu) {
>>>>>> +                       used_mem =
>>>>>> +mem->gpu->kfd2kgd->get_vram_usage(
>>>>>> +
>>>>>> mem->gpu->kgd);
>>>>>> +                       return sysfs_show_64bit_val(buffer,
>>>>>> +used_mem);
>>>>>> +               }
>>>>>> +               /* TODO: Report APU/CPU-allocated memory; For now
>>>>>> return 0 */
>>>>>> +               return 0;
>>>>>> +       }
>>>>>> +
>>>>>> +       mem = container_of(attr, struct kfd_mem_properties,
>>>>>> attr_props);
>>>>>>          sysfs_show_32bit_prop(buffer, "heap_type",
>>>>>> mem->heap_type);
>>>>>>          sysfs_show_64bit_prop(buffer, "size_in_bytes",
>>>>>> mem->size_in_bytes);
>>>>>>          sysfs_show_32bit_prop(buffer, "flags", mem->flags); @@
>>>>>> -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct
>>>>>> kfd_topology_device *dev)
>>>>>>          if (dev->kobj_mem) {
>>>>>>                  list_for_each_entry(mem, &dev->mem_props, list)
>>>>>>                          if (mem->kobj) {
>>>>>> -                               kfd_remove_sysfs_file(mem->kobj,
>>>>>> &mem->attr);
>>>>>> +                               /* TODO: Remove when CPU/APU
>>>>>> supported */
>>>>>> +                               if (dev->node_props.cpu_cores_count
>>>>>> == 0)
>>>>>> +
>>>>>> +sysfs_remove_file(mem->kobj,
>>>>>> +
>>>>>> &mem->attr_used);
>>>>>> +                               kfd_remove_sysfs_file(mem->kobj,
>>>>>> +                                               &mem->attr_props);
>>>>>>                                  mem->kobj = NULL;
>>>>>>                          }
>>>>>>                  kobject_del(dev->kobj_mem); @@ -629,12 +648,23 @@
>>>>>> static int kfd_build_sysfs_node_entry(struct kfd_topology_device
>>>>>> *dev,
>>>>>>                  if (ret < 0)
>>>>>>                          return ret;
>>>>>>
>>>>>> -               mem->attr.name = "properties";
>>>>>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>>>>> -               sysfs_attr_init(&mem->attr);
>>>>>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>>>>>> +               mem->attr_props.name = "properties";
>>>>>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>>>>> +               sysfs_attr_init(&mem->attr_props);
>>>>>> +               ret = sysfs_create_file(mem->kobj,
>>>>>> +&mem->attr_props);
>>>>>>                  if (ret < 0)
>>>>>>                          return ret;
>>>>>> +
>>>>>> +               /* TODO: Support APU/CPU memory usage */
>>>>>> +               if (dev->node_props.cpu_cores_count == 0) {
>>>>>> +                       mem->attr_used.name = "used_memory";
>>>>>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>>>>>> +                       sysfs_attr_init(&mem->attr_used);
>>>>>> +                       ret = sysfs_create_file(mem->kobj,
>>>>>> &mem->attr_used);
>>>>>> +                       if (ret < 0)
>>>>>> +                               return ret;
>>>>>> +               }
>>>>>> +
>>>>>>                  i++;
>>>>>>          }
>>>>>>
>>>>>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device
>>>>>> *kfd_assign_gpu(struct kfd_dev *gpu)
>>>>>>   {
>>>>>>          struct kfd_topology_device *dev;
>>>>>>          struct kfd_topology_device *out_dev = NULL;
>>>>>> +       struct kfd_mem_properties *mem;
>>>>>>
>>>>>>          down_write(&topology_lock);
>>>>>>          list_for_each_entry(dev, &topology_device_list, list)
>>>>>>                  if (!dev->gpu && (dev->node_props.simd_count > 0))
>>>>>> {
>>>>>>                          dev->gpu = gpu;
>>>>>>                          out_dev = dev;
>>>>>> +
>>>>>> +                       /* Assign mem->gpu */
>>>>>> +                       list_for_each_entry(mem, &dev->mem_props,
>>>>>> list)
>>>>>> +                               mem->gpu = dev->gpu;
>>>>>> +
>>>>>>                          break;
>>>>>>                  }
>>>>>>          up_write(&topology_lock);
>>>>>> +
>>>>>>          return out_dev;
>>>>>>   }
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>> index 53fca1f..0f698d8 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>>>>>          uint32_t                width;
>>>>>>          uint32_t                mem_clk_max;
>>>>>>          struct kobject          *kobj;
>>>>>> -       struct attribute        attr;
>>>>>> +       struct kfd_dev          *gpu;
>>>>>> +       struct attribute        attr_props;
>>>>>> +       struct attribute        attr_used;
>>>>>>   };
>>>>>>
>>>>>>   #define HSA_CACHE_TYPE_DATA            0x00000001
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]                 ` <CAFCwf13RLA1pnL+Srh2WVz-YuNgS0yE__F3-+LUHY=9hsEH7MQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-12-16 20:48                   ` Felix Kühling
       [not found]                     ` <fec4e14f-5ccf-1266-e0c4-47089b6c4d0c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Felix Kühling @ 2017-12-16 20:48 UTC (permalink / raw)
  To: Oded Gabbay, Felix Kuehling
  Cc: Alex Deucher, Amber Lin, Christian König, amd-gfx list,
	Kent Russell


[-- Attachment #1.1.1: Type: text/plain, Size: 13886 bytes --]

Am 12.12.2017 um 09:15 schrieb Oded Gabbay:
> On Mon, Dec 11, 2017 at 9:54 PM, Felix Kuehling <felix.kuehling-5C7GfCeVMHo@public.gmane.org> wrote:
>> On 2017-12-11 10:23 AM, Oded Gabbay wrote:
>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling-urvtwAKJhsc@public.gmane.orgm> wrote:
>>>> From: Amber Lin <Amber.Lin-5C7GfCeVMHo@public.gmane.org>
>>>>
>>>> For hardware blocks whose performance counters are accessed via MMIO
>>>> registers, KFD provides the support for those privileged blocks. IOMMU is
>>>> one of those privileged blocks. Most performance counter properties
>>>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>>>  This patch adds properties to topology in KFD sysfs for information not
>>>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>>>> For dGPUs, who don't have IOMMU, nothing appears under
>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>>> I don't feel comfortable with this patch. It seems to me you didn't
>>> have anywhere to put these counters so you just stuck them in a place
>>> the thunk already reads because it was "convenient" for you to do it.
>>> But, as you point out in a comment later, these counters have nothing
>>> to do with topology.
>>> So this just feels wrong and I would like to:
>>>
>>> a. get additional opinions on it. Christian ? Alex ? What do you think
>>> ? How the GPU's GFX counters are exposed ?
>>> b. Ask why not use IOCTL to get the counters ?
>> I see the performance counter information similar to other information
>> provided in the topology, such as memory, caches, CUs, etc. That's why
>> it makes sense for me to report it in the topology.
>>
>> If this is controversial, I can drop the patch for now. It's not
>> critically needed for enabling dGPU support.
>>
>> Regards,
>>   Felix
> Felix,
> Is the perf counter information part of the snapshot that the thunk
> takes before opening the device, or is it constantly being sampled ?
> If its a oneshot thing, than I think that's somehow acceptable.

It's currently read in hsaKmtOpen. But I think that could be changed to
be done as part of the snapshot. Either way, it's a one-shot thing.

Regards,
  Felix

>
> Oded
>
>>> btw, I tried to search for other drivers that do this (expose perf
>>> counters in sysfs) and didn't find any (it wasn't an exhaustive search
>>> so I may have missed).
>>>
>>> Thanks,
>>> Oded
>>>
>>>
>>>
>>>
>>>> Signed-off-by: Amber Lin <Amber.Lin-5C7GfCeVMHo@public.gmane.org>
>>>> Signed-off-by: Kent Russell <kent.russell-5C7GfCeVMHo@public.gmane.org>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>>>  2 files changed, 127 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> index 7fe7ee0..52d20f5 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>>         struct kfd_mem_properties *mem;
>>>>         struct kfd_cache_properties *cache;
>>>>         struct kfd_iolink_properties *iolink;
>>>> +       struct kfd_perf_properties *perf;
>>>>
>>>>         list_del(&dev->list);
>>>>
>>>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>>                 kfree(iolink);
>>>>         }
>>>>
>>>> +       while (dev->perf_props.next != &dev->perf_props) {
>>>> +               perf = container_of(dev->perf_props.next,
>>>> +                               struct kfd_perf_properties, list);
>>>> +               list_del(&perf->list);
>>>> +               kfree(perf);
>>>> +       }
>>>> +
>>>>         kfree(dev);
>>>>  }
>>>>
>>>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>>         INIT_LIST_HEAD(&dev->mem_props);
>>>>         INIT_LIST_HEAD(&dev->cache_props);
>>>>         INIT_LIST_HEAD(&dev->io_link_props);
>>>> +       INIT_LIST_HEAD(&dev->perf_props);
>>>>
>>>>         list_add_tail(&dev->list, device_list);
>>>>
>>>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>>>         .sysfs_ops = &cache_ops,
>>>>  };
>>>>
>>>> +/****** Sysfs of Performance Counters ******/
>>>> +
>>>> +struct kfd_perf_attr {
>>>> +       struct kobj_attribute attr;
>>>> +       uint32_t data;
>>>> +};
>>>> +
>>>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>>>> +                       char *buf)
>>>> +{
>>>> +       struct kfd_perf_attr *attr;
>>>> +
>>>> +       buf[0] = 0;
>>>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>>>> +       if (!attr->data) /* invalid data for PMC */
>>>> +               return 0;
>>>> +       else
>>>> +               return sysfs_show_32bit_val(buf, attr->data);
>>>> +}
>>>> +
>>>> +#define KFD_PERF_DESC(_name, _data)                    \
>>>> +{                                                      \
>>>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>>>> +       .data = _data,                                  \
>>>> +}
>>>> +
>>>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>>>> +       KFD_PERF_DESC(max_concurrent, 0),
>>>> +       KFD_PERF_DESC(num_counters, 0),
>>>> +       KFD_PERF_DESC(counter_ids, 0),
>>>> +};
>>>> +/****************************************/
>>>> +
>>>>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>>>                 char *buffer)
>>>>  {
>>>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>>         struct kfd_iolink_properties *iolink;
>>>>         struct kfd_cache_properties *cache;
>>>>         struct kfd_mem_properties *mem;
>>>> +       struct kfd_perf_properties *perf;
>>>>
>>>>         if (dev->kobj_iolink) {
>>>>                 list_for_each_entry(iolink, &dev->io_link_props, list)
>>>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>>                 dev->kobj_mem = NULL;
>>>>         }
>>>>
>>>> +       if (dev->kobj_perf) {
>>>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>>>> +                       kfree(perf->attr_group);
>>>> +                       perf->attr_group = NULL;
>>>> +               }
>>>> +               kobject_del(dev->kobj_perf);
>>>> +               kobject_put(dev->kobj_perf);
>>>> +               dev->kobj_perf = NULL;
>>>> +       }
>>>> +
>>>>         if (dev->kobj_node) {
>>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>>         struct kfd_iolink_properties *iolink;
>>>>         struct kfd_cache_properties *cache;
>>>>         struct kfd_mem_properties *mem;
>>>> +       struct kfd_perf_properties *perf;
>>>>         int ret;
>>>> -       uint32_t i;
>>>> +       uint32_t i, num_attrs;
>>>> +       struct attribute **attrs;
>>>>
>>>>         if (WARN_ON(dev->kobj_node))
>>>>                 return -EEXIST;
>>>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>>         if (!dev->kobj_iolink)
>>>>                 return -ENOMEM;
>>>>
>>>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>>>> +       if (!dev->kobj_perf)
>>>> +               return -ENOMEM;
>>>> +
>>>>         /*
>>>>          * Creating sysfs files for node properties
>>>>          */
>>>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>>                 if (ret < 0)
>>>>                         return ret;
>>>>                 i++;
>>>> -}
>>>> +       }
>>>> +
>>>> +       /* All hardware blocks have the same number of attributes. */
>>>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>>>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>>>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>>>> +                       * num_attrs + sizeof(struct attribute_group),
>>>> +                       GFP_KERNEL);
>>>> +               if (!perf->attr_group)
>>>> +                       return -ENOMEM;
>>>> +
>>>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>>>> +               if (!strcmp(perf->block_name, "iommu")) {
>>>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>>>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>>>> +                * duplicate here.
>>>> +                */
>>>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>>>> +                       for (i = 0; i < num_attrs; i++)
>>>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>>>> +               }
>>>> +               perf->attr_group->name = perf->block_name;
>>>> +               perf->attr_group->attrs = attrs;
>>>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>>>> +               if (ret < 0)
>>>> +                       return ret;
>>>> +       }
>>>>
>>>>         return 0;
>>>>  }
>>>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>>>                 }
>>>>         }
>>>>  }
>>>> +
>>>> +/*
>>>> + * Performance counters information is not part of CRAT but we would like to
>>>> + * put them in the sysfs under topology directory for Thunk to get the data.
>>>> + * This function is called before updating the sysfs.
>>>> + */
>>>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>>> +{
>>>> +       struct kfd_perf_properties *props;
>>>> +
>>>> +       if (amd_iommu_pc_supported()) {
>>>> +               props = kfd_alloc_struct(props);
>>>> +               if (!props)
>>>> +                       return -ENOMEM;
>>>> +               strcpy(props->block_name, "iommu");
>>>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>>>> +               list_add_tail(&props->list, &kdev->perf_props);
>>>> +       }
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>>  /* kfd_add_non_crat_information - Add information that is not currently
>>>>   *     defined in CRAT but is necessary for KFD topology
>>>>   * @dev - topology device to which addition info is added
>>>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>>>                 }
>>>>         }
>>>>
>>>> +       kdev = list_first_entry(&temp_topology_device_list,
>>>> +                               struct kfd_topology_device, list);
>>>> +       kfd_add_perf_to_topology(kdev);
>>>> +
>>>>         down_write(&topology_lock);
>>>>         kfd_topology_update_device_list(&temp_topology_device_list,
>>>>                                         &topology_device_list);
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> index 55de56f..b9f3142 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>>>         struct attribute        attr;
>>>>  };
>>>>
>>>> +struct kfd_perf_properties {
>>>> +       struct list_head        list;
>>>> +       char                    block_name[16];
>>>> +       uint32_t                max_concurrent;
>>>> +       struct attribute_group  *attr_group;
>>>> +};
>>>> +
>>>>  struct kfd_topology_device {
>>>>         struct list_head                list;
>>>>         uint32_t                        gpu_id;
>>>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>>>         struct list_head                cache_props;
>>>>         uint32_t                        io_link_count;
>>>>         struct list_head                io_link_props;
>>>> +       struct list_head                perf_props;
>>>>         struct kfd_dev                  *gpu;
>>>>         struct kobject                  *kobj_node;
>>>>         struct kobject                  *kobj_mem;
>>>>         struct kobject                  *kobj_cache;
>>>>         struct kobject                  *kobj_iolink;
>>>> +       struct kobject                  *kobj_perf;
>>>>         struct attribute                attr_gpuid;
>>>>         struct attribute                attr_name;
>>>>         struct attribute                attr_props;
>>>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>>                 struct list_head *device_list);
>>>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>>>
>>>> +extern bool amd_iommu_pc_supported(void);
>>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>>> +
>>>>  #endif /* __KFD_TOPOLOGY_H__ */
>>>> --
>>>> 2.7.4
>>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 213 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 22/37] drm/amdkfd: Add perf counters to topology
       [not found]                     ` <fec4e14f-5ccf-1266-e0c4-47089b6c4d0c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-19  9:40                       ` Oded Gabbay
  0 siblings, 0 replies; 62+ messages in thread
From: Oded Gabbay @ 2017-12-19  9:40 UTC (permalink / raw)
  To: Felix Kühling
  Cc: Amber Lin, Felix Kuehling, amd-gfx list, Kent Russell,
	Alex Deucher, Christian König

On Sat, Dec 16, 2017 at 10:48 PM, Felix Kühling
<felix.kuehling@gmail.com> wrote:
> Am 12.12.2017 um 09:15 schrieb Oded Gabbay:
>> On Mon, Dec 11, 2017 at 9:54 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
>>> On 2017-12-11 10:23 AM, Oded Gabbay wrote:
>>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>>>> From: Amber Lin <Amber.Lin@amd.com>
>>>>>
>>>>> For hardware blocks whose performance counters are accessed via MMIO
>>>>> registers, KFD provides the support for those privileged blocks. IOMMU is
>>>>> one of those privileged blocks. Most performance counter properties
>>>>> required by Thunk are available at /sys/bus/event_source/devices/amd_iommu.
>>>>>  This patch adds properties to topology in KFD sysfs for information not
>>>>> available in /sys/bus/event_source/devices/amd_iommu. They are shown at
>>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/ formatted as
>>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/<block>/<property>, i.e.
>>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf/iommu/max_concurrent.
>>>>> For dGPUs, who don't have IOMMU, nothing appears under
>>>>> /sys/devices/virtual/kfd/kfd/topology/nodes/0/perf.
>>>> I don't feel comfortable with this patch. It seems to me you didn't
>>>> have anywhere to put these counters so you just stuck them in a place
>>>> the thunk already reads because it was "convenient" for you to do it.
>>>> But, as you point out in a comment later, these counters have nothing
>>>> to do with topology.
>>>> So this just feels wrong and I would like to:
>>>>
>>>> a. get additional opinions on it. Christian ? Alex ? What do you think
>>>> ? How the GPU's GFX counters are exposed ?
>>>> b. Ask why not use IOCTL to get the counters ?
>>> I see the performance counter information similar to other information
>>> provided in the topology, such as memory, caches, CUs, etc. That's why
>>> it makes sense for me to report it in the topology.
>>>
>>> If this is controversial, I can drop the patch for now. It's not
>>> critically needed for enabling dGPU support.
>>>
>>> Regards,
>>>   Felix
>> Felix,
>> Is the perf counter information part of the snapshot that the thunk
>> takes before opening the device, or is it constantly being sampled ?
>> If its a oneshot thing, than I think that's somehow acceptable.
>
> It's currently read in hsaKmtOpen. But I think that could be changed to
> be done as part of the snapshot. Either way, it's a one-shot thing.
>
> Regards,
>   Felix
>
ok, so I think we can accept this as it is.

Oded

>>
>> Oded
>>
>>>> btw, I tried to search for other drivers that do this (expose perf
>>>> counters in sysfs) and didn't find any (it wasn't an exhaustive search
>>>> so I may have missed).
>>>>
>>>> Thanks,
>>>> Oded
>>>>
>>>>
>>>>
>>>>
>>>>> Signed-off-by: Amber Lin <Amber.Lin@amd.com>
>>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>> ---
>>>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 116 +++++++++++++++++++++++++++++-
>>>>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  13 ++++
>>>>>  2 files changed, 127 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> index 7fe7ee0..52d20f5 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>> @@ -104,6 +104,7 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>>>         struct kfd_mem_properties *mem;
>>>>>         struct kfd_cache_properties *cache;
>>>>>         struct kfd_iolink_properties *iolink;
>>>>> +       struct kfd_perf_properties *perf;
>>>>>
>>>>>         list_del(&dev->list);
>>>>>
>>>>> @@ -128,6 +129,13 @@ static void kfd_release_topology_device(struct kfd_topology_device *dev)
>>>>>                 kfree(iolink);
>>>>>         }
>>>>>
>>>>> +       while (dev->perf_props.next != &dev->perf_props) {
>>>>> +               perf = container_of(dev->perf_props.next,
>>>>> +                               struct kfd_perf_properties, list);
>>>>> +               list_del(&perf->list);
>>>>> +               kfree(perf);
>>>>> +       }
>>>>> +
>>>>>         kfree(dev);
>>>>>  }
>>>>>
>>>>> @@ -162,6 +170,7 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>>>         INIT_LIST_HEAD(&dev->mem_props);
>>>>>         INIT_LIST_HEAD(&dev->cache_props);
>>>>>         INIT_LIST_HEAD(&dev->io_link_props);
>>>>> +       INIT_LIST_HEAD(&dev->perf_props);
>>>>>
>>>>>         list_add_tail(&dev->list, device_list);
>>>>>
>>>>> @@ -328,6 +337,39 @@ static struct kobj_type cache_type = {
>>>>>         .sysfs_ops = &cache_ops,
>>>>>  };
>>>>>
>>>>> +/****** Sysfs of Performance Counters ******/
>>>>> +
>>>>> +struct kfd_perf_attr {
>>>>> +       struct kobj_attribute attr;
>>>>> +       uint32_t data;
>>>>> +};
>>>>> +
>>>>> +static ssize_t perf_show(struct kobject *kobj, struct kobj_attribute *attrs,
>>>>> +                       char *buf)
>>>>> +{
>>>>> +       struct kfd_perf_attr *attr;
>>>>> +
>>>>> +       buf[0] = 0;
>>>>> +       attr = container_of(attrs, struct kfd_perf_attr, attr);
>>>>> +       if (!attr->data) /* invalid data for PMC */
>>>>> +               return 0;
>>>>> +       else
>>>>> +               return sysfs_show_32bit_val(buf, attr->data);
>>>>> +}
>>>>> +
>>>>> +#define KFD_PERF_DESC(_name, _data)                    \
>>>>> +{                                                      \
>>>>> +       .attr  = __ATTR(_name, 0444, perf_show, NULL),  \
>>>>> +       .data = _data,                                  \
>>>>> +}
>>>>> +
>>>>> +static struct kfd_perf_attr perf_attr_iommu[] = {
>>>>> +       KFD_PERF_DESC(max_concurrent, 0),
>>>>> +       KFD_PERF_DESC(num_counters, 0),
>>>>> +       KFD_PERF_DESC(counter_ids, 0),
>>>>> +};
>>>>> +/****************************************/
>>>>> +
>>>>>  static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>>>>>                 char *buffer)
>>>>>  {
>>>>> @@ -452,6 +494,7 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>>>         struct kfd_iolink_properties *iolink;
>>>>>         struct kfd_cache_properties *cache;
>>>>>         struct kfd_mem_properties *mem;
>>>>> +       struct kfd_perf_properties *perf;
>>>>>
>>>>>         if (dev->kobj_iolink) {
>>>>>                 list_for_each_entry(iolink, &dev->io_link_props, list)
>>>>> @@ -488,6 +531,16 @@ static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
>>>>>                 dev->kobj_mem = NULL;
>>>>>         }
>>>>>
>>>>> +       if (dev->kobj_perf) {
>>>>> +               list_for_each_entry(perf, &dev->perf_props, list) {
>>>>> +                       kfree(perf->attr_group);
>>>>> +                       perf->attr_group = NULL;
>>>>> +               }
>>>>> +               kobject_del(dev->kobj_perf);
>>>>> +               kobject_put(dev->kobj_perf);
>>>>> +               dev->kobj_perf = NULL;
>>>>> +       }
>>>>> +
>>>>>         if (dev->kobj_node) {
>>>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
>>>>>                 sysfs_remove_file(dev->kobj_node, &dev->attr_name);
>>>>> @@ -504,8 +557,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>>>         struct kfd_iolink_properties *iolink;
>>>>>         struct kfd_cache_properties *cache;
>>>>>         struct kfd_mem_properties *mem;
>>>>> +       struct kfd_perf_properties *perf;
>>>>>         int ret;
>>>>> -       uint32_t i;
>>>>> +       uint32_t i, num_attrs;
>>>>> +       struct attribute **attrs;
>>>>>
>>>>>         if (WARN_ON(dev->kobj_node))
>>>>>                 return -EEXIST;
>>>>> @@ -534,6 +589,10 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>>>         if (!dev->kobj_iolink)
>>>>>                 return -ENOMEM;
>>>>>
>>>>> +       dev->kobj_perf = kobject_create_and_add("perf", dev->kobj_node);
>>>>> +       if (!dev->kobj_perf)
>>>>> +               return -ENOMEM;
>>>>> +
>>>>>         /*
>>>>>          * Creating sysfs files for node properties
>>>>>          */
>>>>> @@ -611,7 +670,33 @@ static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
>>>>>                 if (ret < 0)
>>>>>                         return ret;
>>>>>                 i++;
>>>>> -}
>>>>> +       }
>>>>> +
>>>>> +       /* All hardware blocks have the same number of attributes. */
>>>>> +       num_attrs = sizeof(perf_attr_iommu)/sizeof(struct kfd_perf_attr);
>>>>> +       list_for_each_entry(perf, &dev->perf_props, list) {
>>>>> +               perf->attr_group = kzalloc(sizeof(struct kfd_perf_attr)
>>>>> +                       * num_attrs + sizeof(struct attribute_group),
>>>>> +                       GFP_KERNEL);
>>>>> +               if (!perf->attr_group)
>>>>> +                       return -ENOMEM;
>>>>> +
>>>>> +               attrs = (struct attribute **)(perf->attr_group + 1);
>>>>> +               if (!strcmp(perf->block_name, "iommu")) {
>>>>> +               /* Information of IOMMU's num_counters and counter_ids is shown
>>>>> +                * under /sys/bus/event_source/devices/amd_iommu. We don't
>>>>> +                * duplicate here.
>>>>> +                */
>>>>> +                       perf_attr_iommu[0].data = perf->max_concurrent;
>>>>> +                       for (i = 0; i < num_attrs; i++)
>>>>> +                               attrs[i] = &perf_attr_iommu[i].attr.attr;
>>>>> +               }
>>>>> +               perf->attr_group->name = perf->block_name;
>>>>> +               perf->attr_group->attrs = attrs;
>>>>> +               ret = sysfs_create_group(dev->kobj_perf, perf->attr_group);
>>>>> +               if (ret < 0)
>>>>> +                       return ret;
>>>>> +       }
>>>>>
>>>>>         return 0;
>>>>>  }
>>>>> @@ -778,6 +863,29 @@ static void find_system_memory(const struct dmi_header *dm,
>>>>>                 }
>>>>>         }
>>>>>  }
>>>>> +
>>>>> +/*
>>>>> + * Performance counters information is not part of CRAT but we would like to
>>>>> + * put them in the sysfs under topology directory for Thunk to get the data.
>>>>> + * This function is called before updating the sysfs.
>>>>> + */
>>>>> +static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>>>> +{
>>>>> +       struct kfd_perf_properties *props;
>>>>> +
>>>>> +       if (amd_iommu_pc_supported()) {
>>>>> +               props = kfd_alloc_struct(props);
>>>>> +               if (!props)
>>>>> +                       return -ENOMEM;
>>>>> +               strcpy(props->block_name, "iommu");
>>>>> +               props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>>>> +                       amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>>>>> +               list_add_tail(&props->list, &kdev->perf_props);
>>>>> +       }
>>>>> +
>>>>> +       return 0;
>>>>> +}
>>>>> +
>>>>>  /* kfd_add_non_crat_information - Add information that is not currently
>>>>>   *     defined in CRAT but is necessary for KFD topology
>>>>>   * @dev - topology device to which addition info is added
>>>>> @@ -860,6 +968,10 @@ int kfd_topology_init(void)
>>>>>                 }
>>>>>         }
>>>>>
>>>>> +       kdev = list_first_entry(&temp_topology_device_list,
>>>>> +                               struct kfd_topology_device, list);
>>>>> +       kfd_add_perf_to_topology(kdev);
>>>>> +
>>>>>         down_write(&topology_lock);
>>>>>         kfd_topology_update_device_list(&temp_topology_device_list,
>>>>>                                         &topology_device_list);
>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> index 55de56f..b9f3142 100644
>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>> @@ -134,6 +134,13 @@ struct kfd_iolink_properties {
>>>>>         struct attribute        attr;
>>>>>  };
>>>>>
>>>>> +struct kfd_perf_properties {
>>>>> +       struct list_head        list;
>>>>> +       char                    block_name[16];
>>>>> +       uint32_t                max_concurrent;
>>>>> +       struct attribute_group  *attr_group;
>>>>> +};
>>>>> +
>>>>>  struct kfd_topology_device {
>>>>>         struct list_head                list;
>>>>>         uint32_t                        gpu_id;
>>>>> @@ -144,11 +151,13 @@ struct kfd_topology_device {
>>>>>         struct list_head                cache_props;
>>>>>         uint32_t                        io_link_count;
>>>>>         struct list_head                io_link_props;
>>>>> +       struct list_head                perf_props;
>>>>>         struct kfd_dev                  *gpu;
>>>>>         struct kobject                  *kobj_node;
>>>>>         struct kobject                  *kobj_mem;
>>>>>         struct kobject                  *kobj_cache;
>>>>>         struct kobject                  *kobj_iolink;
>>>>> +       struct kobject                  *kobj_perf;
>>>>>         struct attribute                attr_gpuid;
>>>>>         struct attribute                attr_name;
>>>>>         struct attribute                attr_props;
>>>>> @@ -173,4 +182,8 @@ struct kfd_topology_device *kfd_create_topology_device(
>>>>>                 struct list_head *device_list);
>>>>>  void kfd_release_topology_device_list(struct list_head *device_list);
>>>>>
>>>>> +extern bool amd_iommu_pc_supported(void);
>>>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>>>> +
>>>>>  #endif /* __KFD_TOPOLOGY_H__ */
>>>>> --
>>>>> 2.7.4
>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
       [not found]                                 ` <f64e2633-9424-26d4-0a35-166a2b8a62c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-19  9:42                                   ` Oded Gabbay
  0 siblings, 0 replies; 62+ messages in thread
From: Oded Gabbay @ 2017-12-19  9:42 UTC (permalink / raw)
  To: Felix Kühling
  Cc: Kuehling, Felix, Russell, Kent, amd-gfx list, Koenig, Christian

On Sat, Dec 16, 2017 at 10:36 PM, Felix Kühling
<felix.kuehling@gmail.com> wrote:
> Am 13.12.2017 um 08:23 schrieb Oded Gabbay:
>> On Tue, Dec 12, 2017 at 1:11 PM, Russell, Kent <Kent.Russell@amd.com> wrote:
>>> That's alright. I admit that it was a bit self-serving in that I was asked to get something somewhere to get the information out, and it was the simplest solution. I can see if I can come up with a more acceptable option in a future patch set, but for now I think Felix is right in that we can just drop this one for now, it's definitely not worth holding up the rest of the patches over.
>>>
>>>  Kent
>> Sure, np. I'll drop this for now.
>
> FWIW, there is precedent for this type of information in sysfs. See
> /sys/devices/virtual/drm/ttm/memory_accounting/kernel/used_memory.
>
> Regards,
>   Felix

I understand but I don't think that's applicable in our case. Who
works directly vs. ttm ? I'm not familiar with it enough but it seems
more like a debug feature.
I think it is better to follow amdgpu programming model.

Oded

>
>>
>> Oded
>>
>>> -----Original Message-----
>>> From: Kuehling, Felix
>>> Sent: Monday, December 11, 2017 2:52 PM
>>> To: Koenig, Christian; Oded Gabbay
>>> Cc: Russell, Kent; amd-gfx list
>>> Subject: Re: [PATCH 28/37] drm/amdkfd: Add support for displaying VRAM usage
>>>
>>> On 2017-12-11 12:28 PM, Christian König wrote:
>>>> Am 11.12.2017 um 17:40 schrieb Oded Gabbay:
>>>>> On Mon, Dec 11, 2017 at 5:32 PM, Oded Gabbay <oded.gabbay@gmail.com>
>>>>> wrote:
>>>>>> On Sat, Dec 9, 2017 at 6:09 AM, Felix Kuehling
>>>>>> <Felix.Kuehling@amd.com> wrote:
>>>>>>> From: Kent Russell <kent.russell@amd.com>
>>>>>>>
>>>>>>> Add a sysfs file in topology (node/x/memory_banks/X/used_memory)
>>>>>>> that reports the current VRAM usage for that node. Only works for
>>>>>>> GPU nodes at this time.
>>>>>>>
>>>>>> As with patch 22 (perf counters), I would not expect this
>>>>>> information to be included in the topology. It doesn't describe the
>>>>>> properties of the device, but a current state.
>>>>>> Oded
>>>>> For example, in amdgpu, the VRAM usage is reported in the INFO IOCTL
>>>>> (AMDGPU_INFO_VRAM_USAGE). See function  amdgpu_info_ioctl()
>>>> Yep, completely agree.
>>>>
>>>> That stuff is runtime properties and not static attribute nor
>>>> configuration or setup.
>>>>
>>>> So either debugfs or IOCTL are the two best options as far as I can see.
>>> Right. I admit, this feature was a bit of a hack to quickly enable the HIP team without having to change a bunch of interfaces (ioctls, Thunk, and Runtime).
>>>
>>> This patch isn't critical for enabling dGPU support. I'll drop it for now and we can reimplement it properly later.
>>>
>>> Regards,
>>>   Felix
>>>
>>>> Christian.
>>>>
>>>>> Thanks,
>>>>>
>>>>> Oded
>>>>>
>>>>>
>>>>>>> Signed-off-by: Kent Russell <kent.russell@amd.com>
>>>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 49
>>>>>>> +++++++++++++++++++++++++++----
>>>>>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  4 ++-
>>>>>>>   2 files changed, 46 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>>> index 7f0d41e..7f04038 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>>>>> @@ -186,6 +186,8 @@ struct kfd_topology_device
>>>>>>> *kfd_create_topology_device(
>>>>>>>                  sysfs_show_gen_prop(buffer, "%s %llu\n", name,
>>>>>>> value)
>>>>>>>   #define sysfs_show_32bit_val(buffer, value) \
>>>>>>>                  sysfs_show_gen_prop(buffer, "%u\n", value)
>>>>>>> +#define sysfs_show_64bit_val(buffer, value) \
>>>>>>> +               sysfs_show_gen_prop(buffer, "%llu\n", value)
>>>>>>>   #define sysfs_show_str_val(buffer, value) \
>>>>>>>                  sysfs_show_gen_prop(buffer, "%s\n", value)
>>>>>>>
>>>>>>> @@ -268,11 +270,23 @@ static ssize_t mem_show(struct kobject *kobj,
>>>>>>> struct attribute *attr,
>>>>>>>   {
>>>>>>>          ssize_t ret;
>>>>>>>          struct kfd_mem_properties *mem;
>>>>>>> +       uint64_t used_mem;
>>>>>>>
>>>>>>>          /* Making sure that the buffer is an empty string */
>>>>>>>          buffer[0] = 0;
>>>>>>>
>>>>>>> -       mem = container_of(attr, struct kfd_mem_properties, attr);
>>>>>>> +       if (strcmp(attr->name, "used_memory") == 0) {
>>>>>>> +               mem = container_of(attr, struct kfd_mem_properties,
>>>>>>> attr_used);
>>>>>>> +               if (mem->gpu) {
>>>>>>> +                       used_mem =
>>>>>>> +mem->gpu->kfd2kgd->get_vram_usage(
>>>>>>> +
>>>>>>> mem->gpu->kgd);
>>>>>>> +                       return sysfs_show_64bit_val(buffer,
>>>>>>> +used_mem);
>>>>>>> +               }
>>>>>>> +               /* TODO: Report APU/CPU-allocated memory; For now
>>>>>>> return 0 */
>>>>>>> +               return 0;
>>>>>>> +       }
>>>>>>> +
>>>>>>> +       mem = container_of(attr, struct kfd_mem_properties,
>>>>>>> attr_props);
>>>>>>>          sysfs_show_32bit_prop(buffer, "heap_type",
>>>>>>> mem->heap_type);
>>>>>>>          sysfs_show_64bit_prop(buffer, "size_in_bytes",
>>>>>>> mem->size_in_bytes);
>>>>>>>          sysfs_show_32bit_prop(buffer, "flags", mem->flags); @@
>>>>>>> -527,7 +541,12 @@ static void kfd_remove_sysfs_node_entry(struct
>>>>>>> kfd_topology_device *dev)
>>>>>>>          if (dev->kobj_mem) {
>>>>>>>                  list_for_each_entry(mem, &dev->mem_props, list)
>>>>>>>                          if (mem->kobj) {
>>>>>>> -                               kfd_remove_sysfs_file(mem->kobj,
>>>>>>> &mem->attr);
>>>>>>> +                               /* TODO: Remove when CPU/APU
>>>>>>> supported */
>>>>>>> +                               if (dev->node_props.cpu_cores_count
>>>>>>> == 0)
>>>>>>> +
>>>>>>> +sysfs_remove_file(mem->kobj,
>>>>>>> +
>>>>>>> &mem->attr_used);
>>>>>>> +                               kfd_remove_sysfs_file(mem->kobj,
>>>>>>> +                                               &mem->attr_props);
>>>>>>>                                  mem->kobj = NULL;
>>>>>>>                          }
>>>>>>>                  kobject_del(dev->kobj_mem); @@ -629,12 +648,23 @@
>>>>>>> static int kfd_build_sysfs_node_entry(struct kfd_topology_device
>>>>>>> *dev,
>>>>>>>                  if (ret < 0)
>>>>>>>                          return ret;
>>>>>>>
>>>>>>> -               mem->attr.name = "properties";
>>>>>>> -               mem->attr.mode = KFD_SYSFS_FILE_MODE;
>>>>>>> -               sysfs_attr_init(&mem->attr);
>>>>>>> -               ret = sysfs_create_file(mem->kobj, &mem->attr);
>>>>>>> +               mem->attr_props.name = "properties";
>>>>>>> +               mem->attr_props.mode = KFD_SYSFS_FILE_MODE;
>>>>>>> +               sysfs_attr_init(&mem->attr_props);
>>>>>>> +               ret = sysfs_create_file(mem->kobj,
>>>>>>> +&mem->attr_props);
>>>>>>>                  if (ret < 0)
>>>>>>>                          return ret;
>>>>>>> +
>>>>>>> +               /* TODO: Support APU/CPU memory usage */
>>>>>>> +               if (dev->node_props.cpu_cores_count == 0) {
>>>>>>> +                       mem->attr_used.name = "used_memory";
>>>>>>> +                       mem->attr_used.mode = KFD_SYSFS_FILE_MODE;
>>>>>>> +                       sysfs_attr_init(&mem->attr_used);
>>>>>>> +                       ret = sysfs_create_file(mem->kobj,
>>>>>>> &mem->attr_used);
>>>>>>> +                       if (ret < 0)
>>>>>>> +                               return ret;
>>>>>>> +               }
>>>>>>> +
>>>>>>>                  i++;
>>>>>>>          }
>>>>>>>
>>>>>>> @@ -1075,15 +1105,22 @@ static struct kfd_topology_device
>>>>>>> *kfd_assign_gpu(struct kfd_dev *gpu)
>>>>>>>   {
>>>>>>>          struct kfd_topology_device *dev;
>>>>>>>          struct kfd_topology_device *out_dev = NULL;
>>>>>>> +       struct kfd_mem_properties *mem;
>>>>>>>
>>>>>>>          down_write(&topology_lock);
>>>>>>>          list_for_each_entry(dev, &topology_device_list, list)
>>>>>>>                  if (!dev->gpu && (dev->node_props.simd_count > 0))
>>>>>>> {
>>>>>>>                          dev->gpu = gpu;
>>>>>>>                          out_dev = dev;
>>>>>>> +
>>>>>>> +                       /* Assign mem->gpu */
>>>>>>> +                       list_for_each_entry(mem, &dev->mem_props,
>>>>>>> list)
>>>>>>> +                               mem->gpu = dev->gpu;
>>>>>>> +
>>>>>>>                          break;
>>>>>>>                  }
>>>>>>>          up_write(&topology_lock);
>>>>>>> +
>>>>>>>          return out_dev;
>>>>>>>   }
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>>> index 53fca1f..0f698d8 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>>>>> @@ -93,7 +93,9 @@ struct kfd_mem_properties {
>>>>>>>          uint32_t                width;
>>>>>>>          uint32_t                mem_clk_max;
>>>>>>>          struct kobject          *kobj;
>>>>>>> -       struct attribute        attr;
>>>>>>> +       struct kfd_dev          *gpu;
>>>>>>> +       struct attribute        attr_props;
>>>>>>> +       struct attribute        attr_used;
>>>>>>>   };
>>>>>>>
>>>>>>>   #define HSA_CACHE_TYPE_DATA            0x00000001
>>>>>>> --
>>>>>>> 2.7.4
>>>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
@ 2018-01-02 23:41       ` Felix Kuehling
  0 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2018-01-02 23:41 UTC (permalink / raw)
  To: Bjorn Helgaas, Jay Cornwall
  Cc: oded.gabbay, amd-gfx, linux-pci, Ram Amrani, Doug Ledford,
	Michal Kalderon, Ariel Elior, Jason Gunthorpe

On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
> [+cc Ram, Michal, Ariel, Doug, Jason]
>
> The [29/37] in the subject makes it look like this is part of a larger
> series, but I can't find the rest of it on linux-pci or linux-kernel.
>
> I don't want to merge a new interface unless there's an in-tree user
> of it.  I assume the rest of the series includes a user.
>
> On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
[snip]
>> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
>> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
>> + * 128-bit AtomicOp completion, or negative otherwise.
>> + */
>> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
>> +{
>> +	struct pci_bus *bus = dev->bus;
>> +
>> +	if (!pci_is_pcie(dev))
>> +		return -EINVAL;
>> +
>> +	switch (pci_pcie_type(dev)) {
>> +	/*
>> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
>> +	 * to implement AtomicOp requester capabilities.
>> +	 */
>> +	case PCI_EXP_TYPE_ENDPOINT:
>> +	case PCI_EXP_TYPE_LEG_END:
>> +	case PCI_EXP_TYPE_RC_END:
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	while (bus->parent) {
>> +		struct pci_dev *bridge = bus->self;
>> +		u32 cap;
>> +
>> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
>> +
>> +		switch (pci_pcie_type(bridge)) {
>> +		/*
>> +		 * Upstream, downstream and root ports may implement AtomicOp
>> +		 * routing capabilities. AtomicOp routing via a root port is
>> +		 * not considered.
>> +		 */
>> +		case PCI_EXP_TYPE_UPSTREAM:
>> +		case PCI_EXP_TYPE_DOWNSTREAM:
>> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
>> +				return -EINVAL;
>> +			break;
>> +
>> +		/*
>> +		 * Root ports are permitted to implement AtomicOp completion
>> +		 * capabilities.
>> +		 */
>> +		case PCI_EXP_TYPE_ROOT_PORT:
>> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
>> +				return -EINVAL;
>> +			break;
>> +		}
> IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
> initiate AtomicOps that target system memory.  This interface
> (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
> the driver wants to do.  If the GPU requests a 128-bit op and the Root
> Port doesn't support it, I think we'll see an Unsupported Request
> error.
>
> Do you need to extend this interface so the driver can specify what
> sizes it wants?
>
> The existing code in qedr_pci_set_atomic() is very similar.  We should
> make this new interface work for both places, then actually use it in
> qedr_pci_set_atomic().

Hi Bjorn, Doug, Ram,

I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
seems to use a different criteria to find the completer for atomic
requests. Jay's function expects the root port to have a parent, which
was the case on the systems he tested. But Ram's function looks for a
bridge without a parent and checks completion capabilities on that. Jay
believes that to be a root complex, not a root port.

According to the spec, "Root ports are permitted to implement AtomicOp
completion capabilities." It talks about a root port, not a root complex.

Can you help us understand, which interpretation is correct? And how to
correctly identify the root port for checking completion capabilities?
Are there valid topologies where a root port does not have a parent?

Regards,
  Felix

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
@ 2018-01-02 23:41       ` Felix Kuehling
  0 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2018-01-02 23:41 UTC (permalink / raw)
  To: Bjorn Helgaas, Jay Cornwall
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w, Ariel Elior,
	linux-pci-u79uwXL29TY76Z2rM5mHXA, Ram Amrani,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Jason Gunthorpe,
	Doug Ledford, Michal Kalderon

On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
> [+cc Ram, Michal, Ariel, Doug, Jason]
>
> The [29/37] in the subject makes it look like this is part of a larger
> series, but I can't find the rest of it on linux-pci or linux-kernel.
>
> I don't want to merge a new interface unless there's an in-tree user
> of it.  I assume the rest of the series includes a user.
>
> On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
[snip]
>> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
>> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
>> + * 128-bit AtomicOp completion, or negative otherwise.
>> + */
>> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
>> +{
>> +	struct pci_bus *bus = dev->bus;
>> +
>> +	if (!pci_is_pcie(dev))
>> +		return -EINVAL;
>> +
>> +	switch (pci_pcie_type(dev)) {
>> +	/*
>> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
>> +	 * to implement AtomicOp requester capabilities.
>> +	 */
>> +	case PCI_EXP_TYPE_ENDPOINT:
>> +	case PCI_EXP_TYPE_LEG_END:
>> +	case PCI_EXP_TYPE_RC_END:
>> +		break;
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +
>> +	while (bus->parent) {
>> +		struct pci_dev *bridge = bus->self;
>> +		u32 cap;
>> +
>> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
>> +
>> +		switch (pci_pcie_type(bridge)) {
>> +		/*
>> +		 * Upstream, downstream and root ports may implement AtomicOp
>> +		 * routing capabilities. AtomicOp routing via a root port is
>> +		 * not considered.
>> +		 */
>> +		case PCI_EXP_TYPE_UPSTREAM:
>> +		case PCI_EXP_TYPE_DOWNSTREAM:
>> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
>> +				return -EINVAL;
>> +			break;
>> +
>> +		/*
>> +		 * Root ports are permitted to implement AtomicOp completion
>> +		 * capabilities.
>> +		 */
>> +		case PCI_EXP_TYPE_ROOT_PORT:
>> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
>> +				return -EINVAL;
>> +			break;
>> +		}
> IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
> initiate AtomicOps that target system memory.  This interface
> (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
> the driver wants to do.  If the GPU requests a 128-bit op and the Root
> Port doesn't support it, I think we'll see an Unsupported Request
> error.
>
> Do you need to extend this interface so the driver can specify what
> sizes it wants?
>
> The existing code in qedr_pci_set_atomic() is very similar.  We should
> make this new interface work for both places, then actually use it in
> qedr_pci_set_atomic().

Hi Bjorn, Doug, Ram,

I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
seems to use a different criteria to find the completer for atomic
requests. Jay's function expects the root port to have a parent, which
was the case on the systems he tested. But Ram's function looks for a
bridge without a parent and checks completion capabilities on that. Jay
believes that to be a root complex, not a root port.

According to the spec, "Root ports are permitted to implement AtomicOp
completion capabilities." It talks about a root port, not a root complex.

Can you help us understand, which interpretation is correct? And how to
correctly identify the root port for checking completion capabilities?
Are there valid topologies where a root port does not have a parent?

Regards,
  Felix

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info
       [not found]     ` <1512792555-26042-5-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-04 23:35       ` Dave Airlie
       [not found]         ` <CAPM=9typVidxNLS0+9JcUOcwohcOi469Ke56GcPDZKAh2Tg6fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Dave Airlie @ 2018-01-04 23:35 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Oded Gabbay, Ben Goz, Harish Kasiviswanathan, amd-gfx mailing list

On 9 December 2017 at 14:08, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>
> Implement new kgd-kfd interface function get_local_mem_info.
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Ben Goz <ben.goz@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 30 +++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
>  4 files changed, 34 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index cfb7827..56f6c12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -252,6 +252,36 @@ uint64_t get_vmem_size(struct kgd_dev *kgd)
>         return adev->mc.real_vram_size;
>  }
>
> +void get_local_mem_info(struct kgd_dev *kgd,
> +                       struct kfd_local_mem_info *mem_info)
> +{
> +       struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
> +       uint64_t address_mask = adev->dev->dma_mask ? ~*adev->dev->dma_mask :
> +                                            ~((1ULL << 32) - 1);
> +       resource_size_t aper_limit = adev->mc.aper_base + adev->mc.aper_size;
> +
> +       memset(mem_info, 0, sizeof(*mem_info));
> +       if (!(adev->mc.aper_base & address_mask || aper_limit & address_mask)) {
> +               mem_info->local_mem_size_public = adev->mc.visible_vram_size;
> +               mem_info->local_mem_size_private = adev->mc.real_vram_size -
> +                               adev->mc.visible_vram_size;
> +       } else {
> +               mem_info->local_mem_size_public = 0;
> +               mem_info->local_mem_size_private = adev->mc.real_vram_size;
> +       }
> +       mem_info->vram_width = adev->mc.vram_width;
> +
> +       pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
> +                       adev->mc.aper_base, aper_limit,
> +                       mem_in/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.cfo->local_mem_size_public,
> +                       mem_info->local_mem_size_private);

This patches introduces:

In file included from
/home/airlied/devel/kernel/drm-next/include/linux/kernel.h:14:0,
                 from
/home/airlied/devel/kernel/drm-next/include/asm-generic/bug.h:18,
                 from
/home/airlied/devel/kernel/drm-next/arch/arm/include/asm/bug.h:60,
                 from /home/airlied/devel/kernel/drm-next/include/linux/bug.h:5,
                 from
/home/airlied/devel/kernel/drm-next/include/linux/thread_info.h:12,
                 from
/home/airlied/devel/kernel/drm-next/include/asm-generic/current.h:5,
                 from ./arch/arm/include/generated/asm/current.h:1,
                 from
/home/airlied/devel/kernel/drm-next/include/linux/sched.h:12,
                 from
/home/airlied/devel/kernel/drm-next/arch/arm/include/asm/mmu_context.h:17,
                 from
/home/airlied/devel/kernel/drm-next/include/linux/mmu_context.h:5,
                 from
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:29,
                 from
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:23:
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:
In function ‘get_local_mem_info’:
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:11:
warning: format ‘%llx’ expects argument of type ‘long long unsigned
int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’
[-Wformat=]
  pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
           ^
/home/airlied/devel/kernel/drm-next/include/linux/printk.h:285:21:
note: in definition of macro ‘pr_fmt’
 #define pr_fmt(fmt) fmt
                     ^~~
/home/airlied/devel/kernel/drm-next/include/linux/printk.h:333:2:
note: in expansion of macro ‘dynamic_pr_debug’
  dynamic_pr_debug(fmt, ##__VA_ARGS__)
  ^~~~~~~~~~~~~~~~
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:2:
note: in expansion of macro ‘pr_debug’
  pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
  ^~~~~~~~
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:11:
warning: format ‘%llx’ expects argument of type ‘long long unsigned
int’, but argument 4 has type ‘resource_size_t {aka unsigned int}’
[-Wformat=]
  pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
           ^
/home/airlied/devel/kernel/drm-next/include/linux/printk.h:285:21:
note: in definition of macro ‘pr_fmt’
 #define pr_fmt(fmt) fmt
                     ^~~
/home/airlied/devel/kernel/drm-next/include/linux/printk.h:333:2:
note: in expansion of macro ‘dynamic_pr_debug’
  dynamic_pr_debug(fmt, ##__VA_ARGS__)
  ^~~~~~~~~~~~~~~~
/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:2:
note: in expansion of macro ‘pr_debug’
  pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
  ^~~~~~~~

On 32-bit arm build.

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
  2018-01-02 23:41       ` Felix Kuehling
  (?)
@ 2018-01-04 23:40       ` Bjorn Helgaas
  2018-01-05  0:09         ` Felix Kuehling
  -1 siblings, 1 reply; 62+ messages in thread
From: Bjorn Helgaas @ 2018-01-04 23:40 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Jay Cornwall, oded.gabbay, amd-gfx, linux-pci, Ram Amrani,
	Doug Ledford, Michal Kalderon, Ariel Elior, Jason Gunthorpe

On Tue, Jan 02, 2018 at 06:41:17PM -0500, Felix Kuehling wrote:
> On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
> > [+cc Ram, Michal, Ariel, Doug, Jason]
> >
> > The [29/37] in the subject makes it look like this is part of a larger
> > series, but I can't find the rest of it on linux-pci or linux-kernel.
> >
> > I don't want to merge a new interface unless there's an in-tree user
> > of it.  I assume the rest of the series includes a user.
> >
> > On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
> [snip]
> >> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
> >> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
> >> + * 128-bit AtomicOp completion, or negative otherwise.
> >> + */
> >> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
> >> +{
> >> +	struct pci_bus *bus = dev->bus;
> >> +
> >> +	if (!pci_is_pcie(dev))
> >> +		return -EINVAL;
> >> +
> >> +	switch (pci_pcie_type(dev)) {
> >> +	/*
> >> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
> >> +	 * to implement AtomicOp requester capabilities.
> >> +	 */
> >> +	case PCI_EXP_TYPE_ENDPOINT:
> >> +	case PCI_EXP_TYPE_LEG_END:
> >> +	case PCI_EXP_TYPE_RC_END:
> >> +		break;
> >> +	default:
> >> +		return -EINVAL;
> >> +	}
> >> +
> >> +	while (bus->parent) {
> >> +		struct pci_dev *bridge = bus->self;
> >> +		u32 cap;
> >> +
> >> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
> >> +
> >> +		switch (pci_pcie_type(bridge)) {
> >> +		/*
> >> +		 * Upstream, downstream and root ports may implement AtomicOp
> >> +		 * routing capabilities. AtomicOp routing via a root port is
> >> +		 * not considered.
> >> +		 */
> >> +		case PCI_EXP_TYPE_UPSTREAM:
> >> +		case PCI_EXP_TYPE_DOWNSTREAM:
> >> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> >> +				return -EINVAL;
> >> +			break;
> >> +
> >> +		/*
> >> +		 * Root ports are permitted to implement AtomicOp completion
> >> +		 * capabilities.
> >> +		 */
> >> +		case PCI_EXP_TYPE_ROOT_PORT:
> >> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
> >> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
> >> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
> >> +				return -EINVAL;
> >> +			break;
> >> +		}
> > IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
> > initiate AtomicOps that target system memory.  This interface
> > (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
> > the driver wants to do.  If the GPU requests a 128-bit op and the Root
> > Port doesn't support it, I think we'll see an Unsupported Request
> > error.
> >
> > Do you need to extend this interface so the driver can specify what
> > sizes it wants?
> >
> > The existing code in qedr_pci_set_atomic() is very similar.  We should
> > make this new interface work for both places, then actually use it in
> > qedr_pci_set_atomic().
> 
> Hi Bjorn, Doug, Ram,
> 
> I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
> seems to use a different criteria to find the completer for atomic
> requests. Jay's function expects the root port to have a parent, which
> was the case on the systems he tested. But Ram's function looks for a
> bridge without a parent and checks completion capabilities on that. Jay
> believes that to be a root complex, not a root port.

By "Ram's function", I guess you mean qedr_pci_set_atomic()?

That starts with a PCIe device ("pdev"; it assumes but does not check
that this is a PCIe device), and traverses through all the bridges
leading to it.  Usually this will be:

  endpoint -> root port
  endpoint -> switch downstream port -> switch upstream port -> root port

Or there may be additional switches in the middle.  The code is
actually not quite correct because it is legal to have this:

  endpoint -> PCI-to-PCIe bridge -> conventional PCI bridge -> ...

and qedr_pci_set_atomic() will traverse up through the conventional
part of the hierarchy, where there is no PCI_EXP_DEVCAP2.

In general, a Root Port is the root of a PCIe hierarchy and there is
no parent device.  E.g., on my laptop:

  00:1c.0 Intel Root Port (bridge to [bus 02])
  00:1c.2 Intel Root Port (bridge to [bus 04])

What sort of parent do you expect?  As I mentioned, it's legal to have
a PCI/PCI-X to PCIe bridge inside a conventional PCI hierarchy, but
that's a little unusual.

> According to the spec, "Root ports are permitted to implement AtomicOp
> completion capabilities." It talks about a root port, not a root complex.
> 
> Can you help us understand, which interpretation is correct? And how to
> correctly identify the root port for checking completion capabilities?

If you start with a PCIe device and traverse upstream, you should
eventually reach a Root Port or a PCI/PCI-X to PCIe bridge.

> Are there valid topologies where a root port does not have a parent?

I don't understand this because Root Ports normally do not have
parents.

PCIe devices other than Root Ports normally have a Root Port (or
PCI/PCI-X to PCIe bridge) at the root of the PCIe hierarchy, but there
are definitely exceptions.

For example, there some systems where the Root Port is not visible to
Linux, e.g.,
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ee8bdfb6568d
On systems like that, I don't think you can safely use AtomicOps.

Bjorn

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info
       [not found]         ` <CAPM=9typVidxNLS0+9JcUOcwohcOi469Ke56GcPDZKAh2Tg6fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-01-04 23:54           ` Felix Kuehling
       [not found]             ` <997ef22c-f58b-ab06-9ba0-bf3ee290d8f7-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 62+ messages in thread
From: Felix Kuehling @ 2018-01-04 23:54 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Oded Gabbay, Ben Goz, Harish Kasiviswanathan, amd-gfx mailing list

I see. resource_size_t (alias for phys_addr_t) is u64 on x86 and x86_64.
But probably only u32 on ARM-32. This depends on
CONFIG_PHYS_ADDR_T_64BIT: include/linux/types.h:

    #ifdef CONFIG_PHYS_ADDR_T_64BIT
    typedef u64 phys_addr_t;
    #else
    typedef u32 phys_addr_t;
    #endif

The easiest solution is probably to define aper_limit as uint64_t.

Regards,
  Felix


On 2018-01-04 06:35 PM, Dave Airlie wrote:
> On 9 December 2017 at 14:08, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>>
>> Implement new kgd-kfd interface function get_local_mem_info.
>>
>> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 30 +++++++++++++++++++++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  2 ++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
>>  4 files changed, 34 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> index cfb7827..56f6c12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> @@ -252,6 +252,36 @@ uint64_t get_vmem_size(struct kgd_dev *kgd)
>>         return adev->mc.real_vram_size;
>>  }
>>
>> +void get_local_mem_info(struct kgd_dev *kgd,
>> +                       struct kfd_local_mem_info *mem_info)
>> +{
>> +       struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
>> +       uint64_t address_mask = adev->dev->dma_mask ? ~*adev->dev->dma_mask :
>> +                                            ~((1ULL << 32) - 1);
>> +       resource_size_t aper_limit = adev->mc.aper_base + adev->mc.aper_size;
>> +
>> +       memset(mem_info, 0, sizeof(*mem_info));
>> +       if (!(adev->mc.aper_base & address_mask || aper_limit & address_mask)) {
>> +               mem_info->local_mem_size_public = adev->mc.visible_vram_size;
>> +               mem_info->local_mem_size_private = adev->mc.real_vram_size -
>> +                               adev->mc.visible_vram_size;
>> +       } else {
>> +               mem_info->local_mem_size_public = 0;
>> +               mem_info->local_mem_size_private = adev->mc.real_vram_size;
>> +       }
>> +       mem_info->vram_width = adev->mc.vram_width;
>> +
>> +       pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>> +                       adev->mc.aper_base, aper_limit,
>> +                       mem_in/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.cfo->local_mem_size_public,
>> +                       mem_info->local_mem_size_private);
> This patches introduces:
>
> In file included from
> /home/airlied/devel/kernel/drm-next/include/linux/kernel.h:14:0,
>                  from
> /home/airlied/devel/kernel/drm-next/include/asm-generic/bug.h:18,
>                  from
> /home/airlied/devel/kernel/drm-next/arch/arm/include/asm/bug.h:60,
>                  from /home/airlied/devel/kernel/drm-next/include/linux/bug.h:5,
>                  from
> /home/airlied/devel/kernel/drm-next/include/linux/thread_info.h:12,
>                  from
> /home/airlied/devel/kernel/drm-next/include/asm-generic/current.h:5,
>                  from ./arch/arm/include/generated/asm/current.h:1,
>                  from
> /home/airlied/devel/kernel/drm-next/include/linux/sched.h:12,
>                  from
> /home/airlied/devel/kernel/drm-next/arch/arm/include/asm/mmu_context.h:17,
>                  from
> /home/airlied/devel/kernel/drm-next/include/linux/mmu_context.h:5,
>                  from
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:29,
>                  from
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:23:
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:
> In function ‘get_local_mem_info’:
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:11:
> warning: format ‘%llx’ expects argument of type ‘long long unsigned
> int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’
> [-Wformat=]
>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>            ^
> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:285:21:
> note: in definition of macro ‘pr_fmt’
>  #define pr_fmt(fmt) fmt
>                      ^~~
> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:333:2:
> note: in expansion of macro ‘dynamic_pr_debug’
>   dynamic_pr_debug(fmt, ##__VA_ARGS__)
>   ^~~~~~~~~~~~~~~~
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:2:
> note: in expansion of macro ‘pr_debug’
>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>   ^~~~~~~~
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:11:
> warning: format ‘%llx’ expects argument of type ‘long long unsigned
> int’, but argument 4 has type ‘resource_size_t {aka unsigned int}’
> [-Wformat=]
>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>            ^
> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:285:21:
> note: in definition of macro ‘pr_fmt’
>  #define pr_fmt(fmt) fmt
>                      ^~~
> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:333:2:
> note: in expansion of macro ‘dynamic_pr_debug’
>   dynamic_pr_debug(fmt, ##__VA_ARGS__)
>   ^~~~~~~~~~~~~~~~
> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:2:
> note: in expansion of macro ‘pr_debug’
>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>   ^~~~~~~~
>
> On 32-bit arm build.
>
> Dave.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root
  2018-01-04 23:40       ` Bjorn Helgaas
@ 2018-01-05  0:09         ` Felix Kuehling
  0 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2018-01-05  0:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Jay Cornwall, oded.gabbay, amd-gfx, linux-pci, Ram Amrani,
	Doug Ledford, Michal Kalderon, Ariel Elior, Jason Gunthorpe

Hi Bjorn,

I figured it out. The difference between the functions is whether they
use struct pci_bus * or struct pci_dev * as cursor. I found that the
functions are in fact equivalent. The last loop iteration in
pci_enable_atomic_ops_to_root is equivalent to the code after the loop
in qedr_pci_set_atomic. Both handle the root port.

I think my confusion was based on an incorrect assumption that
bridge->bus->self is the same as bridge. But brigde->bus is in fact the
parent bus.

I sent out an updated patch that addresses your comments to the previous
version. This should be general enough that it can replace
qedr_pci_set_atomic.

Regards,
  Felix


On 2018-01-04 06:40 PM, Bjorn Helgaas wrote:
> On Tue, Jan 02, 2018 at 06:41:17PM -0500, Felix Kuehling wrote:
>> On 2017-12-12 06:27 PM, Bjorn Helgaas wrote:
>>> [+cc Ram, Michal, Ariel, Doug, Jason]
>>>
>>> The [29/37] in the subject makes it look like this is part of a larger
>>> series, but I can't find the rest of it on linux-pci or linux-kernel.
>>>
>>> I don't want to merge a new interface unless there's an in-tree user
>>> of it.  I assume the rest of the series includes a user.
>>>
>>> On Fri, Dec 08, 2017 at 11:09:07PM -0500, Felix Kuehling wrote:
>> [snip]
>>>> + * all upstream bridges support AtomicOp routing, egress blocking is disabled
>>>> + * on all upstream ports, and the root port supports 32-bit, 64-bit and/or
>>>> + * 128-bit AtomicOp completion, or negative otherwise.
>>>> + */
>>>> +int pci_enable_atomic_ops_to_root(struct pci_dev *dev)
>>>> +{
>>>> +	struct pci_bus *bus = dev->bus;
>>>> +
>>>> +	if (!pci_is_pcie(dev))
>>>> +		return -EINVAL;
>>>> +
>>>> +	switch (pci_pcie_type(dev)) {
>>>> +	/*
>>>> +	 * PCIe 3.0, 6.15 specifies that endpoints and root ports are permitted
>>>> +	 * to implement AtomicOp requester capabilities.
>>>> +	 */
>>>> +	case PCI_EXP_TYPE_ENDPOINT:
>>>> +	case PCI_EXP_TYPE_LEG_END:
>>>> +	case PCI_EXP_TYPE_RC_END:
>>>> +		break;
>>>> +	default:
>>>> +		return -EINVAL;
>>>> +	}
>>>> +
>>>> +	while (bus->parent) {
>>>> +		struct pci_dev *bridge = bus->self;
>>>> +		u32 cap;
>>>> +
>>>> +		pcie_capability_read_dword(bridge, PCI_EXP_DEVCAP2, &cap);
>>>> +
>>>> +		switch (pci_pcie_type(bridge)) {
>>>> +		/*
>>>> +		 * Upstream, downstream and root ports may implement AtomicOp
>>>> +		 * routing capabilities. AtomicOp routing via a root port is
>>>> +		 * not considered.
>>>> +		 */
>>>> +		case PCI_EXP_TYPE_UPSTREAM:
>>>> +		case PCI_EXP_TYPE_DOWNSTREAM:
>>>> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
>>>> +				return -EINVAL;
>>>> +			break;
>>>> +
>>>> +		/*
>>>> +		 * Root ports are permitted to implement AtomicOp completion
>>>> +		 * capabilities.
>>>> +		 */
>>>> +		case PCI_EXP_TYPE_ROOT_PORT:
>>>> +			if (!(cap & (PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
>>>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP64 |
>>>> +				     PCI_EXP_DEVCAP2_ATOMIC_COMP128)))
>>>> +				return -EINVAL;
>>>> +			break;
>>>> +		}
>>> IIUC, you want to enable an endpoint, e.g., an AMD Fiji-class GPU, to
>>> initiate AtomicOps that target system memory.  This interface
>>> (pci_enable_atomic_ops_to_root()) doesn't specify what size operations
>>> the driver wants to do.  If the GPU requests a 128-bit op and the Root
>>> Port doesn't support it, I think we'll see an Unsupported Request
>>> error.
>>>
>>> Do you need to extend this interface so the driver can specify what
>>> sizes it wants?
>>>
>>> The existing code in qedr_pci_set_atomic() is very similar.  We should
>>> make this new interface work for both places, then actually use it in
>>> qedr_pci_set_atomic().
>> Hi Bjorn, Doug, Ram,
>>
>> I just discussed this with Jay, and he noticed that qedr_pci_set_atomic
>> seems to use a different criteria to find the completer for atomic
>> requests. Jay's function expects the root port to have a parent, which
>> was the case on the systems he tested. But Ram's function looks for a
>> bridge without a parent and checks completion capabilities on that. Jay
>> believes that to be a root complex, not a root port.
> By "Ram's function", I guess you mean qedr_pci_set_atomic()?
>
> That starts with a PCIe device ("pdev"; it assumes but does not check
> that this is a PCIe device), and traverses through all the bridges
> leading to it.  Usually this will be:
>
>   endpoint -> root port
>   endpoint -> switch downstream port -> switch upstream port -> root port
>
> Or there may be additional switches in the middle.  The code is
> actually not quite correct because it is legal to have this:
>
>   endpoint -> PCI-to-PCIe bridge -> conventional PCI bridge -> ...
>
> and qedr_pci_set_atomic() will traverse up through the conventional
> part of the hierarchy, where there is no PCI_EXP_DEVCAP2.
>
> In general, a Root Port is the root of a PCIe hierarchy and there is
> no parent device.  E.g., on my laptop:
>
>   00:1c.0 Intel Root Port (bridge to [bus 02])
>   00:1c.2 Intel Root Port (bridge to [bus 04])
>
> What sort of parent do you expect?  As I mentioned, it's legal to have
> a PCI/PCI-X to PCIe bridge inside a conventional PCI hierarchy, but
> that's a little unusual.
>
>> According to the spec, "Root ports are permitted to implement AtomicOp
>> completion capabilities." It talks about a root port, not a root complex.
>>
>> Can you help us understand, which interpretation is correct? And how to
>> correctly identify the root port for checking completion capabilities?
> If you start with a PCIe device and traverse upstream, you should
> eventually reach a Root Port or a PCI/PCI-X to PCIe bridge.
>
>> Are there valid topologies where a root port does not have a parent?
> I don't understand this because Root Ports normally do not have
> parents.
>
> PCIe devices other than Root Ports normally have a Root Port (or
> PCI/PCI-X to PCIe bridge) at the root of the PCIe hierarchy, but there
> are definitely exceptions.
>
> For example, there some systems where the Root Port is not visible to
> Linux, e.g.,
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ee8bdfb6568d
> On systems like that, I don't think you can safely use AtomicOps.
>
> Bjorn

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info
       [not found]             ` <997ef22c-f58b-ab06-9ba0-bf3ee290d8f7-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-05  0:16               ` Felix Kuehling
  0 siblings, 0 replies; 62+ messages in thread
From: Felix Kuehling @ 2018-01-05  0:16 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Oded Gabbay, Ben Goz, Harish Kasiviswanathan, amd-gfx mailing list

On 2018-01-04 06:54 PM, Felix Kuehling wrote:
> I see. resource_size_t (alias for phys_addr_t) is u64 on x86 and x86_64.
> But probably only u32 on ARM-32. This depends on
> CONFIG_PHYS_ADDR_T_64BIT: include/linux/types.h:
>
>     #ifdef CONFIG_PHYS_ADDR_T_64BIT
>     typedef u64 phys_addr_t;
>     #else
>     typedef u32 phys_addr_t;
>     #endif
>
> The easiest solution is probably to define aper_limit as uint64_t.

That said, KFD isn't supported on 32-bit kernels anyway. We probably
don't need to compile this file at all on 32-bit kernels. That would
probably need more changes elsewhere to make sure no one tries to call
these functions on 32-bit kernels, or define empty stubs.

>
> Regards,
>   Felix
>
>
> On 2018-01-04 06:35 PM, Dave Airlie wrote:
>> On 9 December 2017 at 14:08, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>> From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>>>
>>> Implement new kgd-kfd interface function get_local_mem_info.
>>>
>>> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>>> Signed-off-by: Ben Goz <ben.goz@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 30 +++++++++++++++++++++++
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  2 ++
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
>>>  4 files changed, 34 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> index cfb7827..56f6c12 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> @@ -252,6 +252,36 @@ uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>         return adev->mc.real_vram_size;
>>>  }
>>>
>>> +void get_local_mem_info(struct kgd_dev *kgd,
>>> +                       struct kfd_local_mem_info *mem_info)
>>> +{
>>> +       struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
>>> +       uint64_t address_mask = adev->dev->dma_mask ? ~*adev->dev->dma_mask :
>>> +                                            ~((1ULL << 32) - 1);
>>> +       resource_size_t aper_limit = adev->mc.aper_base + adev->mc.aper_size;
>>> +
>>> +       memset(mem_info, 0, sizeof(*mem_info));
>>> +       if (!(adev->mc.aper_base & address_mask || aper_limit & address_mask)) {
>>> +               mem_info->local_mem_size_public = adev->mc.visible_vram_size;
>>> +               mem_info->local_mem_size_private = adev->mc.real_vram_size -
>>> +                               adev->mc.visible_vram_size;
>>> +       } else {
>>> +               mem_info->local_mem_size_public = 0;
>>> +               mem_info->local_mem_size_private = adev->mc.real_vram_size;
>>> +       }
>>> +       mem_info->vram_width = adev->mc.vram_width;
>>> +
>>> +       pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>>> +                       adev->mc.aper_base, aper_limit,
>>> +                       mem_in/home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.cfo->local_mem_size_public,
>>> +                       mem_info->local_mem_size_private);
>> This patches introduces:
>>
>> In file included from
>> /home/airlied/devel/kernel/drm-next/include/linux/kernel.h:14:0,
>>                  from
>> /home/airlied/devel/kernel/drm-next/include/asm-generic/bug.h:18,
>>                  from
>> /home/airlied/devel/kernel/drm-next/arch/arm/include/asm/bug.h:60,
>>                  from /home/airlied/devel/kernel/drm-next/include/linux/bug.h:5,
>>                  from
>> /home/airlied/devel/kernel/drm-next/include/linux/thread_info.h:12,
>>                  from
>> /home/airlied/devel/kernel/drm-next/include/asm-generic/current.h:5,
>>                  from ./arch/arm/include/generated/asm/current.h:1,
>>                  from
>> /home/airlied/devel/kernel/drm-next/include/linux/sched.h:12,
>>                  from
>> /home/airlied/devel/kernel/drm-next/arch/arm/include/asm/mmu_context.h:17,
>>                  from
>> /home/airlied/devel/kernel/drm-next/include/linux/mmu_context.h:5,
>>                  from
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h:29,
>>                  from
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:23:
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:
>> In function ‘get_local_mem_info’:
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:11:
>> warning: format ‘%llx’ expects argument of type ‘long long unsigned
>> int’, but argument 3 has type ‘resource_size_t {aka unsigned int}’
>> [-Wformat=]
>>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>>            ^
>> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:285:21:
>> note: in definition of macro ‘pr_fmt’
>>  #define pr_fmt(fmt) fmt
>>                      ^~~
>> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:333:2:
>> note: in expansion of macro ‘dynamic_pr_debug’
>>   dynamic_pr_debug(fmt, ##__VA_ARGS__)
>>   ^~~~~~~~~~~~~~~~
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:2:
>> note: in expansion of macro ‘pr_debug’
>>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>>   ^~~~~~~~
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:11:
>> warning: format ‘%llx’ expects argument of type ‘long long unsigned
>> int’, but argument 4 has type ‘resource_size_t {aka unsigned int}’
>> [-Wformat=]
>>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>>            ^
>> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:285:21:
>> note: in definition of macro ‘pr_fmt’
>>  #define pr_fmt(fmt) fmt
>>                      ^~~
>> /home/airlied/devel/kernel/drm-next/include/linux/printk.h:333:2:
>> note: in expansion of macro ‘dynamic_pr_debug’
>>   dynamic_pr_debug(fmt, ##__VA_ARGS__)
>>   ^~~~~~~~~~~~~~~~
>> /home/airlied/devel/kernel/drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:297:2:
>> note: in expansion of macro ‘pr_debug’
>>   pr_debug("Address base: 0x%llx limit 0x%llx public 0x%llx private 0x%llx\n",
>>   ^~~~~~~~
>>
>> On 32-bit arm build.
>>
>> Dave.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-01-05  0:16 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-09  4:08 [PATCH 00/37] KFD dGPU topology and initialization Felix Kuehling
     [not found] ` <1512792555-26042-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2017-12-09  4:08   ` [PATCH 01/37] drm/amd: add new interface to query cu info Felix Kuehling
2017-12-09  4:08   ` [PATCH 02/37] drm/amdgpu: add amdgpu " Felix Kuehling
2017-12-09  4:08   ` [PATCH 03/37] drm/amd: Add get_local_mem_info to KGD-KFD interface Felix Kuehling
2017-12-09  4:08   ` [PATCH 04/37] drm/amdgpu: Implement get_local_mem_info Felix Kuehling
     [not found]     ` <1512792555-26042-5-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-01-04 23:35       ` Dave Airlie
     [not found]         ` <CAPM=9typVidxNLS0+9JcUOcwohcOi469Ke56GcPDZKAh2Tg6fg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-01-04 23:54           ` Felix Kuehling
     [not found]             ` <997ef22c-f58b-ab06-9ba0-bf3ee290d8f7-5C7GfCeVMHo@public.gmane.org>
2018-01-05  0:16               ` Felix Kuehling
2017-12-09  4:08   ` [PATCH 05/37] drm/amdkfd: Stop using get_vmem_size KGD-KFD interface Felix Kuehling
2017-12-09  4:08   ` [PATCH 06/37] drm/amdkfd: Remove deprecated get_vmem_size Felix Kuehling
2017-12-09  4:08   ` [PATCH 07/37] drm/amd: Remove get_vmem_size from KGD-KFD interface Felix Kuehling
2017-12-09  4:08   ` [PATCH 08/37] drm/amdkfd: Update number of compute unit from KGD Felix Kuehling
2017-12-09  4:08   ` [PATCH 09/37] drm/amdkfd: Topology: Fix location_id Felix Kuehling
2017-12-09  4:08   ` [PATCH 10/37] drm/amdkfd: Fix memory leaks in kfd topology Felix Kuehling
2017-12-09  4:08   ` [PATCH 11/37] drm/amdkfd: Group up CRAT related functions Felix Kuehling
2017-12-09  4:08   ` [PATCH 12/37] drm/amdkfd: Coding style cleanup Felix Kuehling
2017-12-09  4:08   ` [PATCH 13/37] drm/amdkfd: Reorganize CRAT fetching from ACPI Felix Kuehling
2017-12-09  4:08   ` [PATCH 14/37] drm/amdkfd: Decouple CRAT parsing from device list update Felix Kuehling
2017-12-09  4:08   ` [PATCH 15/37] drm/amdkfd: Support enumerating non-GPU devices Felix Kuehling
2017-12-09  4:08   ` [PATCH 16/37] drm/amdkfd: sync IOLINK defines to thunk spec Felix Kuehling
2017-12-09  4:08   ` [PATCH 17/37] drm/amdkfd: Turn verbose topology messages into pr_debug Felix Kuehling
2017-12-09  4:08   ` [PATCH 18/37] drm/amdkfd: Simplify counting of memory banks Felix Kuehling
2017-12-09  4:08   ` [PATCH 19/37] drm/amdkfd: Fix sibling_map[] size Felix Kuehling
2017-12-09  4:08   ` [PATCH 20/37] drm/amdkfd: Add topology support for CPUs Felix Kuehling
2017-12-09  4:08   ` [PATCH 21/37] drm/amdkfd: Add topology support for dGPUs Felix Kuehling
     [not found]     ` <1512792555-26042-22-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2017-12-11 14:46       ` Oded Gabbay
     [not found]         ` <CAFCwf10_AOKQaQU31Mnn+2fO=awvO-DWaM7bTzO-khjk=yw+8w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-11 16:20           ` Felix Kuehling
2017-12-09  4:09   ` [PATCH 22/37] drm/amdkfd: Add perf counters to topology Felix Kuehling
     [not found]     ` <1512792555-26042-23-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2017-12-11 15:23       ` Oded Gabbay
     [not found]         ` <CAFCwf11fXHVD+OaQ3RXFNb+wttvqZM8b=7+UfhrGw_uSoiph0A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-11 15:47           ` Alex Deucher
     [not found]             ` <CADnq5_OAhLbnA+07J+T+V82=Q0SZwB4r8zUv7D+KDXrqV==+GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-11 16:45               ` Oded Gabbay
2017-12-11 15:53           ` Christian König
     [not found]             ` <79968e85-d5dd-59ad-c6c1-9ca6f93a3c37-5C7GfCeVMHo@public.gmane.org>
2017-12-11 16:32               ` Oded Gabbay
     [not found]                 ` <CAFCwf10FhChovT58CFzb0LEVGG4jZBocdbb_57iSRj9GnenGyA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-11 17:26                   ` Christian König
2017-12-11 19:54           ` Felix Kuehling
     [not found]             ` <02742353-8c54-4bed-e013-f29de62a43d5-5C7GfCeVMHo@public.gmane.org>
2017-12-12  8:15               ` Oded Gabbay
     [not found]                 ` <CAFCwf13RLA1pnL+Srh2WVz-YuNgS0yE__F3-+LUHY=9hsEH7MQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-16 20:48                   ` Felix Kühling
     [not found]                     ` <fec4e14f-5ccf-1266-e0c4-47089b6c4d0c-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-19  9:40                       ` Oded Gabbay
2017-12-09  4:09   ` [PATCH 23/37] drm/amdkfd: Fixup incorrect info in the CZ CRAT table Felix Kuehling
2017-12-09  4:09   ` [PATCH 24/37] drm/amdkfd: Add AQL Queue Memory flag on topology Felix Kuehling
2017-12-09  4:09   ` [PATCH 25/37] drm/amdkfd: Module option to disable CRAT table Felix Kuehling
2017-12-09  4:09   ` [PATCH 26/37] drm/amdkfd: Ignore ACPI CRAT for non-APU systems Felix Kuehling
2017-12-09  4:09   ` [PATCH 27/37] drm/amdgpu: Add support for reporting VRAM usage Felix Kuehling
2017-12-09  4:09   ` [PATCH 28/37] drm/amdkfd: Add support for displaying " Felix Kuehling
     [not found]     ` <1512792555-26042-29-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2017-12-11 15:32       ` Oded Gabbay
     [not found]         ` <CAFCwf12qZXOyS3iwHd5WnBLKce9Kf+7gMu720uiN_TJSzuFngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-11 16:40           ` Oded Gabbay
     [not found]             ` <CAFCwf129dU00ocL=btDWh0bjHVaX-JsqWcwAr1uLpJmvj1gkKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-11 17:28               ` Christian König
     [not found]                 ` <64ce9318-7aa1-d654-7d41-e24cb7ad056a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-11 19:52                   ` Felix Kuehling
     [not found]                     ` <71874366-c697-8e90-de59-4e5f1d4f797b-5C7GfCeVMHo@public.gmane.org>
2017-12-12 11:11                       ` Russell, Kent
     [not found]                         ` <BN6PR1201MB01800128BE70E4BF0E4DEE4085340-6iU6OBHu2P/H0AMcJMwsYmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-12-13  7:23                           ` Oded Gabbay
     [not found]                             ` <CAFCwf12RbD-AaecdAjurqyAYXU6UqvpdLM+mpc7VdvXuYtp=TA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-12-16 20:36                               ` Felix Kühling
     [not found]                                 ` <f64e2633-9424-26d4-0a35-166a2b8a62c8-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-19  9:42                                   ` Oded Gabbay
2017-12-10 10:26   ` [PATCH 00/37] KFD dGPU topology and initialization Oded Gabbay
2017-12-09  4:09 ` [PATCH 29/37] PCI: Add pci_enable_atomic_ops_to_root Felix Kuehling
2017-12-12 23:27   ` Bjorn Helgaas
2017-12-12 23:42     ` Jason Gunthorpe
2017-12-13  7:22       ` Oded Gabbay
2017-12-13  7:22         ` Oded Gabbay
2018-01-02 23:41     ` Felix Kuehling
2018-01-02 23:41       ` Felix Kuehling
2018-01-04 23:40       ` Bjorn Helgaas
2018-01-05  0:09         ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.