All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/11] HMM profiler interface
@ 2022-06-28 14:50 Philip Yang
  2022-06-28 14:50 ` [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers Philip Yang
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

This implements KFD profiling APIs to expose HMM migration and 
recoverable page fault profiling data. The ROCm profiler will shared 
link with application, to collect and expose the profiling data to 
application developers to tune the applications based on how the address 
range attributes affect the behavior and performance. Kernel perf and 
ftrace requires superuser permission to collect data, it is not suitable 
for ROCm profiler.

The profiling data is per process per device event uses the existing SMI 
(system management interface) event API. Each event log is one line of 
text with the event specific information.

For user space usage example:
patch 9/11, 10/11 Thunk libhsakmt is based on
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface

patch 11/11 ROCr Basic-SVM-profiler patch is based on
https://github.com/RadeonOpenCompute/ROCR-Runtime

v5:
 * Fix multi-thead profiling support
 * Added user space usage example Thunk and ROCr patch

v4:
 * Add event helper function
 * Rebase to 5.16 kernel

v3:
 * Changes from Felix's review

v2:
 * Keep existing events behaviour
 * Use ktime_get_boottime_ns() as timestamp to correlate with other APIs
 * Use compact message layout, stick with existing message convention
 * Add unmap from GPU event

Philip Yang (8):
  drm/amdkfd: Add KFD SMI event IDs and triggers
  drm/amdkfd: Enable per process SMI event
  drm/amdkfd: Add GPU recoverable fault SMI event
  drm/amdkfd: Add migration SMI event
  drm/amdkfd: Add user queue eviction restore SMI event
  drm/amdkfd: Add unmap from GPU SMI event
  drm/amdkfd: Asynchronously free smi_client
  drm/amdkfd: Bump KFD API version for SMI profiling event

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  12 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c      |  53 +++++--
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h      |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  15 +-
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   | 134 ++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h   |  21 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  64 ++++++---
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h          |   2 +-
 include/uapi/linux/kfd_ioctl.h                |  40 +++++-
 13 files changed, 293 insertions(+), 65 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:46   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH v5 2/11] drm/amdkfd: Enable per process SMI event Philip Yang
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Define new system management interface event IDs for migration, GPU
recoverable page fault, user queues eviction, restore and unmap from
GPU events and corresponding event triggers, those will be implemented
in the following patches.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 37 ++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index c648ed7c5ff1..f239e260796b 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -468,6 +468,43 @@ enum kfd_smi_event {
 	KFD_SMI_EVENT_THERMAL_THROTTLE = 2,
 	KFD_SMI_EVENT_GPU_PRE_RESET = 3,
 	KFD_SMI_EVENT_GPU_POST_RESET = 4,
+	KFD_SMI_EVENT_MIGRATE_START = 5,
+	KFD_SMI_EVENT_MIGRATE_END = 6,
+	KFD_SMI_EVENT_PAGE_FAULT_START = 7,
+	KFD_SMI_EVENT_PAGE_FAULT_END = 8,
+	KFD_SMI_EVENT_QUEUE_EVICTION = 9,
+	KFD_SMI_EVENT_QUEUE_RESTORE = 10,
+	KFD_SMI_EVENT_UNMAP_FROM_GPU = 11,
+
+	/*
+	 * max event number, as a flag bit to get events from all processes,
+	 * this requires super user permission, otherwise will not be able to
+	 * receive event from any process. Without this flag to receive events
+	 * from same process.
+	 */
+	KFD_SMI_EVENT_ALL_PROCESS = 64
+};
+
+enum KFD_MIGRATE_TRIGGERS {
+	KFD_MIGRATE_TRIGGER_PREFETCH,
+	KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU,
+	KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU,
+	KFD_MIGRATE_TRIGGER_TTM_EVICTION
+};
+
+enum KFD_QUEUE_EVICTION_TRIGGERS {
+	KFD_QUEUE_EVICTION_TRIGGER_SVM,
+	KFD_QUEUE_EVICTION_TRIGGER_USERPTR,
+	KFD_QUEUE_EVICTION_TRIGGER_TTM,
+	KFD_QUEUE_EVICTION_TRIGGER_SUSPEND,
+	KFD_QUEUE_EVICTION_CRIU_CHECKPOINT,
+	KFD_QUEUE_EVICTION_CRIU_RESTORE
+};
+
+enum KFD_SVM_UNMAP_TRIGGERS {
+	KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY,
+	KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY_MIGRATE,
+	KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU
 };
 
 #define KFD_SMI_EVENT_MASK_FROM_INDEX(i) (1ULL << ((i) - 1))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 2/11] drm/amdkfd: Enable per process SMI event
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
  2022-06-28 14:50 ` [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-28 14:50 ` [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault " Philip Yang
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Process receive event from same process by default. Add a flag to be
able to receive event from all processes, this requires super user
permission.

Event using pid 0 to send the event to all processes, to keep the
default behavior of existing SMI events.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 37 +++++++++++++++------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index f2e1d506ba21..55ed026435e2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -38,6 +38,8 @@ struct kfd_smi_client {
 	uint64_t events;
 	struct kfd_dev *dev;
 	spinlock_t lock;
+	pid_t pid;
+	bool suser;
 };
 
 #define MAX_KFIFO_SIZE	1024
@@ -151,16 +153,27 @@ static int kfd_smi_ev_release(struct inode *inode, struct file *filep)
 	return 0;
 }
 
-static void add_event_to_kfifo(struct kfd_dev *dev, unsigned int smi_event,
-			      char *event_msg, int len)
+static bool kfd_smi_ev_enabled(pid_t pid, struct kfd_smi_client *client,
+			       unsigned int event)
+{
+	uint64_t all = KFD_SMI_EVENT_MASK_FROM_INDEX(KFD_SMI_EVENT_ALL_PROCESS);
+	uint64_t events = READ_ONCE(client->events);
+
+	if (pid && client->pid != pid && !(client->suser && (events & all)))
+		return false;
+
+	return events & KFD_SMI_EVENT_MASK_FROM_INDEX(event);
+}
+
+static void add_event_to_kfifo(pid_t pid, struct kfd_dev *dev,
+			       unsigned int smi_event, char *event_msg, int len)
 {
 	struct kfd_smi_client *client;
 
 	rcu_read_lock();
 
 	list_for_each_entry_rcu(client, &dev->smi_clients, list) {
-		if (!(READ_ONCE(client->events) &
-				KFD_SMI_EVENT_MASK_FROM_INDEX(smi_event)))
+		if (!kfd_smi_ev_enabled(pid, client, smi_event))
 			continue;
 		spin_lock(&client->lock);
 		if (kfifo_avail(&client->fifo) >= len) {
@@ -176,9 +189,9 @@ static void add_event_to_kfifo(struct kfd_dev *dev, unsigned int smi_event,
 	rcu_read_unlock();
 }
 
-__printf(3, 4)
-static void kfd_smi_event_add(struct kfd_dev *dev, unsigned int event,
-			      char *fmt, ...)
+__printf(4, 5)
+static void kfd_smi_event_add(pid_t pid, struct kfd_dev *dev,
+			      unsigned int event, char *fmt, ...)
 {
 	char fifo_in[KFD_SMI_EVENT_MSG_SIZE];
 	int len;
@@ -193,7 +206,7 @@ static void kfd_smi_event_add(struct kfd_dev *dev, unsigned int event,
 	len += vsnprintf(fifo_in + len, sizeof(fifo_in) - len, fmt, args);
 	va_end(args);
 
-	add_event_to_kfifo(dev, event, fifo_in, len);
+	add_event_to_kfifo(pid, dev, event, fifo_in, len);
 }
 
 void kfd_smi_event_update_gpu_reset(struct kfd_dev *dev, bool post_reset)
@@ -206,13 +219,13 @@ void kfd_smi_event_update_gpu_reset(struct kfd_dev *dev, bool post_reset)
 		event = KFD_SMI_EVENT_GPU_PRE_RESET;
 		++(dev->reset_seq_num);
 	}
-	kfd_smi_event_add(dev, event, "%x\n", dev->reset_seq_num);
+	kfd_smi_event_add(0, dev, event, "%x\n", dev->reset_seq_num);
 }
 
 void kfd_smi_event_update_thermal_throttling(struct kfd_dev *dev,
 					     uint64_t throttle_bitmask)
 {
-	kfd_smi_event_add(dev, KFD_SMI_EVENT_THERMAL_THROTTLE, "%llx:%llx\n",
+	kfd_smi_event_add(0, dev, KFD_SMI_EVENT_THERMAL_THROTTLE, "%llx:%llx\n",
 			  throttle_bitmask,
 			  amdgpu_dpm_get_thermal_throttling_counter(dev->adev));
 }
@@ -227,7 +240,7 @@ void kfd_smi_event_update_vmfault(struct kfd_dev *dev, uint16_t pasid)
 	if (!task_info.pid)
 		return;
 
-	kfd_smi_event_add(dev, KFD_SMI_EVENT_VMFAULT, "%x:%s\n",
+	kfd_smi_event_add(0, dev, KFD_SMI_EVENT_VMFAULT, "%x:%s\n",
 			  task_info.pid, task_info.task_name);
 }
 
@@ -251,6 +264,8 @@ int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
 	spin_lock_init(&client->lock);
 	client->events = 0;
 	client->dev = dev;
+	client->pid = current->tgid;
+	client->suser = capable(CAP_SYS_ADMIN);
 
 	spin_lock(&dev->smi_lock);
 	list_add_rcu(&client->list, &dev->smi_clients);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault SMI event
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
  2022-06-28 14:50 ` [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers Philip Yang
  2022-06-28 14:50 ` [PATCH v5 2/11] drm/amdkfd: Enable per process SMI event Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:19   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH v5 4/11] drm/amdkfd: Add migration " Philip Yang
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Use ktime_get_boottime_ns() as timestamp to correlate with other
APIs. Output timestamp when GPU recoverable fault starts and ends to
recover the fault, if migration happened or only GPU page table is
updated to recover, fault address, if read or write fault.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 17 +++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  6 +++++-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c        | 17 +++++++++++++----
 drivers/gpu/drm/amd/amdkfd/kfd_svm.h        |  2 +-
 4 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index 55ed026435e2..b7e68283925f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -244,6 +244,23 @@ void kfd_smi_event_update_vmfault(struct kfd_dev *dev, uint16_t pasid)
 			  task_info.pid, task_info.task_name);
 }
 
+void kfd_smi_event_page_fault_start(struct kfd_dev *dev, pid_t pid,
+				    unsigned long address, bool write_fault,
+				    ktime_t ts)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_PAGE_FAULT_START,
+			  "%lld -%d @%lx(%x) %c\n", ktime_to_ns(ts), pid,
+			  address, dev->id, write_fault ? 'W' : 'R');
+}
+
+void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
+				  unsigned long address, bool migration)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_PAGE_FAULT_END,
+			  "%lld -%d @%lx(%x) %c\n", ktime_get_boottime_ns(),
+			  pid, address, dev->id, migration ? 'M' : 'U');
+}
+
 int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
 {
 	struct kfd_smi_client *client;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
index dfe101c21166..7903718cd9eb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
@@ -29,5 +29,9 @@ void kfd_smi_event_update_vmfault(struct kfd_dev *dev, uint16_t pasid);
 void kfd_smi_event_update_thermal_throttling(struct kfd_dev *dev,
 					     uint64_t throttle_bitmask);
 void kfd_smi_event_update_gpu_reset(struct kfd_dev *dev, bool post_reset);
-
+void kfd_smi_event_page_fault_start(struct kfd_dev *dev, pid_t pid,
+				    unsigned long address, bool write_fault,
+				    ktime_t ts);
+void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
+				  unsigned long address, bool migration);
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index d6fc00d51c8c..2ad08a1f38dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -32,6 +32,7 @@
 #include "kfd_priv.h"
 #include "kfd_svm.h"
 #include "kfd_migrate.h"
+#include "kfd_smi_events.h"
 
 #ifdef dev_fmt
 #undef dev_fmt
@@ -1617,7 +1618,7 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
 	svm_range_unreserve_bos(&ctx);
 
 	if (!r)
-		prange->validate_timestamp = ktime_to_us(ktime_get());
+		prange->validate_timestamp = ktime_get_boottime();
 
 	return r;
 }
@@ -2694,11 +2695,12 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 	struct svm_range_list *svms;
 	struct svm_range *prange;
 	struct kfd_process *p;
-	uint64_t timestamp;
+	ktime_t timestamp = ktime_get_boottime();
 	int32_t best_loc;
 	int32_t gpuidx = MAX_GPU_INSTANCE;
 	bool write_locked = false;
 	struct vm_area_struct *vma;
+	bool migration = false;
 	int r = 0;
 
 	if (!KFD_IS_SVM_API_SUPPORTED(adev->kfd.dev)) {
@@ -2775,9 +2777,9 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 		goto out_unlock_range;
 	}
 
-	timestamp = ktime_to_us(ktime_get()) - prange->validate_timestamp;
 	/* skip duplicate vm fault on different pages of same range */
-	if (timestamp < AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING) {
+	if (ktime_before(timestamp, ktime_add_ns(prange->validate_timestamp,
+				AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING))) {
 		pr_debug("svms 0x%p [0x%lx %lx] already restored\n",
 			 svms, prange->start, prange->last);
 		r = 0;
@@ -2813,7 +2815,11 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 		 svms, prange->start, prange->last, best_loc,
 		 prange->actual_loc);
 
+	kfd_smi_event_page_fault_start(adev->kfd.dev, p->lead_thread->pid, addr,
+				       write_fault, timestamp);
+
 	if (prange->actual_loc != best_loc) {
+		migration = true;
 		if (best_loc) {
 			r = svm_migrate_to_vram(prange, best_loc, mm);
 			if (r) {
@@ -2842,6 +2848,9 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 		pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpus\n",
 			 r, svms, prange->start, prange->last);
 
+	kfd_smi_event_page_fault_end(adev->kfd.dev, p->lead_thread->pid, addr,
+				     migration);
+
 out_unlock_range:
 	mutex_unlock(&prange->migrate_mutex);
 out_unlock_svms:
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
index 2d54147b4dda..eab7f6d3b13c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
@@ -125,7 +125,7 @@ struct svm_range {
 	uint32_t			actual_loc;
 	uint8_t				granularity;
 	atomic_t			invalid;
-	uint64_t			validate_timestamp;
+	ktime_t				validate_timestamp;
 	struct mmu_interval_notifier	notifier;
 	struct svm_work_list_item	work_item;
 	struct list_head		deferred_list;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 4/11] drm/amdkfd: Add migration SMI event
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (2 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault " Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:29   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore " Philip Yang
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

For migration start and end event, output timestamp when migration
starts, ends, svm range address and size, GPU id of migration source and
destination and svm range attributes,

Migration trigger could be prefetch, CPU or GPU page fault and TTM
eviction.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c    | 53 ++++++++++++++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.h    |  5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 22 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  8 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c        | 16 ++++---
 5 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index fb8a94e52656..9667015a6cbc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -32,6 +32,7 @@
 #include "kfd_priv.h"
 #include "kfd_svm.h"
 #include "kfd_migrate.h"
+#include "kfd_smi_events.h"
 
 #ifdef dev_fmt
 #undef dev_fmt
@@ -402,8 +403,9 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 static long
 svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 			struct vm_area_struct *vma, uint64_t start,
-			uint64_t end)
+			uint64_t end, uint32_t trigger)
 {
+	struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms);
 	uint64_t npages = (end - start) >> PAGE_SHIFT;
 	struct kfd_process_device *pdd;
 	struct dma_fence *mfence = NULL;
@@ -430,6 +432,11 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 	migrate.dst = migrate.src + npages;
 	scratch = (dma_addr_t *)(migrate.dst + npages);
 
+	kfd_smi_event_migration_start(adev->kfd.dev, p->lead_thread->pid,
+				      start >> PAGE_SHIFT, end >> PAGE_SHIFT,
+				      0, adev->kfd.dev->id, prange->prefetch_loc,
+				      prange->preferred_loc, trigger);
+
 	r = migrate_vma_setup(&migrate);
 	if (r) {
 		dev_err(adev->dev, "%s: vma setup fail %d range [0x%lx 0x%lx]\n",
@@ -458,6 +465,10 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
 	svm_migrate_copy_done(adev, mfence);
 	migrate_vma_finalize(&migrate);
 
+	kfd_smi_event_migration_end(adev->kfd.dev, p->lead_thread->pid,
+				    start >> PAGE_SHIFT, end >> PAGE_SHIFT,
+				    0, adev->kfd.dev->id, trigger);
+
 	svm_range_dma_unmap(adev->dev, scratch, 0, npages);
 	svm_range_free_dma_mappings(prange);
 
@@ -479,6 +490,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
  * @prange: range structure
  * @best_loc: the device to migrate to
  * @mm: the process mm structure
+ * @trigger: reason of migration
  *
  * Context: Process context, caller hold mmap read lock, svms lock, prange lock
  *
@@ -487,7 +499,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
  */
 static int
 svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
-			struct mm_struct *mm)
+			struct mm_struct *mm, uint32_t trigger)
 {
 	unsigned long addr, start, end;
 	struct vm_area_struct *vma;
@@ -524,7 +536,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
 			break;
 
 		next = min(vma->vm_end, end);
-		r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next);
+		r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next, trigger);
 		if (r < 0) {
 			pr_debug("failed %ld to migrate\n", r);
 			break;
@@ -655,8 +667,10 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
  */
 static long
 svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
-		       struct vm_area_struct *vma, uint64_t start, uint64_t end)
+		       struct vm_area_struct *vma, uint64_t start, uint64_t end,
+		       uint32_t trigger)
 {
+	struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms);
 	uint64_t npages = (end - start) >> PAGE_SHIFT;
 	unsigned long upages = npages;
 	unsigned long cpages = 0;
@@ -685,6 +699,11 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
 	migrate.dst = migrate.src + npages;
 	scratch = (dma_addr_t *)(migrate.dst + npages);
 
+	kfd_smi_event_migration_start(adev->kfd.dev, p->lead_thread->pid,
+				      start >> PAGE_SHIFT, end >> PAGE_SHIFT,
+				      adev->kfd.dev->id, 0, prange->prefetch_loc,
+				      prange->preferred_loc, trigger);
+
 	r = migrate_vma_setup(&migrate);
 	if (r) {
 		dev_err(adev->dev, "%s: vma setup fail %d range [0x%lx 0x%lx]\n",
@@ -715,6 +734,11 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
 
 	svm_migrate_copy_done(adev, mfence);
 	migrate_vma_finalize(&migrate);
+
+	kfd_smi_event_migration_end(adev->kfd.dev, p->lead_thread->pid,
+				    start >> PAGE_SHIFT, end >> PAGE_SHIFT,
+				    adev->kfd.dev->id, 0, trigger);
+
 	svm_range_dma_unmap(adev->dev, scratch, 0, npages);
 
 out_free:
@@ -732,13 +756,15 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
  * svm_migrate_vram_to_ram - migrate svm range from device to system
  * @prange: range structure
  * @mm: process mm, use current->mm if NULL
+ * @trigger: reason of migration
  *
  * Context: Process context, caller hold mmap read lock, prange->migrate_mutex
  *
  * Return:
  * 0 - OK, otherwise error code
  */
-int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
+int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm,
+			    uint32_t trigger)
 {
 	struct amdgpu_device *adev;
 	struct vm_area_struct *vma;
@@ -779,7 +805,7 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
 		}
 
 		next = min(vma->vm_end, end);
-		r = svm_migrate_vma_to_ram(adev, prange, vma, addr, next);
+		r = svm_migrate_vma_to_ram(adev, prange, vma, addr, next, trigger);
 		if (r < 0) {
 			pr_debug("failed %ld to migrate prange %p\n", r, prange);
 			break;
@@ -802,6 +828,7 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
  * @prange: range structure
  * @best_loc: the device to migrate to
  * @mm: process mm, use current->mm if NULL
+ * @trigger: reason of migration
  *
  * Context: Process context, caller hold mmap read lock, svms lock, prange lock
  *
@@ -810,7 +837,7 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
  */
 static int
 svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
-			 struct mm_struct *mm)
+			 struct mm_struct *mm, uint32_t trigger)
 {
 	int r, retries = 3;
 
@@ -822,7 +849,7 @@ svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
 	pr_debug("from gpu 0x%x to gpu 0x%x\n", prange->actual_loc, best_loc);
 
 	do {
-		r = svm_migrate_vram_to_ram(prange, mm);
+		r = svm_migrate_vram_to_ram(prange, mm, trigger);
 		if (r)
 			return r;
 	} while (prange->actual_loc && --retries);
@@ -830,17 +857,17 @@ svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
 	if (prange->actual_loc)
 		return -EDEADLK;
 
-	return svm_migrate_ram_to_vram(prange, best_loc, mm);
+	return svm_migrate_ram_to_vram(prange, best_loc, mm, trigger);
 }
 
 int
 svm_migrate_to_vram(struct svm_range *prange, uint32_t best_loc,
-		    struct mm_struct *mm)
+		    struct mm_struct *mm, uint32_t trigger)
 {
 	if  (!prange->actual_loc)
-		return svm_migrate_ram_to_vram(prange, best_loc, mm);
+		return svm_migrate_ram_to_vram(prange, best_loc, mm, trigger);
 	else
-		return svm_migrate_vram_to_vram(prange, best_loc, mm);
+		return svm_migrate_vram_to_vram(prange, best_loc, mm, trigger);
 
 }
 
@@ -909,7 +936,7 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
 		goto out_unlock_prange;
 	}
 
-	r = svm_migrate_vram_to_ram(prange, mm);
+	r = svm_migrate_vram_to_ram(prange, mm, KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU);
 	if (r)
 		pr_debug("failed %d migrate 0x%p [0x%lx 0x%lx] to ram\n", r,
 			 prange, prange->start, prange->last);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
index 2f5b3394c9ed..b3f0754b32fa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
@@ -41,8 +41,9 @@ enum MIGRATION_COPY_DIR {
 };
 
 int svm_migrate_to_vram(struct svm_range *prange,  uint32_t best_loc,
-			struct mm_struct *mm);
-int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm);
+			struct mm_struct *mm, uint32_t trigger);
+int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm,
+			    uint32_t trigger);
 unsigned long
 svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index b7e68283925f..ec4d278c2a47 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -261,6 +261,28 @@ void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
 			  pid, address, dev->id, migration ? 'M' : 'U');
 }
 
+void kfd_smi_event_migration_start(struct kfd_dev *dev, pid_t pid,
+				   unsigned long start, unsigned long end,
+				   uint32_t from, uint32_t to,
+				   uint32_t prefetch_loc, uint32_t preferred_loc,
+				   uint32_t trigger)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_MIGRATE_START,
+			  "%lld -%d @%lx(%lx) %x->%x %x:%x %d\n",
+			  ktime_get_boottime_ns(), pid, start, end - start,
+			  from, to, prefetch_loc, preferred_loc, trigger);
+}
+
+void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
+				 unsigned long start, unsigned long end,
+				 uint32_t from, uint32_t to, uint32_t trigger)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_MIGRATE_END,
+			  "%lld -%d @%lx(%lx) %x->%x %d\n",
+			  ktime_get_boottime_ns(), pid, start, end - start,
+			  from, to, trigger);
+}
+
 int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
 {
 	struct kfd_smi_client *client;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
index 7903718cd9eb..ec5d74a2fef4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
@@ -34,4 +34,12 @@ void kfd_smi_event_page_fault_start(struct kfd_dev *dev, pid_t pid,
 				    ktime_t ts);
 void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
 				  unsigned long address, bool migration);
+void kfd_smi_event_migration_start(struct kfd_dev *dev, pid_t pid,
+			     unsigned long start, unsigned long end,
+			     uint32_t from, uint32_t to,
+			     uint32_t prefetch_loc, uint32_t preferred_loc,
+			     uint32_t trigger);
+void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
+			     unsigned long start, unsigned long end,
+			     uint32_t from, uint32_t to, uint32_t trigger);
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 2ad08a1f38dd..5cead2a0e819 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -2821,7 +2821,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 	if (prange->actual_loc != best_loc) {
 		migration = true;
 		if (best_loc) {
-			r = svm_migrate_to_vram(prange, best_loc, mm);
+			r = svm_migrate_to_vram(prange, best_loc, mm,
+					KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU);
 			if (r) {
 				pr_debug("svm_migrate_to_vram failed (%d) at %llx, falling back to system memory\n",
 					 r, addr);
@@ -2829,12 +2830,14 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
 				 * VRAM failed
 				 */
 				if (prange->actual_loc)
-					r = svm_migrate_vram_to_ram(prange, mm);
+					r = svm_migrate_vram_to_ram(prange, mm,
+					   KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU);
 				else
 					r = 0;
 			}
 		} else {
-			r = svm_migrate_vram_to_ram(prange, mm);
+			r = svm_migrate_vram_to_ram(prange, mm,
+					KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU);
 		}
 		if (r) {
 			pr_debug("failed %d to migrate svms %p [0x%lx 0x%lx]\n",
@@ -3157,12 +3160,12 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange,
 		return 0;
 
 	if (!best_loc) {
-		r = svm_migrate_vram_to_ram(prange, mm);
+		r = svm_migrate_vram_to_ram(prange, mm, KFD_MIGRATE_TRIGGER_PREFETCH);
 		*migrated = !r;
 		return r;
 	}
 
-	r = svm_migrate_to_vram(prange, best_loc, mm);
+	r = svm_migrate_to_vram(prange, best_loc, mm, KFD_MIGRATE_TRIGGER_PREFETCH);
 	*migrated = !r;
 
 	return r;
@@ -3220,7 +3223,8 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
 		mutex_lock(&prange->migrate_mutex);
 		do {
 			r = svm_migrate_vram_to_ram(prange,
-						svm_bo->eviction_fence->mm);
+						svm_bo->eviction_fence->mm,
+						KFD_MIGRATE_TRIGGER_TTM_EVICTION);
 		} while (!r && prange->actual_loc && --retries);
 
 		if (!r && prange->actual_loc)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore SMI event
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (3 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH v5 4/11] drm/amdkfd: Add migration " Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:36   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU " Philip Yang
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Output user queue eviction and restore event. User queue eviction may be
triggered by svm or userptr MMU notifier, TTM eviction, device suspend
and CRIU checkpoint and restore.

User queue restore may be rescheduled if eviction happens again while
restore.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 12 ++++---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 15 ++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   | 35 +++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h   |  4 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  6 ++--
 9 files changed, 69 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index b25b41f50213..73bf8b5f2aa9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -336,7 +336,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
 }
 #endif
 /* KGD2KFD callbacks */
-int kgd2kfd_quiesce_mm(struct mm_struct *mm);
+int kgd2kfd_quiesce_mm(struct mm_struct *mm, uint32_t trigger);
 int kgd2kfd_resume_mm(struct mm_struct *mm);
 int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
 						struct dma_fence *fence);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 5ba9070d8722..6a7e045ddcc5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -32,6 +32,7 @@
 #include "amdgpu_dma_buf.h"
 #include <uapi/linux/kfd_ioctl.h>
 #include "amdgpu_xgmi.h"
+#include "kfd_smi_events.h"
 
 /* Userptr restore delay, just long enough to allow consecutive VM
  * changes to accumulate
@@ -2381,7 +2382,7 @@ int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem,
 	evicted_bos = atomic_inc_return(&process_info->evicted_bos);
 	if (evicted_bos == 1) {
 		/* First eviction, stop the queues */
-		r = kgd2kfd_quiesce_mm(mm);
+		r = kgd2kfd_quiesce_mm(mm, KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
 		if (r)
 			pr_err("Failed to quiesce KFD\n");
 		schedule_delayed_work(&process_info->restore_userptr_work,
@@ -2655,13 +2656,16 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
 
 unlock_out:
 	mutex_unlock(&process_info->lock);
-	mmput(mm);
-	put_task_struct(usertask);
 
 	/* If validation failed, reschedule another attempt */
-	if (evicted_bos)
+	if (evicted_bos) {
 		schedule_delayed_work(&process_info->restore_userptr_work,
 			msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+
+		kfd_smi_event_queue_restore_rescheduled(mm);
+	}
+	mmput(mm);
+	put_task_struct(usertask);
 }
 
 /** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a0246b4bae6b..6abfe10229a2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2428,7 +2428,7 @@ static int criu_restore(struct file *filep,
 	 * Set the process to evicted state to avoid running any new queues before all the memory
 	 * mappings are ready.
 	 */
-	ret = kfd_process_evict_queues(p);
+	ret = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_CRIU_RESTORE);
 	if (ret)
 		goto exit_unlock;
 
@@ -2547,7 +2547,7 @@ static int criu_process_info(struct file *filep,
 		goto err_unlock;
 	}
 
-	ret = kfd_process_evict_queues(p);
+	ret = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_CRIU_CHECKPOINT);
 	if (ret)
 		goto err_unlock;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index c8fee0dbfdcb..6ec0e9f0927d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -837,7 +837,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
 }
 
-int kgd2kfd_quiesce_mm(struct mm_struct *mm)
+int kgd2kfd_quiesce_mm(struct mm_struct *mm, uint32_t trigger)
 {
 	struct kfd_process *p;
 	int r;
@@ -851,7 +851,7 @@ int kgd2kfd_quiesce_mm(struct mm_struct *mm)
 		return -ESRCH;
 
 	WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);
-	r = kfd_process_evict_queues(p);
+	r = kfd_process_evict_queues(p, trigger);
 
 	kfd_unref_process(p);
 	return r;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 59ba50ce54d3..b9e7e9c52853 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -946,7 +946,7 @@ static inline struct kfd_process_device *kfd_process_device_from_gpuidx(
 }
 
 void kfd_unref_process(struct kfd_process *p);
-int kfd_process_evict_queues(struct kfd_process *p);
+int kfd_process_evict_queues(struct kfd_process *p, uint32_t trigger);
 int kfd_process_restore_queues(struct kfd_process *p);
 void kfd_suspend_all_processes(void);
 int kfd_resume_all_processes(void);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index a13e60d48b73..fc38a4d81420 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -43,6 +43,7 @@ struct mm_struct;
 #include "kfd_device_queue_manager.h"
 #include "kfd_iommu.h"
 #include "kfd_svm.h"
+#include "kfd_smi_events.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -1736,7 +1737,7 @@ struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
  * Eviction is reference-counted per process-device. This means multiple
  * evictions from different sources can be nested safely.
  */
-int kfd_process_evict_queues(struct kfd_process *p)
+int kfd_process_evict_queues(struct kfd_process *p, uint32_t trigger)
 {
 	int r = 0;
 	int i;
@@ -1745,6 +1746,9 @@ int kfd_process_evict_queues(struct kfd_process *p)
 	for (i = 0; i < p->n_pdds; i++) {
 		struct kfd_process_device *pdd = p->pdds[i];
 
+		kfd_smi_event_queue_eviction(pdd->dev, p->lead_thread->pid,
+					     trigger);
+
 		r = pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,
 							    &pdd->qpd);
 		/* evict return -EIO if HWS is hang or asic is resetting, in this case
@@ -1769,6 +1773,9 @@ int kfd_process_evict_queues(struct kfd_process *p)
 
 		if (n_evicted == 0)
 			break;
+
+		kfd_smi_event_queue_restore(pdd->dev, p->lead_thread->pid);
+
 		if (pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
 							      &pdd->qpd))
 			pr_err("Failed to restore queues\n");
@@ -1788,6 +1795,8 @@ int kfd_process_restore_queues(struct kfd_process *p)
 	for (i = 0; i < p->n_pdds; i++) {
 		struct kfd_process_device *pdd = p->pdds[i];
 
+		kfd_smi_event_queue_restore(pdd->dev, p->lead_thread->pid);
+
 		r = pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
 							      &pdd->qpd);
 		if (r) {
@@ -1849,7 +1858,7 @@ static void evict_process_worker(struct work_struct *work)
 	flush_delayed_work(&p->restore_work);
 
 	pr_debug("Started evicting pasid 0x%x\n", p->pasid);
-	ret = kfd_process_evict_queues(p);
+	ret = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_TTM);
 	if (!ret) {
 		dma_fence_signal(p->ef);
 		dma_fence_put(p->ef);
@@ -1916,7 +1925,7 @@ void kfd_suspend_all_processes(void)
 		cancel_delayed_work_sync(&p->eviction_work);
 		cancel_delayed_work_sync(&p->restore_work);
 
-		if (kfd_process_evict_queues(p))
+		if (kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SUSPEND))
 			pr_err("Failed to suspend process 0x%x\n", p->pasid);
 		dma_fence_signal(p->ef);
 		dma_fence_put(p->ef);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index ec4d278c2a47..3917c38204d0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -283,6 +283,41 @@ void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
 			  from, to, trigger);
 }
 
+void kfd_smi_event_queue_eviction(struct kfd_dev *dev, pid_t pid,
+				  uint32_t trigger)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_QUEUE_EVICTION,
+			  "%lld -%d %x %d\n", ktime_get_boottime_ns(), pid,
+			  dev->id, trigger);
+}
+
+void kfd_smi_event_queue_restore(struct kfd_dev *dev, pid_t pid)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_QUEUE_RESTORE,
+			  "%lld -%d %x\n", ktime_get_boottime_ns(), pid,
+			  dev->id);
+}
+
+void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm)
+{
+	struct kfd_process *p;
+	int i;
+
+	p = kfd_lookup_process_by_mm(mm);
+	if (!p)
+		return;
+
+	for (i = 0; i < p->n_pdds; i++) {
+		struct kfd_process_device *pdd = p->pdds[i];
+
+		kfd_smi_event_add(p->lead_thread->pid, pdd->dev,
+				  KFD_SMI_EVENT_QUEUE_RESTORE,
+				  "%lld -%d %x %c\n", ktime_get_boottime_ns(),
+				  p->lead_thread->pid, pdd->dev->id, 'R');
+	}
+	kfd_unref_process(p);
+}
+
 int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
 {
 	struct kfd_smi_client *client;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
index ec5d74a2fef4..b23292637239 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
@@ -42,4 +42,8 @@ void kfd_smi_event_migration_start(struct kfd_dev *dev, pid_t pid,
 void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
 			     unsigned long start, unsigned long end,
 			     uint32_t from, uint32_t to, uint32_t trigger);
+void kfd_smi_event_queue_eviction(struct kfd_dev *dev, pid_t pid,
+				  uint32_t trigger);
+void kfd_smi_event_queue_restore(struct kfd_dev *dev, pid_t pid);
+void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm);
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 5cead2a0e819..ddc1e4651919 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1730,14 +1730,16 @@ static void svm_range_restore_work(struct work_struct *work)
 	mutex_unlock(&svms->lock);
 	mmap_write_unlock(mm);
 	mutex_unlock(&process_info->lock);
-	mmput(mm);
 
 	/* If validation failed, reschedule another attempt */
 	if (evicted_ranges) {
 		pr_debug("reschedule to restore svm range\n");
 		schedule_delayed_work(&svms->restore_work,
 			msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
+
+		kfd_smi_event_queue_restore_rescheduled(mm);
 	}
+	mmput(mm);
 }
 
 /**
@@ -1793,7 +1795,7 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
 			 prange->svms, prange->start, prange->last);
 
 		/* First eviction, stop the queues */
-		r = kgd2kfd_quiesce_mm(mm);
+		r = kgd2kfd_quiesce_mm(mm, KFD_QUEUE_EVICTION_TRIGGER_SVM);
 		if (r)
 			pr_debug("failed to quiesce KFD\n");
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU SMI event
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (4 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore " Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:39   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client Philip Yang
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

SVM range unmapped from GPUs when range is unmapped from CPU, or with
xnack on from MMU notifier when range is evicted or migrated.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c        | 25 +++++++++++++++------
 3 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index 3917c38204d0..e5896b7a16dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -318,6 +318,15 @@ void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm)
 	kfd_unref_process(p);
 }
 
+void kfd_smi_event_unmap_from_gpu(struct kfd_dev *dev, pid_t pid,
+				  unsigned long address, unsigned long last,
+				  uint32_t trigger)
+{
+	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_UNMAP_FROM_GPU,
+			  "%lld -%d @%lx(%lx) %x %d\n", ktime_get_boottime_ns(),
+			  pid, address, last - address + 1, dev->id, trigger);
+}
+
 int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
 {
 	struct kfd_smi_client *client;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
index b23292637239..76fe4e0ec2d2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
@@ -46,4 +46,7 @@ void kfd_smi_event_queue_eviction(struct kfd_dev *dev, pid_t pid,
 				  uint32_t trigger);
 void kfd_smi_event_queue_restore(struct kfd_dev *dev, pid_t pid);
 void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm);
+void kfd_smi_event_unmap_from_gpu(struct kfd_dev *dev, pid_t pid,
+				  unsigned long address, unsigned long last,
+				  uint32_t trigger);
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index ddc1e4651919..bf888ae84c92 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1200,7 +1200,7 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 
 static int
 svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start,
-			  unsigned long last)
+			  unsigned long last, uint32_t trigger)
 {
 	DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
 	struct kfd_process_device *pdd;
@@ -1232,6 +1232,9 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start,
 			return -EINVAL;
 		}
 
+		kfd_smi_event_unmap_from_gpu(pdd->dev, p->lead_thread->pid,
+					     start, last, trigger);
+
 		r = svm_range_unmap_from_gpu(pdd->dev->adev,
 					     drm_priv_to_vm(pdd->drm_priv),
 					     start, last, &fence);
@@ -1759,7 +1762,8 @@ static void svm_range_restore_work(struct work_struct *work)
  */
 static int
 svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
-		unsigned long start, unsigned long last)
+		unsigned long start, unsigned long last,
+		enum mmu_notifier_event event)
 {
 	struct svm_range_list *svms = prange->svms;
 	struct svm_range *pchild;
@@ -1804,6 +1808,12 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
 			msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
 	} else {
 		unsigned long s, l;
+		uint32_t trigger;
+
+		if (event == MMU_NOTIFY_MIGRATE)
+			trigger = KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY_MIGRATE;
+		else
+			trigger = KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY;
 
 		pr_debug("invalidate unmap svms 0x%p [0x%lx 0x%lx] from GPUs\n",
 			 prange->svms, start, last);
@@ -1812,13 +1822,13 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
 			s = max(start, pchild->start);
 			l = min(last, pchild->last);
 			if (l >= s)
-				svm_range_unmap_from_gpus(pchild, s, l);
+				svm_range_unmap_from_gpus(pchild, s, l, trigger);
 			mutex_unlock(&pchild->lock);
 		}
 		s = max(start, prange->start);
 		l = min(last, prange->last);
 		if (l >= s)
-			svm_range_unmap_from_gpus(prange, s, l);
+			svm_range_unmap_from_gpus(prange, s, l, trigger);
 	}
 
 	return r;
@@ -2232,6 +2242,7 @@ static void
 svm_range_unmap_from_cpu(struct mm_struct *mm, struct svm_range *prange,
 			 unsigned long start, unsigned long last)
 {
+	uint32_t trigger = KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU;
 	struct svm_range_list *svms;
 	struct svm_range *pchild;
 	struct kfd_process *p;
@@ -2259,14 +2270,14 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, struct svm_range *prange,
 		s = max(start, pchild->start);
 		l = min(last, pchild->last);
 		if (l >= s)
-			svm_range_unmap_from_gpus(pchild, s, l);
+			svm_range_unmap_from_gpus(pchild, s, l, trigger);
 		svm_range_unmap_split(mm, prange, pchild, start, last);
 		mutex_unlock(&pchild->lock);
 	}
 	s = max(start, prange->start);
 	l = min(last, prange->last);
 	if (l >= s)
-		svm_range_unmap_from_gpus(prange, s, l);
+		svm_range_unmap_from_gpus(prange, s, l, trigger);
 	svm_range_unmap_split(mm, prange, prange, start, last);
 
 	if (unmap_parent)
@@ -2333,7 +2344,7 @@ svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
 		svm_range_unmap_from_cpu(mni->mm, prange, start, last);
 		break;
 	default:
-		svm_range_evict(prange, mni->mm, start, last);
+		svm_range_evict(prange, mni->mm, start, last, range->event);
 		break;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (5 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU " Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:45   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event Philip Yang
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

The synchronize_rcu may take several ms, which noticeably slows down
applications close SMI event handle. Use call_rcu to free client->fifo
and client asynchronously and eliminate the synchronize_rcu call in the
user thread.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index e5896b7a16dd..0472b56de245 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
@@ -38,6 +38,7 @@ struct kfd_smi_client {
 	uint64_t events;
 	struct kfd_dev *dev;
 	spinlock_t lock;
+	struct rcu_head rcu;
 	pid_t pid;
 	bool suser;
 };
@@ -137,6 +138,14 @@ static ssize_t kfd_smi_ev_write(struct file *filep, const char __user *user,
 	return sizeof(events);
 }
 
+static void kfd_smi_ev_client_free(struct rcu_head *p)
+{
+	struct kfd_smi_client *ev = container_of(p, struct kfd_smi_client, rcu);
+
+	kfifo_free(&ev->fifo);
+	kfree(ev);
+}
+
 static int kfd_smi_ev_release(struct inode *inode, struct file *filep)
 {
 	struct kfd_smi_client *client = filep->private_data;
@@ -146,10 +155,7 @@ static int kfd_smi_ev_release(struct inode *inode, struct file *filep)
 	list_del_rcu(&client->list);
 	spin_unlock(&dev->smi_lock);
 
-	synchronize_rcu();
-	kfifo_free(&client->fifo);
-	kfree(client);
-
+	call_rcu(&client->rcu, kfd_smi_ev_client_free);
 	return 0;
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (6 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-30 14:45   ` Felix Kuehling
  2022-06-28 14:50 ` [PATCH 9/11] libhsakmt: hsaKmtGetNodeProperties add gpu_id Philip Yang
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Indicate SMI profiling events available.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f239e260796b..b024e8ba865d 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -35,9 +35,10 @@
  * - 1.7 - Checkpoint Restore (CRIU) API
  * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs
  * - 1.9 - Add available memory ioctl
+ * - 1.10 - Add SMI profiler event log
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 9
+#define KFD_IOCTL_MINOR_VERSION 10
 
 struct kfd_ioctl_get_version_args {
 	__u32 major_version;	/* from KFD */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/11] libhsakmt: hsaKmtGetNodeProperties add gpu_id
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (7 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-28 14:50 ` [PATCH 10/11] libhsakmt: add open SMI event handle Philip Yang
  2022-06-28 14:50 ` [PATCH 11/11] ROCR-Runtime Basic SVM profiler Philip Yang
  10 siblings, 0 replies; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

Add KFDGpuID to HsaNodeProperties to return gpu_id to upper layer,
gpu_id is hash ID generated by KFD to distinguish GPUs on the system.
ROCr and ROCProfiler will use gpu_id to analyze SMI event message.

Change-Id: I6eabe6849230e04120674f5bc55e6ea254a532d6
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 include/hsakmttypes.h |  4 +++-
 src/fmm.c             |  7 +++----
 src/libhsakmt.h       |  1 -
 src/topology.c        | 26 ++++++++++++--------------
 4 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index 9063f85..ab2591b 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -328,7 +328,9 @@ typedef struct _HsaNodeProperties
 
     HSAuint32       VGPRSizePerCU;     // VGPR size in bytes per CU
     HSAuint32       SGPRSizePerCU;     // SGPR size in bytes per CU
-    HSAuint8        Reserved[12];
+
+    HSAuint32       KFDGpuID;          // GPU Hash ID generated by KFD
+    HSAuint8        Reserved[8];
 } HsaNodeProperties;
 
 
diff --git a/src/fmm.c b/src/fmm.c
index 35da3b8..92b76e1 100644
--- a/src/fmm.c
+++ b/src/fmm.c
@@ -2170,7 +2170,6 @@ HSAKMT_STATUS fmm_init_process_apertures(unsigned int NumNodes)
 {
 	uint32_t i;
 	int32_t gpu_mem_id = 0;
-	uint32_t gpu_id;
 	HsaNodeProperties props;
 	struct kfd_process_device_apertures *process_apertures;
 	uint32_t num_of_sysfs_nodes;
@@ -2235,14 +2234,14 @@ HSAKMT_STATUS fmm_init_process_apertures(unsigned int NumNodes)
 
 	for (i = 0; i < NumNodes; i++) {
 		memset(&props, 0, sizeof(props));
-		ret = topology_sysfs_get_node_props(i, &props, &gpu_id, NULL, NULL);
+		ret = topology_sysfs_get_node_props(i, &props, NULL, NULL);
 		if (ret != HSAKMT_STATUS_SUCCESS)
 			goto sysfs_parse_failed;
 
 		topology_setup_is_dgpu_param(&props);
 
 		/* Skip non-GPU nodes */
-		if (gpu_id != 0) {
+		if (props.KFDGpuID) {
 			int fd = open_drm_render_device(props.DrmRenderMinor);
 			if (fd <= 0) {
 				ret = HSAKMT_STATUS_ERROR;
@@ -2254,7 +2253,7 @@ HSAKMT_STATUS fmm_init_process_apertures(unsigned int NumNodes)
 			gpu_mem[gpu_mem_count].EngineId.ui32.Stepping = props.EngineId.ui32.Stepping;
 
 			gpu_mem[gpu_mem_count].drm_render_fd = fd;
-			gpu_mem[gpu_mem_count].gpu_id = gpu_id;
+			gpu_mem[gpu_mem_count].gpu_id = props.KFDGpuID;
 			gpu_mem[gpu_mem_count].local_mem_size = props.LocalMemSize;
 			gpu_mem[gpu_mem_count].device_id = props.DeviceId;
 			gpu_mem[gpu_mem_count].node_id = i;
diff --git a/src/libhsakmt.h b/src/libhsakmt.h
index e4246e0..822744b 100644
--- a/src/libhsakmt.h
+++ b/src/libhsakmt.h
@@ -173,7 +173,6 @@ HSAKMT_STATUS validate_nodeid_array(uint32_t **gpu_id_array,
 		uint32_t NumberOfNodes, uint32_t *NodeArray);
 
 HSAKMT_STATUS topology_sysfs_get_node_props(uint32_t node_id, HsaNodeProperties *props,
-					uint32_t *gpu_id,
 					bool *p2p_links, uint32_t *num_p2pLinks);
 HSAKMT_STATUS topology_sysfs_get_system_props(HsaSystemProperties *props);
 void topology_setup_is_dgpu_param(HsaNodeProperties *props);
diff --git a/src/topology.c b/src/topology.c
index 81ff62f..99a6a03 100644
--- a/src/topology.c
+++ b/src/topology.c
@@ -56,7 +56,6 @@
 #define KFD_SYSFS_PATH_NODES "/sys/devices/virtual/kfd/kfd/topology/nodes"
 
 typedef struct {
-	uint32_t gpu_id;
 	HsaNodeProperties node;
 	HsaMemoryProperties *mem;     /* node->NumBanks elements */
 	HsaCacheProperties *cache;
@@ -1037,7 +1036,6 @@ static int topology_get_marketing_name(int minor, uint16_t *marketing_name)
 
 HSAKMT_STATUS topology_sysfs_get_node_props(uint32_t node_id,
 					    HsaNodeProperties *props,
-					    uint32_t *gpu_id,
 					    bool *p2p_links,
 					    uint32_t *num_p2pLinks)
 {
@@ -1056,13 +1054,14 @@ HSAKMT_STATUS topology_sysfs_get_node_props(uint32_t node_id,
 	HSAKMT_STATUS ret = HSAKMT_STATUS_SUCCESS;
 
 	assert(props);
-	assert(gpu_id);
 	ret = topology_sysfs_map_node_id(node_id, &sys_node_id);
 	if (ret != HSAKMT_STATUS_SUCCESS)
 		return ret;
 
 	/* Retrieve the GPU ID */
-	ret = topology_sysfs_get_gpu_id(sys_node_id, gpu_id);
+	ret = topology_sysfs_get_gpu_id(sys_node_id, &props->KFDGpuID);
+	if (ret != HSAKMT_STATUS_SUCCESS)
+		return ret;
 
 	read_buf = malloc(PAGE_SIZE);
 	if (!read_buf)
@@ -1723,7 +1722,7 @@ static int32_t gpu_get_direct_link_cpu(uint32_t gpu_node, node_props_t *node_pro
 	HsaIoLinkProperties *props = node_props[gpu_node].link;
 	uint32_t i;
 
-	if (!node_props[gpu_node].gpu_id || !props ||
+	if (!node_props[gpu_node].node.KFDGpuID || !props ||
 			node_props[gpu_node].node.NumIOLinks == 0)
 		return -1;
 
@@ -1776,7 +1775,7 @@ static HSAKMT_STATUS get_indirect_iolink_info(uint32_t node1, uint32_t node2,
 		return HSAKMT_STATUS_INVALID_PARAMETER;
 
 	/* CPU->CPU is not an indirect link */
-	if (!node_props[node1].gpu_id && !node_props[node2].gpu_id)
+	if (!node_props[node1].node.KFDGpuID && !node_props[node2].node.KFDGpuID)
 		return HSAKMT_STATUS_INVALID_NODE_UNIT;
 
 	if (node_props[node1].node.HiveID &&
@@ -1784,16 +1783,16 @@ static HSAKMT_STATUS get_indirect_iolink_info(uint32_t node1, uint32_t node2,
 	    node_props[node1].node.HiveID == node_props[node2].node.HiveID)
 		return HSAKMT_STATUS_INVALID_PARAMETER;
 
-	if (node_props[node1].gpu_id)
+	if (node_props[node1].node.KFDGpuID)
 		dir_cpu1 = gpu_get_direct_link_cpu(node1, node_props);
-	if (node_props[node2].gpu_id)
+	if (node_props[node2].node.KFDGpuID)
 		dir_cpu2 = gpu_get_direct_link_cpu(node2, node_props);
 
 	if (dir_cpu1 < 0 && dir_cpu2 < 0)
 		return HSAKMT_STATUS_ERROR;
 
 	/* if the node2(dst) is GPU , it need to be large bar for host access*/
-	if (node_props[node2].gpu_id) {
+	if (node_props[node2].node.KFDGpuID) {
 		for (i = 0; i < node_props[node2].node.NumMemoryBanks; ++i)
 			if (node_props[node2].mem[i].HeapType ==
 				HSA_HEAPTYPE_FRAME_BUFFER_PUBLIC)
@@ -1922,7 +1921,6 @@ retry:
 		for (i = 0; i < sys_props.NumNodes; i++) {
 			ret = topology_sysfs_get_node_props(i,
 					&temp_props[i].node,
-					&temp_props[i].gpu_id,
 					&p2p_links, &num_p2pLinks);
 			if (ret != HSAKMT_STATUS_SUCCESS) {
 				free_properties(temp_props, i);
@@ -1963,7 +1961,7 @@ retry:
 						goto err;
 					}
 				}
-			} else if (!temp_props[i].gpu_id) { /* a CPU node */
+			} else if (!temp_props[i].node.KFDGpuID) { /* a CPU node */
 				ret = topology_get_cpu_cache_props(
 						i, cpuinfo, &temp_props[i]);
 				if (ret != HSAKMT_STATUS_SUCCESS) {
@@ -2104,7 +2102,7 @@ HSAKMT_STATUS validate_nodeid(uint32_t nodeid, uint32_t *gpu_id)
 	if (!g_props || !g_system || g_system->NumNodes <= nodeid)
 		return HSAKMT_STATUS_INVALID_NODE_UNIT;
 	if (gpu_id)
-		*gpu_id = g_props[nodeid].gpu_id;
+		*gpu_id = g_props[nodeid].node.KFDGpuID;
 
 	return HSAKMT_STATUS_SUCCESS;
 }
@@ -2114,7 +2112,7 @@ HSAKMT_STATUS gpuid_to_nodeid(uint32_t gpu_id, uint32_t *node_id)
 	uint64_t node_idx;
 
 	for (node_idx = 0; node_idx < g_system->NumNodes; node_idx++) {
-		if (g_props[node_idx].gpu_id == gpu_id) {
+		if (g_props[node_idx].node.KFDGpuID == gpu_id) {
 			*node_id = node_idx;
 			return HSAKMT_STATUS_SUCCESS;
 		}
@@ -2383,7 +2381,7 @@ uint16_t get_device_id_by_gpu_id(HSAuint32 gpu_id)
 		return 0;
 
 	for (i = 0; i < g_system->NumNodes; i++) {
-		if (g_props[i].gpu_id == gpu_id)
+		if (g_props[i].node.KFDGpuID == gpu_id)
 			return g_props[i].node.DeviceId;
 	}
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/11] libhsakmt: add open SMI event handle
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (8 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH 9/11] libhsakmt: hsaKmtGetNodeProperties add gpu_id Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  2022-06-28 14:50 ` [PATCH 11/11] ROCR-Runtime Basic SVM profiler Philip Yang
  10 siblings, 0 replies; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang, Felix.Kuehling

System Management Interface event is read from anonymous file handle,
this helper wrap the ioctl interface to get anonymous file handle for
GPU nodeid.

Define SMI event IDs, event triggers, copy the same value from
kfd_ioctl.h to avoid translation.

Change-Id: I5c8ba5301473bb3b80bb4e2aa33a9f675bedb001
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
---
 include/hsakmt.h      | 16 ++++++++++++++
 include/hsakmttypes.h | 49 +++++++++++++++++++++++++++++++++++++++++++
 src/events.c          | 27 ++++++++++++++++++++++++
 src/libhsakmt.ver     |  1 +
 4 files changed, 93 insertions(+)

diff --git a/include/hsakmt.h b/include/hsakmt.h
index abc617f..ca586ba 100644
--- a/include/hsakmt.h
+++ b/include/hsakmt.h
@@ -877,6 +877,22 @@ hsaKmtGetXNACKMode(
     HSAint32 * enable  // OUT: returns XNACK value.
 );
 
+/**
+   Open anonymous file handle to enable events and read SMI events.
+
+   To enable events, write 64bit events mask to fd, event enums as bit index.
+   for example, event mask (HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_INDEX_MAX) - 1) to enable all events
+
+   Read event from fd is not blocking, use poll with timeout value to check if event is available.
+   Event is dropped if kernel event fifo is full.
+*/
+HSAKMT_STATUS
+HSAKMTAPI
+hsaKmtOpenSMI(
+    HSAuint32 NodeId,   // IN: GPU node_id to receive the SMI event from
+    int *fd             // OUT: anonymous file handle
+);
+
 #ifdef __cplusplus
 }   //extern "C"
 #endif
diff --git a/include/hsakmttypes.h b/include/hsakmttypes.h
index ab2591b..690e001 100644
--- a/include/hsakmttypes.h
+++ b/include/hsakmttypes.h
@@ -1354,6 +1354,55 @@ typedef struct _HSA_SVM_ATTRIBUTE {
 	HSAuint32 value; // attribute value
 } HSA_SVM_ATTRIBUTE;
 
+typedef enum _HSA_SMI_EVENT {
+	HSA_SMI_EVENT_NONE = 0, /* not used */
+	HSA_SMI_EVENT_VMFAULT = 1, /* event start counting at 1 */
+	HSA_SMI_EVENT_THERMAL_THROTTLE = 2,
+	HSA_SMI_EVENT_GPU_PRE_RESET = 3,
+	HSA_SMI_EVENT_GPU_POST_RESET = 4,
+	HSA_SMI_EVENT_MIGRATE_START = 5,
+	HSA_SMI_EVENT_MIGRATE_END = 6,
+	HSA_SMI_EVENT_PAGE_FAULT_START = 7,
+	HSA_SMI_EVENT_PAGE_FAULT_END = 8,
+	HSA_SMI_EVENT_QUEUE_EVICTION = 9,
+	HSA_SMI_EVENT_QUEUE_RESTORE = 10,
+	HSA_SMI_EVENT_UNMAP_FROM_GPU = 11,
+	HSA_SMI_EVENT_INDEX_MAX = 12,
+
+	/*
+	 * max event number, as a flag bit to get events from all processes,
+	 * this requires super user permission, otherwise will not be able to
+	 * receive event from any process. Without this flag to receive events
+	 * from same process.
+	 */
+	HSA_SMI_EVENT_ALL_PROCESS = 64
+} HSA_EVENT_TYPE;
+
+typedef enum _HSA_MIGRATE_TRIGGERS {
+	HSA_MIGRATE_TRIGGER_PREFETCH,
+	HSA_MIGRATE_TRIGGER_PAGEFAULT_GPU,
+	HSA_MIGRATE_TRIGGER_PAGEFAULT_CPU,
+	HSA_MIGRATE_TRIGGER_TTM_EVICTION
+} HSA_MIGRATE_TRIGGERS;
+
+typedef enum _HSA_QUEUE_EVICTION_TRIGGERS {
+	HSA_QUEUE_EVICTION_TRIGGER_SVM,
+	HSA_QUEUE_EVICTION_TRIGGER_USERPTR,
+	HSA_QUEUE_EVICTION_TRIGGER_TTM,
+	HSA_QUEUE_EVICTION_TRIGGER_SUSPEND,
+	HSA_QUEUE_EVICTION_CRIU_CHECKPOINT,
+	HSA_QUEUE_EVICTION_CRIU_RESTORE
+} HSA_QUEUE_EVICTION_TRIGGERS;
+
+typedef enum _HSA_SVM_UNMAP_TRIGGERS {
+	HSA_SVM_UNMAP_TRIGGER_MMU_NOTIFY,
+	HSA_SVM_UNMAP_TRIGGER_MMU_NOTIFY_MIGRATE,
+	HSA_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU
+} HSA_SVM_UNMAP_TRIGGERS;
+
+#define HSA_SMI_EVENT_MASK_FROM_INDEX(i) (1ULL << ((i) - 1))
+#define HSA_SMI_EVENT_MSG_SIZE	96
+
 #pragma pack(pop, hsakmttypes_h)
 
 
diff --git a/src/events.c b/src/events.c
index d4c751c..06d3959 100644
--- a/src/events.c
+++ b/src/events.c
@@ -339,3 +339,30 @@ out:
 
 	return result;
 }
+
+HSAKMT_STATUS HSAKMTAPI hsaKmtOpenSMI(HSAuint32 NodeId, int *fd)
+{
+	struct kfd_ioctl_smi_events_args args;
+	HSAKMT_STATUS result;
+	uint32_t gpuid;
+
+	CHECK_KFD_OPEN();
+
+	pr_debug("[%s] node %d\n", __func__, NodeId);
+
+	result = validate_nodeid(NodeId, &gpuid);
+	if (result != HSAKMT_STATUS_SUCCESS) {
+		pr_err("[%s] invalid node ID: %d\n", __func__, NodeId);
+		return result;
+	}
+
+	args.gpuid = gpuid;
+	result = kmtIoctl(kfd_fd, AMDKFD_IOC_SMI_EVENTS, &args);
+	if (result) {
+		pr_debug("open SMI event fd failed %s\n", strerror(errno));
+		return HSAKMT_STATUS_ERROR;
+	}
+
+	*fd = args.anon_fd;
+	return HSAKMT_STATUS_SUCCESS;
+}
diff --git a/src/libhsakmt.ver b/src/libhsakmt.ver
index 50c309d..46370c6 100644
--- a/src/libhsakmt.ver
+++ b/src/libhsakmt.ver
@@ -69,6 +69,7 @@ hsaKmtSVMSetAttr;
 hsaKmtSVMGetAttr;
 hsaKmtSetXNACKMode;
 hsaKmtGetXNACKMode;
+hsaKmtOpenSMI;
 
 local: *;
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/11] ROCR-Runtime Basic SVM profiler
  2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
                   ` (9 preceding siblings ...)
  2022-06-28 14:50 ` [PATCH 10/11] libhsakmt: add open SMI event handle Philip Yang
@ 2022-06-28 14:50 ` Philip Yang
  10 siblings, 0 replies; 20+ messages in thread
From: Philip Yang @ 2022-06-28 14:50 UTC (permalink / raw)
  To: amd-gfx; +Cc: Felix.Kuehling, Sean Keely

From: Sean Keely <Sean.Keely@amd.com>

Mostly a demo at this point.  Logs SVM (aka HMM) info to
HSA_SVM_PROFILE if set.

Example: HSA_SVM_PROFILE=log.txt SomeApp

Change-Id: Ib6fd688f661a21b2c695f586b833be93662a15f4
---
 src/CMakeLists.txt                |   1 +
 src/core/inc/amd_gpu_agent.h      |   3 +
 src/core/inc/runtime.h            |   9 +
 src/core/inc/svm_profiler.h       |  67 ++++++
 src/core/runtime/runtime.cpp      |   8 +
 src/core/runtime/svm_profiler.cpp | 364 ++++++++++++++++++++++++++++++
 src/core/util/flag.h              |   6 +
 7 files changed, 458 insertions(+)
 create mode 100644 src/core/inc/svm_profiler.h
 create mode 100644 src/core/runtime/svm_profiler.cpp

diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 8fb02b14..1b7bf9b0 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -180,6 +180,7 @@ set ( SRCS core/util/lnx/os_linux.cpp
            core/runtime/signal.cpp
            core/runtime/queue.cpp
            core/runtime/cache.cpp
+           core/runtime/svm_profiler.cpp
            core/common/shared.cpp
            core/common/hsa_table_interface.cpp
            loader/executable.cpp
diff --git a/src/core/inc/amd_gpu_agent.h b/src/core/inc/amd_gpu_agent.h
index ed64d5be..fbdccaae 100644
--- a/src/core/inc/amd_gpu_agent.h
+++ b/src/core/inc/amd_gpu_agent.h
@@ -283,6 +283,9 @@ class GpuAgent : public GpuAgentInt {
   // @brief Returns Hive ID
   __forceinline uint64_t HiveId() const override { return  properties_.HiveID; }
 
+  // @brief Returns KFD's GPU id which is a hash used internally.
+  __forceinline uint64_t KfdGpuID() const { return properties_.KFDGpuID; }
+
   // @brief Returns node property.
   __forceinline const HsaNodeProperties& properties() const {
     return properties_;
diff --git a/src/core/inc/runtime.h b/src/core/inc/runtime.h
index 9f5b8acc..13190c75 100644
--- a/src/core/inc/runtime.h
+++ b/src/core/inc/runtime.h
@@ -50,6 +50,7 @@
 #include <memory>
 #include <tuple>
 #include <utility>
+#include <thread>
 
 #include "core/inc/hsa_ext_interface.h"
 #include "core/inc/hsa_internal.h"
@@ -60,6 +61,7 @@
 #include "core/inc/memory_region.h"
 #include "core/inc/signal.h"
 #include "core/inc/interrupt_signal.h"
+#include "core/inc/svm_profiler.h"
 #include "core/util/flag.h"
 #include "core/util/locks.h"
 #include "core/util/os.h"
@@ -312,6 +314,8 @@ class Runtime {
 
   const std::vector<uint32_t>& gpu_ids() { return gpu_ids_; }
 
+  Agent* agent_by_gpuid(uint32_t gpuid) { return agents_by_gpuid_[gpuid]; }
+
   Agent* region_gpu() { return region_gpu_; }
 
   const std::vector<const MemoryRegion*>& system_regions_fine() const {
@@ -508,6 +512,9 @@ class Runtime {
   // Agent map containing all agents indexed by their KFD node IDs.
   std::map<uint32_t, std::vector<Agent*> > agents_by_node_;
 
+  // Agent map containing all agents indexed by their KFD gpuid.
+  std::map<uint32_t, Agent*> agents_by_gpuid_;
+
   // Agent list containing all compatible gpu agent ids in the platform.
   std::vector<uint32_t> gpu_ids_;
 
@@ -590,6 +597,8 @@ class Runtime {
   // Kfd version
   KfdVersion_t kfd_version;
 
+  std::unique_ptr<AMD::SvmProfileControl> svm_profile_;
+
   // Frees runtime memory when the runtime library is unloaded if safe to do so.
   // Failure to release the runtime indicates an incorrect application but is
   // common (example: calls library routines at process exit).
diff --git a/src/core/inc/svm_profiler.h b/src/core/inc/svm_profiler.h
new file mode 100644
index 00000000..064965c7
--- /dev/null
+++ b/src/core/inc/svm_profiler.h
@@ -0,0 +1,67 @@
+////////////////////////////////////////////////////////////////////////////////
+//
+// The University of Illinois/NCSA
+// Open Source License (NCSA)
+//
+// Copyright (c) 2022-2022, Advanced Micro Devices, Inc. All rights reserved.
+//
+// Developed by:
+//
+//                 AMD Research and AMD HSA Software Development
+//
+//                 Advanced Micro Devices, Inc.
+//
+//                 www.amd.com
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to
+// deal with the Software without restriction, including without limitation
+// the rights to use, copy, modify, merge, publish, distribute, sublicense,
+// and/or sell copies of the Software, and to permit persons to whom the
+// Software is furnished to do so, subject to the following conditions:
+//
+//  - Redistributions of source code must retain the above copyright notice,
+//    this list of conditions and the following disclaimers.
+//  - Redistributions in binary form must reproduce the above copyright
+//    notice, this list of conditions and the following disclaimers in
+//    the documentation and/or other materials provided with the distribution.
+//  - Neither the names of Advanced Micro Devices, Inc,
+//    nor the names of its contributors may be used to endorse or promote
+//    products derived from this Software without specific prior written
+//    permission.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+// THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+// OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+// DEALINGS WITH THE SOFTWARE.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+#ifndef HSA_RUNTME_CORE_INC_SVM_PROFILER_H_
+#define HSA_RUNTME_CORE_INC_SVM_PROFILER_H_
+
+#include <vector>
+#include <string>
+#include <thread>
+
+namespace rocr {
+namespace AMD {
+
+    class SvmProfileControl {
+    public:
+      SvmProfileControl();
+      ~SvmProfileControl();
+
+    private:
+      template <typename... Args> std::string format(const char* format, Args... arg);
+      int event;
+      std::thread* thread;
+      std::vector<char> format_buffer;
+    };
+
+} // namespace AMD
+} // namespace rocr
+#endif // header guard
diff --git a/src/core/runtime/runtime.cpp b/src/core/runtime/runtime.cpp
index 40ebc35e..cb7ba992 100644
--- a/src/core/runtime/runtime.cpp
+++ b/src/core/runtime/runtime.cpp
@@ -48,6 +48,7 @@
 #include <string>
 #include <thread>
 #include <vector>
+#include <cstdio>
 
 #include "core/common/shared.h"
 #include "core/inc/hsa_ext_interface.h"
@@ -158,6 +159,8 @@ void Runtime::RegisterAgent(Agent* agent) {
   if (agent->device_type() == Agent::DeviceType::kAmdCpuDevice) {
     cpu_agents_.push_back(agent);
 
+    agents_by_gpuid_[0] = agent;
+
     // Add cpu regions to the system region list.
     for (const core::MemoryRegion* region : agent->regions()) {
       if (region->fine_grain()) {
@@ -1375,10 +1378,15 @@ hsa_status_t Runtime::Load() {
   // Load tools libraries
   LoadTools();
 
+  // Load svm profiler
+  svm_profile_.reset(new AMD::SvmProfileControl);
+
   return HSA_STATUS_SUCCESS;
 }
 
 void Runtime::Unload() {
+  svm_profile_.reset(nullptr);
+
   UnloadTools();
   UnloadExtensions();
 
diff --git a/src/core/runtime/svm_profiler.cpp b/src/core/runtime/svm_profiler.cpp
new file mode 100644
index 00000000..537b3a05
--- /dev/null
+++ b/src/core/runtime/svm_profiler.cpp
@@ -0,0 +1,364 @@
+////////////////////////////////////////////////////////////////////////////////
+//
+// The University of Illinois/NCSA
+// Open Source License (NCSA)
+//
+// Copyright (c) 2022-2022, Advanced Micro Devices, Inc. All rights reserved.
+//
+// Developed by:
+//
+//                 AMD Research and AMD HSA Software Development
+//
+//                 Advanced Micro Devices, Inc.
+//
+//                 www.amd.com
+//
+// Permission is hereby granted, free of charge, to any person obtaining a copy
+// of this software and associated documentation files (the "Software"), to
+// deal with the Software without restriction, including without limitation
+// the rights to use, copy, modify, merge, publish, distribute, sublicense,
+// and/or sell copies of the Software, and to permit persons to whom the
+// Software is furnished to do so, subject to the following conditions:
+//
+//  - Redistributions of source code must retain the above copyright notice,
+//    this list of conditions and the following disclaimers.
+//  - Redistributions in binary form must reproduce the above copyright
+//    notice, this list of conditions and the following disclaimers in
+//    the documentation and/or other materials provided with the distribution.
+//  - Neither the names of Advanced Micro Devices, Inc,
+//    nor the names of its contributors may be used to endorse or promote
+//    products derived from this Software without specific prior written
+//    permission.
+//
+// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+// THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+// OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+// ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+// DEALINGS WITH THE SOFTWARE.
+//
+////////////////////////////////////////////////////////////////////////////////
+
+#include "core/inc/svm_profiler.h"
+
+#include <stdint.h>
+#include <algorithm>
+#include <sys/eventfd.h>
+#include <poll.h>
+
+#include "hsakmt/hsakmt.h"
+
+#include "core/util/utils.h"
+#include "core/inc/runtime.h"
+#include "core/inc/agent.h"
+#include "core/inc/amd_gpu_agent.h"
+
+namespace rocr {
+namespace AMD {
+
+static const char* smi_event_string(uint32_t event) {
+  static const char* strings[] = {"NONE",
+                                  "VMFAULT",
+                                  "THERMAL_THROTTLE",
+                                  "GPU_PRE_RESET",
+                                  "GPU_POST_RESET",
+                                  "MIGRATE_START",
+                                  "MIGRATE_END",
+                                  "PAGE_FAULT_START",
+                                  "PAGE_FAULT_END",
+                                  "QUEUE_EVICTION",
+                                  "QUEUE_RESTORE",
+                                  "UNMAP_FROM_GPU",
+                                  "UNKNOWN"};
+
+  event = std::min<uint32_t>(event, sizeof(strings) / sizeof(char*) - 1);
+  return strings[event];
+}
+
+static const char* smi_migrate_string(uint32_t trigger) {
+  static const char* strings[] = {"PREFETCH",
+                                  "PAGEFAULT_GPU",
+                                  "PAGEFAULT_CPU",
+                                  "TTM_EVICTION",
+                                  "UNKNOWN"};
+
+  trigger = std::min<uint32_t>(trigger, sizeof(strings) / sizeof(char*) - 1);
+  return strings[trigger];
+}
+
+static const char* smi_eviction_string(uint32_t trigger) {
+  static const char* strings[] = {"SVM",
+                                  "USERPTR",
+                                  "TTM",
+                                  "SUSPEND",
+                                  "CRIU_CHECKPOINT",
+                                  "CRIU_RESTORE",
+                                  "UNKNOWN"};
+
+  trigger = std::min<uint32_t>(trigger, sizeof(strings) / sizeof(char*) - 1);
+  return strings[trigger];
+}
+
+static const char* smi_unmap_string(uint32_t trigger) {
+  static const char* strings[] = {"MMU_NOTIFY",
+                                  "MMU_NOTIFY_MIGRATE",
+                                  "UNMAP_FROM_CPU",
+                                  "UNKNOWN"};
+
+  trigger = std::min<uint32_t>(trigger, sizeof(strings) / sizeof(char*) - 1);
+  return strings[trigger];
+}
+
+SvmProfileControl::SvmProfileControl() : event(-1), thread(nullptr) {
+  event = eventfd(0, EFD_CLOEXEC);
+  if (event == -1) return;
+
+  thread = new std::thread([&]() {
+    if (core::Runtime::runtime_singleton_->flag().svm_profile().empty()) return;
+    FILE* logFile = fopen(core::Runtime::runtime_singleton_->flag().svm_profile().c_str(), "a");
+    if (logFile == NULL) return;
+    MAKE_NAMED_SCOPE_GUARD(logGuard, [&]() { fclose(logFile); });
+
+    std::vector<pollfd> files;
+    files.resize(core::Runtime::runtime_singleton_->gpu_agents().size() + 1);
+    files[0].fd = event;
+    files[0].events = POLLIN;
+    files[0].revents = 0;
+
+    HSAuint64 events = 0;
+    events = HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_MIGRATE_START) |
+        HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_MIGRATE_END) |
+        HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_PAGE_FAULT_START) |
+        HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_PAGE_FAULT_END) |
+        HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_QUEUE_EVICTION) |
+        HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_QUEUE_RESTORE) |
+        HSA_SMI_EVENT_MASK_FROM_INDEX(HSA_SMI_EVENT_UNMAP_FROM_GPU);
+
+    for (int i = 0; i < core::Runtime::runtime_singleton_->gpu_agents().size(); i++) {
+      auto err =
+          hsaKmtOpenSMI(core::Runtime::runtime_singleton_->gpu_agents()[i]->node_id(), &files[i + 1].fd);
+      assert(err == HSAKMT_STATUS_SUCCESS);
+      files[i + 1].events = POLLIN;
+      files[i + 1].revents = 0;
+      // Enable collecting masked events.
+      auto wrote = write(files[i + 1].fd, &events, sizeof(events));
+      assert(wrote == sizeof(events));
+    }
+    MAKE_NAMED_SCOPE_GUARD(smiGuard, [&]() {
+      for (int i = 1; i < files.size(); i++) {
+        close(files[i].fd);
+      }
+    });
+
+    std::vector<std::string> smi_records;
+    smi_records.resize(core::Runtime::runtime_singleton_->gpu_agents().size() + 1);
+    char buffer[HSA_SMI_EVENT_MSG_SIZE + 1];
+
+    auto format_agent = [this](uint32_t gpuid) {
+      std::string ret;
+      core::Agent* agent = core::Runtime::runtime_singleton_->agent_by_gpuid(gpuid);
+      if (agent->device_type() == core::Agent::kAmdCpuDevice)
+        return std::string("CPU");
+      else
+        return format("GPU%u(%p)", ((AMD::GpuAgent*)agent)->enumeration_index(),
+                      agent->public_handle());
+    };
+
+    while (true) {
+      int ready = poll(&files[0], files.size(), -1);
+      if (ready < 1) {
+        assert(false && "poll failed!");
+        return;
+      }
+
+      for (int i = 1; i < files.size(); i++) {
+        if (files[i].revents & POLLIN) {
+          memset(buffer, 0, sizeof(buffer));
+          auto len = read(files[i].fd, buffer, sizeof(buffer) - 1);
+          if (len > 0) {
+            buffer[len] = '\0';
+            // printf("%s\n", buffer);
+            // fprintf(logFile, "%s\n", buffer);
+
+            smi_records[i] += buffer;
+
+            while (true) {
+              size_t pos = smi_records[i].find('\n');
+              if (pos == std::string::npos) break;
+
+              std::string line = smi_records[i].substr(0, pos);
+              smi_records[i].erase(0, pos + 1);
+
+              const char* cursor;
+              cursor = line.c_str();
+
+              // Event records follow the format:
+              // event_id timestamp -pid event_specific_info trigger
+              // timestamp, pid, and trigger are in dec.  All other are hex.
+              // event_specific substring is listed for each event type.
+              // See kfd_ioctl.h for more info.
+              int event_id;
+              uint64_t time;
+              int pid;
+              int offset = 0;
+              int args = sscanf(cursor, "%x %lu -%u%n", &event_id, &time, &pid, &offset);
+              assert(args == 3 && "Parsing error!");
+
+              std::string detail;
+              cursor += offset + 1;
+              switch (event_id) {
+                //@addr(size) from->to prefetch_location:preferred_location
+                case HSA_SMI_EVENT_MIGRATE_START: {
+                  uint64_t addr;
+                  uint32_t size;
+                  uint32_t from, to;
+                  uint32_t trigger = 0;
+                  uint32_t fetch, pref;
+                  args = sscanf(cursor, "@%lx(%x) %x->%x %x:%x %u", &addr, &size, &from, &to,
+                                &fetch, &pref, &trigger);
+                  assert(args == 7 && "Parsing error!");
+
+                  addr *= 4096;
+                  size *= 4096;
+
+                  std::string from_agent = format_agent(from);
+                  std::string to_agent = format_agent(to);
+                  std::string range = format("[%p, %p]", addr, addr + size - 1);
+                  std::string cause = smi_migrate_string(trigger);
+                  detail = cause + " " + from_agent + "->" + to_agent + " " + range;
+                  break;
+                }
+                //@addr(size) from->to
+                case HSA_SMI_EVENT_MIGRATE_END: {
+                  uint64_t addr;
+                  uint32_t size;
+                  uint32_t from, to;
+                  uint32_t trigger;
+                  args = sscanf(cursor, "@%lx(%x) %x->%x %u", &addr, &size, &from, &to, &trigger);
+                  assert(args == 5 && "Parsing error!");
+
+                  addr *= 4096;
+                  size *= 4096;
+
+                  std::string from_agent = format_agent(from);
+                  std::string to_agent = format_agent(to);
+                  std::string range = format("[%p, %p]", addr, addr + size - 1);
+                  std::string cause = smi_migrate_string(trigger);
+                  detail = cause + " " + from_agent + "->" + to_agent + " " + range;
+                  break;
+                }
+                //@addr(gpu_id) W/R
+                case HSA_SMI_EVENT_PAGE_FAULT_START: {
+                  uint64_t addr;
+                  uint32_t gpuid;
+                  char mode;
+                  args = sscanf(cursor, "@%lx(%x) %c", &addr, &gpuid, &mode);
+
+                  addr *= 4096;
+
+                  assert(args == 3 && "Parsing error!");
+                  std::string agent = format_agent(gpuid);
+                  std::string range = std::to_string(addr);
+                  std::string cause = (mode == 'W') ? "Write" : "Read";
+                  detail = cause + " " + agent + " " + range;
+                  break;
+                }
+                //@addr(gpu_id) M/U  (migration / page table update)
+                case HSA_SMI_EVENT_PAGE_FAULT_END: {
+                  uint64_t addr;
+                  uint32_t gpuid;
+                  char mode;
+                  args = sscanf(cursor, "@%lx(%x) %c", &addr, &gpuid, &mode);
+                  assert(args == 3 && "Parsing error!");
+
+                  addr *= 4096;
+
+                  std::string agent = format_agent(gpuid);
+                  std::string range = std::to_string(addr);
+                  std::string cause = (mode == 'M') ? "Migration" : "Map";
+                  detail = cause + " " + agent + " " + range;
+                  break;
+                }
+                // gpu_id
+                case HSA_SMI_EVENT_QUEUE_EVICTION: {
+                  uint32_t gpuid;
+                  uint32_t trigger;
+                  args = sscanf(cursor, "%x %u", &gpuid, &trigger);
+                  assert(args == 2 && "Parsing error!");
+                  std::string agent = format_agent(gpuid);
+                  std::string cause = smi_eviction_string(trigger);
+                  detail = cause + " " + agent;
+                  break;
+                }
+                // gpu_id
+                case HSA_SMI_EVENT_QUEUE_RESTORE: {
+                  uint32_t gpuid;
+                  uint32_t trigger;
+                  args = sscanf(cursor, "%x %u", &gpuid, &trigger);
+                  assert(args == 2 && "Parsing error!");
+                  std::string agent = format_agent(gpuid);
+                  std::string cause = smi_eviction_string(trigger);
+                  detail = cause + " " + agent;
+                  break;
+                }
+                //@addr(size) gpu_id
+                case HSA_SMI_EVENT_UNMAP_FROM_GPU: {
+                  uint64_t addr;
+                  uint32_t size;
+                  uint32_t gpuid;
+                  uint32_t trigger;
+                  args = sscanf(cursor, "@%lx(%x) %x %u", &addr, &size, &gpuid, &trigger);
+                  assert(args == 4 && "Parsing error!");
+
+                  addr *= 4096;
+                  size *= 4096;
+
+                  std::string gpu = format_agent(gpuid);
+                  std::string range = format("[%p, %p]", addr, addr + size - 1);
+                  std::string cause = smi_unmap_string(trigger);
+                  detail = cause + " " + gpu + " " + range;
+                  break;
+                }
+                default:;
+              }
+
+              std::string record = std::string("ROCr HMM event: ") + std::to_string(time) + " " +
+                  smi_event_string(event_id) + " " + detail;
+              // printf("%s\n", record.c_str());
+              fprintf(logFile, "%s\n", record.c_str());
+            }
+          } else {
+            auto err = errno;
+            const char* msg = strerror(err);
+            // printf("ROCr HMM event error: Read returned %ld, %s (%d)\n", len, msg, err);
+            fprintf(logFile, "ROCr HMM event error: Read returned %ld, %s (%d)\n", len, msg, err);
+          }
+          files[i].revents = 0;
+        }
+      }
+
+      if (files[0].revents & POLLIN) return;
+    }
+  });
+}
+
+SvmProfileControl::~SvmProfileControl() {
+  if (event != -1) eventfd_write(event, 1);
+  thread->join();
+  delete thread;
+  close(event);
+}
+
+template <typename... Args>
+std::string SvmProfileControl::format(const char* format, Args... args) {
+  int len = snprintf(&format_buffer[0], format_buffer.size(), format, args...);
+  if (len + 1 > format_buffer.size()) {
+    format_buffer.resize(len + 1);
+    snprintf(&format_buffer[0], format_buffer.size(), format, args...);
+  }
+  return std::string(&format_buffer[0]);
+}
+ 
+} // namespace AMD
+} // namespace rocr
diff --git a/src/core/util/flag.h b/src/core/util/flag.h
index 045a6d0c..212ab013 100644
--- a/src/core/util/flag.h
+++ b/src/core/util/flag.h
@@ -153,6 +153,9 @@ class Flag {
     // Will become opt-out and possibly removed in future releases.
     var = os::GetEnvVar("HSA_COOP_CU_COUNT");
     coop_cu_count_ = (var == "1") ? true : false;
+
+    var = os::GetEnvVar("HSA_SVM_PROFILE");
+    svm_profile_ = var;
   }
 
   void parse_masks(uint32_t maxGpu, uint32_t maxCU) {
@@ -221,6 +224,8 @@ class Flag {
 
   bool coop_cu_count() const { return coop_cu_count_; }
 
+  const std::string& svm_profile() const { return svm_profile_; }
+
  private:
   bool check_flat_scratch_;
   bool enable_vm_fault_message_;
@@ -252,6 +257,7 @@ class Flag {
   size_t scratch_mem_size_;
 
   std::string tools_lib_names_;
+  std::string svm_profile_;
 
   size_t force_sdma_size_;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault SMI event
  2022-06-28 14:50 ` [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault " Philip Yang
@ 2022-06-30 14:19   ` Felix Kuehling
  2022-06-30 14:53     ` philip yang
  0 siblings, 1 reply; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:19 UTC (permalink / raw)
  To: Philip Yang, amd-gfx


Am 2022-06-28 um 10:50 schrieb Philip Yang:
> Use ktime_get_boottime_ns() as timestamp to correlate with other
> APIs. Output timestamp when GPU recoverable fault starts and ends to
> recover the fault, if migration happened or only GPU page table is
> updated to recover, fault address, if read or write fault.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 17 +++++++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  6 +++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c        | 17 +++++++++++++----
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.h        |  2 +-
>   4 files changed, 36 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> index 55ed026435e2..b7e68283925f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> @@ -244,6 +244,23 @@ void kfd_smi_event_update_vmfault(struct kfd_dev *dev, uint16_t pasid)
>   			  task_info.pid, task_info.task_name);
>   }
>   
> +void kfd_smi_event_page_fault_start(struct kfd_dev *dev, pid_t pid,
> +				    unsigned long address, bool write_fault,
> +				    ktime_t ts)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_PAGE_FAULT_START,
> +			  "%lld -%d @%lx(%x) %c\n", ktime_to_ns(ts), pid,
> +			  address, dev->id, write_fault ? 'W' : 'R');
> +}
> +
> +void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
> +				  unsigned long address, bool migration)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_PAGE_FAULT_END,
> +			  "%lld -%d @%lx(%x) %c\n", ktime_get_boottime_ns(),
> +			  pid, address, dev->id, migration ? 'M' : 'U');
> +}
> +
>   int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
>   {
>   	struct kfd_smi_client *client;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> index dfe101c21166..7903718cd9eb 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> @@ -29,5 +29,9 @@ void kfd_smi_event_update_vmfault(struct kfd_dev *dev, uint16_t pasid);
>   void kfd_smi_event_update_thermal_throttling(struct kfd_dev *dev,
>   					     uint64_t throttle_bitmask);
>   void kfd_smi_event_update_gpu_reset(struct kfd_dev *dev, bool post_reset);
> -
> +void kfd_smi_event_page_fault_start(struct kfd_dev *dev, pid_t pid,
> +				    unsigned long address, bool write_fault,
> +				    ktime_t ts);
> +void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
> +				  unsigned long address, bool migration);
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index d6fc00d51c8c..2ad08a1f38dd 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -32,6 +32,7 @@
>   #include "kfd_priv.h"
>   #include "kfd_svm.h"
>   #include "kfd_migrate.h"
> +#include "kfd_smi_events.h"
>   
>   #ifdef dev_fmt
>   #undef dev_fmt
> @@ -1617,7 +1618,7 @@ static int svm_range_validate_and_map(struct mm_struct *mm,
>   	svm_range_unreserve_bos(&ctx);
>   
>   	if (!r)
> -		prange->validate_timestamp = ktime_to_us(ktime_get());
> +		prange->validate_timestamp = ktime_get_boottime();
>   
>   	return r;
>   }
> @@ -2694,11 +2695,12 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   	struct svm_range_list *svms;
>   	struct svm_range *prange;
>   	struct kfd_process *p;
> -	uint64_t timestamp;
> +	ktime_t timestamp = ktime_get_boottime();
>   	int32_t best_loc;
>   	int32_t gpuidx = MAX_GPU_INSTANCE;
>   	bool write_locked = false;
>   	struct vm_area_struct *vma;
> +	bool migration = false;
>   	int r = 0;
>   
>   	if (!KFD_IS_SVM_API_SUPPORTED(adev->kfd.dev)) {
> @@ -2775,9 +2777,9 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   		goto out_unlock_range;
>   	}
>   
> -	timestamp = ktime_to_us(ktime_get()) - prange->validate_timestamp;
>   	/* skip duplicate vm fault on different pages of same range */
> -	if (timestamp < AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING) {
> +	if (ktime_before(timestamp, ktime_add_ns(prange->validate_timestamp,
> +				AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING))) {

You changed the timestamp units from us to ns. I think you'll need to 
update AMDGPU_SVM_RANGE_RETRY_FAULT_PENDING (multiply with 1000) to 
account for that.

Other than that, this patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


>   		pr_debug("svms 0x%p [0x%lx %lx] already restored\n",
>   			 svms, prange->start, prange->last);
>   		r = 0;
> @@ -2813,7 +2815,11 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   		 svms, prange->start, prange->last, best_loc,
>   		 prange->actual_loc);
>   
> +	kfd_smi_event_page_fault_start(adev->kfd.dev, p->lead_thread->pid, addr,
> +				       write_fault, timestamp);
> +
>   	if (prange->actual_loc != best_loc) {
> +		migration = true;
>   		if (best_loc) {
>   			r = svm_migrate_to_vram(prange, best_loc, mm);
>   			if (r) {
> @@ -2842,6 +2848,9 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   		pr_debug("failed %d to map svms 0x%p [0x%lx 0x%lx] to gpus\n",
>   			 r, svms, prange->start, prange->last);
>   
> +	kfd_smi_event_page_fault_end(adev->kfd.dev, p->lead_thread->pid, addr,
> +				     migration);
> +
>   out_unlock_range:
>   	mutex_unlock(&prange->migrate_mutex);
>   out_unlock_svms:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> index 2d54147b4dda..eab7f6d3b13c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.h
> @@ -125,7 +125,7 @@ struct svm_range {
>   	uint32_t			actual_loc;
>   	uint8_t				granularity;
>   	atomic_t			invalid;
> -	uint64_t			validate_timestamp;
> +	ktime_t				validate_timestamp;
>   	struct mmu_interval_notifier	notifier;
>   	struct svm_work_list_item	work_item;
>   	struct list_head		deferred_list;

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 4/11] drm/amdkfd: Add migration SMI event
  2022-06-28 14:50 ` [PATCH v5 4/11] drm/amdkfd: Add migration " Philip Yang
@ 2022-06-30 14:29   ` Felix Kuehling
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:29 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2022-06-28 um 10:50 schrieb Philip Yang:
> For migration start and end event, output timestamp when migration
> starts, ends, svm range address and size, GPU id of migration source and
> destination and svm range attributes,
>
> Migration trigger could be prefetch, CPU or GPU page fault and TTM
> eviction.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c    | 53 ++++++++++++++++-----
>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.h    |  5 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 22 +++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  8 ++++
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c        | 16 ++++---
>   5 files changed, 83 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index fb8a94e52656..9667015a6cbc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -32,6 +32,7 @@
>   #include "kfd_priv.h"
>   #include "kfd_svm.h"
>   #include "kfd_migrate.h"
> +#include "kfd_smi_events.h"
>   
>   #ifdef dev_fmt
>   #undef dev_fmt
> @@ -402,8 +403,9 @@ svm_migrate_copy_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>   static long
>   svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>   			struct vm_area_struct *vma, uint64_t start,
> -			uint64_t end)
> +			uint64_t end, uint32_t trigger)
>   {
> +	struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms);
>   	uint64_t npages = (end - start) >> PAGE_SHIFT;
>   	struct kfd_process_device *pdd;
>   	struct dma_fence *mfence = NULL;
> @@ -430,6 +432,11 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>   	migrate.dst = migrate.src + npages;
>   	scratch = (dma_addr_t *)(migrate.dst + npages);
>   
> +	kfd_smi_event_migration_start(adev->kfd.dev, p->lead_thread->pid,
> +				      start >> PAGE_SHIFT, end >> PAGE_SHIFT,
> +				      0, adev->kfd.dev->id, prange->prefetch_loc,
> +				      prange->preferred_loc, trigger);
> +
>   	r = migrate_vma_setup(&migrate);
>   	if (r) {
>   		dev_err(adev->dev, "%s: vma setup fail %d range [0x%lx 0x%lx]\n",
> @@ -458,6 +465,10 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>   	svm_migrate_copy_done(adev, mfence);
>   	migrate_vma_finalize(&migrate);
>   
> +	kfd_smi_event_migration_end(adev->kfd.dev, p->lead_thread->pid,
> +				    start >> PAGE_SHIFT, end >> PAGE_SHIFT,
> +				    0, adev->kfd.dev->id, trigger);
> +
>   	svm_range_dma_unmap(adev->dev, scratch, 0, npages);
>   	svm_range_free_dma_mappings(prange);
>   
> @@ -479,6 +490,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>    * @prange: range structure
>    * @best_loc: the device to migrate to
>    * @mm: the process mm structure
> + * @trigger: reason of migration
>    *
>    * Context: Process context, caller hold mmap read lock, svms lock, prange lock
>    *
> @@ -487,7 +499,7 @@ svm_migrate_vma_to_vram(struct amdgpu_device *adev, struct svm_range *prange,
>    */
>   static int
>   svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
> -			struct mm_struct *mm)
> +			struct mm_struct *mm, uint32_t trigger)
>   {
>   	unsigned long addr, start, end;
>   	struct vm_area_struct *vma;
> @@ -524,7 +536,7 @@ svm_migrate_ram_to_vram(struct svm_range *prange, uint32_t best_loc,
>   			break;
>   
>   		next = min(vma->vm_end, end);
> -		r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next);
> +		r = svm_migrate_vma_to_vram(adev, prange, vma, addr, next, trigger);
>   		if (r < 0) {
>   			pr_debug("failed %ld to migrate\n", r);
>   			break;
> @@ -655,8 +667,10 @@ svm_migrate_copy_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
>    */
>   static long
>   svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
> -		       struct vm_area_struct *vma, uint64_t start, uint64_t end)
> +		       struct vm_area_struct *vma, uint64_t start, uint64_t end,
> +		       uint32_t trigger)
>   {
> +	struct kfd_process *p = container_of(prange->svms, struct kfd_process, svms);
>   	uint64_t npages = (end - start) >> PAGE_SHIFT;
>   	unsigned long upages = npages;
>   	unsigned long cpages = 0;
> @@ -685,6 +699,11 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
>   	migrate.dst = migrate.src + npages;
>   	scratch = (dma_addr_t *)(migrate.dst + npages);
>   
> +	kfd_smi_event_migration_start(adev->kfd.dev, p->lead_thread->pid,
> +				      start >> PAGE_SHIFT, end >> PAGE_SHIFT,
> +				      adev->kfd.dev->id, 0, prange->prefetch_loc,
> +				      prange->preferred_loc, trigger);
> +
>   	r = migrate_vma_setup(&migrate);
>   	if (r) {
>   		dev_err(adev->dev, "%s: vma setup fail %d range [0x%lx 0x%lx]\n",
> @@ -715,6 +734,11 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
>   
>   	svm_migrate_copy_done(adev, mfence);
>   	migrate_vma_finalize(&migrate);
> +
> +	kfd_smi_event_migration_end(adev->kfd.dev, p->lead_thread->pid,
> +				    start >> PAGE_SHIFT, end >> PAGE_SHIFT,
> +				    adev->kfd.dev->id, 0, trigger);
> +
>   	svm_range_dma_unmap(adev->dev, scratch, 0, npages);
>   
>   out_free:
> @@ -732,13 +756,15 @@ svm_migrate_vma_to_ram(struct amdgpu_device *adev, struct svm_range *prange,
>    * svm_migrate_vram_to_ram - migrate svm range from device to system
>    * @prange: range structure
>    * @mm: process mm, use current->mm if NULL
> + * @trigger: reason of migration
>    *
>    * Context: Process context, caller hold mmap read lock, prange->migrate_mutex
>    *
>    * Return:
>    * 0 - OK, otherwise error code
>    */
> -int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
> +int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm,
> +			    uint32_t trigger)
>   {
>   	struct amdgpu_device *adev;
>   	struct vm_area_struct *vma;
> @@ -779,7 +805,7 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
>   		}
>   
>   		next = min(vma->vm_end, end);
> -		r = svm_migrate_vma_to_ram(adev, prange, vma, addr, next);
> +		r = svm_migrate_vma_to_ram(adev, prange, vma, addr, next, trigger);
>   		if (r < 0) {
>   			pr_debug("failed %ld to migrate prange %p\n", r, prange);
>   			break;
> @@ -802,6 +828,7 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
>    * @prange: range structure
>    * @best_loc: the device to migrate to
>    * @mm: process mm, use current->mm if NULL
> + * @trigger: reason of migration
>    *
>    * Context: Process context, caller hold mmap read lock, svms lock, prange lock
>    *
> @@ -810,7 +837,7 @@ int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm)
>    */
>   static int
>   svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
> -			 struct mm_struct *mm)
> +			 struct mm_struct *mm, uint32_t trigger)
>   {
>   	int r, retries = 3;
>   
> @@ -822,7 +849,7 @@ svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
>   	pr_debug("from gpu 0x%x to gpu 0x%x\n", prange->actual_loc, best_loc);
>   
>   	do {
> -		r = svm_migrate_vram_to_ram(prange, mm);
> +		r = svm_migrate_vram_to_ram(prange, mm, trigger);
>   		if (r)
>   			return r;
>   	} while (prange->actual_loc && --retries);
> @@ -830,17 +857,17 @@ svm_migrate_vram_to_vram(struct svm_range *prange, uint32_t best_loc,
>   	if (prange->actual_loc)
>   		return -EDEADLK;
>   
> -	return svm_migrate_ram_to_vram(prange, best_loc, mm);
> +	return svm_migrate_ram_to_vram(prange, best_loc, mm, trigger);
>   }
>   
>   int
>   svm_migrate_to_vram(struct svm_range *prange, uint32_t best_loc,
> -		    struct mm_struct *mm)
> +		    struct mm_struct *mm, uint32_t trigger)
>   {
>   	if  (!prange->actual_loc)
> -		return svm_migrate_ram_to_vram(prange, best_loc, mm);
> +		return svm_migrate_ram_to_vram(prange, best_loc, mm, trigger);
>   	else
> -		return svm_migrate_vram_to_vram(prange, best_loc, mm);
> +		return svm_migrate_vram_to_vram(prange, best_loc, mm, trigger);
>   
>   }
>   
> @@ -909,7 +936,7 @@ static vm_fault_t svm_migrate_to_ram(struct vm_fault *vmf)
>   		goto out_unlock_prange;
>   	}
>   
> -	r = svm_migrate_vram_to_ram(prange, mm);
> +	r = svm_migrate_vram_to_ram(prange, mm, KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU);
>   	if (r)
>   		pr_debug("failed %d migrate 0x%p [0x%lx 0x%lx] to ram\n", r,
>   			 prange, prange->start, prange->last);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
> index 2f5b3394c9ed..b3f0754b32fa 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.h
> @@ -41,8 +41,9 @@ enum MIGRATION_COPY_DIR {
>   };
>   
>   int svm_migrate_to_vram(struct svm_range *prange,  uint32_t best_loc,
> -			struct mm_struct *mm);
> -int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm);
> +			struct mm_struct *mm, uint32_t trigger);
> +int svm_migrate_vram_to_ram(struct svm_range *prange, struct mm_struct *mm,
> +			    uint32_t trigger);
>   unsigned long
>   svm_migrate_addr_to_pfn(struct amdgpu_device *adev, unsigned long addr);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> index b7e68283925f..ec4d278c2a47 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> @@ -261,6 +261,28 @@ void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
>   			  pid, address, dev->id, migration ? 'M' : 'U');
>   }
>   
> +void kfd_smi_event_migration_start(struct kfd_dev *dev, pid_t pid,
> +				   unsigned long start, unsigned long end,
> +				   uint32_t from, uint32_t to,
> +				   uint32_t prefetch_loc, uint32_t preferred_loc,
> +				   uint32_t trigger)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_MIGRATE_START,
> +			  "%lld -%d @%lx(%lx) %x->%x %x:%x %d\n",
> +			  ktime_get_boottime_ns(), pid, start, end - start,
> +			  from, to, prefetch_loc, preferred_loc, trigger);
> +}
> +
> +void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
> +				 unsigned long start, unsigned long end,
> +				 uint32_t from, uint32_t to, uint32_t trigger)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_MIGRATE_END,
> +			  "%lld -%d @%lx(%lx) %x->%x %d\n",
> +			  ktime_get_boottime_ns(), pid, start, end - start,
> +			  from, to, trigger);
> +}
> +
>   int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
>   {
>   	struct kfd_smi_client *client;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> index 7903718cd9eb..ec5d74a2fef4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> @@ -34,4 +34,12 @@ void kfd_smi_event_page_fault_start(struct kfd_dev *dev, pid_t pid,
>   				    ktime_t ts);
>   void kfd_smi_event_page_fault_end(struct kfd_dev *dev, pid_t pid,
>   				  unsigned long address, bool migration);
> +void kfd_smi_event_migration_start(struct kfd_dev *dev, pid_t pid,
> +			     unsigned long start, unsigned long end,
> +			     uint32_t from, uint32_t to,
> +			     uint32_t prefetch_loc, uint32_t preferred_loc,
> +			     uint32_t trigger);
> +void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
> +			     unsigned long start, unsigned long end,
> +			     uint32_t from, uint32_t to, uint32_t trigger);
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 2ad08a1f38dd..5cead2a0e819 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -2821,7 +2821,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   	if (prange->actual_loc != best_loc) {
>   		migration = true;
>   		if (best_loc) {
> -			r = svm_migrate_to_vram(prange, best_loc, mm);
> +			r = svm_migrate_to_vram(prange, best_loc, mm,
> +					KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU);
>   			if (r) {
>   				pr_debug("svm_migrate_to_vram failed (%d) at %llx, falling back to system memory\n",
>   					 r, addr);
> @@ -2829,12 +2830,14 @@ svm_range_restore_pages(struct amdgpu_device *adev, unsigned int pasid,
>   				 * VRAM failed
>   				 */
>   				if (prange->actual_loc)
> -					r = svm_migrate_vram_to_ram(prange, mm);
> +					r = svm_migrate_vram_to_ram(prange, mm,
> +					   KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU);
>   				else
>   					r = 0;
>   			}
>   		} else {
> -			r = svm_migrate_vram_to_ram(prange, mm);
> +			r = svm_migrate_vram_to_ram(prange, mm,
> +					KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU);
>   		}
>   		if (r) {
>   			pr_debug("failed %d to migrate svms %p [0x%lx 0x%lx]\n",
> @@ -3157,12 +3160,12 @@ svm_range_trigger_migration(struct mm_struct *mm, struct svm_range *prange,
>   		return 0;
>   
>   	if (!best_loc) {
> -		r = svm_migrate_vram_to_ram(prange, mm);
> +		r = svm_migrate_vram_to_ram(prange, mm, KFD_MIGRATE_TRIGGER_PREFETCH);
>   		*migrated = !r;
>   		return r;
>   	}
>   
> -	r = svm_migrate_to_vram(prange, best_loc, mm);
> +	r = svm_migrate_to_vram(prange, best_loc, mm, KFD_MIGRATE_TRIGGER_PREFETCH);
>   	*migrated = !r;
>   
>   	return r;
> @@ -3220,7 +3223,8 @@ static void svm_range_evict_svm_bo_worker(struct work_struct *work)
>   		mutex_lock(&prange->migrate_mutex);
>   		do {
>   			r = svm_migrate_vram_to_ram(prange,
> -						svm_bo->eviction_fence->mm);
> +						svm_bo->eviction_fence->mm,
> +						KFD_MIGRATE_TRIGGER_TTM_EVICTION);
>   		} while (!r && prange->actual_loc && --retries);
>   
>   		if (!r && prange->actual_loc)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore SMI event
  2022-06-28 14:50 ` [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore " Philip Yang
@ 2022-06-30 14:36   ` Felix Kuehling
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:36 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2022-06-28 um 10:50 schrieb Philip Yang:
> Output user queue eviction and restore event. User queue eviction may be
> triggered by svm or userptr MMU notifier, TTM eviction, device suspend
> and CRIU checkpoint and restore.
>
> User queue restore may be rescheduled if eviction happens again while
> restore.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  2 +-
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 12 ++++---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  4 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c       |  4 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 15 ++++++--
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c   | 35 +++++++++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h   |  4 +++
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c          |  6 ++--
>   9 files changed, 69 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index b25b41f50213..73bf8b5f2aa9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -336,7 +336,7 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo)
>   }
>   #endif
>   /* KGD2KFD callbacks */
> -int kgd2kfd_quiesce_mm(struct mm_struct *mm);
> +int kgd2kfd_quiesce_mm(struct mm_struct *mm, uint32_t trigger);
>   int kgd2kfd_resume_mm(struct mm_struct *mm);
>   int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
>   						struct dma_fence *fence);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index 5ba9070d8722..6a7e045ddcc5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -32,6 +32,7 @@
>   #include "amdgpu_dma_buf.h"
>   #include <uapi/linux/kfd_ioctl.h>
>   #include "amdgpu_xgmi.h"
> +#include "kfd_smi_events.h"
>   
>   /* Userptr restore delay, just long enough to allow consecutive VM
>    * changes to accumulate
> @@ -2381,7 +2382,7 @@ int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem,
>   	evicted_bos = atomic_inc_return(&process_info->evicted_bos);
>   	if (evicted_bos == 1) {
>   		/* First eviction, stop the queues */
> -		r = kgd2kfd_quiesce_mm(mm);
> +		r = kgd2kfd_quiesce_mm(mm, KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
>   		if (r)
>   			pr_err("Failed to quiesce KFD\n");
>   		schedule_delayed_work(&process_info->restore_userptr_work,
> @@ -2655,13 +2656,16 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
>   
>   unlock_out:
>   	mutex_unlock(&process_info->lock);
> -	mmput(mm);
> -	put_task_struct(usertask);
>   
>   	/* If validation failed, reschedule another attempt */
> -	if (evicted_bos)
> +	if (evicted_bos) {
>   		schedule_delayed_work(&process_info->restore_userptr_work,
>   			msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
> +
> +		kfd_smi_event_queue_restore_rescheduled(mm);
> +	}
> +	mmput(mm);
> +	put_task_struct(usertask);
>   }
>   
>   /** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index a0246b4bae6b..6abfe10229a2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -2428,7 +2428,7 @@ static int criu_restore(struct file *filep,
>   	 * Set the process to evicted state to avoid running any new queues before all the memory
>   	 * mappings are ready.
>   	 */
> -	ret = kfd_process_evict_queues(p);
> +	ret = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_CRIU_RESTORE);
>   	if (ret)
>   		goto exit_unlock;
>   
> @@ -2547,7 +2547,7 @@ static int criu_process_info(struct file *filep,
>   		goto err_unlock;
>   	}
>   
> -	ret = kfd_process_evict_queues(p);
> +	ret = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_CRIU_CHECKPOINT);
>   	if (ret)
>   		goto err_unlock;
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index c8fee0dbfdcb..6ec0e9f0927d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -837,7 +837,7 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>   	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
>   }
>   
> -int kgd2kfd_quiesce_mm(struct mm_struct *mm)
> +int kgd2kfd_quiesce_mm(struct mm_struct *mm, uint32_t trigger)
>   {
>   	struct kfd_process *p;
>   	int r;
> @@ -851,7 +851,7 @@ int kgd2kfd_quiesce_mm(struct mm_struct *mm)
>   		return -ESRCH;
>   
>   	WARN(debug_evictions, "Evicting pid %d", p->lead_thread->pid);
> -	r = kfd_process_evict_queues(p);
> +	r = kfd_process_evict_queues(p, trigger);
>   
>   	kfd_unref_process(p);
>   	return r;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 59ba50ce54d3..b9e7e9c52853 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -946,7 +946,7 @@ static inline struct kfd_process_device *kfd_process_device_from_gpuidx(
>   }
>   
>   void kfd_unref_process(struct kfd_process *p);
> -int kfd_process_evict_queues(struct kfd_process *p);
> +int kfd_process_evict_queues(struct kfd_process *p, uint32_t trigger);
>   int kfd_process_restore_queues(struct kfd_process *p);
>   void kfd_suspend_all_processes(void);
>   int kfd_resume_all_processes(void);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index a13e60d48b73..fc38a4d81420 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -43,6 +43,7 @@ struct mm_struct;
>   #include "kfd_device_queue_manager.h"
>   #include "kfd_iommu.h"
>   #include "kfd_svm.h"
> +#include "kfd_smi_events.h"
>   
>   /*
>    * List of struct kfd_process (field kfd_process).
> @@ -1736,7 +1737,7 @@ struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
>    * Eviction is reference-counted per process-device. This means multiple
>    * evictions from different sources can be nested safely.
>    */
> -int kfd_process_evict_queues(struct kfd_process *p)
> +int kfd_process_evict_queues(struct kfd_process *p, uint32_t trigger)
>   {
>   	int r = 0;
>   	int i;
> @@ -1745,6 +1746,9 @@ int kfd_process_evict_queues(struct kfd_process *p)
>   	for (i = 0; i < p->n_pdds; i++) {
>   		struct kfd_process_device *pdd = p->pdds[i];
>   
> +		kfd_smi_event_queue_eviction(pdd->dev, p->lead_thread->pid,
> +					     trigger);
> +
>   		r = pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,
>   							    &pdd->qpd);
>   		/* evict return -EIO if HWS is hang or asic is resetting, in this case
> @@ -1769,6 +1773,9 @@ int kfd_process_evict_queues(struct kfd_process *p)
>   
>   		if (n_evicted == 0)
>   			break;
> +
> +		kfd_smi_event_queue_restore(pdd->dev, p->lead_thread->pid);
> +
>   		if (pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
>   							      &pdd->qpd))
>   			pr_err("Failed to restore queues\n");
> @@ -1788,6 +1795,8 @@ int kfd_process_restore_queues(struct kfd_process *p)
>   	for (i = 0; i < p->n_pdds; i++) {
>   		struct kfd_process_device *pdd = p->pdds[i];
>   
> +		kfd_smi_event_queue_restore(pdd->dev, p->lead_thread->pid);
> +
>   		r = pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
>   							      &pdd->qpd);
>   		if (r) {
> @@ -1849,7 +1858,7 @@ static void evict_process_worker(struct work_struct *work)
>   	flush_delayed_work(&p->restore_work);
>   
>   	pr_debug("Started evicting pasid 0x%x\n", p->pasid);
> -	ret = kfd_process_evict_queues(p);
> +	ret = kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_TTM);
>   	if (!ret) {
>   		dma_fence_signal(p->ef);
>   		dma_fence_put(p->ef);
> @@ -1916,7 +1925,7 @@ void kfd_suspend_all_processes(void)
>   		cancel_delayed_work_sync(&p->eviction_work);
>   		cancel_delayed_work_sync(&p->restore_work);
>   
> -		if (kfd_process_evict_queues(p))
> +		if (kfd_process_evict_queues(p, KFD_QUEUE_EVICTION_TRIGGER_SUSPEND))
>   			pr_err("Failed to suspend process 0x%x\n", p->pasid);
>   		dma_fence_signal(p->ef);
>   		dma_fence_put(p->ef);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> index ec4d278c2a47..3917c38204d0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> @@ -283,6 +283,41 @@ void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
>   			  from, to, trigger);
>   }
>   
> +void kfd_smi_event_queue_eviction(struct kfd_dev *dev, pid_t pid,
> +				  uint32_t trigger)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_QUEUE_EVICTION,
> +			  "%lld -%d %x %d\n", ktime_get_boottime_ns(), pid,
> +			  dev->id, trigger);
> +}
> +
> +void kfd_smi_event_queue_restore(struct kfd_dev *dev, pid_t pid)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_QUEUE_RESTORE,
> +			  "%lld -%d %x\n", ktime_get_boottime_ns(), pid,
> +			  dev->id);
> +}
> +
> +void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm)
> +{
> +	struct kfd_process *p;
> +	int i;
> +
> +	p = kfd_lookup_process_by_mm(mm);
> +	if (!p)
> +		return;
> +
> +	for (i = 0; i < p->n_pdds; i++) {
> +		struct kfd_process_device *pdd = p->pdds[i];
> +
> +		kfd_smi_event_add(p->lead_thread->pid, pdd->dev,
> +				  KFD_SMI_EVENT_QUEUE_RESTORE,
> +				  "%lld -%d %x %c\n", ktime_get_boottime_ns(),
> +				  p->lead_thread->pid, pdd->dev->id, 'R');
> +	}
> +	kfd_unref_process(p);
> +}
> +
>   int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
>   {
>   	struct kfd_smi_client *client;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> index ec5d74a2fef4..b23292637239 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> @@ -42,4 +42,8 @@ void kfd_smi_event_migration_start(struct kfd_dev *dev, pid_t pid,
>   void kfd_smi_event_migration_end(struct kfd_dev *dev, pid_t pid,
>   			     unsigned long start, unsigned long end,
>   			     uint32_t from, uint32_t to, uint32_t trigger);
> +void kfd_smi_event_queue_eviction(struct kfd_dev *dev, pid_t pid,
> +				  uint32_t trigger);
> +void kfd_smi_event_queue_restore(struct kfd_dev *dev, pid_t pid);
> +void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm);
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 5cead2a0e819..ddc1e4651919 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1730,14 +1730,16 @@ static void svm_range_restore_work(struct work_struct *work)
>   	mutex_unlock(&svms->lock);
>   	mmap_write_unlock(mm);
>   	mutex_unlock(&process_info->lock);
> -	mmput(mm);
>   
>   	/* If validation failed, reschedule another attempt */
>   	if (evicted_ranges) {
>   		pr_debug("reschedule to restore svm range\n");
>   		schedule_delayed_work(&svms->restore_work,
>   			msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
> +
> +		kfd_smi_event_queue_restore_rescheduled(mm);
>   	}
> +	mmput(mm);
>   }
>   
>   /**
> @@ -1793,7 +1795,7 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
>   			 prange->svms, prange->start, prange->last);
>   
>   		/* First eviction, stop the queues */
> -		r = kgd2kfd_quiesce_mm(mm);
> +		r = kgd2kfd_quiesce_mm(mm, KFD_QUEUE_EVICTION_TRIGGER_SVM);
>   		if (r)
>   			pr_debug("failed to quiesce KFD\n");
>   

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU SMI event
  2022-06-28 14:50 ` [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU " Philip Yang
@ 2022-06-30 14:39   ` Felix Kuehling
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:39 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2022-06-28 um 10:50 schrieb Philip Yang:
> SVM range unmapped from GPUs when range is unmapped from CPU, or with
> xnack on from MMU notifier when range is evicted or migrated.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c |  9 ++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h |  3 +++
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c        | 25 +++++++++++++++------
>   3 files changed, 30 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> index 3917c38204d0..e5896b7a16dd 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> @@ -318,6 +318,15 @@ void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm)
>   	kfd_unref_process(p);
>   }
>   
> +void kfd_smi_event_unmap_from_gpu(struct kfd_dev *dev, pid_t pid,
> +				  unsigned long address, unsigned long last,
> +				  uint32_t trigger)
> +{
> +	kfd_smi_event_add(pid, dev, KFD_SMI_EVENT_UNMAP_FROM_GPU,
> +			  "%lld -%d @%lx(%lx) %x %d\n", ktime_get_boottime_ns(),
> +			  pid, address, last - address + 1, dev->id, trigger);
> +}
> +
>   int kfd_smi_event_open(struct kfd_dev *dev, uint32_t *fd)
>   {
>   	struct kfd_smi_client *client;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> index b23292637239..76fe4e0ec2d2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.h
> @@ -46,4 +46,7 @@ void kfd_smi_event_queue_eviction(struct kfd_dev *dev, pid_t pid,
>   				  uint32_t trigger);
>   void kfd_smi_event_queue_restore(struct kfd_dev *dev, pid_t pid);
>   void kfd_smi_event_queue_restore_rescheduled(struct mm_struct *mm);
> +void kfd_smi_event_unmap_from_gpu(struct kfd_dev *dev, pid_t pid,
> +				  unsigned long address, unsigned long last,
> +				  uint32_t trigger);
>   #endif
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index ddc1e4651919..bf888ae84c92 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -1200,7 +1200,7 @@ svm_range_unmap_from_gpu(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>   
>   static int
>   svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start,
> -			  unsigned long last)
> +			  unsigned long last, uint32_t trigger)
>   {
>   	DECLARE_BITMAP(bitmap, MAX_GPU_INSTANCE);
>   	struct kfd_process_device *pdd;
> @@ -1232,6 +1232,9 @@ svm_range_unmap_from_gpus(struct svm_range *prange, unsigned long start,
>   			return -EINVAL;
>   		}
>   
> +		kfd_smi_event_unmap_from_gpu(pdd->dev, p->lead_thread->pid,
> +					     start, last, trigger);
> +
>   		r = svm_range_unmap_from_gpu(pdd->dev->adev,
>   					     drm_priv_to_vm(pdd->drm_priv),
>   					     start, last, &fence);
> @@ -1759,7 +1762,8 @@ static void svm_range_restore_work(struct work_struct *work)
>    */
>   static int
>   svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
> -		unsigned long start, unsigned long last)
> +		unsigned long start, unsigned long last,
> +		enum mmu_notifier_event event)
>   {
>   	struct svm_range_list *svms = prange->svms;
>   	struct svm_range *pchild;
> @@ -1804,6 +1808,12 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
>   			msecs_to_jiffies(AMDGPU_SVM_RANGE_RESTORE_DELAY_MS));
>   	} else {
>   		unsigned long s, l;
> +		uint32_t trigger;
> +
> +		if (event == MMU_NOTIFY_MIGRATE)
> +			trigger = KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY_MIGRATE;
> +		else
> +			trigger = KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY;
>   
>   		pr_debug("invalidate unmap svms 0x%p [0x%lx 0x%lx] from GPUs\n",
>   			 prange->svms, start, last);
> @@ -1812,13 +1822,13 @@ svm_range_evict(struct svm_range *prange, struct mm_struct *mm,
>   			s = max(start, pchild->start);
>   			l = min(last, pchild->last);
>   			if (l >= s)
> -				svm_range_unmap_from_gpus(pchild, s, l);
> +				svm_range_unmap_from_gpus(pchild, s, l, trigger);
>   			mutex_unlock(&pchild->lock);
>   		}
>   		s = max(start, prange->start);
>   		l = min(last, prange->last);
>   		if (l >= s)
> -			svm_range_unmap_from_gpus(prange, s, l);
> +			svm_range_unmap_from_gpus(prange, s, l, trigger);
>   	}
>   
>   	return r;
> @@ -2232,6 +2242,7 @@ static void
>   svm_range_unmap_from_cpu(struct mm_struct *mm, struct svm_range *prange,
>   			 unsigned long start, unsigned long last)
>   {
> +	uint32_t trigger = KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU;
>   	struct svm_range_list *svms;
>   	struct svm_range *pchild;
>   	struct kfd_process *p;
> @@ -2259,14 +2270,14 @@ svm_range_unmap_from_cpu(struct mm_struct *mm, struct svm_range *prange,
>   		s = max(start, pchild->start);
>   		l = min(last, pchild->last);
>   		if (l >= s)
> -			svm_range_unmap_from_gpus(pchild, s, l);
> +			svm_range_unmap_from_gpus(pchild, s, l, trigger);
>   		svm_range_unmap_split(mm, prange, pchild, start, last);
>   		mutex_unlock(&pchild->lock);
>   	}
>   	s = max(start, prange->start);
>   	l = min(last, prange->last);
>   	if (l >= s)
> -		svm_range_unmap_from_gpus(prange, s, l);
> +		svm_range_unmap_from_gpus(prange, s, l, trigger);
>   	svm_range_unmap_split(mm, prange, prange, start, last);
>   
>   	if (unmap_parent)
> @@ -2333,7 +2344,7 @@ svm_range_cpu_invalidate_pagetables(struct mmu_interval_notifier *mni,
>   		svm_range_unmap_from_cpu(mni->mm, prange, start, last);
>   		break;
>   	default:
> -		svm_range_evict(prange, mni->mm, start, last);
> +		svm_range_evict(prange, mni->mm, start, last, range->event);
>   		break;
>   	}
>   

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client
  2022-06-28 14:50 ` [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client Philip Yang
@ 2022-06-30 14:45   ` Felix Kuehling
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:45 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2022-06-28 um 10:50 schrieb Philip Yang:
> The synchronize_rcu may take several ms, which noticeably slows down
> applications close SMI event handle. Use call_rcu to free client->fifo
> and client asynchronously and eliminate the synchronize_rcu call in the
> user thread.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> index e5896b7a16dd..0472b56de245 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
> @@ -38,6 +38,7 @@ struct kfd_smi_client {
>   	uint64_t events;
>   	struct kfd_dev *dev;
>   	spinlock_t lock;
> +	struct rcu_head rcu;
>   	pid_t pid;
>   	bool suser;
>   };
> @@ -137,6 +138,14 @@ static ssize_t kfd_smi_ev_write(struct file *filep, const char __user *user,
>   	return sizeof(events);
>   }
>   
> +static void kfd_smi_ev_client_free(struct rcu_head *p)
> +{
> +	struct kfd_smi_client *ev = container_of(p, struct kfd_smi_client, rcu);
> +
> +	kfifo_free(&ev->fifo);
> +	kfree(ev);
> +}
> +
>   static int kfd_smi_ev_release(struct inode *inode, struct file *filep)
>   {
>   	struct kfd_smi_client *client = filep->private_data;
> @@ -146,10 +155,7 @@ static int kfd_smi_ev_release(struct inode *inode, struct file *filep)
>   	list_del_rcu(&client->list);
>   	spin_unlock(&dev->smi_lock);
>   
> -	synchronize_rcu();
> -	kfifo_free(&client->fifo);
> -	kfree(client);
> -
> +	call_rcu(&client->rcu, kfd_smi_ev_client_free);
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event
  2022-06-28 14:50 ` [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event Philip Yang
@ 2022-06-30 14:45   ` Felix Kuehling
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:45 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2022-06-28 um 10:50 schrieb Philip Yang:
> Indicate SMI profiling events available.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   include/uapi/linux/kfd_ioctl.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index f239e260796b..b024e8ba865d 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -35,9 +35,10 @@
>    * - 1.7 - Checkpoint Restore (CRIU) API
>    * - 1.8 - CRIU - Support for SDMA transfers with GTT BOs
>    * - 1.9 - Add available memory ioctl
> + * - 1.10 - Add SMI profiler event log
>    */
>   #define KFD_IOCTL_MAJOR_VERSION 1
> -#define KFD_IOCTL_MINOR_VERSION 9
> +#define KFD_IOCTL_MINOR_VERSION 10
>   
>   struct kfd_ioctl_get_version_args {
>   	__u32 major_version;	/* from KFD */

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers
  2022-06-28 14:50 ` [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers Philip Yang
@ 2022-06-30 14:46   ` Felix Kuehling
  0 siblings, 0 replies; 20+ messages in thread
From: Felix Kuehling @ 2022-06-30 14:46 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2022-06-28 um 10:50 schrieb Philip Yang:
> Define new system management interface event IDs for migration, GPU
> recoverable page fault, user queues eviction, restore and unmap from
> GPU events and corresponding event triggers, those will be implemented
> in the following patches.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   include/uapi/linux/kfd_ioctl.h | 37 ++++++++++++++++++++++++++++++++++
>   1 file changed, 37 insertions(+)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index c648ed7c5ff1..f239e260796b 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -468,6 +468,43 @@ enum kfd_smi_event {
>   	KFD_SMI_EVENT_THERMAL_THROTTLE = 2,
>   	KFD_SMI_EVENT_GPU_PRE_RESET = 3,
>   	KFD_SMI_EVENT_GPU_POST_RESET = 4,
> +	KFD_SMI_EVENT_MIGRATE_START = 5,
> +	KFD_SMI_EVENT_MIGRATE_END = 6,
> +	KFD_SMI_EVENT_PAGE_FAULT_START = 7,
> +	KFD_SMI_EVENT_PAGE_FAULT_END = 8,
> +	KFD_SMI_EVENT_QUEUE_EVICTION = 9,
> +	KFD_SMI_EVENT_QUEUE_RESTORE = 10,
> +	KFD_SMI_EVENT_UNMAP_FROM_GPU = 11,
> +
> +	/*
> +	 * max event number, as a flag bit to get events from all processes,
> +	 * this requires super user permission, otherwise will not be able to
> +	 * receive event from any process. Without this flag to receive events
> +	 * from same process.
> +	 */
> +	KFD_SMI_EVENT_ALL_PROCESS = 64
> +};
> +
> +enum KFD_MIGRATE_TRIGGERS {
> +	KFD_MIGRATE_TRIGGER_PREFETCH,
> +	KFD_MIGRATE_TRIGGER_PAGEFAULT_GPU,
> +	KFD_MIGRATE_TRIGGER_PAGEFAULT_CPU,
> +	KFD_MIGRATE_TRIGGER_TTM_EVICTION
> +};
> +
> +enum KFD_QUEUE_EVICTION_TRIGGERS {
> +	KFD_QUEUE_EVICTION_TRIGGER_SVM,
> +	KFD_QUEUE_EVICTION_TRIGGER_USERPTR,
> +	KFD_QUEUE_EVICTION_TRIGGER_TTM,
> +	KFD_QUEUE_EVICTION_TRIGGER_SUSPEND,
> +	KFD_QUEUE_EVICTION_CRIU_CHECKPOINT,
> +	KFD_QUEUE_EVICTION_CRIU_RESTORE
> +};
> +
> +enum KFD_SVM_UNMAP_TRIGGERS {
> +	KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY,
> +	KFD_SVM_UNMAP_TRIGGER_MMU_NOTIFY_MIGRATE,
> +	KFD_SVM_UNMAP_TRIGGER_UNMAP_FROM_CPU
>   };
>   
>   #define KFD_SMI_EVENT_MASK_FROM_INDEX(i) (1ULL << ((i) - 1))

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault SMI event
  2022-06-30 14:19   ` Felix Kuehling
@ 2022-06-30 14:53     ` philip yang
  0 siblings, 0 replies; 20+ messages in thread
From: philip yang @ 2022-06-30 14:53 UTC (permalink / raw)
  To: Felix Kuehling, Philip Yang, amd-gfx

[-- Attachment #1: Type: text/html, Size: 13618 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-30 14:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-28 14:50 [PATCH v5 0/11] HMM profiler interface Philip Yang
2022-06-28 14:50 ` [PATCH v5 1/11] drm/amdkfd: Add KFD SMI event IDs and triggers Philip Yang
2022-06-30 14:46   ` Felix Kuehling
2022-06-28 14:50 ` [PATCH v5 2/11] drm/amdkfd: Enable per process SMI event Philip Yang
2022-06-28 14:50 ` [PATCH v5 3/11] drm/amdkfd: Add GPU recoverable fault " Philip Yang
2022-06-30 14:19   ` Felix Kuehling
2022-06-30 14:53     ` philip yang
2022-06-28 14:50 ` [PATCH v5 4/11] drm/amdkfd: Add migration " Philip Yang
2022-06-30 14:29   ` Felix Kuehling
2022-06-28 14:50 ` [PATCH v5 5/11] drm/amdkfd: Add user queue eviction restore " Philip Yang
2022-06-30 14:36   ` Felix Kuehling
2022-06-28 14:50 ` [PATCH v5 6/11] drm/amdkfd: Add unmap from GPU " Philip Yang
2022-06-30 14:39   ` Felix Kuehling
2022-06-28 14:50 ` [PATCH v5 7/11] drm/amdkfd: Asynchronously free smi_client Philip Yang
2022-06-30 14:45   ` Felix Kuehling
2022-06-28 14:50 ` [PATCH v5 8/11] drm/amdkfd: Bump KFD API version for SMI profiling event Philip Yang
2022-06-30 14:45   ` Felix Kuehling
2022-06-28 14:50 ` [PATCH 9/11] libhsakmt: hsaKmtGetNodeProperties add gpu_id Philip Yang
2022-06-28 14:50 ` [PATCH 10/11] libhsakmt: add open SMI event handle Philip Yang
2022-06-28 14:50 ` [PATCH 11/11] ROCR-Runtime Basic SVM profiler Philip Yang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.