All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Kim <jonathan.kim@amd.com>
To: <amd-gfx@lists.freedesktop.org>, <dri-devel@lists.freedesktop.org>
Cc: Felix.Kuehling@amd.com, Jonathan.Kim@amd.com
Subject: [PATCH 30/32] drm/amdkfd: add debug queue snapshot operation
Date: Wed, 25 Jan 2023 14:53:59 -0500	[thread overview]
Message-ID: <20230125195401.4183544-31-jonathan.kim@amd.com> (raw)
In-Reply-To: <20230125195401.4183544-1-jonathan.kim@amd.com>

Allow the debugger to get a snapshot of a specified number of queues
containing various queue property information that is copied to the
debugger.

Since the debugger doesn't know how many queues exist at any given time,
allow the debugger to pass the requested number of snapshots as 0 to get
the actual number of potential snapshots to use for a subsequent snapshot
request for actual information.

To prevent future ABI breakage, pass in the requested entry_size.
The KFD will return it's own entry_size in case the debugger still wants
log the information in a core dump on sizing failure.

Also allow the debugger to clear exceptions when doing a snapshot.

v3: fix uninitialized return and change queue snapshot to type void for
proper increment on buffer copy.
use memset 0 to init snapshot entry to clear struct padding.

v2: change buf_size arg to num_queues for clarity.
fix minimum entry size calculation.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  6 +++
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 36 ++++++++++++++++
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  5 +++
 .../amd/amdkfd/kfd_process_queue_manager.c    | 41 +++++++++++++++++++
 5 files changed, 91 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d3d2026b6e65..93b288233577 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2965,6 +2965,12 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v
 				&args->query_exception_info.info_size);
 		break;
 	case KFD_IOC_DBG_TRAP_GET_QUEUE_SNAPSHOT:
+		r = pqm_get_queue_snapshot(&target->pqm,
+				args->queue_snapshot.exception_mask,
+				(void __user *)args->queue_snapshot.snapshot_buf_ptr,
+				&args->queue_snapshot.num_queues,
+				&args->queue_snapshot.entry_size);
+		break;
 	case KFD_IOC_DBG_TRAP_GET_DEVICE_SNAPSHOT:
 		pr_warn("Debug op %i not supported yet\n", args->op);
 		r = -EACCES;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 7792fe9491c5..5ae504a512f0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3000,6 +3000,42 @@ int suspend_queues(struct kfd_process *p,
 	return total_suspended;
 }
 
+static uint32_t set_queue_type_for_user(struct queue_properties *q_props)
+{
+	switch (q_props->type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+		return q_props->format == KFD_QUEUE_FORMAT_PM4
+					? KFD_IOC_QUEUE_TYPE_COMPUTE
+					: KFD_IOC_QUEUE_TYPE_COMPUTE_AQL;
+	case KFD_QUEUE_TYPE_SDMA:
+		return KFD_IOC_QUEUE_TYPE_SDMA;
+	case KFD_QUEUE_TYPE_SDMA_XGMI:
+		return KFD_IOC_QUEUE_TYPE_SDMA_XGMI;
+	default:
+		WARN_ONCE(true, "queue type not recognized!");
+		return 0xffffffff;
+	};
+}
+
+void set_queue_snapshot_entry(struct queue *q,
+			      uint64_t exception_clear_mask,
+			      struct kfd_queue_snapshot_entry *qss_entry)
+{
+	qss_entry->ring_base_address = q->properties.queue_address;
+	qss_entry->write_pointer_address = (uint64_t)q->properties.write_ptr;
+	qss_entry->read_pointer_address = (uint64_t)q->properties.read_ptr;
+	qss_entry->ctx_save_restore_address =
+				q->properties.ctx_save_restore_area_address;
+	qss_entry->ctx_save_restore_area_size =
+				q->properties.ctx_save_restore_area_size;
+	qss_entry->exception_status = q->properties.exception_status;
+	qss_entry->queue_id = q->properties.queue_id;
+	qss_entry->gpu_id = q->device->id;
+	qss_entry->ring_size = (uint32_t)q->properties.queue_size;
+	qss_entry->queue_type = set_queue_type_for_user(&q->properties);
+	q->properties.exception_status &= ~exception_clear_mask;
+}
+
 int debug_lock_and_unmap(struct device_queue_manager *dqm)
 {
 	int r;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 7ccf8d0d1867..89d4a5b293a5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -296,6 +296,9 @@ int suspend_queues(struct kfd_process *p,
 int resume_queues(struct kfd_process *p,
 		uint32_t num_queues,
 		uint32_t *usr_queue_id_array);
+void set_queue_snapshot_entry(struct queue *q,
+			      uint64_t exception_clear_mask,
+			      struct kfd_queue_snapshot_entry *qss_entry);
 int debug_lock_and_unmap(struct device_queue_manager *dqm);
 int debug_map_and_unlock(struct device_queue_manager *dqm);
 int debug_refresh_runlist(struct device_queue_manager *dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cfc50d1690c7..cc7816db60eb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1302,6 +1302,11 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 		       void __user *ctl_stack,
 		       u32 *ctl_stack_used_size,
 		       u32 *save_area_used_size);
+int pqm_get_queue_snapshot(struct process_queue_manager *pqm,
+			   uint64_t exception_clear_mask,
+			   void __user *buf,
+			   int *num_qss_entries,
+			   uint32_t *entry_size);
 
 int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
 			      uint64_t fence_value,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 0ae6026c7d69..221cd4b03f1c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -576,6 +576,47 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 						       save_area_used_size);
 }
 
+int pqm_get_queue_snapshot(struct process_queue_manager *pqm,
+			   uint64_t exception_clear_mask,
+			   void __user *buf,
+			   int *num_qss_entries,
+			   uint32_t *entry_size)
+{
+	struct process_queue_node *pqn;
+	uint32_t tmp_entry_size = *entry_size, tmp_qss_entries = *num_qss_entries;
+	int r = 0;
+
+	*num_qss_entries = 0;
+	if (!(*entry_size))
+		return -EINVAL;
+
+	*entry_size = min_t(size_t, *entry_size, sizeof(struct kfd_queue_snapshot_entry));
+	mutex_lock(&pqm->process->event_mutex);
+
+	list_for_each_entry(pqn, &pqm->queues, process_queue_list) {
+		if (!pqn->q)
+			continue;
+
+		if (*num_qss_entries < tmp_qss_entries) {
+			struct kfd_queue_snapshot_entry src;
+
+			memset(&src, 0, sizeof(src));
+
+			set_queue_snapshot_entry(pqn->q, exception_clear_mask, &src);
+
+			if (copy_to_user(buf, &src, *entry_size)) {
+				r = -EFAULT;
+				break;
+			}
+			buf += tmp_entry_size;
+		}
+		*num_qss_entries += 1;
+	}
+
+	mutex_unlock(&pqm->process->event_mutex);
+	return r;
+}
+
 static int get_queue_data_sizes(struct kfd_process_device *pdd,
 				struct queue *q,
 				uint32_t *mqd_size,
-- 
2.25.1


  parent reply	other threads:[~2023-01-25 19:55 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-25 19:53 [PATCH 00/32] Upstream of kernel support for AMDGPU ISA debugging Jonathan Kim
2023-01-25 19:53 ` [PATCH 01/32] drm/amdkfd: add debug and runtime enable interface Jonathan Kim
2023-02-16 22:16   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 02/32] drm/amdkfd: display debug capabilities Jonathan Kim
2023-02-16 22:24   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 03/32] drm/amdkfd: prepare per-process debug enable and disable Jonathan Kim
2023-02-16 23:44   ` Felix Kuehling
2023-03-23 19:12     ` Kim, Jonathan
2023-03-23 20:08       ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 04/32] drm/amdgpu: add kgd hw debug mode setting interface Jonathan Kim
2023-01-25 19:53 ` [PATCH 05/32] drm/amdgpu: setup hw debug registers on driver initialization Jonathan Kim
2023-02-16 22:39   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 06/32] drm/amdgpu: add gfx9 hw debug mode enable and disable calls Jonathan Kim
2023-01-29  5:12   ` kernel test robot
2023-01-29  5:12     ` kernel test robot
2023-02-16 22:54   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 07/32] drm/amdgpu: add gfx9.4.1 " Jonathan Kim
2023-01-29  6:34   ` kernel test robot
2023-01-29  6:34     ` kernel test robot
2023-02-16 23:01   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 08/32] drm/amdgpu: add gfx10 " Jonathan Kim
2023-01-29  7:55   ` kernel test robot
2023-01-29  7:55     ` kernel test robot
2023-02-16 23:11   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 09/32] drm/amdgpu: add gfx9.4.2 " Jonathan Kim
2023-02-16 23:14   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 10/32] drm/amdgpu: add gfx11 " Jonathan Kim
2023-02-16 23:19   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 11/32] drm/amdgpu: add configurable grace period for unmap queues Jonathan Kim
2023-03-20 19:19   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 12/32] drm/amdkfd: prepare map process for single process debug devices Jonathan Kim
2023-03-20 20:06   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 13/32] drm/amdgpu: prepare map process for multi-process " Jonathan Kim
2023-03-20 20:16   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 14/32] drm/amdgpu: expose debug api for mes Jonathan Kim
2023-03-20 20:47   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 15/32] drm/amdkfd: prepare trap workaround for gfx11 Jonathan Kim
2023-03-20 21:49   ` Felix Kuehling
2023-03-23 13:50     ` Kim, Jonathan
2023-03-23 14:00       ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 16/32] drm/amdkfd: add per process hw trap enable and disable functions Jonathan Kim
2023-03-20 23:06   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 17/32] drm/amdkfd: add raise exception event function Jonathan Kim
2023-03-20 23:18   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 18/32] drm/amdkfd: add send exception operation Jonathan Kim
2023-03-20 23:26   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 19/32] drm/amdkfd: add runtime enable operation Jonathan Kim
2023-03-21  0:31   ` Felix Kuehling
2023-03-23 19:45     ` Kim, Jonathan
2023-01-25 19:53 ` [PATCH 20/32] drm/amdkfd: add debug trap enabled flag to tma Jonathan Kim
2023-01-25 19:53 ` [PATCH 21/32] drm/amdkfd: update process interrupt handling for debug events Jonathan Kim
2023-03-21 21:07   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 22/32] drm/amdkfd: add debug set exceptions enabled operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 23/32] drm/amdkfd: add debug wave launch override operation Jonathan Kim
2023-03-21 21:37   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 24/32] drm/amdkfd: add debug wave launch mode operation Jonathan Kim
2023-03-21 21:42   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 25/32] drm/amdkfd: add debug suspend and resume process queues operation Jonathan Kim
2023-03-21 22:16   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 26/32] drm/amdkfd: add debug set and clear address watch points operation Jonathan Kim
2023-03-22 21:38   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 27/32] drm/amdkfd: add debug set flags operation Jonathan Kim
2023-03-22 21:47   ` Felix Kuehling
2023-01-25 19:53 ` [PATCH 28/32] drm/amdkfd: add debug query event operation Jonathan Kim
2023-01-25 19:53 ` [PATCH 29/32] drm/amdkfd: add debug query exception info operation Jonathan Kim
2023-01-25 19:53 ` Jonathan Kim [this message]
2023-03-22 21:52   ` [PATCH 30/32] drm/amdkfd: add debug queue snapshot operation Felix Kuehling
2023-01-25 19:54 ` [PATCH 31/32] drm/amdkfd: add debug device " Jonathan Kim
2023-03-22 21:54   ` Felix Kuehling
2023-01-25 19:54 ` [PATCH 32/32] drm/amdkfd: bump kfd ioctl minor version for debug api availability Jonathan Kim
2023-03-22 21:56   ` Felix Kuehling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230125195401.4183544-31-jonathan.kim@amd.com \
    --to=jonathan.kim@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.