All of lore.kernel.org
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: Jonathan Kim <jonathan.kim@amd.com>, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 22/29] drm/amdkfd: add debug suspend and resume process queues operation
Date: Tue, 29 Nov 2022 18:55:16 -0500	[thread overview]
Message-ID: <b05ea898-76c8-f19c-06d8-c176e7877d8d@amd.com> (raw)
In-Reply-To: <20221031162359.445805-22-jonathan.kim@amd.com>


On 2022-10-31 12:23, Jonathan Kim wrote:
> In order to inspect waves from the saved context at any point during a
> debug session, the debugger must be able to preempt queues to trigger
> context save by suspending them.
>
> On queue suspend, the KFD will copy the context save header information
> so that the debugger can correctly crawl the appropriate size of the saved
> context. The debugger must then also be allowed to resume suspended queues.
>
> A queue that is newly created cannot be suspended because queue ids are
> recycled after destruction so the debugger needs to know that this has
> occurred.  Query functions will be later added that will clear a given
> queue of its new queue status.
>
> A queue cannot be destroyed while it is suspended to preserve its saved
> context during debugger inspection.  Have queue destruction block while
> a queue is suspended and unblocked when it is resumed.  Likewise, if a
> queue is about to be destroyed, it cannot be suspended.
>
> Return the number of queues successfully suspended or resumed along with
> a per queue status array where the upper bits per queue status show that
> the request was invalid (new/destroyed queue suspend request, missing
> queue) or an error occurred (HWS in a fatal state so it can't suspend or
> resume queues).
>
> Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>

Some nit-picks inline. Other than that, this patch looks good to me.


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  12 +
>   drivers/gpu/drm/amd/amdkfd/kfd_debug.c        |   7 +
>   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 401 +++++++++++++++++-
>   .../drm/amd/amdkfd/kfd_device_queue_manager.h |  11 +
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |  10 +
>   .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |  14 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |   5 +-
>   7 files changed, 454 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 63665279ce4d..ec26c51177f9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -410,6 +410,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>   	pr_debug("Write ptr address   == 0x%016llX\n",
>   			args->write_pointer_address);
>   
> +	kfd_dbg_ev_raise(KFD_EC_MASK(EC_QUEUE_NEW), p, dev, queue_id, false, NULL, 0);
>   	return 0;
>   
>   err_create_queue:
> @@ -2903,7 +2904,18 @@ static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process *p, v
>   				args->launch_mode.launch_mode);
>   		break;
>   	case KFD_IOC_DBG_TRAP_SUSPEND_QUEUES:
> +		r = suspend_queues(target,
> +				args->suspend_queues.num_queues,
> +				args->suspend_queues.grace_period,
> +				args->suspend_queues.exception_mask,
> +				(uint32_t *)args->suspend_queues.queue_array_ptr);
> +
> +		break;
>   	case KFD_IOC_DBG_TRAP_RESUME_QUEUES:
> +		r = resume_queues(target, false,
> +				args->resume_queues.num_queues,
> +				(uint32_t *)args->resume_queues.queue_array_ptr);
> +		break;
>   	case KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH:
>   	case KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH:
>   	case KFD_IOC_DBG_TRAP_SET_FLAGS:
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> index 210851f2cdb3..afa56aad316b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> @@ -274,6 +274,13 @@ void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind, int unwind
>   
>   		count++;
>   	}
> +
> +	if (!unwind) {
> +		int resume_count = resume_queues(target, true, 0, NULL);
> +
> +		if (resume_count)
> +			pr_debug("Resumed %d queues\n", resume_count);
> +	}
>   }
>   
>   static void kfd_dbg_clean_exception_status(struct kfd_process *target)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index bf4787b4dc6c..589efbefc8dc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -921,6 +921,79 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q,
>   	return retval;
>   }
>   
> +/* suspend_single_queue does not lock the dqm like the
> + * evict_process_queues_cpsch or evict_process_queues_nocpsch. You should
> + * lock the dqm before calling, and unlock after calling.
> + *
> + * The reason we don't lock the dqm is because this function may be
> + * called on multiple queues in a loop, so rather than locking/unlocking
> + * multiple times, we will just keep the dqm locked for all of the calls.
> + */
> +static int suspend_single_queue(struct device_queue_manager *dqm,
> +				      struct kfd_process_device *pdd,
> +				      struct queue *q)
> +{
> +	bool is_new;
> +
> +	if (q->properties.is_suspended)
> +		return 0;
> +
> +	pr_debug("Suspending PASID %u queue [%i]\n",
> +			pdd->process->pasid,
> +			q->properties.queue_id);
> +
> +	is_new = q->properties.exception_status & KFD_EC_MASK(EC_QUEUE_NEW);
> +
> +	if (is_new || q->properties.is_being_destroyed) {
> +		pr_debug("Suspend: skip %s queue id %i\n",
> +				is_new ? "new" : "destroyed",
> +				q->properties.queue_id);
> +		return -EBUSY;
> +	}
> +
> +	q->properties.is_suspended = true;
> +	if (q->properties.is_active) {
> +		decrement_queue_count(dqm, &pdd->qpd, q);
> +		q->properties.is_active = false;
> +	}
> +
> +	return 0;
> +}
> +
> +/* resume_single_queue does not lock the dqm like the functions
> + * restore_process_queues_cpsch or restore_process_queues_nocpsch. You should
> + * lock the dqm before calling, and unlock after calling.
> + *
> + * The reason we don't lock the dqm is because this function may be
> + * called on multiple queues in a loop, so rather than locking/unlocking
> + * multiple times, we will just keep the dqm locked for all of the calls.
> + */
> +static void resume_single_queue(struct device_queue_manager *dqm,
> +				      struct qcm_process_device *qpd,
> +				      struct queue *q)
> +{
> +	struct kfd_process_device *pdd;
> +	uint64_t pd_base;
> +
> +	if (!q->properties.is_suspended)
> +		return;
> +
> +	pdd = qpd_to_pdd(qpd);
> +	/* Retrieve PD base */
> +	pd_base = amdgpu_amdkfd_gpuvm_get_process_page_dir(pdd->drm_priv);

This variable seems to be unused.


> +
> +	pr_debug("Restoring from suspend PASID %u queue [%i]\n",
> +			    pdd->process->pasid,
> +			    q->properties.queue_id);
> +
> +	q->properties.is_suspended = false;
> +
> +	if (QUEUE_IS_ACTIVE(q->properties)) {
> +		q->properties.is_active = true;
> +		increment_queue_count(dqm, qpd, q);
> +	}
> +}
> +
>   static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
>   					struct qcm_process_device *qpd)
>   {
> @@ -1885,6 +1958,31 @@ static int execute_queues_cpsch(struct device_queue_manager *dqm,
>   	return map_queues_cpsch(dqm);
>   }
>   
> +static int wait_on_destroy_queue(struct device_queue_manager *dqm,
> +				 struct queue *q)
> +{
> +	struct kfd_process_device *pdd = kfd_get_process_device_data(q->device,
> +								q->process);
> +	int ret = 0;
> +
> +	if (pdd->qpd.is_debug)
> +		return ret;
> +
> +	q->properties.is_being_destroyed = true;
> +
> +	if (pdd->process->debug_trap_enabled && q->properties.is_suspended) {
> +		dqm_unlock(dqm);
> +		mutex_unlock(&q->process->mutex);
> +		ret = wait_event_interruptible(dqm->destroy_wait,
> +						!q->properties.is_suspended);
> +
> +		mutex_lock(&q->process->mutex);
> +		dqm_lock(dqm);
> +	}
> +
> +	return ret;
> +}
> +
>   static int destroy_queue_cpsch(struct device_queue_manager *dqm,
>   				struct qcm_process_device *qpd,
>   				struct queue *q)
> @@ -1904,11 +2002,16 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
>   				q->properties.queue_id);
>   	}
>   
> -	retval = 0;
> -
>   	/* remove queue from list to prevent rescheduling after preemption */
>   	dqm_lock(dqm);
>   
> +	retval = wait_on_destroy_queue(dqm, q);
> +
> +	if (retval) {
> +		dqm_unlock(dqm);
> +		return retval;
> +	}
> +
>   	if (qpd->is_debug) {
>   		/*
>   		 * error, currently we do not allow to destroy a queue
> @@ -1954,7 +2057,17 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
>   
>   	dqm_unlock(dqm);
>   
> -	/* Do free_mqd after dqm_unlock(dqm) to avoid circular locking */
> +	/*
> +	 * Do free_mqd and delete raise event after dqm_unlock(dqm) to avoid

I think this was meant to say "... raise delete event ...".


> +	 * circular locking
> +	 */
> +	kfd_dbg_ev_raise(KFD_EC_MASK(EC_DEVICE_QUEUE_DELETE),
> +			qpd->pqm->process,
> +			q->device,
> +			-1,
> +			false,
> +			NULL,
> +			0);

One line per parameter seems excessive here. The last 4 parameters are 
basically N/A. I think this is more readable:

+	kfd_dbg_ev_raise(KFD_EC_MASK(EC_DEVICE_QUEUE_DELETE),
+			qpd->pqm->process, q->device,
+			-1, false, NULL, 0);


>   	mqd_mgr->free_mqd(mqd_mgr, q->mqd, q->mqd_mem_obj);
>   
>   	return retval;
> @@ -2418,8 +2531,10 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
>   		goto out_free;
>   	}
>   
> -	if (!dqm->ops.initialize(dqm))
> +	if (!dqm->ops.initialize(dqm)) {
> +		init_waitqueue_head(&dqm->destroy_wait);
>   		return dqm;
> +	}
>   
>   out_free:
>   	kfree(dqm);
> @@ -2557,6 +2672,284 @@ int release_debug_trap_vmid(struct device_queue_manager *dqm,
>   	return r;
>   }
>   
> +#define QUEUE_NOT_FOUND		-1
> +/* invalidate queue operation in array */
> +static void q_array_invalidate(uint32_t num_queues, uint32_t *queue_ids)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_queues; i++)
> +		queue_ids[i] |= KFD_DBG_QUEUE_INVALID_MASK;
> +}
> +
> +/* find queue index in array */
> +static int q_array_get_index(unsigned int queue_id,
> +		uint32_t num_queues,
> +		uint32_t *queue_ids)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_queues; i++)
> +		if (queue_id == (queue_ids[i] & ~KFD_DBG_QUEUE_INVALID_MASK))
> +			return i;
> +
> +	return QUEUE_NOT_FOUND;
> +}
> +
> +struct copy_context_work_handler_workarea {
> +	struct work_struct copy_context_work;
> +	struct kfd_process *p;
> +};
> +
> +static void copy_context_work_handler (struct work_struct *work)
> +{
> +	struct copy_context_work_handler_workarea *workarea;
> +	struct mqd_manager *mqd_mgr;
> +	struct queue *q;
> +	struct mm_struct *mm;
> +	struct kfd_process *p;
> +	uint32_t tmp_ctl_stack_used_size, tmp_save_area_used_size;
> +	int i;
> +
> +	workarea = container_of(work,
> +			struct copy_context_work_handler_workarea,
> +			copy_context_work);
> +
> +	p = workarea->p;
> +	mm = get_task_mm(p->lead_thread);
> +
> +	if (!mm)
> +		return;
> +
> +	kthread_use_mm(mm);
> +	for (i = 0; i < p->n_pdds; i++) {
> +		struct kfd_process_device *pdd = p->pdds[i];
> +		struct device_queue_manager *dqm = pdd->dev->dqm;
> +		struct qcm_process_device *qpd = &pdd->qpd;
> +
> +		list_for_each_entry(q, &qpd->queues_list, list) {
> +			mqd_mgr = dqm->mqd_mgrs[KFD_MQD_TYPE_CP];
> +
> +			/* We ignore the return value from get_wave_state
> +			 * because
> +			 * i) right now, it always returns 0, and
> +			 * ii) if we hit an error, we would continue to the
> +			 *      next queue anyway.
> +			 */
> +			mqd_mgr->get_wave_state(mqd_mgr,
> +					q->mqd,
> +					(void __user *)	q->properties.ctx_save_restore_area_address,
> +					&tmp_ctl_stack_used_size,
> +					&tmp_save_area_used_size);
> +		}
> +	}
> +	kthread_unuse_mm(mm);
> +	mmput(mm);
> +}
> +
> +static uint32_t *get_queue_ids(uint32_t num_queues, uint32_t *usr_queue_id_array)
> +{
> +	size_t array_size = num_queues * sizeof(uint32_t);
> +	uint32_t *queue_ids = NULL;
> +
> +	if (!usr_queue_id_array)
> +		return NULL;
> +
> +	queue_ids = kzalloc(array_size, GFP_KERNEL);
> +	if (!queue_ids)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (copy_from_user(queue_ids, usr_queue_id_array, array_size))
> +		return ERR_PTR(-EFAULT);
> +
> +	return queue_ids;
> +}
> +
> +int resume_queues(struct kfd_process *p,
> +		bool resume_all_queues,
> +		uint32_t num_queues,
> +		uint32_t *usr_queue_id_array)
> +{
> +	uint32_t *queue_ids = get_queue_ids(num_queues, usr_queue_id_array);
> +	int total_resumed = 0;
> +	int i;
> +
> +	if (!resume_all_queues && IS_ERR(queue_ids))
> +		return PTR_ERR(queue_ids);
> +
> +	/* mask all queues as invalid.  unmask per successful request */
> +	if (!resume_all_queues)
> +		q_array_invalidate(num_queues, queue_ids);
> +
> +	for (i = 0; i < p->n_pdds; i++) {
> +		struct kfd_process_device *pdd = p->pdds[i];
> +		struct device_queue_manager *dqm = pdd->dev->dqm;
> +		struct qcm_process_device *qpd = &pdd->qpd;
> +		struct queue *q;
> +		int r, per_device_resumed = 0;
> +
> +		dqm_lock(dqm);
> +
> +		/* unmask queues that resume or already resumed as valid */
> +		list_for_each_entry(q, &qpd->queues_list, list) {
> +			int q_idx = QUEUE_NOT_FOUND;
> +
> +			if (queue_ids)
> +				q_idx = q_array_get_index(
> +						q->properties.queue_id,
> +						num_queues,
> +						queue_ids);
> +
> +			if (resume_all_queues || q_idx != QUEUE_NOT_FOUND) {
> +				resume_single_queue(dqm, &pdd->qpd, q);
> +				if (queue_ids)
> +					queue_ids[q_idx] &=
> +						~KFD_DBG_QUEUE_INVALID_MASK;
> +				per_device_resumed++;
> +			}
> +		}
> +
> +		if (!per_device_resumed) {
> +			dqm_unlock(dqm);
> +			continue;
> +		}
> +
> +		r = execute_queues_cpsch(dqm,
> +					KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES,
> +					0,
> +					USE_DEFAULT_GRACE_PERIOD);
> +		if (r) {
> +			pr_err("Failed to resume process queues\n");
> +			if (!resume_all_queues) {
> +				list_for_each_entry(q, &qpd->queues_list, list) {
> +					int q_idx = q_array_get_index(
> +							q->properties.queue_id,
> +							num_queues,
> +							queue_ids);
> +
> +					/* mask queue as error on resume fail */
> +					if (q_idx != QUEUE_NOT_FOUND)
> +						queue_ids[q_idx] |=
> +							KFD_DBG_QUEUE_ERROR_MASK;
> +				}
> +			}
> +		} else {
> +			wake_up_all(&dqm->destroy_wait);
> +			total_resumed += per_device_resumed;
> +		}
> +
> +		dqm_unlock(dqm);
> +	}
> +
> +	if (copy_to_user((void __user *)usr_queue_id_array, queue_ids,
> +			num_queues * sizeof(uint32_t)))

I think we should skip this if resume_all_queues is true, because the 
queue array pointer is NULL in that case. I think we get away with it 
because num_queues is also 0 and copy_to_user becomes a NOP, but seems a 
bit weird.

	if (!resume_all_queues && copy_to_user(...))

Also, if it's legal to call this from user mode with usr_queue_id_array 
== NULL, we should instead check

	if (usr_queue_id_array && copy_to_user(...))

Maybe we could just replace the resume_all_queues parameter with a check 
for !usr_queue_id_array.

Regards,
   Felix


> +		pr_err("copy_to_user failed on queue resume\n");
> +
> +	kfree(queue_ids);
> +
> +	return total_resumed;
> +}
> +
> +int suspend_queues(struct kfd_process *p,
> +			uint32_t num_queues,
> +			uint32_t grace_period,
> +			uint64_t exception_clear_mask,
> +			uint32_t *usr_queue_id_array)
> +{
> +	uint32_t *queue_ids = get_queue_ids(num_queues, usr_queue_id_array);
> +	int total_suspended = 0;
> +	int i;
> +
> +	if (IS_ERR(queue_ids))
> +		return PTR_ERR(queue_ids);
> +
> +	/* mask all queues as invalid.  umask on successful request */
> +	q_array_invalidate(num_queues, queue_ids);
> +
> +	for (i = 0; i < p->n_pdds; i++) {
> +		struct kfd_process_device *pdd = p->pdds[i];
> +		struct device_queue_manager *dqm = pdd->dev->dqm;
> +		struct qcm_process_device *qpd = &pdd->qpd;
> +		struct queue *q;
> +		int r, per_device_suspended = 0;
> +
> +		mutex_lock(&p->event_mutex);
> +		dqm_lock(dqm);
> +
> +		/* unmask queues that suspend or already suspended */
> +		list_for_each_entry(q, &qpd->queues_list, list) {
> +			int q_idx = q_array_get_index(q->properties.queue_id,
> +							num_queues,
> +							queue_ids);
> +
> +			if (q_idx != QUEUE_NOT_FOUND &&
> +					!suspend_single_queue(dqm, pdd, q)) {
> +				queue_ids[q_idx] &=
> +					~KFD_DBG_QUEUE_INVALID_MASK;
> +				per_device_suspended++;
> +			}
> +		}
> +
> +		if (!per_device_suspended) {
> +			dqm_unlock(dqm);
> +			mutex_unlock(&p->event_mutex);
> +			continue;
> +		}
> +
> +		r = execute_queues_cpsch(dqm,
> +			KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0,
> +			grace_period);
> +
> +		if (r)
> +			pr_err("Failed to suspend process queues.\n");
> +		else
> +			total_suspended += per_device_suspended;
> +
> +		list_for_each_entry(q, &qpd->queues_list, list) {
> +			int q_idx = q_array_get_index(q->properties.queue_id,
> +						num_queues, queue_ids);
> +
> +			if (q_idx == QUEUE_NOT_FOUND)
> +				continue;
> +
> +			/* mask queue as error on suspend fail */
> +			if (r)
> +				queue_ids[q_idx] |= KFD_DBG_QUEUE_ERROR_MASK;
> +			else if (exception_clear_mask)
> +				q->properties.exception_status &=
> +							~exception_clear_mask;
> +		}
> +
> +		dqm_unlock(dqm);
> +		mutex_unlock(&p->event_mutex);
> +		amdgpu_device_flush_hdp(dqm->dev->adev, NULL);
> +	}
> +
> +	if (total_suspended) {
> +		struct copy_context_work_handler_workarea copy_context_worker;
> +
> +		INIT_WORK_ONSTACK(
> +				&copy_context_worker.copy_context_work,
> +				copy_context_work_handler);
> +
> +		copy_context_worker.p = p;
> +
> +		schedule_work(&copy_context_worker.copy_context_work);
> +
> +
> +		flush_work(&copy_context_worker.copy_context_work);
> +		destroy_work_on_stack(&copy_context_worker.copy_context_work);
> +	}
> +
> +	if (copy_to_user((void __user *)usr_queue_id_array, queue_ids,
> +			num_queues * sizeof(uint32_t)))
> +		pr_err("copy_to_user failed on queue suspend\n");
> +
> +	kfree(queue_ids);
> +
> +	return total_suspended;
> +}
> +
>   int debug_lock_and_unmap(struct device_queue_manager *dqm)
>   {
>   	int r;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index bef3be84c5cc..12643528684c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -259,6 +259,8 @@ struct device_queue_manager {
>   	struct kfd_mem_obj	hiq_sdma_mqd;
>   	bool			sched_running;
>   	uint32_t		wait_times;
> +
> +	wait_queue_head_t	destroy_wait;
>   };
>   
>   void device_queue_manager_init_cik(
> @@ -286,6 +288,15 @@ int reserve_debug_trap_vmid(struct device_queue_manager *dqm,
>   			struct qcm_process_device *qpd);
>   int release_debug_trap_vmid(struct device_queue_manager *dqm,
>   			struct qcm_process_device *qpd);
> +int suspend_queues(struct kfd_process *p,
> +			uint32_t num_queues,
> +			uint32_t grace_period,
> +			uint64_t exception_clear_mask,
> +			uint32_t *usr_queue_id_array);
> +int resume_queues(struct kfd_process *p,
> +		bool resume_all_queues,
> +		uint32_t num_queues,
> +		uint32_t *usr_queue_id_array);
>   int debug_lock_and_unmap(struct device_queue_manager *dqm);
>   int debug_map_and_unlock(struct device_queue_manager *dqm);
>   int debug_refresh_runlist(struct device_queue_manager *dqm);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> index cb484ace17de..d74862755213 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
> @@ -237,6 +237,7 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
>   			  u32 *save_area_used_size)
>   {
>   	struct v10_compute_mqd *m;
> +	struct kfd_context_save_area_header header;
>   
>   	m = get_mqd(mqd);
>   
> @@ -255,6 +256,15 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
>   	 * accessible to user mode
>   	 */
>   
> +	header.control_stack_size = *ctl_stack_used_size;
> +	header.wave_state_size = *save_area_used_size;
> +
> +	header.wave_state_offset = m->cp_hqd_wg_state_offset;
> +	header.control_stack_offset = m->cp_hqd_cntl_stack_offset;
> +
> +	if (copy_to_user(ctl_stack, &header, sizeof(header)))
> +		return -EFAULT;
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> index 86f1cf090246..f05a2bed655a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> @@ -289,6 +289,7 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
>   			  u32 *save_area_used_size)
>   {
>   	struct v9_mqd *m;
> +	struct kfd_context_save_area_header header;
>   
>   	/* Control stack is located one page after MQD. */
>   	void *mqd_ctl_stack = (void *)((uintptr_t)mqd + PAGE_SIZE);
> @@ -300,7 +301,18 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
>   	*save_area_used_size = m->cp_hqd_wg_state_offset -
>   		m->cp_hqd_cntl_stack_size;
>   
> -	if (copy_to_user(ctl_stack, mqd_ctl_stack, m->cp_hqd_cntl_stack_size))
> +	header.control_stack_size = *ctl_stack_used_size;
> +	header.wave_state_size = *save_area_used_size;
> +
> +	header.wave_state_offset = m->cp_hqd_wg_state_offset;
> +	header.control_stack_offset = m->cp_hqd_cntl_stack_offset;
> +
> +	if (copy_to_user(ctl_stack, &header, sizeof(header)))
> +		return -EFAULT;
> +
> +	if (copy_to_user(ctl_stack + m->cp_hqd_cntl_stack_offset,
> +				mqd_ctl_stack + m->cp_hqd_cntl_stack_offset,
> +				*ctl_stack_used_size))
>   		return -EFAULT;
>   
>   	return 0;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index bd3d8a0b61b7..3d529c7499f8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -477,6 +477,8 @@ struct queue_properties {
>   	uint32_t doorbell_off;
>   	bool is_interop;
>   	bool is_evicted;
> +	bool is_suspended;
> +	bool is_being_destroyed;
>   	bool is_active;
>   	bool is_gws;
>   	/* Not relevant for user mode queues in cp scheduling */
> @@ -499,7 +501,8 @@ struct queue_properties {
>   #define QUEUE_IS_ACTIVE(q) ((q).queue_size > 0 &&	\
>   			    (q).queue_address != 0 &&	\
>   			    (q).queue_percent > 0 &&	\
> -			    !(q).is_evicted)
> +			    !(q).is_evicted &&		\
> +			    !(q).is_suspended)
>   
>   enum mqd_update_flag {
>   	UPDATE_FLAG_CU_MASK = 0,

  reply	other threads:[~2022-11-29 23:55 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 16:23 [PATCH 01/29] drm/amdkfd: add debug and runtime enable interface Jonathan Kim
2022-10-31 16:23 ` [PATCH 02/29] drm/amdkfd: display debug capabilities Jonathan Kim
2022-11-22 23:08   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 03/29] drm/amdkfd: prepare per-process debug enable and disable Jonathan Kim
2022-11-22 23:31   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 04/29] drm/amdgpu: add kgd hw debug mode setting interface Jonathan Kim
2022-12-01  0:08   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 05/29] drm/amdgpu: setup hw debug registers on driver initialization Jonathan Kim
2022-11-22 23:38   ` Felix Kuehling
2022-11-23 20:53     ` Kim, Jonathan
2022-12-01  0:18     ` Felix Kuehling
2022-12-01  0:23   ` Felix Kuehling
2022-12-02 17:42     ` Kim, Jonathan
2022-10-31 16:23 ` [PATCH 06/29] drm/amdgpu: add gfx9 hw debug mode enable and disable calls Jonathan Kim
2022-11-22 23:50   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 07/29] drm/amdgpu: add gfx9.4.1 " Jonathan Kim
2022-11-22 23:59   ` Felix Kuehling
2022-11-24 14:58     ` Kim, Jonathan
2022-11-24 16:25       ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 08/29] drm/amdgpu: add gfx10 " Jonathan Kim
2022-10-31 16:23 ` [PATCH 09/29] drm/amdgpu: add gfx9.4.2 " Jonathan Kim
2022-10-31 16:23 ` [PATCH 10/29] drm/amdgpu: add configurable grace period for unmap queues Jonathan Kim
2022-11-23  0:21   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 11/29] drm/amdkfd: prepare map process for single process debug devices Jonathan Kim
2022-10-31 16:23 ` [PATCH 12/29] drm/amdgpu: prepare map process for multi-process " Jonathan Kim
2022-10-31 16:23 ` [PATCH 13/29] drm/amdkfd: add per process hw trap enable and disable functions Jonathan Kim
2022-10-31 16:23 ` [PATCH 14/29] drm/amdkfd: add raise exception event function Jonathan Kim
2022-10-31 16:23 ` [PATCH 15/29] drm/amdkfd: add send exception operation Jonathan Kim
2022-10-31 16:23 ` [PATCH 16/29] drm/amdkfd: add runtime enable operation Jonathan Kim
2022-11-23  0:52   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 17/29] drm/amdkfd: Add debug trap enabled flag to TMA Jonathan Kim
2022-11-23  0:44   ` Felix Kuehling
2022-11-24 14:51     ` Kim, Jonathan
2022-11-24 16:23       ` Felix Kuehling
2022-11-24 20:27         ` Kim, Jonathan
2022-11-25 16:53           ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 18/29] drm/amdkfd: update process interrupt handling for debug events Jonathan Kim
2022-10-31 16:23 ` [PATCH 19/29] drm/amdkfd: add debug set exceptions enabled operation Jonathan Kim
2022-11-24 21:24   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 20/29] drm/amdkfd: add debug wave launch override operation Jonathan Kim
2022-11-29 22:37   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 21/29] drm/amdkfd: add debug wave launch mode operation Jonathan Kim
2022-12-01  0:02   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 22/29] drm/amdkfd: add debug suspend and resume process queues operation Jonathan Kim
2022-11-29 23:55   ` Felix Kuehling [this message]
2022-10-31 16:23 ` [PATCH 23/29] drm/amdkfd: add debug set and clear address watch points operation Jonathan Kim
2022-11-30  0:34   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 24/29] drm/amdkfd: add debug set flags operation Jonathan Kim
2022-11-30  0:39   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 25/29] drm/amdkfd: add debug query event operation Jonathan Kim
2022-11-30  0:44   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 26/29] drm/amdkfd: add debug query exception info operation Jonathan Kim
2022-11-30  0:50   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 27/29] drm/amdkfd: add debug queue snapshot operation Jonathan Kim
2022-11-30 23:55   ` Felix Kuehling
2022-12-02 19:13     ` Kim, Jonathan
2022-10-31 16:23 ` [PATCH 28/29] drm/amdkfd: add debug device " Jonathan Kim
2022-12-01  0:00   ` Felix Kuehling
2022-10-31 16:23 ` [PATCH 29/29] drm/amdkfd: bump kfd ioctl minor version for debug api availability Jonathan Kim
2022-12-01  0:00   ` Felix Kuehling
2022-11-22 23:05 ` [PATCH 01/29] drm/amdkfd: add debug and runtime enable interface Felix Kuehling
2022-11-23 20:45   ` Kim, Jonathan
  -- strict thread matches above, loose matches on Subject: below --
2022-08-29 14:29 [PATCH 0/29] Introduce AMD GPU ISA Debugging for HSA Compute Jonathan Kim
2022-08-29 14:30 ` [PATCH 22/29] drm/amdkfd: add debug suspend and resume process queues operation Jonathan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b05ea898-76c8-f19c-06d8-c176e7877d8d@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=jonathan.kim@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.