amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Felix Kuehling <felix.kuehling@amd.com>
To: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>,
	amd-gfx@lists.freedesktop.org
Cc: alexander.deucher@amd.com
Subject: Re: [Patch v3 3/4] drm/amdkfd: refactor runtime pm for baco
Date: Fri, 7 Feb 2020 16:49:57 -0500	[thread overview]
Message-ID: <e73350f5-c604-8f2d-97fb-5c3226dfcf74@amd.com> (raw)
In-Reply-To: <20200207000911.19166-4-rajneesh.bhardwaj@amd.com>

One more nit-pick and one error-handling problem inline.

On 2020-02-06 7:09 p.m., Rajneesh Bhardwaj wrote:
> So far the kfd driver implemented same routines for runtime and system
> wide suspend and resume (s2idle or mem). During system wide suspend the
> kfd aquires an atomic lock that prevents any more user processes to
> create queues and interact with kfd driver and amd gpu. This mechanism
> created problem when amdgpu device is runtime suspended with BACO
> enabled. Any application that relies on kfd driver fails to load because
> the driver reports a locked kfd device since gpu is runtime suspended.
>
> However, in an ideal case, when gpu is runtime  suspended the kfd driver
> should be able to:
>
>   - auto resume amdgpu driver whenever a client requests compute service
>   - prevent runtime suspend for amdgpu  while kfd is in use
>
> This change refactors the amdgpu and amdkfd drivers to support BACO and
> runtime power management.
>
> Reviewed-by: Oak Zeng <oak.zeng@amd.com>
> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 12 +++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  8 ++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 +--
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c    | 29 +++++++++-------
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h      |  1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c   | 40 ++++++++++++++++++++--
>   6 files changed, 68 insertions(+), 26 deletions(-)
>
[snip]
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 98dcbb96b2e2..6d6c25fe2677 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -31,6 +31,7 @@
>   #include <linux/compat.h>
>   #include <linux/mman.h>
>   #include <linux/file.h>
> +#include <linux/pm_runtime.h>
>   #include "amdgpu_amdkfd.h"
>   #include "amdgpu.h"
>   
> @@ -527,6 +528,16 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>   		kfree(pdd->qpd.doorbell_bitmap);
>   		idr_destroy(&pdd->alloc_idr);
>   
> +		/*
> +		 * before destroying pdd, make sure to report availability
> +		 * for auto suspend
> +		 */
> +		if (pdd->runtime_inuse) {
> +			pm_runtime_mark_last_busy(pdd->dev->ddev->dev);
> +			pm_runtime_put_autosuspend(pdd->dev->ddev->dev);
> +			pdd->runtime_inuse = false;
> +		}
> +
>   		kfree(pdd);
>   	}
>   }
> @@ -844,6 +855,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>   	pdd->process = p;
>   	pdd->bound = PDD_UNBOUND;
>   	pdd->already_dequeued = false;
> +	pdd->runtime_inuse = false;
>   	list_add(&pdd->per_device_list, &p->per_device_data);
>   
>   	/* Init idr used for memory handle translation */
> @@ -933,15 +945,39 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> +	/*
> +	 * signal runtime-pm system to auto resume and prevent
> +	 * further runtime suspend once device pdd is created until
> +	 * pdd is destroyed.
> +	 */
> +	if (!pdd->runtime_inuse) {
> +		err = pm_runtime_get_sync(dev->ddev->dev);
> +		if (err < 0)
> +			return ERR_PTR(err);
> +	}
> +
>   	err = kfd_iommu_bind_process_to_device(pdd);
>   	if (err)
> -		return ERR_PTR(err);
> +		goto out;
>   
>   	err = kfd_process_device_init_vm(pdd, NULL);
>   	if (err)
> -		return ERR_PTR(err);
> +		goto out;
> +
> +	if (!err)

This "if" is also redundant. If there was an error, you already did goto 
out. pdd->runtime_inuse should be set whenever we return successfully 
from this function, so logically there should be no extra "if".


> +		/*
> +		 * make sure that runtime_usage counter is incremented
> +		 * just once per pdd
> +		 */
> +		pdd->runtime_inuse = true;
>   
>   	return pdd;
> +
> +out:
> +	/* balance runpm reference count and exit with error */

I think you need an "if (!pdd->runtime_inuse)" here. If this function 
didn't call pm_runtime_get_sync above, you shouldn't do the cleanup 
below. Otherwise you risk getting unbalanced usage counters. In other 
words, you need to use the same condition for pm_runtime_get_sync and 
the cleanup.

Regards,
   Felix


> +	pm_runtime_mark_last_busy(dev->ddev->dev);
> +	pm_runtime_put_autosuspend(dev->ddev->dev);
> +	return ERR_PTR(err);
>   }
>   
>   struct kfd_process_device *kfd_get_first_process_device_data(
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2020-02-07 21:50 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07  0:09 [Patch v3 0/4] Enable BACO with KFD Rajneesh Bhardwaj
2020-02-07  0:09 ` [Patch v3 1/4] drm/amdgpu: Fix missing error check in suspend Rajneesh Bhardwaj
2020-02-07  0:09 ` [Patch v3 2/4] drm/amdkfd: show warning when kfd is locked Rajneesh Bhardwaj
2020-02-07  0:09 ` [Patch v3 3/4] drm/amdkfd: refactor runtime pm for baco Rajneesh Bhardwaj
2020-02-07 21:49   ` Felix Kuehling [this message]
2020-02-07  0:09 ` [Patch v3 4/4] drm/amdgpu/runpm: enable runpm on baco capable VI+ asics Rajneesh Bhardwaj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e73350f5-c604-8f2d-97fb-5c3226dfcf74@amd.com \
    --to=felix.kuehling@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=rajneesh.bhardwaj@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).