From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7C00C433F5 for ; Wed, 25 May 2022 16:22:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245442AbiEYQWl (ORCPT ); Wed, 25 May 2022 12:22:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233582AbiEYQWi (ORCPT ); Wed, 25 May 2022 12:22:38 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9CBF28148B; Wed, 25 May 2022 09:22:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1653495757; x=1685031757; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=ft21lVsoTwXJ1PlXHDHEGSpjqGIsnHo6ndYVHAxIjcg=; b=Cf+odi0Y/RuP7N9GaAYLfi8kIBDt1hnrGmLm+KsyflAVE6IE+oguDrKX h1NUimlXLLt+Z6FyezYE2rEXARbW/CrGrt+DZfpKCfmq/QfmorX9aPTR+ WTGjF9tT+EYaE9kQt8Po7PDQHlBPrBLlMVnketcQ5x5affMlepXtwDbVZ VikpkxTDNZLTEPLvUkcWcbJBlas3cUCBu0ehtXOV2jRcN6DgnAFJ4Zb1C dHXd1nY+xHqxe7xDJneY+uU3N7am8X6suovYjV/s9YAwqKVU0gFEtfYqw NvImPdu4RJ0p180mqbdiS+7+0g+KAcIRxFXj6GRVwv++KZ5ftkt1s/5Kn w==; X-IronPort-AV: E=McAfee;i="6400,9594,10358"; a="334503818" X-IronPort-AV: E=Sophos;i="5.91,250,1647327600"; d="scan'208";a="334503818" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2022 09:22:37 -0700 X-IronPort-AV: E=Sophos;i="5.91,250,1647327600"; d="scan'208";a="676931008" Received: from isobansk-mobl.ger.corp.intel.com (HELO [10.213.230.191]) ([10.213.230.191]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2022 09:22:33 -0700 Message-ID: <046f2d0f-5e61-7d24-1b40-006f2377c974@linux.intel.com> Date: Wed, 25 May 2022 17:22:32 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 Subject: Re: [PATCH v4 12/13] drm/msm: Utilize gpu scheduler priorities Content-Language: en-US To: Rob Clark Cc: Rob Clark , dri-devel , Sai Prakash Ranjan , Jonathan Marek , David Airlie , freedreno , Sharat Masetty , Akhil P Oommen , Jordan Crouse , "open list:DRM DRIVER FOR MSM ADRENO GPU" , Sean Paul , =?UTF-8?Q?Christian_K=c3=b6nig?= , open list References: <20210728010632.2633470-1-robdclark@gmail.com> <20210728010632.2633470-13-robdclark@gmail.com> <84e03c5f-a3af-6592-d19a-a2f5d20b92fb@linux.intel.com> <904ae104-1c30-d130-129f-ccae381261d5@linux.intel.com> <1cd913da-6e51-509c-a6e6-83bf79cae20b@linux.intel.com> From: Tvrtko Ursulin Organization: Intel Corporation UK Plc In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 25/05/2022 14:41, Rob Clark wrote: > On Wed, May 25, 2022 at 2:46 AM Tvrtko Ursulin > wrote: >> >> >> On 24/05/2022 15:50, Rob Clark wrote: >>> On Tue, May 24, 2022 at 6:45 AM Tvrtko Ursulin >>> wrote: >>>> >>>> >>>> On 23/05/2022 23:53, Rob Clark wrote: >>>>> On Mon, May 23, 2022 at 7:45 AM Tvrtko Ursulin >>>>> wrote: >>>>>> >>>>>> >>>>>> Hi Rob, >>>>>> >>>>>> On 28/07/2021 02:06, Rob Clark wrote: >>>>>>> From: Rob Clark >>>>>>> >>>>>>> The drm/scheduler provides additional prioritization on top of that >>>>>>> provided by however many number of ringbuffers (each with their own >>>>>>> priority level) is supported on a given generation. Expose the >>>>>>> additional levels of priority to userspace and map the userspace >>>>>>> priority back to ring (first level of priority) and schedular priority >>>>>>> (additional priority levels within the ring). >>>>>>> >>>>>>> Signed-off-by: Rob Clark >>>>>>> Acked-by: Christian König >>>>>>> --- >>>>>>> drivers/gpu/drm/msm/adreno/adreno_gpu.c | 4 +- >>>>>>> drivers/gpu/drm/msm/msm_gem_submit.c | 4 +- >>>>>>> drivers/gpu/drm/msm/msm_gpu.h | 58 ++++++++++++++++++++++++- >>>>>>> drivers/gpu/drm/msm/msm_submitqueue.c | 35 +++++++-------- >>>>>>> include/uapi/drm/msm_drm.h | 14 +++++- >>>>>>> 5 files changed, 88 insertions(+), 27 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>>>> index bad4809b68ef..748665232d29 100644 >>>>>>> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>>>> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c >>>>>>> @@ -261,8 +261,8 @@ int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value) >>>>>>> return ret; >>>>>>> } >>>>>>> return -EINVAL; >>>>>>> - case MSM_PARAM_NR_RINGS: >>>>>>> - *value = gpu->nr_rings; >>>>>>> + case MSM_PARAM_PRIORITIES: >>>>>>> + *value = gpu->nr_rings * NR_SCHED_PRIORITIES; >>>>>>> return 0; >>>>>>> case MSM_PARAM_PP_PGTABLE: >>>>>>> *value = 0; >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c >>>>>>> index 450efe59abb5..c2ecec5b11c4 100644 >>>>>>> --- a/drivers/gpu/drm/msm/msm_gem_submit.c >>>>>>> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c >>>>>>> @@ -59,7 +59,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev, >>>>>>> submit->gpu = gpu; >>>>>>> submit->cmd = (void *)&submit->bos[nr_bos]; >>>>>>> submit->queue = queue; >>>>>>> - submit->ring = gpu->rb[queue->prio]; >>>>>>> + submit->ring = gpu->rb[queue->ring_nr]; >>>>>>> submit->fault_dumped = false; >>>>>>> >>>>>>> INIT_LIST_HEAD(&submit->node); >>>>>>> @@ -749,7 +749,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data, >>>>>>> /* Get a unique identifier for the submission for logging purposes */ >>>>>>> submitid = atomic_inc_return(&ident) - 1; >>>>>>> >>>>>>> - ring = gpu->rb[queue->prio]; >>>>>>> + ring = gpu->rb[queue->ring_nr]; >>>>>>> trace_msm_gpu_submit(pid_nr(pid), ring->id, submitid, >>>>>>> args->nr_bos, args->nr_cmds); >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h >>>>>>> index b912cacaecc0..0e4b45bff2e6 100644 >>>>>>> --- a/drivers/gpu/drm/msm/msm_gpu.h >>>>>>> +++ b/drivers/gpu/drm/msm/msm_gpu.h >>>>>>> @@ -250,6 +250,59 @@ struct msm_gpu_perfcntr { >>>>>>> const char *name; >>>>>>> }; >>>>>>> >>>>>>> +/* >>>>>>> + * The number of priority levels provided by drm gpu scheduler. The >>>>>>> + * DRM_SCHED_PRIORITY_KERNEL priority level is treated specially in some >>>>>>> + * cases, so we don't use it (no need for kernel generated jobs). >>>>>>> + */ >>>>>>> +#define NR_SCHED_PRIORITIES (1 + DRM_SCHED_PRIORITY_HIGH - DRM_SCHED_PRIORITY_MIN) >>>>>>> + >>>>>>> +/** >>>>>>> + * msm_gpu_convert_priority - Map userspace priority to ring # and sched priority >>>>>>> + * >>>>>>> + * @gpu: the gpu instance >>>>>>> + * @prio: the userspace priority level >>>>>>> + * @ring_nr: [out] the ringbuffer the userspace priority maps to >>>>>>> + * @sched_prio: [out] the gpu scheduler priority level which the userspace >>>>>>> + * priority maps to >>>>>>> + * >>>>>>> + * With drm/scheduler providing it's own level of prioritization, our total >>>>>>> + * number of available priority levels is (nr_rings * NR_SCHED_PRIORITIES). >>>>>>> + * Each ring is associated with it's own scheduler instance. However, our >>>>>>> + * UABI is that lower numerical values are higher priority. So mapping the >>>>>>> + * single userspace priority level into ring_nr and sched_prio takes some >>>>>>> + * care. The userspace provided priority (when a submitqueue is created) >>>>>>> + * is mapped to ring nr and scheduler priority as such: >>>>>>> + * >>>>>>> + * ring_nr = userspace_prio / NR_SCHED_PRIORITIES >>>>>>> + * sched_prio = NR_SCHED_PRIORITIES - >>>>>>> + * (userspace_prio % NR_SCHED_PRIORITIES) - 1 >>>>>>> + * >>>>>>> + * This allows generations without preemption (nr_rings==1) to have some >>>>>>> + * amount of prioritization, and provides more priority levels for gens >>>>>>> + * that do have preemption. >>>>>> >>>>>> I am exploring how different drivers handle priority levels and this >>>>>> caught my eye. >>>>>> >>>>>> Is the implication of the last paragraphs that on hw with nr_rings > 1, >>>>>> ring + 1 preempts ring? >>>>> >>>>> Other way around, at least from the uabi standpoint. Ie. ring[0] >>>>> preempts ring[1] >>>> >>>> Ah yes, I figure it out from the comments but then confused myself when >>>> writing the email. >>>> >>>>>> If so I am wondering does the "spreading" of >>>>>> user visible priorities by NR_SCHED_PRIORITIES creates a non-preemptable >>>>>> levels within every "bucket" or how does that work? >>>>> >>>>> So, preemption is possible between any priority level before run_job() >>>>> gets called, which writes the job into the ringbuffer. After that >>>> >>>> Hmm how? Before run_job() the jobs are not runnable, sitting in the >>>> scheduler queues, right? >>> >>> I mean, if prio[0]+prio[1]+prio[2] map to a single ring, submit A on >>> prio[1] could be executed after submit B on prio[2] provided that >>> run_job(submitA) hasn't happened yet. So I guess it isn't "really" >>> preemption because the submit hasn't started running on the GPU yet. >>> But rather just scheduling according to priority. >>> >>>>> point, you only have "bucket" level preemption, because >>>>> NR_SCHED_PRIORITIES levels of priority get mapped to a single FIFO >>>>> ringbuffer. >>>> >>>> Right, and you have one GPU with four rings, which means you expose 12 >>>> priority levels to userspace, did I get that right? >>> >>> Correct >>> >>>> If so how do you convey in the ABI that not all there priority levels >>>> are equal? Like userspace can submit at prio 4 and expect prio 3 to >>>> preempt, as would prio 2 preempt prio 3. While actual behaviour will not >>>> match - 3 will not preempt 4. >>> >>> It isn't really exposed to userspace, but perhaps it should be.. >>> Userspace just knows that, to the extent possible, the kernel will try >>> to execute prio 3 before prio 4. >>> >>>> Also, does your userspace stack (EGL/Vulkan) use the priorities? I had a >>>> quick peek in Mesa but did not spot it - although I am not really at >>>> home there yet so maybe I missed it. >>> >>> Yes, there is an EGL extension: >>> >>> https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt >>> >>> It is pretty limited, it only exposes three priority levels. >> >> Right, is that wired up on msm? And if it is, or could be, how do/would >> you map the three priority levels for GPUs which expose 3 priority >> levels versus the one which exposes 12? > > We don't yet, but probably should, expose a cap to indicate to > userspace the # of hw rings vs # of levels of sched priority What bothers me is the question of whether this setup provides a consistent benefit. Why would userspace use other than "real" (hardware) priority levels on chips where they are available? For instance if you exposed 4 instead of 12 on a respective platform, would that be better or worse? Yes you could only map three directly drm/sched and one would have to be "fake". Like: hw prio 0 -> drm/sched 2 hw prio 1 -> drm/sched 1 hw prio 2 -> drm/sched 0 hw prio 3 -> drm/sched 0 Not saying that's nice either. Perhaps the answer is that drm/sched needs more flexibility for instance if it wants to be widely used. For instance in i915 uapi we have priority as int -1023 - +1023. And matching implementation on some platforms, until the new ones which are GuC firmware based, where we need to squash that to low/normal/high. So thinking was drm/sched happens to align with GuC. But then we have your hw where it doesn't seem to. Regards, Tvrtko >> Is it doable properly without leaking the fact drm/sched internal >> implementation detail of three priority levels? Or if you went the other >> way and only exposed up to max 3 levels, then you lose one priority >> level your hardware suppose which is also not good. >> >> It is all quite interesting because your hardware is completely >> different from ours in this respect. In our case i915 decides when to >> preempt, hardware has no concept of priority (*). > > It is really pretty much all in firmware.. a6xx is the first gen that > could do actual (non-cooperative) preemption (but that isn't > implemented yet in upstream driver) > > BR, > -R > >> Regards, >> >> Tvrtko >> >> (*) Almost no concept of priority in hardware - we do have it on new >> GPUs and only on a subset of engine classes where render and compute >> share the EUs. But I think it's way different from Ardenos.