All of lore.kernel.org
 help / color / mirror / Atom feed
From: Emil Velikov <emil.l.velikov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: amd-gfx mailing list
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>
Subject: Re: [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
Date: Fri, 3 Mar 2017 14:48:07 +0000	[thread overview]
Message-ID: <CACvgo53tA_NHaFO_rELLjdozAf_kmqkRv70CT7Y1M5_cLu4wBw@mail.gmail.com> (raw)
In-Reply-To: <782283a5-3871-0827-ed2c-9069a6dc6734-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

On 2 March 2017 at 03:52, Andres Rodriguez <andresx7@gmail.com> wrote:
>
>
> On 2017-02-28 08:13 PM, Emil Velikov wrote:
>>
>> Hi Andres,
>>
>> There's a couple of nitpicks below, but feel free to address those as
>> follow-up. Considering they're correct of course ;-)
>
>
> As much as I'd like the to let future me deal with those issues, the UAPI
> behavior is something I'd like to get nailed down early and avoid changing.
>
> So any nitpicks here are more than welcome now (better than later :) )
>
>>
>> On 28 February 2017 at 22:14, Andres Rodriguez <andresx7@gmail.com> wrote:
>>>
>>> Add a new context creation parameter to express a global context
>>> priority.
>>>
>>> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
>>> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
>>> (default) contexts.
>>>
>>> v2: Instead of using flags, repurpose __pad
>>> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
>>> v4: Validate usermode priority and store it
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  1 +
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c       | 41
>>> +++++++++++++++++++++++----
>>>  drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
>>>  include/uapi/drm/amdgpu_drm.h                 |  7 ++++-
>>>  4 files changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index e30c47e..366f6d3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>>>         struct amd_sched_entity entity;
>>>  };
>>>
>>>  struct amdgpu_ctx {
>>>         struct kref             refcount;
>>>         struct amdgpu_device    *adev;
>>>         unsigned                reset_counter;
>>>         spinlock_t              ring_lock;
>>>         struct dma_fence        **fences;
>>>         struct amdgpu_ctx_ring  rings[AMDGPU_MAX_RINGS];
>>> +       int                     priority;
>>>         bool preamble_presented;
>>>  };
>>>
>>>  struct amdgpu_ctx_mgr {
>>>         struct amdgpu_device    *adev;
>>>         struct mutex            lock;
>>>         /* protected by lock */
>>>         struct idr              ctx_handles;
>>>  };
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index 400c66b..22a15d6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -18,47 +18,75 @@
>>>   * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>   * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>   * OTHER DEALINGS IN THE SOFTWARE.
>>>   *
>>>   * Authors: monk liu <monk.liu@amd.com>
>>>   */
>>>
>>>  #include <drm/drmP.h>
>>>  #include "amdgpu.h"
>>>
>>> -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx
>>> *ctx)
>>> +static enum amd_sched_priority amdgpu_to_sched_priority(int
>>> amdgpu_priority)
>>> +{
>>> +       switch (amdgpu_priority) {
>>> +       case AMDGPU_CTX_PRIORITY_HIGH:
>>> +               return AMD_SCHED_PRIORITY_HIGH;
>>> +       case AMDGPU_CTX_PRIORITY_NORMAL:
>>> +               return AMD_SCHED_PRIORITY_NORMAL;
>>> +       default:
>>> +               WARN(1, "Invalid context priority %d\n",
>>> amdgpu_priority);
>>> +               return AMD_SCHED_PRIORITY_NORMAL;
>>> +       }
>>> +}
>>> +
>>> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
>>> +                               uint32_t priority,
>>> +                               struct amdgpu_ctx *ctx)
>>>  {
>>>         unsigned i, j;
>>>         int r;
>>> +       enum amd_sched_priority sched_priority;
>>> +
>>> +       sched_priority = amdgpu_to_sched_priority(priority);
>>> +
>>
>> This will trigger dmesg spam on normal user input. I'd keep the WARN
>> in amdgpu_to_sched_priority, but move the function call after the
>> validation of priority.
>> Thinking about it the input validation really belongs in the ioctl -
>> amdgpu_ctx_ioctl().
>>
>
> Agreed.
>
>>> +       if (priority >= AMDGPU_CTX_PRIORITY_NUM)
>>> +               return -EINVAL;
>>> +
>>> +       if (sched_priority < 0 || sched_priority >=
>>> AMD_SCHED_MAX_PRIORITY)
>>> +               return -EINVAL;
>>> +
>>> +       if (sched_priority == AMD_SCHED_PRIORITY_HIGH &&
>>> !capable(CAP_SYS_ADMIN))
>>
>> This is not obvious neither in the commit message nor the UAPI. I'd
>> suggest adding a comment in the latter.
>
>
> Will do.
>
>> If memory is not failing - high prio will _not_ work with render nodes
>> so you really want to cover and/or explain why.
>
>
> High priority will work fine with render nodes. I'm testing using radv which
> uses render nodes actually.
>
> But I've had my fair share of two bugs canceling each other out. So if you
> do have some insight on why this is the case, let me know.
>
Right - got confused by the CAP_SYS_ADMIN cap. What I meant above is -
why do we need it, does it impose specific workflow, permissions for
the userspace to use high prio ?
This is the type of thing, I think, must be documented in the UAPI header.

>>
>>> +               return -EACCES;
>>>
>>>         memset(ctx, 0, sizeof(*ctx));
>>>         ctx->adev = adev;
>>> +       ctx->priority = priority;
>>>         kref_init(&ctx->refcount);
>>>         spin_lock_init(&ctx->ring_lock);
>>>         ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>>>                               sizeof(struct dma_fence*), GFP_KERNEL);
>>>         if (!ctx->fences)
>>>                 return -ENOMEM;
>>>
>>>         for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>                 ctx->rings[i].sequence = 1;
>>>                 ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs *
>>> i];
>>>         }
>>>
>>>         ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>>>
>>>         /* create context entity for each ring */
>>>         for (i = 0; i < adev->num_rings; i++) {
>>>                 struct amdgpu_ring *ring = adev->rings[i];
>>>                 struct amd_sched_rq *rq;
>>>
>>> -               rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
>>> +               rq = &ring->sched.sched_rq[sched_priority];
>>>                 r = amd_sched_entity_init(&ring->sched,
>>> &ctx->rings[i].entity,
>>>                                           rq, amdgpu_sched_jobs);
>>>                 if (r)
>>>                         goto failed;
>>>         }
>>>
>>>         return 0;
>>>
>>>  failed:
>>>         for (j = 0; j < i; j++)
>>> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>>>         kfree(ctx->fences);
>>>         ctx->fences = NULL;
>>>
>>>         for (i = 0; i < adev->num_rings; i++)
>>>                 amd_sched_entity_fini(&adev->rings[i]->sched,
>>>                                       &ctx->rings[i].entity);
>>>  }
>>>
>>>  static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>>>                             struct amdgpu_fpriv *fpriv,
>>> +                           uint32_t priority,
>>>                             uint32_t *id)
>>>  {
>>>         struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>>>         struct amdgpu_ctx *ctx;
>>>         int r;
>>>
>>>         ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>>>         if (!ctx)
>>>                 return -ENOMEM;
>>>
>>>         mutex_lock(&mgr->lock);
>>>         r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>>>         if (r < 0) {
>>>                 mutex_unlock(&mgr->lock);
>>>                 kfree(ctx);
>>>                 return r;
>>>         }
>>> +
>>>         *id = (uint32_t)r;
>>> -       r = amdgpu_ctx_init(adev, ctx);
>>> +       r = amdgpu_ctx_init(adev, priority, ctx);
>>>         if (r) {
>>>                 idr_remove(&mgr->ctx_handles, *id);
>>>                 *id = 0;
>>>                 kfree(ctx);
>>>         }
>>>         mutex_unlock(&mgr->lock);
>>>         return r;
>>>  }
>>>
>>>  static void amdgpu_ctx_do_release(struct kref *ref)
>>> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device
>>> *adev,
>>>         ctx->reset_counter = reset_counter;
>>>
>>>         mutex_unlock(&mgr->lock);
>>>         return 0;
>>>  }
>>>
>>>  int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>>>                      struct drm_file *filp)
>>>  {
>>>         int r;
>>> -       uint32_t id;
>>> +       uint32_t id, priority;
>>>
>>>         union drm_amdgpu_ctx *args = data;
>>>         struct amdgpu_device *adev = dev->dev_private;
>>>         struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>
>>>         r = 0;
>>>         id = args->in.ctx_id;
>>> +       priority = args->in.priority;
>>>
>> Hmm we don't seem to be doing any in.flags validation - not cool.
>> Someone seriously wants to add that and check the remaining ioctls.
>> At the same time - I think you want to add a flag bit "HAS_PRIORITY"
>> [or similar] and honour in.priority only when that is set.
>>
>> Even if the USM drivers are safe, this will break on a poor soul that
>> is learning how to program their GPU. "My program was running before -
>> I updated the kernel and it no longer does :-("
>
>
> Improving validation of already existing parameters is probably something
> better served in a separate series (so in this case I will take you up on
> the earlier offer)
>
Agreed.

> As far as a HAS_PRIORITY flag goes, I'm not sure it would really be
> necessary. libdrm_amdgpu zeroes their data structures since its first commit
> (0936139), so the change should be backwards compatible with all
> libdrm_amdgpu versions.
>
> If someone is bypassing libdrm_amdgpu and hitting the IOCTLs directly, then
> I hope they know what they are doing. The only apps I've really seen do this
> are fuzzers, and they probably wouldn't care.
>
> I'm assuming we do have the flexibility of relying on the usermode library
> as the point of the backwards compatibility.
>
Since there is no validation in the first place anyone can feed
garbage into the ioctl without knowing that they're doing it wrong.
Afaict the rule tends to be - kernel updates should never break
userspace.
And by the time you realise there is one here, it'll be too late.

An even if we ignore all that for a moment, flags is there exactly for
things like HAS_PRIORITY and adding/managing is trivial.
Then again, if you feel so strongly against ~5 lines of code, so be it :-(

Thanks
Emil
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  parent reply	other threads:[~2017-03-03 14:48 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 22:14 Add support for high priority scheduling in amdgpu Andres Rodriguez
     [not found] ` <1488320089-22035-1-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-28 22:14   ` [PATCH 01/22] drm/amdgpu: refactor MQD/HQD initialization Andres Rodriguez
2017-02-28 22:14   ` [PATCH 02/22] drm/amdgpu: doorbell registers need only be set once v2 Andres Rodriguez
2017-02-28 22:14   ` [PATCH 03/22] drm/amdgpu: detect timeout error when deactivating hqd Andres Rodriguez
2017-02-28 22:14   ` [PATCH 04/22] drm/amdgpu: remove duplicate definition of cik_mqd Andres Rodriguez
2017-02-28 22:14   ` [PATCH 05/22] drm/amdgpu: unify MQD programming sequence for kfd and amdgpu Andres Rodriguez
2017-02-28 22:14   ` [PATCH 06/22] drm/amdgpu: rename rdev to adev Andres Rodriguez
2017-02-28 22:14   ` [PATCH 07/22] drm/amdgpu: take ownership of per-pipe configuration Andres Rodriguez
2017-02-28 22:14   ` [PATCH 08/22] drm/radeon: take ownership of pipe initialization Andres Rodriguez
2017-02-28 22:14   ` [PATCH 09/22] drm/amdgpu: allow split of queues with kfd at queue granularity Andres Rodriguez
2017-02-28 22:14   ` [PATCH 10/22] drm/amdgpu: teach amdgpu how to enable interrupts for any pipe Andres Rodriguez
2017-02-28 22:14   ` [PATCH 11/22] drm/amdkfd: allow split HQD on per-queue granularity v3 Andres Rodriguez
2017-02-28 22:14   ` [PATCH 12/22] drm/amdgpu: remove duplicate magic constants from amdgpu_amdkfd_gfx*.c Andres Rodriguez
2017-02-28 22:14   ` [PATCH 13/22] drm/amdgpu: allocate queues horizontally across pipes Andres Rodriguez
2017-02-28 22:14   ` [PATCH 14/22] drm/amdgpu: new queue policy, take first 2 queues of each pipe Andres Rodriguez
2017-02-28 22:14   ` [PATCH 15/22] drm/amdgpu: add hw_ip member to amdgpu_ring Andres Rodriguez
     [not found]     ` <1488320089-22035-16-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01 15:33       ` Alex Deucher
2017-02-28 22:14   ` [PATCH 16/22] drm/amdgpu: add a mechanism to untie user ring ids from kernel ring ids Andres Rodriguez
2017-02-28 22:14   ` [PATCH 17/22] drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute Andres Rodriguez
2017-02-28 22:14   ` [PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4 Andres Rodriguez
     [not found]     ` <1488320089-22035-19-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01  1:13       ` Emil Velikov
     [not found]         ` <CACvgo51=1-8dHmC8MOmbCijDv3vpD4dTC6hibQMe5bYB9zsB4Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-02  3:52           ` Andres Rodriguez
     [not found]             ` <782283a5-3871-0827-ed2c-9069a6dc6734-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-03 14:48               ` Emil Velikov [this message]
2017-03-01  6:52       ` zhoucm1
     [not found]         ` <58B66FB8.8050300-5C7GfCeVMHo@public.gmane.org>
2017-03-01  7:09           ` zhoucm1
     [not found]             ` <58B673C0.4070006-5C7GfCeVMHo@public.gmane.org>
2017-03-01 11:51               ` Emil Velikov
2017-02-28 22:14   ` [PATCH 19/22] drm/amdgpu: add framework for HW specific priority settings Andres Rodriguez
     [not found]     ` <1488320089-22035-20-git-send-email-andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01  7:27       ` zhoucm1
     [not found]         ` <58B677DD.4070408-5C7GfCeVMHo@public.gmane.org>
2017-03-01 15:49           ` Alex Deucher
     [not found]             ` <CADnq5_NhLAOsR7tHhRZRzA12j_-5MWFEXfWeGqKmSifHp_5jKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-01 17:44               ` Andres Rodriguez
     [not found]                 ` <f0de5e4f-bf94-9222-cc9e-1d535c228b0a-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-02  6:45                   ` Andres Rodriguez
2017-02-28 22:14   ` [PATCH 20/22] drm/amdgpu: implement ring set_priority for gfx_v8 compute Andres Rodriguez
2017-02-28 22:14   ` [PATCH 21/22] drm/amdgpu: condense mqd programming sequence Andres Rodriguez
2017-02-28 22:14   ` [PATCH 22/22] drm/amdgpu: workaround tonga HW bug in HQD " Andres Rodriguez
2017-03-01 11:42   ` Add support for high priority scheduling in amdgpu Christian König
     [not found]     ` <25194b1a-4756-e1ad-f597-17063a14eb4c-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-03-01 17:13       ` Andres Rodriguez
     [not found]         ` <ddeb4a53-ec4f-9a87-9323-897c571b1634-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-01 17:24           ` Andres Rodriguez
     [not found]             ` <4c908b1f-fcb2-7d89-026a-76fd3f4f1f22-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-03-02 11:00               ` Christian König
2017-03-01 16:14   ` Bridgman, John
     [not found]     ` <BN6PR12MB1348B8F1F537321557D522AFE8290-/b2+HYfkarQX0pEhCR5T8QdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-03-01 16:37       ` Andres Rodriguez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACvgo53tA_NHaFO_rELLjdozAf_kmqkRv70CT7Y1M5_cLu4wBw@mail.gmail.com \
    --to=emil.l.velikov-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
    --cc=amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org \
    --cc=andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.