All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Tejun Heo <tj@kernel.org>
Cc: "Kenny Ho" <Kenny.Ho@amd.com>,
	"Kuehling, Felix" <felix.kuehling@amd.com>,
	jsparks@cray.com, dri-devel <dri-devel@lists.freedesktop.org>,
	"Kenny Ho" <y2kenny@gmail.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Greathouse, Joseph" <joseph.greathouse@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	cgroups@vger.kernel.org,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem
Date: Tue, 14 Apr 2020 14:20:15 +0200	[thread overview]
Message-ID: <20200414122015.GR3456981@phenom.ffwll.local> (raw)
In-Reply-To: <20200413191136.GI60335@mtj.duckdns.org>

On Mon, Apr 13, 2020 at 03:11:36PM -0400, Tejun Heo wrote:
> Hello, Kenny.
> 
> On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote:
> > Can you elaborate more on what are the missing pieces?
> 
> Sorry about the long delay, but I think we've been going in circles for quite
> a while now. Let's try to make it really simple as the first step. How about
> something like the following?
> 
> * gpu.weight (should it be gpu.compute.weight? idk) - A single number
>   per-device weight similar to io.weight, which distributes computation
>   resources in work-conserving way.
> 
> * gpu.memory.high - A single number per-device on-device memory limit.
> 
> The above two, if works well, should already be plenty useful. And my guess is
> that getting the above working well will be plenty challenging already even
> though it's already excluding work-conserving memory distribution. So, let's
> please do that as the first step and see what more would be needed from there.

This agrees with my understanding of the consensus here and what's
reasonable possible across different gpus. And in case this isn't clear:
This is very much me talking with my drm co-maintainer hat on, not with a
gpu vendor hat on (since that's implied somewhere further down the
discussion). My understanding from talking with a few other folks is that
the cpumask-style CU-weight thing is not something any other gpu can
reasonably support (and we have about 6+ of those in-tree), whereas some
work-preserving computation resource thing should be doable for anyone
with a scheduler. +/- more or less the same issues as io devices, there
might be quite bit latencies involved from going from one client to the
other because gpu pipelines are deed and pre-emption for gpus rather slow.
And ofc not all gpu "requests" use equal amounts of resources (different
engines and stuff just to begin with), same way not all io requests are
made equal. Plus since we do have a shared scheduler used by at least most
drivers, this shouldn't be too hard to get done somewhat consistently
across drivers

tldr; Acked by me.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel@ffwll.ch>
To: Tejun Heo <tj@kernel.org>
Cc: "Kenny Ho" <Kenny.Ho@amd.com>,
	"Kuehling, Felix" <felix.kuehling@amd.com>,
	jsparks@cray.com, dri-devel <dri-devel@lists.freedesktop.org>,
	"Kenny Ho" <y2kenny@gmail.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Greathouse, Joseph" <joseph.greathouse@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	cgroups@vger.kernel.org,
	"Christian König" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem
Date: Tue, 14 Apr 2020 14:20:15 +0200	[thread overview]
Message-ID: <20200414122015.GR3456981@phenom.ffwll.local> (raw)
In-Reply-To: <20200413191136.GI60335@mtj.duckdns.org>

On Mon, Apr 13, 2020 at 03:11:36PM -0400, Tejun Heo wrote:
> Hello, Kenny.
> 
> On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote:
> > Can you elaborate more on what are the missing pieces?
> 
> Sorry about the long delay, but I think we've been going in circles for quite
> a while now. Let's try to make it really simple as the first step. How about
> something like the following?
> 
> * gpu.weight (should it be gpu.compute.weight? idk) - A single number
>   per-device weight similar to io.weight, which distributes computation
>   resources in work-conserving way.
> 
> * gpu.memory.high - A single number per-device on-device memory limit.
> 
> The above two, if works well, should already be plenty useful. And my guess is
> that getting the above working well will be plenty challenging already even
> though it's already excluding work-conserving memory distribution. So, let's
> please do that as the first step and see what more would be needed from there.

This agrees with my understanding of the consensus here and what's
reasonable possible across different gpus. And in case this isn't clear:
This is very much me talking with my drm co-maintainer hat on, not with a
gpu vendor hat on (since that's implied somewhere further down the
discussion). My understanding from talking with a few other folks is that
the cpumask-style CU-weight thing is not something any other gpu can
reasonably support (and we have about 6+ of those in-tree), whereas some
work-preserving computation resource thing should be doable for anyone
with a scheduler. +/- more or less the same issues as io devices, there
might be quite bit latencies involved from going from one client to the
other because gpu pipelines are deed and pre-emption for gpus rather slow.
And ofc not all gpu "requests" use equal amounts of resources (different
engines and stuff just to begin with), same way not all io requests are
made equal. Plus since we do have a shared scheduler used by at least most
drivers, this shouldn't be too hard to get done somewhat consistently
across drivers

tldr; Acked by me.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)
From: Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: "Kenny Ho" <Kenny.Ho-5C7GfCeVMHo@public.gmane.org>,
	"Kuehling, Felix" <felix.kuehling-5C7GfCeVMHo@public.gmane.org>,
	jsparks-WVYJKLFxKCc@public.gmane.org,
	dri-devel
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"Kenny Ho" <y2kenny-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"amd-gfx list"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"Greathouse,
	Joseph" <joseph.greathouse-5C7GfCeVMHo@public.gmane.org>,
	"Alex Deucher" <alexander.deucher-5C7GfCeVMHo@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	"Christian König" <christian.koenig-5C7GfCeVMHo@public.gmane.org>
Subject: Re: [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem
Date: Tue, 14 Apr 2020 14:20:15 +0200	[thread overview]
Message-ID: <20200414122015.GR3456981@phenom.ffwll.local> (raw)
In-Reply-To: <20200413191136.GI60335-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>

On Mon, Apr 13, 2020 at 03:11:36PM -0400, Tejun Heo wrote:
> Hello, Kenny.
> 
> On Tue, Mar 24, 2020 at 02:49:27PM -0400, Kenny Ho wrote:
> > Can you elaborate more on what are the missing pieces?
> 
> Sorry about the long delay, but I think we've been going in circles for quite
> a while now. Let's try to make it really simple as the first step. How about
> something like the following?
> 
> * gpu.weight (should it be gpu.compute.weight? idk) - A single number
>   per-device weight similar to io.weight, which distributes computation
>   resources in work-conserving way.
> 
> * gpu.memory.high - A single number per-device on-device memory limit.
> 
> The above two, if works well, should already be plenty useful. And my guess is
> that getting the above working well will be plenty challenging already even
> though it's already excluding work-conserving memory distribution. So, let's
> please do that as the first step and see what more would be needed from there.

This agrees with my understanding of the consensus here and what's
reasonable possible across different gpus. And in case this isn't clear:
This is very much me talking with my drm co-maintainer hat on, not with a
gpu vendor hat on (since that's implied somewhere further down the
discussion). My understanding from talking with a few other folks is that
the cpumask-style CU-weight thing is not something any other gpu can
reasonably support (and we have about 6+ of those in-tree), whereas some
work-preserving computation resource thing should be doable for anyone
with a scheduler. +/- more or less the same issues as io devices, there
might be quite bit latencies involved from going from one client to the
other because gpu pipelines are deed and pre-emption for gpus rather slow.
And ofc not all gpu "requests" use equal amounts of resources (different
engines and stuff just to begin with), same way not all io requests are
made equal. Plus since we do have a shared scheduler used by at least most
drivers, this shouldn't be too hard to get done somewhat consistently
across drivers

tldr; Acked by me.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  parent reply	other threads:[~2020-04-14 12:20 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <lkaplan@cray.com; daniel@ffwll.ch; nirmoy.das@amd.com; damon.mcdougall@amd.com; juan.zuniga-anaya@amd.com; hannes@cmpxchg.org>
2020-02-26 19:01 ` [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
2020-02-26 19:01   ` Kenny Ho
2020-02-26 19:01   ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 01/11] cgroup: Introduce cgroup for drm subsystem Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 02/11] drm, cgroup: Bind drm and cgroup subsystem Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 03/11] drm, cgroup: Initialize drmcg properties Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 04/11] drm, cgroup: Add total GEM buffer allocation stats Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 05/11] drm, cgroup: Add peak " Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 06/11] drm, cgroup: Add GEM buffer allocation count stats Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 07/11] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 08/11] drm, cgroup: Add peak " Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 09/11] drm, cgroup: Add compute as gpu cgroup resource Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 10/11] drm, cgroup: add update trigger after limit change Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01   ` [PATCH v2 11/11] drm/amdgpu: Integrate with DRM cgroup Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-02-26 19:01     ` Kenny Ho
2020-03-17 16:03   ` [PATCH v2 00/11] new cgroup controller for gpu/drm subsystem Kenny Ho
2020-03-17 16:03     ` Kenny Ho
2020-03-17 16:03     ` Kenny Ho
2020-03-24 18:46     ` Tejun Heo
2020-03-24 18:46       ` Tejun Heo
2020-03-24 18:46       ` Tejun Heo
2020-03-24 18:49       ` Kenny Ho
2020-03-24 18:49         ` Kenny Ho
2020-03-24 18:49         ` Kenny Ho
2020-04-13 19:11         ` Tejun Heo
2020-04-13 19:11           ` Tejun Heo
2020-04-13 19:11           ` Tejun Heo
2020-04-13 20:12           ` Ho, Kenny
2020-04-13 20:12             ` Ho, Kenny
2020-04-13 20:12             ` Ho, Kenny
2020-04-13 20:17           ` Kenny Ho
2020-04-13 20:17             ` Kenny Ho
2020-04-13 20:17             ` Kenny Ho
2020-04-13 20:54             ` Tejun Heo
2020-04-13 20:54               ` Tejun Heo
2020-04-13 20:54               ` Tejun Heo
2020-04-13 21:40               ` Kenny Ho
2020-04-13 21:40                 ` Kenny Ho
2020-04-13 21:40                 ` Kenny Ho
2020-04-13 21:53                 ` Tejun Heo
2020-04-13 21:53                   ` Tejun Heo
2020-04-13 21:53                   ` Tejun Heo
2020-04-14 12:20           ` Daniel Vetter [this message]
2020-04-14 12:20             ` Daniel Vetter
2020-04-14 12:20             ` Daniel Vetter
2020-04-14 12:47             ` Kenny Ho
2020-04-14 12:47               ` Kenny Ho
2020-04-14 12:47               ` Kenny Ho
2020-04-14 12:52               ` Daniel Vetter
2020-04-14 12:52                 ` Daniel Vetter
2020-04-14 12:52                 ` Daniel Vetter
2020-04-14 13:14                 ` Kenny Ho
2020-04-14 13:14                   ` Kenny Ho
2020-04-14 13:14                   ` Kenny Ho
2020-04-14 13:26                   ` Daniel Vetter
2020-04-14 13:26                     ` Daniel Vetter
2020-04-14 13:26                     ` Daniel Vetter
2020-04-14 13:50                     ` Kenny Ho
2020-04-14 13:50                       ` Kenny Ho
2020-04-14 13:50                       ` Kenny Ho
2020-04-14 14:04                       ` Daniel Vetter
2020-04-14 14:04                         ` Daniel Vetter
2020-04-14 14:04                         ` Daniel Vetter
2020-04-14 14:29                         ` Kenny Ho
2020-04-14 14:29                           ` Kenny Ho
2020-04-14 14:29                           ` Kenny Ho
2020-04-14 15:01                           ` Daniel Vetter
2020-04-14 15:01                             ` Daniel Vetter
2020-04-14 15:01                             ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414122015.GR3456981@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=Kenny.Ho@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=cgroups@vger.kernel.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=joseph.greathouse@amd.com \
    --cc=jsparks@cray.com \
    --cc=tj@kernel.org \
    --cc=y2kenny@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.