Re: [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Tamminen, Eero T" <eero.t.tamminen@intel.com>
To: "daniel@ffwll.ch" <daniel@ffwll.ch>,
	"Welty, Brian" <brian.welty@intel.com>
Cc: "tvrtko.ursulin@linux.intel.com" <tvrtko.ursulin@linux.intel.com>,
	"airlied@linux.ie" <airlied@linux.ie>,
	"Kenny.Ho@amd.com" <Kenny.Ho@amd.com>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"chris@chris-wilson.co.uk" <chris@chris-wilson.co.uk>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"tj@kernel.org" <tj@kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"christian.koenig@amd.com" <christian.koenig@amd.com>
Subject: Re: [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup
Date: Mon, 10 May 2021 16:06:31 +0000	[thread overview]
Message-ID: <09a39aa58c064a8f8e696a091a1b5edd22ef58b9.camel@intel.com> (raw)
In-Reply-To: <CAKMK7uG7EWv93EbRcMRCm+opi=7fQPMOv2z1R6GBhJXb6--28w@mail.gmail.com>

Hi,

Mon, 2021-05-10 at 17:36 +0200, Daniel Vetter wrote:
> 
...
> > If DRM allows user-space to exhaust all of system memory, this seems
> > to be a gap in enforcement of MEMCG limits for system memory.
> > I tried to look into it when this was discussed in the past....
> > My guess is that shmem_read_mapping_page_gfp() ->
> > shmem_getpage_gfp() is not choosing the correct MM to charge against
> > in the use case of drivers using shmemfs for backing gem buffers.
> 
> Yeah we know about this one since forever. The bug report is roughly
> as old as the gem/ttm memory managers :-/ So another problem might be
> that if we now suddenly include gpu memory in the memcg accounting, we
> start breaking a bunch of workloads that worked just fine beforehand.

It's not the first time tightening security requires adapting settings
for running workloads...

Workload GPU memory usage needs to be significant portion of
application's real memory usage, to cause workload to hit limits that
have been set for it earlier.  Therefore I think it to definitely be
something that user setting such limits actually cares about.

=> I think the important thing is that reason for the failures is clear
from the OOM message.  This works much better if GPU related memory
usage is specifically stated in that message, once that memory starts to
be accounted for system memory.


	- Eero

WARNING: multiple messages have this Message-ID (diff)

From: "Tamminen, Eero T" <eero.t.tamminen@intel.com>
To: "daniel@ffwll.ch" <daniel@ffwll.ch>,
	"Welty, Brian" <brian.welty@intel.com>
Cc: "airlied@linux.ie" <airlied@linux.ie>,
	"Kenny.Ho@amd.com" <Kenny.Ho@amd.com>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"chris@chris-wilson.co.uk" <chris@chris-wilson.co.uk>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"tj@kernel.org" <tj@kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"christian.koenig@amd.com" <christian.koenig@amd.com>
Subject: Re: [Intel-gfx] [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup
Date: Mon, 10 May 2021 16:06:31 +0000	[thread overview]
Message-ID: <09a39aa58c064a8f8e696a091a1b5edd22ef58b9.camel@intel.com> (raw)
In-Reply-To: <CAKMK7uG7EWv93EbRcMRCm+opi=7fQPMOv2z1R6GBhJXb6--28w@mail.gmail.com>

Hi,

Mon, 2021-05-10 at 17:36 +0200, Daniel Vetter wrote:
> 
...
> > If DRM allows user-space to exhaust all of system memory, this seems
> > to be a gap in enforcement of MEMCG limits for system memory.
> > I tried to look into it when this was discussed in the past....
> > My guess is that shmem_read_mapping_page_gfp() ->
> > shmem_getpage_gfp() is not choosing the correct MM to charge against
> > in the use case of drivers using shmemfs for backing gem buffers.
> 
> Yeah we know about this one since forever. The bug report is roughly
> as old as the gem/ttm memory managers :-/ So another problem might be
> that if we now suddenly include gpu memory in the memcg accounting, we
> start breaking a bunch of workloads that worked just fine beforehand.

It's not the first time tightening security requires adapting settings
for running workloads...

Workload GPU memory usage needs to be significant portion of
application's real memory usage, to cause workload to hit limits that
have been set for it earlier.  Therefore I think it to definitely be
something that user setting such limits actually cares about.

=> I think the important thing is that reason for the failures is clear
from the OOM message.  This works much better if GPU related memory
usage is specifically stated in that message, once that memory starts to
be accounted for system memory.


	- Eero

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

WARNING: multiple messages have this Message-ID (diff)

From: "Tamminen, Eero T" <eero.t.tamminen@intel.com>
To: "daniel@ffwll.ch" <daniel@ffwll.ch>,
	"Welty, Brian" <brian.welty@intel.com>
Cc: "tvrtko.ursulin@linux.intel.com" <tvrtko.ursulin@linux.intel.com>,
	"airlied@linux.ie" <airlied@linux.ie>,
	"Kenny.Ho@amd.com" <Kenny.Ho@amd.com>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"joonas.lahtinen@linux.intel.com"
	<joonas.lahtinen@linux.intel.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"chris@chris-wilson.co.uk" <chris@chris-wilson.co.uk>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"tj@kernel.org" <tj@kernel.org>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"christian.koenig@amd.com" <christian.koenig@amd.com>
Subject: Re: [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup
Date: Mon, 10 May 2021 16:06:31 +0000	[thread overview]
Message-ID: <09a39aa58c064a8f8e696a091a1b5edd22ef58b9.camel@intel.com> (raw)
In-Reply-To: <CAKMK7uG7EWv93EbRcMRCm+opi=7fQPMOv2z1R6GBhJXb6--28w@mail.gmail.com>

Hi,

Mon, 2021-05-10 at 17:36 +0200, Daniel Vetter wrote:
> 
...
> > If DRM allows user-space to exhaust all of system memory, this seems
> > to be a gap in enforcement of MEMCG limits for system memory.
> > I tried to look into it when this was discussed in the past....
> > My guess is that shmem_read_mapping_page_gfp() ->
> > shmem_getpage_gfp() is not choosing the correct MM to charge against
> > in the use case of drivers using shmemfs for backing gem buffers.
> 
> Yeah we know about this one since forever. The bug report is roughly
> as old as the gem/ttm memory managers :-/ So another problem might be
> that if we now suddenly include gpu memory in the memcg accounting, we
> start breaking a bunch of workloads that worked just fine beforehand.

It's not the first time tightening security requires adapting settings
for running workloads...

Workload GPU memory usage needs to be significant portion of
application's real memory usage, to cause workload to hit limits that
have been set for it earlier.  Therefore I think it to definitely be
something that user setting such limits actually cares about.

=> I think the important thing is that reason for the failures is clear
from the OOM message.  This works much better if GPU related memory
usage is specifically stated in that message, once that memory starts to
be accounted for system memory.


	- Eero

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

WARNING: multiple messages have this Message-ID (diff)

From: "Tamminen, Eero T" <eero.t.tamminen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
To: "daniel-/w4YWyX8dFk@public.gmane.org"
	<daniel-/w4YWyX8dFk@public.gmane.org>,
	"Welty,
	Brian" <brian.welty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "airlied-cv59FeDIM0c@public.gmane.org"
	<airlied-cv59FeDIM0c@public.gmane.org>,
	"tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org"
	<tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	"intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org"
	<amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org>,
	"christian.koenig-5C7GfCeVMHo@public.gmane.org"
	<christian.koenig-5C7GfCeVMHo@public.gmane.org>,
	"chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org"
	<chris-Y6uKTt2uX1cEflXRtASbqLVCufUGDwFn@public.gmane.org>,
	"tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org"
	<tvrtko.ursulin-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	"Kenny.Ho-5C7GfCeVMHo@public.gmane.org"
	<Kenny.Ho-5C7GfCeVMHo@public.gmane.org>,
	"joonas.lahtinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org"
	<joonas.lahtinen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
Subject: Re: [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup
Date: Mon, 10 May 2021 16:06:31 +0000	[thread overview]
Message-ID: <09a39aa58c064a8f8e696a091a1b5edd22ef58b9.camel@intel.com> (raw)
In-Reply-To: <CAKMK7uG7EWv93EbRcMRCm+opi=7fQPMOv2z1R6GBhJXb6--28w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi,

Mon, 2021-05-10 at 17:36 +0200, Daniel Vetter wrote:
> 
...
> > If DRM allows user-space to exhaust all of system memory, this seems
> > to be a gap in enforcement of MEMCG limits for system memory.
> > I tried to look into it when this was discussed in the past....
> > My guess is that shmem_read_mapping_page_gfp() ->
> > shmem_getpage_gfp() is not choosing the correct MM to charge against
> > in the use case of drivers using shmemfs for backing gem buffers.
> 
> Yeah we know about this one since forever. The bug report is roughly
> as old as the gem/ttm memory managers :-/ So another problem might be
> that if we now suddenly include gpu memory in the memcg accounting, we
> start breaking a bunch of workloads that worked just fine beforehand.

It's not the first time tightening security requires adapting settings
for running workloads...

Workload GPU memory usage needs to be significant portion of
application's real memory usage, to cause workload to hit limits that
have been set for it earlier.  Therefore I think it to definitely be
something that user setting such limits actually cares about.

=> I think the important thing is that reason for the failures is clear
from the OOM message.  This works much better if GPU related memory
usage is specifically stated in that message, once that memory starts to
be accounted for system memory.


	- Eero

next prev parent reply	other threads:[~2021-05-10 16:07 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26 21:46 [RFC PATCH 0/9] cgroup support for GPU devices Brian Welty
2021-01-26 21:46 ` Brian Welty
2021-01-26 21:46 ` Brian Welty
2021-01-26 21:46 ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 1/9] cgroup: Introduce cgroup for drm subsystem Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 2/9] drm, cgroup: Bind drm and cgroup subsystem Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 3/9] drm, cgroup: Initialize drmcg properties Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 4/9] drmcg: Add skeleton seq_show and write for drmcg files Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 5/9] drmcg: Add support for device memory accounting via page counter Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-27  9:14   ` kernel test robot
2021-01-26 21:46 ` [RFC PATCH 6/9] drmcg: Add memory.total file Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 7/9] drmcg: Add initial support for tracking gpu time usage Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-02-03 13:25   ` Joonas Lahtinen
2021-02-03 13:25     ` Joonas Lahtinen
2021-02-03 13:25     ` [Intel-gfx] " Joonas Lahtinen
2021-02-04  2:23     ` Brian Welty
2021-02-04  2:23       ` Brian Welty
2021-02-04  2:23       ` Brian Welty
2021-02-04  2:23       ` [Intel-gfx] " Brian Welty
2021-01-26 21:46 ` [RFC PATCH 8/9] drm/gem: Associate GEM objects with drm cgroup Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-02-09 10:54   ` Daniel Vetter
2021-02-09 10:54     ` Daniel Vetter
2021-02-09 10:54     ` Daniel Vetter
2021-02-09 10:54     ` [Intel-gfx] " Daniel Vetter
2021-02-10  7:52     ` Thomas Zimmermann
2021-02-10  7:52       ` Thomas Zimmermann
2021-02-10  7:52       ` Thomas Zimmermann
2021-02-10  7:52       ` [Intel-gfx] " Thomas Zimmermann
2021-02-10 12:45       ` Daniel Vetter
2021-02-10 12:45         ` Daniel Vetter
2021-02-10 12:45         ` Daniel Vetter
2021-02-10 12:45         ` [Intel-gfx] " Daniel Vetter
2021-02-10 22:00     ` Brian Welty
2021-02-10 22:00       ` Brian Welty
2021-02-10 22:00       ` Brian Welty
2021-02-10 22:00       ` [Intel-gfx] " Brian Welty
2021-02-11 15:34       ` Daniel Vetter
2021-02-11 15:34         ` Daniel Vetter
2021-02-11 15:34         ` Daniel Vetter
2021-02-11 15:34         ` [Intel-gfx] " Daniel Vetter
2021-03-06  0:44         ` Brian Welty
2021-03-06  0:44           ` Brian Welty
2021-03-06  0:44           ` Brian Welty
2021-03-06  0:44           ` [Intel-gfx] " Brian Welty
2021-03-18 10:16           ` Daniel Vetter
2021-03-18 10:16             ` Daniel Vetter
2021-03-18 10:16             ` Daniel Vetter
2021-03-18 10:16             ` [Intel-gfx] " Daniel Vetter
2021-03-18 19:20             ` Brian Welty
2021-03-18 19:20               ` Brian Welty
2021-03-18 19:20               ` Brian Welty
2021-03-18 19:20               ` [Intel-gfx] " Brian Welty
2021-05-10 15:36               ` Daniel Vetter
2021-05-10 15:36                 ` Daniel Vetter
2021-05-10 15:36                 ` Daniel Vetter
2021-05-10 15:36                 ` [Intel-gfx] " Daniel Vetter
2021-05-10 16:06                 ` Tamminen, Eero T [this message]
2021-05-10 16:06                   ` Tamminen, Eero T
2021-05-10 16:06                   ` Tamminen, Eero T
2021-05-10 16:06                   ` [Intel-gfx] " Tamminen, Eero T
2021-01-26 21:46 ` [RFC PATCH 9/9] drm/i915: Use memory cgroup for enforcing device memory limit Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` Brian Welty
2021-01-26 21:46   ` [Intel-gfx] " Brian Welty
2021-01-27  6:04   ` kernel test robot
2021-01-26 22:37 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for cgroup support for GPU devices (rev3) Patchwork
2021-01-26 22:40 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-01-26 23:07 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-01-26 23:07 ` [Intel-gfx] ✗ Fi.CI.BUILD: warning " Patchwork
2021-01-27  4:55 ` [Intel-gfx] ✓ Fi.CI.IGT: success " Patchwork
2021-01-29  2:45 ` [RFC PATCH 0/9] cgroup support for GPU devices Xingyou Chen
2021-01-29  2:45   ` Xingyou Chen
2021-01-29  2:45   ` Xingyou Chen
2021-01-29  2:45   ` [Intel-gfx] " Xingyou Chen
2021-01-29  3:00 ` Xingyou Chen
2021-01-29  3:00   ` Xingyou Chen
2021-01-29  3:00   ` Xingyou Chen
2021-01-29  3:00   ` [Intel-gfx] " Xingyou Chen
2021-02-01 23:21   ` Brian Welty
2021-02-01 23:21     ` Brian Welty
2021-02-01 23:21     ` Brian Welty
2021-02-01 23:21     ` [Intel-gfx] " Brian Welty
2021-02-03 10:18     ` Daniel Vetter
2021-02-03 10:18       ` Daniel Vetter
2021-02-03 10:18       ` Daniel Vetter
2021-02-03 10:18       ` [Intel-gfx] " Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=09a39aa58c064a8f8e696a091a1b5edd22ef58b9.camel@intel.com \
    --to=eero.t.tamminen@intel.com \
    --cc=Kenny.Ho@amd.com \
    --cc=airlied@linux.ie \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=brian.welty@intel.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chris-wilson.co.uk \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=tj@kernel.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.