[RFC PATCH 0/5] cgroup support for GPU devices

* [RFC PATCH 0/5] cgroup support for GPU devices
@ 2019-05-01 14:04 Brian Welty
  2019-05-01 14:04 ` [RFC PATCH 1/5] cgroup: Add cgroup_subsys per-device registration framework Brian Welty
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Brian Welty @ 2019-05-01 14:04 UTC (permalink / raw)
  To: cgroups, Tejun Heo, Li Zefan, Johannes Weiner, linux-mm,
	Michal Hocko, Vladimir Davydov, dri-devel, David Airlie,
	Daniel Vetter, intel-gfx, Jani Nikula, Joonas Lahtinen,
	Rodrigo Vivi, Christian König, Alex Deucher, ChunMing Zhou,
	Jérôme Glisse

In containerized or virtualized environments, there is desire to have
controls in place for resources that can be consumed by users of a GPU
device.  This RFC patch series proposes a framework for integrating 
use of existing cgroup controllers into device drivers.
The i915 driver is updated in this series as our primary use case to
leverage this framework and to serve as an example for discussion.

The patch series enables device drivers to use cgroups to control the
following resources within a GPU (or other accelerator device):
*  control allocation of device memory (reuse of memcg)
and with future work, we could extend to:
*  track and control share of GPU time (reuse of cpu/cpuacct)
*  apply mask of allowed execution engines (reuse of cpusets)

Instead of introducing a new cgroup subsystem for GPU devices, a new
framework is proposed to allow devices to register with existing cgroup
controllers, which creates per-device cgroup_subsys_state within the
cgroup.  This gives device drivers their own private cgroup controls
(such as memory limits or other parameters) to be applied to device
resources instead of host system resources.
Device drivers (GPU or other) are then able to reuse the existing cgroup
controls, instead of inventing similar ones.

Per-device controls would be exposed in cgroup filesystem as:
    mount/<cgroup_name>/<subsys_name>.devices/<dev_name>/<subsys_files>
such as (for example):
    mount/<cgroup_name>/memory.devices/<dev_name>/memory.max
    mount/<cgroup_name>/memory.devices/<dev_name>/memory.current
    mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.stat
    mount/<cgroup_name>/cpu.devices/<dev_name>/cpu.weight

The drm/i915 patch in this series is based on top of other RFC work [1]
for i915 device memory support.

AMD [2] and Intel [3] have proposed related work in this area within the
last few years, listed below as reference.  This new RFC reuses existing
cgroup controllers and takes a different approach than prior work.

Finally, some potential discussion points for this series:
* merge proposed <subsys_name>.devices into a single devices directory?
* allow devices to have multiple registrations for subsets of resources?
* document a 'common charging policy' for device drivers to follow?

[1] https://patchwork.freedesktop.org/series/56683/
[2] https://lists.freedesktop.org/archives/dri-devel/2018-November/197106.html
[3] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html

Brian Welty (5):
  cgroup: Add cgroup_subsys per-device registration framework
  cgroup: Change kernfs_node for directories to store
    cgroup_subsys_state
  memcg: Add per-device support to memory cgroup subsystem
  drm: Add memory cgroup registration and DRIVER_CGROUPS feature bit
  drm/i915: Use memory cgroup for enforcing device memory limit

 drivers/gpu/drm/drm_drv.c                  |  12 +
 drivers/gpu/drm/drm_gem.c                  |   7 +
 drivers/gpu/drm/i915/i915_drv.c            |   2 +-
 drivers/gpu/drm/i915/intel_memory_region.c |  24 +-
 include/drm/drm_device.h                   |   3 +
 include/drm/drm_drv.h                      |   8 +
 include/drm/drm_gem.h                      |  11 +
 include/linux/cgroup-defs.h                |  28 ++
 include/linux/cgroup.h                     |   3 +
 include/linux/memcontrol.h                 |  10 +
 kernel/cgroup/cgroup-v1.c                  |  10 +-
 kernel/cgroup/cgroup.c                     | 310 ++++++++++++++++++---
 mm/memcontrol.c                            | 183 +++++++++++-
 13 files changed, 552 insertions(+), 59 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 19+ messages in thread