All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/5] DRM cgroup controller
@ 2018-11-20 18:58 Kenny Ho
  2018-11-20 18:58 ` [PATCH RFC 1/5] cgroup: Introduce cgroup for drm subsystem Kenny Ho
                   ` (4 more replies)
  0 siblings, 5 replies; 80+ messages in thread
From: Kenny Ho @ 2018-11-20 18:58 UTC (permalink / raw)
  To: y2kenny, Kenny.Ho, cgroups, dri-devel, amd-gfx, intel-gfx

The purpose of this patch series is to start a discussion for a generic cgroup
controller for the drm subsystem.  The design proposed here is a very early one.
We are hoping to engage the community as we develop the idea.


Backgrounds
==========
Control Groups/cgroup provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with specialized
behaviour, such as accounting/limiting the resources which processes in a cgroup
can access[1].  Weights, limits, protections, allocations are the main resource
distribution models.  Existing cgroup controllers includes cpu, memory, io,
rdma, and more.  cgroup is one of the foundational technologies that enables the
popular container application deployment and management method.

Direct Rendering Manager/drm contains code intended to support the needs of
complex graphics devices. Graphics drivers in the kernel may make use of DRM
functions to make tasks like memory management, interrupt handling and DMA
easier, and provide a uniform interface to applications.  The DRM has also
developed beyond traditional graphics applications to support compute/GPGPU
applications.


Motivations
=========
As GPU grow beyond the realm of desktop/workstation graphics into areas like
data center clusters and IoT, there are increasing needs to monitor and regulate
GPU as a resource like cpu, memory and io.

Matt Roper from Intel began working on similar idea in early 2018 [2] for the
purpose of managing GPU priority using the cgroup hierarchy.  While that
particular use case may not warrant a standalone drm cgroup controller, there
are other use cases where having one can be useful [3].  Monitoring GPU
resources such as VRAM and buffers, CU (compute unit [AMD's nomenclature])/EU
(execution unit [Intel's nomenclature]), GPU job scheduling [4] can help
sysadmins get a better understanding of the applications usage profile.  Further
usage regulations of the aforementioned resources can also help sysadmins
optimize workload deployment on limited GPU resources.

With the increased importance of machine learning, data science and other
cloud-based applications, GPUs are already in production use in data centers
today [5,6,7].  Existing GPU resource management is very course grain, however,
as sysadmins are only able to distribute workload on a per-GPU basis [8].  An
alternative is to use GPU virtualization (with or without SRIOV) but it
generally acts on the entire GPU instead of the specific resources in a GPU.
With a drm cgroup controller, we can enable alternate, fine-grain, sub-GPU
resource management (in addition to what may be available via GPU
virtualization.)

In addition to production use, the DRM cgroup can also help with testing
graphics application robustness by providing a mean to artificially limit DRM
resources availble to the applications.

Challenges
========
While there are common infrastructure in DRM that is shared across many vendors
(the scheduler [4] for example), there are also aspects of DRM that are vendor
specific.  To accommodate this, we borrowed the mechanism used by the cgroup to
handle different kinds of cgroup controller.

Resources for DRM are also often device (GPU) specific instead of system
specific and a system may contain more than one GPU.  For this, we borrowed some
of the ideas from RDMA cgroup controller.

Approach
=======
To experiment with the idea of a DRM cgroup, we would like to start with basic
accounting and statistics, then continue to iterate and add regulating
mechanisms into the driver.

[1] https://www.kernel.org/doc/Documentation/cgroup-v1/cgroups.txt
[2] https://lists.freedesktop.org/archives/intel-gfx/2018-January/153156.html
[3] https://www.spinics.net/lists/cgroups/msg20720.html
[4] https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/scheduler
[5] https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
[6] https://blog.openshift.com/gpu-accelerated-sql-queries-with-postgresql-pg-strom-in-openshift-3-10/
[7] https://github.com/RadeonOpenCompute/k8s-device-plugin
[8] https://github.com/kubernetes/kubernetes/issues/52757


Kenny Ho (5):
  cgroup: Introduce cgroup for drm subsystem
  cgroup: Add mechanism to register vendor specific DRM devices
  drm/amdgpu: Add DRM cgroup support for AMD devices
  drm/amdgpu: Add accounting of command submission via DRM cgroup
  drm/amdgpu: Add accounting of buffer object creation request via DRM
    cgroup

 drivers/gpu/drm/amd/amdgpu/Makefile         |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      |   5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |   7 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.c | 147 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.h |  27 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c     |  13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c    |  15 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h    |   5 +-
 include/drm/drm_cgroup.h                    |  39 ++++++
 include/drm/drmcgrp_vendors.h               |   8 ++
 include/linux/cgroup_drm.h                  |  58 ++++++++
 include/linux/cgroup_subsys.h               |   4 +
 include/uapi/drm/amdgpu_drm.h               |  24 +++-
 init/Kconfig                                |   5 +
 kernel/cgroup/Makefile                      |   1 +
 kernel/cgroup/drm.c                         | 130 +++++++++++++++++
 16 files changed, 484 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_drmcgrp.h
 create mode 100644 include/drm/drm_cgroup.h
 create mode 100644 include/drm/drmcgrp_vendors.h
 create mode 100644 include/linux/cgroup_drm.h
 create mode 100644 kernel/cgroup/drm.c

-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2019-05-16 14:58 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-20 18:58 [PATCH RFC 0/5] DRM cgroup controller Kenny Ho
2018-11-20 18:58 ` [PATCH RFC 1/5] cgroup: Introduce cgroup for drm subsystem Kenny Ho
     [not found] ` <20181120185814.13362-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2018-11-20 18:58   ` [PATCH RFC 2/5] cgroup: Add mechanism to register vendor specific DRM devices Kenny Ho
     [not found]     ` <20181120185814.13362-3-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2018-11-20 20:21       ` Tejun Heo
     [not found]         ` <20181120202141.GA2509588-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2018-11-20 22:21           ` Ho, Kenny
     [not found]             ` <DM5PR12MB1226E972538A45325114ADF683D90-2J9CzHegvk+lTFawYev2gQdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2018-11-20 22:30               ` Tejun Heo
     [not found]                 ` <20181120223018.GB2509588-LpCCV3molIbIZ9tKgghJQw2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2018-11-21 22:07                   ` Ho, Kenny
2018-11-21 22:12                   ` Ho, Kenny
2018-11-26 20:59                   ` Kasiviswanathan, Harish
2018-11-27  9:38                     ` Koenig, Christian
2018-11-27  9:46                     ` [Intel-gfx] " Joonas Lahtinen
2018-11-27 15:41                       ` Ho, Kenny
2018-11-28  9:14                         ` Joonas Lahtinen
2018-11-28 19:46                           ` Ho, Kenny
2018-11-30 22:22                             ` Matt Roper
     [not found]                               ` <20181130222228.GE31345-b/RNqDZ/lqH1fpGqjiHozbKMmGWinSIL2HeeBUIffwg@public.gmane.org>
2018-12-03  6:46                                 ` [Intel-gfx] " Ho, Kenny
2018-12-03 18:58                                   ` Matt Roper
     [not found]                           ` <154339645444.5339.6291298808444340104-zzJjBcU1GAT9BXuAQUXR0fooFf0ArEBIu+b9c/7xato@public.gmane.org>
2018-12-03 20:55                             ` Kuehling, Felix
2018-12-03 20:55                               ` Kuehling, Felix
     [not found]                               ` <219f8754-3e14-05ad-07a3-6cddb8bb74aa-5C7GfCeVMHo@public.gmane.org>
2018-12-05 14:20                                 ` Joonas Lahtinen
2018-12-05 14:20                                   ` Joonas Lahtinen
2018-11-21  9:53       ` Christian König
2018-11-20 18:58   ` [PATCH RFC 3/5] drm/amdgpu: Add DRM cgroup support for AMD devices Kenny Ho
     [not found]     ` <20181120185814.13362-4-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2018-11-21  9:55       ` Christian König
2018-11-20 18:58   ` [PATCH RFC 4/5] drm/amdgpu: Add accounting of command submission via DRM cgroup Kenny Ho
2018-11-20 20:57     ` Eric Anholt
2018-11-20 20:57       ` Eric Anholt
     [not found]       ` <87r2ff79he.fsf-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>
2018-11-21 10:03         ` Christian König
2018-11-23 17:36           ` Eric Anholt
     [not found]             ` <871s7b7l2b.fsf-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>
2018-11-23 18:13               ` Koenig, Christian
2018-11-23 18:13                 ` Koenig, Christian
     [not found]                 ` <095e010c-e3b8-ec79-c87b-a05ce1d95e10-5C7GfCeVMHo@public.gmane.org>
2018-11-23 19:09                   ` Ho, Kenny
2018-11-23 19:09                     ` Ho, Kenny
2018-11-21  9:58     ` Christian König
2018-11-20 18:58 ` [PATCH RFC 5/5] drm/amdgpu: Add accounting of buffer object creation request " Kenny Ho
     [not found]   ` <20181120185814.13362-6-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2018-11-20 20:56     ` Eric Anholt
2018-11-20 20:56       ` Eric Anholt
2018-11-21 10:00     ` Christian König
2018-11-27 18:15       ` Kenny Ho
     [not found]         ` <CAOWid-fMFUvT_XQijRd34+cUOxM=zbbf+HwWv_NbqO-rBo2d_A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-11-27 20:31           ` Christian König
     [not found]             ` <3299d9d6-e272-0459-8f63-0c81d11cde1e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-11-27 20:36               ` Kenny Ho
2018-11-21  1:43 ` ✗ Fi.CI.BAT: failure for DRM cgroup controller Patchwork
2019-05-09 21:04 ` [RFC PATCH v2 0/5] new cgroup controller for gpu/drm subsystem Kenny Ho
     [not found]   ` <20190509210410.5471-1-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-05-09 21:04     ` [RFC PATCH v2 1/5] cgroup: Introduce cgroup for drm subsystem Kenny Ho
2019-05-09 21:04     ` [RFC PATCH v2 2/5] cgroup: Add mechanism to register DRM devices Kenny Ho
2019-05-09 21:04     ` [RFC PATCH v2 5/5] drm, cgroup: Add peak GEM buffer allocation limit Kenny Ho
     [not found]       ` <20190509210410.5471-6-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-05-10 12:29         ` Christian König
2019-05-10 12:31     ` [RFC PATCH v2 0/5] new cgroup controller for gpu/drm subsystem Christian König
2019-05-10 15:07       ` Kenny Ho
2019-05-10 15:07         ` Kenny Ho
     [not found]         ` <CAOWid-dJZrnAifFYByh4p9x-jA1o_5YWkoNVAVbdRUaxzdPbGA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-10 17:46           ` Koenig, Christian
2019-05-10 17:46             ` Koenig, Christian
2019-05-09 21:04   ` [RFC PATCH v2 3/5] drm/amdgpu: Register AMD devices for DRM cgroup Kenny Ho
2019-05-09 21:04   ` [RFC PATCH v2 4/5] drm, cgroup: Add total GEM buffer allocation limit Kenny Ho
     [not found]     ` <20190509210410.5471-5-Kenny.Ho-5C7GfCeVMHo@public.gmane.org>
2019-05-10 12:28       ` Christian König
     [not found]         ` <f63c8d6b-92a4-2977-d062-7e0b7036834e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-05-10 14:57           ` Kenny Ho
2019-05-10 14:57             ` Kenny Ho
2019-05-10 15:08             ` Koenig, Christian
2019-05-10 15:08               ` Koenig, Christian
     [not found]               ` <1ca1363e-b39c-c299-1d24-098b1059f7ff-5C7GfCeVMHo@public.gmane.org>
2019-05-10 15:25                 ` Kenny Ho
2019-05-10 15:25                   ` Kenny Ho
2019-05-10 17:48                   ` Koenig, Christian
2019-05-10 17:48                     ` Koenig, Christian
2019-05-10 18:50                     ` Kenny Ho
     [not found]                       ` <CAOWid-es+C_iStQUkM52mO3TeP8eS9MX+emZDQNH2PyZCf=RHQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-13 15:10                         ` Daniel Vetter
2019-05-15 21:26     ` Welty, Brian
     [not found]       ` <d81e8f55-9602-818e-0f9c-1d9d150133b1-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2019-05-16  2:29         ` Kenny Ho
     [not found]           ` <CAOWid-ftUrVVWPu9KuS8xpWKNQT6_FtxB8gEyEAn9nLD6qxb5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-16  7:16             ` Koenig, Christian
2019-05-16  7:25               ` Christian König
     [not found]                 ` <6e124f5e-f83f-5ca1-4616-92538f202653-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-05-16 12:28                   ` Daniel Vetter
2019-05-16 12:28                     ` Daniel Vetter
2019-05-16 14:08                     ` Koenig, Christian
2019-05-16 14:08                       ` Koenig, Christian
2019-05-16 14:03                   ` Kenny Ho
2019-05-16 14:03                     ` Kenny Ho
     [not found]                     ` <CAOWid-fQgah16ycz-V-ymsm7yKUnFTeTSBaW4MK=2mqUHhCcmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-16 14:12                       ` Christian König
2019-05-16 14:12                         ` Christian König
2019-05-16 14:28                         ` Kenny Ho
2019-05-16 14:10             ` Tejun Heo
2019-05-16 14:58               ` Kenny Ho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.